9  API Reference

Complete reference for every public function and constant in the zulipdata library:

Each entry shows the docstring, a live example, and a test. The namespace links above lead to the conceptual walkthrough for each — read those for context; this chapter is the API reference.

Sample data

A small pull, reused across every example below. Each layer of the pipeline is bound for direct reuse: sample-pull (raw pull result), sample-messages (flat seq of raw messages), sample-timeline (plain tablecloth view), sample-anon (anonymized), sample-with-time (anonymized + date columns).

(def sample-channels
  ["clojurecivitas" "scicloj-webpublic" "gratitude" "events"])
(def sample-pull
  (pull/pull-channels! sample-channels))
(def sample-messages
  (->> sample-pull
       (filter (fn [[k _]] (string? k)))
       (mapcat (fn [[_ r]] (pull/all-messages r)))))
(def sample-timeline
  (views/messages-timeline sample-messages))
(def sample-anon
  (anon/anonymized-timeline sample-messages))
(def sample-with-time
  (nar/with-time-columns sample-anon))

scicloj.zulipdata.client

base-url

API root for the Clojurians Zulip instance. All api-get paths are resolved relative to this prefix.

client/base-url
"https://clojurians.zulipchat.com/api/v1"

api-get

[path]

[path query-params]

Authenticated GET against the Clojurians Zulip API. path is resolved relative to base-url; query-params is an optional map. Wraps the request in a small retry loop with longer waits between retries and a 90-second per-request timeout. Returns the JSON body parsed with keyword keys.

(-> (client/api-get "/server_settings")
    :realm_name)
"Clojurians"

With query parameters:

(-> (client/api-get "/messages"
                    {"narrow"     (charred.api/write-json-str
                                   [{:operator "channel" :operand "clojurecivitas"}])
                     "anchor"     "newest"
                     "num_before" 1
                     "num_after"  0})
    :messages count)
1

whoami

[]

Calls /users/me and returns a short summary of the authenticated identity. Use this after configuring credentials to confirm everything works before running a pull.

(client/whoami)
{:email "user138175@clojurians.zulipchat.com",
 :full-name "Daniel Slutsky",
 :user-id 138175,
 :is-bot false,
 :is-admin true,
 :role 100}

get-me

[]

Full /users/me response for the authenticated account. Use whoami for a trimmed summary.

(-> (client/get-me) :user_id integer?)
true

get-streams

[]

Full /streams response — every stream the authenticated user can see. Returns the raw Zulip API map; the stream entries live under :streams.

(-> (client/get-streams) :streams count pos?)
true

get-messages

[{:keys [narrow anchor num-before num-after apply-markdown], :or {anchor "newest", num-before 100, num-after 0, apply-markdown false}}]

Fetch messages matching a narrow. narrow is a vector of maps, e.g. [{:operator “channel” :operand “data-science”}]. anchor may be “newest”, “oldest”, “first_unread”, or a message id. Returns up to num-before + num-after + 1 messages around the anchor.

(-> (client/get-messages
     {:narrow     [{:operator "channel" :operand "clojurecivitas"}]
      :anchor     "newest"
      :num-before 3
      :num-after  0})
    :messages count)
3

scicloj.zulipdata.pull

default-batch-size

Messages requested per window when pull-channel! is called without an explicit :batch-size. 5000 is also Zulip’s per-request cap.

pull/default-batch-size
5000

fetch-window

[stream-name anchor-id batch-size]

Cached forward window. Returns the deref’d page map.

(-> (pull/fetch-window "clojurecivitas" 0 100)
    :messages count)
100

pull-channel!

[stream-name start-anchor-id & {:keys [batch-size refresh], :or {batch-size default-batch-size}}]

Walk forward through stream-name in cached windows, starting at start-anchor-id. Returns {:pages [...], :message-count n}.

Options: :batch-size — messages per window (default 5000) :refresh — when true, any cached page with found_newest: true is invalidated and re-fetched once, then the walk continues if new full windows appeared. Use to catch up after messages were posted since the last pull.

With :refresh false (default), repeated calls are served entirely from cache.

A complete walk from id zero to the channel’s tip. Result keys:

(-> (pull/pull-channel! "clojurecivitas" 0)
    (select-keys [:pages :message-count])
    keys
    set)
#{:pages :message-count}

all-messages

[pull-result]

Flatten the :pages result of pull-channel! into a single sequence of messages, de-duplicating by :id (windows are non-overlapping by construction; this is a redundant safety check).

(let [walk     (pull/pull-channel! "clojurecivitas" 0)
      messages (pull/all-messages walk)]
  (= (count messages) (:message-count walk)))
true

pull-channels!

[channel-names & {:keys [batch-size refresh parallelism], :as opts, :or {parallelism default-parallelism}}]

Pull a collection of channels by name. Returns a map {channel-name {:pages ... :message-count ... :stream-id ... :first-message-id ...}}.

First-message ids are resolved from /streams. Any unknown channel names are returned under key :not-found as a vector.

Options: :batch-size — passed through to pull-channel! (default 5000) :refresh — passed through to pull-channel! :parallelism — number of channels to pull concurrently (default default-parallelism, currently 8). Pass 1 for fully sequential pulls.

Successful entries are keyed by name; unknown names land in :not-found.

(-> (pull/pull-channels! ["clojurecivitas" "no-such-channel"])
    :not-found)
["no-such-channel"]

public-channel-names

[]

Names of all channels visible to the bot that are either public or web-public.

(-> (pull/public-channel-names) count pos?)
true

pull-public-channels!

[& opts]

Convenience: pull every public + web-public channel visible to the bot. Same options as pull-channels!.

Not run here — a fresh full-corpus pull can take minutes. Pulls every name returned by pull/public-channel-names and accepts the same options as pull-channels!.

scicloj.zulipdata.views

messages-timeline

[messages]

One row per message — simple-valued fields only. Good for activity-over-time and sender/topic analyses.

(-> (views/messages-timeline sample-messages)
    tc/row-count)
1096

The columns:

(-> sample-timeline tc/column-names sort)
(:channel
 :client
 :content
 :content-length
 :edited
 :id
 :instant
 :last-edit-ts
 :sender
 :sender-id
 :stream-id
 :subject
 :timestamp)

reactions-long

[messages]

One row per (message, reaction). Fields: message-id, stream-id, channel, subject, emoji-name, emoji-code, reaction-type, user-id, message-ts.

(-> (views/reactions-long sample-messages)
    tc/column-names sort)
(:channel
 :emoji-code
 :emoji-name
 :message-id
 :message-ts
 :reaction-type
 :stream-id
 :subject
 :user-id)

edits-long

[messages]

One row per edit event in :edit_history. Note: some edits record only topic/stream moves (no :prev_content); we include prev-content as-is.

(-> (views/edits-long sample-messages)
    tc/column-names sort)
(:channel
 :edit-ts
 :edit-user-id
 :message-id
 :prev-content
 :prev-stream
 :prev-subject
 :stream-id)

scicloj.zulipdata.anonymize

user-key

[sender-id]

Stable, irreversible 16-hex-char identifier for a sender id.

(anon/user-key 42)
"62b81b15a6414d9b"

Stable across calls; nil passes through:

[(= (anon/user-key 42) (anon/user-key 42))
 (anon/user-key nil)]
[true nil]

subject-key

[subject]

Stable 16-hex-char identifier for a topic/subject string. Wide enough that two distinct subjects almost never collide, so the key uniquely identifies a topic given the full corpus.

(anon/subject-key "channel introductions")
"b61cd3d678d6f0da"

anonymized-timeline

[messages]

One row per message, anonymized. Sender identity and subject are replaced by stable hash keys; message text is replaced by length only. Reaction count is kept; the per-emoji breakdown lives in anonymized-reactions.

(-> (anon/anonymized-timeline sample-messages)
    tc/column-names sort)
(:channel
 :client
 :content-length
 :edited
 :id
 :last-edit-ts
 :reaction-count
 :stream-id
 :subject-key
 :timestamp
 :user-key)

anonymized-reactions

[messages]

One row per (message, reaction). Both the message author’s subject and the reactor’s identity are anonymized; the emoji name is preserved (it captures community sentiment, not message content).

(-> (anon/anonymized-reactions sample-messages)
    tc/column-names sort)
(:channel
 :emoji-code
 :emoji-name
 :message-id
 :message-ts
 :reaction-type
 :reactor-user-key
 :stream-id
 :subject-key)

anonymized-edits

[messages]

One row per edit event. Editor and prior subject are anonymized; prior content is dropped. prev-stream is left as-is — it is a stream id, not personal data.

(-> (anon/anonymized-edits sample-messages)
    tc/column-names sort)
(:channel
 :edit-ts
 :editor-user-key
 :message-id
 :prev-stream
 :prev-subject-key
 :stream-id)

scicloj.zulipdata.narrative

ts->month-date

[ts]

Epoch-second -> first-of-month LocalDate (UTC).

(nar/ts->month-date 1725611765)
#object[java.time.LocalDate 0xe211652 "2024-09-01"]

ts->year-month

[ts]

Epoch-second -> “YYYY-MM” string (UTC).

(nar/ts->year-month 1725611765)
"2024-09"

ts->year

[ts]

Epoch-second -> integer year (UTC).

(nar/ts->year 1725611765)
2024

with-time-columns

[timeline]

Add :month-date, :year-month, and :year columns to a timeline that has a :timestamp column (epoch seconds).

(-> (nar/with-time-columns sample-anon)
    tc/column-names
    set
    (clojure.set/intersection #{:month-date :year-month :year}))
#{:month-date :year :year-month}

channel-lifecycle

[timeline]

One row per channel: first-date, last-date, total messages, active months, distinct users. Sorted ascending by first-date by default.

(-> (nar/channel-lifecycle sample-with-time)
    tc/column-names sort)
(:active-months :channel :distinct-users :first-date :last-date :total)

channels-by-name-pattern

[timeline regex]

Channels whose name matches regex.

(nar/channels-by-name-pattern sample-with-time #"civitas|gratitude")
["clojurecivitas" "gratitude"]

channels-by-shared-users

[timeline seed-channel & {:keys [share min-msgs top-n], :or {share 0.4, min-msgs 30, top-n 30}}]

Channels where the top-N posters of seed-channel account for at least share of messages, restricted to channels with at least min-msgs total. Returns a sorted vector of channel names.

Use to build a curated cluster around a seed channel by user-overlap rather than name patterns.

The seed channel itself appears in the result if it meets the threshold, since by definition its top posters account for 100% of its activity.

(set
 (nar/channels-by-shared-users sample-with-time "clojurecivitas"
                               :share 0.5 :min-msgs 5 :top-n 5))
#{"scicloj-webpublic" "clojurecivitas" "events"}

first-posters-of-channel

[timeline channel n]

First n distinct user-keys to post in channel, with their first-post date. Useful for identifying a channel’s earliest contributors.

(-> (nar/first-posters-of-channel sample-with-time "clojurecivitas" 5)
    tc/column-names sort)
(:first-post-date :user-key)

prior-channels-of-newcomers

[timeline channel year-month]

For users whose first-ever post in channel falls in the given year-month (“YYYY-MM”), tally the channels they had posted in before that first post. Returns one row per (prior-channel) with counts of how many newcomers passed through it.

(-> (nar/prior-channels-of-newcomers sample-with-time "clojurecivitas" "2025-10")
    tc/column-names sort)
(:newcomers-touched :prior-channel)

channel-monthly-activity

[timeline]

[timeline channels]

Long-form: one row per (channel, month-date) with :msgs count. Restricted to channels if supplied, else all channels.

(-> (nar/channel-monthly-activity sample-with-time #{"clojurecivitas"})
    tc/column-names sort)
(:channel :month-date :msgs)

scicloj.zulipdata.graph

user-channel-sets

[timeline]

[timeline min-channels]

Map of user-key → set of channels they posted in. Drops users with fewer than min-channels channels (default 1).

Map of user-key to the set of channels they posted in.

(let [u->c (graph/user-channel-sets sample-with-time)
      [_ chans] (first u->c)]
  (set? chans))
true

channel-comembership-graph

[timeline & {:keys [min-shared], :or {min-shared 1}}]

Undirected weighted graph: nodes are channels, edges weighted by shared user count. min-shared filters out edges with fewer than N shared users.

(let [g (graph/channel-comembership-graph sample-with-time :min-shared 1)]
  (= (set sample-channels) (.vertexSet g)))
true

user-copresence-graph

[timeline & {:keys [min-shared min-channels], :or {min-shared 3, min-channels 3}}]

Undirected weighted graph: nodes are users, edges weighted by shared channel count. min-shared filters edges; min-channels filters users (active in ≥ N channels).

(let [g (graph/user-copresence-graph sample-with-time
                                     :min-shared 2 :min-channels 2)]
  (pos? (count (.vertexSet g))))
true

migration-graph

[timeline from-set & {:keys [min-users], :or {min-users 3}}]

Directed weighted graph: edge from from-channel to to-channel weighted by the number of users who posted in from-channel and later (after their last post in any from-set channel) posted in to-channel. Excludes self-loops and edges within from-set.

Only users with at least 5 posts in from-set are considered.

Edges from each from-set source to channels users moved to next. With clojurecivitas as the seed, no self-loops:

(let [g (graph/migration-graph sample-with-time #{"clojurecivitas"} :min-users 1)]
  (every? (fn [e] (not= (.getEdgeSource g e) (.getEdgeTarget g e)))
          (.edgeSet g)))
true

betweenness

[g]

Map node → betweenness centrality score.

(let [g      (graph/channel-comembership-graph sample-with-time)
      scores (graph/betweenness g)]
  (= (.vertexSet g) (set (keys scores))))
true

girvan-newman

[g k]

Vector of node-sets, one per cluster. k is the desired number of clusters.

(let [g        (graph/channel-comembership-graph sample-with-time)
      clusters (graph/girvan-newman g 2)]
  (count clusters))
2

label-propagation

[g]

Vector of node-sets — communities found by label propagation (number of clusters chosen by the algorithm).

(let [g        (graph/channel-comembership-graph sample-with-time)
      clusters (graph/label-propagation g)]
  (every? set? clusters))
true

->cytoscape-elements

[g & {:keys [node-attrs edge-attrs], :or {node-attrs (constantly {}), edge-attrs (constantly {})}}]

Convert a JGraphT graph to a :elements map for kind/cytoscape. node-attrs and edge-attrs are optional fns of the node / [u v weight] returning a map of extra attributes (merged into :data).

(let [g (graph/channel-comembership-graph sample-with-time)
      e (graph/->cytoscape-elements g)]
  (set (keys e)))
#{:nodes :edges}

->dot

[g & {:keys [directed node-label edge-label name], :or {directed true, node-label str, edge-label (constantly nil), name "G"}}]

Render a JGraphT graph as Graphviz DOT source. directed chooses between digraph/graph. node-label and edge-label are optional fns producing label strings.

(let [g   (graph/channel-comembership-graph sample-with-time)
      dot (graph/->dot g :directed false)]
  (and (string? dot)
       (clojure.string/starts-with? dot "graph ")))
true
source: notebooks/zulipdata_book/api_reference.clj