8 Graph views
scicloj.zulipdata.graph turns the corpus into graph-shaped structures backed by JGraphT. Two graphs cover most analyses:
Channel co-membership — undirected weighted; nodes are channel names, edges weighted by the number of users who post in both endpoints.
User co-presence — undirected weighted; nodes are anonymized user-keys, edges weighted by the number of channels the two users share.
A third — migration — is directed and answers “after these users last posted in from-set, where did they show up next?”
Plus utilities for community detection (Girvan-Newman, Label Propagation), betweenness centrality, and conversion to shapes that kind/cytoscape and kind/graphviz know how to render.
(ns zulipdata-book.graph
(:require
;; Zulipdata pull -- paginated, cached channel history
[scicloj.zulipdata.pull :as pull]
;; Zulipdata anonymize -- HMAC-keyed anonymized projections
[scicloj.zulipdata.anonymize :as anon]
;; Zulipdata narrative -- date columns, lifecycles, newcomer tracking
[scicloj.zulipdata.narrative :as nar]
;; Zulipdata graph -- co-membership / co-presence graphs
[scicloj.zulipdata.graph :as graph]
;; Kindly -- notebook rendering protocol
[scicloj.kindly.v4.kind :as kind]
;; Tablecloth -- dataset manipulation
[tablecloth.api :as tc]))A multi-channel sample
A small set of web-public channels — small enough to render cleanly, large enough to expose non-trivial graph structure.
(def sample-channels
["clojurecivitas" "scicloj-webpublic" "gratitude" "events"
"calva" "clojure-uk" "clojure-europe" "news-and-articles"])(def timeline
(->> (pull/pull-channels! sample-channels)
(filter (fn [[k _]] (string? k)))
(mapcat (fn [[_ r]] (pull/all-messages r)))
anon/anonymized-timeline
nar/with-time-columns))timeline_unnamed [2359 14]:
| :last-edit-ts | :client | :reaction-count | :channel | :user-key | :stream-id | :edited | :content-length | :id | :subject-key | :timestamp | :month-date | :year-month | :year |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Internal | 0 | clojure-uk | 30f24f0b44b99e93 | 151222 | false | 27 | 147403047 | 6777fcbe881b91ed | 1541800305 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipMobile | 0 | clojure-uk | 59c5550a8a9f258f | 151222 | false | 8 | 147403098 | a621c785f8deecbf | 1541800328 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipMobile | 0 | clojure-uk | 59c5550a8a9f258f | 151222 | false | 7 | 147422253 | a621c785f8deecbf | 1541832580 | 2018-11-01 | 2018-11 | 2018 | |
| website | 0 | clojure-uk | 7066f94b066c86cf | 151222 | false | 7 | 147542850 | a621c785f8deecbf | 1542047347 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipMobile | 0 | clojure-uk | 59c5550a8a9f258f | 151222 | false | 8 | 147544737 | a621c785f8deecbf | 1542049359 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipMobile | 0 | clojure-uk | 59c5550a8a9f258f | 151222 | false | 60 | 147554684 | 80ea9bf4e69d1493 | 1542060998 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipMobile | 0 | clojure-uk | 59c5550a8a9f258f | 151222 | false | 12 | 147572603 | a621c785f8deecbf | 1542089372 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipMobile | 0 | clojure-uk | 6b7c27d0b84e2cc4 | 151222 | false | 12 | 147575275 | a621c785f8deecbf | 1542094392 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipMobile | 0 | clojure-uk | 6b7c27d0b84e2cc4 | 151222 | false | 27 | 147575328 | a621c785f8deecbf | 1542094451 | 2018-11-01 | 2018-11 | 2018 | |
| ZulipElectron | 0 | clojure-uk | 392433c10fddd53e | 151222 | false | 6 | 147575905 | a621c785f8deecbf | 1542095634 | 2018-11-01 | 2018-11 | 2018 | |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … |
| Apache-HttpClient | 0 | events | d19c7e5c3106f475 | 262224 | false | 941 | 581605887 | c73d6239136b12de | 1774442659 | 2026-03-01 | 2026-03 | 2026 | |
| Apache-HttpClient | 1 | events | d19c7e5c3106f475 | 262224 | false | 1782 | 581834590 | c860442b6b6b2ee8 | 1774514202 | 2026-03-01 | 2026-03 | 2026 | |
| 1774554234 | Apache-HttpClient | 1 | events | 23f35a98369b24ab | 262224 | true | 788 | 582024774 | c7d84a00b1e33ecb | 1774554138 | 2026-03-01 | 2026-03 | 2026 |
| Apache-HttpClient | 0 | events | d19c7e5c3106f475 | 262224 | false | 843 | 582822755 | 7462ad1a80605ec3 | 1774968780 | 2026-03-01 | 2026-03 | 2026 | |
| Apache-HttpClient | 0 | events | 4dd19d9fa5a81a0f | 262224 | false | 592 | 583710500 | 3ee35fa470d40352 | 1775418564 | 2026-04-01 | 2026-04 | 2026 | |
| Apache-HttpClient | 0 | events | d19c7e5c3106f475 | 262224 | false | 844 | 584035929 | a716b082c7a51ed2 | 1775578980 | 2026-04-01 | 2026-04 | 2026 | |
| Apache-HttpClient | 0 | events | d19c7e5c3106f475 | 262224 | false | 844 | 585500688 | 94b35beb5f01d503 | 1776192547 | 2026-04-01 | 2026-04 | 2026 | |
| ZulipElectron | 0 | events | 2a7920fa288b6ac5 | 262224 | false | 9 | 585768949 | 94b35beb5f01d503 | 1776292144 | 2026-04-01 | 2026-04 | 2026 | |
| ZulipElectron | 0 | events | 2b13ca51d92878c3 | 262224 | false | 15 | 586110306 | 94b35beb5f01d503 | 1776425996 | 2026-04-01 | 2026-04 | 2026 | |
| website | 0 | events | 491c7331968ae4b2 | 262224 | false | 19 | 586168194 | 94b35beb5f01d503 | 1776441936 | 2026-04-01 | 2026-04 | 2026 | |
| 1776448026 | website | 0 | events | d19c7e5c3106f475 | 262224 | true | 47 | 586186641 | 94b35beb5f01d503 | 1776448017 | 2026-04-01 | 2026-04 | 2026 |
Users to channels
The building block both whole-graph constructors share: user-channel-sets returns a map from user-key to the set of channels they have posted in. min-channels is a lower bound — users in fewer than min-channels channels are dropped.
(def u->chans (graph/user-channel-sets timeline))(count u->chans)96The first five entries:
(->> u->chans (take 5) (into {})){"81c90f8fd6744027" #{"news-and-articles"},
"71093247047f6c02" #{"clojure-europe" "events"},
"81974a7a8a884239" #{"events"},
"13c6ca9ef033c774"
#{"news-and-articles"
"gratitude"
"scicloj-webpublic"
"clojurecivitas"},
"2dbe557be94bc5b0" #{"clojure-europe" "clojurecivitas"}}The distribution over how many channels users participate in:
(->> u->chans
vals
(map count)
frequencies
(into (sorted-map))){1 69, 2 10, 3 8, 4 5, 5 2, 7 1, 8 1}Channel co-membership
channel-comembership-graph returns a JGraphT DefaultUndirectedWeightedGraph. Nodes are channels, edges are weighted by the number of shared users. The :min-shared option drops thin edges.
(def co-channel
(graph/channel-comembership-graph timeline :min-shared 1))(.vertexSet co-channel)#{"clojurecivitas" "news-and-articles" "gratitude" "clojure-europe" "clojure-uk" "events" "calva" "scicloj-webpublic"}(count (.edgeSet co-channel))28The graph is complete — every pair of channels shares at least one user, so every possible pair becomes an edge:
The edge-weight table:
(->> (.edgeSet co-channel)
(map (fn [e]
{:from (.getEdgeSource co-channel e)
:to (.getEdgeTarget co-channel e)
:weight (.getEdgeWeight co-channel e)}))
(sort-by :weight >)
tc/dataset)_unnamed [28 3]:
| :from | :to | :weight |
|---|---|---|
| clojurecivitas | scicloj-webpublic | 10.0 |
| news-and-articles | scicloj-webpublic | 8.0 |
| clojure-europe | events | 7.0 |
| gratitude | scicloj-webpublic | 7.0 |
| events | scicloj-webpublic | 7.0 |
| calva | events | 7.0 |
| clojurecivitas | news-and-articles | 6.0 |
| clojure-europe | clojurecivitas | 6.0 |
| events | gratitude | 6.0 |
| clojurecivitas | gratitude | 5.0 |
| … | … | … |
| clojure-uk | events | 4.0 |
| calva | news-and-articles | 4.0 |
| calva | clojurecivitas | 4.0 |
| clojurecivitas | events | 4.0 |
| gratitude | news-and-articles | 3.0 |
| clojure-uk | gratitude | 3.0 |
| events | news-and-articles | 3.0 |
| clojure-europe | news-and-articles | 2.0 |
| clojure-uk | news-and-articles | 1.0 |
| clojure-uk | clojurecivitas | 1.0 |
| clojure-uk | scicloj-webpublic | 1.0 |
User co-presence
user-copresence-graph flips the construction: nodes are users, edges are weighted by shared-channel count. The defaults (:min-shared 3 :min-channels 3) keep only the densely-connected core — users active in at least three channels, paired only when they share at least three.
(def co-user
(graph/user-copresence-graph timeline :min-shared 3 :min-channels 3))Node and edge counts:
{:nodes (count (.vertexSet co-user))
:edges (count (.edgeSet co-user))}{:nodes 17, :edges 48}Migration: where did people go next?
migration-graph is directed. For each user with at least five posts in from-set, it looks at every channel they posted in after their last from-set post, and adds an edge from each from-set source they used to each later destination. Edges with fewer than :min-users are dropped.
Taking clojurecivitas as the seed shows where clojurecivitas posters subsequently appeared.
(def migration
(graph/migration-graph timeline #{"clojurecivitas"} :min-users 1))(->> (.edgeSet migration)
(map (fn [e]
{:from (.getEdgeSource migration e)
:to (.getEdgeTarget migration e)
:weight (.getEdgeWeight migration e)}))
(sort-by :weight >)
tc/dataset)_unnamed [1 3]:
| :from | :to | :weight |
|---|---|---|
| clojurecivitas | gratitude | 1.0 |
Centrality
betweenness returns a map from node to its betweenness centrality score — the share of shortest paths that pass through the node. The top scores:
(->> (graph/betweenness co-channel)
(sort-by val >)
(take 5)
(into (array-map))){"clojure-uk" 8.0,
"news-and-articles" 6.833333333333334,
"clojurecivitas" 0.3333333333333333,
"gratitude" 0.0,
"clojure-europe" 0.0}The graph is a clique (every pair of channels is directly connected), yet betweenness is not uniformly zero. JGraphT treats edge weights as distances when computing shortest paths. With weight = shared-user count, a heavily-shared pair has a long direct edge, and a 2-hop detour through a thin-overlap channel can be shorter. The high scorers here are channels with thin overlap to most others — they sit on the cheap detours.
(boolean (some pos? (vals (graph/betweenness co-channel))))trueCommunities
Two algorithms, both returning a vector of node-sets — one set per cluster.
Girvan-Newman needs you to pick k (the desired number of clusters). On this small graph, k = 2 produces a useful split.
(graph/girvan-newman co-channel 2)[#{"clojure-uk"
"news-and-articles"
"gratitude"
"scicloj-webpublic"
"clojure-europe"
"clojurecivitas"
"events"}
#{"calva"}](count (graph/girvan-newman co-channel 2))2Label propagation chooses k itself. On a small dense graph it will often produce only one cluster — which is informative in its own right.
(graph/label-propagation co-channel)[#{"clojure-uk"
"news-and-articles"
"gratitude"
"scicloj-webpublic"
"clojure-europe"
"clojurecivitas"
"calva"
"events"}](count (graph/label-propagation co-channel))1Rendering: kind/cytoscape
->cytoscape-elements converts a JGraphT graph to the :elements shape that kind/cytoscape consumes. Optional :node-attrs and :edge-attrs functions add extra attributes to each :data map — useful for colour-coding by community or thickness by weight.
^{:kindly/options {:element/style {:height "500px" :width "100%"}}}
(let [weights (map #(.getEdgeWeight co-channel %) (.edgeSet co-channel))
w-min (apply min weights)
w-max (apply max weights)]
(kind/cytoscape
{:elements (graph/->cytoscape-elements co-channel)
:style [{:selector "node"
:css {:label "data(id)"
:content "data(id)"
:font-size 9}}
{:selector "edge"
:css {:width (str "mapData(weight, " w-min ", " w-max ", 1, 8)")}}]
:layout {:name "cose"}}))Rendering: kind/graphviz
->dot returns a Graphviz DOT string. Pass it straight to kind/graphviz for a static rendering. directed?, node-label, and edge-label are optional settings.
(def co-channel-dot
(graph/->dot co-channel
:directed false
:edge-label (fn [[_ _ w]] (str (long w)))))(kind/graphviz co-channel-dot)Where to go next
You have now completed the full tutorial. From here:
API Reference — every public function in one chapter, with docstrings and a worked example each. The right place to look when you know which function you want.
The source under
src/scicloj/zulipdata/is small enough to read straight through whenever a docstring leaves you uncertain.