8  Graph views

scicloj.zulipdata.graph turns the corpus into graph-shaped structures backed by JGraphT. Two graphs cover most analyses:

A third — migration — is directed and answers “after these users last posted in from-set, where did they show up next?”

Plus utilities for community detection (Girvan-Newman, Label Propagation), betweenness centrality, and conversion to shapes that kind/cytoscape and kind/graphviz know how to render.

(ns zulipdata-book.graph
  (:require
   ;; Zulipdata pull -- paginated, cached channel history
   [scicloj.zulipdata.pull :as pull]
   ;; Zulipdata anonymize -- HMAC-keyed anonymized projections
   [scicloj.zulipdata.anonymize :as anon]
   ;; Zulipdata narrative -- date columns, lifecycles, newcomer tracking
   [scicloj.zulipdata.narrative :as nar]
   ;; Zulipdata graph -- co-membership / co-presence graphs
   [scicloj.zulipdata.graph :as graph]
   ;; Kindly -- notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Tablecloth -- dataset manipulation
   [tablecloth.api :as tc]))

A multi-channel sample

A small set of web-public channels — small enough to render cleanly, large enough to expose non-trivial graph structure.

(def sample-channels
  ["clojurecivitas" "scicloj-webpublic" "gratitude" "events"
   "calva" "clojure-uk" "clojure-europe" "news-and-articles"])
(def timeline
  (->> (pull/pull-channels! sample-channels)
       (filter (fn [[k _]] (string? k)))
       (mapcat (fn [[_ r]] (pull/all-messages r)))
       anon/anonymized-timeline
       nar/with-time-columns))
timeline

_unnamed [2359 14]:

:last-edit-ts :client :reaction-count :channel :user-key :stream-id :edited :content-length :id :subject-key :timestamp :month-date :year-month :year
Internal 0 clojure-uk 30f24f0b44b99e93 151222 false 27 147403047 6777fcbe881b91ed 1541800305 2018-11-01 2018-11 2018
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 8 147403098 a621c785f8deecbf 1541800328 2018-11-01 2018-11 2018
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 7 147422253 a621c785f8deecbf 1541832580 2018-11-01 2018-11 2018
website 0 clojure-uk 7066f94b066c86cf 151222 false 7 147542850 a621c785f8deecbf 1542047347 2018-11-01 2018-11 2018
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 8 147544737 a621c785f8deecbf 1542049359 2018-11-01 2018-11 2018
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 60 147554684 80ea9bf4e69d1493 1542060998 2018-11-01 2018-11 2018
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 12 147572603 a621c785f8deecbf 1542089372 2018-11-01 2018-11 2018
ZulipMobile 0 clojure-uk 6b7c27d0b84e2cc4 151222 false 12 147575275 a621c785f8deecbf 1542094392 2018-11-01 2018-11 2018
ZulipMobile 0 clojure-uk 6b7c27d0b84e2cc4 151222 false 27 147575328 a621c785f8deecbf 1542094451 2018-11-01 2018-11 2018
ZulipElectron 0 clojure-uk 392433c10fddd53e 151222 false 6 147575905 a621c785f8deecbf 1542095634 2018-11-01 2018-11 2018
Apache-HttpClient 0 events d19c7e5c3106f475 262224 false 941 581605887 c73d6239136b12de 1774442659 2026-03-01 2026-03 2026
Apache-HttpClient 1 events d19c7e5c3106f475 262224 false 1782 581834590 c860442b6b6b2ee8 1774514202 2026-03-01 2026-03 2026
1774554234 Apache-HttpClient 1 events 23f35a98369b24ab 262224 true 788 582024774 c7d84a00b1e33ecb 1774554138 2026-03-01 2026-03 2026
Apache-HttpClient 0 events d19c7e5c3106f475 262224 false 843 582822755 7462ad1a80605ec3 1774968780 2026-03-01 2026-03 2026
Apache-HttpClient 0 events 4dd19d9fa5a81a0f 262224 false 592 583710500 3ee35fa470d40352 1775418564 2026-04-01 2026-04 2026
Apache-HttpClient 0 events d19c7e5c3106f475 262224 false 844 584035929 a716b082c7a51ed2 1775578980 2026-04-01 2026-04 2026
Apache-HttpClient 0 events d19c7e5c3106f475 262224 false 844 585500688 94b35beb5f01d503 1776192547 2026-04-01 2026-04 2026
ZulipElectron 0 events 2a7920fa288b6ac5 262224 false 9 585768949 94b35beb5f01d503 1776292144 2026-04-01 2026-04 2026
ZulipElectron 0 events 2b13ca51d92878c3 262224 false 15 586110306 94b35beb5f01d503 1776425996 2026-04-01 2026-04 2026
website 0 events 491c7331968ae4b2 262224 false 19 586168194 94b35beb5f01d503 1776441936 2026-04-01 2026-04 2026
1776448026 website 0 events d19c7e5c3106f475 262224 true 47 586186641 94b35beb5f01d503 1776448017 2026-04-01 2026-04 2026

Users to channels

The building block both whole-graph constructors share: user-channel-sets returns a map from user-key to the set of channels they have posted in. min-channels is a lower bound — users in fewer than min-channels channels are dropped.

(def u->chans (graph/user-channel-sets timeline))
(count u->chans)
96

The first five entries:

(->> u->chans (take 5) (into {}))
{"81c90f8fd6744027" #{"news-and-articles"},
 "71093247047f6c02" #{"clojure-europe" "events"},
 "81974a7a8a884239" #{"events"},
 "13c6ca9ef033c774"
 #{"news-and-articles"
   "gratitude"
   "scicloj-webpublic"
   "clojurecivitas"},
 "2dbe557be94bc5b0" #{"clojure-europe" "clojurecivitas"}}

The distribution over how many channels users participate in:

(->> u->chans
     vals
     (map count)
     frequencies
     (into (sorted-map)))
{1 69, 2 10, 3 8, 4 5, 5 2, 7 1, 8 1}

Channel co-membership

channel-comembership-graph returns a JGraphT DefaultUndirectedWeightedGraph. Nodes are channels, edges are weighted by the number of shared users. The :min-shared option drops thin edges.

(def co-channel
  (graph/channel-comembership-graph timeline :min-shared 1))
(.vertexSet co-channel)
#{"clojurecivitas" "news-and-articles" "gratitude" "clojure-europe" "clojure-uk" "events" "calva" "scicloj-webpublic"}
(count (.edgeSet co-channel))
28

The graph is complete — every pair of channels shares at least one user, so every possible pair becomes an edge:

The edge-weight table:

(->> (.edgeSet co-channel)
     (map (fn [e]
            {:from   (.getEdgeSource co-channel e)
             :to     (.getEdgeTarget co-channel e)
             :weight (.getEdgeWeight co-channel e)}))
     (sort-by :weight >)
     tc/dataset)

_unnamed [28 3]:

:from :to :weight
clojurecivitas scicloj-webpublic 10.0
news-and-articles scicloj-webpublic 8.0
clojure-europe events 7.0
gratitude scicloj-webpublic 7.0
events scicloj-webpublic 7.0
calva events 7.0
clojurecivitas news-and-articles 6.0
clojure-europe clojurecivitas 6.0
events gratitude 6.0
clojurecivitas gratitude 5.0
clojure-uk events 4.0
calva news-and-articles 4.0
calva clojurecivitas 4.0
clojurecivitas events 4.0
gratitude news-and-articles 3.0
clojure-uk gratitude 3.0
events news-and-articles 3.0
clojure-europe news-and-articles 2.0
clojure-uk news-and-articles 1.0
clojure-uk clojurecivitas 1.0
clojure-uk scicloj-webpublic 1.0

User co-presence

user-copresence-graph flips the construction: nodes are users, edges are weighted by shared-channel count. The defaults (:min-shared 3 :min-channels 3) keep only the densely-connected core — users active in at least three channels, paired only when they share at least three.

(def co-user
  (graph/user-copresence-graph timeline :min-shared 3 :min-channels 3))

Node and edge counts:

{:nodes (count (.vertexSet co-user))
 :edges (count (.edgeSet co-user))}
{:nodes 17, :edges 48}

Migration: where did people go next?

migration-graph is directed. For each user with at least five posts in from-set, it looks at every channel they posted in after their last from-set post, and adds an edge from each from-set source they used to each later destination. Edges with fewer than :min-users are dropped.

Taking clojurecivitas as the seed shows where clojurecivitas posters subsequently appeared.

(def migration
  (graph/migration-graph timeline #{"clojurecivitas"} :min-users 1))
(->> (.edgeSet migration)
     (map (fn [e]
            {:from   (.getEdgeSource migration e)
             :to     (.getEdgeTarget migration e)
             :weight (.getEdgeWeight migration e)}))
     (sort-by :weight >)
     tc/dataset)

_unnamed [1 3]:

:from :to :weight
clojurecivitas gratitude 1.0

Centrality

betweenness returns a map from node to its betweenness centrality score — the share of shortest paths that pass through the node. The top scores:

(->> (graph/betweenness co-channel)
     (sort-by val >)
     (take 5)
     (into (array-map)))
{"clojure-uk" 8.0,
 "news-and-articles" 6.833333333333334,
 "clojurecivitas" 0.3333333333333333,
 "gratitude" 0.0,
 "clojure-europe" 0.0}

The graph is a clique (every pair of channels is directly connected), yet betweenness is not uniformly zero. JGraphT treats edge weights as distances when computing shortest paths. With weight = shared-user count, a heavily-shared pair has a long direct edge, and a 2-hop detour through a thin-overlap channel can be shorter. The high scorers here are channels with thin overlap to most others — they sit on the cheap detours.

(boolean (some pos? (vals (graph/betweenness co-channel))))
true

Communities

Two algorithms, both returning a vector of node-sets — one set per cluster.

Girvan-Newman needs you to pick k (the desired number of clusters). On this small graph, k = 2 produces a useful split.

(graph/girvan-newman co-channel 2)
[#{"clojure-uk"
   "news-and-articles"
   "gratitude"
   "scicloj-webpublic"
   "clojure-europe"
   "clojurecivitas"
   "events"}
 #{"calva"}]
(count (graph/girvan-newman co-channel 2))
2

Label propagation chooses k itself. On a small dense graph it will often produce only one cluster — which is informative in its own right.

(graph/label-propagation co-channel)
[#{"clojure-uk"
   "news-and-articles"
   "gratitude"
   "scicloj-webpublic"
   "clojure-europe"
   "clojurecivitas"
   "calva"
   "events"}]
(count (graph/label-propagation co-channel))
1

Rendering: kind/cytoscape

->cytoscape-elements converts a JGraphT graph to the :elements shape that kind/cytoscape consumes. Optional :node-attrs and :edge-attrs functions add extra attributes to each :data map — useful for colour-coding by community or thickness by weight.

^{:kindly/options {:element/style {:height "500px" :width "100%"}}}
(let [weights (map #(.getEdgeWeight co-channel %) (.edgeSet co-channel))
      w-min   (apply min weights)
      w-max   (apply max weights)]
  (kind/cytoscape
   {:elements (graph/->cytoscape-elements co-channel)
    :style    [{:selector "node"
                :css      {:label     "data(id)"
                           :content   "data(id)"
                           :font-size 9}}
               {:selector "edge"
                :css      {:width (str "mapData(weight, " w-min ", " w-max ", 1, 8)")}}]
    :layout   {:name "cose"}}))

Rendering: kind/graphviz

->dot returns a Graphviz DOT string. Pass it straight to kind/graphviz for a static rendering. directed?, node-label, and edge-label are optional settings.

(def co-channel-dot
  (graph/->dot co-channel
               :directed false
               :edge-label (fn [[_ _ w]] (str (long w)))))
(kind/graphviz co-channel-dot)

Where to go next

You have now completed the full tutorial. From here:

  • API Reference — every public function in one chapter, with docstrings and a worked example each. The right place to look when you know which function you want.

  • The source under src/scicloj/zulipdata/ is small enough to read straight through whenever a docstring leaves you uncertain.

source: notebooks/zulipdata_book/graph.clj