7  Narrative

scicloj.zulipdata.narrative is a small toolkit for the kinds of questions that recur across analyses on this corpus: enriching a timeline with date columns, summarising channel lifecycles, selecting a sub-set of channels by name or by shared user-base, tracing newcomers’ prior paths, and counting monthly activity.

Everything in this chapter operates on an anonymized timeline — a tablecloth dataset with :channel, :user-key, and :timestamp columns produced by anonymized-timeline (see Anonymized views). The helpers do not depend on anonymization; they only require those columns. We work on the anonymized form throughout because that is what the next step (Graph views) expects, and because there is no reason to handle real names for these aggregates.

(ns zulipdata-book.narrative
  (:require
   ;; Zulipdata pull -- paginated, cached channel history
   [scicloj.zulipdata.pull :as pull]
   ;; Zulipdata anonymize -- HMAC-keyed anonymized projections
   [scicloj.zulipdata.anonymize :as anon]
   ;; Zulipdata narrative -- date columns, lifecycles, newcomer tracking
   [scicloj.zulipdata.narrative :as nar]
   ;; Kindly -- notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Tablecloth -- dataset manipulation
   [tablecloth.api :as tc]))

A multi-channel sample

This chapter needs more than one channel — channels-by-shared-users and prior-channels-of-newcomers are about cross-channel structure. We pull every web-public channel of the Clojurians Zulip; the cache serves repeated runs.

(def sample-channels
  (pull/web-public-channel-names))
(def messages
  (->> (pull/pull-channels! sample-channels)
       (filter (fn [[k _]] (string? k)))
       (mapcat (fn [[_ r]] (pull/all-messages r)))))
(count messages)
1407577
(def base-timeline
  (anon/anonymized-timeline messages))
base-timeline

_unnamed [1407577 11]:

:last-edit-ts :client :reaction-count :channel :user-key :stream-id :edited :content-length :id :subject-key :timestamp
Internal 0 clojure-uk 30f24f0b44b99e93 151222 false 27 147403047 6777fcbe881b91ed 1541800305
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 8 147403098 a621c785f8deecbf 1541800328
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 7 147422253 a621c785f8deecbf 1541832580
website 0 clojure-uk 7066f94b066c86cf 151222 false 7 147542850 a621c785f8deecbf 1542047347
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 8 147544737 a621c785f8deecbf 1542049359
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 60 147554684 80ea9bf4e69d1493 1542060998
ZulipMobile 0 clojure-uk 59c5550a8a9f258f 151222 false 12 147572603 a621c785f8deecbf 1542089372
ZulipMobile 0 clojure-uk 6b7c27d0b84e2cc4 151222 false 12 147575275 a621c785f8deecbf 1542094392
ZulipMobile 0 clojure-uk 6b7c27d0b84e2cc4 151222 false 27 147575328 a621c785f8deecbf 1542094451
ZulipElectron 0 clojure-uk 392433c10fddd53e 151222 false 6 147575905 a621c785f8deecbf 1542095634
website 0 clojurescript c51b54546f12fd12 151762 false 78 565125188 f3341590e4101633 1766482613
ZulipElectron 0 clojurescript 0b9294058f3df84e 151762 false 75 565125482 f3341590e4101633 1766482751
ZulipElectron 0 clojurescript 0b9294058f3df84e 151762 false 138 565125572 f3341590e4101633 1766482790
1766484713 website 0 clojurescript c51b54546f12fd12 151762 true 346 565129729 f3341590e4101633 1766484689
website 0 clojurescript c51b54546f12fd12 151762 false 265 565129997 f3341590e4101633 1766484814
ZulipElectron 0 clojurescript 0b9294058f3df84e 151762 false 61 565130083 f3341590e4101633 1766484864
website 0 clojurescript c51b54546f12fd12 151762 false 19 565130455 f3341590e4101633 1766485028
website 0 clojurescript c51b54546f12fd12 151762 false 41 565130792 f3341590e4101633 1766485163
Internal 0 clojurescript a150cbcc9f0efb7d 151762 false 64 565134508 f3341590e4101633 1766486844
1776438651 website 0 clojurescript 1410ba2085076651 151762 true 956 586152497 c786937dcae35ee5 1776437652
website 1 clojurescript 7ac0128a57133cef 151762 false 41 586165694 c786937dcae35ee5 1776441304
(tc/row-count base-timeline)
1407577

Adding date columns

Most analyses bucket activity by month or year. with-time-columns adds three derived columns from :timestamp (epoch seconds, UTC): :month-date (a LocalDate set to the first of the month), :year-month (a "YYYY-MM" string), and :year (an integer).

The three are different shapes for different uses: LocalDate values plot on a real calendar axis, strings sort lexicographically for grouping, integers behave well as numeric facets.

(def timeline (nar/with-time-columns base-timeline))
(-> timeline tc/column-names sort)
(:channel
 :client
 :content-length
 :edited
 :id
 :last-edit-ts
 :month-date
 :reaction-count
 :stream-id
 :subject-key
 :timestamp
 :user-key
 :year
 :year-month)
(every? (set (tc/column-names timeline))
        [:month-date :year-month :year])
true

The three new columns, freshest first:

(-> timeline
    (tc/select-columns [:timestamp :month-date :year-month :year])
    (tc/order-by :timestamp :desc)
    (tc/head 5))

_unnamed [5 4]:

:timestamp :month-date :year-month :year
1777747330 2026-05-01 2026-05 2026
1777747161 2026-05-01 2026-05 2026
1777739498 2026-05-01 2026-05 2026
1777734093 2026-05-01 2026-05 2026
1777655748 2026-05-01 2026-05 2026

The same three derivations are also exposed as scalar helpers (ts->month-date, ts->year-month, ts->year), in case you need them for one-off arithmetic without a dataset:

(let [ts (-> timeline :timestamp first)]
  {:ts         ts
   :month-date (nar/ts->month-date ts)
   :year-month (nar/ts->year-month ts)
   :year       (nar/ts->year ts)})
{:ts 1541800305,
 :month-date #object[java.time.LocalDate 0x3e5a3a42 "2018-11-01"],
 :year-month "2018-11",
 :year 2018}

Channel lifecycles

channel-lifecycle is the one-row-per-channel summary used in activity reports. It summarises every message in the timeline into five columns per channel: first month, last month, total messages, distinct active months, and distinct (anonymized) users. Sorted ascending by first-date.

(def lifecycles (nar/channel-lifecycle timeline))
lifecycles

_unnamed [26 6]:

:channel :first-date :last-date :total :active-months :distinct-users
clojure-uk 2018-11-01 2026-04-01 110 10 16
clojure 2018-11-01 2026-04-01 9020 72 194
general 2018-11-01 2026-03-01 267 29 60
announce 2018-11-01 2026-04-01 23252 90 51
calva 2018-11-01 2025-10-01 591 17 20
beginners 2018-11-01 2026-03-01 4443 77 207
honeysql 2018-11-01 2026-01-01 456 16 5
jobs 2018-11-01 2026-03-01 22 15 19
sql 2018-11-01 2026-04-01 459 33 14
zulip 2018-11-01 2026-01-01 775 52 61
windows-clojure 2023-01-01 2024-12-01 7 2 3
bubble-up 2024-07-01 2024-07-01 1 1 1
gratitude 2024-12-01 2026-03-01 23 8 15
clojure-europe 2024-12-01 2025-12-01 463 10 19
news-and-articles 2024-12-01 2026-04-01 99 14 15
project-announcements 2024-12-01 2026-04-01 30 10 9
clojars 2024-12-01 2024-12-01 3 1 2
clojure-ohio 2025-02-01 2025-02-01 2 1 2
scicloj-webpublic 2025-03-01 2026-04-01 336 14 25
std.lang-dev 2025-04-01 2025-12-01 73 3 8
clojurecivitas 2025-09-01 2026-05-01 273 9 17

The number of channels matches the number of distinct channels in the timeline:

(tc/row-count lifecycles)
26

Selecting channels by name pattern

channels-by-name-pattern is a thin convenience around re-find against the distinct :channel values. Quick and useful for picking out a name-defined cluster — but fragile because it depends on naming conventions.

(nar/channels-by-name-pattern timeline #"civitas|gratitude")
["clojurecivitas" "gratitude"]

Selecting channels by shared user-base

channels-by-shared-users is the user-overlap counterpart to the name pattern. Pick a seed-channel, take its top-N posters, and return every channel where those users account for at least share of activity. Use this to build a cluster around a seed channel by who posts there, rather than by name.

Tightening :share shrinks the result: at 0.5 the seed’s top posters account for at least half the activity in only three channels (clojurecivitas itself, events, scicloj-webpublic).

(nar/channels-by-shared-users timeline "clojurecivitas"
                              :share 0.5 :min-msgs 5 :top-n 5)
["clojurecivitas" "events" "scicloj-webpublic"]

First posters of a channel

first-posters-of-channel returns the first n distinct :user-keys to ever post in a channel, with the date of their first post. Useful for identifying a channel’s earliest contributors.

(def civitas-first-posters
  (nar/first-posters-of-channel timeline "clojurecivitas" 5))
civitas-first-posters

_unnamed [5 2]:

:user-key :first-post-date
13c6ca9ef033c774 2025-09-01
2a7920fa288b6ac5 2025-09-01
a150cbcc9f0efb7d 2025-09-01
757617661aa406b2 2025-09-01
d19c7e5c3106f475 2025-10-01
(tc/row-count civitas-first-posters)
5

Tracing newcomers’ prior channels

prior-channels-of-newcomers answers: for everyone whose first post in channel falls in year-month, where else had they been posting in the timeline before that first post? Returns one row per prior channel with the count of newcomers who passed through it.

A note on scope. “Prior channels” is restricted to whatever you pulled. We pulled every web-public channel, so the answer covers the whole web-public community; non-web-public prior activity is invisible.

(nar/prior-channels-of-newcomers timeline "clojurecivitas" "2025-10")

_unnamed [16 2]:

:prior-channel :newcomers-touched
news-and-articles 1
gratitude 1
clojure 1
scicloj-webpublic 1
clojure-europe 1
slack-archive 1
general 1
announce 1
calva 1
beginners 1
events 1
xtdb 1
std.lang-dev 1
zulip 1
off-topic 1
clojurescript 1

Monthly activity per channel

channel-monthly-activity is the long-form basis for any activity-over-time chart: one row per (channel, month-date) with a :msgs count. Pass an optional set of channel names to restrict the output.

(def civitas-monthly
  (nar/channel-monthly-activity timeline #{"clojurecivitas"}))
civitas-monthly

_unnamed [9 3]:

:channel :month-date :msgs
clojurecivitas 2025-09-01 20
clojurecivitas 2025-10-01 3
clojurecivitas 2025-11-01 55
clojurecivitas 2025-12-01 40
clojurecivitas 2026-01-01 60
clojurecivitas 2026-02-01 29
clojurecivitas 2026-03-01 28
clojurecivitas 2026-04-01 26
clojurecivitas 2026-05-01 12

The total over the channel matches the lifecycle row:

(reduce + (:msgs civitas-monthly))
273

Where to go next

  • Graph viewsscicloj.zulipdata.graph builds co-membership and co-presence graphs from the same anonymized timeline, plus utilities for community detection and rendering.

  • API Reference — every public function in one chapter, with docstrings and a worked example each.

source: notebooks/zulipdata_book/narrative.clj