6 Anonymized views
The plain views in scicloj.zulipdata.views carry real names, topic strings, and message text. That is fine for analyses that stay on your machine — but the moment a chart, a markdown table, or an exported dataset leaves your laptop, real identities and quoted content go with it.
scicloj.zulipdata.anonymize produces parallel views with:
- sender ids replaced by stable 16-hex-character
:user-keys, - topic strings replaced by stable 16-hex-character
:subject-keys, - message content dropped (only
:content-lengthsurvives).
Same shape, same join keys, no real names or message bodies.
(ns zulipdata-book.anonymize
(:require
;; Zulipdata pull -- paginated, cached channel history
[scicloj.zulipdata.pull :as pull]
;; Zulipdata anonymize -- HMAC-keyed anonymized projections
[scicloj.zulipdata.anonymize :as anon]
;; Kindly -- notebook rendering protocol
[scicloj.kindly.v4.kind :as kind]
;; Tablecloth -- dataset manipulation
[tablecloth.api :as tc]))How the keys are derived
Both keys come from HMAC-SHA256 (a one-way cryptographic hash) with a single committed salt — see src/scicloj/zulipdata/anonymize.clj. The salt is in source on purpose: re-running the analysis must produce the same keys, so that follow-up work links back to prior artifacts.
This means the published artifacts are pseudonymous, not anonymous. Anyone with the salt and access to the original Zulip data can re-identify by re-hashing. The goal is to keep real names and message text from appearing in checked-in markdown, slides, or dashboards — not to be unbreakable.
Both keys are 16 hex characters (64 bits) — wide enough that collisions are not a practical concern at this corpus’s scale (low-thousands of users, low-thousands of subjects).
Hashing one value
The two key functions, user-key and subject-key, are exposed for ad-hoc use. They are pure functions, accept nil, and return deterministic hex strings.
(anon/user-key 42)"62b81b15a6414d9b"The output is stable: hashing the same input always returns the same key.
(= (anon/user-key 42) (anon/user-key 42))trueDifferent inputs almost certainly hash to different keys:
(not= (anon/user-key 42) (anon/user-key 43))trueA nil sender id (which can happen for system messages) maps to nil rather than to a hash:
(anon/user-key nil)nil(anon/subject-key "channel introductions")"b61cd3d678d6f0da"A small sample
A single channel, kindly-dev, is enough to illustrate the anonymization layer. The cross-channel patterns come back in the narrative and graph chapters.
(def messages
(-> (pull/pull-channels! ["kindly-dev"])
(get "kindly-dev")
pull/all-messages))One row per message — anonymized
anonymized-timeline mirrors views/messages-timeline but with sender ids, sender names, subject strings, and message content replaced or removed.
(def anon-timeline (anon/anonymized-timeline messages))Note that :user-key and :subject-key are hex strings, and there is no :content column.
anon-timeline_unnamed [1134 11]:
| :last-edit-ts | :client | :reaction-count | :channel | :user-key | :stream-id | :edited | :content-length | :id | :subject-key | :timestamp |
|---|---|---|---|---|---|---|---|---|---|---|
| Internal | 0 | kindly-dev | a150cbcc9f0efb7d | 454856 | false | 109 | 468115793 | 28c21dbfe5d7f7c2 | 1725611765 | |
| 1725824532 | website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | true | 124 | 468115883 | 4335e23a0728e371 | 1725611792 |
| 1725828308 | website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | true | 124 | 468617117 | 4335e23a0728e371 | 1725824082 |
| 1725832278 | website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | true | 110 | 468624650 | 4335e23a0728e371 | 1725828301 |
| ZulipElectron | 1 | kindly-dev | 2a7920fa288b6ac5 | 454856 | false | 88 | 469159581 | a902c6f46a0525a5 | 1725988465 | |
| 1726053034 | website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | true | 251 | 469366033 | 4335e23a0728e371 | 1726053020 |
| website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | false | 188 | 469366364 | 4335e23a0728e371 | 1726053109 | |
| ZulipElectron | 0 | kindly-dev | 2a7920fa288b6ac5 | 454856 | false | 122 | 470083364 | 5ae277ebef633a96 | 1726283759 | |
| website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | false | 110 | 470117042 | 5ae277ebef633a96 | 1726298921 | |
| website | 3 | kindly-dev | f936fd411b38fdf2 | 454856 | false | 120 | 470188381 | bae18381bff69790 | 1726328425 | |
| … | … | … | … | … | … | … | … | … | … | … |
| ZulipElectron | 0 | kindly-dev | 2a7920fa288b6ac5 | 454856 | false | 2204 | 584802081 | 6373027f93f758f9 | 1775887757 | |
| ZulipElectron | 0 | kindly-dev | 2a7920fa288b6ac5 | 454856 | false | 82 | 584862300 | fcaafb397059b579 | 1775888506 | |
| ZulipFlutter | 0 | kindly-dev | 5ccd816e6ea3fe7f | 454856 | false | 107 | 584937524 | fcaafb397059b579 | 1775900223 | |
| ZulipFlutter | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | false | 181 | 584964602 | 6373027f93f758f9 | 1775924790 | |
| ZulipFlutter | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | false | 50 | 584968058 | 6373027f93f758f9 | 1775927975 | |
| ZulipElectron | 0 | kindly-dev | 2a7920fa288b6ac5 | 454856 | false | 1128 | 584974476 | 6373027f93f758f9 | 1775934118 | |
| ZulipElectron | 0 | kindly-dev | 2a7920fa288b6ac5 | 454856 | false | 38 | 584974566 | fcaafb397059b579 | 1775934196 | |
| website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | false | 13 | 584978922 | 6373027f93f758f9 | 1775938908 | |
| ZulipElectron | 0 | kindly-dev | 2a7920fa288b6ac5 | 454856 | false | 677 | 585768804 | fcaafb397059b579 | 1776292063 | |
| website | 0 | kindly-dev | d19c7e5c3106f475 | 454856 | false | 6 | 585827259 | fcaafb397059b579 | 1776324973 | |
| ZulipFlutter | 0 | kindly-dev | 5ccd816e6ea3fe7f | 454856 | false | 220 | 586159679 | fcaafb397059b579 | 1776439559 |
(tc/row-count anon-timeline)1134The distinct user-keys in this channel:
(-> anon-timeline :user-key distinct sort)("0f228145198d6bce"
"1d7079ea9e63dc37"
"1f708947496ccc88"
"2a7920fa288b6ac5"
"2dbe557be94bc5b0"
"3783b4b149fcde59"
"3a6ab2db8b6fdeee"
"4c9e777783eb9222"
"563648fe0ef5e5d8"
"56f8fdabad6761cb"
"5ccd816e6ea3fe7f"
"73c3354ec3c7165d"
"858005e0ca117ee3"
"928916c8cb9db868"
"a150cbcc9f0efb7d"
"b384f0e07335b9de"
"b6e916f8e4b1f3cc"
"b7b54d61bcd4fb14"
"d19c7e5c3106f475"
"e991fe149be52ac1"
"f936fd411b38fdf2")One row per reaction — anonymized
anonymized-reactions mirrors views/reactions-long. The emoji name (a community-sentiment signal, not message content) is preserved; the reactor’s identity and the message’s subject are both anonymized.
(def anon-reactions (anon/anonymized-reactions messages))anon-reactions_unnamed [113 9]:
| :message-id | :channel | :stream-id | :emoji-name | :message-ts | :emoji-code | :subject-key | :reactor-user-key | :reaction-type |
|---|---|---|---|---|---|---|---|---|
| 469159581 | kindly-dev | 454856 | thank_you | 1725988465 | 1f64f | a902c6f46a0525a5 | d19c7e5c3106f475 | unicode_emoji |
| 470188381 | kindly-dev | 454856 | tada | 1726328425 | 1f389 | bae18381bff69790 | d19c7e5c3106f475 | unicode_emoji |
| 470188381 | kindly-dev | 454856 | heart | 1726328425 | 2764 | bae18381bff69790 | 2a7920fa288b6ac5 | unicode_emoji |
| 470188381 | kindly-dev | 454856 | tada | 1726328425 | 1f389 | bae18381bff69790 | d3a87fc057dceac5 | unicode_emoji |
| 472817130 | kindly-dev | 454856 | smiley | 1727328746 | 1f603 | a016942505f43e89 | f936fd411b38fdf2 | unicode_emoji |
| 474132755 | kindly-dev | 454856 | pray | 1727818951 | 1f64f | 12a57bced4aecea6 | 117523698ec5b5f5 | unicode_emoji |
| 474132755 | kindly-dev | 454856 | tada | 1727818951 | 1f389 | 12a57bced4aecea6 | e991fe149be52ac1 | unicode_emoji |
| 475886087 | kindly-dev | 454856 | thank_you | 1728485510 | 1f64f | a09271f1ca7d309e | d19c7e5c3106f475 | unicode_emoji |
| 475912396 | kindly-dev | 454856 | heart | 1728491269 | 2764 | a09271f1ca7d309e | 858005e0ca117ee3 | unicode_emoji |
| 477014516 | kindly-dev | 454856 | +1 | 1729005919 | 1f44d | 4335e23a0728e371 | e991fe149be52ac1 | unicode_emoji |
| … | … | … | … | … | … | … | … | … |
| 539389999 | kindly-dev | 454856 | innocent | 1757836051 | 1f607 | cba143563e94cf64 | b7b54d61bcd4fb14 | unicode_emoji |
| 539390132 | kindly-dev | 454856 | +1 | 1757836249 | 1f44d | cba143563e94cf64 | b7b54d61bcd4fb14 | unicode_emoji |
| 539728157 | kindly-dev | 454856 | slight_smile | 1758016808 | 1f642 | cba143563e94cf64 | 2a7920fa288b6ac5 | unicode_emoji |
| 553199550 | kindly-dev | 454856 | +1 | 1762040995 | 1f44d | e57e9fabe80d53e5 | 1f708947496ccc88 | unicode_emoji |
| 553199550 | kindly-dev | 454856 | +1 | 1762040995 | 1f44d | e57e9fabe80d53e5 | b6e916f8e4b1f3cc | unicode_emoji |
| 564174719 | kindly-dev | 454856 | plus | 1765948953 | 2795 | fcaafb397059b579 | 1d7079ea9e63dc37 | unicode_emoji |
| 571198343 | kindly-dev | 454856 | plus | 1769867829 | 2795 | 9816c22ac615420f | b6e916f8e4b1f3cc | unicode_emoji |
| 571199039 | kindly-dev | 454856 | bulb | 1769868479 | 1f4a1 | 9816c22ac615420f | d19c7e5c3106f475 | unicode_emoji |
| 571261632 | kindly-dev | 454856 | plus | 1769927171 | 2795 | 5a9cf25a508fc2ae | 1d7079ea9e63dc37 | unicode_emoji |
| 571262798 | kindly-dev | 454856 | +1 | 1769928544 | 1f44d | 5a9cf25a508fc2ae | d19c7e5c3106f475 | unicode_emoji |
| 571498286 | kindly-dev | 454856 | bulb | 1770056363 | 1f4a1 | d6c9ae29a0615fb9 | d19c7e5c3106f475 | unicode_emoji |
One row per edit — anonymized
anonymized-edits mirrors views/edits-long with the editor and prior subject anonymized and prior content dropped. :prev-stream (a numeric stream id, not personal data) is left as-is.
(def anon-edits (anon/anonymized-edits messages))anon-edits_unnamed [362 7]:
| :message-id | :stream-id | :channel | :edit-ts | :editor-user-key | :prev-subject-key | :prev-stream |
|---|---|---|---|---|---|---|
| 468115883 | 454856 | kindly-dev | 1725824532 | d19c7e5c3106f475 | ||
| 468617117 | 454856 | kindly-dev | 1725828308 | d19c7e5c3106f475 | ||
| 468617117 | 454856 | kindly-dev | 1725828272 | d19c7e5c3106f475 | ||
| 468617117 | 454856 | kindly-dev | 1725824096 | d19c7e5c3106f475 | ||
| 468624650 | 454856 | kindly-dev | 1725832278 | d19c7e5c3106f475 | ||
| 469366033 | 454856 | kindly-dev | 1726053034 | d19c7e5c3106f475 | ||
| 474132755 | 454856 | kindly-dev | 1727819201 | d19c7e5c3106f475 | ||
| 474132755 | 454856 | kindly-dev | 1727819135 | d19c7e5c3106f475 | ||
| 474132755 | 454856 | kindly-dev | 1727819103 | d19c7e5c3106f475 | ||
| 474132755 | 454856 | kindly-dev | 1727818984 | d19c7e5c3106f475 | ||
| … | … | … | … | … | … | … |
| 564856994 | 454856 | kindly-dev | 1766268837 | 5ccd816e6ea3fe7f | ||
| 564859953 | 454856 | kindly-dev | 1766273466 | 2a7920fa288b6ac5 | ||
| 564860698 | 454856 | kindly-dev | 1766274811 | 5ccd816e6ea3fe7f | ||
| 564862263 | 454856 | kindly-dev | 1766277762 | 2a7920fa288b6ac5 | ||
| 564862263 | 454856 | kindly-dev | 1766277651 | 2a7920fa288b6ac5 | ||
| 571147459 | 454856 | kindly-dev | 1769820449 | 2a7920fa288b6ac5 | ||
| 571177970 | 454856 | kindly-dev | 1769849800 | d19c7e5c3106f475 | ||
| 571199039 | 454856 | kindly-dev | 1769868572 | 1d7079ea9e63dc37 | ||
| 571199039 | 454856 | kindly-dev | 1769868495 | 1d7079ea9e63dc37 | ||
| 571303192 | 454856 | kindly-dev | 1769967729 | d19c7e5c3106f475 | ||
| 584758936 | 454856 | kindly-dev | 1775855685 | 2a7920fa288b6ac5 |
What the anonymized data can — and cannot — answer
The anonymized views are designed for questions about who, when, and where — not about what was said.
Can be answered:
- Activity patterns over time, per channel, per user.
- Cohort tenure, retention, cross-channel migration.
- Reaction culture (emoji names are preserved).
- Edit rates and topic moves.
- Subject recurrence: same subject-key in many messages means the same topic thread, even though the topic text is hidden.
Cannot be answered without un-anonymizing:
- What was discussed (no content, no subject text).
- Who specifically did what (no names).
- Sentiment beyond what reactions carry.
Where to go next
Narrative —
scicloj.zulipdata.narrativeadds time columns, channel-lifecycle summaries, and newcomer-tracking helpers that operate on the anonymized timeline.Graph views —
scicloj.zulipdata.graphbuilds co-membership and co-presence graphs from the same anonymized timeline.API Reference — every public function in one chapter, with docstrings and a worked example each.