6  Anonymized views

The plain views in scicloj.zulipdata.views carry real names, topic strings, and message text. That is fine for analyses that stay on your machine — but the moment a chart, a markdown table, or an exported dataset leaves your laptop, real identities and quoted content go with it.

scicloj.zulipdata.anonymize produces parallel views with:

Same shape, same join keys, no real names or message bodies.

(ns zulipdata-book.anonymize
  (:require
   ;; Zulipdata pull -- paginated, cached channel history
   [scicloj.zulipdata.pull :as pull]
   ;; Zulipdata anonymize -- HMAC-keyed anonymized projections
   [scicloj.zulipdata.anonymize :as anon]
   ;; Kindly -- notebook rendering protocol
   [scicloj.kindly.v4.kind :as kind]
   ;; Tablecloth -- dataset manipulation
   [tablecloth.api :as tc]))

How the keys are derived

Both keys come from HMAC-SHA256 (a one-way cryptographic hash) with a single committed salt — see src/scicloj/zulipdata/anonymize.clj. The salt is in source on purpose: re-running the analysis must produce the same keys, so that follow-up work links back to prior artifacts.

This means the published artifacts are pseudonymous, not anonymous. Anyone with the salt and access to the original Zulip data can re-identify by re-hashing. The goal is to keep real names and message text from appearing in checked-in markdown, slides, or dashboards — not to be unbreakable.

Both keys are 16 hex characters (64 bits) — wide enough that collisions are not a practical concern at this corpus’s scale (low-thousands of users, low-thousands of subjects).

Hashing one value

The two key functions, user-key and subject-key, are exposed for ad-hoc use. They are pure functions, accept nil, and return deterministic hex strings.

(anon/user-key 42)
"62b81b15a6414d9b"

The output is stable: hashing the same input always returns the same key.

(= (anon/user-key 42) (anon/user-key 42))
true

Different inputs almost certainly hash to different keys:

(not= (anon/user-key 42) (anon/user-key 43))
true

A nil sender id (which can happen for system messages) maps to nil rather than to a hash:

(anon/user-key nil)
nil
(anon/subject-key "channel introductions")
"b61cd3d678d6f0da"

A small sample

A single channel, kindly-dev, is enough to illustrate the anonymization layer. The cross-channel patterns come back in the narrative and graph chapters.

(def messages
  (-> (pull/pull-channels! ["kindly-dev"])
      (get "kindly-dev")
      pull/all-messages))

One row per message — anonymized

anonymized-timeline mirrors views/messages-timeline but with sender ids, sender names, subject strings, and message content replaced or removed.

(def anon-timeline (anon/anonymized-timeline messages))

Note that :user-key and :subject-key are hex strings, and there is no :content column.

anon-timeline

_unnamed [1134 11]:

:last-edit-ts :client :reaction-count :channel :user-key :stream-id :edited :content-length :id :subject-key :timestamp
Internal 0 kindly-dev a150cbcc9f0efb7d 454856 false 109 468115793 28c21dbfe5d7f7c2 1725611765
1725824532 website 0 kindly-dev d19c7e5c3106f475 454856 true 124 468115883 4335e23a0728e371 1725611792
1725828308 website 0 kindly-dev d19c7e5c3106f475 454856 true 124 468617117 4335e23a0728e371 1725824082
1725832278 website 0 kindly-dev d19c7e5c3106f475 454856 true 110 468624650 4335e23a0728e371 1725828301
ZulipElectron 1 kindly-dev 2a7920fa288b6ac5 454856 false 88 469159581 a902c6f46a0525a5 1725988465
1726053034 website 0 kindly-dev d19c7e5c3106f475 454856 true 251 469366033 4335e23a0728e371 1726053020
website 0 kindly-dev d19c7e5c3106f475 454856 false 188 469366364 4335e23a0728e371 1726053109
ZulipElectron 0 kindly-dev 2a7920fa288b6ac5 454856 false 122 470083364 5ae277ebef633a96 1726283759
website 0 kindly-dev d19c7e5c3106f475 454856 false 110 470117042 5ae277ebef633a96 1726298921
website 3 kindly-dev f936fd411b38fdf2 454856 false 120 470188381 bae18381bff69790 1726328425
ZulipElectron 0 kindly-dev 2a7920fa288b6ac5 454856 false 2204 584802081 6373027f93f758f9 1775887757
ZulipElectron 0 kindly-dev 2a7920fa288b6ac5 454856 false 82 584862300 fcaafb397059b579 1775888506
ZulipFlutter 0 kindly-dev 5ccd816e6ea3fe7f 454856 false 107 584937524 fcaafb397059b579 1775900223
ZulipFlutter 0 kindly-dev d19c7e5c3106f475 454856 false 181 584964602 6373027f93f758f9 1775924790
ZulipFlutter 0 kindly-dev d19c7e5c3106f475 454856 false 50 584968058 6373027f93f758f9 1775927975
ZulipElectron 0 kindly-dev 2a7920fa288b6ac5 454856 false 1128 584974476 6373027f93f758f9 1775934118
ZulipElectron 0 kindly-dev 2a7920fa288b6ac5 454856 false 38 584974566 fcaafb397059b579 1775934196
website 0 kindly-dev d19c7e5c3106f475 454856 false 13 584978922 6373027f93f758f9 1775938908
ZulipElectron 0 kindly-dev 2a7920fa288b6ac5 454856 false 677 585768804 fcaafb397059b579 1776292063
website 0 kindly-dev d19c7e5c3106f475 454856 false 6 585827259 fcaafb397059b579 1776324973
ZulipFlutter 0 kindly-dev 5ccd816e6ea3fe7f 454856 false 220 586159679 fcaafb397059b579 1776439559
(tc/row-count anon-timeline)
1134

The distinct user-keys in this channel:

(-> anon-timeline :user-key distinct sort)
("0f228145198d6bce"
 "1d7079ea9e63dc37"
 "1f708947496ccc88"
 "2a7920fa288b6ac5"
 "2dbe557be94bc5b0"
 "3783b4b149fcde59"
 "3a6ab2db8b6fdeee"
 "4c9e777783eb9222"
 "563648fe0ef5e5d8"
 "56f8fdabad6761cb"
 "5ccd816e6ea3fe7f"
 "73c3354ec3c7165d"
 "858005e0ca117ee3"
 "928916c8cb9db868"
 "a150cbcc9f0efb7d"
 "b384f0e07335b9de"
 "b6e916f8e4b1f3cc"
 "b7b54d61bcd4fb14"
 "d19c7e5c3106f475"
 "e991fe149be52ac1"
 "f936fd411b38fdf2")

One row per reaction — anonymized

anonymized-reactions mirrors views/reactions-long. The emoji name (a community-sentiment signal, not message content) is preserved; the reactor’s identity and the message’s subject are both anonymized.

(def anon-reactions (anon/anonymized-reactions messages))
anon-reactions

_unnamed [113 9]:

:message-id :channel :stream-id :emoji-name :message-ts :emoji-code :subject-key :reactor-user-key :reaction-type
469159581 kindly-dev 454856 thank_you 1725988465 1f64f a902c6f46a0525a5 d19c7e5c3106f475 unicode_emoji
470188381 kindly-dev 454856 tada 1726328425 1f389 bae18381bff69790 d19c7e5c3106f475 unicode_emoji
470188381 kindly-dev 454856 heart 1726328425 2764 bae18381bff69790 2a7920fa288b6ac5 unicode_emoji
470188381 kindly-dev 454856 tada 1726328425 1f389 bae18381bff69790 d3a87fc057dceac5 unicode_emoji
472817130 kindly-dev 454856 smiley 1727328746 1f603 a016942505f43e89 f936fd411b38fdf2 unicode_emoji
474132755 kindly-dev 454856 pray 1727818951 1f64f 12a57bced4aecea6 117523698ec5b5f5 unicode_emoji
474132755 kindly-dev 454856 tada 1727818951 1f389 12a57bced4aecea6 e991fe149be52ac1 unicode_emoji
475886087 kindly-dev 454856 thank_you 1728485510 1f64f a09271f1ca7d309e d19c7e5c3106f475 unicode_emoji
475912396 kindly-dev 454856 heart 1728491269 2764 a09271f1ca7d309e 858005e0ca117ee3 unicode_emoji
477014516 kindly-dev 454856 +1 1729005919 1f44d 4335e23a0728e371 e991fe149be52ac1 unicode_emoji
539389999 kindly-dev 454856 innocent 1757836051 1f607 cba143563e94cf64 b7b54d61bcd4fb14 unicode_emoji
539390132 kindly-dev 454856 +1 1757836249 1f44d cba143563e94cf64 b7b54d61bcd4fb14 unicode_emoji
539728157 kindly-dev 454856 slight_smile 1758016808 1f642 cba143563e94cf64 2a7920fa288b6ac5 unicode_emoji
553199550 kindly-dev 454856 +1 1762040995 1f44d e57e9fabe80d53e5 1f708947496ccc88 unicode_emoji
553199550 kindly-dev 454856 +1 1762040995 1f44d e57e9fabe80d53e5 b6e916f8e4b1f3cc unicode_emoji
564174719 kindly-dev 454856 plus 1765948953 2795 fcaafb397059b579 1d7079ea9e63dc37 unicode_emoji
571198343 kindly-dev 454856 plus 1769867829 2795 9816c22ac615420f b6e916f8e4b1f3cc unicode_emoji
571199039 kindly-dev 454856 bulb 1769868479 1f4a1 9816c22ac615420f d19c7e5c3106f475 unicode_emoji
571261632 kindly-dev 454856 plus 1769927171 2795 5a9cf25a508fc2ae 1d7079ea9e63dc37 unicode_emoji
571262798 kindly-dev 454856 +1 1769928544 1f44d 5a9cf25a508fc2ae d19c7e5c3106f475 unicode_emoji
571498286 kindly-dev 454856 bulb 1770056363 1f4a1 d6c9ae29a0615fb9 d19c7e5c3106f475 unicode_emoji

One row per edit — anonymized

anonymized-edits mirrors views/edits-long with the editor and prior subject anonymized and prior content dropped. :prev-stream (a numeric stream id, not personal data) is left as-is.

(def anon-edits (anon/anonymized-edits messages))
anon-edits

_unnamed [362 7]:

:message-id :stream-id :channel :edit-ts :editor-user-key :prev-subject-key :prev-stream
468115883 454856 kindly-dev 1725824532 d19c7e5c3106f475
468617117 454856 kindly-dev 1725828308 d19c7e5c3106f475
468617117 454856 kindly-dev 1725828272 d19c7e5c3106f475
468617117 454856 kindly-dev 1725824096 d19c7e5c3106f475
468624650 454856 kindly-dev 1725832278 d19c7e5c3106f475
469366033 454856 kindly-dev 1726053034 d19c7e5c3106f475
474132755 454856 kindly-dev 1727819201 d19c7e5c3106f475
474132755 454856 kindly-dev 1727819135 d19c7e5c3106f475
474132755 454856 kindly-dev 1727819103 d19c7e5c3106f475
474132755 454856 kindly-dev 1727818984 d19c7e5c3106f475
564856994 454856 kindly-dev 1766268837 5ccd816e6ea3fe7f
564859953 454856 kindly-dev 1766273466 2a7920fa288b6ac5
564860698 454856 kindly-dev 1766274811 5ccd816e6ea3fe7f
564862263 454856 kindly-dev 1766277762 2a7920fa288b6ac5
564862263 454856 kindly-dev 1766277651 2a7920fa288b6ac5
571147459 454856 kindly-dev 1769820449 2a7920fa288b6ac5
571177970 454856 kindly-dev 1769849800 d19c7e5c3106f475
571199039 454856 kindly-dev 1769868572 1d7079ea9e63dc37
571199039 454856 kindly-dev 1769868495 1d7079ea9e63dc37
571303192 454856 kindly-dev 1769967729 d19c7e5c3106f475
584758936 454856 kindly-dev 1775855685 2a7920fa288b6ac5

What the anonymized data can — and cannot — answer

The anonymized views are designed for questions about who, when, and where — not about what was said.

Can be answered:

  • Activity patterns over time, per channel, per user.
  • Cohort tenure, retention, cross-channel migration.
  • Reaction culture (emoji names are preserved).
  • Edit rates and topic moves.
  • Subject recurrence: same subject-key in many messages means the same topic thread, even though the topic text is hidden.

Cannot be answered without un-anonymizing:

  • What was discussed (no content, no subject text).
  • Who specifically did what (no names).
  • Sentiment beyond what reactions carry.

Where to go next

  • Narrativescicloj.zulipdata.narrative adds time columns, channel-lifecycle summaries, and newcomer-tracking helpers that operate on the anonymized timeline.

  • Graph viewsscicloj.zulipdata.graph builds co-membership and co-presence graphs from the same anonymized timeline.

  • API Reference — every public function in one chapter, with docstrings and a worked example each.

source: notebooks/zulipdata_book/anonymize.clj