9 Extending Pocket
Last modified: 2026-02-08
(ns pocket-book.extending-pocket
(:require
;; Logging setup for this chapter (see Logging chapter):
[pocket-book.logging]
;; Pocket API:
[scicloj.pocket :as pocket]
;; Annotating kinds of visualizations:
[scicloj.kindly.v4.kind :as kind]
;; For the dataset identity example:
[tablecloth.api :as tc]
[tech.v3.dataset.modelling :as ds-mod]
;; For the Nippy serialization example:
[taoensso.nippy :as nippy]))Setup
(def cache-dir "/tmp/pocket-extending")(pocket/set-base-cache-dir! cache-dir)10:06:43.918 INFO scicloj.pocket - Cache dir set to: /tmp/pocket-extending
"/tmp/pocket-extending"(pocket/cleanup!)10:06:43.919 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-extending
{:dir "/tmp/pocket-extending", :existed false}The PIdentifiable protocol
Pocket derives cache keys from the identity of the function and its arguments. The PIdentifiable protocol controls how each value contributes to the cache key:
(kind/doc #'pocket/->id)->id
[x]
Return a cache key representation of a value. Dispatches via the PIdentifiable protocol.
For derefed Cached values, returns the same lightweight identity as the original Cached reference β the origin registry preserves the link automatically (see cache_keys notebook for details).
Default behaviors
Pocket provides default implementations for common types:
A varβs identity is its fully-qualified name:
(pocket/->id #'clojure.core/map)clojure.core/mapA mapβs identity is itself (keys are deep-sorted later for stable cache paths):
(pocket/->id {:b 2 :a 1}){:b 2, :a 1}A Cached objectβs identity captures the full computation graph:
(defn add [x y] (+ x y))(pocket/->id (pocket/cached #'add 1 2))(pocket-book.extending-pocket/add 1 2)A derefed Cached value carries its origin identity β see Under the hood: cache keys for details. This works for maps, vectors, sets, and datasets:
(defn make-pair [a b] {:a a :b b})(let [c (pocket/cached #'make-pair 1 2)]
(= (pocket/->id (deref c)) (pocket/->id c)))10:06:43.925 INFO scicloj.pocket.impl.cache - Cache miss, computing: pocket-book.extending-pocket/make-pair
10:06:43.926 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-extending/55/(pocket-book.extending-pocket_make-pair 1 2)
truenil is handled:
(pocket/->id nil)nilBuilt-in dataset support
Pocket recognizes tech.ml.dataset datasets (the type behind tablecloth) and derives their identity from the actual column data and metadata β including annotations like inference targets.
A datasetβs identity is a map of column names to {:data [...] :meta {...}}:
(def example-ds
(-> (tc/dataset {:x (range 30) :y (range 30)})
(ds-mod/set-inference-target :y)))(pocket/->id example-ds){:x
{:data
[0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29],
:meta {:name :x, :datatype :int64, :n-elems 30}},
:y
{:data
[0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29],
:meta
{:name :y, :datatype :int64, :n-elems 30, :inference-target? true}}}Two datasets with identical content produce the same identity, even when the default toString representation would truncate rows:
(def ds-a (tc/dataset {:x (range 30) :y (range 30)}))(def ds-b (tc/dataset {:x (range 30) :y (range 30)}))(= (pocket/->id ds-a) (pocket/->id ds-b))trueDatasets with different content produce different identities, even when the difference falls in rows that toString would elide:
(def ds-c (tc/dataset {:x (range 30)
:y (concat (range 15) [999] (range 16 30))}))(= (pocket/->id ds-a) (pocket/->id ds-c))falseThis means caching functions that take datasets as arguments (like ml/train) works correctly regardless of dataset size.
Extending for custom types
If you have domain-specific types, you can control how they appear in cache keys by extending PIdentifiable. This is useful when the default behavior (which uses the object itself) doesnβt produce stable or meaningful cache keys.
For example, suppose you have a record representing a dataset reference:
(defrecord DatasetRef [source version])Without extending the protocol, a DatasetRef would be treated as a plain map β its identity would be something like {:source "census", :version 3}, which works but isnβt very readable in cache directory names.
Letβs give it a concise, meaningful identity:
(extend-protocol pocket/PIdentifiable
DatasetRef
(->id [this]
(symbol (str (:source this) "-v" (:version this)))))Now the identity is a clean symbol:
(pocket/->id (->DatasetRef "census" 3))census-v3Using custom types in cached computations
(defn analyze-dataset
"Simulate analyzing a dataset."
[dataset-ref opts]
(println " Analyzing" (:source dataset-ref) "v" (:version dataset-ref) "...")
(Thread/sleep 200)
{:source (:source dataset-ref)
:version (:version dataset-ref)
:rows 1000
:method (:method opts)})The cache key now includes our custom identity:
(def analysis
(pocket/cached #'analyze-dataset
(->DatasetRef "census" 3)
{:method :regression}))(pocket/->id analysis)(pocket-book.extending-pocket/analyze-dataset
census-v3
{:method :regression})First deref computes:
(deref analysis)10:06:43.951 INFO scicloj.pocket.impl.cache - Cache miss, computing: pocket-book.extending-pocket/analyze-dataset
Analyzing census v 3 ...
10:06:44.153 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-extending/62/(pocket-book.extending-pocket_analyze-dataset census-v3 {:method :regression})
{:source "census", :version 3, :rows 1000, :method :regression}Second deref loads from cache:
(deref analysis){:source "census", :version 3, :rows 1000, :method :regression}A different version creates a different cache entry:
(deref (pocket/cached #'analyze-dataset
(->DatasetRef "census" 4)
{:method :regression}))10:06:44.158 INFO scicloj.pocket.impl.cache - Cache miss, computing: pocket-book.extending-pocket/analyze-dataset
Analyzing census v 4 ...
10:06:44.360 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-extending/f6/(pocket-book.extending-pocket_analyze-dataset census-v4 {:method :regression})
{:source "census", :version 4, :rows 1000, :method :regression}Whatβs on disk?
The cache directory names reflect our custom identities:
(pocket/dir-tree)pocket-extending
βββ 55
β βββ (pocket-book.extending-pocket_make-pair 1 2)
β βββ meta.edn
β βββ value.nippy
βββ 62
β βββ (pocket-book.extending-pocket_analyze-dataset census-v3 {:method :regression})
β βββ meta.edn
β βββ value.nippy
βββ f6
βββ (pocket-book.extending-pocket_analyze-dataset census-v4 {:method :regression})
βββ meta.edn
βββ value.nippyGuidelines
When extending PIdentifiable:
Return stable values. The identity must be the same across JVM sessions for the same logical input. Avoid including timestamps, random values, or object addresses.
Return distinct values. Two logically different inputs must produce different identities. If they donβt, Pocket will treat them as the same computation and return stale results.
Keep it readable. The identity becomes part of the cache directory name. Symbols and short strings work well.
Prefer symbols or keywords over complex nested structures. They produce clean, short directory names.
Records and plain maps can collide. A record like
(->DatasetRef "census" 3)and a plain map{:source "census" :version 3}produce the same default cache key (both are maps with the same keys). If you use records as cache arguments, extendPIdentifiableto give them a distinct identity β as shown above.
Custom Nippy serialization
Pocket uses Nippy for fast binary serialization. Most Clojure data structures and many Java objects serialize automatically. However, if we cache values containing custom types, we may need to extend Nippy.
Common types that work out of the box:
- All Clojure collections (vectors, maps, sets, lists)
- Primitives, strings, keywords, symbols
- Java Date, UUID, BigDecimal, BigInteger
- Records and deftypes (if all fields are serializable)
- Tribuo ML models
- tech.ml.dataset datasets
Types that require extension:
Objects with unserializable fields (e.g., open file handles, database connections, thread pools)
Custom Java classes from external libraries β even if they implement
Serializable, Nippy 3 checks a thaw allowlist and will quarantine classes not on it. Extending Nippy directly (as shown below) avoids this issue entirely.
Example: a custom model type
Suppose we have a record that wraps model weights. Out of the box, Nippy can freeze records whose fields are all serializable β but letβs say our record contains a Java array or another type that Nippy doesnβt handle natively. We extend freeze and thaw explicitly:
(defrecord MyModel [weights bias])(nippy/extend-freeze MyModel :my-model
[x data-output]
(nippy/freeze-to-out! data-output (:weights x))
(nippy/freeze-to-out! data-output (:bias x)))nil(do (nippy/extend-thaw :my-model
[data-input]
(->MyModel (nippy/thaw-from-in! data-input)
(nippy/thaw-from-in! data-input)))
:done)Warning: resetting Nippy thaw for custom type with id: :my-model
:doneWe can verify the round-trip works:
(def original (->MyModel [0.5 -0.3 1.2] 0.1))(= original (nippy/thaw (nippy/freeze original)))trueNow caching a function that returns a MyModel works seamlessly:
(defn train-my-model [data]
(->MyModel (mapv #(* % 0.01) data) 0.42))(let [result (deref (pocket/cached #'train-my-model [10 20 30]))]
result)10:06:44.387 INFO scicloj.pocket.impl.cache - Cache miss, computing: pocket-book.extending-pocket/train-my-model
10:06:44.388 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-extending/ac/(pocket-book.extending-pocket_train-my-model [10 20 30])
{:weights [0.1 0.2 0.3], :bias 0.42}See the Nippy documentation for more details.
Cleanup
(pocket/cleanup!)10:06:44.391 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-extending
{:dir "/tmp/pocket-extending", :existed true}