11 π§ Draft: pocket-model β drop-in caching for metamorph.ml
Last modified: 2026-02-08
This chapter shows how to cache model training in a metamorph.ml pipeline using Pocket. We define a small pocket-model function β a drop-in replacement for ml/model β and use it with cross-validation, grid search, and multiple model types.
Background
metamorph.ml is the Scicloj library for machine learning pipelines. It builds on metamorph, a data-transformation framework where each step is a function that takes a context map and returns an updated one. metamorph.ml distinguishes two modes β :fit (learn from training data) and :transform (apply to new data) β so a pipeline can be trained once and reused for prediction.
On top of this, metamorph.ml adds model training/prediction, cross-validation (evaluate-pipelines), loss functions, and hyperparameter search. A typical workflow looks like:
- Define a pipeline of preprocessing + model steps
- Split data into folds
- Call
evaluate-pipelinesto train and score across folds - Compare results, pick the best model
Why cache with Pocket?
metamorph.ml includes a built-in caching mechanism. This notebook explores what happens when we use Pocketβs caching instead, bringing a few things that are natural to Pocketβs design:
Disk persistence β cached models survive JVM restarts, so we can pick up where we left off across sessions
Content-based keys β cache keys derived from function identity and full argument values via SHA-1
Concurrent dedup β when multiple threads request the same computation, only one trains and the rest wait for the result
The integration is lightweight: a pocket-model function that is a drop-in replacement for ml/model. We swap one pipeline step and everything else β evaluate-pipelines, preprocessing, grid search β stays the same.
What this gives us:
- Same pipeline code, same
evaluate-pipelines - Model training cached to disk (survives JVM restarts)
- Graceful fallback for non-serializable models
What this notebook does not cover: because pocket-model plugs into metamorph.mlβs existing pipeline machinery, only the model-training step is cached through Pocket. Preprocessing, splitting, and evaluation happen outside Pocketβs awareness β there is no computational DAG tracking the full pipeline, no per-step storage control (choosing whether each step caches to disk, memory, or not at all), and no provenance trail that connects a final metric back to the data and parameters that produced it. A companion notebook is in the works, exploring a deeper integration where every pipeline step is a Pocket caching-fn, giving us all of those things.
Setup
(ns pocket-book.pocket-model
(:require
;; Logging setup for this chapter (see Logging chapter):
[pocket-book.logging]
;; Pocket API:
[scicloj.pocket :as pocket]
;; Annotating kinds of visualizations:
[scicloj.kindly.v4.kind :as kind]
;; Data processing:
[tablecloth.api :as tc]
[tablecloth.column.api :as tcc]
[tech.v3.dataset.modelling :as ds-mod]
[tech.v3.dataset.column-filters :as cf]
;; Machine learning:
[scicloj.metamorph.ml :as ml]
[scicloj.metamorph.ml.loss :as loss]
[scicloj.metamorph.ml.regression]
[scicloj.metamorph.core :as mm]
[scicloj.ml.tribuo]))(def cache-dir "/tmp/pocket-model")(pocket/set-base-cache-dir! cache-dir)[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket - Cache dir set to: /tmp/pocket-model
"/tmp/pocket-model"(pocket/cleanup!)[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed false}The pocket-model function
This is the core of the integration. It follows the same contract as ml/model β a metamorph step that trains in :fit mode and predicts in :transform mode. The only difference: ml/train is wrapped with pocket/cached.
If Nippy canβt serialize a model (e.g., Apache Commons Math OLS), it falls back to uncached training automatically.
(defn pocket-model
"Drop-in replacement for ml/model that caches training via Pocket.
Falls back to uncached training if serialization fails."
[options]
(fn [{:metamorph/keys [id data mode] :as ctx}]
(case mode
:fit
(let [model (try
(deref (pocket/cached #'ml/train data options))
(catch Exception _e
(ml/train data options)))]
(assoc ctx id (assoc model :scicloj.metamorph.ml/unsupervised?
(get (ml/options->model-def options)
:unsupervised? false))))
:transform
(let [model (get ctx id)]
(if (get model :scicloj.metamorph.ml/unsupervised?)
ctx
(-> ctx
(update id assoc
:scicloj.metamorph.ml/feature-ds (cf/feature data)
:scicloj.metamorph.ml/target-ds (cf/target data))
(assoc :metamorph/data (ml/predict data model))))))))Test data
Simple synthetic regression: y = 3x + noise. 200 rows, enough for quick feedback.
(def ds (-> (let [rng (java.util.Random. 42)]
(tc/dataset
{:x (vec (repeatedly 200 #(* 10.0 (.nextDouble rng))))
:y (vec (repeatedly 200 #(+ (* 3.0 (* 10.0 (.nextDouble rng)))
(* 2.0 (.nextGaussian rng)))))}))
(ds-mod/set-inference-target :y)))(def splits (tc/split->seq ds :kfold {:k 3 :seed 42}))(count splits)3Basic usage
Use pocket-model in place of ml/model. The {:metamorph/id :model} map step sets the step ID that evaluate-pipelines expects.
(def cart-spec
{:model-type :scicloj.ml.tribuo/regression
:tribuo-components [{:name "cart"
:type "org.tribuo.regression.rtree.CARTRegressionTrainer"
:properties {:maxDepth "8"}}]
:tribuo-trainer-name "cart"})(def pipe-cart
(mm/pipeline
{:metamorph/id :model}
(pocket-model cart-spec)))First run β trains 3 models (one per fold):
(def results-1
(ml/evaluate-pipelines
[pipe-cart]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
(mapv #(-> % :test-transform :metric) (flatten results-1))[10.938693265902357 11.23113067170221 12.12978921023711]Cache now has 3 entries (one per fold):
(pocket/cache-stats){:total-entries 3,
:total-size-bytes 53366,
:entries-per-fn {"scicloj.metamorph.ml/train" 3}}Second run β all cache hits, same metrics:
(def results-2
(ml/evaluate-pipelines
[pipe-cart]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))(= (mapv #(-> % :test-transform :metric) (flatten results-1))
(mapv #(-> % :test-transform :metric) (flatten results-2)))trueIncremental grid search
Start with 3 depth values, then add 3 more. Only new combinations train β existing ones hit cache.
(pocket/cleanup!)[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}(defn cart-pipe [max-depth]
(mm/pipeline
{:metamorph/id :model}
(pocket-model
{:model-type :scicloj.ml.tribuo/regression
:tribuo-components [{:name "cart"
:type "org.tribuo.regression.rtree.CARTRegressionTrainer"
:properties {:maxDepth (str max-depth)}}]
:tribuo-trainer-name "cart"})))Batch 1: depths 4, 8, 12
(def batch-1
(ml/evaluate-pipelines
(mapv cart-pipe [4 8 12])
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/46/46551ba53c7214873653fd678fb5af5911fd74a8
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/6f/6fbb7bae3b90b8268e38a91c951088c6cfa3cadb
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/45/45b255de5fd11e966a46bb2bc197bd25e7dc3f6d
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/3b/3b861196a2d47cb991e62014fd3765c71da240f1
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/b9/b95687a74377288768e9d5b6046213e733681bd6
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/78/78e74ef9c84789d8e65025fb80d4d14782a68ba3
3 depths Γ 3 folds = 9 trainings:
(pocket/cache-stats){:total-entries 9,
:total-size-bytes 158759,
:entries-per-fn {"scicloj.metamorph.ml/train" 9}}Batch 2: depths 4, 6, 8, 10, 12, 16 Depths 4, 8, 12 already cached β only 6, 10, 16 are new
(def batch-2
(ml/evaluate-pipelines
(mapv cart-pipe [4 6 8 10 12 16])
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/fa/fa43248538238553b66d37050b738a274e96a914
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/73/734a310323e487ea805c11c6c52bda4e9f95e81b
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/ee/eecb6e24aaa957374028b36ad146f7cbde70d19e
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/a1/a1e22191f6bb58b528e6f441dff21c21b53a6614
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/8a/8a0abfef601073e2647c21be0e36785ef4621535
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/3a/3a962b4c4929eecc8207ab8210d2761d069d7913
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/b5/b5dd934981c42d27635da730d93b58cd25d29824
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/ba/bacfed136727155cfce4b6aae389e18c5bccb87a
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9f/9f60d660ba8f69b08c04e5e0789fb8c963ebb49e
3 new depths Γ 3 folds = 9 new + 9 cached = 18 total:
(pocket/cache-stats){:total-entries 18,
:total-size-bytes 322356,
:entries-per-fn {"scicloj.metamorph.ml/train" 18}}Combine results β best depth by mean RMSE:
(let [depths [4 6 8 10 12 16]
means (mapv (fn [pipeline-results]
(tcc/mean (map #(-> % :test-transform :metric) pipeline-results)))
batch-2)]
(tc/dataset {:depth depths :mean-rmse means}))_unnamed [6 2]:
| :depth | :mean-rmse |
|---|---|
| 4 | 12.89681757 |
| 6 | 12.36344906 |
| 8 | 12.28048327 |
| 10 | 11.43320438 |
| 12 | 10.68677435 |
| 16 | 9.86061013 |
Multiple model types
Compare CART, linear SGD, and fastmath OLS in the same evaluation. Each model type is cached independently.
(pocket/cleanup!)[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}(def sgd-spec
{:model-type :scicloj.ml.tribuo/regression
:tribuo-components [{:name "squared"
:type "org.tribuo.regression.sgd.objectives.SquaredLoss"}
{:name "linear-sgd"
:type "org.tribuo.regression.sgd.linear.LinearSGDTrainer"
:properties {:objective "squared"
:epochs "50"
:loggingInterval "10000"}}]
:tribuo-trainer-name "linear-sgd"})(def multi-results
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))
(mm/pipeline {:metamorph/id :model} (pocket-model sgd-spec))
(mm/pipeline {:metamorph/id :model} (pocket-model {:model-type :fastmath/ols}))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
Mar 01, 2026 4:31:59 PM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Training SGD model with 133 examples
Mar 01, 2026 4:31:59 PM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Outputs - RegressionInfo({name=y,id=0,count=133,max=32.285163,min=-3.003255,mean=15.591786,variance=84.043799})
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/6a/6ac08d75a9c1dfba5441528a6c2cb027b0986f6f
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
Mar 01, 2026 4:31:59 PM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Training SGD model with 133 examples
Mar 01, 2026 4:31:59 PM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Outputs - RegressionInfo({name=y,id=0,count=133,max=31.652557,min=-1.736155,mean=15.631001,variance=80.557863})
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/cf/cff2b9e4351565863c5cf69ac6a1aa7a626936af
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
Mar 01, 2026 4:31:59 PM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Training SGD model with 134 examples
Mar 01, 2026 4:31:59 PM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Outputs - RegressionInfo({name=y,id=0,count=134,max=32.285163,min=-3.003255,mean=16.262557,variance=77.697467})
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/21/2174cb3cdbabf34a7fd782c8efb1ab0084db8081
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/c2/c2ccc1e0dcf1c2c00d9c621178aafc97ec23e85e
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/42/42cb21d2dea2f78ba1450f2f2eb4c3683652e07f
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/3c/3cb60cf5b7576d22ded8c276361bcc0eac5d3c40
3 model types Γ 3 folds = 9 entries:
(pocket/cache-stats){:total-entries 9,
:total-size-bytes 188852,
:entries-per-fn {"scicloj.metamorph.ml/train" 9}}Mean RMSE per model type:
(let [model-names ["CART" "SGD" "fastmath-OLS"]
means (mapv (fn [pipeline-results]
(tcc/mean (map #(-> % :test-transform :metric) pipeline-results)))
multi-results)]
(tc/dataset {:model model-names :mean-rmse means}))_unnamed [3 2]:
| :model | :mean-rmse |
|---|---|
| CART | 11.43320438 |
| SGD | 9.00886158 |
| fastmath-OLS | 9.01791979 |
Graceful fallback
The built-in metamorph.ml/ols uses Apache Commons Math which Nippy canβt serialize. pocket-model catches the error and falls back to uncached training β the pipeline still works, just without disk caching for that model.
(pocket/cleanup!)[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}(def fallback-results
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))
(mm/pipeline {:metamorph/id :model} (pocket-model {:model-type :metamorph.ml/ols}))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
CART models are cached β 3 entries, one per fold. OLS falls back to uncached training silently. The failed serialization attempts leave empty cache directories, which show up as entries with a nil function name:
(pocket/cache-stats){:total-entries 6,
:total-size-bytes 53369,
:entries-per-fn {"scicloj.metamorph.ml/train" 3, nil 3}}Both model types produce valid metrics:
(let [model-names ["CART" "OLS-fallback"]
means (mapv (fn [pipeline-results]
(tcc/mean (map #(-> % :test-transform :metric) pipeline-results)))
fallback-results)]
(tc/dataset {:model model-names :mean-rmse means}))_unnamed [2 2]:
| :model | :mean-rmse |
|---|---|
| CART | 11.43320438 |
| OLS-fallback | 9.00886158 |
Disk persistence
Models survive JVM restarts. After clearing the in-memory cache, models are loaded from disk on next access.
(pocket/cleanup!)[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}Train fresh:
(def persist-results-1
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
Clear in-memory cache (simulates JVM restart):
(pocket/clear-mem-cache!)nilRe-evaluate β loads from disk:
(def persist-results-2
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache hit (disk): scicloj.metamorph.ml/train /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache hit (disk): scicloj.metamorph.ml/train /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] DEBUG scicloj.pocket.impl.cache - Cache hit (disk): scicloj.metamorph.ml/train /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
Same metrics:
(= (mapv #(-> % :test-transform :metric) (flatten persist-results-1))
(mapv #(-> % :test-transform :metric) (flatten persist-results-2)))trueDiscussion
pocket-model is a thin wrapper β about 20 lines of code β that gives us disk-persistent model caching with zero changes to our pipeline structure. It works with evaluate-pipelines, preprocessing steps, learning curves, and grid search.
Serialization compatibility (tested):
| Backend | Cacheable? |
|---|---|
| Tribuo regression (CART, SGD) | Yes |
| Tribuo classification | Yes |
| fastmath/ols | Yes |
| metamorph.ml/ols (Commons Math) | No (falls back) |
| metamorph.ml/dummy-regressor | Yes |
When to use pocket-model:
- Grid search / hyperparameter tuning (train once, reuse)
- Iterative notebook development (change downstream code, keep models)
- Learning curves (add new sizes, only new ones train)
- Any workflow where we re-evaluate with the same data + options
Cache key efficiency: When pocket-model receives a derefed dataset (e.g., from ml/evaluate-pipelines, which passes real datasets through :metamorph/data), Pocketβs origin registry recognizes it and uses the lightweight identity from the original Cached reference. This avoids hashing the full dataset content for the cache key β the same efficiency as passing a Cached reference directly.
Cleanup
(pocket/cleanup!)[nREPL-session-120ee500-d4ba-4b41-bcdc-e26822f35e2b] INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}