11 π§ Draft: pocket-model β drop-in caching for metamorph.ml
Last modified: 2026-02-08
This chapter shows how to cache model training in a metamorph.ml pipeline using Pocket. We define a small pocket-model function β a drop-in replacement for ml/model β and use it with cross-validation, grid search, and multiple model types.
Background
metamorph.ml is the Scicloj library for machine learning pipelines. It builds on metamorph, a data-transformation framework where each step is a function that takes a context map and returns an updated one. metamorph.ml distinguishes two modes β :fit (learn from training data) and :transform (apply to new data) β so a pipeline can be trained once and reused for prediction.
On top of this, metamorph.ml adds model training/prediction, cross-validation (evaluate-pipelines), loss functions, and hyperparameter search. A typical workflow looks like:
- Define a pipeline of preprocessing + model steps
- Split data into folds
- Call
evaluate-pipelinesto train and score across folds - Compare results, pick the best model
Why cache with Pocket?
metamorph.ml includes a built-in caching mechanism. This notebook explores what happens when we use Pocketβs caching instead, bringing a few things that are natural to Pocketβs design:
Disk persistence β cached models survive JVM restarts, so we can pick up where we left off across sessions
Content-based keys β cache keys derived from function identity and full argument values via SHA-1
Concurrent dedup β when multiple threads request the same computation, only one trains and the rest wait for the result
The integration is lightweight: a pocket-model function that is a drop-in replacement for ml/model. We swap one pipeline step and everything else β evaluate-pipelines, preprocessing, grid search β stays the same.
What this gives us:
- Same pipeline code, same
evaluate-pipelines - Model training cached to disk (survives JVM restarts)
- Graceful fallback for non-serializable models
What this notebook does not cover: because pocket-model plugs into metamorph.mlβs existing pipeline machinery, only the model-training step is cached through Pocket. Preprocessing, splitting, and evaluation happen outside Pocketβs awareness β there is no computational DAG tracking the full pipeline, no per-step storage control (choosing whether each step caches to disk, memory, or not at all), and no provenance trail that connects a final metric back to the data and parameters that produced it. A companion notebook is in the works, exploring a deeper integration where every pipeline step is a Pocket caching-fn, giving us all of those things.
Setup
(ns pocket-book.pocket-model
(:require
;; Logging setup for this chapter (see Logging chapter):
[pocket-book.logging]
;; Pocket API:
[scicloj.pocket :as pocket]
;; Annotating kinds of visualizations:
[scicloj.kindly.v4.kind :as kind]
;; Data processing:
[tablecloth.api :as tc]
[tablecloth.column.api :as tcc]
[tech.v3.dataset.modelling :as ds-mod]
[tech.v3.dataset.column-filters :as cf]
;; Machine learning:
[scicloj.metamorph.ml :as ml]
[scicloj.metamorph.ml.loss :as loss]
[scicloj.metamorph.ml.regression]
[scicloj.metamorph.core :as mm]
[scicloj.ml.tribuo]))(def cache-dir "/tmp/pocket-model")(pocket/set-base-cache-dir! cache-dir)10:06:45.235 INFO scicloj.pocket - Cache dir set to: /tmp/pocket-model
"/tmp/pocket-model"(pocket/cleanup!)10:06:45.236 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed false}The pocket-model function
This is the core of the integration. It follows the same contract as ml/model β a metamorph step that trains in :fit mode and predicts in :transform mode. The only difference: ml/train is wrapped with pocket/cached.
If Nippy canβt serialize a model (e.g., Apache Commons Math OLS), it falls back to uncached training automatically.
(defn pocket-model
"Drop-in replacement for ml/model that caches training via Pocket.
Falls back to uncached training if serialization fails."
[options]
(fn [{:metamorph/keys [id data mode] :as ctx}]
(case mode
:fit
(let [model (try
(deref (pocket/cached #'ml/train data options))
(catch Exception _e
(ml/train data options)))]
(assoc ctx id (assoc model :scicloj.metamorph.ml/unsupervised?
(get (ml/options->model-def options)
:unsupervised? false))))
:transform
(let [model (get ctx id)]
(if (get model :scicloj.metamorph.ml/unsupervised?)
ctx
(-> ctx
(update id assoc
:scicloj.metamorph.ml/feature-ds (cf/feature data)
:scicloj.metamorph.ml/target-ds (cf/target data))
(assoc :metamorph/data (ml/predict data model))))))))Test data
Simple synthetic regression: y = 3x + noise. 200 rows, enough for quick feedback.
(def ds (-> (let [rng (java.util.Random. 42)]
(tc/dataset
{:x (vec (repeatedly 200 #(* 10.0 (.nextDouble rng))))
:y (vec (repeatedly 200 #(+ (* 3.0 (* 10.0 (.nextDouble rng)))
(* 2.0 (.nextGaussian rng)))))}))
(ds-mod/set-inference-target :y)))(def splits (tc/split->seq ds :kfold {:k 3 :seed 42}))(count splits)3Basic usage
Use pocket-model in place of ml/model. The {:metamorph/id :model} map step sets the step ID that evaluate-pipelines expects.
(def cart-spec
{:model-type :scicloj.ml.tribuo/regression
:tribuo-components [{:name "cart"
:type "org.tribuo.regression.rtree.CARTRegressionTrainer"
:properties {:maxDepth "8"}}]
:tribuo-trainer-name "cart"})(def pipe-cart
(mm/pipeline
{:metamorph/id :model}
(pocket-model cart-spec)))First run β trains 3 models (one per fold):
(def results-1
(ml/evaluate-pipelines
[pipe-cart]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))10:06:45.249 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.255 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
10:06:45.258 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.264 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
10:06:45.268 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.274 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
(mapv #(-> % :test-transform :metric) (flatten results-1))[10.938693265902357 11.23113067170221 12.12978921023711]Cache now has 3 entries (one per fold):
(pocket/cache-stats){:total-entries 3,
:total-size-bytes 53368,
:entries-per-fn {"scicloj.metamorph.ml/train" 3}}Second run β all cache hits, same metrics:
(def results-2
(ml/evaluate-pipelines
[pipe-cart]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))(= (mapv #(-> % :test-transform :metric) (flatten results-1))
(mapv #(-> % :test-transform :metric) (flatten results-2)))trueIncremental grid search
Start with 3 depth values, then add 3 more. Only new combinations train β existing ones hit cache.
(pocket/cleanup!)10:06:45.302 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}(defn cart-pipe [max-depth]
(mm/pipeline
{:metamorph/id :model}
(pocket-model
{:model-type :scicloj.ml.tribuo/regression
:tribuo-components [{:name "cart"
:type "org.tribuo.regression.rtree.CARTRegressionTrainer"
:properties {:maxDepth (str max-depth)}}]
:tribuo-trainer-name "cart"})))Batch 1: depths 4, 8, 12
(def batch-1
(ml/evaluate-pipelines
(mapv cart-pipe [4 8 12])
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))10:06:45.303 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.308 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/46/46551ba53c7214873653fd678fb5af5911fd74a8
10:06:45.312 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.316 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/6f/6fbb7bae3b90b8268e38a91c951088c6cfa3cadb
10:06:45.319 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.324 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/45/45b255de5fd11e966a46bb2bc197bd25e7dc3f6d
10:06:45.328 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.335 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
10:06:45.339 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.345 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
10:06:45.350 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.357 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
10:06:45.362 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.381 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/3b/3b861196a2d47cb991e62014fd3765c71da240f1
10:06:45.388 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.394 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/b9/b95687a74377288768e9d5b6046213e733681bd6
10:06:45.399 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.406 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/78/78e74ef9c84789d8e65025fb80d4d14782a68ba3
3 depths Γ 3 folds = 9 trainings:
(pocket/cache-stats){:total-entries 9,
:total-size-bytes 158771,
:entries-per-fn {"scicloj.metamorph.ml/train" 9}}Batch 2: depths 4, 6, 8, 10, 12, 16 Depths 4, 8, 12 already cached β only 6, 10, 16 are new
(def batch-2
(ml/evaluate-pipelines
(mapv cart-pipe [4 6 8 10 12 16])
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))10:06:45.438 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.446 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/fa/fa43248538238553b66d37050b738a274e96a914
10:06:45.450 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.456 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/73/734a310323e487ea805c11c6c52bda4e9f95e81b
10:06:45.459 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.463 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/ee/eecb6e24aaa957374028b36ad146f7cbde70d19e
10:06:45.476 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.482 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/a1/a1e22191f6bb58b528e6f441dff21c21b53a6614
10:06:45.485 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.490 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/8a/8a0abfef601073e2647c21be0e36785ef4621535
10:06:45.493 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.501 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/3a/3a962b4c4929eecc8207ab8210d2761d069d7913
10:06:45.535 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.544 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/b5/b5dd934981c42d27635da730d93b58cd25d29824
10:06:45.549 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.555 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/ba/bacfed136727155cfce4b6aae389e18c5bccb87a
10:06:45.558 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.564 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9f/9f60d660ba8f69b08c04e5e0789fb8c963ebb49e
3 new depths Γ 3 folds = 9 new + 9 cached = 18 total:
(pocket/cache-stats){:total-entries 18,
:total-size-bytes 322365,
:entries-per-fn {"scicloj.metamorph.ml/train" 18}}Combine results β best depth by mean RMSE:
(let [depths [4 6 8 10 12 16]
means (mapv (fn [pipeline-results]
(tcc/mean (map #(-> % :test-transform :metric) pipeline-results)))
batch-2)]
(tc/dataset {:depth depths :mean-rmse means}))_unnamed [6 2]:
| :depth | :mean-rmse |
|---|---|
| 4 | 12.89681757 |
| 6 | 12.36344906 |
| 8 | 12.28048327 |
| 10 | 11.43320438 |
| 12 | 10.68677435 |
| 16 | 9.86061013 |
Multiple model types
Compare CART, linear SGD, and fastmath OLS in the same evaluation. Each model type is cached independently.
(pocket/cleanup!)10:06:45.588 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}(def sgd-spec
{:model-type :scicloj.ml.tribuo/regression
:tribuo-components [{:name "squared"
:type "org.tribuo.regression.sgd.objectives.SquaredLoss"}
{:name "linear-sgd"
:type "org.tribuo.regression.sgd.linear.LinearSGDTrainer"
:properties {:objective "squared"
:epochs "50"
:loggingInterval "10000"}}]
:tribuo-trainer-name "linear-sgd"})(def multi-results
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))
(mm/pipeline {:metamorph/id :model} (pocket-model sgd-spec))
(mm/pipeline {:metamorph/id :model} (pocket-model {:model-type :fastmath/ols}))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))10:06:45.589 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.596 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
10:06:45.600 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.606 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
10:06:45.610 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.618 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
10:06:45.623 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
Feb 09, 2026 10:06:45 AM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Training SGD model with 133 examples
Feb 09, 2026 10:06:45 AM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Outputs - RegressionInfo({name=y,id=0,count=133,max=32.285163,min=-3.003255,mean=15.591786,variance=84.043799})
10:06:45.632 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/6a/6ac08d75a9c1dfba5441528a6c2cb027b0986f6f
10:06:45.635 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
Feb 09, 2026 10:06:45 AM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Training SGD model with 133 examples
Feb 09, 2026 10:06:45 AM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Outputs - RegressionInfo({name=y,id=0,count=133,max=31.652557,min=-1.736155,mean=15.631001,variance=80.557863})
10:06:45.643 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/cf/cff2b9e4351565863c5cf69ac6a1aa7a626936af
10:06:45.647 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
Feb 09, 2026 10:06:45 AM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Training SGD model with 134 examples
Feb 09, 2026 10:06:45 AM org.tribuo.common.sgd.AbstractSGDTrainer train
INFO: Outputs - RegressionInfo({name=y,id=0,count=134,max=32.285163,min=-3.003255,mean=16.262557,variance=77.697467})
10:06:45.658 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/21/2174cb3cdbabf34a7fd782c8efb1ab0084db8081
10:06:45.663 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.673 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/c2/c2ccc1e0dcf1c2c00d9c621178aafc97ec23e85e
10:06:45.676 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.681 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/42/42cb21d2dea2f78ba1450f2f2eb4c3683652e07f
10:06:45.683 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.689 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/3c/3cb60cf5b7576d22ded8c276361bcc0eac5d3c40
3 model types Γ 3 folds = 9 entries:
(pocket/cache-stats){:total-entries 9,
:total-size-bytes 188853,
:entries-per-fn {"scicloj.metamorph.ml/train" 9}}Mean RMSE per model type:
(let [model-names ["CART" "SGD" "fastmath-OLS"]
means (mapv (fn [pipeline-results]
(tcc/mean (map #(-> % :test-transform :metric) pipeline-results)))
multi-results)]
(tc/dataset {:model model-names :mean-rmse means}))_unnamed [3 2]:
| :model | :mean-rmse |
|---|---|
| CART | 11.43320438 |
| SGD | 9.00886158 |
| fastmath-OLS | 9.01791979 |
Graceful fallback
The built-in metamorph.ml/ols uses Apache Commons Math which Nippy canβt serialize. pocket-model catches the error and falls back to uncached training β the pipeline still works, just without disk caching for that model.
(pocket/cleanup!)10:06:45.705 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}(def fallback-results
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))
(mm/pipeline {:metamorph/id :model} (pocket-model {:model-type :metamorph.ml/ols}))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))10:06:45.706 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.713 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
10:06:45.718 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.725 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
10:06:45.730 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.738 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
10:06:45.744 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.759 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.771 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
CART models are cached β 3 entries, one per fold. OLS falls back to uncached training silently. The failed serialization attempts leave empty cache directories, which show up as entries with a nil function name:
(pocket/cache-stats){:total-entries 6,
:total-size-bytes 53369,
:entries-per-fn {"scicloj.metamorph.ml/train" 3, nil 3}}Both model types produce valid metrics:
(let [model-names ["CART" "OLS-fallback"]
means (mapv (fn [pipeline-results]
(tcc/mean (map #(-> % :test-transform :metric) pipeline-results)))
fallback-results)]
(tc/dataset {:model model-names :mean-rmse means}))_unnamed [2 2]:
| :model | :mean-rmse |
|---|---|
| CART | 11.43320438 |
| OLS-fallback | 9.00886158 |
Disk persistence
Models survive JVM restarts. After clearing the in-memory cache, models are loaded from disk on next access.
(pocket/cleanup!)10:06:45.791 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}Train fresh:
(def persist-results-1
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))10:06:45.792 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.800 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
10:06:45.805 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.813 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
10:06:45.817 INFO scicloj.pocket.impl.cache - Cache miss, computing: scicloj.metamorph.ml/train
10:06:45.824 DEBUG scicloj.pocket.impl.cache - Cache write: /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
Clear in-memory cache (simulates JVM restart):
(pocket/clear-mem-cache!)nilRe-evaluate β loads from disk:
(def persist-results-2
(ml/evaluate-pipelines
[(mm/pipeline {:metamorph/id :model} (pocket-model cart-spec))]
splits
loss/rmse
:loss
{:return-best-crossvalidation-only false
:return-best-pipeline-only false}))10:06:45.831 DEBUG scicloj.pocket.impl.cache - Cache hit (disk): scicloj.metamorph.ml/train /tmp/pocket-model/7a/7a2371066976291d06fe1aad1b48bbeba167ff70
10:06:45.837 DEBUG scicloj.pocket.impl.cache - Cache hit (disk): scicloj.metamorph.ml/train /tmp/pocket-model/9d/9d2799f31ec89ab47c28abaedf1a94632d6e4912
10:06:45.841 DEBUG scicloj.pocket.impl.cache - Cache hit (disk): scicloj.metamorph.ml/train /tmp/pocket-model/75/752a5761fad71dd397dad959c21a078b67503a46
Same metrics:
(= (mapv #(-> % :test-transform :metric) (flatten persist-results-1))
(mapv #(-> % :test-transform :metric) (flatten persist-results-2)))trueDiscussion
pocket-model is a thin wrapper β about 20 lines of code β that gives us disk-persistent model caching with zero changes to our pipeline structure. It works with evaluate-pipelines, preprocessing steps, learning curves, and grid search.
Serialization compatibility (tested):
| Backend | Cacheable? |
|---|---|
| Tribuo regression (CART, SGD) | Yes |
| Tribuo classification | Yes |
| fastmath/ols | Yes |
| metamorph.ml/ols (Commons Math) | No (falls back) |
| metamorph.ml/dummy-regressor | Yes |
When to use pocket-model:
- Grid search / hyperparameter tuning (train once, reuse)
- Iterative notebook development (change downstream code, keep models)
- Learning curves (add new sizes, only new ones train)
- Any workflow where we re-evaluate with the same data + options
Cache key efficiency: When pocket-model receives a derefed dataset (e.g., from ml/evaluate-pipelines, which passes real datasets through :metamorph/data), Pocketβs origin registry recognizes it and uses the lightweight identity from the original Cached reference. This avoids hashing the full dataset content for the cache key β the same efficiency as passing a Cached reference directly.
Cleanup
(pocket/cleanup!)10:06:45.848 INFO scicloj.pocket - Cache cleanup: /tmp/pocket-model
{:dir "/tmp/pocket-model", :existed true}