14  AutoML using metamorph pipelines

In this tutorial we see how to use metamorph.ml to perform automatic machine learning. With AutoML we mean to try lots of different models and hyperparameters and rely on automatic validation to pick the best performing model automatically.

Note that this chapter requires scicloj.ml.smile as an additional dependency to Noj. Clojars Project

(ns noj-book.automl
  (:require [noj-book.ml-basic :as ml-basic]
            [scicloj.kindly.v4.kind :as kind]
            [scicloj.metamorph.ml :as ml]
            [tablecloth.api :as tc]
            [scicloj.metamorph.ml.loss :as loss]
            [scicloj.metamorph.core :as mm]
            [scicloj.metamorph.ml.gridsearch :as gs]
            [tech.v3.dataset.modelling :as ds-mod]
            [scicloj.ml.tribuo]))

14.1 The metamorph pipeline abstraction

When using automl, it is very useful to be able to manage all the steps of a machine learning pipeline (including data transformations and modeling) as a unified function that can be freely moved around. This cannot work with a threading macro, as this executes immediate.

The Clojure way to do this, is function composing and higher level functions.

(The following is a quick explanation of metamorph, see chapter “Machine learning pipelines” for more details.

While in the basic tutorial we saw how to use the pair of train and predict to perform machine learning, AutoML requires us to use another abstraction, in order to encapsulate both train and predict in a single function.(or other any operation)

We will use the concept of a “metamorph pipeline”, which is a sequence of specific functions, and each function can behave differently, depending on the “mode” in which the pipelines get run. It can run either in mode :fit or in mode :transform, and the functions of the pipeline can (but don’t need to) do different things depend on the mode

14.1.1 metamorph.ml/model

Specifically we have a function called metamorph.ml/model which will do train in mode :fit and predict in mode :transform

The names :fit and :transform come from the fact that the functions could do other things then train and predict, so :fit and :transform represent a more general concept than train/predict.

(require '[scicloj.metamorph.ml :as ml]
         '[scicloj.metamorph.core :as mm]
         '[tablecloth.api :as tc])

We will use the ready-for-modeling data from basic-ml tutorial,

(def titanic ml-basic/numeric-titanic-data)

### Split the data

so lets create splits of the data first:

(def splits (first (tc/split->seq titanic)))
(def train-ds (:train splits))
(def test-ds (:test splits))

14.1.2 Create pipeline

In its foundation a metamorph pipeline is a sequential composition of functions, which all take a map as only parameter, the so-called context, and they return another context, changed by the functions. The composed function , hence the pipeline overall, has this same property. Any other function parameters are closed over on function creation. The following creates such a composed function out of other metamorph compliant operations. The overall result of the pipeline function, is the result of the last operation. (in this case we have only ‘1’ operation)

In nearly all cases, the last pipeline operation is ml/model . But this is not absolutely required.

(def my-pipeline
  (mm/pipeline
   (ml/model {:model-type :metamorph.ml/dummy-classifier})))

as we see, this is a function itself

my-pipeline
#object[clojure.core$partial$fn__5929 0x41abf20f "clojure.core$partial$fn__5929@41abf20f"]

This function is metamorph compliant, so it takes a map (my-pipeline {}) and returns a map.

But this map cannot be “arbitrary”, it needs to adhere to the metamorph conventions.

14.1.3 run pipeline = train model

The following trains a model, because the ml/model function does this when called with :mode :fit. And it is the only operation in the pipeline, so the pipeline does one thing, it trains a model

(def ctx-after-train
  (my-pipeline {:metamorph/data train-ds
                :metamorph/mode :fit}))
ctx-after-train

{

:metamorph/data

Group: 0 [711 4]:

:sex :pclass :embarked :survived
0.0 2.0 2.0 0.0
0.0 1.0 2.0 0.0
1.0 2.0 0.0 0.0
0.0 1.0 0.0 1.0
0.0 3.0 0.0 0.0
1.0 2.0 2.0 1.0
0.0 3.0 0.0 0.0
1.0 2.0 0.0 1.0
0.0 3.0 0.0 1.0
0.0 3.0 1.0 0.0
... ... ... ...
0.0 3.0 0.0 0.0
0.0 1.0 2.0 0.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 1.0
0.0 3.0 0.0 0.0
0.0 1.0 2.0 0.0
1.0 3.0 1.0 1.0
1.0 3.0 0.0 0.0
0.0 3.0 2.0 0.0
0.0 3.0 0.0 0.0
1.0 2.0 0.0 1.0
:metamorph/mode :fit
#uuid "246256a9-1fbc-4189-a75c-f088eb289026" {:feature-columns [:sex :pclass :embarked], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {"no" 0, "yes" 1}, :src-column :survived, :result-datatype :float64}}, :target-columns [:survived], :train-input-hash nil, :target-datatypes {:survived :float64}, :scicloj.metamorph.ml/unsupervised? nil, :model-data {:majority-class 0.0, :distinct-labels (0.0 1.0)}, :id #uuid "fda1d02d-7d90-43e3-8c66-cb3b5d24d650", :options {:model-type :metamorph.ml/dummy-classifier}}

}

The ctx contains lots of information, so I only show its top level keys

(keys ctx-after-train)
(:metamorph/data
 :metamorph/mode
 #uuid "246256a9-1fbc-4189-a75c-f088eb289026")

This context map has the “data”, the “mode” and a UUID for each operation (we had only one in this pipeline)

(vals ctx-after-train)

(

Group: 0 [711 4]:

:sex :pclass :embarked :survived
0.0 2.0 2.0 0.0
0.0 1.0 2.0 0.0
1.0 2.0 0.0 0.0
0.0 1.0 0.0 1.0
0.0 3.0 0.0 0.0
1.0 2.0 2.0 1.0
0.0 3.0 0.0 0.0
1.0 2.0 0.0 1.0
0.0 3.0 0.0 1.0
0.0 3.0 1.0 0.0
... ... ... ...
0.0 3.0 0.0 0.0
0.0 1.0 2.0 0.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 1.0
0.0 3.0 0.0 0.0
0.0 1.0 2.0 0.0
1.0 3.0 1.0 1.0
1.0 3.0 0.0 0.0
0.0 3.0 2.0 0.0
0.0 3.0 0.0 0.0
1.0 2.0 0.0 1.0
:fit
{:feature-columns [:sex :pclass :embarked],
 :target-categorical-maps
 {:survived
  {:lookup-table {"no" 0, "yes" 1},
   :src-column :survived,
   :result-datatype :float64}},
 :target-columns [:survived],
 :train-input-hash nil,
 :target-datatypes {:survived :float64},
 :scicloj.metamorph.ml/unsupervised? nil,
 :model-data {:majority-class 0.0, :distinct-labels (0.0 1.0)},
 :id #uuid "fda1d02d-7d90-43e3-8c66-cb3b5d24d650",
 :options {:model-type :metamorph.ml/dummy-classifier}}

)

The model function has closed over the id, so it knows “its id”, so in the transform mode it can get the data created at :fit. So the model function can “send” data to itself from :fit to :transform, the trained model.

So this will do the predict on new data:

(def ctx-after-predict
  (my-pipeline (assoc ctx-after-train
                      :metamorph/mode :transform
                      :metamorph/data test-ds)))
(keys ctx-after-predict)
(:metamorph/data
 :metamorph/mode
 #uuid "246256a9-1fbc-4189-a75c-f088eb289026")

For the dummy-model we do not see a trained-model, but it “communicates” the majority class from the train data to use it for prediction. So the dummy-model has ‘learned’ the majority class from its training data.

So we can get prediction result out of the ctx:

(-> ctx-after-predict :metamorph/data :survived)
#tech.v3.dataset.column<float64>[178]
:survived
[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]

This works as long as all operations of the pipeline follow the metamorph convention (we can create such compliant functions, out of normal dataset->dataset functions, as we will see)

my-pipeline represents therefore a not yet executed model training / prediction flow. It can be freely moved around and applied to datasets when needed.

14.2 Use metamorph pipelines to do model training with higher level API

As user of metamorph.ml we do not need to deal with this low-level details of how metamorph works, we have convenience functions which hide this.

The following code will do the same as train, but return a context object, which contains the trained model, so it will execute the pipeline, and not only create it.

It uses a convenience function mm/fit which generates compliant context maps internally and executes the pipeline as well.

The ctx acts a collector of everything “learned” during :fit, mainly the trained model, but it could be as well other information learned from the data during :fit and to be applied at :transform .

(def train-ctx
  (mm/fit titanic
          (ml/model {:model-type :metamorph.ml/dummy-classifier})))

(The dummy-classifier model does not have a lot of state, so there is little to see)

(keys train-ctx)
(:metamorph/data
 :metamorph/mode
 #uuid "397c1f76-d6e3-4666-9bf1-0f0ae3e8a010")

To show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.

We can already chain train and test with usual functions:

(->>
 (ml/train train-ds {:model-type :metamorph.ml/dummy-classifier})
 (ml/predict test-ds)
 :survived)
#tech.v3.dataset.column<float64>[178]
:survived
[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]

the same with pipelines

(def pipeline
  (mm/pipeline (ml/model {:model-type :metamorph.ml/dummy-classifier})))
(->>
 (mm/fit-pipe train-ds pipeline)
 (mm/transform-pipe test-ds pipeline)
 :metamorph/data :survived)
#tech.v3.dataset.column<float64>[178]
:survived
[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]

14.3 Create metamorph compliant functions

As said before, a metamorph pipeline is composed of metamorph compliant functions / operations, which take as input and output the ctx. There are three ways to create those.

The following three expressions create the same metamorph compliant function

  1. implementing a metamorph compliant function directly via anonymous function
(def ops-1
  (fn [ctx]
    (assoc ctx :metamorph/data
           (tc/drop-columns (:metamorph/data ctx) [:embarked]))))
  1. using mm/lift which does the same as 1.
(def ops-2 (mm/lift tc/drop-columns [:embarked]))
  1. using a name-space containing lifted functions
(require '[tablecloth.pipeline])
NoteERR
WARNING: bit-set already refers to: #'clojure.core/bit-set in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-set
WARNING: bit-shift-right already refers to: #'clojure.core/bit-shift-right in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-shift-right
WARNING: bit-shift-left already refers to: #'clojure.core/bit-shift-left in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-shift-left
WARNING: < already refers to: #'clojure.core/< in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/<
WARNING: pos? already refers to: #'clojure.core/pos? in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/pos?
WARNING: bit-xor already refers to: #'clojure.core/bit-xor in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-xor
WARNING: unsigned-bit-shift-right already refers to: #'clojure.core/unsigned-bit-shift-right in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/unsigned-bit-shift-right
WARNING: neg? already refers to: #'clojure.core/neg? in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/neg?
WARNING: <= already refers to: #'clojure.core/<= in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/<=
WARNING: * already refers to: #'clojure.core/* in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/*
WARNING: min already refers to: #'clojure.core/min in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/min
WARNING: identity already refers to: #'clojure.core/identity in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/identity
WARNING: bit-and-not already refers to: #'clojure.core/bit-and-not in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-and-not
WARNING: quot already refers to: #'clojure.core/quot in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/quot
WARNING: > already refers to: #'clojure.core/> in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/>
WARNING: even? already refers to: #'clojure.core/even? in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/even?
WARNING: - already refers to: #'clojure.core/- in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/-
WARNING: or already refers to: #'clojure.core/or in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/or
WARNING: zero? already refers to: #'clojure.core/zero? in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/zero?
WARNING: rem already refers to: #'clojure.core/rem in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/rem
WARNING: bit-and already refers to: #'clojure.core/bit-and in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-and
WARNING: not already refers to: #'clojure.core/not in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/not
WARNING: / already refers to: #'clojure.core// in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline//
WARNING: bit-or already refers to: #'clojure.core/bit-or in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-or
WARNING: >= already refers to: #'clojure.core/>= in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/>=
WARNING: bit-flip already refers to: #'clojure.core/bit-flip in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-flip
WARNING: infinite? already refers to: #'clojure.core/infinite? in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/infinite?
WARNING: odd? already refers to: #'clojure.core/odd? in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/odd?
WARNING: bit-clear already refers to: #'clojure.core/bit-clear in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-clear
WARNING: + already refers to: #'clojure.core/+ in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/+
WARNING: abs already refers to: #'clojure.core/abs in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/abs
WARNING: bit-not already refers to: #'clojure.core/bit-not in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/bit-not
WARNING: max already refers to: #'clojure.core/max in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/max
WARNING: and already refers to: #'clojure.core/and in namespace: tablecloth.pipeline, being replaced by: #'tablecloth.pipeline/and
(def ops-3 (tablecloth.pipeline/drop-columns [:embarked]))

All three create the same pipeline op and can be used to make a pipeline

(mm/pipeline ops-1)
#object[clojure.core$partial$fn__5929 0x7b5ac27c "clojure.core$partial$fn__5929@7b5ac27c"]
(mm/pipeline ops-2)
#object[clojure.core$partial$fn__5929 0x4e1adf68 "clojure.core$partial$fn__5929@4e1adf68"]
(mm/pipeline ops-3)
#object[clojure.core$partial$fn__5929 0x4b0d4fa2 "clojure.core$partial$fn__5929@4b0d4fa2"]

All three can be called as function taking a dataset wrapped in a ctx map.

Pipeline as data is as well-supported:

(def op-spec [[ml/model {:model-type :metamorph.ml/dummy-classifier}]])
(mm/->pipeline op-spec)
#object[clojure.core$partial$fn__5929 0x47a9a54a "clojure.core$partial$fn__5929@47a9a54a"]

Creating these functions does not yet execute anything, they are functions which can be executed against a context as part of a metamorph pipeline. Executions are triggered like this:

(ops-1 {:metamorph/data titanic})

{

:metamorph/data

_unnamed [889 3]:

:sex :pclass :survived
0.0 3.0 0.0
1.0 1.0 1.0
1.0 3.0 1.0
1.0 1.0 1.0
0.0 3.0 0.0
0.0 3.0 0.0
0.0 1.0 0.0
0.0 3.0 0.0
1.0 3.0 1.0
1.0 2.0 1.0
... ... ...
1.0 2.0 1.0
0.0 3.0 0.0
1.0 3.0 0.0
0.0 2.0 0.0
0.0 3.0 0.0
1.0 3.0 0.0
0.0 2.0 0.0
1.0 1.0 1.0
1.0 3.0 0.0
0.0 1.0 1.0
0.0 3.0 0.0

}

(ops-2 {:metamorph/data titanic})

{

:metamorph/data

_unnamed [889 3]:

:sex :pclass :survived
0.0 3.0 0.0
1.0 1.0 1.0
1.0 3.0 1.0
1.0 1.0 1.0
0.0 3.0 0.0
0.0 3.0 0.0
0.0 1.0 0.0
0.0 3.0 0.0
1.0 3.0 1.0
1.0 2.0 1.0
... ... ...
1.0 2.0 1.0
0.0 3.0 0.0
1.0 3.0 0.0
0.0 2.0 0.0
0.0 3.0 0.0
1.0 3.0 0.0
0.0 2.0 0.0
1.0 1.0 1.0
1.0 3.0 0.0
0.0 1.0 1.0
0.0 3.0 0.0

}

(ops-3 {:metamorph/data titanic})

{

:metamorph/data

_unnamed [889 3]:

:sex :pclass :survived
0.0 3.0 0.0
1.0 1.0 1.0
1.0 3.0 1.0
1.0 1.0 1.0
0.0 3.0 0.0
0.0 3.0 0.0
0.0 1.0 0.0
0.0 3.0 0.0
1.0 3.0 1.0
1.0 2.0 1.0
... ... ...
1.0 2.0 1.0
0.0 3.0 0.0
1.0 3.0 0.0
0.0 2.0 0.0
0.0 3.0 0.0
1.0 3.0 0.0
0.0 2.0 0.0
1.0 1.0 1.0
1.0 3.0 0.0
0.0 1.0 1.0
0.0 3.0 0.0

}

The mm/lift function transforms any dataset->dataset function into a ctx->ctx function, while using the metamorph convention, as required for metamorph pipeline operations

For convenience tablecloth contains a ns where all dataset->dataset functions are lifted into ctx->ctx operations, so can be added to pipelines directly without using lift.

So a metamorph pipeline can encapsulate arbitrary transformation of a dataset in the 2 modes. They can be “stateless” (only chaining the dataset, such as drop-columns) or “state-full”, so they store data in the ctx during :fit and can use it in :transform. In the pipeline above, the trained model is stored in this way.

This state is not stored globally, but inside the pipeline so this makes pipeline execution “isolated”.

So now we can add more operations to the pipeline, and nothing else changes, for example drop columns.

While most metamorph compliant operations behave the same in
:fit and :transform, there are some which do behave differently. They have a certain notion of “fit” and “transform”, that determines the way their behavior changes between these two modes.

They are therefore called “transformer” and are listed in the “Transformer reference” at the end of the Noj book.

Some transformers exist as well as model and can be used with function ml/model

14.4 Automatic ML with metamorph.ml

The AutoML support in metamorph.ml consists now in the possibility to create an arbitrary number of different pipelines and have them run against arbitrary test/train data splits and it automatically chooses the best model evaluated by a user provided metric function.

helper for later

(defn make-results-ds [evaluation-results]
  (->> evaluation-results
       flatten
       (map #(hash-map :options (-> % :test-transform :ctx :model :options)
                       :used-features (-> % :fit-ctx :used-features)
                       :mean-accuracy (-> % :test-transform :mean)))
       tc/dataset))
(require '[scicloj.metamorph.ml :as ml]
         '[scicloj.metamorph.ml.loss :as loss]
         '[scicloj.metamorph.core :as mm]
         '[scicloj.ml.tribuo]
         '[scicloj.ml.xgboost]
         '[scicloj.ml.smile.classification]
         '[scicloj.sklearn-clj.ml])
NoteERR
Boxed math warning, scicloj/ml/xgboost/csr.clj:17:32 - call: public static boolean clojure.lang.Numbers.lte(java.lang.Object,long).
Boxed math warning, scicloj/ml/xgboost/csr.clj:18:60 - call: public static java.lang.Number clojure.lang.Numbers.unchecked_dec(java.lang.Object).
NoteOUT
Register model:  :xgboost/regression
Register model:  :xgboost/classification
Register model:  :xgboost/logistic-binary-raw-classification
Register model:  :xgboost/linear-regression
Register model:  :xgboost/gpu-binary-logistic-raw-classification
Register model:  :xgboost/gpu-linear-regression
Register model:  :xgboost/count-poisson
Register model:  :xgboost/survival-cox
Register model:  :xgboost/gpu-logistic-regression
Register model:  :xgboost/tweedie-regression
Register model:  :xgboost/squared-error-regression
Register model:  :xgboost/multiclass-softprob
Register model:  :xgboost/logistic-binary-classification
Register model:  :xgboost/gamma-regression
Register model:  :xgboost/rank-map
Register model:  :xgboost/multiclass-softmax
Register model:  :xgboost/rank-pairwise
Register model:  :xgboost/gpu-binary-logistic-classification
Register model:  :xgboost/logistic-regression
Register model:  :xgboost/binary-hinge-loss
Register model:  :xgboost/rank-ndcg
Register model:  :smile.classification/linear-discriminant-analysis
Register model:  :smile.classification/fld
Register model:  :smile.classification/random-forest
Register model:  :smile.classification/ada-boost
Register model:  :smile.classification/knn
Register model:  :smile.classification/decision-tree
Register model:  :smile.classification/gradient-tree-boost
Register model:  :smile.classification/regularized-discriminant-analysis
Register model:  :smile.classification/quadratic-discriminant-analysis
Register model:  :smile.classification/logistic-regression
Register model:  :smile.classification/svm
Register model:  :smile.classification/maxent-multinomial
Register model:  :smile.classification/maxent-binomial
Register model:  :smile.classification/mlp
Register model:  :smile.classification/discrete-naive-bayes
Register model:  :smile.classification/sparse-svm
Register model:  :smile.classification/sparse-logistic-regression
'sklearn' version found:  1.5.2
Register model:  :sklearn.regression/ard-regression
Register model:  :sklearn.regression/ada-boost-regressor
Register model:  :sklearn.regression/bagging-regressor
Register model:  :sklearn.regression/bayesian-ridge
Register model:  :sklearn.regression/cca
Register model:  :sklearn.regression/decision-tree-regressor
Register model:  :sklearn.regression/dummy-regressor
Register model:  :sklearn.regression/elastic-net
Register model:  :sklearn.regression/elastic-net-cv
Register model:  :sklearn.regression/extra-tree-regressor
Register model:  :sklearn.regression/extra-trees-regressor
Register model:  :sklearn.regression/gamma-regressor
Register model:  :sklearn.regression/gaussian-process-regressor
Register model:  :sklearn.regression/gradient-boosting-regressor
Register model:  :sklearn.regression/hist-gradient-boosting-regressor
Register model:  :sklearn.regression/huber-regressor
Register model:  :sklearn.regression/isotonic-regression
Register model:  :sklearn.regression/k-neighbors-regressor
Register model:  :sklearn.regression/kernel-ridge
Register model:  :sklearn.regression/lars
Register model:  :sklearn.regression/lars-cv
Register model:  :sklearn.regression/lasso
Register model:  :sklearn.regression/lasso-cv
Register model:  :sklearn.regression/lasso-lars
Register model:  :sklearn.regression/lasso-lars-cv
Register model:  :sklearn.regression/lasso-lars-ic
Register model:  :sklearn.regression/linear-regression
Register model:  :sklearn.regression/linear-svr
Register model:  :sklearn.regression/mlp-regressor
Register model:  :sklearn.regression/multi-task-elastic-net
Register model:  :sklearn.regression/multi-task-elastic-net-cv
Register model:  :sklearn.regression/multi-task-lasso
Register model:  :sklearn.regression/multi-task-lasso-cv
Register model:  :sklearn.regression/nu-svr
Register model:  :sklearn.regression/orthogonal-matching-pursuit
Register model:  :sklearn.regression/orthogonal-matching-pursuit-cv
Register model:  :sklearn.regression/pls-canonical
Register model:  :sklearn.regression/pls-regression
Register model:  :sklearn.regression/passive-aggressive-regressor
Register model:  :sklearn.regression/poisson-regressor
Register model:  :sklearn.regression/quantile-regressor
Register model:  :sklearn.regression/ransac-regressor
Register model:  :sklearn.regression/radius-neighbors-regressor
Register model:  :sklearn.regression/random-forest-regressor
Register model:  :sklearn.regression/ridge
Register model:  :sklearn.regression/ridge-cv
Register model:  :sklearn.regression/sgd-regressor
Register model:  :sklearn.regression/svr
Register model:  :sklearn.regression/theil-sen-regressor
Register model:  :sklearn.regression/transformed-target-regressor
Register model:  :sklearn.regression/tweedie-regressor
Register model:  :sklearn.classification/ada-boost-classifier
Register model:  :sklearn.classification/bagging-classifier
Register model:  :sklearn.classification/bernoulli-nb
Register model:  :sklearn.classification/calibrated-classifier-cv
Register model:  :sklearn.classification/categorical-nb
Register model:  :sklearn.classification/complement-nb
Register model:  :sklearn.classification/decision-tree-classifier
Register model:  :sklearn.classification/dummy-classifier
Register model:  :sklearn.classification/extra-tree-classifier
Register model:  :sklearn.classification/extra-trees-classifier
Register model:  :sklearn.classification/gaussian-nb
Register model:  :sklearn.classification/gaussian-process-classifier
Register model:  :sklearn.classification/gradient-boosting-classifier
Register model:  :sklearn.classification/hist-gradient-boosting-classifier
Register model:  :sklearn.classification/k-neighbors-classifier
Register model:  :sklearn.classification/label-propagation
Register model:  :sklearn.classification/label-spreading
Register model:  :sklearn.classification/linear-discriminant-analysis
Register model:  :sklearn.classification/linear-svc
Register model:  :sklearn.classification/logistic-regression
Register model:  :sklearn.classification/logistic-regression-cv
Register model:  :sklearn.classification/mlp-classifier
Register model:  :sklearn.classification/multinomial-nb
Register model:  :sklearn.classification/nearest-centroid
Register model:  :sklearn.classification/nu-svc
Register model:  :sklearn.classification/passive-aggressive-classifier
Register model:  :sklearn.classification/perceptron
Register model:  :sklearn.classification/quadratic-discriminant-analysis
Register model:  :sklearn.classification/radius-neighbors-classifier
Register model:  :sklearn.classification/random-forest-classifier
Register model:  :sklearn.classification/ridge-classifier
Register model:  :sklearn.classification/ridge-classifier-cv
Register model:  :sklearn.classification/sgd-classifier
Register model:  :sklearn.classification/svc

14.5 Finding the best model automatically

The advantage of the pipelines is even more visible, if we want to have configurable pipelines, and do a grid search to find optimal settings.

the following will find the best model across:

  • 4 different model classes with different hyper params

  • 6 different selections of used features

  • k-cross validate this with different test / train splits

(defn make-pipe-fn [model-spec features]
  (mm/pipeline
   ;; store the used features in ctx, so we can retrieve them at the end
   (fn [ctx]
     (assoc ctx :used-features features))
   (mm/lift tc/select-columns (conj features :survived))
   {:metamorph/id :model} (ml/model model-spec)))

Create a 5-K cross validation split of the data:

(def titanic-k-fold (tc/split->seq ml-basic/numeric-titanic-data :kfold {:seed 12345}))
(-> titanic-k-fold count)
5

We add as well 10 hyperparameter variants for logistic regression obtained via Sobol search over the hyperparameter space of the model.

(def hyper-params 
  (->>
   (ml/hyperparameters :smile.classification/logistic-regression)
   (gs/sobol-gridsearch)
   (take 10)))
hyper-params
({:lambda 51.72462068965517,
  :tolerance 0.05263157942105263,
  :max-iterations 5310}
 {:lambda 75.86231034482758,
  :tolerance 0.026315790210526314,
  :max-iterations 2705}
 {:lambda 24.138689655172413,
  :tolerance 0.07368421078947368,
  :max-iterations 7394}
 {:lambda 37.931655172413784,
  :tolerance 0.03684210589473684,
  :max-iterations 6352}
 {:lambda 86.20703448275862,
  :tolerance 0.08947368431578948,
  :max-iterations 1142}
 {:lambda 62.06934482758621,
  :tolerance 0.010526316684210526,
  :max-iterations 8957}
 {:lambda 13.793965517241379,
  :tolerance 0.06315789510526315,
  :max-iterations 3747}
 {:lambda 17.242206896551725,
  :tolerance 0.031578948052631575,
  :max-iterations 9478}
 {:lambda 68.9658275862069,
  :tolerance 0.07894736863157896,
  :max-iterations 4268}
 {:lambda 93.10351724137931,
  :tolerance 0.0052631588421052635,
  :max-iterations 6873})
(def logistic-regression-specs
  (map
   #(assoc %
           :model-type :smile.classification/logistic-regression)
   hyper-params))
logistic-regression-specs
({:lambda 51.72462068965517,
  :tolerance 0.05263157942105263,
  :max-iterations 5310,
  :model-type :smile.classification/logistic-regression}
 {:lambda 75.86231034482758,
  :tolerance 0.026315790210526314,
  :max-iterations 2705,
  :model-type :smile.classification/logistic-regression}
 {:lambda 24.138689655172413,
  :tolerance 0.07368421078947368,
  :max-iterations 7394,
  :model-type :smile.classification/logistic-regression}
 {:lambda 37.931655172413784,
  :tolerance 0.03684210589473684,
  :max-iterations 6352,
  :model-type :smile.classification/logistic-regression}
 {:lambda 86.20703448275862,
  :tolerance 0.08947368431578948,
  :max-iterations 1142,
  :model-type :smile.classification/logistic-regression}
 {:lambda 62.06934482758621,
  :tolerance 0.010526316684210526,
  :max-iterations 8957,
  :model-type :smile.classification/logistic-regression}
 {:lambda 13.793965517241379,
  :tolerance 0.06315789510526315,
  :max-iterations 3747,
  :model-type :smile.classification/logistic-regression}
 {:lambda 17.242206896551725,
  :tolerance 0.031578948052631575,
  :max-iterations 9478,
  :model-type :smile.classification/logistic-regression}
 {:lambda 68.9658275862069,
  :tolerance 0.07894736863157896,
  :max-iterations 4268,
  :model-type :smile.classification/logistic-regression}
 {:lambda 93.10351724137931,
  :tolerance 0.0052631588421052635,
  :max-iterations 6873,
  :model-type :smile.classification/logistic-regression})

The list of the model types we want to try:

(def models-specs 
  (concat logistic-regression-specs
          [{:model-type :scicloj.ml.tribuo/classification
            :tribuo-components [{:name "cart"
                                 :type "org.tribuo.classification.dtree.CARTClassificationTrainer"
                                 :properties {:maxDepth "8"
                                              :useRandomSplitPoints "false"
                                              :fractionFeaturesInSplit "0.5"}}
                                {:name "combiner"
                                 :type "org.tribuo.classification.ensemble.VotingCombiner"}
           
                                {:name "random-forest"
                                 :type "org.tribuo.common.tree.RandomForestTrainer"
                                 :properties {:innerTrainer "cart"
                                              :combiner "combiner"
                                              :seed "1234"
                                              :numMembers "500"}}]
            :tribuo-trainer-name "random-forest"}
           
           {:model-type :xgboost/classification :round 10}
           {:model-type :sklearn.classification/decision-tree-classifier}
           {:model-type :sklearn.classification/logistic-regression}
           {:model-type :sklearn.classification/random-forest-classifier}
           {:model-type :metamorph.ml/dummy-classifier}
           {:model-type :scicloj.ml.tribuo/classification
            :tribuo-components [{:name "logistic"
                                 :type "org.tribuo.classification.sgd.linear.LogisticRegressionTrainer"}]
            :tribuo-trainer-name "logistic"}
           
           ]))

This uses models from Smile, Tribuo and sklearn but could be any metamorph.ml compliant model

The list of feature combinations to try for each model:

(def feature-combinations
  [[:sex :pclass :embarked]
   [:sex]
   [:pclass :embarked]
   [:embarked]
   [:sex :embarked]
   [:sex :pclass]])

generate 102 pipeline functions:

(def pipe-fns
  (for [model-spec models-specs
        feature-combination feature-combinations]
    (make-pipe-fn model-spec feature-combination)))
(count pipe-fns)
102

Execute all pipelines for all splits in the cross-validations and return best model by classification-accuracy

(add-tap println)
nil
(def evaluation-results
  (ml/evaluate-pipelines
   pipe-fns
   titanic-k-fold
   loss/classification-accuracy
   :accuracy))
NoteTHREAD OUT
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 0, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 1, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 2, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 3, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 4, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 5, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 6, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 7, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 8, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 9, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 10, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 11, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 12, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 13, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 14, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 15, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 16, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 17, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 18, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 19, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 20, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 21, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 22, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 23, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 24, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 25, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 26, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 27, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 28, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 29, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 30, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 31, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 32, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 33, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 34, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 35, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 36, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 37, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 38, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 39, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 40, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 41, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 42, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 43, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 44, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 45, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 46, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 47, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 48, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 49, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 50, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 51, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 52, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 53, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 54, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 55, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 56, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 57, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 58, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 59, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 60, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 61, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 62, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 63, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 64, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 65, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 66, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 67, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 68, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 69, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 70, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 71, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 72, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 73, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 74, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 75, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 76, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 77, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 78, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 79, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 80, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 81, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 82, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 83, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 84, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 85, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 86, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 87, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 88, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 89, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 90, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 91, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 92, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 93, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 94, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 95, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 96, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 97, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 98, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 99, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 100, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 101, :size 101}

By default it returns the best mode only

(make-results-ds evaluation-results)

_unnamed [1 3]:

:used-features :mean-accuracy :options
[:sex :pclass :embarked] 0.81107726 {:model-type :sklearn.classification/random-forest-classifier}

The key observation is here, that the metamorph pipelines allow to not only grid-search over the model hyperparameters, but as well over arbitrary pipeline variations, like which features to include. Both get handled in the same way.

We can get all results as well:

(def evaluation-results-all
  (ml/evaluate-pipelines
   pipe-fns
   titanic-k-fold
   loss/classification-accuracy
   :accuracy
   {:map-fn :map
    :return-best-crossvalidation-only false
    :return-best-pipeline-only false}))
NoteTHREAD OUT
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 0, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 1, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 2, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 3, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 4, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 5, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 6, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 7, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 8, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 9, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 10, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 11, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 12, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 13, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 14, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 15, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 16, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 17, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 18, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 19, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 20, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 21, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 22, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 23, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 24, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 25, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 26, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 27, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 28, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 29, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 30, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 31, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 32, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 33, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 34, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 35, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 36, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 37, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 38, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 39, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 40, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 41, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 42, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 43, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 44, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 45, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 46, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 47, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 48, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 49, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 50, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 51, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 52, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 53, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 54, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 55, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 56, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 57, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 58, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 59, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 60, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 61, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 62, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 63, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 64, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 65, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 66, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 67, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 68, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 69, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 70, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 71, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 72, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 73, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 74, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 75, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 76, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 77, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 78, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 79, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 80, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 81, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 82, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 83, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 84, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 85, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 86, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 87, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 88, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 89, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 90, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 91, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 92, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 93, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 94, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 95, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 96, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 97, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 98, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 99, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 100, :size 101}
{:ns #object[clojure.lang.Namespace 0x726b74f0 noj-book.automl], :topic :pppmap-progress, :label map: evaluate pipelines, :index 101, :size 101}

In total it creates and evaluates 17 models (incl. hyper parameters variations) * 6 feature configurations * 5 CV = 510 models

(->  evaluation-results-all flatten count)
510

We can find the best as well by hand, it’s the first from the list, when sorted by accuracy.

(-> (make-results-ds evaluation-results-all)
    (tc/unique-by)
    (tc/order-by [:mean-accuracy] :desc)
    (tc/head 20)
    (kind/dataset))

_unnamed [20 3]:

:used-features :mean-accuracy :options
[:sex :pclass :embarked] 0.81107726 {:model-type :sklearn.classification/random-forest-classifier}
[:sex :pclass :embarked] 0.81107726 {:model-type :sklearn.classification/decision-tree-classifier}
[:sex :pclass :embarked] 0.81107726 {:model-type :xgboost/classification, :round 10}
[:sex :pclass :embarked] 0.81107726 {:model-type :scicloj.ml.tribuo/classification,
:tribuo-components
[{:name cart,
:type org.tribuo.classification.dtree.CARTClassificationTrainer,
:properties
{:maxDepth 8,
:useRandomSplitPoints false,
:fractionFeaturesInSplit 0.5}}
{:name combiner,
:type org.tribuo.classification.ensemble.VotingCombiner}
{:name random-forest,
:type org.tribuo.common.tree.RandomForestTrainer,
:properties
{:innerTrainer cart,
:combiner combiner,
:seed 1234,
:numMembers 500}}],
:tribuo-trainer-name random-forest}
[:sex :pclass :embarked] 0.80206945 {:lambda 17.242206896551725,
:tolerance 0.031578948052631575,
:max-iterations 9478,
:model-type :smile.classification/logistic-regression}
[:sex :pclass :embarked] 0.80094585 {:lambda 24.138689655172413,
:tolerance 0.07368421078947368,
:max-iterations 7394,
:model-type :smile.classification/logistic-regression}
[:sex :pclass :embarked] 0.79083349 {:lambda 13.793965517241379,
:tolerance 0.06315789510526315,
:max-iterations 3747,
:model-type :smile.classification/logistic-regression}
[:sex :pclass :embarked] 0.78969720 {:lambda 37.931655172413784,
:tolerance 0.03684210589473684,
:max-iterations 6352,
:model-type :smile.classification/logistic-regression}
[:sex :pclass] 0.78633911 {:lambda 51.72462068965517,
:tolerance 0.05263157942105263,
:max-iterations 5310,
:model-type :smile.classification/logistic-regression}
[:sex :pclass] 0.78633911 {:lambda 75.86231034482758,
:tolerance 0.026315790210526314,
:max-iterations 2705,
:model-type :smile.classification/logistic-regression}
[:sex :pclass] 0.78633911 {:lambda 37.931655172413784,
:tolerance 0.03684210589473684,
:max-iterations 6352,
:model-type :smile.classification/logistic-regression}
[:sex :pclass] 0.78633911 {:lambda 62.06934482758621,
:tolerance 0.010526316684210526,
:max-iterations 8957,
:model-type :smile.classification/logistic-regression}
[:sex :pclass] 0.78633911 {:lambda 68.9658275862069,
:tolerance 0.07894736863157896,
:max-iterations 4268,
:model-type :smile.classification/logistic-regression}
[:sex :embarked] 0.78633276 {:model-type :scicloj.ml.tribuo/classification,
:tribuo-components
[{:name cart,
:type org.tribuo.classification.dtree.CARTClassificationTrainer,
:properties
{:maxDepth 8,
:useRandomSplitPoints false,
:fractionFeaturesInSplit 0.5}}
{:name combiner,
:type org.tribuo.classification.ensemble.VotingCombiner}
{:name random-forest,
:type org.tribuo.common.tree.RandomForestTrainer,
:properties
{:innerTrainer cart,
:combiner combiner,
:seed 1234,
:numMembers 500}}],
:tribuo-trainer-name random-forest}
[:sex] 0.78633276 {:model-type :scicloj.ml.tribuo/classification,
:tribuo-components
[{:name cart,
:type org.tribuo.classification.dtree.CARTClassificationTrainer,
:properties
{:maxDepth 8,
:useRandomSplitPoints false,
:fractionFeaturesInSplit 0.5}}
{:name combiner,
:type org.tribuo.classification.ensemble.VotingCombiner}
{:name random-forest,
:type org.tribuo.common.tree.RandomForestTrainer,
:properties
{:innerTrainer cart,
:combiner combiner,
:seed 1234,
:numMembers 500}}],
:tribuo-trainer-name random-forest}
[:sex] 0.78633276 {:lambda 68.9658275862069,
:tolerance 0.07894736863157896,
:max-iterations 4268,
:model-type :smile.classification/logistic-regression}
[:sex :embarked] 0.78633276 {:lambda 17.242206896551725,
:tolerance 0.031578948052631575,
:max-iterations 9478,
:model-type :smile.classification/logistic-regression}
[:sex] 0.78633276 {:lambda 17.242206896551725,
:tolerance 0.031578948052631575,
:max-iterations 9478,
:model-type :smile.classification/logistic-regression}
[:sex :pclass] 0.78633276 {:lambda 13.793965517241379,
:tolerance 0.06315789510526315,
:max-iterations 3747,
:model-type :smile.classification/logistic-regression}
[:sex :embarked] 0.78633276 {:lambda 13.793965517241379,
:tolerance 0.06315789510526315,
:max-iterations 3747,
:model-type :smile.classification/logistic-regression}

14.6 Best practices for data transformation steps in or outside pipeline

(require '[scicloj.metamorph.ml.toydata :as data]
         '[tech.v3.dataset.modelling :as ds-mod]
         '[tech.v3.dataset.categorical :as ds-cat]
         '[tech.v3.dataset :as ds])

We have seen that we have two ways to transform the input data, outside the pipeline and inside the pipeline.

These are the total steps from raw data to “into the model” for the titanic use case.

  1. raw data
(def titanic
  (:train
   (data/titanic-ds-split)))
  1. first transformation, no metamorph pipeline
(def relevant-titanic-data
  (-> titanic
      (tc/select-columns (conj ml-basic/categorical-feature-columns :survived))
      (tc/drop-missing)
      (ds/categorical->number [:sex :pclass :embarked] [0 1 2 "male" "female" "S" "Q" "C"] :float64)
      (ds/categorical->number [:survived] [0 1] :float64)
      (ds-mod/set-inference-target :survived)))
  1. transform via pipelines
(defn make-pipe-fn [model-type features]
  (mm/pipeline
   ;; store the used features in ctx, so we can retrieve them at the end
   (fn [ctx]
     (assoc ctx :used-features features))
   (mm/lift tc/select-columns (conj features :survived))
   {:metamorph/id :model} (ml/model {:model-type model-type})))

While it would be technically possible to move all steps from the “first transformation” into the pipeline, by just using the “lifted” form of the transformations, I would not do so, even though this should give the same result.

Often it is better to separate the steps which are “fixed”, from the steps which are parameterized, so for which we want to find the best values by “trying out”.

In my view there are two reasons for this: * Debugging: It is harder to debug a pipeline and see the results of steps. We have one macro helping in this: mm/def-ctx * Performance: The pipeline is executed lots of times, for every split / variant of the pipeline. It should be faster to do data transformations only once, before the metamorph pipeline starts.

Nevertheless, in some scenarios it is very useful to create a full transformation pipeline as a metamorph pipeline. This would for example allow to perform very different transformation steps per model and still only have a single seq of pipeline functions to manage, therefore having fully self-contained pipelines.

source: notebooks/noj_book/automl.clj