2 Machine learning - DRAFT
| This is part of the Scicloj Clojure Data Scrapbook. |
(ns ml
(:require [scicloj.ml.core :as ml]
[scicloj.ml.metamorph :as mm]
[scicloj.ml.dataset :refer [dataset add-column]]
[scicloj.ml.dataset :as ds]
[fastmath.stats]
[tablecloth.api :as tc]
[scicloj.noj.v1.datasets :as datasets]
[scicloj.kindly.v4.kind :as kind]))2.1 Linear regression
We will explore the Iris dataset:
(tc/head datasets/iris)_unnamed [5 5]:
| :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
A Metamorph pipeline for linear regression:
(def additive-pipeline
(ml/pipeline
(mm/set-inference-target :sepal-length)
(mm/drop-columns [:species])
{:metamorph/id :model}
(mm/model {:model-type :smile.regression/ordinary-least-square})))Training and evaluating the pipeline on various subsets:
(def evaluations
(ml/evaluate-pipelines
[additive-pipeline]
(ds/split->seq datasets/iris :holdout)
ml/rmse
:loss
{:other-metrices [{:name :r2
:metric-fn fastmath.stats/r2-determination}]}))Printing one of the trained models (note that the Smile regression model is recognized by Kindly and printed correctly):
(-> evaluations
flatten
first
:fit-ctx
:model
ml/thaw-model)Linear Model:
Residuals:
Min 1Q Median 3Q Max
-0.7326 -0.2096 -0.0182 0.1866 0.8517
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Intercept 1.7373 0.2932 5.9256 0.0000 ***
sepal-width 0.6949 0.0767 9.0631 0.0000 ***
petal-length 0.6458 0.0660 9.7786 0.0000 ***
petal-width -0.3970 0.1481 -2.6810 0.0086 **
---------------------------------------------------------------------
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3194 on 96 degrees of freedom
Multiple R-squared: 0.8511, Adjusted R-squared: 0.8465
F-statistic: 182.9573 on 4 and 96 DF, p-value: 1.434e-39source: projects/noj/notebooks/ml.clj