2 Machine learning - DRAFT
This is part of the Scicloj Clojure Data Tutorials. |
ns ml
(:require [scicloj.ml.core :as ml]
(:as mm]
[scicloj.ml.metamorph :refer [dataset add-column]]
[scicloj.ml.dataset :as ds]
[scicloj.ml.dataset
[fastmath.stats]:as tc]
[tablecloth.api :as datasets]
[scicloj.noj.v1.datasets :as kind])) [scicloj.kindly.v4.kind
2.1 Linear regression
We will explore the Iris dataset:
(tc/head datasets/iris)
_unnamed [5 5]:
:sepal-length | :sepal-width | :petal-length | :petal-width | :species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
A Metamorph pipeline for linear regression:
def additive-pipeline
(
(ml/pipeline:sepal-length)
(mm/set-inference-target :species])
(mm/drop-columns [:metamorph/id :model}
{:model-type :smile.regression/ordinary-least-square}))) (mm/model {
Training and evaluating the pipeline on various subsets:
def evaluations
(
(ml/evaluate-pipelines
[additive-pipeline]:holdout)
(ds/split->seq datasets/iris
ml/rmse:loss
:other-metrices [{:name :r2
{:metric-fn fastmath.stats/r2-determination}]}))
Printing one of the trained models (note that the Smile regression model is recognized by Kindly and printed correctly):
-> evaluations
(
flattenfirst
:fit-ctx
:model
ml/thaw-model)
Linear Model:
Residuals:1Q Median 3Q Max
Min 0.8517 -0.2316 0.0315 0.2308 0.6501
-
Coefficients:>|t|)
Estimate Std. Error t value Pr(1.8837 0.2870 6.5634 0.0000 ***
Intercept 0.6507 0.0753 8.6382 0.0000 ***
sepal-width 0.6913 0.0673 10.2732 0.0000 ***
petal-length 0.5117 0.1490 -3.4344 0.0009 ***
petal-width -
---------------------------------------------------------------------0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Significance codes:
0.3172 on 96 degrees of freedom
Residual standard error: 0.8534, Adjusted R-squared: 0.8488
Multiple R-squared: 186.2321 on 4 and 96 DF, p-value: 6.948e-40 F-statistic:
source: projects/noj/notebooks/ml.clj