2 Machine learning - DRAFT
This is part of the Scicloj Clojure Data Tutorials. |
ns ml
(:require [scicloj.ml.core :as ml]
(:as mm]
[scicloj.ml.metamorph :refer [dataset add-column]]
[scicloj.ml.dataset :as ds]
[scicloj.ml.dataset
[fastmath.stats]:as tc]
[tablecloth.api :as datasets]
[scicloj.noj.v1.datasets :as kind])) [scicloj.kindly.v4.kind
2.1 Linear regression
We will explore the Iris dataset:
(tc/head datasets/iris)
_unnamed [5 5]:
:sepal-length | :sepal-width | :petal-length | :petal-width | :species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
A Metamorph pipeline for linear regression:
def additive-pipeline
(
(ml/pipeline:sepal-length)
(mm/set-inference-target :species])
(mm/drop-columns [:metamorph/id :model}
{:model-type :smile.regression/ordinary-least-square}))) (mm/model {
Training and evaluating the pipeline on various subsets:
def evaluations
(
(ml/evaluate-pipelines
[additive-pipeline]:holdout)
(ds/split->seq datasets/iris
ml/rmse:loss
:other-metrices [{:name :r2
{:metric-fn fastmath.stats/r2-determination}]}))
Printing one of the trained models (note that the Smile regression model is recognized by Kindly and printed correctly):
-> evaluations
(
flattenfirst
:fit-ctx
:model
ml/thaw-model)
Linear Model:
Residuals:1Q Median 3Q Max
Min 0.8590 -0.2245 0.0465 0.2136 0.8509
-
Coefficients:>|t|)
Estimate Std. Error t value Pr(2.0365 0.3037 6.7065 0.0000 ***
Intercept 0.6058 0.0810 7.4785 0.0000 ***
sepal-width 0.6784 0.0725 9.3580 0.0000 ***
petal-length 0.4970 0.1676 -2.9659 0.0038 **
petal-width -
---------------------------------------------------------------------0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Significance codes:
0.3247 on 96 degrees of freedom
Residual standard error: 0.8626, Adjusted R-squared: 0.8583
Multiple R-squared: 200.8142 on 4 and 96 DF, p-value: 3.132e-41 F-statistic:
source: projects/noj/notebooks/ml.clj