2 Machine learning - DRAFT
This is part of the Scicloj Clojure Data Scrapbook. |
ns ml
(:require [scicloj.ml.core :as ml]
(:as mm]
[scicloj.ml.metamorph :refer [dataset add-column]]
[scicloj.ml.dataset :as ds]
[scicloj.ml.dataset
[fastmath.stats]:as tc]
[tablecloth.api :as datasets]
[scicloj.noj.v1.datasets :as kind])) [scicloj.kindly.v4.kind
2.1 Linear regression
We will explore the Iris dataset:
(tc/head datasets/iris)
_unnamed [5 5]:
:sepal-length | :sepal-width | :petal-length | :petal-width | :species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
A Metamorph pipeline for linear regression:
def additive-pipeline
(
(ml/pipeline:sepal-length)
(mm/set-inference-target :species])
(mm/drop-columns [:metamorph/id :model}
{:model-type :smile.regression/ordinary-least-square}))) (mm/model {
Training and evaluating the pipeline on various subsets:
def evaluations
(
(ml/evaluate-pipelines
[additive-pipeline]:holdout)
(ds/split->seq datasets/iris
ml/rmse:loss
:other-metrices [{:name :r2
{:metric-fn fastmath.stats/r2-determination}]}))
Printing one of the trained models (note that the Smile regression model is recognized by Kindly and printed correctly):
-> evaluations
(
flattenfirst
:fit-ctx
:model
ml/thaw-model)
Linear Model:
Residuals:1Q Median 3Q Max
Min 0.7326 -0.2096 -0.0182 0.1866 0.8517
-
Coefficients:>|t|)
Estimate Std. Error t value Pr(1.7373 0.2932 5.9256 0.0000 ***
Intercept 0.6949 0.0767 9.0631 0.0000 ***
sepal-width 0.6458 0.0660 9.7786 0.0000 ***
petal-length 0.3970 0.1481 -2.6810 0.0086 **
petal-width -
---------------------------------------------------------------------0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Significance codes:
0.3194 on 96 degrees of freedom
Residual standard error: 0.8511, Adjusted R-squared: 0.8465
Multiple R-squared: 182.9573 on 4 and 96 DF, p-value: 1.434e-39 F-statistic:
source: projects/noj/notebooks/ml.clj