8 Statistics (experimental π )
author: Daniel Slutsky
ns noj-book.statistics
(:require [scicloj.noj.v1.stats :as stats]
(:as tc])) [tablecloth.api
8.1 Example data
def iris
(-> "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv"
(:key-fn keyword})
(tc/dataset {:Sepal.Length :sepal-length
(tc/rename-columns {:Sepal.Width :sepal-width
:Petal.Length :petal-length
:Petal.Width :petal-width
:Species :species})))
8.2 Multivariate regression
The stats/regression-model
function computes a regressiom model (using scicloj.ml
) and adds some relevant information such as the R^2
measure.
-> iris
(
(stats/regression-model:sepal-length
:sepal-width :petal-length :petal-width]
[:model-type :smile.regression/elastic-net})
{dissoc :model-data)) (
:feature-columns [:sepal-width :petal-length :petal-width],
{:target-columns [:sepal-length],
:explained #function[clojure.lang.AFunction/1],
:R2 0.8582120394596505,
:id #uuid "7d815ff7-51a4-45e4-aa28-e90bb578f1e6",
:predictions #tech.v3.dataset.column<float64>[150]
:sepal-length
5.022, 4.724, 4.775, 4.851, 5.081, 5.360, 4.911, 5.030, 4.664, 4.903, 5.209, 5.098, 4.775, 4.572, 5.184, 5.522, 5.089, 4.970, 5.352, 5.217...],
[:predict
64674],
#function[scicloj.noj.v1.stats/regression-model/predict--:options {:model-type :smile.regression/elastic-net}}
-> iris
(
(stats/regression-model:sepal-length
:sepal-width :petal-length :petal-width]
[:model-type :smile.regression/ordinary-least-square})
{dissoc :model-data)) (
:feature-columns [:sepal-width :petal-length :petal-width],
{:target-columns [:sepal-length],
:explained #function[clojure.lang.AFunction/1],
:R2 0.8586117200663171,
:id #uuid "3fcc1add-7628-4139-81c0-90300b681a74",
:predictions #tech.v3.dataset.column<float64>[150]
:sepal-length
5.015, 4.690, 4.749, 4.826, 5.080, 5.377, 4.895, 5.021, 4.625, 4.882, 5.216, 5.092, 4.746, 4.533, 5.199, 5.561, 5.094, 4.960, 5.368, 5.226...],
[:predict
64674],
#function[scicloj.noj.v1.stats/regression-model/predict--:options {:model-type :smile.regression/ordinary-least-square}}
The stats/linear-regression-model
convenience function uses specifically the :smile.regression/ordinary-least-square
model type.
-> iris
(
(stats/linear-regression-model:sepal-length
:sepal-width :petal-length :petal-width])
[dissoc :model-data)) (
:feature-columns [:sepal-width :petal-length :petal-width],
{:target-columns [:sepal-length],
:explained #function[clojure.lang.AFunction/1],
:R2 0.8586117200663171,
:id #uuid "68741935-cd8a-48bf-9f29-b133fc3be3ca",
:predictions #tech.v3.dataset.column<float64>[150]
:sepal-length
5.015, 4.690, 4.749, 4.826, 5.080, 5.377, 4.895, 5.021, 4.625, 4.882, 5.216, 5.092, 4.746, 4.533, 5.199, 5.561, 5.094, 4.960, 5.368, 5.226...],
[:predict
64674],
#function[scicloj.noj.v1.stats/regression-model/predict--:options {:model-type :smile.regression/ordinary-least-square}}
8.3 Adding regression predictions to a dataset
The stats/add-predictions
function models a target column using feature columns, adds a new prediction column with the model predictions.
-> iris
(
(stats/add-predictions:sepal-length
:sepal-width :petal-length :petal-width]
[:model-type :smile.regression/ordinary-least-square})) {
https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv [150 7]:
:rownames | :sepal-length | :sepal-width | :petal-length | :petal-width | :species | :sepal-length-prediction |
---|---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 5.01541576 |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa | 4.68999718 |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | 4.74925142 |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | 4.82599409 |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 5.08049948 |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 5.37719368 |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa | 4.89468378 |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa | 5.02124524 |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa | 4.62491347 |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa | 4.88164236 |
β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica | 6.53429168 |
141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica | 6.50917327 |
142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica | 6.21025556 |
143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica | 6.17251376 |
144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica | 6.84264484 |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica | 6.65460564 |
146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica | 6.21608504 |
147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica | 5.97143313 |
148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica | 6.38302984 |
149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica | 6.61824630 |
150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica | 6.42341317 |
It attaches the modelβs information to the metadata of that new column.
-> iris
(
(stats/add-predictions:sepal-length
:sepal-width :petal-length :petal-width]
[:model-type :smile.regression/ordinary-least-square})
{:sepal-length-prediction
meta
update :model
(dissoc :model-data :predict :predictions))
:name :sepal-length-prediction,
{:datatype :float64,
:n-elems 150,
:column-type :prediction,
:model
:feature-columns [:sepal-width :petal-length :petal-width],
{:target-columns [:sepal-length],
:explained #function[clojure.lang.AFunction/1],
:R2 0.8586117200663171,
:id #uuid "36d384d1-2e4b-45c2-be09-0594533e0780",
:options {:model-type :smile.regression/ordinary-least-square}}}