scicloj.metamorph.ml.column-metric

Model evaluation metrics for classification and regression tasks.

This namespace provides functions to compute standard machine learning metrics on model predictions vs. ground truth labels, with support for both binary and multiclass classification as well as regression tasks.

Key Functions:

classification-metric: Evaluate classification model predictions
regression-metric: Evaluate regression model predictions

Classification Metrics (from fastmath.stats):

Supports binary and multiclass metrics including accuracy, precision, recall, F1-score, and more. Multiclass metrics can be averaged using: - :macro - Unweighted mean of per-class metrics - :micro - Aggregated true/false positives globally Also supports :roc-auc for multiclass AUC scoring.

Regression Metrics (from fastmath.stats): Distance and similarity metrics such as MAE, MSE, RMSE, R², etc.

Data Format:

Input datasets must be tech.ml.dataset (TMD) format
Must have appropriate column metadata (:prediction, :target, etc.)
Support categorical mappings via :categorical-map metadata
Missing values and NaNs are detected and rejected appropriately

Validation: The functions perform extensive validation including:

Column metadata correctness
Missing values and NaN detection
Type and datatype uniformity
Row count alignment between datasets
Single-label assumption (multi-label not yet supported)

See also: fastmath.stats documentation for available metric names

classification-metric

(classification-metric y-true y-pred metric averaging options)(classification-metric y-true y-pred metric averaging)

Calculates various classification metrics, supporting binary and multiclass data. Return a single float number

y-true A TMD dataset, having the truth
y-pred A TMD dataset, having the prediction
metric A keyword, supports any metric from: https://generateme.github.io/fastmath/clay/stats.html#binary-classification-metrics and :roc-auc
averaging How the mostly binary metrices get averaged, supports :macro and :micro
options Options for the :metric-fn

Multi-label data is so far not supported.

Both datasets need to have columns containing the appropriate column metadata as foreseen by TMD, see here:https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.column-filters.html , eg:

:column-type being :prediction, :probability-distribution
:inference-target true
:categorical-map column metadata is explicitely supported and get handled properly when present, so gets taken into consideration

when comparing columns

The ml/predict fn is producing these type of datasets.

The function validates various aspects and ev. rejects data which has:

wrong column metadata
missing values or NaNs
non-discrete values in :prediction column
non-uniform datatypes
multi-label data ( having > 1 :inference-target column)
mistmatch in shape between y-true and y-pred
others

This might depend on the concrete metric-fn used.

Examples

Calculate ‘accuracy’

(classification-metric
 (ds/new-dataset
  [(ds/new-column :truth [1 1 1 1] {:inference-target? true})])
 (ds/new-dataset
  [(ds/new-column :pred [1 0 1 0] {:column-type :prediction})])
 :accuracy
 :macro)
;;=> 0.5

Calculate ‘true positives’

(classification-metric
 (ds/new-dataset
  [(ds/new-column :truth [1 1 1 1] {:inference-target? true})])
 (ds/new-dataset
  [(ds/new-column :pred [1 0 1 0] {:column-type :prediction})])
 :tp
 :micro)
;;=> 2.0

view source

regression-metric

(regression-metric y-true y-pred metric-fn)

Calculates various regression metrics and return a single float number

y-true A TMD dataset, having the truth
y-pred A TMD dataset, having the prediction
metric A keyword, supports any metric from: https://generateme.github.io/fastmath/clay/stats.html#distance-and-similarity-metrics

Both datasets need to have columns containing the appropriate column metadata as foreseen by TMD, see here:https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.column-filters.html , eg: * :column-type being :prediction * :inference-target true

The ml/predict fn is producing these type of datasets.

The function validates various aspects and ev. rejects data which has: * wrong column metadata * missing values or NaNs * non-continous values in :prediction column * non-uniform datatypes * is multi-label data ( having > 1 :inference-target column) * mistmatch in shape between y-true and y-pred * others

This might depend on the concrete metric-fn used.

Examples

do regression and calculate RMSE

(let [split (-> (rdatasets/datasets-iris)
                (ds/remove-columns [:rownames :species])
                (ds-mod/set-inference-target [:petal-width])
                (ds-mod/train-test-split))
      model (ml/train (:train-ds split) {:model-type :fastmath/ols})
      prediction (ml/predict (:test-ds split) model)]
  (col-metric/regression-metric (cf/target (:test-ds split))
                                prediction
                                :rmse))
;;=> 0.20672462887913676

view source

Generated by Codox with the Clojang UI theme

Project

Topics

Namespaces

Public Vars

scicloj.metamorph.ml.column-metric

Categories

classification-metric

Examples

regression-metric

Examples