scicloj.metamorph.ml.preprocessing
Feature scaling and normalization transformers for metamorph pipelines.
This namespace provides metamorph-compatible transformers for standardizing and normalizing numeric features. These preprocessing steps are essential for many machine learning algorithms to perform well.
Available Transformers:
std-scale: Standardization (z-score normalization)min-max-scale: Min-max scaling to a specified range
StandardScaling (std-scale): Centers each numeric column (subtract mean) and/or scales by standard deviation, producing zero-mean unit-variance data. Useful for:
- Algorithms sensitive to feature magnitude (SVMs, neural networks, KNN)
- Distance-based models
Options:
:mean?(default true): Center by subtracting column mean:stddev?(default true): Scale by standard deviation
Min-Max Scaling (min-max-scale):
Rescales each numeric column to a specified range (default -0.5, 0.5). Options:
:min(default -0.5): Target minimum value:max(default 0.5): Target maximum value
Metamorph Integration: Both transformers follow the metamorph pipeline pattern:
:fitmode: Learn scaling parameters from training data:transformmode: Apply learned parameters to new data- Stores transformation parameters in context under their assigned
:metamorph/id
min-max-scale
(min-max-scale columns-selector {:keys [min max], :or {min -0.5, max 0.5}, :as options})Metamorph transfomer, which scales the column data into a given range.
columns-selector tablecloth columns-selector to choose columns to work on meta-field tablecloth meta-field working with columns-selector
options Options for scaler, can take: min Minimal value to scale to (default -0.5) max Maximum value to scale to (default 0.5)
| metamorph | . |
|---|---|
| Behaviour in mode :fit | Scales the dataset at key :metamorph/data and stores the trained model in ctx under key at :metamorph/id |
| Behaviour in mode :transform | Reads trained min-max-scale model from ctx and applies it to data in :metamorph/data |
| Reads keys from ctx | In mode :transform : Reads trained model to use for from key in :metamorph/id. |
| Writes keys to ctx | In mode :fit : Stores trained model in key $id |
Examples
Usage
(let [data (tc/dataset [[100 0.001] [8 0.05] [50 0.005] [88 0.07]
[4 0.1]]
{:layout :as-row})
pipe (mm/pipeline (min-max-scale [0 1] {:min -1, :max 1}))
fitted (pipe {:metamorph/data data, :metamorph/mode :fit})]
(str (:metamorph/data fitted)))
;;=> :_unnamed [5 2]:
;;=>
;;=> | 0 | 1 |
;;=> |------------:|------------:|
;;=> | 1.00000000 | -1.00000000 |
;;=> | -0.91666667 | -0.01010101 |
;;=> | -0.04166667 | -0.91919192 |
;;=> | 0.75000000 | 0.39393939 |
;;=> | -1.00000000 | 1.00000000 |std-scale
(std-scale columns-selector meta-field {:keys [mean? stddev?], :or {mean? true, stddev? true}, :as options})(std-scale columns-selector options)Metamorph transfomer, which centers and scales the dataset per column.
columns-selector tablecloth columns-selector to choose columns to work on meta-field tablecloth meta-field working with columns-selector
options are the options for the scaler and can take: mean? If true (default), the data gets shifted by the column means, so 0 centered stddev? If true (default), the data gets scaled by the standard deviation of the column
| metamorph | . |
|---|---|
| Behaviour in mode :fit | Centers and scales the dataset at key :metamorph/data and stores the trained model in ctx under key at :metamorph/id |
| Behaviour in mode :transform | Reads trained std-scale model from ctx and applies it to data in :metamorph/data |
| Reads keys from ctx | In mode :transform : Reads trained model to use for from key in :metamorph/id. |
| Writes keys to ctx | In mode :fit : Stores trained model in key $id |
Examples
Usage
(let [data (tc/dataset [[100 0.001] [8 0.05] [50 0.005] [88 0.07]
[4 0.1]]
{:layout :as-row})
pipe (mm/pipeline (std-scale [0 1] {}))
fitted (pipe {:metamorph/data data, :metamorph/mode :fit})]
(str (:metamorph/data fitted)))
;;=> :_unnamed [5 2]:
;;=>
;;=> | 0 | 1 |
;;=> |------------:|------------:|
;;=> | 1.13053908 | -1.04102352 |
;;=> | -0.94965283 | 0.11305233 |
;;=> | 0.00000000 | -0.94681324 |
;;=> | 0.85920970 | 0.58410369 |
;;=> | -1.04009595 | 1.29068074 |