scicloj.metamorph.ml.preprocessing

Feature scaling and normalization transformers for metamorph pipelines.

This namespace provides metamorph-compatible transformers for standardizing and normalizing numeric features. These preprocessing steps are essential for many machine learning algorithms to perform well.

Available Transformers:

  • std-scale: Standardization (z-score normalization)
  • min-max-scale: Min-max scaling to a specified range

StandardScaling (std-scale): Centers each numeric column (subtract mean) and/or scales by standard deviation, producing zero-mean unit-variance data. Useful for:

  • Algorithms sensitive to feature magnitude (SVMs, neural networks, KNN)
  • Distance-based models

Options:

  • :mean? (default true): Center by subtracting column mean
  • :stddev? (default true): Scale by standard deviation

Min-Max Scaling (min-max-scale):

Rescales each numeric column to a specified range (default -0.5, 0.5). Options:

  • :min (default -0.5): Target minimum value
  • :max (default 0.5): Target maximum value

Metamorph Integration: Both transformers follow the metamorph pipeline pattern:

  • :fit mode: Learn scaling parameters from training data
  • :transform mode: Apply learned parameters to new data
  • Stores transformation parameters in context under their assigned :metamorph/id

Categories

    Other vars: min-max-scale std-scale

    min-max-scale

    (min-max-scale columns-selector {:keys [min max], :or {min -0.5, max 0.5}, :as options})

    Metamorph transfomer, which scales the column data into a given range.

    columns-selector tablecloth columns-selector to choose columns to work on meta-field tablecloth meta-field working with columns-selector

    options Options for scaler, can take: min Minimal value to scale to (default -0.5) max Maximum value to scale to (default 0.5)

    metamorph .
    Behaviour in mode :fit Scales the dataset at key :metamorph/data and stores the trained model in ctx under key at :metamorph/id
    Behaviour in mode :transform Reads trained min-max-scale model from ctx and applies it to data in :metamorph/data
    Reads keys from ctx In mode :transform : Reads trained model to use for from key in :metamorph/id.
    Writes keys to ctx In mode :fit : Stores trained model in key $id

    Examples

    Usage

    (let [data (tc/dataset [[100 0.001] [8 0.05] [50 0.005] [88 0.07]
                            [4 0.1]]
                           {:layout :as-row})
          pipe (mm/pipeline (min-max-scale [0 1] {:min -1, :max 1}))
          fitted (pipe {:metamorph/data data, :metamorph/mode :fit})]
      (str (:metamorph/data fitted)))
    ;;=> :_unnamed [5 2]:
    ;;=> 
    ;;=> |           0 |           1 |
    ;;=> |------------:|------------:|
    ;;=> |  1.00000000 | -1.00000000 |
    ;;=> | -0.91666667 | -0.01010101 |
    ;;=> | -0.04166667 | -0.91919192 |
    ;;=> |  0.75000000 |  0.39393939 |
    ;;=> | -1.00000000 |  1.00000000 |

    std-scale

    (std-scale columns-selector meta-field {:keys [mean? stddev?], :or {mean? true, stddev? true}, :as options})(std-scale columns-selector options)

    Metamorph transfomer, which centers and scales the dataset per column.

    columns-selector tablecloth columns-selector to choose columns to work on meta-field tablecloth meta-field working with columns-selector

    options are the options for the scaler and can take: mean? If true (default), the data gets shifted by the column means, so 0 centered stddev? If true (default), the data gets scaled by the standard deviation of the column

    metamorph .
    Behaviour in mode :fit Centers and scales the dataset at key :metamorph/data and stores the trained model in ctx under key at :metamorph/id
    Behaviour in mode :transform Reads trained std-scale model from ctx and applies it to data in :metamorph/data
    Reads keys from ctx In mode :transform : Reads trained model to use for from key in :metamorph/id.
    Writes keys to ctx In mode :fit : Stores trained model in key $id

    Examples

    Usage

    (let [data (tc/dataset [[100 0.001] [8 0.05] [50 0.005] [88 0.07]
                            [4 0.1]]
                           {:layout :as-row})
          pipe (mm/pipeline (std-scale [0 1] {}))
          fitted (pipe {:metamorph/data data, :metamorph/mode :fit})]
      (str (:metamorph/data fitted)))
    ;;=> :_unnamed [5 2]:
    ;;=> 
    ;;=> |           0 |           1 |
    ;;=> |------------:|------------:|
    ;;=> |  1.13053908 | -1.04102352 |
    ;;=> | -0.94965283 |  0.11305233 |
    ;;=> |  0.00000000 | -0.94681324 |
    ;;=> |  0.85920970 |  0.58410369 |
    ;;=> | -1.04009595 |  1.29068074 |