scicloj.metamorph.ml.r-model-matrix

R-style formula-based feature engineering and linear regression.

This namespace provides tools to leverage R’s powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation.

Key Functions:

  • r-model-matrix: Convert dataset + R formula to design matrix
  • lm: Simplified linear regression using R formulas

Implementation Backends: The namespace supports multiple R execution backends:

  • :ocpu Remote R via OpenCPU (cloud.opencpu.org) - no local R needed
  • :renjin Java-based R implementation (https://renjin.org/)
  • :clojisr Local R via clojisr (requires R installation)

Model Matrix Capabilities: R formulas handle:

  • Basic features: y ~ x1 + x2
  • Interactions: y ~ x1 * x2 (expands to x1 + x2 + x1:x2)
  • Polynomial terms: y ~ x + I(x^2)
  • Categorical encoding: Automatic dummy variable creation
  • Intercept control: y ~ x - 1 (remove intercept)
  • Exclusions: y ~ . - x3 (all columns except x3)

Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions.

Notes:

  • OpenCPU backend is convenient but requires internet connectivity
  • Renjin is standalone but may have some R incompatibilities
  • clojisr requires a local R installation but offers full R compatibility
  • Returned model matrices exclude row names and intercept columns by default

See also: scicloj.metamorph.ml.design-matrix for Clojure-native feature engineering

Categories

    Other vars: lm model r-model-matrix

    lm

    (lm ds formula target-var formula-impl)

    Train a linear model using an R-style formula.

    This function combines R formula-based feature engineering with ordinary least squares (OLS) regression. It creates a design matrix from the input dataset using the specified R formula, then trains a linear model on the resulting features.

    Parameters:

    • ds A tech.ml.dataset dataset containing the input data with all variables referenced in the formula and target variable.
    • formula A string containing the R formula (e.g., “y ~ x1 + x2 * x3”). The formula is interpreted by the R backend.
    • target-var A keyword or string naming the target variable for regression. This variable must be present in the input dataset.
    • formula-impl An implementation keyword for formula evaluation:
    • :ocpu Uses OpenCPU (cloud.opencpu.org), no local R needed

    • :renjin Uses Renjin, a Java implementation of R
    • :clojisr Uses clojisr with local R installation

    Requires setup of dependencies of teh engine, see: r-model-matrix

    Returns: A trained linear model (OLS from fastmath) ready for predictions. The model excludes the intercept column and row names from the design matrix by default.

    Examples

    Make linear model with formula using R wit :rejin backend

    (require (quote [scicloj.metamorph.ml.rdatasets :as rdatasets])
             (quote [scicloj.metamorph.ml.regression]))
    ;;=> nil
    (def model
      (-> (rdatasets/datasets-iris)
          (lm "`sepal-width` ~ `sepal-length` + `petal-length` "
              :sepal-width
              :renjin)))
    ;;=> #'scicloj.metamorph.ml.r-model-matrix/model

    model

    r-model-matrix

    (r-model-matrix dataset r-formula impl)

    Compute a model matrix from a dataset and an R-style formula.

    Parameters:

    • ds A tech.ml.dataset dataset representing the input data.
    • r-formula A string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatible
    • impl An implementation keyword, either

    Each implementation requires dependencies to be added:

    Returns a dataset containing the constructed design matrix. If ds contains target columns, they are added to the returned dataset.

    Dispatches to the appropriate backend implementation.

    Returns a map with

    • :model-matrix-dataset having the TMD containing the design matrix specified by r-formula
    • :attributes the (R) attributes of the model.matrix object

    Examples

    Call with renjin backend

    (require (quote [scicloj.metamorph.ml.rdatasets :as rdatasets]))
    ;;=> nil
    (-> (rdatasets/datasets-mtcars)
        (r-model-matrix "mpg ~ as.factor(cyl) * hp + disp" :renjin)
        :model-matrix-dataset
        (print/dataset->str))
    ;;=> _unnamed [32 7]:
    ;;=> 
    ;;=> | X.Intercept. | as.factor.cyl.6 | as.factor.cyl.8 |    hp |  disp | as.factor.cyl.6.hp | as.factor.cyl.8.hp |
    ;;=> |-------------:|----------------:|----------------:|------:|------:|-------------------:|-------------------:|
    ;;=> |          1.0 |             1.0 |             0.0 | 110.0 | 160.0 |              110.0 |                0.0 |
    ;;=> |          1.0 |             1.0 |             0.0 | 110.0 | 160.0 |              110.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             0.0 |  93.0 | 108.0 |                0.0 |                0.0 |
    ;;=> |          1.0 |             1.0 |             0.0 | 110.0 | 258.0 |              110.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             1.0 | 175.0 | 360.0 |                0.0 |              175.0 |
    ;;=> |          1.0 |             1.0 |             0.0 | 105.0 | 225.0 |              105.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             1.0 | 245.0 | 360.0 |                0.0 |              245.0 |
    ;;=> |          1.0 |             0.0 |             0.0 |  62.0 | 146.7 |                0.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             0.0 |  95.0 | 140.8 |                0.0 |                0.0 |
    ;;=> |          1.0 |             1.0 |             0.0 | 123.0 | 167.6 |              123.0 |                0.0 |
    ;;=> |          ... |             ... |             ... |   ... |   ... |                ... |                ... |
    ;;=> |          1.0 |             0.0 |             1.0 | 150.0 | 318.0 |                0.0 |              150.0 |
    ;;=> |          1.0 |             0.0 |             1.0 | 150.0 | 304.0 |                0.0 |              150.0 |
    ;;=> |          1.0 |             0.0 |             1.0 | 245.0 | 350.0 |                0.0 |              245.0 |
    ;;=> |          1.0 |             0.0 |             1.0 | 175.0 | 400.0 |                0.0 |              175.0 |
    ;;=> |          1.0 |             0.0 |             0.0 |  66.0 |  79.0 |                0.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             0.0 |  91.0 | 120.3 |                0.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             0.0 | 113.0 |  95.1 |                0.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             1.0 | 264.0 | 351.0 |                0.0 |              264.0 |
    ;;=> |          1.0 |             1.0 |             0.0 | 175.0 | 145.0 |              175.0 |                0.0 |
    ;;=> |          1.0 |             0.0 |             1.0 | 335.0 | 301.0 |                0.0 |              335.0 |
    ;;=> |          1.0 |             0.0 |             0.0 | 109.0 | 121.0 |                0.0 |                0.0 |

    Call with ocpu backend

    (require (quote [scicloj.metamorph.ml.rdatasets :as rdatasets]))
    ;;=> nil
    (-> (rdatasets/datasets-iris)
        (ds/remove-column :rownames)
        (ds-mod/set-inference-target [:species])
        (r-model-matrix "species ~ ." :ocpu)
        :model-matrix-dataset
        (print/dataset->str))
    ;;=> :_unnamed [150 6]:
    ;;=> 
    ;;=> | (Intercept) | `sepal-length` | `sepal-width` | `petal-length` | `petal-width` |  :species |
    ;;=> |------------:|---------------:|--------------:|---------------:|--------------:|-----------|
    ;;=> |           1 |            5.1 |           3.5 |            1.4 |           0.2 |    setosa |
    ;;=> |           1 |            4.9 |           3.0 |            1.4 |           0.2 |    setosa |
    ;;=> |           1 |            4.7 |           3.2 |            1.3 |           0.2 |    setosa |
    ;;=> |           1 |            4.6 |           3.1 |            1.5 |           0.2 |    setosa |
    ;;=> |           1 |            5.0 |           3.6 |            1.4 |           0.2 |    setosa |
    ;;=> |           1 |            5.4 |           3.9 |            1.7 |           0.4 |    setosa |
    ;;=> |           1 |            4.6 |           3.4 |            1.4 |           0.3 |    setosa |
    ;;=> |           1 |            5.0 |           3.4 |            1.5 |           0.2 |    setosa |
    ;;=> |           1 |            4.4 |           2.9 |            1.4 |           0.2 |    setosa |
    ;;=> |           1 |            4.9 |           3.1 |            1.5 |           0.1 |    setosa |
    ;;=> |         ... |            ... |           ... |            ... |           ... |       ... |
    ;;=> |           1 |            6.9 |           3.1 |            5.4 |           2.1 | virginica |
    ;;=> |           1 |            6.7 |           3.1 |            5.6 |           2.4 | virginica |
    ;;=> |           1 |            6.9 |           3.1 |            5.1 |           2.3 | virginica |
    ;;=> |           1 |            5.8 |           2.7 |            5.1 |           1.9 | virginica |
    ;;=> |           1 |            6.8 |           3.2 |            5.9 |           2.3 | virginica |
    ;;=> |           1 |            6.7 |           3.3 |            5.7 |           2.5 | virginica |
    ;;=> |           1 |            6.7 |           3.0 |            5.2 |           2.3 | virginica |
    ;;=> |           1 |            6.3 |           2.5 |            5.0 |           1.9 | virginica |
    ;;=> |           1 |            6.5 |           3.0 |            5.2 |           2.0 | virginica |
    ;;=> |           1 |            6.2 |           3.4 |            5.4 |           2.3 | virginica |
    ;;=> |           1 |            5.9 |           3.0 |            5.1 |           1.8 | virginica |