scicloj.metamorph.ml.r-model-matrix
R-style formula-based feature engineering and linear regression.
This namespace provides tools to leverage R’s powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation.
Key Functions:
r-model-matrix: Convert dataset + R formula to design matrixlm: Simplified linear regression using R formulas
Implementation Backends: The namespace supports multiple R execution backends:
:ocpuRemote R via OpenCPU (cloud.opencpu.org) - no local R needed:renjinJava-based R implementation (https://renjin.org/):clojisrLocal R via clojisr (requires R installation)
Model Matrix Capabilities: R formulas handle:
- Basic features:
y ~ x1 + x2 - Interactions:
y ~ x1 * x2(expands to x1 + x2 + x1:x2) - Polynomial terms:
y ~ x + I(x^2) - Categorical encoding: Automatic dummy variable creation
- Intercept control:
y ~ x - 1(remove intercept) - Exclusions:
y ~ . - x3(all columns except x3)
Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions.
Notes:
- OpenCPU backend is convenient but requires internet connectivity
- Renjin is standalone but may have some R incompatibilities
- clojisr requires a local R installation but offers full R compatibility
- Returned model matrices exclude row names and intercept columns by default
See also: scicloj.metamorph.ml.design-matrix for Clojure-native feature engineering
lm
(lm ds formula target-var formula-impl)Train a linear model using an R-style formula.
This function combines R formula-based feature engineering with ordinary least squares (OLS) regression. It creates a design matrix from the input dataset using the specified R formula, then trains a linear model on the resulting features.
Parameters:
dsA tech.ml.dataset dataset containing the input data with all variables referenced in the formula and target variable.formulaA string containing the R formula (e.g., “y ~ x1 + x2 * x3”). The formula is interpreted by the R backend.target-varA keyword or string naming the target variable for regression. This variable must be present in the input dataset.formula-implAn implementation keyword for formula evaluation:-
:ocpuUses OpenCPU (cloud.opencpu.org), no local R needed :renjinUses Renjin, a Java implementation of R:clojisrUses clojisr with local R installation
Requires setup of dependencies of teh engine, see: r-model-matrix
Returns: A trained linear model (OLS from fastmath) ready for predictions. The model excludes the intercept column and row names from the design matrix by default.
Examples
Make linear model with formula using R wit :rejin backend
(require (quote [scicloj.metamorph.ml.rdatasets :as rdatasets])
(quote [scicloj.metamorph.ml.regression]))
;;=> nil
(def model
(-> (rdatasets/datasets-iris)
(lm "`sepal-width` ~ `sepal-length` + `petal-length` "
:sepal-width
:renjin)))
;;=> #'scicloj.metamorph.ml.r-model-matrix/modelmodel
r-model-matrix
(r-model-matrix dataset r-formula impl)Compute a model matrix from a dataset and an R-style formula.
Parameters:
dsA tech.ml.dataset dataset representing the input data.r-formulaA string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatibleimplAn implementation keyword, either:ocpuUses an online service https://www.opencpu.org/api.html (server: cloud.opencpu.org):renjineUses https://renjin.org/:clojisrUses https://github.com/scicloj/clojisr, which requires a local R installation
Each implementation requires dependencies to be added:
:ocpu: opencpu-clj/opencpu-clj “0.3.1”:renjin: org.renjin/renjin-script-engine “3.5-beta76”:clojisr: scicloj/clojisr “1.1.0”
Returns a dataset containing the constructed design matrix. If ds contains target columns, they are added to the returned dataset.
Dispatches to the appropriate backend implementation.
Returns a map with
:model-matrix-datasethaving the TMD containing the design matrix specified byr-formula:attributesthe (R) attributes of the model.matrix object
Examples
Call with renjin backend
(require (quote [scicloj.metamorph.ml.rdatasets :as rdatasets]))
;;=> nil
(-> (rdatasets/datasets-mtcars)
(r-model-matrix "mpg ~ as.factor(cyl) * hp + disp" :renjin)
:model-matrix-dataset
(print/dataset->str))
;;=> _unnamed [32 7]:
;;=>
;;=> | X.Intercept. | as.factor.cyl.6 | as.factor.cyl.8 | hp | disp | as.factor.cyl.6.hp | as.factor.cyl.8.hp |
;;=> |-------------:|----------------:|----------------:|------:|------:|-------------------:|-------------------:|
;;=> | 1.0 | 1.0 | 0.0 | 110.0 | 160.0 | 110.0 | 0.0 |
;;=> | 1.0 | 1.0 | 0.0 | 110.0 | 160.0 | 110.0 | 0.0 |
;;=> | 1.0 | 0.0 | 0.0 | 93.0 | 108.0 | 0.0 | 0.0 |
;;=> | 1.0 | 1.0 | 0.0 | 110.0 | 258.0 | 110.0 | 0.0 |
;;=> | 1.0 | 0.0 | 1.0 | 175.0 | 360.0 | 0.0 | 175.0 |
;;=> | 1.0 | 1.0 | 0.0 | 105.0 | 225.0 | 105.0 | 0.0 |
;;=> | 1.0 | 0.0 | 1.0 | 245.0 | 360.0 | 0.0 | 245.0 |
;;=> | 1.0 | 0.0 | 0.0 | 62.0 | 146.7 | 0.0 | 0.0 |
;;=> | 1.0 | 0.0 | 0.0 | 95.0 | 140.8 | 0.0 | 0.0 |
;;=> | 1.0 | 1.0 | 0.0 | 123.0 | 167.6 | 123.0 | 0.0 |
;;=> | ... | ... | ... | ... | ... | ... | ... |
;;=> | 1.0 | 0.0 | 1.0 | 150.0 | 318.0 | 0.0 | 150.0 |
;;=> | 1.0 | 0.0 | 1.0 | 150.0 | 304.0 | 0.0 | 150.0 |
;;=> | 1.0 | 0.0 | 1.0 | 245.0 | 350.0 | 0.0 | 245.0 |
;;=> | 1.0 | 0.0 | 1.0 | 175.0 | 400.0 | 0.0 | 175.0 |
;;=> | 1.0 | 0.0 | 0.0 | 66.0 | 79.0 | 0.0 | 0.0 |
;;=> | 1.0 | 0.0 | 0.0 | 91.0 | 120.3 | 0.0 | 0.0 |
;;=> | 1.0 | 0.0 | 0.0 | 113.0 | 95.1 | 0.0 | 0.0 |
;;=> | 1.0 | 0.0 | 1.0 | 264.0 | 351.0 | 0.0 | 264.0 |
;;=> | 1.0 | 1.0 | 0.0 | 175.0 | 145.0 | 175.0 | 0.0 |
;;=> | 1.0 | 0.0 | 1.0 | 335.0 | 301.0 | 0.0 | 335.0 |
;;=> | 1.0 | 0.0 | 0.0 | 109.0 | 121.0 | 0.0 | 0.0 |Call with ocpu backend
(require (quote [scicloj.metamorph.ml.rdatasets :as rdatasets]))
;;=> nil
(-> (rdatasets/datasets-iris)
(ds/remove-column :rownames)
(ds-mod/set-inference-target [:species])
(r-model-matrix "species ~ ." :ocpu)
:model-matrix-dataset
(print/dataset->str))
;;=> :_unnamed [150 6]:
;;=>
;;=> | (Intercept) | `sepal-length` | `sepal-width` | `petal-length` | `petal-width` | :species |
;;=> |------------:|---------------:|--------------:|---------------:|--------------:|-----------|
;;=> | 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
;;=> | 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
;;=> | 1 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
;;=> | 1 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
;;=> | 1 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
;;=> | 1 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
;;=> | 1 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
;;=> | 1 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
;;=> | 1 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
;;=> | 1 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
;;=> | ... | ... | ... | ... | ... | ... |
;;=> | 1 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
;;=> | 1 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
;;=> | 1 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
;;=> | 1 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
;;=> | 1 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
;;=> | 1 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
;;=> | 1 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
;;=> | 1 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
;;=> | 1 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
;;=> | 1 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
;;=> | 1 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |