20 Smile classification models reference
As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.
This specific chapter focuses on classification models of Smile version 2.6, which are wrapped by scicloj.ml.smile.
Note that this chapter requires scicloj.ml.smile as an additional dependency to Noj.
(ns noj-book.smile-classification
(:require
[noj-book.utils.example-code :refer [iris-std]]
[noj-book.utils.render-tools :refer [render-key-info]]
[noj-book.utils.surface-plot :refer [surface-plot]]
[scicloj.kindly.v4.kind :as kind]
[scicloj.metamorph.core :as mm]
[scicloj.metamorph.ml :as ml]
[scicloj.ml.smile.classification]
[scicloj.ml.xgboost]
[tech.v3.dataset.metamorph :as ds-mm]))20.1 Smile classification models reference
In the following we have a list of all model keys of Smile classification models, including parameters. They can be used like this:
(comment
(ml/train df
{:model-type <model-key>
:param-1 0
:param-2 1}))20.2 :smile.classification/ada-boost
| name | type | default | description |
|---|---|---|---|
| trees | int32 | 500 | Number of trees |
| max-depth | int32 | 200 | Maximum depth of the tree |
| max-nodes | int32 | 6 | Maximum number of leaf nodes in the tree |
| node-size | int32 | 1 | Number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results |
In this example we will use the capability of the AdaBoost classifier to give us the importance of variables.
As data we take here the Wisconsin Breast Cancer dataset, which has 30 variables.
(def df (-> (datasets/breast-cancer-ds)))(tc/column-names df)(:mean-radius
:mean-texture
:mean-perimeter
:mean-area
:mean-smoothness
:mean-compactness
:mean-concavity
:mean-concave-points
:mean-symmetry
:mean-fractal-dimension
:radius-error
:texture-error
:perimeter-error
:area-error
:smoothness-error
:compactness-error
:concavity-error
:concave-points-error
:symmetry-error
:fractal-dimension-error
:worst-radius
:worst-texture
:worst-perimeter
:worst-area
:worst-smoothness
:worst-compactness
:worst-concavity
:worst-concave-points
:worst-symmetry
:worst-fractal-dimension
:class)To get an overview of the dataset, we print its summary:
(-> df tc/info)https://vincentarelbundock.github.io/Rdatasets/csv/dslabs/brca.csv: descriptive-stats [31 12]:
| :col-name | :datatype | :n-valid | :n-missing | :min | :mean | :mode | :max | :standard-deviation | :skew | :first | :last |
|---|---|---|---|---|---|---|---|---|---|---|---|
| :mean-radius | :float64 | 569 | 0 | 6.9810000 | 14.12729174 | 28.11000 | 3.52404883 | 0.94237957 | 13.540000 | 20.600000 | |
| :mean-texture | :float64 | 569 | 0 | 9.7100000 | 19.28964851 | 39.28000 | 4.30103577 | 0.65044954 | 14.360000 | 29.330000 | |
| :mean-perimeter | :float64 | 569 | 0 | 43.7900000 | 91.96903339 | 188.50000 | 24.29898104 | 0.99065043 | 87.460000 | 140.100000 | |
| :mean-area | :float64 | 569 | 0 | 143.5000000 | 654.88910369 | 2501.00000 | 351.91412918 | 1.64573218 | 566.300000 | 1265.000000 | |
| :mean-smoothness | :float64 | 569 | 0 | 0.0526300 | 0.09636028 | 0.16340 | 0.01406413 | 0.45632376 | 0.097790 | 0.117800 | |
| :mean-compactness | :float64 | 569 | 0 | 0.0193800 | 0.10434098 | 0.34540 | 0.05281276 | 1.19012303 | 0.081290 | 0.277000 | |
| :mean-concavity | :float64 | 569 | 0 | 0.0000000 | 0.08879932 | 0.42680 | 0.07971981 | 1.40117974 | 0.066640 | 0.351400 | |
| :mean-concave-points | :float64 | 569 | 0 | 0.0000000 | 0.04891915 | 0.20120 | 0.03880284 | 1.17118008 | 0.047810 | 0.152000 | |
| :mean-symmetry | :float64 | 569 | 0 | 0.1060000 | 0.18116186 | 0.30400 | 0.02741428 | 0.72560897 | 0.188500 | 0.239700 | |
| :mean-fractal-dimension | :float64 | 569 | 0 | 0.0499600 | 0.06279761 | 0.09744 | 0.00706036 | 1.30448881 | 0.057660 | 0.070160 | |
| :radius-error | :float64 | 569 | 0 | 0.1115000 | 0.40517206 | 2.87300 | 0.27731273 | 3.08861217 | 0.269900 | 0.726000 | |
| :texture-error | :float64 | 569 | 0 | 0.3602000 | 1.21685343 | 4.88500 | 0.55164839 | 1.64644381 | 0.788600 | 1.595000 | |
| :perimeter-error | :float64 | 569 | 0 | 0.7570000 | 2.86605923 | 21.98000 | 2.02185455 | 3.44361520 | 2.058000 | 5.772000 | |
| :area-error | :float64 | 569 | 0 | 6.8020000 | 40.33707909 | 542.20000 | 45.49100552 | 5.44718628 | 23.560000 | 86.220000 | |
| :smoothness-error | :float64 | 569 | 0 | 0.0017130 | 0.00704098 | 0.03113 | 0.00300252 | 2.31445006 | 0.008462 | 0.006522 | |
| :compactness-error | :float64 | 569 | 0 | 0.0022520 | 0.02547814 | 0.13540 | 0.01790818 | 1.90222071 | 0.014600 | 0.061580 | |
| :concavity-error | :float64 | 569 | 0 | 0.0000000 | 0.03189372 | 0.39600 | 0.03018606 | 5.11046305 | 0.023870 | 0.071170 | |
| :concave-points-error | :float64 | 569 | 0 | 0.0000000 | 0.01179614 | 0.05279 | 0.00617029 | 1.44467814 | 0.013150 | 0.016640 | |
| :symmetry-error | :float64 | 569 | 0 | 0.0078820 | 0.02054230 | 0.07895 | 0.00826637 | 2.19513290 | 0.019800 | 0.023240 | |
| :fractal-dimension-error | :float64 | 569 | 0 | 0.0008948 | 0.00379490 | 0.02984 | 0.00264607 | 3.92396862 | 0.002300 | 0.006185 | |
| :worst-radius | :float64 | 569 | 0 | 7.9300000 | 16.26918981 | 36.04000 | 4.83324158 | 1.10311521 | 15.110000 | 25.740000 | |
| :worst-texture | :float64 | 569 | 0 | 12.0200000 | 25.67722320 | 49.54000 | 6.14625762 | 0.49832131 | 19.260000 | 39.420000 | |
| :worst-perimeter | :float64 | 569 | 0 | 50.4100000 | 107.26121265 | 251.20000 | 33.60254227 | 1.12816387 | 99.700000 | 184.600000 | |
| :worst-area | :float64 | 569 | 0 | 185.2000000 | 880.58312830 | 4254.00000 | 569.35699267 | 1.85937327 | 711.200000 | 1821.000000 | |
| :worst-smoothness | :float64 | 569 | 0 | 0.0711700 | 0.13236859 | 0.22260 | 0.02283243 | 0.41542600 | 0.144000 | 0.165000 | |
| :worst-compactness | :float64 | 569 | 0 | 0.0272900 | 0.25426504 | 1.05800 | 0.15733649 | 1.47355490 | 0.177300 | 0.868100 | |
| :worst-concavity | :float64 | 569 | 0 | 0.0000000 | 0.27218848 | 1.25200 | 0.20862428 | 1.15023682 | 0.239000 | 0.938700 | |
| :worst-concave-points | :float64 | 569 | 0 | 0.0000000 | 0.11460622 | 0.29100 | 0.06573234 | 0.49261553 | 0.128800 | 0.265000 | |
| :worst-symmetry | :float64 | 569 | 0 | 0.1565000 | 0.29007557 | 0.66380 | 0.06186747 | 1.43392777 | 0.297700 | 0.408700 | |
| :worst-fractal-dimension | :float64 | 569 | 0 | 0.0550400 | 0.08394582 | 0.20750 | 0.01806127 | 1.66257927 | 0.072590 | 0.124000 | |
| :class | :int16 | 569 | 0 | 0 | 0.000000 | 1.000000 |
Then we create a metamorph pipeline with the AdaBoost model:
(def ada-pipe-fn
(mm/pipeline
(ds-mm/set-inference-target :class)
(ds-mm/categorical->number [:class])
(ml/model {:model-type :smile.classification/ada-boost})))We run the pipeline in :fit. As we just explore the data, no train/test split is needed.
(def trained-ctx (mm/fit-pipe df ada-pipe-fn))Next we take the model out of the pipeline:
(def model (-> trained-ctx vals (nth 2) ml/thaw-model))The variable importance can be obtained from the trained model,
(def var-importances
(mapv
#(hash-map :variable %1 :importance %2)
(map #(first (.variables %)) (.. model formula predictors))
(.importance model)))var-importances[{:variable "mean-radius", :importance 29.065369497534117}
{:variable "mean-texture", :importance 38.787135865841286}
{:variable "mean-perimeter", :importance 4.276342216959478}
{:variable "mean-area", :importance 6.49083987169664}
{:variable "mean-smoothness", :importance 17.13100474281572}
{:variable "mean-compactness", :importance 11.039539086184842}
{:variable "mean-concavity", :importance 11.078034237258203}
{:variable "mean-concave-points", :importance 20.76804690217321}
{:variable "mean-symmetry", :importance 12.948995467737223}
{:variable "mean-fractal-dimension", :importance 9.961748451892227}
{:variable "radius-error", :importance 11.287573604717792}
{:variable "texture-error", :importance 9.931347675503154}
{:variable "perimeter-error", :importance 12.763864109009916}
{:variable "area-error", :importance 14.530125703392487}
{:variable "smoothness-error", :importance 12.358806749040111}
{:variable "compactness-error", :importance 13.916811329262362}
{:variable "concavity-error", :importance 5.217113039229697}
{:variable "concave-points-error", :importance 9.379969518785954}
{:variable "symmetry-error", :importance 7.420769656049224}
{:variable "fractal-dimension-error", :importance 15.550420136987462}
{:variable "worst-radius", :importance 10.851618262760901}
{:variable "worst-texture", :importance 28.18262416245342}
{:variable "worst-perimeter", :importance 10.321125896155744}
{:variable "worst-area", :importance 11.199088024427994}
{:variable "worst-smoothness", :importance 17.582237702011934}
{:variable "worst-compactness", :importance 5.625785824349872}
{:variable "worst-concavity", :importance 15.561260423064136}
{:variable "worst-concave-points", :importance 16.104322818468752}
{:variable "worst-symmetry", :importance 11.404577675521441}
{:variable "worst-fractal-dimension", :importance 7.837453382125137}]and we plot the variables:
(kind/vega-lite
{:data {:values var-importances},
:width 800,
:height 500,
:mark {:type "bar"},
:encoding
{:x {:field :variable, :type "nominal", :sort "-y"},
:y {:field :importance, :type "quantitative"}}})20.3 :smile.classification/decision-tree
| name | type | default | description |
|---|---|---|---|
| max-nodes | int32 | 100 | maximum number of leaf nodes in the tree |
| node-size | int32 | 1 | minimum size of leaf nodes |
| max-depth | int32 | 20 | maximum depth of the tree |
| split-rule | keyword | gini | the splitting rule |
A decision tree learns a set of rules from the data in the form of a tree, that we will plot in this example. We use the iris dataset:
(def iris (datasets/iris-ds))irishttps://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv [150 5]:
| :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | 0 |
| 4.9 | 3.0 | 1.4 | 0.2 | 0 |
| 4.7 | 3.2 | 1.3 | 0.2 | 0 |
| 4.6 | 3.1 | 1.5 | 0.2 | 0 |
| 5.0 | 3.6 | 1.4 | 0.2 | 0 |
| 5.4 | 3.9 | 1.7 | 0.4 | 0 |
| 4.6 | 3.4 | 1.4 | 0.3 | 0 |
| 5.0 | 3.4 | 1.5 | 0.2 | 0 |
| 4.4 | 2.9 | 1.4 | 0.2 | 0 |
| 4.9 | 3.1 | 1.5 | 0.1 | 0 |
| … | … | … | … | … |
| 6.9 | 3.1 | 5.4 | 2.1 | 1 |
| 6.7 | 3.1 | 5.6 | 2.4 | 1 |
| 6.9 | 3.1 | 5.1 | 2.3 | 1 |
| 5.8 | 2.7 | 5.1 | 1.9 | 1 |
| 6.8 | 3.2 | 5.9 | 2.3 | 1 |
| 6.7 | 3.3 | 5.7 | 2.5 | 1 |
| 6.7 | 3.0 | 5.2 | 2.3 | 1 |
| 6.3 | 2.5 | 5.0 | 1.9 | 1 |
| 6.5 | 3.0 | 5.2 | 2.0 | 1 |
| 6.2 | 3.4 | 5.4 | 2.3 | 1 |
| 5.9 | 3.0 | 5.1 | 1.8 | 1 |
We make a pipe only containing the model, as the dataset is ready to be used by metamorph.ml.
(def trained-pipe-tree
(mm/fit-pipe
iris
(mm/pipeline
#:metamorph{:id :model}
(ml/model {:model-type :smile.classification/decision-tree}))))We extract the Java object of the trained model.
(def tree-model (-> trained-pipe-tree :model ml/thaw-model))tree-model#object[smile.classification.DecisionTree 0x75a9e37b "n=150\nnode), split, n, loss, yval, (yprob)\n* denotes terminal node\n1) root 150 329.58 0 (0.33333 0.33333 0.33333)\n 2) petal-length<=2.45000 50 3.8466 0 (0.96226 0.018868 0.018868) *\n 3) petal-length>2.45000 100 140.58 1 (0.0097087 0.49515 0.49515)\n 6) petal-width<=1.75000 54 35.354 2 (0.017544 0.10526 0.87719)\n 12) sepal-length<=7.10000 53 30.434 2 (0.017857 0.089286 0.89286)\n 24) petal-width<=1.65000 51 24.944 2 (0.018519 0.074074 0.90741) *\n 25) petal-width>1.65000 2 3.6652 1 (0.20000 0.40000 0.40000)\n 50) sepal-width<=2.75000 1 1.3863 1 (0.25000 0.50000 0.25000) *\n 51) sepal-width>2.75000 1 1.3863 2 (0.25000 0.25000 0.50000) *\n 13) sepal-length>7.10000 1 1.3863 1 (0.25000 0.50000 0.25000) *\n 7) petal-width>1.75000 46 12.083 1 (0.020408 0.93878 0.040816) *"]The model has a .dot function, which returns a GraphViz textual representation of the decision tree. We render to svg using the kroki service.
(kind/html
(String. (:body (kroki (.dot tree-model) :graphviz :svg)) "UTF-8"))20.4 :smile.classification/discrete-naive-bayes
| name | type | default | description |
|---|---|---|---|
| p | int32 | ||
| k | int32 | ||
| discrete-naive-bayes-model | keyword | ||
| sparse-column | keyword |
20.5 :smile.classification/fld
| name | type | default | description |
|---|---|---|---|
| dimension | int32 | -1.0 | The dimensionality of mapped space. |
| tolerance | float64 | 1.0E-4 | A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol |
20.6 :smile.classification/gradient-tree-boost
| name | type | default | description |
|---|---|---|---|
| ntrees | int32 | 500.0 | number of iterations (trees) |
| max-depth | int32 | 20.0 | maximum depth of the tree |
| max-nodes | int32 | 6.0 | maximum number of leaf nodes in the tree |
| node-size | int32 | 5.0 | number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results |
| shrinkage | float64 | 0.05 | the shrinkage parameter in (0, 1] controls the learning rate of procedure |
| sampling-rate | float64 | 0.7 | the sampling fraction for stochastic tree boosting |
20.7 :smile.classification/knn
| name | type | default | description |
|---|---|---|---|
| k | int32 | 3 | number of neighbors for decision |
In this example we use a k-NN model to classify some dummy data. The training data is this:
(def df-knn
(tc/dataset
{:x1 [7 7 3 1], :x2 [7 4 4 4], :y [:bad :bad :good :good]}))df-knn_unnamed [4 3]:
| :x1 | :x2 | :y |
|---|---|---|
| 7 | 7 | :bad |
| 7 | 4 | :bad |
| 3 | 4 | :good |
| 1 | 4 | :good |
Then we construct a pipeline with the k-NN model, using 3 neighbors for decision.
(def knn-pipe-fn
(mm/pipeline
(ds-mm/set-inference-target :y)
(ds-mm/categorical->number [:y])
(ml/model {:model-type :smile.classification/knn, :k 3})))We run the pipeline in mode :fit:
(def trained-ctx-knn
(knn-pipe-fn #:metamorph{:data df-knn, :mode :fit}))Then we run the pipeline in mode :transform with some test data, take the prediction, and convert it from numeric into categorical:
(-> trained-ctx-knn
(merge
#:metamorph{:data (tc/dataset {:x1 [3 5], :x2 [7 5], :y [nil nil]}),
:mode :transform})
knn-pipe-fn
:metamorph/data
(ds-mod/column-values->categorical :y)
seq)(:good :bad)20.8 :smile.classification/linear-discriminant-analysis
| name | type | default | description |
|---|---|---|---|
| prioiri | float64-array | The priori probability of each class. If null, it will be estimated from the training data. | |
| tolerance | float64 | 1.0E-4 | A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol |
20.9 :smile.classification/logistic-regression
| name | type | default | description |
|---|---|---|---|
| lambda | float64 | 0.1 | lambda > 0 gives a regularized estimate of linear weights which often has superior generalization performance, especially when the dimensionality is high |
| tolerance | float64 | 1.0E-5 | tolerance for stopping iterations |
| max-iterations | int32 | 500.0 | maximum number of iterations |
20.10 :smile.classification/maxent-binomial
20.11 :smile.classification/maxent-multinomial
20.12 :smile.classification/mlp
| name | type | default | description |
|---|---|---|---|
| layer-builders | seq |
|
Sequence of type smile.base.mlp.LayerBuilder describing the layers of the neural network |
20.13 :smile.classification/quadratic-discriminant-analysis
| name | type | default | description |
|---|---|---|---|
| prioiri | float64-array | The priori probability of each class. If null, it will be estimated from the training data. | |
| tolerance | float64 | 1.0E-4 | A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol |
20.14 :smile.classification/random-forest
| name | type | default | description |
|---|---|---|---|
| trees | int32 | 500 | Number of trees |
| mtry | int32 | 0 | number of input variables to be used to determine the decision at a node of the tree. floor(sqrt(p)) generally gives good performance, where p is the number of variables |
| split-rule | keyword | gini | Decision tree split rule |
| max-depth | int32 | 20 | Maximum depth of tree |
| max-nodes | int32 | scicloj.ml.smile.classification$fn__67531@56c93174 | Maximum number of leaf nodes in the tree |
| node-size | int32 | 5 | number of instances in a node below which the tree will not split, nodeSize = 5 generally gives good results |
| sample-rate | float32 | 1.0 | the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means sampling without replacement. |
| class-weight | string | Priors of the classes. The weight of each class is roughly the ratio of samples in each class. For example, if there are 400 positive samples and 100 negative samples, the classWeight should be [1, 4] (assuming label 0 is of negative, label 1 is of positive) |
The following code plots the decision surfaces of the random forest model on pairs of features.
We use the Iris dataset for this.
iris-stdhttps://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv [150 5]:
| :sepal-length | :sepal-width | :petal-length | :petal-width | :species |
|---|---|---|---|---|
| -0.89767388 | 1.01560199 | -1.33575163 | -1.31105215 | 0 |
| -1.13920048 | -0.13153881 | -1.33575163 | -1.31105215 | 0 |
| -1.38072709 | 0.32731751 | -1.39239929 | -1.31105215 | 0 |
| -1.50149039 | 0.09788935 | -1.27910398 | -1.31105215 | 0 |
| -1.01843718 | 1.24503015 | -1.33575163 | -1.31105215 | 0 |
| -0.53538397 | 1.93331463 | -1.16580868 | -1.04866679 | 0 |
| -1.50149039 | 0.78617383 | -1.33575163 | -1.17985947 | 0 |
| -1.01843718 | 0.78617383 | -1.27910398 | -1.31105215 | 0 |
| -1.74301699 | -0.36096697 | -1.33575163 | -1.31105215 | 0 |
| -1.13920048 | 0.09788935 | -1.27910398 | -1.44224482 | 0 |
| … | … | … | … | … |
| 1.27606556 | 0.09788935 | 0.93015445 | 1.18160871 | 1 |
| 1.03453895 | 0.09788935 | 1.04344975 | 1.57518674 | 1 |
| 1.27606556 | 0.09788935 | 0.76021149 | 1.44399406 | 1 |
| -0.05233076 | -0.81982329 | 0.76021149 | 0.91922335 | 1 |
| 1.15530226 | 0.32731751 | 1.21339271 | 1.44399406 | 1 |
| 1.03453895 | 0.55674567 | 1.10009740 | 1.70637941 | 1 |
| 1.03453895 | -0.13153881 | 0.81685914 | 1.44399406 | 1 |
| 0.55148575 | -1.27867961 | 0.70356384 | 0.91922335 | 1 |
| 0.79301235 | -0.13153881 | 0.81685914 | 1.05041603 | 1 |
| 0.43072244 | 0.78617383 | 0.93015445 | 1.44399406 | 1 |
| 0.06843254 | -0.13153881 | 0.76021149 | 0.78803068 | 1 |
The next function creates a Vega-Lite specification for the random forest decision surface for a given pair of column names.
#'noj-book.utils.example-code/make-iris-pipeline(def rf-pipe
(make-iris-pipeline {:model-type :smile.classification/random-forest}))#'noj-book.utils.example-code/iris(kind/vega-lite
(surface-plot
iris
[:sepal-length :sepal-width]
rf-pipe
:smile.classification/random-forest))(kind/vega-lite
(surface-plot
iris-std
[:sepal-length :petal-length]
rf-pipe
:smile.classification/random-forest))(kind/vega-lite
(surface-plot
iris-std
[:sepal-length :petal-width]
rf-pipe
:smile.classification/random-forest))(kind/vega-lite
(surface-plot
iris-std
[:sepal-width :petal-length]
rf-pipe
:smile.classification/random-forest))(kind/vega-lite
(surface-plot
iris-std
[:sepal-width :petal-width]
rf-pipe
:smile.classification/random-forest))(kind/vega-lite
(surface-plot
iris-std
[:petal-length :petal-width]
rf-pipe
:smile.classification/random-forest))20.15 :smile.classification/regularized-discriminant-analysis
| name | type | default | description |
|---|---|---|---|
| prioiri | float64-array | The priori probability of each class. If null, it will be estimated from the training data. | |
| alpha | float64 | 0.9 | Regularization factor in [0, 1] allows a continuum of models between LDA and QDA. |
| tolerance | float64 | 1.0E-4 | A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol |
20.16 :smile.classification/sparse-logistic-regression
| name | type | default | description |
|---|---|---|---|
| lambda | float32 | 0.1 | |
| tolerance | float32 | 1.0E-5 | |
| max-iterations | int32 | 500.0 | |
| sparse-column | key-string-symbol | ||
| n-sparse-columns | int32 |
20.17 :smile.classification/sparse-svm
| name | type | default | description |
|---|---|---|---|
| C | float32 | 1.0 | soft margin penalty parameter |
| tol | float32 | 1.0E-4 | tolerance of convergence test |
| sparse-column | keyword | ||
| p | int32 |
20.18 :smile.classification/svm
| name | type | default | description |
|---|---|---|---|
| C | float32 | 1.0 | soft margin penalty parameter |
| tol | float32 | 1.0E-4 | tolerance of convergence test |
21 Compare decision surfaces of different classification models
In the following we see the decision surfaces of some models on the same data from the Iris dataset using 2 columns :sepal-width and :sepal-length:
[
]
This shows nicely that different model types have different capabilities to separate and therefore classify data.