20 Smile classification models reference

As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.

This specific chapter focuses on classification models of Smile version 2.6, which are wrapped by scicloj.ml.smile.

Note that this chapter requires scicloj.ml.smile as an additional dependency to Noj.

(ns noj-book.smile-classification
  (:require
   [noj-book.utils.example-code :refer [iris-std]]
   [noj-book.utils.render-tools :refer [render-key-info]]
   [noj-book.utils.surface-plot :refer [surface-plot]]
   [scicloj.kindly.v4.kind :as kind]
   [scicloj.metamorph.core :as mm]
   [scicloj.metamorph.ml :as ml]
   [scicloj.ml.smile.classification]
   [scicloj.ml.xgboost]
   [tech.v3.dataset.metamorph :as ds-mm]))

20.1 Smile classification models reference

In the following we have a list of all model keys of Smile classification models, including parameters. They can be used like this:

(comment
  (ml/train df
            {:model-type <model-key>
             :param-1 0
             :param-2 1}))

20.2 :smile.classification/ada-boost

javadoc

user guide

name	type	default	description
trees	int32	500	Number of trees
max-depth	int32	200	Maximum depth of the tree
max-nodes	int32	6	Maximum number of leaf nodes in the tree
node-size	int32	1	Number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results

In this example we will use the capability of the AdaBoost classifier to give us the importance of variables.

As data we take here the Wisconsin Breast Cancer dataset, which has 30 variables.

(def df (-> (datasets/breast-cancer-ds)))

(tc/column-names df)

(:mean-radius
 :mean-texture
 :mean-perimeter
 :mean-area
 :mean-smoothness
 :mean-compactness
 :mean-concavity
 :mean-concave-points
 :mean-symmetry
 :mean-fractal-dimension
 :radius-error
 :texture-error
 :perimeter-error
 :area-error
 :smoothness-error
 :compactness-error
 :concavity-error
 :concave-points-error
 :symmetry-error
 :fractal-dimension-error
 :worst-radius
 :worst-texture
 :worst-perimeter
 :worst-area
 :worst-smoothness
 :worst-compactness
 :worst-concavity
 :worst-concave-points
 :worst-symmetry
 :worst-fractal-dimension
 :class)

To get an overview of the dataset, we print its summary:

(-> df tc/info)

https://vincentarelbundock.github.io/Rdatasets/csv/dslabs/brca.csv: descriptive-stats [31 12]:

:col-name	:datatype	:n-valid	:min	:mean	:mode	:max	:standard-deviation	:skew	:first	:last
:mean-radius	:float64	569	6.9810000	14.12729174		28.11000	3.52404883	0.94237957	13.540000	20.600000
:mean-texture	:float64	569	9.7100000	19.28964851		39.28000	4.30103577	0.65044954	14.360000	29.330000
:mean-perimeter	:float64	569	43.7900000	91.96903339		188.50000	24.29898104	0.99065043	87.460000	140.100000
:mean-area	:float64	569	143.5000000	654.88910369		2501.00000	351.91412918	1.64573218	566.300000	1265.000000
:mean-smoothness	:float64	569	0.0526300	0.09636028		0.16340	0.01406413	0.45632376	0.097790	0.117800
:mean-compactness	:float64	569	0.0193800	0.10434098		0.34540	0.05281276	1.19012303	0.081290	0.277000
:mean-concavity	:float64	569	0.0000000	0.08879932		0.42680	0.07971981	1.40117974	0.066640	0.351400
:mean-concave-points	:float64	569	0.0000000	0.04891915		0.20120	0.03880284	1.17118008	0.047810	0.152000
:mean-symmetry	:float64	569	0.1060000	0.18116186		0.30400	0.02741428	0.72560897	0.188500	0.239700
:mean-fractal-dimension	:float64	569	0.0499600	0.06279761		0.09744	0.00706036	1.30448881	0.057660	0.070160
:radius-error	:float64	569	0.1115000	0.40517206		2.87300	0.27731273	3.08861217	0.269900	0.726000
:texture-error	:float64	569	0.3602000	1.21685343		4.88500	0.55164839	1.64644381	0.788600	1.595000
:perimeter-error	:float64	569	0.7570000	2.86605923		21.98000	2.02185455	3.44361520	2.058000	5.772000
:area-error	:float64	569	6.8020000	40.33707909		542.20000	45.49100552	5.44718628	23.560000	86.220000
:smoothness-error	:float64	569	0.0017130	0.00704098		0.03113	0.00300252	2.31445006	0.008462	0.006522
:compactness-error	:float64	569	0.0022520	0.02547814		0.13540	0.01790818	1.90222071	0.014600	0.061580
:concavity-error	:float64	569	0.0000000	0.03189372		0.39600	0.03018606	5.11046305	0.023870	0.071170
:concave-points-error	:float64	569	0.0000000	0.01179614		0.05279	0.00617029	1.44467814	0.013150	0.016640
:symmetry-error	:float64	569	0.0078820	0.02054230		0.07895	0.00826637	2.19513290	0.019800	0.023240
:fractal-dimension-error	:float64	569	0.0008948	0.00379490		0.02984	0.00264607	3.92396862	0.002300	0.006185
:worst-radius	:float64	569	7.9300000	16.26918981		36.04000	4.83324158	1.10311521	15.110000	25.740000
:worst-texture	:float64	569	12.0200000	25.67722320		49.54000	6.14625762	0.49832131	19.260000	39.420000
:worst-perimeter	:float64	569	50.4100000	107.26121265		251.20000	33.60254227	1.12816387	99.700000	184.600000
:worst-area	:float64	569	185.2000000	880.58312830		4254.00000	569.35699267	1.85937327	711.200000	1821.000000
:worst-smoothness	:float64	569	0.0711700	0.13236859		0.22260	0.02283243	0.41542600	0.144000	0.165000
:worst-compactness	:float64	569	0.0272900	0.25426504		1.05800	0.15733649	1.47355490	0.177300	0.868100
:worst-concavity	:float64	569	0.0000000	0.27218848		1.25200	0.20862428	1.15023682	0.239000	0.938700
:worst-concave-points	:float64	569	0.0000000	0.11460622		0.29100	0.06573234	0.49261553	0.128800	0.265000
:worst-symmetry	:float64	569	0.1565000	0.29007557		0.66380	0.06186747	1.43392777	0.297700	0.408700
:worst-fractal-dimension	:float64	569	0.0550400	0.08394582		0.20750	0.01806127	1.66257927	0.072590	0.124000
:class	:int16	569			0				0.000000	1.000000

Then we create a metamorph pipeline with the AdaBoost model:

(def ada-pipe-fn
 (mm/pipeline
   (ds-mm/set-inference-target :class)
   (ds-mm/categorical->number [:class])
   (ml/model {:model-type :smile.classification/ada-boost})))

We run the pipeline in :fit. As we just explore the data, no train/test split is needed.

(def trained-ctx (mm/fit-pipe df ada-pipe-fn))

Next we take the model out of the pipeline:

(def model (-> trained-ctx vals (nth 2) ml/thaw-model))

The variable importance can be obtained from the trained model,

(def var-importances
 (mapv
   #(hash-map :variable %1 :importance %2)
   (map #(first (.variables %)) (.. model formula predictors))
   (.importance model)))

var-importances

[{:variable "mean-radius", :importance 29.065369497534117}
 {:variable "mean-texture", :importance 38.787135865841286}
 {:variable "mean-perimeter", :importance 4.276342216959478}
 {:variable "mean-area", :importance 6.49083987169664}
 {:variable "mean-smoothness", :importance 17.13100474281572}
 {:variable "mean-compactness", :importance 11.039539086184842}
 {:variable "mean-concavity", :importance 11.078034237258203}
 {:variable "mean-concave-points", :importance 20.76804690217321}
 {:variable "mean-symmetry", :importance 12.948995467737223}
 {:variable "mean-fractal-dimension", :importance 9.961748451892227}
 {:variable "radius-error", :importance 11.287573604717792}
 {:variable "texture-error", :importance 9.931347675503154}
 {:variable "perimeter-error", :importance 12.763864109009916}
 {:variable "area-error", :importance 14.530125703392487}
 {:variable "smoothness-error", :importance 12.358806749040111}
 {:variable "compactness-error", :importance 13.916811329262362}
 {:variable "concavity-error", :importance 5.217113039229697}
 {:variable "concave-points-error", :importance 9.379969518785954}
 {:variable "symmetry-error", :importance 7.420769656049224}
 {:variable "fractal-dimension-error", :importance 15.550420136987462}
 {:variable "worst-radius", :importance 10.851618262760901}
 {:variable "worst-texture", :importance 28.18262416245342}
 {:variable "worst-perimeter", :importance 10.321125896155744}
 {:variable "worst-area", :importance 11.199088024427994}
 {:variable "worst-smoothness", :importance 17.582237702011934}
 {:variable "worst-compactness", :importance 5.625785824349872}
 {:variable "worst-concavity", :importance 15.561260423064136}
 {:variable "worst-concave-points", :importance 16.104322818468752}
 {:variable "worst-symmetry", :importance 11.404577675521441}
 {:variable "worst-fractal-dimension", :importance 7.837453382125137}]

and we plot the variables:

(kind/vega-lite
  {:data {:values var-importances},
   :width 800,
   :height 500,
   :mark {:type "bar"},
   :encoding
   {:x {:field :variable, :type "nominal", :sort "-y"},
    :y {:field :importance, :type "quantitative"}}})

20.3 :smile.classification/decision-tree

javadoc

user guide

name	type	default	description
max-nodes	int32	100	maximum number of leaf nodes in the tree
node-size	int32	1	minimum size of leaf nodes
max-depth	int32	20	maximum depth of the tree
split-rule	keyword	gini	the splitting rule

A decision tree learns a set of rules from the data in the form of a tree, that we will plot in this example. We use the iris dataset:

(def iris (datasets/iris-ds))

iris

https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv [150 5]:

:sepal-length	:sepal-width	:petal-length	:petal-width	:species
5.1	3.5	1.4	0.2	0
4.9	3.0	1.4	0.2	0
4.7	3.2	1.3	0.2	0
4.6	3.1	1.5	0.2	0
5.0	3.6	1.4	0.2	0
5.4	3.9	1.7	0.4	0
4.6	3.4	1.4	0.3	0
5.0	3.4	1.5	0.2	0
4.4	2.9	1.4	0.2	0
4.9	3.1	1.5	0.1	0
…	…	…	…	…
6.9	3.1	5.4	2.1	1
6.7	3.1	5.6	2.4	1
6.9	3.1	5.1	2.3	1
5.8	2.7	5.1	1.9	1
6.8	3.2	5.9	2.3	1
6.7	3.3	5.7	2.5	1
6.7	3.0	5.2	2.3	1
6.3	2.5	5.0	1.9	1
6.5	3.0	5.2	2.0	1
6.2	3.4	5.4	2.3	1
5.9	3.0	5.1	1.8	1

We make a pipe only containing the model, as the dataset is ready to be used by metamorph.ml.

(def trained-pipe-tree
 (mm/fit-pipe
   iris
   (mm/pipeline
     #:metamorph{:id :model}
     (ml/model {:model-type :smile.classification/decision-tree}))))

We extract the Java object of the trained model.

(def tree-model (-> trained-pipe-tree :model ml/thaw-model))

tree-model

#object[smile.classification.DecisionTree 0x75a9e37b "n=150\nnode), split, n, loss, yval, (yprob)\n* denotes terminal node\n1) root 150 329.58 0 (0.33333 0.33333 0.33333)\n 2) petal-length<=2.45000 50 3.8466 0 (0.96226 0.018868 0.018868) *\n 3) petal-length>2.45000 100 140.58 1 (0.0097087 0.49515 0.49515)\n  6) petal-width<=1.75000 54 35.354 2 (0.017544 0.10526 0.87719)\n   12) sepal-length<=7.10000 53 30.434 2 (0.017857 0.089286 0.89286)\n    24) petal-width<=1.65000 51 24.944 2 (0.018519 0.074074 0.90741) *\n    25) petal-width>1.65000 2 3.6652 1 (0.20000 0.40000 0.40000)\n     50) sepal-width<=2.75000 1 1.3863 1 (0.25000 0.50000 0.25000) *\n     51) sepal-width>2.75000 1 1.3863 2 (0.25000 0.25000 0.50000) *\n   13) sepal-length>7.10000 1 1.3863 1 (0.25000 0.50000 0.25000) *\n  7) petal-width>1.75000 46 12.083 1 (0.020408 0.93878 0.040816) *"]

The model has a .dot function, which returns a GraphViz textual representation of the decision tree. We render to svg using the kroki service.

(kind/html
  (String. (:body (kroki (.dot tree-model) :graphviz :svg)) "UTF-8"))

20.4 :smile.classification/discrete-naive-bayes

javadoc

user guide

name	type	default	description
p	int32
k	int32
discrete-naive-bayes-model	keyword
sparse-column	keyword

20.5 :smile.classification/fld

javadoc

user guide

name	type	default	description
dimension	int32	-1.0	The dimensionality of mapped space.
tolerance	float64	1.0E-4	A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol

20.6 :smile.classification/gradient-tree-boost

javadoc

user guide

name	type	default	description
ntrees	int32	500.0	number of iterations (trees)
max-depth	int32	20.0	maximum depth of the tree
max-nodes	int32	6.0	maximum number of leaf nodes in the tree
node-size	int32	5.0	number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results
shrinkage	float64	0.05	the shrinkage parameter in (0, 1] controls the learning rate of procedure
sampling-rate	float64	0.7	the sampling fraction for stochastic tree boosting

20.7 :smile.classification/knn

javadoc

user guide

name	type	default	description
k	int32	3	number of neighbors for decision

In this example we use a k-NN model to classify some dummy data. The training data is this:

(def df-knn
 (tc/dataset
   {:x1 [7 7 3 1], :x2 [7 4 4 4], :y [:bad :bad :good :good]}))

df-knn

_unnamed [4 3]:

:x1	:x2	:y
7	7	:bad
7	4	:bad
3	4	:good
1	4	:good

Then we construct a pipeline with the k-NN model, using 3 neighbors for decision.

(def knn-pipe-fn
 (mm/pipeline
   (ds-mm/set-inference-target :y)
   (ds-mm/categorical->number [:y])
   (ml/model {:model-type :smile.classification/knn, :k 3})))

We run the pipeline in mode :fit:

(def trained-ctx-knn
 (knn-pipe-fn #:metamorph{:data df-knn, :mode :fit}))

Then we run the pipeline in mode :transform with some test data, take the prediction, and convert it from numeric into categorical:

(-> trained-ctx-knn
 (merge
   #:metamorph{:data (tc/dataset {:x1 [3 5], :x2 [7 5], :y [nil nil]}),
               :mode :transform})
 knn-pipe-fn
 :metamorph/data
 (ds-mod/column-values->categorical :y)
 seq)

(:good :bad)

20.8 :smile.classification/linear-discriminant-analysis

javadoc

user guide

name	type	default	description
prioiri	float64-array		The priori probability of each class. If null, it will be estimated from the training data.
tolerance	float64	1.0E-4	A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol

20.9 :smile.classification/logistic-regression

javadoc

user guide

name	type	default	description
lambda	float64	0.1	lambda > 0 gives a regularized estimate of linear weights which often has superior generalization performance, especially when the dimensionality is high
tolerance	float64	1.0E-5	tolerance for stopping iterations
max-iterations	int32	500.0	maximum number of iterations

20.10 :smile.classification/maxent-binomial

20.11 :smile.classification/maxent-multinomial

20.12 :smile.classification/mlp

javadoc

user guide

name	type	default	description
layer-builders	seq	`[]`	Sequence of type smile.base.mlp.LayerBuilder describing the layers of the neural network

20.13 :smile.classification/quadratic-discriminant-analysis

javadoc

user guide

name	type	default	description
prioiri	float64-array		The priori probability of each class. If null, it will be estimated from the training data.
tolerance	float64	1.0E-4	A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol

20.14 :smile.classification/random-forest

javadoc

user guide

name	type	default	description
trees	int32	500	Number of trees
mtry	int32	0	number of input variables to be used to determine the decision at a node of the tree. floor(sqrt(p)) generally gives good performance, where p is the number of variables
split-rule	keyword	gini	Decision tree split rule
max-depth	int32	20	Maximum depth of tree
max-nodes	int32	scicloj.ml.smile.classification$fn__67531@56c93174	Maximum number of leaf nodes in the tree
node-size	int32	5	number of instances in a node below which the tree will not split, nodeSize = 5 generally gives good results
sample-rate	float32	1.0	the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means sampling without replacement.
class-weight	string		Priors of the classes. The weight of each class is roughly the ratio of samples in each class. For example, if there are 400 positive samples and 100 negative samples, the classWeight should be [1, 4] (assuming label 0 is of negative, label 1 is of positive)

The following code plots the decision surfaces of the random forest model on pairs of features.

We use the Iris dataset for this.

iris-std

https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv [150 5]:

:sepal-length	:sepal-width	:petal-length	:petal-width	:species
-0.89767388	1.01560199	-1.33575163	-1.31105215	0
-1.13920048	-0.13153881	-1.33575163	-1.31105215	0
-1.38072709	0.32731751	-1.39239929	-1.31105215	0
-1.50149039	0.09788935	-1.27910398	-1.31105215	0
-1.01843718	1.24503015	-1.33575163	-1.31105215	0
-0.53538397	1.93331463	-1.16580868	-1.04866679	0
-1.50149039	0.78617383	-1.33575163	-1.17985947	0
-1.01843718	0.78617383	-1.27910398	-1.31105215	0
-1.74301699	-0.36096697	-1.33575163	-1.31105215	0
-1.13920048	0.09788935	-1.27910398	-1.44224482	0
…	…	…	…	…
1.27606556	0.09788935	0.93015445	1.18160871	1
1.03453895	0.09788935	1.04344975	1.57518674	1
1.27606556	0.09788935	0.76021149	1.44399406	1
-0.05233076	-0.81982329	0.76021149	0.91922335	1
1.15530226	0.32731751	1.21339271	1.44399406	1
1.03453895	0.55674567	1.10009740	1.70637941	1
1.03453895	-0.13153881	0.81685914	1.44399406	1
0.55148575	-1.27867961	0.70356384	0.91922335	1
0.79301235	-0.13153881	0.81685914	1.05041603	1
0.43072244	0.78617383	0.93015445	1.44399406	1
0.06843254	-0.13153881	0.76021149	0.78803068	1

The next function creates a Vega-Lite specification for the random forest decision surface for a given pair of column names.

#'noj-book.utils.example-code/make-iris-pipeline

(def rf-pipe
 (make-iris-pipeline {:model-type :smile.classification/random-forest}))

#'noj-book.utils.example-code/iris

(kind/vega-lite
  (surface-plot
    iris
    [:sepal-length :sepal-width]
    rf-pipe
    :smile.classification/random-forest))

(kind/vega-lite
  (surface-plot
    iris-std
    [:sepal-length :petal-length]
    rf-pipe
    :smile.classification/random-forest))

(kind/vega-lite
  (surface-plot
    iris-std
    [:sepal-length :petal-width]
    rf-pipe
    :smile.classification/random-forest))

(kind/vega-lite
  (surface-plot
    iris-std
    [:sepal-width :petal-length]
    rf-pipe
    :smile.classification/random-forest))

(kind/vega-lite
  (surface-plot
    iris-std
    [:sepal-width :petal-width]
    rf-pipe
    :smile.classification/random-forest))

(kind/vega-lite
  (surface-plot
    iris-std
    [:petal-length :petal-width]
    rf-pipe
    :smile.classification/random-forest))

20.15 :smile.classification/regularized-discriminant-analysis

javadoc

user guide

name	type	default	description
prioiri	float64-array		The priori probability of each class. If null, it will be estimated from the training data.
alpha	float64	0.9	Regularization factor in [0, 1] allows a continuum of models between LDA and QDA.
tolerance	float64	1.0E-4	A tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol

20.16 :smile.classification/sparse-logistic-regression

name	type	default
lambda	float32	0.1
tolerance	float32	1.0E-5
max-iterations	int32	500.0
sparse-column	key-string-symbol
n-sparse-columns	int32

20.17 :smile.classification/sparse-svm

javadoc

user guide

name	type	default	description
C	float32	1.0	soft margin penalty parameter
tol	float32	1.0E-4	tolerance of convergence test
sparse-column	keyword
p	int32

20.18 :smile.classification/svm

javadoc

user guide

name	type	default	description
C	float32	1.0	soft margin penalty parameter
tol	float32	1.0E-4	tolerance of convergence test

21 Compare decision surfaces of different classification models

In the following we see the decision surfaces of some models on the same data from the Iris dataset using 2 columns :sepal-width and :sepal-length:

[

]

This shows nicely that different model types have different capabilities to separate and therefore classify data.

source: notebooks/noj_book/smile_classification.clj