24  Sklearn model reference

As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.

This specific chapter focuses on the models of the scijit-learn Python library, which is wrapped by sklearn-clj.

(ns noj-book.sklearn-reference
  (:require
   [noj-book.utils.render-tools :refer [render-key-info]]
   [scicloj.kindly.v4.kind :as kind]
   [scicloj.metamorph.core :as mm]
   [scicloj.metamorph.ml :as ml]
   [tech.v3.dataset.tensor :as dst]
   [libpython-clj2.python :refer [py.- ->jvm]]
   [tech.v3.dataset.metamorph :as ds-mm]
   [noj-book.utils.render-tools-sklearn]
   [scicloj.sklearn-clj.ml]))

24.1 Sklearn model reference

Below we find all sklearn models with their parameters and the original documentation.

The parameters are given as Clojure keys in kebab-case. As the document texts are imported from Python, they refer to the Python spelling of the parameter.

But the translation between the two should be obvious.

Example: logistic regression

(def ds (dst/tensor->dataset [[0 0 0] [1 1 1] [2 2 2]]))

Make pipe with sklearn model ‘logistic-regression’

(def pipe
  (mm/pipeline
   (ds-mm/set-inference-target 2)
   {:metamorph/id :model}
   (ml/model {:model-type :sklearn.classification/logistic-regression
              :max-iter 100})))

Train model:

(def fitted-ctx
  (pipe {:metamorph/data ds
         :metamorph/mode :fit}))

Predict on new data:

(->
 (mm/transform-pipe
  (dst/tensor->dataset [[3 4 5]])
  pipe
  fitted-ctx)
 :metamorph/data)

:_unnamed [1 3]:

0 1 2
0.00725794 0.10454345 2.0

Access model details via Python interop (using libpython-clj):

(-> fitted-ctx :model :model-data :model
    (py.- coef_)
    (->jvm))
#tech.v3.tensor<float64>[3 2]
[[   -0.4807    -0.4807]
 [-2.061E-05 -2.061E-05]
 [    0.4807     0.4807]]

All model attributes are also included in the context.

(def model-attributes
  (-> fitted-ctx :model :model-data :attributes))
(kind/hiccup
 [:dl (map
       (fn [[k v]]
         [:span
          (vector :dt k)
          (vector :dd  (clojure.pprint/write v :stream nil))])
       model-attributes)])
n_features_in_
2
coef_
[[-4.80679547e-01 -4.80679547e-01] [-2.06085772e-05 -2.06085772e-05] [ 4.80700156e-01 4.80700156e-01]]
intercept_
[ 0.87322115 0.17611579 -1.04933694]
n_iter_
[11]
classes_
[0. 1. 2.]

24.2 :sklearn.classification models

24.2.1 /ada-boost-classifier

name type default description
estimator
learning-rate
n-estimators
random-state
predict-proba?

An AdaBoost classifier.

An AdaBoost [1]_ classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

This class implements the algorithm based on [2]_.

Read more in the User Guide: adaboost.

Added in 0.14

Parameters

  • estimator: object, default=None The base estimator from which the boosted ensemble is built. Support for sample weighting is required, as well as proper classes_ and n_classes_ attributes. If None, then the base estimator is ~sklearn.tree.DecisionTreeClassifier initialized with max_depth=1.

    Added in 1.2 base_estimator was renamed to estimator.

  • n_estimators: int, default=50 The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early. Values must be in the range [1, inf).

  • learning_rate: float, default=1.0 Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. There is a trade-off between the learning_rate and n_estimators parameters. Values must be in the range (0.0, inf).

  • random_state: int, RandomState instance or None, default=None Controls the random seed given at each estimator at each boosting iteration. Thus, it is only used when estimator exposes a random_state. Pass an int for reproducible output across multiple function calls. See Glossary .

Attributes

  • estimator_: estimator The base estimator from which the ensemble is grown.

    Added in 1.2 base_estimator_ was renamed to estimator_.

  • estimators_: list of classifiers The collection of fitted sub-estimators.

  • classes_: ndarray of shape (n_classes,) The classes labels.

  • n_classes_: int The number of classes.

  • estimator_weights_: ndarray of floats Weights for each estimator in the boosted ensemble.

  • estimator_errors_: ndarray of floats Classification error for each estimator in the boosted ensemble.

  • feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances if supported by the estimator (when based on decision trees).

    Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • AdaBoostRegressor: An AdaBoost regressor that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction.

  • GradientBoostingClassifier: GB builds an additive model in a forward stage-wise fashion. Regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.

  • sklearn.tree.DecisionTreeClassifier: A non-parametric supervised learning method used for classification. Creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

References

  • [1] Y. Freund, R. Schapire, "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", 1995.

  • [2] :doi:J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost." Statistics and its Interface 2.3 (2009): 349-360. <10.4310/SII.2009.v2.n3.a8>

Examples

from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
clf = AdaBoostClassifier(n_estimators=100, random_state=0)
clf.fit(X, y)
AdaBoostClassifier(n_estimators=100, random_state=0)
clf.predict([[0, 0, 0, 0]])
array([1])
clf.score(X, y)
0.96

For a detailed example of using AdaBoost to fit a sequence of DecisionTrees as weaklearners, please refer to :ref:sphx_glr_auto_examples_ensemble_plot_adaboost_multiclass.py.

For a detailed example of using AdaBoost to fit a non-linearly separable classification dataset composed of two Gaussian quantiles clusters, please refer to :ref:sphx_glr_auto_examples_ensemble_plot_adaboost_twoclass.py.



24.2.2 /bagging-classifier

name type default description
bootstrap
bootstrap-features
n-jobs
random-state
estimator
oob-score
max-features
warm-start
n-estimators
max-samples
verbose
predict-proba?

A Bagging classifier.

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]. If samples are drawn with replacement, then the method is known as Bagging [2]. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [3]. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [4].

Read more in the User Guide: bagging.

Added in 0.15

Parameters

  • estimator: object, default=None The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a ~sklearn.tree.DecisionTreeClassifier.

    Added in 1.2 base_estimator was renamed to estimator.

  • n_estimators: int, default=10 The number of base estimators in the ensemble.

  • max_samples: int or float, default=None The number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details).

    • If None, then draw X.shape[0] samples irrespective of sample_weight.
    • If int, then draw max_samples samples.
    • If float, then draw max_samples * X.shape[0] unweighted samples or max_samples * sample_weight.sum() weighted samples.
  • max_features: int or float, default=1.0 The number of features to draw from X to train each base estimator ( without replacement by default, see bootstrap_features for more details).

    • If int, then draw max_features features.
    • If float, then draw max(1, int(max_features * n_features_in_)) features.
  • bootstrap: bool, default=True Whether samples are drawn with replacement. If False, sampling without replacement is performed. If fitting with sample_weight, it is strongly recommended to choose True, as only drawing with replacement will ensure the expected frequency semantics of sample_weight.

  • bootstrap_features: bool, default=False Whether features are drawn with replacement.

  • oob_score: bool, default=False Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. See the Glossary .

    Added in 0.17 warm_start constructor parameter.

  • n_jobs: int, default=None The number of jobs to run in parallel for both fit and predict. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • random_state: int, RandomState instance or None, default=None Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. See Glossary .

  • verbose: int, default=0 Controls the verbosity when fitting and predicting.

Attributes

  • estimator_: estimator The base estimator from which the ensemble is grown.

    Added in 1.2 base_estimator_ was renamed to estimator_.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • estimators_: list of estimators The collection of fitted base estimators.

  • estimators_samples_: list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.

  • estimators_features_: list of arrays The subset of drawn features for each base estimator.

  • classes_: ndarray of shape (n_classes,) The classes labels.

  • n_classes_: int or list The number of classes.

  • oob_score_: float Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

  • oob_decision_function_: ndarray of shape (n_samples, n_classes) Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN. This attribute exists only when oob_score is True.

See Also

  • BaggingRegressor: A Bagging regressor.

References

  • [1] L. Breiman, "Pasting small votes for classification in large databases and on-line", Machine Learning, 36(1), 85-103, 1999.

  • [2] L. Breiman, "Bagging predictors", Machine Learning, 24(2), 123-140, 1996.

  • [3] T. Ho, "The random subspace method for constructing decision forests", Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

  • [4] G. Louppe and P. Geurts, "Ensembles on Random Patches", Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

Examples

from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
clf = BaggingClassifier(estimator=SVC(),
                        n_estimators=10, random_state=0).fit(X, y)
clf.predict([[0, 0, 0, 0]])
array([1])


24.2.3 /bernoulli-nb

name type default description
alpha
binarize
class-prior
fit-prior
force-alpha
predict-proba?

Naive Bayes classifier for multivariate Bernoulli models.

Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

Read more in the User Guide: bernoulli_naive_bayes.

Parameters

  • alpha: float or array-like of shape (n_features,), default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).

  • force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.

    Added in 1.2 Changed in 1.4 The default value of force_alpha changed to True.

  • binarize: float or None, default=0.0 Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.

  • fit_prior: bool, default=True Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

  • class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.

Attributes

  • class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

  • class_log_prior_: ndarray of shape (n_classes,) Log probability of each class (smoothed).

  • classes_: ndarray of shape (n_classes,) Class labels known to the classifier

  • feature_count_: ndarray of shape (n_classes, n_features) Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.

  • feature_log_prob_: ndarray of shape (n_classes, n_features) Empirical log probability of features given a class, P(x_i|y).

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • CategoricalNB: Naive Bayes classifier for categorical features.
  • ComplementNB: The Complement Naive Bayes classifier described in Rennie et al. (2003).
  • GaussianNB: Gaussian Naive Bayes (GaussianNB).
  • MultinomialNB: Naive Bayes classifier for multinomial models.

References

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes -- Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

Examples

import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
Y = np.array([1, 2, 3, 4, 4, 5])
from sklearn.naive_bayes import BernoulliNB
clf = BernoulliNB()
clf.fit(X, Y)
BernoulliNB()
print(clf.predict(X[2:3]))
[3]


24.2.4 /calibrated-classifier-cv

name type default description
cv
ensemble
estimator
method
n-jobs
predict-proba?

Calibrate probabilities using isotonic, sigmoid, or temperature scaling.

This class uses cross-validation to both estimate the parameters of a classifier and subsequently calibrate a classifier. With ensemble=True, for each cv split it fits a copy of the base estimator to the training subset, and calibrates it using the testing subset. For prediction, predicted probabilities are averaged across these individual calibrated classifiers. When ensemble=False, cross-validation is used to obtain unbiased predictions, via ~sklearn.model_selection.cross_val_predict, which are then used for calibration. For prediction, the base estimator, trained using all the data, is used. This is the prediction method implemented when probabilities=True for ~sklearn.svm.SVC and ~sklearn.svm.NuSVC estimators (see User Guide: scores_probabilities for details).

Already fitted classifiers can be calibrated by wrapping the model in a ~sklearn.frozen.FrozenEstimator. In this case all provided data is used for calibration. The user has to take care manually that data for model fitting and calibration are disjoint.

The calibration is based on the decision_function method of the estimator if it exists, else on predict_proba.

Read more in the User Guide: calibration. In order to learn more on the CalibratedClassifierCV class, see the following calibration examples: :ref:sphx_glr_auto_examples_calibration_plot_calibration.py, :ref:sphx_glr_auto_examples_calibration_plot_calibration_curve.py, and :ref:sphx_glr_auto_examples_calibration_plot_calibration_multiclass.py.

Parameters

  • estimator: estimator instance, default=None The classifier whose output need to be calibrated to provide more accurate predict_proba outputs. The default classifier is a ~sklearn.svm.LinearSVC.

    Added in 1.2

  • method: {'sigmoid', 'isotonic', 'temperature'}, default='sigmoid' The method to use for calibration. Can be:

    • 'sigmoid', which corresponds to Platt's method (i.e. a binary logistic regression model).
    • 'isotonic', which is a non-parametric approach.
    • 'temperature', temperature scaling.

    Sigmoid and isotonic calibration methods natively support only binary classifiers and extend to multi-class classification using a One-vs-Rest (OvR) strategy with post-hoc renormalization, i.e., adjusting the probabilities after calibration to ensure they sum up to 1.

    In contrast, temperature scaling naturally supports multi-class calibration by applying softmax(classifier_logits/T) with a value of T (temperature) that optimizes the log loss.

    For very uncalibrated classifiers on very imbalanced datasets, sigmoid calibration might be preferred because it fits an additional intercept parameter. This helps shift decision boundaries appropriately when the classifier being calibrated is biased towards the majority class.

    Isotonic calibration is not recommended when the number of calibration samples is too low (≪1000) since it then tends to overfit.

    Changed in 1.8 Added option 'temperature'.

  • cv: int, cross-validation generator, or iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 5-fold cross-validation,
    • integer, to specify the number of folds.
    • CV splitter,
    • An iterable yielding (train, test) splits as arrays of indices.

    For integer/None inputs, if y is binary or multiclass, ~sklearn.model_selection.StratifiedKFold is used. If y is neither binary nor multiclass, ~sklearn.model_selection.KFold is used.

    Refer to the User Guide: cross_validation for the various cross-validation strategies that can be used here.

    Changed in 0.22 cv default value if None changed from 3-fold to 5-fold.

  • n_jobs: int, default=None Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

    Base estimator clones are fitted in parallel across cross-validation iterations.

    See Glossary for more details.

    Added in 0.24

  • ensemble: bool, or "auto", default="auto" Determines how the calibrator is fitted.

    "auto" will use False if the estimator is a ~sklearn.frozen.FrozenEstimator, and True otherwise.

    If True, the estimator is fitted using training data, and calibrated using testing data, for each cv fold. The final estimator is an ensemble of n_cv fitted classifier and calibrator pairs, where n_cv is the number of cross-validation folds. The output is the average predicted probabilities of all pairs.

    If False, cv is used to compute unbiased predictions, via ~sklearn.model_selection.cross_val_predict, which are then used for calibration. At prediction time, the classifier used is the estimator trained on all the data. Note that this method is also internally implemented in sklearn.svm estimators with the probabilities=True parameter.

    Added in 0.24

    Changed in 1.6 "auto" option is added and is the default.

Attributes

  • classes_: ndarray of shape (n_classes,) The class labels.

  • n_features_in_: int Number of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

    Added in 1.0

  • calibrated_classifiers_: list (len() equal to cv or 1 if ensemble=False) The list of classifier and calibrator pairs.

    • When ensemble=True, n_cv fitted estimator and calibrator pairs. n_cv is the number of cross-validation folds.
    • When ensemble=False, the estimator, fitted on all the data, and fitted calibrator.

    Changed in 0.24 Single calibrated classifier case when ensemble=False.

See Also

  • calibration_curve: Compute true and predicted probabilities for a calibration curve.

References

Examples

from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB
from sklearn.calibration import CalibratedClassifierCV
X, y = make_classification(n_samples=100, n_features=2,
                           n_redundant=0, random_state=42)
base_clf = GaussianNB()
calibrated_clf = CalibratedClassifierCV(base_clf, cv=3)
calibrated_clf.fit(X, y)
CalibratedClassifierCV(...)
len(calibrated_clf.calibrated_classifiers_)
3
calibrated_clf.predict_proba(X)[:5, :]
array([[0.110, 0.889],
       [0.072, 0.927],
       [0.928, 0.072],
       [0.928, 0.072],
       [0.072, 0.928]])
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=100, n_features=2,
                           n_redundant=0, random_state=42)
X_train, X_calib, y_train, y_calib = train_test_split(
       X, y, random_state=42
)
base_clf = GaussianNB()
base_clf.fit(X_train, y_train)
GaussianNB()
from sklearn.frozen import FrozenEstimator
calibrated_clf = CalibratedClassifierCV(FrozenEstimator(base_clf))
calibrated_clf.fit(X_calib, y_calib)
CalibratedClassifierCV(...)
len(calibrated_clf.calibrated_classifiers_)
1
calibrated_clf.predict_proba([[-0.5, 0.5]])
array([[0.936, 0.063]])


24.2.5 /categorical-nb

name type default description
alpha
class-prior
fit-prior
force-alpha
min-categories
predict-proba?

Naive Bayes classifier for categorical features.

The categorical Naive Bayes classifier is suitable for classification with discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution.

Read more in the User Guide: categorical_naive_bayes.

Parameters

  • alpha: float, default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).

  • force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.

    Added in 1.2 Changed in 1.4 The default value of force_alpha changed to True.

  • fit_prior: bool, default=True Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

  • class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.

  • min_categories: int or array-like of shape (n_features,), default=None Minimum number of categories per feature.

    • integer: Sets the minimum number of categories per feature to n_categories for each features.
    • array-like: shape (n_features,) where n_categories[i] holds the minimum number of categories for the ith column of the input.
    • None (default): Determines the number of categories automatically from the training data.

    Added in 0.24

Attributes

  • category_count_: list of arrays of shape (n_features,) Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the number of samples encountered for each class and category of the specific feature.

  • class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

  • class_log_prior_: ndarray of shape (n_classes,) Smoothed empirical log probability for each class.

  • classes_: ndarray of shape (n_classes,) Class labels known to the classifier

  • feature_log_prob_: list of arrays of shape (n_features,) Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the empirical log probability of categories given the respective feature and class, P(x_i|y).

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_categories_: ndarray of shape (n_features,), dtype=np.int64 Number of categories for each feature. This value is inferred from the data or set by the minimum number of categories.

    Added in 0.24

See Also

  • BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.
  • ComplementNB: Complement Naive Bayes classifier.
  • GaussianNB: Gaussian Naive Bayes.
  • MultinomialNB: Naive Bayes classifier for multinomial models.

Examples

import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])
from sklearn.naive_bayes import CategoricalNB
clf = CategoricalNB()
clf.fit(X, y)
CategoricalNB()
print(clf.predict(X[2:3]))
[3]


24.2.6 /complement-nb

name type default description
alpha
class-prior
fit-prior
force-alpha
norm
predict-proba?

The Complement Naive Bayes classifier described in Rennie et al. (2003).

The Complement Naive Bayes classifier was designed to correct the "severe assumptions" made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets.

Read more in the User Guide: complement_naive_bayes.

Added in 0.20

Parameters

  • alpha: float or array-like of shape (n_features,), default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).

  • force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.

    Added in 1.2 Changed in 1.4 The default value of force_alpha changed to True.

  • fit_prior: bool, default=True Only used in edge case with a single class in the training set.

  • class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. Not used.

  • norm: bool, default=False Whether or not a second normalization of the weights is performed. The default behavior mirrors the implementations found in Mahout and Weka, which do not follow the full algorithm described in Table 9 of the paper.

Attributes

  • class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

  • class_log_prior_: ndarray of shape (n_classes,) Smoothed empirical log probability for each class. Only used in edge case with a single class in the training set.

  • classes_: ndarray of shape (n_classes,) Class labels known to the classifier

  • feature_all_: ndarray of shape (n_features,) Number of samples encountered for each feature during fitting. This value is weighted by the sample weight when provided.

  • feature_count_: ndarray of shape (n_classes, n_features) Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.

  • feature_log_prob_: ndarray of shape (n_classes, n_features) Empirical weights for class complements.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.
  • CategoricalNB: Naive Bayes classifier for categorical features.
  • GaussianNB: Gaussian Naive Bayes.
  • MultinomialNB: Naive Bayes classifier for multinomial models.

References

Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623). https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Examples

import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])
from sklearn.naive_bayes import ComplementNB
clf = ComplementNB()
clf.fit(X, y)
ComplementNB()
print(clf.predict(X[2:3]))
[3]


24.2.7 /decision-tree-classifier

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
class-weight
criterion
predict-proba?

A decision tree classifier.

Read more in the User Guide: tree.

Parameters

  • criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation.

  • splitter: {"best", "random"}, default="best" The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.

  • max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.
    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in 0.18 Added float values for fractions.

  • min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.
    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in 0.18 Added float values for fractions.

  • min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features: int, float or {"sqrt", "log2"}, default=None The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.
    • If float, then max_features is a fraction and max(1, int(max_features * n_features_in_)) features are considered at each split.
    • If "sqrt", then max_features=sqrt(n_features).
    • If "log2", then max_features=log2(n_features).
    • If None, then max_features=n_features.

🛈 Note

The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.


  • random_state: int, RandomState instance or None, default=None Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_features < n_features, the algorithm will select max_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_features=n_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

  • max_leaf_nodes: int, default=None Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19
  • class_weight: dict, list of dict or "balanced", default=None Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

    Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    For multi-output, the weights of each column of y will be multiplied.

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruning for details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.py for an example of such pruning.

    Added in 0.22

  • monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decrease

    If monotonic_cst is None, no constraints are applied.

    Monotonicity constraints are not supported for: - multiclass classifications (i.e. when n_classes > 2), - multioutput classifications (i.e. when n_outputs_ > 1), - classifications trained on data with missing values.

    The constraints hold over the probability of the positive class.

    Read more in the User Guide: monotonic_cst_gbdt.

    Added in 1.4

Attributes

  • classes_: ndarray of shape (n_classes,) or list of ndarray The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

  • feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

    Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

  • max_features_: int The inferred value of max_features.

  • n_classes_: int or list of int The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_outputs_: int The number of outputs when fit is performed.

  • tree_: Tree instance The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and :ref:sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

See Also

  • DecisionTreeRegressor: A decision tree regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

The predict method operates using the numpy.argmax function on the outputs of predict_proba. This means that in case the highest predicted probabilities are tied, the classifier will predict the tied class with the lowest index in classes_.

References

Examples

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
cross_val_score(clf, iris.data, iris.target, cv=10)
                            # doctest: +SKIP
array([ 1.     ,  0.93,  0.86,  0.93,  0.93,
        0.93,  0.93,  1.     ,  0.93,  1.      ])


24.2.8 /dummy-classifier

name type default description
constant
random-state
strategy
predict-proba?

DummyClassifier makes predictions that ignore the input features.

This classifier serves as a simple baseline to compare against other more complex classifiers.

The specific behavior of the baseline is selected with the strategy parameter.

All strategies make predictions that ignore the input feature values passed as the X argument to fit and predict. The predictions, however, typically depend on values observed in the y parameter passed to fit.

Note that the "stratified" and "uniform" strategies lead to non-deterministic predictions that can be rendered deterministic by setting the random_state parameter if needed. The other strategies are naturally deterministic and, once fit, always return the same constant prediction for any value of X.

Read more in the User Guide: dummy_estimators.

Added in 0.13

Parameters

  • strategy: {"most_frequent", "prior", "stratified", "uniform", "constant"}, default="prior" Strategy to use to generate predictions.

    • "most_frequent": the predict method always returns the most frequent class label in the observed y argument passed to fit. The predict_proba method returns the matching one-hot encoded vector.

    • "prior": the predict method always returns the most frequent class label in the observed y argument passed to fit (like "most_frequent"). predict_proba always returns the empirical class distribution of y also known as the empirical class prior distribution.

    • "stratified": the predict_proba method randomly samples one-hot vectors from a multinomial distribution parametrized by the empirical class prior probabilities. The predict method returns the class label which got probability one in the one-hot vector of predict_proba. Each sampled row of both methods is therefore independent and identically distributed.

    • "uniform": generates predictions uniformly at random from the list of unique classes observed in y, i.e. each class has equal probability.

    • "constant": always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.

      Changed in 0.24 The default value of strategy has changed to "prior" in version 0.24.

  • random_state: int, RandomState instance or None, default=None Controls the randomness to generate the predictions when strategy='stratified' or strategy='uniform'. Pass an int for reproducible output across multiple function calls. See Glossary .

  • constant: int or str or array-like of shape (n_outputs,), default=None The explicit constant as predicted by the "constant" strategy. This parameter is useful only for the "constant" strategy.

Attributes

  • classes_: ndarray of shape (n_classes,) or list of such arrays Unique class labels observed in y. For multi-output classification problems, this attribute is a list of arrays as each output has an independent set of possible classes.

  • n_classes_: int or list of int Number of label for each output.

  • class_prior_: ndarray of shape (n_classes,) or list of such arrays Frequency of each class observed in y. For multioutput classification problems, this is computed independently for each output.

  • n_features_in_: int Number of features seen during fit.

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

  • n_outputs_: int Number of outputs.

  • sparse_output_: bool True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the input y is passed in sparse format.

See Also

  • DummyRegressor: Regressor that makes predictions using simple rules.

Examples

import numpy as np
from sklearn.dummy import DummyClassifier
X = np.array([-1, 1, 1, 1])
y = np.array([0, 1, 1, 1])
dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_clf.fit(X, y)
DummyClassifier(strategy='most_frequent')
dummy_clf.predict(X)
array([1, 1, 1, 1])
dummy_clf.score(X, y)
0.75


24.2.9 /extra-tree-classifier

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
class-weight
criterion
predict-proba?

An extremely randomized tree classifier.

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Warning: Extra-trees should only be used within ensemble methods.

Read more in the User Guide: tree.

Parameters

  • criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation.

  • splitter: {"random", "best"}, default="random" The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.

  • max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.
    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in 0.18 Added float values for fractions.

  • min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.
    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in 0.18 Added float values for fractions.

  • min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features: int, float, {"sqrt", "log2"} or None, default="sqrt" The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.
    • If float, then max_features is a fraction and max(1, int(max_features * n_features_in_)) features are considered at each split.
    • If "sqrt", then max_features=sqrt(n_features).
    • If "log2", then max_features=log2(n_features).
    • If None, then max_features=n_features.

    Changed in 1.1 The default of max_features changed from "auto" to "sqrt".

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • random_state: int, RandomState instance or None, default=None Used to pick randomly the max_features used at each split. See Glossary for details.

  • max_leaf_nodes: int, default=None Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19
  • class_weight: dict, list of dict or "balanced", default=None Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

    Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    For multi-output, the weights of each column of y will be multiplied.

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruning for details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.py for an example of such pruning.

    Added in 0.22

  • monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decrease

    If monotonic_cst is None, no constraints are applied.

    Monotonicity constraints are not supported for: - multiclass classifications (i.e. when n_classes > 2), - multioutput classifications (i.e. when n_outputs_ > 1), - classifications trained on data with missing values.

    The constraints hold over the probability of the positive class.

    Read more in the User Guide: monotonic_cst_gbdt.

    Added in 1.4

Attributes

  • classes_: ndarray of shape (n_classes,) or list of ndarray The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

  • max_features_: int The inferred value of max_features.

  • n_classes_: int or list of int The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).

  • feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

    Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_outputs_: int The number of outputs when fit is performed.

  • tree_: Tree instance The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and :ref:sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

See Also

  • ExtraTreeRegressor: An extremely randomized tree regressor.
  • sklearn.ensemble.ExtraTreesClassifier: An extra-trees classifier.
  • sklearn.ensemble.ExtraTreesRegressor: An extra-trees regressor.
  • sklearn.ensemble.RandomForestClassifier: A random forest classifier.
  • sklearn.ensemble.RandomForestRegressor: A random forest regressor.
  • sklearn.ensemble.RandomTreesEmbedding: An ensemble of totally random trees.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

  • [1] P. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized trees", Machine Learning, 63(1), 3-42, 2006.

Examples

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import ExtraTreeClassifier
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
   X, y, random_state=0)
extra_tree = ExtraTreeClassifier(random_state=0)
cls = BaggingClassifier(extra_tree, random_state=0).fit(
   X_train, y_train)
cls.score(X_test, y_test)
0.8947


24.2.10 /extra-trees-classifier

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
class-weight
n-estimators
max-samples
criterion
verbose
predict-proba?

An extra-trees classifier.

This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

This estimator has native support for missing values (NaNs) for random splits. During training, a random threshold will be chosen to split the non-missing values on. Then the non-missing values will be sent to the left and right child based on the randomly selected threshold, while the missing values will also be randomly sent to the left or right child. This is repeated for every feature considered at each split. The best split among these is chosen.

Read more in the User Guide: forest.

Parameters

  • n_estimators: int, default=100 The number of trees in the forest.

    Changed in 0.22 The default value of n_estimators changed from 10 to 100 in 0.22.

  • criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation. Note: This parameter is tree-specific.

  • max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.
    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in 0.18 Added float values for fractions.

  • min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.
    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in 0.18 Added float values for fractions.

  • min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features: {"sqrt", "log2", None}, int or float, default="sqrt" The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.
    • If float, then max_features is a fraction and max(1, int(max_features * n_features_in_)) features are considered at each split.
    • If "sqrt", then max_features=sqrt(n_features).
    • If "log2", then max_features=log2(n_features).
    • If None, then max_features=n_features.

    Changed in 1.1 The default of max_features changed from "auto" to "sqrt".

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_leaf_nodes: int, default=None Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19
  • bootstrap: bool, default=False Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

  • oob_score: bool or callable, default=False Whether to use out-of-bag samples to estimate the generalization score. By default, ~sklearn.metrics.accuracy_score is used. Provide a callable with signature metric(y_true, y_pred) to use a custom metric. Only available if bootstrap=True.

    For an illustration of out-of-bag (OOB) error estimation, see the example :ref:sphx_glr_auto_examples_ensemble_plot_ensemble_oob.py.

  • n_jobs: int, default=None The number of jobs to run in parallel. fit, predict, decision_path and apply are all parallelized over the trees. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • random_state: int, RandomState instance or None, default=None Controls 3 sources of randomness:

    • the bootstrapping of the samples used when building trees (if bootstrap=True)
    • the sampling of the features to consider when looking for the best split at each node (if max_features < n_features)
    • the draw of the splits for each of the max_features

    See Glossary for details.

  • verbose: int, default=0 Controls the verbosity when fitting and predicting.

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See Glossary and :ref:tree_ensemble_warm_start for details.

  • class_weight: {"balanced", "balanced_subsample"}, dict or list of dicts, default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

    Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    The "balanced_subsample" mode is the same as "balanced" except that weights are computed based on the bootstrap sample for every tree grown.

    For multi-output, the weights of each column of y will be multiplied.

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruning for details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.py for an example of such pruning.

    Added in 0.22

  • max_samples: int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator.

    • If None (default), then draw X.shape[0] samples.
    • If int, then draw max_samples samples.
    • If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0.0, 1.0].

    Added in 0.22

  • monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonically increasing - 0: no constraint - -1: monotonically decreasing

    If monotonic_cst is None, no constraints are applied.

    Monotonicity constraints are not supported for: - multiclass classifications (i.e. when n_classes > 2), - multioutput classifications (i.e. when n_outputs_ > 1), - classifications trained on data with missing values.

    The constraints hold over the probability of the positive class.

    Read more in the User Guide: monotonic_cst_gbdt.

    Added in 1.4

Attributes

  • estimator_: ~sklearn.tree.ExtraTreeClassifier The child estimator template used to create the collection of fitted sub-estimators.

    Added in 1.2 base_estimator_ was renamed to estimator_.

  • estimators_: list of DecisionTreeClassifier The collection of fitted sub-estimators.

  • classes_: ndarray of shape (n_classes,) or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

  • n_classes_: int or list The number of classes (single output problem), or a list containing the number of classes for each output (multi-output problem).

  • feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

    Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_outputs_: int The number of outputs when fit is performed.

  • oob_score_: float Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

  • oob_decision_function_: ndarray of shape (n_samples, n_classes) or (n_samples, n_classes, n_outputs) Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN. This attribute exists only when oob_score is True.

  • estimators_samples_: list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.

    Added in 1.4

See Also

  • ExtraTreesRegressor: An extra-trees regressor with random splits.
  • RandomForestClassifier: A random forest classifier with optimal splits.
  • RandomForestRegressor: Ensemble regressor using trees with optimal splits.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

  • [1] P. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized trees", Machine Learning, 63(1), 3-42, 2006.

Examples

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_features=4, random_state=0)
clf = ExtraTreesClassifier(n_estimators=100, random_state=0)
clf.fit(X, y)
ExtraTreesClassifier(random_state=0)
clf.predict([[0, 0, 0, 0]])
array([1])


24.2.11 /gaussian-nb

name type default description
priors
var-smoothing
predict-proba?

Gaussian Naive Bayes (GaussianNB).

Can perform online updates to model parameters via partial_fit. For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque.

Read more in the User Guide: gaussian_naive_bayes.

Parameters

  • priors: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.

  • var_smoothing: float, default=1e-9 Portion of the largest variance of all features that is added to variances for calculation stability.

    Added in 0.20

Attributes

  • class_count_: ndarray of shape (n_classes,) number of training samples observed in each class.

  • class_prior_: ndarray of shape (n_classes,) probability of each class.

  • classes_: ndarray of shape (n_classes,) class labels known to the classifier.

  • epsilon_: float absolute additive value to variances.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • var_: ndarray of shape (n_classes, n_features) Variance of each feature per class.

    Added in 1.0

  • theta_: ndarray of shape (n_classes, n_features) mean of each feature per class.

See Also

  • BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.
  • CategoricalNB: Naive Bayes classifier for categorical features.
  • ComplementNB: Complement Naive Bayes classifier.
  • MultinomialNB: Naive Bayes classifier for multinomial models.

Examples

import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X, Y)
GaussianNB()
print(clf.predict([[-0.8, -1]]))
[1]
clf_pf = GaussianNB()
clf_pf.partial_fit(X, Y, np.unique(Y))
GaussianNB()
print(clf_pf.predict([[-0.8, -1]]))
[1]


24.2.12 /gaussian-process-classifier

name type default description
kernel
optimizer
multi-class
n-jobs
random-state
max-iter-predict
copy-x-train
n-restarts-optimizer
warm-start
predict-proba?

Gaussian process classification (GPC) based on Laplace approximation.

The implementation is based on Algorithm 3.1, 3.2, and 5.1 from [RW2006]_.

Internally, the Laplace approximation is used for approximating the non-Gaussian posterior by a Gaussian.

Currently, the implementation is restricted to using the logistic link function. For multi-class classification, several binary one-versus rest classifiers are fitted. Note that this class thus does not implement a true multi-class Laplace approximation.

Read more in the User Guide: gaussian_process.

Added in 0.18

Parameters

  • kernel: kernel instance, default=None The kernel specifying the covariance function of the GP. If None is passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that the kernel's hyperparameters are optimized during fitting. Also kernel cannot be a CompoundKernel.

  • optimizer: 'fmin_l_bfgs_b', callable or None, default='fmin_l_bfgs_b' Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature

def optimizer(obj_func, initial_theta, bounds):
    # * 'obj_func' is the objective function to be maximized, which
    #   takes the hyperparameters theta as parameter and an
    #   optional flag eval_gradient, which determines if the
    #   gradient is returned additionally to the function value
    # * 'initial_theta': the initial value for theta, which can be
    #   used by local optimizers
    # * 'bounds': the bounds on the values of theta
    ....
    # Returned are the best found hyperparameters theta and
    # the corresponding value of the target function.
    return theta_opt, func_min

default, the 'L-BFGS-B' algorithm from scipy.optimize.minimize
sed. If None is passed, the kernel's parameters are kept fixed.
lable internal optimizers are::

'fmin_l_bfgs_b'
  • n_restarts_optimizer: int, default=0 The number of restarts of the optimizer for finding the kernel's parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer=0 implies that one run is performed.

  • max_iter_predict: int, default=100 The maximum number of iterations in Newton's method for approximating the posterior during predict. Smaller values will reduce computation time at the cost of worse results.

  • warm_start: bool, default=False If warm-starts are enabled, the solution of the last Newton iteration on the Laplace approximation of the posterior mode is used as initialization for the next call of _posterior_mode(). This can speed up convergence when _posterior_mode is called several times on similar problems as in hyperparameter optimization. See the Glossary .

  • copy_X_train: bool, default=True If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.

  • random_state: int, RandomState instance or None, default=None Determines random number generation used to initialize the centers. Pass an int for reproducible results across multiple function calls. See Glossary .

  • multi_class: {'one_vs_rest', 'one_vs_one'}, default='one_vs_rest' Specifies how multi-class classification problems are handled. Supported are 'one_vs_rest' and 'one_vs_one'. In 'one_vs_rest', one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from the rest. In 'one_vs_one', one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate these two classes. The predictions of these binary predictors are combined into multi-class predictions. Note that 'one_vs_one' does not support predicting probability estimates.

  • n_jobs: int, default=None The number of jobs to use for the computation: the specified multiclass problems are computed in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

  • base_estimator_: Estimator instance The estimator instance that defines the likelihood function using the observed data.

  • kernel_: kernel instance The kernel used for prediction. In case of binary classification, the structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters. In case of multi-class classification, a CompoundKernel is returned which consists of the different kernels used in the one-versus-rest classifiers.

  • log_marginal_likelihood_value_: float The log-marginal-likelihood of self.kernel_.theta

  • classes_: array-like of shape (n_classes,) Unique class labels.

  • n_classes_: int The number of classes in the training data

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • GaussianProcessRegressor: Gaussian process regression (GPR).

References

.. [RW2006] Carl E. Rasmussen and Christopher K.I. Williams, "Gaussian Processes for Machine Learning", MIT Press 2006

Examples

from sklearn.datasets import load_iris
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
X, y = load_iris(return_X_y=True)
kernel = 1.0 * RBF(1.0)
gpc = GaussianProcessClassifier(kernel=kernel,
        random_state=0).fit(X, y)
gpc.score(X, y)
0.9866...
gpc.predict_proba(X[:2,:])
array([[0.83548752, 0.03228706, 0.13222543],
       [0.79064206, 0.06525643, 0.14410151]])

For a comparison of the GaussianProcessClassifier with other classifiers see: :ref:sphx_glr_auto_examples_classification_plot_classification_probability.py.



24.2.13 /gradient-boosting-classifier

name type default description
n-iter-no-change
learning-rate
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
tol
subsample
ccp-alpha
random-state
min-samples-leaf
max-features
init
warm-start
max-depth
validation-fraction
n-estimators
criterion
loss
verbose
predict-proba?

Gradient Boosting for classification.

This algorithm builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the loss function, e.g. binary or multiclass log loss. Binary classification is a special case where only a single regression tree is induced.

~sklearn.ensemble.HistGradientBoostingClassifier is a much faster variant of this algorithm for intermediate and large datasets (n_samples >= 10_000) and supports monotonic constraints.

Read more in the User Guide: gradient_boosting.

Parameters

  • loss: {'log_loss', 'exponential'}, default='log_loss' The loss function to be optimized. 'log_loss' refers to binomial and multinomial deviance, the same as used in logistic regression. It is a good choice for classification with probabilistic outputs. For loss 'exponential', gradient boosting recovers the AdaBoost algorithm.

  • learning_rate: float, default=0.1 Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators. Values must be in the range [0.0, inf).

    For an example of the effects of this parameter and its interaction with subsample, see :ref:sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regularization.py.

  • n_estimators: int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. Values must be in the range [1, inf).

  • subsample: float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias. Values must be in the range (0.0, 1.0].

  • criterion: {'friedman_mse', 'squared_error'}, default='friedman_mse' The function to measure the quality of a split. Supported criteria are 'friedman_mse' for the mean squared error with improvement score by Friedman, 'squared_error' for mean squared error. The default value of 'friedman_mse' is generally the best as it can provide a better approximation in some cases.

    Added in 0.18

  • min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:

    • If int, values must be in the range [2, inf).
    • If float, values must be in the range (0.0, 1.0] and min_samples_split will be ceil(min_samples_split * n_samples).

    Changed in 0.18 Added float values for fractions.

  • min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, values must be in the range [1, inf).
    • If float, values must be in the range (0.0, 1.0) and min_samples_leaf will be ceil(min_samples_leaf * n_samples).

    Changed in 0.18 Added float values for fractions.

  • min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. Values must be in the range [0.0, 0.5].

  • max_depth: int or None, default=3 Maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. If int, values must be in the range [1, inf).

  • min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Values must be in the range [0.0, inf).

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19
  • init: estimator or 'zero', default=None An estimator object that is used to compute the initial predictions. init has to provide fit and predict_proba. If 'zero', the initial raw predictions are set to zero. By default, a DummyEstimator predicting the classes priors is used.

  • random_state: int, RandomState instance or None, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details). It also controls the random splitting of the training data to obtain a validation set if n_iter_no_change is not None. Pass an int for reproducible output across multiple function calls. See Glossary .

  • max_features: {'sqrt', 'log2'}, int or float, default=None The number of features to consider when looking for the best split:

    • If int, values must be in the range [1, inf).
    • If float, values must be in the range (0.0, 1.0] and the features considered at each split will be max(1, int(max_features * n_features_in_)).
    • If 'sqrt', then max_features=sqrt(n_features).
    • If 'log2', then max_features=log2(n_features).
    • If None, then max_features=n_features.

    Choosing max_features < n_features leads to a reduction of variance and an increase in bias.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • verbose: int, default=0 Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree. Values must be in the range [0, inf).

  • max_leaf_nodes: int, default=None Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. Values must be in the range [2, inf). If None, then unlimited number of leaf nodes.

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. See the Glossary .

  • validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Values must be in the range (0.0, 1.0). Only used if n_iter_no_change is set to an integer.

    Added in 0.20

  • n_iter_no_change: int, default=None n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping. If set to a number, it will set aside validation_fraction size of the training data as validation and terminate training when validation score is not improving in all of the previous n_iter_no_change numbers of iterations. The split is stratified. Values must be in the range [1, inf). See :ref:sphx_glr_auto_examples_ensemble_plot_gradient_boosting_early_stopping.py.

    Added in 0.20

  • tol: float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops. Values must be in the range [0.0, inf).

    Added in 0.20

  • ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. Values must be in the range [0.0, inf). See :ref:minimal_cost_complexity_pruning for details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.py for an example of such pruning.

    Added in 0.22

Attributes

  • n_estimators_: int The number of estimators as selected by early stopping (if n_iter_no_change is specified). Otherwise it is set to n_estimators.

    Added in 0.20

  • n_trees_per_iteration_: int The number of trees that are built at each iteration. For binary classifiers, this is always 1.

    Added in 1.4.0

  • feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

    Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

  • oob_improvement_: ndarray of shape (n_estimators,) The improvement in loss on the out-of-bag samples relative to the previous iteration. oob_improvement_[0] is the improvement in loss of the first stage over the init estimator. Only available if subsample < 1.0.

  • oob_scores_: ndarray of shape (n_estimators,) The full history of the loss values on the out-of-bag samples. Only available if subsample < 1.0.

    Added in 1.3

  • oob_score_: float The last value of the loss on the out-of-bag samples. It is the same as oob_scores_[-1]. Only available if subsample < 1.0.

    Added in 1.3

  • train_score_: ndarray of shape (n_estimators,) The i-th score train_score_[i] is the loss of the model at iteration i on the in-bag sample. If subsample == 1 this is the loss on the training data.

  • init_: estimator The estimator that provides the initial predictions. Set via the init argument.

  • estimators_: ndarray of DecisionTreeRegressor of shape (n_estimators, n_trees_per_iteration_) The collection of fitted sub-estimators. n_trees_per_iteration_ is 1 for binary classification, otherwise n_classes.

  • classes_: ndarray of shape (n_classes,) The classes labels.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_classes_: int The number of classes.

  • max_features_: int The inferred value of max_features.

See Also

  • HistGradientBoostingClassifier: Histogram-based Gradient Boosting Classification Tree.
  • sklearn.tree.DecisionTreeClassifier: A decision tree classifier.
  • RandomForestClassifier: A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
  • AdaBoostClassifier: A meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

Notes

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

References

J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.

J. Friedman, Stochastic Gradient Boosting, 1999

T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.

Examples

The following example shows how to fit a gradient boosting classifier with 100 decision stumps as weak learners.

from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
    max_depth=1, random_state=0).fit(X_train, y_train)
clf.score(X_test, y_test)
0.913


24.2.14 /hist-gradient-boosting-classifier

name type default description
n-iter-no-change
learning-rate
max-leaf-nodes
scoring
tol
early-stopping
max-iter
random-state
max-bins
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
validation-fraction
class-weight
loss
interaction-cst
verbose
categorical-features
l-2-regularization
predict-proba?

Histogram-based Gradient Boosting Classification Tree.

This estimator is much faster than GradientBoostingClassifier for big datasets (n_samples >= 10 000).

This estimator has native support for missing values (NaNs). During training, the tree grower learns at each split point whether samples with missing values should go to the left or right child, based on the potential gain. When predicting, samples with missing values are assigned to the left or right child consequently. If no missing values were encountered for a given feature during training, then samples with missing values are mapped to whichever child has the most samples.

This implementation is inspired by LightGBM .

Read more in the User Guide: histogram_based_gradient_boosting.

Added in 0.21

Parameters

  • loss: {'log_loss'}, default='log_loss' The loss function to use in the boosting process.

    For binary classification problems, 'log_loss' is also known as logistic loss, binomial deviance or binary crossentropy. Internally, the model fits one tree per boosting iteration and uses the logistic sigmoid function (expit) as inverse link function to compute the predicted positive class probability.

    For multiclass classification problems, 'log_loss' is also known as multinomial deviance or categorical crossentropy. Internally, the model fits one tree per boosting iteration and per class and uses the softmax function as inverse link function to compute the predicted probabilities of the classes.

  • learning_rate: float, default=0.1 The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage.

  • max_iter: int, default=100 The maximum number of iterations of the boosting process, i.e. the maximum number of trees for binary classification. For multiclass classification, n_classes trees per iteration are built.

  • max_leaf_nodes: int or None, default=31 The maximum number of leaves for each tree. Must be strictly greater than 1. If None, there is no maximum limit.

  • max_depth: int or None, default=None The maximum depth of each tree. The depth of a tree is the number of edges to go from the root to the deepest leaf. Depth isn't constrained by default.

  • min_samples_leaf: int, default=20 The minimum number of samples per leaf. For small datasets with less than a few hundred samples, it is recommended to lower this value since only very shallow trees would be built.

  • l2_regularization: float, default=0 The L2 regularization parameter penalizing leaves with small hessians. Use 0 for no regularization (default).

  • max_features: float, default=1.0 Proportion of randomly chosen features in each and every node split. This is a form of regularization, smaller values make the trees weaker learners and might prevent overfitting. If interaction constraints from interaction_cst are present, only allowed features are taken into account for the subsampling.

    Added in 1.4

  • max_bins: int, default=255 The maximum number of bins to use for non-missing values. Before training, each feature of the input array X is binned into integer-valued bins, which allows for a much faster training stage. Features with a small number of unique values may use less than max_bins bins. In addition to the max_bins bins, one more bin is always reserved for missing values. Must be no larger than 255.

  • categorical_features: array-like of {bool, int, str} of shape (n_features) or shape (n_categorical_features,), default='from_dtype' Indicates the categorical features.

    • None : no feature will be considered categorical.
    • boolean array-like : boolean mask indicating categorical features.
    • integer array-like : integer indices indicating categorical features.
    • str array-like: names of categorical features (assuming the training data has feature names).
    • "from_dtype": dataframe columns with dtype "category" are considered to be categorical features. The input must be an object exposing a __dataframe__ method such as pandas or polars DataFrames to use this feature.

    For each categorical feature, there must be at most max_bins unique categories. Negative values for categorical features encoded as numeric dtypes are treated as missing values. All categorical values are converted to floating point numbers. This means that categorical values of 1.0 and 1 are treated as the same category.

    Read more in the User Guide: categorical_support_gbdt.

    Added in 0.24

    Changed in 1.2 Added support for feature names.

    Changed in 1.4 Added "from_dtype" option.

    Changed in 1.6 The default value changed from None to "from_dtype".

  • monotonic_cst: array-like of int of shape (n_features) or dict, default=None Monotonic constraint to enforce on each feature are specified using the following integer values:

    • 1: monotonic increase
    • 0: no constraint
    • -1: monotonic decrease

    If a dict with str keys, map feature to monotonic constraints by name. If an array, the features are mapped to constraints by position. See :ref:monotonic_cst_features_names for a usage example.

    The constraints are only valid for binary classifications and hold over the probability of the positive class. Read more in the User Guide: monotonic_cst_gbdt.

    Added in 0.23

    Changed in 1.2 Accept dict of constraints with feature names as keys.

  • interaction_cst: {"pairwise", "no_interactions"} or sequence of lists/tuples/sets of int, default=None Specify interaction constraints, the sets of features which can interact with each other in child node splits.

    Each item specifies the set of feature indices that are allowed to interact with each other. If there are more features than specified in these constraints, they are treated as if they were specified as an additional set.

    The strings "pairwise" and "no_interactions" are shorthands for allowing only pairwise or no interactions, respectively.

    For instance, with 5 features in total, interaction_cst=[{0, 1}] is equivalent to interaction_cst=[{0, 1}, {2, 3, 4}], and specifies that each branch of a tree will either only split on features 0 and 1 or only split on features 2, 3 and 4.

    See this example: ice-vs-pdp on how to use interaction_cst.

    Added in 1.2

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble. For results to be valid, the estimator should be re-trained on the same data only. See the Glossary .

  • early_stopping: 'auto' or bool, default='auto' If 'auto', early stopping is enabled if the sample size is larger than 10000 or if X_val and y_val are passed to fit. If True, early stopping is enabled, otherwise early stopping is disabled.

    Added in 0.23

  • scoring: str or callable or None, default='loss' Scoring method to use for early stopping. Only used if early_stopping is enabled. Options:

    • str: see :ref:scoring_string_names for options.
    • callable: a scorer callable object (e.g., function) with signature scorer(estimator, X, y). See :ref:scoring_callable for details.
    • None: accuracy: accuracy_score is used.
    • 'loss': early stopping is checked w.r.t the loss value.
  • validation_fraction: int or float or None, default=0.1 Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data. The value is ignored if either early stopping is not performed, e.g. early_stopping=False, or if X_val and y_val are passed to fit.

  • n_iter_no_change: int, default=10 Used to determine when to "early stop". The fitting process is stopped when none of the last n_iter_no_change scores are better than the n_iter_no_change - 1 -th-to-last one, up to some tolerance. Only used if early stopping is performed.

  • tol: float, default=1e-7 The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score.

  • verbose: int, default=0 The verbosity level. If not zero, print some information about the fitting process. 1 prints only summary info, 2 prints info per iteration.

  • random_state: int, RandomState instance or None, default=None Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. Pass an int for reproducible output across multiple function calls. See Glossary .

  • class_weight: dict or 'balanced', default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

    Added in 1.2

Attributes

  • classes_: array, shape = (n_classes,) Class labels.

  • do_early_stopping_: bool Indicates whether early stopping is used during training.

  • n_iter_: int The number of iterations as selected by early stopping, depending on the early_stopping parameter. Otherwise it corresponds to max_iter.

  • n_trees_per_iteration_: int The number of tree that are built at each iteration. This is equal to 1 for binary classification, and to n_classes for multiclass classification.

  • train_score_: ndarray, shape (n_iter_+1,) The scores at each iteration on the training data. The first entry is the score of the ensemble before the first iteration. Scores are computed according to the scoring parameter. If scoring is not 'loss', scores are computed on a subset of at most 10 000 samples. Empty if no early stopping.

  • validation_score_: ndarray, shape (n_iter_+1,) The scores at each iteration on the held-out validation data. The first entry is the score of the ensemble before the first iteration. Scores are computed according to the scoring parameter. Empty if no early stopping or if validation_fraction is None.

  • is_categorical_: ndarray, shape (n_features, ) or None Boolean mask for the categorical features. None if there are no categorical features.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • GradientBoostingClassifier: Exact gradient boosting method that does not scale as good on datasets with a large number of samples.
  • sklearn.tree.DecisionTreeClassifier: A decision tree classifier.
  • RandomForestClassifier: A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
  • AdaBoostClassifier: A meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

Examples

from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
clf = HistGradientBoostingClassifier().fit(X, y)
clf.score(X, y)
1.0


24.2.15 /k-neighbors-classifier

name type default description
algorithm
leaf-size
metric
metric-params
n-jobs
n-neighbors
p
weights
predict-proba?

Classifier implementing the k-nearest neighbors vote.

Read more in the User Guide: classification.

Parameters

  • n_neighbors: int, default=5 Number of neighbors to use by default for kneighbors queries.

  • weights: {'uniform', 'distance'}, callable or None, default='uniform' Weight function used in prediction. Possible values:

    • 'uniform' : uniform weights. All points in each neighborhood are weighted equally.
    • 'distance' : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
    • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

    Refer to the example entitled :ref:sphx_glr_auto_examples_neighbors_plot_classification.py showing the impact of the weights parameter on the decision boundary.

  • algorithm: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto' Algorithm used to compute the nearest neighbors:

    • 'ball_tree' will use BallTree
    • 'kd_tree' will use KDTree
    • 'brute' will use a brute-force search.
    • 'auto' will attempt to decide the most appropriate algorithm based on the values passed to fit method.

    Note: fitting on sparse input will override the setting of this parameter, using brute force.

  • leaf_size: int, default=30 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • p: float, default=2 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected to be positive.

  • metric: str or callable, default='minkowski' Metric to use for distance computation. Default is "minkowski", which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in ~sklearn.metrics.pairwise.distance_metrics for valid metric values.

    If metric is "precomputed", X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only "nonzero" elements may be considered neighbors.

    If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy's metrics, but is less efficient than passing the metric name as a string.

  • metric_params: dict, default=None Additional keyword arguments for the metric function.

  • n_jobs: int, default=None The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn't affect fit method.

Attributes

  • classes_: array of shape (n_classes,) Class labels known to the classifier

  • effective_metric_: str or callble The distance metric used. It will be same as the metric parameter or a synonym of it, e.g. 'euclidean' if the metric parameter set to 'minkowski' and p parameter set to 2.

  • effective_metric_params_: dict Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to 'minkowski'.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_samples_fit_: int Number of samples in the fitted data.

  • outputs_2d_: bool False when y's shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.

See Also

RadiusNeighborsClassifier: Classifier based on neighbors within a fixed radius. KNeighborsRegressor: Regression based on k-nearest neighbors. RadiusNeighborsRegressor: Regression based on neighbors within a fixed radius. NearestNeighbors: Unsupervised learner for implementing neighbor searches.

Notes

See Nearest Neighbors: neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.


⚠️ Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.


https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y)
KNeighborsClassifier(...)
print(neigh.predict([[1.1]]))
[0]
print(neigh.predict_proba([[0.9]]))
[[0.666 0.333]]


24.2.16 /label-propagation

name type default description
gamma
kernel
max-iter
n-jobs
n-neighbors
tol
predict-proba?

Label Propagation classifier.

Read more in the User Guide: label_propagation.

Parameters

  • kernel: {'knn', 'rbf'} or callable, default='rbf' String identifier for kernel function to use or the kernel function itself. Only 'rbf' and 'knn' strings are valid inputs. The function passed should take two inputs, each of shape (n_samples, n_features), and return a (n_samples, n_samples) shaped weight matrix.

  • gamma: float, default=20 Parameter for rbf kernel.

  • n_neighbors: int, default=7 Parameter for knn kernel which need to be strictly positive.

  • max_iter: int, default=1000 Change maximum number of iterations allowed.

  • tol: float, default=1e-3 Convergence tolerance: threshold to consider the system at steady state.

  • n_jobs: int, default=None The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

  • X_: {array-like, sparse matrix} of shape (n_samples, n_features) Input array.

  • classes_: ndarray of shape (n_classes,) The distinct labels used in classifying instances.

  • label_distributions_: ndarray of shape (n_samples, n_classes) Categorical distribution for each item.

  • transduction_: ndarray of shape (n_samples) Label assigned to each item during fit.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: int Number of iterations run.

See Also

  • LabelSpreading: Alternate label propagation strategy more robust to noise.

References

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf

Examples

import numpy as np
from sklearn import datasets
from sklearn.semi_supervised import LabelPropagation
label_prop_model = LabelPropagation()
iris = datasets.load_iris()
rng = np.random.RandomState(42)
random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
labels = np.copy(iris.target)
labels[random_unlabeled_points] = -1
label_prop_model.fit(iris.data, labels)
LabelPropagation(...)


24.2.17 /label-spreading

name type default description
alpha
gamma
kernel
max-iter
n-jobs
n-neighbors
tol
predict-proba?

LabelSpreading model for semi-supervised learning.

This model is similar to the basic Label Propagation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels.

Read more in the User Guide: label_propagation.

Parameters

  • kernel: {'knn', 'rbf'} or callable, default='rbf' String identifier for kernel function to use or the kernel function itself. Only 'rbf' and 'knn' strings are valid inputs. The function passed should take two inputs, each of shape (n_samples, n_features), and return a (n_samples, n_samples) shaped weight matrix.

  • gamma: float, default=20 Parameter for rbf kernel.

  • n_neighbors: int, default=7 Parameter for knn kernel which is a strictly positive integer.

  • alpha: float, default=0.2 Clamping factor. A value in (0, 1) that specifies the relative amount that an instance should adopt the information from its neighbors as opposed to its initial label. alpha=0 means keeping the initial label information; alpha=1 means replacing all initial information.

  • max_iter: int, default=30 Maximum number of iterations allowed.

  • tol: float, default=1e-3 Convergence tolerance: threshold to consider the system at steady state.

  • n_jobs: int, default=None The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

  • X_: ndarray of shape (n_samples, n_features) Input array.

  • classes_: ndarray of shape (n_classes,) The distinct labels used in classifying instances.

  • label_distributions_: ndarray of shape (n_samples, n_classes) Categorical distribution for each item.

  • transduction_: ndarray of shape (n_samples,) Label assigned to each item during fit.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: int Number of iterations run.

See Also

  • LabelPropagation: Unregularized graph based semi-supervised learning.

References

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Schoelkopf. Learning with local and global consistency (2004)

Examples

import numpy as np
from sklearn import datasets
from sklearn.semi_supervised import LabelSpreading
label_prop_model = LabelSpreading()
iris = datasets.load_iris()
rng = np.random.RandomState(42)
random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
labels = np.copy(iris.target)
labels[random_unlabeled_points] = -1
label_prop_model.fit(iris.data, labels)
LabelSpreading(...)


24.2.18 /linear-discriminant-analysis

name type default description
covariance-estimator
n-components
priors
shrinkage
solver
store-covariance
tol
predict-proba?

Linear Discriminant Analysis.

A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions, using the transform method.

Added in 0.17

For a comparison between ~sklearn.discriminant_analysis.LinearDiscriminantAnalysis and ~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis, see :ref:sphx_glr_auto_examples_classification_plot_lda_qda.py.

Read more in the User Guide: lda_qda.

Parameters

  • solver: {'svd', 'lsqr', 'eigen'}, default='svd' Solver to use, possible values: - 'svd': Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features. - 'lsqr': Least squares solution. Can be combined with shrinkage or custom covariance estimator. - 'eigen': Eigenvalue decomposition. Can be combined with shrinkage or custom covariance estimator.

    Changed in 1.2 solver="svd" now has experimental Array API support. See the Array API User Guide: array_api for more details.

  • shrinkage: 'auto' or float, default=None Shrinkage parameter, possible values: - None: no shrinkage (default). - 'auto': automatic shrinkage using the Ledoit-Wolf lemma. - float between 0 and 1: fixed shrinkage parameter.

    This should be left to None if covariance_estimator is used. Note that shrinkage works only with 'lsqr' and 'eigen' solvers.

    For a usage example, see :ref:sphx_glr_auto_examples_classification_plot_lda.py.

  • priors: array-like of shape (n_classes,), default=None The class prior probabilities. By default, the class proportions are inferred from the training data.

  • n_components: int, default=None Number of components (<= min(n_classes - 1, n_features)) for dimensionality reduction. If None, will be set to min(n_classes - 1, n_features). This parameter only affects the transform method.

    For a usage example, see :ref:sphx_glr_auto_examples_decomposition_plot_pca_vs_lda.py.

  • store_covariance: bool, default=False If True, explicitly compute the weighted within-class covariance matrix when solver is 'svd'. The matrix is always computed and stored for the other solvers.

    Added in 0.17

  • tol: float, default=1.0e-4 Absolute threshold for a singular value of X to be considered significant, used to estimate the rank of X. Dimensions whose singular values are non-significant are discarded. Only used if solver is 'svd'.

    Added in 0.17

  • covariance_estimator: covariance estimator, default=None If not None, covariance_estimator is used to estimate the covariance matrices instead of relying on the empirical covariance estimator (with potential shrinkage). The object should have a fit method and a covariance_ attribute like the estimators in sklearn.covariance. if None the shrinkage parameter drives the estimate.

    This should be left to None if shrinkage is used. Note that covariance_estimator works only with 'lsqr' and 'eigen' solvers.

    Added in 0.24

Attributes

  • coef_: ndarray of shape (n_features,) or (n_classes, n_features) Weight vector(s).

  • intercept_: ndarray of shape (n_classes,) Intercept term.

  • covariance_: array-like of shape (n_features, n_features) Weighted within-class covariance matrix. It corresponds to sum_k prior_k * C_k where C_k is the covariance matrix of the samples in class k. The C_k are estimated using the (potentially shrunk) biased estimator of covariance. If solver is 'svd', only exists when store_covariance is True.

  • explained_variance_ratio_: ndarray of shape (n_components,) Percentage of variance explained by each of the selected components. If n_components is not set then all components are stored and the sum of explained variances is equal to 1.0. Only available when eigen or svd solver is used.

  • means_: array-like of shape (n_classes, n_features) Class-wise means.

  • priors_: array-like of shape (n_classes,) Class priors (sum to 1).

  • scalings_: array-like of shape (rank, n_classes - 1) Scaling of the features in the space spanned by the class centroids. Only available for 'svd' and 'eigen' solvers.

  • xbar_: array-like of shape (n_features,) Overall mean. Only present if solver is 'svd'.

  • classes_: array-like of shape (n_classes,) Unique class labels.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • QuadraticDiscriminantAnalysis: Quadratic Discriminant Analysis.

Examples

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = LinearDiscriminantAnalysis()
clf.fit(X, y)
LinearDiscriminantAnalysis()
print(clf.predict([[-0.8, -1]]))
[1]


24.2.19 /linear-svc

name type default description
tol
intercept-scaling
multi-class
penalty
c
max-iter
random-state
dual
fit-intercept
class-weight
loss
verbose
predict-proba?

Linear Support Vector Classification.

Similar to SVC with parameter kernel='linear', but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

The main differences between ~sklearn.svm.LinearSVC and ~sklearn.svm.SVC lie in the loss function used by default, and in the handling of intercept regularization between those two implementations.

This class supports both dense and sparse input and the multiclass support is handled according to a one-vs-the-rest scheme.

Read more in the User Guide: svm_classification.

Parameters

  • penalty: {'l1', 'l2'}, default='l2' Specifies the norm used in the penalization. The 'l2' penalty is the standard used in SVC. The 'l1' leads to coef_ vectors that are sparse.

  • loss: {'hinge', 'squared_hinge'}, default='squared_hinge' Specifies the loss function. 'hinge' is the standard SVM loss (used e.g. by the SVC class) while 'squared_hinge' is the square of the hinge loss. The combination of penalty='l1' and loss='hinge' is not supported.

  • dual: "auto" or bool, default="auto" Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features. dual="auto" will choose the value of the parameter automatically, based on the values of n_samples, n_features, loss, multi_class and penalty. If n_samples < n_features and optimizer supports chosen loss, multi_class and penalty, then dual will be set to True, otherwise it will be set to False.

    Changed in 1.3 The "auto" option is added in version 1.3 and will be the default in version 1.5.

  • tol: float, default=1e-4 Tolerance for stopping criteria.

  • C: float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. For an intuitive visualization of the effects of scaling the regularization parameter C, see :ref:sphx_glr_auto_examples_svm_plot_svm_scale_c.py.

  • multi_class: {'ovr', 'crammer_singer'}, default='ovr' Determines the multi-class strategy if y contains more than two classes. "ovr" trains n_classes one-vs-rest classifiers, while "crammer_singer" optimizes a joint objective over all classes. While crammer_singer is interesting from a theoretical perspective as it is consistent, it is seldom used in practice as it rarely leads to better accuracy and is more expensive to compute. If "crammer_singer" is chosen, the options loss, penalty and dual will be ignored.

  • fit_intercept: bool, default=True Whether or not to fit an intercept. If set to True, the feature vector is extended to include an intercept term: [x_1, ..., x_n, 1], where 1 corresponds to the intercept. If set to False, no intercept will be used in calculations (i.e. data is expected to be already centered).

  • intercept_scaling: float, default=1.0 When fit_intercept is True, the instance vector x becomes [x_1, ..., x_n, intercept_scaling], i.e. a "synthetic" feature with a constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight. Note that liblinear internally penalizes the intercept, treating it like any other term in the feature vector. To reduce the impact of the regularization on the intercept, the intercept_scaling parameter can be set to a value greater than 1; the higher the value of intercept_scaling, the lower the impact of regularization on it. Then, the weights become [w_x_1, ..., w_x_n, w_intercept*intercept_scaling], where w_x_1, ..., w_x_n represent the feature weights and the intercept weight is scaled by intercept_scaling. This scaling allows the intercept term to have a different regularization behavior compared to the other features.

  • class_weight: dict or 'balanced', default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • verbose: int, default=0 Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.

  • random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for the dual coordinate descent (if dual=True). When dual=False the underlying implementation of LinearSVC is not random and random_state has no effect on the results. Pass an int for reproducible output across multiple function calls. See Glossary .

  • max_iter: int, default=1000 The maximum number of iterations to be run.

Attributes

  • coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features (coefficients in the primal problem).

    coef_ is a readonly property derived from raw_coef_ that follows the internal memory layout of liblinear.

  • intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.

  • classes_: ndarray of shape (n_classes,) The unique classes labels.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: int Maximum number of iterations run across all classes.

See Also

  • SVC: Implementation of Support Vector Machine classifier using libsvm: the kernel can be non-linear but its SMO algorithm does not scale to large number of samples as LinearSVC does.

    Furthermore SVC multi-class mode is implemented using one vs one scheme while LinearSVC uses one vs the rest. It is possible to implement one vs the rest with SVC by using the ~sklearn.multiclass.OneVsRestClassifier wrapper.

    Finally SVC can fit dense data without memory copy if the input is C-contiguous. Sparse data will still incur memory copy though.

  • sklearn.linear_model.SGDClassifier: SGDClassifier can optimize the same cost function as LinearSVC by adjusting the penalty and loss parameters. In addition it requires less memory, allows incremental (online) learning, and implements various loss functions and regularization regimes.

Notes

The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.

The underlying implementation, liblinear, uses a sparse internal representation for the data that will incur a memory copy.

Predict output may not match that of standalone liblinear in certain cases. See differences from liblinear: liblinear_differences in the narrative documentation.

References

LIBLINEAR: A Library for Large Linear Classification

Examples

from sklearn.svm import LinearSVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
X, y = make_classification(n_features=4, random_state=0)
clf = make_pipeline(StandardScaler(),
                    LinearSVC(random_state=0, tol=1e-5))
clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('linearsvc', LinearSVC(random_state=0, tol=1e-05))])
print(clf.named_steps['linearsvc'].coef_)
[[0.141   0.526 0.679 0.493]]
print(clf.named_steps['linearsvc'].intercept_)
[0.1693]
print(clf.predict([[0, 0, 0, 0]]))
[1]


24.2.20 /logistic-regression

name type default description
tol
intercept-scaling
solver
penalty
c
max-iter
n-jobs
random-state
dual
fit-intercept
warm-start
l-1-ratio
class-weight
verbose
predict-proba?

Logistic Regression (aka logit, MaxEnt) classifier.

This class implements regularized logistic regression using a set of available solvers. Note that regularization is applied by default. It can handle both dense and sparse input X. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).

The solvers 'lbfgs', 'newton-cg', 'newton-cholesky' and 'sag' support only L2 regularization with primal formulation, or no regularization. The 'liblinear' solver supports both L1 and L2 regularization (but not both, i.e. elastic-net), with a dual formulation only for the L2 penalty. The Elastic-Net (combination of L1 and L2) regularization is only supported by the 'saga' solver.

For multiclass problems (whenever n_classes >= 3), all solvers except 'liblinear' optimize the (penalized) multinomial loss. 'liblinear' only handles binary classification but can be extended to handle multiclass by using ~sklearn.multiclass.OneVsRestClassifier.

Read more in the User Guide: logistic_regression.

Parameters

  • penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty:

    • None: no penalty is added;
    • 'l2': add a L2 penalty term and it is the default choice;
    • 'l1': add a L1 penalty term;
    • 'elasticnet': both L1 and L2 penalty terms are added.

⚠️ Warning

Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.

versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1)

deprecated:: 1.8 penalty was deprecated in version 1.8 and will be removed in 1.10. Use l1_ratio instead. l1_ratio=0 for penalty='l2', l1_ratio=1 for penalty='l1' and l1_ratio set to any float between 0 and 1 for 'penalty='elasticnet'.


  • C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. C=np.inf results in unpenalized logistic regression. For a visual example on the effect of tuning the C parameter with an L1 penalty, see: :ref:sphx_glr_auto_examples_linear_model_plot_logistic_path.py.

  • l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Setting l1_ratio=1 gives a pure L1-penalty, setting l1_ratio=0 a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form l1_ratio * L1 + (1 - l1_ratio) * L2.


⚠️ Warning

Certain values of l1_ratio, i.e. some penalties, may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.

versionchanged:: 1.8 Default value changed from None to 0.0.

deprecated:: 1.8 None is deprecated and will be removed in version 1.10. Always use l1_ratio to specify the penalty type.


  • dual: bool, default=False Dual (constrained) or primal (regularized, see also this equation: regularized-logistic-loss) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

  • tol: float, default=1e-4 Tolerance for stopping criteria.

  • fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

  • intercept_scaling: float, default=1 Useful only when the solver liblinear is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.


🛈 Note

The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.


  • class_weight: dict or 'balanced', default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

    Added in 0.17 class_weight='balanced'

  • random_state: int, RandomState instance, default=None Used when solver == 'sag', 'saga' or 'liblinear' to shuffle the data. See Glossary for details.

  • solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

    Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects:

    • 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems.
    • For multiclass problems (n_classes >= 3), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error.
    • 'newton-cholesky' is a good choice for n_samples >> n_features * n_classes, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on n_features * n_classes because it explicitly computes the full Hessian matrix.
    • For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones;
    • 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the ~sklearn.multiclass.OneVsRestClassifier.

⚠️ Warning

The choice of the algorithm depends on the penalty chosen (l1_ratio=0 for L2-penalty, l1_ratio=1 for L1-penalty and 0 < l1_ratio < 1 for Elastic-Net) and on (multinomial) multiclass support:

================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ======================

note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.

seealso:: Refer to the :ref:User Guide for more information regarding :class:LogisticRegression and more specifically the :ref:Table summarizing solver/penalty supports.

versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. versionadded:: 0.19 SAGA solver. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.


  • max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.

  • verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See the Glossary .

    Added in 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.

  • n_jobs: int, default=None Does not have any effect.

    Deprecated since 1.8 n_jobs is deprecated in version 1.8 and will be removed in 1.10.

Attributes

  • classes_: ndarray of shape (n_classes, ) A list of class labels known to the classifier.

  • coef_: ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.

    coef_ is of shape (1, n_features) when the given problem is binary.

  • intercept_: ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function.

    If fit_intercept is set to False, the intercept is set to zero. intercept_ is of shape (1,) when the given problem is binary.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: ndarray of shape (1, ) Actual number of iterations for all classes.

    Changed in 0.20

      In SciPy <= 1.0.0 the number of lbfgs iterations may exceed
      ``max_iter``. ``n_iter_`` will now report at most ``max_iter``.
    

See Also

  • SGDClassifier: Incrementally trained logistic regression (when given the parameter loss="log_loss").
  • LogisticRegressionCV: Logistic regression with built-in cross validation.

Notes

The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.

Predict output may not match that of standalone liblinear in certain cases. See differences from liblinear: liblinear_differences in the narrative documentation.

References

L-BFGS-B -- Software for Large-scale Bound-constrained Optimization Ciyou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales. http://users.iems.northwestern.edu/~nocedal/lbfgsb.html

LIBLINEAR -- A Library for Large Linear Classification https://www.csie.ntu.edu.tw/~cjlin/liblinear/

SAG -- Mark Schmidt, Nicolas Le Roux, and Francis Bach Minimizing Finite Sums with the Stochastic Average Gradient https://hal.inria.fr/hal-00860051/document

SAGA -- Defazio, A., Bach F. & Lacoste-Julien S. (2014). :arxiv:"SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives" <1407.0202>

Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning 85(1-2):41-75. https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf

Examples

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
clf.predict(X[:2, :])
array([0, 0])
clf.predict_proba(X[:2, :])
array([[9.82e-01, 1.82e-02, 1.44e-08],
       [9.72e-01, 2.82e-02, 3.02e-08]])
clf.score(X, y)
0.97

For a comparison of the LogisticRegression with other classifiers see: :ref:sphx_glr_auto_examples_classification_plot_classification_probability.py.



24.2.21 /logistic-regression-cv

name type default description
refit
scoring
tol
intercept-scaling
solver
penalty
max-iter
n-jobs
random-state
dual
use-legacy-attributes
fit-intercept
cv
cs
class-weight
verbose
l-1-ratios
predict-proba?

Logistic Regression CV (aka logit, MaxEnt) classifier.

See glossary entry for cross-validation estimator.

This class implements regularized logistic regression with implicit cross validation for the penalty parameters C and l1_ratio, see LogisticRegression, using a set of available solvers.

The solvers 'lbfgs', 'newton-cg', 'newton-cholesky' and 'sag' support only L2 regularization with primal formulation. The 'liblinear' solver supports both L1 and L2 regularization (but not both, i.e. elastic-net), with a dual formulation only for the L2 penalty. The Elastic-Net (combination of L1 and L2) regularization is only supported by the 'saga' solver.

For the grid of Cs values and l1_ratios values, the best hyperparameter is selected by the cross-validator ~sklearn.model_selection.StratifiedKFold, but it can be changed using the cv parameter. All solvers except 'liblinear' can warm-start the coefficients (see Glossary).

Read more in the User Guide: logistic_regression.

Parameters

  • Cs: int or list of floats, default=10 Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4. Like in support vector machines, smaller values specify stronger regularization.

  • l1_ratios: array-like of shape (n_l1_ratios), default=None Floats between 0 and 1 passed as Elastic-Net mixing parameter (scaling between L1 and L2 penalties). For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2. All the values of the given array-like are tested by cross-validation and the one giving the best prediction score is used.


⚠️ Warning

Certain values of l1_ratios, i.e. some penalties, may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.

deprecated:: 1.8 l1_ratios=None is deprecated in 1.8 and will raise an error in version 1.10. Default value will change from None to (0.0,) in version 1.10.


  • fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

  • cv: int or cross-validation generator, default=None The default cross-validation generator used is Stratified K-Folds. If an integer is provided, it specifies the number of folds, n_folds, used. See the module sklearn.model_selection module for the list of possible cross-validation objects.

    Changed in 0.22 cv default value if None changed from 3-fold to 5-fold.

  • dual: bool, default=False Dual (constrained) or primal (regularized, see also this equation: regularized-logistic-loss) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

  • penalty: {'l1', 'l2', 'elasticnet'}, default='l2' Specify the norm of the penalty:

    • 'l2': add a L2 penalty term (used by default);
    • 'l1': add a L1 penalty term;
    • 'elasticnet': both L1 and L2 penalty terms are added.

⚠️ Warning

Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.

deprecated:: 1.8 penalty was deprecated in version 1.8 and will be removed in 1.10. Use l1_ratio instead. l1_ratio=0 for penalty='l2', l1_ratio=1 for penalty='l1' and l1_ratio set to any float between 0 and 1 for 'penalty='elasticnet'.


  • scoring: str or callable, default=None The scoring method to use for cross-validation. Options:

    • str: see :ref:scoring_string_names for options.
    • callable: a scorer callable object (e.g., function) with signature scorer(estimator, X, y). See :ref:scoring_callable for details.
    • None: accuracy: accuracy_score is used.
  • solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

    Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects:

    • 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems.
    • For multiclass problems (n_classes >= 3), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error.
    • 'newton-cholesky' is a good choice for n_samples >> n_features * n_classes, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on n_features * n_classes because it explicitly computes the full Hessian matrix.
    • For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones;
    • 'liblinear' might be slower in LogisticRegressionCV because it does not handle warm-starting.
    • 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the ~sklearn.multiclass.OneVsRestClassifier.

⚠️ Warning

The choice of the algorithm depends on the penalty (l1_ratio=0 for L2-penalty, l1_ratio=1 for L1-penalty and 0 < l1_ratio < 1 for Elastic-Net) chosen and on (multinomial) multiclass support:

================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ======================

note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.

versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. versionadded:: 0.19 SAGA solver. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.


  • tol: float, default=1e-4 Tolerance for stopping criteria.

  • max_iter: int, default=100 Maximum number of iterations of the optimization algorithm.

  • class_weight: dict or 'balanced', default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

    Added in 0.17 class_weight == 'balanced'

  • n_jobs: int, default=None Number of CPU cores used during the cross-validation loop. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • verbose: int, default=0 For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any positive number for verbosity.

  • refit: bool, default=True If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters. Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.

  • intercept_scaling: float, default=1 Useful only when the solver liblinear is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.


🛈 Note

The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.


  • random_state: int, RandomState instance, default=None Used when solver='sag', 'saga' or 'liblinear' to shuffle the data. Note that this only applies to the solver and not the cross-validation generator. See Glossary for details.

  • use_legacy_attributes: bool, default=True If True, use legacy values for attributes:

    • C_ is an ndarray of shape (n_classes,) with the same value repeated
    • l1_ratio_ is an ndarray of shape (n_classes,) with the same value repeated
    • coefs_paths_ is a dict with class labels as keys and ndarrays as values
    • scores_ is a dict with class labels as keys and ndarrays as values
    • n_iter_ is an ndarray of shape (1, n_folds, n_cs) or similar

    If False, use new values for attributes:

    • C_ is a float
    • l1_ratio_ is a float
    • coefs_paths_ is an ndarray of shape (n_folds, n_l1_ratios, n_cs, n_classes, n_features) For binary problems (n_classes=2), the 2nd last dimension is 1.
    • scores_ is an ndarray of shape (n_folds, n_l1_ratios, n_cs)
    • n_iter_ is an ndarray of shape (n_folds, n_l1_ratios, n_cs)

    Changed in 1.10 The default will change from True to False in version 1.10. Deprecated since 1.10 use_legacy_attributes will be deprecated in version 1.10 and be removed in 1.12.

Attributes

  • classes_: ndarray of shape (n_classes, ) A list of class labels known to the classifier.

  • coef_: ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.

    coef_ is of shape (1, n_features) when the given problem is binary.

  • intercept_: ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function.

    If fit_intercept is set to False, the intercept is set to zero. intercept_ is of shape (1,) when the problem is binary.

  • Cs_: ndarray of shape (n_cs) Array of C i.e. inverse of regularization parameter values used for cross-validation.

  • l1_ratios_: ndarray of shape (n_l1_ratios) Array of l1_ratios used for cross-validation. If l1_ratios=None is used (i.e. penalty is not 'elasticnet'), this is set to [None]

  • coefs_paths_: dict of ndarray of shape (n_folds, n_cs, n_dof) or (n_folds, n_cs, n_l1_ratios, n_dof) A dict with classes as the keys, and the path of coefficients obtained during cross-validating across each fold (n_folds) and then across each Cs (n_cs). The size of the coefficients is the number of degrees of freedom (n_dof), i.e. without intercept n_dof=n_features and with intercept n_dof=n_features+1. If penalty='elasticnet', there is an additional dimension for the number of l1_ratio values (n_l1_ratios), which gives a shape of (n_folds, n_cs, n_l1_ratios_, n_dof). See also parameter use_legacy_attributes.

  • scores_: dict A dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold. The same score is repeated across all classes. Each dict value has shape (n_folds, n_cs) or (n_folds, n_cs, n_l1_ratios) if penalty='elasticnet'. See also parameter use_legacy_attributes.

  • C_: ndarray of shape (n_classes,) or (1,) The value of C that maps to the best score, repeated n_classes times. If refit is set to False, the best C is the average of the C's that correspond to the best score for each fold. C_ is of shape (1,) when the problem is binary. See also parameter use_legacy_attributes.

  • l1_ratio_: ndarray of shape (n_classes,) or (n_classes - 1,) The value of l1_ratio that maps to the best score, repeated n_classes times. If refit is set to False, the best l1_ratio is the average of the l1_ratio's that correspond to the best score for each fold. l1_ratio_ is of shape (1,) when the problem is binary. See also parameter use_legacy_attributes.

  • n_iter_: ndarray of shape (1, n_folds, n_cs) or (1, n_folds, n_cs, n_l1_ratios) Actual number of iterations for all classes, folds and Cs. If penalty='elasticnet', the shape is (1, n_folds, n_cs, n_l1_ratios). See also parameter use_legacy_attributes.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • LogisticRegression: Logistic regression without tuning the hyperparameter C.

Examples

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegressionCV
X, y = load_iris(return_X_y=True)
clf = LogisticRegressionCV(
    cv=5, random_state=0, use_legacy_attributes=False, l1_ratios=(0,)
).fit(X, y)
clf.predict(X[:2, :])
array([0, 0])
clf.predict_proba(X[:2, :]).shape
(2, 3)
clf.score(X, y)
0.98...


24.2.22 /mlp-classifier

name type default description
n-iter-no-change
learning-rate
activation
hidden-layer-sizes
tol
beta-2
early-stopping
nesterovs-momentum
batch-size
solver
shuffle
power-t
max-fun
beta-1
max-iter
random-state
momentum
learning-rate-init
alpha
warm-start
validation-fraction
verbose
epsilon
predict-proba?

Multi-layer Perceptron classifier.

This model optimizes the log-loss function using LBFGS or stochastic gradient descent.

Added in 0.18

Parameters

  • hidden_layer_sizes: array-like of shape(n_layers - 2,), default=(100,) The ith element represents the number of neurons in the ith hidden layer.

  • activation: {'identity', 'logistic', 'tanh', 'relu'}, default='relu' Activation function for the hidden layer.

    • 'identity', no-op activation, useful to implement linear bottleneck, returns f(x) = x

    • 'logistic', the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).

    • 'tanh', the hyperbolic tan function, returns f(x) = tanh(x).

    • 'relu', the rectified linear unit function, returns f(x) = max(0, x)

  • solver: {'lbfgs', 'sgd', 'adam'}, default='adam' The solver for weight optimization.

    • 'lbfgs' is an optimizer in the family of quasi-Newton methods.

    • 'sgd' refers to stochastic gradient descent.

    • 'adam' refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba

    For a comparison between Adam optimizer and SGD, see :ref:sphx_glr_auto_examples_neural_networks_plot_mlp_training_curves.py.

    Note: The default solver 'adam' works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, 'lbfgs' can converge faster and perform better.

  • alpha: float, default=0.0001 Strength of the L2 regularization term. The L2 regularization term is divided by the sample size when added to the loss.

    For an example usage and visualization of varying regularization, see :ref:sphx_glr_auto_examples_neural_networks_plot_mlp_alpha.py.

  • batch_size: int, default='auto' Size of minibatches for stochastic optimizers. If the solver is 'lbfgs', the classifier will not use minibatch. When set to "auto", batch_size=min(200, n_samples).

  • learning_rate: {'constant', 'invscaling', 'adaptive'}, default='constant' Learning rate schedule for weight updates.

    • 'constant' is a constant learning rate given by 'learning_rate_init'.

    • 'invscaling' gradually decreases the learning rate at each time step 't' using an inverse scaling exponent of 'power_t'. effective_learning_rate = learning_rate_init / pow(t, power_t)

    • 'adaptive' keeps the learning rate constant to 'learning_rate_init' as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if 'early_stopping' is on, the current learning rate is divided by 5.

    Only used when solver='sgd'.

  • learning_rate_init: float, default=0.001 The initial learning rate used. It controls the step-size in updating the weights. Only used when solver='sgd' or 'adam'.

  • power_t: float, default=0.5 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to 'invscaling'. Only used when solver='sgd'.

  • max_iter: int, default=200 Maximum number of iterations. The solver iterates until convergence (determined by 'tol') or this number of iterations. For stochastic solvers ('sgd', 'adam'), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

  • shuffle: bool, default=True Whether to shuffle samples in each iteration. Only used when solver='sgd' or 'adam'.

  • random_state: int, RandomState instance, default=None Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver='sgd' or 'adam'. Pass an int for reproducible results across multiple function calls. See Glossary .

  • tol: float, default=1e-4 Tolerance for the optimization. When the loss or score is not improving by at least tol for n_iter_no_change consecutive iterations, unless learning_rate is set to 'adaptive', convergence is considered to be reached and training stops.

  • verbose: bool, default=False Whether to print progress messages to stdout.

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary .

  • momentum: float, default=0.9 Momentum for gradient descent update. Should be between 0 and 1. Only used when solver='sgd'.

  • nesterovs_momentum: bool, default=True Whether to use Nesterov's momentum. Only used when solver='sgd' and momentum > 0.

  • early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside validation_fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs. The split is stratified, except in a multilabel setting. If early stopping is False, then the training stops when the training loss does not improve by more than tol for n_iter_no_change consecutive passes over the training set. Only effective when solver='sgd' or 'adam'.

  • validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

  • beta_1: float, default=0.9 Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver='adam'.

  • beta_2: float, default=0.999 Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver='adam'.

  • epsilon: float, default=1e-8 Value for numerical stability in adam. Only used when solver='adam'.

  • n_iter_no_change: int, default=10 Maximum number of epochs to not meet tol improvement. Only effective when solver='sgd' or 'adam'.

    Added in 0.20

  • max_fun: int, default=15000 Only used when solver='lbfgs'. Maximum number of loss function calls. The solver iterates until convergence (determined by 'tol'), number of iterations reaches max_iter, or this number of loss function calls. Note that number of loss function calls will be greater than or equal to the number of iterations for the MLPClassifier.

    Added in 0.22

Attributes

  • classes_: ndarray or list of ndarray of shape (n_classes,) Class labels for each output.

  • loss_: float The current loss computed with the loss function.

  • best_loss_: float or None The minimum loss reached by the solver throughout fitting. If early_stopping=True, this attribute is set to None. Refer to the best_validation_score_ fitted attribute instead.

  • loss_curve_: list of shape (n_iter_,) The ith element in the list represents the loss at the ith iteration.

  • validation_scores_: list of shape (n_iter_,) or None The score at each iteration on a held-out validation set. The score reported is the accuracy score. Only available if early_stopping=True, otherwise the attribute is set to None.

  • best_validation_score_: float or None The best validation score (i.e. accuracy score) that triggered the early stopping. Only available if early_stopping=True, otherwise the attribute is set to None.

  • t_: int The number of training samples seen by the solver during fitting.

  • coefs_: list of shape (n_layers - 1,) The ith element in the list represents the weight matrix corresponding to layer i.

  • intercepts_: list of shape (n_layers - 1,) The ith element in the list represents the bias vector corresponding to layer i + 1.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: int The number of iterations the solver has run.

  • n_layers_: int Number of layers.

  • n_outputs_: int Number of outputs.

  • out_activation_: str Name of the output activation function.

See Also

  • MLPRegressor: Multi-layer Perceptron regressor.
  • BernoulliRBM: Bernoulli Restricted Boltzmann Machine (RBM).

Notes

MLPClassifier trains iteratively since at each time step the partial derivatives of the loss function with respect to the model parameters are computed to update the parameters.

It can also have a regularization term added to the loss function that shrinks model parameters to prevent overfitting.

This implementation works with data represented as dense numpy arrays or sparse scipy arrays of floating point values.

References

Hinton, Geoffrey E. "Connectionist learning procedures." Artificial intelligence 40.1 (1989): 185-234.

Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." International Conference on Artificial Intelligence and Statistics. 2010.

:arxiv:He, Kaiming, et al (2015). "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." <1502.01852>

:arxiv:Kingma, Diederik, and Jimmy Ba (2014) "Adam: A method for stochastic optimization." <1412.6980>

Examples

from sklearn.neural_network import MLPClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=100, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
                                                    random_state=1)
clf = MLPClassifier(random_state=1, max_iter=300).fit(X_train, y_train)
clf.predict_proba(X_test[:1])
array([[0.0383, 0.961]])
clf.predict(X_test[:5, :])
array([1, 0, 1, 0, 1])
clf.score(X_test, y_test)
0.8...


24.2.23 /multinomial-nb

name type default description
alpha
class-prior
fit-prior
force-alpha
predict-proba?

Naive Bayes classifier for multinomial models.

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

Read more in the User Guide: multinomial_naive_bayes.

Parameters

  • alpha: float or array-like of shape (n_features,), default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).

  • force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.

    Added in 1.2 Changed in 1.4 The default value of force_alpha changed to True.

  • fit_prior: bool, default=True Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

  • class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.

Attributes

  • class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

  • class_log_prior_: ndarray of shape (n_classes,) Smoothed empirical log probability for each class.

  • classes_: ndarray of shape (n_classes,) Class labels known to the classifier

  • feature_count_: ndarray of shape (n_classes, n_features) Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.

  • feature_log_prob_: ndarray of shape (n_classes, n_features) Empirical log probability of features given a class, P(x_i|y).

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.
  • CategoricalNB: Naive Bayes classifier for categorical features.
  • ComplementNB: Complement Naive Bayes classifier.
  • GaussianNB: Gaussian Naive Bayes.

References

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Examples

import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X, y)
MultinomialNB()
print(clf.predict(X[2:3]))
[3]


24.2.24 /nearest-centroid

name type default description
metric
priors
shrink-threshold
predict-proba?

Nearest centroid classifier.

Each class is represented by its centroid, with test samples classified to the class with the nearest centroid.

Read more in the User Guide: nearest_centroid_classifier.

Parameters

  • metric: {"euclidean", "manhattan"}, default="euclidean" Metric to use for distance computation.

    If metric="euclidean", the centroid for the samples corresponding to each class is the arithmetic mean, which minimizes the sum of squared L1 distances. If metric="manhattan", the centroid is the feature-wise median, which minimizes the sum of L1 distances.

    Changed in 1.5 All metrics but "euclidean" and "manhattan" were deprecated and now raise an error.

    Changed in 0.19 metric='precomputed' was deprecated and now raises an error

  • shrink_threshold: float, default=None Threshold for shrinking centroids to remove features.

  • priors: {"uniform", "empirical"} or array-like of shape (n_classes,), default="uniform" The class prior probabilities. By default, the class proportions are inferred from the training data.

    Added in 1.6

Attributes

  • centroids_: array-like of shape (n_classes, n_features) Centroid of each class.

  • classes_: array of shape (n_classes,) The unique classes labels.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • deviations_: ndarray of shape (n_classes, n_features) Deviations (or shrinkages) of the centroids of each class from the overall centroid. Equal to eq. (18.4) if shrink_threshold=None, else (18.5) p. 653 of [2]. Can be used to identify features used for classification.

    Added in 1.6

  • within_class_std_dev_: ndarray of shape (n_features,) Pooled or within-class standard deviation of input data.

    Added in 1.6

  • class_prior_: ndarray of shape (n_classes,) The class prior probabilities.

    Added in 1.6

See Also

  • KNeighborsClassifier: Nearest neighbors classifier.

Notes

When used for text classification with tf-idf vectors, this classifier is also known as the Rocchio classifier.

References

[1] Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567-6572. The National Academy of Sciences.

[2] Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning Data Mining, Inference, and Prediction. 2nd Edition. New York, Springer.

Examples

from sklearn.neighbors import NearestCentroid
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = NearestCentroid()
clf.fit(X, y)
NearestCentroid()
print(clf.predict([[-0.8, -1]]))
[1]


24.2.25 /nu-svc

name type default description
break-ties
kernel
gamma
degree
decision-function-shape
probability
tol
nu
shrinking
max-iter
random-state
coef-0
class-weight
cache-size
verbose
predict-proba?

Nu-Support Vector Classification.

Similar to SVC but uses a parameter to control the number of support vectors.

The implementation is based on libsvm.

Read more in the User Guide: svm_classification.

Parameters

  • nu: float, default=0.5 An upper bound on the fraction of margin errors (see User Guide: nu_svc) and a lower bound of the fraction of support vectors. Should be in the interval (0, 1].

  • kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to precompute the kernel matrix. For an intuitive visualization of different kernel types see :ref:sphx_glr_auto_examples_svm_plot_svm_kernels.py.

  • degree: int, default=3 Degree of the polynomial kernel function ('poly'). Must be non-negative. Ignored by all other kernels.

  • gamma: {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.

    • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,
    • if 'auto', uses 1 / n_features
    • if float, must be non-negative.

    Changed in 0.22 The default value of gamma changed from 'auto' to 'scale'.

  • coef0: float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'.

  • shrinking: bool, default=True Whether to use the shrinking heuristic. See the User Guide: shrinking_svm.

  • probability: bool, default=False Whether to enable probability estimates. This must be enabled prior to calling fit, will slow down that method as it internally uses 5-fold cross-validation, and predict_proba may be inconsistent with predict. Read more in the User Guide: scores_probabilities.

  • tol: float, default=1e-3 Tolerance for stopping criterion.

  • cache_size: float, default=200 Specify the size of the kernel cache (in MB).

  • class_weight: {dict, 'balanced'}, default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies as n_samples / (n_classes * np.bincount(y)).

  • verbose: bool, default=False Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

  • max_iter: int, default=-1 Hard limit on iterations within solver, or -1 for no limit.

  • decision_function_shape: {'ovo', 'ovr'}, default='ovr' Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one ('ovo') decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one ('ovo') is always used as multi-class strategy. The parameter is ignored for binary classification.

    Changed in 0.19 decision_function_shape is 'ovr' by default.

    Added in 0.17 decision_function_shape='ovr' is recommended.

    Changed in 0.17 Deprecated decision_function_shape='ovo' and None.

  • break_ties: bool, default=False If true, decision_function_shape='ovr', and number of classes > 2, predict will break ties according to the confidence values of decision_function; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict. See :ref:sphx_glr_auto_examples_svm_plot_svm_tie_breaking.py for an example of its usage with decision_function_shape='ovr'.

    Added in 0.22

  • random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary .

Attributes

  • class_weight_: ndarray of shape (n_classes,) Multipliers of parameter C of each class. Computed based on the class_weight parameter.

  • classes_: ndarray of shape (n_classes,) The unique classes labels.

  • coef_: ndarray of shape (n_classes * (n_classes -1) / 2, n_features) Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

    coef_ is readonly property derived from dual_coef_ and support_vectors_.

  • dual_coef_: ndarray of shape (n_classes - 1, n_SV) Dual coefficients of the support vector in the decision function (see :ref:sgd_mathematical_formulation), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the multi-class section of the User Guide: svm_multi_class for details.

  • fit_status_: int 0 if correctly fitted, 1 if the algorithm did not converge.

  • intercept_: ndarray of shape (n_classes * (n_classes - 1) / 2,) Constants in decision function.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: ndarray of shape (n_classes * (n_classes - 1) // 2,) Number of iterations run by the optimization routine to fit the model. The shape of this attribute depends on the number of models optimized which in turn depends on the number of classes.

    Added in 1.1

  • support_: ndarray of shape (n_SV,) Indices of support vectors.

  • support_vectors_: ndarray of shape (n_SV, n_features) Support vectors.

  • n_support_: ndarray of shape (n_classes,), dtype=int32 Number of support vectors for each class.

  • fit_status_: int 0 if correctly fitted, 1 if the algorithm did not converge.

  • probA_: ndarray of shape (n_classes * (n_classes - 1) / 2,)

  • probB_: ndarray of shape (n_classes * (n_classes - 1) / 2,) If probability=True, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. If probability=False, it's an empty array. Platt scaling uses the logistic function 1 / (1 + exp(decision_value * probA_ + probB_)) where probA_ and probB_ are learned from the dataset [2]. For more information on the multiclass case and training procedure see section 8 of [1].

  • shape_fit_: tuple of int of shape (n_dimensions_of_X,) Array dimensions of training vector X.

See Also

  • SVC: Support Vector Machine for classification using libsvm.

  • LinearSVC: Scalable linear Support Vector Machine for classification using liblinear.

References

Examples

import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import NuSVC
clf = make_pipeline(StandardScaler(), NuSVC())
clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()), ('nusvc', NuSVC())])
print(clf.predict([[-0.8, -1]]))
[1]


24.2.26 /passive-aggressive-classifier

name type default description
n-iter-no-change
average
tol
early-stopping
shuffle
c
max-iter
n-jobs
random-state
fit-intercept
warm-start
validation-fraction
class-weight
loss
verbose
predict-proba?

Passive Aggressive Classifier.

Deprecated since 1.8 The whole class PassiveAggressiveClassifier was deprecated in version 1.8 and will be removed in 1.10. Instead use:

    clf = SGDClassifier(
        loss="hinge",
        penalty=None,
        learning_rate="pa1",  # or "pa2"
        eta0=1.0,  # for parameter C
    )

Read more in the User Guide: passive_aggressive.

Parameters

  • C: float, default=1.0 Aggressiveness parameter for the passive-agressive algorithm, see [1]. For PA-I it is the maximum step size. For PA-II it regularizes the step size (the smaller C the more it regularizes). As a general rule-of-thumb, C should be small when the data is noisy.

  • fit_intercept: bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

  • max_iter: int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the ~sklearn.linear_model.PassiveAggressiveClassifier.partial_fit method.

    Added in 0.19

  • tol: float or None, default=1e-3 The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol).

    Added in 0.19

  • early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.

    Added in 0.20

  • validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

    Added in 0.20

  • n_iter_no_change: int, default=5 Number of iterations with no improvement to wait before early stopping.

    Added in 0.20

  • shuffle: bool, default=True Whether or not the training data should be shuffled after each epoch.

  • verbose: int, default=0 The verbosity level.

  • loss: str, default="hinge" The loss function to be used: hinge: equivalent to PA-I in the reference paper. squared_hinge: equivalent to PA-II in the reference paper.

  • n_jobs: int or None, default=None The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • random_state: int, RandomState instance, default=None Used to shuffle the training data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary .

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary .

    Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled.

  • class_weight: dict, {class_label: weight} or "balanced" or None, default=None Preset for the class_weight fit parameter.

    Weights associated with classes. If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

    Added in 0.17 parameter class_weight to automatically weight samples.

  • average: bool or int, default=False When set to True, computes the averaged SGD weights and stores the result in the coef_ attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.

    Added in 0.19 parameter average to use weights averaging in SGD.

Attributes

  • coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features.

  • intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: int The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit.

  • classes_: ndarray of shape (n_classes,) The unique classes labels.

  • t_: int Number of weight updates performed during training. Same as (n_iter_ * n_samples + 1).

See Also

  • SGDClassifier: Incrementally trained logistic regression.
  • Perceptron: Linear perceptron classifier.

References

Examples

from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_features=4, random_state=0)
clf = PassiveAggressiveClassifier(max_iter=1000, random_state=0,
tol=1e-3)
clf.fit(X, y)
PassiveAggressiveClassifier(random_state=0)
print(clf.coef_)
[[0.26642044 0.45070924 0.67251877 0.64185414]]
print(clf.intercept_)
[1.84127814]
print(clf.predict([[0, 0, 0, 0]]))
[1]


24.2.27 /perceptron

name type default description
n-iter-no-change
tol
early-stopping
eta-0
shuffle
penalty
max-iter
n-jobs
random-state
fit-intercept
alpha
warm-start
l-1-ratio
validation-fraction
class-weight
verbose
predict-proba?

Linear perceptron classifier.

The implementation is a wrapper around ~sklearn.linear_model.SGDClassifier by fixing the loss and learning_rate parameters as

SGDClassifier(loss="perceptron", learning_rate="constant")

Other available parameters are described below and are forwarded to ~sklearn.linear_model.SGDClassifier.

Read more in the User Guide: perceptron.

Parameters

  • penalty: {'l2','l1','elasticnet'}, default=None The penalty (aka regularization term) to be used.

  • alpha: float, default=0.0001 Constant that multiplies the regularization term if regularization is used.

  • l1_ratio: float, default=0.15 The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if penalty='elasticnet'.

    Added in 0.24

  • fit_intercept: bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

  • max_iter: int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit method.

    Added in 0.19

  • tol: float or None, default=1e-3 The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol).

    Added in 0.19

  • shuffle: bool, default=True Whether or not the training data should be shuffled after each epoch.

  • verbose: int, default=0 The verbosity level.

  • eta0: float, default=1 Constant by which the updates are multiplied.

  • n_jobs: int, default=None The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • random_state: int, RandomState instance or None, default=0 Used to shuffle the training data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary .

  • early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.

    Added in 0.20

  • validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

    Added in 0.20

  • n_iter_no_change: int, default=5 Number of iterations with no improvement to wait before early stopping.

    Added in 0.20

  • class_weight: dict, {class_label: weight} or "balanced", default=None Preset for the class_weight fit parameter.

    Weights associated with classes. If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary .

Attributes

  • classes_: ndarray of shape (n_classes,) The unique classes labels.

  • coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features.

  • intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: int The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit.

  • t_: int Number of weight updates performed during training. Same as (n_iter_ * n_samples + 1).

See Also

  • sklearn.linear_model.SGDClassifier: Linear classifiers (SVM, logistic regression, etc.) with SGD training.

Notes

Perceptron is a classification algorithm which shares the same underlying implementation with SGDClassifier. In fact, Perceptron() is equivalent to SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None).

References

https://en.wikipedia.org/wiki/Perceptron and references therein.

Examples

from sklearn.datasets import load_digits
from sklearn.linear_model import Perceptron
X, y = load_digits(return_X_y=True)
clf = Perceptron(tol=1e-3, random_state=0)
clf.fit(X, y)
Perceptron()
clf.score(X, y)
0.939...


24.2.28 /quadratic-discriminant-analysis

name type default description
covariance-estimator
priors
reg-param
shrinkage
solver
store-covariance
tol
predict-proba?

Quadratic Discriminant Analysis.

A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class.

Added in 0.17

For a comparison between ~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis and ~sklearn.discriminant_analysis.LinearDiscriminantAnalysis, see :ref:sphx_glr_auto_examples_classification_plot_lda_qda.py.

Read more in the User Guide: lda_qda.

Parameters

  • solver: {'svd', 'eigen'}, default='svd' Solver to use, possible values: - 'svd': Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features. - 'eigen': Eigenvalue decomposition. Can be combined with shrinkage or custom covariance estimator.

  • shrinkage: 'auto' or float, default=None Shrinkage parameter, possible values: - None: no shrinkage (default). - 'auto': automatic shrinkage using the Ledoit-Wolf lemma. - float between 0 and 1: fixed shrinkage parameter.

    Enabling shrinkage is expected to improve the model when some
    classes have a relatively small number of training data points
    compared to the number of features by mitigating overfitting during
    the covariance estimation step.
    

    This should be left to None if covariance_estimator is used. Note that shrinkage works only with 'eigen' solver.

  • priors: array-like of shape (n_classes,), default=None Class priors. By default, the class proportions are inferred from the training data.

  • reg_param: float, default=0.0 Regularizes the per-class covariance estimates by transforming S2 as S2 = (1 - reg_param) * S2 + reg_param * np.eye(n_features), where S2 corresponds to the scaling_ attribute of a given class.

  • store_covariance: bool, default=False If True, the class covariance matrices are explicitly computed and stored in the self.covariance_ attribute.

    Added in 0.17

  • tol: float, default=1.0e-4 Absolute threshold for the covariance matrix to be considered rank deficient after applying some regularization (see reg_param) to each Sk where Sk represents covariance matrix for k-th class. This parameter does not affect the predictions. It controls when a warning is raised if the covariance matrix is not full rank.

    Added in 0.17

  • covariance_estimator: covariance estimator, default=None If not None, covariance_estimator is used to estimate the covariance matrices instead of relying on the empirical covariance estimator (with potential shrinkage). The object should have a fit method and a covariance_ attribute like the estimators in sklearn.covariance. If None the shrinkage parameter drives the estimate.

    This should be left to None if shrinkage is used. Note that covariance_estimator works only with the 'eigen' solver.

Attributes

  • covariance_: list of len n_classes of ndarray of shape (n_features, n_features) For each class, gives the covariance matrix estimated using the samples of that class. The estimations are unbiased. Only present if store_covariance is True.

  • means_: array-like of shape (n_classes, n_features) Class-wise means.

  • priors_: array-like of shape (n_classes,) Class priors (sum to 1).

  • rotations_: list of len n_classes of ndarray of shape (n_features, n_k) For each class k an array of shape (n_features, n_k), where n_k = min(n_features, number of elements in class k) It is the rotation of the Gaussian distribution, i.e. its principal axis. It corresponds to V, the matrix of eigenvectors coming from the SVD of Xk = U S Vt where Xk is the centered matrix of samples from class k.

  • scalings_: list of len n_classes of ndarray of shape (n_k,) For each class, contains the scaling of the Gaussian distributions along its principal axes, i.e. the variance in the rotated coordinate system. It corresponds to S^2 / (n_samples - 1), where S is the diagonal matrix of singular values from the SVD of Xk, where Xk is the centered matrix of samples from class k.

  • classes_: ndarray of shape (n_classes,) Unique class labels.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • LinearDiscriminantAnalysis: Linear Discriminant Analysis.

Examples

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = QuadraticDiscriminantAnalysis()
clf.fit(X, y)
QuadraticDiscriminantAnalysis()
print(clf.predict([[-0.8, -1]]))
[1]


24.2.29 /radius-neighbors-classifier

name type default description
weights
p
leaf-size
metric-params
radius
outlier-label
algorithm
n-jobs
metric
predict-proba?

Classifier implementing a vote among neighbors within a given radius.

Read more in the User Guide: classification.

Parameters

  • radius: float, default=1.0 Range of parameter space to use by default for radius_neighbors queries.

  • weights: {'uniform', 'distance'}, callable or None, default='uniform' Weight function used in prediction. Possible values:

    • 'uniform' : uniform weights. All points in each neighborhood are weighted equally.
    • 'distance' : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
    • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

    Uniform weights are used by default.

  • algorithm: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto' Algorithm used to compute the nearest neighbors:

    • 'ball_tree' will use BallTree
    • 'kd_tree' will use KDTree
    • 'brute' will use a brute-force search.
    • 'auto' will attempt to decide the most appropriate algorithm based on the values passed to fit method.

    Note: fitting on sparse input will override the setting of this parameter, using brute force.

  • leaf_size: int, default=30 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • p: float, default=2 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected to be positive.

  • metric: str or callable, default='minkowski' Metric to use for distance computation. Default is "minkowski", which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in ~sklearn.metrics.pairwise.distance_metrics for valid metric values.

    If metric is "precomputed", X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only "nonzero" elements may be considered neighbors.

    If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy's metrics, but is less efficient than passing the metric name as a string.

  • outlier_label: {manual label, 'most_frequent'}, default=None Label for outlier samples (samples with no neighbors in given radius).

    • manual label: str or int label (should be the same type as y) or list of manual labels if multi-output is used.
    • 'most_frequent' : assign the most frequent label of y to outliers.
    • None : when any outlier is detected, ValueError will be raised.

    The outlier label should be selected from among the unique 'Y' labels. If it is specified with a different value a warning will be raised and all class probabilities of outliers will be assigned to be 0.

  • metric_params: dict, default=None Additional keyword arguments for the metric function.

  • n_jobs: int, default=None The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

  • classes_: ndarray of shape (n_classes,) Class labels known to the classifier.

  • effective_metric_: str or callable The distance metric used. It will be same as the metric parameter or a synonym of it, e.g. 'euclidean' if the metric parameter set to 'minkowski' and p parameter set to 2.

  • effective_metric_params_: dict Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to 'minkowski'.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_samples_fit_: int Number of samples in the fitted data.

  • outlier_label_: int or array-like of shape (n_class,) Label which is given for outlier samples (samples with no neighbors on given radius).

  • outputs_2d_: bool False when y's shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.

See Also

  • KNeighborsClassifier: Classifier implementing the k-nearest neighbors vote.
  • RadiusNeighborsRegressor: Regression based on neighbors within a fixed radius.
  • KNeighborsRegressor: Regression based on k-nearest neighbors.
  • NearestNeighbors: Unsupervised learner for implementing neighbor searches.

Notes

See Nearest Neighbors: neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import RadiusNeighborsClassifier
neigh = RadiusNeighborsClassifier(radius=1.0)
neigh.fit(X, y)
RadiusNeighborsClassifier(...)
print(neigh.predict([[1.5]]))
[0]
print(neigh.predict_proba([[1.0]]))
[[0.66666667 0.33333333]]


24.2.30 /random-forest-classifier

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
class-weight
n-estimators
max-samples
criterion
verbose
predict-proba?

A random forest classifier.

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. Trees in the forest use the best split strategy, i.e. equivalent to passing splitter="best" to the underlying ~sklearn.tree.DecisionTreeClassifier. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.

For a comparison between tree-based ensemble models see the example :ref:sphx_glr_auto_examples_ensemble_plot_forest_hist_grad_boosting_comparison.py.

This estimator has native support for missing values (NaNs). During training, the tree grower learns at each split point whether samples with missing values should go to the left or right child, based on the potential gain. When predicting, samples with missing values are assigned to the left or right child consequently. If no missing values were encountered for a given feature during training, then samples with missing values are mapped to whichever child has the most samples.

Read more in the User Guide: forest.

Parameters

  • n_estimators: int, default=100 The number of trees in the forest.

    Changed in 0.22 The default value of n_estimators changed from 10 to 100 in 0.22.

  • criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation. Note: This parameter is tree-specific.

  • max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.
    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in 0.18 Added float values for fractions.

  • min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.
    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in 0.18 Added float values for fractions.

  • min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features: {"sqrt", "log2", None}, int or float, default="sqrt" The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.
    • If float, then max_features is a fraction and max(1, int(max_features * n_features_in_)) features are considered at each split.
    • If "sqrt", then max_features=sqrt(n_features).
    • If "log2", then max_features=log2(n_features).
    • If None, then max_features=n_features.

    Changed in 1.1 The default of max_features changed from "auto" to "sqrt".

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_leaf_nodes: int, default=None Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19
  • bootstrap: bool, default=True Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

  • oob_score: bool or callable, default=False Whether to use out-of-bag samples to estimate the generalization score. By default, ~sklearn.metrics.accuracy_score is used. Provide a callable with signature metric(y_true, y_pred) to use a custom metric. Only available if bootstrap=True.

    For an illustration of out-of-bag (OOB) error estimation, see the example :ref:sphx_glr_auto_examples_ensemble_plot_ensemble_oob.py.

  • n_jobs: int, default=None The number of jobs to run in parallel. fit, predict, decision_path and apply are all parallelized over the trees. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • random_state: int, RandomState instance or None, default=None Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features). See Glossary for details.

  • verbose: int, default=0 Controls the verbosity when fitting and predicting.

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See Glossary and :ref:tree_ensemble_warm_start for details.

  • class_weight: {"balanced", "balanced_subsample"}, dict or list of dicts, default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

    Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    The "balanced_subsample" mode is the same as "balanced" except that weights are computed based on the bootstrap sample for every tree grown.

    For multi-output, the weights of each column of y will be multiplied.

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruning for details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.py for an example of such pruning.

    Added in 0.22

  • max_samples: int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator.

    • If None (default), then draw X.shape[0] samples.
    • If int, then draw max_samples samples.
    • If float, then draw max(round(n_samples * max_samples), 1) samples. Thus, max_samples should be in the interval (0.0, 1.0].

    Added in 0.22

  • monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decrease

    If monotonic_cst is None, no constraints are applied.

    Monotonicity constraints are not supported for: - multiclass classifications (i.e. when n_classes > 2), - multioutput classifications (i.e. when n_outputs_ > 1), - classifications trained on data with missing values.

    The constraints hold over the probability of the positive class.

    Read more in the User Guide: monotonic_cst_gbdt.

    Added in 1.4

Attributes

  • estimator_: ~sklearn.tree.DecisionTreeClassifier The child estimator template used to create the collection of fitted sub-estimators.

    Added in 1.2 base_estimator_ was renamed to estimator_.

  • estimators_: list of DecisionTreeClassifier The collection of fitted sub-estimators.

  • classes_: ndarray of shape (n_classes,) or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

  • n_classes_: int or list The number of classes (single output problem), or a list containing the number of classes for each output (multi-output problem).

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_outputs_: int The number of outputs when fit is performed.

  • feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

    Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

  • oob_score_: float Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

  • oob_decision_function_: ndarray of shape (n_samples, n_classes) or (n_samples, n_classes, n_outputs) Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN. This attribute exists only when oob_score is True.

  • estimators_samples_: list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.

    Added in 1.4

See Also

  • sklearn.tree.DecisionTreeClassifier: A decision tree classifier.
  • sklearn.ensemble.ExtraTreesClassifier: Ensemble of extremely randomized tree classifiers.
  • sklearn.ensemble.HistGradientBoostingClassifier: A Histogram-based Gradient Boosting Classification Tree, very fast for big datasets (n_samples >= 10_000).

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

References

  • [1] :doi:L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001. <10.1023/A:1010933404324>

Examples

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)
RandomForestClassifier(...)
print(clf.predict([[0, 0, 0, 0]]))
[1]


24.2.31 /ridge-classifier

name type default description
positive
tol
solver
max-iter
random-state
copy-x
fit-intercept
alpha
class-weight
predict-proba?

Classifier using Ridge regression.

This classifier first converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).

Read more in the User Guide: ridge_regression.

Parameters

  • alpha: float, default=1.0 Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as ~sklearn.linear_model.LogisticRegression or ~sklearn.svm.LinearSVC.

  • fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).

  • copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.

  • max_iter: int, default=None Maximum number of iterations for conjugate gradient solver. The default value is determined by scipy.sparse.linalg.

  • tol: float, default=1e-4 The precision of the solution (coef_) is determined by tol which specifies a different convergence criterion for each solver:

    • 'svd': tol has no impact.

    • 'cholesky': tol has no impact.

    • 'sparse_cg': norm of residuals smaller than tol.

    • 'lsqr': tol is set as atol and btol of scipy.sparse.linalg.lsqr, which control the norm of the residual vector in terms of the norms of matrix and coefficients.

    • 'sag' and 'saga': relative change of coef smaller than tol.

    • 'lbfgs': maximum of the absolute (projected) gradient=max|residuals| smaller than tol.

    Changed in 1.2 Default value changed from 1e-3 to 1e-4 for consistency with other linear models.

  • class_weight: dict or 'balanced', default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • solver: {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto' Solver to use in the computational routines:

    • 'auto' chooses the solver automatically based on the type of data.

    • 'svd' uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than 'cholesky' at the cost of being slower.

    • 'cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution.

    • 'sparse_cg' uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than 'cholesky' for large-scale data (possibility to set tol and max_iter).

    • 'lsqr' uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.

    • 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses its unbiased and more flexible version named SAGA. Both methods use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

      Added in 0.17 Stochastic Average Gradient descent solver. Added in 0.19 SAGA solver.

    • 'lbfgs' uses L-BFGS-B algorithm implemented in scipy.optimize.minimize. It can be used only when positive is True.

  • positive: bool, default=False When set to True, forces the coefficients to be positive. Only 'lbfgs' solver is supported in this case.

  • random_state: int, RandomState instance, default=None Used when solver == 'sag' or 'saga' to shuffle the data. See Glossary for details.

Attributes

  • coef_: ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.

    coef_ is of shape (1, n_features) when the given problem is binary.

  • intercept_: float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 if fit_intercept = False.

  • n_iter_: None or ndarray of shape (n_targets,) Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.

  • classes_: ndarray of shape (n_classes,) The classes labels.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • solver_: str The solver that was used at fit time by the computational routines.

    Added in 1.5

See Also

  • Ridge: Ridge regression.
  • RidgeClassifierCV: Ridge classifier with built-in cross validation.

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

Examples

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import RidgeClassifier
X, y = load_breast_cancer(return_X_y=True)
clf = RidgeClassifier().fit(X, y)
clf.score(X, y)
0.9595...


24.2.32 /ridge-classifier-cv

name type default description
alphas
class-weight
cv
fit-intercept
scoring
store-cv-results
predict-proba?

Ridge classifier with built-in cross-validation.

See glossary entry for cross-validation estimator.

By default, it performs Leave-One-Out Cross-Validation. Currently, only the n_features > n_samples case is handled efficiently.

Read more in the User Guide: ridge_regression.

Parameters

  • alphas: array-like of shape (n_alphas,), default=(0.1, 1.0, 10.0) Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as ~sklearn.linear_model.LogisticRegression or ~sklearn.svm.LinearSVC. If using Leave-One-Out cross-validation, alphas must be strictly positive.

  • fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

  • scoring: str, callable, default=None The scoring method to use for cross-validation. Options:

    • str: see :ref:scoring_string_names for options.
    • callable: a scorer callable object (e.g., function) with signature scorer(estimator, X, y). See :ref:scoring_callable for details.
    • None: negative mean squared error: mean_squared_error if cv is None (i.e. when using leave-one-out cross-validation), or accuracy: accuracy_score otherwise.
  • cv: int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the efficient Leave-One-Out cross-validation
    • integer, to specify the number of folds.
    • CV splitter,
    • An iterable yielding (train, test) splits as arrays of indices.

    Refer User Guide: cross_validation for the various cross-validation strategies that can be used here.

  • class_weight: dict or 'balanced', default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • store_cv_results: bool, default=False Flag indicating if the cross-validation results corresponding to each alpha should be stored in the cv_results_ attribute (see below). This flag is only compatible with cv=None (i.e. using Leave-One-Out Cross-Validation).

    Changed in 1.5 Parameter name changed from store_cv_values to store_cv_results.

Attributes

  • cv_results_: ndarray of shape (n_samples, n_targets, n_alphas), optional Cross-validation results for each alpha (only if store_cv_results=True and cv=None). After fit() has been called, this attribute will contain the mean squared errors if scoring is None otherwise it will contain standardized per point prediction values.

    Changed in 1.5 cv_values_ changed to cv_results_.

  • coef_: ndarray of shape (1, n_features) or (n_targets, n_features) Coefficient of the features in the decision function.

    coef_ is of shape (1, n_features) when the given problem is binary.

  • intercept_: float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 if fit_intercept = False.

  • alpha_: float Estimated regularization parameter.

  • best_score_: float Score of base estimator with best alpha.

    Added in 0.23

  • classes_: ndarray of shape (n_classes,) The classes labels.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • Ridge: Ridge regression.
  • RidgeClassifier: Ridge classifier.
  • RidgeCV: Ridge regression with built-in cross validation.

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

Examples

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import RidgeClassifierCV
X, y = load_breast_cancer(return_X_y=True)
clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
clf.score(X, y)
0.9630...


24.2.33 /self-training-classifier

name type default description
criterion
estimator
k-best
max-iter
threshold
verbose
predict-proba?

Self-training classifier.

This metaestimator allows a given supervised classifier to function as a semi-supervised classifier, allowing it to learn from unlabeled data. It does this by iteratively predicting pseudo-labels for the unlabeled data and adding them to the training set.

The classifier will continue iterating until either max_iter is reached, or no pseudo-labels were added to the training set in the previous iteration.

Read more in the User Guide: self_training.

Parameters

  • estimator: estimator object An estimator object implementing fit and predict_proba. Invoking the fit method will fit a clone of the passed estimator, which will be stored in the estimator_ attribute.

    Added in 1.6 estimator was added to replace base_estimator.

  • threshold: float, default=0.75 The decision threshold for use with criterion='threshold'. Should be in [0, 1). When using the 'threshold' criterion, a well calibrated classifier: calibration should be used.

  • criterion: {'threshold', 'k_best'}, default='threshold' The selection criterion used to select which labels to add to the training set. If 'threshold', pseudo-labels with prediction probabilities above threshold are added to the dataset. If 'k_best', the k_best pseudo-labels with highest prediction probabilities are added to the dataset. When using the 'threshold' criterion, a well calibrated classifier: calibration should be used.

  • k_best: int, default=10 The amount of samples to add in each iteration. Only used when criterion='k_best'.

  • max_iter: int or None, default=10 Maximum number of iterations allowed. Should be greater than or equal to 0. If it is None, the classifier will continue to predict labels until no new pseudo-labels are added, or all unlabeled samples have been labeled.

  • verbose: bool, default=False Enable verbose output.

Attributes

  • estimator_: estimator object The fitted estimator.

  • classes_: ndarray or list of ndarray of shape (n_classes,) Class labels for each output. (Taken from the trained estimator_).

  • transduction_: ndarray of shape (n_samples,) The labels used for the final fit of the classifier, including pseudo-labels added during fit.

  • labeled_iter_: ndarray of shape (n_samples,) The iteration in which each sample was labeled. When a sample has iteration 0, the sample was already labeled in the original dataset. When a sample has iteration -1, the sample was not labeled in any iteration.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: int The number of rounds of self-training, that is the number of times the base estimator is fitted on relabeled variants of the training set.

  • termination_condition_: {'max_iter', 'no_change', 'all_labeled'} The reason that fitting was stopped.

    • 'max_iter': n_iter_ reached max_iter.
    • 'no_change': no new labels were predicted.
    • 'all_labeled': all unlabeled samples were labeled before max_iter was reached.

See Also

  • LabelPropagation: Label propagation classifier.
  • LabelSpreading: Label spreading model for semi-supervised learning.

References

:doi:David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics (ACL '95). Association for Computational Linguistics, Stroudsburg, PA, USA, 189-196. <10.3115/981658.981684>

Examples

import numpy as np
from sklearn import datasets
from sklearn.semi_supervised import SelfTrainingClassifier
from sklearn.svm import SVC
rng = np.random.RandomState(42)
iris = datasets.load_iris()
random_unlabeled_points = rng.rand(iris.target.shape[0]) < 0.3
iris.target[random_unlabeled_points] = -1
svc = SVC(probability=True, gamma="auto")
self_training_model = SelfTrainingClassifier(svc)
self_training_model.fit(iris.data, iris.target)
SelfTrainingClassifier(...)


24.2.34 /sgd-classifier

name type default description
n-iter-no-change
learning-rate
average
tol
early-stopping
eta-0
shuffle
penalty
power-t
max-iter
n-jobs
random-state
fit-intercept
alpha
warm-start
l-1-ratio
validation-fraction
class-weight
loss
verbose
epsilon
predict-proba?

Linear classifiers (SVM, logistic regression, etc.) with SGD training.

This estimator implements regularized linear models with stochastic gradient descent (SGD) learning: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate). SGD allows minibatch (online/out-of-core) learning via the partial_fit method. For best results using the default learning rate schedule, the data should have zero mean and unit variance.

This implementation works with data represented as dense or sparse arrays of floating point values for the features. The model it fits can be controlled with the loss parameter; by default, it fits a linear support vector machine (SVM).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

Read more in the User Guide: sgd.

Parameters

  • loss: {'hinge', 'log_loss', 'modified_huber', 'squared_hinge', 'perceptron', 'squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive'}, default='hinge' The loss function to be used.

    • 'hinge' gives a linear SVM.
    • 'log_loss' gives logistic regression, a probabilistic classifier.
    • 'modified_huber' is another smooth loss that brings tolerance to outliers as well as probability estimates.
    • 'squared_hinge' is like hinge but is quadratically penalized.
    • 'perceptron' is the linear loss used by the perceptron algorithm.
    • The other losses, 'squared_error', 'huber', 'epsilon_insensitive' and 'squared_epsilon_insensitive' are designed for regression but can be useful in classification as well; see ~sklearn.linear_model.SGDRegressor for a description.

    More details about the losses formulas can be found in the User Guide: sgd_mathematical_formulation and you can find a visualisation of the loss functions in :ref:sphx_glr_auto_examples_linear_model_plot_sgd_loss_functions.py.

  • penalty: {'l2', 'l1', 'elasticnet', None}, default='l2' The penalty (aka regularization term) to be used. Defaults to 'l2' which is the standard regularizer for linear SVM models. 'l1' and 'elasticnet' might bring sparsity to the model (feature selection) not achievable with 'l2'. No penalty is added when set to None.

    You can see a visualisation of the penalties in :ref:sphx_glr_auto_examples_linear_model_plot_sgd_penalties.py.

  • alpha: float, default=0.0001 Constant that multiplies the regularization term. The higher the value, the stronger the regularization. Also used to compute the learning rate when learning_rate is set to 'optimal'. Values must be in the range [0.0, inf).

  • l1_ratio: float, default=0.15 The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if penalty is 'elasticnet'. Values must be in the range [0.0, 1.0] or can be None if penalty is not elasticnet.

    Changed in 1.7 l1_ratio can be None when penalty is not "elasticnet".

  • fit_intercept: bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

  • max_iter: int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit method. Values must be in the range [1, inf).

    Added in 0.19

  • tol: float or None, default=1e-3 The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs. Convergence is checked against the training loss or the validation loss depending on the early_stopping parameter. Values must be in the range [0.0, inf).

    Added in 0.19

  • shuffle: bool, default=True Whether or not the training data should be shuffled after each epoch.

  • verbose: int, default=0 The verbosity level. Values must be in the range [0, inf).

  • epsilon: float, default=0.1 Epsilon in the epsilon-insensitive loss functions; only if loss is 'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive'. For 'huber', determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold. Values must be in the range [0.0, inf).

  • n_jobs: int, default=None The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • random_state: int, RandomState instance, default=None Used for shuffling the data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary . Integer values must be in the range [0, 2**32 - 1].

  • learning_rate: str, default='optimal' The learning rate schedule:

    • 'constant': eta = eta0
    • 'optimal': eta = 1.0 / (alpha * (t + t0)) where t0 is chosen by a heuristic proposed by Leon Bottou.
    • 'invscaling': eta = eta0 / pow(t, power_t)
    • 'adaptive': eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.
    • 'pa1': passive-aggressive algorithm 1, see [1]_. Only with loss='hinge'. Update is w += eta y x with eta = min(eta0, loss/||x||**2).
    • 'pa2': passive-aggressive algorithm 2, see [1]_. Only with loss='hinge'. Update is w += eta y x with eta = hinge_loss / (||x||**2 + 1/(2 eta0)).

    Added in 0.20 Added 'adaptive' option.

    Added in 1.8 Added options 'pa1' and 'pa2'

  • eta0: float, default=0.01 The initial learning rate for the 'constant', 'invscaling' or 'adaptive' schedules. The default value is 0.01, but note that eta0 is not used by the default learning rate 'optimal'. Values must be in the range (0.0, inf).

    For PA-1 (learning_rate=pa1) and PA-II (pa2), it specifies the aggressiveness parameter for the passive-agressive algorithm, see [1] where it is called C:

    • For PA-I it is the maximum step size.
    • For PA-II it regularizes the step size (the smaller eta0 the more it regularizes).

    As a general rule-of-thumb for PA, eta0 should be small when the data is noisy.

  • power_t: float, default=0.5 The exponent for inverse scaling learning rate. Values must be in the range [0.0, inf).

    Deprecated since 1.8 Negative values for power_t are deprecated in version 1.8 and will raise an error in 1.10. Use values in the range [0.0, inf) instead.

  • early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score returned by the score method is not improving by at least tol for n_iter_no_change consecutive epochs.

    See :ref:sphx_glr_auto_examples_linear_model_plot_sgd_early_stopping.py for an example of the effects of early stopping.

    Added in 0.20 Added 'early_stopping' option

  • validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True. Values must be in the range (0.0, 1.0).

    Added in 0.20 Added 'validation_fraction' option

  • n_iter_no_change: int, default=5 Number of iterations with no improvement to wait before stopping fitting. Convergence is checked against the training loss or the validation loss depending on the early_stopping parameter. Integer values must be in the range [1, max_iter).

    Added in 0.20 Added 'n_iter_no_change' option

  • class_weight: dict, {class_label: weight} or "balanced", default=None Preset for the class_weight fit parameter.

    Weights associated with classes. If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary .

    Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled. If a dynamic learning rate is used, the learning rate is adapted depending on the number of samples already seen. Calling fit resets this counter, while partial_fit will result in increasing the existing counter.

  • average: bool or int, default=False When set to True, computes the averaged SGD weights across all updates and stores the result in the coef_ attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples. Integer values must be in the range [1, n_samples].

Attributes

  • coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features.

  • intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.

  • n_iter_: int The actual number of iterations before reaching the stopping criterion. For multiclass fits, it is the maximum over every binary fit.

  • classes_: array of shape (n_classes,)

  • t_: int Number of weight updates performed during training. Same as (n_iter_ * n_samples + 1).

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

See Also

  • sklearn.svm.LinearSVC: Linear support vector classification.
  • LogisticRegression: Logistic regression.
  • Perceptron: Inherits from SGDClassifier. Perceptron() is equivalent to SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None).

References

Examples

import numpy as np
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array([1, 1, 2, 2])
# Always scale the input. The most convenient way is to use a pipeline.
clf = make_pipeline(StandardScaler(),
                    SGDClassifier(max_iter=1000, tol=1e-3))
clf.fit(X, Y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('sgdclassifier', SGDClassifier())])
print(clf.predict([[-0.8, -1]]))
[1]


24.2.35 /svc

name type default description
break-ties
kernel
gamma
degree
decision-function-shape
probability
tol
shrinking
c
max-iter
random-state
coef-0
class-weight
cache-size
verbose
predict-proba?

C-Support Vector Classification.

The implementation is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using ~sklearn.svm.LinearSVC or ~sklearn.linear_model.SGDClassifier instead, possibly after a ~sklearn.kernel_approximation.Nystroem transformer or other :ref:kernel_approximation.

The multiclass support is handled according to a one-vs-one scheme.

For details on the precise mathematical formulation of the provided kernel functions and how gamma, coef0 and degree affect each other, see the corresponding section in the narrative documentation: :ref:svm_kernels.

To learn how to tune SVC's hyperparameters, see the following example: :ref:sphx_glr_auto_examples_model_selection_plot_nested_cross_validation_iris.py

Read more in the User Guide: svm_classification.

Parameters

  • C: float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. For an intuitive visualization of the effects of scaling the regularization parameter C, see :ref:sphx_glr_auto_examples_svm_plot_svm_scale_c.py.

  • kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples). For an intuitive visualization of different kernel types see :ref:sphx_glr_auto_examples_svm_plot_svm_kernels.py.

  • degree: int, default=3 Degree of the polynomial kernel function ('poly'). Must be non-negative. Ignored by all other kernels.

  • gamma: {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.

    • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,
    • if 'auto', uses 1 / n_features
    • if float, must be non-negative.

    Changed in 0.22 The default value of gamma changed from 'auto' to 'scale'.

  • coef0: float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'.

  • shrinking: bool, default=True Whether to use the shrinking heuristic. See the User Guide: shrinking_svm.

  • probability: bool, default=False Whether to enable probability estimates. This must be enabled prior to calling fit, will slow down that method as it internally uses 5-fold cross-validation, and predict_proba may be inconsistent with predict. Read more in the User Guide: scores_probabilities.

  • tol: float, default=1e-3 Tolerance for stopping criterion.

  • cache_size: float, default=200 Specify the size of the kernel cache (in MB).

  • class_weight: dict or 'balanced', default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • verbose: bool, default=False Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

  • max_iter: int, default=-1 Hard limit on iterations within solver, or -1 for no limit.

  • decision_function_shape: {'ovo', 'ovr'}, default='ovr' Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one ('ovo') decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, note that internally, one-vs-one ('ovo') is always used as a multi-class strategy to train models; an ovr matrix is only constructed from the ovo matrix. The parameter is ignored for binary classification.

    Changed in 0.19 decision_function_shape is 'ovr' by default.

    Added in 0.17 decision_function_shape='ovr' is recommended.

    Changed in 0.17 Deprecated decision_function_shape='ovo' and None.

  • break_ties: bool, default=False If true, decision_function_shape='ovr', and number of classes > 2, predict will break ties according to the confidence values of decision_function; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict. See :ref:sphx_glr_auto_examples_svm_plot_svm_tie_breaking.py for an example of its usage with decision_function_shape='ovr'.

    Added in 0.22

  • random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary .

Attributes

  • class_weight_: ndarray of shape (n_classes,) Multipliers of parameter C for each class. Computed based on the class_weight parameter.

  • classes_: ndarray of shape (n_classes,) The classes labels.

  • coef_: ndarray of shape (n_classes * (n_classes - 1) / 2, n_features) Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

    coef_ is a readonly property derived from dual_coef_ and support_vectors_.

  • dual_coef_: ndarray of shape (n_classes -1, n_SV) Dual coefficients of the support vector in the decision function (see :ref:sgd_mathematical_formulation), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the multi-class section of the User Guide: svm_multi_class for details.

  • fit_status_: int 0 if correctly fitted, 1 otherwise (will raise warning)

  • intercept_: ndarray of shape (n_classes * (n_classes - 1) / 2,) Constants in decision function.

  • n_features_in_: int Number of features seen during fit.

    Added in 0.24

  • feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.

    Added in 1.0

  • n_iter_: ndarray of shape (n_classes * (n_classes - 1) // 2,) Number of iterations run by the optimization routine to fit the model. The shape of this attribute depends on the number of models optimized which in turn depends on the number of classes.

    Added in 1.1

  • support_: ndarray of shape (n_SV) Indices of support vectors.

  • support_vectors_: ndarray of shape (n_SV, n_features) Support vectors. An empty array if kernel is precomputed.

  • n_support_: ndarray of shape (n_classes,), dtype=int32 Number of support vectors for each class.

  • probA_: ndarray of shape (n_classes * (n_classes - 1) / 2)

  • probB_: ndarray of shape (n_classes * (n_classes - 1) / 2) If probability=True, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. If probability=False, it's an empty array. Platt scaling uses the logistic function 1 / (1 + exp(decision_value * probA_ + probB_)) where probA_ and probB_ are learned from the dataset [2]. For more information on the multiclass case and training procedure see section 8 of [1].

  • shape_fit_: tuple of int of shape (n_dimensions_of_X,) Array dimensions of training vector X.

See Also

  • SVR: Support Vector Machine for Regression implemented using libsvm.

  • LinearSVC: Scalable Linear Support Vector Machine for classification implemented using liblinear. Check the See Also section of LinearSVC for more comparison element.

References

Examples

import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('svc', SVC(gamma='auto'))])
print(clf.predict([[-0.8, -1]]))
[1]

For a comparison of the SVC with other classifiers see: :ref:sphx_glr_auto_examples_classification_plot_classification_probability.py.



24.3 :sklearn.regression models

24.3.1 /ada-boost-regressor

name type default description
estimator
learning-rate
loss
n-estimators
random-state
predict-proba?


24.3.2 /ard-regression

name type default description
tol
alpha-2
threshold-lambda
max-iter
lambda-1
copy-x
lambda-2
fit-intercept
alpha-1
verbose
compute-score
predict-proba?


24.3.3 /bagging-regressor

name type default description
bootstrap
bootstrap-features
n-jobs
random-state
estimator
oob-score
max-features
warm-start
n-estimators
max-samples
verbose
predict-proba?


24.3.4 /bayesian-ridge

name type default description
tol
alpha-2
max-iter
lambda-1
copy-x
lambda-2
alpha-init
fit-intercept
alpha-1
lambda-init
verbose
compute-score
predict-proba?


24.3.5 /cca

name type default description
copy
max-iter
n-components
scale
tol
predict-proba?


24.3.6 /decision-tree-regressor

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
criterion
predict-proba?


24.3.7 /dummy-regressor

name type default description
constant
quantile
strategy
predict-proba?


24.3.8 /elastic-net

name type default description
positive
tol
max-iter
random-state
copy-x
precompute
fit-intercept
alpha
warm-start
selection
l-1-ratio
predict-proba?


24.3.9 /elastic-net-cv

name type default description
positive
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
precompute
fit-intercept
cv
selection
l-1-ratio
verbose
predict-proba?


24.3.10 /extra-tree-regressor

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
criterion
predict-proba?


24.3.11 /extra-trees-regressor

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
n-estimators
max-samples
criterion
verbose
predict-proba?


24.3.12 /gamma-regressor

name type default description
alpha
fit-intercept
max-iter
solver
tol
verbose
warm-start
predict-proba?


24.3.13 /gaussian-process-regressor

name type default description
alpha
copy-x-train
kernel
n-restarts-optimizer
n-targets
normalize-y
optimizer
random-state
predict-proba?


24.3.14 /gradient-boosting-regressor

name type default description
n-iter-no-change
learning-rate
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
tol
subsample
ccp-alpha
random-state
min-samples-leaf
max-features
init
alpha
warm-start
max-depth
validation-fraction
n-estimators
criterion
loss
verbose
predict-proba?


24.3.15 /hist-gradient-boosting-regressor

name type default description
n-iter-no-change
learning-rate
max-leaf-nodes
scoring
tol
early-stopping
quantile
max-iter
random-state
max-bins
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
validation-fraction
loss
interaction-cst
verbose
categorical-features
l-2-regularization
predict-proba?


24.3.16 /huber-regressor

name type default description
alpha
epsilon
fit-intercept
max-iter
tol
warm-start
predict-proba?


24.3.17 /isotonic-regression

name type default description
increasing
out-of-bounds
y-max
y-min
predict-proba?


24.3.18 /k-neighbors-regressor

name type default description
algorithm
leaf-size
metric
metric-params
n-jobs
n-neighbors
p
weights
predict-proba?


24.3.19 /kernel-ridge

name type default description
alpha
coef-0
degree
gamma
kernel
kernel-params
predict-proba?


24.3.20 /lars

name type default description
fit-path
eps
random-state
jitter
copy-x
precompute
fit-intercept
n-nonzero-coefs
verbose
predict-proba?


24.3.21 /lars-cv

name type default description
eps
max-n-alphas
max-iter
n-jobs
copy-x
precompute
fit-intercept
cv
verbose
predict-proba?


24.3.22 /lasso

name type default description
positive
tol
max-iter
random-state
copy-x
precompute
fit-intercept
alpha
warm-start
selection
predict-proba?


24.3.23 /lasso-cv

name type default description
positive
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
precompute
fit-intercept
cv
selection
verbose
predict-proba?


24.3.24 /lasso-lars

name type default description
positive
fit-path
eps
max-iter
random-state
jitter
copy-x
precompute
fit-intercept
alpha
verbose
predict-proba?


24.3.25 /lasso-lars-cv

name type default description
positive
eps
max-n-alphas
max-iter
n-jobs
copy-x
precompute
fit-intercept
cv
verbose
predict-proba?


24.3.26 /lasso-lars-ic

name type default description
positive
eps
noise-variance
max-iter
copy-x
precompute
fit-intercept
criterion
verbose
predict-proba?


24.3.27 /linear-regression

name type default description
copy-x
fit-intercept
n-jobs
positive
tol
predict-proba?


24.3.28 /linear-svr

name type default description
tol
intercept-scaling
c
max-iter
random-state
dual
fit-intercept
loss
verbose
epsilon
predict-proba?


24.3.29 /mlp-regressor

name type default description
n-iter-no-change
learning-rate
activation
hidden-layer-sizes
tol
beta-2
early-stopping
nesterovs-momentum
batch-size
solver
shuffle
power-t
max-fun
beta-1
max-iter
random-state
momentum
learning-rate-init
alpha
warm-start
validation-fraction
loss
verbose
epsilon
predict-proba?


24.3.30 /multi-task-elastic-net

name type default description
tol
max-iter
random-state
copy-x
fit-intercept
alpha
warm-start
selection
l-1-ratio
predict-proba?


24.3.31 /multi-task-elastic-net-cv

name type default description
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
fit-intercept
cv
selection
l-1-ratio
verbose
predict-proba?


24.3.32 /multi-task-lasso

name type default description
alpha
copy-x
fit-intercept
max-iter
random-state
selection
tol
warm-start
predict-proba?


24.3.33 /multi-task-lasso-cv

name type default description
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
fit-intercept
cv
selection
verbose
predict-proba?


24.3.34 /nu-svr

name type default description
kernel
gamma
degree
tol
nu
shrinking
c
max-iter
coef-0
cache-size
verbose
predict-proba?


24.3.35 /orthogonal-matching-pursuit

name type default description
fit-intercept
n-nonzero-coefs
precompute
tol
predict-proba?


24.3.36 /orthogonal-matching-pursuit-cv

name type default description
copy
cv
fit-intercept
max-iter
n-jobs
verbose
predict-proba?


24.3.37 /passive-aggressive-regressor

name type default description
n-iter-no-change
average
tol
early-stopping
shuffle
c
max-iter
random-state
fit-intercept
warm-start
validation-fraction
loss
verbose
epsilon
predict-proba?


24.3.38 /pls-canonical

name type default description
algorithm
copy
max-iter
n-components
scale
tol
predict-proba?


24.3.39 /pls-regression

name type default description
copy
max-iter
n-components
scale
tol
predict-proba?


24.3.40 /poisson-regressor

name type default description
alpha
fit-intercept
max-iter
solver
tol
verbose
warm-start
predict-proba?


24.3.41 /quantile-regressor

name type default description
alpha
fit-intercept
quantile
solver
solver-options
predict-proba?


24.3.42 /radius-neighbors-regressor

name type default description
algorithm
leaf-size
metric
metric-params
n-jobs
p
radius
weights
predict-proba?


24.3.43 /random-forest-regressor

name type default description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
n-estimators
max-samples
criterion
verbose
predict-proba?


24.3.44 /ransac-regressor

name type default description
is-data-valid
max-skips
random-state
min-samples
stop-probability
estimator
stop-n-inliers
max-trials
residual-threshold
is-model-valid
loss
stop-score
predict-proba?


24.3.45 /ridge

name type default description
alpha
copy-x
fit-intercept
max-iter
positive
random-state
solver
tol
predict-proba?


24.3.46 /ridge-cv

name type default description
alpha-per-target
alphas
cv
fit-intercept
gcv-mode
scoring
store-cv-results
predict-proba?


24.3.47 /sgd-regressor

name type default description
n-iter-no-change
learning-rate
average
tol
early-stopping
eta-0
shuffle
penalty
power-t
max-iter
random-state
fit-intercept
alpha
warm-start
l-1-ratio
validation-fraction
loss
verbose
epsilon
predict-proba?


24.3.48 /svr

name type default description
kernel
gamma
degree
tol
shrinking
c
max-iter
coef-0
cache-size
verbose
epsilon
predict-proba?


24.3.49 /theil-sen-regressor

name type default description
fit-intercept
max-iter
max-subpopulation
n-jobs
n-subsamples
random-state
tol
verbose
predict-proba?


24.3.50 /transformed-target-regressor

name type default description
check-inverse
func
inverse-func
regressor
transformer
predict-proba?


24.3.51 /tweedie-regressor

name type default description
tol
solver
power
max-iter
link
fit-intercept
alpha
warm-start
verbose
predict-proba?


source: notebooks/noj_book/sklearn_reference.clj