24 Sklearn model reference

As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.

This specific chapter focuses on the models of the scijit-learn Python library, which is wrapped by sklearn-clj.

(ns noj-book.sklearn-reference
  (:require
   [noj-book.utils.render-tools :refer [render-key-info]]
   [scicloj.kindly.v4.kind :as kind]
   [scicloj.metamorph.core :as mm]
   [scicloj.metamorph.ml :as ml]
   [tech.v3.dataset.tensor :as dst]
   [libpython-clj2.python :refer [py.- ->jvm]]
   [tech.v3.dataset.metamorph :as ds-mm]
   [noj-book.utils.render-tools-sklearn]
   [scicloj.sklearn-clj.ml]))

24.1 Sklearn model reference

Below we find all sklearn models with their parameters and the original documentation.

The parameters are given as Clojure keys in kebab-case. As the document texts are imported from Python, they refer to the Python spelling of the parameter.

But the translation between the two should be obvious.

Example: logistic regression

(def ds (dst/tensor->dataset [[0 0 0] [1 1 1] [2 2 2]]))

Make pipe with sklearn model ‘logistic-regression’

(def pipe
  (mm/pipeline
   (ds-mm/set-inference-target 2)
   {:metamorph/id :model}
   (ml/model {:model-type :sklearn.classification/logistic-regression
              :max-iter 100})))

Train model:

(def fitted-ctx
  (pipe {:metamorph/data ds
         :metamorph/mode :fit}))

Predict on new data:

(->
 (mm/transform-pipe
  (dst/tensor->dataset [[3 4 5]])
  pipe
  fitted-ctx)
 :metamorph/data)

:_unnamed [1 3]:

0	1	2
0.00725794	0.10454345	2.0

Access model details via Python interop (using libpython-clj):

(-> fitted-ctx :model :model-data :model
    (py.- coef_)
    (->jvm))

#tech.v3.tensor<float64>[3 2]
[[   -0.4807    -0.4807]
 [-2.061E-05 -2.061E-05]
 [    0.4807     0.4807]]

All model attributes are also included in the context.

(def model-attributes
  (-> fitted-ctx :model :model-data :attributes))

(kind/hiccup
 [:dl (map
       (fn [[k v]]
         [:span
          (vector :dt k)
          (vector :dd  (clojure.pprint/write v :stream nil))])
       model-attributes)])

n_features_in_: 2
coef_: [[-4.80679547e-01 -4.80679547e-01] [-2.06085772e-05 -2.06085772e-05] [ 4.80700156e-01 4.80700156e-01]]
intercept_: [ 0.87322115 0.17611579 -1.04933694]
n_iter_: [11]
classes_: [0. 1. 2.]

24.2 `:sklearn.classification` models

24.2.1 /ada-boost-classifier

name	type	default	description
algorithm
estimator
learning-rate
n-estimators
random-state
predict-proba?

An AdaBoost classifier.

An AdaBoost [1]_ classifier is a meta-estimator that begins by fitting a
classifier on the original dataset and then fits additional copies of the
classifier on the same dataset but where the weights of incorrectly
classified instances are adjusted such that subsequent classifiers focus
more on difficult cases.

This class implements the algorithm based on [2]_.

Read more in the User Guide: `adaboost`.

*Added in 0.14*

Parameters
----------
- `estimator`: object, default=None
    The base estimator from which the boosted ensemble is built.
    Support for sample weighting is required, as well as proper
    ``classes_`` and ``n_classes_`` attributes. If ``None``, then
    the base estimator is `~sklearn.tree.DecisionTreeClassifier`
    initialized with `max_depth=1`.

    *Added in 1.2*
       `base_estimator` was renamed to `estimator`.

- `n_estimators`: int, default=50
    The maximum number of estimators at which boosting is terminated.
    In case of perfect fit, the learning procedure is stopped early.
    Values must be in the range `[1, inf)`.

- `learning_rate`: float, default=1.0
    Weight applied to each classifier at each boosting iteration. A higher
    learning rate increases the contribution of each classifier. There is
    a trade-off between the `learning_rate` and `n_estimators` parameters.
    Values must be in the range `(0.0, inf)`.

- `algorithm`: {'SAMME', 'SAMME.R'}, default='SAMME.R'
    If 'SAMME.R' then use the SAMME.R real boosting algorithm.
    ``estimator`` must support calculation of class probabilities.
    If 'SAMME' then use the SAMME discrete boosting algorithm.
    The SAMME.R algorithm typically converges faster than SAMME,
    achieving a lower test error with fewer boosting iterations.

    *Deprecated since 1.4*
        `"SAMME.R"` is deprecated and will be removed in version 1.6.
        '"SAMME"' will become the default.

- `random_state`: int, RandomState instance or None, default=None
    Controls the random seed given at each `estimator` at each
    boosting iteration.
    Thus, it is only used when `estimator` exposes a `random_state`.
    Pass an int for reproducible output across multiple function calls.
    See `Glossary `.

Attributes
----------
- `estimator_`: estimator
    The base estimator from which the ensemble is grown.

    *Added in 1.2*
       `base_estimator_` was renamed to `estimator_`.

- `estimators_`: list of classifiers
    The collection of fitted sub-estimators.

- `classes_`: ndarray of shape (n_classes,)
    The classes labels.

- `n_classes_`: int
    The number of classes.

- `estimator_weights_`: ndarray of floats
    Weights for each estimator in the boosted ensemble.

- `estimator_errors_`: ndarray of floats
    Classification error for each estimator in the boosted
    ensemble.

- `feature_importances_`: ndarray of shape (n_features,)
    The impurity-based feature importances if supported by the
    ``estimator`` (when based on decision trees).

    Warning: impurity-based feature importances can be misleading for
    high cardinality features (many unique values). See
    `sklearn.inspection.permutation_importance` as an alternative.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `AdaBoostRegressor`: An AdaBoost regressor that begins by fitting a
    regressor on the original dataset and then fits additional copies of
    the regressor on the same dataset but where the weights of instances
    are adjusted according to the error of the current prediction.

- `GradientBoostingClassifier`: GB builds an additive model in a forward
    stage-wise fashion. Regression trees are fit on the negative gradient
    of the binomial or multinomial deviance loss function. Binary
    classification is a special case where only a single regression tree is
    induced.

- `sklearn.tree.DecisionTreeClassifier`: A non-parametric supervised learning
    method used for classification.
    Creates a model that predicts the value of a target variable by
    learning simple decision rules inferred from the data features.

References
----------

[1] Y. Freund, R. Schapire, "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", 1995.
[2] :doi:J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost." Statistics and its Interface 2.3 (2009): 349-360. <10.4310/SII.2009.v2.n3.a8>
Examples
from sklearn.ensemble import AdaBoostClassifier from sklearn.datasets import make_classification X, y = make_classification(n_samples=1000, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) clf = AdaBoostClassifier(n_estimators=100, algorithm="SAMME", random_state=0) clf.fit(X, y) AdaBoostClassifier(algorithm='SAMME', n_estimators=100, random_state=0) clf.predict([[0, 0, 0, 0]]) array([1]) clf.score(X, y) 0.96...
For a detailed example of using AdaBoost to fit a sequence of DecisionTrees as weaklearners, please refer to :ref:sphx_glr_auto_examples_ensemble_plot_adaboost_multiclass.py.
For a detailed example of using AdaBoost to fit a non-linearly seperable classification dataset composed of two Gaussian quantiles clusters, please refer to :ref:sphx_glr_auto_examples_ensemble_plot_adaboost_twoclass.py.

24.2.2 /bagging-classifier

name	type	default	description
bootstrap
bootstrap-features
n-jobs
random-state
estimator
oob-score
max-features
warm-start
n-estimators
max-samples
verbose
predict-proba?

A Bagging classifier.

A Bagging classifier is an ensemble meta-estimator that fits base
classifiers each on random subsets of the original dataset and then
aggregate their individual predictions (either by voting or by averaging)
to form a final prediction. Such a meta-estimator can typically be used as
a way to reduce the variance of a black-box estimator (e.g., a decision
tree), by introducing randomization into its construction procedure and
then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random
subsets of the dataset are drawn as random subsets of the samples, then
this algorithm is known as Pasting [1]_. If samples are drawn with
replacement, then the method is known as Bagging [2]_. When random subsets
of the dataset are drawn as random subsets of the features, then the method
is known as Random Subspaces [3]_. Finally, when base estimators are built
on subsets of both samples and features, then the method is known as
Random Patches [4]_.

Read more in the User Guide: `bagging`.

*Added in 0.15*

Parameters
----------
- `estimator`: object, default=None
    The base estimator to fit on random subsets of the dataset.
    If None, then the base estimator is a
    `~sklearn.tree.DecisionTreeClassifier`.

    *Added in 1.2*
       `base_estimator` was renamed to `estimator`.

- `n_estimators`: int, default=10
    The number of base estimators in the ensemble.

- `max_samples`: int or float, default=1.0
    The number of samples to draw from X to train each base estimator (with
    replacement by default, see `bootstrap` for more details).

    - If int, then draw `max_samples` samples.
    - If float, then draw `max_samples * X.shape[0]` samples.

- `max_features`: int or float, default=1.0
    The number of features to draw from X to train each base estimator (
    without replacement by default, see `bootstrap_features` for more
    details).

    - If int, then draw `max_features` features.
    - If float, then draw `max(1, int(max_features * n_features_in_))` features.

- `bootstrap`: bool, default=True
    Whether samples are drawn with replacement. If False, sampling
    without replacement is performed.

- `bootstrap_features`: bool, default=False
    Whether features are drawn with replacement.

- `oob_score`: bool, default=False
    Whether to use out-of-bag samples to estimate
    the generalization error. Only available if bootstrap=True.

- `warm_start`: bool, default=False
    When set to True, reuse the solution of the previous call to fit
    and add more estimators to the ensemble, otherwise, just fit
    a whole new ensemble. See `the Glossary `.

    *Added in 0.17*
       *warm_start* constructor parameter.

- `n_jobs`: int, default=None
    The number of jobs to run in parallel for both `fit` and
    `predict`. ``None`` means 1 unless in a
    `joblib.parallel_backend` context. ``-1`` means using all
    processors. See `Glossary ` for more details.

- `random_state`: int, RandomState instance or None, default=None
    Controls the random resampling of the original dataset
    (sample wise and feature wise).
    If the base estimator accepts a `random_state` attribute, a different
    seed is generated for each instance in the ensemble.
    Pass an int for reproducible output across multiple function calls.
    See `Glossary `.

- `verbose`: int, default=0
    Controls the verbosity when fitting and predicting.

Attributes
----------
- `estimator_`: estimator
    The base estimator from which the ensemble is grown.

    *Added in 1.2*
       `base_estimator_` was renamed to `estimator_`.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `estimators_`: list of estimators
    The collection of fitted base estimators.

- `estimators_samples_`: list of arrays
    The subset of drawn samples (i.e., the in-bag samples) for each base
    estimator. Each subset is defined by an array of the indices selected.

- `estimators_features_`: list of arrays
    The subset of drawn features for each base estimator.

- `classes_`: ndarray of shape (n_classes,)
    The classes labels.

- `n_classes_`: int or list
    The number of classes.

- `oob_score_`: float
    Score of the training dataset obtained using an out-of-bag estimate.
    This attribute exists only when ``oob_score`` is True.

- `oob_decision_function_`: ndarray of shape (n_samples, n_classes)
    Decision function computed with out-of-bag estimate on the training
    set. If n_estimators is small it might be possible that a data point
    was never left out during the bootstrap. In this case,
    `oob_decision_function_` might contain NaN. This attribute exists
    only when ``oob_score`` is True.

See Also
--------
- `BaggingRegressor`: A Bagging regressor.

References
----------

[1] L. Breiman, "Pasting small votes for classification in large databases and on-line", Machine Learning, 36(1), 85-103, 1999.
[2] L. Breiman, "Bagging predictors", Machine Learning, 24(2), 123-140, 1996.
[3] T. Ho, "The random subspace method for constructing decision forests", Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.
[4] G. Louppe and P. Geurts, "Ensembles on Random Patches", Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.
Examples
from sklearn.svm import SVC from sklearn.ensemble import BaggingClassifier from sklearn.datasets import make_classification X, y = make_classification(n_samples=100, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) clf = BaggingClassifier(estimator=SVC(), ... n_estimators=10, random_state=0).fit(X, y) clf.predict([[0, 0, 0, 0]]) array([1])

24.2.3 /bernoulli-nb

name	type	default	description
alpha
binarize
class-prior
fit-prior
force-alpha
predict-proba?

Naive Bayes classifier for multivariate Bernoulli models.

Like MultinomialNB, this classifier is suitable for discrete data. The
difference is that while MultinomialNB works with occurrence counts,
BernoulliNB is designed for binary/boolean features.

Read more in the User Guide: `bernoulli_naive_bayes`.

Parameters
----------
- `alpha`: float or array-like of shape (n_features,), default=1.0
    Additive (Laplace/Lidstone) smoothing parameter
    (set alpha=0 and force_alpha=True, for no smoothing).

- `force_alpha`: bool, default=True
    If False and alpha is less than 1e-10, it will set alpha to
    1e-10. If True, alpha will remain unchanged. This may cause
    numerical errors if alpha is too close to 0.

    *Added in 1.2*
    *Changed in 1.4*
       The default value of `force_alpha` changed to `True`.

- `binarize`: float or None, default=0.0
    Threshold for binarizing (mapping to booleans) of sample features.
    If None, input is presumed to already consist of binary vectors.

- `fit_prior`: bool, default=True
    Whether to learn class prior probabilities or not.
    If false, a uniform prior will be used.

- `class_prior`: array-like of shape (n_classes,), default=None
    Prior probabilities of the classes. If specified, the priors are not
    adjusted according to the data.

Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
    Number of samples encountered for each class during fitting. This
    value is weighted by the sample weight when provided.

- `class_log_prior_`: ndarray of shape (n_classes,)
    Log probability of each class (smoothed).

- `classes_`: ndarray of shape (n_classes,)
    Class labels known to the classifier

- `feature_count_`: ndarray of shape (n_classes, n_features)
    Number of samples encountered for each (class, feature)
    during fitting. This value is weighted by the sample weight when
    provided.

- `feature_log_prob_`: ndarray of shape (n_classes, n_features)
    Empirical log probability of features given a class, P(x_i|y).

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `ComplementNB`: The Complement Naive Bayes classifier
    described in Rennie et al. (2003).
- `GaussianNB`: Gaussian Naive Bayes (GaussianNB).
- `MultinomialNB`: Naive Bayes classifier for multinomial models.

References
----------
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to
Information Retrieval. Cambridge University Press, pp. 234-265.
https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html

A. McCallum and K. Nigam (1998). A comparison of event models for naive
Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for
Text Categorization, pp. 41-48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with
naive Bayes -- Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB()
>>> print(clf.predict(X[2:3]))
[3]

24.2.4 /calibrated-classifier-cv

name	type	default	description
cv
ensemble
estimator
method
n-jobs
predict-proba?

Probability calibration with isotonic regression or logistic regression.

This class uses cross-validation to both estimate the parameters of a
classifier and subsequently calibrate a classifier. With default
`ensemble=True`, for each cv split it
fits a copy of the base estimator to the training subset, and calibrates it
using the testing subset. For prediction, predicted probabilities are
averaged across these individual calibrated classifiers. When
`ensemble=False`, cross-validation is used to obtain unbiased predictions,
via `~sklearn.model_selection.cross_val_predict`, which are then
used for calibration. For prediction, the base estimator, trained using all
the data, is used. This is the prediction method implemented when
`probabilities=True` for `~sklearn.svm.SVC` and `~sklearn.svm.NuSVC`
estimators (see User Guide: `scores_probabilities` for details).

Already fitted classifiers can be calibrated via the parameter
`cv="prefit"`. In this case, no cross-validation is used and all provided
data is used for calibration. The user has to take care manually that data
for model fitting and calibration are disjoint.

The calibration is based on the `decision_function` method of the
`estimator` if it exists, else on `predict_proba`.

Read more in the User Guide: `calibration`.
In order to learn more on the CalibratedClassifierCV class, see the
following calibration examples:
:ref:`sphx_glr_auto_examples_calibration_plot_calibration.py`,
:ref:`sphx_glr_auto_examples_calibration_plot_calibration_curve.py`, and
:ref:`sphx_glr_auto_examples_calibration_plot_calibration_multiclass.py`.

Parameters
----------
- `estimator`: estimator instance, default=None
The classifier whose output need to be calibrated to provide more
accurate `predict_proba` outputs. The default classifier is
a `~sklearn.svm.LinearSVC`.

*Added in 1.2*

- `method`: {'sigmoid', 'isotonic'}, default='sigmoid'
The method to use for calibration. Can be 'sigmoid' which
corresponds to Platt's method (i.e. a logistic regression model) or
'isotonic' which is a non-parametric approach. It is not advised to
use isotonic calibration with too few calibration samples
``(<<1000)`` since it tends to overfit.

- `cv`: int, cross-validation generator, iterable or "prefit", default=None
Determines the cross-validation splitting strategy.
Possible inputs for cv are:

- None, to use the default 5-fold cross-validation,
- integer, to specify the number of folds.
- `CV splitter`,
- An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if ``y`` is binary or multiclass,
`~sklearn.model_selection.StratifiedKFold` is used. If ``y`` is
neither binary nor multiclass, `~sklearn.model_selection.KFold`
is used.

Refer to the User Guide: `cross_validation` for the various
cross-validation strategies that can be used here.

If "prefit" is passed, it is assumed that `estimator` has been
fitted already and all data is used for calibration.

*Changed in 0.22*
``cv`` default value if None changed from 3-fold to 5-fold.

- `n_jobs`: int, default=None
Number of jobs to run in parallel.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors.

Base estimator clones are fitted in parallel across cross-validation
iterations. Therefore parallelism happens only when `cv != "prefit"`.

See `Glossary ` for more details.

*Added in 0.24*

- `ensemble`: bool, default=True
Determines how the calibrator is fitted when `cv` is not `'prefit'`.
Ignored if `cv='prefit'`.

If `True`, the `estimator` is fitted using training data, and
calibrated using testing data, for each `cv` fold. The final estimator
is an ensemble of `n_cv` fitted classifier and calibrator pairs, where
`n_cv` is the number of cross-validation folds. The output is the
average predicted probabilities of all pairs.

If `False`, `cv` is used to compute unbiased predictions, via
`~sklearn.model_selection.cross_val_predict`, which are then
used for calibration. At prediction time, the classifier used is the
`estimator` trained on all the data.
Note that this method is also internally implemented in
`sklearn.svm` estimators with the `probabilities=True` parameter.

*Added in 0.24*

Attributes
----------
- `classes_`: ndarray of shape (n_classes,)
The class labels.

- `n_features_in_`: int
Number of features seen during `fit`. Only defined if the
underlying estimator exposes such an attribute when fit.

*Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Only defined if the
underlying estimator exposes such an attribute when fit.

*Added in 1.0*

- `calibrated_classifiers_`: list (len() equal to cv or 1 if `cv="prefit"` or `ensemble=False`)
The list of classifier and calibrator pairs.

- When `cv="prefit"`, the fitted `estimator` and fitted
calibrator.
- When `cv` is not "prefit" and `ensemble=True`, `n_cv` fitted
`estimator` and calibrator pairs. `n_cv` is the number of
cross-validation folds.
- When `cv` is not "prefit" and `ensemble=False`, the `estimator`,
fitted on all the data, and fitted calibrator.

*Changed in 0.24*
Single calibrated classifier case when `ensemble=False`.

See Also
--------
- `calibration_curve`: Compute true and predicted probabilities
for a calibration curve.

References
----------

[1] Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers, B. Zadrozny & C. Elkan, ICML 2001
[2] Transforming Classifier Scores into Accurate Multiclass Probability Estimates, B. Zadrozny & C. Elkan, (KDD 2002)
[3] Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, J. Platt, (1999)
[4] Predicting Good Probabilities with Supervised Learning, A. Niculescu-Mizil & R. Caruana, ICML 2005
Examples
from sklearn.datasets import make_classification from sklearn.naive_bayes import GaussianNB from sklearn.calibration import CalibratedClassifierCV X, y = make_classification(n_samples=100, n_features=2, ... n_redundant=0, random_state=42) base_clf = GaussianNB() calibrated_clf = CalibratedClassifierCV(base_clf, cv=3) calibrated_clf.fit(X, y) CalibratedClassifierCV(...) len(calibrated_clf.calibrated_classifiers_) 3 calibrated_clf.predict_proba(X)[:5, :] array([[0.110..., 0.889...], [0.072..., 0.927...], [0.928..., 0.071...], [0.928..., 0.071...], [0.071..., 0.928...]]) from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=100, n_features=2, ... n_redundant=0, random_state=42) X_train, X_calib, y_train, y_calib = train_test_split( ... X, y, random_state=42 ... ) base_clf = GaussianNB() base_clf.fit(X_train, y_train) GaussianNB() calibrated_clf = CalibratedClassifierCV(base_clf, cv="prefit") calibrated_clf.fit(X_calib, y_calib) CalibratedClassifierCV(...) len(calibrated_clf.calibrated_classifiers_) 1 calibrated_clf.predict_proba([[-0.5, 0.5]]) array([[0.936..., 0.063...]])

24.2.5 /categorical-nb

name	type	default	description
alpha
class-prior
fit-prior
force-alpha
min-categories
predict-proba?

Naive Bayes classifier for categorical features.

The categorical Naive Bayes classifier is suitable for classification with
discrete features that are categorically distributed. The categories of
each feature are drawn from a categorical distribution.

Read more in the User Guide: `categorical_naive_bayes`.

Parameters
----------
- `alpha`: float, default=1.0
    Additive (Laplace/Lidstone) smoothing parameter
    (set alpha=0 and force_alpha=True, for no smoothing).

- `force_alpha`: bool, default=True
    If False and alpha is less than 1e-10, it will set alpha to
    1e-10. If True, alpha will remain unchanged. This may cause
    numerical errors if alpha is too close to 0.

    *Added in 1.2*
    *Changed in 1.4*
       The default value of `force_alpha` changed to `True`.

- `fit_prior`: bool, default=True
    Whether to learn class prior probabilities or not.
    If false, a uniform prior will be used.

- `class_prior`: array-like of shape (n_classes,), default=None
    Prior probabilities of the classes. If specified, the priors are not
    adjusted according to the data.

- `min_categories`: int or array-like of shape (n_features,), default=None
    Minimum number of categories per feature.

    - integer: Sets the minimum number of categories per feature to
      `n_categories` for each features.
    - array-like: shape (n_features,) where `n_categories[i]` holds the
      minimum number of categories for the ith column of the input.
    - None (default): Determines the number of categories automatically
      from the training data.

    *Added in 0.24*

Attributes
----------
- `category_count_`: list of arrays of shape (n_features,)
    Holds arrays of shape (n_classes, n_categories of respective feature)
    for each feature. Each array provides the number of samples
    encountered for each class and category of the specific feature.

- `class_count_`: ndarray of shape (n_classes,)
    Number of samples encountered for each class during fitting. This
    value is weighted by the sample weight when provided.

- `class_log_prior_`: ndarray of shape (n_classes,)
    Smoothed empirical log probability for each class.

- `classes_`: ndarray of shape (n_classes,)
    Class labels known to the classifier

- `feature_log_prob_`: list of arrays of shape (n_features,)
    Holds arrays of shape (n_classes, n_categories of respective feature)
    for each feature. Each array provides the empirical log probability
    of categories given the respective feature and class, ``P(x_i|y)``.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_categories_`: ndarray of shape (n_features,), dtype=np.int64
    Number of categories for each feature. This value is
    inferred from the data or set by the minimum number of categories.

    *Added in 0.24*

See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `ComplementNB`: Complement Naive Bayes classifier.
- `GaussianNB`: Gaussian Naive Bayes.
- `MultinomialNB`: Naive Bayes classifier for multinomial models.

Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import CategoricalNB
>>> clf = CategoricalNB()
>>> clf.fit(X, y)
CategoricalNB()
>>> print(clf.predict(X[2:3]))
[3]

24.2.6 /complement-nb

name	type	default	description
alpha
class-prior
fit-prior
force-alpha
norm
predict-proba?

The Complement Naive Bayes classifier described in Rennie et al. (2003).

The Complement Naive Bayes classifier was designed to correct the "severe
assumptions" made by the standard Multinomial Naive Bayes classifier. It is
particularly suited for imbalanced data sets.

Read more in the User Guide: `complement_naive_bayes`.

*Added in 0.20*

Parameters
----------
- `alpha`: float or array-like of shape (n_features,), default=1.0
    Additive (Laplace/Lidstone) smoothing parameter
    (set alpha=0 and force_alpha=True, for no smoothing).

- `force_alpha`: bool, default=True
    If False and alpha is less than 1e-10, it will set alpha to
    1e-10. If True, alpha will remain unchanged. This may cause
    numerical errors if alpha is too close to 0.

    *Added in 1.2*
    *Changed in 1.4*
       The default value of `force_alpha` changed to `True`.

- `fit_prior`: bool, default=True
    Only used in edge case with a single class in the training set.

- `class_prior`: array-like of shape (n_classes,), default=None
    Prior probabilities of the classes. Not used.

- `norm`: bool, default=False
    Whether or not a second normalization of the weights is performed. The
    default behavior mirrors the implementations found in Mahout and Weka,
    which do not follow the full algorithm described in Table 9 of the
    paper.

Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
    Number of samples encountered for each class during fitting. This
    value is weighted by the sample weight when provided.

- `class_log_prior_`: ndarray of shape (n_classes,)
    Smoothed empirical log probability for each class. Only used in edge
    case with a single class in the training set.

- `classes_`: ndarray of shape (n_classes,)
    Class labels known to the classifier

- `feature_all_`: ndarray of shape (n_features,)
    Number of samples encountered for each feature during fitting. This
    value is weighted by the sample weight when provided.

- `feature_count_`: ndarray of shape (n_classes, n_features)
    Number of samples encountered for each (class, feature) during fitting.
    This value is weighted by the sample weight when provided.

- `feature_log_prob_`: ndarray of shape (n_classes, n_features)
    Empirical weights for class complements.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `GaussianNB`: Gaussian Naive Bayes.
- `MultinomialNB`: Naive Bayes classifier for multinomial models.

References
----------
Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003).
Tackling the poor assumptions of naive bayes text classifiers. In ICML
(Vol. 3, pp. 616-623).
https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import ComplementNB
>>> clf = ComplementNB()
>>> clf.fit(X, y)
ComplementNB()
>>> print(clf.predict(X[2:3]))
[3]

24.2.7 /decision-tree-classifier

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
class-weight
criterion
predict-proba?

A decision tree classifier.

Read more in the User Guide: `tree`.

Parameters
----------
- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
    The function to measure the quality of a split. Supported criteria are
    "gini" for the Gini impurity and "log_loss" and "entropy" both for the
    Shannon information gain, see :ref:`tree_mathematical_formulation`.

- `splitter`: {"best", "random"}, default="best"
    The strategy used to choose the split at each node. Supported
    strategies are "best" to choose the best split and "random" to choose
    the best random split.

- `max_depth`: int, default=None
    The maximum depth of the tree. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    min_samples_split samples.

- `min_samples_split`: int or float, default=2
    The minimum number of samples required to split an internal node:

    - If int, then consider `min_samples_split` as the minimum number.
    - If float, then `min_samples_split` is a fraction and
      `ceil(min_samples_split * n_samples)` are the minimum
      number of samples for each split.

    *Changed in 0.18*
       Added float values for fractions.

- `min_samples_leaf`: int or float, default=1
    The minimum number of samples required to be at a leaf node.
    A split point at any depth will only be considered if it leaves at
    least ``min_samples_leaf`` training samples in each of the left and
    right branches.  This may have the effect of smoothing the model,
    especially in regression.

    - If int, then consider `min_samples_leaf` as the minimum number.
    - If float, then `min_samples_leaf` is a fraction and
      `ceil(min_samples_leaf * n_samples)` are the minimum
      number of samples for each node.

    *Changed in 0.18*
       Added float values for fractions.

- `min_weight_fraction_leaf`: float, default=0.0
    The minimum weighted fraction of the sum total of weights (of all
    the input samples) required to be at a leaf node. Samples have
    equal weight when sample_weight is not provided.

- `max_features`: int, float or {"sqrt", "log2"}, default=None
    The number of features to consider when looking for the best split:

        - If int, then consider `max_features` features at each split.
        - If float, then `max_features` is a fraction and
          `max(1, int(max_features * n_features_in_))` features are considered at
          each split.
        - If "sqrt", then `max_features=sqrt(n_features)`.
        - If "log2", then `max_features=log2(n_features)`.
        - If None, then `max_features=n_features`.

    Note: the search for a split does not stop until at least one
    valid partition of the node samples is found, even if it requires to
    effectively inspect more than ``max_features`` features.

- `random_state`: int, RandomState instance or None, default=None
    Controls the randomness of the estimator. The features are always
    randomly permuted at each split, even if ``splitter`` is set to
    ``"best"``. When ``max_features < n_features``, the algorithm will
    select ``max_features`` at random at each split before finding the best
    split among them. But the best found split may vary across different
    runs, even if ``max_features=n_features``. That is the case, if the
    improvement of the criterion is identical for several splits and one
    split has to be selected at random. To obtain a deterministic behaviour
    during fitting, ``random_state`` has to be fixed to an integer.
    See `Glossary ` for details.

- `max_leaf_nodes`: int, default=None
    Grow a tree with ``max_leaf_nodes`` in best-first fashion.
    Best nodes are defined as relative reduction in impurity.
    If None then unlimited number of leaf nodes.

- `min_impurity_decrease`: float, default=0.0
    A node will be split if this split induces a decrease of the impurity
    greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19

ight : dict, list of dict or "balanced", default=None
hts associated with classes in the form ``{class_label: weight}``.
one, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.

 that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].

"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``

multi-output, the weights of each column of y will be multiplied.

 that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.

a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.

ersionadded:: 0.22

c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonic increase
0: no constraint
-1: monotonic decrease

onotonic_cst is None, no constraints are applied.

tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.

constraints hold over the probability of the positive class.

 more in the :ref:`User Guide `.

ersionadded:: 1.4

es
--
 : ndarray of shape (n_classes,) or list of ndarray
classes labels (single output problem),
 list of arrays of class labels (multi-output problem).

importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature.  It is also
n as the Gini importance [4]_.

ing: impurity-based feature importances can be misleading for
 cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.

ures_ : int
inferred value of max_features.

s_ : int or list of int
number of classes (for single output problems),
 list containing the number of classes for each
ut (for multi-output problems).

es_in_ : int
er of features seen during :term:`fit`.

ersionadded:: 0.24

names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.

ersionadded:: 1.0

s_ : int
number of outputs when ``fit`` is performed.

Tree instance
underlying Tree object. Please refer to
lp(sklearn.tree._tree.Tree)`` for attributes of Tree object and
:`sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py`
basic usage of these attributes.



TreeRegressor : A decision tree regressor.



ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
 trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.

h:`predict` method operates using the :func:`numpy.argmax`
 on the outputs of :meth:`predict_proba`. This means that in
 highest predicted probabilities are tied, the classifier will
the tied class with the lowest index in :term:`classes_`.

es
--

ttps://en.wikipedia.org/wiki/Decision_tree_learning

. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification
nd Regression Trees", Wadsworth, Belmont, CA, 1984.

. Hastie, R. Tibshirani and J. Friedman. "Elements of Statistical
earning", Springer, 2009.

. Breiman, and A. Cutler, "Random Forests",
ttps://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm



 sklearn.datasets import load_iris
 sklearn.model_selection import cross_val_score
 sklearn.tree import DecisionTreeClassifier
= DecisionTreeClassifier(random_state=0)
 = load_iris()
s_val_score(clf, iris.data, iris.target, cv=10)
                        # doctest: +SKIP

1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
0.93...,  0.93...,  1.     ,  0.93...,  1.      ])

24.2.8 /dummy-classifier

name	type	default	description
constant
random-state
strategy
predict-proba?

DummyClassifier makes predictions that ignore the input features.

This classifier serves as a simple baseline to compare against other more
complex classifiers.

The specific behavior of the baseline is selected with the `strategy`
parameter.

All strategies make predictions that ignore the input feature values passed
as the `X` argument to `fit` and `predict`. The predictions, however,
typically depend on values observed in the `y` parameter passed to `fit`.

Note that the "stratified" and "uniform" strategies lead to
non-deterministic predictions that can be rendered deterministic by setting
the `random_state` parameter if needed. The other strategies are naturally
deterministic and, once fit, always return the same constant prediction
for any value of `X`.

Read more in the User Guide: `dummy_estimators`.

*Added in 0.13*

Parameters
----------
- `strategy`: {"most_frequent", "prior", "stratified", "uniform",             "constant"}, default="prior"
    Strategy to use to generate predictions.

    * "most_frequent": the `predict` method always returns the most
      frequent class label in the observed `y` argument passed to `fit`.
      The `predict_proba` method returns the matching one-hot encoded
      vector.
    * "prior": the `predict` method always returns the most frequent
      class label in the observed `y` argument passed to `fit` (like
      "most_frequent"). ``predict_proba`` always returns the empirical
      class distribution of `y` also known as the empirical class prior
      distribution.
    * "stratified": the `predict_proba` method randomly samples one-hot
      vectors from a multinomial distribution parametrized by the empirical
      class prior probabilities.
      The `predict` method returns the class label which got probability
      one in the one-hot vector of `predict_proba`.
      Each sampled row of both methods is therefore independent and
      identically distributed.
    * "uniform": generates predictions uniformly at random from the list
      of unique classes observed in `y`, i.e. each class has equal
      probability.
    * "constant": always predicts a constant label that is provided by
      the user. This is useful for metrics that evaluate a non-majority
      class.

      *Changed in 0.24*
         The default value of `strategy` has changed to "prior" in version
         0.24.

- `random_state`: int, RandomState instance or None, default=None
    Controls the randomness to generate the predictions when
    ``strategy='stratified'`` or ``strategy='uniform'``.
    Pass an int for reproducible output across multiple function calls.
    See `Glossary `.

- `constant`: int or str or array-like of shape (n_outputs,), default=None
    The explicit constant as predicted by the "constant" strategy. This
    parameter is useful only for the "constant" strategy.

Attributes
----------
- `classes_`: ndarray of shape (n_classes,) or list of such arrays
    Unique class labels observed in `y`. For multi-output classification
    problems, this attribute is a list of arrays as each output has an
    independent set of possible classes.

- `n_classes_`: int or list of int
    Number of label for each output.

- `class_prior_`: ndarray of shape (n_classes,) or list of such arrays
    Frequency of each class observed in `y`. For multioutput classification
    problems, this is computed independently for each output.

- `n_features_in_`: int
    Number of features seen during `fit`.

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X` has
    feature names that are all strings.

- `n_outputs_`: int
    Number of outputs.

- `sparse_output_`: bool
    True if the array returned from predict is to be in sparse CSC format.
    Is automatically set to True if the input `y` is passed in sparse
    format.

See Also
--------
- `DummyRegressor`: Regressor that makes predictions using simple rules.

Examples
--------
>>> import numpy as np
>>> from sklearn.dummy import DummyClassifier
>>> X = np.array([-1, 1, 1, 1])
>>> y = np.array([0, 1, 1, 1])
>>> dummy_clf = DummyClassifier(strategy="most_frequent")
>>> dummy_clf.fit(X, y)
DummyClassifier(strategy='most_frequent')
>>> dummy_clf.predict(X)
array([1, 1, 1, 1])
>>> dummy_clf.score(X, y)
0.75

24.2.9 /extra-tree-classifier

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
class-weight
criterion
predict-proba?

An extremely randomized tree classifier.

Extra-trees differ from classic decision trees in the way they are built.
When looking for the best split to separate the samples of a node into two
groups, random splits are drawn for each of the `max_features` randomly
selected features and the best split among those is chosen. When
`max_features` is set 1, this amounts to building a totally random
decision tree.

Warning: Extra-trees should only be used within ensemble methods.

Read more in the User Guide: `tree`.

Parameters
----------
- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
    The function to measure the quality of a split. Supported criteria are
    "gini" for the Gini impurity and "log_loss" and "entropy" both for the
    Shannon information gain, see :ref:`tree_mathematical_formulation`.

- `splitter`: {"random", "best"}, default="random"
    The strategy used to choose the split at each node. Supported
    strategies are "best" to choose the best split and "random" to choose
    the best random split.

- `max_depth`: int, default=None
    The maximum depth of the tree. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    min_samples_split samples.

- `min_samples_split`: int or float, default=2
    The minimum number of samples required to split an internal node:

    - If int, then consider `min_samples_split` as the minimum number.
    - If float, then `min_samples_split` is a fraction and
      `ceil(min_samples_split * n_samples)` are the minimum
      number of samples for each split.

    *Changed in 0.18*
       Added float values for fractions.

- `min_samples_leaf`: int or float, default=1
    The minimum number of samples required to be at a leaf node.
    A split point at any depth will only be considered if it leaves at
    least ``min_samples_leaf`` training samples in each of the left and
    right branches.  This may have the effect of smoothing the model,
    especially in regression.

    - If int, then consider `min_samples_leaf` as the minimum number.
    - If float, then `min_samples_leaf` is a fraction and
      `ceil(min_samples_leaf * n_samples)` are the minimum
      number of samples for each node.

    *Changed in 0.18*
       Added float values for fractions.

- `min_weight_fraction_leaf`: float, default=0.0
    The minimum weighted fraction of the sum total of weights (of all
    the input samples) required to be at a leaf node. Samples have
    equal weight when sample_weight is not provided.

- `max_features`: int, float, {"sqrt", "log2"} or None, default="sqrt"
    The number of features to consider when looking for the best split:

    - If int, then consider `max_features` features at each split.
    - If float, then `max_features` is a fraction and
      `max(1, int(max_features * n_features_in_))` features are considered at
      each split.
    - If "sqrt", then `max_features=sqrt(n_features)`.
    - If "log2", then `max_features=log2(n_features)`.
    - If None, then `max_features=n_features`.

    *Changed in 1.1*
        The default of `max_features` changed from `"auto"` to `"sqrt"`.

    Note: the search for a split does not stop until at least one
    valid partition of the node samples is found, even if it requires to
    effectively inspect more than ``max_features`` features.

- `random_state`: int, RandomState instance or None, default=None
    Used to pick randomly the `max_features` used at each split.
    See `Glossary ` for details.

- `max_leaf_nodes`: int, default=None
    Grow a tree with ``max_leaf_nodes`` in best-first fashion.
    Best nodes are defined as relative reduction in impurity.
    If None then unlimited number of leaf nodes.

- `min_impurity_decrease`: float, default=0.0
    A node will be split if this split induces a decrease of the impurity
    greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19

ight : dict, list of dict or "balanced", default=None
hts associated with classes in the form ``{class_label: weight}``.
one, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.

 that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].

"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``

multi-output, the weights of each column of y will be multiplied.

 that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.

a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.

ersionadded:: 0.22

c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonic increase
0: no constraint
-1: monotonic decrease

onotonic_cst is None, no constraints are applied.

tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.

constraints hold over the probability of the positive class.

 more in the :ref:`User Guide `.

ersionadded:: 1.4

es
--
 : ndarray of shape (n_classes,) or list of ndarray
classes labels (single output problem),
 list of arrays of class labels (multi-output problem).

ures_ : int
inferred value of max_features.

s_ : int or list of int
number of classes (for single output problems),
 list containing the number of classes for each
ut (for multi-output problems).

importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature.  It is also
n as the Gini importance.

ing: impurity-based feature importances can be misleading for
 cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.

es_in_ : int
er of features seen during :term:`fit`.

ersionadded:: 0.24

names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.

ersionadded:: 1.0

s_ : int
number of outputs when ``fit`` is performed.

Tree instance
underlying Tree object. Please refer to
lp(sklearn.tree._tree.Tree)`` for attributes of Tree object and
:`sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py`
basic usage of these attributes.



eRegressor : An extremely randomized tree regressor.
ensemble.ExtraTreesClassifier : An extra-trees classifier.
ensemble.ExtraTreesRegressor : An extra-trees regressor.
ensemble.RandomForestClassifier : A random forest classifier.
ensemble.RandomForestRegressor : A random forest regressor.
ensemble.RandomTreesEmbedding : An ensemble of
lly random trees.



ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
 trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.

es
--

. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized trees",
achine Learning, 63(1), 3-42, 2006.



 sklearn.datasets import load_iris
 sklearn.model_selection import train_test_split
 sklearn.ensemble import BaggingClassifier
 sklearn.tree import ExtraTreeClassifier
 = load_iris(return_X_y=True)
ain, X_test, y_train, y_test = train_test_split(
, y, random_state=0)
a_tree = ExtraTreeClassifier(random_state=0)
= BaggingClassifier(extra_tree, random_state=0).fit(
_train, y_train)
score(X_test, y_test)
.

24.2.10 /extra-trees-classifier

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
class-weight
n-estimators
max-samples
criterion
verbose
predict-proba?

An extra-trees classifier.

This class implements a meta estimator that fits a number of
randomized decision trees (a.k.a. extra-trees) on various sub-samples
of the dataset and uses averaging to improve the predictive accuracy
and control over-fitting.

Read more in the User Guide: `forest`.

Parameters
----------
- `n_estimators`: int, default=100
    The number of trees in the forest.

    *Changed in 0.22*
       The default value of ``n_estimators`` changed from 10 to 100
       in 0.22.

- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
    The function to measure the quality of a split. Supported criteria are
    "gini" for the Gini impurity and "log_loss" and "entropy" both for the
    Shannon information gain, see :ref:`tree_mathematical_formulation`.
    Note: This parameter is tree-specific.

- `max_depth`: int, default=None
    The maximum depth of the tree. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    min_samples_split samples.

- `min_samples_split`: int or float, default=2
    The minimum number of samples required to split an internal node:

    - If int, then consider `min_samples_split` as the minimum number.
    - If float, then `min_samples_split` is a fraction and
      `ceil(min_samples_split * n_samples)` are the minimum
      number of samples for each split.

    *Changed in 0.18*
       Added float values for fractions.

- `min_samples_leaf`: int or float, default=1
    The minimum number of samples required to be at a leaf node.
    A split point at any depth will only be considered if it leaves at
    least ``min_samples_leaf`` training samples in each of the left and
    right branches.  This may have the effect of smoothing the model,
    especially in regression.

    - If int, then consider `min_samples_leaf` as the minimum number.
    - If float, then `min_samples_leaf` is a fraction and
      `ceil(min_samples_leaf * n_samples)` are the minimum
      number of samples for each node.

    *Changed in 0.18*
       Added float values for fractions.

- `min_weight_fraction_leaf`: float, default=0.0
    The minimum weighted fraction of the sum total of weights (of all
    the input samples) required to be at a leaf node. Samples have
    equal weight when sample_weight is not provided.

- `max_features`: {"sqrt", "log2", None}, int or float, default="sqrt"
    The number of features to consider when looking for the best split:

    - If int, then consider `max_features` features at each split.
    - If float, then `max_features` is a fraction and
      `max(1, int(max_features * n_features_in_))` features are considered at each
      split.
    - If "sqrt", then `max_features=sqrt(n_features)`.
    - If "log2", then `max_features=log2(n_features)`.
    - If None, then `max_features=n_features`.

    *Changed in 1.1*
        The default of `max_features` changed from `"auto"` to `"sqrt"`.

    Note: the search for a split does not stop until at least one
    valid partition of the node samples is found, even if it requires to
    effectively inspect more than ``max_features`` features.

- `max_leaf_nodes`: int, default=None
    Grow trees with ``max_leaf_nodes`` in best-first fashion.
    Best nodes are defined as relative reduction in impurity.
    If None then unlimited number of leaf nodes.

- `min_impurity_decrease`: float, default=0.0
    A node will be split if this split induces a decrease of the impurity
    greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19

p : bool, default=False
her bootstrap samples are used when building trees. If False, the
e dataset is used to build each tree.

e : bool or callable, default=False
her to use out-of-bag samples to estimate the generalization score.
efault, :func:`~sklearn.metrics.accuracy_score` is used.
ide a callable with signature `metric(y_true, y_pred)` to use a
om metric. Only available if `bootstrap=True`.

 int, default=None
number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
h:`decision_path` and :meth:`apply` are all parallelized over the
s. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
ext. ``-1`` means using all processors. See :term:`Glossary
obs>` for more details.

tate : int, RandomState instance or None, default=None
rols 3 sources of randomness:

e bootstrapping of the samples used when building trees
f ``bootstrap=True``)
e sampling of the features to consider when looking for the best
lit at each node (if ``max_features < n_features``)
e draw of the splits for each of the `max_features`

:term:`Glossary ` for details.

: int, default=0
rols the verbosity when fitting and predicting.

rt : bool, default=False
 set to ``True``, reuse the solution of the previous call to fit
add more estimators to the ensemble, otherwise, just fit a whole
forest. See :term:`Glossary ` and
:`tree_ensemble_warm_start` for details.

ight : {"balanced", "balanced_subsample"}, dict or list of dicts,             default=None
hts associated with classes in the form ``{class_label: weight}``.
ot given, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.

 that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].

"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``

"balanced_subsample" mode is the same as "balanced" except that
hts are computed based on the bootstrap sample for every tree
n.

multi-output, the weights of each column of y will be multiplied.

 that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.

a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.

ersionadded:: 0.22

les : int or float, default=None
ootstrap is True, the number of samples to draw from X
rain each base estimator.

 None (default), then draw `X.shape[0]` samples.
 int, then draw `max_samples` samples.
 float, then draw `max_samples * X.shape[0]` samples. Thus,
ax_samples` should be in the interval `(0.0, 1.0]`.

ersionadded:: 0.22

c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonically increasing
0: no constraint
-1: monotonically decreasing

onotonic_cst is None, no constraints are applied.

tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.

constraints hold over the probability of the positive class.

 more in the :ref:`User Guide `.

ersionadded:: 1.4

es
--
r_ : :class:`~sklearn.tree.ExtraTreeClassifier`
child estimator template used to create the collection of fitted
estimators.

ersionadded:: 1.2
base_estimator_` was renamed to `estimator_`.

rs_ : list of DecisionTreeClassifier
collection of fitted sub-estimators.

 : ndarray of shape (n_classes,) or a list of such arrays
classes labels (single output problem), or a list of arrays of
s labels (multi-output problem).

s_ : int or list
number of classes (single output problem), or a list containing the
er of classes for each output (multi-output problem).

importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature.  It is also
n as the Gini importance.

ing: impurity-based feature importances can be misleading for
 cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.

es_in_ : int
er of features seen during :term:`fit`.

ersionadded:: 0.24

names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.

ersionadded:: 1.0

s_ : int
number of outputs when ``fit`` is performed.

e_ : float
e of the training dataset obtained using an out-of-bag estimate.
 attribute exists only when ``oob_score`` is True.

sion_function_ : ndarray of shape (n_samples, n_classes) or             (n_samples, n_classes, n_outputs)
sion function computed with out-of-bag estimate on the training
 If n_estimators is small it might be possible that a data point
never left out during the bootstrap. In this case,
_decision_function_` might contain NaN. This attribute exists
 when ``oob_score`` is True.

rs_samples_ : list of arrays
subset of drawn samples (i.e., the in-bag samples) for each base
mator. Each subset is defined by an array of the indices selected.

ersionadded:: 1.4



esRegressor : An extra-trees regressor with random splits.
restClassifier : A random forest classifier with optimal splits.
restRegressor : Ensemble regressor using trees with optimal splits.



ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
 trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.

es
--
. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized
rees", Machine Learning, 63(1), 3-42, 2006.



 sklearn.ensemble import ExtraTreesClassifier
 sklearn.datasets import make_classification
 = make_classification(n_features=4, random_state=0)
= ExtraTreesClassifier(n_estimators=100, random_state=0)
fit(X, y)
esClassifier(random_state=0)
predict([[0, 0, 0, 0]])
])

24.2.11 /gaussian-nb

name	type	default	description
priors
var-smoothing
predict-proba?

Gaussian Naive Bayes (GaussianNB).

Can perform online updates to model parameters via `partial_fit`.
For details on algorithm used to update feature means and variance online,
see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:

    http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf

Read more in the User Guide: `gaussian_naive_bayes`.

Parameters
----------
- `priors`: array-like of shape (n_classes,), default=None
    Prior probabilities of the classes. If specified, the priors are not
    adjusted according to the data.

- `var_smoothing`: float, default=1e-9
    Portion of the largest variance of all features that is added to
    variances for calculation stability.

    *Added in 0.20*

Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
    number of training samples observed in each class.

- `class_prior_`: ndarray of shape (n_classes,)
    probability of each class.

- `classes_`: ndarray of shape (n_classes,)
    class labels known to the classifier.

- `epsilon_`: float
    absolute additive value to variances.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `var_`: ndarray of shape (n_classes, n_features)
    Variance of each feature per class.

    *Added in 1.0*

- `theta_`: ndarray of shape (n_classes, n_features)
    mean of each feature per class.

See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `ComplementNB`: Complement Naive Bayes classifier.
- `MultinomialNB`: Naive Bayes classifier for multinomial models.

Examples
--------
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([1, 1, 1, 2, 2, 2])
>>> from sklearn.naive_bayes import GaussianNB
>>> clf = GaussianNB()
>>> clf.fit(X, Y)
GaussianNB()
>>> print(clf.predict([[-0.8, -1]]))
[1]
>>> clf_pf = GaussianNB()
>>> clf_pf.partial_fit(X, Y, np.unique(Y))
GaussianNB()
>>> print(clf_pf.predict([[-0.8, -1]]))
[1]

24.2.12 /gaussian-process-classifier

name	type	default	description
kernel
optimizer
multi-class
n-jobs
random-state
max-iter-predict
copy-x-train
n-restarts-optimizer
warm-start
predict-proba?

Gaussian process classification (GPC) based on Laplace approximation.

The implementation is based on Algorithm 3.1, 3.2, and 5.1 from [RW2006]_.

Internally, the Laplace approximation is used for approximating the
non-Gaussian posterior by a Gaussian.

Currently, the implementation is restricted to using the logistic link
function. For multi-class classification, several binary one-versus rest
classifiers are fitted. Note that this class thus does not implement
a true multi-class Laplace approximation.

Read more in the User Guide: `gaussian_process`.

*Added in 0.18*

Parameters
----------
- `kernel`: kernel instance, default=None
    The kernel specifying the covariance function of the GP. If None is
    passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that
    the kernel's hyperparameters are optimized during fitting. Also kernel
    cannot be a `CompoundKernel`.

- `optimizer`: 'fmin_l_bfgs_b', callable or None, default='fmin_l_bfgs_b'
    Can either be one of the internally supported optimizers for optimizing
    the kernel's parameters, specified by a string, or an externally
    defined optimizer passed as a callable. If a callable is passed, it
    must have the  signature

def optimizer(obj_func, initial_theta, bounds):
    # * 'obj_func' is the objective function to be maximized, which
    #   takes the hyperparameters theta as parameter and an
    #   optional flag eval_gradient, which determines if the
    #   gradient is returned additionally to the function value
    # * 'initial_theta': the initial value for theta, which can be
    #   used by local optimizers
    # * 'bounds': the bounds on the values of theta
    ....
    # Returned are the best found hyperparameters theta and
    # the corresponding value of the target function.
    return theta_opt, func_min

default, the 'L-BFGS-B' algorithm from scipy.optimize.minimize
sed. If None is passed, the kernel's parameters are kept fixed.
lable internal optimizers are::

'fmin_l_bfgs_b'

ts_optimizer : int, default=0
number of restarts of the optimizer for finding the kernel's
meters which maximize the log-marginal likelihood. The first run
he optimizer is performed from the kernel's initial parameters,
remaining ones (if any) from thetas sampled log-uniform randomly
 the space of allowed theta-values. If greater than 0, all bounds
 be finite. Note that n_restarts_optimizer=0 implies that one
is performed.

_predict : int, default=100
maximum number of iterations in Newton's method for approximating
posterior during predict. Smaller values will reduce computation
 at the cost of worse results.

rt : bool, default=False
arm-starts are enabled, the solution of the last Newton iteration
he Laplace approximation of the posterior mode is used as
ialization for the next call of _posterior_mode(). This can speed
onvergence when _posterior_mode is called several times on similar
lems as in hyperparameter optimization. See :term:`the Glossary
m_start>`.

rain : bool, default=True
rue, a persistent copy of the training data is stored in the
ct. Otherwise, just a reference to the training data is stored,
h might cause predictions to change if the data is modified
rnally.

tate : int, RandomState instance or None, default=None
rmines random number generation used to initialize the centers.
 an int for reproducible results across multiple function calls.
:term:`Glossary `.

ass : {'one_vs_rest', 'one_vs_one'}, default='one_vs_rest'
ifies how multi-class classification problems are handled.
orted are 'one_vs_rest' and 'one_vs_one'. In 'one_vs_rest',
binary Gaussian process classifier is fitted for each class, which
rained to separate this class from the rest. In 'one_vs_one', one
ry Gaussian process classifier is fitted for each pair of classes,
h is trained to separate these two classes. The predictions of
e binary predictors are combined into multi-class predictions.
 that 'one_vs_one' does not support predicting probability
mates.

 int, default=None
number of jobs to use for the computation: the specified
iclass problems are computed in parallel.
ne`` means 1 unless in a :obj:`joblib.parallel_backend` context.
`` means using all processors. See :term:`Glossary `
more details.

es
--
imator_ : ``Estimator`` instance
estimator instance that defines the likelihood function
g the observed data.

: kernel instance
kernel used for prediction. In case of binary classification,
structure of the kernel is the same as the one passed as parameter
with optimized hyperparameters. In case of multi-class
sification, a CompoundKernel is returned which consists of the
erent kernels used in the one-versus-rest classifiers.

inal_likelihood_value_ : float
log-marginal-likelihood of ``self.kernel_.theta``

 : array-like of shape (n_classes,)
ue class labels.

s_ : int
number of classes in the training data

es_in_ : int
er of features seen during :term:`fit`.

ersionadded:: 0.24

names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.

ersionadded:: 1.0



ProcessRegressor : Gaussian process regression (GPR).

es
--
06] `Carl E. Rasmussen and Christopher K.I. Williams,
sian Processes for Machine Learning",
ress 2006 `_



 sklearn.datasets import load_iris
 sklearn.gaussian_process import GaussianProcessClassifier
 sklearn.gaussian_process.kernels import RBF
 = load_iris(return_X_y=True)
el = 1.0 * RBF(1.0)
= GaussianProcessClassifier(kernel=kernel,
    random_state=0).fit(X, y)
score(X, y)
.
predict_proba(X[:2,:])
0.83548752, 0.03228706, 0.13222543],
0.79064206, 0.06525643, 0.14410151]])

24.2.13 /gradient-boosting-classifier

name	type	default	description
n-iter-no-change
learning-rate
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
tol
subsample
ccp-alpha
random-state
min-samples-leaf
max-features
init
warm-start
max-depth
validation-fraction
n-estimators
criterion
loss
verbose
predict-proba?

Gradient Boosting for classification.

This algorithm builds an additive model in a forward stage-wise fashion; it
allows for the optimization of arbitrary differentiable loss functions. In
each stage ``n_classes_`` regression trees are fit on the negative gradient
of the loss function, e.g. binary or multiclass log loss. Binary
classification is a special case where only a single regression tree is
induced.

`~sklearn.ensemble.HistGradientBoostingClassifier` is a much faster variant
of this algorithm for intermediate and large datasets (`n_samples >= 10_000`) and
supports monotonic constraints.

Read more in the User Guide: `gradient_boosting`.

Parameters
----------
- `loss`: {'log_loss', 'exponential'}, default='log_loss'
    The loss function to be optimized. 'log_loss' refers to binomial and
    multinomial deviance, the same as used in logistic regression.
    It is a good choice for classification with probabilistic outputs.
    For loss 'exponential', gradient boosting recovers the AdaBoost algorithm.

- `learning_rate`: float, default=0.1
    Learning rate shrinks the contribution of each tree by `learning_rate`.
    There is a trade-off between learning_rate and n_estimators.
    Values must be in the range `[0.0, inf)`.

- `n_estimators`: int, default=100
    The number of boosting stages to perform. Gradient boosting
    is fairly robust to over-fitting so a large number usually
    results in better performance.
    Values must be in the range `[1, inf)`.

- `subsample`: float, default=1.0
    The fraction of samples to be used for fitting the individual base
    learners. If smaller than 1.0 this results in Stochastic Gradient
    Boosting. `subsample` interacts with the parameter `n_estimators`.
    Choosing `subsample < 1.0` leads to a reduction of variance
    and an increase in bias.
    Values must be in the range `(0.0, 1.0]`.

- `criterion`: {'friedman_mse', 'squared_error'}, default='friedman_mse'
    The function to measure the quality of a split. Supported criteria are
    'friedman_mse' for the mean squared error with improvement score by
    Friedman, 'squared_error' for mean squared error. The default value of
    'friedman_mse' is generally the best as it can provide a better
    approximation in some cases.

    *Added in 0.18*

- `min_samples_split`: int or float, default=2
    The minimum number of samples required to split an internal node:

    - If int, values must be in the range `[2, inf)`.
    - If float, values must be in the range `(0.0, 1.0]` and `min_samples_split`
      will be `ceil(min_samples_split * n_samples)`.

    *Changed in 0.18*
       Added float values for fractions.

- `min_samples_leaf`: int or float, default=1
    The minimum number of samples required to be at a leaf node.
    A split point at any depth will only be considered if it leaves at
    least ``min_samples_leaf`` training samples in each of the left and
    right branches.  This may have the effect of smoothing the model,
    especially in regression.

    - If int, values must be in the range `[1, inf)`.
    - If float, values must be in the range `(0.0, 1.0)` and `min_samples_leaf`
      will be `ceil(min_samples_leaf * n_samples)`.

    *Changed in 0.18*
       Added float values for fractions.

- `min_weight_fraction_leaf`: float, default=0.0
    The minimum weighted fraction of the sum total of weights (of all
    the input samples) required to be at a leaf node. Samples have
    equal weight when sample_weight is not provided.
    Values must be in the range `[0.0, 0.5]`.

- `max_depth`: int or None, default=3
    Maximum depth of the individual regression estimators. The maximum
    depth limits the number of nodes in the tree. Tune this parameter
    for best performance; the best value depends on the interaction
    of the input variables. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    min_samples_split samples.
    If int, values must be in the range `[1, inf)`.

- `min_impurity_decrease`: float, default=0.0
    A node will be split if this split induces a decrease of the impurity
    greater than or equal to this value.
    Values must be in the range `[0.0, inf)`.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19

stimator or 'zero', default=None
stimator object that is used to compute the initial predictions.
it`` has to provide :term:`fit` and :term:`predict_proba`. If
o', the initial raw predictions are set to zero. By default, a
mmyEstimator`` predicting the classes priors is used.

tate : int, RandomState instance or None, default=None
rols the random seed given to each Tree estimator at each
ting iteration.
ddition, it controls the random permutation of the features at
 split (see Notes for more details).
lso controls the random splitting of the training data to obtain a
dation set if `n_iter_no_change` is not None.
 an int for reproducible output across multiple function calls.
:term:`Glossary `.

ures : {'sqrt', 'log2'}, int or float, default=None
number of features to consider when looking for the best split:

 int, values must be in the range `[1, inf)`.
 float, values must be in the range `(0.0, 1.0]` and the features
nsidered at each split will be `max(1, int(max_features * n_features_in_))`.
 'sqrt', then `max_features=sqrt(n_features)`.
 'log2', then `max_features=log2(n_features)`.
 None, then `max_features=n_features`.

sing `max_features < n_features` leads to a reduction of variance
an increase in bias.

: the search for a split does not stop until at least one
d partition of the node samples is found, even if it requires to
ctively inspect more than ``max_features`` features.

: int, default=0
le verbose output. If 1 then it prints progress and performance
 in a while (the more trees the lower the frequency). If greater
 1 then it prints progress and performance for every tree.
es must be in the range `[0, inf)`.

_nodes : int, default=None
 trees with ``max_leaf_nodes`` in best-first fashion.
 nodes are defined as relative reduction in impurity.
es must be in the range `[2, inf)`.
None`, then unlimited number of leaf nodes.

rt : bool, default=False
 set to ``True``, reuse the solution of the previous call to fit
add more estimators to the ensemble, otherwise, just erase the
ious solution. See :term:`the Glossary `.

on_fraction : float, default=0.1
proportion of training data to set aside as validation set for
y stopping. Values must be in the range `(0.0, 1.0)`.
 used if ``n_iter_no_change`` is set to an integer.

ersionadded:: 0.20

o_change : int, default=None
iter_no_change`` is used to decide if early stopping will be used
erminate training when validation score is not improving. By
ult it is set to None to disable early stopping. If set to a
er, it will set aside ``validation_fraction`` size of the training
 as validation and terminate training when validation score is not
oving in all of the previous ``n_iter_no_change`` numbers of
ations. The split is stratified.
es must be in the range `[1, inf)`.

:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_early_stopping.py`.

ersionadded:: 0.20

oat, default=1e-4
rance for the early stopping. When the loss is not improving
t least tol for ``n_iter_no_change`` iterations (if set to a
er), the training stops.
es must be in the range `[0.0, inf)`.

ersionadded:: 0.20

a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed.
es must be in the range `[0.0, inf)`.
:ref:`minimal_cost_complexity_pruning` for details.

ersionadded:: 0.22

es
--
tors_ : int
number of estimators as selected by early stopping (if
iter_no_change`` is specified). Otherwise it is set to
estimators``.

ersionadded:: 0.20

per_iteration_ : int
number of trees that are built at each iteration. For binary classifiers,
 is always 1.

ersionadded:: 1.4.0

importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature.  It is also
n as the Gini importance.

ing: impurity-based feature importances can be misleading for
 cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.

ovement_ : ndarray of shape (n_estimators,)
improvement in loss on the out-of-bag samples
tive to the previous iteration.
b_improvement_[0]`` is the improvement in
 of the first stage over the ``init`` estimator.
 available if ``subsample < 1.0``.

es_ : ndarray of shape (n_estimators,)
full history of the loss values on the out-of-bag
les. Only available if `subsample < 1.0`.

ersionadded:: 1.3

e_ : float
last value of the loss on the out-of-bag samples. It is
same as `oob_scores_[-1]`. Only available if `subsample < 1.0`.

ersionadded:: 1.3

ore_ : ndarray of shape (n_estimators,)
i-th score ``train_score_[i]`` is the loss of the
l at iteration ``i`` on the in-bag sample.
`subsample == 1`` this is the loss on the training data.

estimator
estimator that provides the initial predictions. Set via the ``init``
ment.

rs_ : ndarray of DecisionTreeRegressor of             shape (n_estimators, ``n_trees_per_iteration_``)
collection of fitted sub-estimators. ``n_trees_per_iteration_`` is 1 for
ry classification, otherwise ``n_classes``.

 : ndarray of shape (n_classes,)
classes labels.

es_in_ : int
er of features seen during :term:`fit`.

ersionadded:: 0.24

names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.

ersionadded:: 1.0

s_ : int
number of classes.

ures_ : int
inferred value of max_features.



ientBoostingClassifier : Histogram-based Gradient Boosting
sification Tree.
tree.DecisionTreeClassifier : A decision tree classifier.
restClassifier : A meta-estimator that fits a number of decision
 classifiers on various sub-samples of the dataset and uses
aging to improve the predictive accuracy and control over-fitting.
Classifier : A meta-estimator that begins by fitting a classifier
he original dataset and then fits additional copies of the
sifier on the same dataset where the weights of incorrectly
sified instances are adjusted such that subsequent classifiers
s more on difficult cases.



ures are always randomly permuted at each split. Therefore,
 found split may vary, even with the same training data and
atures=n_features``, if the improvement of the criterion is
l for several splits enumerated during the search of the best
o obtain a deterministic behaviour during fitting,
_state`` has to be fixed.

es
--
man, Greedy Function Approximation: A Gradient Boosting
 The Annals of Statistics, Vol. 29, No. 5, 2001.

man, Stochastic Gradient Boosting, 1999

e, R. Tibshirani and J. Friedman.
 of Statistical Learning Ed. 2, Springer, 2009.



owing example shows how to fit a gradient boosting classifier with
sion stumps as weak learners.

 sklearn.datasets import make_hastie_10_2
 sklearn.ensemble import GradientBoostingClassifier

 = make_hastie_10_2(random_state=0)
ain, X_test = X[:2000], X[2000:]
ain, y_test = y[:2000], y[2000:]

= GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
max_depth=1, random_state=0).fit(X_train, y_train)
score(X_test, y_test)

24.2.14 /hist-gradient-boosting-classifier

name	type	default	description
n-iter-no-change
learning-rate
max-leaf-nodes
scoring
tol
early-stopping
max-iter
random-state
max-bins
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
validation-fraction
class-weight
loss
interaction-cst
verbose
categorical-features
l-2-regularization
predict-proba?

Histogram-based Gradient Boosting Classification Tree.

This estimator is much faster than
`GradientBoostingClassifier`
for big datasets (n_samples >= 10 000).

This estimator has native support for missing values (NaNs). During
training, the tree grower learns at each split point whether samples
with missing values should go to the left or right child, based on the
potential gain. When predicting, samples with missing values are
assigned to the left or right child consequently. If no missing values
were encountered for a given feature during training, then samples with
missing values are mapped to whichever child has the most samples.

This implementation is inspired by
[LightGBM ](https://github.com/Microsoft/LightGBM).

Read more in the User Guide: `histogram_based_gradient_boosting`.

*Added in 0.21*

Parameters
----------
- `loss`: {'log_loss'}, default='log_loss'
The loss function to use in the boosting process.

For binary classification problems, 'log_loss' is also known as logistic loss,
binomial deviance or binary crossentropy. Internally, the model fits one tree
per boosting iteration and uses the logistic sigmoid function (expit) as
inverse link function to compute the predicted positive class probability.

For multiclass classification problems, 'log_loss' is also known as multinomial
deviance or categorical crossentropy. Internally, the model fits one tree per
boosting iteration and per class and uses the softmax function as inverse link
function to compute the predicted probabilities of the classes.

- `learning_rate`: float, default=0.1
The learning rate, also known as *shrinkage*. This is used as a
multiplicative factor for the leaves values. Use ``1`` for no
shrinkage.
- `max_iter`: int, default=100
The maximum number of iterations of the boosting process, i.e. the
maximum number of trees for binary classification. For multiclass
classification, `n_classes` trees per iteration are built.
- `max_leaf_nodes`: int or None, default=31
The maximum number of leaves for each tree. Must be strictly greater
than 1. If None, there is no maximum limit.
- `max_depth`: int or None, default=None
The maximum depth of each tree. The depth of a tree is the number of
edges to go from the root to the deepest leaf.
Depth isn't constrained by default.
- `min_samples_leaf`: int, default=20
The minimum number of samples per leaf. For small datasets with less
than a few hundred samples, it is recommended to lower this value
since only very shallow trees would be built.
- `l2_regularization`: float, default=0
The L2 regularization parameter penalizing leaves with small hessians.
Use ``0`` for no regularization (default).
- `max_features`: float, default=1.0
Proportion of randomly chosen features in each and every node split.
This is a form of regularization, smaller values make the trees weaker
learners and might prevent overfitting.
If interaction constraints from `interaction_cst` are present, only allowed
features are taken into account for the subsampling.

*Added in 1.4*

- `max_bins`: int, default=255
The maximum number of bins to use for non-missing values. Before
training, each feature of the input array `X` is binned into
integer-valued bins, which allows for a much faster training stage.
Features with a small number of unique values may use less than
``max_bins`` bins. In addition to the ``max_bins`` bins, one more bin
is always reserved for missing values. Must be no larger than 255.
- `categorical_features`: array-like of {bool, int, str} of shape (n_features) or shape (n_categorical_features,), default=None
Indicates the categorical features.

- None : no feature will be considered categorical.
- boolean array-like : boolean mask indicating categorical features.
- integer array-like : integer indices indicating categorical
features.
- str array-like: names of categorical features (assuming the training
data has feature names).
- `"from_dtype"`: dataframe columns with dtype "category" are
considered to be categorical features. The input must be an object
exposing a ``__dataframe__`` method such as pandas or polars
DataFrames to use this feature.

For each categorical feature, there must be at most `max_bins` unique
categories. Negative values for categorical features encoded as numeric
dtypes are treated as missing values. All categorical values are
converted to floating point numbers. This means that categorical values
of 1.0 and 1 are treated as the same category.

Read more in the User Guide: `categorical_support_gbdt`.

*Added in 0.24*

*Changed in 1.2*
Added support for feature names.

*Changed in 1.4*
Added `"from_dtype"` option. The default will change to `"from_dtype"` in
v1.6.

- `monotonic_cst`: array-like of int of shape (n_features) or dict, default=None
Monotonic constraint to enforce on each feature are specified using the
following integer values:

- 1: monotonic increase
- 0: no constraint
- -1: monotonic decrease

If a dict with str keys, map feature to monotonic constraints by name.
If an array, the features are mapped to constraints by position. See
:ref:`monotonic_cst_features_names` for a usage example.

The constraints are only valid for binary classifications and hold
over the probability of the positive class.
Read more in the User Guide: `monotonic_cst_gbdt`.

*Added in 0.23*

*Changed in 1.2*
Accept dict of constraints with feature names as keys.

- `interaction_cst`: {"pairwise", "no_interactions"} or sequence of lists/tuples/sets of int, default=None
Specify interaction constraints, the sets of features which can
interact with each other in child node splits.

Each item specifies the set of feature indices that are allowed
to interact with each other. If there are more features than
specified in these constraints, they are treated as if they were
specified as an additional set.

The strings "pairwise" and "no_interactions" are shorthands for
allowing only pairwise or no interactions, respectively.

For instance, with 5 features in total, `interaction_cst=[{0, 1}]`
is equivalent to `interaction_cst=[{0, 1}, {2, 3, 4}]`,
and specifies that each branch of a tree will either only split
on features 0 and 1 or only split on features 2, 3 and 4.

*Added in 1.2*

- `warm_start`: bool, default=False
When set to ``True``, reuse the solution of the previous call to fit
and add more estimators to the ensemble. For results to be valid, the
estimator should be re-trained on the same data only.
See `the Glossary `.
- `early_stopping`: 'auto' or bool, default='auto'
If 'auto', early stopping is enabled if the sample size is larger than
10000. If True, early stopping is enabled, otherwise early stopping is
disabled.

*Added in 0.23*

- `scoring`: str or callable or None, default='loss'
Scoring parameter to use for early stopping. It can be a single
string (see :ref:`scoring_parameter`) or a callable (see
:ref:`scoring`). If None, the estimator's default scorer
is used. If ``scoring='loss'``, early stopping is checked
w.r.t the loss value. Only used if early stopping is performed.
- `validation_fraction`: int or float or None, default=0.1
Proportion (or absolute size) of training data to set aside as
validation data for early stopping. If None, early stopping is done on
the training data. Only used if early stopping is performed.
- `n_iter_no_change`: int, default=10
Used to determine when to "early stop". The fitting process is
stopped when none of the last ``n_iter_no_change`` scores are better
than the ``n_iter_no_change - 1`` -th-to-last one, up to some
tolerance. Only used if early stopping is performed.
- `tol`: float, default=1e-7
The absolute tolerance to use when comparing scores. The higher the
tolerance, the more likely we are to early stop: higher tolerance
means that it will be harder for subsequent iterations to be
considered an improvement upon the reference score.
- `verbose`: int, default=0
The verbosity level. If not zero, print some information about the
fitting process.
- `random_state`: int, RandomState instance or None, default=None
Pseudo-random number generator to control the subsampling in the
binning process, and the train/validation data split if early stopping
is enabled.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
- `class_weight`: dict or 'balanced', default=None
Weights associated with classes in the form `{class_label: weight}`.
If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as `n_samples / (n_classes * np.bincount(y))`.
Note that these weights will be multiplied with sample_weight (passed
through the fit method) if `sample_weight` is specified.

*Added in 1.2*

Attributes
----------
- `classes_`: array, shape = (n_classes,)
Class labels.
- `do_early_stopping_`: bool
Indicates whether early stopping is used during training.
- `n_iter_`: int
The number of iterations as selected by early stopping, depending on
the `early_stopping` parameter. Otherwise it corresponds to max_iter.
- `n_trees_per_iteration_`: int
The number of tree that are built at each iteration. This is equal to 1
for binary classification, and to ``n_classes`` for multiclass
classification.
- `train_score_`: ndarray, shape (n_iter_+1,)
The scores at each iteration on the training data. The first entry
is the score of the ensemble before the first iteration. Scores are
computed according to the ``scoring`` parameter. If ``scoring`` is
not 'loss', scores are computed on a subset of at most 10 000
samples. Empty if no early stopping.
- `validation_score_`: ndarray, shape (n_iter_+1,)
The scores at each iteration on the held-out validation data. The
first entry is the score of the ensemble before the first iteration.
Scores are computed according to the ``scoring`` parameter. Empty if
no early stopping or if ``validation_fraction`` is None.
- `is_categorical_`: ndarray, shape (n_features, ) or None
Boolean mask for the categorical features. ``None`` if there are no
categorical features.
- `n_features_in_`: int
Number of features seen during `fit`.

*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.

*Added in 1.0*

See Also
--------
- `GradientBoostingClassifier`: Exact gradient boosting method that does not
scale as good on datasets with a large number of samples.
- `sklearn.tree.DecisionTreeClassifier`: A decision tree classifier.
- `RandomForestClassifier`: A meta-estimator that fits a number of decision
tree classifiers on various sub-samples of the dataset and uses
averaging to improve the predictive accuracy and control over-fitting.
- `AdaBoostClassifier`: A meta-estimator that begins by fitting a classifier
on the original dataset and then fits additional copies of the
classifier on the same dataset where the weights of incorrectly
classified instances are adjusted such that subsequent classifiers
focus more on difficult cases.

Examples
--------
>>> from sklearn.ensemble import HistGradientBoostingClassifier
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> clf = HistGradientBoostingClassifier().fit(X, y)
>>> clf.score(X, y)
1.0

24.2.15 /k-neighbors-classifier

name	type	default	description
algorithm
leaf-size
metric
metric-params
n-jobs
n-neighbors
p
weights
predict-proba?

Classifier implementing the k-nearest neighbors vote.

Read more in the User Guide: `classification`.

Parameters
----------
- `n_neighbors`: int, default=5
    Number of neighbors to use by default for `kneighbors` queries.

- `weights`: {'uniform', 'distance'}, callable or None, default='uniform'
    Weight function used in prediction.  Possible values:

    - 'uniform' : uniform weights.  All points in each neighborhood
      are weighted equally.
    - 'distance' : weight points by the inverse of their distance.
      in this case, closer neighbors of a query point will have a
      greater influence than neighbors which are further away.
    - [callable] : a user-defined function which accepts an
      array of distances, and returns an array of the same shape
      containing the weights.

    Refer to the example entitled
    :ref:`sphx_glr_auto_examples_neighbors_plot_classification.py`
    showing the impact of the `weights` parameter on the decision
    boundary.

- `algorithm`: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
    Algorithm used to compute the nearest neighbors:

    - 'ball_tree' will use `BallTree`
    - 'kd_tree' will use `KDTree`
    - 'brute' will use a brute-force search.
    - 'auto' will attempt to decide the most appropriate algorithm
      based on the values passed to `fit` method.

    Note: fitting on sparse input will override the setting of
    this parameter, using brute force.

- `leaf_size`: int, default=30
    Leaf size passed to BallTree or KDTree.  This can affect the
    speed of the construction and query, as well as the memory
    required to store the tree.  The optimal value depends on the
    nature of the problem.

- `p`: float, default=2
    Power parameter for the Minkowski metric. When p = 1, this is equivalent
    to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2.
    For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected
    to be positive.

- `metric`: str or callable, default='minkowski'
    Metric to use for distance computation. Default is "minkowski", which
    results in the standard Euclidean distance when p = 2. See the
    documentation of [scipy.spatial.distance
    ](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) and
    the metrics listed in
    `~sklearn.metrics.pairwise.distance_metrics` for valid metric
    values.

    If metric is "precomputed", X is assumed to be a distance matrix and
    must be square during fit. X may be a `sparse graph`, in which
    case only "nonzero" elements may be considered neighbors.

    If metric is a callable function, it takes two arrays representing 1D
    vectors as inputs and must return one value indicating the distance
    between those vectors. This works for Scipy's metrics, but is less
    efficient than passing the metric name as a string.

- `metric_params`: dict, default=None
    Additional keyword arguments for the metric function.

- `n_jobs`: int, default=None
    The number of parallel jobs to run for neighbors search.
    ``None`` means 1 unless in a `joblib.parallel_backend` context.
    ``-1`` means using all processors. See `Glossary `
    for more details.
    Doesn't affect `fit` method.

Attributes
----------
- `classes_`: array of shape (n_classes,)
    Class labels known to the classifier

- `effective_metric_`: str or callble
    The distance metric used. It will be same as the `metric` parameter
    or a synonym of it, e.g. 'euclidean' if the `metric` parameter set to
    'minkowski' and `p` parameter set to 2.

- `effective_metric_params_`: dict
    Additional keyword arguments for the metric function. For most metrics
    will be same with `metric_params` parameter, but may also contain the
    `p` parameter value if the `effective_metric_` attribute is set to
    'minkowski'.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_samples_fit_`: int
    Number of samples in the fitted data.

- `outputs_2d_`: bool
    False when `y`'s shape is (n_samples, ) or (n_samples, 1) during fit
    otherwise True.

See Also
--------
RadiusNeighborsClassifier: Classifier based on neighbors within a fixed radius.
KNeighborsRegressor: Regression based on k-nearest neighbors.
RadiusNeighborsRegressor: Regression based on neighbors within a fixed radius.
NearestNeighbors: Unsupervised learner for implementing neighbor searches.

Notes
-----
See Nearest Neighbors: `neighbors` in the online documentation
for a discussion of the choice of ``algorithm`` and ``leaf_size``.

⚠️ Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

ps://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

mples

X = [[0], [1], [2], [3]] y = [0, 0, 1, 1] from sklearn.neighbors import KNeighborsClassifier neigh = KNeighborsClassifier(n_neighbors=3) neigh.fit(X, y) ighborsClassifier(...) print(neigh.predict([[1.1]]))

print(neigh.predict_proba([[0.9]])) .666... 0.333...]]

24.2.16 /label-propagation

name	type	default	description
gamma
kernel
max-iter
n-jobs
n-neighbors
tol
predict-proba?

Label Propagation classifier.

Read more in the User Guide: `label_propagation`.

Parameters
----------
- `kernel`: {'knn', 'rbf'} or callable, default='rbf'
    String identifier for kernel function to use or the kernel function
    itself. Only 'rbf' and 'knn' strings are valid inputs. The function
    passed should take two inputs, each of shape (n_samples, n_features),
    and return a (n_samples, n_samples) shaped weight matrix.

- `gamma`: float, default=20
    Parameter for rbf kernel.

- `n_neighbors`: int, default=7
    Parameter for knn kernel which need to be strictly positive.

- `max_iter`: int, default=1000
    Change maximum number of iterations allowed.

- `tol`: float, 1e-3
    Convergence tolerance: threshold to consider the system at steady
    state.

- `n_jobs`: int, default=None
    The number of parallel jobs to run.
    ``None`` means 1 unless in a `joblib.parallel_backend` context.
    ``-1`` means using all processors. See `Glossary `
    for more details.

Attributes
----------
- `X_`: {array-like, sparse matrix} of shape (n_samples, n_features)
    Input array.

- `classes_`: ndarray of shape (n_classes,)
    The distinct labels used in classifying instances.

- `label_distributions_`: ndarray of shape (n_samples, n_classes)
    Categorical distribution for each item.

- `transduction_`: ndarray of shape (n_samples)
    Label assigned to each item during `fit`.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_iter_`: int
    Number of iterations run.

See Also
--------
- `LabelSpreading`: Alternate label propagation strategy more robust to noise.

References
----------
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data
with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon
University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf

Examples
--------
>>> import numpy as np
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelPropagation
>>> label_prop_model = LabelPropagation()
>>> iris = datasets.load_iris()
>>> rng = np.random.RandomState(42)
>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
LabelPropagation(...)

24.2.17 /label-spreading

name	type	default	description
alpha
gamma
kernel
max-iter
n-jobs
n-neighbors
tol
predict-proba?

LabelSpreading model for semi-supervised learning.

This model is similar to the basic Label Propagation algorithm,
but uses affinity matrix based on the normalized graph Laplacian
and soft clamping across the labels.

Read more in the User Guide: `label_propagation`.

Parameters
----------
- `kernel`: {'knn', 'rbf'} or callable, default='rbf'
    String identifier for kernel function to use or the kernel function
    itself. Only 'rbf' and 'knn' strings are valid inputs. The function
    passed should take two inputs, each of shape (n_samples, n_features),
    and return a (n_samples, n_samples) shaped weight matrix.

- `gamma`: float, default=20
  Parameter for rbf kernel.

- `n_neighbors`: int, default=7
  Parameter for knn kernel which is a strictly positive integer.

- `alpha`: float, default=0.2
  Clamping factor. A value in (0, 1) that specifies the relative amount
  that an instance should adopt the information from its neighbors as
  opposed to its initial label.
  alpha=0 means keeping the initial label information; alpha=1 means
  replacing all initial information.

- `max_iter`: int, default=30
  Maximum number of iterations allowed.

- `tol`: float, default=1e-3
  Convergence tolerance: threshold to consider the system at steady
  state.

- `n_jobs`: int, default=None
    The number of parallel jobs to run.
    ``None`` means 1 unless in a `joblib.parallel_backend` context.
    ``-1`` means using all processors. See `Glossary `
    for more details.

Attributes
----------
- `X_`: ndarray of shape (n_samples, n_features)
    Input array.

- `classes_`: ndarray of shape (n_classes,)
    The distinct labels used in classifying instances.

- `label_distributions_`: ndarray of shape (n_samples, n_classes)
    Categorical distribution for each item.

- `transduction_`: ndarray of shape (n_samples,)
    Label assigned to each item during `fit`.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_iter_`: int
    Number of iterations run.

See Also
--------
- `LabelPropagation`: Unregularized graph based semi-supervised learning.

References
----------
[Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston,
Bernhard Schoelkopf. Learning with local and global consistency (2004)
](https://citeseerx.ist.psu.edu/doc_view/pid/d74c37aabf2d5cae663007cbd8718175466aea8c)

Examples
--------
>>> import numpy as np
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelSpreading
>>> label_prop_model = LabelSpreading()
>>> iris = datasets.load_iris()
>>> rng = np.random.RandomState(42)
>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
LabelSpreading(...)

24.2.18 /linear-discriminant-analysis

name	type	default	description
covariance-estimator
n-components
priors
shrinkage
solver
store-covariance
tol
predict-proba?

Linear Discriminant Analysis.

A classifier with a linear decision boundary, generated by fitting class
conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class, assuming that all classes
share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input
by projecting it to the most discriminative directions, using the
`transform` method.

*Added in 0.17*

For a comparison between
`~sklearn.discriminant_analysis.LinearDiscriminantAnalysis`
and `~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis`, see
:ref:`sphx_glr_auto_examples_classification_plot_lda_qda.py`.

Read more in the User Guide: `lda_qda`.

Parameters
----------
- `solver`: {'svd', 'lsqr', 'eigen'}, default='svd'
    Solver to use, possible values:
      - 'svd': Singular value decomposition (default).
        Does not compute the covariance matrix, therefore this solver is
        recommended for data with a large number of features.
      - 'lsqr': Least squares solution.
        Can be combined with shrinkage or custom covariance estimator.
      - 'eigen': Eigenvalue decomposition.
        Can be combined with shrinkage or custom covariance estimator.

    *Changed in 1.2*
        `solver="svd"` now has experimental Array API support. See the
        Array API User Guide: `array_api` for more details.

- `shrinkage`: 'auto' or float, default=None
    Shrinkage parameter, possible values:
      - None: no shrinkage (default).
      - 'auto': automatic shrinkage using the Ledoit-Wolf lemma.
      - float between 0 and 1: fixed shrinkage parameter.

    This should be left to None if `covariance_estimator` is used.
    Note that shrinkage works only with 'lsqr' and 'eigen' solvers.

    For a usage example, see
    :ref:`sphx_glr_auto_examples_classification_plot_lda.py`.

- `priors`: array-like of shape (n_classes,), default=None
    The class prior probabilities. By default, the class proportions are
    inferred from the training data.

- `n_components`: int, default=None
    Number of components (<= min(n_classes - 1, n_features)) for
    dimensionality reduction. If None, will be set to
    min(n_classes - 1, n_features). This parameter only affects the
    `transform` method.

    For a usage example, see
    :ref:`sphx_glr_auto_examples_decomposition_plot_pca_vs_lda.py`.

- `store_covariance`: bool, default=False
    If True, explicitly compute the weighted within-class covariance
    matrix when solver is 'svd'. The matrix is always computed
    and stored for the other solvers.

    *Added in 0.17*

- `tol`: float, default=1.0e-4
    Absolute threshold for a singular value of X to be considered
    significant, used to estimate the rank of X. Dimensions whose
    singular values are non-significant are discarded. Only used if
    solver is 'svd'.

    *Added in 0.17*

- `covariance_estimator`: covariance estimator, default=None
    If not None, `covariance_estimator` is used to estimate
    the covariance matrices instead of relying on the empirical
    covariance estimator (with potential shrinkage).
    The object should have a fit method and a ``covariance_`` attribute
    like the estimators in `sklearn.covariance`.
    if None the shrinkage parameter drives the estimate.

    This should be left to None if `shrinkage` is used.
    Note that `covariance_estimator` works only with 'lsqr' and 'eigen'
    solvers.

    *Added in 0.24*

Attributes
----------
- `coef_`: ndarray of shape (n_features,) or (n_classes, n_features)
    Weight vector(s).

- `intercept_`: ndarray of shape (n_classes,)
    Intercept term.

- `covariance_`: array-like of shape (n_features, n_features)
    Weighted within-class covariance matrix. It corresponds to
    `sum_k prior_k * C_k` where `C_k` is the covariance matrix of the
    samples in class `k`. The `C_k` are estimated using the (potentially
    shrunk) biased estimator of covariance. If solver is 'svd', only
    exists when `store_covariance` is True.

- `explained_variance_ratio_`: ndarray of shape (n_components,)
    Percentage of variance explained by each of the selected components.
    If ``n_components`` is not set then all components are stored and the
    sum of explained variances is equal to 1.0. Only available when eigen
    or svd solver is used.

- `means_`: array-like of shape (n_classes, n_features)
    Class-wise means.

- `priors_`: array-like of shape (n_classes,)
    Class priors (sum to 1).

- `scalings_`: array-like of shape (rank, n_classes - 1)
    Scaling of the features in the space spanned by the class centroids.
    Only available for 'svd' and 'eigen' solvers.

- `xbar_`: array-like of shape (n_features,)
    Overall mean. Only present if solver is 'svd'.

- `classes_`: array-like of shape (n_classes,)
    Unique class labels.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `QuadraticDiscriminantAnalysis`: Quadratic Discriminant Analysis.

Examples
--------
>>> import numpy as np
>>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LinearDiscriminantAnalysis()
>>> clf.fit(X, y)
LinearDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

24.2.19 /linear-svc

name	type	default	description
tol
intercept-scaling
multi-class
penalty
c
max-iter
random-state
dual
fit-intercept
class-weight
loss
verbose
predict-proba?

Linear Support Vector Classification.

Similar to SVC with parameter kernel='linear', but implemented in terms of
liblinear rather than libsvm, so it has more flexibility in the choice of
penalties and loss functions and should scale better to large numbers of
samples.

The main differences between `~sklearn.svm.LinearSVC` and
`~sklearn.svm.SVC` lie in the loss function used by default, and in
the handling of intercept regularization between those two implementations.

This class supports both dense and sparse input and the multiclass support
is handled according to a one-vs-the-rest scheme.

Read more in the User Guide: `svm_classification`.

Parameters
----------
- `penalty`: {'l1', 'l2'}, default='l2'
    Specifies the norm used in the penalization. The 'l2'
    penalty is the standard used in SVC. The 'l1' leads to ``coef_``
    vectors that are sparse.

- `loss`: {'hinge', 'squared_hinge'}, default='squared_hinge'
    Specifies the loss function. 'hinge' is the standard SVM loss
    (used e.g. by the SVC class) while 'squared_hinge' is the
    square of the hinge loss. The combination of ``penalty='l1'``
    and ``loss='hinge'`` is not supported.

- `dual`: "auto" or bool, default="auto"
    Select the algorithm to either solve the dual or primal
    optimization problem. Prefer dual=False when n_samples > n_features.
    `dual="auto"` will choose the value of the parameter automatically,
    based on the values of `n_samples`, `n_features`, `loss`, `multi_class`
    and `penalty`. If `n_samples` < `n_features` and optimizer supports
    chosen `loss`, `multi_class` and `penalty`, then dual will be set to True,
    otherwise it will be set to False.

    *Changed in 1.3*
       The `"auto"` option is added in version 1.3 and will be the default
       in version 1.5.

- `tol`: float, default=1e-4
    Tolerance for stopping criteria.

- `C`: float, default=1.0
    Regularization parameter. The strength of the regularization is
    inversely proportional to C. Must be strictly positive.
    For an intuitive visualization of the effects of scaling
    the regularization parameter C, see
    :ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`.

- `multi_class`: {'ovr', 'crammer_singer'}, default='ovr'
    Determines the multi-class strategy if `y` contains more than
    two classes.
    ``"ovr"`` trains n_classes one-vs-rest classifiers, while
    ``"crammer_singer"`` optimizes a joint objective over all classes.
    While `crammer_singer` is interesting from a theoretical perspective
    as it is consistent, it is seldom used in practice as it rarely leads
    to better accuracy and is more expensive to compute.
    If ``"crammer_singer"`` is chosen, the options loss, penalty and dual
    will be ignored.

- `fit_intercept`: bool, default=True
    Whether or not to fit an intercept. If set to True, the feature vector
    is extended to include an intercept term: `[x_1, ..., x_n, 1]`, where
    1 corresponds to the intercept. If set to False, no intercept will be
    used in calculations (i.e. data is expected to be already centered).

- `intercept_scaling`: float, default=1.0
    When `fit_intercept` is True, the instance vector x becomes ``[x_1,
    ..., x_n, intercept_scaling]``, i.e. a "synthetic" feature with a
    constant value equal to `intercept_scaling` is appended to the instance
    vector. The intercept becomes intercept_scaling * synthetic feature
    weight. Note that liblinear internally penalizes the intercept,
    treating it like any other term in the feature vector. To reduce the
    impact of the regularization on the intercept, the `intercept_scaling`
    parameter can be set to a value greater than 1; the higher the value of
    `intercept_scaling`, the lower the impact of regularization on it.
    Then, the weights become `[w_x_1, ..., w_x_n,
    w_intercept*intercept_scaling]`, where `w_x_1, ..., w_x_n` represent
    the feature weights and the intercept weight is scaled by
    `intercept_scaling`. This scaling allows the intercept term to have a
    different regularization behavior compared to the other features.

- `class_weight`: dict or 'balanced', default=None
    Set the parameter C of class i to ``class_weight[i]*C`` for
    SVC. If not given, all classes are supposed to have
    weight one.
    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``.

- `verbose`: int, default=0
    Enable verbose output. Note that this setting takes advantage of a
    per-process runtime setting in liblinear that, if enabled, may not work
    properly in a multithreaded context.

- `random_state`: int, RandomState instance or None, default=None
    Controls the pseudo random number generation for shuffling the data for
    the dual coordinate descent (if ``dual=True``). When ``dual=False`` the
    underlying implementation of `LinearSVC` is not random and
    ``random_state`` has no effect on the results.
    Pass an int for reproducible output across multiple function calls.
    See `Glossary `.

- `max_iter`: int, default=1000
    The maximum number of iterations to be run.

Attributes
----------
- `coef_`: ndarray of shape (1, n_features) if n_classes == 2             else (n_classes, n_features)
    Weights assigned to the features (coefficients in the primal
    problem).

    ``coef_`` is a readonly property derived from ``raw_coef_`` that
    follows the internal memory layout of liblinear.

- `intercept_`: ndarray of shape (1,) if n_classes == 2 else (n_classes,)
    Constants in decision function.

- `classes_`: ndarray of shape (n_classes,)
    The unique classes labels.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_iter_`: int
    Maximum number of iterations run across all classes.

See Also
--------
- `SVC`: Implementation of Support Vector Machine classifier using libsvm:
    the kernel can be non-linear but its SMO algorithm does not
    scale to large number of samples as LinearSVC does.

    Furthermore SVC multi-class mode is implemented using one
    vs one scheme while LinearSVC uses one vs the rest. It is
    possible to implement one vs the rest with SVC by using the
    `~sklearn.multiclass.OneVsRestClassifier` wrapper.

    Finally SVC can fit dense data without memory copy if the input
    is C-contiguous. Sparse data will still incur memory copy though.

- `sklearn.linear_model.SGDClassifier`: SGDClassifier can optimize the same
    cost function as LinearSVC
    by adjusting the penalty and loss parameters. In addition it requires
    less memory, allows incremental (online) learning, and implements
    various loss functions and regularization regimes.

Notes
-----
The underlying C implementation uses a random number generator to
select features when fitting the model. It is thus not uncommon
to have slightly different results for the same input data. If
that happens, try with a smaller ``tol`` parameter.

The underlying implementation, liblinear, uses a sparse internal
representation for the data that will incur a memory copy.

Predict output may not match that of standalone liblinear in certain
cases. See differences from liblinear: `liblinear_differences`
in the narrative documentation.

References
----------
[LIBLINEAR: A Library for Large Linear Classification
](https://www.csie.ntu.edu.tw/~cjlin/liblinear/)

Examples
--------
>>> from sklearn.svm import LinearSVC
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_features=4, random_state=0)
>>> clf = make_pipeline(StandardScaler(),
...                     LinearSVC(random_state=0, tol=1e-5))
>>> clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('linearsvc', LinearSVC(random_state=0, tol=1e-05))])

>>> print(clf.named_steps['linearsvc'].coef_)
[[0.141...   0.526... 0.679... 0.493...]]

>>> print(clf.named_steps['linearsvc'].intercept_)
[0.1693...]
>>> print(clf.predict([[0, 0, 0, 0]]))
[1]

24.2.20 /logistic-regression

name	type	default	description
tol
intercept-scaling
multi-class
solver
penalty
c
max-iter
n-jobs
random-state
dual
fit-intercept
warm-start
l-1-ratio
class-weight
verbose
predict-proba?

Logistic Regression (aka logit, MaxEnt) classifier.

In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
scheme if the 'multi_class' option is set to 'ovr', and uses the
cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
(Currently the 'multinomial' option is supported only by the 'lbfgs',
'sag', 'saga' and 'newton-cg' solvers.)

This class implements regularized logistic regression using the
'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note
that regularization is applied by default**. It can handle both dense
and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit
floats for optimal performance; any other input format will be converted
(and copied).

The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
with primal formulation, or no regularization. The 'liblinear' solver
supports both L1 and L2 regularization, with a dual formulation only for
the L2 penalty. The Elastic-Net regularization is only supported by the
'saga' solver.

Read more in the User Guide: `logistic_regression`.

Parameters
----------
- `penalty`: {'l1', 'l2', 'elasticnet', None}, default='l2'
    Specify the norm of the penalty:

    - `None`: no penalty is added;
    - `'l2'`: add a L2 penalty term and it is the default choice;
    - `'l1'`: add a L1 penalty term;
    - `'elasticnet'`: both L1 and L2 penalty terms are added.

⚠️ Warning

Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.

versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1)

bool, default=False l (constrained) or primal (regularized, see also f:this equation ) formulation. Dual formulation only implemented for l2 penalty with liblinear solver. Prefer dual=False when amples > n_features.

loat, default=1e-4 erance for stopping criteria.

at, default=1.0 erse of regularization strength; must be a positive float. e in support vector machines, smaller values specify stronger ularization.

ercept : bool, default=True cifies if a constant (a.k.a. bias or intercept) should be ed to the decision function.

pt_scaling : float, default=1 ful only when the solver 'liblinear' is used self.fit_intercept is set to True. In this case, x becomes self.intercept_scaling], . a "synthetic" feature with constant value equal to ercept_scaling is appended to the instance vector. intercept becomes intercept_scaling * synthetic_feature_weight.

e! the synthetic feature weight is subject to l1/l2 regularization all other features. lessen the effect of regularization on synthetic feature weight d therefore on the intercept) intercept_scaling has to be increased.

eight : dict or 'balanced', default=None ghts associated with classes in the form {class_label: weight}. not given, all classes are supposed to have weight one.

"balanced" mode uses the values of y to automatically adjust ghts inversely proportional to class frequencies in the input data n_samples / (n_classes * np.bincount(y)).

e that these weights will be multiplied with sample_weight (passed ough the fit method) if sample_weight is specified.

versionadded:: 0.17 class_weight='balanced'

state : int, RandomState instance, default=None d when solver == 'sag', 'saga' or 'liblinear' to shuffle the a. See :term:Glossary for details.

: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

orithm to use in the optimization problem. Default is 'lbfgs'. choose a solver, you might want to consider the following aspects:

or small datasets, 'liblinear' is a good choice, whereas 'sag' nd 'saga' are faster for large ones; or multiclass problems, only 'newton-cg', 'sag', 'saga' and lbfgs' handle multinomial loss; liblinear' and 'newton-cholesky' can only handle binary classification y default. To apply a one-versus-rest scheme for the multiclass setting ne can wrapt it with the OneVsRestClassifier. newton-cholesky' is a good choice for n_samples >> n_features, specially with one-hot encoded categorical features with rare ategories. Be aware that the memory usage of this solver has a quadratic ependency on n_features because it explicitly computes the Hessian atrix.

warning:: The choice of the algorithm depends on the penalty chosen and on (multinomial) multiclass support:

================= ============================== ====================== solver penalty multinomial multiclass ================= ============================== ====================== 'lbfgs' 'l2', None yes 'liblinear' 'l1', 'l2' no 'newton-cg' 'l2', None yes 'newton-cholesky' 'l2', None no 'sag' 'l2', None yes 'saga' 'elasticnet', 'l1', 'l2', None yes ================= ============================== ======================

note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.

seealso:: Refer to the User Guide for more information regarding :class:LogisticRegression and more specifically the :ref:Table summarizing solver/penalty supports.

versionadded:: 0.17 Stochastic Average Gradient descent solver. versionadded:: 0.19 SAGA solver. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. versionadded:: 1.2 newton-cholesky solver.

r : int, default=100 imum number of iterations taken for the solvers to converge.

lass : {'auto', 'ovr', 'multinomial'}, default='auto' the option chosen is 'ovr', then a binary problem is fit for each el. For 'multinomial' the loss minimised is the multinomial loss fit oss the entire probability distribution, even when the data is ary. 'multinomial' is unavailable when solver='liblinear'. to' selects 'ovr' if the data is binary, or if solver='liblinear', otherwise selects 'multinomial'.

versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22. deprecated:: 1.5 multi_class was deprecated in version 1.5 and will be removed in 1.7. From then on, the recommended 'multinomial' will always be used for n_classes >= 3. Solvers that do not support 'multinomial' will raise an error. Use sklearn.multiclass.OneVsRestClassifier(LogisticRegression()) if you still want to use OvR.

: int, default=0 the liblinear and lbfgs solvers set verbose to any positive ber for verbosity.

art : bool, default=False n set to True, reuse the solution of the previous call to fit as tialization, otherwise, just erase the previous solution. less for liblinear solver. See :term:the Glossary .

versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.

: int, default=None ber of CPU cores used when parallelizing over classes if ti_class='ovr'". This parameter is ignored when the solver is to 'liblinear' regardless of whether 'multi_class' is specified or . None means 1 unless in a :obj:joblib.parallel_backend text. -1 means using all processors. :term:Glossary for more details.

o : float, default=None Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only d if penalty='elasticnet'. Setting l1_ratio=0 is equivalent using penalty='l2', while setting l1_ratio=1 is equivalent using penalty='l1'. For 0 < l1_ratio <1, the penalty is a bination of L1 and L2.

tes

_ : ndarray of shape (n_classes, ) ist of class labels known to the classifier.

ndarray of shape (1, n_features) or (n_classes, n_features) fficient of the features in the decision function.

ef_is of shape (1, n_features) when the given problem is binary. particular, whenmulti_class='multinomial', coef_corresponds outcome 1 (True) and-coef_` corresponds to outcome 0 (False).

pt_ : ndarray of shape (1,) or (n_classes,) ercept (a.k.a. bias) added to the decision function.

fit_intercept is set to False, the intercept is set to zero. tercept_is of shape (1,) when the given problem is binary. particular, whenmulti_class='multinomial', intercept_responds to outcome 1 (True) and-intercept_` corresponds to come 0 (False).

res_in_ : int ber of features seen during :term:fit.

versionadded:: 0.24

names_in : ndarray of shape (n_features_in_,) es of features seen during :term:fit. Defined only when X feature names that are all strings.

versionadded:: 1.0

: ndarray of shape (n_classes,) or (1, ) ual number of iterations for all classes. If binary or multinomial, returns only 1 element. For liblinear solver, only the maximum ber of iteration across all classes is given.

versionchanged:: 0.20

In SciPy <= 1.0.0 the number of lbfgs iterations may exceed max_iter. n_iter_ will now report at most max_iter.

o

sifier : Incrementally trained logistic regression (when given parameter loss="log_loss"). cRegressionCV : Logistic regression with built-in cross validation.

erlying C implementation uses a random number generator to features when fitting the model. It is thus not uncommon, slightly different results for the same input data. If ppens, try with a smaller tol parameter.

output may not match that of standalone liblinear in certain See :ref:differences from liblinear narrative documentation.

ces

B -- Software for Large-scale Bound-constrained Optimization ou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales. p://users.iems.northwestern.edu/~nocedal/lbfgsb.html

AR -- A Library for Large Linear Classification ps://www.csie.ntu.edu.tw/~cjlin/liblinear/

Mark Schmidt, Nicolas Le Roux, and Francis Bach imizing Finite Sums with the Stochastic Average Gradient ps://hal.inria.fr/hal-00860051/document

Defazio, A., Bach F. & Lacoste-Julien S. (2014). :arxiv:"SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives" <1407.0202>

Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent hods for logistic regression and maximum entropy models. hine Learning 85(1-2):41-75. ps://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf

s

m sklearn.datasets import load_iris m sklearn.linear_model import LogisticRegression y = load_iris(return_X_y=True) = LogisticRegression(random_state=0).fit(X, y) .predict(X[:2, :]) 0, 0]) .predict_proba(X[:2, :]) [9.8...e-01, 1.8...e-02, 1.4...e-08], [9.7...e-01, 2.8...e-02, ...e-08]]) .score(X, y)

24.2.21 /logistic-regression-cv

name	type	default	description
refit
scoring
tol
intercept-scaling
multi-class
solver
penalty
max-iter
n-jobs
random-state
dual
fit-intercept
cv
cs
class-weight
verbose
l-1-ratios
predict-proba?

Logistic Regression CV (aka logit, MaxEnt) classifier.

See glossary entry for `cross-validation estimator`.

This class implements logistic regression using liblinear, newton-cg, sag
or lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
regularization with primal formulation. The liblinear solver supports both
L1 and L2 regularization, with a dual formulation only for the L2 penalty.
Elastic-Net penalty is only supported by the saga solver.

For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter
is selected by the cross-validator
`~sklearn.model_selection.StratifiedKFold`, but it can be changed
using the `cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs'
solvers can warm-start the coefficients (see `Glossary`).

Read more in the User Guide: `logistic_regression`.

Parameters
----------
- `Cs`: int or list of floats, default=10
    Each of the values in Cs describes the inverse of regularization
    strength. If Cs is as an int, then a grid of Cs values are chosen
    in a logarithmic scale between 1e-4 and 1e4.
    Like in support vector machines, smaller values specify stronger
    regularization.

- `fit_intercept`: bool, default=True
    Specifies if a constant (a.k.a. bias or intercept) should be
    added to the decision function.

- `cv`: int or cross-validation generator, default=None
    The default cross-validation generator used is Stratified K-Folds.
    If an integer is provided, then it is the number of folds used.
    See the module `sklearn.model_selection` module for the
    list of possible cross-validation objects.

    *Changed in 0.22*
        ``cv`` default value if None changed from 3-fold to 5-fold.

- `dual`: bool, default=False
    Dual (constrained) or primal (regularized, see also
    this equation: `regularized-logistic-loss`) formulation. Dual formulation
    is only implemented for l2 penalty with liblinear solver. Prefer dual=False when
    n_samples > n_features.

- `penalty`: {'l1', 'l2', 'elasticnet'}, default='l2'
    Specify the norm of the penalty:

    - `'l2'`: add a L2 penalty term (used by default);
    - `'l1'`: add a L1 penalty term;
    - `'elasticnet'`: both L1 and L2 penalty terms are added.

⚠️ Warning

Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.

: str or callable, default=None tring (see model evaluation documentation) or corer callable object / function with signature corer(estimator, X, y)``. For a list of scoring functions t can be used, look at :mod:sklearn.metrics. The ault scoring option used is 'accuracy'.

: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

orithm to use in the optimization problem. Default is 'lbfgs'. choose a solver, you might want to consider the following aspects:

or small datasets, 'liblinear' is a good choice, whereas 'sag' nd 'saga' are faster for large ones; or multiclass problems, only 'newton-cg', 'sag', 'saga' and lbfgs' handle multinomial loss; liblinear' might be slower in :class:LogisticRegressionCV ecause it does not handle warm-starting. liblinear' and 'newton-cholesky' can only handle binary classification y default. To apply a one-versus-rest scheme for the multiclass setting ne can wrapt it with the OneVsRestClassifier. newton-cholesky' is a good choice for n_samples >> n_features, specially with one-hot encoded categorical features with rare ategories. Be aware that the memory usage of this solver has a quadratic ependency on n_features because it explicitly computes the Hessian atrix.

warning:: The choice of the algorithm depends on the penalty chosen and on (multinomial) multiclass support:

================= ============================== ====================== solver penalty multinomial multiclass ================= ============================== ====================== 'lbfgs' 'l2' yes 'liblinear' 'l1', 'l2' no 'newton-cg' 'l2' yes 'newton-cholesky' 'l2', no 'sag' 'l2', yes 'saga' 'elasticnet', 'l1', 'l2' yes ================= ============================== ======================

note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.

versionadded:: 0.17 Stochastic Average Gradient descent solver. versionadded:: 0.19 SAGA solver. versionadded:: 1.2 newton-cholesky solver.

loat, default=1e-4 erance for stopping criteria.

r : int, default=100 imum number of iterations of the optimization algorithm.

eight : dict or 'balanced', default=None ghts associated with classes in the form {class_label: weight}. not given, all classes are supposed to have weight one.

"balanced" mode uses the values of y to automatically adjust ghts inversely proportional to class frequencies in the input data n_samples / (n_classes * np.bincount(y)).

e that these weights will be multiplied with sample_weight (passed ough the fit method) if sample_weight is specified.

versionadded:: 0.17 class_weight == 'balanced'

: int, default=None ber of CPU cores used during the cross-validation loop. one means 1 unless in a :obj:`joblib.parallel_backend` context. 1 means using all processors. See :term:Glossary more details.

: int, default=0 the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any itive number for verbosity.

bool, default=True set to True, the scores are averaged across all folds, and the fs and the C that corresponds to the best score is taken, and a al refit is done using these parameters. erwise the coefs, intercepts and C that correspond to the t scores across folds are averaged.

lass : {'auto, 'ovr', 'multinomial'}, default='auto' the option chosen is 'ovr', then a binary problem is fit for each el. For 'multinomial' the loss minimised is the multinomial loss fit oss the entire probability distribution, even when the data is ary. 'multinomial' is unavailable when solver='liblinear'. to' selects 'ovr' if the data is binary, or if solver='liblinear', otherwise selects 'multinomial'.

state : int, RandomState instance, default=None d when solver='sag', 'saga' or 'liblinear' to shuffle the data. e that this only applies to the solver and not the cross-validation erator. See :term:Glossary for details.

os : list of float, default=None list of Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. y used if penalty='elasticnet'. A value of 0 is equivalent to ng penalty='l2', while 1 is equivalent to using enalty='l1'. For 0 < l1_ratio <1``, the penalty is a combination L1 and L2.

tes

_ : ndarray of shape (n_classes, ) ist of class labels known to the classifier.

ndarray of shape (1, n_features) or (n_classes, n_features) fficient of the features in the decision function.

ef_` is of shape (1, n_features) when the given problem binary.

pt_ : ndarray of shape (1,) or (n_classes,) ercept (a.k.a. bias) added to the decision function.

fit_intercept is set to False, the intercept is set to zero. tercept_` is of shape(1,) when the problem is binary.

darray of shape (n_cs) ay of C i.e. inverse of regularization parameter values used cross-validation.

os_ : ndarray of shape (n_l1_ratios) ay of l1_ratios used for cross-validation. If no l1_ratio is used e. penalty is not 'elasticnet'), this is set to [None]

aths_ : ndarray of shape (n_folds, n_cs, n_features) or (n_folds, n_cs, n_features + 1) t with classes as the keys, and the path of coefficients obtained ing cross-validating across each fold and then across each Cs er doing an OvR for the corresponding class as values. the 'multi_class' option is set to 'multinomial', then coefs_paths are the coefficients corresponding to each class. h dict value has shape (n_folds, n_cs, n_features) or n_folds, n_cs, n_features + 1)depending on whether the ercept is fit or not. Ifpenalty='elasticnet', the shape is n_folds, n_cs, n_l1_ratios_, n_features) or n_folds, n_cs, n_l1_ratios_, n_features + 1)``.

: dict t with classes as the keys, and the values as the d of scores obtained during cross-validating each fold, after doing OvR for the corresponding class. If the 'multi_class' option en is 'multinomial' then the same scores are repeated across classes, since this is the multinomial class. Each dict value shape (n_folds, n_cs) or (n_folds, n_cs, n_l1_ratios) if enalty='elasticnet'``.

array of shape (n_classes,) or (n_classes - 1,) ay of C that maps to the best scores across every class. If refit is to False, then for each class, the best C is the average of the that correspond to the best scores for each fold. ` is of shape(n_classes,) when the problem is binary.

o_ : ndarray of shape (n_classes,) or (n_classes - 1,) ay of l1_ratio that maps to the best scores across every class. If it is set to False, then for each class, the best l1_ratio is the rage of the l1_ratio's that correspond to the best scores for each d. l1_ratio_ is of shape(n_classes,) when the problem is binary.

: ndarray of shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs) ual number of iterations for all classes, folds and Cs. the binary or multinomial cases, the first dimension is equal to 1. penalty='elasticnet', the shape is (n_classes, n_folds, s, n_l1_ratios) or (1, n_folds, n_cs, n_l1_ratios).

res_in_ : int ber of features seen during :term:fit.

versionadded:: 0.24

names_in : ndarray of shape (n_features_in_,) es of features seen during :term:fit. Defined only when X feature names that are all strings.

versionadded:: 1.0

o

cRegression : Logistic regression without tuning the erparameter C.

s

m sklearn.datasets import load_iris m sklearn.linear_model import LogisticRegressionCV y = load_iris(return_X_y=True) = LogisticRegressionCV(cv=5, random_state=0).fit(X, y) .predict(X[:2, :]) 0, 0]) .predict_proba(X[:2, :]).shape

.score(X, y)

24.2.22 /mlp-classifier

name	type	default	description
n-iter-no-change
learning-rate
activation
hidden-layer-sizes
tol
beta-2
early-stopping
nesterovs-momentum
batch-size
solver
shuffle
power-t
max-fun
beta-1
max-iter
random-state
momentum
learning-rate-init
alpha
warm-start
validation-fraction
verbose
epsilon
predict-proba?

Multi-layer Perceptron classifier.

This model optimizes the log-loss function using LBFGS or stochastic
gradient descent.

*Added in 0.18*

Parameters
----------
- `hidden_layer_sizes`: array-like of shape(n_layers - 2,), default=(100,)
    The ith element represents the number of neurons in the ith
    hidden layer.

- `activation`: {'identity', 'logistic', 'tanh', 'relu'}, default='relu'
    Activation function for the hidden layer.

    - 'identity', no-op activation, useful to implement linear bottleneck,
      returns f(x) = x

    - 'logistic', the logistic sigmoid function,
      returns f(x) = 1 / (1 + exp(-x)).

    - 'tanh', the hyperbolic tan function,
      returns f(x) = tanh(x).

    - 'relu', the rectified linear unit function,
      returns f(x) = max(0, x)

- `solver`: {'lbfgs', 'sgd', 'adam'}, default='adam'
    The solver for weight optimization.

    - 'lbfgs' is an optimizer in the family of quasi-Newton methods.

    - 'sgd' refers to stochastic gradient descent.

    - 'adam' refers to a stochastic gradient-based optimizer proposed
      by Kingma, Diederik, and Jimmy Ba

    For a comparison between Adam optimizer and SGD, see
    :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_training_curves.py`.

    Note: The default solver 'adam' works pretty well on relatively
    large datasets (with thousands of training samples or more) in terms of
    both training time and validation score.
    For small datasets, however, 'lbfgs' can converge faster and perform
    better.

- `alpha`: float, default=0.0001
    Strength of the L2 regularization term. The L2 regularization term
    is divided by the sample size when added to the loss.

    For an example usage and visualization of varying regularization, see
    :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_alpha.py`.

- `batch_size`: int, default='auto'
    Size of minibatches for stochastic optimizers.
    If the solver is 'lbfgs', the classifier will not use minibatch.
    When set to "auto", `batch_size=min(200, n_samples)`.

- `learning_rate`: {'constant', 'invscaling', 'adaptive'}, default='constant'
    Learning rate schedule for weight updates.

    - 'constant' is a constant learning rate given by
      'learning_rate_init'.

    - 'invscaling' gradually decreases the learning rate at each
      time step 't' using an inverse scaling exponent of 'power_t'.
      effective_learning_rate = learning_rate_init / pow(t, power_t)

    - 'adaptive' keeps the learning rate constant to
      'learning_rate_init' as long as training loss keeps decreasing.
      Each time two consecutive epochs fail to decrease training loss by at
      least tol, or fail to increase validation score by at least tol if
      'early_stopping' is on, the current learning rate is divided by 5.

    Only used when ``solver='sgd'``.

- `learning_rate_init`: float, default=0.001
    The initial learning rate used. It controls the step-size
    in updating the weights. Only used when solver='sgd' or 'adam'.

- `power_t`: float, default=0.5
    The exponent for inverse scaling learning rate.
    It is used in updating effective learning rate when the learning_rate
    is set to 'invscaling'. Only used when solver='sgd'.

- `max_iter`: int, default=200
    Maximum number of iterations. The solver iterates until convergence
    (determined by 'tol') or this number of iterations. For stochastic
    solvers ('sgd', 'adam'), note that this determines the number of epochs
    (how many times each data point will be used), not the number of
    gradient steps.

- `shuffle`: bool, default=True
    Whether to shuffle samples in each iteration. Only used when
    solver='sgd' or 'adam'.

- `random_state`: int, RandomState instance, default=None
    Determines random number generation for weights and bias
    initialization, train-test split if early stopping is used, and batch
    sampling when solver='sgd' or 'adam'.
    Pass an int for reproducible results across multiple function calls.
    See `Glossary `.

- `tol`: float, default=1e-4
    Tolerance for the optimization. When the loss or score is not improving
    by at least ``tol`` for ``n_iter_no_change`` consecutive iterations,
    unless ``learning_rate`` is set to 'adaptive', convergence is
    considered to be reached and training stops.

- `verbose`: bool, default=False
    Whether to print progress messages to stdout.

- `warm_start`: bool, default=False
    When set to True, reuse the solution of the previous
    call to fit as initialization, otherwise, just erase the
    previous solution. See `the Glossary `.

- `momentum`: float, default=0.9
    Momentum for gradient descent update. Should be between 0 and 1. Only
    used when solver='sgd'.

- `nesterovs_momentum`: bool, default=True
    Whether to use Nesterov's momentum. Only used when solver='sgd' and
    momentum > 0.

- `early_stopping`: bool, default=False
    Whether to use early stopping to terminate training when validation
    score is not improving. If set to true, it will automatically set
    aside 10% of training data as validation and terminate training when
    validation score is not improving by at least ``tol`` for
    ``n_iter_no_change`` consecutive epochs. The split is stratified,
    except in a multilabel setting.
    If early stopping is False, then the training stops when the training
    loss does not improve by more than tol for n_iter_no_change consecutive
    passes over the training set.
    Only effective when solver='sgd' or 'adam'.

- `validation_fraction`: float, default=0.1
    The proportion of training data to set aside as validation set for
    early stopping. Must be between 0 and 1.
    Only used if early_stopping is True.

- `beta_1`: float, default=0.9
    Exponential decay rate for estimates of first moment vector in adam,
    should be in [0, 1). Only used when solver='adam'.

- `beta_2`: float, default=0.999
    Exponential decay rate for estimates of second moment vector in adam,
    should be in [0, 1). Only used when solver='adam'.

- `epsilon`: float, default=1e-8
    Value for numerical stability in adam. Only used when solver='adam'.

- `n_iter_no_change`: int, default=10
    Maximum number of epochs to not meet ``tol`` improvement.
    Only effective when solver='sgd' or 'adam'.

    *Added in 0.20*

- `max_fun`: int, default=15000
    Only used when solver='lbfgs'. Maximum number of loss function calls.
    The solver iterates until convergence (determined by 'tol'), number
    of iterations reaches max_iter, or this number of loss function calls.
    Note that number of loss function calls will be greater than or equal
    to the number of iterations for the `MLPClassifier`.

    *Added in 0.22*

Attributes
----------
- `classes_`: ndarray or list of ndarray of shape (n_classes,)
    Class labels for each output.

- `loss_`: float
    The current loss computed with the loss function.

- `best_loss_`: float or None
    The minimum loss reached by the solver throughout fitting.
    If `early_stopping=True`, this attribute is set to `None`. Refer to
    the `best_validation_score_` fitted attribute instead.

- `loss_curve_`: list of shape (`n_iter_`,)
    The ith element in the list represents the loss at the ith iteration.

- `validation_scores_`: list of shape (`n_iter_`,) or None
    The score at each iteration on a held-out validation set. The score
    reported is the accuracy score. Only available if `early_stopping=True`,
    otherwise the attribute is set to `None`.

- `best_validation_score_`: float or None
    The best validation score (i.e. accuracy score) that triggered the
    early stopping. Only available if `early_stopping=True`, otherwise the
    attribute is set to `None`.

- `t_`: int
    The number of training samples seen by the solver during fitting.

- `coefs_`: list of shape (n_layers - 1,)
    The ith element in the list represents the weight matrix corresponding
    to layer i.

- `intercepts_`: list of shape (n_layers - 1,)
    The ith element in the list represents the bias vector corresponding to
    layer i + 1.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_iter_`: int
    The number of iterations the solver has run.

- `n_layers_`: int
    Number of layers.

- `n_outputs_`: int
    Number of outputs.

- `out_activation_`: str
    Name of the output activation function.

See Also
--------
- `MLPRegressor`: Multi-layer Perceptron regressor.
- `BernoulliRBM`: Bernoulli Restricted Boltzmann Machine (RBM).

Notes
-----
MLPClassifier trains iteratively since at each time step
the partial derivatives of the loss function with respect to the model
parameters are computed to update the parameters.

It can also have a regularization term added to the loss function
that shrinks model parameters to prevent overfitting.

This implementation works with data represented as dense numpy arrays or
sparse scipy arrays of floating point values.

References
----------
Hinton, Geoffrey E. "Connectionist learning procedures."
Artificial intelligence 40.1 (1989): 185-234.

Glorot, Xavier, and Yoshua Bengio.
"Understanding the difficulty of training deep feedforward neural networks."
International Conference on Artificial Intelligence and Statistics. 2010.

:arxiv:`He, Kaiming, et al (2015). "Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification." <1502.01852>`

:arxiv:`Kingma, Diederik, and Jimmy Ba (2014)
"Adam: A method for stochastic optimization." <1412.6980>`

Examples
--------
>>> from sklearn.neural_network import MLPClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=100, random_state=1)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
...                                                     random_state=1)
>>> clf = MLPClassifier(random_state=1, max_iter=300).fit(X_train, y_train)
>>> clf.predict_proba(X_test[:1])
array([[0.038..., 0.961...]])
>>> clf.predict(X_test[:5, :])
array([1, 0, 1, 0, 1])
>>> clf.score(X_test, y_test)
0.8...

24.2.23 /multinomial-nb

name	type	default	description
alpha
class-prior
fit-prior
force-alpha
predict-proba?

Naive Bayes classifier for multinomial models.

The multinomial Naive Bayes classifier is suitable for classification with
discrete features (e.g., word counts for text classification). The
multinomial distribution normally requires integer feature counts. However,
in practice, fractional counts such as tf-idf may also work.

Read more in the User Guide: `multinomial_naive_bayes`.

Parameters
----------
- `alpha`: float or array-like of shape (n_features,), default=1.0
    Additive (Laplace/Lidstone) smoothing parameter
    (set alpha=0 and force_alpha=True, for no smoothing).

- `force_alpha`: bool, default=True
    If False and alpha is less than 1e-10, it will set alpha to
    1e-10. If True, alpha will remain unchanged. This may cause
    numerical errors if alpha is too close to 0.

    *Added in 1.2*
    *Changed in 1.4*
       The default value of `force_alpha` changed to `True`.

- `fit_prior`: bool, default=True
    Whether to learn class prior probabilities or not.
    If false, a uniform prior will be used.

- `class_prior`: array-like of shape (n_classes,), default=None
    Prior probabilities of the classes. If specified, the priors are not
    adjusted according to the data.

Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
    Number of samples encountered for each class during fitting. This
    value is weighted by the sample weight when provided.

- `class_log_prior_`: ndarray of shape (n_classes,)
    Smoothed empirical log probability for each class.

- `classes_`: ndarray of shape (n_classes,)
    Class labels known to the classifier

- `feature_count_`: ndarray of shape (n_classes, n_features)
    Number of samples encountered for each (class, feature)
    during fitting. This value is weighted by the sample weight when
    provided.

- `feature_log_prob_`: ndarray of shape (n_classes, n_features)
    Empirical log probability of features
    given a class, ``P(x_i|y)``.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `ComplementNB`: Complement Naive Bayes classifier.
- `GaussianNB`: Gaussian Naive Bayes.

References
----------
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to
Information Retrieval. Cambridge University Press, pp. 234-265.
https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, y)
MultinomialNB()
>>> print(clf.predict(X[2:3]))
[3]

24.2.24 /nearest-centroid

name	type	default	description
metric
shrink-threshold
predict-proba?

Nearest centroid classifier.

Each class is represented by its centroid, with test samples classified to
the class with the nearest centroid.

Read more in the User Guide: `nearest_centroid_classifier`.

Parameters
----------
- `metric`: {"euclidean", "manhattan"}, default="euclidean"
    Metric to use for distance computation.

    If `metric="euclidean"`, the centroid for the samples corresponding to each
    class is the arithmetic mean, which minimizes the sum of squared L1 distances.
    If `metric="manhattan"`, the centroid is the feature-wise median, which
    minimizes the sum of L1 distances.

    *Changed in 1.5*
        All metrics but `"euclidean"` and `"manhattan"` were deprecated and
        now raise an error.

    *Changed in 0.19*
        `metric='precomputed'` was deprecated and now raises an error

- `shrink_threshold`: float, default=None
    Threshold for shrinking centroids to remove features.

Attributes
----------
- `centroids_`: array-like of shape (n_classes, n_features)
    Centroid of each class.

- `classes_`: array of shape (n_classes,)
    The unique classes labels.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `KNeighborsClassifier`: Nearest neighbors classifier.

Notes
-----
When used for text classification with tf-idf vectors, this classifier is
also known as the Rocchio classifier.

References
----------
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of
multiple cancer types by shrunken centroids of gene expression. Proceedings
of the National Academy of Sciences of the United States of America,
99(10), 6567-6572. The National Academy of Sciences.

Examples
--------
>>> from sklearn.neighbors import NearestCentroid
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = NearestCentroid()
>>> clf.fit(X, y)
NearestCentroid()
>>> print(clf.predict([[-0.8, -1]]))
[1]

For a more detailed example see:
:ref:`sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py`

24.2.25 /nu-svc

name	type	default	description
break-ties
kernel
gamma
degree
decision-function-shape
probability
tol
nu
shrinking
max-iter
random-state
coef-0
class-weight
cache-size
verbose
predict-proba?

Nu-Support Vector Classification.

Similar to SVC but uses a parameter to control the number of support
vectors.

The implementation is based on libsvm.

Read more in the User Guide: `svm_classification`.

Parameters
----------
- `nu`: float, default=0.5
    An upper bound on the fraction of margin errors (see User Guide: `nu_svc`) and a lower bound of the fraction of support vectors.
    Should be in the interval (0, 1].

- `kernel`: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable,          default='rbf'
    Specifies the kernel type to be used in the algorithm.
    If none is given, 'rbf' will be used. If a callable is given it is
    used to precompute the kernel matrix. For an intuitive
    visualization of different kernel types see
    :ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`.

- `degree`: int, default=3
    Degree of the polynomial kernel function ('poly').
    Must be non-negative. Ignored by all other kernels.

- `gamma`: {'scale', 'auto'} or float, default='scale'
    Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.

    - if ``gamma='scale'`` (default) is passed then it uses
      1 / (n_features * X.var()) as value of gamma,
    - if 'auto', uses 1 / n_features
    - if float, must be non-negative.

    *Changed in 0.22*
       The default value of ``gamma`` changed from 'auto' to 'scale'.

- `coef0`: float, default=0.0
    Independent term in kernel function.
    It is only significant in 'poly' and 'sigmoid'.

- `shrinking`: bool, default=True
    Whether to use the shrinking heuristic.
    See the User Guide: `shrinking_svm`.

- `probability`: bool, default=False
    Whether to enable probability estimates. This must be enabled prior
    to calling `fit`, will slow down that method as it internally uses
    5-fold cross-validation, and `predict_proba` may be inconsistent with
    `predict`. Read more in the User Guide: `scores_probabilities`.

- `tol`: float, default=1e-3
    Tolerance for stopping criterion.

- `cache_size`: float, default=200
    Specify the size of the kernel cache (in MB).

- `class_weight`: {dict, 'balanced'}, default=None
    Set the parameter C of class i to class_weight[i]*C for
    SVC. If not given, all classes are supposed to have
    weight one. The "balanced" mode uses the values of y to automatically
    adjust weights inversely proportional to class frequencies as
    ``n_samples / (n_classes * np.bincount(y))``.

- `verbose`: bool, default=False
    Enable verbose output. Note that this setting takes advantage of a
    per-process runtime setting in libsvm that, if enabled, may not work
    properly in a multithreaded context.

- `max_iter`: int, default=-1
    Hard limit on iterations within solver, or -1 for no limit.

- `decision_function_shape`: {'ovo', 'ovr'}, default='ovr'
    Whether to return a one-vs-rest ('ovr') decision function of shape
    (n_samples, n_classes) as all other classifiers, or the original
    one-vs-one ('ovo') decision function of libsvm which has shape
    (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one
    ('ovo') is always used as multi-class strategy. The parameter is
    ignored for binary classification.

    *Changed in 0.19*
        decision_function_shape is 'ovr' by default.

    *Added in 0.17*
       *decision_function_shape='ovr'* is recommended.

    *Changed in 0.17*
       Deprecated *decision_function_shape='ovo' and None*.

- `break_ties`: bool, default=False
    If true, ``decision_function_shape='ovr'``, and number of classes > 2,
    `predict` will break ties according to the confidence values of
    `decision_function`; otherwise the first class among the tied
    classes is returned. Please note that breaking ties comes at a
    relatively high computational cost compared to a simple predict.

    *Added in 0.22*

- `random_state`: int, RandomState instance or None, default=None
    Controls the pseudo random number generation for shuffling the data for
    probability estimates. Ignored when `probability` is False.
    Pass an int for reproducible output across multiple function calls.
    See `Glossary `.

Attributes
----------
- `class_weight_`: ndarray of shape (n_classes,)
    Multipliers of parameter C of each class.
    Computed based on the ``class_weight`` parameter.

- `classes_`: ndarray of shape (n_classes,)
    The unique classes labels.

- `coef_`: ndarray of shape (n_classes * (n_classes -1) / 2, n_features)
    Weights assigned to the features (coefficients in the primal
    problem). This is only available in the case of a linear kernel.

    `coef_` is readonly property derived from `dual_coef_` and
    `support_vectors_`.

- `dual_coef_`: ndarray of shape (n_classes - 1, n_SV)
    Dual coefficients of the support vector in the decision
    function (see :ref:`sgd_mathematical_formulation`), multiplied by
    their targets.
    For multiclass, coefficient for all 1-vs-1 classifiers.
    The layout of the coefficients in the multiclass case is somewhat
    non-trivial. See the multi-class section of the User Guide: `svm_multi_class` for details.

- `fit_status_`: int
    0 if correctly fitted, 1 if the algorithm did not converge.

- `intercept_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)
    Constants in decision function.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_iter_`: ndarray of shape (n_classes * (n_classes - 1) // 2,)
    Number of iterations run by the optimization routine to fit the model.
    The shape of this attribute depends on the number of models optimized
    which in turn depends on the number of classes.

    *Added in 1.1*

- `support_`: ndarray of shape (n_SV,)
    Indices of support vectors.

- `support_vectors_`: ndarray of shape (n_SV, n_features)
    Support vectors.

- `n_support_`: ndarray of shape (n_classes,), dtype=int32
    Number of support vectors for each class.

- `fit_status_`: int
    0 if correctly fitted, 1 if the algorithm did not converge.

- `probA_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)

- `probB_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)
    If `probability=True`, it corresponds to the parameters learned in
    Platt scaling to produce probability estimates from decision values.
    If `probability=False`, it's an empty array. Platt scaling uses the
    logistic function
    ``1 / (1 + exp(decision_value * probA_ + probB_))``
    where ``probA_`` and ``probB_`` are learned from the dataset [2]_. For
    more information on the multiclass case and training procedure see
    section 8 of [1]_.

- `shape_fit_`: tuple of int of shape (n_dimensions_of_X,)
    Array dimensions of training vector ``X``.

See Also
--------
- `SVC`: Support Vector Machine for classification using libsvm.

- `LinearSVC`: Scalable linear Support Vector Machine for classification using
    liblinear.

References
----------

[1] LIBSVM: A Library for Support Vector Machines
[2] Platt, John (1999). "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods"
Examples
import numpy as np X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import NuSVC clf = make_pipeline(StandardScaler(), NuSVC()) clf.fit(X, y) Pipeline(steps=[('standardscaler', StandardScaler()), ('nusvc', NuSVC())]) print(clf.predict([[-0.8, -1]])) [1]

24.2.26 /passive-aggressive-classifier

name	type	default	description
n-iter-no-change
average
tol
early-stopping
shuffle
c
max-iter
n-jobs
random-state
fit-intercept
warm-start
validation-fraction
class-weight
loss
verbose
predict-proba?

Passive Aggressive Classifier.

Read more in the User Guide: `passive_aggressive`.

Parameters
----------
- `C`: float, default=1.0
    Maximum step size (regularization). Defaults to 1.0.

- `fit_intercept`: bool, default=True
    Whether the intercept should be estimated or not. If False, the
    data is assumed to be already centered.

- `max_iter`: int, default=1000
    The maximum number of passes over the training data (aka epochs).
    It only impacts the behavior in the ``fit`` method, and not the
    `~sklearn.linear_model.PassiveAggressiveClassifier.partial_fit` method.

    *Added in 0.19*

- `tol`: float or None, default=1e-3
    The stopping criterion. If it is not None, the iterations will stop
    when (loss > previous_loss - tol).

    *Added in 0.19*

- `early_stopping`: bool, default=False
    Whether to use early stopping to terminate training when validation
    score is not improving. If set to True, it will automatically set aside
    a stratified fraction of training data as validation and terminate
    training when validation score is not improving by at least `tol` for
    `n_iter_no_change` consecutive epochs.

    *Added in 0.20*

- `validation_fraction`: float, default=0.1
    The proportion of training data to set aside as validation set for
    early stopping. Must be between 0 and 1.
    Only used if early_stopping is True.

    *Added in 0.20*

- `n_iter_no_change`: int, default=5
    Number of iterations with no improvement to wait before early stopping.

    *Added in 0.20*

- `shuffle`: bool, default=True
    Whether or not the training data should be shuffled after each epoch.

- `verbose`: int, default=0
    The verbosity level.

- `loss`: str, default="hinge"
    The loss function to be used:
    hinge: equivalent to PA-I in the reference paper.
    squared_hinge: equivalent to PA-II in the reference paper.

- `n_jobs`: int or None, default=None
    The number of CPUs to use to do the OVA (One Versus All, for
    multi-class problems) computation.
    ``None`` means 1 unless in a `joblib.parallel_backend` context.
    ``-1`` means using all processors. See `Glossary `
    for more details.

- `random_state`: int, RandomState instance, default=None
    Used to shuffle the training data, when ``shuffle`` is set to
    ``True``. Pass an int for reproducible output across multiple
    function calls.
    See `Glossary `.

- `warm_start`: bool, default=False
    When set to True, reuse the solution of the previous call to fit as
    initialization, otherwise, just erase the previous solution.
    See `the Glossary `.

    Repeatedly calling fit or partial_fit when warm_start is True can
    result in a different solution than when calling fit a single time
    because of the way the data is shuffled.

- `class_weight`: dict, {class_label: weight} or "balanced" or None,             default=None
    Preset for the class_weight fit parameter.

    Weights associated with classes. If not given, all classes
    are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``.

    *Added in 0.17*
       parameter *class_weight* to automatically weight samples.

- `average`: bool or int, default=False
    When set to True, computes the averaged SGD weights and stores the
    result in the ``coef_`` attribute. If set to an int greater than 1,
    averaging will begin once the total number of samples seen reaches
    average. So average=10 will begin averaging after seeing 10 samples.

    *Added in 0.19*
       parameter *average* to use weights averaging in SGD.

Attributes
----------
- `coef_`: ndarray of shape (1, n_features) if n_classes == 2 else             (n_classes, n_features)
    Weights assigned to the features.

- `intercept_`: ndarray of shape (1,) if n_classes == 2 else (n_classes,)
    Constants in decision function.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_iter_`: int
    The actual number of iterations to reach the stopping criterion.
    For multiclass fits, it is the maximum over every binary fit.

- `classes_`: ndarray of shape (n_classes,)
    The unique classes labels.

- `t_`: int
    Number of weight updates performed during training.
    Same as ``(n_iter_ * n_samples + 1)``.

- `loss_function_`: callable
    Loss function used by the algorithm.

     *Deprecated since 1.4*
        Attribute `loss_function_` was deprecated in version 1.4 and will be
        removed in 1.6.

See Also
--------
- `SGDClassifier`: Incrementally trained logistic regression.
- `Perceptron`: Linear perceptron classifier.

References
----------
Online Passive-Aggressive Algorithms

K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR (2006)

Examples
--------
>>> from sklearn.linear_model import PassiveAggressiveClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_features=4, random_state=0)
>>> clf = PassiveAggressiveClassifier(max_iter=1000, random_state=0,
... tol=1e-3)
>>> clf.fit(X, y)
PassiveAggressiveClassifier(random_state=0)
>>> print(clf.coef_)
[[0.26642044 0.45070924 0.67251877 0.64185414]]
>>> print(clf.intercept_)
[1.84127814]
>>> print(clf.predict([[0, 0, 0, 0]]))
[1]

24.2.27 /perceptron

name	type	default	description
n-iter-no-change
tol
early-stopping
eta-0
shuffle
penalty
max-iter
n-jobs
random-state
fit-intercept
alpha
warm-start
l-1-ratio
validation-fraction
class-weight
verbose
predict-proba?

Linear perceptron classifier.

The implementation is a wrapper around `~sklearn.linear_model.SGDClassifier`
by fixing the `loss` and `learning_rate` parameters as

SGDClassifier(loss="perceptron", learning_rate="constant")

r available parameters are described below and are forwarded to
ss:`~sklearn.linear_model.SGDClassifier`.

 more in the :ref:`User Guide `.

meters
------

lty : {'l2','l1','elasticnet'}, default=None
The penalty (aka regularization term) to be used.

a : float, default=0.0001
Constant that multiplies the regularization term if regularization is
used.

atio : float, default=0.15
The Elastic Net mixing parameter, with `0 <= l1_ratio <= 1`.
`l1_ratio=0` corresponds to L2 penalty, `l1_ratio=1` to L1.
Only used if `penalty='elasticnet'`.

.. versionadded:: 0.24

intercept : bool, default=True
Whether the intercept should be estimated or not. If False, the
data is assumed to be already centered.

iter : int, default=1000
The maximum number of passes over the training data (aka epochs).
It only impacts the behavior in the ``fit`` method, and not the
:meth:`partial_fit` method.

.. versionadded:: 0.19

: float or None, default=1e-3
The stopping criterion. If it is not None, the iterations will stop
when (loss > previous_loss - tol).

.. versionadded:: 0.19

fle : bool, default=True
Whether or not the training data should be shuffled after each epoch.

ose : int, default=0
The verbosity level.

 : float, default=1
Constant by which the updates are multiplied.

bs : int, default=None
The number of CPUs to use to do the OVA (One Versus All, for
multi-class problems) computation.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary `
for more details.

om_state : int, RandomState instance or None, default=0
Used to shuffle the training data, when ``shuffle`` is set to
``True``. Pass an int for reproducible output across multiple
function calls.
See :term:`Glossary `.

y_stopping : bool, default=False
Whether to use early stopping to terminate training when validation
score is not improving. If set to True, it will automatically set aside
a stratified fraction of training data as validation and terminate
training when validation score is not improving by at least `tol` for
`n_iter_no_change` consecutive epochs.

.. versionadded:: 0.20

dation_fraction : float, default=0.1
The proportion of training data to set aside as validation set for
early stopping. Must be between 0 and 1.
Only used if early_stopping is True.

.. versionadded:: 0.20

er_no_change : int, default=5
Number of iterations with no improvement to wait before early stopping.

.. versionadded:: 0.20

s_weight : dict, {class_label: weight} or "balanced", default=None
Preset for the class_weight fit parameter.

Weights associated with classes. If not given, all classes
are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.

_start : bool, default=False
When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution. See
:term:`the Glossary `.

ibutes
------
ses_ : ndarray of shape (n_classes,)
The unique classes labels.

_ : ndarray of shape (1, n_features) if n_classes == 2 else             (n_classes, n_features)
Weights assigned to the features.

rcept_ : ndarray of shape (1,) if n_classes == 2 else (n_classes,)
Constants in decision function.

_function_ : concrete LossFunction
The function that determines the loss, or difference between the
output of the algorithm and the target values.

atures_in_ : int
Number of features seen during :term:`fit`.

.. versionadded:: 0.24

ure_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during :term:`fit`. Defined only when `X`
has feature names that are all strings.

.. versionadded:: 1.0

er_ : int
The actual number of iterations to reach the stopping criterion.
For multiclass fits, it is the maximum over every binary fit.

 int
Number of weight updates performed during training.
Same as ``(n_iter_ * n_samples + 1)``.

Also
----
arn.linear_model.SGDClassifier : Linear classifiers
(SVM, logistic regression, etc.) with SGD training.

s
-
rceptron`` is a classification algorithm which shares the same
rlying implementation with ``SGDClassifier``. In fact,
rceptron()`` is equivalent to `SGDClassifier(loss="perceptron",
=1, learning_rate="constant", penalty=None)`.

rences
------
s://en.wikipedia.org/wiki/Perceptron and references therein.

ples
----
from sklearn.datasets import load_digits
from sklearn.linear_model import Perceptron
X, y = load_digits(return_X_y=True)
clf = Perceptron(tol=1e-3, random_state=0)
clf.fit(X, y)
eptron()
clf.score(X, y)
9...

24.2.28 /quadratic-discriminant-analysis

name	type	default	description
priors
reg-param
store-covariance
tol
predict-proba?

Quadratic Discriminant Analysis.

A classifier with a quadratic decision boundary, generated
by fitting class conditional densities to the data
and using Bayes' rule.

The model fits a Gaussian density to each class.

*Added in 0.17*

For a comparison between
`~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis`
and `~sklearn.discriminant_analysis.LinearDiscriminantAnalysis`, see
:ref:`sphx_glr_auto_examples_classification_plot_lda_qda.py`.

Read more in the User Guide: `lda_qda`.

Parameters
----------
- `priors`: array-like of shape (n_classes,), default=None
    Class priors. By default, the class proportions are inferred from the
    training data.

- `reg_param`: float, default=0.0
    Regularizes the per-class covariance estimates by transforming S2 as
    ``S2 = (1 - reg_param) * S2 + reg_param * np.eye(n_features)``,
    where S2 corresponds to the `scaling_` attribute of a given class.

- `store_covariance`: bool, default=False
    If True, the class covariance matrices are explicitly computed and
    stored in the `self.covariance_` attribute.

    *Added in 0.17*

- `tol`: float, default=1.0e-4
    Absolute threshold for a singular value to be considered significant,
    used to estimate the rank of `Xk` where `Xk` is the centered matrix
    of samples in class k. This parameter does not affect the
    predictions. It only controls a warning that is raised when features
    are considered to be colinear.

    *Added in 0.17*

Attributes
----------
- `covariance_`: list of len n_classes of ndarray             of shape (n_features, n_features)
    For each class, gives the covariance matrix estimated using the
    samples of that class. The estimations are unbiased. Only present if
    `store_covariance` is True.

- `means_`: array-like of shape (n_classes, n_features)
    Class-wise means.

- `priors_`: array-like of shape (n_classes,)
    Class priors (sum to 1).

- `rotations_`: list of len n_classes of ndarray of shape (n_features, n_k)
    For each class k an array of shape (n_features, n_k), where
    ``n_k = min(n_features, number of elements in class k)``
    It is the rotation of the Gaussian distribution, i.e. its
    principal axis. It corresponds to `V`, the matrix of eigenvectors
    coming from the SVD of `Xk = U S Vt` where `Xk` is the centered
    matrix of samples from class k.

- `scalings_`: list of len n_classes of ndarray of shape (n_k,)
    For each class, contains the scaling of
    the Gaussian distributions along its principal axes, i.e. the
    variance in the rotated coordinate system. It corresponds to `S^2 /
    (n_samples - 1)`, where `S` is the diagonal matrix of singular values
    from the SVD of `Xk`, where `Xk` is the centered matrix of samples
    from class k.

- `classes_`: ndarray of shape (n_classes,)
    Unique class labels.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `LinearDiscriminantAnalysis`: Linear Discriminant Analysis.

Examples
--------
>>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QuadraticDiscriminantAnalysis()
>>> clf.fit(X, y)
QuadraticDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

24.2.29 /radius-neighbors-classifier

name	type	default	description
weights
p
leaf-size
metric-params
radius
outlier-label
algorithm
n-jobs
metric
predict-proba?

Classifier implementing a vote among neighbors within a given radius.

Read more in the User Guide: `classification`.

Parameters
----------
- `radius`: float, default=1.0
    Range of parameter space to use by default for `radius_neighbors`
    queries.

- `weights`: {'uniform', 'distance'}, callable or None, default='uniform'
    Weight function used in prediction.  Possible values:

    - 'uniform' : uniform weights.  All points in each neighborhood
      are weighted equally.
    - 'distance' : weight points by the inverse of their distance.
      in this case, closer neighbors of a query point will have a
      greater influence than neighbors which are further away.
    - [callable] : a user-defined function which accepts an
      array of distances, and returns an array of the same shape
      containing the weights.

    Uniform weights are used by default.

- `algorithm`: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
    Algorithm used to compute the nearest neighbors:

    - 'ball_tree' will use `BallTree`
    - 'kd_tree' will use `KDTree`
    - 'brute' will use a brute-force search.
    - 'auto' will attempt to decide the most appropriate algorithm
      based on the values passed to `fit` method.

    Note: fitting on sparse input will override the setting of
    this parameter, using brute force.

- `leaf_size`: int, default=30
    Leaf size passed to BallTree or KDTree.  This can affect the
    speed of the construction and query, as well as the memory
    required to store the tree.  The optimal value depends on the
    nature of the problem.

- `p`: float, default=2
    Power parameter for the Minkowski metric. When p = 1, this is
    equivalent to using manhattan_distance (l1), and euclidean_distance
    (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
    This parameter is expected to be positive.

- `metric`: str or callable, default='minkowski'
    Metric to use for distance computation. Default is "minkowski", which
    results in the standard Euclidean distance when p = 2. See the
    documentation of [scipy.spatial.distance
    ](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) and
    the metrics listed in
    `~sklearn.metrics.pairwise.distance_metrics` for valid metric
    values.

    If metric is "precomputed", X is assumed to be a distance matrix and
    must be square during fit. X may be a `sparse graph`, in which
    case only "nonzero" elements may be considered neighbors.

    If metric is a callable function, it takes two arrays representing 1D
    vectors as inputs and must return one value indicating the distance
    between those vectors. This works for Scipy's metrics, but is less
    efficient than passing the metric name as a string.

- `outlier_label`: {manual label, 'most_frequent'}, default=None
    Label for outlier samples (samples with no neighbors in given radius).

    - manual label: str or int label (should be the same type as y)
      or list of manual labels if multi-output is used.
    - 'most_frequent' : assign the most frequent label of y to outliers.
    - None : when any outlier is detected, ValueError will be raised.

    The outlier label should be selected from among the unique 'Y' labels.
    If it is specified with a different value a warning will be raised and
    all class probabilities of outliers will be assigned to be 0.

- `metric_params`: dict, default=None
    Additional keyword arguments for the metric function.

- `n_jobs`: int, default=None
    The number of parallel jobs to run for neighbors search.
    ``None`` means 1 unless in a `joblib.parallel_backend` context.
    ``-1`` means using all processors. See `Glossary `
    for more details.

Attributes
----------
- `classes_`: ndarray of shape (n_classes,)
    Class labels known to the classifier.

- `effective_metric_`: str or callable
    The distance metric used. It will be same as the `metric` parameter
    or a synonym of it, e.g. 'euclidean' if the `metric` parameter set to
    'minkowski' and `p` parameter set to 2.

- `effective_metric_params_`: dict
    Additional keyword arguments for the metric function. For most metrics
    will be same with `metric_params` parameter, but may also contain the
    `p` parameter value if the `effective_metric_` attribute is set to
    'minkowski'.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_samples_fit_`: int
    Number of samples in the fitted data.

- `outlier_label_`: int or array-like of shape (n_class,)
    Label which is given for outlier samples (samples with no neighbors
    on given radius).

- `outputs_2d_`: bool
    False when `y`'s shape is (n_samples, ) or (n_samples, 1) during fit
    otherwise True.

See Also
--------
- `KNeighborsClassifier`: Classifier implementing the k-nearest neighbors
    vote.
- `RadiusNeighborsRegressor`: Regression based on neighbors within a
    fixed radius.
- `KNeighborsRegressor`: Regression based on k-nearest neighbors.
- `NearestNeighbors`: Unsupervised learner for implementing neighbor
    searches.

Notes
-----
See Nearest Neighbors: `neighbors` in the online documentation
for a discussion of the choice of ``algorithm`` and ``leaf_size``.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples
--------
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsClassifier
>>> neigh = RadiusNeighborsClassifier(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsClassifier(...)
>>> print(neigh.predict([[1.5]]))
[0]
>>> print(neigh.predict_proba([[1.0]]))
[[0.66666667 0.33333333]]

24.2.30 /random-forest-classifier

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
class-weight
n-estimators
max-samples
criterion
verbose
predict-proba?

A random forest classifier.

A random forest is a meta estimator that fits a number of decision tree
classifiers on various sub-samples of the dataset and uses averaging to
improve the predictive accuracy and control over-fitting.
Trees in the forest use the best split strategy, i.e. equivalent to passing
`splitter="best"` to the underlying `~sklearn.tree.DecisionTreeRegressor`.
The sub-sample size is controlled with the `max_samples` parameter if
`bootstrap=True` (default), otherwise the whole dataset is used to build
each tree.

For a comparison between tree-based ensemble models see the example
:ref:`sphx_glr_auto_examples_ensemble_plot_forest_hist_grad_boosting_comparison.py`.

Read more in the User Guide: `forest`.

Parameters
----------
- `n_estimators`: int, default=100
    The number of trees in the forest.

    *Changed in 0.22*
       The default value of ``n_estimators`` changed from 10 to 100
       in 0.22.

- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
    The function to measure the quality of a split. Supported criteria are
    "gini" for the Gini impurity and "log_loss" and "entropy" both for the
    Shannon information gain, see :ref:`tree_mathematical_formulation`.
    Note: This parameter is tree-specific.

- `max_depth`: int, default=None
    The maximum depth of the tree. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    min_samples_split samples.

- `min_samples_split`: int or float, default=2
    The minimum number of samples required to split an internal node:

    - If int, then consider `min_samples_split` as the minimum number.
    - If float, then `min_samples_split` is a fraction and
      `ceil(min_samples_split * n_samples)` are the minimum
      number of samples for each split.

    *Changed in 0.18*
       Added float values for fractions.

- `min_samples_leaf`: int or float, default=1
    The minimum number of samples required to be at a leaf node.
    A split point at any depth will only be considered if it leaves at
    least ``min_samples_leaf`` training samples in each of the left and
    right branches.  This may have the effect of smoothing the model,
    especially in regression.

    - If int, then consider `min_samples_leaf` as the minimum number.
    - If float, then `min_samples_leaf` is a fraction and
      `ceil(min_samples_leaf * n_samples)` are the minimum
      number of samples for each node.

    *Changed in 0.18*
       Added float values for fractions.

- `min_weight_fraction_leaf`: float, default=0.0
    The minimum weighted fraction of the sum total of weights (of all
    the input samples) required to be at a leaf node. Samples have
    equal weight when sample_weight is not provided.

- `max_features`: {"sqrt", "log2", None}, int or float, default="sqrt"
    The number of features to consider when looking for the best split:

    - If int, then consider `max_features` features at each split.
    - If float, then `max_features` is a fraction and
      `max(1, int(max_features * n_features_in_))` features are considered at each
      split.
    - If "sqrt", then `max_features=sqrt(n_features)`.
    - If "log2", then `max_features=log2(n_features)`.
    - If None, then `max_features=n_features`.

    *Changed in 1.1*
        The default of `max_features` changed from `"auto"` to `"sqrt"`.

    Note: the search for a split does not stop until at least one
    valid partition of the node samples is found, even if it requires to
    effectively inspect more than ``max_features`` features.

- `max_leaf_nodes`: int, default=None
    Grow trees with ``max_leaf_nodes`` in best-first fashion.
    Best nodes are defined as relative reduction in impurity.
    If None then unlimited number of leaf nodes.

- `min_impurity_decrease`: float, default=0.0
    A node will be split if this split induces a decrease of the impurity
    greater than or equal to this value.

    The weighted impurity decrease equation is the following

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
 child, and ``N_t_R`` is the number of samples in the right child.

`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.

ersionadded:: 0.19

p : bool, default=True
her bootstrap samples are used when building trees. If False, the
e dataset is used to build each tree.

e : bool or callable, default=False
her to use out-of-bag samples to estimate the generalization score.
efault, :func:`~sklearn.metrics.accuracy_score` is used.
ide a callable with signature `metric(y_true, y_pred)` to use a
om metric. Only available if `bootstrap=True`.

 int, default=None
number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
h:`decision_path` and :meth:`apply` are all parallelized over the
s. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
ext. ``-1`` means using all processors. See :term:`Glossary
obs>` for more details.

tate : int, RandomState instance or None, default=None
rols both the randomness of the bootstrapping of the samples used
 building trees (if ``bootstrap=True``) and the sampling of the
ures to consider when looking for the best split at each node
``max_features < n_features``).
:term:`Glossary ` for details.

: int, default=0
rols the verbosity when fitting and predicting.

rt : bool, default=False
 set to ``True``, reuse the solution of the previous call to fit
add more estimators to the ensemble, otherwise, just fit a whole
forest. See :term:`Glossary ` and
:`tree_ensemble_warm_start` for details.

ight : {"balanced", "balanced_subsample"}, dict or list of dicts,             default=None
hts associated with classes in the form ``{class_label: weight}``.
ot given, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.

 that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].

"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``

"balanced_subsample" mode is the same as "balanced" except that
hts are computed based on the bootstrap sample for every tree
n.

multi-output, the weights of each column of y will be multiplied.

 that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.

a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.

ersionadded:: 0.22

les : int or float, default=None
ootstrap is True, the number of samples to draw from X
rain each base estimator.

 None (default), then draw `X.shape[0]` samples.
 int, then draw `max_samples` samples.
 float, then draw `max(round(n_samples * max_samples), 1)` samples. Thus,
ax_samples` should be in the interval `(0.0, 1.0]`.

ersionadded:: 0.22

c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonic increase
0: no constraint
-1: monotonic decrease

onotonic_cst is None, no constraints are applied.

tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.

constraints hold over the probability of the positive class.

 more in the :ref:`User Guide `.

ersionadded:: 1.4

es
--
r_ : :class:`~sklearn.tree.DecisionTreeClassifier`
child estimator template used to create the collection of fitted
estimators.

ersionadded:: 1.2
base_estimator_` was renamed to `estimator_`.

rs_ : list of DecisionTreeClassifier
collection of fitted sub-estimators.

 : ndarray of shape (n_classes,) or a list of such arrays
classes labels (single output problem), or a list of arrays of
s labels (multi-output problem).

s_ : int or list
number of classes (single output problem), or a list containing the
er of classes for each output (multi-output problem).

es_in_ : int
er of features seen during :term:`fit`.

ersionadded:: 0.24

names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.

ersionadded:: 1.0

s_ : int
number of outputs when ``fit`` is performed.

importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature.  It is also
n as the Gini importance.

ing: impurity-based feature importances can be misleading for
 cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.

e_ : float
e of the training dataset obtained using an out-of-bag estimate.
 attribute exists only when ``oob_score`` is True.

sion_function_ : ndarray of shape (n_samples, n_classes) or             (n_samples, n_classes, n_outputs)
sion function computed with out-of-bag estimate on the training
 If n_estimators is small it might be possible that a data point
never left out during the bootstrap. In this case,
_decision_function_` might contain NaN. This attribute exists
 when ``oob_score`` is True.

rs_samples_ : list of arrays
subset of drawn samples (i.e., the in-bag samples) for each base
mator. Each subset is defined by an array of the indices selected.

ersionadded:: 1.4



tree.DecisionTreeClassifier : A decision tree classifier.
ensemble.ExtraTreesClassifier : Ensemble of extremely randomized
 classifiers.
ensemble.HistGradientBoostingClassifier : A Histogram-based Gradient
ting Classification Tree, very fast for big datasets (n_samples >=
00).



ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
 trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.

ures are always randomly permuted at each split. Therefore,
 found split may vary, even with the same training data,
atures=n_features`` and ``bootstrap=False``, if the improvement
riterion is identical for several splits enumerated during the
f the best split. To obtain a deterministic behaviour during
 ``random_state`` has to be fixed.

es
--
. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.



 sklearn.ensemble import RandomForestClassifier
 sklearn.datasets import make_classification
 = make_classification(n_samples=1000, n_features=4,
                       n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)
= RandomForestClassifier(max_depth=2, random_state=0)
fit(X, y)
restClassifier(...)
t(clf.predict([[0, 0, 0, 0]]))

24.2.31 /ridge-classifier

name	type	default	description
positive
tol
solver
max-iter
random-state
copy-x
fit-intercept
alpha
class-weight
predict-proba?

Classifier using Ridge regression.

This classifier first converts the target values into ``{-1, 1}`` and
then treats the problem as a regression task (multi-output regression in
the multiclass case).

Read more in the User Guide: `ridge_regression`.

Parameters
----------
- `alpha`: float, default=1.0
    Regularization strength; must be a positive float. Regularization
    improves the conditioning of the problem and reduces the variance of
    the estimates. Larger values specify stronger regularization.
    Alpha corresponds to ``1 / (2C)`` in other linear models such as
    `~sklearn.linear_model.LogisticRegression` or
    `~sklearn.svm.LinearSVC`.

- `fit_intercept`: bool, default=True
    Whether to calculate the intercept for this model. If set to false, no
    intercept will be used in calculations (e.g. data is expected to be
    already centered).

- `copy_X`: bool, default=True
    If True, X will be copied; else, it may be overwritten.

- `max_iter`: int, default=None
    Maximum number of iterations for conjugate gradient solver.
    The default value is determined by scipy.sparse.linalg.

- `tol`: float, default=1e-4
    The precision of the solution (`coef_`) is determined by `tol` which
    specifies a different convergence criterion for each solver:

    - 'svd': `tol` has no impact.

    - 'cholesky': `tol` has no impact.

    - 'sparse_cg': norm of residuals smaller than `tol`.

    - 'lsqr': `tol` is set as atol and btol of scipy.sparse.linalg.lsqr,
      which control the norm of the residual vector in terms of the norms of
      matrix and coefficients.

    - 'sag' and 'saga': relative change of coef smaller than `tol`.

    - 'lbfgs': maximum of the absolute (projected) gradient=max|residuals|
      smaller than `tol`.

    *Changed in 1.2*
       Default value changed from 1e-3 to 1e-4 for consistency with other linear
       models.

- `class_weight`: dict or 'balanced', default=None
    Weights associated with classes in the form ``{class_label: weight}``.
    If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``.

- `solver`: {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg',             'sag', 'saga', 'lbfgs'}, default='auto'
    Solver to use in the computational routines:

    - 'auto' chooses the solver automatically based on the type of data.

    - 'svd' uses a Singular Value Decomposition of X to compute the Ridge
      coefficients. It is the most stable solver, in particular more stable
      for singular matrices than 'cholesky' at the cost of being slower.

    - 'cholesky' uses the standard scipy.linalg.solve function to
      obtain a closed-form solution.

    - 'sparse_cg' uses the conjugate gradient solver as found in
      scipy.sparse.linalg.cg. As an iterative algorithm, this solver is
      more appropriate than 'cholesky' for large-scale data
      (possibility to set `tol` and `max_iter`).

    - 'lsqr' uses the dedicated regularized least-squares routine
      scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative
      procedure.

    - 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses
      its unbiased and more flexible version named SAGA. Both methods
      use an iterative procedure, and are often faster than other solvers
      when both n_samples and n_features are large. Note that 'sag' and
      'saga' fast convergence is only guaranteed on features with
      approximately the same scale. You can preprocess the data with a
      scaler from sklearn.preprocessing.

      *Added in 0.17*
         Stochastic Average Gradient descent solver.
      *Added in 0.19*
         SAGA solver.

    - 'lbfgs' uses L-BFGS-B algorithm implemented in
      `scipy.optimize.minimize`. It can be used only when `positive`
      is True.

- `positive`: bool, default=False
    When set to ``True``, forces the coefficients to be positive.
    Only 'lbfgs' solver is supported in this case.

- `random_state`: int, RandomState instance, default=None
    Used when ``solver`` == 'sag' or 'saga' to shuffle the data.
    See `Glossary ` for details.

Attributes
----------
- `coef_`: ndarray of shape (1, n_features) or (n_classes, n_features)
    Coefficient of the features in the decision function.

    ``coef_`` is of shape (1, n_features) when the given problem is binary.

- `intercept_`: float or ndarray of shape (n_targets,)
    Independent term in decision function. Set to 0.0 if
    ``fit_intercept = False``.

- `n_iter_`: None or ndarray of shape (n_targets,)
    Actual number of iterations for each target. Available only for
    sag and lsqr solvers. Other solvers will return None.

- `classes_`: ndarray of shape (n_classes,)
    The classes labels.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `solver_`: str
    The solver that was used at fit time by the computational
    routines.

    *Added in 1.5*

See Also
--------
- `Ridge`: Ridge regression.
- `RidgeClassifierCV`:  Ridge classifier with built-in cross validation.

Notes
-----
For multi-class classification, n_class classifiers are trained in
a one-versus-all approach. Concretely, this is implemented by taking
advantage of the multi-variate response support in Ridge.

Examples
--------
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifier
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifier().fit(X, y)
>>> clf.score(X, y)
0.9595...

24.2.32 /ridge-classifier-cv

name	type	default	description
alphas
class-weight
cv
fit-intercept
scoring
store-cv-results
store-cv-values
predict-proba?

Ridge classifier with built-in cross-validation.

See glossary entry for `cross-validation estimator`.

By default, it performs Leave-One-Out Cross-Validation. Currently,
only the n_features > n_samples case is handled efficiently.

Read more in the User Guide: `ridge_regression`.

Parameters
----------
- `alphas`: array-like of shape (n_alphas,), default=(0.1, 1.0, 10.0)
    Array of alpha values to try.
    Regularization strength; must be a positive float. Regularization
    improves the conditioning of the problem and reduces the variance of
    the estimates. Larger values specify stronger regularization.
    Alpha corresponds to ``1 / (2C)`` in other linear models such as
    `~sklearn.linear_model.LogisticRegression` or
    `~sklearn.svm.LinearSVC`.
    If using Leave-One-Out cross-validation, alphas must be strictly positive.

- `fit_intercept`: bool, default=True
    Whether to calculate the intercept for this model. If set
    to false, no intercept will be used in calculations
    (i.e. data is expected to be centered).

- `scoring`: str, callable, default=None
    A string (see :ref:`scoring_parameter`) or a scorer callable object /
    function with signature ``scorer(estimator, X, y)``.

- `cv`: int, cross-validation generator or an iterable, default=None
    Determines the cross-validation splitting strategy.
    Possible inputs for cv are:

    - None, to use the efficient Leave-One-Out cross-validation
    - integer, to specify the number of folds.
    - `CV splitter`,
    - An iterable yielding (train, test) splits as arrays of indices.

    Refer User Guide: `cross_validation` for the various
    cross-validation strategies that can be used here.

- `class_weight`: dict or 'balanced', default=None
    Weights associated with classes in the form ``{class_label: weight}``.
    If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``.

- `store_cv_results`: bool, default=False
    Flag indicating if the cross-validation results corresponding to
    each alpha should be stored in the ``cv_results_`` attribute (see
    below). This flag is only compatible with ``cv=None`` (i.e. using
    Leave-One-Out Cross-Validation).

    *Changed in 1.5*
        Parameter name changed from `store_cv_values` to `store_cv_results`.

- `store_cv_values`: bool
    Flag indicating if the cross-validation values corresponding to
    each alpha should be stored in the ``cv_values_`` attribute (see
    below). This flag is only compatible with ``cv=None`` (i.e. using
    Leave-One-Out Cross-Validation).

    *Deprecated since 1.5*
        `store_cv_values` is deprecated in version 1.5 in favor of
        `store_cv_results` and will be removed in version 1.7.

Attributes
----------
- `cv_results_`: ndarray of shape (n_samples, n_targets, n_alphas), optional
    Cross-validation results for each alpha (only if ``store_cv_results=True`` and
    ``cv=None``). After ``fit()`` has been called, this attribute will
    contain the mean squared errors if `scoring is None` otherwise it
    will contain standardized per point prediction values.

    *Changed in 1.5*
        `cv_values_` changed to `cv_results_`.

- `coef_`: ndarray of shape (1, n_features) or (n_targets, n_features)
    Coefficient of the features in the decision function.

    ``coef_`` is of shape (1, n_features) when the given problem is binary.

- `intercept_`: float or ndarray of shape (n_targets,)
    Independent term in decision function. Set to 0.0 if
    ``fit_intercept = False``.

- `alpha_`: float
    Estimated regularization parameter.

- `best_score_`: float
    Score of base estimator with best alpha.

    *Added in 0.23*

- `classes_`: ndarray of shape (n_classes,)
    The classes labels.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `Ridge`: Ridge regression.
- `RidgeClassifier`: Ridge classifier.
- `RidgeCV`: Ridge regression with built-in cross validation.

Notes
-----
For multi-class classification, n_class classifiers are trained in
a one-versus-all approach. Concretely, this is implemented by taking
advantage of the multi-variate response support in Ridge.

Examples
--------
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifierCV
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
>>> clf.score(X, y)
0.9630...

24.2.33 /sgd-classifier

name	type	default	description
n-iter-no-change
learning-rate
average
tol
early-stopping
eta-0
shuffle
penalty
power-t
max-iter
n-jobs
random-state
fit-intercept
alpha
warm-start
l-1-ratio
validation-fraction
class-weight
loss
verbose
epsilon
predict-proba?

Linear classifiers (SVM, logistic regression, etc.) with SGD training.

This estimator implements regularized linear models with stochastic
gradient descent (SGD) learning: the gradient of the loss is estimated
each sample at a time and the model is updated along the way with a
decreasing strength schedule (aka learning rate). SGD allows minibatch
(online/out-of-core) learning via the `partial_fit` method.
For best results using the default learning rate schedule, the data should
have zero mean and unit variance.

This implementation works with data represented as dense or sparse arrays
of floating point values for the features. The model it fits can be
controlled with the loss parameter; by default, it fits a linear support
vector machine (SVM).

The regularizer is a penalty added to the loss function that shrinks model
parameters towards the zero vector using either the squared euclidean norm
L2 or the absolute norm L1 or a combination of both (Elastic Net). If the
parameter update crosses the 0.0 value because of the regularizer, the
update is truncated to 0.0 to allow for learning sparse models and achieve
online feature selection.

Read more in the User Guide: `sgd`.

Parameters
----------
- `loss`: {'hinge', 'log_loss', 'modified_huber', 'squared_hinge',        'perceptron', 'squared_error', 'huber', 'epsilon_insensitive',        'squared_epsilon_insensitive'}, default='hinge'
    The loss function to be used.

    - 'hinge' gives a linear SVM.
    - 'log_loss' gives logistic regression, a probabilistic classifier.
    - 'modified_huber' is another smooth loss that brings tolerance to
      outliers as well as probability estimates.
    - 'squared_hinge' is like hinge but is quadratically penalized.
    - 'perceptron' is the linear loss used by the perceptron algorithm.
    - The other losses, 'squared_error', 'huber', 'epsilon_insensitive' and
      'squared_epsilon_insensitive' are designed for regression but can be useful
      in classification as well; see
      `~sklearn.linear_model.SGDRegressor` for a description.

    More details about the losses formulas can be found in the
    User Guide: `sgd_mathematical_formulation`.

- `penalty`: {'l2', 'l1', 'elasticnet', None}, default='l2'
    The penalty (aka regularization term) to be used. Defaults to 'l2'
    which is the standard regularizer for linear SVM models. 'l1' and
    'elasticnet' might bring sparsity to the model (feature selection)
    not achievable with 'l2'. No penalty is added when set to `None`.

- `alpha`: float, default=0.0001
    Constant that multiplies the regularization term. The higher the
    value, the stronger the regularization. Also used to compute the
    learning rate when `learning_rate` is set to 'optimal'.
    Values must be in the range `[0.0, inf)`.

- `l1_ratio`: float, default=0.15
    The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1.
    l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1.
    Only used if `penalty` is 'elasticnet'.
    Values must be in the range `[0.0, 1.0]`.

- `fit_intercept`: bool, default=True
    Whether the intercept should be estimated or not. If False, the
    data is assumed to be already centered.

- `max_iter`: int, default=1000
    The maximum number of passes over the training data (aka epochs).
    It only impacts the behavior in the ``fit`` method, and not the
    `partial_fit` method.
    Values must be in the range `[1, inf)`.

    *Added in 0.19*

- `tol`: float or None, default=1e-3
    The stopping criterion. If it is not None, training will stop
    when (loss > best_loss - tol) for ``n_iter_no_change`` consecutive
    epochs.
    Convergence is checked against the training loss or the
    validation loss depending on the `early_stopping` parameter.
    Values must be in the range `[0.0, inf)`.

    *Added in 0.19*

- `shuffle`: bool, default=True
    Whether or not the training data should be shuffled after each epoch.

- `verbose`: int, default=0
    The verbosity level.
    Values must be in the range `[0, inf)`.

- `epsilon`: float, default=0.1
    Epsilon in the epsilon-insensitive loss functions; only if `loss` is
    'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive'.
    For 'huber', determines the threshold at which it becomes less
    important to get the prediction exactly right.
    For epsilon-insensitive, any differences between the current prediction
    and the correct label are ignored if they are less than this threshold.
    Values must be in the range `[0.0, inf)`.

- `n_jobs`: int, default=None
    The number of CPUs to use to do the OVA (One Versus All, for
    multi-class problems) computation.
    ``None`` means 1 unless in a `joblib.parallel_backend` context.
    ``-1`` means using all processors. See `Glossary `
    for more details.

- `random_state`: int, RandomState instance, default=None
    Used for shuffling the data, when ``shuffle`` is set to ``True``.
    Pass an int for reproducible output across multiple function calls.
    See `Glossary `.
    Integer values must be in the range `[0, 2**32 - 1]`.

- `learning_rate`: str, default='optimal'
    The learning rate schedule:

    - 'constant': `eta = eta0`
    - 'optimal': `eta = 1.0 / (alpha * (t + t0))`
      where `t0` is chosen by a heuristic proposed by Leon Bottou.
    - 'invscaling': `eta = eta0 / pow(t, power_t)`
    - 'adaptive': `eta = eta0`, as long as the training keeps decreasing.
      Each time n_iter_no_change consecutive epochs fail to decrease the
      training loss by tol or fail to increase validation score by tol if
      `early_stopping` is `True`, the current learning rate is divided by 5.

        *Added in 0.20*
            Added 'adaptive' option

- `eta0`: float, default=0.0
    The initial learning rate for the 'constant', 'invscaling' or
    'adaptive' schedules. The default value is 0.0 as eta0 is not used by
    the default schedule 'optimal'.
    Values must be in the range `[0.0, inf)`.

- `power_t`: float, default=0.5
    The exponent for inverse scaling learning rate.
    Values must be in the range `(-inf, inf)`.

- `early_stopping`: bool, default=False
    Whether to use early stopping to terminate training when validation
    score is not improving. If set to `True`, it will automatically set aside
    a stratified fraction of training data as validation and terminate
    training when validation score returned by the `score` method is not
    improving by at least tol for n_iter_no_change consecutive epochs.

    *Added in 0.20*
        Added 'early_stopping' option

- `validation_fraction`: float, default=0.1
    The proportion of training data to set aside as validation set for
    early stopping. Must be between 0 and 1.
    Only used if `early_stopping` is True.
    Values must be in the range `(0.0, 1.0)`.

    *Added in 0.20*
        Added 'validation_fraction' option

- `n_iter_no_change`: int, default=5
    Number of iterations with no improvement to wait before stopping
    fitting.
    Convergence is checked against the training loss or the
    validation loss depending on the `early_stopping` parameter.
    Integer values must be in the range `[1, max_iter)`.

    *Added in 0.20*
        Added 'n_iter_no_change' option

- `class_weight`: dict, {class_label: weight} or "balanced", default=None
    Preset for the class_weight fit parameter.

    Weights associated with classes. If not given, all classes
    are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``.

- `warm_start`: bool, default=False
    When set to True, reuse the solution of the previous call to fit as
    initialization, otherwise, just erase the previous solution.
    See `the Glossary `.

    Repeatedly calling fit or partial_fit when warm_start is True can
    result in a different solution than when calling fit a single time
    because of the way the data is shuffled.
    If a dynamic learning rate is used, the learning rate is adapted
    depending on the number of samples already seen. Calling ``fit`` resets
    this counter, while ``partial_fit`` will result in increasing the
    existing counter.

- `average`: bool or int, default=False
    When set to `True`, computes the averaged SGD weights across all
    updates and stores the result in the ``coef_`` attribute. If set to
    an int greater than 1, averaging will begin once the total number of
    samples seen reaches `average`. So ``average=10`` will begin
    averaging after seeing 10 samples.
    Integer values must be in the range `[1, n_samples]`.

Attributes
----------
- `coef_`: ndarray of shape (1, n_features) if n_classes == 2 else             (n_classes, n_features)
    Weights assigned to the features.

- `intercept_`: ndarray of shape (1,) if n_classes == 2 else (n_classes,)
    Constants in decision function.

- `n_iter_`: int
    The actual number of iterations before reaching the stopping criterion.
    For multiclass fits, it is the maximum over every binary fit.

- `loss_function_`: concrete ``LossFunction``

    *Deprecated since 1.4*
        Attribute `loss_function_` was deprecated in version 1.4 and will be
        removed in 1.6.

- `classes_`: array of shape (n_classes,)

- `t_`: int
    Number of weight updates performed during training.
    Same as ``(n_iter_ * n_samples + 1)``.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

See Also
--------
- `sklearn.svm.LinearSVC`: Linear support vector classification.
- `LogisticRegression`: Logistic regression.
- `Perceptron`: Inherits from SGDClassifier. ``Perceptron()`` is equivalent to
    ``SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant",
    penalty=None)``.

Examples
--------
>>> import numpy as np
>>> from sklearn.linear_model import SGDClassifier
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.pipeline import make_pipeline
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> # Always scale the input. The most convenient way is to use a pipeline.
>>> clf = make_pipeline(StandardScaler(),
...                     SGDClassifier(max_iter=1000, tol=1e-3))
>>> clf.fit(X, Y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('sgdclassifier', SGDClassifier())])
>>> print(clf.predict([[-0.8, -1]]))
[1]

24.2.34 /svc

name	type	default	description
break-ties
kernel
gamma
degree
decision-function-shape
probability
tol
shrinking
c
max-iter
random-state
coef-0
class-weight
cache-size
verbose
predict-proba?

C-Support Vector Classification.

The implementation is based on libsvm. The fit time scales at least
quadratically with the number of samples and may be impractical
beyond tens of thousands of samples. For large datasets
consider using `~sklearn.svm.LinearSVC` or
`~sklearn.linear_model.SGDClassifier` instead, possibly after a
`~sklearn.kernel_approximation.Nystroem` transformer or
other :ref:`kernel_approximation`.

The multiclass support is handled according to a one-vs-one scheme.

For details on the precise mathematical formulation of the provided
kernel functions and how `gamma`, `coef0` and `degree` affect each
other, see the corresponding section in the narrative documentation:
:ref:`svm_kernels`.

To learn how to tune SVC's hyperparameters, see the following example:
:ref:`sphx_glr_auto_examples_model_selection_plot_nested_cross_validation_iris.py`

Read more in the User Guide: `svm_classification`.

Parameters
----------
- `C`: float, default=1.0
    Regularization parameter. The strength of the regularization is
    inversely proportional to C. Must be strictly positive. The penalty
    is a squared l2 penalty. For an intuitive visualization of the effects
    of scaling the regularization parameter C, see
    :ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`.

- `kernel`: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable,          default='rbf'
    Specifies the kernel type to be used in the algorithm. If
    none is given, 'rbf' will be used. If a callable is given it is used to
    pre-compute the kernel matrix from data matrices; that matrix should be
    an array of shape ``(n_samples, n_samples)``. For an intuitive
    visualization of different kernel types see
    :ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`.

- `degree`: int, default=3
    Degree of the polynomial kernel function ('poly').
    Must be non-negative. Ignored by all other kernels.

- `gamma`: {'scale', 'auto'} or float, default='scale'
    Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.

    - if ``gamma='scale'`` (default) is passed then it uses
      1 / (n_features * X.var()) as value of gamma,
    - if 'auto', uses 1 / n_features
    - if float, must be non-negative.

    *Changed in 0.22*
       The default value of ``gamma`` changed from 'auto' to 'scale'.

- `coef0`: float, default=0.0
    Independent term in kernel function.
    It is only significant in 'poly' and 'sigmoid'.

- `shrinking`: bool, default=True
    Whether to use the shrinking heuristic.
    See the User Guide: `shrinking_svm`.

- `probability`: bool, default=False
    Whether to enable probability estimates. This must be enabled prior
    to calling `fit`, will slow down that method as it internally uses
    5-fold cross-validation, and `predict_proba` may be inconsistent with
    `predict`. Read more in the User Guide: `scores_probabilities`.

- `tol`: float, default=1e-3
    Tolerance for stopping criterion.

- `cache_size`: float, default=200
    Specify the size of the kernel cache (in MB).

- `class_weight`: dict or 'balanced', default=None
    Set the parameter C of class i to class_weight[i]*C for
    SVC. If not given, all classes are supposed to have
    weight one.
    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``.

- `verbose`: bool, default=False
    Enable verbose output. Note that this setting takes advantage of a
    per-process runtime setting in libsvm that, if enabled, may not work
    properly in a multithreaded context.

- `max_iter`: int, default=-1
    Hard limit on iterations within solver, or -1 for no limit.

- `decision_function_shape`: {'ovo', 'ovr'}, default='ovr'
    Whether to return a one-vs-rest ('ovr') decision function of shape
    (n_samples, n_classes) as all other classifiers, or the original
    one-vs-one ('ovo') decision function of libsvm which has shape
    (n_samples, n_classes * (n_classes - 1) / 2). However, note that
    internally, one-vs-one ('ovo') is always used as a multi-class strategy
    to train models; an ovr matrix is only constructed from the ovo matrix.
    The parameter is ignored for binary classification.

    *Changed in 0.19*
        decision_function_shape is 'ovr' by default.

    *Added in 0.17*
       *decision_function_shape='ovr'* is recommended.

    *Changed in 0.17*
       Deprecated *decision_function_shape='ovo' and None*.

- `break_ties`: bool, default=False
    If true, ``decision_function_shape='ovr'``, and number of classes > 2,
    `predict` will break ties according to the confidence values of
    `decision_function`; otherwise the first class among the tied
    classes is returned. Please note that breaking ties comes at a
    relatively high computational cost compared to a simple predict.

    *Added in 0.22*

- `random_state`: int, RandomState instance or None, default=None
    Controls the pseudo random number generation for shuffling the data for
    probability estimates. Ignored when `probability` is False.
    Pass an int for reproducible output across multiple function calls.
    See `Glossary `.

Attributes
----------
- `class_weight_`: ndarray of shape (n_classes,)
    Multipliers of parameter C for each class.
    Computed based on the ``class_weight`` parameter.

- `classes_`: ndarray of shape (n_classes,)
    The classes labels.

- `coef_`: ndarray of shape (n_classes * (n_classes - 1) / 2, n_features)
    Weights assigned to the features (coefficients in the primal
    problem). This is only available in the case of a linear kernel.

    `coef_` is a readonly property derived from `dual_coef_` and
    `support_vectors_`.

- `dual_coef_`: ndarray of shape (n_classes -1, n_SV)
    Dual coefficients of the support vector in the decision
    function (see :ref:`sgd_mathematical_formulation`), multiplied by
    their targets.
    For multiclass, coefficient for all 1-vs-1 classifiers.
    The layout of the coefficients in the multiclass case is somewhat
    non-trivial. See the multi-class section of the User Guide: `svm_multi_class` for details.

- `fit_status_`: int
    0 if correctly fitted, 1 otherwise (will raise warning)

- `intercept_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)
    Constants in decision function.

- `n_features_in_`: int
    Number of features seen during `fit`.

    *Added in 0.24*

- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
    Names of features seen during `fit`. Defined only when `X`
    has feature names that are all strings.

    *Added in 1.0*

- `n_iter_`: ndarray of shape (n_classes * (n_classes - 1) // 2,)
    Number of iterations run by the optimization routine to fit the model.
    The shape of this attribute depends on the number of models optimized
    which in turn depends on the number of classes.

    *Added in 1.1*

- `support_`: ndarray of shape (n_SV)
    Indices of support vectors.

- `support_vectors_`: ndarray of shape (n_SV, n_features)
    Support vectors. An empty array if kernel is precomputed.

- `n_support_`: ndarray of shape (n_classes,), dtype=int32
    Number of support vectors for each class.

- `probA_`: ndarray of shape (n_classes * (n_classes - 1) / 2)
- `probB_`: ndarray of shape (n_classes * (n_classes - 1) / 2)
    If `probability=True`, it corresponds to the parameters learned in
    Platt scaling to produce probability estimates from decision values.
    If `probability=False`, it's an empty array. Platt scaling uses the
    logistic function
    ``1 / (1 + exp(decision_value * probA_ + probB_))``
    where ``probA_`` and ``probB_`` are learned from the dataset [2]_. For
    more information on the multiclass case and training procedure see
    section 8 of [1]_.

- `shape_fit_`: tuple of int of shape (n_dimensions_of_X,)
    Array dimensions of training vector ``X``.

See Also
--------
- `SVR`: Support Vector Machine for Regression implemented using libsvm.

- `LinearSVC`: Scalable Linear Support Vector Machine for classification
    implemented using liblinear. Check the See Also section of
    LinearSVC for more comparison element.

References
----------

[1] LIBSVM: A Library for Support Vector Machines
[2] Platt, John (1999). "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods"
Examples
import numpy as np from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.svm import SVC clf = make_pipeline(StandardScaler(), SVC(gamma='auto')) clf.fit(X, y) Pipeline(steps=[('standardscaler', StandardScaler()), ('svc', SVC(gamma='auto'))])
print(clf.predict([[-0.8, -1]])) [1]

24.3 `:sklearn.regression` models

24.3.1 /ada-boost-regressor

name	type	default	description
estimator
learning-rate
loss
n-estimators
random-state
predict-proba?

24.3.2 /ard-regression

name	type	default	description
tol
alpha-2
threshold-lambda
max-iter
lambda-1
copy-x
lambda-2
fit-intercept
alpha-1
verbose
compute-score
predict-proba?

24.3.3 /bagging-regressor

name	type	default	description
bootstrap
bootstrap-features
n-jobs
random-state
estimator
oob-score
max-features
warm-start
n-estimators
max-samples
verbose
predict-proba?

24.3.4 /bayesian-ridge

name	type	default	description
tol
alpha-2
max-iter
lambda-1
copy-x
lambda-2
alpha-init
fit-intercept
alpha-1
lambda-init
verbose
compute-score
predict-proba?

24.3.5 /cca

name	type	default	description
copy
max-iter
n-components
scale
tol
predict-proba?

24.3.6 /decision-tree-regressor

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
criterion
predict-proba?

24.3.7 /dummy-regressor

name	type	default	description
constant
quantile
strategy
predict-proba?

24.3.8 /elastic-net

name	type	default	description
positive
tol
max-iter
random-state
copy-x
precompute
fit-intercept
alpha
warm-start
selection
l-1-ratio
predict-proba?

24.3.9 /elastic-net-cv

name	type	default	description
positive
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
precompute
fit-intercept
cv
selection
l-1-ratio
verbose
predict-proba?

24.3.10 /extra-tree-regressor

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
ccp-alpha
splitter
random-state
min-samples-leaf
max-features
monotonic-cst
max-depth
criterion
predict-proba?

24.3.11 /extra-trees-regressor

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
n-estimators
max-samples
criterion
verbose
predict-proba?

24.3.12 /gamma-regressor

name	type	default	description
alpha
fit-intercept
max-iter
solver
tol
verbose
warm-start
predict-proba?

24.3.13 /gaussian-process-regressor

name	type	default	description
alpha
copy-x-train
kernel
n-restarts-optimizer
n-targets
normalize-y
optimizer
random-state
predict-proba?

24.3.14 /gradient-boosting-regressor

name	type	default	description
n-iter-no-change
learning-rate
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
tol
subsample
ccp-alpha
random-state
min-samples-leaf
max-features
init
alpha
warm-start
max-depth
validation-fraction
n-estimators
criterion
loss
verbose
predict-proba?

24.3.15 /hist-gradient-boosting-regressor

name	type	default	description
n-iter-no-change
learning-rate
max-leaf-nodes
scoring
tol
early-stopping
quantile
max-iter
random-state
max-bins
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
validation-fraction
loss
interaction-cst
verbose
categorical-features
l-2-regularization
predict-proba?

24.3.16 /huber-regressor

name	type	default	description
alpha
epsilon
fit-intercept
max-iter
tol
warm-start
predict-proba?

24.3.17 /isotonic-regression

name	type	default	description
increasing
out-of-bounds
y-max
y-min
predict-proba?

24.3.18 /k-neighbors-regressor

name	type	default	description
algorithm
leaf-size
metric
metric-params
n-jobs
n-neighbors
p
weights
predict-proba?

24.3.19 /kernel-ridge

name	type	default	description
alpha
coef-0
degree
gamma
kernel
kernel-params
predict-proba?

24.3.20 /lars

name	type	default	description
fit-path
eps
random-state
jitter
copy-x
precompute
fit-intercept
n-nonzero-coefs
verbose
predict-proba?

24.3.21 /lars-cv

name	type	default	description
eps
max-n-alphas
max-iter
n-jobs
copy-x
precompute
fit-intercept
cv
verbose
predict-proba?

24.3.22 /lasso

name	type	default	description
positive
tol
max-iter
random-state
copy-x
precompute
fit-intercept
alpha
warm-start
selection
predict-proba?

24.3.23 /lasso-cv

name	type	default	description
positive
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
precompute
fit-intercept
cv
selection
verbose
predict-proba?

24.3.24 /lasso-lars

name	type	default	description
positive
fit-path
eps
max-iter
random-state
jitter
copy-x
precompute
fit-intercept
alpha
verbose
predict-proba?

24.3.25 /lasso-lars-cv

name	type	default	description
positive
eps
max-n-alphas
max-iter
n-jobs
copy-x
precompute
fit-intercept
cv
verbose
predict-proba?

24.3.26 /lasso-lars-ic

name	type	default	description
positive
eps
noise-variance
max-iter
copy-x
precompute
fit-intercept
criterion
verbose
predict-proba?

24.3.27 /linear-regression

name	type	default	description
copy-x
fit-intercept
n-jobs
positive
predict-proba?

24.3.28 /linear-svr

name	type	default	description
tol
intercept-scaling
c
max-iter
random-state
dual
fit-intercept
loss
verbose
epsilon
predict-proba?

24.3.29 /mlp-regressor

name	type	default	description
n-iter-no-change
learning-rate
activation
hidden-layer-sizes
tol
beta-2
early-stopping
nesterovs-momentum
batch-size
solver
shuffle
power-t
max-fun
beta-1
max-iter
random-state
momentum
learning-rate-init
alpha
warm-start
validation-fraction
verbose
epsilon
predict-proba?

24.3.30 /multi-task-elastic-net

name	type	default	description
tol
max-iter
random-state
copy-x
fit-intercept
alpha
warm-start
selection
l-1-ratio
predict-proba?

24.3.31 /multi-task-elastic-net-cv

name	type	default	description
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
fit-intercept
cv
selection
l-1-ratio
verbose
predict-proba?

24.3.32 /multi-task-lasso

name	type	default	description
alpha
copy-x
fit-intercept
max-iter
random-state
selection
tol
warm-start
predict-proba?

24.3.33 /multi-task-lasso-cv

name	type	default	description
tol
n-alphas
eps
alphas
max-iter
n-jobs
random-state
copy-x
fit-intercept
cv
selection
verbose
predict-proba?

24.3.34 /nu-svr

name	type	default	description
kernel
gamma
degree
tol
nu
shrinking
c
max-iter
coef-0
cache-size
verbose
predict-proba?

24.3.35 /orthogonal-matching-pursuit

name	type	default	description
fit-intercept
n-nonzero-coefs
precompute
tol
predict-proba?

24.3.36 /orthogonal-matching-pursuit-cv

name	type	default	description
copy
cv
fit-intercept
max-iter
n-jobs
verbose
predict-proba?

24.3.37 /passive-aggressive-regressor

name	type	default	description
n-iter-no-change
average
tol
early-stopping
shuffle
c
max-iter
random-state
fit-intercept
warm-start
validation-fraction
loss
verbose
epsilon
predict-proba?

24.3.38 /pls-canonical

name	type	default	description
algorithm
copy
max-iter
n-components
scale
tol
predict-proba?

24.3.39 /pls-regression

name	type	default	description
copy
max-iter
n-components
scale
tol
predict-proba?

24.3.40 /poisson-regressor

name	type	default	description
alpha
fit-intercept
max-iter
solver
tol
verbose
warm-start
predict-proba?

24.3.41 /quantile-regressor

name	type	default	description
alpha
fit-intercept
quantile
solver
solver-options
predict-proba?

24.3.42 /radius-neighbors-regressor

name	type	default	description
algorithm
leaf-size
metric
metric-params
n-jobs
p
radius
weights
predict-proba?

24.3.43 /random-forest-regressor

name	type	default	description
min-weight-fraction-leaf
max-leaf-nodes
min-impurity-decrease
min-samples-split
bootstrap
ccp-alpha
n-jobs
random-state
oob-score
min-samples-leaf
max-features
monotonic-cst
warm-start
max-depth
n-estimators
max-samples
criterion
verbose
predict-proba?

24.3.44 /ransac-regressor

name	type	default	description
is-data-valid
max-skips
random-state
min-samples
stop-probability
estimator
stop-n-inliers
max-trials
residual-threshold
is-model-valid
loss
stop-score
predict-proba?

24.3.45 /ridge

name	type	default	description
alpha
copy-x
fit-intercept
max-iter
positive
random-state
solver
tol
predict-proba?

24.3.46 /ridge-cv

name	type	default	description
alpha-per-target
alphas
cv
fit-intercept
gcv-mode
scoring
store-cv-results
store-cv-values
predict-proba?

24.3.47 /sgd-regressor

name	type	default	description
n-iter-no-change
learning-rate
average
tol
early-stopping
eta-0
shuffle
penalty
power-t
max-iter
random-state
fit-intercept
alpha
warm-start
l-1-ratio
validation-fraction
loss
verbose
epsilon
predict-proba?

24.3.48 /svr

name	type	default	description
kernel
gamma
degree
tol
shrinking
c
max-iter
coef-0
cache-size
verbose
epsilon
predict-proba?

24.3.49 /theil-sen-regressor

name	type	default	description
max-subpopulation
tol
n-subsamples
max-iter
n-jobs
random-state
copy-x
fit-intercept
verbose
predict-proba?

24.3.50 /transformed-target-regressor

name	type	default	description
check-inverse
func
inverse-func
regressor
transformer
predict-proba?

24.3.51 /tweedie-regressor

name	type	default	description
tol
solver
power
max-iter
link
fit-intercept
alpha
warm-start
verbose
predict-proba?

source: notebooks/noj_book/sklearn_reference.clj

24.1 Sklearn model reference

24.2 :sklearn.classification models

24.2.1 /ada-boost-classifier

Examples

24.2.2 /bagging-classifier

Examples

24.2.3 /bernoulli-nb

24.2.4 /calibrated-classifier-cv

Examples

24.2.5 /categorical-nb

24.2.6 /complement-nb

24.2.7 /decision-tree-classifier

24.2.8 /dummy-classifier

24.2.9 /extra-tree-classifier

24.2.10 /extra-trees-classifier

24.2.11 /gaussian-nb

24.2.12 /gaussian-process-classifier

24.2.13 /gradient-boosting-classifier

24.2.14 /hist-gradient-boosting-classifier

24.2.15 /k-neighbors-classifier

mples

24.2.16 /label-propagation

24.2.17 /label-spreading

24.2.18 /linear-discriminant-analysis

24.2.19 /linear-svc

24.2.20 /logistic-regression

tes

o

ces

s

24.2.21 /logistic-regression-cv

tes

o

s

24.2.22 /mlp-classifier

24.2.23 /multinomial-nb

24.2.24 /nearest-centroid

24.2.25 /nu-svc

Examples

24.2.26 /passive-aggressive-classifier

24.2.27 /perceptron

24.2.28 /quadratic-discriminant-analysis

24.2.29 /radius-neighbors-classifier

24.2.30 /random-forest-classifier

24.2.31 /ridge-classifier

24.2.32 /ridge-classifier-cv

24.2.33 /sgd-classifier

24.2.34 /svc

Examples

24.3 :sklearn.regression models

24.3.1 /ada-boost-regressor

24.3.2 /ard-regression

24.3.3 /bagging-regressor

24.3.4 /bayesian-ridge

24.3.5 /cca

24.3.6 /decision-tree-regressor

24.3.7 /dummy-regressor

24.3.8 /elastic-net

24.3.9 /elastic-net-cv

24.3.10 /extra-tree-regressor

24.3.11 /extra-trees-regressor

24.3.12 /gamma-regressor

24.3.13 /gaussian-process-regressor

24.3.14 /gradient-boosting-regressor

24.3.15 /hist-gradient-boosting-regressor

24.3.16 /huber-regressor

24.3.17 /isotonic-regression

24.3.18 /k-neighbors-regressor

24.3.19 /kernel-ridge

24.3.20 /lars

24.3.21 /lars-cv

24.3.22 /lasso

24.3.23 /lasso-cv

24.3.24 /lasso-lars

24.3.25 /lasso-lars-cv

24.3.26 /lasso-lars-ic

24.3.27 /linear-regression

24.3.28 /linear-svr

24.3.29 /mlp-regressor

24.3.30 /multi-task-elastic-net

24.2 `:sklearn.classification` models

24.3 `:sklearn.regression` models