24 Sklearn model reference
As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.
This specific chapter focuses on the models of the scijit-learn Python library, which is wrapped by sklearn-clj.
(ns noj-book.sklearn-reference
(:require
[noj-book.utils.render-tools :refer [render-key-info]]
[scicloj.kindly.v4.kind :as kind]
[scicloj.metamorph.core :as mm]
[scicloj.metamorph.ml :as ml]
[tech.v3.dataset.tensor :as dst]
[libpython-clj2.python :refer [py.- ->jvm]]
[tech.v3.dataset.metamorph :as ds-mm]
[noj-book.utils.render-tools-sklearn]
[scicloj.sklearn-clj.ml]))24.1 Sklearn model reference
Below we find all sklearn models with their parameters and the original documentation.
The parameters are given as Clojure keys in kebab-case. As the document texts are imported from Python, they refer to the Python spelling of the parameter.
But the translation between the two should be obvious.
Example: logistic regression
(def ds (dst/tensor->dataset [[0 0 0] [1 1 1] [2 2 2]]))Make pipe with sklearn model ‘logistic-regression’
(def pipe
(mm/pipeline
(ds-mm/set-inference-target 2)
{:metamorph/id :model}
(ml/model {:model-type :sklearn.classification/logistic-regression
:max-iter 100})))Train model:
(def fitted-ctx
(pipe {:metamorph/data ds
:metamorph/mode :fit}))Predict on new data:
(->
(mm/transform-pipe
(dst/tensor->dataset [[3 4 5]])
pipe
fitted-ctx)
:metamorph/data):_unnamed [1 3]:
| 0 | 1 | 2 |
|---|---|---|
| 0.00725794 | 0.10454345 | 2.0 |
Access model details via Python interop (using libpython-clj):
(-> fitted-ctx :model :model-data :model
(py.- coef_)
(->jvm))#tech.v3.tensor<float64>[3 2]
[[ -0.4807 -0.4807]
[-2.061E-05 -2.061E-05]
[ 0.4807 0.4807]]All model attributes are also included in the context.
(def model-attributes
(-> fitted-ctx :model :model-data :attributes))(kind/hiccup
[:dl (map
(fn [[k v]]
[:span
(vector :dt k)
(vector :dd (clojure.pprint/write v :stream nil))])
model-attributes)])- n_features_in_
- 2
- coef_
- [[-4.80679547e-01 -4.80679547e-01] [-2.06085772e-05 -2.06085772e-05] [ 4.80700156e-01 4.80700156e-01]]
- intercept_
- [ 0.87322115 0.17611579 -1.04933694]
- n_iter_
- [11]
- classes_
- [0. 1. 2.]
24.2 :sklearn.classification models
24.2.1 /ada-boost-classifier
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| estimator | |||
| learning-rate | |||
| n-estimators | |||
| random-state | |||
| predict-proba? |
An AdaBoost classifier.
An AdaBoost [1]_ classifier is a meta-estimator that begins by fitting a
classifier on the original dataset and then fits additional copies of the
classifier on the same dataset but where the weights of incorrectly
classified instances are adjusted such that subsequent classifiers focus
more on difficult cases.
This class implements the algorithm based on [2]_.
Read more in the User Guide: `adaboost`.
*Added in 0.14*
Parameters
----------
- `estimator`: object, default=None
The base estimator from which the boosted ensemble is built.
Support for sample weighting is required, as well as proper
``classes_`` and ``n_classes_`` attributes. If ``None``, then
the base estimator is `~sklearn.tree.DecisionTreeClassifier`
initialized with `max_depth=1`.
*Added in 1.2*
`base_estimator` was renamed to `estimator`.
- `n_estimators`: int, default=50
The maximum number of estimators at which boosting is terminated.
In case of perfect fit, the learning procedure is stopped early.
Values must be in the range `[1, inf)`.
- `learning_rate`: float, default=1.0
Weight applied to each classifier at each boosting iteration. A higher
learning rate increases the contribution of each classifier. There is
a trade-off between the `learning_rate` and `n_estimators` parameters.
Values must be in the range `(0.0, inf)`.
- `algorithm`: {'SAMME', 'SAMME.R'}, default='SAMME.R'
If 'SAMME.R' then use the SAMME.R real boosting algorithm.
``estimator`` must support calculation of class probabilities.
If 'SAMME' then use the SAMME discrete boosting algorithm.
The SAMME.R algorithm typically converges faster than SAMME,
achieving a lower test error with fewer boosting iterations.
*Deprecated since 1.4*
`"SAMME.R"` is deprecated and will be removed in version 1.6.
'"SAMME"' will become the default.
- `random_state`: int, RandomState instance or None, default=None
Controls the random seed given at each `estimator` at each
boosting iteration.
Thus, it is only used when `estimator` exposes a `random_state`.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
Attributes
----------
- `estimator_`: estimator
The base estimator from which the ensemble is grown.
*Added in 1.2*
`base_estimator_` was renamed to `estimator_`.
- `estimators_`: list of classifiers
The collection of fitted sub-estimators.
- `classes_`: ndarray of shape (n_classes,)
The classes labels.
- `n_classes_`: int
The number of classes.
- `estimator_weights_`: ndarray of floats
Weights for each estimator in the boosted ensemble.
- `estimator_errors_`: ndarray of floats
Classification error for each estimator in the boosted
ensemble.
- `feature_importances_`: ndarray of shape (n_features,)
The impurity-based feature importances if supported by the
``estimator`` (when based on decision trees).
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
`sklearn.inspection.permutation_importance` as an alternative.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `AdaBoostRegressor`: An AdaBoost regressor that begins by fitting a
regressor on the original dataset and then fits additional copies of
the regressor on the same dataset but where the weights of instances
are adjusted according to the error of the current prediction.
- `GradientBoostingClassifier`: GB builds an additive model in a forward
stage-wise fashion. Regression trees are fit on the negative gradient
of the binomial or multinomial deviance loss function. Binary
classification is a special case where only a single regression tree is
induced.
- `sklearn.tree.DecisionTreeClassifier`: A non-parametric supervised learning
method used for classification.
Creates a model that predicts the value of a target variable by
learning simple decision rules inferred from the data features.
References
----------
[1] Y. Freund, R. Schapire, "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", 1995.
[2] :doi:
J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost." Statistics and its Interface 2.3 (2009): 349-360. <10.4310/SII.2009.v2.n3.a8>Examples
from sklearn.ensemble import AdaBoostClassifier from sklearn.datasets import make_classification X, y = make_classification(n_samples=1000, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) clf = AdaBoostClassifier(n_estimators=100, algorithm="SAMME", random_state=0) clf.fit(X, y) AdaBoostClassifier(algorithm='SAMME', n_estimators=100, random_state=0) clf.predict([[0, 0, 0, 0]]) array([1]) clf.score(X, y) 0.96...
For a detailed example of using AdaBoost to fit a sequence of DecisionTrees as weaklearners, please refer to :ref:
sphx_glr_auto_examples_ensemble_plot_adaboost_multiclass.py.For a detailed example of using AdaBoost to fit a non-linearly seperable classification dataset composed of two Gaussian quantiles clusters, please refer to :ref:
sphx_glr_auto_examples_ensemble_plot_adaboost_twoclass.py.
24.2.2 /bagging-classifier
| name | type | default | description |
|---|---|---|---|
| bootstrap | |||
| bootstrap-features | |||
| n-jobs | |||
| random-state | |||
| estimator | |||
| oob-score | |||
| max-features | |||
| warm-start | |||
| n-estimators | |||
| max-samples | |||
| verbose | |||
| predict-proba? |
A Bagging classifier.
A Bagging classifier is an ensemble meta-estimator that fits base
classifiers each on random subsets of the original dataset and then
aggregate their individual predictions (either by voting or by averaging)
to form a final prediction. Such a meta-estimator can typically be used as
a way to reduce the variance of a black-box estimator (e.g., a decision
tree), by introducing randomization into its construction procedure and
then making an ensemble out of it.
This algorithm encompasses several works from the literature. When random
subsets of the dataset are drawn as random subsets of the samples, then
this algorithm is known as Pasting [1]_. If samples are drawn with
replacement, then the method is known as Bagging [2]_. When random subsets
of the dataset are drawn as random subsets of the features, then the method
is known as Random Subspaces [3]_. Finally, when base estimators are built
on subsets of both samples and features, then the method is known as
Random Patches [4]_.
Read more in the User Guide: `bagging`.
*Added in 0.15*
Parameters
----------
- `estimator`: object, default=None
The base estimator to fit on random subsets of the dataset.
If None, then the base estimator is a
`~sklearn.tree.DecisionTreeClassifier`.
*Added in 1.2*
`base_estimator` was renamed to `estimator`.
- `n_estimators`: int, default=10
The number of base estimators in the ensemble.
- `max_samples`: int or float, default=1.0
The number of samples to draw from X to train each base estimator (with
replacement by default, see `bootstrap` for more details).
- If int, then draw `max_samples` samples.
- If float, then draw `max_samples * X.shape[0]` samples.
- `max_features`: int or float, default=1.0
The number of features to draw from X to train each base estimator (
without replacement by default, see `bootstrap_features` for more
details).
- If int, then draw `max_features` features.
- If float, then draw `max(1, int(max_features * n_features_in_))` features.
- `bootstrap`: bool, default=True
Whether samples are drawn with replacement. If False, sampling
without replacement is performed.
- `bootstrap_features`: bool, default=False
Whether features are drawn with replacement.
- `oob_score`: bool, default=False
Whether to use out-of-bag samples to estimate
the generalization error. Only available if bootstrap=True.
- `warm_start`: bool, default=False
When set to True, reuse the solution of the previous call to fit
and add more estimators to the ensemble, otherwise, just fit
a whole new ensemble. See `the Glossary `.
*Added in 0.17*
*warm_start* constructor parameter.
- `n_jobs`: int, default=None
The number of jobs to run in parallel for both `fit` and
`predict`. ``None`` means 1 unless in a
`joblib.parallel_backend` context. ``-1`` means using all
processors. See `Glossary ` for more details.
- `random_state`: int, RandomState instance or None, default=None
Controls the random resampling of the original dataset
(sample wise and feature wise).
If the base estimator accepts a `random_state` attribute, a different
seed is generated for each instance in the ensemble.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
- `verbose`: int, default=0
Controls the verbosity when fitting and predicting.
Attributes
----------
- `estimator_`: estimator
The base estimator from which the ensemble is grown.
*Added in 1.2*
`base_estimator_` was renamed to `estimator_`.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `estimators_`: list of estimators
The collection of fitted base estimators.
- `estimators_samples_`: list of arrays
The subset of drawn samples (i.e., the in-bag samples) for each base
estimator. Each subset is defined by an array of the indices selected.
- `estimators_features_`: list of arrays
The subset of drawn features for each base estimator.
- `classes_`: ndarray of shape (n_classes,)
The classes labels.
- `n_classes_`: int or list
The number of classes.
- `oob_score_`: float
Score of the training dataset obtained using an out-of-bag estimate.
This attribute exists only when ``oob_score`` is True.
- `oob_decision_function_`: ndarray of shape (n_samples, n_classes)
Decision function computed with out-of-bag estimate on the training
set. If n_estimators is small it might be possible that a data point
was never left out during the bootstrap. In this case,
`oob_decision_function_` might contain NaN. This attribute exists
only when ``oob_score`` is True.
See Also
--------
- `BaggingRegressor`: A Bagging regressor.
References
----------
[1] L. Breiman, "Pasting small votes for classification in large databases and on-line", Machine Learning, 36(1), 85-103, 1999.
[2] L. Breiman, "Bagging predictors", Machine Learning, 24(2), 123-140, 1996.
[3] T. Ho, "The random subspace method for constructing decision forests", Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.
[4] G. Louppe and P. Geurts, "Ensembles on Random Patches", Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.
Examples
from sklearn.svm import SVC from sklearn.ensemble import BaggingClassifier from sklearn.datasets import make_classification X, y = make_classification(n_samples=100, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) clf = BaggingClassifier(estimator=SVC(), ... n_estimators=10, random_state=0).fit(X, y) clf.predict([[0, 0, 0, 0]]) array([1])
24.2.3 /bernoulli-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| binarize | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| predict-proba? |
Naive Bayes classifier for multivariate Bernoulli models.
Like MultinomialNB, this classifier is suitable for discrete data. The
difference is that while MultinomialNB works with occurrence counts,
BernoulliNB is designed for binary/boolean features.
Read more in the User Guide: `bernoulli_naive_bayes`.
Parameters
----------
- `alpha`: float or array-like of shape (n_features,), default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(set alpha=0 and force_alpha=True, for no smoothing).
- `force_alpha`: bool, default=True
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.
*Added in 1.2*
*Changed in 1.4*
The default value of `force_alpha` changed to `True`.
- `binarize`: float or None, default=0.0
Threshold for binarizing (mapping to booleans) of sample features.
If None, input is presumed to already consist of binary vectors.
- `fit_prior`: bool, default=True
Whether to learn class prior probabilities or not.
If false, a uniform prior will be used.
- `class_prior`: array-like of shape (n_classes,), default=None
Prior probabilities of the classes. If specified, the priors are not
adjusted according to the data.
Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
Number of samples encountered for each class during fitting. This
value is weighted by the sample weight when provided.
- `class_log_prior_`: ndarray of shape (n_classes,)
Log probability of each class (smoothed).
- `classes_`: ndarray of shape (n_classes,)
Class labels known to the classifier
- `feature_count_`: ndarray of shape (n_classes, n_features)
Number of samples encountered for each (class, feature)
during fitting. This value is weighted by the sample weight when
provided.
- `feature_log_prob_`: ndarray of shape (n_classes, n_features)
Empirical log probability of features given a class, P(x_i|y).
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `ComplementNB`: The Complement Naive Bayes classifier
described in Rennie et al. (2003).
- `GaussianNB`: Gaussian Naive Bayes (GaussianNB).
- `MultinomialNB`: Naive Bayes classifier for multinomial models.
References
----------
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to
Information Retrieval. Cambridge University Press, pp. 234-265.
https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html
A. McCallum and K. Nigam (1998). A comparison of event models for naive
Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for
Text Categorization, pp. 41-48.
V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with
naive Bayes -- Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).
Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB()
>>> print(clf.predict(X[2:3]))
[3]
24.2.4 /calibrated-classifier-cv
| name | type | default | description |
|---|---|---|---|
| cv | |||
| ensemble | |||
| estimator | |||
| method | |||
| n-jobs | |||
| predict-proba? |
Probability calibration with isotonic regression or logistic regression.
This class uses cross-validation to both estimate the parameters of a
classifier and subsequently calibrate a classifier. With default
`ensemble=True`, for each cv split it
fits a copy of the base estimator to the training subset, and calibrates it
using the testing subset. For prediction, predicted probabilities are
averaged across these individual calibrated classifiers. When
`ensemble=False`, cross-validation is used to obtain unbiased predictions,
via `~sklearn.model_selection.cross_val_predict`, which are then
used for calibration. For prediction, the base estimator, trained using all
the data, is used. This is the prediction method implemented when
`probabilities=True` for `~sklearn.svm.SVC` and `~sklearn.svm.NuSVC`
estimators (see User Guide: `scores_probabilities` for details).
Already fitted classifiers can be calibrated via the parameter
`cv="prefit"`. In this case, no cross-validation is used and all provided
data is used for calibration. The user has to take care manually that data
for model fitting and calibration are disjoint.
The calibration is based on the `decision_function` method of the
`estimator` if it exists, else on `predict_proba`.
Read more in the User Guide: `calibration`.
In order to learn more on the CalibratedClassifierCV class, see the
following calibration examples:
:ref:`sphx_glr_auto_examples_calibration_plot_calibration.py`,
:ref:`sphx_glr_auto_examples_calibration_plot_calibration_curve.py`, and
:ref:`sphx_glr_auto_examples_calibration_plot_calibration_multiclass.py`.
Parameters
----------
- `estimator`: estimator instance, default=None
The classifier whose output need to be calibrated to provide more
accurate `predict_proba` outputs. The default classifier is
a `~sklearn.svm.LinearSVC`.
*Added in 1.2*
- `method`: {'sigmoid', 'isotonic'}, default='sigmoid'
The method to use for calibration. Can be 'sigmoid' which
corresponds to Platt's method (i.e. a logistic regression model) or
'isotonic' which is a non-parametric approach. It is not advised to
use isotonic calibration with too few calibration samples
``(<<1000)`` since it tends to overfit.
- `cv`: int, cross-validation generator, iterable or "prefit", default=None
Determines the cross-validation splitting strategy.
Possible inputs for cv are:
- None, to use the default 5-fold cross-validation,
- integer, to specify the number of folds.
- `CV splitter`,
- An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if ``y`` is binary or multiclass,
`~sklearn.model_selection.StratifiedKFold` is used. If ``y`` is
neither binary nor multiclass, `~sklearn.model_selection.KFold`
is used.
Refer to the User Guide: `cross_validation` for the various
cross-validation strategies that can be used here.
If "prefit" is passed, it is assumed that `estimator` has been
fitted already and all data is used for calibration.
*Changed in 0.22*
``cv`` default value if None changed from 3-fold to 5-fold.
- `n_jobs`: int, default=None
Number of jobs to run in parallel.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors.
Base estimator clones are fitted in parallel across cross-validation
iterations. Therefore parallelism happens only when `cv != "prefit"`.
See `Glossary ` for more details.
*Added in 0.24*
- `ensemble`: bool, default=True
Determines how the calibrator is fitted when `cv` is not `'prefit'`.
Ignored if `cv='prefit'`.
If `True`, the `estimator` is fitted using training data, and
calibrated using testing data, for each `cv` fold. The final estimator
is an ensemble of `n_cv` fitted classifier and calibrator pairs, where
`n_cv` is the number of cross-validation folds. The output is the
average predicted probabilities of all pairs.
If `False`, `cv` is used to compute unbiased predictions, via
`~sklearn.model_selection.cross_val_predict`, which are then
used for calibration. At prediction time, the classifier used is the
`estimator` trained on all the data.
Note that this method is also internally implemented in
`sklearn.svm` estimators with the `probabilities=True` parameter.
*Added in 0.24*
Attributes
----------
- `classes_`: ndarray of shape (n_classes,)
The class labels.
- `n_features_in_`: int
Number of features seen during `fit`. Only defined if the
underlying estimator exposes such an attribute when fit.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Only defined if the
underlying estimator exposes such an attribute when fit.
*Added in 1.0*
- `calibrated_classifiers_`: list (len() equal to cv or 1 if `cv="prefit"` or `ensemble=False`)
The list of classifier and calibrator pairs.
- When `cv="prefit"`, the fitted `estimator` and fitted
calibrator.
- When `cv` is not "prefit" and `ensemble=True`, `n_cv` fitted
`estimator` and calibrator pairs. `n_cv` is the number of
cross-validation folds.
- When `cv` is not "prefit" and `ensemble=False`, the `estimator`,
fitted on all the data, and fitted calibrator.
*Changed in 0.24*
Single calibrated classifier case when `ensemble=False`.
See Also
--------
- `calibration_curve`: Compute true and predicted probabilities
for a calibration curve.
References
----------
[1] Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers, B. Zadrozny & C. Elkan, ICML 2001
[2] Transforming Classifier Scores into Accurate Multiclass Probability Estimates, B. Zadrozny & C. Elkan, (KDD 2002)
[3] Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, J. Platt, (1999)
[4] Predicting Good Probabilities with Supervised Learning, A. Niculescu-Mizil & R. Caruana, ICML 2005
Examples
from sklearn.datasets import make_classification from sklearn.naive_bayes import GaussianNB from sklearn.calibration import CalibratedClassifierCV X, y = make_classification(n_samples=100, n_features=2, ... n_redundant=0, random_state=42) base_clf = GaussianNB() calibrated_clf = CalibratedClassifierCV(base_clf, cv=3) calibrated_clf.fit(X, y) CalibratedClassifierCV(...) len(calibrated_clf.calibrated_classifiers_) 3 calibrated_clf.predict_proba(X)[:5, :] array([[0.110..., 0.889...], [0.072..., 0.927...], [0.928..., 0.071...], [0.928..., 0.071...], [0.071..., 0.928...]]) from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=100, n_features=2, ... n_redundant=0, random_state=42) X_train, X_calib, y_train, y_calib = train_test_split( ... X, y, random_state=42 ... ) base_clf = GaussianNB() base_clf.fit(X_train, y_train) GaussianNB() calibrated_clf = CalibratedClassifierCV(base_clf, cv="prefit") calibrated_clf.fit(X_calib, y_calib) CalibratedClassifierCV(...) len(calibrated_clf.calibrated_classifiers_) 1 calibrated_clf.predict_proba([[-0.5, 0.5]]) array([[0.936..., 0.063...]])
24.2.5 /categorical-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| min-categories | |||
| predict-proba? |
Naive Bayes classifier for categorical features.
The categorical Naive Bayes classifier is suitable for classification with
discrete features that are categorically distributed. The categories of
each feature are drawn from a categorical distribution.
Read more in the User Guide: `categorical_naive_bayes`.
Parameters
----------
- `alpha`: float, default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(set alpha=0 and force_alpha=True, for no smoothing).
- `force_alpha`: bool, default=True
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.
*Added in 1.2*
*Changed in 1.4*
The default value of `force_alpha` changed to `True`.
- `fit_prior`: bool, default=True
Whether to learn class prior probabilities or not.
If false, a uniform prior will be used.
- `class_prior`: array-like of shape (n_classes,), default=None
Prior probabilities of the classes. If specified, the priors are not
adjusted according to the data.
- `min_categories`: int or array-like of shape (n_features,), default=None
Minimum number of categories per feature.
- integer: Sets the minimum number of categories per feature to
`n_categories` for each features.
- array-like: shape (n_features,) where `n_categories[i]` holds the
minimum number of categories for the ith column of the input.
- None (default): Determines the number of categories automatically
from the training data.
*Added in 0.24*
Attributes
----------
- `category_count_`: list of arrays of shape (n_features,)
Holds arrays of shape (n_classes, n_categories of respective feature)
for each feature. Each array provides the number of samples
encountered for each class and category of the specific feature.
- `class_count_`: ndarray of shape (n_classes,)
Number of samples encountered for each class during fitting. This
value is weighted by the sample weight when provided.
- `class_log_prior_`: ndarray of shape (n_classes,)
Smoothed empirical log probability for each class.
- `classes_`: ndarray of shape (n_classes,)
Class labels known to the classifier
- `feature_log_prob_`: list of arrays of shape (n_features,)
Holds arrays of shape (n_classes, n_categories of respective feature)
for each feature. Each array provides the empirical log probability
of categories given the respective feature and class, ``P(x_i|y)``.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_categories_`: ndarray of shape (n_features,), dtype=np.int64
Number of categories for each feature. This value is
inferred from the data or set by the minimum number of categories.
*Added in 0.24*
See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `ComplementNB`: Complement Naive Bayes classifier.
- `GaussianNB`: Gaussian Naive Bayes.
- `MultinomialNB`: Naive Bayes classifier for multinomial models.
Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import CategoricalNB
>>> clf = CategoricalNB()
>>> clf.fit(X, y)
CategoricalNB()
>>> print(clf.predict(X[2:3]))
[3]
24.2.6 /complement-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| norm | |||
| predict-proba? |
The Complement Naive Bayes classifier described in Rennie et al. (2003).
The Complement Naive Bayes classifier was designed to correct the "severe
assumptions" made by the standard Multinomial Naive Bayes classifier. It is
particularly suited for imbalanced data sets.
Read more in the User Guide: `complement_naive_bayes`.
*Added in 0.20*
Parameters
----------
- `alpha`: float or array-like of shape (n_features,), default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(set alpha=0 and force_alpha=True, for no smoothing).
- `force_alpha`: bool, default=True
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.
*Added in 1.2*
*Changed in 1.4*
The default value of `force_alpha` changed to `True`.
- `fit_prior`: bool, default=True
Only used in edge case with a single class in the training set.
- `class_prior`: array-like of shape (n_classes,), default=None
Prior probabilities of the classes. Not used.
- `norm`: bool, default=False
Whether or not a second normalization of the weights is performed. The
default behavior mirrors the implementations found in Mahout and Weka,
which do not follow the full algorithm described in Table 9 of the
paper.
Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
Number of samples encountered for each class during fitting. This
value is weighted by the sample weight when provided.
- `class_log_prior_`: ndarray of shape (n_classes,)
Smoothed empirical log probability for each class. Only used in edge
case with a single class in the training set.
- `classes_`: ndarray of shape (n_classes,)
Class labels known to the classifier
- `feature_all_`: ndarray of shape (n_features,)
Number of samples encountered for each feature during fitting. This
value is weighted by the sample weight when provided.
- `feature_count_`: ndarray of shape (n_classes, n_features)
Number of samples encountered for each (class, feature) during fitting.
This value is weighted by the sample weight when provided.
- `feature_log_prob_`: ndarray of shape (n_classes, n_features)
Empirical weights for class complements.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `GaussianNB`: Gaussian Naive Bayes.
- `MultinomialNB`: Naive Bayes classifier for multinomial models.
References
----------
Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003).
Tackling the poor assumptions of naive bayes text classifiers. In ICML
(Vol. 3, pp. 616-623).
https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import ComplementNB
>>> clf = ComplementNB()
>>> clf.fit(X, y)
ComplementNB()
>>> print(clf.predict(X[2:3]))
[3]
24.2.7 /decision-tree-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| class-weight | |||
| criterion | |||
| predict-proba? |
A decision tree classifier.
Read more in the User Guide: `tree`.
Parameters
----------
- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
The function to measure the quality of a split. Supported criteria are
"gini" for the Gini impurity and "log_loss" and "entropy" both for the
Shannon information gain, see :ref:`tree_mathematical_formulation`.
- `splitter`: {"best", "random"}, default="best"
The strategy used to choose the split at each node. Supported
strategies are "best" to choose the best split and "random" to choose
the best random split.
- `max_depth`: int, default=None
The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
- `min_samples_split`: int or float, default=2
The minimum number of samples required to split an internal node:
- If int, then consider `min_samples_split` as the minimum number.
- If float, then `min_samples_split` is a fraction and
`ceil(min_samples_split * n_samples)` are the minimum
number of samples for each split.
*Changed in 0.18*
Added float values for fractions.
- `min_samples_leaf`: int or float, default=1
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least ``min_samples_leaf`` training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
- If int, then consider `min_samples_leaf` as the minimum number.
- If float, then `min_samples_leaf` is a fraction and
`ceil(min_samples_leaf * n_samples)` are the minimum
number of samples for each node.
*Changed in 0.18*
Added float values for fractions.
- `min_weight_fraction_leaf`: float, default=0.0
The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
- `max_features`: int, float or {"sqrt", "log2"}, default=None
The number of features to consider when looking for the best split:
- If int, then consider `max_features` features at each split.
- If float, then `max_features` is a fraction and
`max(1, int(max_features * n_features_in_))` features are considered at
each split.
- If "sqrt", then `max_features=sqrt(n_features)`.
- If "log2", then `max_features=log2(n_features)`.
- If None, then `max_features=n_features`.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than ``max_features`` features.
- `random_state`: int, RandomState instance or None, default=None
Controls the randomness of the estimator. The features are always
randomly permuted at each split, even if ``splitter`` is set to
``"best"``. When ``max_features < n_features``, the algorithm will
select ``max_features`` at random at each split before finding the best
split among them. But the best found split may vary across different
runs, even if ``max_features=n_features``. That is the case, if the
improvement of the criterion is identical for several splits and one
split has to be selected at random. To obtain a deterministic behaviour
during fitting, ``random_state`` has to be fixed to an integer.
See `Glossary ` for details.
- `max_leaf_nodes`: int, default=None
Grow a tree with ``max_leaf_nodes`` in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
- `min_impurity_decrease`: float, default=0.0
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
ight : dict, list of dict or "balanced", default=None
hts associated with classes in the form ``{class_label: weight}``.
one, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.
that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].
"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``
multi-output, the weights of each column of y will be multiplied.
that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.
a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.
ersionadded:: 0.22
c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonic increase
0: no constraint
-1: monotonic decrease
onotonic_cst is None, no constraints are applied.
tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.
constraints hold over the probability of the positive class.
more in the :ref:`User Guide `.
ersionadded:: 1.4
es
--
: ndarray of shape (n_classes,) or list of ndarray
classes labels (single output problem),
list of arrays of class labels (multi-output problem).
importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature. It is also
n as the Gini importance [4]_.
ing: impurity-based feature importances can be misleading for
cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.
ures_ : int
inferred value of max_features.
s_ : int or list of int
number of classes (for single output problems),
list containing the number of classes for each
ut (for multi-output problems).
es_in_ : int
er of features seen during :term:`fit`.
ersionadded:: 0.24
names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.
ersionadded:: 1.0
s_ : int
number of outputs when ``fit`` is performed.
Tree instance
underlying Tree object. Please refer to
lp(sklearn.tree._tree.Tree)`` for attributes of Tree object and
:`sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py`
basic usage of these attributes.
TreeRegressor : A decision tree regressor.
ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.
h:`predict` method operates using the :func:`numpy.argmax`
on the outputs of :meth:`predict_proba`. This means that in
highest predicted probabilities are tied, the classifier will
the tied class with the lowest index in :term:`classes_`.
es
--
ttps://en.wikipedia.org/wiki/Decision_tree_learning
. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification
nd Regression Trees", Wadsworth, Belmont, CA, 1984.
. Hastie, R. Tibshirani and J. Friedman. "Elements of Statistical
earning", Springer, 2009.
. Breiman, and A. Cutler, "Random Forests",
ttps://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
sklearn.datasets import load_iris
sklearn.model_selection import cross_val_score
sklearn.tree import DecisionTreeClassifier
= DecisionTreeClassifier(random_state=0)
= load_iris()
s_val_score(clf, iris.data, iris.target, cv=10)
# doctest: +SKIP
1. , 0.93..., 0.86..., 0.93..., 0.93...,
0.93..., 0.93..., 1. , 0.93..., 1. ])
24.2.8 /dummy-classifier
| name | type | default | description |
|---|---|---|---|
| constant | |||
| random-state | |||
| strategy | |||
| predict-proba? |
DummyClassifier makes predictions that ignore the input features.
This classifier serves as a simple baseline to compare against other more
complex classifiers.
The specific behavior of the baseline is selected with the `strategy`
parameter.
All strategies make predictions that ignore the input feature values passed
as the `X` argument to `fit` and `predict`. The predictions, however,
typically depend on values observed in the `y` parameter passed to `fit`.
Note that the "stratified" and "uniform" strategies lead to
non-deterministic predictions that can be rendered deterministic by setting
the `random_state` parameter if needed. The other strategies are naturally
deterministic and, once fit, always return the same constant prediction
for any value of `X`.
Read more in the User Guide: `dummy_estimators`.
*Added in 0.13*
Parameters
----------
- `strategy`: {"most_frequent", "prior", "stratified", "uniform", "constant"}, default="prior"
Strategy to use to generate predictions.
* "most_frequent": the `predict` method always returns the most
frequent class label in the observed `y` argument passed to `fit`.
The `predict_proba` method returns the matching one-hot encoded
vector.
* "prior": the `predict` method always returns the most frequent
class label in the observed `y` argument passed to `fit` (like
"most_frequent"). ``predict_proba`` always returns the empirical
class distribution of `y` also known as the empirical class prior
distribution.
* "stratified": the `predict_proba` method randomly samples one-hot
vectors from a multinomial distribution parametrized by the empirical
class prior probabilities.
The `predict` method returns the class label which got probability
one in the one-hot vector of `predict_proba`.
Each sampled row of both methods is therefore independent and
identically distributed.
* "uniform": generates predictions uniformly at random from the list
of unique classes observed in `y`, i.e. each class has equal
probability.
* "constant": always predicts a constant label that is provided by
the user. This is useful for metrics that evaluate a non-majority
class.
*Changed in 0.24*
The default value of `strategy` has changed to "prior" in version
0.24.
- `random_state`: int, RandomState instance or None, default=None
Controls the randomness to generate the predictions when
``strategy='stratified'`` or ``strategy='uniform'``.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
- `constant`: int or str or array-like of shape (n_outputs,), default=None
The explicit constant as predicted by the "constant" strategy. This
parameter is useful only for the "constant" strategy.
Attributes
----------
- `classes_`: ndarray of shape (n_classes,) or list of such arrays
Unique class labels observed in `y`. For multi-output classification
problems, this attribute is a list of arrays as each output has an
independent set of possible classes.
- `n_classes_`: int or list of int
Number of label for each output.
- `class_prior_`: ndarray of shape (n_classes,) or list of such arrays
Frequency of each class observed in `y`. For multioutput classification
problems, this is computed independently for each output.
- `n_features_in_`: int
Number of features seen during `fit`.
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X` has
feature names that are all strings.
- `n_outputs_`: int
Number of outputs.
- `sparse_output_`: bool
True if the array returned from predict is to be in sparse CSC format.
Is automatically set to True if the input `y` is passed in sparse
format.
See Also
--------
- `DummyRegressor`: Regressor that makes predictions using simple rules.
Examples
--------
>>> import numpy as np
>>> from sklearn.dummy import DummyClassifier
>>> X = np.array([-1, 1, 1, 1])
>>> y = np.array([0, 1, 1, 1])
>>> dummy_clf = DummyClassifier(strategy="most_frequent")
>>> dummy_clf.fit(X, y)
DummyClassifier(strategy='most_frequent')
>>> dummy_clf.predict(X)
array([1, 1, 1, 1])
>>> dummy_clf.score(X, y)
0.75
24.2.9 /extra-tree-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| class-weight | |||
| criterion | |||
| predict-proba? |
An extremely randomized tree classifier.
Extra-trees differ from classic decision trees in the way they are built.
When looking for the best split to separate the samples of a node into two
groups, random splits are drawn for each of the `max_features` randomly
selected features and the best split among those is chosen. When
`max_features` is set 1, this amounts to building a totally random
decision tree.
Warning: Extra-trees should only be used within ensemble methods.
Read more in the User Guide: `tree`.
Parameters
----------
- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
The function to measure the quality of a split. Supported criteria are
"gini" for the Gini impurity and "log_loss" and "entropy" both for the
Shannon information gain, see :ref:`tree_mathematical_formulation`.
- `splitter`: {"random", "best"}, default="random"
The strategy used to choose the split at each node. Supported
strategies are "best" to choose the best split and "random" to choose
the best random split.
- `max_depth`: int, default=None
The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
- `min_samples_split`: int or float, default=2
The minimum number of samples required to split an internal node:
- If int, then consider `min_samples_split` as the minimum number.
- If float, then `min_samples_split` is a fraction and
`ceil(min_samples_split * n_samples)` are the minimum
number of samples for each split.
*Changed in 0.18*
Added float values for fractions.
- `min_samples_leaf`: int or float, default=1
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least ``min_samples_leaf`` training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
- If int, then consider `min_samples_leaf` as the minimum number.
- If float, then `min_samples_leaf` is a fraction and
`ceil(min_samples_leaf * n_samples)` are the minimum
number of samples for each node.
*Changed in 0.18*
Added float values for fractions.
- `min_weight_fraction_leaf`: float, default=0.0
The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
- `max_features`: int, float, {"sqrt", "log2"} or None, default="sqrt"
The number of features to consider when looking for the best split:
- If int, then consider `max_features` features at each split.
- If float, then `max_features` is a fraction and
`max(1, int(max_features * n_features_in_))` features are considered at
each split.
- If "sqrt", then `max_features=sqrt(n_features)`.
- If "log2", then `max_features=log2(n_features)`.
- If None, then `max_features=n_features`.
*Changed in 1.1*
The default of `max_features` changed from `"auto"` to `"sqrt"`.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than ``max_features`` features.
- `random_state`: int, RandomState instance or None, default=None
Used to pick randomly the `max_features` used at each split.
See `Glossary ` for details.
- `max_leaf_nodes`: int, default=None
Grow a tree with ``max_leaf_nodes`` in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
- `min_impurity_decrease`: float, default=0.0
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
ight : dict, list of dict or "balanced", default=None
hts associated with classes in the form ``{class_label: weight}``.
one, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.
that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].
"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``
multi-output, the weights of each column of y will be multiplied.
that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.
a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.
ersionadded:: 0.22
c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonic increase
0: no constraint
-1: monotonic decrease
onotonic_cst is None, no constraints are applied.
tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.
constraints hold over the probability of the positive class.
more in the :ref:`User Guide `.
ersionadded:: 1.4
es
--
: ndarray of shape (n_classes,) or list of ndarray
classes labels (single output problem),
list of arrays of class labels (multi-output problem).
ures_ : int
inferred value of max_features.
s_ : int or list of int
number of classes (for single output problems),
list containing the number of classes for each
ut (for multi-output problems).
importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature. It is also
n as the Gini importance.
ing: impurity-based feature importances can be misleading for
cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.
es_in_ : int
er of features seen during :term:`fit`.
ersionadded:: 0.24
names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.
ersionadded:: 1.0
s_ : int
number of outputs when ``fit`` is performed.
Tree instance
underlying Tree object. Please refer to
lp(sklearn.tree._tree.Tree)`` for attributes of Tree object and
:`sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py`
basic usage of these attributes.
eRegressor : An extremely randomized tree regressor.
ensemble.ExtraTreesClassifier : An extra-trees classifier.
ensemble.ExtraTreesRegressor : An extra-trees regressor.
ensemble.RandomForestClassifier : A random forest classifier.
ensemble.RandomForestRegressor : A random forest regressor.
ensemble.RandomTreesEmbedding : An ensemble of
lly random trees.
ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.
es
--
. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized trees",
achine Learning, 63(1), 3-42, 2006.
sklearn.datasets import load_iris
sklearn.model_selection import train_test_split
sklearn.ensemble import BaggingClassifier
sklearn.tree import ExtraTreeClassifier
= load_iris(return_X_y=True)
ain, X_test, y_train, y_test = train_test_split(
, y, random_state=0)
a_tree = ExtraTreeClassifier(random_state=0)
= BaggingClassifier(extra_tree, random_state=0).fit(
_train, y_train)
score(X_test, y_test)
.
24.2.10 /extra-trees-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| class-weight | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
An extra-trees classifier.
This class implements a meta estimator that fits a number of
randomized decision trees (a.k.a. extra-trees) on various sub-samples
of the dataset and uses averaging to improve the predictive accuracy
and control over-fitting.
Read more in the User Guide: `forest`.
Parameters
----------
- `n_estimators`: int, default=100
The number of trees in the forest.
*Changed in 0.22*
The default value of ``n_estimators`` changed from 10 to 100
in 0.22.
- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
The function to measure the quality of a split. Supported criteria are
"gini" for the Gini impurity and "log_loss" and "entropy" both for the
Shannon information gain, see :ref:`tree_mathematical_formulation`.
Note: This parameter is tree-specific.
- `max_depth`: int, default=None
The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
- `min_samples_split`: int or float, default=2
The minimum number of samples required to split an internal node:
- If int, then consider `min_samples_split` as the minimum number.
- If float, then `min_samples_split` is a fraction and
`ceil(min_samples_split * n_samples)` are the minimum
number of samples for each split.
*Changed in 0.18*
Added float values for fractions.
- `min_samples_leaf`: int or float, default=1
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least ``min_samples_leaf`` training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
- If int, then consider `min_samples_leaf` as the minimum number.
- If float, then `min_samples_leaf` is a fraction and
`ceil(min_samples_leaf * n_samples)` are the minimum
number of samples for each node.
*Changed in 0.18*
Added float values for fractions.
- `min_weight_fraction_leaf`: float, default=0.0
The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
- `max_features`: {"sqrt", "log2", None}, int or float, default="sqrt"
The number of features to consider when looking for the best split:
- If int, then consider `max_features` features at each split.
- If float, then `max_features` is a fraction and
`max(1, int(max_features * n_features_in_))` features are considered at each
split.
- If "sqrt", then `max_features=sqrt(n_features)`.
- If "log2", then `max_features=log2(n_features)`.
- If None, then `max_features=n_features`.
*Changed in 1.1*
The default of `max_features` changed from `"auto"` to `"sqrt"`.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than ``max_features`` features.
- `max_leaf_nodes`: int, default=None
Grow trees with ``max_leaf_nodes`` in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
- `min_impurity_decrease`: float, default=0.0
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
p : bool, default=False
her bootstrap samples are used when building trees. If False, the
e dataset is used to build each tree.
e : bool or callable, default=False
her to use out-of-bag samples to estimate the generalization score.
efault, :func:`~sklearn.metrics.accuracy_score` is used.
ide a callable with signature `metric(y_true, y_pred)` to use a
om metric. Only available if `bootstrap=True`.
int, default=None
number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
h:`decision_path` and :meth:`apply` are all parallelized over the
s. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
ext. ``-1`` means using all processors. See :term:`Glossary
obs>` for more details.
tate : int, RandomState instance or None, default=None
rols 3 sources of randomness:
e bootstrapping of the samples used when building trees
f ``bootstrap=True``)
e sampling of the features to consider when looking for the best
lit at each node (if ``max_features < n_features``)
e draw of the splits for each of the `max_features`
:term:`Glossary ` for details.
: int, default=0
rols the verbosity when fitting and predicting.
rt : bool, default=False
set to ``True``, reuse the solution of the previous call to fit
add more estimators to the ensemble, otherwise, just fit a whole
forest. See :term:`Glossary ` and
:`tree_ensemble_warm_start` for details.
ight : {"balanced", "balanced_subsample"}, dict or list of dicts, default=None
hts associated with classes in the form ``{class_label: weight}``.
ot given, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.
that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].
"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``
"balanced_subsample" mode is the same as "balanced" except that
hts are computed based on the bootstrap sample for every tree
n.
multi-output, the weights of each column of y will be multiplied.
that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.
a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.
ersionadded:: 0.22
les : int or float, default=None
ootstrap is True, the number of samples to draw from X
rain each base estimator.
None (default), then draw `X.shape[0]` samples.
int, then draw `max_samples` samples.
float, then draw `max_samples * X.shape[0]` samples. Thus,
ax_samples` should be in the interval `(0.0, 1.0]`.
ersionadded:: 0.22
c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonically increasing
0: no constraint
-1: monotonically decreasing
onotonic_cst is None, no constraints are applied.
tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.
constraints hold over the probability of the positive class.
more in the :ref:`User Guide `.
ersionadded:: 1.4
es
--
r_ : :class:`~sklearn.tree.ExtraTreeClassifier`
child estimator template used to create the collection of fitted
estimators.
ersionadded:: 1.2
base_estimator_` was renamed to `estimator_`.
rs_ : list of DecisionTreeClassifier
collection of fitted sub-estimators.
: ndarray of shape (n_classes,) or a list of such arrays
classes labels (single output problem), or a list of arrays of
s labels (multi-output problem).
s_ : int or list
number of classes (single output problem), or a list containing the
er of classes for each output (multi-output problem).
importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature. It is also
n as the Gini importance.
ing: impurity-based feature importances can be misleading for
cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.
es_in_ : int
er of features seen during :term:`fit`.
ersionadded:: 0.24
names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.
ersionadded:: 1.0
s_ : int
number of outputs when ``fit`` is performed.
e_ : float
e of the training dataset obtained using an out-of-bag estimate.
attribute exists only when ``oob_score`` is True.
sion_function_ : ndarray of shape (n_samples, n_classes) or (n_samples, n_classes, n_outputs)
sion function computed with out-of-bag estimate on the training
If n_estimators is small it might be possible that a data point
never left out during the bootstrap. In this case,
_decision_function_` might contain NaN. This attribute exists
when ``oob_score`` is True.
rs_samples_ : list of arrays
subset of drawn samples (i.e., the in-bag samples) for each base
mator. Each subset is defined by an array of the indices selected.
ersionadded:: 1.4
esRegressor : An extra-trees regressor with random splits.
restClassifier : A random forest classifier with optimal splits.
restRegressor : Ensemble regressor using trees with optimal splits.
ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.
es
--
. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized
rees", Machine Learning, 63(1), 3-42, 2006.
sklearn.ensemble import ExtraTreesClassifier
sklearn.datasets import make_classification
= make_classification(n_features=4, random_state=0)
= ExtraTreesClassifier(n_estimators=100, random_state=0)
fit(X, y)
esClassifier(random_state=0)
predict([[0, 0, 0, 0]])
])
24.2.11 /gaussian-nb
| name | type | default | description |
|---|---|---|---|
| priors | |||
| var-smoothing | |||
| predict-proba? |
Gaussian Naive Bayes (GaussianNB).
Can perform online updates to model parameters via `partial_fit`.
For details on algorithm used to update feature means and variance online,
see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:
http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf
Read more in the User Guide: `gaussian_naive_bayes`.
Parameters
----------
- `priors`: array-like of shape (n_classes,), default=None
Prior probabilities of the classes. If specified, the priors are not
adjusted according to the data.
- `var_smoothing`: float, default=1e-9
Portion of the largest variance of all features that is added to
variances for calculation stability.
*Added in 0.20*
Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
number of training samples observed in each class.
- `class_prior_`: ndarray of shape (n_classes,)
probability of each class.
- `classes_`: ndarray of shape (n_classes,)
class labels known to the classifier.
- `epsilon_`: float
absolute additive value to variances.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `var_`: ndarray of shape (n_classes, n_features)
Variance of each feature per class.
*Added in 1.0*
- `theta_`: ndarray of shape (n_classes, n_features)
mean of each feature per class.
See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `ComplementNB`: Complement Naive Bayes classifier.
- `MultinomialNB`: Naive Bayes classifier for multinomial models.
Examples
--------
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([1, 1, 1, 2, 2, 2])
>>> from sklearn.naive_bayes import GaussianNB
>>> clf = GaussianNB()
>>> clf.fit(X, Y)
GaussianNB()
>>> print(clf.predict([[-0.8, -1]]))
[1]
>>> clf_pf = GaussianNB()
>>> clf_pf.partial_fit(X, Y, np.unique(Y))
GaussianNB()
>>> print(clf_pf.predict([[-0.8, -1]]))
[1]
24.2.12 /gaussian-process-classifier
| name | type | default | description |
|---|---|---|---|
| kernel | |||
| optimizer | |||
| multi-class | |||
| n-jobs | |||
| random-state | |||
| max-iter-predict | |||
| copy-x-train | |||
| n-restarts-optimizer | |||
| warm-start | |||
| predict-proba? |
Gaussian process classification (GPC) based on Laplace approximation.
The implementation is based on Algorithm 3.1, 3.2, and 5.1 from [RW2006]_.
Internally, the Laplace approximation is used for approximating the
non-Gaussian posterior by a Gaussian.
Currently, the implementation is restricted to using the logistic link
function. For multi-class classification, several binary one-versus rest
classifiers are fitted. Note that this class thus does not implement
a true multi-class Laplace approximation.
Read more in the User Guide: `gaussian_process`.
*Added in 0.18*
Parameters
----------
- `kernel`: kernel instance, default=None
The kernel specifying the covariance function of the GP. If None is
passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that
the kernel's hyperparameters are optimized during fitting. Also kernel
cannot be a `CompoundKernel`.
- `optimizer`: 'fmin_l_bfgs_b', callable or None, default='fmin_l_bfgs_b'
Can either be one of the internally supported optimizers for optimizing
the kernel's parameters, specified by a string, or an externally
defined optimizer passed as a callable. If a callable is passed, it
must have the signature
def optimizer(obj_func, initial_theta, bounds):
# * 'obj_func' is the objective function to be maximized, which
# takes the hyperparameters theta as parameter and an
# optional flag eval_gradient, which determines if the
# gradient is returned additionally to the function value
# * 'initial_theta': the initial value for theta, which can be
# used by local optimizers
# * 'bounds': the bounds on the values of theta
....
# Returned are the best found hyperparameters theta and
# the corresponding value of the target function.
return theta_opt, func_min
default, the 'L-BFGS-B' algorithm from scipy.optimize.minimize
sed. If None is passed, the kernel's parameters are kept fixed.
lable internal optimizers are::
'fmin_l_bfgs_b'
ts_optimizer : int, default=0
number of restarts of the optimizer for finding the kernel's
meters which maximize the log-marginal likelihood. The first run
he optimizer is performed from the kernel's initial parameters,
remaining ones (if any) from thetas sampled log-uniform randomly
the space of allowed theta-values. If greater than 0, all bounds
be finite. Note that n_restarts_optimizer=0 implies that one
is performed.
_predict : int, default=100
maximum number of iterations in Newton's method for approximating
posterior during predict. Smaller values will reduce computation
at the cost of worse results.
rt : bool, default=False
arm-starts are enabled, the solution of the last Newton iteration
he Laplace approximation of the posterior mode is used as
ialization for the next call of _posterior_mode(). This can speed
onvergence when _posterior_mode is called several times on similar
lems as in hyperparameter optimization. See :term:`the Glossary
m_start>`.
rain : bool, default=True
rue, a persistent copy of the training data is stored in the
ct. Otherwise, just a reference to the training data is stored,
h might cause predictions to change if the data is modified
rnally.
tate : int, RandomState instance or None, default=None
rmines random number generation used to initialize the centers.
an int for reproducible results across multiple function calls.
:term:`Glossary `.
ass : {'one_vs_rest', 'one_vs_one'}, default='one_vs_rest'
ifies how multi-class classification problems are handled.
orted are 'one_vs_rest' and 'one_vs_one'. In 'one_vs_rest',
binary Gaussian process classifier is fitted for each class, which
rained to separate this class from the rest. In 'one_vs_one', one
ry Gaussian process classifier is fitted for each pair of classes,
h is trained to separate these two classes. The predictions of
e binary predictors are combined into multi-class predictions.
that 'one_vs_one' does not support predicting probability
mates.
int, default=None
number of jobs to use for the computation: the specified
iclass problems are computed in parallel.
ne`` means 1 unless in a :obj:`joblib.parallel_backend` context.
`` means using all processors. See :term:`Glossary `
more details.
es
--
imator_ : ``Estimator`` instance
estimator instance that defines the likelihood function
g the observed data.
: kernel instance
kernel used for prediction. In case of binary classification,
structure of the kernel is the same as the one passed as parameter
with optimized hyperparameters. In case of multi-class
sification, a CompoundKernel is returned which consists of the
erent kernels used in the one-versus-rest classifiers.
inal_likelihood_value_ : float
log-marginal-likelihood of ``self.kernel_.theta``
: array-like of shape (n_classes,)
ue class labels.
s_ : int
number of classes in the training data
es_in_ : int
er of features seen during :term:`fit`.
ersionadded:: 0.24
names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.
ersionadded:: 1.0
ProcessRegressor : Gaussian process regression (GPR).
es
--
06] `Carl E. Rasmussen and Christopher K.I. Williams,
sian Processes for Machine Learning",
ress 2006 `_
sklearn.datasets import load_iris
sklearn.gaussian_process import GaussianProcessClassifier
sklearn.gaussian_process.kernels import RBF
= load_iris(return_X_y=True)
el = 1.0 * RBF(1.0)
= GaussianProcessClassifier(kernel=kernel,
random_state=0).fit(X, y)
score(X, y)
.
predict_proba(X[:2,:])
0.83548752, 0.03228706, 0.13222543],
0.79064206, 0.06525643, 0.14410151]])
24.2.13 /gradient-boosting-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| tol | |||
| subsample | |||
| ccp-alpha | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| init | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| n-estimators | |||
| criterion | |||
| loss | |||
| verbose | |||
| predict-proba? |
Gradient Boosting for classification.
This algorithm builds an additive model in a forward stage-wise fashion; it
allows for the optimization of arbitrary differentiable loss functions. In
each stage ``n_classes_`` regression trees are fit on the negative gradient
of the loss function, e.g. binary or multiclass log loss. Binary
classification is a special case where only a single regression tree is
induced.
`~sklearn.ensemble.HistGradientBoostingClassifier` is a much faster variant
of this algorithm for intermediate and large datasets (`n_samples >= 10_000`) and
supports monotonic constraints.
Read more in the User Guide: `gradient_boosting`.
Parameters
----------
- `loss`: {'log_loss', 'exponential'}, default='log_loss'
The loss function to be optimized. 'log_loss' refers to binomial and
multinomial deviance, the same as used in logistic regression.
It is a good choice for classification with probabilistic outputs.
For loss 'exponential', gradient boosting recovers the AdaBoost algorithm.
- `learning_rate`: float, default=0.1
Learning rate shrinks the contribution of each tree by `learning_rate`.
There is a trade-off between learning_rate and n_estimators.
Values must be in the range `[0.0, inf)`.
- `n_estimators`: int, default=100
The number of boosting stages to perform. Gradient boosting
is fairly robust to over-fitting so a large number usually
results in better performance.
Values must be in the range `[1, inf)`.
- `subsample`: float, default=1.0
The fraction of samples to be used for fitting the individual base
learners. If smaller than 1.0 this results in Stochastic Gradient
Boosting. `subsample` interacts with the parameter `n_estimators`.
Choosing `subsample < 1.0` leads to a reduction of variance
and an increase in bias.
Values must be in the range `(0.0, 1.0]`.
- `criterion`: {'friedman_mse', 'squared_error'}, default='friedman_mse'
The function to measure the quality of a split. Supported criteria are
'friedman_mse' for the mean squared error with improvement score by
Friedman, 'squared_error' for mean squared error. The default value of
'friedman_mse' is generally the best as it can provide a better
approximation in some cases.
*Added in 0.18*
- `min_samples_split`: int or float, default=2
The minimum number of samples required to split an internal node:
- If int, values must be in the range `[2, inf)`.
- If float, values must be in the range `(0.0, 1.0]` and `min_samples_split`
will be `ceil(min_samples_split * n_samples)`.
*Changed in 0.18*
Added float values for fractions.
- `min_samples_leaf`: int or float, default=1
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least ``min_samples_leaf`` training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
- If int, values must be in the range `[1, inf)`.
- If float, values must be in the range `(0.0, 1.0)` and `min_samples_leaf`
will be `ceil(min_samples_leaf * n_samples)`.
*Changed in 0.18*
Added float values for fractions.
- `min_weight_fraction_leaf`: float, default=0.0
The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
Values must be in the range `[0.0, 0.5]`.
- `max_depth`: int or None, default=3
Maximum depth of the individual regression estimators. The maximum
depth limits the number of nodes in the tree. Tune this parameter
for best performance; the best value depends on the interaction
of the input variables. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
If int, values must be in the range `[1, inf)`.
- `min_impurity_decrease`: float, default=0.0
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
Values must be in the range `[0.0, inf)`.
The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
stimator or 'zero', default=None
stimator object that is used to compute the initial predictions.
it`` has to provide :term:`fit` and :term:`predict_proba`. If
o', the initial raw predictions are set to zero. By default, a
mmyEstimator`` predicting the classes priors is used.
tate : int, RandomState instance or None, default=None
rols the random seed given to each Tree estimator at each
ting iteration.
ddition, it controls the random permutation of the features at
split (see Notes for more details).
lso controls the random splitting of the training data to obtain a
dation set if `n_iter_no_change` is not None.
an int for reproducible output across multiple function calls.
:term:`Glossary `.
ures : {'sqrt', 'log2'}, int or float, default=None
number of features to consider when looking for the best split:
int, values must be in the range `[1, inf)`.
float, values must be in the range `(0.0, 1.0]` and the features
nsidered at each split will be `max(1, int(max_features * n_features_in_))`.
'sqrt', then `max_features=sqrt(n_features)`.
'log2', then `max_features=log2(n_features)`.
None, then `max_features=n_features`.
sing `max_features < n_features` leads to a reduction of variance
an increase in bias.
: the search for a split does not stop until at least one
d partition of the node samples is found, even if it requires to
ctively inspect more than ``max_features`` features.
: int, default=0
le verbose output. If 1 then it prints progress and performance
in a while (the more trees the lower the frequency). If greater
1 then it prints progress and performance for every tree.
es must be in the range `[0, inf)`.
_nodes : int, default=None
trees with ``max_leaf_nodes`` in best-first fashion.
nodes are defined as relative reduction in impurity.
es must be in the range `[2, inf)`.
None`, then unlimited number of leaf nodes.
rt : bool, default=False
set to ``True``, reuse the solution of the previous call to fit
add more estimators to the ensemble, otherwise, just erase the
ious solution. See :term:`the Glossary `.
on_fraction : float, default=0.1
proportion of training data to set aside as validation set for
y stopping. Values must be in the range `(0.0, 1.0)`.
used if ``n_iter_no_change`` is set to an integer.
ersionadded:: 0.20
o_change : int, default=None
iter_no_change`` is used to decide if early stopping will be used
erminate training when validation score is not improving. By
ult it is set to None to disable early stopping. If set to a
er, it will set aside ``validation_fraction`` size of the training
as validation and terminate training when validation score is not
oving in all of the previous ``n_iter_no_change`` numbers of
ations. The split is stratified.
es must be in the range `[1, inf)`.
:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_early_stopping.py`.
ersionadded:: 0.20
oat, default=1e-4
rance for the early stopping. When the loss is not improving
t least tol for ``n_iter_no_change`` iterations (if set to a
er), the training stops.
es must be in the range `[0.0, inf)`.
ersionadded:: 0.20
a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed.
es must be in the range `[0.0, inf)`.
:ref:`minimal_cost_complexity_pruning` for details.
ersionadded:: 0.22
es
--
tors_ : int
number of estimators as selected by early stopping (if
iter_no_change`` is specified). Otherwise it is set to
estimators``.
ersionadded:: 0.20
per_iteration_ : int
number of trees that are built at each iteration. For binary classifiers,
is always 1.
ersionadded:: 1.4.0
importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature. It is also
n as the Gini importance.
ing: impurity-based feature importances can be misleading for
cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.
ovement_ : ndarray of shape (n_estimators,)
improvement in loss on the out-of-bag samples
tive to the previous iteration.
b_improvement_[0]`` is the improvement in
of the first stage over the ``init`` estimator.
available if ``subsample < 1.0``.
es_ : ndarray of shape (n_estimators,)
full history of the loss values on the out-of-bag
les. Only available if `subsample < 1.0`.
ersionadded:: 1.3
e_ : float
last value of the loss on the out-of-bag samples. It is
same as `oob_scores_[-1]`. Only available if `subsample < 1.0`.
ersionadded:: 1.3
ore_ : ndarray of shape (n_estimators,)
i-th score ``train_score_[i]`` is the loss of the
l at iteration ``i`` on the in-bag sample.
`subsample == 1`` this is the loss on the training data.
estimator
estimator that provides the initial predictions. Set via the ``init``
ment.
rs_ : ndarray of DecisionTreeRegressor of shape (n_estimators, ``n_trees_per_iteration_``)
collection of fitted sub-estimators. ``n_trees_per_iteration_`` is 1 for
ry classification, otherwise ``n_classes``.
: ndarray of shape (n_classes,)
classes labels.
es_in_ : int
er of features seen during :term:`fit`.
ersionadded:: 0.24
names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.
ersionadded:: 1.0
s_ : int
number of classes.
ures_ : int
inferred value of max_features.
ientBoostingClassifier : Histogram-based Gradient Boosting
sification Tree.
tree.DecisionTreeClassifier : A decision tree classifier.
restClassifier : A meta-estimator that fits a number of decision
classifiers on various sub-samples of the dataset and uses
aging to improve the predictive accuracy and control over-fitting.
Classifier : A meta-estimator that begins by fitting a classifier
he original dataset and then fits additional copies of the
sifier on the same dataset where the weights of incorrectly
sified instances are adjusted such that subsequent classifiers
s more on difficult cases.
ures are always randomly permuted at each split. Therefore,
found split may vary, even with the same training data and
atures=n_features``, if the improvement of the criterion is
l for several splits enumerated during the search of the best
o obtain a deterministic behaviour during fitting,
_state`` has to be fixed.
es
--
man, Greedy Function Approximation: A Gradient Boosting
The Annals of Statistics, Vol. 29, No. 5, 2001.
man, Stochastic Gradient Boosting, 1999
e, R. Tibshirani and J. Friedman.
of Statistical Learning Ed. 2, Springer, 2009.
owing example shows how to fit a gradient boosting classifier with
sion stumps as weak learners.
sklearn.datasets import make_hastie_10_2
sklearn.ensemble import GradientBoostingClassifier
= make_hastie_10_2(random_state=0)
ain, X_test = X[:2000], X[2000:]
ain, y_test = y[:2000], y[2000:]
= GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
max_depth=1, random_state=0).fit(X_train, y_train)
score(X_test, y_test)
24.2.14 /hist-gradient-boosting-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| max-leaf-nodes | |||
| scoring | |||
| tol | |||
| early-stopping | |||
| max-iter | |||
| random-state | |||
| max-bins | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| class-weight | |||
| loss | |||
| interaction-cst | |||
| verbose | |||
| categorical-features | |||
| l-2-regularization | |||
| predict-proba? |
Histogram-based Gradient Boosting Classification Tree.
This estimator is much faster than `GradientBoostingClassifier` for big datasets (n_samples >= 10 000). This estimator has native support for missing values (NaNs). During training, the tree grower learns at each split point whether samples with missing values should go to the left or right child, based on the potential gain. When predicting, samples with missing values are assigned to the left or right child consequently. If no missing values were encountered for a given feature during training, then samples with missing values are mapped to whichever child has the most samples. This implementation is inspired by [LightGBM ](https://github.com/Microsoft/LightGBM). Read more in the User Guide: `histogram_based_gradient_boosting`. *Added in 0.21* Parameters ---------- - `loss`: {'log_loss'}, default='log_loss' The loss function to use in the boosting process. For binary classification problems, 'log_loss' is also known as logistic loss, binomial deviance or binary crossentropy. Internally, the model fits one tree per boosting iteration and uses the logistic sigmoid function (expit) as inverse link function to compute the predicted positive class probability. For multiclass classification problems, 'log_loss' is also known as multinomial deviance or categorical crossentropy. Internally, the model fits one tree per boosting iteration and per class and uses the softmax function as inverse link function to compute the predicted probabilities of the classes. - `learning_rate`: float, default=0.1 The learning rate, also known as *shrinkage*. This is used as a multiplicative factor for the leaves values. Use ``1`` for no shrinkage. - `max_iter`: int, default=100 The maximum number of iterations of the boosting process, i.e. the maximum number of trees for binary classification. For multiclass classification, `n_classes` trees per iteration are built. - `max_leaf_nodes`: int or None, default=31 The maximum number of leaves for each tree. Must be strictly greater than 1. If None, there is no maximum limit. - `max_depth`: int or None, default=None The maximum depth of each tree. The depth of a tree is the number of edges to go from the root to the deepest leaf. Depth isn't constrained by default. - `min_samples_leaf`: int, default=20 The minimum number of samples per leaf. For small datasets with less than a few hundred samples, it is recommended to lower this value since only very shallow trees would be built. - `l2_regularization`: float, default=0 The L2 regularization parameter penalizing leaves with small hessians. Use ``0`` for no regularization (default). - `max_features`: float, default=1.0 Proportion of randomly chosen features in each and every node split. This is a form of regularization, smaller values make the trees weaker learners and might prevent overfitting. If interaction constraints from `interaction_cst` are present, only allowed features are taken into account for the subsampling. *Added in 1.4* - `max_bins`: int, default=255 The maximum number of bins to use for non-missing values. Before training, each feature of the input array `X` is binned into integer-valued bins, which allows for a much faster training stage. Features with a small number of unique values may use less than ``max_bins`` bins. In addition to the ``max_bins`` bins, one more bin is always reserved for missing values. Must be no larger than 255. - `categorical_features`: array-like of {bool, int, str} of shape (n_features) or shape (n_categorical_features,), default=None Indicates the categorical features. - None : no feature will be considered categorical. - boolean array-like : boolean mask indicating categorical features. - integer array-like : integer indices indicating categorical features. - str array-like: names of categorical features (assuming the training data has feature names). - `"from_dtype"`: dataframe columns with dtype "category" are considered to be categorical features. The input must be an object exposing a ``__dataframe__`` method such as pandas or polars DataFrames to use this feature. For each categorical feature, there must be at most `max_bins` unique categories. Negative values for categorical features encoded as numeric dtypes are treated as missing values. All categorical values are converted to floating point numbers. This means that categorical values of 1.0 and 1 are treated as the same category. Read more in the User Guide: `categorical_support_gbdt`. *Added in 0.24* *Changed in 1.2* Added support for feature names. *Changed in 1.4* Added `"from_dtype"` option. The default will change to `"from_dtype"` in v1.6. - `monotonic_cst`: array-like of int of shape (n_features) or dict, default=None Monotonic constraint to enforce on each feature are specified using the following integer values: - 1: monotonic increase - 0: no constraint - -1: monotonic decrease If a dict with str keys, map feature to monotonic constraints by name. If an array, the features are mapped to constraints by position. See :ref:`monotonic_cst_features_names` for a usage example. The constraints are only valid for binary classifications and hold over the probability of the positive class. Read more in the User Guide: `monotonic_cst_gbdt`. *Added in 0.23* *Changed in 1.2* Accept dict of constraints with feature names as keys. - `interaction_cst`: {"pairwise", "no_interactions"} or sequence of lists/tuples/sets of int, default=None Specify interaction constraints, the sets of features which can interact with each other in child node splits. Each item specifies the set of feature indices that are allowed to interact with each other. If there are more features than specified in these constraints, they are treated as if they were specified as an additional set. The strings "pairwise" and "no_interactions" are shorthands for allowing only pairwise or no interactions, respectively. For instance, with 5 features in total, `interaction_cst=[{0, 1}]` is equivalent to `interaction_cst=[{0, 1}, {2, 3, 4}]`, and specifies that each branch of a tree will either only split on features 0 and 1 or only split on features 2, 3 and 4. *Added in 1.2* - `warm_start`: bool, default=False When set to ``True``, reuse the solution of the previous call to fit and add more estimators to the ensemble. For results to be valid, the estimator should be re-trained on the same data only. See `the Glossary `. - `early_stopping`: 'auto' or bool, default='auto' If 'auto', early stopping is enabled if the sample size is larger than 10000. If True, early stopping is enabled, otherwise early stopping is disabled. *Added in 0.23* - `scoring`: str or callable or None, default='loss' Scoring parameter to use for early stopping. It can be a single string (see :ref:`scoring_parameter`) or a callable (see :ref:`scoring`). If None, the estimator's default scorer is used. If ``scoring='loss'``, early stopping is checked w.r.t the loss value. Only used if early stopping is performed. - `validation_fraction`: int or float or None, default=0.1 Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data. Only used if early stopping is performed. - `n_iter_no_change`: int, default=10 Used to determine when to "early stop". The fitting process is stopped when none of the last ``n_iter_no_change`` scores are better than the ``n_iter_no_change - 1`` -th-to-last one, up to some tolerance. Only used if early stopping is performed. - `tol`: float, default=1e-7 The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score. - `verbose`: int, default=0 The verbosity level. If not zero, print some information about the fitting process. - `random_state`: int, RandomState instance or None, default=None Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. Pass an int for reproducible output across multiple function calls. See `Glossary `. - `class_weight`: dict or 'balanced', default=None Weights associated with classes in the form `{class_label: weight}`. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as `n_samples / (n_classes * np.bincount(y))`. Note that these weights will be multiplied with sample_weight (passed through the fit method) if `sample_weight` is specified. *Added in 1.2* Attributes ---------- - `classes_`: array, shape = (n_classes,) Class labels. - `do_early_stopping_`: bool Indicates whether early stopping is used during training. - `n_iter_`: int The number of iterations as selected by early stopping, depending on the `early_stopping` parameter. Otherwise it corresponds to max_iter. - `n_trees_per_iteration_`: int The number of tree that are built at each iteration. This is equal to 1 for binary classification, and to ``n_classes`` for multiclass classification. - `train_score_`: ndarray, shape (n_iter_+1,) The scores at each iteration on the training data. The first entry is the score of the ensemble before the first iteration. Scores are computed according to the ``scoring`` parameter. If ``scoring`` is not 'loss', scores are computed on a subset of at most 10 000 samples. Empty if no early stopping. - `validation_score_`: ndarray, shape (n_iter_+1,) The scores at each iteration on the held-out validation data. The first entry is the score of the ensemble before the first iteration. Scores are computed according to the ``scoring`` parameter. Empty if no early stopping or if ``validation_fraction`` is None. - `is_categorical_`: ndarray, shape (n_features, ) or None Boolean mask for the categorical features. ``None`` if there are no categorical features. - `n_features_in_`: int Number of features seen during `fit`. *Added in 0.24* - `feature_names_in_`: ndarray of shape (`n_features_in_`,) Names of features seen during `fit`. Defined only when `X` has feature names that are all strings. *Added in 1.0* See Also -------- - `GradientBoostingClassifier`: Exact gradient boosting method that does not scale as good on datasets with a large number of samples. - `sklearn.tree.DecisionTreeClassifier`: A decision tree classifier. - `RandomForestClassifier`: A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. - `AdaBoostClassifier`: A meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases. Examples -------- >>> from sklearn.ensemble import HistGradientBoostingClassifier >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> clf = HistGradientBoostingClassifier().fit(X, y) >>> clf.score(X, y) 1.0
24.2.15 /k-neighbors-classifier
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| leaf-size | |||
| metric | |||
| metric-params | |||
| n-jobs | |||
| n-neighbors | |||
| p | |||
| weights | |||
| predict-proba? |
Classifier implementing the k-nearest neighbors vote.
Read more in the User Guide: `classification`.
Parameters
----------
- `n_neighbors`: int, default=5
Number of neighbors to use by default for `kneighbors` queries.
- `weights`: {'uniform', 'distance'}, callable or None, default='uniform'
Weight function used in prediction. Possible values:
- 'uniform' : uniform weights. All points in each neighborhood
are weighted equally.
- 'distance' : weight points by the inverse of their distance.
in this case, closer neighbors of a query point will have a
greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an
array of distances, and returns an array of the same shape
containing the weights.
Refer to the example entitled
:ref:`sphx_glr_auto_examples_neighbors_plot_classification.py`
showing the impact of the `weights` parameter on the decision
boundary.
- `algorithm`: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
Algorithm used to compute the nearest neighbors:
- 'ball_tree' will use `BallTree`
- 'kd_tree' will use `KDTree`
- 'brute' will use a brute-force search.
- 'auto' will attempt to decide the most appropriate algorithm
based on the values passed to `fit` method.
Note: fitting on sparse input will override the setting of
this parameter, using brute force.
- `leaf_size`: int, default=30
Leaf size passed to BallTree or KDTree. This can affect the
speed of the construction and query, as well as the memory
required to store the tree. The optimal value depends on the
nature of the problem.
- `p`: float, default=2
Power parameter for the Minkowski metric. When p = 1, this is equivalent
to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2.
For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected
to be positive.
- `metric`: str or callable, default='minkowski'
Metric to use for distance computation. Default is "minkowski", which
results in the standard Euclidean distance when p = 2. See the
documentation of [scipy.spatial.distance
](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) and
the metrics listed in
`~sklearn.metrics.pairwise.distance_metrics` for valid metric
values.
If metric is "precomputed", X is assumed to be a distance matrix and
must be square during fit. X may be a `sparse graph`, in which
case only "nonzero" elements may be considered neighbors.
If metric is a callable function, it takes two arrays representing 1D
vectors as inputs and must return one value indicating the distance
between those vectors. This works for Scipy's metrics, but is less
efficient than passing the metric name as a string.
- `metric_params`: dict, default=None
Additional keyword arguments for the metric function.
- `n_jobs`: int, default=None
The number of parallel jobs to run for neighbors search.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors. See `Glossary `
for more details.
Doesn't affect `fit` method.
Attributes
----------
- `classes_`: array of shape (n_classes,)
Class labels known to the classifier
- `effective_metric_`: str or callble
The distance metric used. It will be same as the `metric` parameter
or a synonym of it, e.g. 'euclidean' if the `metric` parameter set to
'minkowski' and `p` parameter set to 2.
- `effective_metric_params_`: dict
Additional keyword arguments for the metric function. For most metrics
will be same with `metric_params` parameter, but may also contain the
`p` parameter value if the `effective_metric_` attribute is set to
'minkowski'.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_samples_fit_`: int
Number of samples in the fitted data.
- `outputs_2d_`: bool
False when `y`'s shape is (n_samples, ) or (n_samples, 1) during fit
otherwise True.
See Also
--------
RadiusNeighborsClassifier: Classifier based on neighbors within a fixed radius.
KNeighborsRegressor: Regression based on k-nearest neighbors.
RadiusNeighborsRegressor: Regression based on neighbors within a fixed radius.
NearestNeighbors: Unsupervised learner for implementing neighbor searches.
Notes
-----
See Nearest Neighbors: `neighbors` in the online documentation
for a discussion of the choice of ``algorithm`` and ``leaf_size``.
⚠️ Warning
Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.
ps://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
mples
X = [[0], [1], [2], [3]] y = [0, 0, 1, 1] from sklearn.neighbors import KNeighborsClassifier neigh = KNeighborsClassifier(n_neighbors=3) neigh.fit(X, y) ighborsClassifier(...) print(neigh.predict([[1.1]]))
print(neigh.predict_proba([[0.9]])) .666... 0.333...]]
24.2.16 /label-propagation
| name | type | default | description |
|---|---|---|---|
| gamma | |||
| kernel | |||
| max-iter | |||
| n-jobs | |||
| n-neighbors | |||
| tol | |||
| predict-proba? |
Label Propagation classifier.
Read more in the User Guide: `label_propagation`.
Parameters
----------
- `kernel`: {'knn', 'rbf'} or callable, default='rbf'
String identifier for kernel function to use or the kernel function
itself. Only 'rbf' and 'knn' strings are valid inputs. The function
passed should take two inputs, each of shape (n_samples, n_features),
and return a (n_samples, n_samples) shaped weight matrix.
- `gamma`: float, default=20
Parameter for rbf kernel.
- `n_neighbors`: int, default=7
Parameter for knn kernel which need to be strictly positive.
- `max_iter`: int, default=1000
Change maximum number of iterations allowed.
- `tol`: float, 1e-3
Convergence tolerance: threshold to consider the system at steady
state.
- `n_jobs`: int, default=None
The number of parallel jobs to run.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors. See `Glossary `
for more details.
Attributes
----------
- `X_`: {array-like, sparse matrix} of shape (n_samples, n_features)
Input array.
- `classes_`: ndarray of shape (n_classes,)
The distinct labels used in classifying instances.
- `label_distributions_`: ndarray of shape (n_samples, n_classes)
Categorical distribution for each item.
- `transduction_`: ndarray of shape (n_samples)
Label assigned to each item during `fit`.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_iter_`: int
Number of iterations run.
See Also
--------
- `LabelSpreading`: Alternate label propagation strategy more robust to noise.
References
----------
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data
with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon
University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf
Examples
--------
>>> import numpy as np
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelPropagation
>>> label_prop_model = LabelPropagation()
>>> iris = datasets.load_iris()
>>> rng = np.random.RandomState(42)
>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
LabelPropagation(...)
24.2.17 /label-spreading
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| gamma | |||
| kernel | |||
| max-iter | |||
| n-jobs | |||
| n-neighbors | |||
| tol | |||
| predict-proba? |
LabelSpreading model for semi-supervised learning.
This model is similar to the basic Label Propagation algorithm,
but uses affinity matrix based on the normalized graph Laplacian
and soft clamping across the labels.
Read more in the User Guide: `label_propagation`.
Parameters
----------
- `kernel`: {'knn', 'rbf'} or callable, default='rbf'
String identifier for kernel function to use or the kernel function
itself. Only 'rbf' and 'knn' strings are valid inputs. The function
passed should take two inputs, each of shape (n_samples, n_features),
and return a (n_samples, n_samples) shaped weight matrix.
- `gamma`: float, default=20
Parameter for rbf kernel.
- `n_neighbors`: int, default=7
Parameter for knn kernel which is a strictly positive integer.
- `alpha`: float, default=0.2
Clamping factor. A value in (0, 1) that specifies the relative amount
that an instance should adopt the information from its neighbors as
opposed to its initial label.
alpha=0 means keeping the initial label information; alpha=1 means
replacing all initial information.
- `max_iter`: int, default=30
Maximum number of iterations allowed.
- `tol`: float, default=1e-3
Convergence tolerance: threshold to consider the system at steady
state.
- `n_jobs`: int, default=None
The number of parallel jobs to run.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors. See `Glossary `
for more details.
Attributes
----------
- `X_`: ndarray of shape (n_samples, n_features)
Input array.
- `classes_`: ndarray of shape (n_classes,)
The distinct labels used in classifying instances.
- `label_distributions_`: ndarray of shape (n_samples, n_classes)
Categorical distribution for each item.
- `transduction_`: ndarray of shape (n_samples,)
Label assigned to each item during `fit`.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_iter_`: int
Number of iterations run.
See Also
--------
- `LabelPropagation`: Unregularized graph based semi-supervised learning.
References
----------
[Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston,
Bernhard Schoelkopf. Learning with local and global consistency (2004)
](https://citeseerx.ist.psu.edu/doc_view/pid/d74c37aabf2d5cae663007cbd8718175466aea8c)
Examples
--------
>>> import numpy as np
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelSpreading
>>> label_prop_model = LabelSpreading()
>>> iris = datasets.load_iris()
>>> rng = np.random.RandomState(42)
>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
LabelSpreading(...)
24.2.18 /linear-discriminant-analysis
| name | type | default | description |
|---|---|---|---|
| covariance-estimator | |||
| n-components | |||
| priors | |||
| shrinkage | |||
| solver | |||
| store-covariance | |||
| tol | |||
| predict-proba? |
Linear Discriminant Analysis.
A classifier with a linear decision boundary, generated by fitting class
conditional densities to the data and using Bayes' rule.
The model fits a Gaussian density to each class, assuming that all classes
share the same covariance matrix.
The fitted model can also be used to reduce the dimensionality of the input
by projecting it to the most discriminative directions, using the
`transform` method.
*Added in 0.17*
For a comparison between
`~sklearn.discriminant_analysis.LinearDiscriminantAnalysis`
and `~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis`, see
:ref:`sphx_glr_auto_examples_classification_plot_lda_qda.py`.
Read more in the User Guide: `lda_qda`.
Parameters
----------
- `solver`: {'svd', 'lsqr', 'eigen'}, default='svd'
Solver to use, possible values:
- 'svd': Singular value decomposition (default).
Does not compute the covariance matrix, therefore this solver is
recommended for data with a large number of features.
- 'lsqr': Least squares solution.
Can be combined with shrinkage or custom covariance estimator.
- 'eigen': Eigenvalue decomposition.
Can be combined with shrinkage or custom covariance estimator.
*Changed in 1.2*
`solver="svd"` now has experimental Array API support. See the
Array API User Guide: `array_api` for more details.
- `shrinkage`: 'auto' or float, default=None
Shrinkage parameter, possible values:
- None: no shrinkage (default).
- 'auto': automatic shrinkage using the Ledoit-Wolf lemma.
- float between 0 and 1: fixed shrinkage parameter.
This should be left to None if `covariance_estimator` is used.
Note that shrinkage works only with 'lsqr' and 'eigen' solvers.
For a usage example, see
:ref:`sphx_glr_auto_examples_classification_plot_lda.py`.
- `priors`: array-like of shape (n_classes,), default=None
The class prior probabilities. By default, the class proportions are
inferred from the training data.
- `n_components`: int, default=None
Number of components (<= min(n_classes - 1, n_features)) for
dimensionality reduction. If None, will be set to
min(n_classes - 1, n_features). This parameter only affects the
`transform` method.
For a usage example, see
:ref:`sphx_glr_auto_examples_decomposition_plot_pca_vs_lda.py`.
- `store_covariance`: bool, default=False
If True, explicitly compute the weighted within-class covariance
matrix when solver is 'svd'. The matrix is always computed
and stored for the other solvers.
*Added in 0.17*
- `tol`: float, default=1.0e-4
Absolute threshold for a singular value of X to be considered
significant, used to estimate the rank of X. Dimensions whose
singular values are non-significant are discarded. Only used if
solver is 'svd'.
*Added in 0.17*
- `covariance_estimator`: covariance estimator, default=None
If not None, `covariance_estimator` is used to estimate
the covariance matrices instead of relying on the empirical
covariance estimator (with potential shrinkage).
The object should have a fit method and a ``covariance_`` attribute
like the estimators in `sklearn.covariance`.
if None the shrinkage parameter drives the estimate.
This should be left to None if `shrinkage` is used.
Note that `covariance_estimator` works only with 'lsqr' and 'eigen'
solvers.
*Added in 0.24*
Attributes
----------
- `coef_`: ndarray of shape (n_features,) or (n_classes, n_features)
Weight vector(s).
- `intercept_`: ndarray of shape (n_classes,)
Intercept term.
- `covariance_`: array-like of shape (n_features, n_features)
Weighted within-class covariance matrix. It corresponds to
`sum_k prior_k * C_k` where `C_k` is the covariance matrix of the
samples in class `k`. The `C_k` are estimated using the (potentially
shrunk) biased estimator of covariance. If solver is 'svd', only
exists when `store_covariance` is True.
- `explained_variance_ratio_`: ndarray of shape (n_components,)
Percentage of variance explained by each of the selected components.
If ``n_components`` is not set then all components are stored and the
sum of explained variances is equal to 1.0. Only available when eigen
or svd solver is used.
- `means_`: array-like of shape (n_classes, n_features)
Class-wise means.
- `priors_`: array-like of shape (n_classes,)
Class priors (sum to 1).
- `scalings_`: array-like of shape (rank, n_classes - 1)
Scaling of the features in the space spanned by the class centroids.
Only available for 'svd' and 'eigen' solvers.
- `xbar_`: array-like of shape (n_features,)
Overall mean. Only present if solver is 'svd'.
- `classes_`: array-like of shape (n_classes,)
Unique class labels.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `QuadraticDiscriminantAnalysis`: Quadratic Discriminant Analysis.
Examples
--------
>>> import numpy as np
>>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LinearDiscriminantAnalysis()
>>> clf.fit(X, y)
LinearDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]
24.2.19 /linear-svc
| name | type | default | description |
|---|---|---|---|
| tol | |||
| intercept-scaling | |||
| multi-class | |||
| penalty | |||
| c | |||
| max-iter | |||
| random-state | |||
| dual | |||
| fit-intercept | |||
| class-weight | |||
| loss | |||
| verbose | |||
| predict-proba? |
Linear Support Vector Classification.
Similar to SVC with parameter kernel='linear', but implemented in terms of
liblinear rather than libsvm, so it has more flexibility in the choice of
penalties and loss functions and should scale better to large numbers of
samples.
The main differences between `~sklearn.svm.LinearSVC` and
`~sklearn.svm.SVC` lie in the loss function used by default, and in
the handling of intercept regularization between those two implementations.
This class supports both dense and sparse input and the multiclass support
is handled according to a one-vs-the-rest scheme.
Read more in the User Guide: `svm_classification`.
Parameters
----------
- `penalty`: {'l1', 'l2'}, default='l2'
Specifies the norm used in the penalization. The 'l2'
penalty is the standard used in SVC. The 'l1' leads to ``coef_``
vectors that are sparse.
- `loss`: {'hinge', 'squared_hinge'}, default='squared_hinge'
Specifies the loss function. 'hinge' is the standard SVM loss
(used e.g. by the SVC class) while 'squared_hinge' is the
square of the hinge loss. The combination of ``penalty='l1'``
and ``loss='hinge'`` is not supported.
- `dual`: "auto" or bool, default="auto"
Select the algorithm to either solve the dual or primal
optimization problem. Prefer dual=False when n_samples > n_features.
`dual="auto"` will choose the value of the parameter automatically,
based on the values of `n_samples`, `n_features`, `loss`, `multi_class`
and `penalty`. If `n_samples` < `n_features` and optimizer supports
chosen `loss`, `multi_class` and `penalty`, then dual will be set to True,
otherwise it will be set to False.
*Changed in 1.3*
The `"auto"` option is added in version 1.3 and will be the default
in version 1.5.
- `tol`: float, default=1e-4
Tolerance for stopping criteria.
- `C`: float, default=1.0
Regularization parameter. The strength of the regularization is
inversely proportional to C. Must be strictly positive.
For an intuitive visualization of the effects of scaling
the regularization parameter C, see
:ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`.
- `multi_class`: {'ovr', 'crammer_singer'}, default='ovr'
Determines the multi-class strategy if `y` contains more than
two classes.
``"ovr"`` trains n_classes one-vs-rest classifiers, while
``"crammer_singer"`` optimizes a joint objective over all classes.
While `crammer_singer` is interesting from a theoretical perspective
as it is consistent, it is seldom used in practice as it rarely leads
to better accuracy and is more expensive to compute.
If ``"crammer_singer"`` is chosen, the options loss, penalty and dual
will be ignored.
- `fit_intercept`: bool, default=True
Whether or not to fit an intercept. If set to True, the feature vector
is extended to include an intercept term: `[x_1, ..., x_n, 1]`, where
1 corresponds to the intercept. If set to False, no intercept will be
used in calculations (i.e. data is expected to be already centered).
- `intercept_scaling`: float, default=1.0
When `fit_intercept` is True, the instance vector x becomes ``[x_1,
..., x_n, intercept_scaling]``, i.e. a "synthetic" feature with a
constant value equal to `intercept_scaling` is appended to the instance
vector. The intercept becomes intercept_scaling * synthetic feature
weight. Note that liblinear internally penalizes the intercept,
treating it like any other term in the feature vector. To reduce the
impact of the regularization on the intercept, the `intercept_scaling`
parameter can be set to a value greater than 1; the higher the value of
`intercept_scaling`, the lower the impact of regularization on it.
Then, the weights become `[w_x_1, ..., w_x_n,
w_intercept*intercept_scaling]`, where `w_x_1, ..., w_x_n` represent
the feature weights and the intercept weight is scaled by
`intercept_scaling`. This scaling allows the intercept term to have a
different regularization behavior compared to the other features.
- `class_weight`: dict or 'balanced', default=None
Set the parameter C of class i to ``class_weight[i]*C`` for
SVC. If not given, all classes are supposed to have
weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
- `verbose`: int, default=0
Enable verbose output. Note that this setting takes advantage of a
per-process runtime setting in liblinear that, if enabled, may not work
properly in a multithreaded context.
- `random_state`: int, RandomState instance or None, default=None
Controls the pseudo random number generation for shuffling the data for
the dual coordinate descent (if ``dual=True``). When ``dual=False`` the
underlying implementation of `LinearSVC` is not random and
``random_state`` has no effect on the results.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
- `max_iter`: int, default=1000
The maximum number of iterations to be run.
Attributes
----------
- `coef_`: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features)
Weights assigned to the features (coefficients in the primal
problem).
``coef_`` is a readonly property derived from ``raw_coef_`` that
follows the internal memory layout of liblinear.
- `intercept_`: ndarray of shape (1,) if n_classes == 2 else (n_classes,)
Constants in decision function.
- `classes_`: ndarray of shape (n_classes,)
The unique classes labels.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_iter_`: int
Maximum number of iterations run across all classes.
See Also
--------
- `SVC`: Implementation of Support Vector Machine classifier using libsvm:
the kernel can be non-linear but its SMO algorithm does not
scale to large number of samples as LinearSVC does.
Furthermore SVC multi-class mode is implemented using one
vs one scheme while LinearSVC uses one vs the rest. It is
possible to implement one vs the rest with SVC by using the
`~sklearn.multiclass.OneVsRestClassifier` wrapper.
Finally SVC can fit dense data without memory copy if the input
is C-contiguous. Sparse data will still incur memory copy though.
- `sklearn.linear_model.SGDClassifier`: SGDClassifier can optimize the same
cost function as LinearSVC
by adjusting the penalty and loss parameters. In addition it requires
less memory, allows incremental (online) learning, and implements
various loss functions and regularization regimes.
Notes
-----
The underlying C implementation uses a random number generator to
select features when fitting the model. It is thus not uncommon
to have slightly different results for the same input data. If
that happens, try with a smaller ``tol`` parameter.
The underlying implementation, liblinear, uses a sparse internal
representation for the data that will incur a memory copy.
Predict output may not match that of standalone liblinear in certain
cases. See differences from liblinear: `liblinear_differences`
in the narrative documentation.
References
----------
[LIBLINEAR: A Library for Large Linear Classification
](https://www.csie.ntu.edu.tw/~cjlin/liblinear/)
Examples
--------
>>> from sklearn.svm import LinearSVC
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_features=4, random_state=0)
>>> clf = make_pipeline(StandardScaler(),
... LinearSVC(random_state=0, tol=1e-5))
>>> clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
('linearsvc', LinearSVC(random_state=0, tol=1e-05))])
>>> print(clf.named_steps['linearsvc'].coef_)
[[0.141... 0.526... 0.679... 0.493...]]
>>> print(clf.named_steps['linearsvc'].intercept_)
[0.1693...]
>>> print(clf.predict([[0, 0, 0, 0]]))
[1]
24.2.20 /logistic-regression
| name | type | default | description |
|---|---|---|---|
| tol | |||
| intercept-scaling | |||
| multi-class | |||
| solver | |||
| penalty | |||
| c | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| dual | |||
| fit-intercept | |||
| warm-start | |||
| l-1-ratio | |||
| class-weight | |||
| verbose | |||
| predict-proba? |
Logistic Regression (aka logit, MaxEnt) classifier.
In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
scheme if the 'multi_class' option is set to 'ovr', and uses the
cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
(Currently the 'multinomial' option is supported only by the 'lbfgs',
'sag', 'saga' and 'newton-cg' solvers.)
This class implements regularized logistic regression using the
'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note
that regularization is applied by default**. It can handle both dense
and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit
floats for optimal performance; any other input format will be converted
(and copied).
The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
with primal formulation, or no regularization. The 'liblinear' solver
supports both L1 and L2 regularization, with a dual formulation only for
the L2 penalty. The Elastic-Net regularization is only supported by the
'saga' solver.
Read more in the User Guide: `logistic_regression`.
Parameters
----------
- `penalty`: {'l1', 'l2', 'elasticnet', None}, default='l2'
Specify the norm of the penalty:
- `None`: no penalty is added;
- `'l2'`: add a L2 penalty term and it is the default choice;
- `'l1'`: add a L1 penalty term;
- `'elasticnet'`: both L1 and L2 penalty terms are added.
⚠️ Warning
Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.
versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1)
bool, default=False l (constrained) or primal (regularized, see also f:this equation ) formulation. Dual formulation only implemented for l2 penalty with liblinear solver. Prefer dual=False when amples > n_features.
loat, default=1e-4 erance for stopping criteria.
at, default=1.0 erse of regularization strength; must be a positive float. e in support vector machines, smaller values specify stronger ularization.
ercept : bool, default=True cifies if a constant (a.k.a. bias or intercept) should be ed to the decision function.
pt_scaling : float, default=1 ful only when the solver 'liblinear' is used self.fit_intercept is set to True. In this case, x becomes self.intercept_scaling], . a "synthetic" feature with constant value equal to ercept_scaling is appended to the instance vector. intercept becomes intercept_scaling * synthetic_feature_weight.
e! the synthetic feature weight is subject to l1/l2 regularization all other features. lessen the effect of regularization on synthetic feature weight d therefore on the intercept) intercept_scaling has to be increased.
eight : dict or 'balanced', default=None ghts associated with classes in the form {class_label: weight}. not given, all classes are supposed to have weight one.
"balanced" mode uses the values of y to automatically adjust ghts inversely proportional to class frequencies in the input data n_samples / (n_classes * np.bincount(y)).
e that these weights will be multiplied with sample_weight (passed ough the fit method) if sample_weight is specified.
versionadded:: 0.17 class_weight='balanced'
state : int, RandomState instance, default=None d when solver == 'sag', 'saga' or 'liblinear' to shuffle the a. See :term:Glossary for details.
: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'
orithm to use in the optimization problem. Default is 'lbfgs'. choose a solver, you might want to consider the following aspects:
or small datasets, 'liblinear' is a good choice, whereas 'sag' nd 'saga' are faster for large ones; or multiclass problems, only 'newton-cg', 'sag', 'saga' and lbfgs' handle multinomial loss; liblinear' and 'newton-cholesky' can only handle binary classification y default. To apply a one-versus-rest scheme for the multiclass setting ne can wrapt it with the OneVsRestClassifier. newton-cholesky' is a good choice for n_samples >> n_features, specially with one-hot encoded categorical features with rare ategories. Be aware that the memory usage of this solver has a quadratic ependency on n_features because it explicitly computes the Hessian atrix.
warning:: The choice of the algorithm depends on the penalty chosen and on (multinomial) multiclass support:
================= ============================== ====================== solver penalty multinomial multiclass ================= ============================== ====================== 'lbfgs' 'l2', None yes 'liblinear' 'l1', 'l2' no 'newton-cg' 'l2', None yes 'newton-cholesky' 'l2', None no 'sag' 'l2', None yes 'saga' 'elasticnet', 'l1', 'l2', None yes ================= ============================== ======================
note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.
seealso:: Refer to the User Guide for more information regarding :class:LogisticRegression and more specifically the :ref:Table summarizing solver/penalty supports.
versionadded:: 0.17 Stochastic Average Gradient descent solver. versionadded:: 0.19 SAGA solver. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. versionadded:: 1.2 newton-cholesky solver.
r : int, default=100 imum number of iterations taken for the solvers to converge.
lass : {'auto', 'ovr', 'multinomial'}, default='auto' the option chosen is 'ovr', then a binary problem is fit for each el. For 'multinomial' the loss minimised is the multinomial loss fit oss the entire probability distribution, even when the data is ary. 'multinomial' is unavailable when solver='liblinear'. to' selects 'ovr' if the data is binary, or if solver='liblinear', otherwise selects 'multinomial'.
versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22. deprecated:: 1.5 multi_class was deprecated in version 1.5 and will be removed in 1.7. From then on, the recommended 'multinomial' will always be used for n_classes >= 3. Solvers that do not support 'multinomial' will raise an error. Use sklearn.multiclass.OneVsRestClassifier(LogisticRegression()) if you still want to use OvR.
: int, default=0 the liblinear and lbfgs solvers set verbose to any positive ber for verbosity.
art : bool, default=False n set to True, reuse the solution of the previous call to fit as tialization, otherwise, just erase the previous solution. less for liblinear solver. See :term:the Glossary .
versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.
: int, default=None ber of CPU cores used when parallelizing over classes if ti_class='ovr'". This parameter is ignored when the solver is to 'liblinear' regardless of whether 'multi_class' is specified or . None means 1 unless in a :obj:joblib.parallel_backend text. -1 means using all processors. :term:Glossary for more details.
o : float, default=None Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only d if penalty='elasticnet'. Setting l1_ratio=0 is equivalent using penalty='l2', while setting l1_ratio=1 is equivalent using penalty='l1'. For 0 < l1_ratio <1, the penalty is a bination of L1 and L2.
tes
_ : ndarray of shape (n_classes, ) ist of class labels known to the classifier.
ndarray of shape (1, n_features) or (n_classes, n_features) fficient of the features in the decision function.
ef_is of shape (1, n_features) when the given problem is binary. particular, whenmulti_class='multinomial', coef_corresponds outcome 1 (True) and-coef_` corresponds to outcome 0 (False).
pt_ : ndarray of shape (1,) or (n_classes,) ercept (a.k.a. bias) added to the decision function.
fit_intercept is set to False, the intercept is set to zero. tercept_is of shape (1,) when the given problem is binary. particular, whenmulti_class='multinomial', intercept_responds to outcome 1 (True) and-intercept_` corresponds to come 0 (False).
res_in_ : int ber of features seen during :term:fit.
versionadded:: 0.24
names_in : ndarray of shape (n_features_in_,) es of features seen during :term:fit. Defined only when X feature names that are all strings.
versionadded:: 1.0
: ndarray of shape (n_classes,) or (1, ) ual number of iterations for all classes. If binary or multinomial, returns only 1 element. For liblinear solver, only the maximum ber of iteration across all classes is given.
versionchanged:: 0.20
In SciPy <= 1.0.0 the number of lbfgs iterations may exceed max_iter. n_iter_ will now report at most max_iter.
o
sifier : Incrementally trained logistic regression (when given parameter loss="log_loss"). cRegressionCV : Logistic regression with built-in cross validation.
erlying C implementation uses a random number generator to features when fitting the model. It is thus not uncommon, slightly different results for the same input data. If ppens, try with a smaller tol parameter.
output may not match that of standalone liblinear in certain See :ref:differences from liblinear narrative documentation.
ces
B -- Software for Large-scale Bound-constrained Optimization ou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales. p://users.iems.northwestern.edu/~nocedal/lbfgsb.html
AR -- A Library for Large Linear Classification ps://www.csie.ntu.edu.tw/~cjlin/liblinear/
Mark Schmidt, Nicolas Le Roux, and Francis Bach imizing Finite Sums with the Stochastic Average Gradient ps://hal.inria.fr/hal-00860051/document
Defazio, A., Bach F. & Lacoste-Julien S. (2014). :arxiv:"SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives" <1407.0202>
Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent hods for logistic regression and maximum entropy models. hine Learning 85(1-2):41-75. ps://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf
s
m sklearn.datasets import load_iris m sklearn.linear_model import LogisticRegression y = load_iris(return_X_y=True) = LogisticRegression(random_state=0).fit(X, y) .predict(X[:2, :]) 0, 0]) .predict_proba(X[:2, :]) [9.8...e-01, 1.8...e-02, 1.4...e-08], [9.7...e-01, 2.8...e-02, ...e-08]]) .score(X, y)
24.2.21 /logistic-regression-cv
| name | type | default | description |
|---|---|---|---|
| refit | |||
| scoring | |||
| tol | |||
| intercept-scaling | |||
| multi-class | |||
| solver | |||
| penalty | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| dual | |||
| fit-intercept | |||
| cv | |||
| cs | |||
| class-weight | |||
| verbose | |||
| l-1-ratios | |||
| predict-proba? |
Logistic Regression CV (aka logit, MaxEnt) classifier.
See glossary entry for `cross-validation estimator`. This class implements logistic regression using liblinear, newton-cg, sag or lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2 regularization with primal formulation. The liblinear solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. Elastic-Net penalty is only supported by the saga solver. For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter is selected by the cross-validator `~sklearn.model_selection.StratifiedKFold`, but it can be changed using the `cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers can warm-start the coefficients (see `Glossary`). Read more in the User Guide: `logistic_regression`. Parameters ---------- - `Cs`: int or list of floats, default=10 Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4. Like in support vector machines, smaller values specify stronger regularization. - `fit_intercept`: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. - `cv`: int or cross-validation generator, default=None The default cross-validation generator used is Stratified K-Folds. If an integer is provided, then it is the number of folds used. See the module `sklearn.model_selection` module for the list of possible cross-validation objects. *Changed in 0.22* ``cv`` default value if None changed from 3-fold to 5-fold. - `dual`: bool, default=False Dual (constrained) or primal (regularized, see also this equation: `regularized-logistic-loss`) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features. - `penalty`: {'l1', 'l2', 'elasticnet'}, default='l2' Specify the norm of the penalty: - `'l2'`: add a L2 penalty term (used by default); - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added.
⚠️ Warning
Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.
: str or callable, default=None tring (see model evaluation documentation) or corer callable object / function with signature corer(estimator, X, y)``. For a list of scoring functions t can be used, look at :mod:sklearn.metrics. The ault scoring option used is 'accuracy'.
: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'
orithm to use in the optimization problem. Default is 'lbfgs'. choose a solver, you might want to consider the following aspects:
or small datasets, 'liblinear' is a good choice, whereas 'sag' nd 'saga' are faster for large ones; or multiclass problems, only 'newton-cg', 'sag', 'saga' and lbfgs' handle multinomial loss; liblinear' might be slower in :class:LogisticRegressionCV ecause it does not handle warm-starting. liblinear' and 'newton-cholesky' can only handle binary classification y default. To apply a one-versus-rest scheme for the multiclass setting ne can wrapt it with the OneVsRestClassifier. newton-cholesky' is a good choice for n_samples >> n_features, specially with one-hot encoded categorical features with rare ategories. Be aware that the memory usage of this solver has a quadratic ependency on n_features because it explicitly computes the Hessian atrix.
warning:: The choice of the algorithm depends on the penalty chosen and on (multinomial) multiclass support:
================= ============================== ====================== solver penalty multinomial multiclass ================= ============================== ====================== 'lbfgs' 'l2' yes 'liblinear' 'l1', 'l2' no 'newton-cg' 'l2' yes 'newton-cholesky' 'l2', no 'sag' 'l2', yes 'saga' 'elasticnet', 'l1', 'l2' yes ================= ============================== ======================
note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.
versionadded:: 0.17 Stochastic Average Gradient descent solver. versionadded:: 0.19 SAGA solver. versionadded:: 1.2 newton-cholesky solver.
loat, default=1e-4 erance for stopping criteria.
r : int, default=100 imum number of iterations of the optimization algorithm.
eight : dict or 'balanced', default=None ghts associated with classes in the form {class_label: weight}. not given, all classes are supposed to have weight one.
"balanced" mode uses the values of y to automatically adjust ghts inversely proportional to class frequencies in the input data n_samples / (n_classes * np.bincount(y)).
e that these weights will be multiplied with sample_weight (passed ough the fit method) if sample_weight is specified.
versionadded:: 0.17 class_weight == 'balanced'
: int, default=None ber of CPU cores used during the cross-validation loop. one means 1 unless in a :obj:`joblib.parallel_backend` context. 1 means using all processors. See :term:Glossary more details.
: int, default=0 the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any itive number for verbosity.
bool, default=True set to True, the scores are averaged across all folds, and the fs and the C that corresponds to the best score is taken, and a al refit is done using these parameters. erwise the coefs, intercepts and C that correspond to the t scores across folds are averaged.
pt_scaling : float, default=1 ful only when the solver 'liblinear' is used self.fit_intercept is set to True. In this case, x becomes self.intercept_scaling], . a "synthetic" feature with constant value equal to ercept_scaling is appended to the instance vector. intercept becomes intercept_scaling * synthetic_feature_weight.
e! the synthetic feature weight is subject to l1/l2 regularization all other features. lessen the effect of regularization on synthetic feature weight d therefore on the intercept) intercept_scaling has to be increased.
lass : {'auto, 'ovr', 'multinomial'}, default='auto' the option chosen is 'ovr', then a binary problem is fit for each el. For 'multinomial' the loss minimised is the multinomial loss fit oss the entire probability distribution, even when the data is ary. 'multinomial' is unavailable when solver='liblinear'. to' selects 'ovr' if the data is binary, or if solver='liblinear', otherwise selects 'multinomial'.
versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22. deprecated:: 1.5 multi_class was deprecated in version 1.5 and will be removed in 1.7. From then on, the recommended 'multinomial' will always be used for n_classes >= 3. Solvers that do not support 'multinomial' will raise an error. Use sklearn.multiclass.OneVsRestClassifier(LogisticRegressionCV()) if you still want to use OvR.
state : int, RandomState instance, default=None d when solver='sag', 'saga' or 'liblinear' to shuffle the data. e that this only applies to the solver and not the cross-validation erator. See :term:Glossary for details.
os : list of float, default=None list of Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. y used if penalty='elasticnet'. A value of 0 is equivalent to ng penalty='l2', while 1 is equivalent to using enalty='l1'. For 0 < l1_ratio <1``, the penalty is a combination L1 and L2.
tes
_ : ndarray of shape (n_classes, ) ist of class labels known to the classifier.
ndarray of shape (1, n_features) or (n_classes, n_features) fficient of the features in the decision function.
ef_` is of shape (1, n_features) when the given problem binary.
pt_ : ndarray of shape (1,) or (n_classes,) ercept (a.k.a. bias) added to the decision function.
fit_intercept is set to False, the intercept is set to zero. tercept_` is of shape(1,) when the problem is binary.
darray of shape (n_cs) ay of C i.e. inverse of regularization parameter values used cross-validation.
os_ : ndarray of shape (n_l1_ratios) ay of l1_ratios used for cross-validation. If no l1_ratio is used e. penalty is not 'elasticnet'), this is set to [None]
aths_ : ndarray of shape (n_folds, n_cs, n_features) or (n_folds, n_cs, n_features + 1) t with classes as the keys, and the path of coefficients obtained ing cross-validating across each fold and then across each Cs er doing an OvR for the corresponding class as values. the 'multi_class' option is set to 'multinomial', then coefs_paths are the coefficients corresponding to each class. h dict value has shape (n_folds, n_cs, n_features) or n_folds, n_cs, n_features + 1)depending on whether the ercept is fit or not. Ifpenalty='elasticnet', the shape is n_folds, n_cs, n_l1_ratios_, n_features) or n_folds, n_cs, n_l1_ratios_, n_features + 1)``.
: dict t with classes as the keys, and the values as the d of scores obtained during cross-validating each fold, after doing OvR for the corresponding class. If the 'multi_class' option en is 'multinomial' then the same scores are repeated across classes, since this is the multinomial class. Each dict value shape (n_folds, n_cs) or (n_folds, n_cs, n_l1_ratios) if enalty='elasticnet'``.
array of shape (n_classes,) or (n_classes - 1,) ay of C that maps to the best scores across every class. If refit is to False, then for each class, the best C is the average of the that correspond to the best scores for each fold. ` is of shape(n_classes,) when the problem is binary.
o_ : ndarray of shape (n_classes,) or (n_classes - 1,) ay of l1_ratio that maps to the best scores across every class. If it is set to False, then for each class, the best l1_ratio is the rage of the l1_ratio's that correspond to the best scores for each d. l1_ratio_ is of shape(n_classes,) when the problem is binary.
: ndarray of shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs) ual number of iterations for all classes, folds and Cs. the binary or multinomial cases, the first dimension is equal to 1. penalty='elasticnet', the shape is (n_classes, n_folds, s, n_l1_ratios) or (1, n_folds, n_cs, n_l1_ratios).
res_in_ : int ber of features seen during :term:fit.
versionadded:: 0.24
names_in : ndarray of shape (n_features_in_,) es of features seen during :term:fit. Defined only when X feature names that are all strings.
versionadded:: 1.0
o
cRegression : Logistic regression without tuning the erparameter C.
s
m sklearn.datasets import load_iris m sklearn.linear_model import LogisticRegressionCV y = load_iris(return_X_y=True) = LogisticRegressionCV(cv=5, random_state=0).fit(X, y) .predict(X[:2, :]) 0, 0]) .predict_proba(X[:2, :]).shape
.score(X, y)
24.2.22 /mlp-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| activation | |||
| hidden-layer-sizes | |||
| tol | |||
| beta-2 | |||
| early-stopping | |||
| nesterovs-momentum | |||
| batch-size | |||
| solver | |||
| shuffle | |||
| power-t | |||
| max-fun | |||
| beta-1 | |||
| max-iter | |||
| random-state | |||
| momentum | |||
| learning-rate-init | |||
| alpha | |||
| warm-start | |||
| validation-fraction | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
Multi-layer Perceptron classifier.
This model optimizes the log-loss function using LBFGS or stochastic
gradient descent.
*Added in 0.18*
Parameters
----------
- `hidden_layer_sizes`: array-like of shape(n_layers - 2,), default=(100,)
The ith element represents the number of neurons in the ith
hidden layer.
- `activation`: {'identity', 'logistic', 'tanh', 'relu'}, default='relu'
Activation function for the hidden layer.
- 'identity', no-op activation, useful to implement linear bottleneck,
returns f(x) = x
- 'logistic', the logistic sigmoid function,
returns f(x) = 1 / (1 + exp(-x)).
- 'tanh', the hyperbolic tan function,
returns f(x) = tanh(x).
- 'relu', the rectified linear unit function,
returns f(x) = max(0, x)
- `solver`: {'lbfgs', 'sgd', 'adam'}, default='adam'
The solver for weight optimization.
- 'lbfgs' is an optimizer in the family of quasi-Newton methods.
- 'sgd' refers to stochastic gradient descent.
- 'adam' refers to a stochastic gradient-based optimizer proposed
by Kingma, Diederik, and Jimmy Ba
For a comparison between Adam optimizer and SGD, see
:ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_training_curves.py`.
Note: The default solver 'adam' works pretty well on relatively
large datasets (with thousands of training samples or more) in terms of
both training time and validation score.
For small datasets, however, 'lbfgs' can converge faster and perform
better.
- `alpha`: float, default=0.0001
Strength of the L2 regularization term. The L2 regularization term
is divided by the sample size when added to the loss.
For an example usage and visualization of varying regularization, see
:ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_alpha.py`.
- `batch_size`: int, default='auto'
Size of minibatches for stochastic optimizers.
If the solver is 'lbfgs', the classifier will not use minibatch.
When set to "auto", `batch_size=min(200, n_samples)`.
- `learning_rate`: {'constant', 'invscaling', 'adaptive'}, default='constant'
Learning rate schedule for weight updates.
- 'constant' is a constant learning rate given by
'learning_rate_init'.
- 'invscaling' gradually decreases the learning rate at each
time step 't' using an inverse scaling exponent of 'power_t'.
effective_learning_rate = learning_rate_init / pow(t, power_t)
- 'adaptive' keeps the learning rate constant to
'learning_rate_init' as long as training loss keeps decreasing.
Each time two consecutive epochs fail to decrease training loss by at
least tol, or fail to increase validation score by at least tol if
'early_stopping' is on, the current learning rate is divided by 5.
Only used when ``solver='sgd'``.
- `learning_rate_init`: float, default=0.001
The initial learning rate used. It controls the step-size
in updating the weights. Only used when solver='sgd' or 'adam'.
- `power_t`: float, default=0.5
The exponent for inverse scaling learning rate.
It is used in updating effective learning rate when the learning_rate
is set to 'invscaling'. Only used when solver='sgd'.
- `max_iter`: int, default=200
Maximum number of iterations. The solver iterates until convergence
(determined by 'tol') or this number of iterations. For stochastic
solvers ('sgd', 'adam'), note that this determines the number of epochs
(how many times each data point will be used), not the number of
gradient steps.
- `shuffle`: bool, default=True
Whether to shuffle samples in each iteration. Only used when
solver='sgd' or 'adam'.
- `random_state`: int, RandomState instance, default=None
Determines random number generation for weights and bias
initialization, train-test split if early stopping is used, and batch
sampling when solver='sgd' or 'adam'.
Pass an int for reproducible results across multiple function calls.
See `Glossary `.
- `tol`: float, default=1e-4
Tolerance for the optimization. When the loss or score is not improving
by at least ``tol`` for ``n_iter_no_change`` consecutive iterations,
unless ``learning_rate`` is set to 'adaptive', convergence is
considered to be reached and training stops.
- `verbose`: bool, default=False
Whether to print progress messages to stdout.
- `warm_start`: bool, default=False
When set to True, reuse the solution of the previous
call to fit as initialization, otherwise, just erase the
previous solution. See `the Glossary `.
- `momentum`: float, default=0.9
Momentum for gradient descent update. Should be between 0 and 1. Only
used when solver='sgd'.
- `nesterovs_momentum`: bool, default=True
Whether to use Nesterov's momentum. Only used when solver='sgd' and
momentum > 0.
- `early_stopping`: bool, default=False
Whether to use early stopping to terminate training when validation
score is not improving. If set to true, it will automatically set
aside 10% of training data as validation and terminate training when
validation score is not improving by at least ``tol`` for
``n_iter_no_change`` consecutive epochs. The split is stratified,
except in a multilabel setting.
If early stopping is False, then the training stops when the training
loss does not improve by more than tol for n_iter_no_change consecutive
passes over the training set.
Only effective when solver='sgd' or 'adam'.
- `validation_fraction`: float, default=0.1
The proportion of training data to set aside as validation set for
early stopping. Must be between 0 and 1.
Only used if early_stopping is True.
- `beta_1`: float, default=0.9
Exponential decay rate for estimates of first moment vector in adam,
should be in [0, 1). Only used when solver='adam'.
- `beta_2`: float, default=0.999
Exponential decay rate for estimates of second moment vector in adam,
should be in [0, 1). Only used when solver='adam'.
- `epsilon`: float, default=1e-8
Value for numerical stability in adam. Only used when solver='adam'.
- `n_iter_no_change`: int, default=10
Maximum number of epochs to not meet ``tol`` improvement.
Only effective when solver='sgd' or 'adam'.
*Added in 0.20*
- `max_fun`: int, default=15000
Only used when solver='lbfgs'. Maximum number of loss function calls.
The solver iterates until convergence (determined by 'tol'), number
of iterations reaches max_iter, or this number of loss function calls.
Note that number of loss function calls will be greater than or equal
to the number of iterations for the `MLPClassifier`.
*Added in 0.22*
Attributes
----------
- `classes_`: ndarray or list of ndarray of shape (n_classes,)
Class labels for each output.
- `loss_`: float
The current loss computed with the loss function.
- `best_loss_`: float or None
The minimum loss reached by the solver throughout fitting.
If `early_stopping=True`, this attribute is set to `None`. Refer to
the `best_validation_score_` fitted attribute instead.
- `loss_curve_`: list of shape (`n_iter_`,)
The ith element in the list represents the loss at the ith iteration.
- `validation_scores_`: list of shape (`n_iter_`,) or None
The score at each iteration on a held-out validation set. The score
reported is the accuracy score. Only available if `early_stopping=True`,
otherwise the attribute is set to `None`.
- `best_validation_score_`: float or None
The best validation score (i.e. accuracy score) that triggered the
early stopping. Only available if `early_stopping=True`, otherwise the
attribute is set to `None`.
- `t_`: int
The number of training samples seen by the solver during fitting.
- `coefs_`: list of shape (n_layers - 1,)
The ith element in the list represents the weight matrix corresponding
to layer i.
- `intercepts_`: list of shape (n_layers - 1,)
The ith element in the list represents the bias vector corresponding to
layer i + 1.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_iter_`: int
The number of iterations the solver has run.
- `n_layers_`: int
Number of layers.
- `n_outputs_`: int
Number of outputs.
- `out_activation_`: str
Name of the output activation function.
See Also
--------
- `MLPRegressor`: Multi-layer Perceptron regressor.
- `BernoulliRBM`: Bernoulli Restricted Boltzmann Machine (RBM).
Notes
-----
MLPClassifier trains iteratively since at each time step
the partial derivatives of the loss function with respect to the model
parameters are computed to update the parameters.
It can also have a regularization term added to the loss function
that shrinks model parameters to prevent overfitting.
This implementation works with data represented as dense numpy arrays or
sparse scipy arrays of floating point values.
References
----------
Hinton, Geoffrey E. "Connectionist learning procedures."
Artificial intelligence 40.1 (1989): 185-234.
Glorot, Xavier, and Yoshua Bengio.
"Understanding the difficulty of training deep feedforward neural networks."
International Conference on Artificial Intelligence and Statistics. 2010.
:arxiv:`He, Kaiming, et al (2015). "Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification." <1502.01852>`
:arxiv:`Kingma, Diederik, and Jimmy Ba (2014)
"Adam: A method for stochastic optimization." <1412.6980>`
Examples
--------
>>> from sklearn.neural_network import MLPClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=100, random_state=1)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
... random_state=1)
>>> clf = MLPClassifier(random_state=1, max_iter=300).fit(X_train, y_train)
>>> clf.predict_proba(X_test[:1])
array([[0.038..., 0.961...]])
>>> clf.predict(X_test[:5, :])
array([1, 0, 1, 0, 1])
>>> clf.score(X_test, y_test)
0.8...
24.2.23 /multinomial-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| predict-proba? |
Naive Bayes classifier for multinomial models.
The multinomial Naive Bayes classifier is suitable for classification with
discrete features (e.g., word counts for text classification). The
multinomial distribution normally requires integer feature counts. However,
in practice, fractional counts such as tf-idf may also work.
Read more in the User Guide: `multinomial_naive_bayes`.
Parameters
----------
- `alpha`: float or array-like of shape (n_features,), default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(set alpha=0 and force_alpha=True, for no smoothing).
- `force_alpha`: bool, default=True
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.
*Added in 1.2*
*Changed in 1.4*
The default value of `force_alpha` changed to `True`.
- `fit_prior`: bool, default=True
Whether to learn class prior probabilities or not.
If false, a uniform prior will be used.
- `class_prior`: array-like of shape (n_classes,), default=None
Prior probabilities of the classes. If specified, the priors are not
adjusted according to the data.
Attributes
----------
- `class_count_`: ndarray of shape (n_classes,)
Number of samples encountered for each class during fitting. This
value is weighted by the sample weight when provided.
- `class_log_prior_`: ndarray of shape (n_classes,)
Smoothed empirical log probability for each class.
- `classes_`: ndarray of shape (n_classes,)
Class labels known to the classifier
- `feature_count_`: ndarray of shape (n_classes, n_features)
Number of samples encountered for each (class, feature)
during fitting. This value is weighted by the sample weight when
provided.
- `feature_log_prob_`: ndarray of shape (n_classes, n_features)
Empirical log probability of features
given a class, ``P(x_i|y)``.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `BernoulliNB`: Naive Bayes classifier for multivariate Bernoulli models.
- `CategoricalNB`: Naive Bayes classifier for categorical features.
- `ComplementNB`: Complement Naive Bayes classifier.
- `GaussianNB`: Gaussian Naive Bayes.
References
----------
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to
Information Retrieval. Cambridge University Press, pp. 234-265.
https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
Examples
--------
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, y)
MultinomialNB()
>>> print(clf.predict(X[2:3]))
[3]
24.2.24 /nearest-centroid
| name | type | default | description |
|---|---|---|---|
| metric | |||
| shrink-threshold | |||
| predict-proba? |
Nearest centroid classifier.
Each class is represented by its centroid, with test samples classified to
the class with the nearest centroid.
Read more in the User Guide: `nearest_centroid_classifier`.
Parameters
----------
- `metric`: {"euclidean", "manhattan"}, default="euclidean"
Metric to use for distance computation.
If `metric="euclidean"`, the centroid for the samples corresponding to each
class is the arithmetic mean, which minimizes the sum of squared L1 distances.
If `metric="manhattan"`, the centroid is the feature-wise median, which
minimizes the sum of L1 distances.
*Changed in 1.5*
All metrics but `"euclidean"` and `"manhattan"` were deprecated and
now raise an error.
*Changed in 0.19*
`metric='precomputed'` was deprecated and now raises an error
- `shrink_threshold`: float, default=None
Threshold for shrinking centroids to remove features.
Attributes
----------
- `centroids_`: array-like of shape (n_classes, n_features)
Centroid of each class.
- `classes_`: array of shape (n_classes,)
The unique classes labels.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `KNeighborsClassifier`: Nearest neighbors classifier.
Notes
-----
When used for text classification with tf-idf vectors, this classifier is
also known as the Rocchio classifier.
References
----------
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of
multiple cancer types by shrunken centroids of gene expression. Proceedings
of the National Academy of Sciences of the United States of America,
99(10), 6567-6572. The National Academy of Sciences.
Examples
--------
>>> from sklearn.neighbors import NearestCentroid
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = NearestCentroid()
>>> clf.fit(X, y)
NearestCentroid()
>>> print(clf.predict([[-0.8, -1]]))
[1]
For a more detailed example see:
:ref:`sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py`
24.2.25 /nu-svc
| name | type | default | description |
|---|---|---|---|
| break-ties | |||
| kernel | |||
| gamma | |||
| degree | |||
| decision-function-shape | |||
| probability | |||
| tol | |||
| nu | |||
| shrinking | |||
| max-iter | |||
| random-state | |||
| coef-0 | |||
| class-weight | |||
| cache-size | |||
| verbose | |||
| predict-proba? |
Nu-Support Vector Classification.
Similar to SVC but uses a parameter to control the number of support
vectors.
The implementation is based on libsvm.
Read more in the User Guide: `svm_classification`.
Parameters
----------
- `nu`: float, default=0.5
An upper bound on the fraction of margin errors (see User Guide: `nu_svc`) and a lower bound of the fraction of support vectors.
Should be in the interval (0, 1].
- `kernel`: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf'
Specifies the kernel type to be used in the algorithm.
If none is given, 'rbf' will be used. If a callable is given it is
used to precompute the kernel matrix. For an intuitive
visualization of different kernel types see
:ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`.
- `degree`: int, default=3
Degree of the polynomial kernel function ('poly').
Must be non-negative. Ignored by all other kernels.
- `gamma`: {'scale', 'auto'} or float, default='scale'
Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.
- if ``gamma='scale'`` (default) is passed then it uses
1 / (n_features * X.var()) as value of gamma,
- if 'auto', uses 1 / n_features
- if float, must be non-negative.
*Changed in 0.22*
The default value of ``gamma`` changed from 'auto' to 'scale'.
- `coef0`: float, default=0.0
Independent term in kernel function.
It is only significant in 'poly' and 'sigmoid'.
- `shrinking`: bool, default=True
Whether to use the shrinking heuristic.
See the User Guide: `shrinking_svm`.
- `probability`: bool, default=False
Whether to enable probability estimates. This must be enabled prior
to calling `fit`, will slow down that method as it internally uses
5-fold cross-validation, and `predict_proba` may be inconsistent with
`predict`. Read more in the User Guide: `scores_probabilities`.
- `tol`: float, default=1e-3
Tolerance for stopping criterion.
- `cache_size`: float, default=200
Specify the size of the kernel cache (in MB).
- `class_weight`: {dict, 'balanced'}, default=None
Set the parameter C of class i to class_weight[i]*C for
SVC. If not given, all classes are supposed to have
weight one. The "balanced" mode uses the values of y to automatically
adjust weights inversely proportional to class frequencies as
``n_samples / (n_classes * np.bincount(y))``.
- `verbose`: bool, default=False
Enable verbose output. Note that this setting takes advantage of a
per-process runtime setting in libsvm that, if enabled, may not work
properly in a multithreaded context.
- `max_iter`: int, default=-1
Hard limit on iterations within solver, or -1 for no limit.
- `decision_function_shape`: {'ovo', 'ovr'}, default='ovr'
Whether to return a one-vs-rest ('ovr') decision function of shape
(n_samples, n_classes) as all other classifiers, or the original
one-vs-one ('ovo') decision function of libsvm which has shape
(n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one
('ovo') is always used as multi-class strategy. The parameter is
ignored for binary classification.
*Changed in 0.19*
decision_function_shape is 'ovr' by default.
*Added in 0.17*
*decision_function_shape='ovr'* is recommended.
*Changed in 0.17*
Deprecated *decision_function_shape='ovo' and None*.
- `break_ties`: bool, default=False
If true, ``decision_function_shape='ovr'``, and number of classes > 2,
`predict` will break ties according to the confidence values of
`decision_function`; otherwise the first class among the tied
classes is returned. Please note that breaking ties comes at a
relatively high computational cost compared to a simple predict.
*Added in 0.22*
- `random_state`: int, RandomState instance or None, default=None
Controls the pseudo random number generation for shuffling the data for
probability estimates. Ignored when `probability` is False.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
Attributes
----------
- `class_weight_`: ndarray of shape (n_classes,)
Multipliers of parameter C of each class.
Computed based on the ``class_weight`` parameter.
- `classes_`: ndarray of shape (n_classes,)
The unique classes labels.
- `coef_`: ndarray of shape (n_classes * (n_classes -1) / 2, n_features)
Weights assigned to the features (coefficients in the primal
problem). This is only available in the case of a linear kernel.
`coef_` is readonly property derived from `dual_coef_` and
`support_vectors_`.
- `dual_coef_`: ndarray of shape (n_classes - 1, n_SV)
Dual coefficients of the support vector in the decision
function (see :ref:`sgd_mathematical_formulation`), multiplied by
their targets.
For multiclass, coefficient for all 1-vs-1 classifiers.
The layout of the coefficients in the multiclass case is somewhat
non-trivial. See the multi-class section of the User Guide: `svm_multi_class` for details.
- `fit_status_`: int
0 if correctly fitted, 1 if the algorithm did not converge.
- `intercept_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)
Constants in decision function.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_iter_`: ndarray of shape (n_classes * (n_classes - 1) // 2,)
Number of iterations run by the optimization routine to fit the model.
The shape of this attribute depends on the number of models optimized
which in turn depends on the number of classes.
*Added in 1.1*
- `support_`: ndarray of shape (n_SV,)
Indices of support vectors.
- `support_vectors_`: ndarray of shape (n_SV, n_features)
Support vectors.
- `n_support_`: ndarray of shape (n_classes,), dtype=int32
Number of support vectors for each class.
- `fit_status_`: int
0 if correctly fitted, 1 if the algorithm did not converge.
- `probA_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)
- `probB_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)
If `probability=True`, it corresponds to the parameters learned in
Platt scaling to produce probability estimates from decision values.
If `probability=False`, it's an empty array. Platt scaling uses the
logistic function
``1 / (1 + exp(decision_value * probA_ + probB_))``
where ``probA_`` and ``probB_`` are learned from the dataset [2]_. For
more information on the multiclass case and training procedure see
section 8 of [1]_.
- `shape_fit_`: tuple of int of shape (n_dimensions_of_X,)
Array dimensions of training vector ``X``.
See Also
--------
- `SVC`: Support Vector Machine for classification using libsvm.
- `LinearSVC`: Scalable linear Support Vector Machine for classification using
liblinear.
References
----------
Examples
import numpy as np X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import NuSVC clf = make_pipeline(StandardScaler(), NuSVC()) clf.fit(X, y) Pipeline(steps=[('standardscaler', StandardScaler()), ('nusvc', NuSVC())]) print(clf.predict([[-0.8, -1]])) [1]
24.2.26 /passive-aggressive-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| average | |||
| tol | |||
| early-stopping | |||
| shuffle | |||
| c | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| fit-intercept | |||
| warm-start | |||
| validation-fraction | |||
| class-weight | |||
| loss | |||
| verbose | |||
| predict-proba? |
Passive Aggressive Classifier.
Read more in the User Guide: `passive_aggressive`.
Parameters
----------
- `C`: float, default=1.0
Maximum step size (regularization). Defaults to 1.0.
- `fit_intercept`: bool, default=True
Whether the intercept should be estimated or not. If False, the
data is assumed to be already centered.
- `max_iter`: int, default=1000
The maximum number of passes over the training data (aka epochs).
It only impacts the behavior in the ``fit`` method, and not the
`~sklearn.linear_model.PassiveAggressiveClassifier.partial_fit` method.
*Added in 0.19*
- `tol`: float or None, default=1e-3
The stopping criterion. If it is not None, the iterations will stop
when (loss > previous_loss - tol).
*Added in 0.19*
- `early_stopping`: bool, default=False
Whether to use early stopping to terminate training when validation
score is not improving. If set to True, it will automatically set aside
a stratified fraction of training data as validation and terminate
training when validation score is not improving by at least `tol` for
`n_iter_no_change` consecutive epochs.
*Added in 0.20*
- `validation_fraction`: float, default=0.1
The proportion of training data to set aside as validation set for
early stopping. Must be between 0 and 1.
Only used if early_stopping is True.
*Added in 0.20*
- `n_iter_no_change`: int, default=5
Number of iterations with no improvement to wait before early stopping.
*Added in 0.20*
- `shuffle`: bool, default=True
Whether or not the training data should be shuffled after each epoch.
- `verbose`: int, default=0
The verbosity level.
- `loss`: str, default="hinge"
The loss function to be used:
hinge: equivalent to PA-I in the reference paper.
squared_hinge: equivalent to PA-II in the reference paper.
- `n_jobs`: int or None, default=None
The number of CPUs to use to do the OVA (One Versus All, for
multi-class problems) computation.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors. See `Glossary `
for more details.
- `random_state`: int, RandomState instance, default=None
Used to shuffle the training data, when ``shuffle`` is set to
``True``. Pass an int for reproducible output across multiple
function calls.
See `Glossary `.
- `warm_start`: bool, default=False
When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
See `the Glossary `.
Repeatedly calling fit or partial_fit when warm_start is True can
result in a different solution than when calling fit a single time
because of the way the data is shuffled.
- `class_weight`: dict, {class_label: weight} or "balanced" or None, default=None
Preset for the class_weight fit parameter.
Weights associated with classes. If not given, all classes
are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
*Added in 0.17*
parameter *class_weight* to automatically weight samples.
- `average`: bool or int, default=False
When set to True, computes the averaged SGD weights and stores the
result in the ``coef_`` attribute. If set to an int greater than 1,
averaging will begin once the total number of samples seen reaches
average. So average=10 will begin averaging after seeing 10 samples.
*Added in 0.19*
parameter *average* to use weights averaging in SGD.
Attributes
----------
- `coef_`: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features)
Weights assigned to the features.
- `intercept_`: ndarray of shape (1,) if n_classes == 2 else (n_classes,)
Constants in decision function.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_iter_`: int
The actual number of iterations to reach the stopping criterion.
For multiclass fits, it is the maximum over every binary fit.
- `classes_`: ndarray of shape (n_classes,)
The unique classes labels.
- `t_`: int
Number of weight updates performed during training.
Same as ``(n_iter_ * n_samples + 1)``.
- `loss_function_`: callable
Loss function used by the algorithm.
*Deprecated since 1.4*
Attribute `loss_function_` was deprecated in version 1.4 and will be
removed in 1.6.
See Also
--------
- `SGDClassifier`: Incrementally trained logistic regression.
- `Perceptron`: Linear perceptron classifier.
References
----------
Online Passive-Aggressive Algorithms
K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR (2006)
Examples
--------
>>> from sklearn.linear_model import PassiveAggressiveClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_features=4, random_state=0)
>>> clf = PassiveAggressiveClassifier(max_iter=1000, random_state=0,
... tol=1e-3)
>>> clf.fit(X, y)
PassiveAggressiveClassifier(random_state=0)
>>> print(clf.coef_)
[[0.26642044 0.45070924 0.67251877 0.64185414]]
>>> print(clf.intercept_)
[1.84127814]
>>> print(clf.predict([[0, 0, 0, 0]]))
[1]
24.2.27 /perceptron
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| tol | |||
| early-stopping | |||
| eta-0 | |||
| shuffle | |||
| penalty | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| l-1-ratio | |||
| validation-fraction | |||
| class-weight | |||
| verbose | |||
| predict-proba? |
Linear perceptron classifier.
The implementation is a wrapper around `~sklearn.linear_model.SGDClassifier` by fixing the `loss` and `learning_rate` parameters as
SGDClassifier(loss="perceptron", learning_rate="constant") r available parameters are described below and are forwarded to ss:`~sklearn.linear_model.SGDClassifier`. more in the :ref:`User Guide`. meters ------ lty : {'l2','l1','elasticnet'}, default=None The penalty (aka regularization term) to be used. a : float, default=0.0001 Constant that multiplies the regularization term if regularization is used. atio : float, default=0.15 The Elastic Net mixing parameter, with `0 <= l1_ratio <= 1`. `l1_ratio=0` corresponds to L2 penalty, `l1_ratio=1` to L1. Only used if `penalty='elasticnet'`. .. versionadded:: 0.24 intercept : bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. iter : int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the ``fit`` method, and not the :meth:`partial_fit` method. .. versionadded:: 0.19 : float or None, default=1e-3 The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol). .. versionadded:: 0.19 fle : bool, default=True Whether or not the training data should be shuffled after each epoch. ose : int, default=0 The verbosity level. : float, default=1 Constant by which the updates are multiplied. bs : int, default=None The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details. om_state : int, RandomState instance or None, default=0 Used to shuffle the training data, when ``shuffle`` is set to ``True``. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. y_stopping : bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score is not improving by at least `tol` for `n_iter_no_change` consecutive epochs. .. versionadded:: 0.20 dation_fraction : float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True. .. versionadded:: 0.20 er_no_change : int, default=5 Number of iterations with no improvement to wait before early stopping. .. versionadded:: 0.20 s_weight : dict, {class_label: weight} or "balanced", default=None Preset for the class_weight fit parameter. Weights associated with classes. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. _start : bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See :term:`the Glossary `. ibutes ------ ses_ : ndarray of shape (n_classes,) The unique classes labels. _ : ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features. rcept_ : ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function. _function_ : concrete LossFunction The function that determines the loss, or difference between the output of the algorithm and the target values. atures_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 ure_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 er_ : int The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit. int Number of weight updates performed during training. Same as ``(n_iter_ * n_samples + 1)``. Also ---- arn.linear_model.SGDClassifier : Linear classifiers (SVM, logistic regression, etc.) with SGD training. s - rceptron`` is a classification algorithm which shares the same rlying implementation with ``SGDClassifier``. In fact, rceptron()`` is equivalent to `SGDClassifier(loss="perceptron", =1, learning_rate="constant", penalty=None)`. rences ------ s://en.wikipedia.org/wiki/Perceptron and references therein. ples ---- from sklearn.datasets import load_digits from sklearn.linear_model import Perceptron X, y = load_digits(return_X_y=True) clf = Perceptron(tol=1e-3, random_state=0) clf.fit(X, y) eptron() clf.score(X, y) 9...
24.2.28 /quadratic-discriminant-analysis
| name | type | default | description |
|---|---|---|---|
| priors | |||
| reg-param | |||
| store-covariance | |||
| tol | |||
| predict-proba? |
Quadratic Discriminant Analysis.
A classifier with a quadratic decision boundary, generated
by fitting class conditional densities to the data
and using Bayes' rule.
The model fits a Gaussian density to each class.
*Added in 0.17*
For a comparison between
`~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis`
and `~sklearn.discriminant_analysis.LinearDiscriminantAnalysis`, see
:ref:`sphx_glr_auto_examples_classification_plot_lda_qda.py`.
Read more in the User Guide: `lda_qda`.
Parameters
----------
- `priors`: array-like of shape (n_classes,), default=None
Class priors. By default, the class proportions are inferred from the
training data.
- `reg_param`: float, default=0.0
Regularizes the per-class covariance estimates by transforming S2 as
``S2 = (1 - reg_param) * S2 + reg_param * np.eye(n_features)``,
where S2 corresponds to the `scaling_` attribute of a given class.
- `store_covariance`: bool, default=False
If True, the class covariance matrices are explicitly computed and
stored in the `self.covariance_` attribute.
*Added in 0.17*
- `tol`: float, default=1.0e-4
Absolute threshold for a singular value to be considered significant,
used to estimate the rank of `Xk` where `Xk` is the centered matrix
of samples in class k. This parameter does not affect the
predictions. It only controls a warning that is raised when features
are considered to be colinear.
*Added in 0.17*
Attributes
----------
- `covariance_`: list of len n_classes of ndarray of shape (n_features, n_features)
For each class, gives the covariance matrix estimated using the
samples of that class. The estimations are unbiased. Only present if
`store_covariance` is True.
- `means_`: array-like of shape (n_classes, n_features)
Class-wise means.
- `priors_`: array-like of shape (n_classes,)
Class priors (sum to 1).
- `rotations_`: list of len n_classes of ndarray of shape (n_features, n_k)
For each class k an array of shape (n_features, n_k), where
``n_k = min(n_features, number of elements in class k)``
It is the rotation of the Gaussian distribution, i.e. its
principal axis. It corresponds to `V`, the matrix of eigenvectors
coming from the SVD of `Xk = U S Vt` where `Xk` is the centered
matrix of samples from class k.
- `scalings_`: list of len n_classes of ndarray of shape (n_k,)
For each class, contains the scaling of
the Gaussian distributions along its principal axes, i.e. the
variance in the rotated coordinate system. It corresponds to `S^2 /
(n_samples - 1)`, where `S` is the diagonal matrix of singular values
from the SVD of `Xk`, where `Xk` is the centered matrix of samples
from class k.
- `classes_`: ndarray of shape (n_classes,)
Unique class labels.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `LinearDiscriminantAnalysis`: Linear Discriminant Analysis.
Examples
--------
>>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QuadraticDiscriminantAnalysis()
>>> clf.fit(X, y)
QuadraticDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]
24.2.29 /radius-neighbors-classifier
| name | type | default | description |
|---|---|---|---|
| weights | |||
| p | |||
| leaf-size | |||
| metric-params | |||
| radius | |||
| outlier-label | |||
| algorithm | |||
| n-jobs | |||
| metric | |||
| predict-proba? |
Classifier implementing a vote among neighbors within a given radius.
Read more in the User Guide: `classification`.
Parameters
----------
- `radius`: float, default=1.0
Range of parameter space to use by default for `radius_neighbors`
queries.
- `weights`: {'uniform', 'distance'}, callable or None, default='uniform'
Weight function used in prediction. Possible values:
- 'uniform' : uniform weights. All points in each neighborhood
are weighted equally.
- 'distance' : weight points by the inverse of their distance.
in this case, closer neighbors of a query point will have a
greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an
array of distances, and returns an array of the same shape
containing the weights.
Uniform weights are used by default.
- `algorithm`: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
Algorithm used to compute the nearest neighbors:
- 'ball_tree' will use `BallTree`
- 'kd_tree' will use `KDTree`
- 'brute' will use a brute-force search.
- 'auto' will attempt to decide the most appropriate algorithm
based on the values passed to `fit` method.
Note: fitting on sparse input will override the setting of
this parameter, using brute force.
- `leaf_size`: int, default=30
Leaf size passed to BallTree or KDTree. This can affect the
speed of the construction and query, as well as the memory
required to store the tree. The optimal value depends on the
nature of the problem.
- `p`: float, default=2
Power parameter for the Minkowski metric. When p = 1, this is
equivalent to using manhattan_distance (l1), and euclidean_distance
(l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
This parameter is expected to be positive.
- `metric`: str or callable, default='minkowski'
Metric to use for distance computation. Default is "minkowski", which
results in the standard Euclidean distance when p = 2. See the
documentation of [scipy.spatial.distance
](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) and
the metrics listed in
`~sklearn.metrics.pairwise.distance_metrics` for valid metric
values.
If metric is "precomputed", X is assumed to be a distance matrix and
must be square during fit. X may be a `sparse graph`, in which
case only "nonzero" elements may be considered neighbors.
If metric is a callable function, it takes two arrays representing 1D
vectors as inputs and must return one value indicating the distance
between those vectors. This works for Scipy's metrics, but is less
efficient than passing the metric name as a string.
- `outlier_label`: {manual label, 'most_frequent'}, default=None
Label for outlier samples (samples with no neighbors in given radius).
- manual label: str or int label (should be the same type as y)
or list of manual labels if multi-output is used.
- 'most_frequent' : assign the most frequent label of y to outliers.
- None : when any outlier is detected, ValueError will be raised.
The outlier label should be selected from among the unique 'Y' labels.
If it is specified with a different value a warning will be raised and
all class probabilities of outliers will be assigned to be 0.
- `metric_params`: dict, default=None
Additional keyword arguments for the metric function.
- `n_jobs`: int, default=None
The number of parallel jobs to run for neighbors search.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors. See `Glossary `
for more details.
Attributes
----------
- `classes_`: ndarray of shape (n_classes,)
Class labels known to the classifier.
- `effective_metric_`: str or callable
The distance metric used. It will be same as the `metric` parameter
or a synonym of it, e.g. 'euclidean' if the `metric` parameter set to
'minkowski' and `p` parameter set to 2.
- `effective_metric_params_`: dict
Additional keyword arguments for the metric function. For most metrics
will be same with `metric_params` parameter, but may also contain the
`p` parameter value if the `effective_metric_` attribute is set to
'minkowski'.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_samples_fit_`: int
Number of samples in the fitted data.
- `outlier_label_`: int or array-like of shape (n_class,)
Label which is given for outlier samples (samples with no neighbors
on given radius).
- `outputs_2d_`: bool
False when `y`'s shape is (n_samples, ) or (n_samples, 1) during fit
otherwise True.
See Also
--------
- `KNeighborsClassifier`: Classifier implementing the k-nearest neighbors
vote.
- `RadiusNeighborsRegressor`: Regression based on neighbors within a
fixed radius.
- `KNeighborsRegressor`: Regression based on k-nearest neighbors.
- `NearestNeighbors`: Unsupervised learner for implementing neighbor
searches.
Notes
-----
See Nearest Neighbors: `neighbors` in the online documentation
for a discussion of the choice of ``algorithm`` and ``leaf_size``.
https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Examples
--------
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsClassifier
>>> neigh = RadiusNeighborsClassifier(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsClassifier(...)
>>> print(neigh.predict([[1.5]]))
[0]
>>> print(neigh.predict_proba([[1.0]]))
[[0.66666667 0.33333333]]
24.2.30 /random-forest-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| class-weight | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
A random forest classifier.
A random forest is a meta estimator that fits a number of decision tree
classifiers on various sub-samples of the dataset and uses averaging to
improve the predictive accuracy and control over-fitting.
Trees in the forest use the best split strategy, i.e. equivalent to passing
`splitter="best"` to the underlying `~sklearn.tree.DecisionTreeRegressor`.
The sub-sample size is controlled with the `max_samples` parameter if
`bootstrap=True` (default), otherwise the whole dataset is used to build
each tree.
For a comparison between tree-based ensemble models see the example
:ref:`sphx_glr_auto_examples_ensemble_plot_forest_hist_grad_boosting_comparison.py`.
Read more in the User Guide: `forest`.
Parameters
----------
- `n_estimators`: int, default=100
The number of trees in the forest.
*Changed in 0.22*
The default value of ``n_estimators`` changed from 10 to 100
in 0.22.
- `criterion`: {"gini", "entropy", "log_loss"}, default="gini"
The function to measure the quality of a split. Supported criteria are
"gini" for the Gini impurity and "log_loss" and "entropy" both for the
Shannon information gain, see :ref:`tree_mathematical_formulation`.
Note: This parameter is tree-specific.
- `max_depth`: int, default=None
The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
- `min_samples_split`: int or float, default=2
The minimum number of samples required to split an internal node:
- If int, then consider `min_samples_split` as the minimum number.
- If float, then `min_samples_split` is a fraction and
`ceil(min_samples_split * n_samples)` are the minimum
number of samples for each split.
*Changed in 0.18*
Added float values for fractions.
- `min_samples_leaf`: int or float, default=1
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least ``min_samples_leaf`` training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
- If int, then consider `min_samples_leaf` as the minimum number.
- If float, then `min_samples_leaf` is a fraction and
`ceil(min_samples_leaf * n_samples)` are the minimum
number of samples for each node.
*Changed in 0.18*
Added float values for fractions.
- `min_weight_fraction_leaf`: float, default=0.0
The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
- `max_features`: {"sqrt", "log2", None}, int or float, default="sqrt"
The number of features to consider when looking for the best split:
- If int, then consider `max_features` features at each split.
- If float, then `max_features` is a fraction and
`max(1, int(max_features * n_features_in_))` features are considered at each
split.
- If "sqrt", then `max_features=sqrt(n_features)`.
- If "log2", then `max_features=log2(n_features)`.
- If None, then `max_features=n_features`.
*Changed in 1.1*
The default of `max_features` changed from `"auto"` to `"sqrt"`.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than ``max_features`` features.
- `max_leaf_nodes`: int, default=None
Grow trees with ``max_leaf_nodes`` in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
- `min_impurity_decrease`: float, default=0.0
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
p : bool, default=True
her bootstrap samples are used when building trees. If False, the
e dataset is used to build each tree.
e : bool or callable, default=False
her to use out-of-bag samples to estimate the generalization score.
efault, :func:`~sklearn.metrics.accuracy_score` is used.
ide a callable with signature `metric(y_true, y_pred)` to use a
om metric. Only available if `bootstrap=True`.
int, default=None
number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
h:`decision_path` and :meth:`apply` are all parallelized over the
s. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
ext. ``-1`` means using all processors. See :term:`Glossary
obs>` for more details.
tate : int, RandomState instance or None, default=None
rols both the randomness of the bootstrapping of the samples used
building trees (if ``bootstrap=True``) and the sampling of the
ures to consider when looking for the best split at each node
``max_features < n_features``).
:term:`Glossary ` for details.
: int, default=0
rols the verbosity when fitting and predicting.
rt : bool, default=False
set to ``True``, reuse the solution of the previous call to fit
add more estimators to the ensemble, otherwise, just fit a whole
forest. See :term:`Glossary ` and
:`tree_ensemble_warm_start` for details.
ight : {"balanced", "balanced_subsample"}, dict or list of dicts, default=None
hts associated with classes in the form ``{class_label: weight}``.
ot given, all classes are supposed to have weight one. For
i-output problems, a list of dicts can be provided in the same
r as the columns of y.
that for multioutput (including multilabel) weights should be
ned for each class of every column in its own dict. For example,
four-class multilabel classification weights should be
1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1}, {2:5}, {3:1}, {4:1}].
"balanced" mode uses the values of y to automatically adjust
hts inversely proportional to class frequencies in the input data
`n_samples / (n_classes * np.bincount(y))``
"balanced_subsample" mode is the same as "balanced" except that
hts are computed based on the bootstrap sample for every tree
n.
multi-output, the weights of each column of y will be multiplied.
that these weights will be multiplied with sample_weight (passed
ugh the fit method) if sample_weight is specified.
a : non-negative float, default=0.0
lexity parameter used for Minimal Cost-Complexity Pruning. The
ree with the largest cost complexity that is smaller than
p_alpha`` will be chosen. By default, no pruning is performed. See
:`minimal_cost_complexity_pruning` for details.
ersionadded:: 0.22
les : int or float, default=None
ootstrap is True, the number of samples to draw from X
rain each base estimator.
None (default), then draw `X.shape[0]` samples.
int, then draw `max_samples` samples.
float, then draw `max(round(n_samples * max_samples), 1)` samples. Thus,
ax_samples` should be in the interval `(0.0, 1.0]`.
ersionadded:: 0.22
c_cst : array-like of int of shape (n_features), default=None
cates the monotonicity constraint to enforce on each feature.
1: monotonic increase
0: no constraint
-1: monotonic decrease
onotonic_cst is None, no constraints are applied.
tonicity constraints are not supported for:
multiclass classifications (i.e. when `n_classes > 2`),
multioutput classifications (i.e. when `n_outputs_ > 1`),
classifications trained on data with missing values.
constraints hold over the probability of the positive class.
more in the :ref:`User Guide `.
ersionadded:: 1.4
es
--
r_ : :class:`~sklearn.tree.DecisionTreeClassifier`
child estimator template used to create the collection of fitted
estimators.
ersionadded:: 1.2
base_estimator_` was renamed to `estimator_`.
rs_ : list of DecisionTreeClassifier
collection of fitted sub-estimators.
: ndarray of shape (n_classes,) or a list of such arrays
classes labels (single output problem), or a list of arrays of
s labels (multi-output problem).
s_ : int or list
number of classes (single output problem), or a list containing the
er of classes for each output (multi-output problem).
es_in_ : int
er of features seen during :term:`fit`.
ersionadded:: 0.24
names_in_ : ndarray of shape (`n_features_in_`,)
s of features seen during :term:`fit`. Defined only when `X`
feature names that are all strings.
ersionadded:: 1.0
s_ : int
number of outputs when ``fit`` is performed.
importances_ : ndarray of shape (n_features,)
impurity-based feature importances.
higher, the more important the feature.
importance of a feature is computed as the (normalized)
l reduction of the criterion brought by that feature. It is also
n as the Gini importance.
ing: impurity-based feature importances can be misleading for
cardinality features (many unique values). See
c:`sklearn.inspection.permutation_importance` as an alternative.
e_ : float
e of the training dataset obtained using an out-of-bag estimate.
attribute exists only when ``oob_score`` is True.
sion_function_ : ndarray of shape (n_samples, n_classes) or (n_samples, n_classes, n_outputs)
sion function computed with out-of-bag estimate on the training
If n_estimators is small it might be possible that a data point
never left out during the bootstrap. In this case,
_decision_function_` might contain NaN. This attribute exists
when ``oob_score`` is True.
rs_samples_ : list of arrays
subset of drawn samples (i.e., the in-bag samples) for each base
mator. Each subset is defined by an array of the indices selected.
ersionadded:: 1.4
tree.DecisionTreeClassifier : A decision tree classifier.
ensemble.ExtraTreesClassifier : Ensemble of extremely randomized
classifiers.
ensemble.HistGradientBoostingClassifier : A Histogram-based Gradient
ting Classification Tree, very fast for big datasets (n_samples >=
00).
ult values for the parameters controlling the size of the trees
max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
trees which can potentially be very large on some data sets. To
emory consumption, the complexity and size of the trees should be
ed by setting those parameter values.
ures are always randomly permuted at each split. Therefore,
found split may vary, even with the same training data,
atures=n_features`` and ``bootstrap=False``, if the improvement
riterion is identical for several splits enumerated during the
f the best split. To obtain a deterministic behaviour during
``random_state`` has to be fixed.
es
--
. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.
sklearn.ensemble import RandomForestClassifier
sklearn.datasets import make_classification
= make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
= RandomForestClassifier(max_depth=2, random_state=0)
fit(X, y)
restClassifier(...)
t(clf.predict([[0, 0, 0, 0]]))
24.2.31 /ridge-classifier
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| solver | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| alpha | |||
| class-weight | |||
| predict-proba? |
Classifier using Ridge regression.
This classifier first converts the target values into ``{-1, 1}`` and
then treats the problem as a regression task (multi-output regression in
the multiclass case).
Read more in the User Guide: `ridge_regression`.
Parameters
----------
- `alpha`: float, default=1.0
Regularization strength; must be a positive float. Regularization
improves the conditioning of the problem and reduces the variance of
the estimates. Larger values specify stronger regularization.
Alpha corresponds to ``1 / (2C)`` in other linear models such as
`~sklearn.linear_model.LogisticRegression` or
`~sklearn.svm.LinearSVC`.
- `fit_intercept`: bool, default=True
Whether to calculate the intercept for this model. If set to false, no
intercept will be used in calculations (e.g. data is expected to be
already centered).
- `copy_X`: bool, default=True
If True, X will be copied; else, it may be overwritten.
- `max_iter`: int, default=None
Maximum number of iterations for conjugate gradient solver.
The default value is determined by scipy.sparse.linalg.
- `tol`: float, default=1e-4
The precision of the solution (`coef_`) is determined by `tol` which
specifies a different convergence criterion for each solver:
- 'svd': `tol` has no impact.
- 'cholesky': `tol` has no impact.
- 'sparse_cg': norm of residuals smaller than `tol`.
- 'lsqr': `tol` is set as atol and btol of scipy.sparse.linalg.lsqr,
which control the norm of the residual vector in terms of the norms of
matrix and coefficients.
- 'sag' and 'saga': relative change of coef smaller than `tol`.
- 'lbfgs': maximum of the absolute (projected) gradient=max|residuals|
smaller than `tol`.
*Changed in 1.2*
Default value changed from 1e-3 to 1e-4 for consistency with other linear
models.
- `class_weight`: dict or 'balanced', default=None
Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
- `solver`: {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto'
Solver to use in the computational routines:
- 'auto' chooses the solver automatically based on the type of data.
- 'svd' uses a Singular Value Decomposition of X to compute the Ridge
coefficients. It is the most stable solver, in particular more stable
for singular matrices than 'cholesky' at the cost of being slower.
- 'cholesky' uses the standard scipy.linalg.solve function to
obtain a closed-form solution.
- 'sparse_cg' uses the conjugate gradient solver as found in
scipy.sparse.linalg.cg. As an iterative algorithm, this solver is
more appropriate than 'cholesky' for large-scale data
(possibility to set `tol` and `max_iter`).
- 'lsqr' uses the dedicated regularized least-squares routine
scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative
procedure.
- 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses
its unbiased and more flexible version named SAGA. Both methods
use an iterative procedure, and are often faster than other solvers
when both n_samples and n_features are large. Note that 'sag' and
'saga' fast convergence is only guaranteed on features with
approximately the same scale. You can preprocess the data with a
scaler from sklearn.preprocessing.
*Added in 0.17*
Stochastic Average Gradient descent solver.
*Added in 0.19*
SAGA solver.
- 'lbfgs' uses L-BFGS-B algorithm implemented in
`scipy.optimize.minimize`. It can be used only when `positive`
is True.
- `positive`: bool, default=False
When set to ``True``, forces the coefficients to be positive.
Only 'lbfgs' solver is supported in this case.
- `random_state`: int, RandomState instance, default=None
Used when ``solver`` == 'sag' or 'saga' to shuffle the data.
See `Glossary ` for details.
Attributes
----------
- `coef_`: ndarray of shape (1, n_features) or (n_classes, n_features)
Coefficient of the features in the decision function.
``coef_`` is of shape (1, n_features) when the given problem is binary.
- `intercept_`: float or ndarray of shape (n_targets,)
Independent term in decision function. Set to 0.0 if
``fit_intercept = False``.
- `n_iter_`: None or ndarray of shape (n_targets,)
Actual number of iterations for each target. Available only for
sag and lsqr solvers. Other solvers will return None.
- `classes_`: ndarray of shape (n_classes,)
The classes labels.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `solver_`: str
The solver that was used at fit time by the computational
routines.
*Added in 1.5*
See Also
--------
- `Ridge`: Ridge regression.
- `RidgeClassifierCV`: Ridge classifier with built-in cross validation.
Notes
-----
For multi-class classification, n_class classifiers are trained in
a one-versus-all approach. Concretely, this is implemented by taking
advantage of the multi-variate response support in Ridge.
Examples
--------
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifier
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifier().fit(X, y)
>>> clf.score(X, y)
0.9595...
24.2.32 /ridge-classifier-cv
| name | type | default | description |
|---|---|---|---|
| alphas | |||
| class-weight | |||
| cv | |||
| fit-intercept | |||
| scoring | |||
| store-cv-results | |||
| store-cv-values | |||
| predict-proba? |
Ridge classifier with built-in cross-validation.
See glossary entry for `cross-validation estimator`.
By default, it performs Leave-One-Out Cross-Validation. Currently,
only the n_features > n_samples case is handled efficiently.
Read more in the User Guide: `ridge_regression`.
Parameters
----------
- `alphas`: array-like of shape (n_alphas,), default=(0.1, 1.0, 10.0)
Array of alpha values to try.
Regularization strength; must be a positive float. Regularization
improves the conditioning of the problem and reduces the variance of
the estimates. Larger values specify stronger regularization.
Alpha corresponds to ``1 / (2C)`` in other linear models such as
`~sklearn.linear_model.LogisticRegression` or
`~sklearn.svm.LinearSVC`.
If using Leave-One-Out cross-validation, alphas must be strictly positive.
- `fit_intercept`: bool, default=True
Whether to calculate the intercept for this model. If set
to false, no intercept will be used in calculations
(i.e. data is expected to be centered).
- `scoring`: str, callable, default=None
A string (see :ref:`scoring_parameter`) or a scorer callable object /
function with signature ``scorer(estimator, X, y)``.
- `cv`: int, cross-validation generator or an iterable, default=None
Determines the cross-validation splitting strategy.
Possible inputs for cv are:
- None, to use the efficient Leave-One-Out cross-validation
- integer, to specify the number of folds.
- `CV splitter`,
- An iterable yielding (train, test) splits as arrays of indices.
Refer User Guide: `cross_validation` for the various
cross-validation strategies that can be used here.
- `class_weight`: dict or 'balanced', default=None
Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
- `store_cv_results`: bool, default=False
Flag indicating if the cross-validation results corresponding to
each alpha should be stored in the ``cv_results_`` attribute (see
below). This flag is only compatible with ``cv=None`` (i.e. using
Leave-One-Out Cross-Validation).
*Changed in 1.5*
Parameter name changed from `store_cv_values` to `store_cv_results`.
- `store_cv_values`: bool
Flag indicating if the cross-validation values corresponding to
each alpha should be stored in the ``cv_values_`` attribute (see
below). This flag is only compatible with ``cv=None`` (i.e. using
Leave-One-Out Cross-Validation).
*Deprecated since 1.5*
`store_cv_values` is deprecated in version 1.5 in favor of
`store_cv_results` and will be removed in version 1.7.
Attributes
----------
- `cv_results_`: ndarray of shape (n_samples, n_targets, n_alphas), optional
Cross-validation results for each alpha (only if ``store_cv_results=True`` and
``cv=None``). After ``fit()`` has been called, this attribute will
contain the mean squared errors if `scoring is None` otherwise it
will contain standardized per point prediction values.
*Changed in 1.5*
`cv_values_` changed to `cv_results_`.
- `coef_`: ndarray of shape (1, n_features) or (n_targets, n_features)
Coefficient of the features in the decision function.
``coef_`` is of shape (1, n_features) when the given problem is binary.
- `intercept_`: float or ndarray of shape (n_targets,)
Independent term in decision function. Set to 0.0 if
``fit_intercept = False``.
- `alpha_`: float
Estimated regularization parameter.
- `best_score_`: float
Score of base estimator with best alpha.
*Added in 0.23*
- `classes_`: ndarray of shape (n_classes,)
The classes labels.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `Ridge`: Ridge regression.
- `RidgeClassifier`: Ridge classifier.
- `RidgeCV`: Ridge regression with built-in cross validation.
Notes
-----
For multi-class classification, n_class classifiers are trained in
a one-versus-all approach. Concretely, this is implemented by taking
advantage of the multi-variate response support in Ridge.
Examples
--------
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifierCV
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
>>> clf.score(X, y)
0.9630...
24.2.33 /sgd-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| average | |||
| tol | |||
| early-stopping | |||
| eta-0 | |||
| shuffle | |||
| penalty | |||
| power-t | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| l-1-ratio | |||
| validation-fraction | |||
| class-weight | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
Linear classifiers (SVM, logistic regression, etc.) with SGD training.
This estimator implements regularized linear models with stochastic
gradient descent (SGD) learning: the gradient of the loss is estimated
each sample at a time and the model is updated along the way with a
decreasing strength schedule (aka learning rate). SGD allows minibatch
(online/out-of-core) learning via the `partial_fit` method.
For best results using the default learning rate schedule, the data should
have zero mean and unit variance.
This implementation works with data represented as dense or sparse arrays
of floating point values for the features. The model it fits can be
controlled with the loss parameter; by default, it fits a linear support
vector machine (SVM).
The regularizer is a penalty added to the loss function that shrinks model
parameters towards the zero vector using either the squared euclidean norm
L2 or the absolute norm L1 or a combination of both (Elastic Net). If the
parameter update crosses the 0.0 value because of the regularizer, the
update is truncated to 0.0 to allow for learning sparse models and achieve
online feature selection.
Read more in the User Guide: `sgd`.
Parameters
----------
- `loss`: {'hinge', 'log_loss', 'modified_huber', 'squared_hinge', 'perceptron', 'squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive'}, default='hinge'
The loss function to be used.
- 'hinge' gives a linear SVM.
- 'log_loss' gives logistic regression, a probabilistic classifier.
- 'modified_huber' is another smooth loss that brings tolerance to
outliers as well as probability estimates.
- 'squared_hinge' is like hinge but is quadratically penalized.
- 'perceptron' is the linear loss used by the perceptron algorithm.
- The other losses, 'squared_error', 'huber', 'epsilon_insensitive' and
'squared_epsilon_insensitive' are designed for regression but can be useful
in classification as well; see
`~sklearn.linear_model.SGDRegressor` for a description.
More details about the losses formulas can be found in the
User Guide: `sgd_mathematical_formulation`.
- `penalty`: {'l2', 'l1', 'elasticnet', None}, default='l2'
The penalty (aka regularization term) to be used. Defaults to 'l2'
which is the standard regularizer for linear SVM models. 'l1' and
'elasticnet' might bring sparsity to the model (feature selection)
not achievable with 'l2'. No penalty is added when set to `None`.
- `alpha`: float, default=0.0001
Constant that multiplies the regularization term. The higher the
value, the stronger the regularization. Also used to compute the
learning rate when `learning_rate` is set to 'optimal'.
Values must be in the range `[0.0, inf)`.
- `l1_ratio`: float, default=0.15
The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1.
l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1.
Only used if `penalty` is 'elasticnet'.
Values must be in the range `[0.0, 1.0]`.
- `fit_intercept`: bool, default=True
Whether the intercept should be estimated or not. If False, the
data is assumed to be already centered.
- `max_iter`: int, default=1000
The maximum number of passes over the training data (aka epochs).
It only impacts the behavior in the ``fit`` method, and not the
`partial_fit` method.
Values must be in the range `[1, inf)`.
*Added in 0.19*
- `tol`: float or None, default=1e-3
The stopping criterion. If it is not None, training will stop
when (loss > best_loss - tol) for ``n_iter_no_change`` consecutive
epochs.
Convergence is checked against the training loss or the
validation loss depending on the `early_stopping` parameter.
Values must be in the range `[0.0, inf)`.
*Added in 0.19*
- `shuffle`: bool, default=True
Whether or not the training data should be shuffled after each epoch.
- `verbose`: int, default=0
The verbosity level.
Values must be in the range `[0, inf)`.
- `epsilon`: float, default=0.1
Epsilon in the epsilon-insensitive loss functions; only if `loss` is
'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive'.
For 'huber', determines the threshold at which it becomes less
important to get the prediction exactly right.
For epsilon-insensitive, any differences between the current prediction
and the correct label are ignored if they are less than this threshold.
Values must be in the range `[0.0, inf)`.
- `n_jobs`: int, default=None
The number of CPUs to use to do the OVA (One Versus All, for
multi-class problems) computation.
``None`` means 1 unless in a `joblib.parallel_backend` context.
``-1`` means using all processors. See `Glossary `
for more details.
- `random_state`: int, RandomState instance, default=None
Used for shuffling the data, when ``shuffle`` is set to ``True``.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
Integer values must be in the range `[0, 2**32 - 1]`.
- `learning_rate`: str, default='optimal'
The learning rate schedule:
- 'constant': `eta = eta0`
- 'optimal': `eta = 1.0 / (alpha * (t + t0))`
where `t0` is chosen by a heuristic proposed by Leon Bottou.
- 'invscaling': `eta = eta0 / pow(t, power_t)`
- 'adaptive': `eta = eta0`, as long as the training keeps decreasing.
Each time n_iter_no_change consecutive epochs fail to decrease the
training loss by tol or fail to increase validation score by tol if
`early_stopping` is `True`, the current learning rate is divided by 5.
*Added in 0.20*
Added 'adaptive' option
- `eta0`: float, default=0.0
The initial learning rate for the 'constant', 'invscaling' or
'adaptive' schedules. The default value is 0.0 as eta0 is not used by
the default schedule 'optimal'.
Values must be in the range `[0.0, inf)`.
- `power_t`: float, default=0.5
The exponent for inverse scaling learning rate.
Values must be in the range `(-inf, inf)`.
- `early_stopping`: bool, default=False
Whether to use early stopping to terminate training when validation
score is not improving. If set to `True`, it will automatically set aside
a stratified fraction of training data as validation and terminate
training when validation score returned by the `score` method is not
improving by at least tol for n_iter_no_change consecutive epochs.
*Added in 0.20*
Added 'early_stopping' option
- `validation_fraction`: float, default=0.1
The proportion of training data to set aside as validation set for
early stopping. Must be between 0 and 1.
Only used if `early_stopping` is True.
Values must be in the range `(0.0, 1.0)`.
*Added in 0.20*
Added 'validation_fraction' option
- `n_iter_no_change`: int, default=5
Number of iterations with no improvement to wait before stopping
fitting.
Convergence is checked against the training loss or the
validation loss depending on the `early_stopping` parameter.
Integer values must be in the range `[1, max_iter)`.
*Added in 0.20*
Added 'n_iter_no_change' option
- `class_weight`: dict, {class_label: weight} or "balanced", default=None
Preset for the class_weight fit parameter.
Weights associated with classes. If not given, all classes
are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
- `warm_start`: bool, default=False
When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
See `the Glossary `.
Repeatedly calling fit or partial_fit when warm_start is True can
result in a different solution than when calling fit a single time
because of the way the data is shuffled.
If a dynamic learning rate is used, the learning rate is adapted
depending on the number of samples already seen. Calling ``fit`` resets
this counter, while ``partial_fit`` will result in increasing the
existing counter.
- `average`: bool or int, default=False
When set to `True`, computes the averaged SGD weights across all
updates and stores the result in the ``coef_`` attribute. If set to
an int greater than 1, averaging will begin once the total number of
samples seen reaches `average`. So ``average=10`` will begin
averaging after seeing 10 samples.
Integer values must be in the range `[1, n_samples]`.
Attributes
----------
- `coef_`: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features)
Weights assigned to the features.
- `intercept_`: ndarray of shape (1,) if n_classes == 2 else (n_classes,)
Constants in decision function.
- `n_iter_`: int
The actual number of iterations before reaching the stopping criterion.
For multiclass fits, it is the maximum over every binary fit.
- `loss_function_`: concrete ``LossFunction``
*Deprecated since 1.4*
Attribute `loss_function_` was deprecated in version 1.4 and will be
removed in 1.6.
- `classes_`: array of shape (n_classes,)
- `t_`: int
Number of weight updates performed during training.
Same as ``(n_iter_ * n_samples + 1)``.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
See Also
--------
- `sklearn.svm.LinearSVC`: Linear support vector classification.
- `LogisticRegression`: Logistic regression.
- `Perceptron`: Inherits from SGDClassifier. ``Perceptron()`` is equivalent to
``SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant",
penalty=None)``.
Examples
--------
>>> import numpy as np
>>> from sklearn.linear_model import SGDClassifier
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.pipeline import make_pipeline
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> # Always scale the input. The most convenient way is to use a pipeline.
>>> clf = make_pipeline(StandardScaler(),
... SGDClassifier(max_iter=1000, tol=1e-3))
>>> clf.fit(X, Y)
Pipeline(steps=[('standardscaler', StandardScaler()),
('sgdclassifier', SGDClassifier())])
>>> print(clf.predict([[-0.8, -1]]))
[1]
24.2.34 /svc
| name | type | default | description |
|---|---|---|---|
| break-ties | |||
| kernel | |||
| gamma | |||
| degree | |||
| decision-function-shape | |||
| probability | |||
| tol | |||
| shrinking | |||
| c | |||
| max-iter | |||
| random-state | |||
| coef-0 | |||
| class-weight | |||
| cache-size | |||
| verbose | |||
| predict-proba? |
C-Support Vector Classification.
The implementation is based on libsvm. The fit time scales at least
quadratically with the number of samples and may be impractical
beyond tens of thousands of samples. For large datasets
consider using `~sklearn.svm.LinearSVC` or
`~sklearn.linear_model.SGDClassifier` instead, possibly after a
`~sklearn.kernel_approximation.Nystroem` transformer or
other :ref:`kernel_approximation`.
The multiclass support is handled according to a one-vs-one scheme.
For details on the precise mathematical formulation of the provided
kernel functions and how `gamma`, `coef0` and `degree` affect each
other, see the corresponding section in the narrative documentation:
:ref:`svm_kernels`.
To learn how to tune SVC's hyperparameters, see the following example:
:ref:`sphx_glr_auto_examples_model_selection_plot_nested_cross_validation_iris.py`
Read more in the User Guide: `svm_classification`.
Parameters
----------
- `C`: float, default=1.0
Regularization parameter. The strength of the regularization is
inversely proportional to C. Must be strictly positive. The penalty
is a squared l2 penalty. For an intuitive visualization of the effects
of scaling the regularization parameter C, see
:ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`.
- `kernel`: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf'
Specifies the kernel type to be used in the algorithm. If
none is given, 'rbf' will be used. If a callable is given it is used to
pre-compute the kernel matrix from data matrices; that matrix should be
an array of shape ``(n_samples, n_samples)``. For an intuitive
visualization of different kernel types see
:ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`.
- `degree`: int, default=3
Degree of the polynomial kernel function ('poly').
Must be non-negative. Ignored by all other kernels.
- `gamma`: {'scale', 'auto'} or float, default='scale'
Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.
- if ``gamma='scale'`` (default) is passed then it uses
1 / (n_features * X.var()) as value of gamma,
- if 'auto', uses 1 / n_features
- if float, must be non-negative.
*Changed in 0.22*
The default value of ``gamma`` changed from 'auto' to 'scale'.
- `coef0`: float, default=0.0
Independent term in kernel function.
It is only significant in 'poly' and 'sigmoid'.
- `shrinking`: bool, default=True
Whether to use the shrinking heuristic.
See the User Guide: `shrinking_svm`.
- `probability`: bool, default=False
Whether to enable probability estimates. This must be enabled prior
to calling `fit`, will slow down that method as it internally uses
5-fold cross-validation, and `predict_proba` may be inconsistent with
`predict`. Read more in the User Guide: `scores_probabilities`.
- `tol`: float, default=1e-3
Tolerance for stopping criterion.
- `cache_size`: float, default=200
Specify the size of the kernel cache (in MB).
- `class_weight`: dict or 'balanced', default=None
Set the parameter C of class i to class_weight[i]*C for
SVC. If not given, all classes are supposed to have
weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
- `verbose`: bool, default=False
Enable verbose output. Note that this setting takes advantage of a
per-process runtime setting in libsvm that, if enabled, may not work
properly in a multithreaded context.
- `max_iter`: int, default=-1
Hard limit on iterations within solver, or -1 for no limit.
- `decision_function_shape`: {'ovo', 'ovr'}, default='ovr'
Whether to return a one-vs-rest ('ovr') decision function of shape
(n_samples, n_classes) as all other classifiers, or the original
one-vs-one ('ovo') decision function of libsvm which has shape
(n_samples, n_classes * (n_classes - 1) / 2). However, note that
internally, one-vs-one ('ovo') is always used as a multi-class strategy
to train models; an ovr matrix is only constructed from the ovo matrix.
The parameter is ignored for binary classification.
*Changed in 0.19*
decision_function_shape is 'ovr' by default.
*Added in 0.17*
*decision_function_shape='ovr'* is recommended.
*Changed in 0.17*
Deprecated *decision_function_shape='ovo' and None*.
- `break_ties`: bool, default=False
If true, ``decision_function_shape='ovr'``, and number of classes > 2,
`predict` will break ties according to the confidence values of
`decision_function`; otherwise the first class among the tied
classes is returned. Please note that breaking ties comes at a
relatively high computational cost compared to a simple predict.
*Added in 0.22*
- `random_state`: int, RandomState instance or None, default=None
Controls the pseudo random number generation for shuffling the data for
probability estimates. Ignored when `probability` is False.
Pass an int for reproducible output across multiple function calls.
See `Glossary `.
Attributes
----------
- `class_weight_`: ndarray of shape (n_classes,)
Multipliers of parameter C for each class.
Computed based on the ``class_weight`` parameter.
- `classes_`: ndarray of shape (n_classes,)
The classes labels.
- `coef_`: ndarray of shape (n_classes * (n_classes - 1) / 2, n_features)
Weights assigned to the features (coefficients in the primal
problem). This is only available in the case of a linear kernel.
`coef_` is a readonly property derived from `dual_coef_` and
`support_vectors_`.
- `dual_coef_`: ndarray of shape (n_classes -1, n_SV)
Dual coefficients of the support vector in the decision
function (see :ref:`sgd_mathematical_formulation`), multiplied by
their targets.
For multiclass, coefficient for all 1-vs-1 classifiers.
The layout of the coefficients in the multiclass case is somewhat
non-trivial. See the multi-class section of the User Guide: `svm_multi_class` for details.
- `fit_status_`: int
0 if correctly fitted, 1 otherwise (will raise warning)
- `intercept_`: ndarray of shape (n_classes * (n_classes - 1) / 2,)
Constants in decision function.
- `n_features_in_`: int
Number of features seen during `fit`.
*Added in 0.24*
- `feature_names_in_`: ndarray of shape (`n_features_in_`,)
Names of features seen during `fit`. Defined only when `X`
has feature names that are all strings.
*Added in 1.0*
- `n_iter_`: ndarray of shape (n_classes * (n_classes - 1) // 2,)
Number of iterations run by the optimization routine to fit the model.
The shape of this attribute depends on the number of models optimized
which in turn depends on the number of classes.
*Added in 1.1*
- `support_`: ndarray of shape (n_SV)
Indices of support vectors.
- `support_vectors_`: ndarray of shape (n_SV, n_features)
Support vectors. An empty array if kernel is precomputed.
- `n_support_`: ndarray of shape (n_classes,), dtype=int32
Number of support vectors for each class.
- `probA_`: ndarray of shape (n_classes * (n_classes - 1) / 2)
- `probB_`: ndarray of shape (n_classes * (n_classes - 1) / 2)
If `probability=True`, it corresponds to the parameters learned in
Platt scaling to produce probability estimates from decision values.
If `probability=False`, it's an empty array. Platt scaling uses the
logistic function
``1 / (1 + exp(decision_value * probA_ + probB_))``
where ``probA_`` and ``probB_`` are learned from the dataset [2]_. For
more information on the multiclass case and training procedure see
section 8 of [1]_.
- `shape_fit_`: tuple of int of shape (n_dimensions_of_X,)
Array dimensions of training vector ``X``.
See Also
--------
- `SVR`: Support Vector Machine for Regression implemented using libsvm.
- `LinearSVC`: Scalable Linear Support Vector Machine for classification
implemented using liblinear. Check the See Also section of
LinearSVC for more comparison element.
References
----------
Examples
import numpy as np from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.svm import SVC clf = make_pipeline(StandardScaler(), SVC(gamma='auto')) clf.fit(X, y) Pipeline(steps=[('standardscaler', StandardScaler()), ('svc', SVC(gamma='auto'))])
print(clf.predict([[-0.8, -1]])) [1]
24.3 :sklearn.regression models
24.3.1 /ada-boost-regressor
| name | type | default | description |
|---|---|---|---|
| estimator | |||
| learning-rate | |||
| loss | |||
| n-estimators | |||
| random-state | |||
| predict-proba? |
24.3.2 /ard-regression
| name | type | default | description |
|---|---|---|---|
| tol | |||
| alpha-2 | |||
| threshold-lambda | |||
| max-iter | |||
| lambda-1 | |||
| copy-x | |||
| lambda-2 | |||
| fit-intercept | |||
| alpha-1 | |||
| verbose | |||
| compute-score | |||
| predict-proba? |
24.3.3 /bagging-regressor
| name | type | default | description |
|---|---|---|---|
| bootstrap | |||
| bootstrap-features | |||
| n-jobs | |||
| random-state | |||
| estimator | |||
| oob-score | |||
| max-features | |||
| warm-start | |||
| n-estimators | |||
| max-samples | |||
| verbose | |||
| predict-proba? |
24.3.4 /bayesian-ridge
| name | type | default | description |
|---|---|---|---|
| tol | |||
| alpha-2 | |||
| max-iter | |||
| lambda-1 | |||
| copy-x | |||
| lambda-2 | |||
| alpha-init | |||
| fit-intercept | |||
| alpha-1 | |||
| lambda-init | |||
| verbose | |||
| compute-score | |||
| predict-proba? |
24.3.5 /cca
| name | type | default | description |
|---|---|---|---|
| copy | |||
| max-iter | |||
| n-components | |||
| scale | |||
| tol | |||
| predict-proba? |
24.3.6 /decision-tree-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| criterion | |||
| predict-proba? |
24.3.7 /dummy-regressor
| name | type | default | description |
|---|---|---|---|
| constant | |||
| quantile | |||
| strategy | |||
| predict-proba? |
24.3.8 /elastic-net
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| selection | |||
| l-1-ratio | |||
| predict-proba? |
24.3.9 /elastic-net-cv
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| l-1-ratio | |||
| verbose | |||
| predict-proba? |
24.3.10 /extra-tree-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| criterion | |||
| predict-proba? |
24.3.11 /extra-trees-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
24.3.12 /gamma-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| fit-intercept | |||
| max-iter | |||
| solver | |||
| tol | |||
| verbose | |||
| warm-start | |||
| predict-proba? |
24.3.13 /gaussian-process-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| copy-x-train | |||
| kernel | |||
| n-restarts-optimizer | |||
| n-targets | |||
| normalize-y | |||
| optimizer | |||
| random-state | |||
| predict-proba? |
24.3.14 /gradient-boosting-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| tol | |||
| subsample | |||
| ccp-alpha | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| init | |||
| alpha | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| n-estimators | |||
| criterion | |||
| loss | |||
| verbose | |||
| predict-proba? |
24.3.15 /hist-gradient-boosting-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| max-leaf-nodes | |||
| scoring | |||
| tol | |||
| early-stopping | |||
| quantile | |||
| max-iter | |||
| random-state | |||
| max-bins | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| loss | |||
| interaction-cst | |||
| verbose | |||
| categorical-features | |||
| l-2-regularization | |||
| predict-proba? |
24.3.16 /huber-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| epsilon | |||
| fit-intercept | |||
| max-iter | |||
| tol | |||
| warm-start | |||
| predict-proba? |
24.3.17 /isotonic-regression
| name | type | default | description |
|---|---|---|---|
| increasing | |||
| out-of-bounds | |||
| y-max | |||
| y-min | |||
| predict-proba? |
24.3.18 /k-neighbors-regressor
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| leaf-size | |||
| metric | |||
| metric-params | |||
| n-jobs | |||
| n-neighbors | |||
| p | |||
| weights | |||
| predict-proba? |
24.3.19 /kernel-ridge
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| coef-0 | |||
| degree | |||
| gamma | |||
| kernel | |||
| kernel-params | |||
| predict-proba? |
24.3.20 /lars
| name | type | default | description |
|---|---|---|---|
| fit-path | |||
| eps | |||
| random-state | |||
| jitter | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| n-nonzero-coefs | |||
| verbose | |||
| predict-proba? |
24.3.21 /lars-cv
| name | type | default | description |
|---|---|---|---|
| eps | |||
| max-n-alphas | |||
| max-iter | |||
| n-jobs | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| verbose | |||
| predict-proba? |
24.3.22 /lasso
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| selection | |||
| predict-proba? |
24.3.23 /lasso-cv
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| verbose | |||
| predict-proba? |
24.3.24 /lasso-lars
| name | type | default | description |
|---|---|---|---|
| positive | |||
| fit-path | |||
| eps | |||
| max-iter | |||
| random-state | |||
| jitter | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| alpha | |||
| verbose | |||
| predict-proba? |
24.3.25 /lasso-lars-cv
| name | type | default | description |
|---|---|---|---|
| positive | |||
| eps | |||
| max-n-alphas | |||
| max-iter | |||
| n-jobs | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| verbose | |||
| predict-proba? |
24.3.26 /lasso-lars-ic
| name | type | default | description |
|---|---|---|---|
| positive | |||
| eps | |||
| noise-variance | |||
| max-iter | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| criterion | |||
| verbose | |||
| predict-proba? |
24.3.27 /linear-regression
| name | type | default | description |
|---|---|---|---|
| copy-x | |||
| fit-intercept | |||
| n-jobs | |||
| positive | |||
| predict-proba? |
24.3.28 /linear-svr
| name | type | default | description |
|---|---|---|---|
| tol | |||
| intercept-scaling | |||
| c | |||
| max-iter | |||
| random-state | |||
| dual | |||
| fit-intercept | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.29 /mlp-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| activation | |||
| hidden-layer-sizes | |||
| tol | |||
| beta-2 | |||
| early-stopping | |||
| nesterovs-momentum | |||
| batch-size | |||
| solver | |||
| shuffle | |||
| power-t | |||
| max-fun | |||
| beta-1 | |||
| max-iter | |||
| random-state | |||
| momentum | |||
| learning-rate-init | |||
| alpha | |||
| warm-start | |||
| validation-fraction | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.30 /multi-task-elastic-net
| name | type | default | description |
|---|---|---|---|
| tol | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| selection | |||
| l-1-ratio | |||
| predict-proba? |
24.3.31 /multi-task-elastic-net-cv
| name | type | default | description |
|---|---|---|---|
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| l-1-ratio | |||
| verbose | |||
| predict-proba? |
24.3.32 /multi-task-lasso
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| copy-x | |||
| fit-intercept | |||
| max-iter | |||
| random-state | |||
| selection | |||
| tol | |||
| warm-start | |||
| predict-proba? |
24.3.33 /multi-task-lasso-cv
| name | type | default | description |
|---|---|---|---|
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| verbose | |||
| predict-proba? |
24.3.34 /nu-svr
| name | type | default | description |
|---|---|---|---|
| kernel | |||
| gamma | |||
| degree | |||
| tol | |||
| nu | |||
| shrinking | |||
| c | |||
| max-iter | |||
| coef-0 | |||
| cache-size | |||
| verbose | |||
| predict-proba? |
24.3.35 /orthogonal-matching-pursuit
| name | type | default | description |
|---|---|---|---|
| fit-intercept | |||
| n-nonzero-coefs | |||
| precompute | |||
| tol | |||
| predict-proba? |
24.3.36 /orthogonal-matching-pursuit-cv
| name | type | default | description |
|---|---|---|---|
| copy | |||
| cv | |||
| fit-intercept | |||
| max-iter | |||
| n-jobs | |||
| verbose | |||
| predict-proba? |
24.3.37 /passive-aggressive-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| average | |||
| tol | |||
| early-stopping | |||
| shuffle | |||
| c | |||
| max-iter | |||
| random-state | |||
| fit-intercept | |||
| warm-start | |||
| validation-fraction | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.38 /pls-canonical
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| copy | |||
| max-iter | |||
| n-components | |||
| scale | |||
| tol | |||
| predict-proba? |
24.3.39 /pls-regression
| name | type | default | description |
|---|---|---|---|
| copy | |||
| max-iter | |||
| n-components | |||
| scale | |||
| tol | |||
| predict-proba? |
24.3.40 /poisson-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| fit-intercept | |||
| max-iter | |||
| solver | |||
| tol | |||
| verbose | |||
| warm-start | |||
| predict-proba? |
24.3.41 /quantile-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| fit-intercept | |||
| quantile | |||
| solver | |||
| solver-options | |||
| predict-proba? |
24.3.42 /radius-neighbors-regressor
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| leaf-size | |||
| metric | |||
| metric-params | |||
| n-jobs | |||
| p | |||
| radius | |||
| weights | |||
| predict-proba? |
24.3.43 /random-forest-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
24.3.44 /ransac-regressor
| name | type | default | description |
|---|---|---|---|
| is-data-valid | |||
| max-skips | |||
| random-state | |||
| min-samples | |||
| stop-probability | |||
| estimator | |||
| stop-n-inliers | |||
| max-trials | |||
| residual-threshold | |||
| is-model-valid | |||
| loss | |||
| stop-score | |||
| predict-proba? |
24.3.45 /ridge
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| copy-x | |||
| fit-intercept | |||
| max-iter | |||
| positive | |||
| random-state | |||
| solver | |||
| tol | |||
| predict-proba? |
24.3.46 /ridge-cv
| name | type | default | description |
|---|---|---|---|
| alpha-per-target | |||
| alphas | |||
| cv | |||
| fit-intercept | |||
| gcv-mode | |||
| scoring | |||
| store-cv-results | |||
| store-cv-values | |||
| predict-proba? |
24.3.47 /sgd-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| average | |||
| tol | |||
| early-stopping | |||
| eta-0 | |||
| shuffle | |||
| penalty | |||
| power-t | |||
| max-iter | |||
| random-state | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| l-1-ratio | |||
| validation-fraction | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.48 /svr
| name | type | default | description |
|---|---|---|---|
| kernel | |||
| gamma | |||
| degree | |||
| tol | |||
| shrinking | |||
| c | |||
| max-iter | |||
| coef-0 | |||
| cache-size | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.49 /theil-sen-regressor
| name | type | default | description |
|---|---|---|---|
| max-subpopulation | |||
| tol | |||
| n-subsamples | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| verbose | |||
| predict-proba? |
24.3.50 /transformed-target-regressor
| name | type | default | description |
|---|---|---|---|
| check-inverse | |||
| func | |||
| inverse-func | |||
| regressor | |||
| transformer | |||
| predict-proba? |
24.3.51 /tweedie-regressor
| name | type | default | description |
|---|---|---|---|
| tol | |||
| solver | |||
| power | |||
| max-iter | |||
| link | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| verbose | |||
| predict-proba? |