24 Sklearn model reference
As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.
This specific chapter focuses on the models of the scijit-learn Python library, which is wrapped by sklearn-clj.
(ns noj-book.sklearn-reference
(:require
[noj-book.utils.render-tools :refer [render-key-info]]
[scicloj.kindly.v4.kind :as kind]
[scicloj.metamorph.core :as mm]
[scicloj.metamorph.ml :as ml]
[tech.v3.dataset.tensor :as dst]
[libpython-clj2.python :refer [py.- ->jvm]]
[tech.v3.dataset.metamorph :as ds-mm]
[noj-book.utils.render-tools-sklearn]
[scicloj.sklearn-clj.ml]))24.1 Sklearn model reference
Below we find all sklearn models with their parameters and the original documentation.
The parameters are given as Clojure keys in kebab-case. As the document texts are imported from Python, they refer to the Python spelling of the parameter.
But the translation between the two should be obvious.
Example: logistic regression
(def ds (dst/tensor->dataset [[0 0 0] [1 1 1] [2 2 2]]))Make pipe with sklearn model ‘logistic-regression’
(def pipe
(mm/pipeline
(ds-mm/set-inference-target 2)
{:metamorph/id :model}
(ml/model {:model-type :sklearn.classification/logistic-regression
:max-iter 100})))Train model:
(def fitted-ctx
(pipe {:metamorph/data ds
:metamorph/mode :fit}))Predict on new data:
(->
(mm/transform-pipe
(dst/tensor->dataset [[3 4 5]])
pipe
fitted-ctx)
:metamorph/data):_unnamed [1 3]:
| 0 | 1 | 2 |
|---|---|---|
| 0.00725794 | 0.10454345 | 2.0 |
Access model details via Python interop (using libpython-clj):
(-> fitted-ctx :model :model-data :model
(py.- coef_)
(->jvm))#tech.v3.tensor<float64>[3 2]
[[ -0.4807 -0.4807]
[-2.061E-05 -2.061E-05]
[ 0.4807 0.4807]]All model attributes are also included in the context.
(def model-attributes
(-> fitted-ctx :model :model-data :attributes))(kind/hiccup
[:dl (map
(fn [[k v]]
[:span
(vector :dt k)
(vector :dd (clojure.pprint/write v :stream nil))])
model-attributes)])- n_features_in_
- 2
- coef_
- [[-4.80679547e-01 -4.80679547e-01] [-2.06085772e-05 -2.06085772e-05] [ 4.80700156e-01 4.80700156e-01]]
- intercept_
- [ 0.87322115 0.17611579 -1.04933694]
- n_iter_
- [11]
- classes_
- [0. 1. 2.]
24.2 :sklearn.classification models
24.2.1 /ada-boost-classifier
| name | type | default | description |
|---|---|---|---|
| estimator | |||
| learning-rate | |||
| n-estimators | |||
| random-state | |||
| predict-proba? |
An AdaBoost classifier.
An AdaBoost [1]_ classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
This class implements the algorithm based on [2]_.
Read more in the User Guide: adaboost.
Added in 0.14
Parameters
estimator: object, default=None The base estimator from which the boosted ensemble is built. Support for sample weighting is required, as well as properclasses_andn_classes_attributes. IfNone, then the base estimator is~sklearn.tree.DecisionTreeClassifierinitialized withmax_depth=1.Added in 1.2
base_estimatorwas renamed toestimator.n_estimators: int, default=50 The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early. Values must be in the range[1, inf).learning_rate: float, default=1.0 Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. There is a trade-off between thelearning_rateandn_estimatorsparameters. Values must be in the range(0.0, inf).random_state: int, RandomState instance or None, default=None Controls the random seed given at eachestimatorat each boosting iteration. Thus, it is only used whenestimatorexposes arandom_state. Pass an int for reproducible output across multiple function calls. SeeGlossary.
Attributes
estimator_: estimator The base estimator from which the ensemble is grown.Added in 1.2
base_estimator_was renamed toestimator_.estimators_: list of classifiers The collection of fitted sub-estimators.classes_: ndarray of shape (n_classes,) The classes labels.n_classes_: int The number of classes.estimator_weights_: ndarray of floats Weights for each estimator in the boosted ensemble.estimator_errors_: ndarray of floats Classification error for each estimator in the boosted ensemble.feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances if supported by theestimator(when based on decision trees).Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importanceas an alternative.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
AdaBoostRegressor: An AdaBoost regressor that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction.GradientBoostingClassifier: GB builds an additive model in a forward stage-wise fashion. Regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.sklearn.tree.DecisionTreeClassifier: A non-parametric supervised learning method used for classification. Creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
References
[1] Y. Freund, R. Schapire, "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", 1995.
[2] :doi:
J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost." Statistics and its Interface 2.3 (2009): 349-360. <10.4310/SII.2009.v2.n3.a8>
Examples
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
clf = AdaBoostClassifier(n_estimators=100, random_state=0)
clf.fit(X, y)
AdaBoostClassifier(n_estimators=100, random_state=0)
clf.predict([[0, 0, 0, 0]])
array([1])
clf.score(X, y)
0.96
For a detailed example of using AdaBoost to fit a sequence of DecisionTrees as weaklearners, please refer to :ref:sphx_glr_auto_examples_ensemble_plot_adaboost_multiclass.py.
For a detailed example of using AdaBoost to fit a non-linearly separable classification dataset composed of two Gaussian quantiles clusters, please refer to :ref:sphx_glr_auto_examples_ensemble_plot_adaboost_twoclass.py.
24.2.2 /bagging-classifier
| name | type | default | description |
|---|---|---|---|
| bootstrap | |||
| bootstrap-features | |||
| n-jobs | |||
| random-state | |||
| estimator | |||
| oob-score | |||
| max-features | |||
| warm-start | |||
| n-estimators | |||
| max-samples | |||
| verbose | |||
| predict-proba? |
A Bagging classifier.
A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.
This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]. If samples are drawn with replacement, then the method is known as Bagging [2]. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [3]. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [4].
Read more in the User Guide: bagging.
Added in 0.15
Parameters
estimator: object, default=None The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a~sklearn.tree.DecisionTreeClassifier.Added in 1.2
base_estimatorwas renamed toestimator.n_estimators: int, default=10 The number of base estimators in the ensemble.max_samples: int or float, default=None The number of samples to draw from X to train each base estimator (with replacement by default, seebootstrapfor more details).- If None, then draw
X.shape[0]samples irrespective ofsample_weight. - If int, then draw
max_samplessamples. - If float, then draw
max_samples * X.shape[0]unweighted samples ormax_samples * sample_weight.sum()weighted samples.
- If None, then draw
max_features: int or float, default=1.0 The number of features to draw from X to train each base estimator ( without replacement by default, seebootstrap_featuresfor more details).- If int, then draw
max_featuresfeatures. - If float, then draw
max(1, int(max_features * n_features_in_))features.
- If int, then draw
bootstrap: bool, default=True Whether samples are drawn with replacement. If False, sampling without replacement is performed. If fitting withsample_weight, it is strongly recommended to choose True, as only drawing with replacement will ensure the expected frequency semantics ofsample_weight.bootstrap_features: bool, default=False Whether features are drawn with replacement.oob_score: bool, default=False Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. Seethe Glossary.Added in 0.17 warm_start constructor parameter.
n_jobs: int, default=None The number of jobs to run in parallel for bothfitandpredict.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.random_state: int, RandomState instance or None, default=None Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts arandom_stateattribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. SeeGlossary.verbose: int, default=0 Controls the verbosity when fitting and predicting.
Attributes
estimator_: estimator The base estimator from which the ensemble is grown.Added in 1.2
base_estimator_was renamed toestimator_.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
estimators_: list of estimators The collection of fitted base estimators.estimators_samples_: list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.estimators_features_: list of arrays The subset of drawn features for each base estimator.classes_: ndarray of shape (n_classes,) The classes labels.n_classes_: int or list The number of classes.oob_score_: float Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only whenoob_scoreis True.oob_decision_function_: ndarray of shape (n_samples, n_classes) Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case,oob_decision_function_might contain NaN. This attribute exists only whenoob_scoreis True.
See Also
BaggingRegressor: A Bagging regressor.
References
[1] L. Breiman, "Pasting small votes for classification in large databases and on-line", Machine Learning, 36(1), 85-103, 1999.
[2] L. Breiman, "Bagging predictors", Machine Learning, 24(2), 123-140, 1996.
[3] T. Ho, "The random subspace method for constructing decision forests", Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.
[4] G. Louppe and P. Geurts, "Ensembles on Random Patches", Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.
Examples
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
clf = BaggingClassifier(estimator=SVC(),
n_estimators=10, random_state=0).fit(X, y)
clf.predict([[0, 0, 0, 0]])
array([1])
24.2.3 /bernoulli-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| binarize | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| predict-proba? |
Naive Bayes classifier for multivariate Bernoulli models.
Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.
Read more in the User Guide: bernoulli_naive_bayes.
Parameters
alpha: float or array-like of shape (n_features,), default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.Added in 1.2 Changed in 1.4 The default value of
force_alphachanged toTrue.binarize: float or None, default=0.0 Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.fit_prior: bool, default=True Whether to learn class prior probabilities or not. If false, a uniform prior will be used.class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.
Attributes
class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.class_log_prior_: ndarray of shape (n_classes,) Log probability of each class (smoothed).classes_: ndarray of shape (n_classes,) Class labels known to the classifierfeature_count_: ndarray of shape (n_classes, n_features) Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.feature_log_prob_: ndarray of shape (n_classes, n_features) Empirical log probability of features given a class, P(x_i|y).n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
CategoricalNB: Naive Bayes classifier for categorical features.ComplementNB: The Complement Naive Bayes classifier described in Rennie et al. (2003).GaussianNB: Gaussian Naive Bayes (GaussianNB).MultinomialNB: Naive Bayes classifier for multinomial models.
References
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html
A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.
V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes -- Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).
Examples
import numpy as np rng = np.random.RandomState(1) X = rng.randint(5, size=(6, 100)) Y = np.array([1, 2, 3, 4, 4, 5]) from sklearn.naive_bayes import BernoulliNB clf = BernoulliNB() clf.fit(X, Y)
BernoulliNB()
print(clf.predict(X[2:3]))
[3]
24.2.4 /calibrated-classifier-cv
| name | type | default | description |
|---|---|---|---|
| cv | |||
| ensemble | |||
| estimator | |||
| method | |||
| n-jobs | |||
| predict-proba? |
Calibrate probabilities using isotonic, sigmoid, or temperature scaling.
This class uses cross-validation to both estimate the parameters of a classifier and subsequently calibrate a classifier. With ensemble=True, for each cv split it fits a copy of the base estimator to the training subset, and calibrates it using the testing subset. For prediction, predicted probabilities are averaged across these individual calibrated classifiers. When ensemble=False, cross-validation is used to obtain unbiased predictions, via ~sklearn.model_selection.cross_val_predict, which are then used for calibration. For prediction, the base estimator, trained using all the data, is used. This is the prediction method implemented when probabilities=True for ~sklearn.svm.SVC and ~sklearn.svm.NuSVC estimators (see User Guide: scores_probabilities for details).
Already fitted classifiers can be calibrated by wrapping the model in a ~sklearn.frozen.FrozenEstimator. In this case all provided data is used for calibration. The user has to take care manually that data for model fitting and calibration are disjoint.
The calibration is based on the decision_function method of the estimator if it exists, else on predict_proba.
Read more in the User Guide: calibration. In order to learn more on the CalibratedClassifierCV class, see the following calibration examples: :ref:sphx_glr_auto_examples_calibration_plot_calibration.py, :ref:sphx_glr_auto_examples_calibration_plot_calibration_curve.py, and :ref:sphx_glr_auto_examples_calibration_plot_calibration_multiclass.py.
Parameters
estimator: estimator instance, default=None The classifier whose output need to be calibrated to provide more accuratepredict_probaoutputs. The default classifier is a~sklearn.svm.LinearSVC.Added in 1.2
method: {'sigmoid', 'isotonic', 'temperature'}, default='sigmoid' The method to use for calibration. Can be:- 'sigmoid', which corresponds to Platt's method (i.e. a binary logistic regression model).
- 'isotonic', which is a non-parametric approach.
- 'temperature', temperature scaling.
Sigmoid and isotonic calibration methods natively support only binary classifiers and extend to multi-class classification using a One-vs-Rest (OvR) strategy with post-hoc renormalization, i.e., adjusting the probabilities after calibration to ensure they sum up to 1.
In contrast, temperature scaling naturally supports multi-class calibration by applying
softmax(classifier_logits/T)with a value ofT(temperature) that optimizes the log loss.For very uncalibrated classifiers on very imbalanced datasets, sigmoid calibration might be preferred because it fits an additional intercept parameter. This helps shift decision boundaries appropriately when the classifier being calibrated is biased towards the majority class.
Isotonic calibration is not recommended when the number of calibration samples is too low
(≪1000)since it then tends to overfit.Changed in 1.8 Added option 'temperature'.
cv: int, cross-validation generator, or iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are:- None, to use the default 5-fold cross-validation,
- integer, to specify the number of folds.
CV splitter,- An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if
yis binary or multiclass,~sklearn.model_selection.StratifiedKFoldis used. Ifyis neither binary nor multiclass,~sklearn.model_selection.KFoldis used.Refer to the User Guide:
cross_validationfor the various cross-validation strategies that can be used here.Changed in 0.22
cvdefault value if None changed from 3-fold to 5-fold.n_jobs: int, default=None Number of jobs to run in parallel.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.Base estimator clones are fitted in parallel across cross-validation iterations.
See
Glossaryfor more details.Added in 0.24
ensemble: bool, or "auto", default="auto" Determines how the calibrator is fitted."auto" will use
Falseif theestimatoris a~sklearn.frozen.FrozenEstimator, andTrueotherwise.If
True, theestimatoris fitted using training data, and calibrated using testing data, for eachcvfold. The final estimator is an ensemble ofn_cvfitted classifier and calibrator pairs, wheren_cvis the number of cross-validation folds. The output is the average predicted probabilities of all pairs.If
False,cvis used to compute unbiased predictions, via~sklearn.model_selection.cross_val_predict, which are then used for calibration. At prediction time, the classifier used is theestimatortrained on all the data. Note that this method is also internally implemented insklearn.svmestimators with theprobabilities=Trueparameter.Added in 0.24
Changed in 1.6
"auto"option is added and is the default.
Attributes
classes_: ndarray of shape (n_classes,) The class labels.n_features_in_: int Number of features seen duringfit. Only defined if the underlying estimator exposes such an attribute when fit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Only defined if the underlying estimator exposes such an attribute when fit.Added in 1.0
calibrated_classifiers_: list (len() equal to cv or 1 ifensemble=False) The list of classifier and calibrator pairs.- When
ensemble=True,n_cvfittedestimatorand calibrator pairs.n_cvis the number of cross-validation folds. - When
ensemble=False, theestimator, fitted on all the data, and fitted calibrator.
Changed in 0.24 Single calibrated classifier case when
ensemble=False.- When
See Also
calibration_curve: Compute true and predicted probabilities for a calibration curve.
References
[1] B. Zadrozny & C. Elkan. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , ICML 2001.
[2] B. Zadrozny & C. Elkan. Transforming Classifier Scores into Accurate Multiclass Probability Estimates , KDD 2002.
[3] J. Platt. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods , 1999.
[4] A. Niculescu-Mizil & R. Caruana. Predicting Good Probabilities with Supervised Learning , ICML 2005.
[5] Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger. :doi:
On Calibration of Modern Neural Networks<10.48550/arXiv.1706.04599>. Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1321-1330, 2017.
Examples
from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB
from sklearn.calibration import CalibratedClassifierCV
X, y = make_classification(n_samples=100, n_features=2,
n_redundant=0, random_state=42)
base_clf = GaussianNB()
calibrated_clf = CalibratedClassifierCV(base_clf, cv=3)
calibrated_clf.fit(X, y)
CalibratedClassifierCV(...)
len(calibrated_clf.calibrated_classifiers_)
3
calibrated_clf.predict_proba(X)[:5, :]
array([[0.110, 0.889],
[0.072, 0.927],
[0.928, 0.072],
[0.928, 0.072],
[0.072, 0.928]])
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=100, n_features=2,
n_redundant=0, random_state=42)
X_train, X_calib, y_train, y_calib = train_test_split(
X, y, random_state=42
)
base_clf = GaussianNB()
base_clf.fit(X_train, y_train)
GaussianNB()
from sklearn.frozen import FrozenEstimator calibrated_clf = CalibratedClassifierCV(FrozenEstimator(base_clf)) calibrated_clf.fit(X_calib, y_calib)
CalibratedClassifierCV(...)
len(calibrated_clf.calibrated_classifiers_)
1
calibrated_clf.predict_proba([[-0.5, 0.5]])
array([[0.936, 0.063]])
24.2.5 /categorical-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| min-categories | |||
| predict-proba? |
Naive Bayes classifier for categorical features.
The categorical Naive Bayes classifier is suitable for classification with discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution.
Read more in the User Guide: categorical_naive_bayes.
Parameters
alpha: float, default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.Added in 1.2 Changed in 1.4 The default value of
force_alphachanged toTrue.fit_prior: bool, default=True Whether to learn class prior probabilities or not. If false, a uniform prior will be used.class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.min_categories: int or array-like of shape (n_features,), default=None Minimum number of categories per feature.- integer: Sets the minimum number of categories per feature to
n_categoriesfor each features. - array-like: shape (n_features,) where
n_categories[i]holds the minimum number of categories for the ith column of the input. - None (default): Determines the number of categories automatically from the training data.
Added in 0.24
- integer: Sets the minimum number of categories per feature to
Attributes
category_count_: list of arrays of shape (n_features,) Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the number of samples encountered for each class and category of the specific feature.class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.class_log_prior_: ndarray of shape (n_classes,) Smoothed empirical log probability for each class.classes_: ndarray of shape (n_classes,) Class labels known to the classifierfeature_log_prob_: list of arrays of shape (n_features,) Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the empirical log probability of categories given the respective feature and class,P(x_i|y).n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_categories_: ndarray of shape (n_features,), dtype=np.int64 Number of categories for each feature. This value is inferred from the data or set by the minimum number of categories.Added in 0.24
See Also
BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.ComplementNB: Complement Naive Bayes classifier.GaussianNB: Gaussian Naive Bayes.MultinomialNB: Naive Bayes classifier for multinomial models.
Examples
import numpy as np rng = np.random.RandomState(1) X = rng.randint(5, size=(6, 100)) y = np.array([1, 2, 3, 4, 5, 6]) from sklearn.naive_bayes import CategoricalNB clf = CategoricalNB() clf.fit(X, y)
CategoricalNB()
print(clf.predict(X[2:3]))
[3]
24.2.6 /complement-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| norm | |||
| predict-proba? |
The Complement Naive Bayes classifier described in Rennie et al. (2003).
The Complement Naive Bayes classifier was designed to correct the "severe assumptions" made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets.
Read more in the User Guide: complement_naive_bayes.
Added in 0.20
Parameters
alpha: float or array-like of shape (n_features,), default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.Added in 1.2 Changed in 1.4 The default value of
force_alphachanged toTrue.fit_prior: bool, default=True Only used in edge case with a single class in the training set.class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. Not used.norm: bool, default=False Whether or not a second normalization of the weights is performed. The default behavior mirrors the implementations found in Mahout and Weka, which do not follow the full algorithm described in Table 9 of the paper.
Attributes
class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.class_log_prior_: ndarray of shape (n_classes,) Smoothed empirical log probability for each class. Only used in edge case with a single class in the training set.classes_: ndarray of shape (n_classes,) Class labels known to the classifierfeature_all_: ndarray of shape (n_features,) Number of samples encountered for each feature during fitting. This value is weighted by the sample weight when provided.feature_count_: ndarray of shape (n_classes, n_features) Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.feature_log_prob_: ndarray of shape (n_classes, n_features) Empirical weights for class complements.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.CategoricalNB: Naive Bayes classifier for categorical features.GaussianNB: Gaussian Naive Bayes.MultinomialNB: Naive Bayes classifier for multinomial models.
References
Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623). https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
Examples
import numpy as np rng = np.random.RandomState(1) X = rng.randint(5, size=(6, 100)) y = np.array([1, 2, 3, 4, 5, 6]) from sklearn.naive_bayes import ComplementNB clf = ComplementNB() clf.fit(X, y)
ComplementNB()
print(clf.predict(X[2:3]))
[3]
24.2.7 /decision-tree-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| class-weight | |||
| criterion | |||
| predict-proba? |
A decision tree classifier.
Read more in the User Guide: tree.
Parameters
criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation.splitter: {"best", "random"}, default="best" The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:- If int, then consider
min_samples_splitas the minimum number. - If float, then
min_samples_splitis a fraction andceil(min_samples_split * n_samples)are the minimum number of samples for each split.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at leastmin_samples_leaftraining samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.- If int, then consider
min_samples_leafas the minimum number. - If float, then
min_samples_leafis a fraction andceil(min_samples_leaf * n_samples)are the minimum number of samples for each node.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.max_features: int, float or {"sqrt", "log2"}, default=None The number of features to consider when looking for the best split:- If int, then consider
max_featuresfeatures at each split. - If float, then
max_featuresis a fraction andmax(1, int(max_features * n_features_in_))features are considered at each split. - If "sqrt", then
max_features=sqrt(n_features). - If "log2", then
max_features=log2(n_features). - If None, then
max_features=n_features.
- If int, then consider
🛈 Note
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
random_state: int, RandomState instance or None, default=None Controls the randomness of the estimator. The features are always randomly permuted at each split, even ifsplitteris set to"best". Whenmax_features < n_features, the algorithm will selectmax_featuresat random at each split before finding the best split among them. But the best found split may vary across different runs, even ifmax_features=n_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting,random_statehas to be fixed to an integer. SeeGlossaryfor details.max_leaf_nodes: int, default=None Grow a tree withmax_leaf_nodesin best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
class_weight: dict, list of dict or "balanced", default=None Weights associated with classes in the form{class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller thanccp_alphawill be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruningfor details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.pyfor an example of such pruning.Added in 0.22
monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decreaseIf monotonic_cst is None, no constraints are applied.
Monotonicity constraints are not supported for: - multiclass classifications (i.e. when
n_classes > 2), - multioutput classifications (i.e. whenn_outputs_ > 1), - classifications trained on data with missing values.The constraints hold over the probability of the positive class.
Read more in the User Guide:
monotonic_cst_gbdt.Added in 1.4
Attributes
classes_: ndarray of shape (n_classes,) or list of ndarray The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importanceas an alternative.max_features_: int The inferred value of max_features.n_classes_: int or list of int The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_outputs_: int The number of outputs whenfitis performed.tree_: Tree instance The underlying Tree object. Please refer tohelp(sklearn.tree._tree.Tree)for attributes of Tree object and :ref:sphx_glr_auto_examples_tree_plot_unveil_tree_structure.pyfor basic usage of these attributes.
See Also
DecisionTreeRegressor: A decision tree regressor.
Notes
The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.
The predict method operates using the numpy.argmax function on the outputs of predict_proba. This means that in case the highest predicted probabilities are tied, the classifier will predict the tied class with the lowest index in classes_.
References
[2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification and Regression Trees", Wadsworth, Belmont, CA, 1984.
[3] T. Hastie, R. Tibshirani and J. Friedman. "Elements of Statistical Learning", Springer, 2009.
[4] L. Breiman, and A. Cutler, "Random Forests", https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
Examples
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
cross_val_score(clf, iris.data, iris.target, cv=10)
# doctest: +SKIP
array([ 1. , 0.93, 0.86, 0.93, 0.93,
0.93, 0.93, 1. , 0.93, 1. ])
24.2.8 /dummy-classifier
| name | type | default | description |
|---|---|---|---|
| constant | |||
| random-state | |||
| strategy | |||
| predict-proba? |
DummyClassifier makes predictions that ignore the input features.
This classifier serves as a simple baseline to compare against other more complex classifiers.
The specific behavior of the baseline is selected with the strategy parameter.
All strategies make predictions that ignore the input feature values passed as the X argument to fit and predict. The predictions, however, typically depend on values observed in the y parameter passed to fit.
Note that the "stratified" and "uniform" strategies lead to non-deterministic predictions that can be rendered deterministic by setting the random_state parameter if needed. The other strategies are naturally deterministic and, once fit, always return the same constant prediction for any value of X.
Read more in the User Guide: dummy_estimators.
Added in 0.13
Parameters
strategy: {"most_frequent", "prior", "stratified", "uniform", "constant"}, default="prior" Strategy to use to generate predictions."most_frequent": the
predictmethod always returns the most frequent class label in the observedyargument passed tofit. Thepredict_probamethod returns the matching one-hot encoded vector."prior": the
predictmethod always returns the most frequent class label in the observedyargument passed tofit(like "most_frequent").predict_probaalways returns the empirical class distribution ofyalso known as the empirical class prior distribution."stratified": the
predict_probamethod randomly samples one-hot vectors from a multinomial distribution parametrized by the empirical class prior probabilities. Thepredictmethod returns the class label which got probability one in the one-hot vector ofpredict_proba. Each sampled row of both methods is therefore independent and identically distributed."uniform": generates predictions uniformly at random from the list of unique classes observed in
y, i.e. each class has equal probability."constant": always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.
Changed in 0.24 The default value of
strategyhas changed to "prior" in version 0.24.
random_state: int, RandomState instance or None, default=None Controls the randomness to generate the predictions whenstrategy='stratified'orstrategy='uniform'. Pass an int for reproducible output across multiple function calls. SeeGlossary.constant: int or str or array-like of shape (n_outputs,), default=None The explicit constant as predicted by the "constant" strategy. This parameter is useful only for the "constant" strategy.
Attributes
classes_: ndarray of shape (n_classes,) or list of such arrays Unique class labels observed iny. For multi-output classification problems, this attribute is a list of arrays as each output has an independent set of possible classes.n_classes_: int or list of int Number of label for each output.class_prior_: ndarray of shape (n_classes,) or list of such arrays Frequency of each class observed iny. For multioutput classification problems, this is computed independently for each output.n_features_in_: int Number of features seen duringfit.feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.n_outputs_: int Number of outputs.sparse_output_: bool True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the inputyis passed in sparse format.
See Also
DummyRegressor: Regressor that makes predictions using simple rules.
Examples
import numpy as np from sklearn.dummy import DummyClassifier X = np.array([-1, 1, 1, 1]) y = np.array([0, 1, 1, 1]) dummy_clf = DummyClassifier(strategy="most_frequent") dummy_clf.fit(X, y)
DummyClassifier(strategy='most_frequent')
dummy_clf.predict(X)
array([1, 1, 1, 1])
dummy_clf.score(X, y)
0.75
24.2.9 /extra-tree-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| class-weight | |||
| criterion | |||
| predict-proba? |
An extremely randomized tree classifier.
Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.
Warning: Extra-trees should only be used within ensemble methods.
Read more in the User Guide: tree.
Parameters
criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation.splitter: {"random", "best"}, default="random" The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:- If int, then consider
min_samples_splitas the minimum number. - If float, then
min_samples_splitis a fraction andceil(min_samples_split * n_samples)are the minimum number of samples for each split.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at leastmin_samples_leaftraining samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.- If int, then consider
min_samples_leafas the minimum number. - If float, then
min_samples_leafis a fraction andceil(min_samples_leaf * n_samples)are the minimum number of samples for each node.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.max_features: int, float, {"sqrt", "log2"} or None, default="sqrt" The number of features to consider when looking for the best split:- If int, then consider
max_featuresfeatures at each split. - If float, then
max_featuresis a fraction andmax(1, int(max_features * n_features_in_))features are considered at each split. - If "sqrt", then
max_features=sqrt(n_features). - If "log2", then
max_features=log2(n_features). - If None, then
max_features=n_features.
Changed in 1.1 The default of
max_featureschanged from"auto"to"sqrt".Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_featuresfeatures.- If int, then consider
random_state: int, RandomState instance or None, default=None Used to pick randomly themax_featuresused at each split. SeeGlossaryfor details.max_leaf_nodes: int, default=None Grow a tree withmax_leaf_nodesin best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
class_weight: dict, list of dict or "balanced", default=None Weights associated with classes in the form{class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller thanccp_alphawill be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruningfor details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.pyfor an example of such pruning.Added in 0.22
monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decreaseIf monotonic_cst is None, no constraints are applied.
Monotonicity constraints are not supported for: - multiclass classifications (i.e. when
n_classes > 2), - multioutput classifications (i.e. whenn_outputs_ > 1), - classifications trained on data with missing values.The constraints hold over the probability of the positive class.
Read more in the User Guide:
monotonic_cst_gbdt.Added in 1.4
Attributes
classes_: ndarray of shape (n_classes,) or list of ndarray The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).max_features_: int The inferred value of max_features.n_classes_: int or list of int The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importanceas an alternative.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_outputs_: int The number of outputs whenfitis performed.tree_: Tree instance The underlying Tree object. Please refer tohelp(sklearn.tree._tree.Tree)for attributes of Tree object and :ref:sphx_glr_auto_examples_tree_plot_unveil_tree_structure.pyfor basic usage of these attributes.
See Also
ExtraTreeRegressor: An extremely randomized tree regressor.sklearn.ensemble.ExtraTreesClassifier: An extra-trees classifier.sklearn.ensemble.ExtraTreesRegressor: An extra-trees regressor.sklearn.ensemble.RandomForestClassifier: A random forest classifier.sklearn.ensemble.RandomForestRegressor: A random forest regressor.sklearn.ensemble.RandomTreesEmbedding: An ensemble of totally random trees.
Notes
The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.
References
- [1] P. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized trees", Machine Learning, 63(1), 3-42, 2006.
Examples
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.ensemble import BaggingClassifier from sklearn.tree import ExtraTreeClassifier X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split( X, y, random_state=0) extra_tree = ExtraTreeClassifier(random_state=0) cls = BaggingClassifier(extra_tree, random_state=0).fit( X_train, y_train) cls.score(X_test, y_test)
0.8947
24.2.10 /extra-trees-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| class-weight | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
An extra-trees classifier.
This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
This estimator has native support for missing values (NaNs) for random splits. During training, a random threshold will be chosen to split the non-missing values on. Then the non-missing values will be sent to the left and right child based on the randomly selected threshold, while the missing values will also be randomly sent to the left or right child. This is repeated for every feature considered at each split. The best split among these is chosen.
Read more in the User Guide: forest.
Parameters
n_estimators: int, default=100 The number of trees in the forest.Changed in 0.22 The default value of
n_estimatorschanged from 10 to 100 in 0.22.criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation. Note: This parameter is tree-specific.max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:- If int, then consider
min_samples_splitas the minimum number. - If float, then
min_samples_splitis a fraction andceil(min_samples_split * n_samples)are the minimum number of samples for each split.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at leastmin_samples_leaftraining samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.- If int, then consider
min_samples_leafas the minimum number. - If float, then
min_samples_leafis a fraction andceil(min_samples_leaf * n_samples)are the minimum number of samples for each node.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.max_features: {"sqrt", "log2", None}, int or float, default="sqrt" The number of features to consider when looking for the best split:- If int, then consider
max_featuresfeatures at each split. - If float, then
max_featuresis a fraction andmax(1, int(max_features * n_features_in_))features are considered at each split. - If "sqrt", then
max_features=sqrt(n_features). - If "log2", then
max_features=log2(n_features). - If None, then
max_features=n_features.
Changed in 1.1 The default of
max_featureschanged from"auto"to"sqrt".Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_featuresfeatures.- If int, then consider
max_leaf_nodes: int, default=None Grow trees withmax_leaf_nodesin best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
bootstrap: bool, default=False Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.oob_score: bool or callable, default=False Whether to use out-of-bag samples to estimate the generalization score. By default,~sklearn.metrics.accuracy_scoreis used. Provide a callable with signaturemetric(y_true, y_pred)to use a custom metric. Only available ifbootstrap=True.For an illustration of out-of-bag (OOB) error estimation, see the example :ref:
sphx_glr_auto_examples_ensemble_plot_ensemble_oob.py.n_jobs: int, default=None The number of jobs to run in parallel.fit,predict,decision_pathandapplyare all parallelized over the trees.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.random_state: int, RandomState instance or None, default=None Controls 3 sources of randomness:- the bootstrapping of the samples used when building trees (if
bootstrap=True) - the sampling of the features to consider when looking for the best split at each node (if
max_features < n_features) - the draw of the splits for each of the
max_features
See
Glossaryfor details.- the bootstrapping of the samples used when building trees (if
verbose: int, default=0 Controls the verbosity when fitting and predicting.warm_start: bool, default=False When set toTrue, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. SeeGlossaryand :ref:tree_ensemble_warm_startfor details.class_weight: {"balanced", "balanced_subsample"}, dict or list of dicts, default=None Weights associated with classes in the form{class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))The "balanced_subsample" mode is the same as "balanced" except that weights are computed based on the bootstrap sample for every tree grown.
For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller thanccp_alphawill be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruningfor details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.pyfor an example of such pruning.Added in 0.22
max_samples: int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator.- If None (default), then draw
X.shape[0]samples. - If int, then draw
max_samplessamples. - If float, then draw
max_samples * X.shape[0]samples. Thus,max_samplesshould be in the interval(0.0, 1.0].
Added in 0.22
- If None (default), then draw
monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonically increasing - 0: no constraint - -1: monotonically decreasingIf monotonic_cst is None, no constraints are applied.
Monotonicity constraints are not supported for: - multiclass classifications (i.e. when
n_classes > 2), - multioutput classifications (i.e. whenn_outputs_ > 1), - classifications trained on data with missing values.The constraints hold over the probability of the positive class.
Read more in the User Guide:
monotonic_cst_gbdt.Added in 1.4
Attributes
estimator_:~sklearn.tree.ExtraTreeClassifierThe child estimator template used to create the collection of fitted sub-estimators.Added in 1.2
base_estimator_was renamed toestimator_.estimators_: list of DecisionTreeClassifier The collection of fitted sub-estimators.classes_: ndarray of shape (n_classes,) or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).n_classes_: int or list The number of classes (single output problem), or a list containing the number of classes for each output (multi-output problem).feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importanceas an alternative.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_outputs_: int The number of outputs whenfitis performed.oob_score_: float Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only whenoob_scoreis True.oob_decision_function_: ndarray of shape (n_samples, n_classes) or (n_samples, n_classes, n_outputs) Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case,oob_decision_function_might contain NaN. This attribute exists only whenoob_scoreis True.estimators_samples_: list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.Added in 1.4
See Also
ExtraTreesRegressor: An extra-trees regressor with random splits.RandomForestClassifier: A random forest classifier with optimal splits.RandomForestRegressor: Ensemble regressor using trees with optimal splits.
Notes
The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.
References
- [1] P. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized trees", Machine Learning, 63(1), 3-42, 2006.
Examples
from sklearn.ensemble import ExtraTreesClassifier from sklearn.datasets import make_classification X, y = make_classification(n_features=4, random_state=0) clf = ExtraTreesClassifier(n_estimators=100, random_state=0) clf.fit(X, y)
ExtraTreesClassifier(random_state=0)
clf.predict([[0, 0, 0, 0]])
array([1])
24.2.11 /gaussian-nb
| name | type | default | description |
|---|---|---|---|
| priors | |||
| var-smoothing | |||
| predict-proba? |
Gaussian Naive Bayes (GaussianNB).
Can perform online updates to model parameters via partial_fit. For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque.
Read more in the User Guide: gaussian_naive_bayes.
Parameters
priors: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.var_smoothing: float, default=1e-9 Portion of the largest variance of all features that is added to variances for calculation stability.Added in 0.20
Attributes
class_count_: ndarray of shape (n_classes,) number of training samples observed in each class.class_prior_: ndarray of shape (n_classes,) probability of each class.classes_: ndarray of shape (n_classes,) class labels known to the classifier.epsilon_: float absolute additive value to variances.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
var_: ndarray of shape (n_classes, n_features) Variance of each feature per class.Added in 1.0
theta_: ndarray of shape (n_classes, n_features) mean of each feature per class.
See Also
BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.CategoricalNB: Naive Bayes classifier for categorical features.ComplementNB: Complement Naive Bayes classifier.MultinomialNB: Naive Bayes classifier for multinomial models.
Examples
import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) Y = np.array([1, 1, 1, 2, 2, 2]) from sklearn.naive_bayes import GaussianNB clf = GaussianNB() clf.fit(X, Y)
GaussianNB()
print(clf.predict([[-0.8, -1]]))
[1]
clf_pf = GaussianNB() clf_pf.partial_fit(X, Y, np.unique(Y))
GaussianNB()
print(clf_pf.predict([[-0.8, -1]]))
[1]
24.2.12 /gaussian-process-classifier
| name | type | default | description |
|---|---|---|---|
| kernel | |||
| optimizer | |||
| multi-class | |||
| n-jobs | |||
| random-state | |||
| max-iter-predict | |||
| copy-x-train | |||
| n-restarts-optimizer | |||
| warm-start | |||
| predict-proba? |
Gaussian process classification (GPC) based on Laplace approximation.
The implementation is based on Algorithm 3.1, 3.2, and 5.1 from [RW2006]_.
Internally, the Laplace approximation is used for approximating the non-Gaussian posterior by a Gaussian.
Currently, the implementation is restricted to using the logistic link function. For multi-class classification, several binary one-versus rest classifiers are fitted. Note that this class thus does not implement a true multi-class Laplace approximation.
Read more in the User Guide: gaussian_process.
Added in 0.18
Parameters
kernel: kernel instance, default=None The kernel specifying the covariance function of the GP. If None is passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that the kernel's hyperparameters are optimized during fitting. Also kernel cannot be aCompoundKernel.optimizer: 'fmin_l_bfgs_b', callable or None, default='fmin_l_bfgs_b' Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature
def optimizer(obj_func, initial_theta, bounds):
# * 'obj_func' is the objective function to be maximized, which
# takes the hyperparameters theta as parameter and an
# optional flag eval_gradient, which determines if the
# gradient is returned additionally to the function value
# * 'initial_theta': the initial value for theta, which can be
# used by local optimizers
# * 'bounds': the bounds on the values of theta
....
# Returned are the best found hyperparameters theta and
# the corresponding value of the target function.
return theta_opt, func_min
default, the 'L-BFGS-B' algorithm from scipy.optimize.minimize
sed. If None is passed, the kernel's parameters are kept fixed.
lable internal optimizers are::
'fmin_l_bfgs_b'
n_restarts_optimizer: int, default=0 The number of restarts of the optimizer for finding the kernel's parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer=0 implies that one run is performed.max_iter_predict: int, default=100 The maximum number of iterations in Newton's method for approximating the posterior during predict. Smaller values will reduce computation time at the cost of worse results.warm_start: bool, default=False If warm-starts are enabled, the solution of the last Newton iteration on the Laplace approximation of the posterior mode is used as initialization for the next call of _posterior_mode(). This can speed up convergence when _posterior_mode is called several times on similar problems as in hyperparameter optimization. Seethe Glossary.copy_X_train: bool, default=True If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.random_state: int, RandomState instance or None, default=None Determines random number generation used to initialize the centers. Pass an int for reproducible results across multiple function calls. SeeGlossary.multi_class: {'one_vs_rest', 'one_vs_one'}, default='one_vs_rest' Specifies how multi-class classification problems are handled. Supported are 'one_vs_rest' and 'one_vs_one'. In 'one_vs_rest', one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from the rest. In 'one_vs_one', one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate these two classes. The predictions of these binary predictors are combined into multi-class predictions. Note that 'one_vs_one' does not support predicting probability estimates.n_jobs: int, default=None The number of jobs to use for the computation: the specified multiclass problems are computed in parallel.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.
Attributes
base_estimator_:Estimatorinstance The estimator instance that defines the likelihood function using the observed data.kernel_: kernel instance The kernel used for prediction. In case of binary classification, the structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters. In case of multi-class classification, a CompoundKernel is returned which consists of the different kernels used in the one-versus-rest classifiers.log_marginal_likelihood_value_: float The log-marginal-likelihood ofself.kernel_.thetaclasses_: array-like of shape (n_classes,) Unique class labels.n_classes_: int The number of classes in the training datan_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
GaussianProcessRegressor: Gaussian process regression (GPR).
References
.. [RW2006] Carl E. Rasmussen and Christopher K.I. Williams, "Gaussian Processes for Machine Learning", MIT Press 2006
Examples
from sklearn.datasets import load_iris
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
X, y = load_iris(return_X_y=True)
kernel = 1.0 * RBF(1.0)
gpc = GaussianProcessClassifier(kernel=kernel,
random_state=0).fit(X, y)
gpc.score(X, y)
0.9866...
gpc.predict_proba(X[:2,:])
array([[0.83548752, 0.03228706, 0.13222543],
[0.79064206, 0.06525643, 0.14410151]])
For a comparison of the GaussianProcessClassifier with other classifiers see: :ref:sphx_glr_auto_examples_classification_plot_classification_probability.py.
24.2.13 /gradient-boosting-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| tol | |||
| subsample | |||
| ccp-alpha | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| init | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| n-estimators | |||
| criterion | |||
| loss | |||
| verbose | |||
| predict-proba? |
Gradient Boosting for classification.
This algorithm builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the loss function, e.g. binary or multiclass log loss. Binary classification is a special case where only a single regression tree is induced.
~sklearn.ensemble.HistGradientBoostingClassifier is a much faster variant of this algorithm for intermediate and large datasets (n_samples >= 10_000) and supports monotonic constraints.
Read more in the User Guide: gradient_boosting.
Parameters
loss: {'log_loss', 'exponential'}, default='log_loss' The loss function to be optimized. 'log_loss' refers to binomial and multinomial deviance, the same as used in logistic regression. It is a good choice for classification with probabilistic outputs. For loss 'exponential', gradient boosting recovers the AdaBoost algorithm.learning_rate: float, default=0.1 Learning rate shrinks the contribution of each tree bylearning_rate. There is a trade-off between learning_rate and n_estimators. Values must be in the range[0.0, inf).For an example of the effects of this parameter and its interaction with
subsample, see :ref:sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regularization.py.n_estimators: int, default=100 The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. Values must be in the range[1, inf).subsample: float, default=1.0 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting.subsampleinteracts with the parametern_estimators. Choosingsubsample < 1.0leads to a reduction of variance and an increase in bias. Values must be in the range(0.0, 1.0].criterion: {'friedman_mse', 'squared_error'}, default='friedman_mse' The function to measure the quality of a split. Supported criteria are 'friedman_mse' for the mean squared error with improvement score by Friedman, 'squared_error' for mean squared error. The default value of 'friedman_mse' is generally the best as it can provide a better approximation in some cases.Added in 0.18
min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:- If int, values must be in the range
[2, inf). - If float, values must be in the range
(0.0, 1.0]andmin_samples_splitwill beceil(min_samples_split * n_samples).
Changed in 0.18 Added float values for fractions.
- If int, values must be in the range
min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at leastmin_samples_leaftraining samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.- If int, values must be in the range
[1, inf). - If float, values must be in the range
(0.0, 1.0)andmin_samples_leafwill beceil(min_samples_leaf * n_samples).
Changed in 0.18 Added float values for fractions.
- If int, values must be in the range
min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. Values must be in the range[0.0, 0.5].max_depth: int or None, default=3 Maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. If int, values must be in the range[1, inf).min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Values must be in the range[0.0, inf).The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
init: estimator or 'zero', default=None An estimator object that is used to compute the initial predictions.inithas to providefitandpredict_proba. If 'zero', the initial raw predictions are set to zero. By default, aDummyEstimatorpredicting the classes priors is used.random_state: int, RandomState instance or None, default=None Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details). It also controls the random splitting of the training data to obtain a validation set ifn_iter_no_changeis not None. Pass an int for reproducible output across multiple function calls. SeeGlossary.max_features: {'sqrt', 'log2'}, int or float, default=None The number of features to consider when looking for the best split:- If int, values must be in the range
[1, inf). - If float, values must be in the range
(0.0, 1.0]and the features considered at each split will bemax(1, int(max_features * n_features_in_)). - If 'sqrt', then
max_features=sqrt(n_features). - If 'log2', then
max_features=log2(n_features). - If None, then
max_features=n_features.
Choosing
max_features < n_featuresleads to a reduction of variance and an increase in bias.Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_featuresfeatures.- If int, values must be in the range
verbose: int, default=0 Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree. Values must be in the range[0, inf).max_leaf_nodes: int, default=None Grow trees withmax_leaf_nodesin best-first fashion. Best nodes are defined as relative reduction in impurity. Values must be in the range[2, inf). IfNone, then unlimited number of leaf nodes.warm_start: bool, default=False When set toTrue, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. Seethe Glossary.validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Values must be in the range(0.0, 1.0). Only used ifn_iter_no_changeis set to an integer.Added in 0.20
n_iter_no_change: int, default=Nonen_iter_no_changeis used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping. If set to a number, it will set asidevalidation_fractionsize of the training data as validation and terminate training when validation score is not improving in all of the previousn_iter_no_changenumbers of iterations. The split is stratified. Values must be in the range[1, inf). See :ref:sphx_glr_auto_examples_ensemble_plot_gradient_boosting_early_stopping.py.Added in 0.20
tol: float, default=1e-4 Tolerance for the early stopping. When the loss is not improving by at least tol forn_iter_no_changeiterations (if set to a number), the training stops. Values must be in the range[0.0, inf).Added in 0.20
ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller thanccp_alphawill be chosen. By default, no pruning is performed. Values must be in the range[0.0, inf). See :ref:minimal_cost_complexity_pruningfor details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.pyfor an example of such pruning.Added in 0.22
Attributes
n_estimators_: int The number of estimators as selected by early stopping (ifn_iter_no_changeis specified). Otherwise it is set ton_estimators.Added in 0.20
n_trees_per_iteration_: int The number of trees that are built at each iteration. For binary classifiers, this is always 1.Added in 1.4.0
feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importanceas an alternative.oob_improvement_: ndarray of shape (n_estimators,) The improvement in loss on the out-of-bag samples relative to the previous iteration.oob_improvement_[0]is the improvement in loss of the first stage over theinitestimator. Only available ifsubsample < 1.0.oob_scores_: ndarray of shape (n_estimators,) The full history of the loss values on the out-of-bag samples. Only available ifsubsample < 1.0.Added in 1.3
oob_score_: float The last value of the loss on the out-of-bag samples. It is the same asoob_scores_[-1]. Only available ifsubsample < 1.0.Added in 1.3
train_score_: ndarray of shape (n_estimators,) The i-th scoretrain_score_[i]is the loss of the model at iterationion the in-bag sample. Ifsubsample == 1this is the loss on the training data.init_: estimator The estimator that provides the initial predictions. Set via theinitargument.estimators_: ndarray of DecisionTreeRegressor of shape (n_estimators,n_trees_per_iteration_) The collection of fitted sub-estimators.n_trees_per_iteration_is 1 for binary classification, otherwisen_classes.classes_: ndarray of shape (n_classes,) The classes labels.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_classes_: int The number of classes.max_features_: int The inferred value of max_features.
See Also
HistGradientBoostingClassifier: Histogram-based Gradient Boosting Classification Tree.sklearn.tree.DecisionTreeClassifier: A decision tree classifier.RandomForestClassifier: A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.AdaBoostClassifier: A meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
Notes
The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.
References
J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
J. Friedman, Stochastic Gradient Boosting, 1999
T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.
Examples
The following example shows how to fit a gradient boosting classifier with 100 decision stumps as weak learners.
from sklearn.datasets import make_hastie_10_2 from sklearn.ensemble import GradientBoostingClassifier
X, y = make_hastie_10_2(random_state=0) X_train, X_test = X[:2000], X[2000:] y_train, y_test = y[:2000], y[2000:]
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
max_depth=1, random_state=0).fit(X_train, y_train)
clf.score(X_test, y_test)
0.913
24.2.14 /hist-gradient-boosting-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| max-leaf-nodes | |||
| scoring | |||
| tol | |||
| early-stopping | |||
| max-iter | |||
| random-state | |||
| max-bins | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| class-weight | |||
| loss | |||
| interaction-cst | |||
| verbose | |||
| categorical-features | |||
| l-2-regularization | |||
| predict-proba? |
Histogram-based Gradient Boosting Classification Tree.
This estimator is much faster than GradientBoostingClassifier for big datasets (n_samples >= 10 000).
This estimator has native support for missing values (NaNs). During training, the tree grower learns at each split point whether samples with missing values should go to the left or right child, based on the potential gain. When predicting, samples with missing values are assigned to the left or right child consequently. If no missing values were encountered for a given feature during training, then samples with missing values are mapped to whichever child has the most samples.
This implementation is inspired by LightGBM .
Read more in the User Guide: histogram_based_gradient_boosting.
Added in 0.21
Parameters
loss: {'log_loss'}, default='log_loss' The loss function to use in the boosting process.For binary classification problems, 'log_loss' is also known as logistic loss, binomial deviance or binary crossentropy. Internally, the model fits one tree per boosting iteration and uses the logistic sigmoid function (expit) as inverse link function to compute the predicted positive class probability.
For multiclass classification problems, 'log_loss' is also known as multinomial deviance or categorical crossentropy. Internally, the model fits one tree per boosting iteration and per class and uses the softmax function as inverse link function to compute the predicted probabilities of the classes.
learning_rate: float, default=0.1 The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use1for no shrinkage.max_iter: int, default=100 The maximum number of iterations of the boosting process, i.e. the maximum number of trees for binary classification. For multiclass classification,n_classestrees per iteration are built.max_leaf_nodes: int or None, default=31 The maximum number of leaves for each tree. Must be strictly greater than 1. If None, there is no maximum limit.max_depth: int or None, default=None The maximum depth of each tree. The depth of a tree is the number of edges to go from the root to the deepest leaf. Depth isn't constrained by default.min_samples_leaf: int, default=20 The minimum number of samples per leaf. For small datasets with less than a few hundred samples, it is recommended to lower this value since only very shallow trees would be built.l2_regularization: float, default=0 The L2 regularization parameter penalizing leaves with small hessians. Use0for no regularization (default).max_features: float, default=1.0 Proportion of randomly chosen features in each and every node split. This is a form of regularization, smaller values make the trees weaker learners and might prevent overfitting. If interaction constraints frominteraction_cstare present, only allowed features are taken into account for the subsampling.Added in 1.4
max_bins: int, default=255 The maximum number of bins to use for non-missing values. Before training, each feature of the input arrayXis binned into integer-valued bins, which allows for a much faster training stage. Features with a small number of unique values may use less thanmax_binsbins. In addition to themax_binsbins, one more bin is always reserved for missing values. Must be no larger than 255.categorical_features: array-like of {bool, int, str} of shape (n_features) or shape (n_categorical_features,), default='from_dtype' Indicates the categorical features.- None : no feature will be considered categorical.
- boolean array-like : boolean mask indicating categorical features.
- integer array-like : integer indices indicating categorical features.
- str array-like: names of categorical features (assuming the training data has feature names).
"from_dtype": dataframe columns with dtype "category" are considered to be categorical features. The input must be an object exposing a__dataframe__method such as pandas or polars DataFrames to use this feature.
For each categorical feature, there must be at most
max_binsunique categories. Negative values for categorical features encoded as numeric dtypes are treated as missing values. All categorical values are converted to floating point numbers. This means that categorical values of 1.0 and 1 are treated as the same category.Read more in the User Guide:
categorical_support_gbdt.Added in 0.24
Changed in 1.2 Added support for feature names.
Changed in 1.4 Added
"from_dtype"option.Changed in 1.6 The default value changed from
Noneto"from_dtype".monotonic_cst: array-like of int of shape (n_features) or dict, default=None Monotonic constraint to enforce on each feature are specified using the following integer values:- 1: monotonic increase
- 0: no constraint
- -1: monotonic decrease
If a dict with str keys, map feature to monotonic constraints by name. If an array, the features are mapped to constraints by position. See :ref:
monotonic_cst_features_namesfor a usage example.The constraints are only valid for binary classifications and hold over the probability of the positive class. Read more in the User Guide:
monotonic_cst_gbdt.Added in 0.23
Changed in 1.2 Accept dict of constraints with feature names as keys.
interaction_cst: {"pairwise", "no_interactions"} or sequence of lists/tuples/sets of int, default=None Specify interaction constraints, the sets of features which can interact with each other in child node splits.Each item specifies the set of feature indices that are allowed to interact with each other. If there are more features than specified in these constraints, they are treated as if they were specified as an additional set.
The strings "pairwise" and "no_interactions" are shorthands for allowing only pairwise or no interactions, respectively.
For instance, with 5 features in total,
interaction_cst=[{0, 1}]is equivalent tointeraction_cst=[{0, 1}, {2, 3, 4}], and specifies that each branch of a tree will either only split on features 0 and 1 or only split on features 2, 3 and 4.See this example:
ice-vs-pdpon how to useinteraction_cst.Added in 1.2
warm_start: bool, default=False When set toTrue, reuse the solution of the previous call to fit and add more estimators to the ensemble. For results to be valid, the estimator should be re-trained on the same data only. Seethe Glossary.early_stopping: 'auto' or bool, default='auto' If 'auto', early stopping is enabled if the sample size is larger than 10000 or ifX_valandy_valare passed tofit. If True, early stopping is enabled, otherwise early stopping is disabled.Added in 0.23
scoring: str or callable or None, default='loss' Scoring method to use for early stopping. Only used ifearly_stoppingis enabled. Options:- str: see :ref:
scoring_string_namesfor options. - callable: a scorer callable object (e.g., function) with signature
scorer(estimator, X, y). See :ref:scoring_callablefor details. None: accuracy:accuracy_scoreis used.- 'loss': early stopping is checked w.r.t the loss value.
- str: see :ref:
validation_fraction: int or float or None, default=0.1 Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data. The value is ignored if either early stopping is not performed, e.g.early_stopping=False, or ifX_valandy_valare passed to fit.n_iter_no_change: int, default=10 Used to determine when to "early stop". The fitting process is stopped when none of the lastn_iter_no_changescores are better than then_iter_no_change - 1-th-to-last one, up to some tolerance. Only used if early stopping is performed.tol: float, default=1e-7 The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score.verbose: int, default=0 The verbosity level. If not zero, print some information about the fitting process.1prints only summary info,2prints info per iteration.random_state: int, RandomState instance or None, default=None Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. Pass an int for reproducible output across multiple function calls. SeeGlossary.class_weight: dict or 'balanced', default=None Weights associated with classes in the form{class_label: weight}. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y)). Note that these weights will be multiplied with sample_weight (passed through the fit method) ifsample_weightis specified.Added in 1.2
Attributes
classes_: array, shape = (n_classes,) Class labels.do_early_stopping_: bool Indicates whether early stopping is used during training.n_iter_: int The number of iterations as selected by early stopping, depending on theearly_stoppingparameter. Otherwise it corresponds to max_iter.n_trees_per_iteration_: int The number of tree that are built at each iteration. This is equal to 1 for binary classification, and ton_classesfor multiclass classification.train_score_: ndarray, shape (n_iter_+1,) The scores at each iteration on the training data. The first entry is the score of the ensemble before the first iteration. Scores are computed according to thescoringparameter. Ifscoringis not 'loss', scores are computed on a subset of at most 10 000 samples. Empty if no early stopping.validation_score_: ndarray, shape (n_iter_+1,) The scores at each iteration on the held-out validation data. The first entry is the score of the ensemble before the first iteration. Scores are computed according to thescoringparameter. Empty if no early stopping or ifvalidation_fractionis None.is_categorical_: ndarray, shape (n_features, ) or None Boolean mask for the categorical features.Noneif there are no categorical features.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
GradientBoostingClassifier: Exact gradient boosting method that does not scale as good on datasets with a large number of samples.sklearn.tree.DecisionTreeClassifier: A decision tree classifier.RandomForestClassifier: A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.AdaBoostClassifier: A meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
Examples
from sklearn.ensemble import HistGradientBoostingClassifier from sklearn.datasets import load_iris X, y = load_iris(return_X_y=True) clf = HistGradientBoostingClassifier().fit(X, y) clf.score(X, y)
1.0
24.2.15 /k-neighbors-classifier
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| leaf-size | |||
| metric | |||
| metric-params | |||
| n-jobs | |||
| n-neighbors | |||
| p | |||
| weights | |||
| predict-proba? |
Classifier implementing the k-nearest neighbors vote.
Read more in the User Guide: classification.
Parameters
n_neighbors: int, default=5 Number of neighbors to use by default forkneighborsqueries.weights: {'uniform', 'distance'}, callable or None, default='uniform' Weight function used in prediction. Possible values:- 'uniform' : uniform weights. All points in each neighborhood are weighted equally.
- 'distance' : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Refer to the example entitled :ref:
sphx_glr_auto_examples_neighbors_plot_classification.pyshowing the impact of theweightsparameter on the decision boundary.algorithm: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto' Algorithm used to compute the nearest neighbors:- 'ball_tree' will use
BallTree - 'kd_tree' will use
KDTree - 'brute' will use a brute-force search.
- 'auto' will attempt to decide the most appropriate algorithm based on the values passed to
fitmethod.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
- 'ball_tree' will use
leaf_size: int, default=30 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.p: float, default=2 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected to be positive.metric: str or callable, default='minkowski' Metric to use for distance computation. Default is "minkowski", which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in~sklearn.metrics.pairwise.distance_metricsfor valid metric values.If metric is "precomputed", X is assumed to be a distance matrix and must be square during fit. X may be a
sparse graph, in which case only "nonzero" elements may be considered neighbors.If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy's metrics, but is less efficient than passing the metric name as a string.
metric_params: dict, default=None Additional keyword arguments for the metric function.n_jobs: int, default=None The number of parallel jobs to run for neighbors search.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details. Doesn't affectfitmethod.
Attributes
classes_: array of shape (n_classes,) Class labels known to the classifiereffective_metric_: str or callble The distance metric used. It will be same as themetricparameter or a synonym of it, e.g. 'euclidean' if themetricparameter set to 'minkowski' andpparameter set to 2.effective_metric_params_: dict Additional keyword arguments for the metric function. For most metrics will be same withmetric_paramsparameter, but may also contain thepparameter value if theeffective_metric_attribute is set to 'minkowski'.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_samples_fit_: int Number of samples in the fitted data.outputs_2d_: bool False wheny's shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.
See Also
RadiusNeighborsClassifier: Classifier based on neighbors within a fixed radius. KNeighborsRegressor: Regression based on k-nearest neighbors. RadiusNeighborsRegressor: Regression based on neighbors within a fixed radius. NearestNeighbors: Unsupervised learner for implementing neighbor searches.
Notes
See Nearest Neighbors: neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
⚠️ Warning
Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.
https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Examples
X = [[0], [1], [2], [3]] y = [0, 0, 1, 1] from sklearn.neighbors import KNeighborsClassifier neigh = KNeighborsClassifier(n_neighbors=3) neigh.fit(X, y)
KNeighborsClassifier(...)
print(neigh.predict([[1.1]]))
[0]
print(neigh.predict_proba([[0.9]]))
[[0.666 0.333]]
24.2.16 /label-propagation
| name | type | default | description |
|---|---|---|---|
| gamma | |||
| kernel | |||
| max-iter | |||
| n-jobs | |||
| n-neighbors | |||
| tol | |||
| predict-proba? |
Label Propagation classifier.
Read more in the User Guide: label_propagation.
Parameters
kernel: {'knn', 'rbf'} or callable, default='rbf' String identifier for kernel function to use or the kernel function itself. Only 'rbf' and 'knn' strings are valid inputs. The function passed should take two inputs, each of shape (n_samples, n_features), and return a (n_samples, n_samples) shaped weight matrix.gamma: float, default=20 Parameter for rbf kernel.n_neighbors: int, default=7 Parameter for knn kernel which need to be strictly positive.max_iter: int, default=1000 Change maximum number of iterations allowed.tol: float, default=1e-3 Convergence tolerance: threshold to consider the system at steady state.n_jobs: int, default=None The number of parallel jobs to run.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.
Attributes
X_: {array-like, sparse matrix} of shape (n_samples, n_features) Input array.classes_: ndarray of shape (n_classes,) The distinct labels used in classifying instances.label_distributions_: ndarray of shape (n_samples, n_classes) Categorical distribution for each item.transduction_: ndarray of shape (n_samples) Label assigned to each item duringfit.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: int Number of iterations run.
See Also
LabelSpreading: Alternate label propagation strategy more robust to noise.
References
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf
Examples
import numpy as np from sklearn import datasets from sklearn.semi_supervised import LabelPropagation label_prop_model = LabelPropagation() iris = datasets.load_iris() rng = np.random.RandomState(42) random_unlabeled_points = rng.rand(len(iris.target)) < 0.3 labels = np.copy(iris.target) labels[random_unlabeled_points] = -1 label_prop_model.fit(iris.data, labels)
LabelPropagation(...)
24.2.17 /label-spreading
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| gamma | |||
| kernel | |||
| max-iter | |||
| n-jobs | |||
| n-neighbors | |||
| tol | |||
| predict-proba? |
LabelSpreading model for semi-supervised learning.
This model is similar to the basic Label Propagation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels.
Read more in the User Guide: label_propagation.
Parameters
kernel: {'knn', 'rbf'} or callable, default='rbf' String identifier for kernel function to use or the kernel function itself. Only 'rbf' and 'knn' strings are valid inputs. The function passed should take two inputs, each of shape (n_samples, n_features), and return a (n_samples, n_samples) shaped weight matrix.gamma: float, default=20 Parameter for rbf kernel.n_neighbors: int, default=7 Parameter for knn kernel which is a strictly positive integer.alpha: float, default=0.2 Clamping factor. A value in (0, 1) that specifies the relative amount that an instance should adopt the information from its neighbors as opposed to its initial label. alpha=0 means keeping the initial label information; alpha=1 means replacing all initial information.max_iter: int, default=30 Maximum number of iterations allowed.tol: float, default=1e-3 Convergence tolerance: threshold to consider the system at steady state.n_jobs: int, default=None The number of parallel jobs to run.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.
Attributes
X_: ndarray of shape (n_samples, n_features) Input array.classes_: ndarray of shape (n_classes,) The distinct labels used in classifying instances.label_distributions_: ndarray of shape (n_samples, n_classes) Categorical distribution for each item.transduction_: ndarray of shape (n_samples,) Label assigned to each item duringfit.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: int Number of iterations run.
See Also
LabelPropagation: Unregularized graph based semi-supervised learning.
References
Examples
import numpy as np from sklearn import datasets from sklearn.semi_supervised import LabelSpreading label_prop_model = LabelSpreading() iris = datasets.load_iris() rng = np.random.RandomState(42) random_unlabeled_points = rng.rand(len(iris.target)) < 0.3 labels = np.copy(iris.target) labels[random_unlabeled_points] = -1 label_prop_model.fit(iris.data, labels)
LabelSpreading(...)
24.2.18 /linear-discriminant-analysis
| name | type | default | description |
|---|---|---|---|
| covariance-estimator | |||
| n-components | |||
| priors | |||
| shrinkage | |||
| solver | |||
| store-covariance | |||
| tol | |||
| predict-proba? |
Linear Discriminant Analysis.
A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.
The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.
The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions, using the transform method.
Added in 0.17
For a comparison between ~sklearn.discriminant_analysis.LinearDiscriminantAnalysis and ~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis, see :ref:sphx_glr_auto_examples_classification_plot_lda_qda.py.
Read more in the User Guide: lda_qda.
Parameters
solver: {'svd', 'lsqr', 'eigen'}, default='svd' Solver to use, possible values: - 'svd': Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features. - 'lsqr': Least squares solution. Can be combined with shrinkage or custom covariance estimator. - 'eigen': Eigenvalue decomposition. Can be combined with shrinkage or custom covariance estimator.Changed in 1.2
solver="svd"now has experimental Array API support. See the Array API User Guide:array_apifor more details.shrinkage: 'auto' or float, default=None Shrinkage parameter, possible values: - None: no shrinkage (default). - 'auto': automatic shrinkage using the Ledoit-Wolf lemma. - float between 0 and 1: fixed shrinkage parameter.This should be left to None if
covariance_estimatoris used. Note that shrinkage works only with 'lsqr' and 'eigen' solvers.For a usage example, see :ref:
sphx_glr_auto_examples_classification_plot_lda.py.priors: array-like of shape (n_classes,), default=None The class prior probabilities. By default, the class proportions are inferred from the training data.n_components: int, default=None Number of components (<= min(n_classes - 1, n_features)) for dimensionality reduction. If None, will be set to min(n_classes - 1, n_features). This parameter only affects thetransformmethod.For a usage example, see :ref:
sphx_glr_auto_examples_decomposition_plot_pca_vs_lda.py.store_covariance: bool, default=False If True, explicitly compute the weighted within-class covariance matrix when solver is 'svd'. The matrix is always computed and stored for the other solvers.Added in 0.17
tol: float, default=1.0e-4 Absolute threshold for a singular value of X to be considered significant, used to estimate the rank of X. Dimensions whose singular values are non-significant are discarded. Only used if solver is 'svd'.Added in 0.17
covariance_estimator: covariance estimator, default=None If not None,covariance_estimatoris used to estimate the covariance matrices instead of relying on the empirical covariance estimator (with potential shrinkage). The object should have a fit method and acovariance_attribute like the estimators insklearn.covariance. if None the shrinkage parameter drives the estimate.This should be left to None if
shrinkageis used. Note thatcovariance_estimatorworks only with 'lsqr' and 'eigen' solvers.Added in 0.24
Attributes
coef_: ndarray of shape (n_features,) or (n_classes, n_features) Weight vector(s).intercept_: ndarray of shape (n_classes,) Intercept term.covariance_: array-like of shape (n_features, n_features) Weighted within-class covariance matrix. It corresponds tosum_k prior_k * C_kwhereC_kis the covariance matrix of the samples in classk. TheC_kare estimated using the (potentially shrunk) biased estimator of covariance. If solver is 'svd', only exists whenstore_covarianceis True.explained_variance_ratio_: ndarray of shape (n_components,) Percentage of variance explained by each of the selected components. Ifn_componentsis not set then all components are stored and the sum of explained variances is equal to 1.0. Only available when eigen or svd solver is used.means_: array-like of shape (n_classes, n_features) Class-wise means.priors_: array-like of shape (n_classes,) Class priors (sum to 1).scalings_: array-like of shape (rank, n_classes - 1) Scaling of the features in the space spanned by the class centroids. Only available for 'svd' and 'eigen' solvers.xbar_: array-like of shape (n_features,) Overall mean. Only present if solver is 'svd'.classes_: array-like of shape (n_classes,) Unique class labels.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
QuadraticDiscriminantAnalysis: Quadratic Discriminant Analysis.
Examples
import numpy as np from sklearn.discriminant_analysis import LinearDiscriminantAnalysis X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) y = np.array([1, 1, 1, 2, 2, 2]) clf = LinearDiscriminantAnalysis() clf.fit(X, y)
LinearDiscriminantAnalysis()
print(clf.predict([[-0.8, -1]]))
[1]
24.2.19 /linear-svc
| name | type | default | description |
|---|---|---|---|
| tol | |||
| intercept-scaling | |||
| multi-class | |||
| penalty | |||
| c | |||
| max-iter | |||
| random-state | |||
| dual | |||
| fit-intercept | |||
| class-weight | |||
| loss | |||
| verbose | |||
| predict-proba? |
Linear Support Vector Classification.
Similar to SVC with parameter kernel='linear', but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.
The main differences between ~sklearn.svm.LinearSVC and ~sklearn.svm.SVC lie in the loss function used by default, and in the handling of intercept regularization between those two implementations.
This class supports both dense and sparse input and the multiclass support is handled according to a one-vs-the-rest scheme.
Read more in the User Guide: svm_classification.
Parameters
penalty: {'l1', 'l2'}, default='l2' Specifies the norm used in the penalization. The 'l2' penalty is the standard used in SVC. The 'l1' leads tocoef_vectors that are sparse.loss: {'hinge', 'squared_hinge'}, default='squared_hinge' Specifies the loss function. 'hinge' is the standard SVM loss (used e.g. by the SVC class) while 'squared_hinge' is the square of the hinge loss. The combination ofpenalty='l1'andloss='hinge'is not supported.dual: "auto" or bool, default="auto" Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features.dual="auto"will choose the value of the parameter automatically, based on the values ofn_samples,n_features,loss,multi_classandpenalty. Ifn_samples<n_featuresand optimizer supports chosenloss,multi_classandpenalty, then dual will be set to True, otherwise it will be set to False.Changed in 1.3 The
"auto"option is added in version 1.3 and will be the default in version 1.5.tol: float, default=1e-4 Tolerance for stopping criteria.C: float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. For an intuitive visualization of the effects of scaling the regularization parameter C, see :ref:sphx_glr_auto_examples_svm_plot_svm_scale_c.py.multi_class: {'ovr', 'crammer_singer'}, default='ovr' Determines the multi-class strategy ifycontains more than two classes."ovr"trains n_classes one-vs-rest classifiers, while"crammer_singer"optimizes a joint objective over all classes. Whilecrammer_singeris interesting from a theoretical perspective as it is consistent, it is seldom used in practice as it rarely leads to better accuracy and is more expensive to compute. If"crammer_singer"is chosen, the options loss, penalty and dual will be ignored.fit_intercept: bool, default=True Whether or not to fit an intercept. If set to True, the feature vector is extended to include an intercept term:[x_1, ..., x_n, 1], where 1 corresponds to the intercept. If set to False, no intercept will be used in calculations (i.e. data is expected to be already centered).intercept_scaling: float, default=1.0 Whenfit_interceptis True, the instance vector x becomes[x_1, ..., x_n, intercept_scaling], i.e. a "synthetic" feature with a constant value equal tointercept_scalingis appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight. Note that liblinear internally penalizes the intercept, treating it like any other term in the feature vector. To reduce the impact of the regularization on the intercept, theintercept_scalingparameter can be set to a value greater than 1; the higher the value ofintercept_scaling, the lower the impact of regularization on it. Then, the weights become[w_x_1, ..., w_x_n, w_intercept*intercept_scaling], wherew_x_1, ..., w_x_nrepresent the feature weights and the intercept weight is scaled byintercept_scaling. This scaling allows the intercept term to have a different regularization behavior compared to the other features.class_weight: dict or 'balanced', default=None Set the parameter C of class i toclass_weight[i]*Cfor SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y)).verbose: int, default=0 Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for the dual coordinate descent (ifdual=True). Whendual=Falsethe underlying implementation ofLinearSVCis not random andrandom_statehas no effect on the results. Pass an int for reproducible output across multiple function calls. SeeGlossary.max_iter: int, default=1000 The maximum number of iterations to be run.
Attributes
coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features (coefficients in the primal problem).coef_is a readonly property derived fromraw_coef_that follows the internal memory layout of liblinear.intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.classes_: ndarray of shape (n_classes,) The unique classes labels.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: int Maximum number of iterations run across all classes.
See Also
SVC: Implementation of Support Vector Machine classifier using libsvm: the kernel can be non-linear but its SMO algorithm does not scale to large number of samples as LinearSVC does.Furthermore SVC multi-class mode is implemented using one vs one scheme while LinearSVC uses one vs the rest. It is possible to implement one vs the rest with SVC by using the
~sklearn.multiclass.OneVsRestClassifierwrapper.Finally SVC can fit dense data without memory copy if the input is C-contiguous. Sparse data will still incur memory copy though.
sklearn.linear_model.SGDClassifier: SGDClassifier can optimize the same cost function as LinearSVC by adjusting the penalty and loss parameters. In addition it requires less memory, allows incremental (online) learning, and implements various loss functions and regularization regimes.
Notes
The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.
The underlying implementation, liblinear, uses a sparse internal representation for the data that will incur a memory copy.
Predict output may not match that of standalone liblinear in certain cases. See differences from liblinear: liblinear_differences in the narrative documentation.
References
LIBLINEAR: A Library for Large Linear Classification
Examples
from sklearn.svm import LinearSVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
X, y = make_classification(n_features=4, random_state=0)
clf = make_pipeline(StandardScaler(),
LinearSVC(random_state=0, tol=1e-5))
clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
('linearsvc', LinearSVC(random_state=0, tol=1e-05))])
print(clf.named_steps['linearsvc'].coef_)
[[0.141 0.526 0.679 0.493]]
print(clf.named_steps['linearsvc'].intercept_)
[0.1693]
print(clf.predict([[0, 0, 0, 0]]))
[1]
24.2.20 /logistic-regression
| name | type | default | description |
|---|---|---|---|
| tol | |||
| intercept-scaling | |||
| solver | |||
| penalty | |||
| c | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| dual | |||
| fit-intercept | |||
| warm-start | |||
| l-1-ratio | |||
| class-weight | |||
| verbose | |||
| predict-proba? |
Logistic Regression (aka logit, MaxEnt) classifier.
This class implements regularized logistic regression using a set of available solvers. Note that regularization is applied by default. It can handle both dense and sparse input X. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).
The solvers 'lbfgs', 'newton-cg', 'newton-cholesky' and 'sag' support only L2 regularization with primal formulation, or no regularization. The 'liblinear' solver supports both L1 and L2 regularization (but not both, i.e. elastic-net), with a dual formulation only for the L2 penalty. The Elastic-Net (combination of L1 and L2) regularization is only supported by the 'saga' solver.
For multiclass problems (whenever n_classes >= 3), all solvers except 'liblinear' optimize the (penalized) multinomial loss. 'liblinear' only handles binary classification but can be extended to handle multiclass by using ~sklearn.multiclass.OneVsRestClassifier.
Read more in the User Guide: logistic_regression.
Parameters
penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty:None: no penalty is added;'l2': add a L2 penalty term and it is the default choice;'l1': add a L1 penalty term;'elasticnet': both L1 and L2 penalty terms are added.
⚠️ Warning
Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.
versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1)
deprecated:: 1.8 penalty was deprecated in version 1.8 and will be removed in 1.10. Use l1_ratio instead. l1_ratio=0 for penalty='l2', l1_ratio=1 for penalty='l1' and l1_ratio set to any float between 0 and 1 for 'penalty='elasticnet'.
C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.C=np.infresults in unpenalized logistic regression. For a visual example on the effect of tuning theCparameter with an L1 penalty, see: :ref:sphx_glr_auto_examples_linear_model_plot_logistic_path.py.l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with0 <= l1_ratio <= 1. Settingl1_ratio=1gives a pure L1-penalty, settingl1_ratio=0a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the forml1_ratio * L1 + (1 - l1_ratio) * L2.
⚠️ Warning
Certain values of l1_ratio, i.e. some penalties, may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.
versionchanged:: 1.8 Default value changed from None to 0.0.
deprecated:: 1.8 None is deprecated and will be removed in version 1.10. Always use l1_ratio to specify the penalty type.
dual: bool, default=False Dual (constrained) or primal (regularized, see also this equation:regularized-logistic-loss) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Preferdual=Falsewhen n_samples > n_features.tol: float, default=1e-4 Tolerance for stopping criteria.fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.intercept_scaling: float, default=1 Useful only when the solverliblinearis used andself.fit_interceptis set toTrue. In this case,xbecomes[x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal tointercept_scalingis appended to the instance vector. The intercept becomesintercept_scaling * synthetic_feature_weight.
🛈 Note
The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.
class_weight: dict or 'balanced', default=None Weights associated with classes in the form{class_label: weight}. If not given, all classes are supposed to have weight one.The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y)).Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
Added in 0.17 class_weight='balanced'
random_state: int, RandomState instance, default=None Used whensolver== 'sag', 'saga' or 'liblinear' to shuffle the data. SeeGlossaryfor details.solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects:
- 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems.
- For
multiclassproblems (n_classes >= 3), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for
n_samples>>n_features * n_classes, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency onn_features * n_classesbecause it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones;
- 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the
~sklearn.multiclass.OneVsRestClassifier.
⚠️ Warning
The choice of the algorithm depends on the penalty chosen (l1_ratio=0 for L2-penalty, l1_ratio=1 for L1-penalty and 0 < l1_ratio < 1 for Elastic-Net) and on (multinomial) multiclass support:
================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ======================
note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.
seealso:: Refer to the :ref:User Guide for more information regarding :class:LogisticRegression and more specifically the :ref:Table summarizing solver/penalty supports.
versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. versionadded:: 0.19 SAGA solver. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.
max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. Seethe Glossary.Added in 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.
n_jobs: int, default=None Does not have any effect.Deprecated since 1.8
n_jobsis deprecated in version 1.8 and will be removed in 1.10.
Attributes
classes_: ndarray of shape (n_classes, ) A list of class labels known to the classifier.coef_: ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.coef_is of shape (1, n_features) when the given problem is binary.intercept_: ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function.If
fit_interceptis set to False, the intercept is set to zero.intercept_is of shape (1,) when the given problem is binary.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: ndarray of shape (1, ) Actual number of iterations for all classes.Changed in 0.20
In SciPy <= 1.0.0 the number of lbfgs iterations may exceed ``max_iter``. ``n_iter_`` will now report at most ``max_iter``.
See Also
SGDClassifier: Incrementally trained logistic regression (when given the parameterloss="log_loss").LogisticRegressionCV: Logistic regression with built-in cross validation.
Notes
The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.
Predict output may not match that of standalone liblinear in certain cases. See differences from liblinear: liblinear_differences in the narrative documentation.
References
L-BFGS-B -- Software for Large-scale Bound-constrained Optimization Ciyou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales. http://users.iems.northwestern.edu/~nocedal/lbfgsb.html
LIBLINEAR -- A Library for Large Linear Classification https://www.csie.ntu.edu.tw/~cjlin/liblinear/
SAG -- Mark Schmidt, Nicolas Le Roux, and Francis Bach Minimizing Finite Sums with the Stochastic Average Gradient https://hal.inria.fr/hal-00860051/document
SAGA -- Defazio, A., Bach F. & Lacoste-Julien S. (2014). :arxiv:"SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives" <1407.0202>
Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning 85(1-2):41-75. https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf
Examples
from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression X, y = load_iris(return_X_y=True) clf = LogisticRegression(random_state=0).fit(X, y) clf.predict(X[:2, :])
array([0, 0])
clf.predict_proba(X[:2, :])
array([[9.82e-01, 1.82e-02, 1.44e-08],
[9.72e-01, 2.82e-02, 3.02e-08]])
clf.score(X, y)
0.97
For a comparison of the LogisticRegression with other classifiers see: :ref:sphx_glr_auto_examples_classification_plot_classification_probability.py.
24.2.21 /logistic-regression-cv
| name | type | default | description |
|---|---|---|---|
| refit | |||
| scoring | |||
| tol | |||
| intercept-scaling | |||
| solver | |||
| penalty | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| dual | |||
| use-legacy-attributes | |||
| fit-intercept | |||
| cv | |||
| cs | |||
| class-weight | |||
| verbose | |||
| l-1-ratios | |||
| predict-proba? |
Logistic Regression CV (aka logit, MaxEnt) classifier.
See glossary entry for cross-validation estimator.
This class implements regularized logistic regression with implicit cross validation for the penalty parameters C and l1_ratio, see LogisticRegression, using a set of available solvers.
The solvers 'lbfgs', 'newton-cg', 'newton-cholesky' and 'sag' support only L2 regularization with primal formulation. The 'liblinear' solver supports both L1 and L2 regularization (but not both, i.e. elastic-net), with a dual formulation only for the L2 penalty. The Elastic-Net (combination of L1 and L2) regularization is only supported by the 'saga' solver.
For the grid of Cs values and l1_ratios values, the best hyperparameter is selected by the cross-validator ~sklearn.model_selection.StratifiedKFold, but it can be changed using the cv parameter. All solvers except 'liblinear' can warm-start the coefficients (see Glossary).
Read more in the User Guide: logistic_regression.
Parameters
Cs: int or list of floats, default=10 Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4. Like in support vector machines, smaller values specify stronger regularization.l1_ratios: array-like of shape (n_l1_ratios), default=None Floats between 0 and 1 passed as Elastic-Net mixing parameter (scaling between L1 and L2 penalties). Forl1_ratio = 0the penalty is an L2 penalty. Forl1_ratio = 1it is an L1 penalty. For0 < l1_ratio < 1, the penalty is a combination of L1 and L2. All the values of the given array-like are tested by cross-validation and the one giving the best prediction score is used.
⚠️ Warning
Certain values of l1_ratios, i.e. some penalties, may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.
deprecated:: 1.8 l1_ratios=None is deprecated in 1.8 and will raise an error in version 1.10. Default value will change from None to (0.0,) in version 1.10.
fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.cv: int or cross-validation generator, default=None The default cross-validation generator used is Stratified K-Folds. If an integer is provided, it specifies the number of folds,n_folds, used. See the modulesklearn.model_selectionmodule for the list of possible cross-validation objects.Changed in 0.22
cvdefault value if None changed from 3-fold to 5-fold.dual: bool, default=False Dual (constrained) or primal (regularized, see also this equation:regularized-logistic-loss) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.penalty: {'l1', 'l2', 'elasticnet'}, default='l2' Specify the norm of the penalty:'l2': add a L2 penalty term (used by default);'l1': add a L1 penalty term;'elasticnet': both L1 and L2 penalty terms are added.
⚠️ Warning
Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.
deprecated:: 1.8 penalty was deprecated in version 1.8 and will be removed in 1.10. Use l1_ratio instead. l1_ratio=0 for penalty='l2', l1_ratio=1 for penalty='l1' and l1_ratio set to any float between 0 and 1 for 'penalty='elasticnet'.
scoring: str or callable, default=None The scoring method to use for cross-validation. Options:- str: see :ref:
scoring_string_namesfor options. - callable: a scorer callable object (e.g., function) with signature
scorer(estimator, X, y). See :ref:scoring_callablefor details. None: accuracy:accuracy_scoreis used.
- str: see :ref:
solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects:
- 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems.
- For
multiclassproblems (n_classes >= 3), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for
n_samples>>n_features * n_classes, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency onn_features * n_classesbecause it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones;
- 'liblinear' might be slower in
LogisticRegressionCVbecause it does not handle warm-starting. - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the
~sklearn.multiclass.OneVsRestClassifier.
⚠️ Warning
The choice of the algorithm depends on the penalty (l1_ratio=0 for L2-penalty, l1_ratio=1 for L1-penalty and 0 < l1_ratio < 1 for Elastic-Net) chosen and on (multinomial) multiclass support:
================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ======================
note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:sklearn.preprocessing.
versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. versionadded:: 0.19 SAGA solver. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.
tol: float, default=1e-4 Tolerance for stopping criteria.max_iter: int, default=100 Maximum number of iterations of the optimization algorithm.class_weight: dict or 'balanced', default=None Weights associated with classes in the form{class_label: weight}. If not given, all classes are supposed to have weight one.The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y)).Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
Added in 0.17 class_weight == 'balanced'
n_jobs: int, default=None Number of CPU cores used during the cross-validation loop.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.verbose: int, default=0 For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any positive number for verbosity.refit: bool, default=True If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters. Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.intercept_scaling: float, default=1 Useful only when the solverliblinearis used andself.fit_interceptis set toTrue. In this case,xbecomes[x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal tointercept_scalingis appended to the instance vector. The intercept becomesintercept_scaling * synthetic_feature_weight.
🛈 Note
The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.
random_state: int, RandomState instance, default=None Used whensolver='sag', 'saga' or 'liblinear' to shuffle the data. Note that this only applies to the solver and not the cross-validation generator. SeeGlossaryfor details.use_legacy_attributes: bool, default=True If True, use legacy values for attributes:C_is an ndarray of shape (n_classes,) with the same value repeatedl1_ratio_is an ndarray of shape (n_classes,) with the same value repeatedcoefs_paths_is a dict with class labels as keys and ndarrays as valuesscores_is a dict with class labels as keys and ndarrays as valuesn_iter_is an ndarray of shape (1, n_folds, n_cs) or similar
If False, use new values for attributes:
C_is a floatl1_ratio_is a floatcoefs_paths_is an ndarray of shape (n_folds, n_l1_ratios, n_cs, n_classes, n_features) For binary problems (n_classes=2), the 2nd last dimension is 1.scores_is an ndarray of shape (n_folds, n_l1_ratios, n_cs)n_iter_is an ndarray of shape (n_folds, n_l1_ratios, n_cs)
Changed in 1.10 The default will change from True to False in version 1.10. Deprecated since 1.10
use_legacy_attributeswill be deprecated in version 1.10 and be removed in 1.12.
Attributes
classes_: ndarray of shape (n_classes, ) A list of class labels known to the classifier.coef_: ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.coef_is of shape (1, n_features) when the given problem is binary.intercept_: ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function.If
fit_interceptis set to False, the intercept is set to zero.intercept_is of shape (1,) when the problem is binary.Cs_: ndarray of shape (n_cs) Array of C i.e. inverse of regularization parameter values used for cross-validation.l1_ratios_: ndarray of shape (n_l1_ratios) Array of l1_ratios used for cross-validation. If l1_ratios=None is used (i.e. penalty is not 'elasticnet'), this is set to[None]coefs_paths_: dict of ndarray of shape (n_folds, n_cs, n_dof) or (n_folds, n_cs, n_l1_ratios, n_dof) A dict with classes as the keys, and the path of coefficients obtained during cross-validating across each fold (n_folds) and then across each Cs (n_cs). The size of the coefficients is the number of degrees of freedom (n_dof), i.e. without interceptn_dof=n_featuresand with interceptn_dof=n_features+1. Ifpenalty='elasticnet', there is an additional dimension for the number of l1_ratio values (n_l1_ratios), which gives a shape of(n_folds, n_cs, n_l1_ratios_, n_dof). See also parameteruse_legacy_attributes.scores_: dict A dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold. The same score is repeated across all classes. Each dict value has shape(n_folds, n_cs)or(n_folds, n_cs, n_l1_ratios)ifpenalty='elasticnet'. See also parameteruse_legacy_attributes.C_: ndarray of shape (n_classes,) or (1,) The value of C that maps to the best score, repeated n_classes times. If refit is set to False, the best C is the average of the C's that correspond to the best score for each fold.C_is of shape (1,) when the problem is binary. See also parameteruse_legacy_attributes.l1_ratio_: ndarray of shape (n_classes,) or (n_classes - 1,) The value of l1_ratio that maps to the best score, repeated n_classes times. If refit is set to False, the best l1_ratio is the average of the l1_ratio's that correspond to the best score for each fold.l1_ratio_is of shape (1,) when the problem is binary. See also parameteruse_legacy_attributes.n_iter_: ndarray of shape (1, n_folds, n_cs) or (1, n_folds, n_cs, n_l1_ratios) Actual number of iterations for all classes, folds and Cs. Ifpenalty='elasticnet', the shape is(1, n_folds, n_cs, n_l1_ratios). See also parameteruse_legacy_attributes.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
LogisticRegression: Logistic regression without tuning the hyperparameterC.
Examples
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegressionCV
X, y = load_iris(return_X_y=True)
clf = LogisticRegressionCV(
cv=5, random_state=0, use_legacy_attributes=False, l1_ratios=(0,)
).fit(X, y)
clf.predict(X[:2, :])
array([0, 0])
clf.predict_proba(X[:2, :]).shape
(2, 3)
clf.score(X, y)
0.98...
24.2.22 /mlp-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| activation | |||
| hidden-layer-sizes | |||
| tol | |||
| beta-2 | |||
| early-stopping | |||
| nesterovs-momentum | |||
| batch-size | |||
| solver | |||
| shuffle | |||
| power-t | |||
| max-fun | |||
| beta-1 | |||
| max-iter | |||
| random-state | |||
| momentum | |||
| learning-rate-init | |||
| alpha | |||
| warm-start | |||
| validation-fraction | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
Multi-layer Perceptron classifier.
This model optimizes the log-loss function using LBFGS or stochastic gradient descent.
Added in 0.18
Parameters
hidden_layer_sizes: array-like of shape(n_layers - 2,), default=(100,) The ith element represents the number of neurons in the ith hidden layer.activation: {'identity', 'logistic', 'tanh', 'relu'}, default='relu' Activation function for the hidden layer.'identity', no-op activation, useful to implement linear bottleneck, returns f(x) = x
'logistic', the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).
'tanh', the hyperbolic tan function, returns f(x) = tanh(x).
'relu', the rectified linear unit function, returns f(x) = max(0, x)
solver: {'lbfgs', 'sgd', 'adam'}, default='adam' The solver for weight optimization.'lbfgs' is an optimizer in the family of quasi-Newton methods.
'sgd' refers to stochastic gradient descent.
'adam' refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba
For a comparison between Adam optimizer and SGD, see :ref:
sphx_glr_auto_examples_neural_networks_plot_mlp_training_curves.py.Note: The default solver 'adam' works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, 'lbfgs' can converge faster and perform better.
alpha: float, default=0.0001 Strength of the L2 regularization term. The L2 regularization term is divided by the sample size when added to the loss.For an example usage and visualization of varying regularization, see :ref:
sphx_glr_auto_examples_neural_networks_plot_mlp_alpha.py.batch_size: int, default='auto' Size of minibatches for stochastic optimizers. If the solver is 'lbfgs', the classifier will not use minibatch. When set to "auto",batch_size=min(200, n_samples).learning_rate: {'constant', 'invscaling', 'adaptive'}, default='constant' Learning rate schedule for weight updates.'constant' is a constant learning rate given by 'learning_rate_init'.
'invscaling' gradually decreases the learning rate at each time step 't' using an inverse scaling exponent of 'power_t'. effective_learning_rate = learning_rate_init / pow(t, power_t)
'adaptive' keeps the learning rate constant to 'learning_rate_init' as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if 'early_stopping' is on, the current learning rate is divided by 5.
Only used when
solver='sgd'.learning_rate_init: float, default=0.001 The initial learning rate used. It controls the step-size in updating the weights. Only used when solver='sgd' or 'adam'.power_t: float, default=0.5 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to 'invscaling'. Only used when solver='sgd'.max_iter: int, default=200 Maximum number of iterations. The solver iterates until convergence (determined by 'tol') or this number of iterations. For stochastic solvers ('sgd', 'adam'), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.shuffle: bool, default=True Whether to shuffle samples in each iteration. Only used when solver='sgd' or 'adam'.random_state: int, RandomState instance, default=None Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver='sgd' or 'adam'. Pass an int for reproducible results across multiple function calls. SeeGlossary.tol: float, default=1e-4 Tolerance for the optimization. When the loss or score is not improving by at leasttolforn_iter_no_changeconsecutive iterations, unlesslearning_rateis set to 'adaptive', convergence is considered to be reached and training stops.verbose: bool, default=False Whether to print progress messages to stdout.warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Seethe Glossary.momentum: float, default=0.9 Momentum for gradient descent update. Should be between 0 and 1. Only used when solver='sgd'.nesterovs_momentum: bool, default=True Whether to use Nesterov's momentum. Only used when solver='sgd' and momentum > 0.early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set asidevalidation_fractionof training data as validation and terminate training when validation score is not improving by at leasttolforn_iter_no_changeconsecutive epochs. The split is stratified, except in a multilabel setting. If early stopping is False, then the training stops when the training loss does not improve by more thantolforn_iter_no_changeconsecutive passes over the training set. Only effective when solver='sgd' or 'adam'.validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.beta_1: float, default=0.9 Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver='adam'.beta_2: float, default=0.999 Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver='adam'.epsilon: float, default=1e-8 Value for numerical stability in adam. Only used when solver='adam'.n_iter_no_change: int, default=10 Maximum number of epochs to not meettolimprovement. Only effective when solver='sgd' or 'adam'.Added in 0.20
max_fun: int, default=15000 Only used when solver='lbfgs'. Maximum number of loss function calls. The solver iterates until convergence (determined by 'tol'), number of iterations reaches max_iter, or this number of loss function calls. Note that number of loss function calls will be greater than or equal to the number of iterations for theMLPClassifier.Added in 0.22
Attributes
classes_: ndarray or list of ndarray of shape (n_classes,) Class labels for each output.loss_: float The current loss computed with the loss function.best_loss_: float or None The minimum loss reached by the solver throughout fitting. Ifearly_stopping=True, this attribute is set toNone. Refer to thebest_validation_score_fitted attribute instead.loss_curve_: list of shape (n_iter_,) The ith element in the list represents the loss at the ith iteration.validation_scores_: list of shape (n_iter_,) or None The score at each iteration on a held-out validation set. The score reported is the accuracy score. Only available ifearly_stopping=True, otherwise the attribute is set toNone.best_validation_score_: float or None The best validation score (i.e. accuracy score) that triggered the early stopping. Only available ifearly_stopping=True, otherwise the attribute is set toNone.t_: int The number of training samples seen by the solver during fitting.coefs_: list of shape (n_layers - 1,) The ith element in the list represents the weight matrix corresponding to layer i.intercepts_: list of shape (n_layers - 1,) The ith element in the list represents the bias vector corresponding to layer i + 1.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: int The number of iterations the solver has run.n_layers_: int Number of layers.n_outputs_: int Number of outputs.out_activation_: str Name of the output activation function.
See Also
MLPRegressor: Multi-layer Perceptron regressor.BernoulliRBM: Bernoulli Restricted Boltzmann Machine (RBM).
Notes
MLPClassifier trains iteratively since at each time step the partial derivatives of the loss function with respect to the model parameters are computed to update the parameters.
It can also have a regularization term added to the loss function that shrinks model parameters to prevent overfitting.
This implementation works with data represented as dense numpy arrays or sparse scipy arrays of floating point values.
References
Hinton, Geoffrey E. "Connectionist learning procedures." Artificial intelligence 40.1 (1989): 185-234.
Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." International Conference on Artificial Intelligence and Statistics. 2010.
:arxiv:He, Kaiming, et al (2015). "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." <1502.01852>
:arxiv:Kingma, Diederik, and Jimmy Ba (2014) "Adam: A method for stochastic optimization." <1412.6980>
Examples
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=100, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
random_state=1)
clf = MLPClassifier(random_state=1, max_iter=300).fit(X_train, y_train)
clf.predict_proba(X_test[:1])
array([[0.0383, 0.961]])
clf.predict(X_test[:5, :])
array([1, 0, 1, 0, 1])
clf.score(X_test, y_test)
0.8...
24.2.23 /multinomial-nb
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| class-prior | |||
| fit-prior | |||
| force-alpha | |||
| predict-proba? |
Naive Bayes classifier for multinomial models.
The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.
Read more in the User Guide: multinomial_naive_bayes.
Parameters
alpha: float or array-like of shape (n_features,), default=1.0 Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing).force_alpha: bool, default=True If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0.Added in 1.2 Changed in 1.4 The default value of
force_alphachanged toTrue.fit_prior: bool, default=True Whether to learn class prior probabilities or not. If false, a uniform prior will be used.class_prior: array-like of shape (n_classes,), default=None Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.
Attributes
class_count_: ndarray of shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.class_log_prior_: ndarray of shape (n_classes,) Smoothed empirical log probability for each class.classes_: ndarray of shape (n_classes,) Class labels known to the classifierfeature_count_: ndarray of shape (n_classes, n_features) Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.feature_log_prob_: ndarray of shape (n_classes, n_features) Empirical log probability of features given a class,P(x_i|y).n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.CategoricalNB: Naive Bayes classifier for categorical features.ComplementNB: Complement Naive Bayes classifier.GaussianNB: Gaussian Naive Bayes.
References
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
Examples
import numpy as np rng = np.random.RandomState(1) X = rng.randint(5, size=(6, 100)) y = np.array([1, 2, 3, 4, 5, 6]) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(X, y)
MultinomialNB()
print(clf.predict(X[2:3]))
[3]
24.2.24 /nearest-centroid
| name | type | default | description |
|---|---|---|---|
| metric | |||
| priors | |||
| shrink-threshold | |||
| predict-proba? |
Nearest centroid classifier.
Each class is represented by its centroid, with test samples classified to the class with the nearest centroid.
Read more in the User Guide: nearest_centroid_classifier.
Parameters
metric: {"euclidean", "manhattan"}, default="euclidean" Metric to use for distance computation.If
metric="euclidean", the centroid for the samples corresponding to each class is the arithmetic mean, which minimizes the sum of squared L1 distances. Ifmetric="manhattan", the centroid is the feature-wise median, which minimizes the sum of L1 distances.Changed in 1.5 All metrics but
"euclidean"and"manhattan"were deprecated and now raise an error.Changed in 0.19
metric='precomputed'was deprecated and now raises an errorshrink_threshold: float, default=None Threshold for shrinking centroids to remove features.priors: {"uniform", "empirical"} or array-like of shape (n_classes,), default="uniform" The class prior probabilities. By default, the class proportions are inferred from the training data.Added in 1.6
Attributes
centroids_: array-like of shape (n_classes, n_features) Centroid of each class.classes_: array of shape (n_classes,) The unique classes labels.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
deviations_: ndarray of shape (n_classes, n_features) Deviations (or shrinkages) of the centroids of each class from the overall centroid. Equal to eq. (18.4) ifshrink_threshold=None, else (18.5) p. 653 of [2]. Can be used to identify features used for classification.Added in 1.6
within_class_std_dev_: ndarray of shape (n_features,) Pooled or within-class standard deviation of input data.Added in 1.6
class_prior_: ndarray of shape (n_classes,) The class prior probabilities.Added in 1.6
See Also
KNeighborsClassifier: Nearest neighbors classifier.
Notes
When used for text classification with tf-idf vectors, this classifier is also known as the Rocchio classifier.
References
[1] Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567-6572. The National Academy of Sciences.
[2] Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning Data Mining, Inference, and Prediction. 2nd Edition. New York, Springer.
Examples
from sklearn.neighbors import NearestCentroid import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) y = np.array([1, 1, 1, 2, 2, 2]) clf = NearestCentroid() clf.fit(X, y)
NearestCentroid()
print(clf.predict([[-0.8, -1]]))
[1]
24.2.25 /nu-svc
| name | type | default | description |
|---|---|---|---|
| break-ties | |||
| kernel | |||
| gamma | |||
| degree | |||
| decision-function-shape | |||
| probability | |||
| tol | |||
| nu | |||
| shrinking | |||
| max-iter | |||
| random-state | |||
| coef-0 | |||
| class-weight | |||
| cache-size | |||
| verbose | |||
| predict-proba? |
Nu-Support Vector Classification.
Similar to SVC but uses a parameter to control the number of support vectors.
The implementation is based on libsvm.
Read more in the User Guide: svm_classification.
Parameters
nu: float, default=0.5 An upper bound on the fraction of margin errors (see User Guide:nu_svc) and a lower bound of the fraction of support vectors. Should be in the interval (0, 1].kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to precompute the kernel matrix. For an intuitive visualization of different kernel types see :ref:sphx_glr_auto_examples_svm_plot_svm_kernels.py.degree: int, default=3 Degree of the polynomial kernel function ('poly'). Must be non-negative. Ignored by all other kernels.gamma: {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.- if
gamma='scale'(default) is passed then it uses 1 / (n_features * X.var()) as value of gamma, - if 'auto', uses 1 / n_features
- if float, must be non-negative.
Changed in 0.22 The default value of
gammachanged from 'auto' to 'scale'.- if
coef0: float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'.shrinking: bool, default=True Whether to use the shrinking heuristic. See the User Guide:shrinking_svm.probability: bool, default=False Whether to enable probability estimates. This must be enabled prior to callingfit, will slow down that method as it internally uses 5-fold cross-validation, andpredict_probamay be inconsistent withpredict. Read more in the User Guide:scores_probabilities.tol: float, default=1e-3 Tolerance for stopping criterion.cache_size: float, default=200 Specify the size of the kernel cache (in MB).class_weight: {dict, 'balanced'}, default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies asn_samples / (n_classes * np.bincount(y)).verbose: bool, default=False Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.max_iter: int, default=-1 Hard limit on iterations within solver, or -1 for no limit.decision_function_shape: {'ovo', 'ovr'}, default='ovr' Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one ('ovo') decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one ('ovo') is always used as multi-class strategy. The parameter is ignored for binary classification.Changed in 0.19 decision_function_shape is 'ovr' by default.
Added in 0.17 decision_function_shape='ovr' is recommended.
Changed in 0.17 Deprecated decision_function_shape='ovo' and None.
break_ties: bool, default=False If true,decision_function_shape='ovr', and number of classes > 2,predictwill break ties according to the confidence values ofdecision_function; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict. See :ref:sphx_glr_auto_examples_svm_plot_svm_tie_breaking.pyfor an example of its usage withdecision_function_shape='ovr'.Added in 0.22
random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored whenprobabilityis False. Pass an int for reproducible output across multiple function calls. SeeGlossary.
Attributes
class_weight_: ndarray of shape (n_classes,) Multipliers of parameter C of each class. Computed based on theclass_weightparameter.classes_: ndarray of shape (n_classes,) The unique classes labels.coef_: ndarray of shape (n_classes * (n_classes -1) / 2, n_features) Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.coef_is readonly property derived fromdual_coef_andsupport_vectors_.dual_coef_: ndarray of shape (n_classes - 1, n_SV) Dual coefficients of the support vector in the decision function (see :ref:sgd_mathematical_formulation), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the multi-class section of the User Guide:svm_multi_classfor details.fit_status_: int 0 if correctly fitted, 1 if the algorithm did not converge.intercept_: ndarray of shape (n_classes * (n_classes - 1) / 2,) Constants in decision function.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: ndarray of shape (n_classes * (n_classes - 1) // 2,) Number of iterations run by the optimization routine to fit the model. The shape of this attribute depends on the number of models optimized which in turn depends on the number of classes.Added in 1.1
support_: ndarray of shape (n_SV,) Indices of support vectors.support_vectors_: ndarray of shape (n_SV, n_features) Support vectors.n_support_: ndarray of shape (n_classes,), dtype=int32 Number of support vectors for each class.fit_status_: int 0 if correctly fitted, 1 if the algorithm did not converge.probA_: ndarray of shape (n_classes * (n_classes - 1) / 2,)probB_: ndarray of shape (n_classes * (n_classes - 1) / 2,) Ifprobability=True, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. Ifprobability=False, it's an empty array. Platt scaling uses the logistic function1 / (1 + exp(decision_value * probA_ + probB_))whereprobA_andprobB_are learned from the dataset [2]. For more information on the multiclass case and training procedure see section 8 of [1].shape_fit_: tuple of int of shape (n_dimensions_of_X,) Array dimensions of training vectorX.
See Also
SVC: Support Vector Machine for classification using libsvm.LinearSVC: Scalable linear Support Vector Machine for classification using liblinear.
References
Examples
import numpy as np X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import NuSVC clf = make_pipeline(StandardScaler(), NuSVC()) clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()), ('nusvc', NuSVC())])
print(clf.predict([[-0.8, -1]]))
[1]
24.2.26 /passive-aggressive-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| average | |||
| tol | |||
| early-stopping | |||
| shuffle | |||
| c | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| fit-intercept | |||
| warm-start | |||
| validation-fraction | |||
| class-weight | |||
| loss | |||
| verbose | |||
| predict-proba? |
Passive Aggressive Classifier.
Deprecated since 1.8 The whole class PassiveAggressiveClassifier was deprecated in version 1.8 and will be removed in 1.10. Instead use:
clf = SGDClassifier(
loss="hinge",
penalty=None,
learning_rate="pa1", # or "pa2"
eta0=1.0, # for parameter C
)
Read more in the User Guide: passive_aggressive.
Parameters
C: float, default=1.0 Aggressiveness parameter for the passive-agressive algorithm, see [1]. For PA-I it is the maximum step size. For PA-II it regularizes the step size (the smallerCthe more it regularizes). As a general rule-of-thumb,Cshould be small when the data is noisy.fit_intercept: bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.max_iter: int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in thefitmethod, and not the~sklearn.linear_model.PassiveAggressiveClassifier.partial_fitmethod.Added in 0.19
tol: float or None, default=1e-3 The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol).Added in 0.19
early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score is not improving by at leasttolforn_iter_no_changeconsecutive epochs.Added in 0.20
validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.Added in 0.20
n_iter_no_change: int, default=5 Number of iterations with no improvement to wait before early stopping.Added in 0.20
shuffle: bool, default=True Whether or not the training data should be shuffled after each epoch.verbose: int, default=0 The verbosity level.loss: str, default="hinge" The loss function to be used: hinge: equivalent to PA-I in the reference paper. squared_hinge: equivalent to PA-II in the reference paper.n_jobs: int or None, default=None The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.random_state: int, RandomState instance, default=None Used to shuffle the training data, whenshuffleis set toTrue. Pass an int for reproducible output across multiple function calls. SeeGlossary.warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Seethe Glossary.Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled.
class_weight: dict, {class_label: weight} or "balanced" or None, default=None Preset for the class_weight fit parameter.Weights associated with classes. If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y)).Added in 0.17 parameter class_weight to automatically weight samples.
average: bool or int, default=False When set to True, computes the averaged SGD weights and stores the result in thecoef_attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.Added in 0.19 parameter average to use weights averaging in SGD.
Attributes
coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features.intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: int The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit.classes_: ndarray of shape (n_classes,) The unique classes labels.t_: int Number of weight updates performed during training. Same as(n_iter_ * n_samples + 1).
See Also
SGDClassifier: Incrementally trained logistic regression.Perceptron: Linear perceptron classifier.
References
- [1] Online Passive-Aggressive Algorithms http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR (2006)
Examples
from sklearn.linear_model import PassiveAggressiveClassifier from sklearn.datasets import make_classification X, y = make_classification(n_features=4, random_state=0) clf = PassiveAggressiveClassifier(max_iter=1000, random_state=0, tol=1e-3) clf.fit(X, y)
PassiveAggressiveClassifier(random_state=0)
print(clf.coef_)
[[0.26642044 0.45070924 0.67251877 0.64185414]]
print(clf.intercept_)
[1.84127814]
print(clf.predict([[0, 0, 0, 0]]))
[1]
24.2.27 /perceptron
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| tol | |||
| early-stopping | |||
| eta-0 | |||
| shuffle | |||
| penalty | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| l-1-ratio | |||
| validation-fraction | |||
| class-weight | |||
| verbose | |||
| predict-proba? |
Linear perceptron classifier.
The implementation is a wrapper around ~sklearn.linear_model.SGDClassifier by fixing the loss and learning_rate parameters as
SGDClassifier(loss="perceptron", learning_rate="constant")
Other available parameters are described below and are forwarded to ~sklearn.linear_model.SGDClassifier.
Read more in the User Guide: perceptron.
Parameters
penalty: {'l2','l1','elasticnet'}, default=None The penalty (aka regularization term) to be used.alpha: float, default=0.0001 Constant that multiplies the regularization term if regularization is used.l1_ratio: float, default=0.15 The Elastic Net mixing parameter, with0 <= l1_ratio <= 1.l1_ratio=0corresponds to L2 penalty,l1_ratio=1to L1. Only used ifpenalty='elasticnet'.Added in 0.24
fit_intercept: bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.max_iter: int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in thefitmethod, and not thepartial_fitmethod.Added in 0.19
tol: float or None, default=1e-3 The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol).Added in 0.19
shuffle: bool, default=True Whether or not the training data should be shuffled after each epoch.verbose: int, default=0 The verbosity level.eta0: float, default=1 Constant by which the updates are multiplied.n_jobs: int, default=None The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.random_state: int, RandomState instance or None, default=0 Used to shuffle the training data, whenshuffleis set toTrue. Pass an int for reproducible output across multiple function calls. SeeGlossary.early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score is not improving by at leasttolforn_iter_no_changeconsecutive epochs.Added in 0.20
validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.Added in 0.20
n_iter_no_change: int, default=5 Number of iterations with no improvement to wait before early stopping.Added in 0.20
class_weight: dict, {class_label: weight} or "balanced", default=None Preset for the class_weight fit parameter.Weights associated with classes. If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y)).warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Seethe Glossary.
Attributes
classes_: ndarray of shape (n_classes,) The unique classes labels.coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features.intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: int The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit.t_: int Number of weight updates performed during training. Same as(n_iter_ * n_samples + 1).
See Also
sklearn.linear_model.SGDClassifier: Linear classifiers (SVM, logistic regression, etc.) with SGD training.
Notes
Perceptron is a classification algorithm which shares the same underlying implementation with SGDClassifier. In fact, Perceptron() is equivalent to SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None).
References
https://en.wikipedia.org/wiki/Perceptron and references therein.
Examples
from sklearn.datasets import load_digits from sklearn.linear_model import Perceptron X, y = load_digits(return_X_y=True) clf = Perceptron(tol=1e-3, random_state=0) clf.fit(X, y)
Perceptron()
clf.score(X, y)
0.939...
24.2.28 /quadratic-discriminant-analysis
| name | type | default | description |
|---|---|---|---|
| covariance-estimator | |||
| priors | |||
| reg-param | |||
| shrinkage | |||
| solver | |||
| store-covariance | |||
| tol | |||
| predict-proba? |
Quadratic Discriminant Analysis.
A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.
The model fits a Gaussian density to each class.
Added in 0.17
For a comparison between ~sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis and ~sklearn.discriminant_analysis.LinearDiscriminantAnalysis, see :ref:sphx_glr_auto_examples_classification_plot_lda_qda.py.
Read more in the User Guide: lda_qda.
Parameters
solver: {'svd', 'eigen'}, default='svd' Solver to use, possible values: - 'svd': Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features. - 'eigen': Eigenvalue decomposition. Can be combined with shrinkage or custom covariance estimator.shrinkage: 'auto' or float, default=None Shrinkage parameter, possible values: - None: no shrinkage (default). - 'auto': automatic shrinkage using the Ledoit-Wolf lemma. - float between 0 and 1: fixed shrinkage parameter.Enabling shrinkage is expected to improve the model when some classes have a relatively small number of training data points compared to the number of features by mitigating overfitting during the covariance estimation step.
This should be left to
Noneifcovariance_estimatoris used. Note that shrinkage works only with 'eigen' solver.priors: array-like of shape (n_classes,), default=None Class priors. By default, the class proportions are inferred from the training data.reg_param: float, default=0.0 Regularizes the per-class covariance estimates by transforming S2 asS2 = (1 - reg_param) * S2 + reg_param * np.eye(n_features), where S2 corresponds to thescaling_attribute of a given class.store_covariance: bool, default=False If True, the class covariance matrices are explicitly computed and stored in theself.covariance_attribute.Added in 0.17
tol: float, default=1.0e-4 Absolute threshold for the covariance matrix to be considered rank deficient after applying some regularization (seereg_param) to eachSkwhereSkrepresents covariance matrix for k-th class. This parameter does not affect the predictions. It controls when a warning is raised if the covariance matrix is not full rank.Added in 0.17
covariance_estimator: covariance estimator, default=None If not None,covariance_estimatoris used to estimate the covariance matrices instead of relying on the empirical covariance estimator (with potential shrinkage). The object should have a fit method and acovariance_attribute like the estimators insklearn.covariance. If None the shrinkage parameter drives the estimate.This should be left to
Noneifshrinkageis used. Note thatcovariance_estimatorworks only with the 'eigen' solver.
Attributes
covariance_: list of len n_classes of ndarray of shape (n_features, n_features) For each class, gives the covariance matrix estimated using the samples of that class. The estimations are unbiased. Only present ifstore_covarianceis True.means_: array-like of shape (n_classes, n_features) Class-wise means.priors_: array-like of shape (n_classes,) Class priors (sum to 1).rotations_: list of len n_classes of ndarray of shape (n_features, n_k) For each class k an array of shape (n_features, n_k), wheren_k = min(n_features, number of elements in class k)It is the rotation of the Gaussian distribution, i.e. its principal axis. It corresponds toV, the matrix of eigenvectors coming from the SVD ofXk = U S VtwhereXkis the centered matrix of samples from class k.scalings_: list of len n_classes of ndarray of shape (n_k,) For each class, contains the scaling of the Gaussian distributions along its principal axes, i.e. the variance in the rotated coordinate system. It corresponds toS^2 / (n_samples - 1), whereSis the diagonal matrix of singular values from the SVD ofXk, whereXkis the centered matrix of samples from class k.classes_: ndarray of shape (n_classes,) Unique class labels.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
LinearDiscriminantAnalysis: Linear Discriminant Analysis.
Examples
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) y = np.array([1, 1, 1, 2, 2, 2]) clf = QuadraticDiscriminantAnalysis() clf.fit(X, y)
QuadraticDiscriminantAnalysis()
print(clf.predict([[-0.8, -1]]))
[1]
24.2.29 /radius-neighbors-classifier
| name | type | default | description |
|---|---|---|---|
| weights | |||
| p | |||
| leaf-size | |||
| metric-params | |||
| radius | |||
| outlier-label | |||
| algorithm | |||
| n-jobs | |||
| metric | |||
| predict-proba? |
Classifier implementing a vote among neighbors within a given radius.
Read more in the User Guide: classification.
Parameters
radius: float, default=1.0 Range of parameter space to use by default forradius_neighborsqueries.weights: {'uniform', 'distance'}, callable or None, default='uniform' Weight function used in prediction. Possible values:- 'uniform' : uniform weights. All points in each neighborhood are weighted equally.
- 'distance' : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Uniform weights are used by default.
algorithm: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto' Algorithm used to compute the nearest neighbors:- 'ball_tree' will use
BallTree - 'kd_tree' will use
KDTree - 'brute' will use a brute-force search.
- 'auto' will attempt to decide the most appropriate algorithm based on the values passed to
fitmethod.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
- 'ball_tree' will use
leaf_size: int, default=30 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.p: float, default=2 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected to be positive.metric: str or callable, default='minkowski' Metric to use for distance computation. Default is "minkowski", which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in~sklearn.metrics.pairwise.distance_metricsfor valid metric values.If metric is "precomputed", X is assumed to be a distance matrix and must be square during fit. X may be a
sparse graph, in which case only "nonzero" elements may be considered neighbors.If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy's metrics, but is less efficient than passing the metric name as a string.
outlier_label: {manual label, 'most_frequent'}, default=None Label for outlier samples (samples with no neighbors in given radius).- manual label: str or int label (should be the same type as y) or list of manual labels if multi-output is used.
- 'most_frequent' : assign the most frequent label of y to outliers.
- None : when any outlier is detected, ValueError will be raised.
The outlier label should be selected from among the unique 'Y' labels. If it is specified with a different value a warning will be raised and all class probabilities of outliers will be assigned to be 0.
metric_params: dict, default=None Additional keyword arguments for the metric function.n_jobs: int, default=None The number of parallel jobs to run for neighbors search.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.
Attributes
classes_: ndarray of shape (n_classes,) Class labels known to the classifier.effective_metric_: str or callable The distance metric used. It will be same as themetricparameter or a synonym of it, e.g. 'euclidean' if themetricparameter set to 'minkowski' andpparameter set to 2.effective_metric_params_: dict Additional keyword arguments for the metric function. For most metrics will be same withmetric_paramsparameter, but may also contain thepparameter value if theeffective_metric_attribute is set to 'minkowski'.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_samples_fit_: int Number of samples in the fitted data.outlier_label_: int or array-like of shape (n_class,) Label which is given for outlier samples (samples with no neighbors on given radius).outputs_2d_: bool False wheny's shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.
See Also
KNeighborsClassifier: Classifier implementing the k-nearest neighbors vote.RadiusNeighborsRegressor: Regression based on neighbors within a fixed radius.KNeighborsRegressor: Regression based on k-nearest neighbors.NearestNeighbors: Unsupervised learner for implementing neighbor searches.
Notes
See Nearest Neighbors: neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Examples
X = [[0], [1], [2], [3]] y = [0, 0, 1, 1] from sklearn.neighbors import RadiusNeighborsClassifier neigh = RadiusNeighborsClassifier(radius=1.0) neigh.fit(X, y)
RadiusNeighborsClassifier(...)
print(neigh.predict([[1.5]]))
[0]
print(neigh.predict_proba([[1.0]]))
[[0.66666667 0.33333333]]
24.2.30 /random-forest-classifier
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| class-weight | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
A random forest classifier.
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. Trees in the forest use the best split strategy, i.e. equivalent to passing splitter="best" to the underlying ~sklearn.tree.DecisionTreeClassifier. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.
For a comparison between tree-based ensemble models see the example :ref:sphx_glr_auto_examples_ensemble_plot_forest_hist_grad_boosting_comparison.py.
This estimator has native support for missing values (NaNs). During training, the tree grower learns at each split point whether samples with missing values should go to the left or right child, based on the potential gain. When predicting, samples with missing values are assigned to the left or right child consequently. If no missing values were encountered for a given feature during training, then samples with missing values are mapped to whichever child has the most samples.
Read more in the User Guide: forest.
Parameters
n_estimators: int, default=100 The number of trees in the forest.Changed in 0.22 The default value of
n_estimatorschanged from 10 to 100 in 0.22.criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:tree_mathematical_formulation. Note: This parameter is tree-specific.max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node:- If int, then consider
min_samples_splitas the minimum number. - If float, then
min_samples_splitis a fraction andceil(min_samples_split * n_samples)are the minimum number of samples for each split.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at leastmin_samples_leaftraining samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.- If int, then consider
min_samples_leafas the minimum number. - If float, then
min_samples_leafis a fraction andceil(min_samples_leaf * n_samples)are the minimum number of samples for each node.
Changed in 0.18 Added float values for fractions.
- If int, then consider
min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.max_features: {"sqrt", "log2", None}, int or float, default="sqrt" The number of features to consider when looking for the best split:- If int, then consider
max_featuresfeatures at each split. - If float, then
max_featuresis a fraction andmax(1, int(max_features * n_features_in_))features are considered at each split. - If "sqrt", then
max_features=sqrt(n_features). - If "log2", then
max_features=log2(n_features). - If None, then
max_features=n_features.
Changed in 1.1 The default of
max_featureschanged from"auto"to"sqrt".Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_featuresfeatures.- If int, then consider
max_leaf_nodes: int, default=None Grow trees withmax_leaf_nodesin best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value.The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
e ``N`` is the total number of samples, ``N_t`` is the number of
les at the current node, ``N_t_L`` is the number of samples in the
child, and ``N_t_R`` is the number of samples in the right child.
`, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
`sample_weight`` is passed.
ersionadded:: 0.19
bootstrap: bool, default=True Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.oob_score: bool or callable, default=False Whether to use out-of-bag samples to estimate the generalization score. By default,~sklearn.metrics.accuracy_scoreis used. Provide a callable with signaturemetric(y_true, y_pred)to use a custom metric. Only available ifbootstrap=True.For an illustration of out-of-bag (OOB) error estimation, see the example :ref:
sphx_glr_auto_examples_ensemble_plot_ensemble_oob.py.n_jobs: int, default=None The number of jobs to run in parallel.fit,predict,decision_pathandapplyare all parallelized over the trees.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.random_state: int, RandomState instance or None, default=None Controls both the randomness of the bootstrapping of the samples used when building trees (ifbootstrap=True) and the sampling of the features to consider when looking for the best split at each node (ifmax_features < n_features). SeeGlossaryfor details.verbose: int, default=0 Controls the verbosity when fitting and predicting.warm_start: bool, default=False When set toTrue, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. SeeGlossaryand :ref:tree_ensemble_warm_startfor details.class_weight: {"balanced", "balanced_subsample"}, dict or list of dicts, default=None Weights associated with classes in the form{class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))The "balanced_subsample" mode is the same as "balanced" except that weights are computed based on the bootstrap sample for every tree grown.
For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller thanccp_alphawill be chosen. By default, no pruning is performed. See :ref:minimal_cost_complexity_pruningfor details. See :ref:sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.pyfor an example of such pruning.Added in 0.22
max_samples: int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator.- If None (default), then draw
X.shape[0]samples. - If int, then draw
max_samplessamples. - If float, then draw
max(round(n_samples * max_samples), 1)samples. Thus,max_samplesshould be in the interval(0.0, 1.0].
Added in 0.22
- If None (default), then draw
monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decreaseIf monotonic_cst is None, no constraints are applied.
Monotonicity constraints are not supported for: - multiclass classifications (i.e. when
n_classes > 2), - multioutput classifications (i.e. whenn_outputs_ > 1), - classifications trained on data with missing values.The constraints hold over the probability of the positive class.
Read more in the User Guide:
monotonic_cst_gbdt.Added in 1.4
Attributes
estimator_:~sklearn.tree.DecisionTreeClassifierThe child estimator template used to create the collection of fitted sub-estimators.Added in 1.2
base_estimator_was renamed toestimator_.estimators_: list of DecisionTreeClassifier The collection of fitted sub-estimators.classes_: ndarray of shape (n_classes,) or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).n_classes_: int or list The number of classes (single output problem), or a list containing the number of classes for each output (multi-output problem).n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_outputs_: int The number of outputs whenfitis performed.feature_importances_: ndarray of shape (n_features,) The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importanceas an alternative.oob_score_: float Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only whenoob_scoreis True.oob_decision_function_: ndarray of shape (n_samples, n_classes) or (n_samples, n_classes, n_outputs) Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case,oob_decision_function_might contain NaN. This attribute exists only whenoob_scoreis True.estimators_samples_: list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.Added in 1.4
See Also
sklearn.tree.DecisionTreeClassifier: A decision tree classifier.sklearn.ensemble.ExtraTreesClassifier: Ensemble of extremely randomized tree classifiers.sklearn.ensemble.HistGradientBoostingClassifier: A Histogram-based Gradient Boosting Classification Tree, very fast for big datasets (n_samples >= 10_000).
Notes
The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.
References
- [1] :doi:
L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001. <10.1023/A:1010933404324>
Examples
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)
RandomForestClassifier(...)
print(clf.predict([[0, 0, 0, 0]]))
[1]
24.2.31 /ridge-classifier
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| solver | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| alpha | |||
| class-weight | |||
| predict-proba? |
Classifier using Ridge regression.
This classifier first converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).
Read more in the User Guide: ridge_regression.
Parameters
alpha: float, default=1.0 Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to1 / (2C)in other linear models such as~sklearn.linear_model.LogisticRegressionor~sklearn.svm.LinearSVC.fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.max_iter: int, default=None Maximum number of iterations for conjugate gradient solver. The default value is determined by scipy.sparse.linalg.tol: float, default=1e-4 The precision of the solution (coef_) is determined bytolwhich specifies a different convergence criterion for each solver:'svd':
tolhas no impact.'cholesky':
tolhas no impact.'sparse_cg': norm of residuals smaller than
tol.'lsqr':
tolis set as atol and btol of scipy.sparse.linalg.lsqr, which control the norm of the residual vector in terms of the norms of matrix and coefficients.'sag' and 'saga': relative change of coef smaller than
tol.'lbfgs': maximum of the absolute (projected) gradient=max|residuals| smaller than
tol.
Changed in 1.2 Default value changed from 1e-3 to 1e-4 for consistency with other linear models.
class_weight: dict or 'balanced', default=None Weights associated with classes in the form{class_label: weight}. If not given, all classes are supposed to have weight one.The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y)).solver: {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto' Solver to use in the computational routines:'auto' chooses the solver automatically based on the type of data.
'svd' uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than 'cholesky' at the cost of being slower.
'cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution.
'sparse_cg' uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than 'cholesky' for large-scale data (possibility to set
tolandmax_iter).'lsqr' uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
'sag' uses a Stochastic Average Gradient descent, and 'saga' uses its unbiased and more flexible version named SAGA. Both methods use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
Added in 0.17 Stochastic Average Gradient descent solver. Added in 0.19 SAGA solver.
'lbfgs' uses L-BFGS-B algorithm implemented in
scipy.optimize.minimize. It can be used only whenpositiveis True.
positive: bool, default=False When set toTrue, forces the coefficients to be positive. Only 'lbfgs' solver is supported in this case.random_state: int, RandomState instance, default=None Used whensolver== 'sag' or 'saga' to shuffle the data. SeeGlossaryfor details.
Attributes
coef_: ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.coef_is of shape (1, n_features) when the given problem is binary.intercept_: float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 iffit_intercept = False.n_iter_: None or ndarray of shape (n_targets,) Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.classes_: ndarray of shape (n_classes,) The classes labels.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
solver_: str The solver that was used at fit time by the computational routines.Added in 1.5
See Also
Ridge: Ridge regression.RidgeClassifierCV: Ridge classifier with built-in cross validation.
Notes
For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.
Examples
from sklearn.datasets import load_breast_cancer from sklearn.linear_model import RidgeClassifier X, y = load_breast_cancer(return_X_y=True) clf = RidgeClassifier().fit(X, y) clf.score(X, y)
0.9595...
24.2.32 /ridge-classifier-cv
| name | type | default | description |
|---|---|---|---|
| alphas | |||
| class-weight | |||
| cv | |||
| fit-intercept | |||
| scoring | |||
| store-cv-results | |||
| predict-proba? |
Ridge classifier with built-in cross-validation.
See glossary entry for cross-validation estimator.
By default, it performs Leave-One-Out Cross-Validation. Currently, only the n_features > n_samples case is handled efficiently.
Read more in the User Guide: ridge_regression.
Parameters
alphas: array-like of shape (n_alphas,), default=(0.1, 1.0, 10.0) Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to1 / (2C)in other linear models such as~sklearn.linear_model.LogisticRegressionor~sklearn.svm.LinearSVC. If using Leave-One-Out cross-validation, alphas must be strictly positive.fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).scoring: str, callable, default=None The scoring method to use for cross-validation. Options:- str: see :ref:
scoring_string_namesfor options. - callable: a scorer callable object (e.g., function) with signature
scorer(estimator, X, y). See :ref:scoring_callablefor details. None: negative mean squared error:mean_squared_errorif cv is None (i.e. when using leave-one-out cross-validation), or accuracy:accuracy_scoreotherwise.
- str: see :ref:
cv: int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are:- None, to use the efficient Leave-One-Out cross-validation
- integer, to specify the number of folds.
CV splitter,- An iterable yielding (train, test) splits as arrays of indices.
Refer User Guide:
cross_validationfor the various cross-validation strategies that can be used here.class_weight: dict or 'balanced', default=None Weights associated with classes in the form{class_label: weight}. If not given, all classes are supposed to have weight one.The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y)).store_cv_results: bool, default=False Flag indicating if the cross-validation results corresponding to each alpha should be stored in thecv_results_attribute (see below). This flag is only compatible withcv=None(i.e. using Leave-One-Out Cross-Validation).Changed in 1.5 Parameter name changed from
store_cv_valuestostore_cv_results.
Attributes
cv_results_: ndarray of shape (n_samples, n_targets, n_alphas), optional Cross-validation results for each alpha (only ifstore_cv_results=Trueandcv=None). Afterfit()has been called, this attribute will contain the mean squared errors ifscoring is Noneotherwise it will contain standardized per point prediction values.Changed in 1.5
cv_values_changed tocv_results_.coef_: ndarray of shape (1, n_features) or (n_targets, n_features) Coefficient of the features in the decision function.coef_is of shape (1, n_features) when the given problem is binary.intercept_: float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 iffit_intercept = False.alpha_: float Estimated regularization parameter.best_score_: float Score of base estimator with best alpha.Added in 0.23
classes_: ndarray of shape (n_classes,) The classes labels.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
Ridge: Ridge regression.RidgeClassifier: Ridge classifier.RidgeCV: Ridge regression with built-in cross validation.
Notes
For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.
Examples
from sklearn.datasets import load_breast_cancer from sklearn.linear_model import RidgeClassifierCV X, y = load_breast_cancer(return_X_y=True) clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y) clf.score(X, y)
0.9630...
24.2.33 /self-training-classifier
| name | type | default | description |
|---|---|---|---|
| criterion | |||
| estimator | |||
| k-best | |||
| max-iter | |||
| threshold | |||
| verbose | |||
| predict-proba? |
Self-training classifier.
This metaestimator allows a given supervised classifier to function as a semi-supervised classifier, allowing it to learn from unlabeled data. It does this by iteratively predicting pseudo-labels for the unlabeled data and adding them to the training set.
The classifier will continue iterating until either max_iter is reached, or no pseudo-labels were added to the training set in the previous iteration.
Read more in the User Guide: self_training.
Parameters
estimator: estimator object An estimator object implementingfitandpredict_proba. Invoking thefitmethod will fit a clone of the passed estimator, which will be stored in theestimator_attribute.Added in 1.6
estimatorwas added to replacebase_estimator.threshold: float, default=0.75 The decision threshold for use withcriterion='threshold'. Should be in [0, 1). When using the'threshold'criterion, a well calibrated classifier:calibrationshould be used.criterion: {'threshold', 'k_best'}, default='threshold' The selection criterion used to select which labels to add to the training set. If'threshold', pseudo-labels with prediction probabilities abovethresholdare added to the dataset. If'k_best', thek_bestpseudo-labels with highest prediction probabilities are added to the dataset. When using the 'threshold' criterion, a well calibrated classifier:calibrationshould be used.k_best: int, default=10 The amount of samples to add in each iteration. Only used whencriterion='k_best'.max_iter: int or None, default=10 Maximum number of iterations allowed. Should be greater than or equal to 0. If it isNone, the classifier will continue to predict labels until no new pseudo-labels are added, or all unlabeled samples have been labeled.verbose: bool, default=False Enable verbose output.
Attributes
estimator_: estimator object The fitted estimator.classes_: ndarray or list of ndarray of shape (n_classes,) Class labels for each output. (Taken from the trainedestimator_).transduction_: ndarray of shape (n_samples,) The labels used for the final fit of the classifier, including pseudo-labels added during fit.labeled_iter_: ndarray of shape (n_samples,) The iteration in which each sample was labeled. When a sample has iteration 0, the sample was already labeled in the original dataset. When a sample has iteration -1, the sample was not labeled in any iteration.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: int The number of rounds of self-training, that is the number of times the base estimator is fitted on relabeled variants of the training set.termination_condition_: {'max_iter', 'no_change', 'all_labeled'} The reason that fitting was stopped.'max_iter':n_iter_reachedmax_iter.'no_change': no new labels were predicted.'all_labeled': all unlabeled samples were labeled beforemax_iterwas reached.
See Also
LabelPropagation: Label propagation classifier.LabelSpreading: Label spreading model for semi-supervised learning.
References
:doi:David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics (ACL '95). Association for Computational Linguistics, Stroudsburg, PA, USA, 189-196. <10.3115/981658.981684>
Examples
import numpy as np from sklearn import datasets from sklearn.semi_supervised import SelfTrainingClassifier from sklearn.svm import SVC rng = np.random.RandomState(42) iris = datasets.load_iris() random_unlabeled_points = rng.rand(iris.target.shape[0]) < 0.3 iris.target[random_unlabeled_points] = -1 svc = SVC(probability=True, gamma="auto") self_training_model = SelfTrainingClassifier(svc) self_training_model.fit(iris.data, iris.target)
SelfTrainingClassifier(...)
24.2.34 /sgd-classifier
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| average | |||
| tol | |||
| early-stopping | |||
| eta-0 | |||
| shuffle | |||
| penalty | |||
| power-t | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| l-1-ratio | |||
| validation-fraction | |||
| class-weight | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
Linear classifiers (SVM, logistic regression, etc.) with SGD training.
This estimator implements regularized linear models with stochastic gradient descent (SGD) learning: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate). SGD allows minibatch (online/out-of-core) learning via the partial_fit method. For best results using the default learning rate schedule, the data should have zero mean and unit variance.
This implementation works with data represented as dense or sparse arrays of floating point values for the features. The model it fits can be controlled with the loss parameter; by default, it fits a linear support vector machine (SVM).
The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.
Read more in the User Guide: sgd.
Parameters
loss: {'hinge', 'log_loss', 'modified_huber', 'squared_hinge', 'perceptron', 'squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive'}, default='hinge' The loss function to be used.- 'hinge' gives a linear SVM.
- 'log_loss' gives logistic regression, a probabilistic classifier.
- 'modified_huber' is another smooth loss that brings tolerance to outliers as well as probability estimates.
- 'squared_hinge' is like hinge but is quadratically penalized.
- 'perceptron' is the linear loss used by the perceptron algorithm.
- The other losses, 'squared_error', 'huber', 'epsilon_insensitive' and 'squared_epsilon_insensitive' are designed for regression but can be useful in classification as well; see
~sklearn.linear_model.SGDRegressorfor a description.
More details about the losses formulas can be found in the User Guide:
sgd_mathematical_formulationand you can find a visualisation of the loss functions in :ref:sphx_glr_auto_examples_linear_model_plot_sgd_loss_functions.py.penalty: {'l2', 'l1', 'elasticnet', None}, default='l2' The penalty (aka regularization term) to be used. Defaults to 'l2' which is the standard regularizer for linear SVM models. 'l1' and 'elasticnet' might bring sparsity to the model (feature selection) not achievable with 'l2'. No penalty is added when set toNone.You can see a visualisation of the penalties in :ref:
sphx_glr_auto_examples_linear_model_plot_sgd_penalties.py.alpha: float, default=0.0001 Constant that multiplies the regularization term. The higher the value, the stronger the regularization. Also used to compute the learning rate whenlearning_rateis set to 'optimal'. Values must be in the range[0.0, inf).l1_ratio: float, default=0.15 The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used ifpenaltyis 'elasticnet'. Values must be in the range[0.0, 1.0]or can beNoneifpenaltyis notelasticnet.Changed in 1.7
l1_ratiocan beNonewhenpenaltyis not "elasticnet".fit_intercept: bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.max_iter: int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in thefitmethod, and not thepartial_fitmethod. Values must be in the range[1, inf).Added in 0.19
tol: float or None, default=1e-3 The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) forn_iter_no_changeconsecutive epochs. Convergence is checked against the training loss or the validation loss depending on theearly_stoppingparameter. Values must be in the range[0.0, inf).Added in 0.19
shuffle: bool, default=True Whether or not the training data should be shuffled after each epoch.verbose: int, default=0 The verbosity level. Values must be in the range[0, inf).epsilon: float, default=0.1 Epsilon in the epsilon-insensitive loss functions; only iflossis 'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive'. For 'huber', determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold. Values must be in the range[0.0, inf).n_jobs: int, default=None The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.random_state: int, RandomState instance, default=None Used for shuffling the data, whenshuffleis set toTrue. Pass an int for reproducible output across multiple function calls. SeeGlossary. Integer values must be in the range[0, 2**32 - 1].learning_rate: str, default='optimal' The learning rate schedule:- 'constant':
eta = eta0 - 'optimal':
eta = 1.0 / (alpha * (t + t0))wheret0is chosen by a heuristic proposed by Leon Bottou. - 'invscaling':
eta = eta0 / pow(t, power_t) - 'adaptive':
eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol ifearly_stoppingisTrue, the current learning rate is divided by 5. - 'pa1': passive-aggressive algorithm 1, see [1]_. Only with
loss='hinge'. Update isw += eta y xwitheta = min(eta0, loss/||x||**2). - 'pa2': passive-aggressive algorithm 2, see [1]_. Only with
loss='hinge'. Update isw += eta y xwitheta = hinge_loss / (||x||**2 + 1/(2 eta0)).
Added in 0.20 Added 'adaptive' option.
Added in 1.8 Added options 'pa1' and 'pa2'
- 'constant':
eta0: float, default=0.01 The initial learning rate for the 'constant', 'invscaling' or 'adaptive' schedules. The default value is 0.01, but note that eta0 is not used by the default learning rate 'optimal'. Values must be in the range(0.0, inf).For PA-1 (
learning_rate=pa1) and PA-II (pa2), it specifies the aggressiveness parameter for the passive-agressive algorithm, see [1] where it is called C:- For PA-I it is the maximum step size.
- For PA-II it regularizes the step size (the smaller
eta0the more it regularizes).
As a general rule-of-thumb for PA,
eta0should be small when the data is noisy.power_t: float, default=0.5 The exponent for inverse scaling learning rate. Values must be in the range[0.0, inf).Deprecated since 1.8 Negative values for
power_tare deprecated in version 1.8 and will raise an error in 1.10. Use values in the range [0.0, inf) instead.early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set toTrue, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score returned by thescoremethod is not improving by at least tol for n_iter_no_change consecutive epochs.See :ref:
sphx_glr_auto_examples_linear_model_plot_sgd_early_stopping.pyfor an example of the effects of early stopping.Added in 0.20 Added 'early_stopping' option
validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used ifearly_stoppingis True. Values must be in the range(0.0, 1.0).Added in 0.20 Added 'validation_fraction' option
n_iter_no_change: int, default=5 Number of iterations with no improvement to wait before stopping fitting. Convergence is checked against the training loss or the validation loss depending on theearly_stoppingparameter. Integer values must be in the range[1, max_iter).Added in 0.20 Added 'n_iter_no_change' option
class_weight: dict, {class_label: weight} or "balanced", default=None Preset for the class_weight fit parameter.Weights associated with classes. If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y)).warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Seethe Glossary.Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled. If a dynamic learning rate is used, the learning rate is adapted depending on the number of samples already seen. Calling
fitresets this counter, whilepartial_fitwill result in increasing the existing counter.average: bool or int, default=False When set toTrue, computes the averaged SGD weights across all updates and stores the result in thecoef_attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reachesaverage. Soaverage=10will begin averaging after seeing 10 samples. Integer values must be in the range[1, n_samples].
Attributes
coef_: ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features) Weights assigned to the features.intercept_: ndarray of shape (1,) if n_classes == 2 else (n_classes,) Constants in decision function.n_iter_: int The actual number of iterations before reaching the stopping criterion. For multiclass fits, it is the maximum over every binary fit.classes_: array of shape (n_classes,)t_: int Number of weight updates performed during training. Same as(n_iter_ * n_samples + 1).n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
See Also
sklearn.svm.LinearSVC: Linear support vector classification.LogisticRegression: Logistic regression.Perceptron: Inherits from SGDClassifier.Perceptron()is equivalent toSGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None).
References
- [1] Online Passive-Aggressive Algorithms http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR (2006)
Examples
import numpy as np
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array([1, 1, 2, 2])
# Always scale the input. The most convenient way is to use a pipeline.
clf = make_pipeline(StandardScaler(),
SGDClassifier(max_iter=1000, tol=1e-3))
clf.fit(X, Y)
Pipeline(steps=[('standardscaler', StandardScaler()),
('sgdclassifier', SGDClassifier())])
print(clf.predict([[-0.8, -1]]))
[1]
24.2.35 /svc
| name | type | default | description |
|---|---|---|---|
| break-ties | |||
| kernel | |||
| gamma | |||
| degree | |||
| decision-function-shape | |||
| probability | |||
| tol | |||
| shrinking | |||
| c | |||
| max-iter | |||
| random-state | |||
| coef-0 | |||
| class-weight | |||
| cache-size | |||
| verbose | |||
| predict-proba? |
C-Support Vector Classification.
The implementation is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using ~sklearn.svm.LinearSVC or ~sklearn.linear_model.SGDClassifier instead, possibly after a ~sklearn.kernel_approximation.Nystroem transformer or other :ref:kernel_approximation.
The multiclass support is handled according to a one-vs-one scheme.
For details on the precise mathematical formulation of the provided kernel functions and how gamma, coef0 and degree affect each other, see the corresponding section in the narrative documentation: :ref:svm_kernels.
To learn how to tune SVC's hyperparameters, see the following example: :ref:sphx_glr_auto_examples_model_selection_plot_nested_cross_validation_iris.py
Read more in the User Guide: svm_classification.
Parameters
C: float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. For an intuitive visualization of the effects of scaling the regularization parameter C, see :ref:sphx_glr_auto_examples_svm_plot_svm_scale_c.py.kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape(n_samples, n_samples). For an intuitive visualization of different kernel types see :ref:sphx_glr_auto_examples_svm_plot_svm_kernels.py.degree: int, default=3 Degree of the polynomial kernel function ('poly'). Must be non-negative. Ignored by all other kernels.gamma: {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'.- if
gamma='scale'(default) is passed then it uses 1 / (n_features * X.var()) as value of gamma, - if 'auto', uses 1 / n_features
- if float, must be non-negative.
Changed in 0.22 The default value of
gammachanged from 'auto' to 'scale'.- if
coef0: float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'.shrinking: bool, default=True Whether to use the shrinking heuristic. See the User Guide:shrinking_svm.probability: bool, default=False Whether to enable probability estimates. This must be enabled prior to callingfit, will slow down that method as it internally uses 5-fold cross-validation, andpredict_probamay be inconsistent withpredict. Read more in the User Guide:scores_probabilities.tol: float, default=1e-3 Tolerance for stopping criterion.cache_size: float, default=200 Specify the size of the kernel cache (in MB).class_weight: dict or 'balanced', default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y)).verbose: bool, default=False Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.max_iter: int, default=-1 Hard limit on iterations within solver, or -1 for no limit.decision_function_shape: {'ovo', 'ovr'}, default='ovr' Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one ('ovo') decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, note that internally, one-vs-one ('ovo') is always used as a multi-class strategy to train models; an ovr matrix is only constructed from the ovo matrix. The parameter is ignored for binary classification.Changed in 0.19 decision_function_shape is 'ovr' by default.
Added in 0.17 decision_function_shape='ovr' is recommended.
Changed in 0.17 Deprecated decision_function_shape='ovo' and None.
break_ties: bool, default=False If true,decision_function_shape='ovr', and number of classes > 2,predictwill break ties according to the confidence values ofdecision_function; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict. See :ref:sphx_glr_auto_examples_svm_plot_svm_tie_breaking.pyfor an example of its usage withdecision_function_shape='ovr'.Added in 0.22
random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored whenprobabilityis False. Pass an int for reproducible output across multiple function calls. SeeGlossary.
Attributes
class_weight_: ndarray of shape (n_classes,) Multipliers of parameter C for each class. Computed based on theclass_weightparameter.classes_: ndarray of shape (n_classes,) The classes labels.coef_: ndarray of shape (n_classes * (n_classes - 1) / 2, n_features) Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.coef_is a readonly property derived fromdual_coef_andsupport_vectors_.dual_coef_: ndarray of shape (n_classes -1, n_SV) Dual coefficients of the support vector in the decision function (see :ref:sgd_mathematical_formulation), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the multi-class section of the User Guide:svm_multi_classfor details.fit_status_: int 0 if correctly fitted, 1 otherwise (will raise warning)intercept_: ndarray of shape (n_classes * (n_classes - 1) / 2,) Constants in decision function.n_features_in_: int Number of features seen duringfit.Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen duringfit. Defined only whenXhas feature names that are all strings.Added in 1.0
n_iter_: ndarray of shape (n_classes * (n_classes - 1) // 2,) Number of iterations run by the optimization routine to fit the model. The shape of this attribute depends on the number of models optimized which in turn depends on the number of classes.Added in 1.1
support_: ndarray of shape (n_SV) Indices of support vectors.support_vectors_: ndarray of shape (n_SV, n_features) Support vectors. An empty array if kernel is precomputed.n_support_: ndarray of shape (n_classes,), dtype=int32 Number of support vectors for each class.probA_: ndarray of shape (n_classes * (n_classes - 1) / 2)probB_: ndarray of shape (n_classes * (n_classes - 1) / 2) Ifprobability=True, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. Ifprobability=False, it's an empty array. Platt scaling uses the logistic function1 / (1 + exp(decision_value * probA_ + probB_))whereprobA_andprobB_are learned from the dataset [2]. For more information on the multiclass case and training procedure see section 8 of [1].shape_fit_: tuple of int of shape (n_dimensions_of_X,) Array dimensions of training vectorX.
See Also
SVR: Support Vector Machine for Regression implemented using libsvm.LinearSVC: Scalable Linear Support Vector Machine for classification implemented using liblinear. Check the See Also section of LinearSVC for more comparison element.
References
Examples
import numpy as np from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.svm import SVC clf = make_pipeline(StandardScaler(), SVC(gamma='auto')) clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
('svc', SVC(gamma='auto'))])
print(clf.predict([[-0.8, -1]]))
[1]
For a comparison of the SVC with other classifiers see: :ref:sphx_glr_auto_examples_classification_plot_classification_probability.py.
24.3 :sklearn.regression models
24.3.1 /ada-boost-regressor
| name | type | default | description |
|---|---|---|---|
| estimator | |||
| learning-rate | |||
| loss | |||
| n-estimators | |||
| random-state | |||
| predict-proba? |
24.3.2 /ard-regression
| name | type | default | description |
|---|---|---|---|
| tol | |||
| alpha-2 | |||
| threshold-lambda | |||
| max-iter | |||
| lambda-1 | |||
| copy-x | |||
| lambda-2 | |||
| fit-intercept | |||
| alpha-1 | |||
| verbose | |||
| compute-score | |||
| predict-proba? |
24.3.3 /bagging-regressor
| name | type | default | description |
|---|---|---|---|
| bootstrap | |||
| bootstrap-features | |||
| n-jobs | |||
| random-state | |||
| estimator | |||
| oob-score | |||
| max-features | |||
| warm-start | |||
| n-estimators | |||
| max-samples | |||
| verbose | |||
| predict-proba? |
24.3.4 /bayesian-ridge
| name | type | default | description |
|---|---|---|---|
| tol | |||
| alpha-2 | |||
| max-iter | |||
| lambda-1 | |||
| copy-x | |||
| lambda-2 | |||
| alpha-init | |||
| fit-intercept | |||
| alpha-1 | |||
| lambda-init | |||
| verbose | |||
| compute-score | |||
| predict-proba? |
24.3.5 /cca
| name | type | default | description |
|---|---|---|---|
| copy | |||
| max-iter | |||
| n-components | |||
| scale | |||
| tol | |||
| predict-proba? |
24.3.6 /decision-tree-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| criterion | |||
| predict-proba? |
24.3.7 /dummy-regressor
| name | type | default | description |
|---|---|---|---|
| constant | |||
| quantile | |||
| strategy | |||
| predict-proba? |
24.3.8 /elastic-net
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| selection | |||
| l-1-ratio | |||
| predict-proba? |
24.3.9 /elastic-net-cv
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| l-1-ratio | |||
| verbose | |||
| predict-proba? |
24.3.10 /extra-tree-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| ccp-alpha | |||
| splitter | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| max-depth | |||
| criterion | |||
| predict-proba? |
24.3.11 /extra-trees-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
24.3.12 /gamma-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| fit-intercept | |||
| max-iter | |||
| solver | |||
| tol | |||
| verbose | |||
| warm-start | |||
| predict-proba? |
24.3.13 /gaussian-process-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| copy-x-train | |||
| kernel | |||
| n-restarts-optimizer | |||
| n-targets | |||
| normalize-y | |||
| optimizer | |||
| random-state | |||
| predict-proba? |
24.3.14 /gradient-boosting-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| tol | |||
| subsample | |||
| ccp-alpha | |||
| random-state | |||
| min-samples-leaf | |||
| max-features | |||
| init | |||
| alpha | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| n-estimators | |||
| criterion | |||
| loss | |||
| verbose | |||
| predict-proba? |
24.3.15 /hist-gradient-boosting-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| max-leaf-nodes | |||
| scoring | |||
| tol | |||
| early-stopping | |||
| quantile | |||
| max-iter | |||
| random-state | |||
| max-bins | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| validation-fraction | |||
| loss | |||
| interaction-cst | |||
| verbose | |||
| categorical-features | |||
| l-2-regularization | |||
| predict-proba? |
24.3.16 /huber-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| epsilon | |||
| fit-intercept | |||
| max-iter | |||
| tol | |||
| warm-start | |||
| predict-proba? |
24.3.17 /isotonic-regression
| name | type | default | description |
|---|---|---|---|
| increasing | |||
| out-of-bounds | |||
| y-max | |||
| y-min | |||
| predict-proba? |
24.3.18 /k-neighbors-regressor
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| leaf-size | |||
| metric | |||
| metric-params | |||
| n-jobs | |||
| n-neighbors | |||
| p | |||
| weights | |||
| predict-proba? |
24.3.19 /kernel-ridge
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| coef-0 | |||
| degree | |||
| gamma | |||
| kernel | |||
| kernel-params | |||
| predict-proba? |
24.3.20 /lars
| name | type | default | description |
|---|---|---|---|
| fit-path | |||
| eps | |||
| random-state | |||
| jitter | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| n-nonzero-coefs | |||
| verbose | |||
| predict-proba? |
24.3.21 /lars-cv
| name | type | default | description |
|---|---|---|---|
| eps | |||
| max-n-alphas | |||
| max-iter | |||
| n-jobs | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| verbose | |||
| predict-proba? |
24.3.22 /lasso
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| selection | |||
| predict-proba? |
24.3.23 /lasso-cv
| name | type | default | description |
|---|---|---|---|
| positive | |||
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| verbose | |||
| predict-proba? |
24.3.24 /lasso-lars
| name | type | default | description |
|---|---|---|---|
| positive | |||
| fit-path | |||
| eps | |||
| max-iter | |||
| random-state | |||
| jitter | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| alpha | |||
| verbose | |||
| predict-proba? |
24.3.25 /lasso-lars-cv
| name | type | default | description |
|---|---|---|---|
| positive | |||
| eps | |||
| max-n-alphas | |||
| max-iter | |||
| n-jobs | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| cv | |||
| verbose | |||
| predict-proba? |
24.3.26 /lasso-lars-ic
| name | type | default | description |
|---|---|---|---|
| positive | |||
| eps | |||
| noise-variance | |||
| max-iter | |||
| copy-x | |||
| precompute | |||
| fit-intercept | |||
| criterion | |||
| verbose | |||
| predict-proba? |
24.3.27 /linear-regression
| name | type | default | description |
|---|---|---|---|
| copy-x | |||
| fit-intercept | |||
| n-jobs | |||
| positive | |||
| tol | |||
| predict-proba? |
24.3.28 /linear-svr
| name | type | default | description |
|---|---|---|---|
| tol | |||
| intercept-scaling | |||
| c | |||
| max-iter | |||
| random-state | |||
| dual | |||
| fit-intercept | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.29 /mlp-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| activation | |||
| hidden-layer-sizes | |||
| tol | |||
| beta-2 | |||
| early-stopping | |||
| nesterovs-momentum | |||
| batch-size | |||
| solver | |||
| shuffle | |||
| power-t | |||
| max-fun | |||
| beta-1 | |||
| max-iter | |||
| random-state | |||
| momentum | |||
| learning-rate-init | |||
| alpha | |||
| warm-start | |||
| validation-fraction | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.30 /multi-task-elastic-net
| name | type | default | description |
|---|---|---|---|
| tol | |||
| max-iter | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| selection | |||
| l-1-ratio | |||
| predict-proba? |
24.3.31 /multi-task-elastic-net-cv
| name | type | default | description |
|---|---|---|---|
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| l-1-ratio | |||
| verbose | |||
| predict-proba? |
24.3.32 /multi-task-lasso
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| copy-x | |||
| fit-intercept | |||
| max-iter | |||
| random-state | |||
| selection | |||
| tol | |||
| warm-start | |||
| predict-proba? |
24.3.33 /multi-task-lasso-cv
| name | type | default | description |
|---|---|---|---|
| tol | |||
| n-alphas | |||
| eps | |||
| alphas | |||
| max-iter | |||
| n-jobs | |||
| random-state | |||
| copy-x | |||
| fit-intercept | |||
| cv | |||
| selection | |||
| verbose | |||
| predict-proba? |
24.3.34 /nu-svr
| name | type | default | description |
|---|---|---|---|
| kernel | |||
| gamma | |||
| degree | |||
| tol | |||
| nu | |||
| shrinking | |||
| c | |||
| max-iter | |||
| coef-0 | |||
| cache-size | |||
| verbose | |||
| predict-proba? |
24.3.35 /orthogonal-matching-pursuit
| name | type | default | description |
|---|---|---|---|
| fit-intercept | |||
| n-nonzero-coefs | |||
| precompute | |||
| tol | |||
| predict-proba? |
24.3.36 /orthogonal-matching-pursuit-cv
| name | type | default | description |
|---|---|---|---|
| copy | |||
| cv | |||
| fit-intercept | |||
| max-iter | |||
| n-jobs | |||
| verbose | |||
| predict-proba? |
24.3.37 /passive-aggressive-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| average | |||
| tol | |||
| early-stopping | |||
| shuffle | |||
| c | |||
| max-iter | |||
| random-state | |||
| fit-intercept | |||
| warm-start | |||
| validation-fraction | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.38 /pls-canonical
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| copy | |||
| max-iter | |||
| n-components | |||
| scale | |||
| tol | |||
| predict-proba? |
24.3.39 /pls-regression
| name | type | default | description |
|---|---|---|---|
| copy | |||
| max-iter | |||
| n-components | |||
| scale | |||
| tol | |||
| predict-proba? |
24.3.40 /poisson-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| fit-intercept | |||
| max-iter | |||
| solver | |||
| tol | |||
| verbose | |||
| warm-start | |||
| predict-proba? |
24.3.41 /quantile-regressor
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| fit-intercept | |||
| quantile | |||
| solver | |||
| solver-options | |||
| predict-proba? |
24.3.42 /radius-neighbors-regressor
| name | type | default | description |
|---|---|---|---|
| algorithm | |||
| leaf-size | |||
| metric | |||
| metric-params | |||
| n-jobs | |||
| p | |||
| radius | |||
| weights | |||
| predict-proba? |
24.3.43 /random-forest-regressor
| name | type | default | description |
|---|---|---|---|
| min-weight-fraction-leaf | |||
| max-leaf-nodes | |||
| min-impurity-decrease | |||
| min-samples-split | |||
| bootstrap | |||
| ccp-alpha | |||
| n-jobs | |||
| random-state | |||
| oob-score | |||
| min-samples-leaf | |||
| max-features | |||
| monotonic-cst | |||
| warm-start | |||
| max-depth | |||
| n-estimators | |||
| max-samples | |||
| criterion | |||
| verbose | |||
| predict-proba? |
24.3.44 /ransac-regressor
| name | type | default | description |
|---|---|---|---|
| is-data-valid | |||
| max-skips | |||
| random-state | |||
| min-samples | |||
| stop-probability | |||
| estimator | |||
| stop-n-inliers | |||
| max-trials | |||
| residual-threshold | |||
| is-model-valid | |||
| loss | |||
| stop-score | |||
| predict-proba? |
24.3.45 /ridge
| name | type | default | description |
|---|---|---|---|
| alpha | |||
| copy-x | |||
| fit-intercept | |||
| max-iter | |||
| positive | |||
| random-state | |||
| solver | |||
| tol | |||
| predict-proba? |
24.3.46 /ridge-cv
| name | type | default | description |
|---|---|---|---|
| alpha-per-target | |||
| alphas | |||
| cv | |||
| fit-intercept | |||
| gcv-mode | |||
| scoring | |||
| store-cv-results | |||
| predict-proba? |
24.3.47 /sgd-regressor
| name | type | default | description |
|---|---|---|---|
| n-iter-no-change | |||
| learning-rate | |||
| average | |||
| tol | |||
| early-stopping | |||
| eta-0 | |||
| shuffle | |||
| penalty | |||
| power-t | |||
| max-iter | |||
| random-state | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| l-1-ratio | |||
| validation-fraction | |||
| loss | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.48 /svr
| name | type | default | description |
|---|---|---|---|
| kernel | |||
| gamma | |||
| degree | |||
| tol | |||
| shrinking | |||
| c | |||
| max-iter | |||
| coef-0 | |||
| cache-size | |||
| verbose | |||
| epsilon | |||
| predict-proba? |
24.3.49 /theil-sen-regressor
| name | type | default | description |
|---|---|---|---|
| fit-intercept | |||
| max-iter | |||
| max-subpopulation | |||
| n-jobs | |||
| n-subsamples | |||
| random-state | |||
| tol | |||
| verbose | |||
| predict-proba? |
24.3.50 /transformed-target-regressor
| name | type | default | description |
|---|---|---|---|
| check-inverse | |||
| func | |||
| inverse-func | |||
| regressor | |||
| transformer | |||
| predict-proba? |
24.3.51 /tweedie-regressor
| name | type | default | description |
|---|---|---|---|
| tol | |||
| solver | |||
| power | |||
| max-iter | |||
| link | |||
| fit-intercept | |||
| alpha | |||
| warm-start | |||
| verbose | |||
| predict-proba? |