26 Tribuo reference - DRAFT 🛠
The following is a refeference for all Tribuo trainers. They can be used as the model specification in ml/train
on the :type
of the tribuo trainer.
comment
(
(ml/train
ds:model-type :scicloj.ml.tribuo/classification
{:tribuo-components [{:name "random-forest"
:type "org.tribuo.classification.dtree.CARTClassificationTrainer"
:properties {:maxDepth "8"
:useRandomSplitPoints "false"
:fractionFeaturesInSplit "0.5"}}]
:tribuo-trainer-name "random-forest"}))
There is also a reference to all non-trainer compotents of Tribuo. These could also be potentiall used in Tribuo model specs.
26.1 Tribuo trainer reference
26.1.0.1 o..t…classification.baseline.DummyClassifierTrainer
javadocThe DummyClassifier predicts a value, using a ‘dummy’ algorithm
"It can for example always predict a :CONSTANT value") (kind/md
It can for example always predict a :CONSTANT value
def df
(-> (tc/dataset {:a [1 2], :target [:x :x]})
(:target))) (ds-mod/set-inference-target
(kind/table df)
a | target |
---|---|
1 | x |
2 | x |
def model
(
(ml/train
df:model-type :scicloj.ml.tribuo/classification,
{:tribuo-components
:name "dummy",
[{:type
"org.tribuo.classification.baseline.DummyClassifierTrainer",
:properties {:dummyType :CONSTANT, :constantLabel "c"}}],
:tribuo-trainer-name "dummy"}))
‘c’ in this case:
(ml/predict df model)
_unnamed [2 1]:
:target |
---|
:c |
:c |
All configurable options for org.tribuo.classification.baseline.DummyClassifierTrainer:
name | description | type | default |
---|---|---|---|
constantLabel | Label to use for the constant classifier. | class java.lang.String | |
dummyType | Type of dummy classifier. | class org.tribuo.classification.baseline.DummyClassifierTrainer$DummyType | |
seed | Seed for the RNG. | long | 1 |
26.1.0.2 o..t…classification.dtree.CARTClassificationTrainer
javadocAll configurable options for org.tribuo.classification.dtree.CARTClassificationTrainer:
name | description | type | default |
---|---|---|---|
fractionFeaturesInSplit | The fraction of features to consider in each split. 1.0f indicates all features are considered. | float | 1.0 |
impurity | The impurity measure used to determine split quality. | interface org.tribuo.classification.dtree.impurity.LabelImpurity | GiniIndex |
maxDepth | The maximum depth of the tree. | int | 2147483647 |
minChildWeight | The minimum weight allowed in a child node. | float | 5.0 |
minImpurityDecrease | The decrease in impurity needed in order to split the node. | float | 0.0 |
seed | The RNG seed to use when sampling features in a split. | long | 12345 |
useRandomSplitPoints | Whether to choose split points for features at random. | boolean | false |
26.1.0.3 o..t…classification.ensemble.AdaBoostTrainer
javadocAll configurable options for org.tribuo.classification.ensemble.AdaBoostTrainer:
name | description | type | default |
---|---|---|---|
innerTrainer | The trainer to use to build each weak learner. | org.tribuo.Trainer<org.tribuo.classification.Label> | |
numMembers | The number of ensemble members to train. | int | 0 |
seed | The seed for the RNG. | long | 0 |
26.1.0.4 o..t…classification.liblinear.LibLinearClassificationTrainer
javadocAll configurable options for org.tribuo.classification.liblinear.LibLinearClassificationTrainer:
name | description | type | default |
---|---|---|---|
cost | Cost penalty for misclassifications. | double | 1.0 |
epsilon | Epsilon insensitivity in the regression cost function. | double | 0.1 |
labelWeights | Use Label specific weights. | java.util.Map<java.lang.String, java.lang.Float> | {} |
maxIterations | Maximum number of iterations before terminating. | int | 1000 |
seed | RNG seed. | long | 12345 |
terminationCriterion | Stop iterating when the loss score decreases by less than this value. | double | 0.1 |
trainerType | Algorithm to use. | org.tribuo.common.liblinear.LibLinearType |
org.tribuo.classification.liblinear.LinearClassificationType@483a6cf1 |
26.1.0.5 o..t…classification.libsvm.LibSVMClassificationTrainer
javadocAll configurable options for org.tribuo.classification.libsvm.LibSVMClassificationTrainer:
name | description | type | default |
---|---|---|---|
cache_size | Internal cache size, most of the time should be left at default. | double | 500.0 |
coef0 | Polynomial coefficient or shift in sigmoid kernel. | double | 0.0 |
cost | Cost parameter for incorrect predictions. | double | 1.0 |
degree | Polynomial degree. | int | 3 |
eps | Tolerance of the termination criterion. | double | 0.001 |
gamma | Width of the RBF kernel, or scalar on sigmoid kernel. | double | 0.0 |
kernelType | Type of Kernel. | class org.tribuo.common.libsvm.KernelType | LINEAR |
labelWeights | Use Label specific weights. | java.util.Map<java.lang.String, java.lang.Float> | {} |
nu | nu value in NU SVM. | double | 0.5 |
p | Epsilon in EPSILON_SVR. | double | 0.001 |
probability | Generate probability estimates. | boolean | false |
seed | RNG seed. | long | 12345 |
shrinking | Regularise the weight parameters. | boolean | true |
svmType | Type of SVM algorithm. | org.tribuo.common.libsvm.SVMType |
26.1.0.6 o..t…classification.sgd.fm.FMClassificationTrainer
javadocAll configurable options for org.tribuo.classification.sgd.fm.FMClassificationTrainer:
name | description | type | default |
---|---|---|---|
epochs | The number of gradient descent epochs. | int | 5 |
factorizedDimSize | The size of the factorized feature representation. | int | 0 |
loggingInterval | Log values after this many updates. | int | -1 |
minibatchSize | Minibatch size in SGD. | int | 1 |
objective | The classification objective function to use. | interface org.tribuo.classification.sgd.LabelObjective | LogMulticlass |
optimiser | The gradient optimiser to use. | interface org.tribuo.math.StochasticGradientOptimiser | AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0) |
seed | Seed for the RNG used to shuffle elements. | long | 12345 |
shuffle | Shuffle the data before each epoch. Only turn off for debugging. | boolean | true |
variance | The variance of the initializer. | double | 0.0 |
26.1.0.7 o..t…classification.sgd.kernel.KernelSVMTrainer
javadocAll configurable options for org.tribuo.classification.sgd.kernel.KernelSVMTrainer:
name | description | type | default |
---|---|---|---|
epochs | Number of SGD epochs. | int | 5 |
kernel | SVM kernel. | interface org.tribuo.math.kernel.Kernel | |
lambda | Step size. | double | 0.0 |
loggingInterval | Log values after this many updates. | int | -1 |
seed | Seed for the RNG used to shuffle elements. | long | 0 |
shuffle | Shuffle the data before each epoch. Only turn off for debugging. | boolean | true |
26.1.0.8 o..t…classification.sgd.linear.LinearSGDTrainer
javadocAll configurable options for org.tribuo.classification.sgd.linear.LinearSGDTrainer:
name | description | type | default |
---|---|---|---|
epochs | The number of gradient descent epochs. | int | 5 |
loggingInterval | Log values after this many updates. | int | -1 |
minibatchSize | Minibatch size in SGD. | int | 1 |
objective | The classification objective function to use. | interface org.tribuo.classification.sgd.LabelObjective | LogMulticlass |
optimiser | The gradient optimiser to use. | interface org.tribuo.math.StochasticGradientOptimiser | AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0) |
seed | Seed for the RNG used to shuffle elements. | long | 12345 |
shuffle | Shuffle the data before each epoch. Only turn off for debugging. | boolean | true |
26.1.0.9 o..t…classification.sgd.linear.LogisticRegressionTrainer
javadocAll configurable options for org.tribuo.classification.sgd.linear.LogisticRegressionTrainer:
name | description | type | default |
---|---|---|---|
epochs | The number of gradient descent epochs. | int | 5 |
loggingInterval | Log values after this many updates. | int | 1000 |
minibatchSize | Minibatch size in SGD. | int | 1 |
objective | The classification objective function to use. | interface org.tribuo.classification.sgd.LabelObjective | LogMulticlass |
optimiser | The gradient optimiser to use. | interface org.tribuo.math.StochasticGradientOptimiser | AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0) |
seed | Seed for the RNG used to shuffle elements. | long | 12345 |
shuffle | Shuffle the data before each epoch. Only turn off for debugging. | boolean | true |
26.1.0.10 o..t…classification.xgboost.XGBoostClassificationTrainer
javadocAll configurable options for org.tribuo.classification.xgboost.XGBoostClassificationTrainer:
name | description | type | default |
---|---|---|---|
alpha | l1 regularisation term on the weights. | double | 1.0 |
booster | Type of the weak learner. | class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType | GBTREE |
eta | The learning rate, shrinks the new tree output to prevent overfitting. | double | 0.3 |
evalMetric | Evaluation metric to use. The default value is set based on the objective function, so this can be usually left blank. | class java.lang.String | |
featureSubsample | Independently subsample the features available for each node of each tree. | double | 1.0 |
gamma | Minimum loss reduction needed to split a tree node. | double | 0.0 |
lambda | l2 regularisation term on the weights. | double | 1.0 |
maxDepth | The maximum depth of any tree. | int | 6 |
minChildWeight | The minimum weight in each child node before a split is valid. | double | 1.0 |
nThread | The number of threads to use at training time. | int | 4 |
numTrees | The number of trees to build. | int | 0 |
overrideParameters | Override for parameters, if used must contain all the relevant parameters, including the objective | java.util.Map<java.lang.String, java.lang.String> | {} |
seed | The RNG seed. | long | 12345 |
silent | Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'. | int | 1 |
subsample | Independently subsample the examples for each tree. | double | 1.0 |
treeMethod | The tree building algorithm to use. | class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod | AUTO |
verbosity | Logging verbosity, 0 is silent, 3 is debug. | class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity | SILENT |
26.1.0.11 o..t…common.tree.ExtraTreesTrainer
javadocAll configurable options for org.tribuo.common.tree.ExtraTreesTrainer:
name | description | type | default |
---|---|---|---|
combiner | The combination function to aggregate each ensemble member's outputs. | org.tribuo.ensemble.EnsembleCombiner |
|
innerTrainer | The trainer to use for each ensemble member. | org.tribuo.Trainer |
|
numMembers | The number of ensemble members to train. | int | 0 |
seed | The seed for the RNG. | long | 0 |
26.1.0.12 o..t…common.tree.RandomForestTrainer
javadocAll configurable options for org.tribuo.common.tree.RandomForestTrainer:
name | description | type | default |
---|---|---|---|
combiner | The combination function to aggregate each ensemble member's outputs. | org.tribuo.ensemble.EnsembleCombiner |
|
innerTrainer | The trainer to use for each ensemble member. | org.tribuo.Trainer |
|
numMembers | The number of ensemble members to train. | int | 0 |
seed | The seed for the RNG. | long | 0 |
26.1.0.13 o..t…ensemble.BaggingTrainer
javadocAll configurable options for org.tribuo.ensemble.BaggingTrainer:
name | description | type | default |
---|---|---|---|
combiner | The combination function to aggregate each ensemble member's outputs. | org.tribuo.ensemble.EnsembleCombiner |
|
innerTrainer | The trainer to use for each ensemble member. | org.tribuo.Trainer |
|
numMembers | The number of ensemble members to train. | int | 0 |
seed | The seed for the RNG. | long | 0 |
26.1.0.14 o..t…hash.HashingTrainer
javadocAll configurable options for org.tribuo.hash.HashingTrainer:
name | description | type | default |
---|---|---|---|
hasher | Feature hashing function to use. | class org.tribuo.hash.Hasher | |
innerTrainer | Trainer to use. | org.tribuo.Trainer |
26.1.0.15 o..t…regression.baseline.DummyRegressionTrainer
javadocAll configurable options for org.tribuo.regression.baseline.DummyRegressionTrainer:
name | description | type | default |
---|---|---|---|
constantValue | Constant value to use for the constant regressor. | double | NaN |
dummyType | Type of dummy regressor. | class org.tribuo.regression.baseline.DummyRegressionTrainer$DummyType | |
quartile | Quartile to use. | double | NaN |
seed | The seed for the RNG. | long | 1 |
26.1.0.16 o..t…regression.liblinear.LibLinearRegressionTrainer
javadocAll configurable options for org.tribuo.regression.liblinear.LibLinearRegressionTrainer:
name | description | type | default |
---|---|---|---|
cost | Cost penalty for misclassifications. | double | 1.0 |
epsilon | Epsilon insensitivity in the regression cost function. | double | 0.1 |
maxIterations | Maximum number of iterations before terminating. | int | 1000 |
seed | RNG seed. | long | 12345 |
terminationCriterion | Stop iterating when the loss score decreases by less than this value. | double | 0.1 |
trainerType | Algorithm to use. | org.tribuo.common.liblinear.LibLinearType |
org.tribuo.regression.liblinear.LinearRegressionType@197745f4 |
26.1.0.17 o..t…regression.libsvm.LibSVMRegressionTrainer
javadocAll configurable options for org.tribuo.regression.libsvm.LibSVMRegressionTrainer:
name | description | type | default |
---|---|---|---|
cache_size | Internal cache size, most of the time should be left at default. | double | 500.0 |
coef0 | Polynomial coefficient or shift in sigmoid kernel. | double | 0.0 |
cost | Cost parameter for incorrect predictions. | double | 1.0 |
degree | Polynomial degree. | int | 3 |
eps | Tolerance of the termination criterion. | double | 0.001 |
gamma | Width of the RBF kernel, or scalar on sigmoid kernel. | double | 0.0 |
kernelType | Type of Kernel. | class org.tribuo.common.libsvm.KernelType | LINEAR |
nu | nu value in NU SVM. | double | 0.5 |
p | Epsilon in EPSILON_SVR. | double | 0.001 |
probability | Generate probability estimates. | boolean | false |
seed | RNG seed. | long | 12345 |
shrinking | Regularise the weight parameters. | boolean | true |
standardize | Standardise the regression outputs before training. | boolean | false |
svmType | Type of SVM algorithm. | org.tribuo.common.libsvm.SVMType |
26.1.0.18 o..t…regression.rtree.CARTJointRegressionTrainer
javadocAll configurable options for org.tribuo.regression.rtree.CARTJointRegressionTrainer:
name | description | type | default |
---|---|---|---|
fractionFeaturesInSplit | The fraction of features to consider in each split. 1.0f indicates all features are considered. | float | 1.0 |
impurity | The regression impurity to use. | interface org.tribuo.regression.rtree.impurity.RegressorImpurity | MeanSquaredError |
maxDepth | The maximum depth of the tree. | int | 2147483647 |
minChildWeight | The minimum weight allowed in a child node. | float | 5.0 |
minImpurityDecrease | The decrease in impurity needed in order to split the node. | float | 0.0 |
normalize | Normalize the output of each leaf so it sums to one. | boolean | false |
seed | The RNG seed to use when sampling features in a split. | long | 12345 |
useRandomSplitPoints | Whether to choose split points for features at random. | boolean | false |
26.1.0.19 o..t…regression.rtree.CARTRegressionTrainer
javadocAll configurable options for org.tribuo.regression.rtree.CARTRegressionTrainer:
name | description | type | default |
---|---|---|---|
fractionFeaturesInSplit | The fraction of features to consider in each split. 1.0f indicates all features are considered. | float | 1.0 |
impurity | Regression impurity measure used to determine split quality. | interface org.tribuo.regression.rtree.impurity.RegressorImpurity | MeanSquaredError |
maxDepth | The maximum depth of the tree. | int | 2147483647 |
minChildWeight | The minimum weight allowed in a child node. | float | 5.0 |
minImpurityDecrease | The decrease in impurity needed in order to split the node. | float | 0.0 |
seed | The RNG seed to use when sampling features in a split. | long | 12345 |
useRandomSplitPoints | Whether to choose split points for features at random. | boolean | false |
26.1.0.20 o..t…regression.sgd.fm.FMRegressionTrainer
javadocAll configurable options for org.tribuo.regression.sgd.fm.FMRegressionTrainer:
name | description | type | default |
---|---|---|---|
epochs | The number of gradient descent epochs. | int | 5 |
factorizedDimSize | The size of the factorized feature representation. | int | 0 |
loggingInterval | Log values after this many updates. | int | -1 |
minibatchSize | Minibatch size in SGD. | int | 1 |
objective | The regression objective to use. | interface org.tribuo.regression.sgd.RegressionObjective | |
optimiser | The gradient optimiser to use. | interface org.tribuo.math.StochasticGradientOptimiser | AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0) |
seed | Seed for the RNG used to shuffle elements. | long | 12345 |
shuffle | Shuffle the data before each epoch. Only turn off for debugging. | boolean | true |
standardise | Standardise the output variables before fitting the model. | boolean | false |
variance | The variance of the initializer. | double | 0.0 |
26.1.0.21 o..t…regression.sgd.linear.LinearSGDTrainer
javadocAll configurable options for org.tribuo.regression.sgd.linear.LinearSGDTrainer:
name | description | type | default |
---|---|---|---|
epochs | The number of gradient descent epochs. | int | 5 |
loggingInterval | Log values after this many updates. | int | -1 |
minibatchSize | Minibatch size in SGD. | int | 1 |
objective | The regression objective to use. | interface org.tribuo.regression.sgd.RegressionObjective | |
optimiser | The gradient optimiser to use. | interface org.tribuo.math.StochasticGradientOptimiser | AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0) |
seed | Seed for the RNG used to shuffle elements. | long | 12345 |
shuffle | Shuffle the data before each epoch. Only turn off for debugging. | boolean | true |
26.1.0.22 o..t…regression.xgboost.XGBoostRegressionTrainer
javadocAll configurable options for org.tribuo.regression.xgboost.XGBoostRegressionTrainer:
name | description | type | default |
---|---|---|---|
alpha | l1 regularisation term on the weights. | double | 1.0 |
booster | Type of the weak learner. | class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType | GBTREE |
eta | The learning rate, shrinks the new tree output to prevent overfitting. | double | 0.3 |
featureSubsample | Independently subsample the features available for each node of each tree. | double | 1.0 |
gamma | Minimum loss reduction needed to split a tree node. | double | 0.0 |
lambda | l2 regularisation term on the weights. | double | 1.0 |
maxDepth | The maximum depth of any tree. | int | 6 |
minChildWeight | The minimum weight in each child node before a split is valid. | double | 1.0 |
nThread | The number of threads to use at training time. | int | 4 |
numTrees | The number of trees to build. | int | 0 |
overrideParameters | Override for parameters, if used must contain all the relevant parameters, including the objective | java.util.Map<java.lang.String, java.lang.String> | {} |
rType | The type of regression. | class org.tribuo.regression.xgboost.XGBoostRegressionTrainer$RegressionType | LINEAR |
seed | The RNG seed. | long | 12345 |
silent | Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'. | int | 1 |
subsample | Independently subsample the examples for each tree. | double | 1.0 |
treeMethod | The tree building algorithm to use. | class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod | AUTO |
verbosity | Logging verbosity, 0 is silent, 3 is debug. | class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity | SILENT |
26.1.0.23 o..t…transform.TransformTrainer
javadocAll configurable options for org.tribuo.transform.TransformTrainer:
name | description | type | default |
---|---|---|---|
densify | Densify all the features before applying transformations. | boolean | false |
includeImplicitZeroFeatures | Include the implicit zeros in the transformation statistics collection | boolean | false |
innerTrainer | Trainer to use. | org.tribuo.Trainer |
|
transformations | Transformations to apply. | class org.tribuo.transform.TransformationMap |
26.2 Tribuo component reference
26.2.0.1 o..t…classification.example.CheckerboardDataSource
javadocAll configurable options for org.tribuo.classification.example.CheckerboardDataSource:
name | description | type | default |
---|---|---|---|
max | The maximum feature value. | double | 10.0 |
min | The minimum feature value. | double | 0.0 |
numSamples | Number of samples to generate. | int | 0 |
numSquares | The number of squares on each side. | int | 5 |
seed | RNG seed. | long | 0 |
26.2.0.2 o..t…classification.example.ConcentricCirclesDataSource
javadocAll configurable options for org.tribuo.classification.example.ConcentricCirclesDataSource:
name | description | type | default |
---|---|---|---|
classProportion | The proportion of the circle radius that forms class one. | double | 0.5 |
numSamples | Number of samples to generate. | int | 0 |
radius | The radius of the outer circle. | double | 2.0 |
seed | RNG seed. | long | 0 |
26.2.0.3 o..t…classification.example.GaussianLabelDataSource
javadocAll configurable options for org.tribuo.classification.example.GaussianLabelDataSource:
name | description | type | default |
---|---|---|---|
firstCovarianceMatrix | 4 element covariance matrix of the first Gaussian. | class [D | |
firstMean | 2d mean of the first Gaussian. | class [D | |
numSamples | Number of samples to generate. | int | 0 |
secondCovarianceMatrix | 4 element covariance matrix of the second Gaussian. | class [D | |
secondMean | 2d mean of the second Gaussian. | class [D | |
seed | RNG seed. | long | 0 |
26.2.0.4 o..t…classification.example.InterlockingCrescentsDataSource
javadocAll configurable options for org.tribuo.classification.example.InterlockingCrescentsDataSource:
name | description | type | default |
---|---|---|---|
numSamples | Number of samples to generate. | int | 0 |
seed | RNG seed. | long | 0 |
26.2.0.5 o..t…classification.example.NoisyInterlockingCrescentsDataSource
javadocAll configurable options for org.tribuo.classification.example.NoisyInterlockingCrescentsDataSource:
name | description | type | default |
---|---|---|---|
numSamples | Number of samples to generate. | int | 0 |
seed | RNG seed. | long | 0 |
variance | Variance of the Gaussian noise | double | 0.1 |
26.2.0.6 o..t…classification.liblinear.LinearClassificationType
javadocAll configurable options for org.tribuo.classification.liblinear.LinearClassificationType:
name | description | type | default |
---|---|---|---|
type | The type of classification model | class org.tribuo.classification.liblinear.LinearClassificationType$LinearType |
26.2.0.7 o..t…classification.libsvm.SVMClassificationType
javadocAll configurable options for org.tribuo.classification.libsvm.SVMClassificationType:
name | description | type | default |
---|---|---|---|
type | The SVM classification algorithm to use. | class org.tribuo.classification.libsvm.SVMClassificationType$SVMMode |
26.2.0.8 o..t…classification.sequence.viterbi.DefaultFeatureExtractor
javadocAll configurable options for org.tribuo.classification.sequence.viterbi.DefaultFeatureExtractor:
name | description | type | default |
---|---|---|---|
leastRecentOutcome | Position of the least recent output to include. | int | 3 |
mostRecentOutcome | Position of the most recent outcome to include. | int | 1 |
use4gram | Use 4-grams of the labels as features. | boolean | false |
useBigram | Use bigrams of the labels as features. | boolean | true |
useTrigram | Use trigrams of the labels as features. | boolean | true |
26.2.0.9 o..t…classification.sgd.objectives.Hinge
javadocAll configurable options for org.tribuo.classification.sgd.objectives.Hinge:
name | description | type | default |
---|---|---|---|
margin | The classification margin. | double | 1.0 |
26.2.0.10 o..t…data.columnar.RowProcessor
javadocAll configurable options for org.tribuo.data.columnar.RowProcessor:
name | description | type | default |
---|---|---|---|
featureProcessors | A set of feature processors to apply after extraction. | java.util.Set<org.tribuo.data.columnar.FeatureProcessor> | [] |
fieldProcessorList | The list of field processors to use. | java.util.List<org.tribuo.data.columnar.FieldProcessor> | |
metadataExtractors | Extractors for the example metadata. | java.util.List<org.tribuo.data.columnar.FieldExtractor<?>> | [] |
regexMappingProcessors | A map from a regex to field processors to apply to fields matching the regex. | java.util.Map<java.lang.String, org.tribuo.data.columnar.FieldProcessor> | {} |
replaceNewlinesWithSpaces | Replace newlines with spaces in values before passing them to field processors. | boolean | true |
responseProcessor | Processor which extracts the response. | org.tribuo.data.columnar.ResponseProcessor |
|
weightExtractor | Extractor for the example weight. | org.tribuo.data.columnar.FieldExtractor<java.lang.Float> |
26.2.0.11 o..t…data.columnar.extractors.DateExtractor
javadocAll configurable options for org.tribuo.data.columnar.extractors.DateExtractor:
name | description | type | default |
---|---|---|---|
dateFormat | The expected date format. | class java.lang.String | |
fieldName | The field name to read. | class java.lang.String | |
localeCountry | Sets the locale country. | class java.lang.String | |
localeLanguage | Sets the locale language. | class java.lang.String | |
metadataName | The metadata key to emit, defaults to field name if unpopulated | class java.lang.String |
26.2.0.12 o..t…data.columnar.extractors.DoubleExtractor
javadocAll configurable options for org.tribuo.data.columnar.extractors.DoubleExtractor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
metadataName | The metadata key to emit, defaults to field name if unpopulated | class java.lang.String |
26.2.0.13 o..t…data.columnar.extractors.FloatExtractor
javadocAll configurable options for org.tribuo.data.columnar.extractors.FloatExtractor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
metadataName | The metadata key to emit, defaults to field name if unpopulated | class java.lang.String |
26.2.0.14 o..t…data.columnar.extractors.IdentityExtractor
javadocAll configurable options for org.tribuo.data.columnar.extractors.IdentityExtractor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
metadataName | The metadata key to emit, defaults to field name if unpopulated | class java.lang.String |
26.2.0.15 o..t…data.columnar.extractors.IndexExtractor
javadocAll configurable options for org.tribuo.data.columnar.extractors.IndexExtractor:
name | description | type | default |
---|---|---|---|
metadataName | The metadata key to emit, defaults to Example.NAME | class java.lang.String | name |
26.2.0.16 o..t…data.columnar.extractors.IntExtractor
javadocAll configurable options for org.tribuo.data.columnar.extractors.IntExtractor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
metadataName | The metadata key to emit, defaults to field name if unpopulated | class java.lang.String |
26.2.0.17 o..t…data.columnar.extractors.OffsetDateTimeExtractor
javadocAll configurable options for org.tribuo.data.columnar.extractors.OffsetDateTimeExtractor:
name | description | type | default |
---|---|---|---|
dateTimeFormat | The expected date format. | class java.lang.String | |
fieldName | The field name to read. | class java.lang.String | |
localeCountry | The locale country. | class java.lang.String | |
localeLanguage | The locale language. | class java.lang.String | |
metadataName | The metadata key to emit, defaults to field name if unpopulated | class java.lang.String |
26.2.0.18 o..t…data.columnar.processors.feature.UniqueProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.feature.UniqueProcessor:
name | description | type | default |
---|---|---|---|
reductionType | The operation to perform. | class org.tribuo.data.columnar.processors.feature.UniqueProcessor$UniqueType |
26.2.0.19 o..t…data.columnar.processors.field.DateFieldProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.field.DateFieldProcessor:
name | description | type | default |
---|---|---|---|
dateFormat | The expected date format. | class java.lang.String | |
featureTypes | The date features to extract. | java.util.EnumSet<org.tribuo.data.columnar.processors.field.DateFieldProcessor$DateFeatureType> | |
fieldName | The field name to read. | class java.lang.String | |
localeCountry | Sets the locale country. | class java.lang.String | US |
localeLanguage | Sets the locale language. | class java.lang.String | en |
26.2.0.20 o..t…data.columnar.processors.field.DoubleFieldProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.field.DoubleFieldProcessor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
onlyFieldName | Emit a feature using just the field name. | boolean | false |
throwOnInvalid | Throw NumberFormatException if the value failed to parse. | boolean | false |
26.2.0.21 o..t…data.columnar.processors.field.IdentityProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.field.IdentityProcessor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String |
26.2.0.22 o..t…data.columnar.processors.field.RegexFieldProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.field.RegexFieldProcessor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
modes | Matching mode. | java.util.EnumSet<org.tribuo.data.columnar.processors.field.RegexFieldProcessor$Mode> | |
regexString | Regex to apply to the field. | class java.lang.String |
26.2.0.23 o..t…data.columnar.processors.field.TextFieldProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.field.TextFieldProcessor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
pipeline | Text processing pipeline to use. | interface org.tribuo.data.text.TextPipeline |
26.2.0.24 o..t…data.columnar.processors.response.BinaryResponseProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.response.BinaryResponseProcessor:
name | description | type | default |
---|---|---|---|
displayField | Whether to display field names as part of the generated output, defaults to false | boolean | false |
fieldName | The field name to read, you should use only one of this or fieldNames | class java.lang.String | |
fieldNames | A list of field names to read, you should use only one of this or fieldName. | java.util.List<java.lang.String> | |
negativeName | The negative response to emit. | class java.lang.String | 0 |
outputFactory | Output factory to use to create the response. | org.tribuo.OutputFactory |
|
positiveName | The positive response to emit. | class java.lang.String | 1 |
positiveResponse | The string which triggers a positive response. | class java.lang.String | |
positiveResponses | A list of strings that trigger positive responses; it should be the same length as fieldNames or empty | java.util.List<java.lang.String> |
26.2.0.25 o..t…data.columnar.processors.response.EmptyResponseProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.response.EmptyResponseProcessor:
name | description | type | default |
---|---|---|---|
outputFactory | Output factory to type the columnar loader. | org.tribuo.OutputFactory |
26.2.0.26 o..t…data.columnar.processors.response.FieldResponseProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.response.FieldResponseProcessor:
name | description | type | default |
---|---|---|---|
defaultValue | Default value to return if one isn't found. | class java.lang.String | |
defaultValues | A list of default values to return if one isn't found, one for each field | java.util.List<java.lang.String> | |
displayField | Whether to display field names as part of the generated label, defaults to false | boolean | false |
fieldName | The field name to read. | class java.lang.String | |
fieldNames | A list of field names to read, you should use only one of this or fieldName. | java.util.List<java.lang.String> | |
outputFactory | The output factory to use. | org.tribuo.OutputFactory |
|
uppercase | Uppercase the value before converting to output. | boolean | true |
26.2.0.27 o..t…data.columnar.processors.response.Quartile
javadocAll configurable options for org.tribuo.data.columnar.processors.response.Quartile:
name | description | type | default |
---|---|---|---|
lowerMedian | The lower quartile value. | double | 0.0 |
median | The median value. | double | 0.0 |
upperMedian | The upper quartile value. | double | 0.0 |
26.2.0.28 o..t…data.columnar.processors.response.QuartileResponseProcessor
javadocAll configurable options for org.tribuo.data.columnar.processors.response.QuartileResponseProcessor:
name | description | type | default |
---|---|---|---|
fieldName | The field name to read. | class java.lang.String | |
fieldNames | A list of field names to read, you should use only one of this or fieldName. | java.util.List<java.lang.String> | |
name | The string to emit. | class java.lang.String | |
outputFactory | The output factory to use. | org.tribuo.OutputFactory |
|
quartile | The quartile to use. | class org.tribuo.data.columnar.processors.response.Quartile | |
quartiles | A list of quartiles to use, should have the same length as fieldNames | java.util.List<org.tribuo.data.columnar.processors.response.Quartile> |
26.2.0.29 o..t…data.csv.CSVDataSource
javadocAll configurable options for org.tribuo.data.csv.CSVDataSource:
name | description | type | default |
---|---|---|---|
dataPath | Path to the CSV file. | interface java.nio.file.Path | |
headers | The CSV headers. Should only be used if the csv file does not already contain headers. | java.util.List<java.lang.String> | [] |
outputFactory | The output factory to use. | org.tribuo.OutputFactory |
|
outputRequired | Is an output required from each row? | boolean | true |
quote | The CSV quote character. | char | " |
rowProcessor | The row processor to use. | org.tribuo.data.columnar.RowProcessor |
|
separator | The CSV separator character. | char | , |
26.2.0.30 o..t…data.csv.CSVSaver
javadocAll configurable options for org.tribuo.data.csv.CSVSaver:
name | description | type | default |
---|---|---|---|
quote | The quote character. | char | " |
separator | The column separator. | char | , |
26.2.0.31 o..t…data.sql.SQLDBConfig
javadocAll configurable options for org.tribuo.data.sql.SQLDBConfig:
name | description | type | default |
---|---|---|---|
connectionString | Connection string, including host, port and db. | class java.lang.String | |
db | Database name. | class java.lang.String | |
fetchSize | Size of batches to fetch from DB for queries | int | 1000 |
host | Hostname of the database machine. | class java.lang.String | |
password | Database password. | class java.lang.String | |
port | Port number. | class java.lang.String | |
propMap | Properties to pass to java.sql.DriverManager, username and password will be removed and populated to their fields. If specified both on the map and in the fields, the fields will be used | java.util.Map<java.lang.String, java.lang.String> | {} |
username | Database username. | class java.lang.String |
26.2.0.32 o..t…data.sql.SQLDataSource
javadocAll configurable options for org.tribuo.data.sql.SQLDataSource:
name | description | type | default |
---|---|---|---|
outputFactory | The output factory to use. | org.tribuo.OutputFactory |
|
outputRequired | Is an output required from each row? | boolean | true |
rowProcessor | The row processor to use. | org.tribuo.data.columnar.RowProcessor |
|
sqlConfig | Database configuration. | class org.tribuo.data.sql.SQLDBConfig | |
sqlString | SQL query to run. | class java.lang.String |
26.2.0.33 o..t…data.text.DirectoryFileSource
javadocAll configurable options for org.tribuo.data.text.DirectoryFileSource:
name | description | type | default |
---|---|---|---|
dataDir | The top-level directory containing the data set. | interface java.nio.file.Path | . |
extractor | The feature extractor that converts text into examples. | org.tribuo.data.text.TextFeatureExtractor |
|
outputFactory | The output factory to use. | org.tribuo.OutputFactory |
|
preprocessors | The preprocessors to apply to the input documents. | java.util.List<org.tribuo.data.text.DocumentPreprocessor> | [] |
26.2.0.34 o..t…data.text.impl.BasicPipeline
javadocAll configurable options for org.tribuo.data.text.impl.BasicPipeline:
name | description | type | default |
---|---|---|---|
ngram | n in the n-gram to emit. | int | 2 |
tokenizer | Tokenizer to use. | interface org.tribuo.util.tokens.Tokenizer |
26.2.0.35 o..t…data.text.impl.CasingPreprocessor
javadocAll configurable options for org.tribuo.data.text.impl.CasingPreprocessor:
name | description | type | default |
---|---|---|---|
op | Which casing operation to apply. | class org.tribuo.data.text.impl.CasingPreprocessor$CasingOperation | LOWERCASE |
26.2.0.36 o..t…data.text.impl.FeatureHasher
javadocAll configurable options for org.tribuo.data.text.impl.FeatureHasher:
name | description | type | default |
---|---|---|---|
dimension | Dimension to map the hash into. | int | 0 |
hashSeed | Seed used in the hash function. | int | 38495 |
preserveValue | Preserve input feature value. | boolean | false |
valueHashSeed | Seed used for value hash function. | int | 77777 |
26.2.0.37 o..t…data.text.impl.NgramProcessor
javadocAll configurable options for org.tribuo.data.text.impl.NgramProcessor:
name | description | type | default |
---|---|---|---|
n | n in the n-gram to emit. | int | 2 |
tokenizer | Tokenizer to use. | interface org.tribuo.util.tokens.Tokenizer | |
value | Value to emit for each n-gram. | double | 1.0 |
26.2.0.38 o..t…data.text.impl.RegexPreprocessor
javadocAll configurable options for org.tribuo.data.text.impl.RegexPreprocessor:
name | description | type | default |
---|---|---|---|
regexStrings | A list of regular expressions in string format used to match the input | java.util.List<java.lang.String> | |
replacements | A list of replacement strings which are used to replace the matches | java.util.List<java.lang.String> |
26.2.0.39 o..t…data.text.impl.SimpleStringDataSource
javadocAll configurable options for org.tribuo.data.text.impl.SimpleStringDataSource:
name | description | type | default |
---|---|---|---|
extractor | The feature extractor that generates Features from text. | org.tribuo.data.text.TextFeatureExtractor |
|
outputFactory | The factory that converts a String into an Output instance. | org.tribuo.OutputFactory |
|
path | The path to read the data from. | interface java.nio.file.Path | |
preprocessors | The document preprocessors to run on each document in the data source. | java.util.List<org.tribuo.data.text.DocumentPreprocessor> | [] |
rawLines | The input data lines. | java.util.List<java.lang.String> |
26.2.0.40 o..t…data.text.impl.SimpleTextDataSource
javadocAll configurable options for org.tribuo.data.text.impl.SimpleTextDataSource:
name | description | type | default |
---|---|---|---|
extractor | The feature extractor that generates Features from text. | org.tribuo.data.text.TextFeatureExtractor |
|
outputFactory | The factory that converts a String into an Output instance. | org.tribuo.OutputFactory |
|
path | The path to read the data from. | interface java.nio.file.Path | |
preprocessors | The document preprocessors to run on each document in the data source. | java.util.List<org.tribuo.data.text.DocumentPreprocessor> | [] |
26.2.0.41 o..t…data.text.impl.TextFeatureExtractorImpl
javadocAll configurable options for org.tribuo.data.text.impl.TextFeatureExtractorImpl:
name | description | type | default |
---|---|---|---|
pipeline | The text processing pipeline. | interface org.tribuo.data.text.TextPipeline |
26.2.0.42 o..t…data.text.impl.TokenPipeline
javadocAll configurable options for org.tribuo.data.text.impl.TokenPipeline:
name | description | type | default |
---|---|---|---|
hashDim | Dimension to map the hash into. | int | -1 |
hashPreserveValue | Should feature hashing preserve the value? | boolean | true |
ngram | n in the n-gram to emit. | int | 2 |
termCounting | Use term counting, otherwise emit binary features. | boolean | false |
tokenizer | Tokenizer to use. | interface org.tribuo.util.tokens.Tokenizer |
26.2.0.43 o..t…data.text.impl.UniqueAggregator
javadocAll configurable options for org.tribuo.data.text.impl.UniqueAggregator:
name | description | type | default |
---|---|---|---|
value | Value to emit, if unset emits the last value observed for that token. | double | NaN |
26.2.0.44 o..t…datasource.IDXDataSource
javadocAll configurable options for org.tribuo.datasource.IDXDataSource:
name | description | type | default |
---|---|---|---|
featuresPath | Path to load the features from. | interface java.nio.file.Path | |
outputFactory | The output factory to use. | org.tribuo.OutputFactory |
|
outputPath | Path to load the features from. | interface java.nio.file.Path |
26.2.0.45 o..t…datasource.LibSVMDataSource
javadocAll configurable options for org.tribuo.datasource.LibSVMDataSource:
name | description | type | default |
---|---|---|---|
maxFeatureID | Sets the maximum feature id to load from the file. | int | -2147483648 |
outputFactory | The output factory to use. | org.tribuo.OutputFactory |
|
path | Path to load the data from. Either this or url must be set. | interface java.nio.file.Path | |
url | URL to load the data from. Either this or path must be set. | class java.net.URL | |
zeroIndexed | Set to true if the features are zero indexed. | boolean | false |
26.2.0.46 o..t…hash.HashCodeHasher
javadocAll configurable options for org.tribuo.hash.HashCodeHasher:
name | description | type | default |
---|---|---|---|
salt | Salt used in the hash. | class java.lang.String |
26.2.0.47 o..t…hash.MessageDigestHasher
javadocAll configurable options for org.tribuo.hash.MessageDigestHasher:
name | description | type | default |
---|---|---|---|
hashType | MessageDigest hashing function. | class java.lang.String | |
saltStr | Salt used in the hash. | class java.lang.String |
26.2.0.48 o..t…hash.ModHashCodeHasher
javadocAll configurable options for org.tribuo.hash.ModHashCodeHasher:
name | description | type | default |
---|---|---|---|
dimension | Range of the hashing function. | int | 100 |
salt | Salt used in the hash. | class java.lang.String |
26.2.0.49 o..t…math.kernel.Polynomial
javadocAll configurable options for org.tribuo.math.kernel.Polynomial:
name | description | type | default |
---|---|---|---|
degree | Degree of the polynomial. | double | 0.0 |
gamma | Coefficient to multiply the dot product by. | double | 0.0 |
intercept | Scalar to add to the dot product. | double | 0.0 |
26.2.0.50 o..t…math.kernel.RBF
javadocAll configurable options for org.tribuo.math.kernel.RBF:
name | description | type | default |
---|---|---|---|
gamma | Kernel output = exp(-gamma*|u-v|^2). | double | 0.0 |
26.2.0.51 o..t…math.kernel.Sigmoid
javadocAll configurable options for org.tribuo.math.kernel.Sigmoid:
name | description | type | default |
---|---|---|---|
gamma | Coefficient to multiply the dot product by. | double | 0.0 |
intercept | Scalar intercept to add to the dot product. | double | 0.0 |
26.2.0.52 o..t…math.neighbour.bruteforce.NeighboursBruteForceFactory
javadocAll configurable options for org.tribuo.math.neighbour.bruteforce.NeighboursBruteForceFactory:
name | description | type | default |
---|---|---|---|
distance | The distance function to use. | interface org.tribuo.math.distance.Distance | L2Distance() |
numThreads | The number of threads to use for training. | int | 1 |
26.2.0.53 o..t…math.neighbour.kdtree.KDTreeFactory
javadocAll configurable options for org.tribuo.math.neighbour.kdtree.KDTreeFactory:
name | description | type | default |
---|---|---|---|
distance | The distance function to use. | interface org.tribuo.math.distance.Distance | L2Distance() |
numThreads | The number of threads to use for training. | int | 1 |
26.2.0.54 o..t…math.optimisers.AdaDelta
javadocAll configurable options for org.tribuo.math.optimisers.AdaDelta:
name | description | type | default |
---|---|---|---|
epsilon | Epsilon for numerical stability. | double | 1.0E-6 |
rho | Momentum value. | double | 0.95 |
26.2.0.55 o..t…math.optimisers.AdaGrad
javadocAll configurable options for org.tribuo.math.optimisers.AdaGrad:
name | description | type | default |
---|---|---|---|
epsilon | Epsilon for numerical stability around zero. | double | 1.0E-6 |
initialLearningRate | Initial learning rate used to scale the gradients. | double | 0.0 |
initialValue | Initial value for the gradient accumulator. | double | 0.0 |
26.2.0.56 o..t…math.optimisers.AdaGradRDA
javadocAll configurable options for org.tribuo.math.optimisers.AdaGradRDA:
name | description | type | default |
---|---|---|---|
epsilon | Epsilon for numerical stability around zero. | double | 1.0E-6 |
initialLearningRate | Initial learning rate used to scale the gradients. | double | 0.0 |
l1 | l1 regularization penalty. | double | 0.0 |
l2 | l2 regularization penalty. | double | 0.0 |
numExamples | Number of examples to scale the l1 and l2 penalties by. | int | 1 |
26.2.0.57 o..t…math.optimisers.Adam
javadocAll configurable options for org.tribuo.math.optimisers.Adam:
name | description | type | default |
---|---|---|---|
betaOne | The beta one parameter. | double | 0.9 |
betaTwo | The beta two parameter. | double | 0.999 |
epsilon | Epsilon for numerical stability. | double | 1.0E-6 |
initialLearningRate | Learning rate to scale the gradients by. | double | 0.001 |
26.2.0.58 o..t…math.optimisers.LinearDecaySGD
javadocAll configurable options for org.tribuo.math.optimisers.LinearDecaySGD:
name | description | type | default |
---|---|---|---|
initialLearningRate | Initial learning rate. | double | 0.0 |
rho | Momentum scaling factor. | double | 0.0 |
useMomentum | Momentum type to use. | class org.tribuo.math.optimisers.SGD$Momentum |
26.2.0.59 o..t…math.optimisers.ParameterAveraging
javadocAll configurable options for org.tribuo.math.optimisers.ParameterAveraging:
name | description | type | default |
---|---|---|---|
optimiser | Inner optimiser to average parameters across. | interface org.tribuo.math.StochasticGradientOptimiser |
26.2.0.60 o..t…math.optimisers.Pegasos
javadocAll configurable options for org.tribuo.math.optimisers.Pegasos:
name | description | type | default |
---|---|---|---|
baseRate | Base learning rate. | double | 0.1 |
lambda | Step size shrinkage. | double | 0.01 |
26.2.0.61 o..t…math.optimisers.RMSProp
javadocAll configurable options for org.tribuo.math.optimisers.RMSProp:
name | description | type | default |
---|---|---|---|
decay | Decay factor for the momentum. | double | 0.0 |
epsilon | Epsilon for numerical stability. | double | 1.0E-8 |
initialLearningRate | Learning rate to scale the gradients by. | double | 0.0 |
rho | Momentum parameter. | double | 0.9 |
26.2.0.62 o..t…math.optimisers.SimpleSGD
javadocAll configurable options for org.tribuo.math.optimisers.SimpleSGD:
name | description | type | default |
---|---|---|---|
initialLearningRate | Initial learning rate. | double | 0.0 |
rho | Momentum scaling factor. | double | 0.0 |
useMomentum | Momentum type to use. | class org.tribuo.math.optimisers.SGD$Momentum |
26.2.0.63 o..t…math.optimisers.SqrtDecaySGD
javadocAll configurable options for org.tribuo.math.optimisers.SqrtDecaySGD:
name | description | type | default |
---|---|---|---|
initialLearningRate | Initial learning rate. | double | 0.0 |
rho | Momentum scaling factor. | double | 0.0 |
useMomentum | Momentum type to use. | class org.tribuo.math.optimisers.SGD$Momentum |
26.2.0.64 o..t…regression.RegressionFactory
javadocAll configurable options for org.tribuo.regression.RegressionFactory:
name | description | type | default |
---|---|---|---|
splitChar | The character to split the dimensions on. | char | , |
26.2.0.65 o..t…regression.example.GaussianDataSource
javadocAll configurable options for org.tribuo.regression.example.GaussianDataSource:
name | description | type | default |
---|---|---|---|
intercept | The y-intercept of the line. | float | 0.0 |
numSamples | The number of samples to draw. | int | 0 |
seed | The RNG seed. | long | 12345 |
slope | The slope of the line. | float | 0.0 |
variance | The variance of the gaussian. | float | 1.0 |
xMax | The maximum feature value. | float | 0.0 |
xMin | The minimum feature value. | float | 0.0 |
26.2.0.66 o..t…regression.example.NonlinearGaussianDataSource
javadocAll configurable options for org.tribuo.regression.example.NonlinearGaussianDataSource:
name | description | type | default |
---|---|---|---|
intercept | The y-intercept of the line. | float | 0.0 |
numSamples | The number of samples to draw. | int | 0 |
seed | The RNG seed. | long | 12345 |
variance | The variance of the noise gaussian. | float | 1.0 |
weights | The feature weights. Must be a 4 element array. | class [F | [F@5072ff49 |
xOneMax | The maximum value of x_1. | float | 2.0 |
xOneMin | The minimum value of x_1. | float | -2.0 |
xZeroMax | The maximum value of x_0. | float | 2.0 |
xZeroMin | The minimum value of x_0. | float | -2.0 |
26.2.0.67 o..t…regression.liblinear.LinearRegressionType
javadocAll configurable options for org.tribuo.regression.liblinear.LinearRegressionType:
name | description | type | default |
---|---|---|---|
type | The type of regression algorithm. | class org.tribuo.regression.liblinear.LinearRegressionType$LinearType |
26.2.0.68 o..t…regression.libsvm.SVMRegressionType
javadocAll configurable options for org.tribuo.regression.libsvm.SVMRegressionType:
name | description | type | default |
---|---|---|---|
type | The SVM regression algorithm to use. | class org.tribuo.regression.libsvm.SVMRegressionType$SVMMode |
26.2.0.69 o..t…regression.sgd.objectives.Huber
javadocAll configurable options for org.tribuo.regression.sgd.objectives.Huber:
name | description | type | default |
---|---|---|---|
cost | Cost beyond which the loss function is linear. | double | 5.0 |
26.2.0.70 o..t…transform.TransformationMap
javadocAll configurable options for org.tribuo.transform.TransformationMap:
name | description | type | default |
---|---|---|---|
featureTransformationList | Feature specific transformations. Accepts regexes for feature names. | java.util.Map<java.lang.String, org.tribuo.transform.TransformationMap$TransformationList> | {} |
globalTransformations | Global transformations to apply after the feature specific transforms. | java.util.List<org.tribuo.transform.Transformation> |
26.2.0.71 o..t…transform.TransformationMap$TransformationList
javadocAll configurable options for org.tribuo.transform.TransformationMap$TransformationList:
name | description | type | default |
---|---|---|---|
list | A list of transformations to apply. | java.util.List<org.tribuo.transform.Transformation> |
26.2.0.72 o..t…transform.transformations.BinningTransformation
javadocAll configurable options for org.tribuo.transform.transformations.BinningTransformation:
name | description | type | default |
---|---|---|---|
numBins | Number of bins. | int | 0 |
type | Binning algorithm to use. | class org.tribuo.transform.transformations.BinningTransformation$BinningType |
26.2.0.73 o..t…transform.transformations.LinearScalingTransformation
javadocAll configurable options for org.tribuo.transform.transformations.LinearScalingTransformation:
name | description | type | default |
---|---|---|---|
targetMax | Maximum value after transformation. | double | 1.0 |
targetMin | Minimum value after transformation. | double | 0.0 |
26.2.0.74 o..t…transform.transformations.MeanStdDevTransformation
javadocAll configurable options for org.tribuo.transform.transformations.MeanStdDevTransformation:
name | description | type | default |
---|---|---|---|
targetMean | Mean value after transformation. | double | 0.0 |
targetStdDev | Standard deviation after transformation. | double | 1.0 |
26.2.0.75 o..t…transform.transformations.SimpleTransform
javadocAll configurable options for org.tribuo.transform.transformations.SimpleTransform:
name | description | type | default |
---|---|---|---|
op | Type of the simple transformation. | class org.tribuo.transform.transformations.SimpleTransform$Operation | |
operand | Operand (if required). | double | NaN |
secondOperand | Second operand (if required). | double | NaN |
26.2.0.76 o..t…util.tokens.impl.BreakIteratorTokenizer
javadocAll configurable options for org.tribuo.util.tokens.impl.BreakIteratorTokenizer:
name | description | type | default |
---|---|---|---|
localeStr | The locale language tag string. | class java.lang.String |
26.2.0.77 o..t…util.tokens.impl.SplitCharactersTokenizer
javadocAll configurable options for org.tribuo.util.tokens.impl.SplitCharactersTokenizer:
name | description | type | default |
---|---|---|---|
splitCharacters | The characters to split on. | class [C | [C@63a57430 |
splitXDigitsCharacters | The characters to split on unless we're in a number. | class [C | [C@6f2ee356 |
26.2.0.78 o..t…util.tokens.impl.SplitPatternTokenizer
javadocAll configurable options for org.tribuo.util.tokens.impl.SplitPatternTokenizer:
name | description | type | default |
---|---|---|---|
splitPatternRegex | The regex to split with. | class java.lang.String | [\.,]?\s+ |
26.2.0.79 o..t…util.tokens.impl.wordpiece.Wordpiece
javadocAll configurable options for org.tribuo.util.tokens.impl.wordpiece.Wordpiece:
name | description | type | default |
---|---|---|---|
maxInputCharactersPerWord | the maximum number of characters per word to consider. This helps eliminate doing extra work on pathological cases. | int | 100 |
unknownToken | the value to use for 'UNKNOWN' tokens. Defaults to '[UNK]' which is a common default in BERT-based solutions. | class java.lang.String | [UNK] |
vocabPath | path to a vocabulary data file. | class java.lang.String |
26.2.0.80 o..t…util.tokens.impl.wordpiece.WordpieceBasicTokenizer
javadocAll configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer:
name | description | type | default |
---|---|---|---|
tokenizeChineseChars | split on Chinese tokens? | boolean | true |
26.2.0.81 o..t…util.tokens.impl.wordpiece.WordpieceTokenizer
javadocAll configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceTokenizer:
name | description | type | default |
---|---|---|---|
basicTokenizer | performs some tokenization work on the input text before the wordpiece algorithm is applied to each resulting token. | interface org.tribuo.util.tokens.Tokenizer | org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer@7dce5263 |
neverSplitTokens | a set of 'token' strings that should never be split regardless of whether they have e.g., punctuation in the middle. No entries should have whitespace in them. | java.util.Set<java.lang.String> | [] |
stripAccents | determines whether or not to strip accents/diacritics from the input text | boolean | true |
toLowerCase | determines whether or not to lowercase the input text | boolean | true |
whitespaceTokenizer | performs whitespace tokenization before 'basic' tokenizer is applied (see basicTokenizer) | interface org.tribuo.util.tokens.Tokenizer | org.tribuo.util.tokens.impl.WhitespaceTokenizer@65e239da |
wordpiece | an instance of Wordpiece which applies the 'wordpiece' algorithm | class org.tribuo.util.tokens.impl.wordpiece.Wordpiece |
26.2.0.82 o..t…util.tokens.universal.UniversalTokenizer
javadocAll configurable options for org.tribuo.util.tokens.universal.UniversalTokenizer:
name | description | type | default |
---|---|---|---|
sendPunct | Send punctuation through as tokens. | boolean | false |