26  Tribuo reference - DRAFT 🛠

The following is a refeference for all Tribuo trainers. They can be used as the model specification in ml/train on the :type of the tribuo trainer.

(comment
  (ml/train
   ds
   {:model-type :scicloj.ml.tribuo/classification
    :tribuo-components [{:name "random-forest"
                         :type "org.tribuo.classification.dtree.CARTClassificationTrainer"
                         :properties {:maxDepth "8"
                                      :useRandomSplitPoints "false"
                                      :fractionFeaturesInSplit "0.5"}}]
    :tribuo-trainer-name "random-forest"}))

There is also a reference to all non-trainer compotents of Tribuo. These could also be potentiall used in Tribuo model specs.

26.1 Tribuo trainer reference

26.1.0.1 o..t…classification.baseline.DummyClassifierTrainer

javadoc

The DummyClassifier predicts a value, using a ‘dummy’ algorithm

(kind/md "It can for example always predict a :CONSTANT value")

It can for example always predict a :CONSTANT value

(def df
 (-> (tc/dataset {:a [1 2], :target [:x :x]})
  (ds-mod/set-inference-target :target)))
(kind/table df)
a target
1 x
2 x
(def model
 (ml/train
   df
   {:model-type :scicloj.ml.tribuo/classification,
    :tribuo-components
    [{:name "dummy",
      :type
      "org.tribuo.classification.baseline.DummyClassifierTrainer",
      :properties {:dummyType :CONSTANT, :constantLabel "c"}}],
    :tribuo-trainer-name "dummy"}))

‘c’ in this case:

(ml/predict df model)

_unnamed [2 1]:

:target
:c
:c

All configurable options for org.tribuo.classification.baseline.DummyClassifierTrainer:

name description type default
constantLabel Label to use for the constant classifier. class java.lang.String
dummyType Type of dummy classifier. class org.tribuo.classification.baseline.DummyClassifierTrainer$DummyType
seed Seed for the RNG. long 1

26.1.0.2 o..t…classification.dtree.CARTClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.dtree.CARTClassificationTrainer:

name description type default
fractionFeaturesInSplit The fraction of features to consider in each split. 1.0f indicates all features are considered. float 1.0
impurity The impurity measure used to determine split quality. interface org.tribuo.classification.dtree.impurity.LabelImpurity GiniIndex
maxDepth The maximum depth of the tree. int 2147483647
minChildWeight The minimum weight allowed in a child node. float 5.0
minImpurityDecrease The decrease in impurity needed in order to split the node. float 0.0
seed The RNG seed to use when sampling features in a split. long 12345
useRandomSplitPoints Whether to choose split points for features at random. boolean false

26.1.0.3 o..t…classification.ensemble.AdaBoostTrainer

javadoc

All configurable options for org.tribuo.classification.ensemble.AdaBoostTrainer:

name description type default
innerTrainer The trainer to use to build each weak learner. org.tribuo.Trainer<org.tribuo.classification.Label>
numMembers The number of ensemble members to train. int 0
seed The seed for the RNG. long 0

26.1.0.4 o..t…classification.liblinear.LibLinearClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.liblinear.LibLinearClassificationTrainer:

name description type default
cost Cost penalty for misclassifications. double 1.0
epsilon Epsilon insensitivity in the regression cost function. double 0.1
labelWeights Use Label specific weights. java.util.Map<java.lang.String, java.lang.Float> {}
maxIterations Maximum number of iterations before terminating. int 1000
seed RNG seed. long 12345
terminationCriterion Stop iterating when the loss score decreases by less than this value. double 0.1
trainerType Algorithm to use. org.tribuo.common.liblinear.LibLinearType org.tribuo.classification.liblinear.LinearClassificationType@483a6cf1

26.1.0.5 o..t…classification.libsvm.LibSVMClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.libsvm.LibSVMClassificationTrainer:

name description type default
cache_size Internal cache size, most of the time should be left at default. double 500.0
coef0 Polynomial coefficient or shift in sigmoid kernel. double 0.0
cost Cost parameter for incorrect predictions. double 1.0
degree Polynomial degree. int 3
eps Tolerance of the termination criterion. double 0.001
gamma Width of the RBF kernel, or scalar on sigmoid kernel. double 0.0
kernelType Type of Kernel. class org.tribuo.common.libsvm.KernelType LINEAR
labelWeights Use Label specific weights. java.util.Map<java.lang.String, java.lang.Float> {}
nu nu value in NU SVM. double 0.5
p Epsilon in EPSILON_SVR. double 0.001
probability Generate probability estimates. boolean false
seed RNG seed. long 12345
shrinking Regularise the weight parameters. boolean true
svmType Type of SVM algorithm. org.tribuo.common.libsvm.SVMType

26.1.0.6 o..t…classification.sgd.fm.FMClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.fm.FMClassificationTrainer:

name description type default
epochs The number of gradient descent epochs. int 5
factorizedDimSize The size of the factorized feature representation. int 0
loggingInterval Log values after this many updates. int -1
minibatchSize Minibatch size in SGD. int 1
objective The classification objective function to use. interface org.tribuo.classification.sgd.LabelObjective LogMulticlass
optimiser The gradient optimiser to use. interface org.tribuo.math.StochasticGradientOptimiser AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements. long 12345
shuffle Shuffle the data before each epoch. Only turn off for debugging. boolean true
variance The variance of the initializer. double 0.0

26.1.0.7 o..t…classification.sgd.kernel.KernelSVMTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.kernel.KernelSVMTrainer:

name description type default
epochs Number of SGD epochs. int 5
kernel SVM kernel. interface org.tribuo.math.kernel.Kernel
lambda Step size. double 0.0
loggingInterval Log values after this many updates. int -1
seed Seed for the RNG used to shuffle elements. long 0
shuffle Shuffle the data before each epoch. Only turn off for debugging. boolean true

26.1.0.8 o..t…classification.sgd.linear.LinearSGDTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.linear.LinearSGDTrainer:

name description type default
epochs The number of gradient descent epochs. int 5
loggingInterval Log values after this many updates. int -1
minibatchSize Minibatch size in SGD. int 1
objective The classification objective function to use. interface org.tribuo.classification.sgd.LabelObjective LogMulticlass
optimiser The gradient optimiser to use. interface org.tribuo.math.StochasticGradientOptimiser AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements. long 12345
shuffle Shuffle the data before each epoch. Only turn off for debugging. boolean true

26.1.0.9 o..t…classification.sgd.linear.LogisticRegressionTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.linear.LogisticRegressionTrainer:

name description type default
epochs The number of gradient descent epochs. int 5
loggingInterval Log values after this many updates. int 1000
minibatchSize Minibatch size in SGD. int 1
objective The classification objective function to use. interface org.tribuo.classification.sgd.LabelObjective LogMulticlass
optimiser The gradient optimiser to use. interface org.tribuo.math.StochasticGradientOptimiser AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements. long 12345
shuffle Shuffle the data before each epoch. Only turn off for debugging. boolean true

26.1.0.10 o..t…classification.xgboost.XGBoostClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.xgboost.XGBoostClassificationTrainer:

name description type default
alpha l1 regularisation term on the weights. double 1.0
booster Type of the weak learner. class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType GBTREE
eta The learning rate, shrinks the new tree output to prevent overfitting. double 0.3
evalMetric Evaluation metric to use. The default value is set based on the objective function, so this can be usually left blank. class java.lang.String
featureSubsample Independently subsample the features available for each node of each tree. double 1.0
gamma Minimum loss reduction needed to split a tree node. double 0.0
lambda l2 regularisation term on the weights. double 1.0
maxDepth The maximum depth of any tree. int 6
minChildWeight The minimum weight in each child node before a split is valid. double 1.0
nThread The number of threads to use at training time. int 4
numTrees The number of trees to build. int 0
overrideParameters Override for parameters, if used must contain all the relevant parameters, including the objective java.util.Map<java.lang.String, java.lang.String> {}
seed The RNG seed. long 12345
silent Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'. int 1
subsample Independently subsample the examples for each tree. double 1.0
treeMethod The tree building algorithm to use. class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod AUTO
verbosity Logging verbosity, 0 is silent, 3 is debug. class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity SILENT

26.1.0.11 o..t…common.tree.ExtraTreesTrainer

javadoc

All configurable options for org.tribuo.common.tree.ExtraTreesTrainer:

name description type default
combiner The combination function to aggregate each ensemble member's outputs. org.tribuo.ensemble.EnsembleCombiner
innerTrainer The trainer to use for each ensemble member. org.tribuo.Trainer
numMembers The number of ensemble members to train. int 0
seed The seed for the RNG. long 0

26.1.0.12 o..t…common.tree.RandomForestTrainer

javadoc

All configurable options for org.tribuo.common.tree.RandomForestTrainer:

name description type default
combiner The combination function to aggregate each ensemble member's outputs. org.tribuo.ensemble.EnsembleCombiner
innerTrainer The trainer to use for each ensemble member. org.tribuo.Trainer
numMembers The number of ensemble members to train. int 0
seed The seed for the RNG. long 0

26.1.0.13 o..t…ensemble.BaggingTrainer

javadoc

All configurable options for org.tribuo.ensemble.BaggingTrainer:

name description type default
combiner The combination function to aggregate each ensemble member's outputs. org.tribuo.ensemble.EnsembleCombiner
innerTrainer The trainer to use for each ensemble member. org.tribuo.Trainer
numMembers The number of ensemble members to train. int 0
seed The seed for the RNG. long 0

26.1.0.14 o..t…hash.HashingTrainer

javadoc

All configurable options for org.tribuo.hash.HashingTrainer:

name description type default
hasher Feature hashing function to use. class org.tribuo.hash.Hasher
innerTrainer Trainer to use. org.tribuo.Trainer

26.1.0.15 o..t…regression.baseline.DummyRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.baseline.DummyRegressionTrainer:

name description type default
constantValue Constant value to use for the constant regressor. double NaN
dummyType Type of dummy regressor. class org.tribuo.regression.baseline.DummyRegressionTrainer$DummyType
quartile Quartile to use. double NaN
seed The seed for the RNG. long 1

26.1.0.16 o..t…regression.liblinear.LibLinearRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.liblinear.LibLinearRegressionTrainer:

name description type default
cost Cost penalty for misclassifications. double 1.0
epsilon Epsilon insensitivity in the regression cost function. double 0.1
maxIterations Maximum number of iterations before terminating. int 1000
seed RNG seed. long 12345
terminationCriterion Stop iterating when the loss score decreases by less than this value. double 0.1
trainerType Algorithm to use. org.tribuo.common.liblinear.LibLinearType org.tribuo.regression.liblinear.LinearRegressionType@197745f4

26.1.0.17 o..t…regression.libsvm.LibSVMRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.libsvm.LibSVMRegressionTrainer:

name description type default
cache_size Internal cache size, most of the time should be left at default. double 500.0
coef0 Polynomial coefficient or shift in sigmoid kernel. double 0.0
cost Cost parameter for incorrect predictions. double 1.0
degree Polynomial degree. int 3
eps Tolerance of the termination criterion. double 0.001
gamma Width of the RBF kernel, or scalar on sigmoid kernel. double 0.0
kernelType Type of Kernel. class org.tribuo.common.libsvm.KernelType LINEAR
nu nu value in NU SVM. double 0.5
p Epsilon in EPSILON_SVR. double 0.001
probability Generate probability estimates. boolean false
seed RNG seed. long 12345
shrinking Regularise the weight parameters. boolean true
standardize Standardise the regression outputs before training. boolean false
svmType Type of SVM algorithm. org.tribuo.common.libsvm.SVMType

26.1.0.18 o..t…regression.rtree.CARTJointRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.rtree.CARTJointRegressionTrainer:

name description type default
fractionFeaturesInSplit The fraction of features to consider in each split. 1.0f indicates all features are considered. float 1.0
impurity The regression impurity to use. interface org.tribuo.regression.rtree.impurity.RegressorImpurity MeanSquaredError
maxDepth The maximum depth of the tree. int 2147483647
minChildWeight The minimum weight allowed in a child node. float 5.0
minImpurityDecrease The decrease in impurity needed in order to split the node. float 0.0
normalize Normalize the output of each leaf so it sums to one. boolean false
seed The RNG seed to use when sampling features in a split. long 12345
useRandomSplitPoints Whether to choose split points for features at random. boolean false

26.1.0.19 o..t…regression.rtree.CARTRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.rtree.CARTRegressionTrainer:

name description type default
fractionFeaturesInSplit The fraction of features to consider in each split. 1.0f indicates all features are considered. float 1.0
impurity Regression impurity measure used to determine split quality. interface org.tribuo.regression.rtree.impurity.RegressorImpurity MeanSquaredError
maxDepth The maximum depth of the tree. int 2147483647
minChildWeight The minimum weight allowed in a child node. float 5.0
minImpurityDecrease The decrease in impurity needed in order to split the node. float 0.0
seed The RNG seed to use when sampling features in a split. long 12345
useRandomSplitPoints Whether to choose split points for features at random. boolean false

26.1.0.20 o..t…regression.sgd.fm.FMRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.sgd.fm.FMRegressionTrainer:

name description type default
epochs The number of gradient descent epochs. int 5
factorizedDimSize The size of the factorized feature representation. int 0
loggingInterval Log values after this many updates. int -1
minibatchSize Minibatch size in SGD. int 1
objective The regression objective to use. interface org.tribuo.regression.sgd.RegressionObjective
optimiser The gradient optimiser to use. interface org.tribuo.math.StochasticGradientOptimiser AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements. long 12345
shuffle Shuffle the data before each epoch. Only turn off for debugging. boolean true
standardise Standardise the output variables before fitting the model. boolean false
variance The variance of the initializer. double 0.0

26.1.0.21 o..t…regression.sgd.linear.LinearSGDTrainer

javadoc

All configurable options for org.tribuo.regression.sgd.linear.LinearSGDTrainer:

name description type default
epochs The number of gradient descent epochs. int 5
loggingInterval Log values after this many updates. int -1
minibatchSize Minibatch size in SGD. int 1
objective The regression objective to use. interface org.tribuo.regression.sgd.RegressionObjective
optimiser The gradient optimiser to use. interface org.tribuo.math.StochasticGradientOptimiser AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements. long 12345
shuffle Shuffle the data before each epoch. Only turn off for debugging. boolean true

26.1.0.22 o..t…regression.xgboost.XGBoostRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.xgboost.XGBoostRegressionTrainer:

name description type default
alpha l1 regularisation term on the weights. double 1.0
booster Type of the weak learner. class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType GBTREE
eta The learning rate, shrinks the new tree output to prevent overfitting. double 0.3
featureSubsample Independently subsample the features available for each node of each tree. double 1.0
gamma Minimum loss reduction needed to split a tree node. double 0.0
lambda l2 regularisation term on the weights. double 1.0
maxDepth The maximum depth of any tree. int 6
minChildWeight The minimum weight in each child node before a split is valid. double 1.0
nThread The number of threads to use at training time. int 4
numTrees The number of trees to build. int 0
overrideParameters Override for parameters, if used must contain all the relevant parameters, including the objective java.util.Map<java.lang.String, java.lang.String> {}
rType The type of regression. class org.tribuo.regression.xgboost.XGBoostRegressionTrainer$RegressionType LINEAR
seed The RNG seed. long 12345
silent Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'. int 1
subsample Independently subsample the examples for each tree. double 1.0
treeMethod The tree building algorithm to use. class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod AUTO
verbosity Logging verbosity, 0 is silent, 3 is debug. class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity SILENT

26.1.0.23 o..t…transform.TransformTrainer

javadoc

All configurable options for org.tribuo.transform.TransformTrainer:

name description type default
densify Densify all the features before applying transformations. boolean false
includeImplicitZeroFeatures Include the implicit zeros in the transformation statistics collection boolean false
innerTrainer Trainer to use. org.tribuo.Trainer
transformations Transformations to apply. class org.tribuo.transform.TransformationMap

26.2 Tribuo component reference

26.2.0.1 o..t…classification.example.CheckerboardDataSource

javadoc

All configurable options for org.tribuo.classification.example.CheckerboardDataSource:

name description type default
max The maximum feature value. double 10.0
min The minimum feature value. double 0.0
numSamples Number of samples to generate. int 0
numSquares The number of squares on each side. int 5
seed RNG seed. long 0

26.2.0.2 o..t…classification.example.ConcentricCirclesDataSource

javadoc

All configurable options for org.tribuo.classification.example.ConcentricCirclesDataSource:

name description type default
classProportion The proportion of the circle radius that forms class one. double 0.5
numSamples Number of samples to generate. int 0
radius The radius of the outer circle. double 2.0
seed RNG seed. long 0

26.2.0.3 o..t…classification.example.GaussianLabelDataSource

javadoc

All configurable options for org.tribuo.classification.example.GaussianLabelDataSource:

name description type default
firstCovarianceMatrix 4 element covariance matrix of the first Gaussian. class [D
firstMean 2d mean of the first Gaussian. class [D
numSamples Number of samples to generate. int 0
secondCovarianceMatrix 4 element covariance matrix of the second Gaussian. class [D
secondMean 2d mean of the second Gaussian. class [D
seed RNG seed. long 0

26.2.0.4 o..t…classification.example.InterlockingCrescentsDataSource

javadoc

All configurable options for org.tribuo.classification.example.InterlockingCrescentsDataSource:

name description type default
numSamples Number of samples to generate. int 0
seed RNG seed. long 0

26.2.0.5 o..t…classification.example.NoisyInterlockingCrescentsDataSource

javadoc

All configurable options for org.tribuo.classification.example.NoisyInterlockingCrescentsDataSource:

name description type default
numSamples Number of samples to generate. int 0
seed RNG seed. long 0
variance Variance of the Gaussian noise double 0.1

26.2.0.6 o..t…classification.liblinear.LinearClassificationType

javadoc

All configurable options for org.tribuo.classification.liblinear.LinearClassificationType:

name description type default
type The type of classification model class org.tribuo.classification.liblinear.LinearClassificationType$LinearType

26.2.0.7 o..t…classification.libsvm.SVMClassificationType

javadoc

All configurable options for org.tribuo.classification.libsvm.SVMClassificationType:

name description type default
type The SVM classification algorithm to use. class org.tribuo.classification.libsvm.SVMClassificationType$SVMMode

26.2.0.8 o..t…classification.sequence.viterbi.DefaultFeatureExtractor

javadoc

All configurable options for org.tribuo.classification.sequence.viterbi.DefaultFeatureExtractor:

name description type default
leastRecentOutcome Position of the least recent output to include. int 3
mostRecentOutcome Position of the most recent outcome to include. int 1
use4gram Use 4-grams of the labels as features. boolean false
useBigram Use bigrams of the labels as features. boolean true
useTrigram Use trigrams of the labels as features. boolean true

26.2.0.9 o..t…classification.sgd.objectives.Hinge

javadoc

All configurable options for org.tribuo.classification.sgd.objectives.Hinge:

name description type default
margin The classification margin. double 1.0

26.2.0.10 o..t…data.columnar.RowProcessor

javadoc

All configurable options for org.tribuo.data.columnar.RowProcessor:

name description type default
featureProcessors A set of feature processors to apply after extraction. java.util.Set<org.tribuo.data.columnar.FeatureProcessor> []
fieldProcessorList The list of field processors to use. java.util.List<org.tribuo.data.columnar.FieldProcessor>
metadataExtractors Extractors for the example metadata. java.util.List<org.tribuo.data.columnar.FieldExtractor<?>> []
regexMappingProcessors A map from a regex to field processors to apply to fields matching the regex. java.util.Map<java.lang.String, org.tribuo.data.columnar.FieldProcessor> {}
replaceNewlinesWithSpaces Replace newlines with spaces in values before passing them to field processors. boolean true
responseProcessor Processor which extracts the response. org.tribuo.data.columnar.ResponseProcessor
weightExtractor Extractor for the example weight. org.tribuo.data.columnar.FieldExtractor<java.lang.Float>

26.2.0.11 o..t…data.columnar.extractors.DateExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.DateExtractor:

name description type default
dateFormat The expected date format. class java.lang.String
fieldName The field name to read. class java.lang.String
localeCountry Sets the locale country. class java.lang.String
localeLanguage Sets the locale language. class java.lang.String
metadataName The metadata key to emit, defaults to field name if unpopulated class java.lang.String

26.2.0.12 o..t…data.columnar.extractors.DoubleExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.DoubleExtractor:

name description type default
fieldName The field name to read. class java.lang.String
metadataName The metadata key to emit, defaults to field name if unpopulated class java.lang.String

26.2.0.13 o..t…data.columnar.extractors.FloatExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.FloatExtractor:

name description type default
fieldName The field name to read. class java.lang.String
metadataName The metadata key to emit, defaults to field name if unpopulated class java.lang.String

26.2.0.14 o..t…data.columnar.extractors.IdentityExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IdentityExtractor:

name description type default
fieldName The field name to read. class java.lang.String
metadataName The metadata key to emit, defaults to field name if unpopulated class java.lang.String

26.2.0.15 o..t…data.columnar.extractors.IndexExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IndexExtractor:

name description type default
metadataName The metadata key to emit, defaults to Example.NAME class java.lang.String name

26.2.0.16 o..t…data.columnar.extractors.IntExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IntExtractor:

name description type default
fieldName The field name to read. class java.lang.String
metadataName The metadata key to emit, defaults to field name if unpopulated class java.lang.String

26.2.0.17 o..t…data.columnar.extractors.OffsetDateTimeExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.OffsetDateTimeExtractor:

name description type default
dateTimeFormat The expected date format. class java.lang.String
fieldName The field name to read. class java.lang.String
localeCountry The locale country. class java.lang.String
localeLanguage The locale language. class java.lang.String
metadataName The metadata key to emit, defaults to field name if unpopulated class java.lang.String

26.2.0.18 o..t…data.columnar.processors.feature.UniqueProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.feature.UniqueProcessor:

name description type default
reductionType The operation to perform. class org.tribuo.data.columnar.processors.feature.UniqueProcessor$UniqueType

26.2.0.19 o..t…data.columnar.processors.field.DateFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.DateFieldProcessor:

name description type default
dateFormat The expected date format. class java.lang.String
featureTypes The date features to extract. java.util.EnumSet<org.tribuo.data.columnar.processors.field.DateFieldProcessor$DateFeatureType>
fieldName The field name to read. class java.lang.String
localeCountry Sets the locale country. class java.lang.String US
localeLanguage Sets the locale language. class java.lang.String en

26.2.0.20 o..t…data.columnar.processors.field.DoubleFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.DoubleFieldProcessor:

name description type default
fieldName The field name to read. class java.lang.String
onlyFieldName Emit a feature using just the field name. boolean false
throwOnInvalid Throw NumberFormatException if the value failed to parse. boolean false

26.2.0.21 o..t…data.columnar.processors.field.IdentityProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.IdentityProcessor:

name description type default
fieldName The field name to read. class java.lang.String

26.2.0.22 o..t…data.columnar.processors.field.RegexFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.RegexFieldProcessor:

name description type default
fieldName The field name to read. class java.lang.String
modes Matching mode. java.util.EnumSet<org.tribuo.data.columnar.processors.field.RegexFieldProcessor$Mode>
regexString Regex to apply to the field. class java.lang.String

26.2.0.23 o..t…data.columnar.processors.field.TextFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.TextFieldProcessor:

name description type default
fieldName The field name to read. class java.lang.String
pipeline Text processing pipeline to use. interface org.tribuo.data.text.TextPipeline

26.2.0.24 o..t…data.columnar.processors.response.BinaryResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.BinaryResponseProcessor:

name description type default
displayField Whether to display field names as part of the generated output, defaults to false boolean false
fieldName The field name to read, you should use only one of this or fieldNames class java.lang.String
fieldNames A list of field names to read, you should use only one of this or fieldName. java.util.List<java.lang.String>
negativeName The negative response to emit. class java.lang.String 0
outputFactory Output factory to use to create the response. org.tribuo.OutputFactory
positiveName The positive response to emit. class java.lang.String 1
positiveResponse The string which triggers a positive response. class java.lang.String
positiveResponses A list of strings that trigger positive responses; it should be the same length as fieldNames or empty java.util.List<java.lang.String>

26.2.0.25 o..t…data.columnar.processors.response.EmptyResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.EmptyResponseProcessor:

name description type default
outputFactory Output factory to type the columnar loader. org.tribuo.OutputFactory

26.2.0.26 o..t…data.columnar.processors.response.FieldResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.FieldResponseProcessor:

name description type default
defaultValue Default value to return if one isn't found. class java.lang.String
defaultValues A list of default values to return if one isn't found, one for each field java.util.List<java.lang.String>
displayField Whether to display field names as part of the generated label, defaults to false boolean false
fieldName The field name to read. class java.lang.String
fieldNames A list of field names to read, you should use only one of this or fieldName. java.util.List<java.lang.String>
outputFactory The output factory to use. org.tribuo.OutputFactory
uppercase Uppercase the value before converting to output. boolean true

26.2.0.27 o..t…data.columnar.processors.response.Quartile

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.Quartile:

name description type default
lowerMedian The lower quartile value. double 0.0
median The median value. double 0.0
upperMedian The upper quartile value. double 0.0

26.2.0.28 o..t…data.columnar.processors.response.QuartileResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.QuartileResponseProcessor:

name description type default
fieldName The field name to read. class java.lang.String
fieldNames A list of field names to read, you should use only one of this or fieldName. java.util.List<java.lang.String>
name The string to emit. class java.lang.String
outputFactory The output factory to use. org.tribuo.OutputFactory
quartile The quartile to use. class org.tribuo.data.columnar.processors.response.Quartile
quartiles A list of quartiles to use, should have the same length as fieldNames java.util.List<org.tribuo.data.columnar.processors.response.Quartile>

26.2.0.29 o..t…data.csv.CSVDataSource

javadoc

All configurable options for org.tribuo.data.csv.CSVDataSource:

name description type default
dataPath Path to the CSV file. interface java.nio.file.Path
headers The CSV headers. Should only be used if the csv file does not already contain headers. java.util.List<java.lang.String> []
outputFactory The output factory to use. org.tribuo.OutputFactory
outputRequired Is an output required from each row? boolean true
quote The CSV quote character. char "
rowProcessor The row processor to use. org.tribuo.data.columnar.RowProcessor
separator The CSV separator character. char ,

26.2.0.30 o..t…data.csv.CSVSaver

javadoc

All configurable options for org.tribuo.data.csv.CSVSaver:

name description type default
quote The quote character. char "
separator The column separator. char ,

26.2.0.31 o..t…data.sql.SQLDBConfig

javadoc

All configurable options for org.tribuo.data.sql.SQLDBConfig:

name description type default
connectionString Connection string, including host, port and db. class java.lang.String
db Database name. class java.lang.String
fetchSize Size of batches to fetch from DB for queries int 1000
host Hostname of the database machine. class java.lang.String
password Database password. class java.lang.String
port Port number. class java.lang.String
propMap Properties to pass to java.sql.DriverManager, username and password will be removed and populated to their fields. If specified both on the map and in the fields, the fields will be used java.util.Map<java.lang.String, java.lang.String> {}
username Database username. class java.lang.String

26.2.0.32 o..t…data.sql.SQLDataSource

javadoc

All configurable options for org.tribuo.data.sql.SQLDataSource:

name description type default
outputFactory The output factory to use. org.tribuo.OutputFactory
outputRequired Is an output required from each row? boolean true
rowProcessor The row processor to use. org.tribuo.data.columnar.RowProcessor
sqlConfig Database configuration. class org.tribuo.data.sql.SQLDBConfig
sqlString SQL query to run. class java.lang.String

26.2.0.33 o..t…data.text.DirectoryFileSource

javadoc

All configurable options for org.tribuo.data.text.DirectoryFileSource:

name description type default
dataDir The top-level directory containing the data set. interface java.nio.file.Path .
extractor The feature extractor that converts text into examples. org.tribuo.data.text.TextFeatureExtractor
outputFactory The output factory to use. org.tribuo.OutputFactory
preprocessors The preprocessors to apply to the input documents. java.util.List<org.tribuo.data.text.DocumentPreprocessor> []

26.2.0.34 o..t…data.text.impl.BasicPipeline

javadoc

All configurable options for org.tribuo.data.text.impl.BasicPipeline:

name description type default
ngram n in the n-gram to emit. int 2
tokenizer Tokenizer to use. interface org.tribuo.util.tokens.Tokenizer

26.2.0.35 o..t…data.text.impl.CasingPreprocessor

javadoc

All configurable options for org.tribuo.data.text.impl.CasingPreprocessor:

name description type default
op Which casing operation to apply. class org.tribuo.data.text.impl.CasingPreprocessor$CasingOperation LOWERCASE

26.2.0.36 o..t…data.text.impl.FeatureHasher

javadoc

All configurable options for org.tribuo.data.text.impl.FeatureHasher:

name description type default
dimension Dimension to map the hash into. int 0
hashSeed Seed used in the hash function. int 38495
preserveValue Preserve input feature value. boolean false
valueHashSeed Seed used for value hash function. int 77777

26.2.0.37 o..t…data.text.impl.NgramProcessor

javadoc

All configurable options for org.tribuo.data.text.impl.NgramProcessor:

name description type default
n n in the n-gram to emit. int 2
tokenizer Tokenizer to use. interface org.tribuo.util.tokens.Tokenizer
value Value to emit for each n-gram. double 1.0

26.2.0.38 o..t…data.text.impl.RegexPreprocessor

javadoc

All configurable options for org.tribuo.data.text.impl.RegexPreprocessor:

name description type default
regexStrings A list of regular expressions in string format used to match the input java.util.List<java.lang.String>
replacements A list of replacement strings which are used to replace the matches java.util.List<java.lang.String>

26.2.0.39 o..t…data.text.impl.SimpleStringDataSource

javadoc

All configurable options for org.tribuo.data.text.impl.SimpleStringDataSource:

name description type default
extractor The feature extractor that generates Features from text. org.tribuo.data.text.TextFeatureExtractor
outputFactory The factory that converts a String into an Output instance. org.tribuo.OutputFactory
path The path to read the data from. interface java.nio.file.Path
preprocessors The document preprocessors to run on each document in the data source. java.util.List<org.tribuo.data.text.DocumentPreprocessor> []
rawLines The input data lines. java.util.List<java.lang.String>

26.2.0.40 o..t…data.text.impl.SimpleTextDataSource

javadoc

All configurable options for org.tribuo.data.text.impl.SimpleTextDataSource:

name description type default
extractor The feature extractor that generates Features from text. org.tribuo.data.text.TextFeatureExtractor
outputFactory The factory that converts a String into an Output instance. org.tribuo.OutputFactory
path The path to read the data from. interface java.nio.file.Path
preprocessors The document preprocessors to run on each document in the data source. java.util.List<org.tribuo.data.text.DocumentPreprocessor> []

26.2.0.41 o..t…data.text.impl.TextFeatureExtractorImpl

javadoc

All configurable options for org.tribuo.data.text.impl.TextFeatureExtractorImpl:

name description type default
pipeline The text processing pipeline. interface org.tribuo.data.text.TextPipeline

26.2.0.42 o..t…data.text.impl.TokenPipeline

javadoc

All configurable options for org.tribuo.data.text.impl.TokenPipeline:

name description type default
hashDim Dimension to map the hash into. int -1
hashPreserveValue Should feature hashing preserve the value? boolean true
ngram n in the n-gram to emit. int 2
termCounting Use term counting, otherwise emit binary features. boolean false
tokenizer Tokenizer to use. interface org.tribuo.util.tokens.Tokenizer

26.2.0.43 o..t…data.text.impl.UniqueAggregator

javadoc

All configurable options for org.tribuo.data.text.impl.UniqueAggregator:

name description type default
value Value to emit, if unset emits the last value observed for that token. double NaN

26.2.0.44 o..t…datasource.IDXDataSource

javadoc

All configurable options for org.tribuo.datasource.IDXDataSource:

name description type default
featuresPath Path to load the features from. interface java.nio.file.Path
outputFactory The output factory to use. org.tribuo.OutputFactory
outputPath Path to load the features from. interface java.nio.file.Path

26.2.0.45 o..t…datasource.LibSVMDataSource

javadoc

All configurable options for org.tribuo.datasource.LibSVMDataSource:

name description type default
maxFeatureID Sets the maximum feature id to load from the file. int -2147483648
outputFactory The output factory to use. org.tribuo.OutputFactory
path Path to load the data from. Either this or url must be set. interface java.nio.file.Path
url URL to load the data from. Either this or path must be set. class java.net.URL
zeroIndexed Set to true if the features are zero indexed. boolean false

26.2.0.46 o..t…hash.HashCodeHasher

javadoc

All configurable options for org.tribuo.hash.HashCodeHasher:

name description type default
salt Salt used in the hash. class java.lang.String

26.2.0.47 o..t…hash.MessageDigestHasher

javadoc

All configurable options for org.tribuo.hash.MessageDigestHasher:

name description type default
hashType MessageDigest hashing function. class java.lang.String
saltStr Salt used in the hash. class java.lang.String

26.2.0.48 o..t…hash.ModHashCodeHasher

javadoc

All configurable options for org.tribuo.hash.ModHashCodeHasher:

name description type default
dimension Range of the hashing function. int 100
salt Salt used in the hash. class java.lang.String

26.2.0.49 o..t…math.kernel.Polynomial

javadoc

All configurable options for org.tribuo.math.kernel.Polynomial:

name description type default
degree Degree of the polynomial. double 0.0
gamma Coefficient to multiply the dot product by. double 0.0
intercept Scalar to add to the dot product. double 0.0

26.2.0.50 o..t…math.kernel.RBF

javadoc

All configurable options for org.tribuo.math.kernel.RBF:

name description type default
gamma Kernel output = exp(-gamma*|u-v|^2). double 0.0

26.2.0.51 o..t…math.kernel.Sigmoid

javadoc

All configurable options for org.tribuo.math.kernel.Sigmoid:

name description type default
gamma Coefficient to multiply the dot product by. double 0.0
intercept Scalar intercept to add to the dot product. double 0.0

26.2.0.52 o..t…math.neighbour.bruteforce.NeighboursBruteForceFactory

javadoc

All configurable options for org.tribuo.math.neighbour.bruteforce.NeighboursBruteForceFactory:

name description type default
distance The distance function to use. interface org.tribuo.math.distance.Distance L2Distance()
numThreads The number of threads to use for training. int 1

26.2.0.53 o..t…math.neighbour.kdtree.KDTreeFactory

javadoc

All configurable options for org.tribuo.math.neighbour.kdtree.KDTreeFactory:

name description type default
distance The distance function to use. interface org.tribuo.math.distance.Distance L2Distance()
numThreads The number of threads to use for training. int 1

26.2.0.54 o..t…math.optimisers.AdaDelta

javadoc

All configurable options for org.tribuo.math.optimisers.AdaDelta:

name description type default
epsilon Epsilon for numerical stability. double 1.0E-6
rho Momentum value. double 0.95

26.2.0.55 o..t…math.optimisers.AdaGrad

javadoc

All configurable options for org.tribuo.math.optimisers.AdaGrad:

name description type default
epsilon Epsilon for numerical stability around zero. double 1.0E-6
initialLearningRate Initial learning rate used to scale the gradients. double 0.0
initialValue Initial value for the gradient accumulator. double 0.0

26.2.0.56 o..t…math.optimisers.AdaGradRDA

javadoc

All configurable options for org.tribuo.math.optimisers.AdaGradRDA:

name description type default
epsilon Epsilon for numerical stability around zero. double 1.0E-6
initialLearningRate Initial learning rate used to scale the gradients. double 0.0
l1 l1 regularization penalty. double 0.0
l2 l2 regularization penalty. double 0.0
numExamples Number of examples to scale the l1 and l2 penalties by. int 1

26.2.0.57 o..t…math.optimisers.Adam

javadoc

All configurable options for org.tribuo.math.optimisers.Adam:

name description type default
betaOne The beta one parameter. double 0.9
betaTwo The beta two parameter. double 0.999
epsilon Epsilon for numerical stability. double 1.0E-6
initialLearningRate Learning rate to scale the gradients by. double 0.001

26.2.0.58 o..t…math.optimisers.LinearDecaySGD

javadoc

All configurable options for org.tribuo.math.optimisers.LinearDecaySGD:

name description type default
initialLearningRate Initial learning rate. double 0.0
rho Momentum scaling factor. double 0.0
useMomentum Momentum type to use. class org.tribuo.math.optimisers.SGD$Momentum

26.2.0.59 o..t…math.optimisers.ParameterAveraging

javadoc

All configurable options for org.tribuo.math.optimisers.ParameterAveraging:

name description type default
optimiser Inner optimiser to average parameters across. interface org.tribuo.math.StochasticGradientOptimiser

26.2.0.60 o..t…math.optimisers.Pegasos

javadoc

All configurable options for org.tribuo.math.optimisers.Pegasos:

name description type default
baseRate Base learning rate. double 0.1
lambda Step size shrinkage. double 0.01

26.2.0.61 o..t…math.optimisers.RMSProp

javadoc

All configurable options for org.tribuo.math.optimisers.RMSProp:

name description type default
decay Decay factor for the momentum. double 0.0
epsilon Epsilon for numerical stability. double 1.0E-8
initialLearningRate Learning rate to scale the gradients by. double 0.0
rho Momentum parameter. double 0.9

26.2.0.62 o..t…math.optimisers.SimpleSGD

javadoc

All configurable options for org.tribuo.math.optimisers.SimpleSGD:

name description type default
initialLearningRate Initial learning rate. double 0.0
rho Momentum scaling factor. double 0.0
useMomentum Momentum type to use. class org.tribuo.math.optimisers.SGD$Momentum

26.2.0.63 o..t…math.optimisers.SqrtDecaySGD

javadoc

All configurable options for org.tribuo.math.optimisers.SqrtDecaySGD:

name description type default
initialLearningRate Initial learning rate. double 0.0
rho Momentum scaling factor. double 0.0
useMomentum Momentum type to use. class org.tribuo.math.optimisers.SGD$Momentum

26.2.0.64 o..t…regression.RegressionFactory

javadoc

All configurable options for org.tribuo.regression.RegressionFactory:

name description type default
splitChar The character to split the dimensions on. char ,

26.2.0.65 o..t…regression.example.GaussianDataSource

javadoc

All configurable options for org.tribuo.regression.example.GaussianDataSource:

name description type default
intercept The y-intercept of the line. float 0.0
numSamples The number of samples to draw. int 0
seed The RNG seed. long 12345
slope The slope of the line. float 0.0
variance The variance of the gaussian. float 1.0
xMax The maximum feature value. float 0.0
xMin The minimum feature value. float 0.0

26.2.0.66 o..t…regression.example.NonlinearGaussianDataSource

javadoc

All configurable options for org.tribuo.regression.example.NonlinearGaussianDataSource:

name description type default
intercept The y-intercept of the line. float 0.0
numSamples The number of samples to draw. int 0
seed The RNG seed. long 12345
variance The variance of the noise gaussian. float 1.0
weights The feature weights. Must be a 4 element array. class [F [F@5072ff49
xOneMax The maximum value of x_1. float 2.0
xOneMin The minimum value of x_1. float -2.0
xZeroMax The maximum value of x_0. float 2.0
xZeroMin The minimum value of x_0. float -2.0

26.2.0.67 o..t…regression.liblinear.LinearRegressionType

javadoc

All configurable options for org.tribuo.regression.liblinear.LinearRegressionType:

name description type default
type The type of regression algorithm. class org.tribuo.regression.liblinear.LinearRegressionType$LinearType

26.2.0.68 o..t…regression.libsvm.SVMRegressionType

javadoc

All configurable options for org.tribuo.regression.libsvm.SVMRegressionType:

name description type default
type The SVM regression algorithm to use. class org.tribuo.regression.libsvm.SVMRegressionType$SVMMode

26.2.0.69 o..t…regression.sgd.objectives.Huber

javadoc

All configurable options for org.tribuo.regression.sgd.objectives.Huber:

name description type default
cost Cost beyond which the loss function is linear. double 5.0

26.2.0.70 o..t…transform.TransformationMap

javadoc

All configurable options for org.tribuo.transform.TransformationMap:

name description type default
featureTransformationList Feature specific transformations. Accepts regexes for feature names. java.util.Map<java.lang.String, org.tribuo.transform.TransformationMap$TransformationList> {}
globalTransformations Global transformations to apply after the feature specific transforms. java.util.List<org.tribuo.transform.Transformation>

26.2.0.71 o..t…transform.TransformationMap$TransformationList

javadoc

All configurable options for org.tribuo.transform.TransformationMap$TransformationList:

name description type default
list A list of transformations to apply. java.util.List<org.tribuo.transform.Transformation>

26.2.0.72 o..t…transform.transformations.BinningTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.BinningTransformation:

name description type default
numBins Number of bins. int 0
type Binning algorithm to use. class org.tribuo.transform.transformations.BinningTransformation$BinningType

26.2.0.73 o..t…transform.transformations.LinearScalingTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.LinearScalingTransformation:

name description type default
targetMax Maximum value after transformation. double 1.0
targetMin Minimum value after transformation. double 0.0

26.2.0.74 o..t…transform.transformations.MeanStdDevTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.MeanStdDevTransformation:

name description type default
targetMean Mean value after transformation. double 0.0
targetStdDev Standard deviation after transformation. double 1.0

26.2.0.75 o..t…transform.transformations.SimpleTransform

javadoc

All configurable options for org.tribuo.transform.transformations.SimpleTransform:

name description type default
op Type of the simple transformation. class org.tribuo.transform.transformations.SimpleTransform$Operation
operand Operand (if required). double NaN
secondOperand Second operand (if required). double NaN

26.2.0.76 o..t…util.tokens.impl.BreakIteratorTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.BreakIteratorTokenizer:

name description type default
localeStr The locale language tag string. class java.lang.String

26.2.0.77 o..t…util.tokens.impl.SplitCharactersTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.SplitCharactersTokenizer:

name description type default
splitCharacters The characters to split on. class [C [C@63a57430
splitXDigitsCharacters The characters to split on unless we're in a number. class [C [C@6f2ee356

26.2.0.78 o..t…util.tokens.impl.SplitPatternTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.SplitPatternTokenizer:

name description type default
splitPatternRegex The regex to split with. class java.lang.String [\.,]?\s+

26.2.0.79 o..t…util.tokens.impl.wordpiece.Wordpiece

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.Wordpiece:

name description type default
maxInputCharactersPerWord the maximum number of characters per word to consider. This helps eliminate doing extra work on pathological cases. int 100
unknownToken the value to use for 'UNKNOWN' tokens. Defaults to '[UNK]' which is a common default in BERT-based solutions. class java.lang.String [UNK]
vocabPath path to a vocabulary data file. class java.lang.String

26.2.0.80 o..t…util.tokens.impl.wordpiece.WordpieceBasicTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer:

name description type default
tokenizeChineseChars split on Chinese tokens? boolean true

26.2.0.81 o..t…util.tokens.impl.wordpiece.WordpieceTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceTokenizer:

name description type default
basicTokenizer performs some tokenization work on the input text before the wordpiece algorithm is applied to each resulting token. interface org.tribuo.util.tokens.Tokenizer org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer@7dce5263
neverSplitTokens a set of 'token' strings that should never be split regardless of whether they have e.g., punctuation in the middle. No entries should have whitespace in them. java.util.Set<java.lang.String> []
stripAccents determines whether or not to strip accents/diacritics from the input text boolean true
toLowerCase determines whether or not to lowercase the input text boolean true
whitespaceTokenizer performs whitespace tokenization before 'basic' tokenizer is applied (see basicTokenizer) interface org.tribuo.util.tokens.Tokenizer org.tribuo.util.tokens.impl.WhitespaceTokenizer@65e239da
wordpiece an instance of Wordpiece which applies the 'wordpiece' algorithm class org.tribuo.util.tokens.impl.wordpiece.Wordpiece

26.2.0.82 o..t…util.tokens.universal.UniversalTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.universal.UniversalTokenizer:

name description type default
sendPunct Send punctuation through as tokens. boolean false
source: notebooks/noj_book/tribuo_reference.clj