25  Tribuo reference

As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.

This specific chapter focuses on the models of the Tribuo Java library, which is wrapped by scicloj.ml.tribuo.

The following is a reference for all Tribuo trainers. They can be used as the model specification in ml/train on the :type of the Tribuo trainer.

(comment
  (ml/train
   ds
   {:model-type :scicloj.ml.tribuo/classification
    :tribuo-components [{:name "random-forest"
                         :type "org.tribuo.classification.dtree.CARTClassificationTrainer"
                         :properties {:maxDepth "8"
                                      :useRandomSplitPoints "false"
                                      :fractionFeaturesInSplit "0.5"}}]
    :tribuo-trainer-name "random-forest"}))

There is also a reference to all non-trainer components of Tribuo. These could also be potentially used in Tribuo model specs.

25.1 Tribuo trainer reference

25.1.0.1 o..t…classification.baseline.DummyClassifierTrainer

javadoc

The DummyClassifier predicts a value, using a ‘dummy’ algorithm

(kind/md "It can for example always predict a :CONSTANT value")

It can for example always predict a :CONSTANT value

(def df
 (-> (tc/dataset {:a [1 2], :target [:x :x]})
  (ds-mod/set-inference-target :target)))
(kind/table df)
a target
1 x
2 x
(def model
 (ml/train
   df
   {:model-type :scicloj.ml.tribuo/classification,
    :tribuo-components
    [{:name "dummy",
      :type
      "org.tribuo.classification.baseline.DummyClassifierTrainer",
      :properties {:dummyType :CONSTANT, :constantLabel "c"}}],
    :tribuo-trainer-name "dummy"}))

‘c’ in this case:

(ml/predict df model)

_unnamed [2 1]:

:target
:c
:c

All configurable options for org.tribuo.classification.baseline.DummyClassifierTrainer:

name description type default
constantLabel Label to use for the constant classifier.

class java.lang.String

dummyType Type of dummy classifier.

class org.tribuo.classification.baseline.DummyClassifierTrainer$DummyType

seed Seed for the RNG.

long

1

25.1.0.2 o..t…classification.dtree.CARTClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.dtree.CARTClassificationTrainer:

name description type default
fractionFeaturesInSplit The fraction of features to consider in each split. 1.0f indicates all features are considered.

float

1.0
impurity The impurity measure used to determine split quality.

interface org.tribuo.classification.dtree.impurity.LabelImpurity

GiniIndex
maxDepth The maximum depth of the tree.

int

2147483647
minChildWeight The minimum weight allowed in a child node.

float

5.0
minImpurityDecrease The decrease in impurity needed in order to split the node.

float

0.0
seed The RNG seed to use when sampling features in a split.

long

12345
useRandomSplitPoints Whether to choose split points for features at random.

boolean

false

25.1.0.3 o..t…classification.ensemble.AdaBoostTrainer

javadoc

All configurable options for org.tribuo.classification.ensemble.AdaBoostTrainer:

name description type default
innerTrainer The trainer to use to build each weak learner.

org.tribuo.Trainer

numMembers The number of ensemble members to train.

int

0
seed The seed for the RNG.

long

0

25.1.0.4 o..t…classification.liblinear.LibLinearClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.liblinear.LibLinearClassificationTrainer:

name description type default
cost Cost penalty for misclassifications.

double

1.0
epsilon Epsilon insensitivity in the regression cost function.

double

0.1
labelWeights Use Label specific weights.

java.util.Map

{}
maxIterations Maximum number of iterations before terminating.

int

1000
seed RNG seed.

long

12345
terminationCriterion Stop iterating when the loss score decreases by less than this value.

double

0.1
trainerType Algorithm to use.

org.tribuo.common.liblinear.LibLinearType

org.tribuo.classification.liblinear.LinearClassificationType@1355a877

25.1.0.5 o..t…classification.libsvm.LibSVMClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.libsvm.LibSVMClassificationTrainer:

name description type default
cache_size Internal cache size, most of the time should be left at default.

double

500.0
coef0 Polynomial coefficient or shift in sigmoid kernel.

double

0.0
cost Cost parameter for incorrect predictions.

double

1.0
degree Polynomial degree.

int

3
eps Tolerance of the termination criterion.

double

0.001
gamma Width of the RBF kernel, or scalar on sigmoid kernel.

double

0.0
kernelType Type of Kernel.

class org.tribuo.common.libsvm.KernelType

LINEAR
labelWeights Use Label specific weights.

java.util.Map

{}
nu nu value in NU SVM.

double

0.5
p Epsilon in EPSILON_SVR.

double

0.001
probability Generate probability estimates.

boolean

false
seed RNG seed.

long

12345
shrinking Regularise the weight parameters.

boolean

true
svmType Type of SVM algorithm.

org.tribuo.common.libsvm.SVMType

25.1.0.6 o..t…classification.sgd.fm.FMClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.fm.FMClassificationTrainer:

name description type default
epochs The number of gradient descent epochs.

int

5
factorizedDimSize The size of the factorized feature representation.

int

0
loggingInterval Log values after this many updates.

int

-1
minibatchSize Minibatch size in SGD.

int

1
objective The classification objective function to use.

interface org.tribuo.classification.sgd.LabelObjective

LogMulticlass
optimiser The gradient optimiser to use.

interface org.tribuo.math.StochasticGradientOptimiser

AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements.

long

12345
shuffle Shuffle the data before each epoch. Only turn off for debugging.

boolean

true
variance The variance of the initializer.

double

0.0

25.1.0.7 o..t…classification.sgd.kernel.KernelSVMTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.kernel.KernelSVMTrainer:

name description type default
epochs Number of SGD epochs.

int

5
kernel SVM kernel.

interface org.tribuo.math.kernel.Kernel

lambda Step size.

double

0.0
loggingInterval Log values after this many updates.

int

-1
seed Seed for the RNG used to shuffle elements.

long

0
shuffle Shuffle the data before each epoch. Only turn off for debugging.

boolean

true

25.1.0.8 o..t…classification.sgd.linear.LinearSGDTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.linear.LinearSGDTrainer:

name description type default
epochs The number of gradient descent epochs.

int

5
loggingInterval Log values after this many updates.

int

-1
minibatchSize Minibatch size in SGD.

int

1
objective The classification objective function to use.

interface org.tribuo.classification.sgd.LabelObjective

LogMulticlass
optimiser The gradient optimiser to use.

interface org.tribuo.math.StochasticGradientOptimiser

AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements.

long

12345
shuffle Shuffle the data before each epoch. Only turn off for debugging.

boolean

true

25.1.0.9 o..t…classification.sgd.linear.LogisticRegressionTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.linear.LogisticRegressionTrainer:

name description type default
epochs The number of gradient descent epochs.

int

5
loggingInterval Log values after this many updates.

int

1000
minibatchSize Minibatch size in SGD.

int

1
objective The classification objective function to use.

interface org.tribuo.classification.sgd.LabelObjective

LogMulticlass
optimiser The gradient optimiser to use.

interface org.tribuo.math.StochasticGradientOptimiser

AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements.

long

12345
shuffle Shuffle the data before each epoch. Only turn off for debugging.

boolean

true

25.1.0.10 o..t…classification.xgboost.XGBoostClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.xgboost.XGBoostClassificationTrainer:

name description type default
alpha l1 regularisation term on the weights.

double

1.0
booster Type of the weak learner.

class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType

GBTREE
eta The learning rate, shrinks the new tree output to prevent overfitting.

double

0.3
evalMetric Evaluation metric to use. The default value is set based on the objective function, so this can be usually left blank.

class java.lang.String

featureSubsample Independently subsample the features available for each node of each tree.

double

1.0
gamma Minimum loss reduction needed to split a tree node.

double

0.0
lambda l2 regularisation term on the weights.

double

1.0
maxDepth The maximum depth of any tree.

int

6
minChildWeight The minimum weight in each child node before a split is valid.

double

1.0
nThread The number of threads to use at training time.

int

4
numTrees The number of trees to build.

int

0
overrideParameters Override for parameters, if used must contain all the relevant parameters, including the objective

java.util.Map

{}
seed The RNG seed.

long

12345
silent Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'.

int

1
subsample Independently subsample the examples for each tree.

double

1.0
treeMethod The tree building algorithm to use.

class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod

AUTO
verbosity Logging verbosity, 0 is silent, 3 is debug.

class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity

SILENT

25.1.0.11 o..t…common.tree.ExtraTreesTrainer

javadoc

All configurable options for org.tribuo.common.tree.ExtraTreesTrainer:

name description type default
combiner The combination function to aggregate each ensemble member's outputs.

org.tribuo.ensemble.EnsembleCombiner

innerTrainer The trainer to use for each ensemble member.

org.tribuo.Trainer

numMembers The number of ensemble members to train.

int

0
seed The seed for the RNG.

long

0

25.1.0.12 o..t…common.tree.RandomForestTrainer

javadoc

All configurable options for org.tribuo.common.tree.RandomForestTrainer:

name description type default
combiner The combination function to aggregate each ensemble member's outputs.

org.tribuo.ensemble.EnsembleCombiner

innerTrainer The trainer to use for each ensemble member.

org.tribuo.Trainer

numMembers The number of ensemble members to train.

int

0
seed The seed for the RNG.

long

0

25.1.0.13 o..t…ensemble.BaggingTrainer

javadoc

All configurable options for org.tribuo.ensemble.BaggingTrainer:

name description type default
combiner The combination function to aggregate each ensemble member's outputs.

org.tribuo.ensemble.EnsembleCombiner

innerTrainer The trainer to use for each ensemble member.

org.tribuo.Trainer

numMembers The number of ensemble members to train.

int

0
seed The seed for the RNG.

long

0

25.1.0.14 o..t…hash.HashingTrainer

javadoc

All configurable options for org.tribuo.hash.HashingTrainer:

name description type default
hasher Feature hashing function to use.

class org.tribuo.hash.Hasher

innerTrainer Trainer to use.

org.tribuo.Trainer

25.1.0.15 o..t…regression.baseline.DummyRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.baseline.DummyRegressionTrainer:

name description type default
constantValue Constant value to use for the constant regressor.

double

NaN
dummyType Type of dummy regressor.

class org.tribuo.regression.baseline.DummyRegressionTrainer$DummyType

quartile Quartile to use.

double

NaN
seed The seed for the RNG.

long

1

25.1.0.16 o..t…regression.liblinear.LibLinearRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.liblinear.LibLinearRegressionTrainer:

name description type default
cost Cost penalty for misclassifications.

double

1.0
epsilon Epsilon insensitivity in the regression cost function.

double

0.1
maxIterations Maximum number of iterations before terminating.

int

1000
seed RNG seed.

long

12345
terminationCriterion Stop iterating when the loss score decreases by less than this value.

double

0.1
trainerType Algorithm to use.

org.tribuo.common.liblinear.LibLinearType

org.tribuo.regression.liblinear.LinearRegressionType@50db498d

25.1.0.17 o..t…regression.libsvm.LibSVMRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.libsvm.LibSVMRegressionTrainer:

name description type default
cache_size Internal cache size, most of the time should be left at default.

double

500.0
coef0 Polynomial coefficient or shift in sigmoid kernel.

double

0.0
cost Cost parameter for incorrect predictions.

double

1.0
degree Polynomial degree.

int

3
eps Tolerance of the termination criterion.

double

0.001
gamma Width of the RBF kernel, or scalar on sigmoid kernel.

double

0.0
kernelType Type of Kernel.

class org.tribuo.common.libsvm.KernelType

LINEAR
nu nu value in NU SVM.

double

0.5
p Epsilon in EPSILON_SVR.

double

0.001
probability Generate probability estimates.

boolean

false
seed RNG seed.

long

12345
shrinking Regularise the weight parameters.

boolean

true
standardize Standardise the regression outputs before training.

boolean

false
svmType Type of SVM algorithm.

org.tribuo.common.libsvm.SVMType

25.1.0.18 o..t…regression.rtree.CARTJointRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.rtree.CARTJointRegressionTrainer:

name description type default
fractionFeaturesInSplit The fraction of features to consider in each split. 1.0f indicates all features are considered.

float

1.0
impurity The regression impurity to use.

interface org.tribuo.regression.rtree.impurity.RegressorImpurity

MeanSquaredError
maxDepth The maximum depth of the tree.

int

2147483647
minChildWeight The minimum weight allowed in a child node.

float

5.0
minImpurityDecrease The decrease in impurity needed in order to split the node.

float

0.0
normalize Normalize the output of each leaf so it sums to one.

boolean

false
seed The RNG seed to use when sampling features in a split.

long

12345
useRandomSplitPoints Whether to choose split points for features at random.

boolean

false

25.1.0.19 o..t…regression.rtree.CARTRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.rtree.CARTRegressionTrainer:

name description type default
fractionFeaturesInSplit The fraction of features to consider in each split. 1.0f indicates all features are considered.

float

1.0
impurity Regression impurity measure used to determine split quality.

interface org.tribuo.regression.rtree.impurity.RegressorImpurity

MeanSquaredError
maxDepth The maximum depth of the tree.

int

2147483647
minChildWeight The minimum weight allowed in a child node.

float

5.0
minImpurityDecrease The decrease in impurity needed in order to split the node.

float

0.0
seed The RNG seed to use when sampling features in a split.

long

12345
useRandomSplitPoints Whether to choose split points for features at random.

boolean

false

25.1.0.20 o..t…regression.sgd.fm.FMRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.sgd.fm.FMRegressionTrainer:

name description type default
epochs The number of gradient descent epochs.

int

5
factorizedDimSize The size of the factorized feature representation.

int

0
loggingInterval Log values after this many updates.

int

-1
minibatchSize Minibatch size in SGD.

int

1
objective The regression objective to use.

interface org.tribuo.regression.sgd.RegressionObjective

optimiser The gradient optimiser to use.

interface org.tribuo.math.StochasticGradientOptimiser

AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements.

long

12345
shuffle Shuffle the data before each epoch. Only turn off for debugging.

boolean

true
standardise Standardise the output variables before fitting the model.

boolean

false
variance The variance of the initializer.

double

0.0

25.1.0.21 o..t…regression.sgd.linear.LinearSGDTrainer

javadoc

All configurable options for org.tribuo.regression.sgd.linear.LinearSGDTrainer:

name description type default
epochs The number of gradient descent epochs.

int

5
loggingInterval Log values after this many updates.

int

-1
minibatchSize Minibatch size in SGD.

int

1
objective The regression objective to use.

interface org.tribuo.regression.sgd.RegressionObjective

optimiser The gradient optimiser to use.

interface org.tribuo.math.StochasticGradientOptimiser

AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed Seed for the RNG used to shuffle elements.

long

12345
shuffle Shuffle the data before each epoch. Only turn off for debugging.

boolean

true

25.1.0.22 o..t…regression.xgboost.XGBoostRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.xgboost.XGBoostRegressionTrainer:

name description type default
alpha l1 regularisation term on the weights.

double

1.0
booster Type of the weak learner.

class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType

GBTREE
eta The learning rate, shrinks the new tree output to prevent overfitting.

double

0.3
featureSubsample Independently subsample the features available for each node of each tree.

double

1.0
gamma Minimum loss reduction needed to split a tree node.

double

0.0
lambda l2 regularisation term on the weights.

double

1.0
maxDepth The maximum depth of any tree.

int

6
minChildWeight The minimum weight in each child node before a split is valid.

double

1.0
nThread The number of threads to use at training time.

int

4
numTrees The number of trees to build.

int

0
overrideParameters Override for parameters, if used must contain all the relevant parameters, including the objective

java.util.Map

{}
rType The type of regression.

class org.tribuo.regression.xgboost.XGBoostRegressionTrainer$RegressionType

LINEAR
seed The RNG seed.

long

12345
silent Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'.

int

1
subsample Independently subsample the examples for each tree.

double

1.0
treeMethod The tree building algorithm to use.

class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod

AUTO
verbosity Logging verbosity, 0 is silent, 3 is debug.

class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity

SILENT

25.1.0.23 o..t…transform.TransformTrainer

javadoc

All configurable options for org.tribuo.transform.TransformTrainer:

name description type default
densify Densify all the features before applying transformations.

boolean

false
includeImplicitZeroFeatures Include the implicit zeros in the transformation statistics collection

boolean

false
innerTrainer Trainer to use.

org.tribuo.Trainer

transformations Transformations to apply.

class org.tribuo.transform.TransformationMap

25.2 Tribuo component reference

25.2.0.1 o..t…classification.example.CheckerboardDataSource

javadoc

All configurable options for org.tribuo.classification.example.CheckerboardDataSource:

name description type default
max The maximum feature value.

double

10.0
min The minimum feature value.

double

0.0
numSamples Number of samples to generate.

int

0
numSquares The number of squares on each side.

int

5
seed RNG seed.

long

0

25.2.0.2 o..t…classification.example.ConcentricCirclesDataSource

javadoc

All configurable options for org.tribuo.classification.example.ConcentricCirclesDataSource:

name description type default
classProportion The proportion of the circle radius that forms class one.

double

0.5
numSamples Number of samples to generate.

int

0
radius The radius of the outer circle.

double

2.0
seed RNG seed.

long

0

25.2.0.3 o..t…classification.example.GaussianLabelDataSource

javadoc

All configurable options for org.tribuo.classification.example.GaussianLabelDataSource:

name description type default
firstCovarianceMatrix 4 element covariance matrix of the first Gaussian.

class [D

firstMean 2d mean of the first Gaussian.

class [D

numSamples Number of samples to generate.

int

0
secondCovarianceMatrix 4 element covariance matrix of the second Gaussian.

class [D

secondMean 2d mean of the second Gaussian.

class [D

seed RNG seed.

long

0

25.2.0.4 o..t…classification.example.InterlockingCrescentsDataSource

javadoc

All configurable options for org.tribuo.classification.example.InterlockingCrescentsDataSource:

name description type default
numSamples Number of samples to generate.

int

0
seed RNG seed.

long

0

25.2.0.5 o..t…classification.example.NoisyInterlockingCrescentsDataSource

javadoc

All configurable options for org.tribuo.classification.example.NoisyInterlockingCrescentsDataSource:

name description type default
numSamples Number of samples to generate.

int

0
seed RNG seed.

long

0
variance Variance of the Gaussian noise

double

0.1

25.2.0.6 o..t…classification.liblinear.LinearClassificationType

javadoc

All configurable options for org.tribuo.classification.liblinear.LinearClassificationType:

name description type default
type The type of classification model

class org.tribuo.classification.liblinear.LinearClassificationType$LinearType

25.2.0.7 o..t…classification.libsvm.SVMClassificationType

javadoc

All configurable options for org.tribuo.classification.libsvm.SVMClassificationType:

name description type default
type The SVM classification algorithm to use.

class org.tribuo.classification.libsvm.SVMClassificationType$SVMMode

25.2.0.8 o..t…classification.sequence.viterbi.DefaultFeatureExtractor

javadoc

All configurable options for org.tribuo.classification.sequence.viterbi.DefaultFeatureExtractor:

name description type default
leastRecentOutcome Position of the least recent output to include.

int

3
mostRecentOutcome Position of the most recent outcome to include.

int

1
use4gram Use 4-grams of the labels as features.

boolean

false
useBigram Use bigrams of the labels as features.

boolean

true
useTrigram Use trigrams of the labels as features.

boolean

true

25.2.0.9 o..t…classification.sgd.objectives.Hinge

javadoc

All configurable options for org.tribuo.classification.sgd.objectives.Hinge:

name description type default
margin The classification margin.

double

1.0

25.2.0.10 o..t…data.columnar.RowProcessor

javadoc

All configurable options for org.tribuo.data.columnar.RowProcessor:

name description type default
featureProcessors A set of feature processors to apply after extraction.

java.util.Set

[]
fieldProcessorList The list of field processors to use.

java.util.List

metadataExtractors Extractors for the example metadata.

java.util.List>

[]
regexMappingProcessors A map from a regex to field processors to apply to fields matching the regex.

java.util.Map

{}
replaceNewlinesWithSpaces Replace newlines with spaces in values before passing them to field processors.

boolean

true
responseProcessor Processor which extracts the response.

org.tribuo.data.columnar.ResponseProcessor

weightExtractor Extractor for the example weight.

org.tribuo.data.columnar.FieldExtractor

25.2.0.11 o..t…data.columnar.extractors.DateExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.DateExtractor:

name description type default
dateFormat The expected date format.

class java.lang.String

fieldName The field name to read.

class java.lang.String

localeCountry Sets the locale country.

class java.lang.String

localeLanguage Sets the locale language.

class java.lang.String

metadataName The metadata key to emit, defaults to field name if unpopulated

class java.lang.String

25.2.0.12 o..t…data.columnar.extractors.DoubleExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.DoubleExtractor:

name description type default
fieldName The field name to read.

class java.lang.String

metadataName The metadata key to emit, defaults to field name if unpopulated

class java.lang.String

25.2.0.13 o..t…data.columnar.extractors.FloatExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.FloatExtractor:

name description type default
fieldName The field name to read.

class java.lang.String

metadataName The metadata key to emit, defaults to field name if unpopulated

class java.lang.String

25.2.0.14 o..t…data.columnar.extractors.IdentityExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IdentityExtractor:

name description type default
fieldName The field name to read.

class java.lang.String

metadataName The metadata key to emit, defaults to field name if unpopulated

class java.lang.String

25.2.0.15 o..t…data.columnar.extractors.IndexExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IndexExtractor:

name description type default
metadataName The metadata key to emit, defaults to Example.NAME

class java.lang.String

name

25.2.0.16 o..t…data.columnar.extractors.IntExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IntExtractor:

name description type default
fieldName The field name to read.

class java.lang.String

metadataName The metadata key to emit, defaults to field name if unpopulated

class java.lang.String

25.2.0.17 o..t…data.columnar.extractors.OffsetDateTimeExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.OffsetDateTimeExtractor:

name description type default
dateTimeFormat The expected date format.

class java.lang.String

fieldName The field name to read.

class java.lang.String

localeCountry The locale country.

class java.lang.String

localeLanguage The locale language.

class java.lang.String

metadataName The metadata key to emit, defaults to field name if unpopulated

class java.lang.String

25.2.0.18 o..t…data.columnar.processors.feature.UniqueProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.feature.UniqueProcessor:

name description type default
reductionType The operation to perform.

class org.tribuo.data.columnar.processors.feature.UniqueProcessor$UniqueType

25.2.0.19 o..t…data.columnar.processors.field.DateFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.DateFieldProcessor:

name description type default
dateFormat The expected date format.

class java.lang.String

featureTypes The date features to extract.

java.util.EnumSet

fieldName The field name to read.

class java.lang.String

localeCountry Sets the locale country.

class java.lang.String

US
localeLanguage Sets the locale language.

class java.lang.String

en

25.2.0.20 o..t…data.columnar.processors.field.DoubleFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.DoubleFieldProcessor:

name description type default
fieldName The field name to read.

class java.lang.String

onlyFieldName Emit a feature using just the field name.

boolean

false
throwOnInvalid Throw NumberFormatException if the value failed to parse.

boolean

false

25.2.0.21 o..t…data.columnar.processors.field.IdentityProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.IdentityProcessor:

name description type default
fieldName The field name to read.

class java.lang.String

25.2.0.22 o..t…data.columnar.processors.field.RegexFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.RegexFieldProcessor:

name description type default
fieldName The field name to read.

class java.lang.String

modes Matching mode.

java.util.EnumSet

regexString Regex to apply to the field.

class java.lang.String

25.2.0.23 o..t…data.columnar.processors.field.TextFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.TextFieldProcessor:

name description type default
fieldName The field name to read.

class java.lang.String

pipeline Text processing pipeline to use.

interface org.tribuo.data.text.TextPipeline

25.2.0.24 o..t…data.columnar.processors.response.BinaryResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.BinaryResponseProcessor:

name description type default
displayField Whether to display field names as part of the generated output, defaults to false

boolean

false
fieldName The field name to read, you should use only one of this or fieldNames

class java.lang.String

fieldNames A list of field names to read, you should use only one of this or fieldName.

java.util.List

negativeName The negative response to emit.

class java.lang.String

0
outputFactory Output factory to use to create the response.

org.tribuo.OutputFactory

positiveName The positive response to emit.

class java.lang.String

1
positiveResponse The string which triggers a positive response.

class java.lang.String

positiveResponses A list of strings that trigger positive responses; it should be the same length as fieldNames or empty

java.util.List

25.2.0.25 o..t…data.columnar.processors.response.EmptyResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.EmptyResponseProcessor:

name description type default
outputFactory Output factory to type the columnar loader.

org.tribuo.OutputFactory

25.2.0.26 o..t…data.columnar.processors.response.FieldResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.FieldResponseProcessor:

name description type default
defaultValue Default value to return if one isn't found.

class java.lang.String

defaultValues A list of default values to return if one isn't found, one for each field

java.util.List

displayField Whether to display field names as part of the generated label, defaults to false

boolean

false
fieldName The field name to read.

class java.lang.String

fieldNames A list of field names to read, you should use only one of this or fieldName.

java.util.List

outputFactory The output factory to use.

org.tribuo.OutputFactory

uppercase Uppercase the value before converting to output.

boolean

true

25.2.0.27 o..t…data.columnar.processors.response.Quartile

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.Quartile:

name description type default
lowerMedian The lower quartile value.

double

0.0
median The median value.

double

0.0
upperMedian The upper quartile value.

double

0.0

25.2.0.28 o..t…data.columnar.processors.response.QuartileResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.QuartileResponseProcessor:

name description type default
fieldName The field name to read.

class java.lang.String

fieldNames A list of field names to read, you should use only one of this or fieldName.

java.util.List

name The string to emit.

class java.lang.String

outputFactory The output factory to use.

org.tribuo.OutputFactory

quartile The quartile to use.

class org.tribuo.data.columnar.processors.response.Quartile

quartiles A list of quartiles to use, should have the same length as fieldNames

java.util.List

25.2.0.29 o..t…data.csv.CSVDataSource

javadoc

All configurable options for org.tribuo.data.csv.CSVDataSource:

name description type default
dataPath Path to the CSV file.

interface java.nio.file.Path

headers The CSV headers. Should only be used if the csv file does not already contain headers.

java.util.List

[]
outputFactory The output factory to use.

org.tribuo.OutputFactory

outputRequired Is an output required from each row?

boolean

true
quote The CSV quote character.

char

"
rowProcessor The row processor to use.

org.tribuo.data.columnar.RowProcessor

separator The CSV separator character.

char

,

25.2.0.30 o..t…data.csv.CSVSaver

javadoc

All configurable options for org.tribuo.data.csv.CSVSaver:

name description type default
quote The quote character.

char

"
separator The column separator.

char

,

25.2.0.31 o..t…data.sql.SQLDBConfig

javadoc

All configurable options for org.tribuo.data.sql.SQLDBConfig:

name description type default
connectionString Connection string, including host, port and db.

class java.lang.String

db Database name.

class java.lang.String

fetchSize Size of batches to fetch from DB for queries

int

1000
host Hostname of the database machine.

class java.lang.String

password Database password.

class java.lang.String

port Port number.

class java.lang.String

propMap Properties to pass to java.sql.DriverManager, username and password will be removed and populated to their fields. If specified both on the map and in the fields, the fields will be used

java.util.Map

{}
username Database username.

class java.lang.String

25.2.0.32 o..t…data.sql.SQLDataSource

javadoc

All configurable options for org.tribuo.data.sql.SQLDataSource:

name description type default
outputFactory The output factory to use.

org.tribuo.OutputFactory

outputRequired Is an output required from each row?

boolean

true
rowProcessor The row processor to use.

org.tribuo.data.columnar.RowProcessor

sqlConfig Database configuration.

class org.tribuo.data.sql.SQLDBConfig

sqlString SQL query to run.

class java.lang.String

25.2.0.33 o..t…data.text.DirectoryFileSource

javadoc

All configurable options for org.tribuo.data.text.DirectoryFileSource:

name description type default
dataDir The top-level directory containing the data set.

interface java.nio.file.Path

.
extractor The feature extractor that converts text into examples.

org.tribuo.data.text.TextFeatureExtractor

outputFactory The output factory to use.

org.tribuo.OutputFactory

preprocessors The preprocessors to apply to the input documents.

java.util.List

[]

25.2.0.34 o..t…data.text.impl.BasicPipeline

javadoc

All configurable options for org.tribuo.data.text.impl.BasicPipeline:

name description type default
ngram n in the n-gram to emit.

int

2
tokenizer Tokenizer to use.

interface org.tribuo.util.tokens.Tokenizer

25.2.0.35 o..t…data.text.impl.CasingPreprocessor

javadoc

All configurable options for org.tribuo.data.text.impl.CasingPreprocessor:

name description type default
op Which casing operation to apply.

class org.tribuo.data.text.impl.CasingPreprocessor$CasingOperation

LOWERCASE

25.2.0.36 o..t…data.text.impl.FeatureHasher

javadoc

All configurable options for org.tribuo.data.text.impl.FeatureHasher:

name description type default
dimension Dimension to map the hash into.

int

0
hashSeed Seed used in the hash function.

int

38495
preserveValue Preserve input feature value.

boolean

false
valueHashSeed Seed used for value hash function.

int

77777

25.2.0.37 o..t…data.text.impl.NgramProcessor

javadoc

All configurable options for org.tribuo.data.text.impl.NgramProcessor:

name description type default
n n in the n-gram to emit.

int

2
tokenizer Tokenizer to use.

interface org.tribuo.util.tokens.Tokenizer

value Value to emit for each n-gram.

double

1.0

25.2.0.38 o..t…data.text.impl.RegexPreprocessor

javadoc

All configurable options for org.tribuo.data.text.impl.RegexPreprocessor:

name description type default
regexStrings A list of regular expressions in string format used to match the input

java.util.List

replacements A list of replacement strings which are used to replace the matches

java.util.List

25.2.0.39 o..t…data.text.impl.SimpleStringDataSource

javadoc

All configurable options for org.tribuo.data.text.impl.SimpleStringDataSource:

name description type default
extractor The feature extractor that generates Features from text.

org.tribuo.data.text.TextFeatureExtractor

outputFactory The factory that converts a String into an Output instance.

org.tribuo.OutputFactory

path The path to read the data from.

interface java.nio.file.Path

preprocessors The document preprocessors to run on each document in the data source.

java.util.List

[]
rawLines The input data lines.

java.util.List

25.2.0.40 o..t…data.text.impl.SimpleTextDataSource

javadoc

All configurable options for org.tribuo.data.text.impl.SimpleTextDataSource:

name description type default
extractor The feature extractor that generates Features from text.

org.tribuo.data.text.TextFeatureExtractor

outputFactory The factory that converts a String into an Output instance.

org.tribuo.OutputFactory

path The path to read the data from.

interface java.nio.file.Path

preprocessors The document preprocessors to run on each document in the data source.

java.util.List

[]

25.2.0.41 o..t…data.text.impl.TextFeatureExtractorImpl

javadoc

All configurable options for org.tribuo.data.text.impl.TextFeatureExtractorImpl:

name description type default
pipeline The text processing pipeline.

interface org.tribuo.data.text.TextPipeline

25.2.0.42 o..t…data.text.impl.TokenPipeline

javadoc

All configurable options for org.tribuo.data.text.impl.TokenPipeline:

name description type default
hashDim Dimension to map the hash into.

int

-1
hashPreserveValue Should feature hashing preserve the value?

boolean

true
ngram n in the n-gram to emit.

int

2
termCounting Use term counting, otherwise emit binary features.

boolean

false
tokenizer Tokenizer to use.

interface org.tribuo.util.tokens.Tokenizer

25.2.0.43 o..t…data.text.impl.UniqueAggregator

javadoc

All configurable options for org.tribuo.data.text.impl.UniqueAggregator:

name description type default
value Value to emit, if unset emits the last value observed for that token.

double

NaN

25.2.0.44 o..t…datasource.IDXDataSource

javadoc

All configurable options for org.tribuo.datasource.IDXDataSource:

name description type default
featuresPath Path to load the features from.

interface java.nio.file.Path

outputFactory The output factory to use.

org.tribuo.OutputFactory

outputPath Path to load the features from.

interface java.nio.file.Path

25.2.0.45 o..t…datasource.LibSVMDataSource

javadoc

All configurable options for org.tribuo.datasource.LibSVMDataSource:

name description type default
maxFeatureID Sets the maximum feature id to load from the file.

int

-2147483648
outputFactory The output factory to use.

org.tribuo.OutputFactory

path Path to load the data from. Either this or url must be set.

interface java.nio.file.Path

url URL to load the data from. Either this or path must be set.

class java.net.URL

zeroIndexed Set to true if the features are zero indexed.

boolean

false

25.2.0.46 o..t…hash.HashCodeHasher

javadoc

All configurable options for org.tribuo.hash.HashCodeHasher:

name description type default
salt Salt used in the hash.

class java.lang.String

25.2.0.47 o..t…hash.MessageDigestHasher

javadoc

All configurable options for org.tribuo.hash.MessageDigestHasher:

name description type default
hashType MessageDigest hashing function.

class java.lang.String

saltStr Salt used in the hash.

class java.lang.String

25.2.0.48 o..t…hash.ModHashCodeHasher

javadoc

All configurable options for org.tribuo.hash.ModHashCodeHasher:

name description type default
dimension Range of the hashing function.

int

100
salt Salt used in the hash.

class java.lang.String

25.2.0.49 o..t…math.kernel.Polynomial

javadoc

All configurable options for org.tribuo.math.kernel.Polynomial:

name description type default
degree Degree of the polynomial.

double

0.0
gamma Coefficient to multiply the dot product by.

double

0.0
intercept Scalar to add to the dot product.

double

0.0

25.2.0.50 o..t…math.kernel.RBF

javadoc

All configurable options for org.tribuo.math.kernel.RBF:

name description type default
gamma Kernel output = exp(-gamma*|u-v|^2).

double

0.0

25.2.0.51 o..t…math.kernel.Sigmoid

javadoc

All configurable options for org.tribuo.math.kernel.Sigmoid:

name description type default
gamma Coefficient to multiply the dot product by.

double

0.0
intercept Scalar intercept to add to the dot product.

double

0.0

25.2.0.52 o..t…math.neighbour.bruteforce.NeighboursBruteForceFactory

javadoc

All configurable options for org.tribuo.math.neighbour.bruteforce.NeighboursBruteForceFactory:

name description type default
distance The distance function to use.

interface org.tribuo.math.distance.Distance

L2Distance()
numThreads The number of threads to use for training.

int

1

25.2.0.53 o..t…math.neighbour.kdtree.KDTreeFactory

javadoc

All configurable options for org.tribuo.math.neighbour.kdtree.KDTreeFactory:

name description type default
distance The distance function to use.

interface org.tribuo.math.distance.Distance

L2Distance()
numThreads The number of threads to use for training.

int

1

25.2.0.54 o..t…math.optimisers.AdaDelta

javadoc

All configurable options for org.tribuo.math.optimisers.AdaDelta:

name description type default
epsilon Epsilon for numerical stability.

double

1.0E-6
rho Momentum value.

double

0.95

25.2.0.55 o..t…math.optimisers.AdaGrad

javadoc

All configurable options for org.tribuo.math.optimisers.AdaGrad:

name description type default
epsilon Epsilon for numerical stability around zero.

double

1.0E-6
initialLearningRate Initial learning rate used to scale the gradients.

double

0.0
initialValue Initial value for the gradient accumulator.

double

0.0

25.2.0.56 o..t…math.optimisers.AdaGradRDA

javadoc

All configurable options for org.tribuo.math.optimisers.AdaGradRDA:

name description type default
epsilon Epsilon for numerical stability around zero.

double

1.0E-6
initialLearningRate Initial learning rate used to scale the gradients.

double

0.0
l1 l1 regularization penalty.

double

0.0
l2 l2 regularization penalty.

double

0.0
numExamples Number of examples to scale the l1 and l2 penalties by.

int

1

25.2.0.57 o..t…math.optimisers.Adam

javadoc

All configurable options for org.tribuo.math.optimisers.Adam:

name description type default
betaOne The beta one parameter.

double

0.9
betaTwo The beta two parameter.

double

0.999
epsilon Epsilon for numerical stability.

double

1.0E-6
initialLearningRate Learning rate to scale the gradients by.

double

0.001

25.2.0.58 o..t…math.optimisers.LinearDecaySGD

javadoc

All configurable options for org.tribuo.math.optimisers.LinearDecaySGD:

name description type default
initialLearningRate Initial learning rate.

double

0.0
rho Momentum scaling factor.

double

0.0
useMomentum Momentum type to use.

class org.tribuo.math.optimisers.SGD$Momentum

25.2.0.59 o..t…math.optimisers.ParameterAveraging

javadoc

All configurable options for org.tribuo.math.optimisers.ParameterAveraging:

name description type default
optimiser Inner optimiser to average parameters across.

interface org.tribuo.math.StochasticGradientOptimiser

25.2.0.60 o..t…math.optimisers.Pegasos

javadoc

All configurable options for org.tribuo.math.optimisers.Pegasos:

name description type default
baseRate Base learning rate.

double

0.1
lambda Step size shrinkage.

double

0.01

25.2.0.61 o..t…math.optimisers.RMSProp

javadoc

All configurable options for org.tribuo.math.optimisers.RMSProp:

name description type default
decay Decay factor for the momentum.

double

0.0
epsilon Epsilon for numerical stability.

double

1.0E-8
initialLearningRate Learning rate to scale the gradients by.

double

0.0
rho Momentum parameter.

double

0.9

25.2.0.62 o..t…math.optimisers.SimpleSGD

javadoc

All configurable options for org.tribuo.math.optimisers.SimpleSGD:

name description type default
initialLearningRate Initial learning rate.

double

0.0
rho Momentum scaling factor.

double

0.0
useMomentum Momentum type to use.

class org.tribuo.math.optimisers.SGD$Momentum

25.2.0.63 o..t…math.optimisers.SqrtDecaySGD

javadoc

All configurable options for org.tribuo.math.optimisers.SqrtDecaySGD:

name description type default
initialLearningRate Initial learning rate.

double

0.0
rho Momentum scaling factor.

double

0.0
useMomentum Momentum type to use.

class org.tribuo.math.optimisers.SGD$Momentum

25.2.0.64 o..t…regression.RegressionFactory

javadoc

All configurable options for org.tribuo.regression.RegressionFactory:

name description type default
splitChar The character to split the dimensions on.

char

,

25.2.0.65 o..t…regression.example.GaussianDataSource

javadoc

All configurable options for org.tribuo.regression.example.GaussianDataSource:

name description type default
intercept The y-intercept of the line.

float

0.0
numSamples The number of samples to draw.

int

0
seed The RNG seed.

long

12345
slope The slope of the line.

float

0.0
variance The variance of the gaussian.

float

1.0
xMax The maximum feature value.

float

0.0
xMin The minimum feature value.

float

0.0

25.2.0.66 o..t…regression.example.NonlinearGaussianDataSource

javadoc

All configurable options for org.tribuo.regression.example.NonlinearGaussianDataSource:

name description type default
intercept The y-intercept of the line.

float

0.0
numSamples The number of samples to draw.

int

0
seed The RNG seed.

long

12345
variance The variance of the noise gaussian.

float

1.0
weights The feature weights. Must be a 4 element array.

class [F

[F@3fa016fb
xOneMax The maximum value of x_1.

float

2.0
xOneMin The minimum value of x_1.

float

-2.0
xZeroMax The maximum value of x_0.

float

2.0
xZeroMin The minimum value of x_0.

float

-2.0

25.2.0.67 o..t…regression.liblinear.LinearRegressionType

javadoc

All configurable options for org.tribuo.regression.liblinear.LinearRegressionType:

name description type default
type The type of regression algorithm.

class org.tribuo.regression.liblinear.LinearRegressionType$LinearType

25.2.0.68 o..t…regression.libsvm.SVMRegressionType

javadoc

All configurable options for org.tribuo.regression.libsvm.SVMRegressionType:

name description type default
type The SVM regression algorithm to use.

class org.tribuo.regression.libsvm.SVMRegressionType$SVMMode

25.2.0.69 o..t…regression.sgd.objectives.Huber

javadoc

All configurable options for org.tribuo.regression.sgd.objectives.Huber:

name description type default
cost Cost beyond which the loss function is linear.

double

5.0

25.2.0.70 o..t…transform.TransformationMap

javadoc

All configurable options for org.tribuo.transform.TransformationMap:

name description type default
featureTransformationList Feature specific transformations. Accepts regexes for feature names.

java.util.Map

{}
globalTransformations Global transformations to apply after the feature specific transforms.

java.util.List

25.2.0.71 o..t…transform.TransformationMap$TransformationList

javadoc

All configurable options for org.tribuo.transform.TransformationMap$TransformationList:

name description type default
list A list of transformations to apply.

java.util.List

25.2.0.72 o..t…transform.transformations.BinningTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.BinningTransformation:

name description type default
numBins Number of bins.

int

0
type Binning algorithm to use.

class org.tribuo.transform.transformations.BinningTransformation$BinningType

25.2.0.73 o..t…transform.transformations.LinearScalingTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.LinearScalingTransformation:

name description type default
targetMax Maximum value after transformation.

double

1.0
targetMin Minimum value after transformation.

double

0.0

25.2.0.74 o..t…transform.transformations.MeanStdDevTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.MeanStdDevTransformation:

name description type default
targetMean Mean value after transformation.

double

0.0
targetStdDev Standard deviation after transformation.

double

1.0

25.2.0.75 o..t…transform.transformations.SimpleTransform

javadoc

All configurable options for org.tribuo.transform.transformations.SimpleTransform:

name description type default
op Type of the simple transformation.

class org.tribuo.transform.transformations.SimpleTransform$Operation

operand Operand (if required).

double

NaN
secondOperand Second operand (if required).

double

NaN

25.2.0.76 o..t…util.tokens.impl.BreakIteratorTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.BreakIteratorTokenizer:

name description type default
localeStr The locale language tag string.

class java.lang.String

25.2.0.77 o..t…util.tokens.impl.SplitCharactersTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.SplitCharactersTokenizer:

name description type default
splitCharacters The characters to split on.

class [C

[C@6ca8e281
splitXDigitsCharacters The characters to split on unless we're in a number.

class [C

[C@311653d

25.2.0.78 o..t…util.tokens.impl.SplitPatternTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.SplitPatternTokenizer:

name description type default
splitPatternRegex The regex to split with.

class java.lang.String

[\.,]?\s+

25.2.0.79 o..t…util.tokens.impl.wordpiece.Wordpiece

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.Wordpiece:

name description type default
maxInputCharactersPerWord the maximum number of characters per word to consider. This helps eliminate doing extra work on pathological cases.

int

100
unknownToken the value to use for 'UNKNOWN' tokens. Defaults to '[UNK]' which is a common default in BERT-based solutions.

class java.lang.String

[UNK]
vocabPath path to a vocabulary data file.

class java.lang.String

25.2.0.80 o..t…util.tokens.impl.wordpiece.WordpieceBasicTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer:

name description type default
tokenizeChineseChars split on Chinese tokens?

boolean

true

25.2.0.81 o..t…util.tokens.impl.wordpiece.WordpieceTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceTokenizer:

name description type default
basicTokenizer performs some tokenization work on the input text before the wordpiece algorithm is applied to each resulting token.

interface org.tribuo.util.tokens.Tokenizer

org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer@39c0374f
neverSplitTokens a set of 'token' strings that should never be split regardless of whether they have e.g., punctuation in the middle. No entries should have whitespace in them.

java.util.Set

[]
stripAccents determines whether or not to strip accents/diacritics from the input text

boolean

true
toLowerCase determines whether or not to lowercase the input text

boolean

true
whitespaceTokenizer performs whitespace tokenization before 'basic' tokenizer is applied (see basicTokenizer)

interface org.tribuo.util.tokens.Tokenizer

org.tribuo.util.tokens.impl.WhitespaceTokenizer@317084d1
wordpiece an instance of Wordpiece which applies the 'wordpiece' algorithm

class org.tribuo.util.tokens.impl.wordpiece.Wordpiece

25.2.0.82 o..t…util.tokens.universal.UniversalTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.universal.UniversalTokenizer:

name description type default
sendPunct Send punctuation through as tokens.

boolean

false
source: notebooks/noj_book/tribuo_reference.clj