25 Tribuo reference

As discussed in the Machine Learning chapter, this book contains reference chapters for machine learning models that can be registered in metamorph.ml.

This specific chapter focuses on the models of the Tribuo Java library, which is wrapped by scicloj.ml.tribuo.

The following is a reference for all Tribuo trainers. They can be used as the model specification in ml/train on the :type of the Tribuo trainer.

(comment
  (ml/train
   ds
   {:model-type :scicloj.ml.tribuo/classification
    :tribuo-components [{:name "random-forest"
                         :type "org.tribuo.classification.dtree.CARTClassificationTrainer"
                         :properties {:maxDepth "8"
                                      :useRandomSplitPoints "false"
                                      :fractionFeaturesInSplit "0.5"}}]
    :tribuo-trainer-name "random-forest"}))

There is also a reference to all non-trainer components of Tribuo. These could also be potentially used in Tribuo model specs.

25.1 Tribuo trainer reference

25.1.0.1 o..t…classification.baseline.DummyClassifierTrainer

javadoc

The DummyClassifier predicts a value, using a ‘dummy’ algorithm

(kind/md "It can for example always predict a :CONSTANT value")

It can for example always predict a :CONSTANT value

(def df
 (-> (tc/dataset {:a [1 2], :target [:x :x]})
  (ds-mod/set-inference-target :target)))

(kind/table df)

a	target
1	x
2	x

(def model
 (ml/train
   df
   {:model-type :scicloj.ml.tribuo/classification,
    :tribuo-components
    [{:name "dummy",
      :type
      "org.tribuo.classification.baseline.DummyClassifierTrainer",
      :properties {:dummyType :CONSTANT, :constantLabel "c"}}],
    :tribuo-trainer-name "dummy"}))

‘c’ in this case:

(ml/predict df model)

_unnamed [2 1]:

:target
:c
:c

All configurable options for org.tribuo.classification.baseline.DummyClassifierTrainer:

name	description	type	default
constantLabel	Label to use for the constant classifier.	class java.lang.String
dummyType	Type of dummy classifier.	class org.tribuo.classification.baseline.DummyClassifierTrainer$DummyType
seed	Seed for the RNG.	long	1

25.1.0.2 o..t…classification.dtree.CARTClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.dtree.CARTClassificationTrainer:

name	description	type	default
fractionFeaturesInSplit	The fraction of features to consider in each split. 1.0f indicates all features are considered.	float	1.0
impurity	The impurity measure used to determine split quality.	interface org.tribuo.classification.dtree.impurity.LabelImpurity	GiniIndex
maxDepth	The maximum depth of the tree.	int	2147483647
minChildWeight	The minimum weight allowed in a child node.	float	5.0
minImpurityDecrease	The decrease in impurity needed in order to split the node.	float	0.0
seed	The RNG seed to use when sampling features in a split.	long	12345
useRandomSplitPoints	Whether to choose split points for features at random.	boolean	false

25.1.0.3 o..t…classification.ensemble.AdaBoostTrainer

javadoc

All configurable options for org.tribuo.classification.ensemble.AdaBoostTrainer:

name	description	type	default
innerTrainer	The trainer to use to build each weak learner.	org.tribuo.Trainer<org.tribuo.classification.Label>
numMembers	The number of ensemble members to train.	int	0
seed	The seed for the RNG.	long	0

25.1.0.4 o..t…classification.liblinear.LibLinearClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.liblinear.LibLinearClassificationTrainer:

name	description	type	default
cost	Cost penalty for misclassifications.	double	1.0
epsilon	Epsilon insensitivity in the regression cost function.	double	0.1
labelWeights	Use Label specific weights.	java.util.Map<java.lang.String, java.lang.Float>	{}
maxIterations	Maximum number of iterations before terminating.	int	1000
seed	RNG seed.	long	12345
terminationCriterion	Stop iterating when the loss score decreases by less than this value.	double	0.1
trainerType	Algorithm to use.	org.tribuo.common.liblinear.LibLinearType	org.tribuo.classification.liblinear.LinearClassificationType@71587ac7

25.1.0.5 o..t…classification.libsvm.LibSVMClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.libsvm.LibSVMClassificationTrainer:

name	description	type	default
cache_size	Internal cache size, most of the time should be left at default.	double	500.0
coef0	Polynomial coefficient or shift in sigmoid kernel.	double	0.0
cost	Cost parameter for incorrect predictions.	double	1.0
degree	Polynomial degree.	int	3
eps	Tolerance of the termination criterion.	double	0.001
gamma	Width of the RBF kernel, or scalar on sigmoid kernel.	double	0.0
kernelType	Type of Kernel.	class org.tribuo.common.libsvm.KernelType	LINEAR
labelWeights	Use Label specific weights.	java.util.Map<java.lang.String, java.lang.Float>	{}
nu	nu value in NU SVM.	double	0.5
p	Epsilon in EPSILON_SVR.	double	0.001
probability	Generate probability estimates.	boolean	false
seed	RNG seed.	long	12345
shrinking	Regularise the weight parameters.	boolean	true
svmType	Type of SVM algorithm.	org.tribuo.common.libsvm.SVMType

25.1.0.6 o..t…classification.sgd.fm.FMClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.fm.FMClassificationTrainer:

name	description	type	default
epochs	The number of gradient descent epochs.	int	5
factorizedDimSize	The size of the factorized feature representation.	int	0
loggingInterval	Log values after this many updates.	int	-1
minibatchSize	Minibatch size in SGD.	int	1
objective	The classification objective function to use.	interface org.tribuo.classification.sgd.LabelObjective	LogMulticlass
optimiser	The gradient optimiser to use.	interface org.tribuo.math.StochasticGradientOptimiser	AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed	Seed for the RNG used to shuffle elements.	long	12345
shuffle	Shuffle the data before each epoch. Only turn off for debugging.	boolean	true
variance	The variance of the initializer.	double	0.0

25.1.0.7 o..t…classification.sgd.kernel.KernelSVMTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.kernel.KernelSVMTrainer:

name	description	type	default
epochs	Number of SGD epochs.	int	5
kernel	SVM kernel.	interface org.tribuo.math.kernel.Kernel
lambda	Step size.	double	0.0
loggingInterval	Log values after this many updates.	int	-1
seed	Seed for the RNG used to shuffle elements.	long	0
shuffle	Shuffle the data before each epoch. Only turn off for debugging.	boolean	true

25.1.0.8 o..t…classification.sgd.linear.LinearSGDTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.linear.LinearSGDTrainer:

name	description	type	default
epochs	The number of gradient descent epochs.	int	5
loggingInterval	Log values after this many updates.	int	-1
minibatchSize	Minibatch size in SGD.	int	1
objective	The classification objective function to use.	interface org.tribuo.classification.sgd.LabelObjective	LogMulticlass
optimiser	The gradient optimiser to use.	interface org.tribuo.math.StochasticGradientOptimiser	AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed	Seed for the RNG used to shuffle elements.	long	12345
shuffle	Shuffle the data before each epoch. Only turn off for debugging.	boolean	true

25.1.0.9 o..t…classification.sgd.linear.LogisticRegressionTrainer

javadoc

All configurable options for org.tribuo.classification.sgd.linear.LogisticRegressionTrainer:

name	description	type	default
epochs	The number of gradient descent epochs.	int	5
loggingInterval	Log values after this many updates.	int	1000
minibatchSize	Minibatch size in SGD.	int	1
objective	The classification objective function to use.	interface org.tribuo.classification.sgd.LabelObjective	LogMulticlass
optimiser	The gradient optimiser to use.	interface org.tribuo.math.StochasticGradientOptimiser	AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed	Seed for the RNG used to shuffle elements.	long	12345
shuffle	Shuffle the data before each epoch. Only turn off for debugging.	boolean	true

25.1.0.10 o..t…classification.xgboost.XGBoostClassificationTrainer

javadoc

All configurable options for org.tribuo.classification.xgboost.XGBoostClassificationTrainer:

name	description	type	default
alpha	l1 regularisation term on the weights.	double	1.0
booster	Type of the weak learner.	class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType	GBTREE
eta	The learning rate, shrinks the new tree output to prevent overfitting.	double	0.3
evalMetric	Evaluation metric to use. The default value is set based on the objective function, so this can be usually left blank.	class java.lang.String
featureSubsample	Independently subsample the features available for each node of each tree.	double	1.0
gamma	Minimum loss reduction needed to split a tree node.	double	0.0
lambda	l2 regularisation term on the weights.	double	1.0
maxDepth	The maximum depth of any tree.	int	6
minChildWeight	The minimum weight in each child node before a split is valid.	double	1.0
nThread	The number of threads to use at training time.	int	4
numTrees	The number of trees to build.	int	0
overrideParameters	Override for parameters, if used must contain all the relevant parameters, including the objective	java.util.Map<java.lang.String, java.lang.String>	{}
seed	The RNG seed.	long	12345
silent	Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'.	int	1
subsample	Independently subsample the examples for each tree.	double	1.0
treeMethod	The tree building algorithm to use.	class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod	AUTO
verbosity	Logging verbosity, 0 is silent, 3 is debug.	class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity	SILENT

25.1.0.11 o..t…common.tree.ExtraTreesTrainer

javadoc

All configurable options for org.tribuo.common.tree.ExtraTreesTrainer:

name	description	type	default
combiner	The combination function to aggregate each ensemble member's outputs.	org.tribuo.ensemble.EnsembleCombiner
innerTrainer	The trainer to use for each ensemble member.	org.tribuo.Trainer
numMembers	The number of ensemble members to train.	int	0
seed	The seed for the RNG.	long	0

25.1.0.12 o..t…common.tree.RandomForestTrainer

javadoc

All configurable options for org.tribuo.common.tree.RandomForestTrainer:

name	description	type	default
combiner	The combination function to aggregate each ensemble member's outputs.	org.tribuo.ensemble.EnsembleCombiner
innerTrainer	The trainer to use for each ensemble member.	org.tribuo.Trainer
numMembers	The number of ensemble members to train.	int	0
seed	The seed for the RNG.	long	0

25.1.0.13 o..t…ensemble.BaggingTrainer

javadoc

All configurable options for org.tribuo.ensemble.BaggingTrainer:

name	description	type	default
combiner	The combination function to aggregate each ensemble member's outputs.	org.tribuo.ensemble.EnsembleCombiner
innerTrainer	The trainer to use for each ensemble member.	org.tribuo.Trainer
numMembers	The number of ensemble members to train.	int	0
seed	The seed for the RNG.	long	0

25.1.0.14 o..t…hash.HashingTrainer

javadoc

All configurable options for org.tribuo.hash.HashingTrainer:

name	description	type	default
hasher	Feature hashing function to use.	class org.tribuo.hash.Hasher
innerTrainer	Trainer to use.	org.tribuo.Trainer

25.1.0.15 o..t…regression.baseline.DummyRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.baseline.DummyRegressionTrainer:

name	description	type	default
constantValue	Constant value to use for the constant regressor.	double	NaN
dummyType	Type of dummy regressor.	class org.tribuo.regression.baseline.DummyRegressionTrainer$DummyType
quartile	Quartile to use.	double	NaN
seed	The seed for the RNG.	long	1

25.1.0.16 o..t…regression.liblinear.LibLinearRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.liblinear.LibLinearRegressionTrainer:

name	description	type	default
cost	Cost penalty for misclassifications.	double	1.0
epsilon	Epsilon insensitivity in the regression cost function.	double	0.1
maxIterations	Maximum number of iterations before terminating.	int	1000
seed	RNG seed.	long	12345
terminationCriterion	Stop iterating when the loss score decreases by less than this value.	double	0.1
trainerType	Algorithm to use.	org.tribuo.common.liblinear.LibLinearType	org.tribuo.regression.liblinear.LinearRegressionType@128fa703

25.1.0.17 o..t…regression.libsvm.LibSVMRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.libsvm.LibSVMRegressionTrainer:

name	description	type	default
cache_size	Internal cache size, most of the time should be left at default.	double	500.0
coef0	Polynomial coefficient or shift in sigmoid kernel.	double	0.0
cost	Cost parameter for incorrect predictions.	double	1.0
degree	Polynomial degree.	int	3
eps	Tolerance of the termination criterion.	double	0.001
gamma	Width of the RBF kernel, or scalar on sigmoid kernel.	double	0.0
kernelType	Type of Kernel.	class org.tribuo.common.libsvm.KernelType	LINEAR
nu	nu value in NU SVM.	double	0.5
p	Epsilon in EPSILON_SVR.	double	0.001
probability	Generate probability estimates.	boolean	false
seed	RNG seed.	long	12345
shrinking	Regularise the weight parameters.	boolean	true
standardize	Standardise the regression outputs before training.	boolean	false
svmType	Type of SVM algorithm.	org.tribuo.common.libsvm.SVMType

25.1.0.18 o..t…regression.rtree.CARTJointRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.rtree.CARTJointRegressionTrainer:

name	description	type	default
fractionFeaturesInSplit	The fraction of features to consider in each split. 1.0f indicates all features are considered.	float	1.0
impurity	The regression impurity to use.	interface org.tribuo.regression.rtree.impurity.RegressorImpurity	MeanSquaredError
maxDepth	The maximum depth of the tree.	int	2147483647
minChildWeight	The minimum weight allowed in a child node.	float	5.0
minImpurityDecrease	The decrease in impurity needed in order to split the node.	float	0.0
normalize	Normalize the output of each leaf so it sums to one.	boolean	false
seed	The RNG seed to use when sampling features in a split.	long	12345
useRandomSplitPoints	Whether to choose split points for features at random.	boolean	false

25.1.0.19 o..t…regression.rtree.CARTRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.rtree.CARTRegressionTrainer:

name	description	type	default
fractionFeaturesInSplit	The fraction of features to consider in each split. 1.0f indicates all features are considered.	float	1.0
impurity	Regression impurity measure used to determine split quality.	interface org.tribuo.regression.rtree.impurity.RegressorImpurity	MeanSquaredError
maxDepth	The maximum depth of the tree.	int	2147483647
minChildWeight	The minimum weight allowed in a child node.	float	5.0
minImpurityDecrease	The decrease in impurity needed in order to split the node.	float	0.0
seed	The RNG seed to use when sampling features in a split.	long	12345
useRandomSplitPoints	Whether to choose split points for features at random.	boolean	false

25.1.0.20 o..t…regression.sgd.fm.FMRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.sgd.fm.FMRegressionTrainer:

name	description	type	default
epochs	The number of gradient descent epochs.	int	5
factorizedDimSize	The size of the factorized feature representation.	int	0
loggingInterval	Log values after this many updates.	int	-1
minibatchSize	Minibatch size in SGD.	int	1
objective	The regression objective to use.	interface org.tribuo.regression.sgd.RegressionObjective
optimiser	The gradient optimiser to use.	interface org.tribuo.math.StochasticGradientOptimiser	AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed	Seed for the RNG used to shuffle elements.	long	12345
shuffle	Shuffle the data before each epoch. Only turn off for debugging.	boolean	true
standardise	Standardise the output variables before fitting the model.	boolean	false
variance	The variance of the initializer.	double	0.0

25.1.0.21 o..t…regression.sgd.linear.LinearSGDTrainer

javadoc

All configurable options for org.tribuo.regression.sgd.linear.LinearSGDTrainer:

name	description	type	default
epochs	The number of gradient descent epochs.	int	5
loggingInterval	Log values after this many updates.	int	-1
minibatchSize	Minibatch size in SGD.	int	1
objective	The regression objective to use.	interface org.tribuo.regression.sgd.RegressionObjective
optimiser	The gradient optimiser to use.	interface org.tribuo.math.StochasticGradientOptimiser	AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0)
seed	Seed for the RNG used to shuffle elements.	long	12345
shuffle	Shuffle the data before each epoch. Only turn off for debugging.	boolean	true

25.1.0.22 o..t…regression.xgboost.XGBoostRegressionTrainer

javadoc

All configurable options for org.tribuo.regression.xgboost.XGBoostRegressionTrainer:

name	description	type	default
alpha	l1 regularisation term on the weights.	double	1.0
booster	Type of the weak learner.	class org.tribuo.common.xgboost.XGBoostTrainer$BoosterType	GBTREE
eta	The learning rate, shrinks the new tree output to prevent overfitting.	double	0.3
featureSubsample	Independently subsample the features available for each node of each tree.	double	1.0
gamma	Minimum loss reduction needed to split a tree node.	double	0.0
lambda	l2 regularisation term on the weights.	double	1.0
maxDepth	The maximum depth of any tree.	int	6
minChildWeight	The minimum weight in each child node before a split is valid.	double	1.0
nThread	The number of threads to use at training time.	int	4
numTrees	The number of trees to build.	int	0
overrideParameters	Override for parameters, if used must contain all the relevant parameters, including the objective	java.util.Map<java.lang.String, java.lang.String>	{}
rType	The type of regression.	class org.tribuo.regression.xgboost.XGBoostRegressionTrainer$RegressionType	LINEAR
seed	The RNG seed.	long	12345
silent	Quiesce all the logging output from the XGBoost C library. Deprecated in favour of 'verbosity'.	int	1
subsample	Independently subsample the examples for each tree.	double	1.0
treeMethod	The tree building algorithm to use.	class org.tribuo.common.xgboost.XGBoostTrainer$TreeMethod	AUTO
verbosity	Logging verbosity, 0 is silent, 3 is debug.	class org.tribuo.common.xgboost.XGBoostTrainer$LoggingVerbosity	SILENT

25.1.0.23 o..t…transform.TransformTrainer

javadoc

All configurable options for org.tribuo.transform.TransformTrainer:

name	description	type	default
densify	Densify all the features before applying transformations.	boolean	false
includeImplicitZeroFeatures	Include the implicit zeros in the transformation statistics collection	boolean	false
innerTrainer	Trainer to use.	org.tribuo.Trainer
transformations	Transformations to apply.	class org.tribuo.transform.TransformationMap

25.2 Tribuo component reference

25.2.0.1 o..t…classification.example.CheckerboardDataSource

javadoc

All configurable options for org.tribuo.classification.example.CheckerboardDataSource:

name	description	type	default
max	The maximum feature value.	double	10.0
min	The minimum feature value.	double	0.0
numSamples	Number of samples to generate.	int	0
numSquares	The number of squares on each side.	int	5
seed	RNG seed.	long	0

25.2.0.2 o..t…classification.example.ConcentricCirclesDataSource

javadoc

All configurable options for org.tribuo.classification.example.ConcentricCirclesDataSource:

name	description	type	default
classProportion	The proportion of the circle radius that forms class one.	double	0.5
numSamples	Number of samples to generate.	int	0
radius	The radius of the outer circle.	double	2.0
seed	RNG seed.	long	0

25.2.0.3 o..t…classification.example.GaussianLabelDataSource

javadoc

All configurable options for org.tribuo.classification.example.GaussianLabelDataSource:

name	description	type	default
firstCovarianceMatrix	4 element covariance matrix of the first Gaussian.	class [D
firstMean	2d mean of the first Gaussian.	class [D
numSamples	Number of samples to generate.	int	0
secondCovarianceMatrix	4 element covariance matrix of the second Gaussian.	class [D
secondMean	2d mean of the second Gaussian.	class [D
seed	RNG seed.	long	0

25.2.0.4 o..t…classification.example.InterlockingCrescentsDataSource

javadoc

All configurable options for org.tribuo.classification.example.InterlockingCrescentsDataSource:

name	description	type	default
numSamples	Number of samples to generate.	int	0
seed	RNG seed.	long	0

25.2.0.5 o..t…classification.example.NoisyInterlockingCrescentsDataSource

javadoc

All configurable options for org.tribuo.classification.example.NoisyInterlockingCrescentsDataSource:

name	description	type	default
numSamples	Number of samples to generate.	int	0
seed	RNG seed.	long	0
variance	Variance of the Gaussian noise	double	0.1

25.2.0.6 o..t…classification.liblinear.LinearClassificationType

javadoc

All configurable options for org.tribuo.classification.liblinear.LinearClassificationType:

name	description	type	default
type	The type of classification model	class org.tribuo.classification.liblinear.LinearClassificationType$LinearType

25.2.0.7 o..t…classification.libsvm.SVMClassificationType

javadoc

All configurable options for org.tribuo.classification.libsvm.SVMClassificationType:

name	description	type	default
type	The SVM classification algorithm to use.	class org.tribuo.classification.libsvm.SVMClassificationType$SVMMode

25.2.0.8 o..t…classification.sequence.viterbi.DefaultFeatureExtractor

javadoc

All configurable options for org.tribuo.classification.sequence.viterbi.DefaultFeatureExtractor:

name	description	type	default
leastRecentOutcome	Position of the least recent output to include.	int	3
mostRecentOutcome	Position of the most recent outcome to include.	int	1
use4gram	Use 4-grams of the labels as features.	boolean	false
useBigram	Use bigrams of the labels as features.	boolean	true
useTrigram	Use trigrams of the labels as features.	boolean	true

25.2.0.9 o..t…classification.sgd.objectives.Hinge

javadoc

All configurable options for org.tribuo.classification.sgd.objectives.Hinge:

name	description	type	default
margin	The classification margin.	double	1.0

25.2.0.10 o..t…data.columnar.RowProcessor

javadoc

All configurable options for org.tribuo.data.columnar.RowProcessor:

name	description	type	default
featureProcessors	A set of feature processors to apply after extraction.	java.util.Set<org.tribuo.data.columnar.FeatureProcessor>	[]
fieldProcessorList	The list of field processors to use.	java.util.List<org.tribuo.data.columnar.FieldProcessor>
metadataExtractors	Extractors for the example metadata.	java.util.List<org.tribuo.data.columnar.FieldExtractor<?>>	[]
regexMappingProcessors	A map from a regex to field processors to apply to fields matching the regex.	java.util.Map<java.lang.String, org.tribuo.data.columnar.FieldProcessor>	{}
replaceNewlinesWithSpaces	Replace newlines with spaces in values before passing them to field processors.	boolean	true
responseProcessor	Processor which extracts the response.	org.tribuo.data.columnar.ResponseProcessor
weightExtractor	Extractor for the example weight.	org.tribuo.data.columnar.FieldExtractor<java.lang.Float>

25.2.0.11 o..t…data.columnar.extractors.DateExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.DateExtractor:

name	description	type
dateFormat	The expected date format.	class java.lang.String
fieldName	The field name to read.	class java.lang.String
localeCountry	Sets the locale country.	class java.lang.String
localeLanguage	Sets the locale language.	class java.lang.String
metadataName	The metadata key to emit, defaults to field name if unpopulated	class java.lang.String

25.2.0.12 o..t…data.columnar.extractors.DoubleExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.DoubleExtractor:

name	description	type	default
fieldName	The field name to read.	class java.lang.String
metadataName	The metadata key to emit, defaults to field name if unpopulated	class java.lang.String

25.2.0.13 o..t…data.columnar.extractors.FloatExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.FloatExtractor:

name	description	type	default
fieldName	The field name to read.	class java.lang.String
metadataName	The metadata key to emit, defaults to field name if unpopulated	class java.lang.String

25.2.0.14 o..t…data.columnar.extractors.IdentityExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IdentityExtractor:

name	description	type	default
fieldName	The field name to read.	class java.lang.String
metadataName	The metadata key to emit, defaults to field name if unpopulated	class java.lang.String

25.2.0.15 o..t…data.columnar.extractors.IndexExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IndexExtractor:

name	description	type	default
metadataName	The metadata key to emit, defaults to Example.NAME	class java.lang.String	name

25.2.0.16 o..t…data.columnar.extractors.IntExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.IntExtractor:

name	description	type	default
fieldName	The field name to read.	class java.lang.String
metadataName	The metadata key to emit, defaults to field name if unpopulated	class java.lang.String

25.2.0.17 o..t…data.columnar.extractors.OffsetDateTimeExtractor

javadoc

All configurable options for org.tribuo.data.columnar.extractors.OffsetDateTimeExtractor:

name	description	type
dateTimeFormat	The expected date format.	class java.lang.String
fieldName	The field name to read.	class java.lang.String
localeCountry	The locale country.	class java.lang.String
localeLanguage	The locale language.	class java.lang.String
metadataName	The metadata key to emit, defaults to field name if unpopulated	class java.lang.String

25.2.0.18 o..t…data.columnar.processors.feature.UniqueProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.feature.UniqueProcessor:

name	description	type	default
reductionType	The operation to perform.	class org.tribuo.data.columnar.processors.feature.UniqueProcessor$UniqueType

25.2.0.19 o..t…data.columnar.processors.field.DateFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.DateFieldProcessor:

name	description	type	default
dateFormat	The expected date format.	class java.lang.String
featureTypes	The date features to extract.	java.util.EnumSet<org.tribuo.data.columnar.processors.field.DateFieldProcessor$DateFeatureType>
fieldName	The field name to read.	class java.lang.String
localeCountry	Sets the locale country.	class java.lang.String	US
localeLanguage	Sets the locale language.	class java.lang.String	en

25.2.0.20 o..t…data.columnar.processors.field.DoubleFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.DoubleFieldProcessor:

name	description	type	default
fieldName	The field name to read.	class java.lang.String
onlyFieldName	Emit a feature using just the field name.	boolean	false
throwOnInvalid	Throw NumberFormatException if the value failed to parse.	boolean	false

25.2.0.21 o..t…data.columnar.processors.field.IdentityProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.IdentityProcessor:

name	description	type	default
fieldName	The field name to read.	class java.lang.String

25.2.0.22 o..t…data.columnar.processors.field.RegexFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.RegexFieldProcessor:

name	description	type
fieldName	The field name to read.	class java.lang.String
modes	Matching mode.	java.util.EnumSet<org.tribuo.data.columnar.processors.field.RegexFieldProcessor$Mode>
regexString	Regex to apply to the field.	class java.lang.String

25.2.0.23 o..t…data.columnar.processors.field.TextFieldProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.field.TextFieldProcessor:

name	description	type	default
fieldName	The field name to read.	class java.lang.String
pipeline	Text processing pipeline to use.	interface org.tribuo.data.text.TextPipeline

25.2.0.24 o..t…data.columnar.processors.response.BinaryResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.BinaryResponseProcessor:

name	description	type	default
displayField	Whether to display field names as part of the generated output, defaults to false	boolean	false
fieldName	The field name to read, you should use only one of this or fieldNames	class java.lang.String
fieldNames	A list of field names to read, you should use only one of this or fieldName.	java.util.List<java.lang.String>
negativeName	The negative response to emit.	class java.lang.String	0
outputFactory	Output factory to use to create the response.	org.tribuo.OutputFactory
positiveName	The positive response to emit.	class java.lang.String	1
positiveResponse	The string which triggers a positive response.	class java.lang.String
positiveResponses	A list of strings that trigger positive responses; it should be the same length as fieldNames or empty	java.util.List<java.lang.String>

25.2.0.25 o..t…data.columnar.processors.response.EmptyResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.EmptyResponseProcessor:

name	description	type	default
outputFactory	Output factory to type the columnar loader.	org.tribuo.OutputFactory

25.2.0.26 o..t…data.columnar.processors.response.FieldResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.FieldResponseProcessor:

name	description	type	default
defaultValue	Default value to return if one isn't found.	class java.lang.String
defaultValues	A list of default values to return if one isn't found, one for each field	java.util.List<java.lang.String>
displayField	Whether to display field names as part of the generated label, defaults to false	boolean	false
fieldName	The field name to read.	class java.lang.String
fieldNames	A list of field names to read, you should use only one of this or fieldName.	java.util.List<java.lang.String>
outputFactory	The output factory to use.	org.tribuo.OutputFactory
uppercase	Uppercase the value before converting to output.	boolean	true

25.2.0.27 o..t…data.columnar.processors.response.Quartile

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.Quartile:

name	description	type
lowerMedian	The lower quartile value.	double
median	The median value.	double
upperMedian	The upper quartile value.	double

25.2.0.28 o..t…data.columnar.processors.response.QuartileResponseProcessor

javadoc

All configurable options for org.tribuo.data.columnar.processors.response.QuartileResponseProcessor:

name	description	type
fieldName	The field name to read.	class java.lang.String
fieldNames	A list of field names to read, you should use only one of this or fieldName.	java.util.List<java.lang.String>
name	The string to emit.	class java.lang.String
outputFactory	The output factory to use.	org.tribuo.OutputFactory
quartile	The quartile to use.	class org.tribuo.data.columnar.processors.response.Quartile
quartiles	A list of quartiles to use, should have the same length as fieldNames	java.util.List<org.tribuo.data.columnar.processors.response.Quartile>

25.2.0.29 o..t…data.csv.CSVDataSource

javadoc

All configurable options for org.tribuo.data.csv.CSVDataSource:

name	description	type	default
dataPath	Path to the CSV file.	interface java.nio.file.Path
headers	The CSV headers. Should only be used if the csv file does not already contain headers.	java.util.List<java.lang.String>	[]
outputFactory	The output factory to use.	org.tribuo.OutputFactory
outputRequired	Is an output required from each row?	boolean	true
quote	The CSV quote character.	char	"
rowProcessor	The row processor to use.	org.tribuo.data.columnar.RowProcessor
separator	The CSV separator character.	char	,

25.2.0.30 o..t…data.csv.CSVSaver

javadoc

All configurable options for org.tribuo.data.csv.CSVSaver:

name	description	type	default
quote	The quote character.	char	"
separator	The column separator.	char	,

25.2.0.31 o..t…data.sql.SQLDBConfig

javadoc

All configurable options for org.tribuo.data.sql.SQLDBConfig:

name	description	type	default
connectionString	Connection string, including host, port and db.	class java.lang.String
db	Database name.	class java.lang.String
fetchSize	Size of batches to fetch from DB for queries	int	1000
host	Hostname of the database machine.	class java.lang.String
password	Database password.	class java.lang.String
port	Port number.	class java.lang.String
propMap	Properties to pass to java.sql.DriverManager, username and password will be removed and populated to their fields. If specified both on the map and in the fields, the fields will be used	java.util.Map<java.lang.String, java.lang.String>	{}
username	Database username.	class java.lang.String

25.2.0.32 o..t…data.sql.SQLDataSource

javadoc

All configurable options for org.tribuo.data.sql.SQLDataSource:

name	description	type	default
outputFactory	The output factory to use.	org.tribuo.OutputFactory
outputRequired	Is an output required from each row?	boolean	true
rowProcessor	The row processor to use.	org.tribuo.data.columnar.RowProcessor
sqlConfig	Database configuration.	class org.tribuo.data.sql.SQLDBConfig
sqlString	SQL query to run.	class java.lang.String

25.2.0.33 o..t…data.text.DirectoryFileSource

javadoc

All configurable options for org.tribuo.data.text.DirectoryFileSource:

name	description	type	default
dataDir	The top-level directory containing the data set.	interface java.nio.file.Path	.
extractor	The feature extractor that converts text into examples.	org.tribuo.data.text.TextFeatureExtractor
outputFactory	The output factory to use.	org.tribuo.OutputFactory
preprocessors	The preprocessors to apply to the input documents.	java.util.List<org.tribuo.data.text.DocumentPreprocessor>	[]

25.2.0.34 o..t…data.text.impl.BasicPipeline

javadoc

All configurable options for org.tribuo.data.text.impl.BasicPipeline:

name	description	type	default
ngram	n in the n-gram to emit.	int	2
tokenizer	Tokenizer to use.	interface org.tribuo.util.tokens.Tokenizer

25.2.0.35 o..t…data.text.impl.CasingPreprocessor

javadoc

All configurable options for org.tribuo.data.text.impl.CasingPreprocessor:

name	description	type	default
op	Which casing operation to apply.	class org.tribuo.data.text.impl.CasingPreprocessor$CasingOperation	LOWERCASE

25.2.0.36 o..t…data.text.impl.FeatureHasher

javadoc

All configurable options for org.tribuo.data.text.impl.FeatureHasher:

name	description	type	default
dimension	Dimension to map the hash into.	int	0
hashSeed	Seed used in the hash function.	int	38495
preserveValue	Preserve input feature value.	boolean	false
valueHashSeed	Seed used for value hash function.	int	77777

25.2.0.37 o..t…data.text.impl.NgramProcessor

javadoc

All configurable options for org.tribuo.data.text.impl.NgramProcessor:

name	description	type	default
n	n in the n-gram to emit.	int	2
tokenizer	Tokenizer to use.	interface org.tribuo.util.tokens.Tokenizer
value	Value to emit for each n-gram.	double	1.0

25.2.0.38 o..t…data.text.impl.RegexPreprocessor

javadoc

All configurable options for org.tribuo.data.text.impl.RegexPreprocessor:

name	description	type	default
regexStrings	A list of regular expressions in string format used to match the input	java.util.List<java.lang.String>
replacements	A list of replacement strings which are used to replace the matches	java.util.List<java.lang.String>

25.2.0.39 o..t…data.text.impl.SimpleStringDataSource

javadoc

All configurable options for org.tribuo.data.text.impl.SimpleStringDataSource:

name	description	type	default
extractor	The feature extractor that generates Features from text.	org.tribuo.data.text.TextFeatureExtractor
outputFactory	The factory that converts a String into an Output instance.	org.tribuo.OutputFactory
path	The path to read the data from.	interface java.nio.file.Path
preprocessors	The document preprocessors to run on each document in the data source.	java.util.List<org.tribuo.data.text.DocumentPreprocessor>	[]
rawLines	The input data lines.	java.util.List<java.lang.String>

25.2.0.40 o..t…data.text.impl.SimpleTextDataSource

javadoc

All configurable options for org.tribuo.data.text.impl.SimpleTextDataSource:

name	description	type	default
extractor	The feature extractor that generates Features from text.	org.tribuo.data.text.TextFeatureExtractor
outputFactory	The factory that converts a String into an Output instance.	org.tribuo.OutputFactory
path	The path to read the data from.	interface java.nio.file.Path
preprocessors	The document preprocessors to run on each document in the data source.	java.util.List<org.tribuo.data.text.DocumentPreprocessor>	[]

25.2.0.41 o..t…data.text.impl.TextFeatureExtractorImpl

javadoc

All configurable options for org.tribuo.data.text.impl.TextFeatureExtractorImpl:

name	description	type	default
pipeline	The text processing pipeline.	interface org.tribuo.data.text.TextPipeline

25.2.0.42 o..t…data.text.impl.TokenPipeline

javadoc

All configurable options for org.tribuo.data.text.impl.TokenPipeline:

name	description	type	default
hashDim	Dimension to map the hash into.	int	-1
hashPreserveValue	Should feature hashing preserve the value?	boolean	true
ngram	n in the n-gram to emit.	int	2
termCounting	Use term counting, otherwise emit binary features.	boolean	false
tokenizer	Tokenizer to use.	interface org.tribuo.util.tokens.Tokenizer

25.2.0.43 o..t…data.text.impl.UniqueAggregator

javadoc

All configurable options for org.tribuo.data.text.impl.UniqueAggregator:

name	description	type	default
value	Value to emit, if unset emits the last value observed for that token.	double	NaN

25.2.0.44 o..t…datasource.IDXDataSource

javadoc

All configurable options for org.tribuo.datasource.IDXDataSource:

name	description	type
featuresPath	Path to load the features from.	interface java.nio.file.Path
outputFactory	The output factory to use.	org.tribuo.OutputFactory
outputPath	Path to load the features from.	interface java.nio.file.Path

25.2.0.45 o..t…datasource.LibSVMDataSource

javadoc

All configurable options for org.tribuo.datasource.LibSVMDataSource:

name	description	type	default
maxFeatureID	Sets the maximum feature id to load from the file.	int	-2147483648
outputFactory	The output factory to use.	org.tribuo.OutputFactory
path	Path to load the data from. Either this or url must be set.	interface java.nio.file.Path
url	URL to load the data from. Either this or path must be set.	class java.net.URL
zeroIndexed	Set to true if the features are zero indexed.	boolean	false

25.2.0.46 o..t…hash.HashCodeHasher

javadoc

All configurable options for org.tribuo.hash.HashCodeHasher:

name	description	type	default
salt	Salt used in the hash.	class java.lang.String

25.2.0.47 o..t…hash.MessageDigestHasher

javadoc

All configurable options for org.tribuo.hash.MessageDigestHasher:

name	description	type	default
hashType	MessageDigest hashing function.	class java.lang.String
saltStr	Salt used in the hash.	class java.lang.String

25.2.0.48 o..t…hash.ModHashCodeHasher

javadoc

All configurable options for org.tribuo.hash.ModHashCodeHasher:

name	description	type	default
dimension	Range of the hashing function.	int	100
salt	Salt used in the hash.	class java.lang.String

25.2.0.49 o..t…math.kernel.Polynomial

javadoc

All configurable options for org.tribuo.math.kernel.Polynomial:

name	description	type
degree	Degree of the polynomial.	double
gamma	Coefficient to multiply the dot product by.	double
intercept	Scalar to add to the dot product.	double

25.2.0.50 o..t…math.kernel.RBF

javadoc

All configurable options for org.tribuo.math.kernel.RBF:

name	description	type	default
gamma	Kernel output = exp(-gamma*\|u-v\|^2).	double	0.0

25.2.0.51 o..t…math.kernel.Sigmoid

javadoc

All configurable options for org.tribuo.math.kernel.Sigmoid:

name	description	type	default
gamma	Coefficient to multiply the dot product by.	double	0.0
intercept	Scalar intercept to add to the dot product.	double	0.0

25.2.0.52 o..t…math.neighbour.bruteforce.NeighboursBruteForceFactory

javadoc

All configurable options for org.tribuo.math.neighbour.bruteforce.NeighboursBruteForceFactory:

name	description	type	default
distance	The distance function to use.	interface org.tribuo.math.distance.Distance	L2Distance()
numThreads	The number of threads to use for training.	int	1

25.2.0.53 o..t…math.neighbour.kdtree.KDTreeFactory

javadoc

All configurable options for org.tribuo.math.neighbour.kdtree.KDTreeFactory:

name	description	type	default
distance	The distance function to use.	interface org.tribuo.math.distance.Distance	L2Distance()
numThreads	The number of threads to use for training.	int	1

25.2.0.54 o..t…math.optimisers.AdaDelta

javadoc

All configurable options for org.tribuo.math.optimisers.AdaDelta:

name	description	type	default
epsilon	Epsilon for numerical stability.	double	1.0E-6
rho	Momentum value.	double	0.95

25.2.0.55 o..t…math.optimisers.AdaGrad

javadoc

All configurable options for org.tribuo.math.optimisers.AdaGrad:

name	description	type	default
epsilon	Epsilon for numerical stability around zero.	double	1.0E-6
initialLearningRate	Initial learning rate used to scale the gradients.	double	0.0
initialValue	Initial value for the gradient accumulator.	double	0.0

25.2.0.56 o..t…math.optimisers.AdaGradRDA

javadoc

All configurable options for org.tribuo.math.optimisers.AdaGradRDA:

name	description	type	default
epsilon	Epsilon for numerical stability around zero.	double	1.0E-6
initialLearningRate	Initial learning rate used to scale the gradients.	double	0.0
l1	l1 regularization penalty.	double	0.0
l2	l2 regularization penalty.	double	0.0
numExamples	Number of examples to scale the l1 and l2 penalties by.	int	1

25.2.0.57 o..t…math.optimisers.Adam

javadoc

All configurable options for org.tribuo.math.optimisers.Adam:

name	description	type	default
betaOne	The beta one parameter.	double	0.9
betaTwo	The beta two parameter.	double	0.999
epsilon	Epsilon for numerical stability.	double	1.0E-6
initialLearningRate	Learning rate to scale the gradients by.	double	0.001

25.2.0.58 o..t…math.optimisers.LinearDecaySGD

javadoc

All configurable options for org.tribuo.math.optimisers.LinearDecaySGD:

name	description	type	default
initialLearningRate	Initial learning rate.	double	0.0
rho	Momentum scaling factor.	double	0.0
useMomentum	Momentum type to use.	class org.tribuo.math.optimisers.SGD$Momentum

25.2.0.59 o..t…math.optimisers.ParameterAveraging

javadoc

All configurable options for org.tribuo.math.optimisers.ParameterAveraging:

name	description	type	default
optimiser	Inner optimiser to average parameters across.	interface org.tribuo.math.StochasticGradientOptimiser

25.2.0.60 o..t…math.optimisers.Pegasos

javadoc

All configurable options for org.tribuo.math.optimisers.Pegasos:

name	description	type	default
baseRate	Base learning rate.	double	0.1
lambda	Step size shrinkage.	double	0.01

25.2.0.61 o..t…math.optimisers.RMSProp

javadoc

All configurable options for org.tribuo.math.optimisers.RMSProp:

name	description	type	default
decay	Decay factor for the momentum.	double	0.0
epsilon	Epsilon for numerical stability.	double	1.0E-8
initialLearningRate	Learning rate to scale the gradients by.	double	0.0
rho	Momentum parameter.	double	0.9

25.2.0.62 o..t…math.optimisers.SimpleSGD

javadoc

All configurable options for org.tribuo.math.optimisers.SimpleSGD:

name	description	type	default
initialLearningRate	Initial learning rate.	double	0.0
rho	Momentum scaling factor.	double	0.0
useMomentum	Momentum type to use.	class org.tribuo.math.optimisers.SGD$Momentum

25.2.0.63 o..t…math.optimisers.SqrtDecaySGD

javadoc

All configurable options for org.tribuo.math.optimisers.SqrtDecaySGD:

name	description	type	default
initialLearningRate	Initial learning rate.	double	0.0
rho	Momentum scaling factor.	double	0.0
useMomentum	Momentum type to use.	class org.tribuo.math.optimisers.SGD$Momentum

25.2.0.64 o..t…regression.RegressionFactory

javadoc

All configurable options for org.tribuo.regression.RegressionFactory:

name	description	type	default
splitChar	The character to split the dimensions on.	char	,

25.2.0.65 o..t…regression.example.GaussianDataSource

javadoc

All configurable options for org.tribuo.regression.example.GaussianDataSource:

name	description	type	default
intercept	The y-intercept of the line.	float	0.0
numSamples	The number of samples to draw.	int	0
seed	The RNG seed.	long	12345
slope	The slope of the line.	float	0.0
variance	The variance of the gaussian.	float	1.0
xMax	The maximum feature value.	float	0.0
xMin	The minimum feature value.	float	0.0

25.2.0.66 o..t…regression.example.NonlinearGaussianDataSource

javadoc

All configurable options for org.tribuo.regression.example.NonlinearGaussianDataSource:

name	description	type	default
intercept	The y-intercept of the line.	float	0.0
numSamples	The number of samples to draw.	int	0
seed	The RNG seed.	long	12345
variance	The variance of the noise gaussian.	float	1.0
weights	The feature weights. Must be a 4 element array.	class [F	[F@54fe1e2a
xOneMax	The maximum value of x_1.	float	2.0
xOneMin	The minimum value of x_1.	float	-2.0
xZeroMax	The maximum value of x_0.	float	2.0
xZeroMin	The minimum value of x_0.	float	-2.0

25.2.0.67 o..t…regression.liblinear.LinearRegressionType

javadoc

All configurable options for org.tribuo.regression.liblinear.LinearRegressionType:

name	description	type	default
type	The type of regression algorithm.	class org.tribuo.regression.liblinear.LinearRegressionType$LinearType

25.2.0.68 o..t…regression.libsvm.SVMRegressionType

javadoc

All configurable options for org.tribuo.regression.libsvm.SVMRegressionType:

name	description	type	default
type	The SVM regression algorithm to use.	class org.tribuo.regression.libsvm.SVMRegressionType$SVMMode

25.2.0.69 o..t…regression.sgd.objectives.Huber

javadoc

All configurable options for org.tribuo.regression.sgd.objectives.Huber:

name	description	type	default
cost	Cost beyond which the loss function is linear.	double	5.0

25.2.0.70 o..t…transform.TransformationMap

javadoc

All configurable options for org.tribuo.transform.TransformationMap:

name	description	type	default
featureTransformationList	Feature specific transformations. Accepts regexes for feature names.	java.util.Map<java.lang.String, org.tribuo.transform.TransformationMap$TransformationList>	{}
globalTransformations	Global transformations to apply after the feature specific transforms.	java.util.List<org.tribuo.transform.Transformation>

25.2.0.71 o..t…transform.TransformationMap$TransformationList

javadoc

All configurable options for org.tribuo.transform.TransformationMap$TransformationList:

name	description	type	default
list	A list of transformations to apply.	java.util.List<org.tribuo.transform.Transformation>

25.2.0.72 o..t…transform.transformations.BinningTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.BinningTransformation:

name	description	type	default
numBins	Number of bins.	int	0
type	Binning algorithm to use.	class org.tribuo.transform.transformations.BinningTransformation$BinningType

25.2.0.73 o..t…transform.transformations.LinearScalingTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.LinearScalingTransformation:

name	description	type	default
targetMax	Maximum value after transformation.	double	1.0
targetMin	Minimum value after transformation.	double	0.0

25.2.0.74 o..t…transform.transformations.MeanStdDevTransformation

javadoc

All configurable options for org.tribuo.transform.transformations.MeanStdDevTransformation:

name	description	type	default
targetMean	Mean value after transformation.	double	0.0
targetStdDev	Standard deviation after transformation.	double	1.0

25.2.0.75 o..t…transform.transformations.SimpleTransform

javadoc

All configurable options for org.tribuo.transform.transformations.SimpleTransform:

name	description	type	default
op	Type of the simple transformation.	class org.tribuo.transform.transformations.SimpleTransform$Operation
operand	Operand (if required).	double	NaN
secondOperand	Second operand (if required).	double	NaN

25.2.0.76 o..t…util.tokens.impl.BreakIteratorTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.BreakIteratorTokenizer:

name	description	type	default
localeStr	The locale language tag string.	class java.lang.String

25.2.0.77 o..t…util.tokens.impl.SplitCharactersTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.SplitCharactersTokenizer:

name	description	type	default
splitCharacters	The characters to split on.	class [C	[C@4464abe4
splitXDigitsCharacters	The characters to split on unless we're in a number.	class [C	[C@3902b1d4

25.2.0.78 o..t…util.tokens.impl.SplitPatternTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.SplitPatternTokenizer:

name	description	type	default
splitPatternRegex	The regex to split with.	class java.lang.String	[\.,]?\s+

25.2.0.79 o..t…util.tokens.impl.wordpiece.Wordpiece

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.Wordpiece:

name	description	type	default
maxInputCharactersPerWord	the maximum number of characters per word to consider. This helps eliminate doing extra work on pathological cases.	int	100
unknownToken	the value to use for 'UNKNOWN' tokens. Defaults to '[UNK]' which is a common default in BERT-based solutions.	class java.lang.String	[UNK]
vocabPath	path to a vocabulary data file.	class java.lang.String

25.2.0.80 o..t…util.tokens.impl.wordpiece.WordpieceBasicTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer:

name	description	type	default
tokenizeChineseChars	split on Chinese tokens?	boolean	true

25.2.0.81 o..t…util.tokens.impl.wordpiece.WordpieceTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.impl.wordpiece.WordpieceTokenizer:

name	description	type	default
basicTokenizer	performs some tokenization work on the input text before the wordpiece algorithm is applied to each resulting token.	interface org.tribuo.util.tokens.Tokenizer	org.tribuo.util.tokens.impl.wordpiece.WordpieceBasicTokenizer@159dbcfa
neverSplitTokens	a set of 'token' strings that should never be split regardless of whether they have e.g., punctuation in the middle. No entries should have whitespace in them.	java.util.Set<java.lang.String>	[]
stripAccents	determines whether or not to strip accents/diacritics from the input text	boolean	true
toLowerCase	determines whether or not to lowercase the input text	boolean	true
whitespaceTokenizer	performs whitespace tokenization before 'basic' tokenizer is applied (see basicTokenizer)	interface org.tribuo.util.tokens.Tokenizer	org.tribuo.util.tokens.impl.WhitespaceTokenizer@393d3618
wordpiece	an instance of Wordpiece which applies the 'wordpiece' algorithm	class org.tribuo.util.tokens.impl.wordpiece.Wordpiece

25.2.0.82 o..t…util.tokens.universal.UniversalTokenizer

javadoc

All configurable options for org.tribuo.util.tokens.universal.UniversalTokenizer:

name	description	type	default
sendPunct	Send punctuation through as tokens.	boolean	false

source: notebooks/noj_book/tribuo_reference.clj