eta |
Step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative. |
gamma |
Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma is, the more conservative the algorithm will be. |
max-depth |
Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 is only accepted in lossguide growing policy when tree_method is set as hist or gpu_hist and it indicates no limit on depth. Beware that XGBoost aggressively consumes memory when training a deep tree. |
min-child-weight |
Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression task, this simply corresponds to minimum number of instances needed to be in each node. The larger min_child_weight is, the more conservative the algorithm will be. |
max_delta_step |
Maximum delta step we allow each leaf output to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update. |
subsample |
Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees. and this will prevent overfitting. Subsampling will occur once in every boosting iteration. |
sampling_method |
The method to use to sample the training instances. uniform: each training instance has an equal probability of being selected. Typically set subsample >= 0.5 for good results. gradient_based: the selection probability for each training instance is proportional to the regularized absolute value of gradients (more specifically, ). subsample may be set to as low as 0.1 without loss of model accuracy. Note that this sampling method is only supported when tree_method is set to gpu_hist; other tree methods only support uniform sampling. |
colsample_bytree |
|
colsample_bylevel |
|
colsample_bynode |
|
lambda |
L2 regularization term on weights. Increasing this value will make model more conservative. |
alpha |
L1 regularization term on weights. Increasing this value will make model more conservative. |
tree_method |
|
sketch_eps |
|
scale_pos_weight |
|
updater |
|
refresh_leaf |
|
process_type |
|
grow_policy |
|
max_leaves |
|
max_bin |
|
predictor |
|
num_parallel_tree |
|
monotone_constraints |
|
interaction_constraints |
|