diff --git a/doc/parameter.rst b/doc/parameter.rst index 03951fc34bd2..342d96fcd10a 100644 --- a/doc/parameter.rst +++ b/doc/parameter.rst @@ -555,7 +555,8 @@ These are parameters specific to learning to rank task. See :doc:`Learning to Ra *********************** Command Line Parameters *********************** -The following parameters are only used in the console version of XGBoost +The following parameters are only used in the console version of XGBoost. The CLI has been +deprecated and will be removed in future releases. * ``num_round`` diff --git a/doc/tutorials/param_tuning.rst b/doc/tutorials/param_tuning.rst index 5ef8df003c21..114d118ab373 100644 --- a/doc/tutorials/param_tuning.rst +++ b/doc/tutorials/param_tuning.rst @@ -31,12 +31,17 @@ There are in general two ways that you can control overfitting in XGBoost: * The first way is to directly control model complexity. - - This includes ``max_depth``, ``min_child_weight`` and ``gamma``. + - This includes ``max_depth``, ``min_child_weight``, ``gamma``, ``max_cat_threshold`` + and other similar regularization parameters. See :doc:`` for a + comprehensive set of parameters. + - Set a constant ``base_score``. See :doc:`/tutorials/intercept` for more info. * The second way is to add randomness to make training robust to noise. - - This includes ``subsample`` and ``colsample_bytree``. - - You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so. + - This includes ``subsample`` and ``colsample_bytree``, which may be used with boosting + RF ``num_parallel_tree``. + - You can also reduce stepsize ``eta``, possibly with a training callback. Remember to + increase ``num_round`` when you do so. ************************* @@ -56,6 +61,25 @@ This can affect the training of XGBoost model, and there are two ways to improve - Set parameter ``max_delta_step`` to a finite number (say 1) to help convergence +************************************************* +Use Hyper Parameter Optimization (HPO) Frameworks +************************************************* +Tuning models is a sophisticated task and there are advanced frameworks to help you. For +examples, some meta estimators in scikit-learn like +:py:class:`sklearn.model_selection.HalvingGridSearchCV` can help guide the search +process. Optuna is another great option and there are many more based on different +branches of statistics. + +************** +Know Your Data +************** +It cannot be stressed enough the importance of understanding the data, sometimes that's +all it takes to get a good model. Many solutions use a simple XGBoost tree model without +much tuning and emphasize the data pre-processing step. XGBoost can help feature selection +by providing both a global feature importance score and sample feature importance with +SHAP value. Also, there are parameters specifically targeting categorical features, and +tasks like survival and ranking. Feel free to explore them. + ********************* Reducing Memory Usage *********************