2

How do people decide on the ranges for hyperparameters to tune?

For example, I am tuning an xgboost model, I've been following a guide on kaggle to set the ranges of each hyperparameter to then do a bayesian optimisation gridsearch. E.g. this guide lists the typical values for max_depth of xgboost as 3-10 - how is this range decided on as typical?

I chose to read about the hyperparameter and set:

xgb_parameters = {
    'max_depth':  (1, 4), ... }

I chose 1-4 as I read a large depth is more computationally expensive and an overfitting risk, but I have no reason for choosing 4 to be the end of my range specifically.

Is there a resource or paper I should be referring to to understand the ranges of hyperparameters for any models I'm interested in? Or is it generally set by trial and error depending on your prediction problem? Or am I worrying too much about needing exact reasoning for ranges and as long as I have a reason for my general range its acceptable?

LN3
  • 23
  • Grid search is not a great way to choose hyperparameters, because the same values are tested again and again, whether or not those values have a large influence on the model's quality. Better alternatives include random search, LIPO and Bayesian optimization. https://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate/193310#193310 – Sycorax Jun 06 '22 at 13:54
  • I think the OP is already intending to use bayesian HPO. – gunes Jun 06 '22 at 19:30

1 Answers1

1

this guide lists the typical values for max_depth of xgboost as 3-10 - how is this range decided on as typical?

You guessed it correct. Typical means in most problems, this parameter is chosen something in between 3-10. This is based on experimentation, trial and error and a just rough guide. Another guide could have said 3-12, and it wouldn't be a wrong guide. This is problem and data dependent. And, there are computational considerations as well. Maybe the business problem is not that sensitive and you wouldn't want to wait a whole day just to get a 0.5% improvement.

However, although there aren't set in stone rules for choosing max depth candidates (at least to the best of my knowledge), you can choose reasonable values/ranges based on your number of features, and dataset size. For instance, if your dataset has around 1000 samples, and if each split is assumed to split the space into half, in 10 splits, your leaves would be left with only one or two samples, which seems like overfitting (not necessarily but it looks like). You can come up with some bounds following this kind of logic.

gunes
  • 57,205