12

I am fairly new to machine learning and statistics but I was wondering why bayesian optimization is not referred more often online when learning machine learning to optimize your algorithm hyperparameters? For example using a framework like this one: https://github.com/fmfn/BayesianOptimization

Does bayesian optimization of your hyperparameters have any limitation or major disadvantage over techniques like grid search or random search?

EtienneT
  • 273

1 Answers1

14
  1. results are sensitive to parameters of the surrogate model, which are typically fixed at some value; this underestimates uncertainty; or else you have to be fully Bayesian and marginalize over hyper parameter distributions, which can be expensive and unwieldy.
  2. it takes a dozen or so samples to get a good surrogate surface in 2 or 3 dimensions of search space; increasing dimensionality of the search space requires yet more samples
  3. Bayesian optimization itself depends on an optimizer to search the surrogate surface, which has its own costs -- this problem is (hopefully) cheaper to evaluate than the original problem, but it is still a non-convex box-constrained optimization problem (i.e., difficult!)
  4. estimating the BO model itself has costs

To state it another way, BO is an attempt to keep the number of function evaluations to a minimum, and get the most "bang for the buck" from each evaluation. This is important if you're conducting destructive tests, or just doing a simulation that takes an obscene amount of time to execute. But in all but the most expensive cases, apply pure random search and call it a day! (Or LIPO if your problem is amenable to its assumptions.) It can save you a number of headaches, such as optimizing your Bayesian Optimization program.

Sycorax
  • 90,934