What are some of the disavantage of bayesian hyper parameter optimization?

Question

I am fairly new to machine learning and statistics but I was wondering why bayesian optimization is not referred more often online when learning machine learning to optimize your algorithm hyperparameters? For example using a framework like this one: https://github.com/fmfn/BayesianOptimization

Does bayesian optimization of your hyperparameters have any limitation or major disadvantage over techniques like grid search or random search?

No free lunch in search and optimization. In general, unless cost function evaluation is rather costly and the dimensionality of the problem somewhat small, BO is usually not the answer. The field of Mathematical Optimisation did not become obsolete because of the discovery of Gaussian Processes. — usεr11852, Aug 10 '17 at 20:57
@JanKukacka Good point. I've moved my comments to an answer. — Sycorax, Jun 18 '18 at 03:12

Sycorax · Answer 1 · 2019-11-05T17:27:10.087

results are sensitive to parameters of the surrogate model, which are typically fixed at some value; this underestimates uncertainty; or else you have to be fully Bayesian and marginalize over hyper parameter distributions, which can be expensive and unwieldy.
it takes a dozen or so samples to get a good surrogate surface in 2 or 3 dimensions of search space; increasing dimensionality of the search space requires yet more samples
Bayesian optimization itself depends on an optimizer to search the surrogate surface, which has its own costs -- this problem is (hopefully) cheaper to evaluate than the original problem, but it is still a non-convex box-constrained optimization problem (i.e., difficult!)
estimating the BO model itself has costs

To state it another way, BO is an attempt to keep the number of function evaluations to a minimum, and get the most "bang for the buck" from each evaluation. This is important if you're conducting destructive tests, or just doing a simulation that takes an obscene amount of time to execute. But in all but the most expensive cases, apply pure random search and call it a day! (Or LIPO if your problem is amenable to its assumptions.) It can save you a number of headaches, such as optimizing your Bayesian Optimization program.

What are some of the disavantage of bayesian hyper parameter optimization?

1 Answers1

Linked