I have a function on the unit square that's expensive to evaluate,
and want to estimate where it reaches its minimum in at most $N$ evaluations, $N$ ~ 10 say.
Assuming a model of the form paraboloid + noise,
$ \qquad \text{f}( x; x_{min}, a, c, \sigma )
\approx a (x - x_{min})^2 + c + \mathcal{N}( 0, \sigma^2 ) $
what's an algorithm to generate points $x_0 \dots x_{N-1}$ to sample f() at, for a best estimate of $x_{min}$ ? (My real problem is 2d, but it may be helpful to do 1d first, so $ x\ x_{min}\ a\ c\ \sigma $ are 1d or 2d as appropriate.)
After sampling f() at the 4 corners and the middle, one can alternate:
- fit a paraboloid to the data so far $\rightarrow x_{min}$
- generate the next sample point $x_i$ ... how ?
Heuristics come to mind, e.g. "near $x_{min}$ but not too near", but this is surely a well-known problem ? (I imagine the cases $\sigma$ known and $\sigma$ unknown will be different.)
Added 11 Feb: In grid search, one typically evaluates f() 5 times at each point (5-fold cross-validation), on a say 10 x 10 grid, for 500 evaluations in all. The 5 values would give an estimate of $\sigma$ at each point, but then what — how would a statistician combine the 5 x 100 values ?
This question on SVM grid search shows 12 different bumpy not-so-paraboloids, but does not show the spread at each point. Also, search grid+search on stats.stackexchange.