Gradient Descent and the No-Free-Lunch Theorem

Question

I'm doing a presentation on the No-Free-Lunch theorem, since I've found that the best way to learn about a topic is to try and teach it...In order to get an idea what I'm talking about, I started reading the Wolpert & Macready Paper.

In this paper, they define a trace of length $m$ as the sequence of $(x_i,y_i)$ pairs where $x_i\in X$ is some setting of your model parameters $y_i\in Y$ is some value dictating how desirable $x_i$ is according to some objective function $f$.

They then describe a "black-box search algorithm" as a function which maps traces to $X$ (yielding the next set of parameters to try.)

Since their description of a "black-box search algorithm" is strictly a function of a trace, is gradient-descent considered a black-box search algorithm? Isn't it cheating to look at the gradient? Aren't you making your algorithm a function of the objective function as well in that case?

score 4 · Accepted Answer · answered Nov 25 '18 at 17:59

No, gradient descent is not a black-box algorithm.

In optimization there are several well-studied classes of problems, in particular one way to group problems into classes is by the order of an oracle. In particular, if you are only given the value of the function $y = f(x)$ at the point $x$, then the oracle is said to be a 0-th order one. If you also have function's gradient $\nabla f(x)$, then the oracle is called a 1-st order one. And so on, there're methods that use function's Hessian, and effectively use 2nd order oracle. If some method requires access to a n-th order oracle, the method itself is said to be n-th order.

A black-box optimization algorithm would be a 0-th order one, as it uses no extra information besides the function's value. Gradient Descent assumes the existence and access of the gradient, so you know that the function is at least differentiable, and thus is not a "black box".

Finally, optimization people have proven many results of optimization hardness for different oracles, see Introductory Lectures on Convex Optimization by Yurii Nesterov for detailed exposition.

Gradient Descent and the No-Free-Lunch Theorem

1 Answers1