Please see this question regarding Duda, Hart, and Stork's No Free Lunch Theoremm Discussion
Hi all, I was having trouble understanding the description of the NFL theorem in Duda, Hart, and Stork. My issues are the same as brought up in the reference above. I read the answer too, but I'm still having the following problems.
Equation (1) in the book, makes sense to me. It says that the expected error rate for a learning algorithm given the training set is:
\begin{equation} \mathcal{E}[E|\mathcal{D}] = \sum_{h,F}\sum_{x \notin \mathcal{D}} P(x)[1-\delta(F(x),h(x)]P(h|\mathcal{D})P(F|\mathcal{D}) \end{equation}
(Actually, I believe the $P(x)$ should be $P(x|x \notin \mathcal{D})$, but that won't make a difference in the proof.)
Now, as stated in the text, $P(h|\mathcal{D})$ is the probability that the algorithm will produce the hypothesis function $h$. So with just a slight change of notation, we could write that the expected error for algorithm $k$ is:
\begin{equation} \mathcal{E}_k[E|\mathcal{D}] = \sum_{h,F} \sum_{x \notin \mathcal{D}} P(x)[1-\delta(F(x),h(x)]P_k(h|\mathcal{D})P(F|\mathcal{D}) \end{equation}
So the conditional error for algorithm $k$, given target function $F$ is
\begin{equation} \mathcal{E}_k[E|F, \mathcal{D}] = \sum_{h}\sum_{x \notin \mathcal{D}} P(x)[1-\delta(F(x),h(x)]P_k(h|\mathcal{D}) \end{equation}
Now, the text states "the expected off-training-set classification error when the true function is $F(x)$ and the probability for the kth candidate learning algorithm is $P_k(h(x)|\mathcal{D})$ is given by
\begin{equation} \mathcal{E}_k[E|F,n] = \sum_{x \notin \mathcal{D}} P(x)[1-\delta(F(x),h(x)]P_k(h(x)|\mathcal{D}) \end{equation}
What I don't understand is this: In the case of a stochastic algorithm, there are many hypothesis functions $h$ that could be produced by algorithm $k$. But in the above equation, only one $h(x)$ appears, and there is no summation over $h \in \mathcal{H}$. The penultimate equation above seems a better way to describe the classification error for algorithm $k$ given $F$. That is why I agree with the author of the above referenced question (especially his #1 and #2).
If the equation in the text is correct, then, which $h$ is it? Anybody have any ideas? I must be missing something.