There are actually lots of questions about parameter tuning through cross-validation. I have read some of them, e.g. this one. I, however, still can't understand the details of the process. Here are my questions:
- How do we get the space for parameters without a grid search?
How do we consider the performance of a particular parameter. For example, given 10 folds. I saw a formula for the cross-validation error of a parameter $\theta$ is:
$$CV(\theta) = \frac 1 n \sum_{k=1}^{K} \sum_{i \in F_k}\big(y_i - f^k_{\theta}(x_i)\big)^2$$
If $\theta$ is given, then $CV(\theta)$ is actually the average error of $f_{\theta}(x)$ on the whole 10 fold, i.e. the whole training data. Then why do we cut them into 10 folds? Why don't we just try all possible $\theta$s on the whole data and see which one is the best for f(x)?
When we evaluate the models we have through CV. On each n-1-fold, there will be an optimal $\theta$. For each round of validations, we will have a different $\theta$ on the training folds and use it on the 1 test fold, and then calculate the average error rates on the 10 folds. My question is, once we know which model is the best, how do we apply it on the test dataset while we have different $\theta$s for different folds?

2)Let' say 2-fold CV. In my mind, the evaluation through CV should be training the model on the first half and find a set of good parameters. Then use the trained model on the rest half and get a result. Do the same thing on the second half and get another result. Use the mean of the two results as the performance of this model and compare different models by the same way
– DukeJun May 27 '15 at 04:05GridSearchCVandLassoCVboth use a grid search. For 2), I don't quite follow what you're proposing, could you please explain what you mean by "get a result" in more detail? Is the result the prediction itself or the estimated error? – Matthew Drury May 27 '15 at 04:24