Model selection: before or after nested cross-validation?

Question

I want to build a neural network over a data set. My idea is to use cross-validation on a training set to select the "best" neural network (and evaluate it on a separate test set) and to use nested cross-validation to make some statistical predictions. I'd use nested CV to plot bias and variance of my grid search's hyper-parameters. This way I can estimate my method's performance.

If these assumptions are not wrong, what should I do first? Model selection or estimation?

score 2 · Accepted Answer · answered Dec 19 '16 at 16:53

2

I think you misunderstood something about the nested cross validation: Hyperparameter tuning / model selection is not done before nor after nested cross validation, it is done in the inner loop of the nested cross validation.

Probably related to the misunderstanding: grid hyperparameters in themselves do not cause a bias in generalization error erstimation. The (optimistic) bias is caused by the selection of the (apparently) best hyperparameter set.

answered Dec 19 '16 at 16:53

cbeleites unhappy with SX

38,684

Okay, I know that nested cross validation returns k models from the k inner loops (each one performs model selection). Have I to choose between one of these k models? – Stefano Nardo Dec 19 '16 at 17:39
1

You don't. Your outer cross validation estimates generalization error for a training function that includes tuning. Thus, your final model (trained on the whole data set is basically the inner CV loop applied to the whole data set. See e.g. http://stats.stackexchange.com/a/65156/4598, http://stats.stackexchange.com/a/233027/4598, http://stats.stackexchange.com/a/245169/4598 – cbeleites unhappy with SX Dec 19 '16 at 17:49
I can't see where I'm wrong. You say that the best result is when surrogate models are similar, that is when the same hyper-parameters win in the inner loop. This is what you called a stable method. If my method is stable I can use the method over the whole set. What if is unstable? Do I change it and try again? However, let's suppose it is stable. As you said in other question, I run f (whole data set). f is essentially the inner loop. But f is just training+validation, without a third set for testing purpose. Isn't not including test set wrong? – Stefano Nardo Dec 19 '16 at 18:33
Tell me if I'm wrong: my goal is to get a stable f (obviously a low error too) at the end of the nested CV. If this requirement is satisfied I can run f on the whole data set. – Stefano Nardo Dec 19 '16 at 19:08
I think you still have important misconceptions about what cross validation does, and why it is used. "But f is just training+validation, without a third set for testing purpose." This is the last step after the cross validation, it is not a replacement for cross validation. With the CV, you test f as a training procedure. You then use the CV results as approximation to the (unmeasured) performance of f (whole data set). This approximation (which is an extrapolation to a slightly larger training set) does not work if already the surrogate models have widely varying performance. – cbeleites unhappy with SX Dec 20 '16 at 10:16
So the steps are: 1. Analyze the surrogate models and the average risk estimation. 2. If the previous results are good, fit the method (including parameter tuning) on the whole dataset. 3. At this point the results of 1. and 2. should be comparable, right? – Stefano Nardo Dec 20 '16 at 14:09
yes, that's right (I'd say the performance of 2 should be similar to the performances observed in 1) :-) – cbeleites unhappy with SX Dec 20 '16 at 14:19
Could I also swap the first two steps? Select the model on whole data set and then evaluate it with the surrogate models? They seem quite independent to me. – Stefano Nardo Dec 20 '16 at 14:32
The proper steps 1 and 2 can be done independently of each other. However, your description is still confused: you don't evaluate "the selected model" with surrogate models. Surrogate model training needs to do exactly the same selection as the training the model on the whole data set - just on a subset of your data so some data is left for independent testing of that surrogate model. – cbeleites unhappy with SX Dec 20 '16 at 14:41
So should I only accept identical surrogate models as a good result? Shouldn't I expect a bit of variance? – Stefano Nardo Dec 20 '16 at 14:49

Model selection: before or after nested cross-validation?

1 Answers1