Should cross-validation to compare models be performed with the same partitions?

Asked Apr 10 '15 at 06:10

Active Apr 10 '15 at 06:10

Viewed 127 times

If I want to compare two regression models using cross-validation, should I use the same partitions of training and test data for both of the models?

For example, suppose I fit a linear model with one predictor and then the same model but with one added predictor. Is it important I use the same split of the data into test and training samples? Or does it not matter? Or is using different splits actually preferred?

asked Apr 10 '15 at 06:10

user7340

1

I would recommend that you use the same train/test data for both models. This way you can analyze the results using both dependent tests and independend tests. – alesc Apr 10 '15 at 06:34
Thanks. Can you please give an example of a dependent and independent test? – user7340 Apr 10 '15 at 11:33
1

Depends on whether you have normal distributed data or not. If the data follows normal distribution, you can use T-test (dependente/paired or independent/unpaired). As for non-normal dependent data, you can use Wilcoxon signed-rank test. And for non-normal independent data you dan use Mann–Whitney U test (a.k.a. Wilcoxon rank-sum test). Please don't mix up the last two tests, as they have similar name but different purpose. – alesc Apr 10 '15 at 11:40

Should cross-validation to compare models be performed with the same partitions?

0 Answers0

Linked