1

If I want to compare two regression models using cross-validation, should I use the same partitions of training and test data for both of the models?

For example, suppose I fit a linear model with one predictor and then the same model but with one added predictor. Is it important I use the same split of the data into test and training samples? Or does it not matter? Or is using different splits actually preferred?

user7340
  • 403
  • 1
    I would recommend that you use the same train/test data for both models. This way you can analyze the results using both dependent tests and independend tests. – alesc Apr 10 '15 at 06:34
  • Thanks. Can you please give an example of a dependent and independent test? – user7340 Apr 10 '15 at 11:33
  • 1
    Depends on whether you have normal distributed data or not. If the data follows normal distribution, you can use T-test (dependente/paired or independent/unpaired). As for non-normal dependent data, you can use Wilcoxon signed-rank test. And for non-normal independent data you dan use Mann–Whitney U test (a.k.a. Wilcoxon rank-sum test). Please don't mix up the last two tests, as they have similar name but different purpose. – alesc Apr 10 '15 at 11:40

0 Answers0