0

I would like to use the Sequential Sum of Squares test. But the degree of freedom for the denominator is (n - p - 1), where n = number of samples, and p = number of variables in the full model.

What do I do when p > n?

Tyro
  • 151
  • Since in normal circumstances, as soon as p gets as large as n-1 (and possibly before), there's no remaining residual sum of squares at all. Which implies you're not simply using multiple regression (as your tags suggest). As such you need to clearly explain what you're doing. – Glen_b Mar 22 '14 at 08:38
  • I'm using stepwise selection, and trying to use Sequential Sum of Squares at each step to test if adding/subtracting a variable is an improvement or not. – Tyro Mar 22 '14 at 08:50
  • You should not use stepwise selection (see here: Algorithms for automatic model selection). You'll need to use LASSO / LARS. – gung - Reinstate Monica Mar 22 '14 at 13:59
  • 1
    Thank you for your advice. I want to compare forward selection against several other methods, including elastic net, on simulated data sets. So I still need to address my original question of how to handle p > n in the Sequential Sum of Squares test. Anyone? – Tyro Mar 22 '14 at 18:06
  • @Tyro (Leaving aside any question of how to do the fitting at all) when $p>n-2$, what residual sum of squares do you have? If it's zero, is it not the case that any additional terms will always have sum of squares =0? What use then, to ask about sums of squares in that situation? Isn't the answer simply 0 every time? – Glen_b Mar 22 '14 at 22:21
  • Glen_b: you're right. Your comment reminded me that I'm trying forward selection, not backward, so my question is moot. Thanks for the kick in the head. – Tyro Mar 23 '14 at 02:44

0 Answers0