high feature correlation but good OLS prediction

Question

I have regression results where unconstrained OLS is near optimal - out of sample scores are almost the best when compared to some other constrained regression models. Although the ratio of number of observations to features is high, some features have high correlation and I expect many not to be useful.

I want to be able to explain those results intuitively and check that my results are correct, as I had expected unconstrained OLS to do poorly.

As sanity check, I confirm that PCA regression is optimal when the number of components is close to the number of features. Results are bad when number of components is small. On the contrary, PLS does well with a small number of components.

Are variance inflation factors relevant here? Considering I care more about out of sample prediction scores (eg, R2) than estimation error of the parameters (which indeed could be high for unconstrained regression). In principle, if I orthogonalised all my features, then my inflation factor would be reduced but the dimension wouldn't change so I would expect same prediction score, hence this is not relevant here.

I understand how parameter estimation error blows up with feature correlation considering the variance is inverse of X^TX, but I care more about predictions scores (error estimating Y) - any reference for a discussion on this?

This page has extensive discussion: unless the model is overfit, multicollinearity isn't a problem for making predictions. Section 4.6 of Frank Harrell's online notes summarizes the issues, with some references to the literature. — EdM, Jan 16 '23 at 18:26
Thans, that's the kind of discussion reference I was after. "unless the model is overfit" I believe there is at least some overfit yes: it's unconstrained regression and some features are probably noise so some overfitting is inevitable. — jam123, Jan 16 '23 at 18:37
If "the ratio of number of observations to features is high" then there might not be substantial overfitting at all. With a high ratio, any "noisy" features are likely to have regression coefficients that are low in magnitude and thus contribute little to predictions. — EdM, Jan 16 '23 at 18:44
I see the logic of little harm if number of observations is large, but PCA analysis suggests few dimensions (maybe 5 out of 100 features). So I had expected constrained techniques to do much better than OLS. — jam123, Jan 16 '23 at 18:59
I think the confusion is that textbook theory introducing Ridge, LASSO,...is for when k < n. So that's where I drive my intuition from. — jam123, Jan 16 '23 at 19:02

high feature correlation but good OLS prediction

0 Answers0