0

I fit a model to my data with the following formula after a stepwise selection (x1 to x5 stand for my variables):

lm(formula = outcome ~ x1+ x3 + x4 + x2+ x5+ poly(x4, 2) + poly(x2, 2) + poly(x5, 2) + poly(x1, 2) + x1:x2 + x1:x4+ x1:x3 + x4:x2 + x4:x5 + x3:x4+ x3:x2 + x2:x5 + x3:x5, data = data)

VIF shows strong multicollinearity in the interaction terms and one/two poly terms. Removing predictors with very high VIF increases the model error. I have heard that varpart() function in R can handle the multicollinearity issue, but that can only accept maximum 4 terms, which is below the number of high VIF terms in my model. Is multicollinearity an issue in this case and might be due to its polynomial nature and can be ignored or how best to handle it? The variables are all related biologically, but the VIF values are around 1 in a simple linear model of factors. Thanks for any help

Rossi
  • 1
  • 2
    Stepwise is not recommended in general. How many sampled do you have? What is your data? – Tim Apr 25 '22 at 09:16
  • Sorry, I did forward selection. My sample size is not big enough (80-100) and it was a DoE study. – Rossi Apr 25 '22 at 09:34
  • Your sample size isn't large enough to include all these polynomials and interactions willy-nilly. When working with small datasets you need to perform a more controlled, disciplined model selection process. One element of that is to introduce interactions suggested by the underlying theory or previous experiment. – whuber Apr 25 '22 at 12:59
  • Thanks @whuber. My goal of study was to screen the effect of several predictors. Of course, some interactions are already reported in biology literature. But, I was hoping that with incorporating more interaction terms, I would propose the existence of interactions that haven't been reported before. So do you mean this approach is false? – Rossi Apr 25 '22 at 19:24
  • Unless those new interactions are very strong, you won't detect anything new. There's not enough data to do so. – whuber Apr 25 '22 at 19:44
  • Thanks again. As the last question, when you talk about enough data, what range of data point do you mean? – Rossi Apr 25 '22 at 21:04

0 Answers0