Consider I have a regression model and I want to identify predictor variables that have a significant effect on my dependent variable. (or improve the fit).
I can fit a model with all parameters and do a stepwise backward elimination based on AIC, BIC, Ftest, or else or do LASSO.
No matter what, I obtain a model with reduced parameters, for which I consider the remaining terms as significantly influencing my response .
To check how robust this approach is, I can bootstrap my data and redo the parameter selection. I do this 1000 times and always note the selected model parameters.
I now have a frequency distribution of how often a term was selected.
To build my final model I now include all terms that are robustly selected in say 99% of the time during bootstrapping.
Is this a valid approach and can I apply corrections for multiple testing here? Say the 99% threshold is arbitrary, but can I interpret the 99% as a p=0.01, collect all p values for each term originally in the model and apply the Benjamini-Hochberg or Bonferroni correction and obtain only terms with say p <0.05?
Update: The model should be used for inference, identifying parameters that have impact on the response, and not for optimal prediction. It should be parsimonious as possible, so I tend to be conservative in term selection and want to apply the above mentioned p-value correction. Typically the model includes ~7 predictor variables, but first order interactions may be allowed (if this is not making things more complicated) say the full model is:
lm(y~(x1+x2+..x5+fac1+fac2)^2)
Thanks.