2

As I haven't found the equivelant of the MASS::stepAIC for mixed models (eg in lmer) what I'm intending to do is to find the best lm model using stepAIC and then go in lmer and add the random effects.

I'm not confident that this approach is the best but my question is how bad this could be.

The code it's like that,

# I'm setting up a big model to consider 
lm_big <- lm(res ~ (v1 + v2 + v3 + v4)^2, data= dat)
#stepAIC will choose the "best" model
lm_aic <- MASS::stepAIC(lm_big)

#Then I'm getting the selected model and I add the random effects lmer1 <- lmer(formula(lm_aic) + (1|r1) + (1|r2), data = dat)

The data is from a DoE studying bread making process. The DoE of 8 points plus 3 repetitions of the central points run twice in two days period(once a day)(here is the r1) and run by two different operators (r2) (randomized). So there are 22 runs in total with 4 factors run in 2-levels and two blocks.

Lefty
  • 479
  • 3
    You shouldn't do stepwise regression at all: https://stats.stackexchange.com/a/20856/11849 – Roland Feb 09 '23 at 06:11
  • @Roland, thanks for taking time to answer. I've seen the post however my dataset is not some observational data (like the race example) but is a DoE with some blocking. From these 4 DoE factors I would expect something up to two-way interaction (based on field knowledge). Now, I have no expectation about the direction of factors effect or something else. So, my only alternative is to do a data driven model selection which will lead to a useful and wrong model. So, I understand Harrell’s comments but I see no alternative. – Lefty Feb 09 '23 at 14:04
  • 2
    Please give more derails & context, sample size? What does the variables represent in real world? What is the experimental design? Maybe add the tag [tag:experiment-design] – kjetil b halvorsen Feb 09 '23 at 14:14
  • 1
    @Roland I do not think that these ten points are of much significance when the goal is prediction. And points 9 & 10 are outright ridiculous. – cdalitz Feb 09 '23 at 14:37
  • @cdalitz: In light of OPs last update, I don't think the goal here is prediction. – kjetil b halvorsen Feb 09 '23 at 16:07
  • @kjetil b halvorsen, As the DoE is randomized and there are no causal effects affecting the predictors (e.g. confounders etc) a causal and a predictive model should be the same. – Lefty Feb 09 '23 at 16:22
  • 1
    @kjetil-b-halvorsen Thanks for pointing me to the edit. Even if the goal is prediction: with so little data, stepwise AIC (which is a greedy feature selection method) quite likely will result in a "best" model that is too much fitted to the data. I thus agree with Roland that stepwise feature selection is dubious in this case, but for different reasons. – cdalitz Feb 09 '23 at 16:23
  • Looks like a replicated half-fraction design. Just to verify, you did not have Day1 run by Operator1 and Day2-Oper2, right? I'd recommend fitting the main effects and interaction model (with Day and Operator as well), looking at the confidence intervals, plot everything, and using your subject matter knowledge to choose which factors to continue forward with in your experimentation. Don't let an automated algorithm choose for you. – MichiganWater Feb 10 '23 at 20:22
  • @MichiganWater, Hi, indeed you're right about the DoE type. I like the idea that I should not allow an algorithm to choose the model. However, when an interaction is included in the model and the answer from the literature is like this interaction could be true but not sure, what should I do? – Lefty Feb 13 '23 at 13:53
  • @Lefty I'm not sure what you're asking about "what should I do?" - if scientific insight says an interaction is reasonable, then I'd communicate that the 'plausible' size for the interaction effect is the range of the confidence interval. Plausible is in quotes because it's technically not quite correct for frequentist confidence intervals vs Bayesian credible intervals, but that's a whole other issue. "The interaction effect could reasonably be as small as LCL or as large as UCL." LCL/UCL = Lower/Upper Confidence Limit. A good conceptual starting point is to learn about equivalence tests. – MichiganWater Feb 13 '23 at 19:49

0 Answers0