Obtaining a nonsence estimate of fixed slope after including random slope into the model

Question

Suppose I have pseudo-data, where include three subjects and their taken painkiller dosage and felt pain in respective four trials: (the numbers doesn't make medical sense but I want the case to be extreme)

library(ggplot2)
library(lme4)

set.seed(1337)
x1 <- 1:4
y1 <- 100 + (rnorm(4) - 8 * x1)
x2 <- 26:29
y2 <- 500 + (rnorm(4) - 7 * x2)
x3 <- 51:54
y3 <- 1000 + (rnorm(4) - 10 * x3)
df_test <- data.frame(subject_id = factor(rep(c(1,2,3), each = 4)),
                      dosage = c(x1, x2, x3),
                      pain = round(c(y1, y2, y3), 1))
df_test

##    subject_id dosage  pain
## 1        1      1     92.2
## 2        1      2     82.6
## 3        1      3     75.7
## 4        1      4     69.6
## 5        2     26    317.3
## 6        2     27    313.0
## 7        2     28    304.9
## 8        2     29    299.1
## 9        3     51    491.9
## 10       3     52    479.6
## 11       3     53    471.0
## 12       3     54    458.3

Since each subject was repeated measured, I'd like to know "the relationship between taken dosage and felt pain" by creating a mixed model where subject is the grouping factor of random effects.

Two models I tried are listed below:

Model 1:

summary(lmer(pain ~ dosage + (1 | subject_id), data = df_test))

## Linear mixed model fit by REML ['lmerMod']
## Formula: pain ~ dosage + (1 | subject_id)
##    Data: df_test
## 
## REML criterion at convergence: 77.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.60297 -0.19477  0.04491  0.25937  1.53768 
## 
## Random effects:
##  Groups     Name        Variance  Std.Dev.
##  subject_id (Intercept) 161876.29 402.339 
##  Residual                    8.45   2.907 
## Number of obs: 12, groups:  subject_id, 3
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 512.2454   233.2030   2.197
## dosage       -8.1568     0.7489 -10.891
## 
## Correlation of Fixed Effects:
##        (Intr)
## dosage -0.088

Model 2:

summary(lmer(pain ~ dosage + (1+dosage | subject_id), data = df_test))

## Linear mixed model fit by REML ['lmerMod']
## Formula: pain ~ dosage + (1 + dosage | subject_id)
##    Data: df_test
## 
## REML criterion at convergence: 101.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.34815 -0.60873 -0.01502  0.61189  1.36244 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr 
##  subject_id (Intercept) 2526.259 50.262        
##             dosage         1.381  1.175   -1.00
##  Residual                402.622 20.065        
## Number of obs: 12, groups:  subject_id, 3
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  99.4286    34.2614   2.902
## dosage        7.2037     0.7812   9.221
## 
## Correlation of Fixed Effects:
##        (Intr)
## dosage -0.939

I've read Questions about how random effects are specified in lmer. My understanding is that, Model 1 means that each subject has his/her own intercept of pain vs. dosage, while Model 2 additionally allows each subject has his/her own slope of pain vs. dosage. Am I correct?

However, I am still surprised by the huge difference between the coefficients of dosage fixed effect (-8.1568 in Model 1 and 7.2037 in Model 2). I thought Model 2 is more appropriate for my question since subjects could have various reactions to the drug, but the positive coefficient is contradictory to the trend of every respective subject, making the interpretation rather unreasonable.

How should I choose the "correct" model? And how come the results are so different between these two models - what are the mathematic reasons?

Yes, you basic understanding is correct, but maybe you should not be picking the random-effects structure to begin with? If you think model 2 is more appropriate for your research question so go ahead and use it. That said, are you sure it is estimable? That -1 correlation does not suggest a well-behaved model numerically. — usεr11852, Feb 25 '18 at 16:09
Why you removed the code reproducing your example? If anything it was a good thing that it was there. — usεr11852, Feb 25 '18 at 16:13
@usεr11852 I don't know how to evaluate the "goodness of fit" across models based on the given reports either. For instance, what does the "correlation" mean? I am really puzzled about the mathematic reasons of why the results are so different between these models. BTW I was afraid the code is too tedious to be put here. I am adding it back. — ytu, Feb 25 '18 at 16:19
At a higher level than code I think you have two broad pathways for model selection. The first is theoretical, i.e., the model you choose should be based on or evaluated against previous research and model specification. In this sense your model is a 'tweak' to that body of work with the goal of extending it. The second is empirical and requires adopting a train vs test approach. 'Goodness of fit' refers to the accuracy in fit between predicted and actual information for the training set, a useful metric but not the key metric. The better metric is the fit in the test or out-of-sample data. — user78229, Feb 25 '18 at 16:46
Correlation refers to the correlation between the random intercept and slope. Usually something like -1 suggest an over-parametrised model. Maybe you should consider treating the random slope and the random intercept as not having an explicit correlation (eg. (0 + dosage | subject_id)). — usεr11852, Feb 25 '18 at 17:10
Note also that in your second model the fixed effects are highly correlated. I think this can be reduced by subtracting 25 (ish) from dosage. — mdewey, Feb 25 '18 at 17:18
The situation that you describe is a textbook example of "Simpson's paradox". Maybe this term can help you in your search. Your overall effect described by the fixed slope is indeed positive but negative within each group (subject) as described by the random slopes. — Niek, Feb 26 '18 at 09:34
@usεr11852 Removing the correlation parameter by using (1+dosage || subject_id) does not help: fixed effect of dosage is still estimated as +7 which does not make any sense. Good question (+1). For some reason random intercept variance in Model2 is way too low than it should be. I have no idea why this happens. — amoeba, Feb 26 '18 at 13:04
@Niek This does not make sense, it's vice versa: in the random slopes model the effect is estimated as positive. — amoeba, Feb 26 '18 at 14:17
@mdewey Niek's explanation does not make sense. There is no apparent reason for why Simpson's paradox should appear in the random slopes model but be absent in the random intercepts model. — amoeba, Feb 26 '18 at 14:18
@amoeba I am not sure about that but I felt that if alternative answers are presented as such they can be discussed but it is easy to lose track of things in the comments. Just what we call it is open to debate too. — mdewey, Feb 26 '18 at 14:54
@amoeba: My comment is a general suggestion because the OP is still exploring random designs; it obviously not a "answer". (To be even more self-critical: it is "wrong" in the sense that it only suggests a random slope within group subject_id) The unceremonial answer is the usual: "optimise harder". model2_nm <- lmer(pain ~ dosage + (1 +dosage | subject_id), data = df_test, control = lmerControl(optimizer = "Nelder_Mead")) works fine. Even a simple AIC comparison shows that model2_nm is much better to model2 (by BOBYQA). Too few points, I guess for the quadratic approximation. — usεr11852, Feb 26 '18 at 21:02
@usεr11852 Nice! Consider posting this as an answer! (Of course the OP's example data are really pathological, in that dosage is strongly correlated with subject_id, and also $n=3$ subjects is arguably too low for a random effect (usual recommendation says at least 5), so it's no wonder that the optimizer fails. But bobyqa does not give any warning so it's easy to miss.) — amoeba, Feb 26 '18 at 21:13

score 3 · Accepted Answer · answered Feb 26 '18 at 22:42

The number of subjects in this example is rather low ($n = 3$). This can lead to a number of issues when trying to estimate random effects (intercepts or slopes) because we have insufficient information. To quote directly from GLMM FAQ on the "Should I treat factor xxx as fixed or random?" subject: "random effects estimation is trying to estimate an among-block variance". Estimating accurately a variance with only three subjects is going to be hard and may lead to spurious results.

To that respect, what we are seeing here as a "nonsense estimate of fixed slope" is actually a local minimum in the ML estimation. If we tried a different solver (e.g. Nelder-Mead) we would get the "correct answer" (some value for the dosage slope coefficient close to $-8$):

(lmer(pain ~ dosage + (1 +dosage | subject_id), data = df_test, 
      control = lmerControl(optimizer = "Nelder_Mead")))
Linear mixed model fit by REML ['lmerMod']
Formula: pain ~ dosage + (1 + dosage | subject_id)
   Data: df_test
REML criterion at convergence: 68.0749
Random effects:
 Groups     Name        Std.Dev. Corr 
 subject_id (Intercept) 475.649       
            dosage        2.365  -0.81
 Residual                 1.210       
Number of obs: 12, groups:  subject_id, 3
Fixed Effects:
(Intercept)       dosage  
    542.986       -8.215

We can see that the REML criterion at convergence now is 68.1 when using Nelder-Mead while 101.3 when using BOBYQA. Setting optCtrl = list(iprint=3)) within lmerControl shows that both routines start with a REML criterion value of ~106. The BOBYQA procedure does not improve greatly on this starting estimate but the Nelder-Mead procedure is able to get a significant better solution.

As @amoeba correctly notes, the fact that the default solver bobyqa terminates without any warnings is worrying but I would not be too quick to blame bobyqa. This optimisation procedure makes a quadratic approximation during its internal solution steps and it is not designed to work with very small samples.

Obtaining a nonsence estimate of fixed slope after including random slope into the model

1 Answers1

Linked