How to correctly set up my mixed-effect model?

Question

I have data on days in which the greening of trees happen across America in 2015. This includes meteorological and topography data etc. I want to predict the day of greening happens through a linear mixed-effect model by meteorological data and topography data being fixed effects while the states of America is the random effect. I have looked into how to conduct the linear mixed-effect model in R, and I tried to perform this, but the output looks strange. I have looked at many examples and this 'cheat sheet' on how to perform linear mixed-effect models in R.

Right now, when I for example plot the relationship between relative humidity and day of greening I get strong positive relationships in many states which makes a lot of sense in this study.

But, in my overall linear mixed-effect model I get a negative estimate for relative humidity, while a simple linear regression generates a positive estimate for relative humidity as predictor. The models are below:

The number 15 represents the year 2015
Doy: Day of the year, greening occurs
PostC15: Carbon content in the vegetation
PostMC15: Mean carbon content in the state
PostPAR15 photosynthetically active radiation
PostElev15: Elevation
PostPR15: Precipitation
PostRH15: Relative humidity    
PostTemp15: Air temperature
PostAsp15: Aspect
lm.doy <- lm(doy ~ postC15+postMC15+postPAR15+postElev15+postPR15+postRH15+postTEMP15+postAsp15, data = df)
lmm.doy &lt;- lme(doy ~ postC15+postMC15+postPAR15+postElev15+postPR15+postRH15+postTEMP15+postAsp15, data = postSM, random = ~1+postC15+postMC15+postPAR15+postElev15+postPR15+postRH15+postTEMP15+postAsp15|states, method = 'ML', control = lmeControl(opt = &quot;optim&quot;, msMaxIter=1000, maxIter = 1000, msMaxEval = 1000))


Below is the summary of both:
summary(lm.doy)
Coefficients:
              Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept) -2.549e-16  1.286e-02   0.000 1.000000    
postC15     -3.567e-02  1.636e-02  -4.626 3.83e-06 ***
postMC15     4.209e-02  2.225e-02   3.689 0.000228 ***
postPAR15   -5.008e-02  3.016e-02  -1.660 0.096960 .  
postElev15   6.690e-02  1.588e-02   2.323 0.020217 *  
postPR15     7.233e-02  1.871e-02   2.797 0.005178 ** 
postRH15     8.407e-02  2.556e-02   3.289 0.001012 ** 
postTEMP15   3.635e-01  2.043e-02   8.004 1.52e-15 ***
postAsp15   -3.278e-01  1.477e-02  -8.655  &lt; 2e-16 ***

summary(lmm.doy)
                 Value  Std.Error   DF   t-value p-value
(Intercept) -0.2376308 0.10220873 4500 -1.248727  0.2118
postC15     -0.1541045 0.01906032 4500 -2.313944  0.0207
postMC15    -0.1168753 0.02265520 4500 -1.186277  0.2356
postPAR15   -0.2255873 0.03026647 4500 -3.818988  0.0001
postElev15  -0.1322979 0.01637984 4500 -1.361303  0.1735
postPR15    -0.1411053 0.01815212 4500 -1.713592  0.0867
postRH15    -0.1729018 0.03824127 4500 -1.644868  0.1001
postTEMP15   0.1462955 0.04477048 4500  0.810701  0.4176
postAsp15    0.1557790 0.01890807 4500  2.421137  0.0155

I am not sure that I set up my linear mixed-effect model correctly. In simple terms, I thought by defining:

    random = ~1+postC15+postMC15+postPAR15+postElev15+postPR15+postRH15+postTEMP15+postAsp15|states

I more or less compute small linear regression for each state and eventually 'sum' it up to one global estimate of intercept and slope for each individual variable. Or should I basically do the following:

    doy ~ (1|states) + postC15+postMC15+postPAR15+postElev15+postPR15+postRH15+postTEMP15+postAsp15 + (0+postC15+postMC15+postPAR15+postElev15+postPR15+postRH15+postTEMP15+postAsp15|states)

To get what I want...the effect of each and every variable for each and every state?

How do I set up a linear mixed-effects model in R to predict the day of greening for each individual state from meteorological and topographical data?

Can you be more explicit about what each predictor name means? Looking at this I can surmise what part of the problem may be, but the naming of the predictors doesn't help unless you provide some explanation of what they represent. — Shawn Hemelstrand, Mar 19 '23 at 13:13
Hi @ShawnHemelstrand. Thanks for your reply. I have edited the question to included brief descriptions of each variable included. Does it help? — Thomas, Mar 19 '23 at 13:58
Did this model converge? Im less familar with lme but typically that many random effects terms will cause extreme convergence issues with lmer. — Shawn Hemelstrand, Mar 19 '23 at 14:24
Not once I used lmer, but if I use nlme with control = lmeControl(opt = "optim", msMaxIter=1000, maxIter = 1000, msMaxEval = 1000) it does converge. At least I dont get any converge error. — Thomas, Mar 19 '23 at 18:38

score 2 · Accepted Answer · answered Mar 19 '23 at 15:19

As @shawn-hemelstrand commented, I am pretty surprised that this code converged for you. nlmer::lme() will sometimes allow users to fit models that they do not have enough data for. If you are new to mixed models, I would strongly recommend starting with lme4::lmer()(also using the lmerTest library). For example, I might first fit this model as:

lmm.doy.lmer <- lmerTest::lmer(doy ~ 
   postC15+postMC15+postPAR15+postElev15+postPR15+postRH15+postTEMP15+postAsp15 + 
   (1|states), 
   data = postSM)

I think the broader question is, what is the best way to analyze this data? You have non-independent measurements within-state. But there's more! The states themselves are non-independent. Some states are closer than others, and their measurements are therefore going to be more similar. I also imagine that your measurements from within each state relate to different geographic regions - counties maybe? So the same applies. Basically, you have a massive spatial auto-correlation issue (your observations are non-independent in a structured fashion). You'll need to use an analysis that specifically accounts for spatial auto-correlation. See here for a decent primer.

If I do as stated, I still get that postRH15 is of negative influence on doy, also if I set it to (1+postRH15|states). Should I just come to terms with the fact that postRH15 influences the doy negatively? — Thomas, Mar 19 '23 at 18:37
Your situation may be an example of Simpson's paradox . Your lm model gives your the overall relation between postRH15 and doy ignoring the possible differences between states, whereas lme / lmer give you the average of each state's slope (and also each state's separate slopes if you extract them from the model). These two can have opposite signs, as also illustrated for instance here. — Sointu, Mar 19 '23 at 19:11
Agree with @Sointu re simpson's paradox. Again, I do think that a standard mixed effect model (e.g., random intercepts, maybe random slopes) does not sufficiently account for the structure of your data. — David B, Mar 20 '23 at 12:27

How to correctly set up my mixed-effect model?

1 Answers1