1

I am analysing a dataset that has 20 sampling sites, each sampled five times (T1-T5) to measure the continuous response variable ('resp'). There is a continuous predictor ('covar') that was measured in each session, but the value was 0 for every site in sessions T1 and T2.

# Make the dataset
set.seed(345)
mydf = data.frame(resp = runif(100, 0, 50),
                  covar = runif(100, 30, 90),
                  time = as.factor(rep(c("T1", "T2", "T3", "T4", "T5"), each = 20)),
                  site = as.factor(rep(1:20, times = 5)))
mydf[1:40,2] = 0

> str(mydf) 'data.frame': 100 obs. of 4 variables: $ resp : num 10.8 13.7 19.5 32.8 21.8 ... $ covar: num 0 0 0 0 0 0 0 0 0 0 ... $ time : Factor w/ 5 levels "T1","T2","T3",..: 1 1 1 1 1 1 1 1 1 1 ... $ site : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

I am interested in how the effect of covar on resp varies between sessions, so I fitted the following model in lme4:

> m1 = lmer(resp ~ covar * time + (1|site), data = mydf)
fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients
boundary (singular) fit: see ?isSingular

Two terms are excluded due to rank deficiency (covar:timeT2 and covar:timeT5). I believe this is due to the values of covar all being 0 for T1 and T2, but I don't understand why covar:timeT5 in particular is excluded (rather than covar:T4 for example).

I haven't been able to find a post that matches this situation, so will be grateful if anyone can help explain what is happening here. Many thanks in advance.

> summary(m1)
Linear mixed model fit by REML ['lmerMod']
Formula: resp ~ covar * time + (1 | site)
   Data: mydf

REML criterion at convergence: 796.5

Scaled residuals: Min 1Q Median 3Q Max -1.59299 -0.88562 -0.06054 0.79504 1.92953

Random effects: Groups Name Variance Std.Dev. site (Intercept) 0.0 0.00
Residual 216.4 14.71
Number of obs: 100, groups: site, 20

Fixed effects: Estimate Std. Error t value (Intercept) 26.6204 3.2891 8.094 covar -0.4122 0.1860 -2.216 timeT2 0.8271 4.6515 0.178 timeT3 -9.6063 14.0368 -0.684 timeT4 -8.1798 12.8894 -0.635 timeT5 24.4145 12.1093 2.016 covar:timeT3 0.4942 0.2936 1.683 covar:timeT4 0.4736 0.2656 1.783

(P.S. there is also a boundary (singular) fit warning, which occurs because the random effect accounts for 0 variance)

1 Answers1

0

You can inspect the model matrix using the model.matrix() command, which is often very useful to understand singularities. model.matrix(m1) will give you the model matrix for the fitted model m1, with the singular columns already removed, so that is not what we want. We want

model.matrix(resp ~ covar * time,data=mydf)

Here are the first few lines of the output:

    (Intercept)    covar timeT2 timeT3 timeT4 timeT5 covar:timeT2 covar:timeT3 covar:timeT4 covar:timeT5
1             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
2             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
3             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
4             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
5             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
6             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000

Actually, to see the interesting parts, you have to look at the entire matrix. Your columns are linearly dependent, specifically:

covar = covar:timeT3 + covar:timeT4 + covar:timeT5

In cases like these, R will drop columns from the right (and columns are ordered first by interactions, then alphabetically) until the remaining matrix is not singular any more.

You may find it more informative to drop the covar main effect and only keep the interaction terms, by subtracting it from the formula (R will still drop the covar:timeT2 column, which is a constant zero):

summary(lmer(resp ~ covar * time - covar + (1|site), data = mydf))

A completely equivalent specification is to specify the interaction only by using : (* specifies interactions and all lower-order interactions and main effects):

summary(lmer(resp ~ time + covar : time + (1|site), data = mydf))
Stephan Kolassa
  • 123,354