2

I have conducted a randomized 2x2 cross-over trial of 8 participants measuring the effect of a specific diet (intervention) vs normal diet (control) on the number of sleep hours.

The study design includes one week periods: run-in, intervention/control, washout, intervention/control, and then washout. Thus in total 5 weeks. Thus:

Period 1 Period 2
Sequence AB Treatment A Treatment B
Sequence BA Treatment B Treatment A

The data looks like this:

subject sleep_hours sequence period treatment
1 4.3 AB runin 0
2 6.5 AB runin 0
3 5.2 AB runin 0
4 4.4 AB runin 0
5 4.2 BA runin 0
6 6.5 BA runin 0
7 5.2 BA runin 0
8 4.6 BA runin 0
1 5.2 AB 1 A
2 4.1 AB 1 A
3 6.5 AB 1 A
4 4.4 AB 1 A
1 7.1 AB 2 B
2 8.7 AB 2 B
3 6.5 AB 2 B
4 7.4 AB 2 B
5 7.2 BA 1 B
6 8.3 BA 1 B
7 6.9 BA 1 B
8 7.4 BA 1 B
5 4.8 BA 2 A
6 5.1 BA 2 A
7 4.2 BA 2 A
8 6.6 BA 2 A

Here is a reproducible code of the above data:

db <- structure(list(subject = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 5L, 6L, 7L, 8L), 
sleep_hours = c(4.3, 6.5, 5.2, 4.4, 4.2, 6.5, 5.2, 4.6, 5.2, 
4.1, 6.5, 4.4, 7.1, 8.7, 6.5, 7.4, 7.2, 8.3, 6.9, 7.4, 4.8, 
5.1, 4.2, 6.6), sequence = structure(c(1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), levels = c("AB", "BA"), class = "factor"), 
period = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
), levels = c("1", "2", "runin"), class = "factor"), 
treatment = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L), levels = c("0", "A", "B"
), class = "factor")), row.names = c(NA, -24L), 
class = "data.frame")

I used a Linear mixed-effects model like to see if the intervention (treatment A) affects sleep_hours compared with the control (treatment B):

install.packages(lme4)
install.packages(lmerTest)
model <- lmer(sleep_hours ~ treatment * period + sequence + 
              (1|subject), data = db)
summary(model)

This gave these results:

Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: sleep_hours ~ treatment * period + sequence + (1 | subject)
   Data: db

REML criterion at convergence: 57.7

Scaled residuals: Min 1Q Median 3Q Max -1.0220 -0.6944 -0.1703 0.3407 1.5199

Random effects: Groups Name Variance Std.Dev. subject (Intercept) 0.0000 0.000
Residual 0.9101 0.954
Number of obs: 24, groups: subject, 8

Fixed effects: Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.100000000000009415 0.477005998098789574 17.999999999553889296 10.692 0.00000000316 *** treatmentA -0.050000000000007615 0.674588351844622736 17.999999999553857322 -0.074 0.94173
treatmentB 2.325000000000000178 0.674588351844622736 17.999999999553857322 3.447 0.00288 ** period2 -0.000000000000009721 0.954011996197578593 17.999999999553843111 0.000 1.00000
sequenceBA 0.024999999999988885 0.674588351844622736 17.999999999553867980 0.037 0.97085
treatmentA:period2 0.100000000000017311 1.652397248444412492 17.999999999616743906 0.061 0.95241


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) trtmnA trtmnB perid2 sqncBA treatmentA -0.707
treatmentB 0.000 0.000
period2 -0.500 0.354 -0.707
sequenceBA -0.707 0.500 -0.500 0.707
trtmntA:pr2 0.577 -0.612 0.612 -0.866 -0.816 fit warnings: fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients optimizer (nloptwrap) convergence code: 0 (OK) boundary (singular) fit: see help('isSingular')

I also tried to extract the P value and 95% confidence intervals for the effects of treatment A, inspired by the code of @BenBolker here:

model_p <- pvalue(coef(summary(as(model, 
                   "lmerModLmerTest")))[2,5])
model_ci <- paste0(ciformat1(coef(summary(as(model, 
  "lmerModLmerTest")))[2,1]), " (", ciformat1(confint(model, 
  method="Wald")[4,1]), " to ", ciformat1(confint(model, 
  method="Wald")[4,2]), ")")

Which gave these results:

Intervention period P
Sleep hours (hours) -0.1 (-1.4 to 1.3) .942

My interpretation: treatment A did not affect sleep differently than treatment B, mean difference -0.1 hours (95% CI -1.4 to 1.3 hours), P = .942.

Is my code and interpretation correct?

  • Thank you! I thought code related questions, including interpretation of the test statistics (which depends on the coding), were supposed to be asked at stackoverflow? Sorry if I misunderstood this. – Wandering_geek May 27 '23 at 16:57
  • 1
    The statisticians will understand the coding. If you had been trained properly you should be able to understand what the output meant whether done with R, SAS, or Stata. – DWin May 27 '23 at 17:03

1 Answers1

1

Is my code and interpretation correct?

No! Your model has 2 large issues and the parameter you have extracted means something very different then what you think.

The issues with your model:

  1. It is singular, aka the random effect variance is 0, which means you did not adjust for variation between subjects. This is a whole thing (see here: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1 ) but, in my opinion, with only 8 subjects and 3 observations per subject it is hopeless. Just use an "unmixed" linear model and keep the limitation in mind.

  2. You are requesting redundant and impossible fixed effects. When you put treatment * period your model also wants to fit a parameter for period = runin and treatment = A or B which it can't because at period = runin there is only treatment = 0. Also sequence is the relationship between period and treatment making it redundant. You should try to express your model minimally and understand what the terms in the formula mean. You could start here: https://stats.stackexchange.com/a/122251/341520

Now one way to express your model well is to just look at period*sequence , because if we know which both of those, we also automatically know the treatment.

library(tidyverse)
m2 <- lm(sleep_hours ~  period * sequence, data = db %>% 
           mutate(period = factor(period, levels = c("runin", "1", "2"))) # fixing baseline at runin
         )
> summary(m2)

Call: lm(formula = sleep_hours ~ period * sequence, data = db %>% mutate(period = factor(period, levels = c("runin", "1", "2"))))

Residuals: Min 1Q Median 3Q Max -0.9750 -0.6625 -0.1625 0.3250 1.4500

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1000 0.4770 10.692 3.16e-09 *** period1 -0.0500 0.6746 -0.074 0.94173
period2 2.3250 0.6746 3.447 0.00288 ** sequenceBA 0.0250 0.6746 0.037 0.97085
period1:sequenceBA 2.3750 0.9540 2.489 0.02280 *
period2:sequenceBA -2.2750 0.9540 -2.385 0.02830 *


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.954 on 18 degrees of freedom Multiple R-squared: 0.6379, Adjusted R-squared: 0.5373 F-statistic: 6.343 on 5 and 18 DF, p-value: 0.001462

> confint(m2) 2.5 % 97.5 % (Intercept) 4.0978476 6.1021524 period1 -1.4672575 1.3672575 period2 0.9077425 3.7422575 sequenceBA -1.3922575 1.4422575 period1:sequenceBA 0.3706952 4.3793048 period2:sequenceBA -4.2793048 -0.2706952

So we can see that Treatment B slept 2.3 hours more with differing levels of statistical significance depending on if we compare it to runin or Treatment A. However the standard errors are large it's just that the effect is larger.

I love using predict (see help(predict.lm)) to understand what my model actually means so let's plot that:

predict(m2, interval = "confidence")
db2 <- cbind(db, predict(m2, interval = "confidence"))

ggplot(db2 %>% select(- sleep_hours, -subject) %>% distinct() %>% mutate(period = ordered(period, levels = c("runin", "1", "2"))) %>%
arrange(period), aes(y = fit, x = period, group = factor(sequence))) + geom_path() + geom_errorbar(aes(ymin = lwr, ymax = upr), width = 0.2) + geom_point(aes(color = factor(treatment)), size = 3)

enter image description here

Now let's compare our fit to the data

ggplot(db %>% 
         mutate(period = ordered(period, levels = c("runin", "1", "2"))) %>%   
         arrange(period), 
       aes(y = sleep_hours, x = period, group = factor(subject))) +
  geom_path() +
  geom_point(aes(color = factor(treatment)), size = 3)

enter image description here

Pretty good, Id's say.

Lukas Lohse
  • 2,482
  • Thank you Lukas. I would like to add that my true data has 30 individuals rather than 8. The above was simply a "reproducible example" as required by StackExchange. Could lmer() make more sense now that you know that I have 30 individuals? – Wandering_geek Jun 06 '23 at 16:27
  • First of, thank you for providing the reproducible example. Not everybody does it, but it always helps a lot. Whether your model with 30 subjects will be singular is hard to predict, but if it is you would have good evidence that the subject variance is very small. Sadly you can't use predict to calculate CI for lmer-models. You could however use something less pretty like visreg(m3,xvar = "period" ,by = "sequence", type = "contrast") – Lukas Lohse Jun 06 '23 at 22:07
  • Thank you Lukas. But if I would use lmer for the main result, I could use the equation lmer(sleep_hours ~ period*sequence + (1|subject), data = db)? And in that case, is it the line named "period1:sequenceBA" that gives me the P value for whether or not the treatment had an effect on sleep hours? – Wandering_geek Jun 10 '23 at 09:14
  • It would maybe be easier to interpret with something like m3 <- lmer(sleep_hours ~ treatment * sequence + (1|subject), data = db %>% mutate(treatment = factor(treatment, levels = c("A", "B", "0"))) # fixing baseline at Treatment A ) – Lukas Lohse Jun 16 '23 at 21:25
  • Now your comparisons are against treatment A and you can directly see differnce cused by different sequencing. – Lukas Lohse Jun 16 '23 at 21:30