I am doing a longitudinal repeated measures study looking at the effect of age (maturation) on my test result (Y - a type of hearing test). My outcome Y, is numeric, and is tested at 11 frequencies (pitches) in both ears of each subject at each age group (frequency is numeric but I am treating it as a factor). The main effects are frequency (factor with 11 levels) and age (factor with 4 levels) with ear side (right and left) as a covariate. Each subject has an ID (sub.id) and each ear has an id (ear.id). I am using lmer (from the lme4 package in R) to model the data.
Presently, my model is lmer(Y ~ frequency * age + ear + (1|sub.id/ear.id), long.df), which models the repeated measures of ears nested within subjects (sub.id) as answered here
I initially thought that since frequency was a fixed factor it wouldn't be part of the random structure, but now I'm thinking that maybe it should, since multiple measurements are made at each frequency in each ear over time. Should I include frequency in the random structure? If so how? Should it be lmer(Y ~ frequency * age + ear + (1|sub.id/ear.id/frequency), long.df)? Or something different?
I have created some simplified toy data with only 2 age groups and 3 frequencies below:
library(tidyr)
sub.id = rep(1:15, each = 2) # 15 subjects
ear = rep(c("left", "right"), 15) # 15 right ears, 15 left
# Y results at 1000, 2000 and 4000 Hz for ages 1 and 2
# age 1 Y results
f1000.1 = rnorm(30, mean = 5, sd = 1)
f2000.1 = rnorm(30, mean = 4, sd = 1)
f4000.1 = rnorm(30, mean = 6, sd = 1)
# age 2 Y results
f1000.2 = rnorm(30, mean = 5, sd = 1)
f2000.2 = rnorm(30, mean = 4, sd = 1)
f4000.2 = rnorm(30, mean = 6, sd = 1)
# create dataframe for Y results from age 1 and 2
age1 = cbind.data.frame(sub.id, ear, f1000.1, f2000.1, f4000.1)
age1$age = "age1"
names(age1) = c("sub.id", "ear", "f1000", "f2000", "f4000", "age")
age2 = cbind.data.frame(sub.id, ear, f1000.2, f2000.2, f4000.2)
age2$age = "age2"
names(age2) = c("sub.id", "ear", "f1000", "f2000", "f4000", "age")
df = rbind(age1, age2)
# make df long form
long.df = gather(df, frequency, Y, f1000:f4000)
long.df$age = as.factor(long.df$age)
long.df$frequency = as.factor(long.df$frequency)
Y ~ frequency * age + ear + (frequency | sub.id/ear.id). However, iffrequencyis a factor with 11 levels, then this will attempt estimate 11x11 random covariance matrix which can't be estimated unless you have an enormous dataset. So you need to simplify it. One way to simplify is to remove correlation parameters (e.g. usingafexor manually). Another is to assume compound symmetry which is(1 | sub.id/ear.id/frequency)(so yes, this model does make sense). Another is simply to use(1 | sub.id/ear.id). – amoeba Aug 23 '18 at 09:54afexfor that one, and I don't know how to remove them manually with glmmTMB. The other two options give very similar results withanova(mod1, mod2), so using(1|sub.id/ear.id/frequency), doesn't seem to add much, but it doesn't hurt either. Interestingly, modeling one of my Ys removing correlation parameters with afex resulted in a substantial decrease in AIC. – tauft Aug 24 '18 at 04:54