Problem
There are two excellent CV posts on specifying crossed effects models (post 1, post 2).
The issue I'm trying to wrestle with pertains to part of the answer to post 2, in particular how to nest crossed random effects.
In my study, I have:
- About 20 individuals per site
- About 10 sites
- Within each site, there were about 20 samples
The outcome in the example is participant's "interest" (the study is about out-of-school programs).
Because there are dependencies by both participant and sample, I think there are two crossed random effects, one for observations associated with each individual, and one for observations associated with each sample. The hard part for me is that these random effects are nested in one of the 10 programs.
The samples were at the same time for all of the individuals within the site, but at different times at different sites, so that sample 1 in site A was not necessarily at the same time in any sense (not the same date / time nor at the same interval from the "start" of the site's activities). Therefore, to create the variable identifying the time of the sample, I combined the site variable, the date that the sample was collected, and another variable specifying whether the sample was the 1st, 2nd, 3rd, or 4th sample collected for that date. It's a factor.
The data (in R) are as follows:
# A tibble: 2,970 × 4
interest participant_ID site sample
<dbl+lbl> <dbl> <chr> <fctr>
1 2 1001 1 1-2015-07-14-1
2 2 1001 1 1-2015-07-14-2
3 4 1001 1 1-2015-07-15-1
4 3 1001 1 1-2015-07-15-2
5 3 1001 1 1-2015-07-21-1
6 1 1001 1 1-2015-07-21-2
7 3 1001 1 1-2015-07-21-4
8 3 1001 1 1-2015-07-22-1
9 4 1001 1 1-2015-07-22-4
10 3 1001 1 1-2015-07-28-1
# ... with 2,960 more rows
Possible Solution
In the answer to post 2, the author of the selected answer wrote:
Because you do not have unique values of the tow variable (i.e. because as you say below tows are specified as 1, 2, 3 at every station), you do need to specify the nesting, as (1|station:tow:day). If you did have the tows specified uniquely, you could use either (1|tow:day) or (1|station:tow:day) (they should give equivalent answers).
In mapping this to my example, I do have unique values of the sample (tow variable), I do not need to specify the nesting. I'm having trouble specifying this model mathematically, and, thus, in terms of model syntax. (I am using lme4 in R).
But, here seem to be the options:
Not nesting the crossed random effects within the site because the sample variable includes a site identifier:
lmer(interest ~ 1 + (1|participant_ID) + (1|sample), data = df)Creating the sample variable without a site identifier but in a way so that samples within each site were still identified uniquely and nesting the crossed random effects within the site:
lmer(interest ~ 1 + (1|site/participant_ID) + (1|site/sample), data = df)
Other examples interact the crossed random effects, via adding a term such as (1|participant_ID:sample).
Does either of these seem like they would account for dependencies by both participant and sample? Or, are there other options or better ways to model this?
(1|site)into there. Your option #2 is also fine, and you don't need to change anything about how you code your sample variable;(1|site/sample)is equivalent to(1|site)+(1|site:sample)and if your sample is coded like it is then this is further equivalent to(1|site)+(1|sample). The same goes for the participant term. So option #2 will be equivalent to #1 if you add the site term to #1 as I suggested above. – amoeba Apr 13 '17 at 23:14(1|participant_ID:sample)is another issue. I would try fitting models with and without this term and see what comes out. – amoeba Apr 13 '17 at 23:150(and hence the standard deviation of the random effects comes out to0). And as you suggest, the random effects turn out to be identical for both option #1 and option #2. This seems to suggest that there's very little (to no) variance explained byprogram_IDnot explained bysample. Does that sound about right? – Joshua Rosenberg Apr 14 '17 at 17:54(1|participant_ID:sample)interaction is interesting because theparticipant_IDandsamplerandom effects remain the same but this term does seem to explain some variance - as its random effects have a standard deviation of about.20(relative to.50forparticipant_IDand.20orsample), and the residual variance decreases. This is confusing to me because the(1|participant_ID:sample)seems to be redundant with the residual - both would seem to explain variance not explained byparticipant_IDorsample. – Joshua Rosenberg Apr 14 '17 at 18:00interest), features of the activity at thesamplelevel, and pre-program intentions to study / pursue a career in STEM fields atparticipantlevel. I just didn't include them because I was focused first on the variance components. – Joshua Rosenberg Apr 14 '17 at 18:010forprogram_IDwith a different outcome (we have outcomes other than interest, and it seems for interest that there's not enough variation at the program level). – Joshua Rosenberg Apr 14 '17 at 19:03(1|participant_ID:sample)is not redundant with the residual if you have multiple observations for each participant_ID/sample combination. Is this the case? From your question, it's not exactly clear to me. – amoeba Apr 15 '17 at 10:47program_IDthe same assite(I inadvertently changed the name toprogram_IDin the comment. – Joshua Rosenberg Apr 15 '17 at 12:13(1|site) + (1|participant_ID) + (1|sample)and if thesiteterm comes out with zero variance then so be it; but you cannot know in advance, so it makes sense to put it in the model anyway. – amoeba Apr 15 '17 at 15:51