Are my random effects nested or crossed?

Question

I am using the R package lme4 to build a mixed-effects model. My data is set up in the following way:

set.seed(1)
df = data.frame(group1 = factor(c(rep(1,36),rep(2,36),rep(3,36),rep(4,36),rep(5,36))),
                group2 = factor(rep(c(rep(1,12),rep(2,12),rep(3,12)),5)),
                uniqueid = seq(from=1,to=180),
                value = rnorm(n = 180, mean = 10, sd = 2))

I have two grouping variables, group1 and group2. group1 consists of 5 different categories, and group2 consists of 3 different categories. This creates a total of 15 unique combinations of group1 and group2, and 12 unique observations within each unique combination, like so:

#xtabs(~group1+group2,df)
  group2

group1  1  2  3
     1 12 12 12
     2 12 12 12
     3 12 12 12
     4 12 12 12
     5 12 12 12

My goal is to build a mixed-effects model to get the fixed-effects parameters of being included in group1 and group2, as the 12 samples within each unique combination are not independent.

Intuitively, I thought to build a model like so:

lme4::lmer(data=df,
           formula=value ~ group1 + group2 + (1|group1) + (1|group1:group2))

where group2 is nested within group1, however, there is nothing inherent about the data structure that suggests it could not also be:

lme4::lmer(data=df,
           formula=value ~ group1 + group2 + (1|group2) + (1|group2:group1))

where group1 is nested within group2. This leads me to believe that I am actually dealing with crossed effects, where the proper model would be built like so:

lme4::lmer(data=df,
           formula=value ~ group1 + group2 + (1|group1) + (1|group2))

Other reasons for believing I am dealing with crossed effects are that my group2 categories exist within all levels of the group1 categories, and vice versa. There are not "unique" group2 categories that only exist within certain categories of group1, although the observations within combination are unique.

EDIT (an analogy): An analogous situation would be if there were 5 unique race categories, say, "White", "Black", "American Indian", "Asian/Pacific Islander", and "Other," along with 3 unique ethnicity categories say, "Hispanic", "Non-Hispanic", and "Other." This would allow for 15 unique combinations. Within each unique combination, there are 12 samples that are dependent within the unique combinations, for a total of 180 samples. This way, there are a total of 60 individuals that are "Hispanic", broken up such that 12 fall into each race category.

The confusion I am running into is related to the answer on a related post: Crossed vs nested random effects: how do they differ and how are they specified correctly in lme4? where it uses the image:

to describe a crossed random-effect scenario. Here, it seems that the observations within each "Class" are being shared by the "Schools", but this situation differs from my scenario. In my scenario (using the race/ethnicity example), there are unique observations for each combination of race and ethnicity (60 observations for each ethnicity, split evenly into 12 observations for each race.)

one · Accepted Answer · 2022-04-19T17:47:12.787

1

I am going to only focus on the question: Is it appropriate to use nested or crossed design between group1 and group2 in random model?

The short answer is to use crossed design.

I think the answer becomes really clear if we understand what is the difference between crossed and nested design. I am going to direct you to these two posts in stats.stackexchange that clearly explain the differences: post 1 and post 2.

Essentially, nested design, as name suggested, means one variable is designed to be within another variable. For example, students are intrinsically nested within school. On the other hand, crossed design means there are no directional relationship. For example, for two variables weights and heights, we cannot specify either one of them is nested within the other. Therefore, we use crossed design.

Back to your scenario, you already made it clear that it is a crossed design. Especially in this part:

however, there is nothing inherent about the data structure that suggests it could not also be: lme4::lmer(data=df, formula=value ~ group1 + group2 + (1|group2) + (1|group2:group1)

This suggests there is no directional relationship between group 1 and group 2 as it works both ways.

Therefore, the correct random model is (1|group1) + (1|group2)

Here, it seems that the observations within each "Class" are being shared by the "Schools", but this situation differs from my scenario.

It is similar, except the arrows in the figure go both ways in your analogy and then change "Schools" to ethnicity and "Class" to race.

In my scenario (using the race/ethnicity example), there are unique observations for each combination of race and ethnicity (60 observations for each ethnicity, split evenly into 12 observations for each race.)

This is true but remember the it is also true to state "36 observations for each race, split evenly into 12 observations for each ethnicity." This is exactly referring back to "the arrows go both ways".

edited Apr 19 '22 at 17:47

answered Apr 18 '22 at 19:33

one

270

Thanks for the response, and this was my thought as well. My confusion lies in the first post you linked. Using the classes within schools example, for crossed random effects, I do actually have unique observations within group2. That is, there are 3 unique category types within group2, and 5 different observations of each category (one for each of group 1). To me, this seems very similar to the nested design described in post 1. – geoscience123 Apr 18 '22 at 19:58
Are those "5 different observations of each category" shared across group2? If so, this is a crossed design. If not, then use nested design with (1|group2) + (1|group2:group1). – one Apr 19 '22 at 14:36
An analogy would be if there were 5 unique race categories, say, "White", "Black", "American Indian", "Asian/Pacific Islander", or "Other," and 3 unique ethnicity categories say, "Hispanic", "Non-Hispanic", and "Other." This would allow for 15 unique combinations. Within each unique combination, there are 12 samples that are dependent within the unique combinations, for a total of 180 samples. This way, there are a total of 60 individuals that are "Hispanic", broken up such that 12 fall into each race category. (See Edit) – geoscience123 Apr 19 '22 at 15:09
Thanks for adding an example. For this scenario, we would use crossed design because race and ethnicity are not nested. In addition, they share same ethnicity/race categories across race/ethnicity. Don't worry too much about 12 observations in each ethinity+race combination. It is normal to have replicates and that would be reflected in the residual. (Updating answers). – one Apr 19 '22 at 17:36
Thanks! I found this paper: https://www.nature.com/articles/nmeth.3137.epdf?shared_access_token=D0d0MK8Gti6Hwao6TU76XtRgN0jAjWel9jnR3ZoTv0N0DeGI7mpe9VDCZJYcOB1gQiRNP5rNycymLbaP0G2LjuiXfuonXncuZhUt5-rVjhIKYJQMwoZ8w1NupHzwmAND that sums it up well and I believe your answer to be correct. Accepting it! – geoscience123 Apr 19 '22 at 17:47
Please see https://stats.meta.stackexchange.com/questions/6304/my-upvoting-policy, when you find a question sufficiently clear to write an answer, consider to upvote the question! – kjetil b halvorsen Apr 22 '22 at 16:31

Are my random effects nested or crossed?

1 Answers1