I'm trying to figure out to "best" specify a model in lmer. Any insight is appreciated!
For background information, my data are repeated measures of some outcome y across a 10 year period, with each year being indexed by the variable year. The units of analysis are census tracts (tract). Each census tract is situated within several higher-order geographic factors [i.e., tracts are located in metropolitan areas (cbsa); states (state); and Census regions (region)]. Overall, I'm interested in estimating how y changes across year.
Most of the geographic factors are clearly delineated. E.g, over the course of the study, tract == A123 only belongs to cbsa == 1, state == 2 and region == 3. Given that structure, my first stab at a model was:
lmer(y ~ year + (year | tract) + (1 | cbsa) + (1 | state) + (1 | region), ...)
After taking a closer look at the data, I found that a handful of metropolitan areas were split across states. E.g., cbsa == 1 crosses state lines and can be found in state == 2 for some tracts and state == 3 for other tracts.
My question is: is the model presented above still appropriate for how these data are structured?
An alternative fit that I can imagine is something like:
y ~ year + (year| region:state:cbsa:tract) + (1 | region:state:cbsa) + (1 | region:state) + (1 | region), ...
which, to the best of my understanding, estimates separate intercepts for each state:cbsa pair (such that metropolitan areas split across states will have different random effect estimates). This and the prior model give different results.