5

Let's suppose that data is collected for clinics across the state. The clinics are located in different counties, but also some of the clinics are owned by large healthcare systems that are located in different counties. This data doesn't fit the typical design of system nested within a county. The data is collected for the specific clinic. The outcomes are collected for each clinic and the sociodemographic information is available for the county level.

What is of interest is the association between the sociodemographic information on the county level and the outcomes collected at the clinic.

I'm thinking that I could create a new random intercept for each clinic (ClinicID) so that each one can be nested within the county. In the table below there are 8 clusters, nested in 5 counties. But I have not accounted for any characteristics of the health care system. Is there another way I could also account for clustering of the health systems? Would I add another random effect? I am still figuring out the specification for the random variable in glmTMMB, but I think it would be (1 | CountyID) + (1 | ClinicID) because each ClinicID is unique.

Also, where can I find information about population offsets when nesting random variables? I just want to be sure I'm using the right number-- I think it would be for the clinic, not the county.

I'm a R user and relatively new to these multi-level regression. My apologies for such a basic question and thank you in advance for any help!

Edit: I think that I can just add a random intercept also for the SystemID, and then ClinicID is nested in the CountyID: (1 | SystemID) + (1 | CountyID/ClinicID). I read somewhere you can add these things for the pseudoeffect (for example, gender). I'm not totally sure how it relates to offsets. But the problem is that the demographic info is on county level so there's no variation within the county cluster if I do it that way.

There are 15 health systems, 27 counties, 36 clinics. Thousands of observations are available per clinic.

Most counties (19) have just one health system. But 7 counties have two health systems and 1 county has three health systems. On the flip side, ten health systems are just in one county, and there are five health systems that are large, operating in 2,3,5,7, and 9 different counties, respectively.

Example:

SystemID CountyID ClinicID
A 1 A1
A 2 A2
A 3 A3
A 4 A4
B 1 B1
B 2 B2
C 5 C5
C 4 C4

Side note: Unfortunately, I'm modeling zero-inflated data with glmmTMB and the wrapper mentioned for multiple membership specification of random effects is only for lme4. But also I don't think this is multiple membership because according to the answer here "So to give a definition of multiple membership, I would say this occurs when the lowest level units "belong" to more than one upper-level unit."

In my case, we just have more than one random variable, and each clinic can only belong to one level of each of the two random variables.

New update: I realized that I need covariates for the lowest level or else there won't be variation within the county clusters. So I've been working on collecting that data.

  • 1
    How many large healthcare systems are involved? There might be few enough to include them as fixed effects in the model. – EdM Mar 28 '23 at 16:00
  • @EdM, there are 15 healthcare systems. Will edit the post. – Claire Richards Mar 28 '23 at 17:43
  • 1
    It might be helpful to document in the question the relative sizes of the healthcare systems. The way to handle them might be different if they are all of similar size versus having 2 or 3 very large systems and 12 or 13 smaller ones, perhaps with each of the smaller ones associated with just a few counties. – EdM Mar 28 '23 at 18:43
  • Ok, I'm editing the post. This is just an example of a problem, but I can provide that. – Claire Richards Mar 28 '23 at 19:12
  • 2
    Are there multiple data points (rows) for the same clinic (SystemCountyID)? If you have one data point per clinic, do you need a random clinic effect? It might depend on the outcome. See Mixed model with 1 observation per level. – dipetkov Mar 28 '23 at 19:31
  • There are thousands of data points per clinic. The clinic is the SystemCountyID. Will edit my post for clarity. Thank you – Claire Richards Mar 28 '23 at 19:40
  • 2
    Pending an answer to your question, you might consult this answer and this answer, to what seem to be highly related questions. – EdM Mar 28 '23 at 20:47
  • 2
    When you say "thousands of observations are available per clinic," do you have any covariate data on those individual observations within each clinic? Or do the "thousands of observations" just represent the total numbers involved per clinic in what seems to be a count-type outcome, one (count?) outcome value for each clinic? Or is the situation something in between those extremes? Please continue to provide such information by editing the question, as you have been doing. That makes it much easier for others to understand your situation. – EdM Mar 30 '23 at 13:55
  • Personally -- except for the simple case of a single random grouping (1 | ID) -- I find it more intuitive to think in terms of hierarchical (Bayesian) models. See for example "Data Analysis Using Regression and Multilevel/Hierarchical Models" by Gelman & Hill, esp. Part 2A. – dipetkov Apr 02 '23 at 10:48

0 Answers0