0

I have been asked to asked to analyse a stepped-wedge cluster randomised trial that I did not design. Because the trial protocol has already been registered, I don't have much flexibility in the way the analysis needs to be performed.

The trial involved an intervention to increase the detection of a particular medical condition in primary care. (The condition was thought to be under diagnosed.) About 1,500 people (patients) from 12 health services (clusters) were followed for several years. Hence, this is longitudinal data, not repeated cross-sectional samples.

Here is some example data for person number 1, who belongs to cluster number 10:

--------------------------------------------------------
| person_id | cluster | time | intervention | detected |
--------------------------------------------------------
|         1 |      10 |    1 |            0 |        0 |
|         1 |      10 |    2 |            0 |        0 |
|         1 |      10 |    3 |            0 |        0 |
|         1 |      10 |    4 |            1 |        0 |
|         1 |      10 |    5 |            1 |        0 |
|         1 |      10 |    6 |            1 |        0 |
|         1 |      10 |    7 |            1 |        1 |
|         1 |      10 |    8 |            1 |        1 |
--------------------------------------------------------

At timepoint 4, the cluster that this person belongs to entered the intervention. At timepoint 7, the medical condition was detected in this person.

A few people already have the medical condition diagnosed at baseline, in which case the detected variable will be set to 1 at all time periods.

The trial protocol says that the analysis will be performed with a generalised linear mixed model. Variation between clusters will be modelled as a random intercept effect, and nested within these, time will be treated as a random coefficient effect.

Can this be done with something as simple as:

Stata:

melogit detected i.intervention i.time || cluster: i.time || person_id:, or

R:

m <- glmer(detected ~ intervention + time + (1 + time | cluster) + (1 | person_id),
        data = myData, family = binomial, control = 
        glmerControl(optimizer = "bobyqa"), nAGQ = 10)

I have been trying this syntax but my models do not converge.

I haven't worked with these models before, and any advice (preferably with Stata syntax) would be greatly appreciated.

Zoë
  • 3
  • Carrying a baseline status of 1 to a 1 for all follow-up periods seems strange. Usually this would be a baseline exclusion. To your main question I'd have random intercepts for clusters but handle within-patient correlations using an AR(1) continuous time correlation structure, separate from any random effects. – Frank Harrell Aug 30 '23 at 11:19
  • When a person is diagnosed with the medical condition, it is permanent. This is why the "detected" variable may be set to 1 at all time-points. Some people are known to have the condition at baseline, and we expect it is present but undiagnosed in some of the other people. The aim of the study is to increase the detection of the condition, and that's why people with the condition at baseline aren't excluded. Given that the aim of the study is to hopefully show an increase in the proportion of people with the condition, would you be able to suggest an alternative structure for the data? – Zoë Sep 03 '23 at 11:40

1 Answers1

1

Assuming that the detection at baseline is a result of the intervention, that may be OK. If it's not part of the intervention it could be adjusted for as a covariate. But the setting to 1 at all subsequent periods makes this sound much more like a state transition model with detection as an absorbing state. Random effects models cannot handle the within-patient correlation of 1.0 for all the carried-forward responses. State transition models (e.g. first-order Markov process) can also handle random effects for clusters, and within-patient correlation is typically handled automatically through conditioning on the previous response.

You might also think of this as a time to first detection outcome using a discrete failure time model from survival analysis, with cluster random effects.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
  • Thank you, Frank. I think you've probably identified why I couldn't get my models to converge. I was already considering survival analysis, so I think that's a good suggestion. I will investigate the state transition model option as well. – Zoë Sep 03 '23 at 12:19
  • State transition models are especially useful when there are > 2 states. See this for a case study. – Frank Harrell Sep 03 '23 at 12:39
  • Thank you, Frank. This is very helpful. – Zoë Sep 06 '23 at 10:27