2

I want to evaluate the effect of vaccination on the risk of infection during outbreaks and the change in efficacy due to the time passed from vaccination. I would like to achieve a causal interpretation of the results if possible.

The data reports the number of cases over the total at the end of each outbreak, divided by vaccination status, and the average time from vaccination (in months) for each vaccination group. The setting is nursing homes.

I am trying to imagine the causal flow of these variables using DAGs but I am not sure if the time from vaccination should be considered an independent ancestor of the outcome (fig1) or a descendant of the vaccination (fig2).

Fig 1 Fig 2

Consequently, I am struggling to model my question in a regression setting, that is: the effect of vaccination (Vax) in determine infection risk (rate of infected at the end of the outbreak), and how the effect of time from vaccination (VaxTime) changes Vax effect. Should I adjust for both VaxTime and VaxTime * Vax (eq1), or for VaxTime only (eq2)?

$$eq. 1:\ f(y) = \beta_0 + \beta_1 Vax + \beta_2 VaxTime + \beta_3 VaxTime * Vax + ...$$ $$eq. 2:\ f(y) = \beta_0 + \beta_1 Vax + \beta_3 VaxTime * Vax + ...$$

(here I omit a random intercept for the individual outbreaks and the nursing home characteristics)

Finally, I wonder if I should include time from vaccination at all to consider the full effect of the vaccination.

Bakaburg
  • 2,917
  • What exactly is "time from vax"? Is that the time interval from vaccination to the point at which you are considering the risk of infection? Because, if so, then I would reverse the arrow from "Infection" to "Time from vax". – Adrian Keister Jul 18 '22 at 23:24
  • Uhm, I understand what you mean. The data I have is related to outbreaks, so the infection risk is the number of infected per vax status during over the total pop at the end of the outbreak (the setting is long-term care facilities). So, given that we ignore the risk between outbreaks (it's not a survival analysis) I guess that infection | outbreak is taking out the infection -> time from vax relationship. But indeed is also taking out the Vax -> time from vax link isn't it? The main question stays though: should I keep time from vax as a main effect or just as interaction? – Bakaburg Jul 19 '22 at 07:52
  • given that (Vax -> Infection) | Outbreak changes with Time from Vax strata. – Bakaburg Jul 19 '22 at 07:55
  • I'm sorry, but I can't parse your comments. Could you please re-post after correcting your grammar errors? Also, please explain how you're using the pipe symbol '|'. I'm not familiar with your syntax. – Adrian Keister Jul 19 '22 at 14:31
  • the | means "conditional on" while "->" means a causal relationship. So "infection | outbreak" means infection risk during an outbreak that started already. The point of my question is how to evaluate how vaccine protection changes given time from vaccination, given that an outbreak is already underway. – Bakaburg Jul 19 '22 at 15:51
  • 1
    Please edit the question to include the important information in your comments. I started writing an answer suggesting survival analysis, then saw the comment "it's not a survival analysis." Comments are easy to overlook. Please also provide more information about the nature of your data: what are the actual observations you are trying to model? In particular, as these seem to be aggregated data, what is the observation that you call "time from vax"? That makes sense on an individual basis, but its application to this context isn't clear. – EdM Jul 20 '22 at 12:12
  • 1
    In the meantime, review this page on why it's usually best to model individual coefficients for all predictors involved in an interaction. – EdM Jul 20 '22 at 12:16
  • I updated my question and hopefully made it clearer – Bakaburg Jul 23 '22 at 15:36

1 Answers1

0

If Vax represents a fraction of individuals vaccinated in a facility

The time since vaccination might be considered a moderator of the effect of vaccination, implicit in your including it in an interaction term with the vaccination prevalence. There's some difficulty forcing this scenario into DAGs; see for example Weinberg, Can DAGs Clarify Effect Modification?, Epidemiology 18: 569–572 (2007).

With respect to including time since vaccination as a predictor outside of its interaction term, that's typically the best practice. It seems particularly important here, as it's quite possible that there will be no substantial interaction between time and prevalence of vaccinations but that time on its own is important (on the log-odds scale; presumably you're doing logistic regression for these binomial outcomes), given that there have been vaccinations. You don't want to miss that possibility.

There's a good chance that there won't be a monotonic association between that time and outcome. Vaccinations very close to your evaluation times at the end of outbreaks won't have had enough opportunity to provide immunity; immunity from vaccinations long before the evaluation times might well have waned in the interim. There will need to be some flexible modeling of time.*

A potential difficulty will be that the coefficient(s) reported for Time from Vax in the interaction model of equation 1 will be at a value of Vax = 0, which might seem to makes no sense if Vax is a continuous measure. For interpretability of reported coefficients it might help to re-center the Vax values around some typical value, even though predictions from the model should be the same in any case.

If Vax is a 0/1 indicator of whether a facility has had vaccinations

This is a much simpler scenario. With Vax = 0 being no vaccinations, recognize that the interaction term is just a product, and specify VaxTime = 0 when Vax = 0. Under your equation 2, for cases with Vax = 0 you have:

$$ f(y) = \beta_0$$

For cases with Vax = 1 you have:

$$ f(y) = \beta_0 + \beta_1 + \beta_3 \text{VaxTime}$$

That is, the interaction term VaxTime*Vax is non-zero only when Vax = 1; it's thus identical to VaxTime. That interaction term can be represented just as VaxTime, covering both Vax = 0 and Vax = 1 situations if you code VaxTime as 0 when Vax = 0. Your two equations then are equivalent, except that $\beta_3$ in your second equation would be numerically equivalent to $\beta_2+\beta_3$ in the first.

As noted above, you should model VaxTime as some flexible function g(VaxTime); the above simplification of the interaction term to a term g(VaxTime) holds.

I'd worry about causal inference if this wasn't a randomized trial, as the characteristics of an institution making a choice not to vaccinate might also carry over to other policies that could affect disease outbreaks.


*A similar argument might be made for your Vax predictor if it's continuous, as things like herd immunity can lead to its having non-linear associations with outcome.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thanks for the answer! I am not sure I follow why VaxTime should be important on his own, as you say afterward it's coefficient would not make sense if Vax == 0 isn't it?. You suggest to center Vax as a solution, but Vax is a categorical variable, how can I center it? One solution would be to turn it to numeric but the assumption of linear effect of the various vax statuses is quite strong, while splines seem overly complex. – Bakaburg Jul 23 '22 at 17:14
  • Regarding the non-monotonicity you are totally right. As a simplifying approach I am downgrading the vaccination status if the outbreak started < 14 days after the vaccination. – Bakaburg Jul 23 '22 at 17:14
  • @Bakaburg sorry, I wasn't aware that you had individual vaccination status data; I thought that Vax was the overall vaccination percentage in each nursing home. With 0/1 for Vax you don't need an interaction term, just additive Vax and Time terms if Vax=0 is unvaccinated, and coding Time=0 for unvaccinated. This is what whuber suggests here for a continuous predictor that only exists for one value of a binary predictor. – EdM Jul 23 '22 at 20:36
  • @Bakaburg the same holds even if the 0/1 Vax coding is per nursing home. – EdM Jul 23 '22 at 20:58
  • No, no! the data is aggregated. I am sorry I wasn't clear and divided by vax status. For each vax status I have cases and the denom. By Vax == 0 I meant the control group of unvaccinated. – Bakaburg Jul 25 '22 at 10:18
  • Why I wouldn't need an interaction term if I have info at the vax group level? – Bakaburg Jul 25 '22 at 10:19
  • @Bakaburg I added to the answer to deal with the 0/1 Vax situation, which is much simpler and identical to what whuber suggested in a similar context. You might still think of it conceptually as an interaction term, but the term is 0 when Vax = 0 and is equal to VaxTime (or whatever function you specify for it) when Vax = 1. – EdM Jul 25 '22 at 12:55