Inverse probability weighting for cox proportional hazards model to control for group mismatch vs multivariate cox model with confounder as covariate

Question

I am using a retrospective data set that looks at cancer patients' progression-free survival, and trying to determine if there is a difference between the treatment group and the control group. The grade of cancer though is a confounding variable, for which patients with high grade disease received treatment 80% of the time, and patients with low grade disease received treatment 20% of the time.

I am trying to compare progression-free survival rates using a Cox proportional hazards model between the treatment group and the control group and control for confounding from disease grade (low vs. high).

I am currently using inverse probability weighting to create ATE weights, and using these weights in the cox proportional hazards model (using R). I was wondering if it would be just as effective to create a CPH model with two covariates- both treatment group and disease grade, which I believe should also help control for the confounding for disease grade. Will both of these methods accomplish the same thing, and is one method more accepted than the other?

I have posted some example code below.

library('survival')
Set the seed to ensure reproducibility
set.seed(1234)
Create a vector of time ranging from 1 month to 5 years
time <- sort(c(runif(100, 1, 12), runif(100, 12, 60)))
Create a vector of treatment group
tx <- rbinom(length(time), 1, 0.5)
Create a vector of grade group
grade <- factor(ifelse(tx == 1, sample(c(rep("high", 80), rep("low", 20)), length(time), replace = TRUE), 
                       sample(c("high", "low"), length(time), replace = TRUE)))
Create a vector of binary event representing disease progression
event <- rep(0, length(time))
event[grade == "high"] <- rbinom(sum(grade == "high"), 1, 0.8)
event[grade == "low"] <- rbinom(sum(grade == "low"), 1, 0.2)
Create the dataset by combining the vectors
dat <- data.frame(time, event, tx, grade)
#Generate propensity score weights
mod.glm <- glm(tx ~ grade, data = dat, family = binomial);
dat$ps <- predict(mod.glm, type = "response");
dat$weight <- ifelse(dat$tx ==1, 1/dat$ps, 1/(1-dat$ps) );
#Unadjusted CPH model
mod.cph <- coxph(Surv(time,event)~ tx, data = dat);
#CPH model using IPW
mod.cph.ipw <- coxph(Surv(time,event)~ tx, data = dat, weights = weight);
#Multivariate CPH model
mod.cph.multi <- coxph(Surv(time,event)~ tx + grade, data=dat);
```

score 5 · Accepted Answer · answered Mar 05 '23 at 07:41

They won't accomplish the same thing, but they will accomplish very similar things. The reason they aren't the same is that the Cox model isn't collapsible.

Suppose treatment was actually randomised, so that there was no confounding. The propensity score would then be constant -- there is no varying propensity to be treated -- so the weighted propensity score model would estimate the same target parameter as an unadjusted two-group treatment comparison. The log hazard ratio for treatment in the multivariable model would not estimate this same parameter. It would estimate a parameter that's further from zero, because that's how hazard ratios work (so do odds ratios).

Now consider what happens when there's confounding. The best you could hope for is that the propensity-weighted model estimates the same thing that it would under randomisation (which it does if the models are correctly specified) and the multivariable model estimates the same thing that it would under randomisation (which it also does if the models are correctly specified). So, if everything works perfectly, the two estimates are different, but in the same way that they would be different under randomisation. Either one is ok.

For completeness, there's one additional wrinkle: if grade and treatment both have non-zero effects, it's not possible for the model with just treatment and the model with treatment and grade to both satisfy the proportional hazards assumption exactly. Heuristically, the mix of grades in the two treatment groups will change over time, as the people with high-grade disease die. If the hazard ratio for treatment is constant over time in the multivariable model it will decrease over time in the treatment-only model. The departures may not be large, and you might be happy ignoring this complication, but you did ask.

Inverse probability weighting for cox proportional hazards model to control for group mismatch vs multivariate cox model with confounder as covariate

Set the seed to ensure reproducibility

Create a vector of time ranging from 1 month to 5 years

Create a vector of treatment group

Create a vector of grade group

Create a vector of binary event representing disease progression

Create the dataset by combining the vectors

1 Answers1