2

I'm conducting a research in which patients went through a surgery, for some the surgery was successful (outcome = 1) and for some it wasn't (outcome = 0). The risk factors were calculated using a Cox model to get HRs (hazard ratios) with respect to different features that were collected about the patients. One of those features was the "reason for the surgery", a binary feature with two values, let's call them "reason X" and "reason Y". My struggle is that for "reason Y", there was no patient in the data whose surgery wasn't successful (outcome = 0). This doesn't allow me to perform Cox regression to illustrate the relative risk of reason X vs. reason Y. It is also worth mentioning that the sample size is small (N=27). What can I do to illustrate this relative risk (HR) with respect to reason X and reason Y in this case?

Thank you all in advance :)

AREEEL
  • 21
  • 2
    Why did you use a Cox model? I'm not saying it's wrong, but usually you use a Cox model for survival/time to event data. Here it seems you could use logistic reg. – Peter Flom Jan 16 '24 at 14:20

1 Answers1

1

Those aren't "missing data" in the usual sense. It's just that, in your data, there weren't any events associated with one particular value of your binary predictor. Whether you evaluate this as a Cox survival model or as a logistic regression as suggested in the comments, you won't be able to get a point estimate for the hazard ratio or odds ratio unless you use some type of penalized model, e.g. Firth logistic regression.

One solution is to get a 95% lower confidence limit for the Cox regression coefficient by evaluating the profile likelihood over a range of postulated values. Then find the coefficient value that corresponds to the lower 95% confidence interval, and convert it to a hazard ratio if you wish. This page shows how to do that, based on an approach suggested by Therneau and Grambsch in Section 3.5, "Infinite Coefficients."

You still face the problem of a very small sample size. If you only have 27 cases total and no values of 1 associated with one level of your main binary predictor, you probably only have 10-15 total cases with a value of 1. The number of events, not the number of cases, is the main limit to power in a survival model. That's barely enough to handle even 1 unpenalized predictor without overfitting. You might consider using a Cox or logistic ridge regression instead (e.g., via the glmnet package, which would minimize the overfitting and, if you included your binary predictor in the penalization, would also give you a penalized estimate for its coefficient.

EdM
  • 92,183
  • 10
  • 92
  • 267