3

I have used Kaplan-Meier method several times before when I compared how group $A$ survived compared to group $B$ through a period of (say) 5 years.

Now I face a somewhat different scenario: I have some score variable $v$ which is continuous, and I want to know how it affects survival?. Since I know $0 < v < 100$ I simply categorized it to ten groups and did a Kaplan-Meier on them.

I wonder if there's a better way? maybe Cox Regression?

EdM
  • 92,183
  • 10
  • 92
  • 267

2 Answers2

2

Your sense that Cox regression is a better solution is correct.

It's generally not a good idea to break up a continuous predictor variable. One useful approach is to use a flexible form like a spline to model the continuous predictor. That lets you discover possible non-linear relations between the predictor and outcome without using up too many degrees of freedom. Your 10 groups use up 9 degrees of freedom. In contrast, a continuous spline fit with 4 or 5 knots, usually sufficient to capture nonlinearities well, would use up less than half as many.

You can use the spline fit to display the modeled continuous relation of outcome to your predictor. If your audience wants to see full survival curves, then you can illustrate with groups separated by values of the predictor. But that grouping should be limited to display; statistical analysis should be done on the model based on the continuous predictor.

EdM
  • 92,183
  • 10
  • 92
  • 267
0

As @EdM said, it is generally not a good idea to artificially categorize continuous variables. This also extends to the graphical display of the results. Categorizing the continuous variable to estimate Kaplan-Meier curves does not make a lot of sense for multiple reasons:

1.) If you included confounders or in general other covariates in the Cox model, the results of the Cox model and the results of the categorized Kaplan-Meier curves may differ substantially, confusing the reader.

2.) All the disadvantages of artificial categorization apply to Kaplan-Meier curves as well. You may loose statistical power, create misleading depictions or even get biased curves.

I have recently done some work on this topic and created the contsurvplot R package (https://cran.r-project.org/package=contsurvplot) to offer a possible solution to this problem. With this package you can create multiple types of plots that depict the effect of the continuous variable on the time-to-event outcome directly using the survival probability. In particular, I would suggest using survival area plots or survival contour plots as described in my paper on this topic: https://arxiv.org/abs/2208.04644

Denzo
  • 504