Calculate survival function for remaining life prediction using CoxTimeVaryingFitter in Python

Question

Cox’s time-varying hazard model is given here which is "CoxTimeVaryingFitter"

The mathematical formulation is represented as:

The values of log partial hazards "(x−x¯)′β" are calculated by applying the function on train set: predict_log_partial_hazard(X) which can be seen here:

After calculation of log partial hazards, how do I calculate the survival function for prediction of remaining life for unkown/test dataset with the time-varying covariate values?

Edit 1:

My objective is to predic RUL as shown here. In this article a regression is applied for log-partial hazard to RUL, as shown here:

I am unable to un dersatnd if this logic is correct or not. ALso, there is no such article that supports this. Hence I wish to calculate the baseline hazard function that will help me to plot the survial function for the test engines and predic the RUL.

You have fit a model with time-varying covariate values. Do you also want to use time-varying covariate values for the new dataset? Please edit the question to provide that information. Also, look over this page for discussion about when (if ever) it makes sense to generate survival predictions from time-varying covariate values. — EdM, Feb 03 '23 at 15:33
Thanks @EdM for your valuable suggestions. Accordingly I have modified my question. My sole objective is to predict RUL of turbofan engines where I can get a survival function plot. — NN_Developer, Feb 03 '23 at 17:12

score 1 · Accepted Answer · answered Feb 03 '23 at 20:37

After calculation of log partial hazards, how do I calculate the survival function for prediction of remaining [useful] life [RUL] for unknown/test dataset with the time-varying covariate values?

The method in the linked post has some problems.

First, it depends on having complete survival times for a set of identical turbofan engines, each run until failure. That way you know the actual RUL for each fan at each time point. That's unlikely in many real-life situations.

Second, one step raises red flags for those experienced with survival analysis:

... we’ll clip any RUL values above 125 ...

That's precisely one type of thing that survival analysis is intended to avoid: arbitrarily cutting off long survival times. You want to use all of the available data.

Third, although the log partial hazards are evaluated from a survival model, the RUL values are not. Instead, a calibration curve is used to estimate the association between log partial hazards and the (known) RUL. Effectively, you calculate a log partial hazard and read the corresponding RUL (determined separately) from a curve based on the data set.

Fourth, you have to be really careful with time-varying covariate values in Cox models. At each event time, a Cox model only evaluates the covariate values that are in place for those at risk at that time. If covariate values are constant over time, it's relatively easy to think about their overall associations with outcome. If they are changing over time (as in the sensor values in the linked post) then there's a risk of circular reasoning: if you have a covariate value at some survival time, then you already know survival has been at least that long. The author of lifelines thus doesn't allow for predictions based on time-varying covariates.

Fifth, the two prior points mean that adjustment of the baseline survival curve (which you should be able to extract from the model) for covariates, which is what you want to accomplish, is tricky in this situation. With time-constant covariates, you specify sets of covariate values, calculate corresponding log partial hazards (aka linear-predictor values), and simply estimate new sets of complete survival curves as in this answer. If covariate values are changing in time you have to re-adjust the survival function for each new set of values/times, conditional on surviving already to each time. The R survival package (unlike lifelines) will allow for such calculations, but you have to make sure that the time trajectories of new covariate values make sense. Furthermore, any estimates of survival beyond the last change of covariate values will be based on that last set of values.

Sixth, what's probably needed here is a joint model of longitudinal covariate values (sensor readings) over time and survival. That's beyond my expertise. The R Survival Task View provides some suggestions.

Thanks a lot @EdM for your reply. Is there any inbuil package in Python by which I can predict the RUL of turbofan engine dataset ? — NN_Developer, Feb 04 '23 at 03:27
@NN_Developer I'm not at all familiar with what's available in Python. — EdM, Feb 04 '23 at 03:48
Thanks a lot @EdM for your suggestions. I am now able to understand the pros and cons of cox model — NN_Developer, Feb 06 '23 at 05:09
Hi @EdM...I am a bit confused here...Is Cox model a machine learning model. Generally ML models have some metircs like RMSE. However, in Cox model we only assess the effect of covariates on log HR. Could you please clarify my doubt. — NN_Developer, Feb 16 '23 at 06:25
@NN_Developer there are many metrics for evaluating a Cox model. See Chapter 20 of Harrell's Regression Modeling Strategies. For things like hyperparameter tuning via cross-validation, the partial likelihood deviance can serve a role like RMSE. The C-index shows the ability of a single model to distinguish between cases. — EdM, Feb 16 '23 at 14:15
@NN_Developer certainly you can say that the methods used to build and evaluate a Cox model can be used in machine learning. See for example the Cox model vignette for the R glmnet package, and the survival:cox objective in xgboost. — EdM, Feb 19 '23 at 15:38

Calculate survival function for remaining life prediction using CoxTimeVaryingFitter in Python

1 Answers1