1

In the Cox model, the dependant variable is often stated as time to event, e.g.

https://www.statsdirect.com/help/survival_analysis/cox_regression.htm

even though in the regression formula we model the hazard, h. I am aware of the relationship:

h(t) = f(t) / S(t)

where f(t) is the pdf of survival time and S(t) the survival function.

I have seen lots of formulae using terms like above but these seem to be referring to rates and probabilities, not "time" itself, even though they are functions of time.

I did notice this interesting comment:

This just to clarify that the interpretation will not be different if you look at the same problem in terms of survival time.

From here: What is the dependent variable in a coxph regression in R?

but it does not go into detail. I would be most grateful for some explanation of how modelling the hazard is equivalent to modelling time to event (if ive understood correctly). If im totally on the wrong track, I would equally appreciate somebody correcting me! Thank you!

user167591
  • 667
  • 5
  • 14

1 Answers1

2

Fitting a Cox model to get regression coefficients doesn't directly involve time, only the ordering of events in time. The specific value of the hazard at each event time cancels out in the calculation. After the model has been fit, you can then estimate the underlying cumulative baseline hazard around which the covariates and their associated coefficient estimates work. See this page, for example. That re-introduces the original time scale.

The point that I think your quoted comment is making is that a covariate that leads to a higher hazard leads to a shorter estimated survival time. Don't over-interpret that. It's about direction, not specifically about the magnitude of the association. The specific form of the relationship depends on the baseline hazard as a function of time.

That's easiest to think about when the covariate values are constant over time. Drawing on this answer, for time $t$, a baseline cumulative hazard estimate $\hat{H}_0(t)$ and a set of covariate values $x_j$ with a corresponding vector of coefficient estimates $\beta$, the estimated survival as a function of time is:

$$\hat S(t;x_j) = \exp\left(-\hat{H}_0(t) \exp (\beta^T x_j)\right)$$

So covariates that increase the linear predictor $\beta^T x_j$ necessarily decrease the estimated survival as a function of time.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • This is some nice intuition. Is there also an algebraic relationship that would allow deriving the pdf of time to event from the hazard? – Richard Hardy Oct 23 '22 at 18:15
  • @RichardHardy the survival function is the complement of the cumulative distribution function (CDF) of event times, and the pdf is the first derivative of the CDF with respect to time. In a Cox model the baseline cumulative hazard is a step function in time, however, so no differentiable algebraic relationship holds. Also, given your interest in discrete-time models, note that the survival/cumulative-hazard association in this answer only holds for continuous time. See Section 3.3 of Tutz and Schmid. – EdM Oct 23 '22 at 18:30
  • Yes, I have noticed this was a continuous time model, hence my question. (In discrete time the algebraic relationship is conveniently simple.) But if there is no differentiable algebraic relationship, do we still have the equivalence mentioned in the title of the question? And if we do, that means we have a non-differentiable algebraic relationship, does it not? – Richard Hardy Oct 23 '22 at 18:56
  • @RichardHardy I think of a Cox model as a type of discrete-time approximation to a continuous underlying process, with (ideally) no tied event times. That's unlike a fully parametric model in continuous time. The estimate of the cumulative baseline hazard from a Cox model is an approximation to an assumed underlying continuous function. That's why Cox baseline hazard modeling focuses on cumulative rather than instantaneous hazard. If one appreciates the approximate nature of the resulting estimated survival function, then I don't have a problem with the title of the question. – EdM Oct 23 '22 at 19:46
  • Thanks @EdM. This is nicely put and seems to agree with the point I was making. That is, the dependant variable in the Cox model isnt really time to event that is often quoted. I think such a statement is misleading. – user167591 Oct 30 '22 at 09:49
  • Your expression is for survival as function of time but not time to event. – user167591 Oct 30 '22 at 19:07
  • Also the concept of derriving extra things after the Cox model fit is a bit alien to me. Im used to fitting a regression and interpreting estimated parameters themselves or using them to make predictions – user167591 Oct 30 '22 at 19:11
  • @user167591 the survival function represents the distribution of events over time. You need to make a choice of what you mean by "time to event"--for example, median survival or mean survival--and then derive that from the survival function. The Cox model solved the problem of needing to specify a modeled form for the baseline hazard. Provide proportional hazards hold, you don't need one at all. The downside is what you recognize: you only estimate the baseline hazard after you've fit the regression coefficients. Might seem alien, but that's how it's done. – EdM Oct 30 '22 at 19:49
  • @user167591 I suppose that the most precise way to describe the outcome variable in a Cox proportional-hazards model is the order of events in time. That also helps to highlight the connection to ordinal logistic regression with a proportional-odds assumption. See this page, for example. – EdM Oct 31 '22 at 12:34