8

Using standard survival models (e.g. Joint Survival Models), I could calculate the hazard and survival functions for individual cohorts at different time points in the future. Thus, I could make the naive argument that a standard Survival Model could show me when the average individual in a given cohort is expected to "pass a certain threshold for the first time".

Recently, I found out about First Passage Regression Models: https://www.jstatsoft.org/v66/i08/ . As I understand, these are based on the principles of Brownian Motions and explicitly model the first passage time.

To play devil's advocate - if I can do this with a Joint Survival Model, why do I need a First Passage Regression Model? What added advantages do I gain from a First Passage Regression Model instead of the standard Survival Model?

My guess is that using First Passage Regression Models (based on Brownian Motions), we end up with a full probability distribution of the first passage time instead of a point estimate - thus allowing for a richer analysis.

Is this interpretation correct?

Richard Hardy
  • 67,272
  • You might be interested in the general idea of "censoring" and models for censored data. https://stats.stackexchange.com/questions/tagged/censoring. Survival modeling is one specific variety of models for censored data, because "time until death" is a censored variable. – shadowtalker Feb 21 '24 at 15:37
  • You are linking to very specific models and it is not directly clear what the package that you are linking to is doing differently (E.g. is it modelling the survival at some specific point or more generally the entire survival curve?). Could you add a concise description how they differ (possibly they might actually turn out the same). – Sextus Empiricus Feb 21 '24 at 16:26
  • Possibly the package relates to cases where the data has only first passage times? And with a Brownian motion this would, for instance, be modeled with an inverse Gaussian distribution. If you have the entire data about the evolution, and not just the first passage times, then sure you should be able to fit a model more precisely. – Sextus Empiricus Feb 21 '24 at 16:34
  • @ sextus:I had posted this as an original question but it was probably too long and no one read it (https://stats.stackexchange.com/questions/639491/modelling-time-to-events). I am just interested in knowing what advantages/strengths do first passage time approaches offer compared to traditional models like Cox-PH? I have tried to read about this for a few days, and it seems to me that classic survival models like Cox-PH CAN NOT directly model the first passage time for a medical cohort (i.e. CoxPH can only provide a point estimate) whereas first passage time models CAN do this. Is it correct? – Uk rain troll Feb 21 '24 at 16:36
  • Thats an interesting point you bring up: A situation where we ONLY have the first passage time. In this case, using a Cox PH model would be inappropriate? – Uk rain troll Feb 21 '24 at 16:38
  • What if we had DID indeed have the entire data ... even in this case, would a CoxPH model still not be able to model the "time to first passage"? It seems to me that even if we had entire data, CoxPH can only tell us a point estimate as to when a cohort will first experience a certain behavior (e.g. not just the event, but "any event", e.g. pass some arbitrary threshold for the first time) .... but the First Passage Regression model would provide us with a distribution of these times? Is my interpretation correct? – Uk rain troll Feb 21 '24 at 16:40
  • I see I haven't read your post clearly enough. Cox PH also models only with data that has only single passage times. – Sextus Empiricus Feb 21 '24 at 16:41
  • @ sextus: do you mean my current question or my previous question (https://stats.stackexchange.com/questions/639491/modelling-time-to-events) – Uk rain troll Feb 21 '24 at 17:19
  • @ sextus: you had a wonderful answer which is no longer visible. I was just going to ask - is it possible to include First Passage Time Regression in your answer? I really like seeing basic examples which show the advantages/disadvantages of a specific modelling approach. E.g. simulate correlated data, fit a model with no correlation structure (1st model) vs a model with correlation structure (2nd model) and show the 2nd model performs better in terms of closer estimates, smaller confidence intervals, etc. – Uk rain troll Feb 21 '24 at 18:38
  • @ sextus: Is it possible to create a simulation which shows a situation where First Passage Time Regression models are clearly more suitable compared to classic CoxPH? – Uk rain troll Feb 21 '24 at 18:39
  • 1
    I undeleted my question now. It had the axes in the image switched and the alternative to the cox model was not as powerful as it could be (I used gamma instead of exponential which wasn't as powerful, probably due to a free dispersion parameter). To be honest, I must say that I haven't read the article about the threg package in detail. The principle explained in my question is just what popped up in my head as a potential possible improvement to a Cox PH model. – Sextus Empiricus Feb 21 '24 at 21:02
  • thank you so much ... I tried to work through the math myself and logic out how the concept of first passage can be applied in survival analysis. Did you have an opinion on my original question? https://stats.stackexchange.com/questions/639491/modelling-time-to-events – Uk rain troll Feb 21 '24 at 21:05

4 Answers4

8

First threshold is not an alternative to survival analysis, but rather a distribution choice for survival analysis. To quote the abstract from the linked paper:

The threshold regression methodology is well suited to applications involving survival and time-to-event data, and serves as an important alternative to the Cox proportional hazards model.

The most popular survival regression model is the Cox-PH model, which involves a specific link function (proportional hazards) and a non-parametric baseline distribution. There is flexibility gained by using this non-parametric baseline, but the proportional hazards link may not always be appropriate.

Anecdotally, I find it very common that the proportional odds link is a better fit: the proportional hazards can have an outsized impact on the tails. Likewise, the accelerated failure time model with a Weibull baseline distribution is a common alternative to the Cox-PH model.

Cliff AB
  • 20,980
  • Cliff AB: have you heard of this first passage regression model? in your opinion what are the advantages of this kind of regression model (first passage)? It still seems to me that classic models like Cox-PH can NOT directly model the first passage time ... whereas first passage regression models CAN do this? is this correct? – Uk rain troll Feb 21 '24 at 16:33
  • note: this was my original question : https://stats.stackexchange.com/questions/639491/modelling-time-to-events – Uk rain troll Feb 21 '24 at 16:34
  • I'm not quite sure what you mean by Cox-PH cannot directly model first passage time. From my understanding, taking the distribution of first passage time of various distributions allows us to make a class of survival models/link functions, and in certain cases actually reconstructs the PH link, but not necessarily. So the Cox PH model may not be able to model the same link, but both are used to model survival analysis with covariates. – Cliff AB Feb 21 '24 at 16:41
  • ...but maybe you're interested in a non-standard use of these models. For example, if you knew that the change in some outcome was distributed as Y ~ N(m(t), s(t)) (i.e. a diffusion process), you could then compute the distribution of the first passage time directly from this? This would be very different than survival analysis, since you could actually estimate time to event (i.e. first passage) without ever observing an event, for example. – Cliff AB Feb 21 '24 at 16:43
  • thanks Cliff AB. I am scratching my head trying to understand all of this. I think I understand the 3 standard survival models well enough: Kaplan-Meier (Non Parametric), AFT (Parametric) and CoxPH (Semi-Parametric). I understand that AFT assumes a survival time distribution but that comes with risks ... whereas CoxPH does not require a distribution assumption, but can only model the hazard relative to an unobservable baseline hazard (partial likelihood). As a result, regression coefficients in CoxPH are hazards ratios and are relative. – Uk rain troll Feb 21 '24 at 17:41
  • What I am confused about is how do First Passage Regression Models offer any advantages over these 3 classic survival models? Fundamentally, do First Passage Regression models answer the exact same questions as an AFT model? – Uk rain troll Feb 21 '24 at 17:41
  • If you have time, could you please add some mathematical equations to your answer and mathematically show how First Passage Models answer different questions compared to the 3 classic survival models and what are there advantages? I tried to perform a similar mathematical analysis over here https://stats.stackexchange.com/questions/639491/modelling-time-to-events .... but I think I was unsuccessful in this regard – Uk rain troll Feb 21 '24 at 17:42
3

This might be naive, but : Accelerated Failure Time models model analyse the time to event. By encoding as 0 the individuals of the cohort which did not pass the threshold, and 1 the individuals which passed it at least once, you should be able to model the mean time to first passage of the threshold in each cohort?

CaroZ
  • 755
3

Cox proportional hazards models the probability/odds that an event is of type A or of type B, given that an event A or B happened. By doing that it avoids the problem of figuring out the probability of any hazard at all (whether it is A or B), and it only cares about the relative hazard.

So, that other package ignores this and models the events in some way by fitting a distribution for the passage times?

  • The advantage of cox-ph is that you do not need to model the total hazard as function of time, and you only look at the relative hazards.
  • The disadvantage of cox-ph is that you assume a specific model for the relative hazards (that it is independent of time) and that it contains all the information about the distributions.

If the absolute hazards, can be modeled, then I imagine that a model that does not ignore the absolute hazards might perform better.


Below I simulate 1000 times an observation where there is an effect (so we expect a distribution of p-values that deviates from a uniform distribution, the stronger the deviation the higher the power). The cox model returns the low p-values less often than the exponential model (which models the passage times directly), and has less power.

The effect is very subtle and the difference is not so large. When I model with a glm model using a more general gamma distribution (the lines which have been commented out in the code below), instead of an exponential distribution, then the Cox model performs better and has larger power.

power comparison

library(survival)
set.seed(1)
n = 20
m = 10^4

pcox = rep(NA,m) pexp = rep(NA,m)

a function to

- fit a nul model

- fit an alternative model

and compute p-value based on likelihood ratio

assuming chi squared distribution for this value

pval_exp = function(time,x) { mu_0 = mean(time) mu_a = mean(time[x==0]) mu_b = mean(time[x==1]) lik_0 = sum(dexp(time,1/mu_0, log = 1)) lik_1 = sum(dexp(time[x==0],1/mu_a, log = 1))+ sum(dexp(time[x==1],1/mu_b, log = 1)) D = 2*(lik_1-lik_0) return(1-pchisq(D,1)) }

repeatedly simulate data with a non-zero-effect

and compute p-values according to two models, one of them is Cox proportional hazerss

for (i in 1:m) { x = c(rep(1,n/2),rep(0,n/2)) time = rexp(n,1+x*0.3)

the glm model below doesn't have great power
because it has a more flexible dispersion

mod = glm(time ~ x, family = Gamma(link = "identity"))

pexp[i] = coef(summary(mod))[,4][2]

the manual fitting with function pval_exp works better

pexp[i] = pval_exp(time,x)

mod2 = coxph(Surv(time) ~ x) pcox[i] = coef(summary(mod2))[5] }

pexp = pexp[order(pexp)] pcox = pcox[order(pcox)]

plot(pexp,c(1:m)/m, type = "l", ylab = "cumulative distribution of p-values", xlab = "p value", log = "xy") lines(pcox,c(1:m)/m, col = 2) lines(10^c(-10,1),10^c(-10,1), lty = 2, col = 1)

legend(0.005,0.9, c("exponential model", "cox model"), lty = 1, col = c(1,2) )

  • 1
    thanks sextus ... I am scratching my head trying to understand all of this. I think I understand the 3 standard survival models well enough: Kaplan-Meier (Non Parametric), AFT (Parametric) and CoxPH (Semi-Parametric). I understand that AFT assumes a survival time distribution but that comes with risks ... whereas CoxPH does not require a distribution assumption, but can only model the hazard relative to an unobservable baseline hazard (partial likelihood). As a result, regression coefficients in CoxPH are hazards ratios and are relative. – Uk rain troll Feb 21 '24 at 17:37
  • 1
    What I am confused about is how do First Passage Regression Models offer any advantages over these 3 classic survival models? Fundamentally, do First Passage Regression models answer the exact same questions as an AFT model? – Uk rain troll Feb 21 '24 at 17:38
  • If you have time, could you please add some mathematical equations to your answer and mathematically show how First Passage Models answer different questions compared to the 3 classic survival models and what are there advantages? I tried to perform a similar mathematical analysis over here https://stats.stackexchange.com/questions/639491/modelling-time-to-events .... but I think I was unsuccessful in this regard – Uk rain troll Feb 21 '24 at 17:40
  • @firstpassage I just added a demonstration for a simple exponential waiting time model. I guess that it will be much the same for more complicated models, and it is just more mathematics to compute everything. The advantage of cox is that it doesn't require the total hazard in time to follow a specific model, and you only look at relative hazard. This allows an application to models where the total hazard (as function of time) might follow complicated patterns. – Sextus Empiricus Feb 21 '24 at 17:47
  • 1
    Wow, this is a great simulation! I will try to unpack whats going on First you simulate a random covariate and survival times - but the survival times are a function of the covariate (i.e. dependent). Next you fit a CoxPH and a GLM model to this data and record the p-value whether the regression coefficient is zero or non-zero: since you deliberately made the times depend on the covariate, ideally it should be non-zero. you repeat this simulation many times and plot the results. Statistical theory tells us that p-values have a uniform distribution, and the uniform CDF is a diagonal line. – Uk rain troll Feb 21 '24 at 18:08
  • (https://statproofbook.github.io/P/pval-h0.html , https://stats.stackexchange.com/questions/10613/why-are-p-values-uniformly-distributed-under-the-null-hypothesis) – Uk rain troll Feb 21 '24 at 18:08
  • 1
    we can see that the p values for the coxph model (black line) are not "hugging" the dotted diagonal line corresponding to theoretical uniform CDF ..... whereas the GLM model is doing this much better. Thus, from the simulation, we conclude that the Cox PH model has less statistical power compared to the GLM. Is my interpretation of your simulation correct? – Uk rain troll Feb 21 '24 at 18:10
  • Huhh I think it would be great to pack all this into the answer. – Ggjj11 Feb 21 '24 at 18:14
  • @ Ggjj11 : thank you! what do you mean by pack all this into the answer? – Uk rain troll Feb 21 '24 at 18:28
  • @firstpassage Your comments made me rethink the graph, and I realize that I have switched the horizontal and vertical labels of the graph. I will have to re-investigate this simulation. Possibly the interpretation is completely reversed. – Sextus Empiricus Feb 21 '24 at 18:33
  • @firstpassage I have corrected the answer. Your unpacking was well done. A critical point is the part "Statistical theory tells us that p-values have a uniform distribution". That is only true when the null hypothesis is true, but the data were generated with an effect being present and a deviation from the uniform distribution is expected (and also desired). What we want to see is that small p-values have a large probability of occuring. The exponential model does this better than the Cox model (before the edit this was different, but I used a gamma model instead of an exponential model). – Sextus Empiricus Feb 21 '24 at 20:45
  • thank you for your kind words! I wonder what makes the red curve (corresponding to the cox-ph) quickly catch up and seemingly merge with the black line? is that to be expected for the tail end behavior of the (empirical) cumulative distribution? – Uk rain troll Feb 21 '24 at 20:58
  • I hate to bother you so much with my questions ... I still can't seem to figure out what kinds of modelling problems are first passage regression models trying to solve. I had left a comment above: – Uk rain troll Feb 21 '24 at 21:03
  • Is it possible to include First Passage Time Regression in your answer? I really like seeing basic examples which show the advantages/disadvantages of a specific modelling approach. E.g. simulate correlated data, fit a model with no correlation structure (1st model) vs a model with correlation structure (2nd model) and show the 2nd model performs better in terms of closer estimates, smaller confidence intervals, etc Is it possible to create a simulation which shows a situation where First Passage Time Regression models are clearly more suitable compared to classic CoxPH? – Uk rain troll Feb 21 '24 at 21:03
  • I believe that I still can improve the example by using a case where the proportional hazards are actually not constant in time. – Sextus Empiricus Feb 21 '24 at 21:10
3

As Cliff AB's answer noted, we need to distinguish between survival (a.k.a. time-to-event) data and specific models for that data (like Cox-PH, accelerated failure time, first passage models you mentioned and many others).

First consider a simple analogy: we have unbounded continous measurements and we are considering whether to model it with a normal, student-t or Gumbel distribution. The difference is that each makes different assumptions about the data.

Similarly, different models for time-to-event data make different assumptions about the underlying process. They also provide different interpretation of the data and analysis results. As any other model, they will perform well when their assumptions are met and poorly if they are grossly violated. Additionally, simpler/constrained models will tend to provide more precise estimates than complex/flexible models using the same amount of data.

To be specific, the first passage time model you mention is likely to be appealing if a convincing case can be made that there indeed is an underlying Wiener process. It will be particularly useful if we know the predictors act on the drift of the model, because this will provide a nice interpretation of model coefficients.

So e.g. modelling time to a stock hitting a value? Plausible. Time it takes a cell to complete a cell cycle? Maybe. Modelling a neurodegenerative disease? Unlikely as we know the underlying process is monotonous and thus not Wiener. More generally the independent increments assumption is going to be hard to justify for many real world processes as many processes have some form of momentum/memory/inner state.

The biggest possible advantage is that this model is quite restricted so if it is roughly correct, it will make good use of your data. The biggest disadvantage is that it is quite restricted and will thus be misleading/overconfident when a more flexible model is needed.

By contrast, Cox-PH model assumes that there is a fixed baseline hazard and any predictors act additively on the log-hazard. I don't think this has a very good direct physical interpretation, but the model is quite flexible and highly mathematically appealing and computationally tractable. It also naturally allows for predictors that change over time, multiple event types and other extensions that are hard to square with both first passage and accelerated failure time models.

It is true that the classical version of Cox-PH doesn't let you make direct predictions for time to event, but that can be ameliorated. One can restrict the class of baseline hazard functions to a semi-parametric form (e.g. a penalized spline for the log of baseline hazard) and then full predictions are possible. Indeed this is what Bayesian implementations of Cox-PH do.

One reason Cox-PH is often used is that the loss of power/precision due to its flexibility tends to be small for reasonably big datasets while the bias from an overly restricted model never goes away.

To some extent, you might be able to choose a good model based on data (e.g. via tests rejecting specific distributional forms, residual plots, comparing performance in cross-validation or posterior predictive check in the Bayesian context) but that will always be limited, so understanding how your data was collected and what it represents is crucial.

COOLSerdash
  • 30,198