2

I am trying to see if there is a class of statistical regression models which can specifically be used to model the "time to event" (e.g. time at which a certain threshold is expected to be passed for the first time). So far I have seen models that can be used for modelling how different phenomena change with time (e.g. Hazard, Survival), but nothing which directly models the "first passage time to event".

Here is a review of the research I have done:

1) Joint Survival Models

I decided to write a Joint Survival Model based on AFT (Accelerated Failure Time) instead of Cox-PH since it is easier to determine the Survival and Hazard Function in AFT vs Cox-PH. In a joint model, the survival and longitudinal models are linked through shared random effects and/or correlated error terms. This allows the model to account for the correlation between the longitudinal and survival processes.

The common terms in both components are the covariates $X$ and the coefficients $\beta$. These represent the variables of interest and their effects on the response, respectively. In the survival model, they influence the survival time, while in the longitudinal model, they influence the evolution of the response over time.

Survival Model (AFT Model): The AFT model describes the survival time $T$ as a function of covariates $X$ and a random error term $\epsilon$. It can be written as:

$$\log T = -X'\beta + \epsilon$$

where $\beta$ is a vector of coefficients to be estimated, and $\epsilon$ follows some specified distribution, such as a Weibull or log-normal distribution.

In this case, let's assume that the Survival Times have a PDF of $f(t)$ and a CDF of $F(T)$. For $Z = -X'\beta$, then we have $\log T = Z + \epsilon$. This implies: $T = e^{Z + \epsilon} = e^Z e^\epsilon$

If we denote $Y = e^\epsilon$, then $T = e^Z Y$. The distribution of $T$ is specified, so we know the pdf and CDF of $T$, denoted as $f(t)$ and $F(t)$ respectively.

Given the transformation $T = e^Z Y$, we can derive the survival and hazard functions for $T$ as follows:

$$S_T(t) = P(T > t) = P(e^Z Y > t) = P(Y > t / e^Z) = 1 - F_Y(t / e^Z)$$

$$h_T(t) = f_T(t) / S_T(t) = f_Y(t / e^Z) / (1 - F_Y(t / e^Z))$$

where $f_Y$ and $F_Y$ are the pdf and CDF of $Y$.

Longitudinal Model: The longitudinal model describes the evolution of covariates over time. A common choice is the linear mixed effects model, which can be written as:

$$Y(t) = X(t)'\beta + Z(t)'\gamma + \epsilon(t)$$

where $Y(t)$ is the response at time $t$, $X(t)$ is a matrix of fixed effects covariates, $Z(t)$ is a matrix of random effects covariates, $\beta$ and $\gamma$ are vectors of fixed and random effects coefficients to be estimated, and $\epsilon(t)$ is a random error term.

2) ARIMA Model with Exogeneous Terms (i.e. ARIMAX):

Given a basic ARIMA model describing a Stochastic Process $Y_t$:

$$\Delta^d y_t = c + \phi_1 \Delta^d y_{t-1} + \ldots + \phi_p \Delta^d y_{t-p} + \theta_1 \varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q} + \varepsilon_t$$

We can modify this to turn it into a regression model:

$$\Delta^d y_t = c + \beta X_t + \phi_1 \Delta^d y_{t-1} + \ldots + \phi_p \Delta^d y_{t-p} + \theta_1 \varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q} + \varepsilon_t$$

where:

  • $X_t$ is the exogenous variable.
  • $\beta$ is the coefficient of the exogenous variable.

Problem: It seems to me that neither of these models (i.e. Joint Survival Models, ARIMAX) are explicitly modelling the "time to event". These models are modelling "what will happen at a certain time", but they are not directly modelling the probability the time at which the event will happen. I think these models can indirectly be used to model "time to event" via simulation and prediction, but this will not be an exact solution.

I tried to find some statistical models that address this specific problem and came across a concept known as "first passage time regression", e.g. https://www.jstatsoft.org/article/download/v066i08/879, https://www.wiley.com/en-br/First+Hitting+Time+Regression+Models:+Lifetime+Data+Analysis+Based+on+Underlying+Stochastic+Processes-p-9781848218895

If I understand correctly, we start by considering a stochastic process $X(t)$, where $t \geq 0$. The first hitting time $\tau_b$ of a level $b$ is defined as:

$$\tau_b = \inf\{t \geq 0: X(t) = b\}$$

This represents the first time $t$ that the process $X(t)$ reaches the level $b$.

Suppose now we decide to model this process using a Brownian Motion with drift $X_t = \mu(t) t + \sigma W_t$, where $W_t$ is a standard Brownian motion, $\mu(t)$ is the drift, and $\sigma$ is the standard deviation. We can model the drift term as a function of some covariates $Z(t)$:

$$\mu(t) = Z(t)'\beta$$

where $\beta$ is a vector of coefficients to be estimated. The first passage time $T_a$ for a fixed level $a > 0$ by $X_t$ is then distributed as an Inverse Gaussian (there is some mathematical property that says the first passage time of a Brownian Motion is naturally described by an Inverse Gaussian distribution, however I don't fully understand why to be honest) :

$$T_a \sim IG\left(\frac{a}{Z(t)'\beta}, \frac{a^2}{\sigma^2}\right)$$

The likelihood function for this model can be written as:

$$L(\beta | Z, T_a) = \sqrt{\frac{a^2}{2\pi \sigma^2 T_a^3}} \exp\left(-\frac{a^2(T_a - Z(t)'\beta)^2}{2\sigma^2 Z(t)'\beta^2 T_a}\right)$$

This likelihood function can be maximized to estimate the parameter $\beta$, giving us the maximum likelihood estimate (MLE) of $\beta$.

Thus, it appears that we can now exactly model the time at which a certain event might happen, by treating the underlying probability distribution of this event as a (covariate dependent) Stochastic Process and then analyzing this Stochastic Process using first passage times.

Is my understanding of this correct? Have I correctly described the shortcomings of the Joint Survival and ARIMAX models, and shown why an approach based on first passage times can remedy these shortcomings?

Richard Hardy
  • 67,272
  • 1
    Discrete-time survival models (e.g. discrete-time logistic regression to mention but one example) provide an estimate of the entire distribution of survival times (every possible time with its probability, with probabilities summing up to one). See e.g. Tutz & Schmid "Modeling Discrete Time-to-Event Data" (2016). (A pdf can be found online.) Might that be what you want? – Richard Hardy Feb 22 '24 at 15:28
  • 1
    The connection between first passage times and the inverse Gaussian distribution is discussed here. You can view it as any Gaussian distribution with a drift being the solution to the diffusion equation, a sum of such Gaussian distributions can be used to fix the boundary condition that the concentration is zero at the absorbing passage. Based on that picture of the diffusion process you can compute the first passage time. – Sextus Empiricus Feb 22 '24 at 15:37
  • 1
    The accelerated failure time model is explicitly modeling the distribution of failure (first passage) times. – Sextus Empiricus Feb 22 '24 at 15:51
  • You mentioned ARIMAX, could you explain more thoroughly how this is used in modeling survival, or link to a source that uses it for modeling survival. – Sextus Empiricus Feb 22 '24 at 20:14
  • @ sextus re ARIMAX: this is a big tangent but hear me out: first passage time can be interpreted as a continuous state space stochastic process. An ARIMA model can be used to model a continuous state space stochastic process. An ARIMA model can be created with (exogenous) variables ie covariates. Thus, perhaps we could transform the data in a such a way to model first passage times or survival times indirectly ... but I think this is a long stretch and likely does not make any sense. – Uk rain troll Feb 22 '24 at 20:47
  • "The connection between first passage times and the inverse Gaussian distribution is discussed here. " - this is a big ask, but I have always tried to understand why first passage time follows an inverse gaussian distribution. I created some math notes on this topic but got stuck. if i post question on this, is there any chance you can help me with this? – Uk rain troll Feb 22 '24 at 20:49
  • If "The accelerated failure time model is explicitly modeling the distribution of failure (first passage) times" - then what kind of advantages do first hitting time survival models have? – Uk rain troll Feb 22 '24 at 20:49
  • @firstpassage the relationship with the inverse Gaussian is described in the link. The solution to the Brownian motion with drift can be described as a Gaussian functions. The solution when there is also an absorbing boundary can bee described by a linear combination of two such functions. Then based on that solution you can compute the passage times (you have to compute how much of the distribution of the Gaussian passes). – Sextus Empiricus Feb 22 '24 at 21:16
  • Here is a discrete analogues example that might be easier to understand https://stats.stackexchange.com/a/623371/164061 Also this dead drunk man question relates to it. – Sextus Empiricus Feb 22 '24 at 21:23
  • 1
    You can see an accelerated failure time model as a specific case of a first hitting time model. The accelerated failure time model changes only one thing, which is the scale of the distribution (e.g. the entire distribution of failure times is scaled to make everything $x$ times slower or $x$ times faster). – Sextus Empiricus Feb 22 '24 at 21:25
  • Thank you again... I have been studying the AFT model lately. Is there any difference in writing the likelihood function one way or the other (e.g. relative to error distribution vs time distribution)? E.g I wrote here https://stats.stackexchange.com/questions/639606/best-way-to-write-the-likelihood-of-a-model – Uk rain troll Feb 22 '24 at 22:17
  • 1
    @firstpassage AFT models are modeling more explicitly $\log(T)$ instead of $T$, but for the likelihood function this doesn't matter. The distribution function (and related likelihood) transforms, but for a fixed observation $T$ it does this in the same way for all values of $\theta$ and for the likelihood function it is only a difference by a constant of proportionality. – Sextus Empiricus Feb 23 '24 at 06:24

2 Answers2

5

Thus it appears that we can now exactly model the time at which a certain event might happen, by treating the underlying probability distribution of this event as a (covariate dependent) Stochastic Process and then analyzing this Stochastic Process using first passage times. (Emphasis added.)

Your proposal doesn't give you any more of an "exact" event time than any other survival model. The "first passage times" for your "Stochastic Process" are themselves randomly distributed. As with any survival model, you end up with a distribution of event/first-passage times, often a very broad distribution. All you have done is to base your model on a particular set of underlying assumptions. See, for example, Figure 2 of a document you linked about "first passage time regression" based on a latent Wiener process. You still get a distribution of survival times with that approach, not "exact" times. For any particular application that set of assumptions might work better than another, but it isn't fundamentally different in what it produces: an estimate of a survival-time distribution, not any "exact" time.

In response to comments:

Although a Cox proportional hazards (PH) model only directly evaluates relative log-hazards as a function of covariates, you can use the results of the model to estimate the cumulative baseline hazard and thus the distribution of survival times for any set of covariate values. See this page, for example. In practice, the Cox model thus can give you something as equivalent to a "first hitting time" as an accelerated failure time (AFT) or other parametric model.

If the current behavior of the survival time density is influenced by previous behaviors of itself...

In the context of a survival model in which an individual can experience at most one event, I think what you're getting at is something that distinguishes PH models from other models. A PH model assumes that only the current values of covariates are associated with the current hazard of an event. The past history doesn't matter for fitting the model, even if covariate values have been changing over time. (That might affect the estimate of the baseline hazard, however.)

Beyond a PH model, a survival function can depend on the entire history of covariate values since time = 0. That's why "joint models" try to estimate both the true covariate history (typically from a restricted set of measurements that potentially include errors) and the association of covariate history with event risk. See, for example, the explanation of the R JM package by Dimitris Rizopoulos in Journal of Statistical Software 35: 1–33 (2010).

If an individual can experience multiple events, then intra-individual correlations might need to be taken into account even in the PH setting. A model of first passage time wouldn't, however, capture all that's of interest in that scenario.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • thank you for these insights! I think I understand it now? First Hitting Time models offer the following advantage: If the true density of survival times (conditional on the covariates) is a Brownian Motion with Drift (i.e. a stochastic process), First Hitting Time models might be able to offer an advantage. But both AFT and First Hitting Time Models can both explicitly model "first passage time" - the only thing that they differ in, is the underlying distributional assumptions they make about the distribution of survival times. Is this correct? – Uk rain troll Feb 28 '24 at 15:21
  • btw can a Cox PH regression, model the distribution of first hitting time? I think the answer is no, because Cox PH analyzes relative hazards whereas AFT and First Hitting Time models are analyzing the actual density? – Uk rain troll Feb 28 '24 at 15:22
  • Just thinking it over ... the AFT model assigns a probability distribution to the survival times : yet most common probability distributions are not stochastic processes (i.e. not correlated to themselves, not iid). If the current behavior of the survival time density is influenced by previous behaviors of itself ... I guess this would make a First Passage Time model hugely useful? – Uk rain troll Feb 28 '24 at 15:24
  • thanks for everything! :) – Uk rain troll Feb 28 '24 at 15:24
  • @firstpassage I added a bit to the answer to address your comments. – EdM Feb 28 '24 at 16:07
  • thank you for the updates... I will accept your answer as the answer and award it the bounty – Uk rain troll Feb 28 '24 at 16:38
  • Can I bug you one last time and ask you about this? First Hitting Time models offer the following advantage: If the true density of survival times (conditional on the covariates) is a Brownian Motion with Drift (i.e. a stochastic process), First Hitting Time models might be able to offer an advantage. But both AFT and First Hitting Time Models can both explicitly model "first passage time" - the only thing that they differ in, is the underlying distributional assumptions they make about the distribution of survival times. Is this correct? – Uk rain troll Feb 28 '24 at 16:38
  • @firstpassage yes, that's pretty much how I would state the differences. That can include differences in how the associations between covariates and the survival-time distributions are modeled. For example, the "first passage time regression" could handle survival curves that cross in time for two groups, while a regular AFT model wouldn't. – EdM Feb 28 '24 at 16:50
  • thank you for this clarification! this is something I had never thought about before (re: cross in time). I thought this "crossing in time" referred to the "proportional hazards assumption". That is, the hazards must always be "proportional" and can never cross each other. I thought in an AFT model, there is no such restriction : the hazard and survival functions can freely cross each other? – Uk rain troll Feb 28 '24 at 19:35
  • @firstpassage in an AFT model, you can think about the entire time axis as stretching or shrinking as a function of covariate values. So the survival curves for a 2-group AFT model can't cross, either. – EdM Feb 28 '24 at 20:00
2

The hazard $h(t)$ at time $t$ is the rate of change of the survival function $S(t)$

$$h(t) = -\frac{S'(t)}{S(t)}$$

For certain hitting time models, these hazards are not a constant ratio of time. This is for example the case with the inverse normal distribution.

Below is an example computation of the hazard and the relative hazard for two different hitting time distributions. The relative hazard is not constant in time.

example of non-constant hazard ratio

library(statmod)
t = seq(0,10, 0.1)
h1 = -dinvgauss(t,1)/(1-pinvgauss(t,1))
h2 = -dinvgauss(t,2)/(1-pinvgauss(t,2))
plot(t,h1/h2, type = "l",
     main = "relative hazard for two inverse Gaussian distributed time to events")

This dependency on time (if it is the case) makes a cox ph model less easily applicable.

On the other hand, an advantage of a cox ph model is that you do not need to explicitly model the absolute hazard and you only need to model the relative hazard. In such a case you only need to model the probability of the event to be of a certain class given that an event happened, and you do not care when the event happened. The advantage is that this requires less assumptions about the distribution of the time to events.


Cox proportional hazards are well applicable when a model relates to the relative hazard. An accelerated failure time model is well applicable when a model relates to accelerated time until failure.

  • An example of a relative hazard is a Poisson model where some failure happens when a first random hit occurs and the probability of a hit has a different rate.
  • An example of an accelerated failure time model is when some failure happens due to an accumulation of little steps, like in a Brownian motion, and all steps occur with a faster rate or larger step sizes.
  • thank you for your answer Sextus! Unfortunately I could only award the bounty to one answer :( – Uk rain troll Mar 01 '24 at 12:43
  • here was my big breakthrough: – Uk rain troll Mar 01 '24 at 12:44
  • First Hitting Time models offer the following advantage: If the true density of survival times (conditional on the covariates) is a Brownian Motion with Drift (i.e. a stochastic process), First Hitting Time models might be able to offer an advantage. But both AFT and First Hitting Time Models can both explicitly model "first passage time" - the only thing that they differ in, is the underlying distributional assumptions they make about the distribution of survival times. – Uk rain troll Mar 01 '24 at 12:44
  • I hope I understood everything correctly :) – Uk rain troll Mar 01 '24 at 12:44
  • @firstpassage I believe that it is good that you gave the other answer the bounty. I haven't correctly addressed this point, but the key difference between a proportional hazards model and a first passage time model is not in the distributional assumptions of the events. The main difference is that the proportional hazards model ignores the past history and considers only 'the type of event given that here has been an event' while a passage time model will consider 'the probability of an event given that it is of a certain type'. The point that I addressed is more the practical application. – Sextus Empiricus Mar 01 '24 at 13:46
  • thank you so much ... this is such a wonderful community, I am slowly starting to get a better understanding of everything: Proportional Hazards ignores previous history whereas First Hitting Time model can use the past history. This is in line with a stochastic process. I actually posted a new question about this - are there some typical situations in which past history would be important for modelling survival times? I posted a question about this here: https://stats.stackexchange.com/questions/641574/is-the-distribution-of-survival-times-always-iid . – Uk rain troll Mar 01 '24 at 14:16
  • It makes me wonder: in such situations where past history is allowed to influence the future - will the survival function still be monotonically decreasing? – Uk rain troll Mar 01 '24 at 14:17