Generate survival data with functional data using R

Question

I am reading this article here and trying to regenerate their simulation study. Here is this scenario here, among others, but if I can figure out one, the rest follow. That is,

Simulation set-up

we assume the hazard function of subject $i$ is \begin{equation} h_i(t|Z_i(t)) = h_0(t) \exp(\alpha Z_i(t)), \end{equation} where $h_0(t) = \lambda t^{\lambda-1} \exp(\eta)$, a Weibull baseline hazard function with $\lambda = 2$, $\eta = -5$, and the association parameter $\alpha = 0.5$.

Consider the linear model.

$$ Z_i(t) = a + bt + b_{i1} + b_{i2}t, $$

The linear longitudinal trajectory is described with $a = 1$, $b = -2$. Trajectories considered above use random effect terms $b_i = (b_{i1}, b_{i2}) \sim \mathcal{N}(0,D))$, with $D = \begin{pmatrix} 0.4 & 0.1 \\ 0.1 & 0.2 \end{pmatrix}$. For simplicity, we generate longitudinal data on irregular time points $t = 0$ and $t = j + \epsilon_{ij}$, $j = 1, 2, \ldots, 10$ and $\epsilon_{ij} \sim \mathcal{N}(0, .1^{2})$ independent across all $i$ and $j$. We simulated the censoring times from a uniform distribution in $(0, t_{\text{max}})$, with $t_{\text{max}}$ set to result in about 25% censoring.

My questions

Eventually, my goal is to generate survival time (the follow-up time) and status for each subject. So, should I integrate \begin{equation} h_i(t|Z_i(t)) = h_0(t) \exp(\alpha Z_i(t)), \end{equation} from $0$ to $t$ to obtain the cumulative hazard $H(t)$ then obtain $S(t)=\exp(-H(t))$ and the inverse of $S(t)$, $t=S^{-1}(u)$. You then generate $U\sim\mathrm{Uniform\left(0,1\right)}$, substituting $U$ for $S\left(t\right)$ and to simulate $t$, the follow up time.
Given what I have said in part (1) is correct, here I am trying to do it numerically, and obviously not working since I have negative time and am also not sure how I would be using my t as described in the simulation. It would be possible to do it analytically and work through steps to obtain the survival time.

I provided an answer that deals with a statistical aspect of the question, but the coding-specific parts are off-topic on this site. For coding problems per se, Stack Overflow is a better site. This search provides a few hundred related questions on this site that might include further hints. The simsurv package probably can do this work for you. — EdM, May 27 '23 at 14:16
@EdM Thank you for mentioning this package, I am aware of it but I rather understand and see what I am coding. — ADAM, May 27 '23 at 16:07

EdM · Accepted Answer · 2023-05-27T17:22:12.070

2

The coding-specific part of this question is off-topic on this site, but there is one principle of survival analysis that you should consider implementing.

Simulating event times typically starts as you suggest, sampling from a uniform distribution over (0,1) and then finding the time corresponding to that survival fraction. The way you have this structured makes sense if you want to sample multiple event times from a distribution. In this scenario, however, you only want to take a single event-time sample from each of the randomly generated hazard functions.

Take advantage of the following relationship between $S(t)$ and the cumulative hazard $H(t)$:

$$H(t)=- \log S(t)$$

After you have your sample of the survival fraction from the uniform distribution, start to integrate your (necessarily non-negative) hazard function $h(t)$ from $t=0$ until you reach the value of $H$ corresponding to that survival fraction. Record the upper limit of integration at which that occurs as the event time. If you instead get to some maximum observation time without reaching that value of $H$, record a right-censored observation at that maximum time.

As an example, sample a survival probability and find the corresponding cumulative hazard:

set.seed(2)
(cumHazTarget <- -log(runif(1)))
## [1] 1.688036

Define your hazard function, with random effects of 0 in this instance:

hazF <- function(t) 2*t*exp(-5)*exp(0.5*(1+2*t))

Now find the value of $t$ that gives an integrated (i.e., cumulative) hazard equal to the above target. This page shows a simple way to find the upper limit of integration that gives a desired target for a definite integral. More-or-less copying that here (with a restriction of the lower limit of integration to 0) and applying it to this instance:

findprob <- function(f, interval, target) {
    optimize(function(x) {
        abs(integrate(f, 0, x)$value-target)
    }, interval)$minimum
}
findprob(hazF, interval=c(0,10),cumHazTarget)
## [1] 3.429483

That gives the time at which the cumulative hazard equals that corresponding to your sampled survival probability. (I suspect that there is a more efficient way to do this that takes advantage of the non-negativity of the hazard function, but this illustrates the principle.)

Set the top limit of interval to a value somewhat above your maximum observation time and treat times greater than the maximum observation time as right censored. For example, if your maximum observation time is 3, in this instance the function returns a time value slightly above 3, which you could then right censor at 3:

findprob(hazF, interval=c(0,3.1),cumHazTarget)
## [1] 3.099922

edited May 27 '23 at 17:22

answered May 27 '23 at 13:56

EdM

92,183
10
92
267

The coding part is not an issue to me. Would you please elaborate a bit more on your answer here. Thank you! – ADAM May 27 '23 at 16:05
1

@ADAM I added an example. – EdM May 27 '23 at 17:23
Thank you so much for the update. One last question here, where would I be using this piece of information "We simulated the censoring times from a uniform distribution in $(0,t_{max)), with $_{}$ set to result in about 25% censoring – ADAM May 27 '23 at 18:48
1

@ADAM as in my answer to your related question, use that after you've estimated all the event times. It's a trial-and-error process. Start with all events greater than your maximum observation time as censored. Sample uniformly distributed individual censoring times at a value of $t_{max}$. Censor event times greater than the corresponding $t_{max}$. Try different values of $t_{max}$ to get to a total of about 25% of event times censored. Obviously, your maximum observation time needs to be greater than 75% of event times for that to work. – EdM May 27 '23 at 19:41
Is it something like this, but I do not see where would I be using maximum observation time times <- findprob(hazF, interval=c(0,10),cumHazTarget) # vector of times (n) censor_times <- runif(n, min = 0, max = max) ztime <- pmin(times, censor_times) event <- as.integer(times <censor_times) – ADAM May 28 '23 at 18:05
1

@ADAM the paper you cite involves a maximum observation/follow-up time $U$. All simulated event times greater than $U$ thus must be right-censored at $U$. Do that censoring before you do the further censoring with runif(n,min=0,max=tmax). That might further censor some such cases at times even earlier than $U$. Play with tmax to get the desired total fraction of censored observations. Also, for efficiency in implementation you will want to Vectorize() the findprob() function. – EdM May 28 '23 at 18:54
Thank you for your clarification. I have updated my code here. My question is not a coding question; it is definitely not the most efficient code at all; I can work on improving that once I have the correct answer. You see that before doing any further censoring using runif(n,min=0,max=tmax), I have a censoring rate over 96%. So, I was wondering what am I doing wrong that made it this way. I would greatly appicaitected if you would point that out or direct me through it. Thank you for your time! – ADAM May 30 '23 at 00:31
@ADAM try a much much larger value of tmaxthan the maximum observation time. You need to play with different values of tmax to find one that works. – EdM May 30 '23 at 02:54
would you please take a look at this question, please : https://stats.stackexchange.com/questions/638069/two-component-cox-proportional-hazards-model-simulation – ADAM Feb 01 '24 at 20:01

Generate survival data with functional data using R

1 Answers1

Linked

Related