How do I estimate survival probabilities using datasets that cover different amounts of time?

Question

I'm looking for help classifying a problem that I don't yet have the statistical terminology for, and for help thinking about possible approaches to work on the problem. Below, I give an analogous version of the problem and provide an example of the kind of data I am trying to apply it to.

Analogous problem

Imagine that we have a population of widgets that stop functioning over time. We can't observe the widgets directly but instead need to use experiments with subsets of them to estimate the rate at which the widgets remain functional. The basic experimental approach we take is to collect fully functional widgets and divide them into N trials, each with the same sample size (sampleSize). After some time (say 1 year) we collect the widgets from your experiment and count the number that are still functional. We are unable to track the widgets during the year, and the process of recovering/counting them is destructive so this is the end of this particular 'cohort' of the experiment.

At the same time as we started the first experimental cohort, we started a second cohort with the same number of trials and sample sizes. We leave this cohort for some additional time (say 1 more year, for a total of 2 years) before we collect the widgets from and count the number that are still functional. We are again unable to track the widgets and the recovery/count is destructive.

Based on these 2 cohorts of the experiment, we can estimate the proportion of widgets that remain functional after 1 year ($\theta_1$), and the proportion of widgets that remain functional after 2 years ($\theta_2$). But we are also interested in the survival of widgets from the end of year 1 to the end of year 2. Given the data that might result from these experiments that I describe below, how can I estimate this survival rate?

Two possible outcomes

The first case below makes sense to me; if the number of functional widgets declines through time, the survival to year 2 would be the product of survival to year 1 and survival from year 1 to year 2.

## Case 1
number of samples
N = 100
number of trials per sample
sampleSize = 1000
survival rates
theta_1 = 0.8
theta_2 = 0.64
year_1 = rbinom(n = N, size = sampleSize, prob = theta_1)
year_2 = rbinom(n = N, size = sampleSize, prob = theta_2)
plot(c(1,2),c(mean(year_1),mean(year_2))/1000,xlim=c(0,3),ylim=c(0,1),
     xlab='Year',ylab='Mean proportion survived',pch=16,cex=2)
survival rate from year 1 to year 2
theta_2/theta_1
(mean(year_2)/100)/(mean(year_1)/100)

The second case is what leads to confusion; if the number of functional widgets is higher in year 2 than in year 1, how can I incorporate both datasets to obtain an estimate for the survival rate from year 1 to year 2?

## Case 2
number of samples
N = 100
number of trials per sample
sampleSize = 1000
survival rates
theta_1 = 0.2
theta_2 = 0.3
year_1 = rbinom(n = N, size = sampleSize, prob = theta_1)
year_2 = rbinom(n = N, size = sampleSize, prob = theta_2)
plot(c(1,2),c(mean(year_1),mean(year_2))/1000,xlim=c(0,3),ylim=c(0,1),
     xlab='Year',ylab='Mean proportion survived',pch=16,cex=2)
survival rate from year 1 to year 2
theta_2/theta_1
(mean(year_2)/100)/(mean(year_1)/100)

Questions

How can I combine the datasets from experiments covering 1 year, 2 year, etc. if the 2 year dataset does not have data from year 1 because sampling is destructive?
I have not discussed sampling error here but how do I properly combine error from these 2 different datasets when trying to get estimates of survival rate?

I would say that if your year 2 dataset has a higher rate than year 1, when there is no difference in the study population, these two results are likely to be too far off to be useful. Just my two cents. It's also very hard to get interpretable results from this during survival analysis with only one time point, let alone two trials with two different time points. I'll let someone else answer who may have a suggestion. — geoscience123, Nov 19 '20 at 17:19
What you have is an interval-censored survival problem (see Kaplan Meier plots and Cox Proportional Hazards models). Not a full answer, but you can leverage later time points for analysis of earlier time points. Suppose you test 100 widgets at Year 1 and find 20 survived (20%), and another 100 at Year 2 and find 30 survived (30%) - you can infer that at least 50 widgets of the total 200 (25%) survived to Year 1, since the surviving Year 2 widgets must have survived Year 1. The trend reversal by Year 2 ought to be in the range of statistical noise, since widgets can't come back to life. — Nuclear Hoagie, Nov 19 '20 at 17:44

score 0 · Answer 1 · answered Nov 28 '20 at 16:04

As Nuclear Hoagie says in a comment, "what you have is an interval-censored survival problem," with a strong flavor of discrete-time survival analysis if you only have data for a handful of years. For the cases sampled after 1 year, all you know is that the events occurred between 0 and 1 years while the others had no events through 1 year; for the cases sampled after 2 years, all you know is that the events happened between 0 and 2 years while the others had no events through 2 years; and so on. The event times are interval-censored while the non-event times are right-censored.

A proper interval-censored analysis will pool information from all cases and events. The random sampling variability that would lead to your second case ("widgets can't come back to life," as Nuclear Hoagie says) might mean higher than desired uncertainty in estimates. Nevertheless, you should still get a useful overall analysis of survival, as a function of time and any covariates you include in the model. See this page for an introduction to ways to proceed.

How do I estimate survival probabilities using datasets that cover different amounts of time?

Analogous problem

Two possible outcomes

number of samples

number of trials per sample

survival rates

survival rate from year 1 to year 2

number of samples

number of trials per sample

survival rates

survival rate from year 1 to year 2

Questions

1 Answers1