1

I have multivariate(?) time series data where I am trying to model coral populations over time. Measurements were taken at discrete timepoints for specific individuals within a population, and I am trying to assess the survival rates over time, as well as possibly predict future survival rates (~ 3-5 years).

My issue however, is that I have categorical variables representing "transition periods" (ex: Jan2015-July2016, July2016-July2017, July2017-Jan2018, etc.) but each transition period has a different time elapsed in years (ex: 1.51, 0.99, 0.52, respectively) Is there a way I can pair these two variables, where I could get for example, Jan2015-July2016 paired with 1.51 years and July2016-July2017 with 0.99 years?

I am trying to create a logistic regression to predict the survival of coral species' over time, but the chronological order of transition periods do not correlate with the time elapsed. Additionally, there were heat waves in certain years that greatly impacted coral survival, and I would like to include that aspect in future predictions, making the conservation of the chronological timepoint essential. For now however, I am mostly interested in figuring out a way to combine my two time-related variables, especially since I am not allowed to share my real data online.

Can anyone help me with this issue?

  • Can you clarify, how do these periods play into the data? Is data given as "This record is for the period from Jan2015 to Jul2016, at the end of the period the individual #1 is still alive", next record:"Jul2016 - Jul2017, #1 is dead", next record:"Ja2015 - Jul2016, individual #2 is still alive", next record:"Jul2016 - Jul2017, #2 is still alive", "July 2017 - Jan 2018, #2 is dead, heat wave in period" etc.? If so, the technical term for this would be "interval censored survival data" and heat waves would be a "time-varying covariate". Or did I get the data structure wrong? – Björn Jan 13 '23 at 13:33
  • Hello Bjorn, Thank you so much for helping me with understanding my data. Yes, it does seem that my data follows "Interval censored survival", where between each timepoints (ex: Ja2015 - Jul2016) I have ~200 individuals experiencing a change in area (growth rates), new individuals showing up in the population (recruitment events; binomial), and individuals found in previous timepoints have disappeared and are no longer present (mortality; binomial). For now I am focusing on predicting survival, and I was wondering if you had any ideas that could help me out? Thanks! – Grad Student Jan 15 '23 at 03:09
  • Do you know the individuals & how long exactly each is at risk? Or do you only have "out of x1 individuals in the time interval 1 of length t1, y1 survived the time interval" & then "out of x2 individuals in the time interval 2 of length t2, y2 survived the time interval", but you don't really know the overlap in individuals? Or is it worse in the sense that "there's x1 individuals at the start of time interval 1 of length t1 & z1 individuals at the end of the interval" (but we don't know to if the x1 overlap with the z1, some of the x1 died, some survived & then there's new individuals)? – Björn Jan 15 '23 at 13:39

1 Answers1

1

From the description in a comment:

between each timepoints (ex: Ja2015 - Jul2016) I have ~200 individuals experiencing a change in area (growth rates), new individuals showing up in the population (recruitment events; binomial), and individuals found in previous timepoints have disappeared and are no longer present (mortality; binomial)

it seems that you have data on individuals all evaluated at the same points in time while they are alive. That's a type of panel data. (If each individual can have its own observation-time intervals then you might need more sophisticated methods to handle the interval censoring.)

In terms of survival per se, you have a setup for a fairly standard discrete-time survival model. You format the data such that you have, for each time interval, one row of data for each individual who was at risk of death at the start of the interval. That long data format simplifies analysis.

Each row can contain the calendar start date of the interval to handle chronological time as a predictor, the duration of the interval to handle the different durations of intervals, and values during the interval for covariates like "heat wave present" (or maybe even better, some continuous measure of heat stress). I don't know enough about corals to say for certain, but I suspect that you might need to include an individual's age at the start of each time period as a covariate, or maybe maybe its size as a proxy for that.

The event/outcome binary marker for each individual and time interval is set to 1 if the individual died during the time interval, to 0 otherwise. That 0 outcome value also is used for an individual lost to follow up for reasons other than death during an observation period ("right censoring" in survival analysis). This data format also allows you to include an individual newly born or otherwise added to the study, for time intervals starting with the first time the individual is observed.

Then you do a binomial regression over the entire data set with appropriately modeled covariates (calendar date, duration of observation interval, age, size, environmental variables...). A logistic regression is one way to do this, but a complementary log-log link instead of the logit link used for logistic regression is more closely related to proportional-hazards survival models.

You might model births with a Poisson or other regression model for counts; with the default log link, you would include the log-duration of each time interval as an offset to account for the differences in duration of observation intervals.

EdM
  • 92,183
  • 10
  • 92
  • 267