4

I have a data set that is analogous to a survival analysis dataset.

I have experimental animals, and these animals are modelled as having two states where the second state is absorbing. i.e. Each individual will only transition once, and when it reaches state 2, it stays there. The individual may also opt to never enter state 2 for the whole experiment.

I understand this to be a basic setup for a survival analysis using Markov chains.

I'm worried about the effect of individual variance in this setup. For example… those individuals with a tendency to quickly transition into state 2, they will have a small impact on the transition parameter estimate, and those individuals who never transition, will have an over represented effect on the transition parameter estimates (as they are in state 1 for a longer period, they are evaluated more times by the likelihood of the state transition).

From my understanding of Markov chains, I'm assuming the parameter estimates will be biased to representing those individuals who are less prone to transitioning into state 2.

When dealing with individual variance… my intuition is to add a random effect to capture that variance and integrate it out of the likelihood. However, in this context, since each individual can only transition once (or never), that means each random effect level will have a sample size of 1 or 0… which feels problematic.

I was wondering if there were any standard methods to deal with this kind of bias in Markov chain-style survival analysis. Or perhaps, Markov chain is just a bad choice when there is expected individual variance that can't be accounted for with covariate data. Also, please let me know if I'm misunderstanding something about Markov chains here.

Edit: Why not a cox-prop-hazard model?

In this dataset, I have experimental individuals which share the same environment & hazard covariates. Hence, when one individual encounters high-hazard covariates, this means all other individuals are also exposed to the same high-hazard covariates.

My understanding of cox-prop-hazard is that covariates are only evaluated at event times (death) accorss individuals. So in this context, the cox-prop-hazard model will always be comparing individuals exposed to the same covariates.

My understanding is that a Markov-chain approach will get around this, as Markov-chains will take into account the whole covariate history of all individuals.

Comments

Sorry for describing this setting in rather abstract terms... but this is a field experiment with animals. I'm worried if I start going into the intricacies of my field setup, it will just confuse matters, as it will take a lot of text to describe the experimental setting and expected behaviours.

RTbecard
  • 430
  • 2
    Your problem setting is not very clear. What are you exactly analysing, measuring or comparing? – Sextus Empiricus May 14 '23 at 15:36
  • If this is about a comparison then why not use a proportional hazards model? – Sextus Empiricus May 14 '23 at 15:37
  • 2
    If all individuals start in state 1 and only a portion of them make the transition to the absorbing state 2 over the period of observation, then what you have is equivalent to a standard survival model. Is there some reason why you can't use that approach? With at most 1 transition per individual, there is no need for a "random effect"; you wouldn't even be able to estimate such a "random effect." The bias that might otherwise arise from individuals who don't make the transition is handled by treating their final observation times as right-censored times. – EdM May 14 '23 at 15:45
  • 2
    Things are different if instead everyone starts in a state 0 with reversible transitions to state 1 and an absorbing transition from one or both of those to state 2. Please edit the question to address the issues raised in comments, as comments are easy to overlook and can be deleted. – EdM May 14 '23 at 15:48
  • Hey @SextusEmpiricus, I'm deliberately trying to keep the setting vague here. I first looked at cox-prop-hazard model... but I didn't like how they only use covariate data @ the event points of individuals. In my dataset, a lot of the covariate data is correlated in time across individuals. I feel this is a problem for cox prop models (i.e. many individuals are exposed to high-hazard covariates at the same times). – RTbecard May 14 '23 at 15:48

3 Answers3

5

In the case of a single terminating event, a comparison with the Cox PH model as @sextus-empiricus suggested is a great idea and you'll find that the two will provide virtual identical standard errors of comparable quantities.

My impression is that you are worrying too much about variances being unrealistic. When the vast majority of subjects stay in one state for a prolonged period, the state transition probability associated with that (start in state A and end in state A in the next time period) is near 1.0 and on the log-likelihood scale this contributes nearly 0 to the log-likelihood, which ultimately is used to estimate the variance. In other words, near-redundancies in the data are already accounted for by Markov modeling. For that reason the likelihood doesn't change whether or not you carry observations forward once a subject reaches an absorbing state. For numerical efficiency, don't carry such records forward.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
  • 2
    Okay, I think I get it. For datasets where stationary behaviour is the norm (A -> A), prolonged observations of that behaviour have little sway on the log-likelihood as the stationary probabilities are near 1. Thanks for giving an explanation clear enough for a field-biologist to understand :) – RTbecard May 14 '23 at 16:13
3

Technically, according to Wikipedia,

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

I suppose that you could re-define your data in a way to incorporate a series of "events" that represents covariate exposures or patterns of exposures, but that seems unnecessarily complicated here. There is no reason to go beyond your simple two-state model, provided that you take care in defining your predictor variables.

You are correct that a Cox model only evaluates predictor values that are in place at event times. Nothing, however, prevents you from defining time-varying predictors that incorporate the histories of covariates in some way: cumulative values, running averages, averages weighted toward more recent observations, anything that makes sense based on your understanding of the subject matter. Then, as Frank Harrell points out in another answer (+1), it won't really matter whether you set the problem up as a two-state Markov model or as a Cox model.

EdM
  • 92,183
  • 10
  • 92
  • 267
3

I'm worried about the effect of individual variance in this setup. For example… those individuals with a tendency to quickly transition into state 2, they will have a small impact on the transition parameter estimate, and those individuals who never transition, will have an over represented effect on the transition parameter estimates

The individual variations can be absorbed into a single survival curve. Whether or not an individual is susceptible to transitioning quickly or not might be unknown, yet that uncertainty can be expressed into the survival curve. Some individuals transition early others do not. That's exactly what the survival curve is about, it models that variation between different individuals.

A similar situation is described in the question: How to introduce uncertainty in fitting the original data when simulating survival curves?

  • 1
    We tend to teach survival curves to the exclusion of state transition models (multistate models). Time to event analysis is more restrictive while being harder to interpret once there is a non-absorbing state or more than one type of event, IMHO. – Frank Harrell May 14 '23 at 22:45
  • 1
    @FrankHarrell I speak about survival curves because of the "the effect of individual variance". In the linked question I have an answer that explains how this variance in individual survival curves works out in changing the effective survival curve for the group. Once you have the alternative survival curve that incorporates individual variations, then you can translate this back into hazard as function of time. – Sextus Empiricus May 15 '23 at 06:37