2

I am analyzing clinical data and complex microbiome data in a longitudinal study. I already compared different groups at baseline and between baseline and "events" using linear mixed models (LMMs) and permutation-based analyses.

Now, I would like to explore predictive modeling. During the time of the study some participants encountered "events". I would like to use explainable machine learning to model the risk for these events and find risk clinical/microbiome risk factors. I have no knowledge on this so far but am willing to learn - where would be a good starting point?

I am working with R.

Best regards and thank you for your help!

Galen
  • 8,442

1 Answers1

2

Modeling longitudinal and time-to-event data together has, oddly enough, an understandable name: "joint modeling." The CRAN Survival Task View has links to several implementations. You might consider Bayesian approaches like those in the rstanarm package, which has a vignette on that topic, or JMbayes. Bayesian approaches have the advantage of forcing you to think through your assumptions carefully before you start and of not tying you to the sometimes arbitrary choices about "significance" typical of frequentist modeling.

Predictions from joint models can be difficult for several reasons.

First, for any one individual, the wide underlying distribution of baseline time-to-event values can overwhelm the covariate-associated differences in time-to-event distributions that you model.

Second, time-to-event models typically work with current covariate values, so if the history of a covariate is important then you have to find a way to turn that history into a current value.

Third, there can be a serious risk of survivorship bias. If you have a longitudinal covariate value for an individual at some time point, that individual is presumably still alive at that time point. That's fine for building a model, but can pose problems for predictions. Even in the simpler scenario of (unmodeled) time-varying covariate values, the author of one widely used survival-analysis package refuses to allow for predictions based on time-varying covariates. See this page for his rationale and some discussion.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • 1
    Sometimes it's better to cast the problem as a Markov process multistage transition model where the states represent continuous or ordinal current status measurements as well as events. As an aside note that microbiome research is ridiculed in some quarters because of low quality of research methods and analyses used. There are many more ways to do it wrong than to do it right. Think of this as a multi-year learning process. – Frank Harrell Dec 15 '23 at 18:24