1

I have data from a prospective study with two measurements per participant (baseline and follow-up). I am interested in whether a cut-off (binary) obtained at baseline predicts disease development at follow-up, taking interval between baseline and follow-up into account (time interval differs for each participant).

Because some participants missed follow-up (they dropped out or deceased) my data is right-censored. Cox regression demonstrated my cut-off as significant predictor, however, this analysis was performed on the non-censored sample resulting in selection bias.

I read about inverse probability weighting, f.e., in Hernán's and Robins' book. I wonder whether this technique is also applicable in my case of only one observation per censored participant.

If no, does anybody have any advice to account for the selection bias in my sample?

edit:

The cox regression which I calculated was performed on a dataset including only non-censored participants, follow-up status (disease yes/no), and number of months from baseline to follow-up. Example of my data:

ID   disease   months   censored   cutoff
a    0          66        0          0
c    1          30        0          1
e    0          45        0          0
coxph(Surv(months, disease)
          ~ cutoff,
           covariates,
          data = dat)

I was thinking right now whether in this case I can simply account for right-censoring by including censored participants?

ID   disease   months   censored   cutoff
a    0          0         0          0
a    0          66        0          0
*b*  0          0         1          0
c    0          0         0          1
c    1          30        0          1
*d*  0          0         0          1
e    0          0         0          0
e    0          45        0          0
coxph(Surv(months, disease)
          ~ score,
          covariates,
          data = dat)

... But as my goal is to predict whether participants developed the disease at follow-up and not at baseline, I am unsure whether this analysis answers another statistical question than mine.

I was asked to provide information some more information about the data:

  • N total = 300
  • n disease = 25
  • n censored = 150
  • Do you know about times to death so that you could model death as a true competing risk? Was there an association of other censoring with the baseline score? Also, I'd suggest that you replace one of your tags (probably simulation) with the propensity-scores tag, to attract the attention of some experts on that. Although usually thought of in terms of propensity of receiving a treatment, that would typically be involved in accounting for censoring too. – EdM Jun 05 '22 at 17:58
  • Thank you for your comment, @EdM. Times to death are unknown. I performed another cox regression to assess whether the score predicted censoring, but it did not. – a.henrietty Jun 05 '22 at 18:34
  • As I think I did not formulate my question comprehensively enough, I posted an improved version of the question, @EdM, – a.henrietty Jun 06 '22 at 11:41
  • How many cases total, how many had the disease outcome, and how many were censored? Is there some continuous measure underlying the "binary cutoff"? Please provide the information by editing the question, as comments are easy to overlook and can be deleted. – EdM Jun 06 '22 at 14:42
  • I posted an edit to answer your questions, @EdM. Regarding the cut-off, there is indeed a continous measure but using this would miss the point of the question and theory behind it. – a.henrietty Jun 06 '22 at 14:48
  • Using a cutoff will make the result uninterpretable and non-reproducible as such cutoffs don't exist in nature (true discontinuities don't exist other than ones mandated by law such as getting national health insurance in the US when you finally turn 65). Much has been written about this. – Frank Harrell Sep 03 '23 at 12:00

1 Answers1

1

You seem to be using the word "censored" somewhat differently than is standard. A standard survival model (other than a cure model) assumes that all individuals eventually experience the event. In a standard survival model, biomarkers or covariates are only associated with how quickly the event happens. Thus all individuals with disease = 0 at any time are considered right-censored at that time. You seem to be using that word only for those who were observed for the last time at time = 0.

For inverse probability-of-censoring weighting, you need something to weight. As you recognize, those observed for the last time at time = 0 provide no information about times to events. They don't provide information about disease development over time; they make no contributions to the Cox partial likelihood at event times. Thus there is nothing to weight for them in the Cox model. For those individuals you could obtain information about how the biomarker and covariates are associated with a failure to be seen for follow up.

There are a few more problems with your approach.

One problem with a Cox model is if you don't have exact times for the event of developing the disease. That's not completely clear from your description. If all you know is that the event happened at some time between two widely spaced visits, you have what's called "interval-censored" event times. They require special handling if you want to use a Cox model, for example with the icenReg package in R.

The second problem is the use of the cutoff for a continuous predictor. That is almost never a good idea. Even if it seems to work on your data set, such cutoffs seldom extend well to new data sets.

The third problem is the limited number, 25, of events. By usual rules of thumb (10-20 events per predictor in a survival model), with so few events you only can consider 1 or 2 unpenalized predictors without risk of overfitting the data. You could consider penalization of the covariates (as in ridge regression) to deal with that problem while keeping your main predictor unpenalized.

EdM
  • 92,183
  • 10
  • 92
  • 267