1

There is a "traditional" biomarker (binary predictor) used in the diagnosis of a disease (binary outcome) that has a high cost to perform for the clinical labs. I'm studying alternatives, which are all analyte measurements (continuous predictors).

How can I compare the two? (as in, can we replace the binary predictor with one, multiple or a combination of the continuous predictors) I know I can evaluate the binary's performance by directly comparing to outcome (precision, recall, etc.) and the continuous by AUC but I can't think of any common measures between them.

Do I fit them both in logistic regression and compare coefs/odds ratios? Do I need to find a cutoff (youden?) for the continuous variable and treat it as a binary?

I'm probably overthinking this, but I appreciate the help.

1 Answers1

0

Instead of dichotomizing the continuous predictor(s), which is generally a bad idea, take advantage of the probability structure provided by the binary predictor. Then evaluate your continuous predictor(s) in a continuous probability model; don't jump to reduce that continuous probability to a yes/no decision. Any comparison between predictors should be done on the same cases.

You can calculate a Brier Score, which for a binary outcome can be taken as the mean-square difference between predicted outcome probabilities and observed outcomes. Unless the expensive "traditional" binary predictor is used as the definition of the disease, then presence/absence of the biomarker only provides a probability of presence/absence of the disease. That lets you calculate the Brier score for the binary-predictor model. The continuous probability model will provide continuous probability estimates case by case for a Brier score.

You can use the Akaike Information Criterion (AIC) to compare models built on the same data set but with the different biomarker choices. That's typically available in the summary of a logistic regression model. Some question whether it's appropriate for comparisons of non-nested models, but the consensus seems to be that it's OK. See this page and its many links.

You say that the biomarkers are "used in the diagnosis of a disease," which implies that there are other clinical variables used in diagnosis. A better approach might be to build logistic regression models that include both the biomarker(s) and the clinical variables, and see how much information the biomarker(s) add. Frank Harrell's post on added value explains how to do that, with an extensively worked-through example of a model for significant coronary artery disease. AIC comparisons could be done here, too.

Note that there are many pitfalls in developing biomarkers, described in Harrell's post on "How to Do Bad Biomarker Research". His Regression Modeling Strategies contains much useful advice on regression modeling. In particular for your interests, see the parts of Chapter 2 on allowing for non-linearity of outcome associations with continuous predictors, and Chapter 4 on how to build models without overfitting.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you for the very complete answer! Just to answer you, there are other clinical variables but they're in patient records which I cannot access (unlike the biomarkers which are in my lab's database). I'll try using age+gender as the base model and see how it goes from there – Daniel Nunes May 25 '23 at 11:31
  • @DanielNunes it should be possible to get access to the patient records, with permission from your institutional review board for human-subjects research, provided that protections are in place for privacy and confidentiality. You should work with clinicians having experience in the field to see what they consider important. Your study will be much more useful if you incorporate more clinical data and expertise. – EdM May 25 '23 at 14:01