1

I have a dataset of patients who underwent an operation, and I collected information on post-surgery complications such as necrosis (binary outcome variable). Now, I would like to investigate whether independent variables such as age (continuous), gender (binary), or underlying disease (categorical) influence the likelihood of experiencing complications.

My research suggests using a logistic regression model for binary outcome variables. Running it through R also gave me promising results. It just does not feel right. Is my approach the right one?

This my code in R:

glm(necrosis_y_n ~ age_op, data = df, family = binomial) 

It was a statistically significant result with an odds ratio of factor $1.053$ per unit increase of age. It just does not feel like the right approach statistically.

  • 8
    Can you tell us what you don't like about this? Logistic regression is absolutely the first thing that comes to mind. – Stephan Kolassa Nov 22 '23 at 21:58
  • 2
    There are other options for the link between the linear predictor and a binary outcome (besides the logit link used for logistic regression, R also allows for probit, cauchit, log, and complementary log-log), but the default logit/logistic regression typically works well. To improve the model, you might consider more flexible fitting of continuous predictors like age, for example with a regression spline. – EdM Nov 22 '23 at 22:08
  • 1
    A logistic regression might not be the best way to go, but it is the first idea that will come to mind for most. I second the inquiry about what feels wrong about running a logistic regression. Until we know what you don’t like about the logistic regression approach, I do not see a way to suggest alternatives. – Dave Nov 23 '23 at 01:23
  • Well, because patients with one complication tend to have other complications too. not because they necessarily influence each other, but because the underlying causative diagnosis is more severe so they are more likely to develop either of the complications. Furthermore, some of the outcome variables definitely are somewhat dependent. – Nicolas Pensel Dec 07 '23 at 21:25
  • So you want to model the presence of multiple types of complications? – Dave Dec 07 '23 at 21:29
  • the surgery is about removement of the eye. there are multiple types of complication of interest like post surgical: infection, inflammation, necrosis, atrophy, hanging eye lid, growing hair, deep sulcus and so on. and without looking at the data one could assume that older patients tend to have more complications, also e.g. some eyes needed to be removed because of a tumor, others because of accidents or severe infection.. and the latter tend to be more prone to complications... so I kinda want to test and combine several predictor and outcome variables to confirm my hypothesis. – Nicolas Pensel Dec 07 '23 at 21:46

2 Answers2

2

This is absolutely the solution to the problem. The logistic regression is setup specifically to predict the probability of an event happening, which can only have two possibilities in the binary case, and thus you need an error term that constricts predictions to these bounded outcomes. It simply takes the log odds and estimates a regression so that:

$$ \text{log} \left(\frac{P}{1-P}\right) = \beta_0 + \beta_1 x_1, $$

where $P$ is the probability of an event, $\beta_0$ is the intercept (conditional average of event happening) and $\beta_1$ is the slope (which can be extended to additional terms like yours).

Perhaps you have reticence about using logistic regression in R. I strongly recommend reading through this book, which is free and does a great job of explaining everything both in terms of what logistic regression is and how to use it in R. A shorter but obviously less comprehensive treatment can be found here. And a previous answer I gave here discusses why a normal regression doesn't work for your case.

User1865345
  • 8,202
1

Certainly, that is indeed the correct statistical approach, as it involves a binary outcome variable (complications: yes/no) predicted by your predictors (e.g., age, gender). To obtain the probability of Y = 1 (complications = yes) based on information about your predictor variables (e.g., age, gender, etc.), you can simply plug in the values for an individual into the formula: look, for instance, http://faculty.cas.usf.edu/mbrannick/regression/Logistic.html

  • 1
    There is a rich literature on analysis of complications in the medical outcomes / health services research / clinical epidemiology literature. It is seldom optimal to analyze complications separately, and they are usually to rare to have an adequate effective sample size for modeling. Some of the better options include expert clinical-derived numeric or ordinal outcome scales that capture the impact/severity of complications, use of patient utilities in scoring, mortality-based scoring, and hierarchical ordinal scales. – Frank Harrell Nov 23 '23 at 11:22