0

I'm reporting odds ratios for a logistic regression. I'm including p-values but it doesn't include R-squared as elsewhere, it says that "Odds Ratios and Log(Odds Ratios) are like R-Squared".

Is this correct, should you add e.g. McFadden R^2^ to odds ratio results?

pluke
  • 159
  • 6

2 Answers2

2

That video did not seem very helpful to me.

It guess it's technically correct that odds ratios and R2s both "explain the relationship between two things" but you could also say that about t values, OLS coefficients, Chi square tests, kendall's tau, hazard ratios, risk ratios, and a zillion other things in stats that are very very different in other important ways. It's like saying "Taylor Swift and Sea snails both contain DNA" - technically true but not very helpful in explaining either of these two things.

Odds ratios are a transformation of the coefficients you get from a logistic regression model (which the video correctly notes are just the log of the odds ratios). So just like in linear regression, you get one odds ratio/coefficient for each independent variable, and each one tells you the relationship between that independent variable and the dependent variable, holding all other variables constant. And you get a p value for each one, which tells you if the coefficient is significantly different from zero (or if the odds ratio is significantly different from one, which is the same thing).

As a side note - lots of people (like me) think that odds ratios are really confusing because people often misinterpret them as risk ratios. It might be better to just report the coefficients (log odds ratios) because those are obviously uninterpretable, and then use some other approach (like average marginal effects) to produce a measure of effect size in terms of changes in probability.

Now for R2: In linear regression/OLS the R2 is something we calculate in regard to the model as a whole - you get one R2 value for the whole model, which tells you the percentage of the variance in the dependent variable being explained by all of the independent variables together. Now in a logit model we're not "explaining variance' at all, so the entire concept of an R2 is meaningless. But people have tried to come up with various "psuedo R2" values that could serve as similar diagnostic tools about the model as a whole. The McFadden R2 is one of these, but it doesn't really have much to do with the R2 from linear regression. It shows how the log likelihood of the observed data change between a null model and the full model. Personally, I never report any psuedo-R2 from a logit model, but I know others do.

So in short - an odds ratio is a (potentially confusing) measure of association for an individual variable and McFadden's R2 is just one of various "psuedo-R2"s people have come up with to serve as a measure of the explanatory power of the model as a whole that are kinda sorta like the R2 from an OLS model. You would not calculate or report a psuedo R2 value for every odds ratio, but you might report one for the model as a whole.

  • While it's certainly confusing to try to explain an odds ratio in terms of a global fit statistic, it's not necessarily true that an odds ratio shouldn't be presented at all because it's prone to being misunderstood. – AdamO May 26 '23 at 13:37
  • I am sure that there is some context in which the odds ratio has value. I've just spent so many hours of my life painstakingly explaining to confused students, authors, reviewers, and even other teachers that odds ratios are not risk ratios, and I've seen this mistake crop up so many times in published peer reviewed papers, that I personally think the world would be a better place if everyone just forgot that odds ratios existed. – Graham Wright May 26 '23 at 14:20
  • Did you not in those cases simply consider fitting the relative risk instead? – AdamO May 26 '23 at 15:39
  • It's not a matter of my own models. I know to calculate predicted probabilities or AMEs if I want to frame my effect size in terms that people actually understand. It's that I see students writing in papers or dissertations, or MDs writing in published, peer reviewed journal articles, that a treatment increased the likelihood of recovery by 70% because the odds ratio from a logit model was 1.7. Someone somewhere has been wrongly prescribed a drug because of this very basic statistical error. It's not out of the realm of possibility that someone somewhere has been actually killed by it. – Graham Wright May 26 '23 at 17:10
2

No, you should not report $R^2$ for a logistic regression analysis. My thought is that this would be misleading because a "perfect fit" is anomalous in logistic regression and inference cannot be performed, plus a linear regression would still give you the "optimal" $R^2$ even when the distribution of the response is binary. While there are versions of $R^2$ statistics meant for analysis of categorical data, I disagree that these are somehow necessary or even useful for understanding the odds ratios themselves.

The video specifically says; "The odds ratio - and the log odds ratio - are like the R^2 [in that] they indicate a relationship between two things". Which... well, so does an OLS regression slope, a Pearson correlation, a Wilcoxon U-statistic, a covariance, a hazard ratio, a risk ratio, a risk difference, a harmonic mean difference, ... the list goes on. In fact, the comparison becomes more tenuous when you consider the problem of multivariate adjustment. In that case, the OR should be compared to a regression slope in a linear model because it summarizes a bivariate association in a multivariate projection whereas the $R^2$ is simply a bivariate association (note it relates the fitted values to the actual values irrespective of the individual components).

The point of fitting a logistic regression is precisely to obtain odds ratios. To test a hypothesis of association between an exposure and a response, you can formulate a null hypothesis that the odds ratio is equal to 1. The inference that one performs on the odds ratio is powerful and robust. Just like with linear regression, the best way to understand an odds ratio is simply to present it and its 95% confidence interval. One can go further by simply tabulating the data is the analysis is largely categorical, or by plotting the continuous exposure versus the discrete response, and fitting a smoothed curve relating the mean response over time. This "S shaped" curve is precisely what logistic regression estimates. The odds ratio is the slope of that curve, with a value of 1 indicating a completely flat probability response and a value of $\infty$ indicating a step function.

AdamO
  • 62,637