Why does the b-value in logistical regression tell how much do the odds change when X increases by one? For example if we have people who smoke and people who don't and we are trying to find out how does smoking affect coronary artery disease. If we have total people of 500, 210 people who smoke and 290 who don't. Of those who smoke 81 have coronary artery disease and of those who don't 62 have coronary artery disease. If I use logistical regression for this in SPSS i get that the b-value is 0,837 and this value hasn't been explained at all in my materials. Could someone tell me what the b coefficient is and what does it have to do with odds?
-
Welcome to CV! You might find this post helpful: Help me understand adjusted odds ratio in logistic regression – medium-dimensional Sep 21 '23 at 11:19
-
By b-value, are you referring to the $\beta$ coefficient? – Shawn Hemelstrand Sep 21 '23 at 11:24
-
Yes the β-coefficent, sorry I wrote that wrong. – Topi Sep 21 '23 at 11:25
1 Answers
A logistic regression assumes that the log-odds of a 1 are given by $$\text{logit} P(Y_i=1| \text{covariate vector } \boldsymbol{X}_i \text{ for observation } i) := \beta_0 + \boldsymbol{\beta} \boldsymbol{X}_i,$$ where $\text{logit}(x) = \log(x) - \log(1-x) = \log(x/(1-x))$ (i.e. turns a probability into the logarithm of the odds or "log-odds").
In your case this simplifies to $$\text{logit} P(Y_i=1| \text{smoking}_i) := \beta_0 + \beta_i\text{smoking}_i,$$ where $\text{smoking}_i = 0$ for non-smokers and $\text{smoking}_i=1$ for smokers.
Let's assume this is from a trial design (e.g. a randomized trial), where you can unproblematically use such a simple logistic regression to infer causality. In that case,
- $\beta_0$ gives the log-odds of having an event for non-smokers, i.e.
- $\text{logit} P(Y_i=1| \text{smoking}_i=0) := \beta_0$ and
- thus, the odds are $\exp(\beta_0)$ or
- equivalently the probability $P(Y_i=1| \text{smoking}_i=0) = \exp(\beta_0)/(1+\exp(\beta_0))$.
Similarly,
- $\beta_0 + \beta_1$ gives the log-odds of having an event for smokers, i.e.
- $\text{logit} P(Y_i=1| \text{smoking}_i=1) := \beta_0 + \beta_1$ and
- thus, the odds are $\exp(\beta_0 + \beta_1)$ or
- equivalently the probability $P(Y_i=1| \text{smoking}_i=1) = \exp(\beta_0 + \beta_1)/(1+\exp(\beta_0 + \beta_1))$.
If you form the ratio of the odds, then you have $$\exp(\beta_0 + \beta_1) / \exp(\beta_0) = \exp(\beta_1).$$ This is called the odds ratio (i.e. simply the ratio of the odds) between smokers and non-smokers, and the logarithm of it is the log-odds ratio (in this case $\beta_1$).
- 32,022
-
Thank you very much for this, it made logistical regression much more easier to understand. However, why is it said in my materials that smoking increases the odds of coronary artery disease by approximately 84% (same as β1). Is this wrong? – Topi Sep 21 '23 at 11:36
-
Should the odds ratio be same as how much smoking increases the risk of coronary artery disease i.e. exp(β1)-100%? – Topi Sep 21 '23 at 11:38
-
If you have a odds-ratio of 1.84 ($\hat{\beta}_1 \approx 0.61$), you could express that as increasing the odds by 84%. However, if the numbers you give, the directly calculated odds ratio is (81/210) / (1-81/210) * (1-62/290) / (62/290) = 2.3, which ties in very nicely with the $\hat{\beta}_1 = 0.837$ you mention, which gives $\exp(0.837) = 2.3$, so the odds are 2.3 times higher (=130% higher). – Björn Sep 21 '23 at 12:20
-
Yes I thought so too. So it's wrongly stated that smoking increases the odds by 84% and that should be by 130%. This is what caused the confusion in me when I was studying. Now everything makes sense. Thank you again! – Topi Sep 21 '23 at 12:30