Confused with logistic regression concept (vs. linear regression) based on causal thinking

Question

I always thinking about regression model is based on Y occurs given X. It means Y is always occur after X shown.

linear regression

Like this...

example1.

price of egg = b0 + b1*(chicken's age) + b2*(chicken's biological status) + b3*(duration after birth of egg)

Logistic Regression

But, I'm very confused thinking about logistic regression (logit link). It is based on Odds Ratio. It has property of symmetry. OR(Y given X) = OR(X given Y) (https://en.wikipedia.org/wiki/Odds_ratio#Symmetry)

So, expand multiple logistic equation... I think it can be... "Y is not always occur after X shown. X can be occur after Y."(?)

i.e.

Example1.

"occur of lung cancer = b0 + b1*(age) + b2*(number of comorbidity) + b3*(smoking status before lung cancer occured)

is equal to

smoking status before lung cancer occured = b0 + b1*(age) + b2*(number of comorbidity) + b3*(occur of lung cancer)"

or

Example2.

"Dead = b0 + b1*(age) + b2*(number of comorbidity) + b3*(lung cancer)

is equal to

lung cancer occured = b0 + b1*(age) + b2*(number of comorbidity) + b3*(Dead)"

Is it correct?

EDIT

I found similar question: Relationship between regressing Y on X, and X on Y in logistic regression

But my new example below, Odds ratio(multiple logistic regression) is not same as origin question's odds ratio(simple logistic regression).

> y = c(0,0,0,1,1,1,1,1,1,1)
> x = c(0,1,1,0,0,0,1,1,1,1)
> z1 = c(0,1,1,1,1,0,0,0,1,1)
> z2 = c(1,1,0,0,1,0,1,1,0,1)
> z3 = c(0,1,0,1,1,0,1,0,1,0)
> 
> fit = glm(y ~ x, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  1.0986123   1.154700  0.9514270 0.3413877
x           -0.4054651   1.443375 -0.2809146 0.7787759
> fit = glm(x ~ y, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  0.6931472   1.224745  0.5659524 0.5714261
y           -0.4054651   1.443375 -0.2809145 0.7787760
> y = c(0,0,0,1,1,1,1,1,1,1)
> x = c(0,1,1,0,0,0,1,1,1,1)
> z1 = c(0,1,1,1,1,0,0,0,1,1)
> z2 = c(1,1,0,0,1,0,1,1,0,1)
> z3 = c(0,1,0,1,1,0,1,0,1,0)
> 
> fit = glm(y ~ x, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  1.0986123   1.154700  0.9514270 0.3413877
x           -0.4054651   1.443375 -0.2809146 0.7787759
> fit = glm(x ~ y, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  0.6931472   1.224745  0.5659524 0.5714261
y           -0.4054651   1.443375 -0.2809145 0.7787760
> 
> fit = glm(y~x + z1 + z2 + z3, family=binomial(link="logit"))
> epiDisplay::logistic.display(fit)
Logistic regression predicting y
        crude OR(95%CI)          adj. OR(95%CI)           P(Wald's test) P(LR-test)

x: 1 vs 0   0.6667 (0.0394,11.2853)  1.0057 (0.0422,23.9878)  0.997          0.997
z1: 1 vs 0  0.67 (0.04,11.29)        0.3 (0.01,11.61)         0.516          0.496
z2: 1 vs 0  0.67 (0.04,11.29)        0.49 (0.02,11.29)        0.659          0.654
z3: 1 vs 0  2.67 (0.16,45.14)        4.47 (0.15,133.82)       0.388          0.357
Log-likelihood = -5.5623
No. of observations = 10
AIC value = 21.1245
> fit = glm(x~y + z1 + z2 + z3, family=binomial(link="logit"))
> epiDisplay::logistic.display(fit)
Logistic regression predicting x
        crude OR(95%CI)    adj. OR(95%CI)     P(Wald's test) P(LR-test)

y: 1 vs 0   0.67 (0.04,11.29)  0.96 (0.04,23.87)  0.979          0.979
z1: 1 vs 0  2 (0.15,26.73)     3.37 (0.11,99.3)   0.482          0.462
z2: 1 vs 0  2 (0.15,26.73)     2.85 (0.15,55.24)  0.488          0.475
z3: 1 vs 0  1 (0.08,12.56)     0.61 (0.02,15.96)  0.769          0.765
Log-likelihood = -6.2909
No. of observations = 10
AIC value = 22.5819

Why is this phenomenon occured?

Christian Hennig · Answer 1 · 2021-07-25T13:58:15.453

1

The odds ratio modelled by logistic regression is not between X and Y but between Y and "not Y". Actually it's not an "odds ratio" but just log odds (the odds themselves are a ratio).

In fact, mathematically, in case of a single binary explanatory variable, the logistic regression is directly related to the odds ratio, see https://en.wikipedia.org/wiki/Logistic_regression#The_odds_ratio, but this does not hold with multiple explanatory variables.

edited Jul 25 '21 at 13:58

answered Jul 25 '21 at 13:36

Christian Hennig

23,655

you mean it is difference about multiple model adjustment vs. simple model? – Chan Jul 25 '21 at 13:47
I don't understand this comment. I'm just saying that your question seems to be based on the idea that logistic regression models the odds ratio between a binary variable X (or more than one) and another one Y, the response, hence your argument that it should be symmetric in X and Y, but in fact X (explanatory variable) doesn't occur in the odds ratio modelled by logistic regression. – Christian Hennig Jul 25 '21 at 13:49
@Chan see my edit. – Christian Hennig Jul 25 '21 at 13:58
Thanks to you, I understand difference between that. you are good man. – Chan Jul 25 '21 at 14:05

Confused with logistic regression concept (vs. linear regression) based on causal thinking

1 Answers1