0

I always thinking about regression model is based on Y occurs given X. It means Y is always occur after X shown.

  1. linear regression

Like this...

example1.

price of egg = b0 + b1*(chicken's age) + b2*(chicken's biological status) + b3*(duration after birth of egg)

  1. Logistic Regression

But, I'm very confused thinking about logistic regression (logit link). It is based on Odds Ratio. It has property of symmetry. OR(Y given X) = OR(X given Y) (https://en.wikipedia.org/wiki/Odds_ratio#Symmetry)

So, expand multiple logistic equation... I think it can be... "Y is not always occur after X shown. X can be occur after Y."(?)

i.e.

Example1.

"occur of lung cancer = b0 + b1*(age) + b2*(number of comorbidity) + b3*(smoking status before lung cancer occured)

is equal to

smoking status before lung cancer occured = b0 + b1*(age) + b2*(number of comorbidity) + b3*(occur of lung cancer)"

or

Example2.

"Dead = b0 + b1*(age) + b2*(number of comorbidity) + b3*(lung cancer)

is equal to

lung cancer occured = b0 + b1*(age) + b2*(number of comorbidity) + b3*(Dead)"

Is it correct?


EDIT

I found similar question: Relationship between regressing Y on X, and X on Y in logistic regression

But my new example below, Odds ratio(multiple logistic regression) is not same as origin question's odds ratio(simple logistic regression).

> y = c(0,0,0,1,1,1,1,1,1,1)
> x = c(0,1,1,0,0,0,1,1,1,1)
> z1 = c(0,1,1,1,1,0,0,0,1,1)
> z2 = c(1,1,0,0,1,0,1,1,0,1)
> z3 = c(0,1,0,1,1,0,1,0,1,0)
> 
> fit = glm(y ~ x, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  1.0986123   1.154700  0.9514270 0.3413877
x           -0.4054651   1.443375 -0.2809146 0.7787759
> fit = glm(x ~ y, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  0.6931472   1.224745  0.5659524 0.5714261
y           -0.4054651   1.443375 -0.2809145 0.7787760
> y = c(0,0,0,1,1,1,1,1,1,1)
> x = c(0,1,1,0,0,0,1,1,1,1)
> z1 = c(0,1,1,1,1,0,0,0,1,1)
> z2 = c(1,1,0,0,1,0,1,1,0,1)
> z3 = c(0,1,0,1,1,0,1,0,1,0)
> 
> fit = glm(y ~ x, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  1.0986123   1.154700  0.9514270 0.3413877
x           -0.4054651   1.443375 -0.2809146 0.7787759
> fit = glm(x ~ y, family=binomial(link="logit"))
> coef(summary(fit))
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept)  0.6931472   1.224745  0.5659524 0.5714261
y           -0.4054651   1.443375 -0.2809145 0.7787760
> 
> fit = glm(y~x + z1 + z2 + z3, family=binomial(link="logit"))
> epiDisplay::logistic.display(fit)

Logistic regression predicting y

        crude OR(95%CI)          adj. OR(95%CI)           P(Wald's test) P(LR-test)

x: 1 vs 0 0.6667 (0.0394,11.2853) 1.0057 (0.0422,23.9878) 0.997 0.997

z1: 1 vs 0 0.67 (0.04,11.29) 0.3 (0.01,11.61) 0.516 0.496

z2: 1 vs 0 0.67 (0.04,11.29) 0.49 (0.02,11.29) 0.659 0.654

z3: 1 vs 0 2.67 (0.16,45.14) 4.47 (0.15,133.82) 0.388 0.357

Log-likelihood = -5.5623 No. of observations = 10 AIC value = 21.1245

> fit = glm(x~y + z1 + z2 + z3, family=binomial(link="logit")) > epiDisplay::logistic.display(fit)

Logistic regression predicting x

        crude OR(95%CI)    adj. OR(95%CI)     P(Wald's test) P(LR-test)

y: 1 vs 0 0.67 (0.04,11.29) 0.96 (0.04,23.87) 0.979 0.979

z1: 1 vs 0 2 (0.15,26.73) 3.37 (0.11,99.3) 0.482 0.462

z2: 1 vs 0 2 (0.15,26.73) 2.85 (0.15,55.24) 0.488 0.475

z3: 1 vs 0 1 (0.08,12.56) 0.61 (0.02,15.96) 0.769 0.765

Log-likelihood = -6.2909 No. of observations = 10 AIC value = 22.5819

Why is this phenomenon occured?

Chan
  • 13

1 Answers1

1

The odds ratio modelled by logistic regression is not between X and Y but between Y and "not Y". Actually it's not an "odds ratio" but just log odds (the odds themselves are a ratio).

In fact, mathematically, in case of a single binary explanatory variable, the logistic regression is directly related to the odds ratio, see https://en.wikipedia.org/wiki/Logistic_regression#The_odds_ratio, but this does not hold with multiple explanatory variables.

  • you mean it is difference about multiple model adjustment vs. simple model? – Chan Jul 25 '21 at 13:47
  • I don't understand this comment. I'm just saying that your question seems to be based on the idea that logistic regression models the odds ratio between a binary variable X (or more than one) and another one Y, the response, hence your argument that it should be symmetric in X and Y, but in fact X (explanatory variable) doesn't occur in the odds ratio modelled by logistic regression. – Christian Hennig Jul 25 '21 at 13:49
  • @Chan see my edit. – Christian Hennig Jul 25 '21 at 13:58
  • Thanks to you, I understand difference between that. you are good man. – Chan Jul 25 '21 at 14:05