Further to my previous post , it seems that one can/should use a logistic regression to model a proportion.
How do I interpret the coefficients of a logistic regression when the outcome variable is a proportion and not binary? Is it the same interpretation as if the outcome variable was binary?
Using the example below (in R), if the interpretation is the same as if the outcome (prop_survived) was binary, I would say "Controlling for all other variables, being a child nearly triples the odds of survival". If that's not the appropriate interpretation what is? What would be the interpretation for the inverse logit of the coefficients?
Is there a way to have an interpretation on the "original" variable i.e. the proportion itself? Say for example the outcome variable was a proportion/scale of pain (ranging continuously from 0 (no pain) to 1 (extreme pain)), it wouldn't make much sense to say that a one unit increase in X increases the odds of pain by Z... Instead I am interested in knowing by how much does pain increase (on my 0 to 1 scale) given a one unit increase in X.
Hope this makes sense, Thanks.
library(tidyverse)
data <- as_tibble(Titanic) %>%
group_by(Class, Sex, Age) %>%
mutate(cohort_size = sum(n)) %>%
ungroup() %>%
filter(cohort_size > 0 & Survived == "Yes") %>%
mutate(prop_survived = n / cohort_size)
data
#> # A tibble: 14 x 7
#> Class Sex Age Survived n cohort_size prop_survived
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1st Male Child Yes 5 5 1
#> 2 2nd Male Child Yes 11 11 1
#> 3 3rd Male Child Yes 13 48 0.271
#> 4 1st Female Child Yes 1 1 1
#> 5 2nd Female Child Yes 13 13 1
#> 6 3rd Female Child Yes 14 31 0.452
#> 7 1st Male Adult Yes 57 175 0.326
#> 8 2nd Male Adult Yes 14 168 0.0833
#> 9 3rd Male Adult Yes 75 462 0.162
#> 10 Crew Male Adult Yes 192 862 0.223
#> 11 1st Female Adult Yes 140 144 0.972
#> 12 2nd Female Adult Yes 80 93 0.860
#> 13 3rd Female Adult Yes 76 165 0.461
#> 14 Crew Female Adult Yes 20 23 0.870
model <- glm(
prop_survived ~ Class + Sex + Age,
weights = cohort_size,
family = binomial(link = "logit"),
data = data
)
summary(model)
#>
#> Call:
#> glm(formula = prop_survived ~ Class + Sex + Age, family = binomial(link = "logit"),
#> data = data, weights = cohort_size)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -4.1356 -1.7126 0.7812 2.6800 4.3833
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.0438 0.1679 12.171 < 2e-16 ***
#> Class2nd -1.0181 0.1960 -5.194 2.05e-07 ***
#> Class3rd -1.7778 0.1716 -10.362 < 2e-16 ***
#> ClassCrew -0.8577 0.1573 -5.451 5.00e-08 ***
#> SexMale -2.4201 0.1404 -17.236 < 2e-16 ***
#> AgeChild 1.0615 0.2440 4.350 1.36e-05 ***
#> ---
#> Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 671.96 on 13 degrees of freedom
#> Residual deviance: 112.57 on 8 degrees of freedom
#> AIC: 171.19
#>
#> Number of Fisher Scoring iterations: 5
exp(coef(model))
#> (Intercept) Class2nd Class3rd ClassCrew SexMale AgeChild
#> 7.72017801 0.36128255 0.16901595 0.42414659 0.08891625 2.89082630
boot::inv.logit(coef(model))
#> (Intercept) Class2nd Class3rd ClassCrew SexMale AgeChild
#> 0.88532344 0.26539865 0.14457967 0.29782509 0.08165573 0.74298519
Created on 2024-02-01 with reprex v2.0.2