0

If I am doing a binomial GLM with a proportion as an outcome--how do I interpret the model coefficients? I understand how it is done when the outcome is an event (e.g., 0/1), but it is less clear to me how to interpret it in this case. Is it still the log odds that the outcome (i.e., proportion) is 1?

A silly example below predicting population-level happiness based on number of televisions. My question is: how do I understand/interpret the coefficient for n_with_tv (0.003524932)?

set.seed(1)
mod_df = data.frame(
  island = 1:10,
  pop = sample(1000, 10)
)
mod_df$n_happy = round(mod_df$pop * sample(seq(.2, .8, .01), 10))
mod_df$prop_happy = mod_df$n_happy / mod_df$pop

mod_df$n_with_tv = round(mod_df$pop * (mod_df$prop_happy + rnorm(10, sd = .10))) mod_df$prop_with_tv = mod_df$n_with_tv / mod_df$pop

mod_df #> island pop n_happy prop_happy n_with_tv prop_with_tv #> 1 1 836 585 0.6997608 518 0.6196172 #> 2 2 679 353 0.5198822 275 0.4050074 #> 3 3 129 52 0.4031008 48 0.3720930 #> 4 4 930 725 0.7795699 697 0.7494624 #> 5 5 509 310 0.6090373 289 0.5677800 #> 6 6 471 344 0.7303609 356 0.7558386 #> 7 7 299 194 0.6488294 167 0.5585284 #> 8 8 270 78 0.2888889 90 0.3333333 #> 9 9 978 254 0.2597137 133 0.1359918 #> 10 10 187 52 0.2780749 48 0.2566845

mod_glm = glm(prop_happy ~ n_with_tv, weights = pop, family = "binomial", data = mod_df)

summary(mod_glm)$coef #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) -0.924823050 0.0551110306 -16.78109 3.356306e-63 #> n_with_tv 0.003524932 0.0001494939 23.57910 6.315129e-123

Possibly related:

Andrew
  • 123
  • 5

1 Answers1

3

This model is called the "fractional logit". You actually should be using a robust standard error and the quasibinomal family for this kind of analysis since the outcome does not have a binomial distribution.

The coefficient does not have a useful interpretation; it is essentially the same as with logistic regression on a binary event: a coefficient of $b$ means that a 1-unit change in $x$ is associated with a $b$-unit change in the logit of the outcome, where $\text{logit}(y) = \log\left(\frac{y}{1-y}\right)$. We can't speak about probabilities or odds and instead can only interpret the logit as a complicated nonlinear function of the outcome.

For this reason, fractional logit models are often interpreted using marginal effects. The Stata documentation of fracreg explains these models and describes how one would use the margins command after fitting the model to appropriately interpret the model results. When the outcome is a probability, it might be possible to use a log-odds interpretation. In your case, you might be able to interpret the coefficient as the change in the log odds of an individual on an island being happy given a change in the number of TVs on that island. But this interpretation extrapolates a bit from what the model allows, and the marginal effects approach would be preferred.

In R, you can use marginaleffects::avg_slopes() to compute the average marginal effect of the predictor, which can be interpreted as the average rate of change in the outcome corresponding to a change in the predictor (or the average of pointwise derivatives of the average dose-response function across the sample). Se my answer here for more intuition on interpreting these quantities.

Noah
  • 33,180
  • 3
  • 47
  • 105