When modelling continuous proportions (e.g. proportional vegetation cover at survey quadrats, or proportion of time engaged in an activity), logistic regression is considered inappropriate (e.g. Warton & Hui (2011) The arcsine is asinine: the analysis of proportions in ecology). Rather, OLS regression after logit-transforming the proportions, or perhaps beta regression, are more appropriate.
Under what conditions do the coefficient estimates of logit-linear regression and logistic regression differ when using R's lm and glm?
Take the following simulated dataset, where we can assume that p are our raw data (i.e. continuous proportions, rather than representing ${n_{successes}\over n_{trials}}$):
set.seed(1)
x <- rnorm(1000)
a <- runif(1)
b <- runif(1)
logit.p <- a + b*x + rnorm(1000, 0, 0.2)
p <- plogis(logit.p)
plot(p ~ x, ylim=c(0, 1))

Fitting a logit-linear model, we obtain:
summary(lm(logit.p ~ x))
##
## Call:
## lm(formula = logit.p ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.64702 -0.13747 -0.00345 0.15077 0.73148
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.868148 0.006579 131.9 <2e-16 ***
## x 0.967129 0.006360 152.1 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Residual standard error: 0.208 on 998 degrees of freedom
## Multiple R-squared: 0.9586, Adjusted R-squared: 0.9586
## F-statistic: 2.312e+04 on 1 and 998 DF, p-value: < 2.2e-16
Logistic regression yields:
summary(glm(p ~ x, family=binomial))
##
## Call:
## glm(formula = p ~ x, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.32099 -0.05475 0.00066 0.05948 0.36307
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.86242 0.07684 11.22 <2e-16 ***
## x 0.96128 0.08395 11.45 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 176.1082 on 999 degrees of freedom
## Residual deviance: 7.9899 on 998 degrees of freedom
## AIC: 701.71
##
## Number of Fisher Scoring iterations: 5
##
## Warning message:
## In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Will the logistic regression coefficient estimates always be unbiased with respect to the logit-linear model's estimates?
0.1there "were", say, 10 independent trials yielding one success. For linear model,0.1is simply a value, some arbitrary measure. – ttnphns Mar 07 '15 at 10:13family=binomialimplies that the dependent variable represents binomial counts -- not proportions. And how wouldglmknow that0.1is like "one out of ten" and not "ten out of hundred"? While the proportion itself does not differ, this has major implications for how the standard error is computed. – Wolfgang Mar 07 '15 at 10:24weightsarg (though this isn't what I was attempting in my post, where I have intentionally analysed the data incorrectly). – jbaums Mar 07 '15 at 11:15glmends up doing? It is obviously doing something (besides issuing the warning that there arenon-integer #successes in a binomial glm!. I still have a hard time wrapping my head around what this really means, but this would probably be something for a new question. – Wolfgang Mar 07 '15 at 11:15glmassumes the response is the outcome of a single trial. Maybe these are rounded to binary outcomes ... I'm not sure (am away from comp and can't compare atm). – jbaums Mar 07 '15 at 11:19summary(glm(p ~ x, family=binomial))andsummary(glm(p ~ x, family=binomial, weights=rep(1,length(p))))yield the same results. – Wolfgang Mar 07 '15 at 11:21