According to http://r-statistics.co/Beta-Regression-With-R.html, the topline remark is:
Beta regression is used when you want to model Y that are probabilities themselves
Grammar aside, one may assume that using the default specification, the beta-regression will provide log-odds ratios for response. But why should this method be preferred over quasibinomial regression?
Suppose I generate the Y according to the following probability model:
$$ \text{logit} (p) = -2 + 0.5 x$$
with the design so that $x = [-3.0, -2.9, \ldots, 2.9, 3.0]$ and represent $Y$ according to fractions of 10 independent Bernoulli replications.
`%in%` <- function(x, r) x>r[1]& x<r[2]
set.seed(123)
options(warn=-1) ## stupid default errors for GLM that don't matter
do.one <- function() {
x <- seq(-3, 3, by=0.1)
y <- rbinom(n <- length(x), size = 10, prob = plogis(-2 + 0.5*x))/10
f1 <- glm(y ~ x, family=quasibinomial())
library(betareg)
y2 <- y
y2[y2 == 0] <- 0.0001
y2[y2 == 1] <- 1-0.0001
f2 <- betareg(y2 ~ x, link='logit')
c(
cover1 = 0.5 %in% confint.default(f1, 'x', level = 0.8),
cover2 = 0.5 %in% confint.default(f2, 'x', level = 0.8)
)
}
out <- replicate(1000, do.one())
rowMeans(out)
In this simulated example, even with small replications, the coverage of the quasibinomial model is much closer to the nominal 80% CI limit whereas the beta regression is anticonservative, achieving approximately 70% coverage. Is this not the correct way to construct CIs for a betareg model? Or is this not the appropriate interpretation of covariate values?
> rowMeans(out)
cover1 cover2
0.787 0.699
yis generated from the binomial distribution, so why would you expect beta regression to work? You should use logistic regression (binomial family). What happens if you simulate your data from beta distribution? There's also https://stats.stackexchange.com/q/29038/35989 – Tim Nov 01 '22 at 18:32