1

I am running a beta-regression using betareg in R (with default logit link function). My response variable is a proportion, and may include 0 and/or 1. I've transformed the data following the betareg manual:"if y also assumes the extremes 0 and 1, a useful transformation in practice is (y ยท (n โˆ’ 1) + 0.5)/n where n is the sample size (Smithson and Verkuilen 2006)."

Normally, I would interpret the regression coefficient in terms of exp(b) - e.g. a unit change in x changes the proportion by (exp(b)*100-100)%,
but how do I account for the transformation carried out?

An example including y=1

> yb<-seq(32)
> a<-c(23,25,36,27,21,18,18,8,15,11,8,14,15,11,6,14,11,14,8,10,9,7,17,14,12,17,8,10,8,12,10,12)
> b<-c(18,19,8,12,8,5,8,7,5,5,4,0,2,0,4,2,1,0,1,5,3,3,1,5,10,6,17,20,18,20,26,28)
> 
> df<-data.frame(x=yb, A=a, B=b)
> 
> ## Calculate total
> df$n<-df$A+df$B
> ## Calculate proportion of A
> df$prop<-df$A/df$n
> ## Transform
> df$prop_trans<-(df$prop*(df$n-1)+0.5)/df$n
> 
> ## Betaregression
> bregt<-betareg(prop_trans ~ x, data=df)
> summary(bregt)

Call: betareg(formula = prop_trans ~ x, data = df)

Standardized weighted residuals 2: Min 1Q Median 3Q Max -1.2435 -0.8567 -0.2671 0.6495 2.4999

Coefficients (mean model with logit link): Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.39672 0.31237 4.471 7.77e-06 *** x -0.04077 0.01581 -2.578 0.00993 **

Phi coefficients (precision model with identity link): Estimate Std. Error z value Pr(>|z|)
(phi) 5.486 1.289 4.255 2.09e-05 ***


Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type of estimator: ML (maximum likelihood) Log-likelihood: 11.44 on 3 Df Pseudo R-squared: 0.1245 Number of iterations: 12 (BFGS) + 1 (Fisher scoring) > > ## Interpretation - unit change in x changes the proportion by ...% > b=summary(bregt)$coef$mean[2,1] > round(exp(b)*100-100,digits=1) [1] -4

branwen85
  • 111

1 Answers1

1

In the actual reference they say the following, suggesting this is not really applicable for proportions in the 'natural scale':

Our recommendation applies to a sample of scores from a continuous bounded scale that has already been linearly transformed to the [0,1] interval.

Other suggestions from the supplement include moving boundary values by a constant, possibly based on the sample size, which will introduce a small amount of bias. None of these will meaningfully change the way you'd interpret the model parameters though, they will just restrict the data to be within the support of the beta distribution (and thereby produce small numerical differences in the actual fit).

I think a better solution here might be to fit a zero- and/or one-inflated beta regression that actually handles the boundary cases.

As an aside, exponentiating the model parameters will not give you the ratio of change in proportion but in odds, these are not the same thing!

PBulls
  • 4,378
  • Thank you, I missed out the "natural scale" fragment in the docs. As for the interpretation, I was going with this thread: https://stats.stackexchange.com/questions/297659/interpretation-of-betareg-coef which actually does say that logit can be interpreted as a change in proportions. I have predicted the values based on the model, and calculated the ratio between the predicted data points - it equaled 1-exp(b).

    Note - I know that this would not equal the probability, but it's not what I am searching for here.

    โ€“ branwen85 Oct 26 '23 at 08:50
  • Sorry, should be 1+exp(b) for the ratio between predicted values โ€“ branwen85 Oct 26 '23 at 08:58