2

I am using the betareg package in R to model a proportional response and would like to incorporate information about the level of confidence in each observation using the weights argument in the betareg() function. The package documentation describes the weights as "case weights" and I have read a bit about how this is different from "proportionality weights". However, I am still a bit uncertain if I am using the weights argument correctly.

Specifically, each observation in my model is an average of 1-3 (non-independent) measurements. Observations based on 3 measurements are much more reliable than observations based on 1 measurement. This is both because of the greater precision afforded by additional measurements and also because observations based on 1 measurement are inherently more likely to be noisy. My current approach to account for this is to fit a model like this,

betareg(y ~ x1, weights = n.obs/3)

where 'n.obs' is a vector giving the number of measurements contributing to each observation. The effect on my model seems reasonable - group means are shifted toward more reliable observations and standard errors increase. However, the specific weighting seems a bit arbitrary because I do not know specifically how the number of measurements should affect the standard error of the model parameters. I am wondering if my current approach is defensible, or whether there is a more appropriate way to specify the weights in this circumstance.

EDIT (to clarify my question): My (probably naive) interpretation of the proposed approach is that observations with only 1 measurement are weighted as 1/3 the "importance" of an observation with all 3 measurements. Is that technically correct? If so, that weighting seems fairly conservative to me. My intuition about this particular data set is that having 1 measurement is at least half the value of a complete set of measurements and the loss of precision is greater dropping from 2 to 1 vs 3 to 2 measurements. I considered scaling the values to reflect this, but I would like to understand the mathematical implications of this in order to justify any possible scaling of the weights.

Devin
  • 63

1 Answers1

1

My impression is that your strategy is reasonable for obtaining the point estimates but some care is needed for obtaining standard errors that might be useful.

betareg uses case weights and hence a weight of, say, 2 would be interpreted as two independent observations that have exactly the same y and x values. Thus, these would not just be averages from different observations but exact replicates.

Therefore, if you fit a betareg where all observations have weight 2 the coefficients would be unchanged (compared to the default case with weight 1) but all (co-)variances would be halved.

One strategy that you see sometimes used in practice when emulating proportionality weights via case weights is to make sure that your weights are scaled such that sum(weights) corresponds to the number of independent observations.

Achim Zeileis
  • 15,515
  • 2
  • 38
  • 62