1

I have a dataset where the response variable is the proportion of land irrigated by a canal and the predictors are:

  • the name of the village
  • the total area in which the crop is sown
  • the district name and
  • the sub-district name.

Since proportion can take values only between 0 and 1, I fit $\log\left(\frac{p}{1-p}\right)$ on the predictors where $p$ denotes the proportion. The problem is, some of the proportions take values 1 and 0, and the corresponding $\log\left(\frac{p}{1-p}\right)$ takes values $\infty$ and $-\infty$.

How to deal with this problem?

utobi
  • 11,726

1 Answers1

3

To fit proportions I'd use a beta regression model, in which we model the expected value as a function of covariates. To do this in R you can use the betareg package.

If you really want to go with log-proportions, then one way to mitigate the boundary problem is to set $0\to\epsilon+0$ and $1\to 1-\epsilon$, for some small $\epsilon>0$, say, 0.001.

utobi
  • 11,726
  • I think the same problem will occur with beta regression considering that the density of the beta random variable at 1 or 0 is 0 and the likelihood function will be 0 given any parameter. – Subhra Patra Oct 10 '22 at 17:51
  • Sure but see https://stats.stackexchange.com/questions/48028/beta-regression-of-proportion-data-including-1-and-0 for how to deal with it. – utobi Oct 10 '22 at 18:49
  • @SubhraPatra if you are happy with my answer, please, consider accepting it. – utobi Oct 11 '22 at 19:37