Still get non-positive values for the 'gamma' family

Question

I am going to post it here, as suggested in stackoverflow:

I am analyzing percentage data with glmer, and I have read that Gamma family should be suitable for this kind of data. I have checked my data and there are no values below 0, but I still get an error saying I have non-positive values.

  > summary(total_F$p.prcnt)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       0.00   50.00   75.00   68.56  100.00  100.00

Just adding my code, that i used:

F_par1<- glmer(p.prcnt ~ b.element+distance+b.element*distance +year+sampling.round+(1|LS1),  
                   family = Gamma, 
                   data=total_F)

What are my options? I tried to modify my data in Excel to be proportional and use binomial, but I would prefer to use Gamma, if possible.

EDIT: I cant also remove the zeros, as they are meaningful

$0$ is non-positive$-$that's why you get the error. See this answer for the most common ways to deal with percentage data. — statmerkur, Oct 18 '22 at 11:07
My observation is a percentage, that is based on the infested larvae/total number of larvae. So if there was no larvae in the sample that were infested, it is 0. I also had NA values, where there was no larvae at all, but I removed those samples — Sisi, Oct 18 '22 at 11:16
If you ave also have the total number of larvae, then as suggested by @Stephan Kolassa below in the comments of his post, I would use binomial logistic regression on the counts. — utobi, Oct 18 '22 at 11:40

score 1 · Answer 1 · answered Oct 18 '22 at 10:49

1

The gamma distribution has its support on the positive axis. Data generated by a gamma will be positive with probability 1. Essentially, a zero value can't come from a gamma. So the gamma is not a good choice for your data. ("Percentage data" sounds like it might be bounded at 1, too, which would be another reason not to use a gamma, because its support is unbounded to the right.)

You might want to look at zero-inflated models. The zero-inflated gamma is sometimes used, I just don't know a way of dealing with it in the context of a mixed model.

answered Oct 18 '22 at 10:49

Stephan Kolassa

123,354

Just a clarification: for continuous distributions s.t. Gamma, I think the zero-inflation approach doesn't work since, as you also wrote, we will never observe zeros from the Gamma component. Think for instance in the Poisson case, there we do observe zeros but just not enough. Am I right? Indeed, I was to suggest deterministically inflating zeros by some small $\epsilon>0$... a dirty but quick fix. – utobi Oct 18 '22 at 11:03
If i change the glmer to glmmtmb and try to do zero-inflated gamma (with family ziGamma), it still gives an error saying non-positive values allowed – Sisi Oct 18 '22 at 11:24
@utobi: zero inflated (or for that matter, inflation by any other value) data arises usually when there are two data generating processes at work. With a certain probability $p$, DGP1 yields a zero, and with probability $1-p$, DGP2 yields some other observation. The results from DGP2 can be gamma distributed, or anything else. As such, zero inflation is just a special case of a mixture distribution, with one of the components being a degenerate distribution. – Stephan Kolassa Oct 18 '22 at 11:25
Sisi, that sounds interesting. First off, I notice your data are indeed truncated above at 100% or 1, so you shouldn't be using the gamma distribution in the first place. Can glmmtmb deal with a beta distribution that is inflated at both zero and one? Also, since your data are a count out of a total count, it might make more sense to model this using a binomial distribution (with the total number of trials/larvae varying by site). Unfortunately, I'm not an expert there. – Stephan Kolassa Oct 18 '22 at 11:31
@StephanKolassa yes you are right. But perhaps the naming should be different... "Zero Injection". Thanks for the clarification! – utobi Oct 18 '22 at 12:04
Thank you! So should this be correct way to do it?
F_par<- glmmTMB(Prop_par ~ b.element+distance+b.elementdistance +year+sampling.round+(1|LS1), weights=dissected.larvae, family=binomial(link="logit"), data=total_F)
Or i should write it like this – Sisi Oct 18 '22 at 12:17
F_par<- glmmTMB(cbind(total no. larvae/infested larvae)~ b.element+distance+b.element*distance +year+sampling.round+(1|LS1), weights=dissected.larvae, family=binomial(link="logit"), data=total_F) Meaning that i should use two counts as separate variables, not transforming my values into proportion (number 0.xx, whis Prop_par in first script is), and then do the analysis? – Sisi Oct 18 '22 at 12:17
As I said, I'm not an expert in this direction. I would recommend you post a new question on how to model binomial data with a mixed model using R, and link to this question from that one. – Stephan Kolassa Oct 18 '22 at 15:16

Still get non-positive values for the 'gamma' family

1 Answers1