How to analyze percent data when most of the data is 0%?

Question

I have a set of percent data on histological abnormalities in fish gills and I need to compare the results between two sites A and B. We analyzed abnormalities in lamellae of a particular gill arch. For example for fish 1 we counted 500 lamellae and noted the number that had abnormality Y. The majority of the data is 0%. That is, in most fish, the lamellae counted had no such abnormality present. Obviously when I try different transformations the distribution does not change because of all the 0s (I have attached a graph of the distribution of the data). I do not know how to analyze this data.

enter image description here

It sounds like you essentially have scaled count data. It should probably be analyzed as count data, perhaps as a binomial GLM or possibly a zero-inflated binomial — Glen_b, Oct 05 '14 at 04:50
It's not correct to say that transformation has no effect on such data. What is correct is that any transformation you could try will map a spike of zeros in the distribution to a corresponding spike in the transformed distribution. That's less of a problem that you may think as other methods of analysis are available any way. I would start a comparison between sites in terms of % of fish with abnormalities and mean abnormality of the latter. — Nick Cox, Oct 05 '14 at 07:54

score 2 · Answer 1 · answered Oct 05 '14 at 03:50

2

It seems like you'd want to convert it to a binary variable (mutations vs not), then compare frequencies between sites with Logistic Regression or similar.

answered Oct 05 '14 at 03:50

Sean Murphy

616

score 0 · Answer 2 · answered Aug 22 '18 at 20:30

Fish are measurement units. Each fish has own total. Proportions may be modeled by the beta distribution. In particular this data may be modeled with the beta[0,1) distribution; where 0 is included and 1 is not. Analysis can proceed in to steps:

1) Estimate the proportions of 0s and >0 by a finite mixture. See SAS manual for the FMM procedure. For a similar case I used this code:

proc fmm data=ddd componentinfo technique=trureg; class case; model percent = case / noint dist=beta k=1; model + / dist=constant k=2; probmodel ; run;

Instead of k=2, you might need to use k=1.

2) Using DOI: 10.1037/1082-989X.11.1.54 and these 2 references you would be able to ascertain differences among sites: http://support.sas.com/resources/papers/proceedings11/335-2011.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.470.7064&rep=rep1&type=pdf

Also (not tested) could use module 'metamix' for STATA or Package ‘zoib’ in R

How to analyze percent data when most of the data is 0%?

2 Answers2