How can I compare two zero inflated continuous datasets?

Question

I have two zero-inflated datasets such as,

dt1= 0, 0.1, 0.125, 0, 0, 1.25... 
dt2= 1.01, 0, 0, 0.25, 0,...

I want to check the differences, like t.test for instance, how can I compare these two datasets?

A minor point about terminology: I would argue that data themselves are not zero-inflated in the same way that data are not "non-parametric". Zero-inflation is a term that has meaning with respect to a certain model such as the Poisson. — COOLSerdash, Jul 29 '20 at 09:26

Stephan Kolassa · Answer 1 · 2020-07-29T09:11:07.860

4

You can probably use the standard $t$ test to compare means of zero inflated datasets. Unless you know what you are doing, I would use a $t$ test that does not assume equal variances.

As an illustration, let's simulate some zero inflated data, where $X=0$ with probability $0.8$ and $X\sim\Gamma(2,2)$ otherwise, like this:

Even with such a high amount of zero inflation, the mean of $n=100$ samples is nicely almost-normally distributed, which is what the $t$ test requires:

You may want to bootstrap means within each group, plot them, and eyeball them to reassure yourself whether the histogram is nicely normal.

R code:

n_sims <- 1e5
n_sample <- 100
means <- rep(NA,n_sims)
for ( ii in 1:n_sims ) {
    set.seed(ii)    # for reproducibility
    zeros <-    runif(n_sample)<0.8
    foo <- c(rep(0,sum(zeros)),rgamma(sum(!zeros),2,2))
    means[ii] <- mean(foo)
}
hist(foo,main="Sample zero inflated dataset",xlab="")
hist(means,xlab="")

Whether such a comparison of means is useful and informative in the context of zero inflation is a different question. Consider also comparing the proportion of zeros. Or fitting a appropriate mixture models and comparing the respective components.

edited Jul 29 '20 at 09:11

answered Jul 29 '20 at 07:42

Stephan Kolassa

123,354

2

Would the downvoter be so kind to explain what about my answer is not useful? – Stephan Kolassa Jul 29 '20 at 10:09
Hi, Stephan. Could you explain why mean comparison wouldn't be useful and informative in this context? What would you propose instead? – Parseval Dec 02 '21 at 12:12
@Parseval: it depends on what question you are interested in. If your zero inflated dataset is 90% zeros, then the overall mean is very much dominated by this, and you may be more interested in the mean of the nonzero entries. Or in quantiles. For instance, your data may be responses to some marketing campaign, where most targets do not respond at all, and you are more interested in the actual responses (= nonzero purchases, clicks or whatever). – Stephan Kolassa Dec 02 '21 at 12:17
1

Indeed that is my case. An A/B test between two groups. Group A has been exposed to a marketing campaign and group B has not. I want to find if there is any difference in their total spending as a result of the campaign that lasted a given period of time. Obviously both groups contain lots of zeros since the majority of the customers place zero orders (many are one time purchasers). – Parseval Dec 02 '21 at 12:21
@Parseval: exactly. So you might also be interested in comparing the proportion of zeros (non-responders) between the two groups. – Stephan Kolassa Dec 02 '21 at 12:56
What is the name of such a test? Or do I only do a permutation test and construct a distribution of the proportions of 0 and check where the observed distributions land? – Parseval Dec 02 '21 at 13:39
@Parseval: I don't know of a specific test on the proportions of zeros. You could do a simple $\chi^2$ test on a table of zeros vs. non-zeros: https://en.wikipedia.org/wiki/Pearson's_chi-squared_test. A permutation test would be a reasonable alternative. – Stephan Kolassa Dec 02 '21 at 15:05

score 0 · Answer 2 · answered Jan 25 '24 at 02:36

0

Instead of getting distribution of mean (from bootstrap samples) would it be more appropriate to consider the skewed distribution in 2 parts: and get distribution of P(data > 0) * median(data after removing zeros) ?

answered Jan 25 '24 at 02:36

user1753581

1

This seems like it is better left as a comment than an answer (as the solution to their problem here isn't clear and could perhaps use some elaboration/citations to support your point). Of course you do not have enough reputation yet to do that, but regardless this is not adequate (in my opinion) as an answer to the query. – Shawn Hemelstrand Jan 25 '24 at 03:25

How can I compare two zero inflated continuous datasets?

2 Answers2

Linked

Related