I have two zero-inflated datasets such as,
dt1= 0, 0.1, 0.125, 0, 0, 1.25...
dt2= 1.01, 0, 0, 0.25, 0,...
I want to check the differences, like t.test for instance, how can I compare these two datasets?
I have two zero-inflated datasets such as,
dt1= 0, 0.1, 0.125, 0, 0, 1.25...
dt2= 1.01, 0, 0, 0.25, 0,...
I want to check the differences, like t.test for instance, how can I compare these two datasets?
You can probably use the standard $t$ test to compare means of zero inflated datasets. Unless you know what you are doing, I would use a $t$ test that does not assume equal variances.
As an illustration, let's simulate some zero inflated data, where $X=0$ with probability $0.8$ and $X\sim\Gamma(2,2)$ otherwise, like this:
Even with such a high amount of zero inflation, the mean of $n=100$ samples is nicely almost-normally distributed, which is what the $t$ test requires:
You may want to bootstrap means within each group, plot them, and eyeball them to reassure yourself whether the histogram is nicely normal.
R code:
n_sims <- 1e5
n_sample <- 100
means <- rep(NA,n_sims)
for ( ii in 1:n_sims ) {
set.seed(ii) # for reproducibility
zeros <- runif(n_sample)<0.8
foo <- c(rep(0,sum(zeros)),rgamma(sum(!zeros),2,2))
means[ii] <- mean(foo)
}
hist(foo,main="Sample zero inflated dataset",xlab="")
hist(means,xlab="")
Whether such a comparison of means is useful and informative in the context of zero inflation is a different question. Consider also comparing the proportion of zeros. Or fitting a appropriate mixture models and comparing the respective components.
Instead of getting distribution of mean (from bootstrap samples) would it be more appropriate to consider the skewed distribution in 2 parts: and get distribution of P(data > 0) * median(data after removing zeros) ?