Z-test with a lot of null values

Question

I have a question concerning the validity of my email tests.

I'm testing different email variations (between 2 and 4 variations usually) with different groups of members (hundreds of thousands members per group) and I'm measuring the revenue per member.

I know the average revenue per member and the standard deviation for each group and use a Z-test to assess if the differences are significant or not.

There are between 1 and 2 % of buyers within the groups which means that 98% of the members have a revenue of 0.

My question is, is the Z-test still valid in this case as there are only few different data points?

Are you testing the differing proportions between groups or the differing revenue? — gung - Reinstate Monica, Mar 16 '15 at 15:57

score 1 · Answer 1 · answered Mar 16 '15 at 15:54

1

First define your question. The mean revenue per member (whatever that means) is not the same as the mean revenue per member contributing positive revenue. Comparing either pair of means could make practical sense.

Either way, if your sample size are thousands, almost any difference will qualify as significant at conventional levels.

The number of distinct values is not really an issue in itself. With only one distinct value, there would be no variation to compare with, but that situation is not yours. There is a worry concerning possibly extreme skewness with your data but t-tests usually behave quite well so long as the samples are not small; nevertheless, watch out.

I'd always use a t-test rather than a z-test. When they give the same practical answer, there is no disadvantage, and when they give a different answer, it's because of small sample size and the t-test is preferred.

answered Mar 16 '15 at 15:54

Nick Cox

56,404
8
127
185

What do you mean by positive revenue? Here the revenue is the amount a customer spend.
The distribution of the revenue within a group is skewed indeed, correct me if I'm wrong but here (as the Z-test relies on the Central Limit theorem) what is supposed to follow a normal distribution is the difference in means (of different groups) not the revenue within the groups right?

I understand the idea to use the t-test in general (so you don't have to worry about the sample size) but in this case, as I have a big sample, what are the advantages to use a t-test?

Thanks for your reply.
– ClydeX Mar 16 '15 at 16:19
Revenue is zero or positive, is it not? If it's not zero, then it's positive. I just think that using the t-test covers all the bases; that way you never have to make an arbitrary decision about when a z-test is allowed, valid or acceptable. In fact, I don't think I ever see z-tests in any research literature or statistical software manuals; your literature and software evidently varies. – Nick Cox Mar 16 '15 at 16:26
I got what you mean, I compare here the mean revenue taking all members into account (so not only the ones having a positive revenue). – ClydeX Mar 16 '15 at 16:37

Z-test with a lot of null values

1 Answers1