0

Some basic overview:

  • Problem Statement: Comparing average transactions per user between two groups
  • Group 1 Size: 6.5M
  • Group 2 Size: 500K

Now, it's been a month and we've seen about of unique users with a transaction within each group:

  • Group 1: 500K
  • Group 2: 25K

Here are our initial results: Group 1 Stats:

  • Mean: 2.75
  • SD: 4.6

Group 1 Stats:

  • Mean: 2.81
  • SD: 4.9

Now, my initial thought was to run a Z-Test but I had some questions:

  • I'm starting to think it is too soon to analyze results. Only <5% of our users have made a transaction so far. The average time between transactions is 26 days with an SD of 47. It could be too early but I wanted to get your thoughts.
  • Am I going to run into issues comparing vastly different sample sizes in this case.

I hope this is appropriate for this site.

Dave
  • 62,186
madsthaks
  • 297
  • 2
    One sample size being much bigger than another doesn't affect the validity of the Z/t test so long as both are sufficiently large for the Gaussian approximation to work (and independence within and between groups holds). More likely in your case is that you'll pretty much always reject due to having such large sample sizes so you may not learn much by the hypothesis tests themselves (see eg https://stats.stackexchange.com/questions/2516/are-large-data-sets-inappropriate-for-hypothesis-testing) – jld Aug 24 '20 at 22:35
  • 1
    If sample sizes were about the same in each group, then what hypotheses would you want to test and what statistical test would you use? What would results mean to you? As @jid says, you could use the same tests in spite of the imbalance in sample sizes. – BruceET Aug 24 '20 at 23:57
  • @jld - Thanks for sharing that. So, I wonder, in this case, if I let this experiment run long enough, would you draw conclusions for such small differences in means? – madsthaks Aug 26 '20 at 00:41

0 Answers0