1

Suppose the application has a click rate of 5%. A new version may improve this. Suppose that frequentists approach is used. To estimate the sample size the click rate of the new version is estimate with 6%. So this means a relative MDE (minimum detectable effect) of 20% (5% -> 6%). Using https://www.evanmiller.org/ab-testing/sample-size.html I get that the A/B test needs 7,663 samples per variation.

I ran the test and the outcome is

Variation Clicks Non-clicks Click rate
A (control) 402 7512 5.07%
B (challenger) 466 7412 5.91%

I enter the numbers into https://www.evanmiller.org/ab-testing/chi-squared.html and get that B is better with p = 0.0146.

BUT the MDE is only 16.5% and therefore less than the 20% initial estimation. What does this say about the whole A/B test? And why?

Options I came up with:

  1. The outcome is significant and the new version should be used since the MDE is only used for the initial estimation.
  2. The outcome is not significant and the new version should not be used.
    1. The whole test needs to be re-run with the lower MDE (which means larger sample size).
    2. The existing data can be re-used and extended to extend to the larger sample size (because of lower MDE).
    3. ???
  3. ???
rfalke
  • 111

0 Answers0