Suppose the application has a click rate of 5%. A new version may improve this. Suppose that frequentists approach is used. To estimate the sample size the click rate of the new version is estimate with 6%. So this means a relative MDE (minimum detectable effect) of 20% (5% -> 6%). Using https://www.evanmiller.org/ab-testing/sample-size.html I get that the A/B test needs 7,663 samples per variation.
I ran the test and the outcome is
| Variation | Clicks | Non-clicks | Click rate |
|---|---|---|---|
| A (control) | 402 | 7512 | 5.07% |
| B (challenger) | 466 | 7412 | 5.91% |
I enter the numbers into https://www.evanmiller.org/ab-testing/chi-squared.html and get that B is better with p = 0.0146.
BUT the MDE is only 16.5% and therefore less than the 20% initial estimation. What does this say about the whole A/B test? And why?
Options I came up with:
- The outcome is significant and the new version should be used since the MDE is only used for the initial estimation.
- The outcome is not significant and the new version should not be used.
- The whole test needs to be re-run with the lower MDE (which means larger sample size).
- The existing data can be re-used and extended to extend to the larger sample size (because of lower MDE).
- ???
- ???