5

In a typical A/B test, the randomization unit is user level, sometimes the analysis unit may be page/visit level, like a cluster randomization experiment. In this situation, the iid assumption doesn't hold, and many methods like cluster standard error, delta-method, and cluster bootstrap, have been proposed (A/B testing ratio of sums). However, since each user may have multiple visits, and the previous visit may determine whether the user visits again, this may cause temporal confounding. so is cluster randomized A/B test still the gold standard for causal inference?

wei
  • 543
  • Can you describe the experiment in more detail? An example of how exactly the experiment is done and what's the exact end goal (e.g. assessing number of unique visits to a webpage) could be really helpful. – Amin Shn Jun 18 '22 at 11:57

1 Answers1

1

You have measurements with some users and per user the number of measurements might differ.

In the simplest way you ignore the number of visits per user and consider the experiment as a single measurement per user. For instance, your variable might be the total number of sales per user and you can consider this as a random number drawn from a population that differs depending on case A or case B.

Alternatively, to get some more power in the hypothesis test, you can treat this as a mixed effects model.


How to perform this mixed effects model exactly will depend on the experiment and goals. Do you want to improve the ratio of sales or the total number of sales?

Possibly a multidimensional view might be useful. This was also mentioned briefly in your linked question in relation with the following image

example

The image above was used to show how a small difference might be observed when only the ratio is considered, but it also shows that depending on how you look at the multidimensional picture you may end up with different conclusions.

The causal effects might be different from what you expect. If A causes an increase in the number of page visits, then it might decrease the ratio of sales per visit. But possibly the total sales increase (because of the extra visits). If you look at the ratio of sales then A performs worse, if you look at the total sales then B performs better.