I have a dataset from an AB test for clickthroughs on a website. We randomly divided users into A and B groups and counted an observation each time a user viewed the webpage. Each observation is one viewing of the webpage with 1 or 0 in the clicked_through column if the user clicked through or didn't, respectively. A sample dataset (for either the A or B group) is given below. Notice that some users viewed the webpage more than once during the test period.
user_id clicked_through
------- ---------------
user_1 0
user_2 1
user_2 0
user_3 0
user_3 1
user_3 0
user_4 0
user_5 0
user_5 1
user_6 1
We want to test whether the clickthrough rate (CTR) for the B group is greater than the CTR for the A group at a given significance level. I don't think we can use an ordinary two-proportion Z-test because observations for the same user are not independent. I've seen similar questions (such as this one) where the Delta Method and bootstrapping were recommended. I want to use bootstrapping because I want to learn more about bootstrapping generally. However, I'm not sure exactly how to use bootstrapping with this data. Sampling rows with replacement seems wrong because we know the distribution of CTR when drawing a given number of times from the A and B datasets and so we can create confidence intervals analytically, probably replicating the ordinary two-proportion Z-test.
My idea is to randomly choose users from each group (with replacement) to create the bootstrapped datasets. For example, in one iteration we might choose users 1, 2, 3, 2, 6, 3 from the above dataset, giving this bootstrapped dataset:
user_id clicked_through
------- ---------------
user_1 0
user_2 1
user_2 0
user_3 0
user_3 1
user_3 0
user_2 1
user_2 0
user_6 1
user_3 0
user_3 1
user_3 0
I would compute CTR for each bootstrapped dataset to create confidence intervals for the A and B datasets, then compare the overlap of the confidence intervals to gauge significance.
Is this approach sound? If not, what's a better approach?
Additional info: Both the A and B datasets have ~12,000 unique users with ~16,000 page views and a ~40% CTR. The max number of views for a single user is 21. Given the relatively low ratio of page views to users, the two-proportion Z-test might not work so badly here, but we have other AB test datasets with far higher ratios so I want to find a bootstrapping approach that will work well generally.
ctr_obsandctr_varfor test and control, sayctr_obs_test,ctr_var_test,ctr_obs_ctrl,ctr_var_ctrl. ThenZ = (ctr_obs_test - ctr_obs_ctrl) / sqrt(ctr_var_test + ctr_var_ctrl)whereZis standard normal. Correct? – gasbag_1 Apr 18 '22 at 21:07