I cannot make up my mind about an implementation of an A/B test, which can be described as follows:
Let's say Facebook wants to know whether more people post content if the page is blue or purple (yeah, stupid I know) and has one week to decide which color is the right one. Fb splits the traffic in half and measures how many people post on each of the colors. The result of the first day looks like this:
- 80% of those facing purple posted content; and 90% posted on the blue version
What is the 'right' way to test what is better? Here comes the trouble (for me):
(1) I can take the week of the data and use a t-test on the 7 data points (assuming the percentages come from approximately normal distribution), which will probably come out insignificant
(2) I can rephrase the result as a variable where 80% of the people facing purple get a 1 for posting and the rest 0 (the same for blue) and then use a test suitable for binoial distribution. This test might get significant result really fast because Fb has plenty of users.
Based on my knowledhe I would go for option (2) but accummulated over the week to make sure there isn't any bias caused by the week day and also that there is enough users so I do not fall into the early significance trap. Answer to Aggregation-Level in AB-Tests shows that there is in theory no difference (in the effect) between the two approaches which makes sense, however, there seems to be a problem in the testing itself.
How would you approach that?