Background
Let’s rewind time and say a company like Twitter wanted to introduce a feature that allows each user to add a banner to the top of their profile page. After creating this feature, 5k beta tester high-profile accounts were auto-enabled into the feature and then 5k other accounts organically enabled it.
Experiment
We want to design an experiment that sees if this banner impacts some engagement metric such as “Total Favorites” from each user that sees it.
To structure this experiment, we do a visitor-side randomization where visitors are exposed to the experiment when they visit any of the ~10k profiles that have the banner.
Test: Participants can see the banner if they visit a profile with a banner
Control: Participants can’t see the banner if they visit a profile with a banner
Outcome Variable: Total Favorites for each participant
What I'd like to understand
How the total number of visits to an auto enabled profile with a banner impacts Total Favorites I.e. What’s the relationship between how many times they saw a banner and the impact on total favorites
How the total number of visits to an organically adopted profile with a banner impacts Total Favorites. I would want to use this to understand the impact on favorites from buyers interacting with the organically adopted profiles. I realize there is a bias with who has organically adopted it, but would like to understand even with the bias.
My initial solution would be a regression model similar to below:
Total Favorites = α + β1 (enabled_high_profile_visits) + β2(organic_enabled_profile_visits) + β3 (test_treatment) + β4 (test_x_enabled_high_profile_visits) + β5 (test_x_enabled_high_profile_visits)
Definitions
enabled_high_profile_visits - total visits to profiles that had the unit auto-enabled organic_enabled_profile_visits - total visits to profiles that organically enabled the unit test_treatment- binary condition if you are part of the test group, and therefore have been exposed to a profile with the banner
My concern is that I will run into post-treatment bias by conditioning the regression on post-treatment variables. Is there any alternative solution, or is this not a problem in this case?
Let's say we are testing a drug which ends up lowering blood pressure. However, one of the consequences of the treatment is that it also increases the likelihood users also take vitamins (sorry for the lack of imagination) and this also lowers blood pressure.
If you would like to isolate the impact of the treatment vs the impact of the vitamin usage, would it make sense to condition on the vitamin usage?
What are the precautions I need to take here? A lot of the literature I've read warns against controlling for post-treatment variables in regressions:
– Alex Oct 31 '21 at 22:07