1

I have designed and run an online experiment in which we've slightly changed parts of a web page. Let's say users visit our website to place food orders and the order funnel looks like this: home --> menu --> basket --> checkout --> order. In this experiment we're making changes to the basket.

The success metric of the experiment is the basket-to-checkout click-through rate (CTR). That is, out of every user who enters the experiment by viewing the basket, how many make it to the checkout page? I've chosen this metric because it's closest to our intervention and it should be sensitive enough for us to detect a meaningful change. Note that this is a pageview-level metric. Each time a user views the basket, they may or may not make it to the checkout page.

The experiment is randomised at the user level. This means that a user who is assigned to the treatment condition when they enter the experiment (i.e. the first time they view the basket) will remain in the treatment condition for the duration of the experiment. Thus, if the same user comes back and views the basket a few days later, they will experience the same version of the basket they did the first time. The same applies for users who are assigned to the control condition.

So far so good. The problem I have is that the unit of randomisation (users) is not the same as the unit of analysis (page views). In their book Trustworthy Online Controlled Experiments (2020), Kohavi et al. state

We now look at CTR and discuss two common ways to compute it, each with different analysis units. The first is to count the clicks and divide by the number of page views; the second is to average each user's CTR and then average all the CTRs. If randomisation is done by user, then the first mechanism uses a different analysis unit than the randomisation unit, which violates the independence assumption and makes the variance computation more complex.

I have a dataset which has one row per (basket) page view, a user identifier, the experimental condition, and an indicator showing whether they made it to the checkout page. Note that users could appear in the dataset multiple times if they viewed the basket page multiple times during the experiment. Following Kohavi et al.'s advice, I have used the second method to calculate CTR:

  1. I first group my data at the user level
  2. Then I calculate each user's CTR (by taking the mean of the indicator column)
  3. Finally, I group by the treatment condition and take the average of all the user-level CTRs. I may then want to compare each condition's CTR and check whether they are significantly different

Now, say that I stop at step 2 and have a user-level dataset consisting of users, their experimental condition, and their average CTR. I want to fit a model to this data to test whether there's a significant difference in CTR between the experimental conditions.

If I had ignored the fact that the pageview-level data was not independent (due to the within-user correlation being introduced by users with multiple page views), I could have modelled the original data with a logistic regression where the response variable was the boolean indicating whether the (basket) page view had made it to the checkout. Instead, now I have a user-level dataset with rows independent of one another. However, the response variable is no longer a boolean but a proportion. Most users have a response value of exactly 1 or 0 because they visited just once and they either made it to the checkout page or not. But I also have lots of users with a response value that takes on a proportion between 0 and 1 (e.g. 0.84).

My question is: how should I model this data? My understanding is that I can't use a simple binomial GLM because the response variable isn't $\{0,1\}$ but rather $[0,1]$. Similarly, I don't think I can use a binomial GLM using the proportions as the response variable because if I did that I would have to pass each user's number of page views as the weights, which would be equivalent to fitting the previous model. I also tried fitting a binomial GLMM to the pageview-level (non-independent) data, with the user identifier as a random effect – but this was too computationally expensive. My next thought was that the answer was probably a beta regression, since it's normally used to model probabilities/proportions. However, most users in my dataset have a value of exactly 0 or exactly 1, and beta regression works in $(0,1)$. One option could be to transform my response variable by marginally adding or subtracting something when the value is 0 or 1, respectively – but that feels hacky.

What is the right way to model this data?

Adrià Luz
  • 1,024
  • Even if in your data you have exact 1's and 0's, I guess you can assume that those are just imperfect estimates, CTR of 1 does not mean that the user would always click, regardless of anything. So for a practical solution to be able to use the data with something like beta regression, you can transform the data so 1's become something like 0.9999... etc. – Tim Jun 20 '23 at 11:03
  • @Tim – I've gone the betareg way, transforming 1s and 0s to 0.9999 and 0.0001, respectively. I have nearly 300k users in my data and when I try to fit the model I get Error: vector memory exhausted (limit reached?). I've got around this by sampling a smaller subset of users before fitting the model. The problem is that the fitted values from the beta regression are too low. For example the observed CTR for the treatment condition is 92% but betareg gives an estimate of 80%. In contrast, the simple logistic regression model gives an estimate that's very close to the observed values. – Adrià Luz Jun 20 '23 at 15:14
  • I'm also confused by the fact that I can fit a binomial GLM on the proportions without specifying any weights and the results seem sensible (certainly more sensible than betareg's). Do you know what goes on when fitting such a model? – Adrià Luz Jun 20 '23 at 15:41
  • But see https://stats.stackexchange.com/questions/216122/what-is-the-difference-between-logistic-regression-and-fractional-response-regre – kjetil b halvorsen Jun 28 '23 at 14:33

0 Answers0