How should I conduct a multiple regression analysis that is A/B tested?

Question

For my master thesis, I have a study consisting of 1 independent variable (digital media channel; level 1 = email, level 2 = social media), 1 pure moderator variable (client type; level 1 = existing client, level 2 = potential client) and a metric dependent variable (intention to click (ITC); answered on a 5-point Likert scale).

Only now I am stuck as I do not know how to conduct a multiple regression analysis for an A/B test. I have the following variables in my dataset:

ITC for email (value: 1, 2, 3, 4, 5)
Client type for email (value: 0, 1)
ITC for social media (value: 1, 2, 3, 4, 5)
Client type for social media (value: 0, 1) I can create a dummy variable for whether a participant is exposed to email or social media. Here’s a sample of my data:

structure(list(ITCEmail = c(4, 2, 5, 2, 4, 2), ClientEmail = c(0, 0, 1, 1, 1, 0)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

I am not sure if I have to conduct to separate regressions; one for email and one for social media.

model.email <- lm(ITCemail ~ clientemail, data = study1email)

I tried several things, among which the above one. However, now client email looks like the IV. My regression equation is the following:

Y = b0 + b1x1 + b2x2 + b3(x1*x2) + e

Can somebody help me with how I correctly formulate the data in the A/B test?

Which code should I put down to compare both digital media channels (as well as the influence of client type) on ITC?

It looks like you need an ordinal logistic regression rather than a regression with normally distributed errors because your data are on a Likert scale. You seem to have more independent variables than you're showing in your equation. This tutorial might help you out — adkane, Dec 08 '22 at 15:07

score 0 · Answer 1 · answered Dec 08 '22 at 15:38

From your dput, it looks like you actually have two dataframes, one with email data and one with social media data. To do your analysis, you'll want to combine these into one long dataset, with columns for channel, ITC, and client type. Using dplyr::bind_rows():

library(dplyr)
study1 <- bind_rows(
  Email = setNames(study1email, c("ITC", "Client")),
  Social = setNames(study1social, c("ITC", "Client")),
  .id = "Channel"
)
study1
#> # A tibble: 12 × 3
#>    Channel   ITC Client
#>    <chr>   <dbl>  <dbl>
#>  1 Email       4      0
#>  2 Email       2      0
#>  3 Email       5      1
#>  4 Email       2      1
#>  5 Email       4      1
#>  6 Email       2      0
#>  7 Social      1      1
#>  8 Social      4      0
#>  9 Social      1      1
#> 10 Social      3      0
#> 11 Social      2      1
#> 12 Social      4      0

You can then specify your model as follows; using * will include main effects of Channel and Client and their interaction:

model1 <- lm(ITC ~ Channel * Client, data = study1)
summary(model1)
#> 
#> Call:
#> lm(formula = ITC ~ Channel * Client, data = study1)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.6667 -0.6667  0.0000  0.4167  1.3333 
#> 
#> Coefficients:
#>                      Estimate Std. Error t value Pr(>|t|)

#> (Intercept)            2.6667     0.6009   4.438  0.00217 **
#> ChannelSocial          1.0000     0.8498   1.177  0.27314

#> Client                 1.0000     0.8498   1.177  0.27314

#> ChannelSocial:Client  -3.3333     1.2019  -2.774  0.02417 * 
#> ---
#> Signif. codes:  0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.041 on 8 degrees of freedom
#> Multiple R-squared:  0.5593, Adjusted R-squared:  0.3941 
#> F-statistic: 3.385 on 3 and 8 DF,  p-value: 0.07457

^{Created on 2022-12-08 with reprex v2.0.2}

Example data:

study1email <- tibble::tibble(
  ITCEmail = c(4, 2, 5, 2, 4, 2), 
  ClientEmail = c(0, 0, 1, 1, 1, 0)
)
study1social <- tibble::tibble(
  ITCSocial = c(1, 4, 1, 3, 2, 4), 
  ClientSocial = c(1, 0, 1, 0, 1, 0)
)

Thank you! This is clear. Now I am just a bit struggling with the regression equation. How should I interpret it? If the channel is social media people are more likely to click? And existing clients are more likely to click than non-clients? And the interaction between social media and existing clients causes a decrease in intention to click? — Roger, Dec 09 '22 at 08:52
See this question. Briefly, the results imply that social media predicts higher ITC than email among existing clients, but lower ITC then email among potential clients. It can also be helpful to plot the mean ITC by channel and client type to see what’s going on. — zephryl, Dec 09 '22 at 10:12
Thank you so much! It is really helpful for me. I am sorry for all the questions. I managed to plot the line of either the digital channel or the client, but how does that work for the interaction? — Roger, Dec 09 '22 at 11:29
Using sjPlot, you could do library(sjPlot); plot_model(model1, type = "int"). — zephryl, Dec 09 '22 at 13:19
Here is a detailed tutorial on probing interactions in R; see the section on categorical by categorical interactions, including additional plotting options. — zephryl, Dec 09 '22 at 13:21

How should I conduct a multiple regression analysis that is A/B tested?

1 Answers1