How to determine effect size confidence interval in a difference-of-differences A/B test

Question

This seems like it should be a really simple thing, so I apologize if this has been answered elsewhere, but I've browsed dozens of existing questions and none seem to fit exactly.

Suppose I have two cohort groups, control and target. I apply a treatment to the target, and want to use DiD to compare conversion deltas between the two groups; pre and post treatment.

I believe the following r code correctly proves that in our instance, the effect is truly non-zero (i.e. we reject the NULL hypothesis). However, because it's a binomial GLM, the confidence intervals are bounding the probability that it's non-zero, not the actual effect itself:

Converted <- c(2526,2500,2818,2268)
Offered <- c(3992,3956,4484,3996)
Cohort <- factor(c("Target","Target","Control","Control"))
Period <- factor(c(0,1,0,1))
ConvRate <- Converted / Offered
fit <- glm(ConvRate~Cohort*Period, family=binomial, weights=Offered)
summary(fit)
confint(fit)

(Summarized) Output:

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)           0.52560    0.03090  17.007  < 2e-16 ***
CohortTarget          0.01850    0.04509   0.410 0.681633    
Period1              -0.25367    0.04444  -5.708 1.14e-08 ***
CohortTarget:Period1  0.25017    0.06434   3.888 0.000101 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:  5.1064e+01  on 3  degrees of freedom

Residual deviance: -7.9270e-13  on 0  degrees of freedom
AIC: 42.851
Number of Fisher Scoring iterations: 2
Waiting for profiling to be done...
                          2.5 %     97.5 %
(Intercept)           0.4651823  0.5863361
CohortTarget         -0.0698583  0.1069002
Period1              -0.3407991 -0.1665982
CohortTarget:Period1  0.1240757  0.3762874

In order to get the actual effect size, I ran this through a LM, but it's refusing to provide me with confidence intervals;

lfit <- lm(ConvRate~Cohort*Period, weights=Offered)
summary(lfit)
confint(lfit)

Output:

Call:
lm(formula = ConvRate ~ Cohort * Period, weights = Offered)
Weighted Residuals:
ALL 4 residuals are 0: no residual degrees of freedom!
Coefficients:
                      Estimate Std. Error t value Pr(>|t|)
(Intercept)           0.628457         NA      NA       NA
CohortTarget          0.004309         NA      NA       NA
Period1              -0.060889         NA      NA       NA
CohortTarget:Period1  0.060075         NA      NA       NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 3 and 0 DF,  p-value: NA
                 2.5 % 97.5 %

(Intercept)            NaN    NaN
CohortTarget           NaN    NaN
Period1                NaN    NaN
CohortTarget:Period1   NaN    NaN
Warning message:
In qt(a, object$df.residual) : NaNs produced

A similar issue results with family=gaussian when using GLM again:

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)
(Intercept)           0.628457         NA      NA       NA
CohortTarget          0.004309         NA      NA       NA
Period1              -0.060889         NA      NA       NA
CohortTarget:Period1  0.060075         NA      NA       NA
(Dispersion parameter for gaussian family taken to be NaN)
Null deviance: 1.2194e+01  on 3  degrees of freedom

Residual deviance: 2.0249e-28  on 0  degrees of freedom
AIC: -272.54
Number of Fisher Scoring iterations: 1
Waiting for profiling to be done...
Error in while ((step <- step + 1) < maxsteps && abs(z) < zmax) { : 
  missing value where TRUE/FALSE needed
Calls: confint -> confint.glm -> profile -> profile.glm
In addition: Warning message:
In qf(1 - alpha, 1, n - p) : NaNs produced
Execution halted

I'm wondering how I can get the effect size confidence intervals? The difference in differences is ~6% but I'm looking for statistically significant intervals.

Perhaps I'm misunderstanding your code, but it seems you arranged the data you feed to your regression such that you only have four data points when in reality you have several thousand data points. — num_39, Mar 07 '23 at 21:43
I accidentally posted this under a new account that can't comment, I'll comment from this one. Yes, I pre-aggregated the data as y is just a binary yes/no. — Darrrrrren, Mar 07 '23 at 21:47
That's not going to work b/c you're regression won't be able to calculate the variance. You need a dataset with a row for each observation. Your case is simpler but see how it's done here in this answer: https://stats.stackexchange.com/questions/560795/is-this-the-correct-way-to-estimate-a-difference-in-differences-model-in-r-mult — num_39, Mar 07 '23 at 21:58

score 1 · Answer 1 · answered Mar 07 '23 at 22:43

As I noted in the comments, the problem is that you're not providing the full dataset to the regression function.

For a toy example that involves only 20 observations

period <- c(rep(0, 10), rep(1, 10))
treatment <- c(rep(1, 5), rep(0, 5), rep(1, 5), rep(0, 5))
converted <- c(1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0)
dat <- data.frame(period, treatment, conversion)
head(dat)
summary(lm(converted ~ period + treatment +  period * treatment))

Now, you get a standard error for the interaction because it's possible to estimate the variance.

Thank you! I will try this. – Darrrrrren Mar 08 '23 at 13:32 — Darrrrrren, Mar 08 '23 at 13:32

How to determine effect size confidence interval in a difference-of-differences A/B test

1 Answers1