How can I determine the significance of two sets of paired, independent measurements?

Question

Let's assume I am trying to find out whether individual people are better at beer pong while sober or drunk. In this experiment I do four trials. On Monday, I first have them throw 100 times sober and record their results. Then, I have them drinks x number of beers and throw another 100 and record their results. I then repeat the experiment the following Monday. So, I have two independent datasets that look like this:

People	Sober	Drunk	Drunk - Sober
A	25	20	-5
B	52	75	23

where there are ~20,000 people shared between datasets.

With this data, what statistics would I perform to select people who show a significant increase or decrease in skill while drunk? My initial thought is some sort of rank test, but I can't convince myself of a method. I'm a probability newbie and am not aware of many of the methods.

So can we say 10k people took part in experiment A and 10k other people participated in experiment B? And you are not interested in average differences in outcome, but you want to identify specific "overachieving" outliers (individual datapoints)? — confused student, Jan 07 '22 at 18:47

bdeonovic · Answer 1 · 2022-01-07T20:27:57.610

Regression is usually a good first approach. I think it satisfies all of your questions. In this case we will do a generalized linear mixed model to account for the replication for each person.

Here is a simulation to make some data (I just used n=100 but you can also do this with n=20000).

library(MASS)
library(tidyverse)
library(lme4)
set.seed(123)
n<-100
#simulate data
#person effects
p <- mvrnorm(n, mu = c(0,0), Sigma = matrix(c(1, 0.6, 0.6, 1), 2, 2) ) # from MASS package
#population effects
alpha<- -3
trt <- -1
df<-rbind(data.frame(person=1:n, trt="sober", rep=1,
           result = rbinom(n, size=100, prob=plogis(alpha+ p[,1]))),
      data.frame(person=1:n, trt="drunk", rep=1,
                 result = rbinom(n, size=100, prob=plogis(alpha+ p[,1] + trt + p[,2]))),
      data.frame(person=1:n, trt="sober", rep=2,
                 result = rbinom(n, size=100, prob=plogis(alpha+ p[,1]))),
      data.frame(person=1:n, trt="drunk", rep=2,
                 result = rbinom(n, size=100, prob=plogis(alpha+ p[,1] + trt + p[,2]))))
df$trials <- 100
df$person <- as.factor(df$person)
df$trt <- factor(df$trt, levels=c("sober","drunk"), labels=c("sober","drunk"))

Above was just to simulate the data. I set the population intercept alpha to -3 and the population treatment effect trt to -1. We fit the model

#model
m<-glmer(cbind(result, trials-result) ~ trt + (1+trt|person),data=df, family=binomial)

Lets take a look at how well it recovered the parameters

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: cbind(result, trials - result) ~ trt + (1 + trt | person)
   Data: df
 AIC      BIC   logLik deviance df.resid 

1917.3   1937.3   -953.6   1907.3      395
Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.8100 -0.6476 -0.1752  0.4740  1.9488
Random effects:
 Groups Name        Variance Std.Dev. Corr
 person (Intercept) 0.9582   0.9789

        trtdrunk    0.7849   0.8860   0.61
Number of obs: 400, groups:  person, 100
Fixed effects:
            Estimate Std. Error z value Pr(>|z|)

(Intercept)  -2.9202     0.1049 -27.832  < 2e-16 ***
trtdrunk     -0.9678     0.1186  -8.163 3.27e-16 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
         (Intr)
trtdrunk 0.321

We can see that it estimated the population level effects well. Next lets rank the persons by their probability to make one shot sober and their probability to make one shot drunk

#predictions
preddf <- expand.grid(person=levels(df$person), trt=levels(df$trt))
preddf$pred <- predict(m, newdata=preddf, type="response")
#top sober players
preddf %>% filter(trt == "sober")  %>% slice_max(pred, n=10)
#top drunk players
preddf %>% filter(trt == "drunk")  %>% slice_max(pred, n=10)

If you are interested strictly in the persons whose skill increases the most you can look at

preddf %>% pivot_wider(names_from=trt, values_from=pred) %>% mutate(diff=drunk-sober) %>% slice_max(diff, n=10)

You could probably arrive at a ranking that is pretty similar by just looking at the data

df %>% pivot_wider(names_from=trt, values_from=result) %>% mutate(diff=drunk-sober) %>% group_by(person) %>% summarize(diff_mean = mean(diff)) %>% slice_max(diff_mean, n=10)

but in your question you also asked about "significant" increases. Essentially you want to put some kind of 'confidence' interval on these point estimates. This is a lot easier if we estimate the same model in a Bayesian setting (I won't go into the details of prior choices, we'll go with the defaults in brms)

library(brms)
top_persons <- df %>% pivot_wider(names_from=trt, values_from=result) %>% mutate(diff=drunk-sober) %>% group_by(person) %>% summarize(diff_mean = mean(diff)) %>% slice_max(diff_mean, n=10) %>% dplyr::select(person) %>% deframe()
m_bayes <-brm(result | trials(trials) ~ trt + (1+trt|person),data=df, family=binomial)
post_diff <- as.data.frame(plogis(fixef(m_bayes, summary=F)[,1] + 
                                    ranef(m_bayes, summary=F)$person[,,1] + 
                                    fixef(m_bayes, summary=F)[,2] + 
                                    ranef(m_bayes, summary=F)$person[,,2]) - 
                             plogis(fixef(m_bayes, summary=F)[,1] + fixef(m_bayes, summary=F)[,2]))

Here is the posterior distribution of the increase in success probability while drunk of the players with the 10 highest increases:

post_diff %>% mutate(iter = 1:n()) %>% pivot_longer(cols=-iter) %>% 
  filter(name %in% top_persons) %>% 
  ggplot(aes(y=fct_reorder(name, value), x=value)) + stat_halfeye()

You can see for these top 10 their posterior probabilities of skill increase are all well above zero.

How can I determine the significance of two sets of paired, independent measurements?

1 Answers1