Regression is usually a good first approach. I think it satisfies all of your questions. In this case we will do a generalized linear mixed model to account for the replication for each person.
Here is a simulation to make some data (I just used n=100 but you can also do this with n=20000).
library(MASS)
library(tidyverse)
library(lme4)
set.seed(123)
n<-100
#simulate data
#person effects
p <- mvrnorm(n, mu = c(0,0), Sigma = matrix(c(1, 0.6, 0.6, 1), 2, 2) ) # from MASS package
#population effects
alpha<- -3
trt <- -1
df<-rbind(data.frame(person=1:n, trt="sober", rep=1,
result = rbinom(n, size=100, prob=plogis(alpha+ p[,1]))),
data.frame(person=1:n, trt="drunk", rep=1,
result = rbinom(n, size=100, prob=plogis(alpha+ p[,1] + trt + p[,2]))),
data.frame(person=1:n, trt="sober", rep=2,
result = rbinom(n, size=100, prob=plogis(alpha+ p[,1]))),
data.frame(person=1:n, trt="drunk", rep=2,
result = rbinom(n, size=100, prob=plogis(alpha+ p[,1] + trt + p[,2]))))
df$trials <- 100
df$person <- as.factor(df$person)
df$trt <- factor(df$trt, levels=c("sober","drunk"), labels=c("sober","drunk"))
Above was just to simulate the data. I set the population intercept alpha to -3 and the population treatment effect trt to -1. We fit the model
#model
m<-glmer(cbind(result, trials-result) ~ trt + (1+trt|person),data=df, family=binomial)
Lets take a look at how well it recovered the parameters
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(result, trials - result) ~ trt + (1 + trt | person)
Data: df
AIC BIC logLik deviance df.resid
1917.3 1937.3 -953.6 1907.3 395
Scaled residuals:
Min 1Q Median 3Q Max
-1.8100 -0.6476 -0.1752 0.4740 1.9488
Random effects:
Groups Name Variance Std.Dev. Corr
person (Intercept) 0.9582 0.9789
trtdrunk 0.7849 0.8860 0.61
Number of obs: 400, groups: person, 100
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.9202 0.1049 -27.832 < 2e-16 ***
trtdrunk -0.9678 0.1186 -8.163 3.27e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
trtdrunk 0.321
We can see that it estimated the population level effects well. Next lets rank the persons by their probability to make one shot sober and their probability to make one shot drunk
#predictions
preddf <- expand.grid(person=levels(df$person), trt=levels(df$trt))
preddf$pred <- predict(m, newdata=preddf, type="response")
#top sober players
preddf %>% filter(trt == "sober") %>% slice_max(pred, n=10)
#top drunk players
preddf %>% filter(trt == "drunk") %>% slice_max(pred, n=10)
If you are interested strictly in the persons whose skill increases the most you can look at
preddf %>% pivot_wider(names_from=trt, values_from=pred) %>% mutate(diff=drunk-sober) %>% slice_max(diff, n=10)
You could probably arrive at a ranking that is pretty similar by just looking at the data
df %>% pivot_wider(names_from=trt, values_from=result) %>% mutate(diff=drunk-sober) %>% group_by(person) %>% summarize(diff_mean = mean(diff)) %>% slice_max(diff_mean, n=10)
but in your question you also asked about "significant" increases. Essentially you want to put some kind of 'confidence' interval on these point estimates. This is a lot easier if we estimate the same model in a Bayesian setting (I won't go into the details of prior choices, we'll go with the defaults in brms)
library(brms)
top_persons <- df %>% pivot_wider(names_from=trt, values_from=result) %>% mutate(diff=drunk-sober) %>% group_by(person) %>% summarize(diff_mean = mean(diff)) %>% slice_max(diff_mean, n=10) %>% dplyr::select(person) %>% deframe()
m_bayes <-brm(result | trials(trials) ~ trt + (1+trt|person),data=df, family=binomial)
post_diff <- as.data.frame(plogis(fixef(m_bayes, summary=F)[,1] +
ranef(m_bayes, summary=F)$person[,,1] +
fixef(m_bayes, summary=F)[,2] +
ranef(m_bayes, summary=F)$person[,,2]) -
plogis(fixef(m_bayes, summary=F)[,1] + fixef(m_bayes, summary=F)[,2]))
Here is the posterior distribution of the increase in success probability while drunk of the players with the 10 highest increases:
post_diff %>% mutate(iter = 1:n()) %>% pivot_longer(cols=-iter) %>%
filter(name %in% top_persons) %>%
ggplot(aes(y=fct_reorder(name, value), x=value)) + stat_halfeye()

You can see for these top 10 their posterior probabilities of skill increase are all well above zero.