Strange result with GLMM (binomial)

Question

I'm analyzing some data, using GLMM and obtain very strange results. The data is of student passing a test, each group of students belong to a different school. So I analyzed the data using glmer(is_pass ~ (1|school), data, family = 'binomial'). The schools are the random effect.

Now, the proportion of passing is very high. The average across all schools is 0.99. However, the confidence interval obtained from the GLMM is between 0.68 - 1. Furthermore, if I construct the Clopper-Pearson CI for each of the schools individually each CI is actually shorter (with the minimal one, the only school where students failed its 0.75 - 0.99).

  Confidence.Interval Lower.limit Upper.limit alpha
            two.sided   0.8765639   1.0000000  0.05
1           two.sided   0.8765639   1.0000000  0.05
2           two.sided   0.8765639   1.0000000  0.05
3           two.sided   0.8842967   1.0000000  0.05
4           two.sided   0.8765639   1.0000000  0.05
5           two.sided   0.8518149   1.0000000  0.05
6           two.sided   0.8628148   1.0000000  0.05
7           two.sided   0.8575264   1.0000000  0.05
8           two.sided   0.7486971   0.9905446  0.05

The glmer function returns no error or warnings. Why does it happen? and how can I circumvent it?

Data attached in dput structure as well as code for analysis.

library(lme4)
library(tidyverse)
data <- structure(list(
  student = c("1004", "1007", "1008", "1009", "1011", 
              "1012", "1014", "1015", "1016", "1017", "1018", "1020", "1021", 
              "1022", "1023", "1024", "1025", "1026", "1029", "1030", "1031", 
              "1032", "1033", "1034", "1035", "1036", "1037", "1038", "1039", 
              "1040", "1041", "1042", "1043", "1044", "1045", "1046", "1047", 
              "1048", "1049", "1050", "1051", "1052", "1053", "1054", "1055", 
              "1056", "1057", "1058", "1059", "1060", "1061", "1062", "1063", 
              "1064", "1065", "1066", "1067", "1068", "1069", "1070", "1071", 
              "1072", "1073", "1074", "1075", "1076", "1077", "1078", "1080", 
              "1081", "1082", "1083", "1084", "1085", "1086", "1087", "1088", 
              "1089", "1090", "1092", "1093", "1094", "1095", "1096", "2003", 
              "2004", "2006", "2007", "2008", "2009", "2010", "2011", "2012", 
              "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", 
              "2021", "2022", "2023", "2024", "2025", "2026", "2027", "2028", 
              "2029", "2030", "2031", "2032", "2033", "2034", "2035", "2036", 
              "2037", "2038", "2039", "2040", "2041", "2042", "2043", "2044", 
              "2045", "2046", "2047", "2048", "2049", "2050", "2052", "2053", 
              "2054", "2055", "2056", "2057", "2058", "2059", "2060", "2061", 
              "2062", "2063", "2064", "2065", "2066", "2067", "2068", "2069", 
              "2071", "2072", "2073", "2075", "2076", "2077", "2078", "2079", 
              "2080", "2081", "2082", "2083", "2084", "2085", "2086", "2087", 
              "5004", "5008", "5009", "5010", "5011", "5012", "5013", "5014", 
              "5015", "5016", "5018", "5019", "5020", "5022", "5024", "5025", 
              "5026", "5027", "5028", "5030", "5031", "5032", "5033", "5034", 
              "5035", "5036", "5037", "5038", "5039", "5040", "5041", "5042", 
              "5043", "5044", "5045", "5046", "5047", "5048", "5049", "5050", 
              "5051", "5052", "5053", "5054", "5055", "5056", "5057", "5058", 
              "5059", "5060", "5061", "5062", "5063", "5064", "5065", "5066", 
              "5067", "5068", "5069", "5071", "5072", "5073", "5074", "5075", 
              "5076", "5077", "5078", "5079", "5080", "5081", "5082", "5083", 
              "5084", "5085", "5086"), 
  school = c("153", "152", "153", "154", 
             "152", "154", "153", "152", "153", "154", "152", "153", "152", 
             "153", "152", "153", "152", "153", "152", "153", "152", "152", 
             "154", "154", "154", "154", "152", "152", "153", "152", "153", 
             "152", "153", "152", "154", "152", "154", "152", "154", "154", 
             "152", "154", "152", "152", "154", "154", "152", "152", "152", 
             "153", "152", "152", "153", "153", "153", "153", "153", "154", 
             "154", "154", "153", "153", "154", "154", "154", "154", "153", 
             "154", "153", "154", "153", "154", "154", "153", "153", "153", 
             "153", "153", "152", "152", "152", "154", "154", "154", "252", 
             "253", "251", "253", "252", "252", "251", "253", "251", "252", 
             "251", "251", "253", "251", "251", "251", "251", "253", "252", 
             "252", "252", "251", "251", "253", "253", "252", "251", "252", 
             "252", "253", "253", "253", "253", "252", "252", "253", "252", 
             "251", "252", "251", "253", "253", "252", "252", "251", "253", 
             "251", "251", "251", "252", "252", "252", "252", "251", "252", 
             "252", "253", "253", "251", "252", "253", "252", "251", "253", 
             "252", "251", "252", "253", "253", "253", "253", "251", "252", 
             "252", "251", "251", "251", "251", "251", "251", "251", "553", 
             "554", "554", "554", "553", "553", "553", "553", "553", "553", 
             "552", "552", "552", "552", "554", "554", "553", "552", "552", 
             "554", "553", "553", "553", "554", "552", "552", "552", "552", 
             "552", "552", "554", "554", "554", "554", "553", "553", "553", 
             "553", "552", "552", "552", "552", "553", "553", "553", "553", 
             "553", "552", "552", "553", "553", "553", "553", "552", "552", 
             "552", "552", "552", "552", "552", "554", "554", "554", "554", 
             "554", "554", "554", "554", "554", "554", "554", "554", "554", 
             "554", "554"), 
  is_pass = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, 
              TRUE, TRUE, TRUE, TRUE, TRUE)), 
  class = "data.frame", 
  row.names = c(NA, -240L))
res        <- glmer(is_pass ~ (1|school), data, family = 'binomial') 
coef_logis <- coef(summary(res))
est_p      <- boot::inv.logit(coef_logis[1])
lb         <- boot::inv.logit(coef_logis[1] - coef_logis[2] * qnorm(1 - 0.05 / 2))
ub         <- boot::inv.logit(coef_logis[1] + coef_logis[2] * qnorm(1 - 0.05 / 2))
mean(data$is_pass)
clopper_ci <- data %>% 
  group_by(school) %>% 
  summarise(mean_pass = mean(is_pass),
            sum_pass  =  sum(is_pass),
            n         = n())
cis <- NULL 
for (i in 1:nrow(clopper_ci)) {
  ci  <- GenBinomApps::clopper.pearson.ci(clopper_ci$sum_pass[i], clopper_ci$n[i], CI ='two.sided', alpha = 0.05)
  cis <- rbind(cis, ci)
}
cis

dipetkov · Accepted Answer · 2022-08-26T07:37:44.977

You construct a Wald confidence interval for the log odds of passing the test: $\hat{\theta} \pm z_{1-\alpha/2}\operatorname{SE}(\hat{\theta})$. This is based on the theory that the maximum likelihood estimator (MLE) is asymptotically Normal. However, since the probability of passing the test is very close to 1 (its upper bound), the distribution of the log odds estimator $\hat{\theta}$ is somewhat asymmetric, so not close to Normal. (Of course the approximation gets better as the sample size increases.)

broom.mixed::tidy(model, "fixed", conf.int = TRUE, conf.method = "Wald") %>%
  mutate(
    across(c(estimate, conf.low, conf.high), plogis)
  )
#> # A tibble: 1 × 5
#>   effect term        estimate conf.low conf.high
#>   <chr>  <chr>          <dbl>    <dbl>     <dbl>
#> 1 fixed  (Intercept)     1.00    0.683      1.00

Instead construct a profile likelihood confidence interval which doesn't assume that the log-likelihood function is Normal at the MLE or even that it is symmetric. So the profile confidence interval has better statistical properties in this case and it is narrow as you expect.

Constructing confidence intervals based on profile likelihood

broom.mixed::tidy(model, "fixed", conf.int = TRUE, conf.method = "profile") %>%
  mutate(
    across(c(estimate, conf.low, conf.high), plogis)
  )
#> Computing profile confidence intervals ...
#> # A tibble: 1 × 5
#>   effect term        estimate conf.low conf.high
#>   <chr>  <chr>          <dbl>    <dbl>     <dbl>
#> 1 fixed  (Intercept)     1.00    0.985         1

Can you explain your first sentence? "However, the probability of passing the test is very close to the boundary". What do you mean by that? I don't necessarily conduct any statistical test. — Kozolovska, Aug 25 '22 at 14:00
It's very close to 1. It would be the same if it was close to 0, the lower bound on probabilities. There is nothing wrong with the regression by the way. It's just that the Wald confidence interval $\hat{\theta} \pm 2\text{se}(\hat{\theta})$ is not a great choice in this case. You see that you get much tighter profile confidence interval from the same model. — dipetkov, Aug 25 '22 at 14:09

Strange result with GLMM (binomial)

1 Answers1