I have N bernoulli random variables $(X_1,...,X_N)$, how to test they are mutually independent using a random sample $(X_{1i},...,X_{Ni})_{i=1}^n$?

Question

Suppose I have random variables $X_1,...,X_N$ with $X_j\sim Bernoullli(p_j)$, and I have a random sample of size $n$ on them: $(X_{1i},...,X_{Ni})_{i=1}^n$. How to test $H_0:X_1,..,X_N$ are mutually independent (vs $H_1:$ they are not mutually independent)? A candidate test statistic I can think of is constructed as follows:

Because by definition of mutual independence, $H_0$ holds if and only if $Pr((X_1,...,X_N)=(x_1,...,x_N))=Pr(X_1=x_1)\times...\times Pr(X_N=x_N)$ for any $(x_1,...,x_N)$, the candidate test statistic just uses the sample analog of this condition:

I estimate $Pr((X_1,...,X_N)=(x_1,...,x_N))$ using $\hat{p}_{x_1...x_N}=\frac{\sum_{i=1}^n\mathbf{1}((X_{1i},...,X_{Ni})=(x_1,...,x_N))}{n}$, and estimate $Pr(X_j=x_j)$ using $\hat{p}_{j,i}=\frac{\sum_{i=1}^n\mathbf{1}(X_{ji}=x_i)}{n}$. Then I construct the following test statistic

$T=n||\begin{bmatrix}\hat{p}_{11...1}-\hat{p_{1,1}}\times\hat{p_{2,1}}...\times\hat{p_{N,1}}\\ \hat{p}_{01...1}-\hat{p_{1,0}}\times\hat{p_{2,1}}...\times\hat{p_{N,1}}\\ ...\\ \hat{p}_{0...01}-\hat{p_{1,0}}\times\hat{p_{2,0}}...\times\hat{p_{N,1}}\\ \end{bmatrix}||_{\hat{V}^{-1}}^2$, where the vector inside the norm has $2^N-1$ rows as I run over all possible $(x_1,...,x_N)$ except for $(x_1,...,x_N)=(0,...,0)$. I need to exclude $(0,...,0)$ as this probability is implied by the other $2^N-1$ probabilities under $H_0$. $\hat{V}^{-1}$ is the inverse of the estimated asymptotic variance-covariance matrix of the vector inside the norm. I should reject the null for large values of $T$.

I have three questions:

Could this test do the job? Intuitively I think it can.
Intuitively I think this test statistic should have an asymptotic $\chi^2_{2^N-1}$ distribution, but because the form of the estimated asymptotic variance $\hat{V}$ is very complex, I would prefer to implement it using bootstrap. My question is how to use bootstrap to do the test and circumvent the need to compute $\hat{V}$ analytically.
Are there any other better test that could also do the job?

Thanks!

Lukas Lohse · Accepted Answer · 2022-11-17T20:27:29.643

Nothing new under the sun. You are reinventing higher dimensional contingencies tables. See here: https://stats.stackexchange.com/a/148174/341520

Also you have overlooked to subtract the df for estimating $P(X_i = 1)$ with the $\hat{p}_{i, 1}$; $\hat{p}_{i, 0}$ being redundant, but a nice notational touch.

Here's some R-Code that demonstrates how to use log-linear-models in your case

library(tidyverse)
library(MASS)
n <- 5000
N <- 10
p <- rbeta(N, 41, 41) # nice ps around 0.5
hist(p)
res <- t(replicate(n, {rbinom(N, size = 1, prob = p)}))
df <- as.data.frame(res)
I don't want to use tab, because it doesn't work with formula = ~.
so i have to make sure all combinations show up, even those tht aren't in the data
tab <- table(df)
which(tab == 0)
eval_str <- paste("expand.grid(", paste(rep("c(0,1)", N), collapse = ","), ")", sep = "")
combs <- eval(parse(text = eval_str))
first add on the extra combinations
df_counts <- rbind(df, setNames(combs, names(df))) %>% 
  group_by_all() %>% 
  summarise(COUNT = n() - 1) #now subtract the extra
all is well
loglm(COUNT ~ ., data = df_counts)
increase p_1,1 by 25% depending on V2-V5
df2 <- df %>% mutate(V1 = if_else(V2 == V3 & V4 == 1 & V5 == 0, rbinom(1, 1, p[1]*1.25), V1))
df2_count <- rbind(df2, setNames(combs, names(df))) %>% 
  group_by_all() %>% 
  summarise(COUNT = n() - 1)
problem detectes
loglm(COUNT ~ ., data = df2_count)
it's a 5 level interaction, hundreds of coefficients make the model struggle
loglm(COUNT ~ (.)^5, data = df2_count)

What did i do in the Code:

I set $n = 5000, N = 10$ and i draw the true $(p_i)_{1,..., N}$ from a Beta-distribution with $\alpha =\beta = 41$. I then put my simualtion results int a data-frame which i (badly, given the context degrees of freedom) call df for short. df is basically a matrix with each realization $(X_{1 i}, ..., X_{N i})$ as its $i$th row.

Then I create the 'df_count' which contains all $2^N$ possible combinations of $x_{1}, ..., x_{N})$ , with each $x_i$ in it's own column and a final column called COUNT, showing $\Sigma 1 ((X_1, ..., X_N) = (x_1, ..., x_n))$ including the ones which didn't show up in the simulation, which is the difficult part in the middle. The other Columns of df_count are now called V, with $Vi = \begin{pmatrix} x_{i1} \\ x_{i2}\\ ...\\ x_{in} \end{pmatrix} $ and then i fit a log linear model with ~ ., which means use all remaining columns. How log linear modells work you have research yourself(https://en.wikipedia.org/wiki/Log-linear_analysis ), but the basic idea is $\log(p_{x_1,...}) = \log(P(X_1 = x_1)) + ...$

The result is just that the Variation from this modell to a saturated one (one paramter per count) is chi-Squared distrubted and sensibly sized. I the create a dependence by $P(X_1 = 1| X_2 = X_3 \wedge X_4 = 1\wedge X_5 = 0) = 1.25*P(X_1 = 1| otherwise)$ and the model picks up on the disturbance.

I can then add interactions, which are basically parameters for intersections, here i have to model 5 levels deep.

Thanks, Lukas! This is very helpful! You are right, I need to substract the dfs for estimating these probabilities, will try to figure out the exact number. Also, since I'm not an R user, could you also illustrate what you are doing with the R codes using math formulas? Thanks again! — ExcitedSnail, Nov 17 '22 at 15:30
Ok, but only because I'm training my Latex. Also I highly recommend you pick a programming language to support your investigations with some simulations. — Lukas Lohse, Nov 17 '22 at 20:18

I have N bernoulli random variables $(X_1,...,X_N)$, how to test they are mutually independent using a random sample $(X_{1i},...,X_{Ni})_{i=1}^n$?

1 Answers1

I don't want to use tab, because it doesn't work with formula = ~.

so i have to make sure all combinations show up, even those tht aren't in the data

first add on the extra combinations

all is well

increase p_1,1 by 25% depending on V2-V5

problem detectes

it's a 5 level interaction, hundreds of coefficients make the model struggle