3

Let's say $\{x_i\}_{i=1}^m,\{y_i\}_{j=1}^n$ are i.i.d samples from two independent multivariate normal populations $N_d(\mu_1,\Sigma)$ and $N_d(\mu_2,\Sigma)$. How can I run a hypothesis test to test $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1\neq \mu_2$ when (a) $\Sigma$ is known and (b) $\Sigma$ is unknown?

(a) I use Hotelling $T^2$ statistic such that $T^2 = n(\bar{x}-\mu_0)^T\Sigma_0^{-1}(\bar{x}-\mu_0),$ which is $\chi^2_p$ distributed.

Is this correct? Then, for when $\Sigma$ is UNKNOWN, would we just use the MLE of $\Sigma$ and use $T^2$ as before? Or would we use an F-statistic? I don't really understand.

CCZ23
  • 304

1 Answers1

0

The independence and multivariate normality assumptions immediately imply $$ \begin{pmatrix} x_1 \\ \vdots \\ x_m \\ y_1 \\ \vdots \\ y_n \end{pmatrix} \sim \mathcal{N}_{\left(m+n\right)d} \left( \begin{pmatrix} \mathbf{1}_m \otimes \mu_1\\ \mathbf{1}_n \otimes \mu_2 \end{pmatrix} , I_{m+n} \otimes \Sigma \right). $$ Hence, $$ \bar{x} - \bar{y} \sim \mathcal{N}_d \left(\mu_1 - \mu_2, \frac{m+n}{mn} \cdot \Sigma \right) $$ and $$ \frac{mn}{m+n} \cdot \left(\bar{x} - \bar{y}\right)^\top \Sigma^{-1} \left(\bar{x} - \bar{y}\right) \mathrel{=:} T_a \overset{H_0}{\sim} \chi^2\left(d\right). $$


If $\Sigma$ is unknown we can start from $$ m \cdot S_x + n \cdot S_y \sim W_d\left(\Sigma, m + n -2 \right),\\ S_x=\frac{1}{m}\sum_{i=1}^{m}\left(x_i - \bar x\right)\left(x_i -\bar x\right)^\top, \\ S_y=\frac{1}{n}\sum_{i=1}^{n}\left(y_i - \bar y\right)\left(y_i -\bar y\right)^\top, $$ where $W_d\left(\Sigma, m + n -2\right)$ denotes the Wishart distribution with scale matrix $\Sigma \in \mathbb{R}^{d \times d}$ and $m + n -2$ degrees of freedom.
Since $S = \left(m \cdot S_x + n \cdot S_y\right)/\left(m + n\right)$ is independent of $\left(\bar{x} - \bar{y}\right)$, we get $$ \frac{mn\left(m + n - 2\right)}{\left(m + n \right)^2} \cdot \left(\bar{x} - \bar{y}\right)^\top S^{-1} \left(\bar{x} - \bar{y}\right) \mathrel{=:} T_b \overset{H_0}{\sim} T^2\left(d, m + n - 2\right), $$ where $T^2\left(d, m + n - 2\right)$ denotes the Hotelling $T$-squared distribution with $d$ and $m + n - 2$ degrees of freedom. Note that $S$ is the MLE for $\Sigma$ (see, e.g., this derivation which can readily be extended to our setting).
The connection to the $F$-distribution is given by $$ T_b \sim T^2\left(d, m + n - 2\right) \iff \frac{m + n -d - 1}{d\left(m + n - 2\right)} \cdot T_b \mathrel{=:} T_c \sim F\left(d, m + n -d -1\right), $$ where $F\left(d, m + n -d -1\right)$ denotes the $F$-distribution with $d$ and $m + n -d -1$ degrees of freedom.


With that you can proceed as usual and compare the realized value of $T_a$ or $T_c$ to the $\left(1-\alpha\right)$-quantile of the $\chi^2\left(d\right)$ or $F\left(d, m + n -d -1\right)$ distribution, respectively, to carry out a hypothesis test at a $\alpha \cdot 100\%$ significance level.

statmerkur
  • 5,950