0

I have samples from a $d$ dimensional distribution $p$. The distribution of $p$ is unknown. I want to use the samples to judge whether or not the $p$ is close to a standard unit Gaussian distribution. I believe there should be some standard approaches for this question, but unfortunately, I do not find a solution for high-dimensional cases. BY high dimensional, I mean $d$ is several thousand.

The only approach in my mind is doing one dimension testing with projecting data to low dimension.

  • It depends what exactly you're looking for. If high-dimensional data is distributed multivariate normal, then each variable is normal. So to test for MVN, you would just test each dimension. A second consideration is independence. To be MVN normal your variables should be i.i.d.. It also matters why you want to know. What kind of analysis are you going to run on the data? – Tanner Phillips Apr 13 '21 at 16:27
  • @Tanner I am transforming a batch of images to noised image, hoping noised image is pure Gaussian noise. I can not test one dimension after one dimension, since the correction between pixels matters. – Qinsheng Zhang Apr 13 '21 at 17:08
  • hmm. That's pretty far outside of my domain expertise, so I'm not sure how much help I will be. But shouldn't the function you use to noise the data guarantee a certain distribution? Sorry if that is nonsense, I may be out of my leauge. – Tanner Phillips Apr 13 '21 at 17:13
  • @TannerPhillips Variables should be iid? I disagree. There’s nothing wrong with having a non-diagonal covariance matrix, and there nothing wrong with having different means or variances in the marginal distributions. – Dave Apr 14 '21 at 02:17
  • One problem with projecting data is that low-dimensional projections of high-dimensional data tend to be close to Normal even if the high-dimensional data aren't, basically by the Central Limit Theorem. (https://projecteuclid.org/journals/annals-of-statistics/volume-12/issue-3/Asymptotics-of-Graphical-Projection-Pursuit/10.1214/aos/1176346703.full) – Thomas Lumley Apr 14 '21 at 03:20
  • The function I use to noise the data is a neural network. In other words, I have a very complex noising function, we do not have a closed-form or any nice properties we can take advantage of. We want to check whether or not my transformed data is MVN. It is easy if $d$ is 1 or 2, but for high dimension, it is not easy to check with intuition. – Qinsheng Zhang Apr 14 '21 at 04:39

1 Answers1

1

From a practical standpoint, high-dimensional data will never be multinormal (unless generated by simulation). So always reject will be a fairly good test! See the answer at Standard Gaussianity test for high dimensional data

A fairly new literature review and there are some R packages at CRAN MVN and mvnormalTest.