I have data that I can assume will be multivariate normal with a known mean vector mu and known covariance matrix sigma, and I'm looking to identify points that fall outside the 95% ellipse. I think the proper way to do this is to apply a transform to the data to a standard multivariate normal, compute the euclidean distances to zero, and compare against the value derived from running mvtnorm::qmvnorm(.95,tail='both',mean=mu,sigma=sigma). However, if this is correct (and feel free to correct me if there's a simpler solution to labelling the points outside the 95% ellipse), I'm struggling with the transform part. I know I'd subtract the mean vector mu from the observed data matrix, and I need to do something involving the resulting matrix and sigma, but I'm lost on what. Some code below showing some data and ending where I'm stuck:
mu = c(100,50)
sigma = matrix(c(100,210,210,900),nrow=2)
obs_data = MASS::mvrnorm(1e2,mu=mu,Sigma=sigma)
transformed = t(t(obs_data) - mu)
#what now?
stats::mahalanobis()function. Neat! Quick follow-up @whuber: do compare these distances to the 95% ellipse (or hypersphere) would I be getting the comparison value viamvtnorm::qmvnorm(.95,tail='both',mean=mu,sigma=sigma)ormvtnorm::qmvnorm(.95,tail='both',sigma=diag(2))? (i.e. do I need to tellqmvrnormthe original mean & covariance?) I'm getting very different results between these two. – Mike Lawrence Aug 19 '20 at 20:03stats::mahalanobis()and it shows that the distances should follow the same quantile function as a chi-square distribution with df=ncol(x), so I don't even needqmvnorm. When I useqchisq(.95,df=ncol(dat)), I do indeed observe that 95% of the distances exceed this value. So I wonder whatqmvnorm()is doing or is supposed to be used for? – Mike Lawrence Aug 19 '20 at 20:57qmvnormgives you the half-width of a mean-centered square (or, generally, hypercube) in the original coordinates. In the transformed coordinates that region would correspond to a generalized rhombus. – whuber Aug 19 '20 at 21:02