Distance from mean in multivariate statistics

Asked Oct 28 '22 at 12:29

Active Oct 28 '22 at 12:29

Viewed 48 times

I have some code that calculates deviation from the mean for a series of univariate data. It calculates the mean and stddev for a window of data points, and compares the latest datapoint using $\mu$ and $\sigma$.

Pretty simple, my question is how do I extend this to higher dimensions. Basically if my datapoints are now vectors $(x1,x2,x3)$ say, how can I find how far a datapoint is from the mean?

asked Oct 28 '22 at 12:29

pnadeau

If I understand your problem correctly, you need to calculate the covariance matrix for your vectors. You can then use the Mahalanobis distance; see https://en.wikipedia.org/wiki/Mahalanobis_distance. – GCru Oct 28 '22 at 12:59
Ok, so supposing I used scipy for this, https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.mahalanobis.html, then it looks like the scipy implementation compares two vectors, not a vector against a distribution. – pnadeau Oct 28 '22 at 13:58
You can select one of the vectors to be your mean $(\bar{x_1},\bar{x_2}, \bar{x_3}$. The covariance matrix is calculated from the three vectors $\mathbf{x}_1$, $\mathbf{x}_2$, $\mathbf{x}_3$ containing your three data series. – GCru Oct 28 '22 at 14:37
Do I have to find a new basis for the point cloud as illustrated here: https://stats.stackexchange.com/a/62147/350353 or is that what the covariance matrix is doing here? – pnadeau Oct 28 '22 at 17:38
You use the covariance matrix as calculated from your data. – GCru Oct 28 '22 at 18:28

Distance from mean in multivariate statistics

0 Answers0