Multidimensional Euclidean Distance in Python

Question

I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy.

Here is my code:

import numpy,scipy;

A=numpy.array([116.629, 7192.6, 4535.66, 279714, 176404, 443608, 295522, 1.18399e+07, 7.74233e+06, 2.85839e+08, 2.30168e+08, 5.6919e+08, 168989, 7.48866e+06, 1.45261e+06, 7.49496e+07, 2.13295e+07, 3.74361e+08, 54.5, 3349.39, 262.614, 16175.8, 3693.79, 205865]);

B=numpy.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151246, 6795630, 4566625, 2.0355328e+08, 1.4250515e+08, 3.2699482e+08, 95635, 4470961, 589043, 29729866, 6124073, 222.3]);

However, I used scipy.spatial.distance.cdist(A[numpy.newaxis,:],B,'euclidean') to calcuate the eucleidan distance.

But it gave me an error

raise ValueError('XB must be a 2-dimensional array.');

I don't seem to understand it.

I looked up scipy.spatial.distance.pdist but don't understand how to use it?

Is there any other better way to do it?

Perhaps [`scipy.spatial.distance.euclidean`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html#scipy.spatial.distance.euclidean)? — Michael Mior, Feb 23 '12 at 14:16
So, you have 2, 24-dimensional points? In that case, @Mr.E's answer is the best option. However, when you have more than 2 points, the various `scipy.spatial.distance` functions will be more efficient. — Joe Kington, Feb 23 '12 at 14:26
I thought perhaps I was missing something. Posted as an answer if that solves your problem. — Michael Mior, Feb 23 '12 at 17:24
I would like to say something about the error you received long time ago and it might help others in need. Reading from the docs both arrays A and B need to have the same dimensions. This means that if your first array A has a 2-dimensional shape (like you defined with `A[numpy.newaxis,:]`) also your second array needs to have the same dimensions. Writing `B[numpy.newaxis,:]` should therefore solve the error. — Julian Gorfer, Sep 19 '20 at 22:36

Michael Mior · Accepted Answer · 2019-02-04T02:32:06.927

24

Perhaps scipy.spatial.distance.euclidean?

Examples

>>> from scipy.spatial import distance
>>> distance.euclidean([1, 0, 0], [0, 1, 0])
1.4142135623730951
>>> distance.euclidean([1, 1, 0], [0, 1, 0])
1.0

edited Feb 04 '19 at 02:32

answered Feb 23 '12 at 17:24

Michael Mior

27,152
8
85
111

score 15 · Answer 2 · answered Feb 23 '12 at 14:15

15

Use either

numpy.sqrt(numpy.sum((A - B)**2))

or more simply

numpy.linalg.norm(A - B)

answered Feb 23 '12 at 14:15

YXD

30,245
14
70
111

Xavier Guihot · Answer 3 · 2019-07-28T05:30:02.317

9

Starting Python 3.8, you can use standard library's math module and its new dist function, which returns the euclidean distance between two points (given as lists or tuples of coordinates):

from math import dist

dist([1, 0, 0], [0, 1, 0]) # 1.4142135623730951

edited Jul 28 '19 at 05:30

answered Jan 15 '19 at 10:46

Xavier Guihot

43,847
17
251
159

1

And it's noticeably faster than scipy's euclidean function! +1 – mauriii Aug 26 '20 at 07:09

score 7 · Answer 4 · answered Feb 23 '12 at 14:25

7

A and B are 2 points in the 24-D space. You should use scipy.spatial.distance.euclidean.

Doc here

scipy.spatial.distance.euclidean(A, B)

answered Feb 23 '12 at 14:25

Ade YU

2,202
3
17
28

score 4 · Answer 5 · answered Dec 12 '17 at 21:29

Since all of the above answers refer to numpy and or scipy, just wanted to point out that something really simple can be done with reduce here

def n_dimensional_euclidean_distance(a, b):
   """
   Returns the euclidean distance for n>=2 dimensions
   :param a: tuple with integers
   :param b: tuple with integers
   :return: the euclidean distance as an integer
   """
   dimension = len(a) # notice, this will definitely throw a IndexError if len(a) != len(b)

   return sqrt(reduce(lambda i,j: i + ((a[j] - b[j]) ** 2), range(dimension), 0))

This will sum all pairs of (a[j] - b[j])^2 for all j in the number of dimensions (note that for simplicity this doesn't support n<2 dimensional distance).

score 4 · Answer 6 · answered Feb 23 '12 at 14:29

Apart from the already mentioned ways of computing the Euclidean distance, here's one that's close to your original code:

scipy.spatial.distance.cdist([A], [B], 'euclidean')

or

scipy.spatial.distance.cdist(np.atleast_2d(A), np.atleast_2d(B), 'euclidean')

This returns a 1×1 np.ndarray holding the L2 distance.

score 1 · Answer 7 · answered Sep 18 '21 at 10:33

Writing your own custom sqaure root sum square is not always safe

You can use math.hypot, numpy.hypot or scipy distance function rather than writing numpy.sqrt(numpy.sum((A - B)**2)) or (i**2 + j**2)**0.5. In your case maybe they can overflow

refer

Speed wise

%%timeit
math.hypot(*(A - B))
# 3 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
numpy.sqrt(numpy.sum((A - B)**2))
# 5.65 µs ± 50.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Safety wise

Underflow

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

Overflow

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

No Underflow

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

No Overflow

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200