6

I am currently using SciPy to calculate the euclidean distance

dis = scipy.spatial.distance.euclidean(A,B)

where; A, B are 5-dimension bit vectors. It works fine now, but if I add weights for each dimension then, is it still possible to use scipy?

What I have now: sqrt((a1-b1)^2 + (a2-b2)^2 +...+ (a5-b5)^2)

What I want: sqrt(w1(a1-b1)^2 + w2(a2-b2)^2 +...+ w5(a5-b5)^2) using scipy or numpy or any other efficient way to do this.

Thanks

Maggie
  • 5,661
  • 8
  • 40
  • 55

3 Answers3

9

The suggestion of writing your own weighted L2 norm is a good one, but the calculation provided in this answer is incorrect. If the intention is to calculate

enter image description here

then this should do the job:

def weightedL2(a,b,w):
    q = a-b
    return np.sqrt((w*q*q).sum())
Community
  • 1
  • 1
talonmies
  • 68,743
  • 34
  • 184
  • 258
1

If you want to keep using scipy function you could pre-process the vector like this.

def weighted_euclidean(a, b, w):
    A = a*np.sqrt(w)
    B = b*np.sqrt(w)
    return scipy.spatial.distance.euclidean(A, B)

However it's look slower than

def weightedL2(a, b, w):
    q = a-b
    return np.sqrt((w*q*q).sum())
ucsky
  • 372
  • 4
  • 11
1

Simply define it yourself. Something like this should do the trick:

def mynorm(A, B, w):
    import numpy as np
    q = np.matrix(w * (A - B))
    return np.sqrt((q * q.T).sum())
wim
  • 302,178
  • 90
  • 548
  • 690
  • 1
    That isn't the norm contained in the question - you have squared the weights. Also the `.sum()` is completely redundant, `q*q.T` is the inner product of the vector with itself, ie. it *is* the sum. – talonmies Jan 14 '12 at 12:05
  • You are correct about the weights, I should have been more careful, however your criticism about the `.sum()` being completely redundant is misguided. The result of `q * q.T` would be a 1x1 matrix, which would be an unexpected return type for a norm function, the sum will turn it into a scalar. – wim Jan 14 '12 at 14:02
  • But why use `sum()` to cast to a scalar? `np.asscalar` will be several times faster`? – talonmies Jan 14 '12 at 14:14
  • I don't know the reason, but that is how it is implemented in `scipy.spatial.distance.euclidean` .. I just assume the authors of scipy know what's best – wim Jan 14 '12 at 14:52