1

If I have 2 Series objects, like so: [0,0,1] [1,0,0] How would I get the intersection and union of the two? They only contain booleans which means they are non-unique values.

I have a large Boolean matrix. I've minhashed it and now I'm trying to find the false positives and negatives which I think means that I have to get the Jaccard similarity for each original pair.

user3927312
  • 784
  • 1
  • 7
  • 27

1 Answers1

1

Since you say they are booleans use logical_and and logical_or of numpy or & and | on series i.e

y1 = pd.Series([1,0,1,0])
y2 = pd.Series([1,0,0,1])

# Numpy approach 
intersection = np.logical_and(y1.values, y2.values)
union = np.logical_or(y1.values, y2.values)
intersection.sum() / union.sum()
# 0.33333333333333331

# Pandas approach 
sum(y1 & y2) / sum(y1 | y2)
# 0.33333333333333331
Bharath
  • 28,527
  • 5
  • 52
  • 95