3

I am trying to find the most efficient way to get the indexes of nested arrays in another array.

import numpy as np
#                     0     1      2     3
haystack = np.array([[1,3],[3,4,],[5,6],[7,8]])
needles  = np.array([[3,4],[7,8]])

Given the arrays contained in needles I want to find their indexes in haystack. In this case 1,3.

I came up with this solution:

 indexes = [idx for idx,elem in enumerate(haystack) if elem in needles ]

Which is wrong because actually is sufficient that one element in elem is in needles to return the idx.

Is there any faster alternative?

G M
  • 17,694
  • 10
  • 75
  • 78
  • `indexes = [idx for idx,elem in enumerate(needles) if elem in haystack ]` This gets indexes in needles, not haystack! – h4z3 Jul 02 '19 at 11:33

3 Answers3

0

this response gives a solution to a similar problem Get intersecting rows across two 2D numpy arrays, you use the np.in1d function which is pretty efficient, but you do that by giving it a view of both arrays which allows is to process them as 1d data array. In your case, you could do

A = np.array([[1,3],[3,4,],[5,6],[7,8]])
B = np.array([[3,4],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
       'formats':ncols * [A.dtype]}
indexes, = np.where(np.in1d(A.view(dtype), B.view(dtype)))

which outputs :

print(indexes)
> array([1, 3])
Ayoub ZAROU
  • 2,367
  • 5
  • 19
0

You can try this

indices = np.apply_along_axis(lambda x: np.where(np.isin(haystack, x).sum(axis=1)==2)[0], 1, needles).flatten()
indices
>>> array([1, 3])
Greeser
  • 76
  • 3
-1

You can directly use the index function as you are only searching on the topmost level of the nesting.

indexes = [[a,haystack.index(a)] for a in needles.tolist()]

Edit: Just leaving this answer to provide an alternative but using core numpy functions such as those found in other answers is probably the better way

DarkElf73
  • 51
  • 8
  • has [3,4] not [3,4]? there must be an error in your anwer... – G M Jul 02 '19 at 12:41
  • I meant that there was an extra **','** at the end which I know is supported in numpy but I don't know if that makes any difference when comparing – DarkElf73 Jul 02 '19 at 12:47
  • No, that doesn't make the difference the problem is that np.array do not have .index method – G M Jul 02 '19 at 12:51