2

There is a simple operation that I need to do a few hundred times with very large arrays (thousands of elements), so I need the most efficient solution (for-loops are too slow):

I have two arrays e.g.

a = np.array([1,23,25,100,100,101])
b = np.array([1,2,2,3,4,4])

I would now like to get the sums of all elements in a for which b has the same value. i.e.

[1,48,100,201]

I could do:

#first index of each unique entry in b
u = np.unique(b,return_index=True)[1]
#split array and sum
list(map(sum, np.split(a,u[1:])))

But that's a bit slow, and it only works if the entries in b are sorted. Is there any other way of doing this?

Amirhossein Kiani
  • 3,189
  • 1
  • 7
  • 26

1 Answers1

3

Try:

>>> [a[b==n].sum() for n in np.unique(b)]
[1, 48, 100, 201]

If you're open to using pandas:

>>> pd.DataFrame({"a": a, "b": b}).groupby("b").sum()["a"].tolist()
[1, 48, 100, 201]
not_speshal
  • 20,086
  • 2
  • 13
  • 28