How can I sum specific elements of an array in python

Question

There is a simple operation that I need to do a few hundred times with very large arrays (thousands of elements), so I need the most efficient solution (for-loops are too slow):

I have two arrays e.g.

a = np.array([1,23,25,100,100,101])
b = np.array([1,2,2,3,4,4])

I would now like to get the sums of all elements in a for which b has the same value. i.e.

[1,48,100,201]

I could do:

#first index of each unique entry in b
u = np.unique(b,return_index=True)[1]
#split array and sum
list(map(sum, np.split(a,u[1:])))

But that's a bit slow, and it only works if the entries in b are sorted. Is there any other way of doing this?

`np.bincount(b-1, a)` or `np.bincount(b, a)` containing the binned sums at index `[bin]`. — Michael Szczesny, Feb 28 '22 at 15:18

not_speshal · Answer 1 · 2022-02-28T15:21:21.510

3

Try:

>>> [a[b==n].sum() for n in np.unique(b)]
[1, 48, 100, 201]

If you're open to using pandas:

>>> pd.DataFrame({"a": a, "b": b}).groupby("b").sum()["a"].tolist()
[1, 48, 100, 201]

edited Feb 28 '22 at 15:21

answered Feb 28 '22 at 14:47

not_speshal

20,086
2
13
28

I could also solve the problem like this: import pandas as pd ; df = pd.DataFrame() ; df["a"] = a ; df["b"] = b ; df.groupby(["b"])["a"].agg("sum") ; but I thought there might be a simpler/faster way – MarthaMuller Feb 28 '22 at 15:13
There was no `pandas` tag to your post, hence my suggested solution. If not, you could just do `df.groupby("a")["b"].sum()` – not_speshal Feb 28 '22 at 15:14
Yes, sorry, I don't mind using pandas if that's the most efficient way. – MarthaMuller Feb 28 '22 at 15:17
1

@MarthaMuller - See the edit! – not_speshal Feb 28 '22 at 15:21

How can I sum specific elements of an array in python

1 Answers1