How to sum a column grouped by other columns in a list?

Question

I have a list as follows.

[['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'], ['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']]

I would like sum up the last column grouped by the other columns.The result is like this

[['Andrew', '1', '17'], ['Peter', '1', '21'], ['Sam', '4', '9'], ['Andrew', '2', '2']]

which is still a list.

In real practice, I would always like to sum up the last column grouped by many other columns. Is there a way I can do this in Python? Much appreciated.

MaxU - stop genocide of UA · Accepted Answer · 2018-03-28T13:54:51.240

14

dynamically grouping by all columns except the last one:

In [24]: df = pd.DataFrame(data)

In [25]: df.groupby(df.columns[:-1].tolist(), as_index=False).agg(lambda x: x.astype(int).sum()).values.tolist()
Out[25]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

edited Mar 28 '18 at 13:54

answered Mar 28 '18 at 13:51

MaxU - stop genocide of UA

191,778
30
340
375

Does [0,1] represents the first column and the second column? – Deepleeqe Mar 28 '18 at 13:53
@Deepleeqe, yes, i've also added another version where it's done dynamically for any number of columns... – MaxU - stop genocide of UA Mar 28 '18 at 13:54

jpp · Answer 2 · 2018-07-06T11:03:33.050

This is an O(n) solution via collections.defaultdict, adaptable to any number of keys.

If your desired output is a list, then this may be preferable to a solution via Pandas, which requires conversion to and from a non-standard type.

from collections import defaultdict

lst = [['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'],
       ['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']]

d = defaultdict(int)

for *keys, val in lst:
    d[tuple(keys)] += int(val)

res = [[*k, v] for k, v in sorted(d.items())]

Result

[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

Explanation

Cycle through your list of lists, define keys / value and add to your defaultdict of lists.
Use a list comprehension to convert dictionary to desired output.

BENY · Answer 3 · 2018-03-28T14:13:39.993

Op1

You can pass a index sum and add tolist convert back to list

pd.DataFrame(L).\
   set_index([0,1])[2].astype(int).sum(level=[0,1]).\
        reset_index().values.tolist()
Out[78]: [['Andrew', '1', 17], ['Peter', '1', 21], ['Sam', '4', 9], ['Andrew', '2', 2]]

Op2

For list of list you can using groupby from itertools

from itertools import groupby
[k+[sum(int(v) for _,_, v in g)] for k, g in groupby(sorted(l), key = lambda x: [x[0],x[1]])]
Out[98]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

jezrael · Answer 4 · 2018-03-28T14:24:48.527

Create to DataFrame and aggregate third column converted to integers by first and second columns, last convert back to lists:

df = pd.DataFrame(L)
L = df[2].astype(int).groupby([df[0], df[1]]).sum().reset_index().values.tolist()
print (L)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

And solution with defaultdict, python 3.x only:

from collections import defaultdict

d = defaultdict(int)
#https://stackoverflow.com/a/10532492
for *head, tail in L:
    d[tuple(head)] += int(tail)

d = [[*i, j] for i, j in sorted(d.items())]
print (d)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

score 6 · Answer 5 · answered Mar 28 '18 at 16:19

`pd.factorize` and `np.bincount`

f, u = pd.factorize([tuple(x[:-1]) for x in data])
v = np.array([x[-1] for x in data], int)

[list(k) + [int(v)] for k, v in zip(u, np.bincount(f, v))]

[['Andrew', '1', 17], ['Peter', '1', 21], ['Sam', '4', 9], ['Andrew', '2', 2]]

prosti · Answer 6 · 2019-05-06T23:00:26.687

Added my 2 cents. Both of these use groupby, agg

V1: introduces the new sum function.

sum=lambda x: x.astype(int).sum()
print(df.groupby([0,1], as_index=False).agg({2: sum}).values.tolist())

V2: converts the column to numeric

df[2] = pd.to_numeric(df[2])
print(df.groupby([0,1], as_index=False).agg({2: sum}).values.tolist())

And will return:

[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

How to sum a column grouped by other columns in a list?

6 Answers6

pd.factorize and np.bincount

`pd.factorize` and `np.bincount`