pandas: aggregate a column of list into one list

Question

I have the following data frame my_df:

name         numbers
----------------------
A             [4,6]
B             [3,7,1,3]
C             [2,5]
D             [1,2,3]

I want to combine all numbers to a new list, so the output should be:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

And here is my code:

def combine_list(my_lists):
    new_list = []
    for x in my_lists:
        new_list.append(x)

    return new_list

new_df = my_df.agg({'numbers': combine_list})

but the new_df still looks the same as original:

              numbers
----------------------
0             [4,6]
1             [3,7,1,3]
2             [2,5]
3             [1,2,3]

What did I do wrong? How do I make new_df like:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

Thanks!

jezrael · Accepted Answer · 2017-10-23T19:43:53.443

4

You need flatten values and then create new Dataframe by constructor:

flatten = [item for sublist in df['numbers'].values.tolist() for item in sublist]

Or:

flatten = np.concatenate(df['numbers'].values).tolist()

Or:

from  itertools import chain

flatten = list(chain.from_iterable(df['numbers'].values.tolist()))

df1 = pd.DataFrame({'numbers':[flatten]})

print (df1)
                             numbers
0  [4, 6, 3, 7, 1, 3, 2, 5, 1, 2, 3]

Timings are here.

edited Oct 23 '17 at 19:43

answered Oct 23 '17 at 19:36

jezrael

729,927
78
1,141
1,090

1

`functools.reduce(lambda x,y: x+y,l)` should be even faster – BENY Oct 23 '17 at 20:01
@Wen - I think it depends of sizeof lists, len of df, but [by here](https://stackoverflow.com/a/953097/2901002) the fastest solution is `chain.from_iterable`. – jezrael Oct 23 '17 at 20:07

score 1 · Answer 2 · answered Oct 23 '17 at 19:39

1

You can use df['numbers'].sum() which returns a combined list to create the new dataframe

new_df = pd.DataFrame({'new_numbers': [df['numbers'].sum()]})

    new_numbers
0   [4, 6, 3, 7, 1, 3, 2, 5, 1, 2, 3]

answered Oct 23 '17 at 19:39

Vaishali

35,413
4
48
78

Unfortunately your solution is [slow](https://stackoverflow.com/a/44753790/2901002) :( – jezrael Oct 23 '17 at 19:42
@jezrael, oh I didn't test the timings – Vaishali Oct 23 '17 at 19:43
@jezrael, yeah I agree, its surprising slow:( – Vaishali Oct 23 '17 at 19:48

score 0 · Answer 3 · answered Oct 23 '17 at 19:46

0

This should do:

newdf = pd.DataFrame({'numbers':[[x for i in mydf['numbers'] for x in i]]})

answered Oct 23 '17 at 19:46

Puneet Tripathi

412
3
15

score 0 · Answer 4 · answered Feb 21 '20 at 10:44

0

Check this pandas groupby and join lists

What you are looking for is,

my_df = my_df.groupby(['name']).agg(sum)

answered Feb 21 '20 at 10:44

Shobha Deepthi

51
5

Please don't post link only answers to other stackoverflow questions. Instead, vote/flag to close as duplicate, or, if the question is not a duplicate, *tailor the answer to this specific question.* – Waqar UlHaq Feb 21 '20 at 11:02

pandas: aggregate a column of list into one list

4 Answers4