3

I have the following data frame my_df:

name         numbers
----------------------
A             [4,6]
B             [3,7,1,3]
C             [2,5]
D             [1,2,3]

I want to combine all numbers to a new list, so the output should be:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

And here is my code:

def combine_list(my_lists):
    new_list = []
    for x in my_lists:
        new_list.append(x)

    return new_list

new_df = my_df.agg({'numbers': combine_list})

but the new_df still looks the same as original:

              numbers
----------------------
0             [4,6]
1             [3,7,1,3]
2             [2,5]
3             [1,2,3]

What did I do wrong? How do I make new_df like:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

Thanks!

Edamame
  • 20,574
  • 59
  • 165
  • 291

4 Answers4

4

You need flatten values and then create new Dataframe by constructor:

flatten = [item for sublist in df['numbers'].values.tolist() for item in sublist]

Or:

flatten = np.concatenate(df['numbers'].values).tolist()

Or:

from  itertools import chain

flatten = list(chain.from_iterable(df['numbers'].values.tolist()))

df1 = pd.DataFrame({'numbers':[flatten]})

print (df1)
                             numbers
0  [4, 6, 3, 7, 1, 3, 2, 5, 1, 2, 3]

Timings are here.

jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • 1
    `functools.reduce(lambda x,y: x+y,l)` should be even faster – BENY Oct 23 '17 at 20:01
  • @Wen - I think it depends of sizeof lists, len of df, but [by here](https://stackoverflow.com/a/953097/2901002) the fastest solution is `chain.from_iterable`. – jezrael Oct 23 '17 at 20:07
1

You can use df['numbers'].sum() which returns a combined list to create the new dataframe

new_df = pd.DataFrame({'new_numbers': [df['numbers'].sum()]})

    new_numbers
0   [4, 6, 3, 7, 1, 3, 2, 5, 1, 2, 3]
Vaishali
  • 35,413
  • 4
  • 48
  • 78
0

This should do:

newdf = pd.DataFrame({'numbers':[[x for i in mydf['numbers'] for x in i]]})

Puneet Tripathi
  • 412
  • 3
  • 15
0

Check this pandas groupby and join lists

What you are looking for is,

my_df = my_df.groupby(['name']).agg(sum)

  • Please don't post link only answers to other stackoverflow questions. Instead, vote/flag to close as duplicate, or, if the question is not a duplicate, *tailor the answer to this specific question.* – Waqar UlHaq Feb 21 '20 at 11:02