Pandas From list to multiple columns as Multilevel columns

Question

in my dataframe i have a column named authors. Within this authors column, each cell contains a list of elements. What I want to do, is to split the list into multiple columns.

The reasoning behind this action, is to easily use groupby() and other pandas analysis methods. In particular, my next goal is to see, which author has the most publications in my dataset and which author has published most in which journals.

What I have:

    authors                                 journal
0   ['Savola', 'Petri Heinonen', 'Miller']  2011 Information...
1   ['Mariana Gerber', 'Rossouw von Solms'] Some Journal
2   ['Cyril Onwubiko']                      Some other Journal

What I want:

    authors                                          journal
    0                  1                   2
0   'Savola'           'Petri Heinonen'    'Miller'  '2011 Information...'
1   'Mariana Gerber'   'Rossouw von Solm'  NaN       'Some Journal'
2   'Cyril Onwubiko'   NaN                 NaN       'Some other Journal'

What I've tried so far is creating a new dataframe from the authors column:

df2 = df["authors"].apply(pd.Series)
df2

But I can't get my head around, on how to insert this dataframe into my original dataframe.

How do I get this new df2 as subcolumns into my original dataframe?

`df.join(df.pop('authors').apply(pd.Series))` if you want to keep going with your original approach, although there are faster alternatives — user3483203, Oct 08 '19 at 15:12
What are those faster alternatives? I am interested :O. Also, is there a way to put the "authors" label still above the 0, 1, 2, etc. column names generated with your approach? — Martin Müsli, Oct 08 '19 at 15:23
You would need to define 'subcolumns', but something like this should do the trick: ```pd.concat([df.apply(lambda x: pd.Series({"author{}".format(i): x.authors[i] for i in range(len(x.authors))}), axis=1), df], axis=1)``` — Grzegorz Skibinski, Oct 08 '19 at 15:43

Pandas From list to multiple columns as Multilevel columns

0 Answers0