78

I've seen a few variations on the theme of exploding a column/series into multiple columns of a Pandas dataframe, but I've been trying to do something and not really succeeding with the existing approaches.

Given a DataFrame like so:

    key       val
id
2   foo   oranges
2   bar   bananas
2   baz    apples
3   foo    grapes
3   bar     kiwis

I want to convert the items in the key series into columns, with the val values serving as the values, like so:

        foo        bar        baz
id
2   oranges    bananas     apples
3    grapes      kiwis        NaN

I feel like this is something that should be relatively straightforward, but I've been bashing my head against this for a few hours now with increasing levels of convolution, and no success.

Idan Gazit
  • 2,280
  • 2
  • 19
  • 24

2 Answers2

122

There are a few ways:

using .pivot_table:

>>> df.pivot_table(values='val', index=df.index, columns='key', aggfunc='first')
key      bar     baz      foo
id                           
2    bananas  apples  oranges
3      kiwis     NaN   grapes

using .pivot:

>>> df.pivot(index=df.index, columns='key')['val']
key      bar     baz      foo
id                           
2    bananas  apples  oranges
3      kiwis     NaN   grapes

using .groupby followed by .unstack:

>>> df.reset_index().groupby(['id', 'key'])['val'].aggregate('first').unstack()
key      bar     baz      foo
id                           
2    bananas  apples  oranges
3      kiwis     NaN   grapes
behzad.nouri
  • 69,003
  • 18
  • 120
  • 118
  • 1
    Wunderbar! I was going down the third path and getting tangled, but I had no idea there was an existing metaphor for this in the whole pivot area. Thank you! – Idan Gazit Oct 08 '14 at 12:03
  • ^Same, I thought this was way more complicated, thanks too! – Servus May 30 '21 at 18:17
  • 1
    @behzad.nouri could you please also extend this where there is an additional column you want to keep around? like lets say there was an additional column C in the original? – Tommy Aug 27 '21 at 01:08
  • @Tommy add the columns you want to keep to the index. For example, df.pivot_table(values='val', index=['colA','colB'], columns='key', aggfunc='first') – Juha Palomäki Jan 16 '22 at 21:46
7

You could use set_index and unstack

In [1923]: df.set_index([df.index, 'key'])['val'].unstack()
Out[1923]:
key      bar     baz      foo
id
2    bananas  apples  oranges
3      kiwis    None   grapes

Or, a simplified groupby

In [1926]: df.groupby([df.index, 'key'])['val'].first().unstack()
Out[1926]:
key      bar     baz      foo
id
2    bananas  apples  oranges
3      kiwis    None   grapes
Zero
  • 66,763
  • 15
  • 141
  • 151