76

let say I have a dataframe that looks like this:

df = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
 df
Out[92]: 
   A  B
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

Asumming that this dataframe already exist, how can I simply add a level 'C' to the column index so I get this:

 df
Out[92]: 
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

I saw SO anwser like this python/pandas: how to combine two dataframes into one with hierarchical column index? but this concat different dataframe instead of adding a column level to an already existing dataframe.

-

Community
  • 1
  • 1
Steven G
  • 14,602
  • 6
  • 47
  • 72

6 Answers6

118

As suggested by @StevenG himself, a better answer:

df.columns = pd.MultiIndex.from_product([df.columns, ['C']])

print(df)
#    A  B
#    C  C
# a  0  0
# b  1  1
# c  2  2
# d  3  3
# e  4  4
Romain
  • 16,760
  • 6
  • 49
  • 57
  • 2
    this is great, I like `pd.MultiIndex.from_product([df.columns, ['C']])` which is a bite more trivial since you don't have to keep track of the `len` of `df.columns`. you mind adding it to the answer so I can accept it? – Steven G Oct 24 '16 at 19:31
  • 1
    @StevenG great I did not know this trick. Thanks I have learned something new :-) – Romain Oct 24 '16 at 19:38
  • 22
    Do you have any tips, how to add another level, when the original df already has multiindex column names? I tried to add new level with from_product() method, however I received this error message: 'NotImplementedError: isnull is not defined for MultiIndex'. – Lenka Vraná Sep 15 '17 at 11:39
  • 2
    @LenkaVraná `pd.MultiIndex.from_product(df.columns.levels + [['C']])` – user3556757 Dec 27 '19 at 09:48
  • @user3556757 this unfortunately did not work for me (unhashable type 'index' or 'list') – ElectRocnic Jan 11 '20 at 12:08
  • EDIT: got it with `pd.MultiIndex.from_product([pd.Index(['C'])] + df.columns.levels)` (my order is inversed) (don't know what went wrong) – ElectRocnic Jan 11 '20 at 12:31
  • 3
    For anyone. I found casting the existing columns index to list before using it in MultiIndex.from_product works for 'isna not implemented'. `pd.MultiIndex.from_product([list(df.columns), ['C']])` – Max Jan 20 '20 at 11:15
  • Although you then have to flatten the indices. You could use `pd.concat([df], keys=[], names=[''],axis=1)` for the same result. – Max Jan 20 '20 at 11:41
26

option 1
set_index and T

df.T.set_index(np.repeat('C', df.shape[1]), append=True).T

option 2
pd.concat, keys, and swaplevel

pd.concat([df], axis=1, keys=['C']).swaplevel(0, 1, 1)

enter image description here

piRSquared
  • 265,629
  • 48
  • 427
  • 571
  • Thanks did not know about swap and this is convinient. I tested it for a large dataframe to see if it was more efficient than setting `pd.MultiIndex.from_product([df.columns, ['C']])` and it was about 25% slower. – Steven G Oct 24 '16 at 19:33
  • No surprises! Romain's answer is quicker. I added this because I think it's valuable to know. – piRSquared Oct 24 '16 at 19:34
  • 12
    `pd.concat([df], axis=1, keys=['C'])` worked very well for multilevel columns – Justislav Bogevolnov Mar 05 '18 at 11:25
  • 1
    Option 2 should be the accepted answer for the general case when `df.columns` can be a `pd.MultiIndex`. – Josh Jun 13 '19 at 02:50
  • The `pd.concat` answer is great because it doesn't modify the original df. – BallpointBen Jul 25 '19 at 17:18
  • Always watch out with .T since it can cause some disruption to well-typed columns. In general .T-.T transformations are lossy. Using seaborn, take `df = sns.load_dataset("diamonds")` and compare `df.info()` and `df.T.T.info()`; all columns turn into object and memory usage grows five times! – creanion May 26 '22 at 07:58
11

A solution which adds a name to the new level and is easier on the eyes than other answers already presented:

df['newlevel'] = 'C'
df = df.set_index('newlevel', append=True).unstack('newlevel')

print(df)
#           A  B
# newlevel  C  C
# a         0  0
# b         1  1
# c         2  2
# d         3  3
# e         4  4
mbugert
  • 111
  • 1
  • 4
  • 4
    This is short and works also with columns that are already multi-level! As a one liner: `df.assign(newlevel='C').set_index('newlevel', append=True).unstack('newlevel')`. – Michele Piccolini Mar 08 '21 at 14:07
  • If the dataframe has very many rows, this has a per-row cost which is unnecessary – creanion May 26 '22 at 08:07
6

Another way for MultiIndex (appanding 'E'):

df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[0], 'E', x[1]), df.columns))

   A  B
   E  E
   C  D
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
Itamar Mushkin
  • 2,692
  • 2
  • 14
  • 30
Anton Abrosimov
  • 249
  • 2
  • 6
1

You could just assign the columns like:

>>> df.columns = [df.columns, ['C', 'C']]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>> 

Or for unknown length of columns:

>>> df.columns = [df.columns.get_level_values(0), np.repeat('C', df.shape[1])]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>> 
U12-Forward
  • 65,118
  • 12
  • 70
  • 89
1

I like it explicit (using MultiIndex) and chain-friendly (.set_axis):

df.set_axis(pd.MultiIndex.from_product([df.columns, ['C']]), axis=1)

This is particularly convenient when merging DataFrames with different column level numbers, where Pandas (1.4.2) raises a FutureWarning (FutureWarning: merging between different levels is deprecated and will be removed ... ):

import pandas as pd

df1 = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df2 = pd.DataFrame(index=list('abcde'), data=range(10, 15), columns=pd.MultiIndex.from_tuples([("C", "x")]))

# df1:
   A  B
a  0  0
b  1  1

# df2:
    C
    x
a  10
b  11

# merge while giving df1 another column level:
pd.merge(df1.set_axis(pd.MultiIndex.from_product([df1.columns, ['']]), axis=1),
         df2, 
         left_index=True, right_index=True)

# result:
   A  B   C
          x
a  0  0  10
b  1  1  11


mcsoini
  • 5,232
  • 1
  • 10
  • 36