How to simply add a column level to a pandas dataframe

Question

let say I have a dataframe that looks like this:

df = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
 df
Out[92]: 
   A  B
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

Asumming that this dataframe already exist, how can I simply add a level 'C' to the column index so I get this:

 df
Out[92]: 
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

I saw SO anwser like this python/pandas: how to combine two dataframes into one with hierarchical column index? but this concat different dataframe instead of adding a column level to an already existing dataframe.

-

Romain · Accepted Answer · 2016-10-24T19:37:03.110

118

As suggested by @StevenG himself, a better answer:

df.columns = pd.MultiIndex.from_product([df.columns, ['C']])

print(df)
#    A  B
#    C  C
# a  0  0
# b  1  1
# c  2  2
# d  3  3
# e  4  4

edited Oct 24 '16 at 19:37

answered Oct 24 '16 at 19:12

Romain

16,760
6
49
57

2

this is great, I like `pd.MultiIndex.from_product([df.columns, ['C']])` which is a bite more trivial since you don't have to keep track of the `len` of `df.columns`. you mind adding it to the answer so I can accept it? – Steven G Oct 24 '16 at 19:31
1

@StevenG great I did not know this trick. Thanks I have learned something new :-) – Romain Oct 24 '16 at 19:38
22

Do you have any tips, how to add another level, when the original df already has multiindex column names? I tried to add new level with from_product() method, however I received this error message: 'NotImplementedError: isnull is not defined for MultiIndex'. – Lenka Vraná Sep 15 '17 at 11:39
2

@LenkaVraná `pd.MultiIndex.from_product(df.columns.levels + [['C']])` – user3556757 Dec 27 '19 at 09:48
@user3556757 this unfortunately did not work for me (unhashable type 'index' or 'list') – ElectRocnic Jan 11 '20 at 12:08
EDIT: got it with `pd.MultiIndex.from_product([pd.Index(['C'])] + df.columns.levels)` (my order is inversed) (don't know what went wrong) – ElectRocnic Jan 11 '20 at 12:31
3

For anyone. I found casting the existing columns index to list before using it in MultiIndex.from_product works for 'isna not implemented'. `pd.MultiIndex.from_product([list(df.columns), ['C']])` – Max Jan 20 '20 at 11:15
Although you then have to flatten the indices. You could use `pd.concat([df], keys=[], names=[''],axis=1)` for the same result. – Max Jan 20 '20 at 11:41

score 26 · Answer 2 · answered Oct 24 '16 at 19:19

26

option 1
set_index and T

df.T.set_index(np.repeat('C', df.shape[1]), append=True).T

option 2
pd.concat, keys, and swaplevel

pd.concat([df], axis=1, keys=['C']).swaplevel(0, 1, 1)

answered Oct 24 '16 at 19:19

piRSquared

265,629
48
427
571

Thanks did not know about swap and this is convinient. I tested it for a large dataframe to see if it was more efficient than setting `pd.MultiIndex.from_product([df.columns, ['C']])` and it was about 25% slower. – Steven G Oct 24 '16 at 19:33
No surprises! Romain's answer is quicker. I added this because I think it's valuable to know. – piRSquared Oct 24 '16 at 19:34
12

`pd.concat([df], axis=1, keys=['C'])` worked very well for multilevel columns – Justislav Bogevolnov Mar 05 '18 at 11:25
1

Option 2 should be the accepted answer for the general case when `df.columns` can be a `pd.MultiIndex`. – Josh Jun 13 '19 at 02:50
The `pd.concat` answer is great because it doesn't modify the original df. – BallpointBen Jul 25 '19 at 17:18
Always watch out with .T since it can cause some disruption to well-typed columns. In general .T-.T transformations are lossy. Using seaborn, take `df = sns.load_dataset("diamonds")` and compare `df.info()` and `df.T.T.info()`; all columns turn into object and memory usage grows five times! – creanion May 26 '22 at 07:58

score 11 · Answer 3 · answered Sep 10 '20 at 12:48

11

A solution which adds a name to the new level and is easier on the eyes than other answers already presented:

df['newlevel'] = 'C'
df = df.set_index('newlevel', append=True).unstack('newlevel')

print(df)
#           A  B
# newlevel  C  C
# a         0  0
# b         1  1
# c         2  2
# d         3  3
# e         4  4

answered Sep 10 '20 at 12:48

mbugert

111
1
4

4

This is short and works also with columns that are already multi-level! As a one liner: `df.assign(newlevel='C').set_index('newlevel', append=True).unstack('newlevel')`. – Michele Piccolini Mar 08 '21 at 14:07
If the dataframe has very many rows, this has a per-row cost which is unnecessary – creanion May 26 '22 at 08:07

score 6 · Answer 4 · edited May 24 '20 at 14:08

6

Another way for MultiIndex (appanding 'E'):

df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[0], 'E', x[1]), df.columns))

   A  B
   E  E
   C  D
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

edited May 24 '20 at 14:08

Itamar Mushkin

2,692
2
14
30

answered Nov 21 '19 at 08:10

Anton Abrosimov

249
2
6

3

A shorter version: `df.columns = pd.MultiIndex.from_tuples([(c[0], 'E', c[1]) for c in df.columns])` – Itamar Mushkin May 24 '20 at 14:14

score 1 · Answer 5 · answered Sep 20 '21 at 08:16

You could just assign the columns like:

>>> df.columns = [df.columns, ['C', 'C']]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>>

Or for unknown length of columns:

>>> df.columns = [df.columns.get_level_values(0), np.repeat('C', df.shape[1])]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>>

mcsoini · Answer 6 · 2022-05-26T07:55:47.860

I like it explicit (using MultiIndex) and chain-friendly (.set_axis):

df.set_axis(pd.MultiIndex.from_product([df.columns, ['C']]), axis=1)

This is particularly convenient when merging DataFrames with different column level numbers, where Pandas (1.4.2) raises a FutureWarning (FutureWarning: merging between different levels is deprecated and will be removed ... ):

import pandas as pd

df1 = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df2 = pd.DataFrame(index=list('abcde'), data=range(10, 15), columns=pd.MultiIndex.from_tuples([("C", "x")]))

# df1:
   A  B
a  0  0
b  1  1

# df2:
    C
    x
a  10
b  11

# merge while giving df1 another column level:
pd.merge(df1.set_axis(pd.MultiIndex.from_product([df1.columns, ['']]), axis=1),
         df2, 
         left_index=True, right_index=True)

# result:
   A  B   C
          x
a  0  0  10
b  1  1  11

How to simply add a column level to a pandas dataframe

6 Answers6

Linked

Related