Renaming column names in Pandas

Question

How do I change the column labels of a pandas DataFrame from:

['$a', '$b', '$c', '$d', '$e']

to

['a', 'b', 'c', 'd', 'e'].

You might want to go check out the official docs which cover renaming column labels: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html — ccpizza, Dec 19 '19 at 07:05

score 3898 · Answer 1 · edited Jun 20 '20 at 09:12

3898

RENAME SPECIFIC COLUMNS

Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

Minimal Code Example

df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df

   a  b  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

The following methods all work and produce the same output:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)  # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'})  # old method  

df2

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified. See v0.25 rename() docs.

REASSIGN COLUMN HEADERS

Use df.set_axis() with axis=1 and inplace=False (to return a copy).

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).

You can also assign headers directly:

df.columns = ['V', 'W', 'X', 'Y', 'Z']
df

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 06 '12 at 01:48

lexual

40,646
2
12
13

2

when I do this with a 6 column data frame (dataframe ) the abbreviated representation:`code` Int64Index: 1000 entries, 0 to 999 Data columns: BodyMarkdown 1000 non-null `code` works, but when i do dataframe.head() the old names for the columns re-appear. – darKoram Sep 10 '12 at 22:39
15

I get the dreaded `SettingWithCopyWarning:` when I use the second code snippet in this answer. – Monica Heddneck Aug 18 '16 at 19:47
is there a version of this with regex replacement? – denfromufa Nov 10 '16 at 17:33
@lexual What if two existing columns have the same name ? How do I refer to the old column name? – vagabond Jan 09 '17 at 22:40
21

The first solution : `df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})` changes the name displayed, but _not_ elements in the underlying data structure. So if you try `df['newName1']` you'll get an error. The `inplace=True` is necessary to avoid that gotchya. – irritable_phd_syndrome Jul 14 '17 at 13:24
Interesting moment: would using `inplace` option be faster than reassigning (`df = df.rename(...)`) ? – Mikhail_Sam Dec 27 '17 at 12:04
@MonicaHeddneck The reason for getting the SettingWithCopyWarning is because you subsetted your DataFrame improperly from a larger DataFrame and then tried to modify it in-place. I've explained the reason in-depth in [my answer here](https://stackoverflow.com/a/53954986/4909087) (specifically, scroll down to the section on XY problems and take a look at "Question 4"). – cs95 May 25 '19 at 03:44
Important note here, the rename solution fails if *_for some reason you have multiple columns with the same name_*. Ideally you shouldn't ever have that scenario, but I found that overwriting the whole list of column names to work the best there – Vince Sep 07 '20 at 05:29
If columns are not strings, doing `df.columns = ['V', 'W', 'X', 'Y', 'Z']` see,s to be the only working approach. Doing `.unstack().reset_index()` will result columns as tuples. – Jari Turkia Aug 26 '21 at 11:14
The method works, however It shows the warning: how to avoid it? C:\Users\hp\Anaconda3\envs\geocube\lib\site-packages\pandas\core\frame.py:4300: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().rename( – Abhilash Singh Chauhan Sep 13 '21 at 06:14
1

`df = df.copy().rename(columns={ 'old': 'new_name'})` to avoid the SettingWithCopyWarning: A value is trying to be set on a copy <== odd English. So first make a copy of the entire dataframe, do the rename, then assign it, overwriting the original entirely I presume. – gseattle Jan 07 '22 at 12:16
`inplace` will be deprecated, probably: https://www.dataschool.io/future-of-pandas/#inplace – PatrickT Jan 08 '22 at 07:01

score 2420 · Accepted Answer · edited Dec 12 '20 at 17:30

2420

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
   $a  $b
0   1  10
1   2  20

>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

edited Dec 12 '20 at 17:30

Kostas Minaidis

2,819
1
14
22

answered Jul 05 '12 at 14:23

eumiro

194,053
32
286
259

386

Is it possible to change a single column header name? – ericmjl Jun 26 '13 at 17:55
145

@ericmjl: suppose you want to change the name of the first variable of df. Then you can do something like: `new_columns = df.columns.values; ` `new_columns[0] = 'XX';` `df.columns = new_columns` – cd98 Nov 20 '13 at 14:18
71

Looks like you could've simply done df.columns.values[0]='XX' – RAY Mar 10 '14 at 07:22
32

Just kidding, @RAY - don't do that. Looks like that's a list generated independent of whatever indexing stores the column name. Does a nice job destroying column naming for your df... – Mitch Flax Mar 11 '14 at 18:42
553

@ericmjl yes `df.rename(columns = {'$b':'B'}, inplace = True)` – nachocab Sep 11 '15 at 22:30
@RAY I don't see it. `df.columns[0]` and `df.columns.values[0]` are the same object. – ilyas patanam Jan 31 '16 at 20:31
1

Pandas can't be modified so that `columns[1] = 'foo'` works? – endolith Jul 04 '16 at 02:26
4

This approach requires you to refer every existing column name. Not very practical when dealing with DataFrames with a lot of columns – alfredocambera Dec 09 '16 at 21:52
7

This approach is fragile. See instead [@lexual's answer](https://stackoverflow.com/a/11354850/1165940) and others, below. Pandas provides the `rename` method for a reason. – Andrew Oct 01 '17 at 15:59
This only works when you specify all column names after the change, even the ones you did not want to modify. @lexual's answer is better. – Ryszard Cetnarski Sep 18 '18 at 13:10
5

This is very fragile; it requires specifying all columns even when you only want to change a handful, and also is not tied to the column semantics, but rather to their (immaterial) position. You definitely should use `rename` and provide a mapping for columns you want to change. In this case, `{c: c.lstrip('$') for c in df.columns}` would be much better. – BallpointBen Dec 12 '18 at 01:47
3

@cd98, why not just ```new_columns = df.columns;``` instead of ```new_columns = df.columns.values;```? – Gathide May 11 '20 at 11:07
@nachocab: `inplace=True` will be deprecated, so we'd better get used to not using it. – PatrickT Jan 08 '22 at 06:58

score 480 · Answer 3 · edited Oct 20 '19 at 22:06

480

The rename method can take a function, for example:

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

edited Oct 20 '19 at 22:06

smci

29,564
18
109
144

answered May 21 '13 at 09:58

Andy Hayden

328,850
93
598
514

66

Nice. This one saved my day: `df.rename(columns=lambda x: x.lstrip(), inplace=True)` – root-11 Oct 21 '13 at 22:05
2

Similar to @root-11 -- in my case there was a bullet point character that was not printed in IPython console output, so I needed to remove more than just whitespace (stripe), so : `t.columns = t.columns.str.replace(r'[^\x00-\x7F]+','')` – The Red Pea Nov 05 '15 at 06:30
13

`df.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)` is a gem so that we can write `df.Column_1_Name` instead of writing`df.loc[:, 'Column 1 Name']` . – Little Bobby Tables Dec 16 '16 at 15:40
How is this not the preferred solution? Only this allows processing a large amount of features names, for instance to allow for dot notation by removing/replacing spaces in the labels, as demonstrated by @LittleBobbyTables – error404 Feb 17 '22 at 13:52

score 243 · Answer 4 · edited Feb 13 '21 at 05:21

243

As documented in Working with text data:

df.columns = df.columns.str.replace('$', '')

edited Feb 13 '21 at 05:21

Peter Mortensen

30,030
21
100
124

answered May 30 '15 at 13:24

kadee

6,792
1
37
28

score 188 · Answer 5 · edited Nov 17 '17 at 19:31

Pandas 0.21+ Answer

There have been some significant updates to column renaming in version 0.21.

The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.

Examples for Pandas 0.21+

Construct sample DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

Using `rename` with `axis='columns'` or `axis=1`

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

or

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

Both result in the following:

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

It is still possible to use the old method signature:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

The rename function also accepts functions that will be applied to each column name.

df.rename(lambda x: x[1:], axis='columns')

or

df.rename(lambda x: x[1:], axis=1)

Using `set_axis` with a list and `inplace=False`

You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

or

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

Why not use `df.columns = ['a', 'b', 'c', 'd', 'e']`?

There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.

The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

Thank you for the `Pandas 0.21+ answer` - somehow i missed that part in the "what's new" part... — MaxU - stop genocide of UA, Nov 22 '17 at 13:27
The solution does not seem to work for Pandas 3.6: df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns'). Gets an unexpected keyword argument "axis" — Arthur D. Howland, Apr 04 '18 at 18:43
df.columns = ['a', 'b', 'c', 'd', 'e'] seems not to work anymore, working with version 0.22 I have a warning saying *Pandas doesn't allow columns to be created via a new attribute name* . how to rename if all my columns are called the same :/ — Nabla, Apr 13 '18 at 02:40
Is there a way to rename one, multiple or all columns, if you don't know the name of the column(s) beforehand but just their index? Thanks! — tommy.carstensen, Aug 17 '18 at 12:19
this was a very helpful comment. for example, the lambda function answered my question of how to do the following: `(df .groupby(['page',pd.Grouper(key='date',freq='MS')])['clicks'].sum() .unstack(1) .rename(lambda x: x.strftime("%Y-%m"), axis='columns') )` — measureallthethings, Dec 07 '18 at 18:32

score 148 · Answer 6 · answered Mar 26 '14 at 10:20

148

Since you only want to remove the $ sign in all column names, you could just do:

df = df.rename(columns=lambda x: x.replace('$', ''))

OR

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

answered Mar 26 '14 at 10:20

paulo.filip3

3,041
1
23
28

1

This one not only helps in OP's case but also in generic requirements. E.g.: to split a column name by a separator and use one part of it. – Deepak Nov 20 '18 at 09:24

score 128 · Answer 7 · edited Feb 13 '21 at 06:01

128

Renaming columns in Pandas is an easy task.

df.rename(columns={'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}, inplace=True)

edited Feb 13 '21 at 06:01

Peter Mortensen

30,030
21
100
124

answered May 08 '20 at 12:34

Nirali Khoda

1,531
1
6
25

2

I will up this since It is naturally supported. – lkahtz Feb 10 '21 at 16:15
1

much better than approved solution – slisnychyi May 24 '21 at 10:31
1

The `columns` arg here can also be a function. So if you want to remove the first char from each name you can do `df.rename(columns=lambda name: name[1:], inplace=True)` ([ref](https://pandas.pydata.org/docs/user_guide/basics.html#basics-rename)) – aschmied Sep 06 '21 at 17:50
1

It's very natural. You can do it for arbitrary columns. It should be an accepted answer. – Shaida Muhammad Nov 04 '21 at 05:33
also give a label to an unlabelled column using this method: df.rename(columns={0: "x", 1: "y", 2: "z"}) – ZakS Feb 09 '22 at 12:18

score 90 · Answer 8 · edited Oct 12 '18 at 05:45

90

df.columns = ['a', 'b', 'c', 'd', 'e']

It will replace the existing names with the names you provide, in the order you provide.

edited Oct 12 '18 at 05:45

Mike_K

8,310
5
19
27

answered Mar 22 '16 at 08:59

M PAUL

1,178
1
11
20

5

Do not modify `df.columns.values`, that's wrong. https://stackoverflow.com/questions/43291781/after-rename-column-get-keyerror – llllllllll May 17 '18 at 08:53
This is exactly what I was looking for! Thanks! – RAM237 Feb 01 '21 at 11:41

score 76 · Answer 9 · edited Feb 13 '21 at 05:20

76

Use:

old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

This way you can manually edit the new_names as you wish. It works great when you need to rename only a few columns to correct misspellings, accents, remove special characters, etc.

edited Feb 13 '21 at 05:20

Peter Mortensen

30,030
21
100
124

answered May 21 '15 at 17:48

migloo

760
5
4

3

I like this approach, but I think `df.columns = ['a', 'b', 'c', 'd', 'e']` is simpler. – Christopher Pearson Jun 22 '15 at 22:05
2

I like this method of zipping old and new names. We can use `df.columns.values` to get the old names. – bkowshik Jul 20 '15 at 07:18
1

I display the tabular view and copy the columns to old_names. I copy the requirement array to new_names. Then use dict(zip(old_names, new_names)) Very elegant solution. – mythicalcoder Oct 27 '16 at 13:59
I often use subsets of lists from something like: `myList = list(df) myList[10:20]` , etc - so this is perfect. – Tim Gottgetreu Jul 12 '17 at 23:12
1

Best to take the old names as @bkowshik suggested, then edit them and re-insert them, ie `namez = df.columns.values` followed by some edits, then `df.columns = namez`. – pauljohn32 Jan 17 '20 at 18:27

score 41 · Answer 10 · edited Feb 13 '21 at 05:35

Column names vs Names of Series

I would like to explain a bit what happens behind the scenes.

Dataframes are a set of Series.

Series in turn are an extension of a numpy.array.

numpy.arrays have a property .name.

This is the name of the series. It is seldom that Pandas respects this attribute, but it lingers in places and can be used to hack some Pandas behaviors.

Naming the list of columns

A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.

This is what happens if you decide to fill in the name of the columns Series:

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index
0                                    4           1
1                                    5           2
2                                    6           3

Note that the name of the index always comes one column lower.

Artefacts that linger

The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.

If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'.

BUT

pd.DataFrame(df.one) will return

Because Pandas reuses the .name of the already defined Series.

Multi-level column names

Pandas has ways of doing multi-layered column names. There is not so much magic involved, but I wanted to cover this in my answer too since I don't see anyone picking up on this here.

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

This is easily achievable by setting columns to lists, like this:

df.columns = [['one', 'one'], ['one', 'two']]

score 41 · Answer 11 · edited Feb 13 '21 at 05:44

One line or Pipeline solutions

I'll focus on two things:

OP clearly states

I have the edited column names stored it in a list, but I don't know how to replace the column names.

I do not want to solve the problem of how to replace '$' or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.
df.columns = new where new is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe's columns attribute and it isn't done inline. I'll show a few ways to perform this via pipelining without editing the existing dataframe.

Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I'll create a new sample dataframe df with initial column names and unrelated new column names.

df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']

df

   Jack  Mahesh  Xin
0     1       3    5
1     2       4    6

Solution 1
pd.DataFrame.rename

It has been said already that if you had a dictionary mapping the old column names to new column names, you could use pd.DataFrame.rename.

d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)

   x098  y765  z432
0     1     3     5
1     2     4     6

However, you can easily create that dictionary and include it in the call to rename. The following takes advantage of the fact that when iterating over df, we iterate over each column name.

# Given just a list of new column names
df.rename(columns=dict(zip(df, new)))

   x098  y765  z432
0     1     3     5
1     2     4     6

This works great if your original column names are unique. But if they are not, then this breaks down.

Setup 2
Non-unique columns

df = pd.DataFrame(
    [[1, 3, 5], [2, 4, 6]],
    columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']

df

   Mahesh  Mahesh  Xin
0       1       3    5
1       2       4    6

Solution 2
pd.concat using the keys argument

First, notice what happens when we attempt to use solution 1:

df.rename(columns=dict(zip(df, new)))

   y765  y765  z432
0     1     3     5
1     2     4     6

We didn't map the new list as the column names. We ended up repeating y765. Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df.

pd.concat([c for _, c in df.items()], axis=1, keys=new) 

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 3
Reconstruct. This should only be used if you have a single dtype for all columns. Otherwise, you'll end up with dtype object for all columns and converting them back requires more dictionary work.

Single dtype

pd.DataFrame(df.values, df.index, new)

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype

pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 4
This is a gimmicky trick with transpose and set_index. pd.DataFrame.set_index allows us to set an index inline, but there is no corresponding set_columns. So we can transpose, then set_index, and transpose back. However, the same single dtype versus mixed dtype caveat from solution 3 applies here.

Single dtype

df.T.set_index(np.asarray(new)).T

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype

df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 5
Use a lambda in pd.DataFrame.rename that cycles through each element of new.
In this solution, we pass a lambda that takes x but then ignores it. It also takes a y but doesn't expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.

df.rename(columns=lambda x, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

And as pointed out to me by the folks in sopython chat, if I add a * in between x and y, I can protect my y variable. Though, in this context I don't believe it needs protecting. It is still worth mentioning.

df.rename(columns=lambda x, *, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

Maybe we can add `df.rename(lambda x : x.lstrip('$'),axis=1) ` — BENY, Oct 12 '18 at 15:59
Hi @piRSquared, would you be able to elaborate on how pandas uses the lambda function in Solution 5 please? I don't quite follow what you mean when you say `x` is ignored? — Josmoor98, May 03 '19 at 19:19

score 30 · Answer 12 · edited Feb 13 '21 at 05:58

30

Let's understand renaming by a small example...

Renaming columns using mapping:

 df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) # Creating a df with column name A and B
 df.rename({"A": "new_a", "B": "new_b"}, axis='columns', inplace =True) # Renaming column A with 'new_a' and B with 'new_b'

 Output:

    new_a  new_b
 0  1       4
 1  2       5
 2  3       6

Renaming index/Row_Name using mapping:

 df.rename({0: "x", 1: "y", 2: "z"}, axis='index', inplace =True) # Row name are getting replaced by 'x', 'y', and 'z'.

 Output:

        new_a  new_b
     x  1       4
     y  2       5
     z  3       6

edited Feb 13 '21 at 05:58

Peter Mortensen

30,030
21
100
124

answered Mar 08 '20 at 05:35

Amar Kumar

1,880
2
19
32

2

In my view this is generally the safest method since it reduces the risk of making an error with the order of the column names. – A Rob4 May 12 '21 at 06:49

score 28 · Answer 13 · answered May 10 '21 at 08:17

28

Suppose your dataset name is df, and df has.

df = ['$a', '$b', '$c', '$d', '$e']`

So, to rename these, we would simply do.

df.columns = ['a','b','c','d','e']

answered May 10 '21 at 08:17

Sushan Bastola

433
4
10

Simple , elegant solution – Tokci Jan 28 '22 at 15:50
this must best answer – Orlando May 01 '22 at 20:05

score 25 · Answer 14 · edited Oct 31 '19 at 11:57

Let's say this is your dataframe.

You can rename the columns using two methods.

Using dataframe.columns=[#list]
```
df.columns=['a','b','c','d','e']
```
The limitation of this method is that if one column has to be changed, full column list has to be passed. Also, this method is not applicable on index labels. For example, if you passed this:
```
df.columns = ['a','b','c','d']
```
This will throw an error. Length mismatch: Expected axis has 5 elements, new values have 4 elements.
Another method is the Pandas rename() method which is used to rename any index, column or row
```
df = df.rename(columns={'$a':'a'})
```

Similarly, you can change any rows or columns.

score 21 · Answer 15 · edited Feb 13 '21 at 05:52

21

df.rename(index=str, columns={'A':'a', 'B':'b'})

pandas.DataFrame.rename

edited Feb 13 '21 at 05:52

Peter Mortensen

30,030
21
100
124

answered Jul 19 '18 at 04:50

Yog

777
1
8
19

An explanation would be in order. – Peter Mortensen Feb 13 '21 at 05:53

score 20 · Answer 16 · edited Feb 13 '21 at 05:27

If you've got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns...

columns = df.columns
columns = [row.replace("$", "") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() # To validate the output

Best way? I don't know. A way - yes.

A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory and execution time. @kadee, @kaitlyn, and @eumiro had the functions with the fastest execution times - though these functions are so fast we're comparing the rounding of 0.000 and 0.001 seconds for all the answers. Moral: my answer above likely isn't the 'best' way.

import pandas as pd
import cProfile, pstats, re

old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
col_dict = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}

df = pd.DataFrame({'$a':[1, 2], '$b': [10, 20], '$c': ['bleep', 'blorp'], '$d': [1, 2], '$e': ['texa$', '']})

df.head()

def eumiro(df, nn):
    df.columns = nn
    # This direct renaming approach is duplicated in methodology in several other answers:
    return df

def lexual1(df):
    return df.rename(columns=col_dict)

def lexual2(df, col_dict):
    return df.rename(columns=col_dict, inplace=True)

def Panda_Master_Hayden(df):
    return df.rename(columns=lambda x: x[1:], inplace=True)

def paulo1(df):
    return df.rename(columns=lambda x: x.replace('$', ''))

def paulo2(df):
    return df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

def migloo(df, on, nn):
    return df.rename(columns=dict(zip(on, nn)), inplace=True)

def kadee(df):
    return df.columns.str.replace('$', '')

def awo(df):
    columns = df.columns
    columns = [row.replace("$", "") for row in columns]
    return df.rename(columns=dict(zip(columns, '')), inplace=True)

def kaitlyn(df):
    df.columns = [col.strip('$') for col in df.columns]
    return df

print 'eumiro'
cProfile.run('eumiro(df, new_names)')
print 'lexual1'
cProfile.run('lexual1(df)')
print 'lexual2'
cProfile.run('lexual2(df, col_dict)')
print 'andy hayden'
cProfile.run('Panda_Master_Hayden(df)')
print 'paulo1'
cProfile.run('paulo1(df)')
print 'paulo2'
cProfile.run('paulo2(df)')
print 'migloo'
cProfile.run('migloo(df, old_names, new_names)')
print 'kadee'
cProfile.run('kadee(df)')
print 'awo'
cProfile.run('awo(df)')
print 'kaitlyn'
cProfile.run('kaitlyn(df)')

Why do you need rename method? Something like this worked for me # df.columns = [row.replace('$', '') for row in df.columns] — shantanuo, Sep 05 '15 at 13:19
I don't understand the 'things' part. What do I have to substitute? The old columns? — Andrea Ianni ௫, Jun 27 '16 at 11:05

Alexander · Answer 17 · 2017-09-13T12:24:31.443

df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})

If your new list of columns is in the same order as the existing columns, the assignment is simple:

new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you had a dictionary keyed on old column names to new column names, you could do the following:

d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col])  # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you don't have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:

df.columns = [col[1:] if col[0] == '$' else col for col in df]

Instead of `lambda col: d[col]` you could pass `d.get`... so it would look like `df.columns.map(d.get)` — piRSquared, Sep 13 '17 at 08:48

score 20 · Answer 18 · answered Jun 15 '21 at 00:38

Many of pandas functions have an inplace parameter. When setting it True, the transformation applies directly to the dataframe that you are calling it on. For example:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4]})
df.rename(columns={'$a': 'a'}, inplace=True)
df.columns

>>> Index(['a', '$b'], dtype='object')

Alternatively, there are cases where you want to preserve the original dataframe. I have often seen people fall into this case if creating the dataframe is an expensive task. For example, if creating the dataframe required querying a snowflake database. In this case, just make sure the the inplace parameter is set to False.

df = pd.DataFrame({'$a':[1,2], '$b': [3,4]})
df2 = df.rename(columns={'$a': 'a'}, inplace=False)
df.columns
    
>>> Index(['$a', '$b'], dtype='object')

df2.columns

>>> Index(['a', '$b'], dtype='object')

If these types of transformations are something that you do often, you could also look into a number of different pandas GUI tools. I'm the creator of one called Mito. Its a spreadsheet that automatically converts your edits to python code.

score 18 · Answer 19 · edited Feb 13 '21 at 05:28

18

Another way we could replace the original column labels is by stripping the unwanted characters (here '$') from the original column labels.

This could have been done by running a for loop over df.columns and appending the stripped columns to df.columns.

Instead, we can do this neatly in a single statement by using list comprehension like below:

df.columns = [col.strip('$') for col in df.columns]

(strip method in Python strips the given character from beginning and end of the string.)

edited Feb 13 '21 at 05:28

Peter Mortensen

30,030
21
100
124

answered Nov 23 '15 at 13:56

kait

1,279
8
12

2

Can you explain how/why this works? That will make the answer more valuable for future readers. – Dan Lowe Nov 23 '15 at 14:43

score 18 · Answer 20 · edited Oct 27 '21 at 22:29

18

If you already have a list for the new column names, you can try this:

new_cols = ['a', 'b', 'c', 'd', 'e']
new_names_map = {df.columns[i]:new_cols[i] for i in range(len(new_cols))}

df.rename(new_names_map, axis=1, inplace=True)

edited Oct 27 '21 at 22:29

Faiz Kidwai

390
5
22

answered Jun 10 '21 at 03:46

Crystal L

491
2
4

This is useful in a case where you don't want to specify the existing column names. I have such a case where they are annoyingly long, so I just want to pass the new names. – Chuck Jan 13 '22 at 16:04

score 17 · Answer 21 · edited Feb 13 '21 at 05:29

17

It is real simple. Just use:

df.columns = ['Name1', 'Name2', 'Name3'...]

And it will assign the column names by the order you put them in.

edited Feb 13 '21 at 05:29

Peter Mortensen

30,030
21
100
124

answered Nov 29 '15 at 19:22

Thodoris P

523
1
5
11

score 13 · Answer 22 · edited Sep 13 '21 at 19:45

13

# This way it will work
import pandas as pd

# Define a dictionary 
rankings = {'test': ['a'],
        'odi': ['E'],
        't20': ['P']}

# Convert the dictionary into DataFrame
rankings_pd = pd.DataFrame(rankings)

# Before renaming the columns
print(rankings_pd)

rankings_pd.rename(columns = {'test':'TEST'}, inplace = True)

edited Sep 13 '21 at 19:45

cheevahagadog

3,778
2
12
14

answered Jul 14 '21 at 02:09

Ankit Rai

191
2
3

score 12 · Answer 23 · answered Jan 28 '16 at 17:31

12

You could use str.slice for that:

df.columns = df.columns.str.slice(1)

answered Jan 28 '16 at 17:31

Anton Protopopov

27,206
10
83
90

1

PS: This is a more verbose equivalent to `df.columns.str[1:]`... probably better to use that, it's shorter and more obvious. – cs95 May 25 '19 at 04:00

score 12 · Answer 24 · answered Jul 07 '18 at 02:07

12

Another option is to rename using a regular expression:

import pandas as pd
import re

df = pd.DataFrame({'$a':[1,2], '$b':[3,4], '$c':[5,6]})

df = df.rename(columns=lambda x: re.sub('\$','',x))
>>> df
   a  b  c
0  1  3  5
1  2  4  6

answered Jul 07 '18 at 02:07

sbha

8,368
2
60
56

score 11 · Answer 25 · edited Feb 13 '21 at 05:30

My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.

Working Code:

import pandas as pd
import re


df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})

delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]

Output:

>>> df
   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

>>> df
   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

score 10 · Answer 26 · edited Feb 13 '21 at 05:31

Note that the approaches in previous answers do not work for a MultiIndex. For a MultiIndex, you need to do something like the following:

>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
   $a $b  e
   $x $y  f
0  1  3  5
1  2  4  6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
        rename.get(item, item) for item in df.columns.tolist()])
>>> df
   a  b  e
   x  y  f
0  1  3  5
1  2  4  6

score 9 · Answer 27 · edited Feb 13 '21 at 05:39

If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacements in one go.

First create a dictionary from the dataframe column names using regular expressions in order to throw away certain appendixes of column names and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.

This is then applied to the dataframe in one go.

dict = dict(zip(df.columns, df.columns.str.replace('(:S$|:C1$|:L$|:D$|\.Serial:L$)', '')))
dict['brand_timeseries:C1'] = 'BTS'
dict['respid:L'] = 'RespID'
dict['country:C1'] = 'CountryID'
dict['pim1:D'] = 'pim_actual'
df.rename(columns=dict, inplace=True)

score 8 · Answer 28 · edited Apr 27 '20 at 17:25

In addition to the solution already provided, you can replace all the columns while you are reading the file. We can use names and header=0 to do that.

First, we create a list of the names that we like to use as our column names:

import pandas as pd

ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols

ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)

In this case, all the column names will be replaced with the names you have in your list.

score 8 · Answer 29 · answered Mar 19 '21 at 10:29

8

If you just want to remove the '$' sign then use the below code

df.columns = pd.Series(df.columns.str.replace("$", ""))

answered Mar 19 '21 at 10:29

Omkar Darves

134
1
3

score 6 · Answer 30 · edited Feb 13 '21 at 05:51

Here's a nifty little function I like to use to cut down on typing:

def rename(data, oldnames, newname):
    if type(oldnames) == str: # Input can be a string or list of strings
        oldnames = [oldnames] # When renaming multiple columns
        newname = [newname] # Make sure you pass the corresponding list of new names
    i = 0
    for name in oldnames:
        oldvar = [c for c in data.columns if name in c]
        if len(oldvar) == 0:
            raise ValueError("Sorry, couldn't find that column in the dataset")
        if len(oldvar) > 1: # Doesn't have to be an exact match
            print("Found multiple columns that matched " + str(name) + ": ")
            for c in oldvar:
                print(str(oldvar.index(c)) + ": " + str(c))
            ind = input('Please enter the index of the column you would like to rename: ')
            oldvar = oldvar[int(ind)]
        if len(oldvar) == 1:
            oldvar = oldvar[0]
        data = data.rename(columns = {oldvar : newname[i]})
        i += 1
    return data

Here is an example of how it works:

In [2]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 4)), columns = ['col1', 'col2', 'omg', 'idk'])
# First list = existing variables
# Second list = new names for those variables
In [3]: df = rename(df, ['col', 'omg'],['first', 'ohmy'])
Found multiple columns that matched col:
0: col1
1: col2

Please enter the index of the column you would like to rename: 0

In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')

The use case for a function like this is extremely rare. In most cases, I know what I'm looking for and what I want to rename it to, I'd just assign/modify it myself. — cs95, May 25 '19 at 05:29
@cs95 I tend to work with large national or international surveys where variables will have coded variable names that begin with prefixes depending on answer options, likert scales, and branching (such as EDU_2913.443, EDU_2913.421,...). This function has been very useful for me in working with those types of sets, I understand if its not for you though :) — seeiespi, May 29 '19 at 19:41

score 6 · Answer 31 · edited Feb 13 '21 at 05:54

6

Assuming you can use a regular expression, this solution removes the need of manual encoding using a regular expression:

import pandas as pd
import re

srch = re.compile(r"\w+")

data = pd.read_csv("CSV_FILE.csv")
cols = data.columns
new_cols = list(map(lambda v:v.group(), (list(map(srch.search, cols)))))
data.columns = new_cols

edited Feb 13 '21 at 05:54

Peter Mortensen

30,030
21
100
124

answered Apr 11 '19 at 15:08

Kaustubh J

652
8
9

2

It's good practice on Stack Overflow to add an explanation as to why your solution should work or is better than the existing solutions. For more information read [How To Answer](//stackoverflow.com/help/how-to-answer). – Samuel Liew Apr 11 '19 at 23:49
Notice how the best-rated answer requires some form of hard coding and the worst rated answer requires only descriptive and procedural approach? – Kaustubh J Apr 13 '19 at 13:13
There are better (more readable) solutions that also utilise regex than this. This is doing way more than it should for a simple renaming operation. There's also the danger of the pattern not matching anything in which case you've not done anything to handle errors. – cs95 May 25 '19 at 03:48
Re *"Assuming you can use a regular expression"*: Do you mean *"Assuming you can't use a regular expression"* (the opposite)? – Peter Mortensen Feb 13 '21 at 05:55

score 6 · Answer 32 · edited Feb 13 '21 at 06:01

6

I needed to rename features for XGBoost, and it didn't like any of these:

import re
regex = r"[!\"#$%&'()*+,\-.\/:;<=>?@[\\\]^_`{|}~ ]+"
X_trn.columns = X_trn.columns.str.replace(regex, '_', regex=True)
X_tst.columns = X_tst.columns.str.replace(regex, '_', regex=True)

edited Feb 13 '21 at 06:01

Peter Mortensen

30,030
21
100
124

answered Jun 24 '20 at 02:42

Igor Ostaptchenko

331
6
6

2

FWIW, you could just keep track of the columns in a separate *n*-dimensional list and pass to XGBoost only the underlying NumPy array/matrix, which doesn't have any headers. In this way, you could name your columns whatever you wanted without having to conform to what XGBoost desires – blacksite Dec 18 '20 at 19:21
1

What *did* it like? – Peter Mortensen Feb 13 '21 at 06:02

Renaming column names in Pandas

32 Answers32

RENAME SPECIFIC COLUMNS

REASSIGN COLUMN HEADERS

Pandas 0.21+ Answer

Examples for Pandas 0.21+

Using `rename` with `axis='columns'` or `axis=1`

Using `set_axis` with a list and `inplace=False`

Why not use `df.columns = ['a', 'b', 'c', 'd', 'e']`?

Column names vs Names of Series

Naming the list of columns

Artefacts that linger

BUT

Multi-level column names

One line or Pipeline solutions

Linked

Related

Renaming column names in Pandas

32 Answers32

RENAME SPECIFIC COLUMNS

REASSIGN COLUMN HEADERS

Pandas 0.21+ Answer

Examples for Pandas 0.21+

Using rename with axis='columns' or axis=1

Using set_axis with a list and inplace=False

Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?

Column names vs Names of Series

Naming the list of columns

Artefacts that linger

BUT

Multi-level column names

One line or Pipeline solutions

Linked

Related

Using `rename` with `axis='columns'` or `axis=1`

Using `set_axis` with a list and `inplace=False`

Why not use `df.columns = ['a', 'b', 'c', 'd', 'e']`?