Get a list from Pandas DataFrame column headers

Question

I want to get a list of the column headers from a Pandas DataFrame. The DataFrame will come from user input, so I won't know how many columns there will be or what they will be called.

For example, if I'm given a DataFrame like this:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would get a list like this:

>>> header_list
['y', 'gdp', 'cap']

From python3.5+ you can use `[*df]` over `list(df)` or `df.columns.tolist()`, this is thanks to [Unpacking generalizations (PEP 448)](https://www.python.org/dev/peps/pep-0448/). — cs95, Jun 07 '20 at 22:13

score 1920 · Accepted Answer · edited Oct 22 '21 at 12:16

1920

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use (as shown in Ed Chum's answer):

list(my_dataframe)

edited Oct 22 '21 at 12:16

Peter Mortensen

30,030
21
100
124

answered Oct 20 '13 at 21:23

Simeon Visser

113,587
18
171
175

49

Why does [this doc](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html) not have `columns` as an attribute? – Tjorriemorrie Nov 21 '14 at 08:30
@Tjorriemorrie: I'm not sure, it may have to do with the way they automatically generate their documentation. It is mentioned in other places though: http://pandas.pydata.org/pandas-docs/stable/basics.html#attributes-and-the-raw-ndarray-s – Simeon Visser Nov 21 '14 at 10:18
9

I would have expect something like `df.column_names()`. Is this answer still right or is it outdated? – alvas Jan 13 '16 at 06:48
1

@alvas there are various other ways to do it (see other answers on this page) but as far as I know there isn't a method on the dataframe directly to produce the list. – Simeon Visser Jan 13 '16 at 09:30
21

Importantly, this preserves the column order. – WindChimes Jan 25 '16 at 13:07
I tried using this with unittest assertListEqual to check the headers in a df matched an expected list, and it tells me it's not a list, but rather a sequence, it looks like `array(['colBoolean','colTinyint', 'colSmallnt', ...], dtype=object)` – Davos May 02 '18 at 07:20
`df.keys().tolist()` is more universal, because it works also for older versions of pandas than 0.16.0 – StefanK May 09 '18 at 08:22
Even though the solution that was provided above is nice. I would also expect something like frame.column_names() to be a function in pandas, but since it is not, maybe it would be nice to use the following syntax. It somehow preserves the feeling that you are using pandas in a proper way by calling the "tolist" function: frame.columns.tolist() – Igor Jakovljevic Nov 23 '18 at 09:53
Note that dataframe[column_name].to_numpy() is the suggested method to get the values of a column as of pandas 0.24.1 – Timbus Calin Mar 16 '19 at 07:35
1

This first option is terrible (as of the current version of pandas - v0.24) because it is [mixing idioms](https://stackoverflow.com/questions/19482970/get-list-from-pandas-dataframe-column-headers/19483602?noredirect=1#comment97691231_48832928). If you are going through the trouble to access the numpy array, please use the `.tolist()` method instead, it is faster and more idiomatic. – cs95 Apr 03 '19 at 09:50
when i used `list(my_df)`, it gives me `[u'Col_Name1', u'Col_Name2']` but please specify, **what is the meaning of 'u' columns list** – Jayank Aug 21 '20 at 17:05
The approach `headers = list(df.columns.values)` doesn't work in all cases. It gives me `TypeError: 'list' object is not callable` with python 3.9.1. Instead, `headers = [*df]` works just fine. Also, `headers = df.columns.values` gives a rather than a list, but seems to work too. – msoutopico Feb 27 '21 at 13:12

score 484 · Answer 2 · edited Oct 22 '21 at 12:24

There is a built-in method which is the most performant:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function .tolist to return a list.

If performance is not as important to you, Index objects define a .tolist() method that you can call directly:

my_dataframe.columns.tolist()

The difference in performance is obvious:

%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For those who hate typing, you can just call list on df, as so:

list(df)

score 103 · Answer 3 · edited Oct 22 '21 at 12:31

I did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)

score 60 · Answer 4 · edited Oct 22 '21 at 12:28

60

It gets even simpler (by Pandas 0.16.0):

df.columns.tolist()

will give you the column names in a nice list.

edited Oct 22 '21 at 12:28

Peter Mortensen

30,030
21
100
124

answered Apr 07 '15 at 14:50

fixxxer

14,690
15
56
75

cs95 · Answer 5 · 2021-10-23T19:33:40.797

Extended Iterable Unpacking (Python 3.5+): `[*df]` and Friends

Unpacking generalizations (PEP 448) have been introduced with Python 3.5. So, the following operations are all possible.

df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

If you want a list....

[*df]
# ['A', 'B', 'C']

Or, if you want a set,

{*df}
# {'A', 'B', 'C'}

Or, if you want a tuple,

*df,  # Please note the trailing comma
# ('A', 'B', 'C')

Or, if you want to store the result somewhere,

*cols, = df  # A wild comma appears, again
cols
# ['A', 'B', 'C']

... if you're the kind of person who converts coffee to typing sounds, well, this is going consume your coffee more efficiently ;)

P.S.: if performance is important, you will want to ditch the solutions above in favour of
df.columns.to_numpy().tolist()
# ['A', 'B', 'C']
This is similar to Ed Chum's answer, but updated for v0.24 where .to_numpy() is preferred to the use of .values. See this answer (by me) for more information.

Visual Check

Since I've seen this discussed in other answers, you can use iterable unpacking (no need for explicit loops).

print(*df)
A B C

print(*df, sep='\n')
A
B
C

Critique of Other Methods

Don't use an explicit for loop for an operation that can be done in a single line (list comprehensions are okay).

Next, using sorted(df) does not preserve the original order of the columns. For that, you should use list(df) instead.

Next, list(df.columns) and list(df.columns.values) are poor suggestions (as of the current version, v0.24). Both Index (returned from df.columns) and NumPy arrays (returned by df.columns.values) define .tolist() method which is faster and more idiomatic.

Lastly, listification i.e., list(df) should only be used as a concise alternative to the aforementioned methods for Python 3.4 or earlier where extended unpacking is not available.

Alexander · Answer 6 · 2018-01-26T22:25:24.990

40

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

edited Jan 26 '18 at 22:25

answered May 28 '15 at 15:58

Alexander

96,739
27
183
184

Would that `list(df)` work only with autoincrement dataframes? Or does it work for all dataframes? – alvas Jan 13 '16 at 06:49
2

Should work for all. When you are in the debugger, however, you need to use a list comprehension `[c for c in df]`. – Alexander Jan 13 '16 at 07:28

BrenBarn · Answer 7 · 2014-01-23T18:50:27.977

26

That's available as my_dataframe.columns.

edited Jan 23 '14 at 18:50

answered Oct 20 '13 at 21:20

BrenBarn

228,001
34
392
371

1

And explicitly as a list by `header_list = list(my_dataframe.columns)` – yeliabsalohcin Sep 05 '17 at 12:59
1

^ Or better still: `df.columns.tolist()`. – cs95 Apr 03 '19 at 09:52
`my_dataframe.columns` will return an index: `Index(['A', 'B', 'C'], dtype='object')` you can check that using: `type(df.columns)` -> `pandas.core.indexes.base.Index` not a list. use the .tolist() of @cs95 – rubengavidia0x Jan 26 '22 at 21:31

score 19 · Answer 8 · edited Oct 22 '21 at 12:22

19

A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.

my_dataframe.keys()

Create a list of keys/columns - object method to_list() and the Pythonic way:

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iteration on a DataFrame returns column labels:

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) # Compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) # Constant time operation - O(1)

edited Oct 22 '21 at 12:22

Peter Mortensen

30,030
21
100
124

answered Jan 23 '14 at 17:23

Sascha Gottfried

3,235
19
29

2

My tests show `df.columns` is a lot faster than `df.keys()`. Not sure why they have both a function and attribute for the same thing (well, it isn't the first time I've seen 10 different ways to do something in pandas). – cs95 Apr 03 '19 at 09:45
1

The intention of my answer was to show a couple of ways to query column labels from a DataFrame and highlight a performance anti-pattern. Nevertheless I like your comments and upvoted your recent answer - since they provide value from a software engineering point of view. – Sascha Gottfried Apr 09 '19 at 10:05

score 19 · Answer 9 · edited Oct 22 '21 at 12:37

19

It's interesting, but df.columns.values.tolist() is almost three times faster than df.columns.tolist(), but I thought that they were the same:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

edited Oct 22 '21 at 12:37

Peter Mortensen

30,030
21
100
124

answered Dec 04 '15 at 21:41

Anton Protopopov

27,206
10
83
90

2

Timings have already been covered in [this answer](https://stackoverflow.com/a/27236748/4909087). The reason for the discrepancy is because `.values` returns the underlying numpy array, and doing something with numpy is almost always faster than doing the same thing with pandas directly. – cs95 Apr 03 '19 at 09:48

score 14 · Answer 10 · edited Oct 22 '21 at 12:32

14

In the Notebook

For data exploration in the IPython notebook, my preferred way is this:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

In a code repository

In code I find it more explicit to do

df.columns

Because it tells others reading your code what you are doing.

edited Oct 22 '21 at 12:32

Peter Mortensen

30,030
21
100
124

answered Mar 30 '16 at 07:19

firelynx

28,634
8
83
95

`sorted(df)` changes order. Use with caution. – cs95 Apr 03 '19 at 09:45
@coldspeed I do mention this though "Which will produce an easy to read alphabetically ordered list." – firelynx Apr 03 '19 at 11:48
`type(df.columns)` -> `pandas.core.indexes.base.Index` use: `list(df.columns)` or `df.values.tolist()` – rubengavidia0x Jan 26 '22 at 21:34

score 9 · Answer 11 · edited Oct 22 '21 at 12:49

9

%%timeit
final_df.columns.values.tolist()
948 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
list(final_df.columns)
14.2 µs ± 79.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
list(final_df.columns.values)
1.88 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
final_df.columns.tolist()
12.3 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
list(final_df.head(1).columns)
163 µs ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

edited Oct 22 '21 at 12:49

Peter Mortensen

30,030
21
100
124

answered Apr 16 '19 at 06:32

rohit singh

139
1
4

1

An explanation would be in order. E.g., what is the summary and conclusion? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/55701903/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Oct 22 '21 at 12:48

score 3 · Answer 12 · edited Oct 22 '21 at 12:39

3

As answered by Simeon Visser, you could do

list(my_dataframe.columns.values)

or

list(my_dataframe) # For less typing.

But I think most the sweet spot is:

list(my_dataframe.columns)

It is explicit and at the same time not unnecessarily long.

edited Oct 22 '21 at 12:39

Peter Mortensen

30,030
21
100
124

answered Feb 16 '18 at 18:36

Vivek Puurkayastha

341
1
5
13

"It is explicit, at the same time not unnecessarily long." I disagree. Calling `list` has no merit unless you are calling it on `df` directly (for, example, conciseness). Accessing the `.columns` attribute returns an `Index` object that has a `tolist()` method defined on it, and calling that is more idiomatic than listifying the `Index`. Mixing idioms just for the sake of completeness is not a great idea. Same goes for listifying the array you get from `.values`. – cs95 Apr 03 '19 at 09:42

score 3 · Answer 13 · answered Aug 22 '18 at 16:17

3

For a quick, neat, visual check, try this:

for col in df.columns:
    print col

answered Aug 22 '18 at 16:17

Joseph True

601
8
6

score 2 · Answer 14 · edited Oct 22 '21 at 12:36

2

I feel the question deserves an additional explanation.

As fixxxer noted, the answer depends on the Pandas version you are using in your project. Which you can get with pd.__version__ command.

If you are for some reason like me (on Debian 8 (Jessie) I use 0.14.1) using an older version of Pandas than 0.16.0, then you need to use:

df.keys().tolist() because there isn’t any df.columns method implemented yet.

The advantage of this keys method is that it works even in newer version of Pandas, so it's more universal.

edited Oct 22 '21 at 12:36

Peter Mortensen

30,030
21
100
124

answered Dec 13 '17 at 14:47

StefanK

1,780
1
17
22

The con of keys() is that it is a function call rather than an attribute lookup, so it's always going to be slower. Of course, with constant time accesses, no one really cares about differences like these, but I think it's worth mentioning anyway; df.columns is now a more universally accepted idiom for accessing headers. – cs95 Apr 04 '19 at 21:00

score 2 · Answer 15 · answered Jan 19 '22 at 01:02

2

The simplest option would be: list(my_dataframe.columns) or my_dataframe.columns.tolist()

No need for the complex stuff above :)

answered Jan 19 '22 at 01:02

Grégoire

46
3

score 1 · Answer 16 · answered Oct 20 '13 at 21:43

1

n = []
for i in my_dataframe.columns:
    n.append(i)
print n

answered Oct 20 '13 at 21:43

user21988

67
1
9

6

please replace it with a list comprehension. – Sascha Gottfried Jan 23 '14 at 16:22
5

change your first 3 lines to `[n for n in dataframe.columns]` – Anton Protopopov Dec 04 '15 at 21:31
Why would you want to go through all this trouble for an operation you can easily do in a single line? – cs95 Apr 03 '19 at 09:36
@cs95 I think the problem is C/C++/Python people here trying to answer in pandas. That's happen to me when I was learning and solving on Python and R. Programmers trying to do pandas. – rubengavidia0x Jan 26 '22 at 21:38

score 1 · Answer 17 · answered Apr 02 '22 at 11:49

1

import pandas as pd

# create test dataframe
df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(2))

list(df.columns)

Returns

['A', 'B', 'C']

answered Apr 02 '22 at 11:49

gremur

1,433
2
5
19

score 0 · Answer 18 · answered Jan 16 '20 at 05:24

If the DataFrame happens to have an Index or MultiIndex and you want those included as column names too:

names = list(filter(None, df.index.names + df.columns.values.tolist()))

It avoids calling reset_index() which has an unnecessary performance hit for such a simple operation.

I've run into needing this more often because I'm shuttling data from databases where the dataframe index maps to a primary/unique key, but is really just another "column" to me. It would probably make sense for pandas to have a built-in method for something like this (totally possible I've missed it).

score 0 · Answer 19 · answered Jun 03 '22 at 20:41

0

This is the easiest way to reach your goal.

my_dataframe.columns.values.tolist()

and if you are Lazy, try this >

list(my_dataframe)

answered Jun 03 '22 at 20:41

Nayem Jaman Tusher

73
10

score -1 · Answer 20 · edited Oct 22 '21 at 12:41

-1

Even though the solution that was provided previously is nice, I would also expect something like frame.column_names() to be a function in Pandas, but since it is not, maybe it would be nice to use the following syntax. It somehow preserves the feeling that you are using pandas in a proper way by calling the "tolist" function: frame.columns.tolist()

frame.columns.tolist()

edited Oct 22 '21 at 12:41

Peter Mortensen

30,030
21
100
124

answered Feb 14 '19 at 10:58

Igor Jakovljevic

84
2
10

Re *"the solution"*: Which one are you referring to? Or do you refer to several solutions? – Peter Mortensen Oct 22 '21 at 12:41

score -1 · Answer 21 · answered Oct 27 '21 at 22:35

-1

listHeaders = [colName for colName in my_dataframe]

answered Oct 27 '21 at 22:35

Spesh

9
3

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 28 '21 at 02:28

Get a list from Pandas DataFrame column headers

21 Answers21

Extended Iterable Unpacking (Python 3.5+): `[*df]` and Friends

Critique of Other Methods

In the Notebook

In a code repository

Linked

Related

Get a list from Pandas DataFrame column headers

21 Answers21

Extended Iterable Unpacking (Python 3.5+): [*df] and Friends

Critique of Other Methods

In the Notebook

In a code repository

Linked

Related

Extended Iterable Unpacking (Python 3.5+): `[*df]` and Friends