How to check if a column exists in Pandas

Question

Is there a way to check if a column exists in a Pandas DataFrame?

Suppose that I have the following DataFrame:

>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in xrange(10)],
                       'B': [randint(1, 9)*10 for x in xrange(10)],
                       'C': [randint(1, 9)*100 for x in xrange(10)]})
>>> df
   A   B    C
0  3  40  100
1  6  30  200
2  7  70  800
3  3  50  200
4  7  50  400
5  4  10  400
6  3  70  500
7  8  30  200
8  3  40  800
9  6  60  200

and I want to calculate df['sum'] = df['A'] + df['C']

But first I want to check if df['A'] exists, and if not, I want to calculate df['sum'] = df['B'] + df['C'] instead.

score 993 · Accepted Answer · answered Jul 21 '14 at 16:48

993

This will work:

if 'A' in df:

But for clarity, I'd probably write it as:

if 'A' in df.columns:

answered Jul 21 '14 at 16:48

chrisb

44,957
8
61
62

20

the otherway around one could use: ```if not 'A' in df.columns:``` to execute an operation if ```A``` is not present in ```df``` – Robvh Feb 05 '20 at 10:59
7

Additionally, you can check multiple with `if header in df.columns for header in ('A', 'B')` – Joe Sadoski May 28 '21 at 14:11
@Robvh I think it is better to use `if 'A' not in df.columns:` rather than using `if not 'A' in df.columns:`. Because `not in` is a single python operator. But if you use something like `not A in B` it is calculating `A in B` first and then going through `not` operator. – Ramesh-X May 28 '22 at 03:19

C8H10N4O2 · Answer 2 · 2021-11-15T14:47:01.820

To check if one or more columns all exist, you can use set.issubset, as in:

if set(['A','C']).issubset(df.columns):
   df['sum'] = df['A'] + df['C']

As @brianpck points out in a comment, set([]) can alternatively be constructed with curly braces,

if {'A', 'C'}.issubset(df.columns):

See this question for a discussion of the curly-braces syntax.

Or, you can use a generator comprehension, as in:

if all(item in df.columns for item in ['A','C']):

score 18 · Answer 3 · answered May 22 '17 at 18:28

18

Just to suggest another way without using if statements, you can use the get() method for DataFrames. For performing the sum based on the question:

df['sum'] = df.get('A', df['B']) + df['C']

The DataFrame get method has similar behavior as python dictionaries.

answered May 22 '17 at 18:28

Gerges

5,869
2
20
37

Thank you, this works: `df['sum'] = df.get('A') + df['B'] + df['C']` or to avoid any column error if it does not exist, using get() for all the terms .. e.g. `df['sum'] = df.get('A') + df.get('B') + df.get('C')` – Santosh K Apr 05 '21 at 07:36
`df.get("A") + df.get("B")` still gives you an error if those don't exist, just the more confusing `TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'` rather than the easier-to-debug `KeyError`. `.get()` should only be used if you're actually planning on using the default value, otherwise it just pushes the error away from the point of failure and makes the state contract more confusing to intuit. The whole point of Gerges' answer is to use the second parameter to `.get()` to specify a column you know will exist as a fallback, not to let a bunch of Nones crash the code. – ggorlen Nov 11 '21 at 00:23

Mykola Zotko · Answer 4 · 2022-02-21T15:39:49.693

5

You can use the set's method issuperset:

set(df).issuperset(['A', 'B'])
# set(df.columns).issuperset(['A', 'B'])

edited Feb 21 '22 at 15:39

answered Jan 11 '22 at 09:02

Mykola Zotko

12,250
2
39
53

How to check if a column exists in Pandas

4 Answers4

Linked

Related