Skip specific set of columns when reading excel frame - pandas

Question

I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])

There is nothing related to this in the documentation. is there any workaround for this?

@Aran-Fey It is possible but the list of columns to use would be significantly large compared with the unused columns list (160 vs 30) — Juan David, Apr 05 '18 at 16:37
do you know the indices (positions) of columns that you want to skip? — MaxU - stop genocide of UA, Apr 05 '18 at 16:44
@MaxU Yes. I can determine the indices of the columns to skip — Juan David, Apr 05 '18 at 17:00
duplicate of https://stackoverflow.com/questions/24366449/python-pandas-how-to-skip-columns-when-reading-a-file — MarMat, May 22 '19 at 06:55

score 19 · Answer 1 · answered May 22 '19 at 08:28

19

If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)

This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.

answered May 22 '19 at 08:28

MarMat

608
5
12

2

Note that `usecols` accept the columns letters as parameter: usecols = "A,C:AA" – neves Jun 11 '20 at 00:16

score 14 · Accepted Answer · edited Aug 09 '20 at 09:00

14

You can use the following technique. Let the columns we don't want(want to skip) are 2 5 8, then find all reamining columns we DO WANT TO KEEP as cols such that:

In [7]: cols2skip = [2,5,8]  
In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

and then we can use those remaining columns(which we DO WANT TO KEEP) using usecols:

df = pd.read_excel(filename, usecols=cols)

edited Aug 09 '20 at 09:00

Anurag Dhadse

1,284
12
19

answered Apr 05 '18 at 17:14

MaxU - stop genocide of UA

191,778
30
340
375

1

I think this is more 'Pythonic' than @MarMat, as this uses readable list comprehension in 2 lines, and other uses lambda. My understanding is always avoid lambda in Python if you can use a list comprehension, and lambda is rarely much faster. If you want someone else to understand your code quicker, this will be easier imho. If you are processing Excel and you find one of columns is binary image string (I get that surprisingly often), this is quite useful! – Will Croxford Feb 15 '21 at 17:19

Skip specific set of columns when reading excel frame - pandas

2 Answers2

Linked