243

I am trying to read a macro-enabled Excel worksheet using pandas.read_excel with the xlrd library. It's running fine in local, but when I try to push the same into PCF, I am getting this error:

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] df1=pd.read_excel(os.path.join(APP_PATH, os.path.join("Data", "aug_latest.xlsm")),sheet_name=None)

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] return open_workbook(filepath_or_buffer)
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] File "/home/vcap/deps/0/python/lib/python3.8/site-packages/xlrd/__init__.py", line 170, in open_workbook
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] xlrd.biffh.XLRDError: Excel xlsx file; not supported

How can I resolve this error?

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
Vignesh K
  • 2,449
  • 2
  • 4
  • 6
  • 2
    Does this answer your question? [pandas cannot open xlsx file](https://stackoverflow.com/questions/65250207/pandas-cannot-open-xlsx-file) – Chris Withers Dec 13 '20 at 18:29

2 Answers2

448

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange, but still present, in the readme on the repository and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

In your case, the solution is to:

  • make sure you are on a recent version of Pandas, at least 1.0.1, and preferably the latest release. 1.2 will make his even clearer.
  • install openpyxl: https://openpyxl.readthedocs.io/en/stable/
  • change your Pandas code to be:
    df1 = pd.read_excel(
         os.path.join(APP_PATH, "Data", "aug_latest.xlsm"),
         engine='openpyxl',
    )
    
Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
Chris Withers
  • 9,825
  • 4
  • 30
  • 47
  • what if you don't know the sheet name? can you pass this to pd.ExcelFile? – Christopher Turnbull Dec 13 '20 at 18:10
  • 11
    Chris, thanks for the xlrd update to support Python 3.9. However, this is a major change in the package with no deprecation warning, so I would suggest a more informative error message, e.g. clarifying when (date and version) xlrd dropped support for non-xls files. – khyox Dec 14 '20 at 02:05
  • 1
    @kyox - there was a notice on the repo for over a year and various announcements on the mailing list and elsewhere going back over four years. – Chris Withers Dec 14 '20 at 07:17
  • 2
    @ChristopherTurnbull specifying the sheet name is optional. If you omit it, the first sheet in the file will be opened. – data.dude Dec 14 '20 at 11:36
  • 1
    As per this pandas developer and the discussion above it [link](https://github.com/pandas-dev/pandas/issues/28547#issuecomment-743125066) apparently xlrd no longer supports .xlsx files. One should wait for the newest pandas version 1.2.0 or put the parameter read_excel(engine='openpyxl') – BrunoSE Dec 22 '20 at 15:27
  • 2
    I install pandas==1.1.4 and xlrd==1.2.0 – Kairat Koibagarov Jan 12 '21 at 08:08
  • 3
    @KairatKoibagarov - as I've said many times now, in my capacity as the maintainer of xlrd: that is an extremely poor choice. – Chris Withers Jan 12 '21 at 13:00
  • 20
    Installing the module `pip install openpyxl` and including in all my read_excel functions the openpyxl engine `read_excel("my.xlsx",engine='openpyxl') ` saved my code and my time! Thank you so much @ChrisWithers! – Corina Roca Jan 13 '21 at 13:21
  • 3
    As a user who didn't actually KNOW pandas was using xlrd to open xlsx files, a deprecation warning coming from the code would have been REALLY useful... I can't read all of the mailing lists of all of the libraries that I might POSSIBLY be using, somewhere 3 layers deep in my code... – Brian Postow Jun 10 '21 at 18:29
  • 19
    Good answer, but the passive aggressive, condescending tone isn't helpful to the numerous less technical users of pandas. Like a grumpy TSA screener, you're assuming that every member of the public is as deeply familiar as you are with a piece of software. – JPKab Jun 23 '21 at 21:20
  • 1
    As someone who came here from google, the condescending tone is not warranted. – CrazyPenguin Nov 02 '21 at 13:56
  • 1
    Oh, it absolutely is. The sense of entitlement around a freely provided piece of code and plaintiff whining of people who couldn't be bothered to read release notes is what's not warranted. – Chris Withers Nov 02 '21 at 15:11
  • 2
    There's no point in mentioning all the places where the warning is other than trying to shame OP. – gmelodie Jan 08 '22 at 15:08
  • What speaks again making openpyxl the standard reader in Pandas? De facto all my excel are xlxs and require openpyxl. – n3rd Feb 07 '22 at 19:55
223

The previous version, xlrd 1.2.0, may appear to work, but it could also expose you to potential security vulnerabilities. With that warning out of the way, if you still want to give it a go, type the following command:

pip install xlrd==1.2.0
Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
tryhard
  • 2,335
  • 1
  • 2
  • 3