38

A simple program for reading a CSV file inside a zip file works in Python 2.7, but not in Python 3.2

$ cat test_zip_file_py3k.py 
import csv, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')

for row in csv.DictReader(items_file):
    pass

$ python2.7 test_zip_file_py3k.py ~/data.zip

$ python3.2 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
  File "test_zip_file_py3k.py", line 8, in <module>
    for row in csv.DictReader(items_file):
  File "/home/msabramo/run/lib/python3.2/csv.py", line 109, in __next__
    self.fieldnames
  File "/home/msabramo/run/lib/python3.2/csv.py", line 96, in fieldnames
    self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file 
in text mode?)

So the csv module in Python 3 wants to see a text file, but zipfile.ZipFile.open returns a zipfile.ZipExtFile that is always treated as binary data.

How does one make this work in Python 3?

isherwood
  • 52,576
  • 15
  • 105
  • 143
Marc Abramowitz
  • 3,207
  • 2
  • 22
  • 30

5 Answers5

33

I just noticed that Lennart's answer didn't work with Python 3.1, but it does work with Python 3.2. They've enhanced zipfile.ZipExtFile in Python 3.2 (see release notes). These changes appear to make zipfile.ZipExtFile work nicely with io.TextWrapper.

Incidentally, it works in Python 3.1, if you uncomment the hacky lines below to monkey-patch zipfile.ZipExtFile, not that I would recommend this sort of hackery. I include it only to illustrate the essence of what was done in Python 3.2 to make things work nicely.

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
# items_file.readable = lambda: True
# items_file.writable = lambda: False
# items_file.seekable = lambda: False
# items_file.read1 = items_file.read
items_file  = io.TextIOWrapper(items_file)

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0} -- row = {1}'.format(idx, row))

If I had to support py3k < 3.2, then I would go with the solution in my other answer.

Community
  • 1
  • 1
Marc Abramowitz
  • 3,207
  • 2
  • 22
  • 30
14

You can wrap it in a io.TextIOWrapper.

items_file  = io.TextIOWrapper(items_file, encoding='your-encoding', newline='')

Should work.

Marc Abramowitz
  • 3,207
  • 2
  • 22
  • 30
Lennart Regebro
  • 158,668
  • 41
  • 218
  • 248
9

And if you just like to read a file into a string:

with ZipFile('spam.zip') as myzip:
    with myzip.open('eggs.txt') as myfile:
       eggs = myfile.read().decode('UTF-8'))
Arigion
  • 2,889
  • 29
  • 41
  • Note this is not the equivalent of opening the file in text mode and I discovered this to my cost today, although according to https://docs.python.org/3/library/io.html this could change in the future! – Keeely May 17 '22 at 11:57
3

Lennart's answer is on the right track (Thanks, Lennart, I voted up your answer) and it almost works:

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
items_file  = io.TextIOWrapper(items_file, encoding='iso-8859-1', newline='')

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0}'.format(idx))

$ python3.1 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
  File "test_zip_file_py3k.py", line 7, in <module>
    items_file  = io.TextIOWrapper(items_file, 
                                   encoding='iso-8859-1', 
                                   newline='')
AttributeError: readable

The problem appears to be that io.TextWrapper's first required parameter is a buffer; not a file object.

This appears to work:

items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))

This seems a little complex and also it seems annoying to have to read in a whole (perhaps huge) zip file into memory. Any better way?

Here it is in action:

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0}'.format(idx))

$ python3.1 test_zip_file_py3k.py ~/data.zip
Processing row 0
Processing row 1
Processing row 2
...
Processing row 250
Community
  • 1
  • 1
Marc Abramowitz
  • 3,207
  • 2
  • 22
  • 30
  • [Another answer from me](http://stackoverflow.com/questions/5627954/py3k-how-do-you-read-a-file-inside-a-zip-file-as-text-not-bytes/5639960#5639960) describes a better way that works in Python 3.2. – Marc Abramowitz Apr 12 '11 at 22:21
0

Starting from Python 3.8, a more beautiful approach using Path object can be used:

zipfile = zipfile.Path(sys.argv[1], at='items.csv')
items_file = zipfile.read_text()

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0}'.format(idx))
Yury
  • 19,958
  • 7
  • 55
  • 85