0

I have a large zip file that I would like to unzip, without loading all of its bytes into memory (to be done concurrently with fetching the zipped bytes via an http request)

How can this be done from Python?

Note: I am specifically asking about the zip format, not gzip. Questions such as Python unzipping stream of bytes?, although often use the word "zip", appear to be about gzip.

Michal Charemza
  • 24,475
  • 11
  • 89
  • 143
  • Does this answer your question? [Python unzipping stream of bytes?](https://stackoverflow.com/questions/12571913/python-unzipping-stream-of-bytes) – Anton Curmanschii May 16 '21 at 08:32
  • @AntonCurmanschii I don't think so: although that question's title says "zip", I think the contents are more about gzip? – Michal Charemza May 16 '21 at 08:34

2 Answers2

0

By calling funzip from within Python, which be done using iterable-subprocess [disclaimer: written by me], you can unzip the first file in a ZIP archive:

from iterable_subprocess import iterable_subprocess
import httpx

def zipped_chunks():
    with httpx.stream('GET', 'https://www.example.com/my.zip') as r:
        yield from r.iter_bytes()

for chunk in iterable_subprocess(['funzip'], zipped_chunks()):
    print(chunk)
Michal Charemza
  • 24,475
  • 11
  • 89
  • 143
0

It is possible to do this from within Python, without calling to an external process, and it can handle all the files in the zip, not just the first.

This can be done by using stream-unzip [disclaimer: written by me].

from stream_unzip import stream_unzip
import httpx

def zipped_chunks():
    with httpx.stream('GET', 'https://www.example.com/my.zip') as r:
        yield from r.iter_bytes()

for file_name, file_size, file_chunks in stream_unzip(zipped_chunks()):
    for chunk in file_chunks:
        print(chunk)
Michal Charemza
  • 24,475
  • 11
  • 89
  • 143