0

Currently I am trying to write a node-flavored Lambda function that will download, compress and upload a file in S3 "on the fly" without storing in disk. Some files can be several GBs in size and Lambda has a disk limit of 512MB and upgrading big memory is costly.

At fist I could achieve a .gz compression successfully but sadly the requirement is to make it .zip which I though it would be trivial but it wasn't to my surpreise. At first I tried successfully the following code:

const zlib = require('zlib');
// Prepare a read stream from a file in s3
const fileStream = s3.getObject(objectDescriptor).createReadStream();
fileStream
  .pipe(zlib.createGunzip()) 
  .pipe(new MultiPartUploadS3(createMultipartUploadRequest, fileStream))

In case you wonder, MultiPartUploadS3 is a write stream class that takes care of administrating memory usage by pausing fileStream if necessary, opening the s3 connection and uploading parts.

With the above code I can create a compressed version of a single file data.dat -> data.gz keeping the memory low even if files are large.

However I have not been able to write a successful equivalent code that does a valid compression: data.dat -> data.zip without having to store the file locally, in which case it would not escalate. zlib is not my friend here since it cannot produce a zip file as easy as it produces a gz file, I have checked several libraries but none of them seem to work with reading streams, they read a file from disk and then some can write the resulting compressed file to a write stream, something like below example:

const zipFancyLib = require('some-fancy-zipping-lib');

const zip = new ZipFancyLib()
zip.addFile('./some/local/file.dat') // Not good reading from disk
zip.stream().pipe(new MultiPartUploadS3(...))

I understand there is a limitation in zlib library since it can provide the compression but cannot add the archiving capacity that zip needs, and probably writing my own code for that might be too complex so I am open to using libs but, Why none of them work with streams as input? Is it not viable to read, compress and write the result sequentially in a zip? Is it a constrain of node? It seems possible in python and java though.

Ivancho
  • 141
  • 3
  • 14

0 Answers0