0

I'm playing with NodeJs streams to get more experience with them and as a way of handling extremely large amounts of generates data (see this answer as to how I am making it); for testing/experimentation purposes I used to do something like:

// generator will make in excess of 1,897,280,473
// Objects but probably even more in the future
const iterStream = Readable.from(myPermutationsGenerator);
let chunkCount = 0;
let streamByteCount = 0;
let streamStart = new Date();
iterStream.on('data', chunk => {
  console.log(chunk);
  chunkCount++;
  streamByteCount += chunk.length;
  // File would be much larger, but I don't want to 
  // write everything while experimenting
  if (streamByteCount > 1e9) iterStream.destroy();
})
iterStream.on('close', () => console.log(
  getClosingLog(streamByteCount, chunkCount, streamStart)
));

And then pipe to file in WSL1 npm run noodle > output.log.

And I get something like:

[
  'black,green,black,orange,green',
  'black,green,black,orange,green',
  'orange,green,black,orange,green',
  'black,green,black,orange,green',
  'black,green,black,orange,green',
  'black,green,black,green,green',
]

for each chunk made by the generator I linked above, when I feed in an array of 200-300 strings that look like 'black,green,black,orange,green'.

I put that guard on bytes written in as I almost filled up my disk space (old machine, not much free disk space).

This has been fine for now, but I wanted to experiment with a compression library to see if I can get the file smaller, while still making each individual chunk searchable in the future (so I can compress individual chunks and search for them).

However, when I'm using LZUTF8.js like so:

const compressionStream = LZUTF8.createCompressionStream();
iterStream.pipe(compressionStream).on('data', chunk => {
      console.log(chunk);
      chunkCount++;
      streamByteCount += chunk.length;
      // File would be much larger, but I don't want to write everything while experimenting
      if (streamByteCount > 1e9) iterStream.destroy();
    })

I now get the error:

NodeError [TypeError]: The "chunk" argument must be one of type string or Buffer. Received Object. ...stackTrace

So firstly, I thought .from(iterable') and using .on('data') put the Stream into a certain mode that would work with objects. Secondly, LZUTF8.js says:

Creates a compression stream. The stream will accept both Buffers and Strings in any encoding supported by Node.js (e.g. utf8, utf16, ucs2, base64, hex, binary etc.) and return Buffers.

So while I don't know why the data event listener is complaining the NodeJs docs do say to Choose one API style:

Specifically, using a combination of on('data'), on('readable'), pipe(), or async iterators could lead to unintuitive behavior.

Which I'm guessing I'm seeing?

It makes sense to switch to a full .pipe(...) implementation using createWriteStream instead of console.log and piping to file, but then I loose the ability to abort if the file gets too big.

So I guess this is all a big preamble to asking, how do I prevent a writeStream from writing too much data, when I can't listen on the readStream emitting data events?

AncientSwordRage
  • 6,865
  • 14
  • 82
  • 158

0 Answers0