I'm playing with NodeJs streams to get more experience with them and as a way of handling extremely large amounts of generates data (see this answer as to how I am making it); for testing/experimentation purposes I used to do something like:
// generator will make in excess of 1,897,280,473
// Objects but probably even more in the future
const iterStream = Readable.from(myPermutationsGenerator);
let chunkCount = 0;
let streamByteCount = 0;
let streamStart = new Date();
iterStream.on('data', chunk => {
console.log(chunk);
chunkCount++;
streamByteCount += chunk.length;
// File would be much larger, but I don't want to
// write everything while experimenting
if (streamByteCount > 1e9) iterStream.destroy();
})
iterStream.on('close', () => console.log(
getClosingLog(streamByteCount, chunkCount, streamStart)
));
And then pipe to file in WSL1 npm run noodle > output.log.
And I get something like:
[
'black,green,black,orange,green',
'black,green,black,orange,green',
'orange,green,black,orange,green',
'black,green,black,orange,green',
'black,green,black,orange,green',
'black,green,black,green,green',
]
for each chunk made by the generator I linked above, when I feed in an array of 200-300 strings that look like 'black,green,black,orange,green'.
I put that guard on bytes written in as I almost filled up my disk space (old machine, not much free disk space).
This has been fine for now, but I wanted to experiment with a compression library to see if I can get the file smaller, while still making each individual chunk searchable in the future (so I can compress individual chunks and search for them).
However, when I'm using LZUTF8.js like so:
const compressionStream = LZUTF8.createCompressionStream();
iterStream.pipe(compressionStream).on('data', chunk => {
console.log(chunk);
chunkCount++;
streamByteCount += chunk.length;
// File would be much larger, but I don't want to write everything while experimenting
if (streamByteCount > 1e9) iterStream.destroy();
})
I now get the error:
NodeError [TypeError]: The "chunk" argument must be one of type string or Buffer. Received Object. ...stackTrace
So firstly, I thought .from(iterable') and using .on('data') put the Stream into a certain mode that would work with objects. Secondly, LZUTF8.js says:
Creates a compression stream. The stream will accept both Buffers and Strings in any encoding supported by Node.js (e.g. utf8, utf16, ucs2, base64, hex, binary etc.) and return Buffers.
So while I don't know why the data event listener is complaining the NodeJs docs do say to Choose one API style:
Specifically, using a combination of on('data'), on('readable'), pipe(), or async iterators could lead to unintuitive behavior.
Which I'm guessing I'm seeing?
It makes sense to switch to a full .pipe(...) implementation using createWriteStream instead of console.log and piping to file, but then I loose the ability to abort if the file gets too big.
So I guess this is all a big preamble to asking, how do I prevent a writeStream from writing too much data, when I can't listen on the readStream emitting data events?