3

A JSON file is 6 GB. When reading it with the following code,

var fs = require('fs');
var contents = fs.readFileSync('large_file.txt').toString();

It had the following error:

buffer.js:182
    throw err;
    ^

RangeError: "size" argument must not be larger than 2147483647
    at Function.Buffer.allocUnsafe (buffer.js:209:3)
    at tryCreateBuffer (fs.js:530:21)
    at Object.fs.readFileSync (fs.js:569:14)
    at Object.<anonymous> (/home/readHugeFile.js:4:19)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
    at Function.Module._load (module.js:458:3)
    at Function.Module.runMain (module.js:605:10)

Could somebody help, please?

Alessio Cantarella
  • 4,800
  • 3
  • 26
  • 31
  • Possible duplicate of [Node.js read big file with fs.readFileSync](https://stackoverflow.com/questions/29766868/node-js-read-big-file-with-fs-readfilesync) – Kukic Vladimir Jul 09 '17 at 09:20
  • Possible duplicate of [What's the maximum size of a Node.js Buffer](https://stackoverflow.com/questions/8974375/whats-the-maximum-size-of-a-node-js-buffer) – Evan Carroll Feb 24 '19 at 23:27

1 Answers1

7

The maximum size for a Buffer, which is what readFileSync() uses internally to hold the file data, is about 2GB (source: https://nodejs.org/api/buffer.html#buffer_buffer_kmaxlength).

You probably need a streaming JSON parser, like JSONStream, to process your file:

const JSONStream = require('JSONStream');
const fs         = require('fs');

fs.createReadStream('large_file.json')
  .pipe(JSONStream.parse('*'))
  .on('data', entry => {
    console.log('entry', entry);
  });
robertklep
  • 185,685
  • 31
  • 370
  • 359
  • Hi @robertklep, I am writing a CLI app, where I need to parse a big JSON file, and respond to the user. Your code does the job, but it's asynchronous. Is there a recommended way of working with streams synchronously? Thanks – mils Jun 14 '18 at 05:35
  • @mils the main feature that streams provide in this example is being able to parse the files incrementally. In your case, it sounds like you want to read/parse the file in one go, which will require multi-GB of RAM. Are you sure that's what you want? AFAIK, there aren't any synchronous stream implementation (and besides that, streams are inherently event-based). – robertklep Jun 14 '18 at 06:33
  • Hi @robertklep, I'm basically doing multiple passes over the file. The first pass gets a unique set of ids from the JSON file, then for each id I perform another pass, retrieving deeper data. This is how I keep memory use low. But I need to do step 2 after step 1 is completely finished (i.e. `for(id in ids)`), and return results to the user. I guess maybe I'm just a n00b with javascript. Is there a "classic" way of solving the sync/async problem? Thanks again – mils Jun 14 '18 at 07:20
  • 1
    @mils what might help is the observation that readable streams emit an `end` event when all data has been read. In your case, you can start step 2 once the `end` event of step 1 has fired. That way, you can chain the stream operations. – robertklep Jun 14 '18 at 07:37
  • 3
    2GB max for a Buffer seems too small for today's standard. Does anyone know how to increase this? Why does it have to be so small? Is this a fundamental limitation with Javascript (v8) itself? – Soichi Hayashi Sep 12 '18 at 21:12
  • @SoichiHayashi I'm not sure if typed arrays have a default maximum size. – robertklep Sep 13 '18 at 06:38