40

I have long file I need to parse. Because it's very long I need to do it chunk by chunk. I tried this:

function parseFile(file){
    var chunkSize = 2000;
    var fileSize = (file.size - 1);

    var foo = function(e){
        console.log(e.target.result);
    };

    for(var i =0; i < fileSize; i += chunkSize)
    {
        (function( fil, start ) {
            var reader = new FileReader();
            var blob = fil.slice(start, chunkSize + 1);
            reader.onload = foo;
            reader.readAsText(blob);
        })( file, i );
    }
}

After running it I see only the first chunk in the console. If I change 'console.log' to jquery append to some div I see only first chunk in that div. What about other chunks? How to make it work?

mnowotka
  • 15,824
  • 17
  • 82
  • 129

5 Answers5

80

FileReader API is asynchronous so you should handle it with block calls. A for loop wouldn't do the trick since it wouldn't wait for each read to complete before reading the next chunk. Here's a working approach.

function parseFile(file, callback) {
    var fileSize   = file.size;
    var chunkSize  = 64 * 1024; // bytes
    var offset     = 0;
    var self       = this; // we need a reference to the current object
    var chunkReaderBlock = null;

    var readEventHandler = function(evt) {
        if (evt.target.error == null) {
            offset += evt.target.result.length;
            callback(evt.target.result); // callback for handling read chunk
        } else {
            console.log("Read error: " + evt.target.error);
            return;
        }
        if (offset >= fileSize) {
            console.log("Done reading file");
            return;
        }

        // of to the next chunk
        chunkReaderBlock(offset, chunkSize, file);
    }

    chunkReaderBlock = function(_offset, length, _file) {
        var r = new FileReader();
        var blob = _file.slice(_offset, length + _offset);
        r.onload = readEventHandler;
        r.readAsText(blob);
    }

    // now let's start the read with the first block
    chunkReaderBlock(offset, chunkSize, file);
}
alediaferia
  • 2,453
  • 17
  • 22
  • 4
    This is brilliant. Reading huge 3GB+ files without issue. The small chunk size makes it a bit slow though. – bryc Feb 15 '15 at 05:54
  • Wrote a CRC32 calculator using this for fun using web workers/dragndrop. http://jsfiddle.net/9xzf8qqj/ – bryc Feb 15 '15 at 06:23
  • 2
    Worked for me as well for large files. However, for larger files (>9GB), I found out incrementing `offset` by `evt.target.result.length` was **corrupting** my file! My quick solution was to increment it by `chunkSize` instead. I'm not sure if it's a FS issue (I'm on Ubuntu) or something else, but it works just fine for any filesize if you `offset += chunkSize`. – user40171 May 11 '15 at 05:41
  • 1
    I kind of improved it here: https://gist.github.com/alediaferia/cfb3a7503039f9278381 I didn't test it though, so if you notice glitches please let me know. – alediaferia Jun 22 '15 at 10:52
  • I was just thinking... Wouldn't it be better to call next `block()` before invoking callback, so that the async IO is already going on while callback is executing? Because the callback is very likely gonna use some CPU for parsing and it may take a while. – Tomáš Zato - Reinstate Monica Oct 29 '15 at 15:20
  • @TomášZato actually it wouldn't matter much. I'm not a Javascript expert at all, actually I don't really code with Javascript but I think that the actual `async IO` won't start anyway until the `readEventHandler` stack is popped. – alediaferia Mar 15 '16 at 20:18
  • @alediaferia Why do you think so? I guess in that case you would need add some asyncness in the code. In that case it wouldn't be worth it, even though it's just 2 lines. – Tomáš Zato - Reinstate Monica Mar 15 '16 at 20:22
  • FileReader `onload` callback is invoked when the data has been read from the file. This happens "asynchronously" in the sense that it is handled by the JavaScript Event Loop as soon as it is possible to process it. This means that even if I call `chunkReaderBlock` earlier, no IO would occur as long as the stack is still busy with the `readEventHandler` call. For reference: https://developer.mozilla.org/en/docs/Web/JavaScript/EventLoop – alediaferia Mar 15 '16 at 22:43
  • Do you have any recommendation on how I could determine which is the last chunk? I need to make a different REST call for the last chunk – Batman Sep 03 '16 at 18:24
  • @user40171 Thanks for `offset += chunkSize` : my file wasn't corrupted, but I couldn't get the right number of chunks. +1 – Ontokrat May 03 '17 at 19:01
  • 2
    according to the [docs](https://developer.mozilla.org/en-US/docs/Web/API/FileReader), ```onload``` is only called if there is no error. Use ```onloadend``` otherwise. I would however recommend using ```onload``` and ```onerror```.In short: the code above is never catching any error. – Flavien Volken Apr 14 '18 at 15:00
  • Hi sir, I'm trying to use your code for uploading big files. I retrieve my file from input with `var file = document.getElementById("file").files` and pass it to parseFile function. I met this error `Uncaught TypeError: _file.slice is not a function`. I realized that file is not a string. Should I read it as a text? How can I do it? Thanks – Andrea Martinelli Aug 03 '18 at 11:09
  • So that chunk that gets passed to the callback, what format is that in? I need to convert it to a byte format so that I can pass it up to the server and process it properly. – Marcel Marino Sep 19 '19 at 15:19
  • 2
    `var self = this; // we need a reference to the current object` where exactly is this used? – SOFe Mar 05 '20 at 03:46
  • 1
    I managed to get this code to work for me! Thanks! However, on Opera 68 (WebKit engine, MacOS X Catalina) the `evt.target.result.length` is undefined and breaks everything. However, there is `.byteLength` which does what is needed and used it instead. I haven't checked the `offset += chunkSize` solution but I fear it might break in certain edge (no browser pun intended) cases. – Andrei Rînea Jun 12 '20 at 21:46
  • 1
    Wow, this answer is so old and still keeps getting attention. I should probably stop and rewrite this messy code. Glad it's useful to some. – alediaferia Jun 12 '20 at 22:57
  • Funny thing is, I tried the same as OP but with`await` but only first chunk was read. Here: https://stackoverflow.com/q/62346764/1796 – Andrei Rînea Jun 14 '20 at 14:51
12

You can take advantage of Response (part of fetch) to convert most things to anything else blob, text, json and also get a ReadableStream that can help you read the blob in chunks

var dest = new WritableStream({
  write (str) {
    console.log(str)
  }
})

var blob = new Blob(['bloby']);

(blob.stream ? blob.stream() : new Response(blob).body)
  // Decode the binary-encoded response to string
  .pipeThrough(new TextDecoderStream())
  .pipeTo(dest)
  .then(() => {
    console.log('done')
  })

Old answer (WritableStreams pipeTo and pipeThrough was not implemented before)

I came up with a interesting idéa that is probably very fast since it will convert the blob to a ReadableByteStreamReader probably much easier too since you don't need to handle stuff like chunk size and offset and then doing it all recursive in a loop

function streamBlob(blob) {
  const reader = new Response(blob).body.getReader()
  const pump = reader => reader.read()
  .then(({ value, done }) => {
    if (done) return
    // uint8array chunk (use TextDecoder to read as text)
    console.log(value)
    return pump(reader)
  })
  return pump(reader)
}

streamBlob(new Blob(['bloby'])).then(() => {
  console.log('done')
})
Endless
  • 29,359
  • 11
  • 97
  • 120
  • This is much better than slicing, although you don't get to control the chunk size. (on Chrome, it was 64KiB) – corwin.amber Dec 14 '19 at 16:52
  • 2
    try using the new `blob.stream()` and see what chunk size you get, probably better than wrapping blob in a Response and get a stream directly instead – Endless Dec 14 '19 at 21:52
  • @Endless how can we preview large image file chunk by chunk? So that, DOM not not getting hanged? – GaurangDhorda Jul 15 '20 at 17:21
8

The second argument of slice is actually the end byte. Your code should look something like:

 function parseFile(file){
    var chunkSize = 2000;
    var fileSize = (file.size - 1);

    var foo = function(e){
        console.log(e.target.result);
    };

    for(var i =0; i < fileSize; i += chunkSize) {
        (function( fil, start ) {
            var reader = new FileReader();
            var blob = fil.slice(start, chunkSize + start);
            reader.onload = foo;
            reader.readAsText(blob);
        })(file, i);
    }
}

Or you can use this BlobReader for easier interface:

BlobReader(blob)
.readText(function (text) {
  console.log('The text in the blob is', text);
});

More information:

Minko Gechev
  • 24,430
  • 7
  • 59
  • 67
  • 1
    Is the loop reliable? I'm rather new to `FileReader` API but I see it is asynchronous. How can we make sure the whole file has been processed completely once the `for loop` ends? – alediaferia Jan 31 '15 at 19:22
  • How can we preview large size image using FileReader? Because, large size of around multiple image file of 800mb around DOM hangs. – GaurangDhorda Jul 16 '20 at 19:15
5

Revamped @alediaferia answer in a class (typescript version here) and returning the result in a promise. Brave coders would even have wrapped it into an async iterator

class FileStreamer {
    constructor(file) {
        this.file = file;
        this.offset = 0;
        this.defaultChunkSize = 64 * 1024; // bytes
        this.rewind();
    }
    rewind() {
        this.offset = 0;
    }
    isEndOfFile() {
        return this.offset >= this.getFileSize();
    }
    readBlockAsText(length = this.defaultChunkSize) {
        const fileReader = new FileReader();
        const blob = this.file.slice(this.offset, this.offset + length);
        return new Promise((resolve, reject) => {
            fileReader.onloadend = (event) => {
                const target = (event.target);
                if (target.error == null) {
                    const result = target.result;
                    this.offset += result.length;
                    this.testEndOfFile();
                    resolve(result);
                }
                else {
                    reject(target.error);
                }
            };
            fileReader.readAsText(blob);
        });
    }
    testEndOfFile() {
        if (this.isEndOfFile()) {
            console.log('Done reading file');
        }
    }
    getFileSize() {
        return this.file.size;
    }
}

Example printing a whole file in the console (within an async context)

const fileStreamer = new FileStreamer(aFile);
while (!fileStreamer.isEndOfFile()) {
  const data = await fileStreamer.readBlockAsText();
  console.log(data);
}
Flavien Volken
  • 16,172
  • 11
  • 86
  • 115
  • Thanks, very handy. Did you test it? Any corrections? – Leo Apr 30 '18 at 15:09
  • 1
    @Leo I am using it in one of my projects and yes it's working fine. Note that all those answer might be deprecated sooner or later by [Streams API](https://developer.mozilla.org/en-US/docs/Web/API/Streams_API). One thing I could improve would be to add the ability to pass an optional encoding parameter to the [fileReader.readAsText function](https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsText) – Flavien Volken May 01 '18 at 14:47
  • Hm, I am going to use it for binary files. Can I just replace `readAsText` with `readAsArrayBuffer`? Or is it safe to use UTF-8 for reading (and output)? – Leo May 01 '18 at 23:40
  • 1
    Yes you can use readAsArrayBuffer, or just take my ts version [here](https://gist.github.com/Xample/c1b7664ba33e09335b94379e48a00c8e) – Flavien Volken May 02 '18 at 06:01
  • @Flavienvolken how we preview large image file chunk by chunk? So that DOM not getting hanged? E.g each image has 25mb in size with about 600mb of image to preview at a time ? – GaurangDhorda Jul 15 '20 at 17:29
  • If your image is not compressed for instance a bmp or any other file format, then you might create a tile picking only the chunk of data you need. If your image is compressed this is a completely different problem… for instance a codec like the jpeg2000 relies on the entire image data to build a 1/1 ratio (full quality) tile. – Flavien Volken Jul 21 '20 at 14:46
3

Parsing the large file into small chunk by using the simple method:

                //Parse large file in to small chunks
                var parseFile = function (file) {

                        var chunkSize = 1024 * 1024 * 16; //16MB Chunk size
                        var fileSize = file.size;
                        var currentChunk = 1;
                        var totalChunks = Math.ceil((fileSize/chunkSize), chunkSize);

                        while (currentChunk <= totalChunks) {

                            var offset = (currentChunk-1) * chunkSize;
                            var currentFilePart = file.slice(offset, (offset+chunkSize));

                            console.log('Current chunk number is ', currentChunk);
                            console.log('Current chunk data', currentFilePart);

                            currentChunk++;
                        }
                };
Community
  • 1
  • 1
Radadiya Nikunj
  • 900
  • 9
  • 10