Using python ijson to read a large json file with multiple json objects

Question

I'm trying to parse a large (~100MB) json file using ijson package which allows me to interact with the file in an efficient way. However, after writing some code like this,

with open(filename, 'r') as f:
    parser = ijson.parse(f)
    for prefix, event, value in parser:
        if prefix == "name":
            print(value)

I found that the code parses only the first line and not the rest of the lines from the file!!

Here is how a portion of my json file looks like:

{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.012000}
{"name":"engine_speed","value":772,"timestamp":1364323939.027000}
{"name":"vehicle_speed","value":0,"timestamp":1364323939.029000}
{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.035000}

In my opinion, I think ijson parses only one json object.

Can someone please suggest how to work around this?

Possible duplicate of http://stackoverflow.com/questions/10715628/opening-a-large-json-file-in-python — ErikR, May 13 '16 at 02:38
Well, the chunk you provided looks like a _set_ of JSONs. That is you should read lines one by one and separately parse it. — user3159253, May 13 '16 at 02:51
BTW since each line is short, you don't need ijson, you can use `json.loads()`. — user3159253, May 13 '16 at 02:52

score 12 · Accepted Answer · answered May 13 '16 at 03:08

12

Since the provided chunk looks more like a set of lines each composing an independent JSON, it should be parsed accordingly:

# each JSON is small, there's no need in iterative processing
import json 
with open(filename, 'r') as f:
    for line in f:
        data = json.loads(line)
        # data[u'name'], data[u'engine_speed'], data[u'timestamp'] now
        # contain correspoding values

answered May 13 '16 at 03:08

user3159253

15,770
3
26
46

1

Thanks for answering, i'm asking if this will not load the hole file into RAM ? , if it loads only one line at time, so this is awesome – Boubouh Karim May 13 '16 at 03:13
certainly `for line in f:` reads one line a time. Check http://stackoverflow.com/questions/17246260/python-readlines-usage-and-efficient-practice-for-reading – user3159253 May 13 '16 at 03:24
How can I handle custom en- and decoding in ijson? I can do this rather easily with json and the cls= argument, how is it done in ijson? Any links? Thanks! – gilgamash Oct 28 '20 at 09:13

Mr-IDE · Answer 2 · 2019-03-21T12:32:09.180

Unfortunately the ijson library (v2.3 as of March 2018) does not handle parsing multiple JSON objects. It can only handle 1 overall object, and if you attempt to parse a second object, you will get an error: "ijson.common.JSONError: Additional data". See bug reports here:

It's a big limitation. However, as long as you have line breaks (new line character) after each JSON object, you can parse each one line-by-line independently, like this:

import io
import ijson

with open(filename, encoding="UTF-8") as json_file:
    cursor = 0
    for line_number, line in enumerate(json_file):
        print ("Processing line", line_number + 1,"at cursor index:", cursor)
        line_as_file = io.StringIO(line)
        # Use a new parser for each line
        json_parser = ijson.parse(line_as_file)
        for prefix, type, value in json_parser:
            print ("prefix=",prefix, "type=",type, "value=",value)
        cursor += len(line)

You are still streaming the file, and not loading it entirely in memory, so it can work on large JSON files. It also uses the line streaming technique from: How to jump to a particular line in a huge text file? and uses enumerate() from: Accessing the index in 'for' loops?

Using python ijson to read a large json file with multiple json objects

2 Answers2

Linked