132

I am trying to load and parse a JSON file in Python. But I'm stuck trying to load the file:

import json
json_data = open('file')
data = json.load(json_data)

Yields:

ValueError: Extra data: line 2 column 1 - line 225116 column 1 (char 232 - 160128774)

I looked at 18.2. json — JSON encoder and decoder in the Python documentation, but it's pretty discouraging to read through this horrible-looking documentation.

First few lines (anonymized with randomized entries):

{"votes": {"funny": 2, "useful": 5, "cool": 1}, "user_id": "harveydennis", "name": "Jasmine Graham", "url": "http://example.org/user_details?userid=harveydennis", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 2, "cool": 4}, "user_id": "njohnson", "name": "Zachary Ballard", "url": "https://www.example.com/user_details?userid=njohnson", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 0, "cool": 4}, "user_id": "david06", "name": "Jonathan George", "url": "https://example.com/user_details?userid=david06", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 6, "useful": 5, "cool": 0}, "user_id": "santiagoerika", "name": "Amanda Taylor", "url": "https://www.example.com/user_details?userid=santiagoerika", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 8, "cool": 2}, "user_id": "rodriguezdennis", "name": "Jennifer Roach", "url": "http://www.example.com/user_details?userid=rodriguezdennis", "average_stars": 3.5, "review_count": 12, "type": "user"}
martineau
  • 112,593
  • 23
  • 157
  • 280
Pi_
  • 1,870
  • 5
  • 21
  • 24

4 Answers4

276

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.

If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.

Community
  • 1
  • 1
Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187
  • 3
    +1 Maybe it is worth noting, that if you do not need all objects at once, processing them one-by-one may be more efficient approach. This way you will not need to store whole data in the memory, but a single piece of it. – Tadeck Sep 16 '12 at 23:13
  • Thanks for the clarification, my assumption about the file not being ill-formatted was wrong. – Pi_ Sep 16 '12 at 23:14
  • @MartijnPieters Knowing the structure of the JSONs is there anyway I could parse the data calling each field by their names? – Pi_ Sep 16 '12 at 23:22
  • 1
    @Pi_: you'll have a dictionary, so just access the fields as keys: `data = json.loads(line); print data[u'votes']` – Martijn Pieters Sep 16 '12 at 23:26
  • @MartijnPieters that gives me a KeyError – Pi_ Sep 16 '12 at 23:29
  • 1
    @Pi_: print the result of json.loads() then or use the debugger to inspect. – Martijn Pieters Sep 16 '12 at 23:31
  • @MartijnPieters All it seems to be doing is copying the lines from the file and appending them to data[] while adding 'u' in front of each field of each json-object. – Pi_ Sep 16 '12 at 23:35
  • 1
    @Pi_: no; don't confuse the JSON format with the python dict representation. You are seeing python dictionaries with strings now. – Martijn Pieters Sep 16 '12 at 23:37
  • How to parse JSON though when the value part has multiple lines? – user2441441 Mar 09 '15 at 18:15
  • 1
    @user2441441: see the [linked answer](http://stackoverflow.com/questions/21708192/how-do-i-use-the-json-module-to-read-in-one-json-object-at-a-time/21709058#21709058) from the post here. – Martijn Pieters Mar 09 '15 at 18:16
  • I was writing a json object per line anyways, thanks for pointing to the "official specification" (I wanted to know a good suffix for the filetype) – Nathan Chappell Jul 01 '20 at 06:34
16

for those stumbling upon this question: the python jsonlines library (much younger than this question) elegantly handles files with one json document per line. see https://jsonlines.readthedocs.io/

wouter bolsterlee
  • 3,511
  • 21
  • 30
12

In case you are using pandas and you will be interested in loading the json file as a dataframe, you can use:

import pandas as pd
df = pd.read_json('file.json', lines=True)

And to convert it into a json array, you can use:

df.to_json('new_file.json')
arunppsg
  • 970
  • 10
  • 16
2

That is ill-formatted. You have one JSON object per line, but they are not contained in a larger data structure (ie an array). You'll either need to reformat it so that it begins with [ and ends with ] with a comma at the end of each line, or parse it line by line as separate dictionaries.

Daniel Roseman
  • 567,968
  • 59
  • 825
  • 842