1512

I have this JSON in a file:

{
    "maps": [
        {
            "id": "blabla",
            "iscategorical": "0"
        },
        {
            "id": "blabla",
            "iscategorical": "0"
        }
    ],
    "masks": [
        "id": "valore"
    ],
    "om_points": "value",
    "parameters": [
        "id": "valore"
    ]
}

I wrote this script to print all of the JSON data:

import json
from pprint import pprint

with open('data.json') as f:
    data = json.load(f)

pprint(data)

This program raises an exception, though:

Traceback (most recent call last):
  File "<pyshell#1>", line 5, in <module>
    data = json.load(f)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 13 column 13 (char 213)

How can I parse the JSON and extract its values?

richardec
  • 14,202
  • 6
  • 23
  • 49
michele
  • 25,820
  • 29
  • 99
  • 158
  • @kederrac For the reason given: "This question was caused by a typo or a problem that can no longer be reproduced." The json is invalid. – Rob Mar 12 '20 at 13:36
  • @kederrac The issue is caused by an error in usage not because it can be reproduced. – Rob Mar 12 '20 at 13:40
  • The issue with the input is simply that "masks" and "parameters" have [] lists(/arrays) instead of {} dicts(/objects). – smci Mar 11 '22 at 20:25
  • This question's status was discussed [here](https://meta.stackoverflow.com/q/381492/1394393). Community consensus was that this question was "good enough" to be left open after substantial edits. Please open a new discussion if you feel something has changed since that discussion. – jpmc26 May 19 '22 at 03:44

9 Answers9

2193

Your data is not valid JSON format. You have [] when you should have {}:

  • [] are for JSON arrays, which are called list in Python
  • {} are for JSON objects, which are called dict in Python

Here's how your JSON file should look:

{
    "maps": [
        {
            "id": "blabla",
            "iscategorical": "0"
        },
        {
            "id": "blabla",
            "iscategorical": "0"
        }
    ],
    "masks": {
        "id": "valore"
    },
    "om_points": "value",
    "parameters": {
        "id": "valore"
    }
}

Then you can use your code:

import json
from pprint import pprint

with open('data.json') as f:
    data = json.load(f)

pprint(data)

With data, you can now also find values like so:

data["maps"][0]["id"]
data["masks"]["id"]
data["om_points"]

Try those out and see if it starts to make sense.

np_6
  • 504
  • 1
  • 6
  • 19
Justin Peel
  • 46,122
  • 6
  • 57
  • 78
  • 2
    Ok so I have to control my code because this json file is generated from a java object. Thanks. – michele May 14 '10 at 16:26
  • serialized data is wrapped with [] , and when you read it in you need f.read(), that is if you use the standard. – radtek Dec 23 '14 at 18:43
  • 5
    Thanks for the solution. i'm getting a unicode symbol while printing it. (eg u'valore' ). How to prevent it? – diaryfolio Jan 30 '15 at 15:36
  • 6
    Nice but python adds a `u'` before each key. Any idea why? – CodyBugstein Jul 05 '15 at 07:14
  • 7
    That is why your text is type unicode not string. Most time it is better to have text in unicode for german umlauts and for sharing text results with other modules/programs etc. . So you're good! – Michael P Aug 29 '15 at 11:56
  • How to know size of the maps array to control index in this example?data["maps"][0]["id"] - Here 0 hard coded. – Karthi Apr 26 '17 at 02:34
  • isn't there a resource leak because the handle to `data.json` is never closed? – Max Heiber Jan 16 '18 at 19:37
  • In python 3, my json file is an array [] of jsons, it's called valid json by online checkers, and with these commands it loaded perfectly. Perhaps the definitions have changed circa 2018? – Nikhil VJ Feb 19 '18 at 15:19
  • @nikhilvj json doesn't need to have `{}` at the root level. It can start with an array at the root level (`[]`) – Justin Peel Feb 22 '18 at 18:08
  • https://stackoverflow.com/a/27415238/3299397 need this piece for it to work. – Kyle Bridenstine Jul 13 '18 at 19:27
  • What exception will be thrown if the with call fails? Should this be wrapped in a try catch? – Kyle Bridenstine Aug 21 '18 at 19:00
  • Nice explanation @JustinPeel. – AbhinavVaidya8 Sep 28 '18 at 09:29
  • 3
    I'd like to make an observation that is hopefully helpful, and definitely ironic. I find the pprint module to be inferior to the json module for pretty-printing json. If you try them both, I think you'll agree. To display and debug my json data structures, I've been doing: output = json.dumps(data_structure, indent=2, sort_keys=True) print(output) I think you'll find the indent-control, sorting, and intelligent line-wrapping in the dumps() method to be quite to your liking. If my thinking is wrong, someone please let me know. – Larold Oct 12 '18 at 19:12
  • @JustinPeel is there any chance that the order of the elements that are in the array (value of maps) will be changed. For example storing it into a data store (elastic search or any database) and then getting it back from there. – viveksinghggits Apr 22 '19 at 09:17
322

Your data.json should look like this:

{
 "maps":[
         {"id":"blabla","iscategorical":"0"},
         {"id":"blabla","iscategorical":"0"}
        ],
"masks":
         {"id":"valore"},
"om_points":"value",
"parameters":
         {"id":"valore"}
}

Your code should be:

import json
from pprint import pprint

with open('data.json') as data_file:    
    data = json.load(data_file)
pprint(data)

Note that this only works in Python 2.6 and up, as it depends upon the with-statement. In Python 2.5 use from __future__ import with_statement, in Python <= 2.4, see Justin Peel's answer, which this answer is based upon.

You can now also access single values like this:

data["maps"][0]["id"]  # will return 'blabla'
data["masks"]["id"]    # will return 'valore'
data["om_points"]      # will return 'value'
Community
  • 1
  • 1
Bengt
  • 13,291
  • 6
  • 46
  • 65
  • 7
    I got a downvote on this. Maybe it was not clear, why I thought another answer was necessary. Added note on compatibility of the with-statement. – Bengt Feb 26 '13 at 19:57
  • Sorry for the roll back, but the suggested code would keep `data_file` `open`ed longer than necessary. – Bengt May 25 '13 at 12:10
  • Referring to 2.6 documentation (https://docs.python.org/2.6/library/io.html), opening a file in the "with" context will automatically close the file. – Steve S. Jun 16 '15 at 01:54
  • 1
    @SteveS. Yes, but not before the context is left. `pprint`ing in the `with`-context keeps the `data_file` open longer. – Bengt Jun 16 '15 at 17:45
  • Is there a way to access like data.om_points or data.masks.id? – Gayan Pathirage Mar 15 '17 at 10:16
  • This works except when I try to use a numbered index like `data["maps"][0]["id"]` I see error: `KeyError: 0` – Patrick Schaefer Apr 03 '17 at 19:25
  • 1
    @GayanPathirage you access it like `data["om_points"]` , `data["masks"]["id"]`. The idea is you can reach any level in a dictionary by specifying the 'key paths'. If you get a `KeyError` exception it means the key doesn't exist in the path. Look out for typos or check the structure of your dictionary. – Nuhman May 25 '18 at 04:55
76

Justin Peel's answer is really helpful, but if you are using Python 3 reading JSON should be done like this:

with open('data.json', encoding='utf-8') as data_file:
    data = json.loads(data_file.read())

Note: use json.loads instead of json.load. In Python 3, json.loads takes a string parameter. json.load takes a file-like object parameter. data_file.read() returns a string object.

To be honest, I don't think it's a problem to load all json data into memory in most cases. I see this in JS, Java, Kotlin, cpp, rust almost every language I use. Consider memory issue like a joke to me :)

On the other hand, I don't think you can parse json without reading all of it.

Geng Jiawen
  • 8,140
  • 2
  • 44
  • 37
  • 12
    Why should `json.load` be avoided in favor of `.loads` in Python 3? – Zearin Jul 16 '15 at 14:55
  • @Zearin pls check the [official doc](https://docs.python.org/3/library/json.html#json.loads). – Geng Jiawen Jul 17 '15 at 02:13
  • 11
    The page you linked doesn't say anything about avoiding `load`. – Dan Hulme Mar 19 '16 at 17:58
  • @Dan Hulme, Because data_file.read() return a string.The `json.loads` take string parameters. The `json.load` take file-like object. – Geng Jiawen Jul 22 '16 at 08:38
  • 32
    This answer read whole file to memory when is does not have to and suggests that in Python 3 JSON files cannot be read lazily, which is untrue. I'm sorry, but it's clear downvote. – Łukasz Rogalski Aug 02 '16 at 09:41
  • 12
    This answer isn't accurate. There's no reason not to use json.load with an open file handler in python3. Sorry for the downvote, but it doesn't seem like you read the above comments very carefully. – dusktreader Sep 30 '16 at 21:21
  • 2
    this answer works, but there is no need to convert to a string by reading file contents explicitly just so you can use json.loads. using json.load and a file-like object does this for you and is a better solution. – Corey Goldberg Jun 15 '17 at 15:28
  • 5
    +1 This answer is great! Thank you for that and pulled me from going far for looking for a function that can use strings cause I only work with strings and network request that are not file! – newpeople Jul 28 '17 at 14:42
  • @ŁukaszRogalski I know, but for most file, the memory is not an issue. – Geng Jiawen Mar 23 '18 at 03:42
  • Since python 3.6, `json.loads` now supports bytes, not just string. – Escher Apr 12 '18 at 11:11
58
data = []
with codecs.open('d:\output.txt','rU','utf-8') as f:
    for line in f:
       data.append(json.loads(line))
smbanaei
  • 1,035
  • 8
  • 13
  • 9
    this is the correct solution if you have multiple json objects in a file. `json.loads` does not decode multiple json objects. Otherwise, you get 'Extra Data' error. – yasin_alm Mar 21 '16 at 21:43
  • This is the best answer. Otherwise, it gives 'Extra Data' error. – Earthx9 Jun 11 '16 at 12:05
  • 40
    Having mutliple json objects in a file means that the file itself is not actually valid json. If you have multiple objects to include in a json file, they should be contained in an array at the top level of the file. – dusktreader Sep 30 '16 at 21:23
  • Having multiple json objects in a file means the file is not a single json object. That's sort of obvious. Making a single array out of the objects is an obvious workaround. But JSON is by design explicitly terminated, at almost every level (by `}`, `]` or `"`). Hence you can indeed concatenate multiple objects in a single string or single file, without ambiguity. The problem here is that a parser expecting a single object fails when it's passed more than one object. – MSalters May 02 '19 at 13:35
  • 1
    Ad storing multiple JSON objects in a single file: there is a "standard" for that - http://jsonlines.org/examples/ in `.jsonl` (json lines), the objects are separated by a newline character which makes the pre-processing for parsing trivial, and allows to easily split/batch files without worrying about start/end markers. – Sebi May 10 '19 at 14:34
14

"Ultra JSON" or simply "ujson" can handle having [] in your JSON file input. If you're reading a JSON input file into your program as a list of JSON elements; such as, [{[{}]}, {}, [], etc...] ujson can handle any arbitrary order of lists of dictionaries, dictionaries of lists.

You can find ujson in the Python package index and the API is almost identical to Python's built-in json library.

ujson is also much faster if you're loading larger JSON files. You can see the performance details in comparison to other Python JSON libraries in the same link provided.

codeforester
  • 34,080
  • 14
  • 96
  • 122
moeabdol
  • 4,411
  • 6
  • 41
  • 41
9

If you're using Python3, you can try changing your (connection.json file) JSON to:

{
  "connection1": {
    "DSN": "con1",
    "UID": "abc",
    "PWD": "1234",
    "connection_string_python":"test1"
  }
  ,
  "connection2": {
    "DSN": "con2",
    "UID": "def",
    "PWD": "1234"
  }
}

Then using the following code:

connection_file = open('connection.json', 'r')
conn_string = json.load(connection_file)
conn_string['connection1']['connection_string_python'])
connection_file.close()
>>> test1
LogicalBranch
  • 4,248
  • 4
  • 21
  • 54
sushmit
  • 4,021
  • 2
  • 30
  • 35
6

Here you go with modified data.json file:

{
    "maps": [
        {
            "id": "blabla",
            "iscategorical": "0"
        },
        {
            "id": "blabla",
            "iscategorical": "0"
        }
    ],
    "masks": [{
        "id": "valore"
    }],
    "om_points": "value",
    "parameters": [{
        "id": "valore"
    }]
}

You can call or print data on console by using below lines:

import json
from pprint import pprint
with open('data.json') as data_file:
    data_item = json.load(data_file)
pprint(data_item)

Expected output for print(data_item['parameters'][0]['id']):

{'maps': [{'id': 'blabla', 'iscategorical': '0'},
          {'id': 'blabla', 'iscategorical': '0'}],
 'masks': [{'id': 'valore'}],
 'om_points': 'value',
 'parameters': [{'id': 'valore'}]}

Expected output for print(data_item['parameters'][0]['id']):

valore
np_6
  • 504
  • 1
  • 6
  • 19
Ramapati Maurya
  • 618
  • 9
  • 10
  • If we would like add a column to count how many observations does "maps" have, how could we write this function? – Chenxi Jun 07 '18 at 17:24
6

There are two types in this parsing.

  1. Parsing data from a file from a system path
  2. Parsing JSON from remote URL.

From a file, you can use the following

import json
json = json.loads(open('/path/to/file.json').read())
value = json['key']
print(json['value'])

This arcticle explains the full parsing and getting values using two scenarios.Parsing JSON using Python

Pikamander2
  • 5,875
  • 3
  • 42
  • 57
Bibin Wilson
  • 498
  • 6
  • 14
3

As a python3 user,

The difference between load and loads methods is important especially when you read json data from file.

As stated in the docs:

json.load:

Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

json.loads:

json.loads: Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.

json.load method can directly read opened json document since it is able to read binary file.

with open('./recipes.json') as data:
  all_recipes = json.load(data)

As a result, your json data available as in a format specified according to this conversion table:

https://docs.python.org/3.7/library/json.html#json-to-py-table

muratgozel
  • 2,093
  • 23
  • 29
  • How is this an answer to the question asked? The user was using the right method to load json file. – Raj006 Nov 04 '19 at 07:20