8

This JSON output is from a MongoDB aggregate query. I essentially need to parse the nested data JSON down to the following to the 'total' and '_id' values.

{
'ok': 1.0, 
'result': [
            {
                'total': 142250.0, 
                '_id': 'BC'
            }, 
            {
                'total': 210.88999999999996,
                 '_id': 'USD'
            }, 

            {
                'total': 1065600.0, 
                '_id': 'TK'
            }
            ]
}

I've tried 5 different techniques to get what I need from it, however I've run into issues using the json and simplejson modules.

Ideally, the output will be something like this:

142250.0, BC
210.88999999999996, USD
1065600.0, TK
martineau
  • 112,593
  • 23
  • 157
  • 280
unique_beast
  • 1,229
  • 2
  • 11
  • 23
  • Can you post the code from your attempts with the two modules? – ely Nov 01 '13 at 15:25
  • As mentioned in cpburnz's answer: the problem is the single vs. double quote characters. You need double-quotes, and then a simple call to `json.loads` will work (your JSON string loads fine for me after that switch and is easy to parse). – ely Nov 01 '13 at 15:42

5 Answers5

11

NOTE: Your JSON response from MongoDB is not actually valid. JSON requires double-quotes ("), not single-quotes (').

I'm not sure why your response has single-quotes instead of double-quotes but from the looks of it you can replace them and then just use the built-in json module:

from __future__ import print_function
import json

response = """{
    'ok': 1.0, 
    'result': [
        {
            'total': 142250.0, 
            '_id': 'BC'
        }, 
        {
            'total': 210.88999999999996,
             '_id': 'USD'
        }, 

        {
            'total': 1065600.0, 
            '_id': 'TK'
        }
        ]
}"""

# JSON requires double-quotes, not single-quotes.
response = response.replace("'", '"')
response = json.loads(response)
for doc in response['result']:
    print(doc['_id'], doc['total'])
Uyghur Lives Matter
  • 17,261
  • 40
  • 105
  • 135
  • I think you mean `json` requires double-quotes in the comment line. – ely Nov 01 '13 at 15:34
  • Yes, I wrote that backwards. Fixed. – Uyghur Lives Matter Nov 01 '13 at 15:35
  • Another good idea is to make unicode strings. It's not always needed, but I find it relieves a lot of headaches with JSON. – ely Nov 01 '13 at 15:36
  • I'm not sure why you are rolling back edits. It's OK to let other people edit your answer. In fact, it's encouraged. – ely Nov 01 '13 at 15:45
  • I didn't realize you were making edits. I noticed that some things disappeared a couple times after I fixed various typos so I re-edited to put back what disappeared (probably due to conflicting edits?) – Uyghur Lives Matter Nov 01 '13 at 15:46
  • It's fine to add additional edits, but it's good to check whether the most recent edit by someone else had made a good change, and then try to keep that change while adding back your disappeared change. It's not a big deal though. The only time it matters is when a moderator makes gross, sweeping changes to a question and then uses moderator status to rally support from others on the meta site to ensure the original text or edits desired by the original author are unable to come back. That can be frustrating, but it's uncommon. – ely Nov 01 '13 at 15:49
  • don't use an unconditional replace on a string with a structured data (a literal for Python dictionary in this case). It might work for some time until it *silently* corrupts the data within. Use an appropriate parser instead e.g., [`ast.literal_eval()`](http://stackoverflow.com/a/19730573/4279) – jfs Nov 01 '13 at 16:09
  • @J.F. Fair enough, `ast.iteral_eval()` would be safer solution (really getting a proper response from MongoDB would be best). Given the data which only contains currency code strings and numeric values, a search and replace is sufficient. This would incorrectly convert an embedded `\'` into a `\"` (e.g., `that\'s` would become `that\"s`. Being pedantic, if the response contained a `Date` or `ObjectId`, `literal_eval()` would fail. But still that's probably better if they're not needed. – Uyghur Lives Matter Nov 01 '13 at 16:38
  • Getting this response from trying the above: >>> for doc in response['result']: print doc['_id'], doc['total'] SyntaxError: invalid syntax – unique_beast Nov 01 '13 at 18:14
  • @Andrew In python 3 print is a function instead of a statement. It's fixed to support both now. – Uyghur Lives Matter Nov 01 '13 at 23:32
1

The response you are getting from the mongodb seems to be the compatible to put for the dictionary type object. as

{
    'ok': 1.0,  'result': [
        {
            'total': 142250.0, 
            '_id': 'BC'
        }, 
        {
            'total': 210.88999999999996,
             '_id': 'USD'
        }, 
        {
            'total': 1065600.0, 
            '_id': 'TK'
        }
    ]
}

Instead of putting it into multiline string and replacing single quotes in double quotes, can't we directly assign it to the dict type object. and perform further operation on it like:

json_data = {
    'ok': 1.0,
    'result':
        [
            {
                'total': 142250.0,
                '_id': 'BC'
            },
            {
                'total': 210.88999999999996,
                '_id': 'USD'
            },
            {
                'total': 1065600.0,
                '_id': 'TK'
            }
    ]
}

And:

for data in json_data['result']:
    print(data['total'], data['_id'])
sɐunıɔןɐqɐp
  • 2,877
  • 15
  • 33
  • 38
-1
import json

data = json.loads(mongo_db_json)
result = data['result']
for value_dict in result:
    print '{0}, {1}'.format(value['total'], value['_id'])

This should work

elssar
  • 5,301
  • 6
  • 45
  • 70
  • The OP alluded to having already tried this (I assume this was similar to the code that was mentioned but not posted). The real issue was the use of single-quotes. – ely Nov 01 '13 at 15:43
  • Actually the OP said that he tried 5 different ways, didn't mention any specific method. But fair enough, couldn't have been such a simple thing. – elssar Nov 01 '13 at 16:12
  • Yeah, this didn't work for the reason mentioned in the first answer. PyMongo is only giving singular quotes in its output. – unique_beast Nov 01 '13 at 17:49
-1

Your example text is not valid JSON text. JSON string must start with a " quotation mark, not '; but it seems a valid Python literal that you can parse with ast.literal_eval() function:

import ast

data = ast.literal_eval(input_string)
for item in data["result"]:
    print("{total}, {_id}".format(**item))

Output

142250.0, BC
210.89, USD
1065600.0, TK

A better way might be to fix the querying process to get valid JSON and use json module to parse it.

jfs
  • 374,366
  • 172
  • 933
  • 1,594
  • Way bad solution if you're building code that is part of a system that relies on JSON. I can't emphasize enough how unbelievably bad it would be to find code that attempted to get around a JSON parsing error by finding a *different* parsing tool that would parse the ill-formed JSON string. The number of situations where that is an acceptable way to address this is so small that it merits downvoting. – ely Nov 01 '13 at 17:58
  • All of the values are singular quote responses. – unique_beast Nov 01 '13 at 18:17
  • >>> data = ast.literal_eval(str(response)) >>> for item in data["result"]: print("{total}, {_id}".format(**item)) – unique_beast Nov 01 '13 at 18:17
  • That will have to do for now, though I would like to actually get JSON out of my Mongo instance... – unique_beast Nov 01 '13 at 18:18
  • 1
    @EMS: I don't see json data in the question. Therefore I don't use json parser to parse it. I've edited the answer to mention explicitly that the input should be changed to json – jfs Nov 01 '13 at 23:03
  • The string literal is intended to be well-formed JSON as the OP says. Reading that string literally and coming up with a way to parse it outside of the other constraints that may be present in the system is unhelpful. Since other answers already addressed this, your answer is better deleted than edited. – ely Nov 03 '13 at 00:29
  • @EMS: if input is JSON then `json` module can be used. End of story. But the input is not JSON. if OP can't/won't fix the input that looks like a Python literal then `ast.literal_eval` is a safer alternative than a blind `.replace` as [I've commented already](http://stackoverflow.com/questions/19729710/parsing-nested-json-using-python/19730573?noredirect=1#comment29314372_19729976). – jfs Nov 03 '13 at 06:57
  • I understand. Your first two sentences are the whole key. Input is *supposed to be JSON* so use the `json` module. End of story. Hence this answer is not needed. Simply adding a comment to the OP that linked to the other question where you provided the `ast` solution is enough. Duplicating that effort into an answer here that is unrelated to the JSON double-quote issue is unhelpful. – ely Nov 03 '13 at 15:45
  • @EMS: ok, I've got your point: you think that actual data presented in the question being non-JSON is a non-issue that deserves at most a comment and could be fixed by such sloppy methods as global `.replace` on the data. – jfs Nov 05 '13 at 13:28
  • No. You haven't got my point. The fact that the actual data is non-JSON *when it's supposed to be JSON* **is** the problem. Finding a way to make that string be JSON *must* be part of any solution, either at the database level (the best way) or with a call to `replace` (another way, less robust). But regardless, the point is that the system (with possibly other components that need to print out valid JSON or write valid JSON to disk...) has a standard (JSON) that shouldn't be ignored just because there's an easy way to parse some badly formed literal in some one-off place in the code. – ely Nov 05 '13 at 14:31
  • @EMS: I agree, that is why I wrote: "if OP **can't/won't** fix the input that looks like a Python literal then ast.literal_eval is a safer alternative than a blind .replace". – jfs Nov 05 '13 at 14:32
  • Right, and that part can just be a comment. Not at all needed as an answer. – ely Nov 05 '13 at 14:33
  • @EMS: and here we disagree. I consider traversing a dictionary to be a trivial matter but a more robust handling of the input data to be essential. – jfs Nov 05 '13 at 14:37
  • That's great and the mention of `ast.literal_eval` is helpful. Just not as a separate answer (since it doesn't answer the question, only offers a work-around if one is willing to ignore the central constraint of the question). Your point is a valid one, it just should be present only in the initial comment you made. This answer is practically a straight copy/paste from the other question where you also answered about using `ast.literal_eval` so, in many ways, it's wasted space sitting as an answer here. That's why I think it should just be present on this page as a comment to the OP only. – ely Nov 05 '13 at 14:41
-2

This should do.

import json

def parse_json(your_json):
    to_dict = json.loads(your_json)
    for item in to_dict['results']:
        print item['total']
shshank
  • 2,521
  • 1
  • 16
  • 27
  • The OP alluded to having already tried this (I assume this was similar to the code that was mentioned but not posted). The real issue was the use of single-quotes. – ely Nov 01 '13 at 15:44