43

i need to format the contents of a Json file in a certain format in a pandas DataFrame so that i can run pandassql to transform the data and run it through a scoring model.

file = C:\scoring_model\json.js (contents of 'file' are below)

{
"response":{
  "version":"1.1",
  "token":"dsfgf",
   "body":{
     "customer":{
         "customer_id":"1234567",
         "verified":"true"
       },
     "contact":{
         "email":"mr@abc.com",
         "mobile_number":"0123456789"
      },
     "personal":{
         "gender": "m",
         "title":"Dr.",
         "last_name":"Muster",
         "first_name":"Max",
         "family_status":"single",
         "dob":"1985-12-23",
     }
   }
 }

I need the dataframe to look like this (obviously all values on same row, tried to format it best as possible for this question):

version | token | customer_id | verified | email      | mobile_number | gender |
1.1     | dsfgf | 1234567     | true     | mr@abc.com | 0123456789    | m      |

title | last_name | first_name |family_status | dob
Dr.   | Muster    | Max        | single       | 23.12.1985

I have looked at all the other questions on this topic, have tried various ways to load Json file into pandas

`with open(r'C:\scoring_model\json.js', 'r') as f:`
    c = pd.read_json(f.read())

 `with open(r'C:\scoring_model\json.js', 'r') as f:`
    c = f.readlines()

tried pd.Panel() in this solution Python Pandas: How to split a sorted dictionary in a column of a dataframe

with dataframe results from [yo = f.readlines()] thought about trying to split contents of each cell based on ("") and find a way to put the split contents into different columns but no luck so far. Your expertise is greatly appreciated. Thank you in advance.

Community
  • 1
  • 1
figgy
  • 545
  • 2
  • 5
  • 11

1 Answers1

64

If you load in the entire json as a dict (or list) e.g. using json.load, you can use json_normalize:

In [11]: d = {"response": {"body": {"contact": {"email": "mr@abc.com", "mobile_number": "0123456789"}, "personal": {"last_name": "Muster", "gender": "m", "first_name": "Max", "dob": "1985-12-23", "family_status": "single", "title": "Dr."}, "customer": {"verified": "true", "customer_id": "1234567"}}, "token": "dsfgf", "version": "1.1"}}

In [12]: df = pd.json_normalize(d)

In [13]: df.columns = df.columns.map(lambda x: x.split(".")[-1])

In [14]: df
Out[14]:
        email mobile_number customer_id verified         dob family_status first_name gender last_name title  token version
0  mr@abc.com    0123456789     1234567     true  1985-12-23        single        Max      m    Muster   Dr.  dsfgf     1.1
Curious Watcher
  • 411
  • 4
  • 10
Andy Hayden
  • 328,850
  • 93
  • 598
  • 514
  • 3
    Brilliant! Thank you very much Andy – figgy Dec 18 '15 at 10:26
  • 1
    The json data gets loaded as dtype 'object'. In order to transform the data with pandasql queries, the data in the columns need to be in various data types (i.e 'int', 'str', 'timestamp', etc.), what is best way to do this? do you recommend the approach found here? [link](http://stackoverflow.com/questions/15891038/pandas-change-data-type-of-columns) – figgy Dec 18 '15 at 18:07
  • following up on my comment above, ended up using cast(variable_x as integer) to transform the data in pandasql – figgy Dec 28 '15 at 16:28
  • For pandas 1.0.0: pd.json_normalize(d) – Mega Feb 01 '20 at 20:25
  • Can we append tag name to columns to keep track. Example: contact_email, contact_mobileNum so on. In case of big json that would be very handy. – MAC Feb 14 '20 at 18:50