0

I am trying to convert multi nest JSON to flat, by using the built-in function in panda's, and a few solutions available on stack, was able to normalize up till the first level or the flattening creating unwanted result or modifying them not giving desired output, any insights, welcomed.

Sample JSON:


{
  "Records": [
    {
      "Name": "Student1",
      "Result": "Pass",
      "Marks": [
        {
          "Sub1": "50",
          "Sub2": "40",
          "YOP": [
            {
              "prim": "2010",
              "sch": "abc"
            },
            {
              "prim": "2010",
              "sch": "abc"
            }
          ]
        }
      ]
    },
    {
      "Name": "Stu2",
      "Result": "Pass",
      "Marks": [
        {
          "Sub1": "33",
          "Sub2": "33",
          "YOP": [
            {
              "prim": "2010",
              "sch": "def"
            },
            {
              "high": "2010",
              "sch": "abc"
            }
          ]
        }
      ]
    }
  ]
}

Current Code

from itertools import chain, starmap
import json
from itertools import islice
from pandas.io.json import json_normalize



from collections import MutableMapping
crumbs = True

def flatten(dictionary, parent_key=False, separator='.'):
    """
    Turn a nested dictionary into a flattened dictionary
    :param dictionary: The dictionary to flatten
    :param parent_key: The string to prepend to dictionary's keys
    :param separator: The string used to separate flattened keys
    :return: A flattened dictionary
    """

    items = []
    for key, value in dictionary.items():
        if crumbs: print('checking:',key)
        new_key = str(parent_key) + separator + key if parent_key else key
        if isinstance(value, MutableMapping):
            if crumbs: print(new_key,': dict found')
            if not value.items():
                if crumbs: print('Adding key-value pair:',new_key,None)
                items.append((new_key,None))
            else:
                items.extend(flatten(value, new_key, separator).items())
        elif isinstance(value, list):
            if crumbs: print(new_key,': list found')
            if len(value):
                for k, v in enumerate(value):
                    items.extend(flatten({str(k): v}, new_key).items())
            else:
                if crumbs: print('Adding key-value pair:',new_key,None)
                items.append((new_key,None))
        else:
            if crumbs: print('Adding key-value pair:',new_key,value)
            items.append((new_key, value))
    return dict(items)

    
def main():
  with open("aaa.json", "r") as f:
    data = json.loads(f.read())
  print(type(data))
  flat = flatten(data)
  print(flat)


if __name__ == '__main__':
    main()

Output

{
  "Records.0.Name": "Student1",
  "Records.0.Result": "Pass",
  "Records.0.Marks.0.Sub1": "50",
  "Records.0.Marks.0.Sub2": "40",
  "Records.0.Marks.0.YOP.0.prim": "2010",
  "Records.0.Marks.0.YOP.0.sch": "abc",
  "Records.0.Marks.0.YOP.1.high": "2012",
  "Records.0.Marks.0.YOP.1.sch": "abc",
  "Records.1.Name": "Stu2",
  "Records.1.Result": "Pass",
  "Records.1.Marks.0.Sub1": "33",
  "Records.1.Marks.0.Sub2": "33",
  "Records.1.Marks.0.YOP.0.prim": "210",
  "Records.1.Marks.0.YOP.0.sch": "def",
  "Records.1.Marks.0.YOP.1.high": "999",
  "Records.1.Marks.0.YOP.1.sch": "abc"
}

With this code, any idea how can to remove numbers, and separate it

End Expectation

{
  "Records.Name": "Student1",
  "Records.Result": "Pass",
  "Records.Marks.Sub1": "50",
  "Records.Marks.Sub2": "40",
  "Records.Marks.YOP.prim": "2010",
  "Records.Marks.YOP.sch": "abc",
  "Records.Marks.YOP.high": "2012",
  "Records.Marks.YOP.sch": "abc",
},

{
  "Records.Name": "Stu2",
  "Records.Result": "Pass",
  "Records.Marks.Sub1": "33",
  "Records.Marks.Sub2": "33",
  "Records.Marks.YOP.prim": "210",
  "Records.Marks.YOP.sch": "def",
  "Records.Marks.YOP.high": "999",
  "Records.Marks.YOP.sch": "abc"
}

a-y-u-1
  • 3
  • 2
  • Not exactly match expected output but this post may help .https://stackoverflow.com/questions/51359783/python-flatten-multilevel-nested-json#answer-51379007 – Kaz Jan 15 '22 at 14:33
  • I tried using flatten_json, its still not giving the expected result, also its making n number of coloumns – a-y-u-1 Jan 15 '22 at 15:02

1 Answers1

0
from flatten_json import flatten

records = flatten(json[0])

Input:

json = [{
  "Records": [
    {
      "Name": "Student1",
      "Result": "Pass",
      "Marks": [
        {
          "Sub1": "50",
          "Sub2": "40",
          "YOP": [
            {
              "prim": "2010",
              "sch": "abc"
            },
            {
              "prim": "2010",
              "sch": "abc"
            }
          ]
        }
      ]
    },
    {
      "Name": "Stu2",
      "Result": "Pass",
      "Marks": [
        {
          "Sub1": "33",
          "Sub2": "33",
          "YOP": [
            {
              "prim": "2010",
              "sch": "def"
            },
            {
              "high": "2010",
              "sch": "abc"
            }
          ]
        }
      ]
    }
  ]
}]

Output:

{'Records_0_Name': 'Student1',
 'Records_0_Result': 'Pass',
 'Records_0_Marks_0_Sub1': '50',
 'Records_0_Marks_0_Sub2': '40',
 'Records_0_Marks_0_YOP_0_prim': '2010',
 'Records_0_Marks_0_YOP_0_sch': 'abc',
 'Records_0_Marks_0_YOP_1_prim': '2010',
 'Records_0_Marks_0_YOP_1_sch': 'abc',
 'Records_1_Name': 'Stu2',
 'Records_1_Result': 'Pass',
 'Records_1_Marks_0_Sub1': '33',
 'Records_1_Marks_0_Sub2': '33',
 'Records_1_Marks_0_YOP_0_prim': '2010',
 'Records_1_Marks_0_YOP_0_sch': 'def',
 'Records_1_Marks_0_YOP_1_high': '2010',
 'Records_1_Marks_0_YOP_1_sch': 'abc'}

Update:

The result you were looking for:

records = [flatten(record, “.”) for record in json[0]['Records']]

Output:

[{'Name': 'Student1',
  'Result': 'Pass',
  'Marks_0_Sub1': '50',
  'Marks_0_Sub2': '40',
  'Marks_0_YOP_0_prim': '2010',
  'Marks_0_YOP_0_sch': 'abc',
  'Marks_0_YOP_1_prim': '2010',
  'Marks_0_YOP_1_sch': 'abc'},
 {'Name': 'Stu2',
  'Result': 'Pass',
  'Marks_0_Sub1': '33',
  'Marks_0_Sub2': '33',
  'Marks_0_YOP_0_prim': '2010',
  'Marks_0_YOP_0_sch': 'def',
  'Marks_0_YOP_1_high': '2010',
  'Marks_0_YOP_1_sch': 'abc'}]
shullaw
  • 101
  • 1
  • 5
  • superb, exactly what I was looking out for, is there any way I could remove the number coming in between the key .. like key name like Marks_YOP_high ? – a-y-u-1 Jan 16 '22 at 07:03
  • :param separator: string to separate dictionary keys by :param root_keys_to_ignore: set of root keys to ignore from flattening :param str replace_separators: Replace separators within keys – shullaw Jan 16 '22 at 07:21
  • sorry didn't get you, I tried modifying the function definition code, it's breaking – a-y-u-1 Jan 16 '22 at 07:24
  • Sorry for the formatting, but those are parameters to flatten. One of those parameters should help you with naming. Either that or renaming by Regex/string manipulation. – shullaw Jan 16 '22 at 07:26
  • I’ve updated the code to give output you required – shullaw Jan 16 '22 at 07:38
  • 1
    Thanks a lot, i guess need to do regex manipulate to remove numbers coming in between, don't see any other way ;) – a-y-u-1 Jan 16 '22 at 07:46