2

I am having some trouble getting the following dataframe to a JSON structure. I've tried a few things but can't quite get to the last bit. So I have a data frame with the following

  serialNumber |    date    | part  | value | name
 --------------|------------|-------|-------|---------------- 
  ABC0001      | 01/10/2019 | Part1 | ABC1  | ABC            
  ABC0001      | 01/10/2019 | Part1 | ABC2  | XYZ            
  ABC0001      | 02/10/2019 | Part2 | ABC3  | ASF            
  ABC0001      | 02/10/2019 | Part2 | ABC4  | TSR    

And need it in the format of

  { "SerialNumber": "ABC001",
    "detail": [  { "part": "Part1",
                   "date":"01/10/2019",
                   "extras": [  { "value": "ABC1",
                                  "name": "ABC"
                                },
                                { "value": "ABC2",
                                  "name": "XYZ"
                                }]
                 },
                 { "part": "Part2",
                   "date":"02/10/2019",
                   "extras": [   { "value": "ABC3",
                                  "name": "ASF"
                                },
                                { "value": "ABC4",
                                  "name": "TSR"
                                }]
              ]
     }  

So grouping serialnumber, then data and part, then value and name. I've had a look at some answers here and here, the last one helped a lot

df.groupby(['serialNumber', 'Part']).apply(
        lambda r: r[['Value', 'identifierName']].to_dict(orient='records')
    ).unstack('serialNumber').apply(lambda s: [
        {s.index.name: idx, 'detail=': value}
        for idx, value in s.items()]
    ).to_json(orient='records')

which gives me

[
   {
      "ABC0001":{
         "Part":"Part1",
         "detail=":[
            {
               "Value":"ABC1",
               "identifierName":"ABC"
            },
            {
               "Value":"ABC2",
               "identifierName":"XYZ"
            }
         ]
      }
   },
   {
      "ABC0001":{
         "Part":"Part2",
         "detail=":[
            {
               "Value":"ABC3",
               "identifierName":"ASF"
            },
            {
               "Value":"ABC4",
               "identifierName":"TSR"
            }
         ]
      }
   }
]

but breaks down when I add Date, and doesn't show the label of serial number Suggestions?? tips?

Jon
  • 3,827
  • 3
  • 11
  • 28

1 Answers1

0

There is no default function in pandas to solve this.

This nesting code iterates through each level of the MultIndex, adding layers to the dictionary until the deepest layer is assigned to the Series value.

This will work for any number of nesting folds:

grouped = df.set_index(['serialNumber', 'Part'])

import json

levels = grouped.ndim
dicts = [{} for i in range(levels)]
last_index = None

for index,value in enumerate(grouped.itertuples(), 1):

    if not last_index:
        last_index = index

    for (ii,(i,j)) in enumerate(zip(index, last_index)):
        if not i == j:
            ii = levels - ii -1
            dicts[:ii] =  [{} for _ in dicts[:ii]]
            break

    for i, key in enumerate(reversed(index)):
        dicts[i][key] = value
        value = dicts[i]

    last_index = index


result = json.dumps(dicts[-1])
pissall
  • 6,460
  • 1
  • 20
  • 39
  • Thanks for the update, I've just ran a test and got 'Cannot access callable attribute 'itertuples' of 'DataFrameGroupBy' objects, try using the 'apply' method' I'm just reading the csv file into a dataframe, then running the above code? Suggestions? – Jon Oct 11 '19 at 11:31
  • Yes I understood the problem and fixed it – pissall Oct 11 '19 at 11:35
  • Sorry, one more, I;ve tried to debug it, now getting ValueError: too many values to unpack (expected 2) – Jon Oct 11 '19 at 12:02
  • @Jonee Which line? – pissall Oct 11 '19 at 12:20
  • for index,value in grouped.itertuples() – Jon Oct 11 '19 at 12:27
  • @Jonee Sorry I missed an `enumerate` – pissall Oct 11 '19 at 16:16
  • sorry I'm getting a error - zip argument #1 must support iteration on the line ---> 12 for (ii,(i,j)) in enumerate(zip(index, last_index)): – Jon Oct 12 '19 at 20:38