1

I am a bit new to Python programming. I have a small requirement where in I need to list down all customers and their amounts for a given fortnight in a JSON format.

Currently, I have a dataframe this way:

  FortNight      Amount     Customer    Parameter
  Apr-2FN-2018   339632.00    10992     CustomerSales
  Apr-2FN-2018   27282.00     10994     CustomerSales 
  Apr-2FN-2018   26353.00     10995     CustomerSales 
  Apr-2FN-2018   24797.00     11000     CustomerSales
  Apr-2FN-2018   21093.00     10990     CustomerSales

Expected JSON:

"CustomerSales" : [                                                                
    {"FortNight" : "Apr-2FN-2018",                                                                                      
         "Details" :[
             {"Customer":  "10992","Amount" : 339632.00},                                                                                                                                
             {"Customer":  "10994","Amount" : 27282.00},
             {"Customer":  "10995","Amount" : 26353.00},  
             {"Customer":  "11000","Amount" : 24797.00},
             {"Customer":  "10990","Amount" : 21093.00}
           ]
    }
]

I tried:

dict(df.set_index('Parameter').groupby(level=0).apply(lambda  x : ast.literal_eval(x.to_json(orient = 'records', date_format = "iso"))))

It retrieves this:

 [{'CustomerSales': 
[{'Customer': '10992', 'Amount': 339632.00, 'FortNight': 'Apr-2FN-2018'}, {'Customer': '10994', 'Amount': 27282.00, 'FortNight': 'Apr-2FN-2018'},{'Customer': '10995', 'Amount': 26353.00, 'FortNight': 'Apr-2FN-2018'},
{'Customer': '11000', 'Amount': 24797.00, 'FortNight': 'Apr-2FN-2018'},
{'Customer': '10990', 'Amount': 21093.00, 'FortNight': 'Apr-2FN-2018'}]}]

I tried other ways too but in vain. Any help is welcome. Thanks in advance!

Nazim Kerimbekov
  • 4,497
  • 8
  • 31
  • 54
  • 1
    **Never** use `ast.literal_eval()` to parse JSON. It's slower, and will produce errors the moment you have boolean or null values. Besides, converting to JSON then parsing back to a dictionary means you really just wanted to produce a dictionary to begin with by a more direct path. – Martijn Pieters Jul 12 '18 at 11:47
  • 1
    You also don't need to set the index to `Parameter` when grouping by that column. Just use `df.groupby('Parameter')`. – Martijn Pieters Jul 12 '18 at 11:58

1 Answers1

3

Start by grouping on both the Parameter and FortNight columns, and using .to_dict() on the resulting grouped rows to produce the inner-most dictionaries:

details = df.groupby(['Parameter', 'FortNight']).apply(
    lambda r: r[['Customer', 'Amount']].to_dict(orient='records'))

This gives you a series with a multi-index over Parameter and FortNight, and the values are all the lists in the correct format, each entry a dictionary with Customer and Amount columns. If you need to convert the value types, do so on the r[['Customer', 'Amount']] dataframe result before calling to_dict() on it.

You can then unstack the series into a dataframe, giving you a nested Parameter -> FortNight -> details structure; the Parameter values become columns, each list of Customer / Amount dictionaries indexed by FortNight:

nested = details.unstack('Parameter')

If you turn this into a dictionary, you'd get a dictionary that's mostly correct already:

>>> pprint(grouped.unstack('Parameter').to_dict())
{'CustomerSales': {'Apr-2FN-2018': [{'Amount': 339632.0, 'Customer': '10992'},
                                    {'Amount': 27282.0, 'Customer': '10994'},
                                    {'Amount': 26353.0, 'Customer': '10995'},
                                    {'Amount': 24797.0, 'Customer': '11000'},
                                    {'Amount': 21093.0, 'Customer': '10990'}]}}

but for your format, you'd convert the values in each column to a list of {'FortNight': indexvalue, 'Details': value} mappings, then converting the whole structure to a dictionary:

output = nested.apply(lambda s: [
    {s.index.name: idx, 'Details': value}
    for idx, value in s.items()
]).to_dict('records')

This gives you your final output:

>>> pprint(output)
[{'CustomerSales': {'Details': [{'Amount': 339632.0, 'Customer': '10992'},
                                {'Amount': 27282.0, 'Customer': '10994'},
                                {'Amount': 26353.0, 'Customer': '10995'},
                                {'Amount': 24797.0, 'Customer': '11000'},
                                {'Amount': 21093.0, 'Customer': '10990'}],
                    'FortNight': 'Apr-2FN-2018'}}]

If you need a JSON document, use .to_json(orient='records') rather than .to_dict('records').

Put together as one expression:

df.groupby(['Parameter', 'FortNight']).apply(
        lambda r: r[['Customer', 'Amount']].to_dict(orient='records')
    ).unstack('Parameter').apply(lambda s: [
        {s.index.name: idx, 'Details': value}
        for idx, value in s.items()]
    ).to_json(orient='records')
Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187