17

I am running Python 3.6 and Pandas 0.19.2 and have a DataFrame which looks as follows:

Name      Chain        Food       Healthy  

George    McDonalds    burger     False
George    KFC          chicken    False
John      Wendys       burger     False
John      McDonalds    salad      True

I want to transform this dataframe into a dict which looks as follows:

health_data = {'George': {'McDonalds': {'Food': 'burger', 'Healthy':False},
                          'KFC':       {'Food': 'chicken', 'Healthy':False}},
               'John':   {'Wendys':    {'Food': 'burger', 'Healthy':False},
                          'McDonalds': {'Food': 'salad', 'Healthy': True}}}

My thoughts so far are:

  1. Use df.groupby to group the names column
  2. Use df.to_dict() to transform the dataframe into a dictionary along the lines of: health_data = input_data.set_index('Chain').T.to_dict()

Thoughts? Thanks up front for the help.

piRSquared
  • 265,629
  • 48
  • 427
  • 571
MRR
  • 559
  • 2
  • 5
  • 26

2 Answers2

18

I think you were very close.

Use groupby and to_dict:

df = df.groupby('Name')[['Chain','Food','Healthy']]
       .apply(lambda x: x.set_index('Chain').to_dict(orient='index'))
       .to_dict()

print (df)
{'George': {'KFC': {'Healthy': False, 'Food': 'chicken'}, 
           'McDonalds': {'Healthy': False, 'Food': 'burger'}}, 
'John': {'McDonalds': {'Healthy': True, 'Food': 'salad'},
         'Wendys': {'Healthy': False, 'Food': 'burger'}}}
richardec
  • 14,202
  • 6
  • 23
  • 49
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • Thanks so much! This worked perfectly. One small question: what does the `[['Chain','Food','Healthy']]` part of the answer do? – MRR Feb 02 '17 at 09:51
  • It is filter columns, but if no other columns in `df`, it can be simplified like `df.groupby('Name').apply(lambda x: x.set_index('Chain').to_dict(orient='index')).to_dict()` – jezrael Feb 02 '17 at 11:22
  • 1
    I was trying to do this for so long, didn't think to put the `.to_dict` inside the lambda, thanks as always Jozi :) – Umar.H Jan 03 '20 at 17:30
  • What if I want to set multiple index in `apply`? Do you have any idea? – E. Zeytinci Feb 24 '22 at 20:07
9

Solution using dictionary comprehension and groupby:

{n: grp.loc[n].to_dict('index')
 for n, grp in df.set_index(['Name', 'Chain']).groupby(level='Name')}

{'George': {'KFC': {'Food': 'chicken', 'Healthy': False},
  'McDonalds': {'Food': 'burger', 'Healthy': False}},
 'John': {'McDonalds': {'Food': 'salad', 'Healthy': True},
  'Wendys': {'Food': 'burger', 'Healthy': False}}}

Solution using defaultdict:

from collections import defaultdict

d = defaultdict(dict)

for i, row in df.iterrows():
    d[row.Name][row.Chain] = row.drop(['Name', 'Chain']).to_dict()

dict(d)

{'George': {'KFC': {'Food': 'chicken', 'Healthy': False},
  'McDonalds': {'Food': 'burger', 'Healthy': False}},
 'John': {'McDonalds': {'Food': 'salad', 'Healthy': True},
  'Wendys': {'Food': 'burger', 'Healthy': False}}}
tommy.carstensen
  • 7,952
  • 13
  • 59
  • 99
piRSquared
  • 265,629
  • 48
  • 427
  • 571
  • 2
    love the use of iterrows and default dict though it is slower than the group by for a little bit. This would allow you to chain multiple loops together. Another way would be to use a multi-index (but not appropriate for this example) – Jon May 03 '21 at 19:59