50

I have an excel sheet that looks like so:

Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

And I'm looking to extract that data, group it by column 1, and add it to a dictionary so it appears like this:

{0: [1],
1: [2,3,5],
2: [1,2],
3: [4,5],
4: [1],
5: [1,2,3]}

This is my code so far

excel = pandas.read_excel(r"e:\test_data.xlsx", sheetname='mySheet', parse_cols'A,C')
myTable = excel.groupby("Column1").groups
print myTable

However, my output looks like this:

{0: [0L], 1: [1L, 2L, 3L], 2: [4L, 5L], 3: [6L, 7L], 4: [8L], 5: [9L, 10L, 11L]}

Thanks!

LondonRob
  • 63,346
  • 33
  • 126
  • 176
SuperDougDougy
  • 561
  • 1
  • 4
  • 9

2 Answers2

80

You could groupby on Column1 and then take Column3 to apply(list) and call to_dict?

In [81]: df.groupby('Column1')['Column3'].apply(list).to_dict()
Out[81]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}

Or, do

In [433]: {k: list(v) for k, v in df.groupby('Column1')['Column3']}
Out[433]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
Zero
  • 66,763
  • 15
  • 141
  • 151
  • 15
    Bit mean to accept this answer when @EdChum's is identical and was posted 3 minutes earlier. – LondonRob Jun 05 '15 at 19:07
  • What is the optimal way to do this for multiple features ie: Column1 is the consistent key but inplace of Column3, it can have Column4, Column5 and then merge so that Column1 still remains as a key to the dict. – Sade Mar 04 '21 at 09:05
  • 1
    wow man, your name should be Hero not Zero – Andrea Russett Mar 12 '21 at 20:25
37

According to the docs, the GroupBy.groups:

is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.

If you want the values themselves, you can groupby 'Column1' and then call apply and pass the list method to apply to each group.

You can then convert it to a dict as desired:

In [5]:

dict(df.groupby('Column1')['Column3'].apply(list))
Out[5]:
{0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}

(Note: have a look at this SO question for why the numbers are followed by L)

Community
  • 1
  • 1
EdChum
  • 339,461
  • 188
  • 752
  • 538