0

I have the next problem: I have a dataframe in pandas with an attribute 'features' and another attribute 'VOTES'. 'VOTES' is numeric, and 'features' is a string which is repeated in the dataframe. I want to group according to features and sum the values of VOTES, in order to get the next result:

Dataframe initially:

+----------+---------+
| features | VOTES   |
+----------+---------+
| A        | 4       |
+----------+---------+
| V        | 3       |
+----------+---------+
| A        | 2       |
+----------+---------+
| C        | 9       |
+----------+---------+

I did the following but I got NaN values on VOTES column.

dataframe_clusters['VOTES'] = dataframe_clusters.groupby('features')['VOTES'].sum()

I want to get the next result:

+----------+---------+
| features | VOTES   |
+----------+---------+
| A        | 6       |
+----------+---------+
| V        | 3       |
+----------+---------+
| C        | 9       |
+----------+---------+
jartymcfly
  • 1,753
  • 6
  • 27
  • 47

3 Answers3

1

You can do in this way:

dataframe_clusters.groupby('features').sum().reset_index()

Output:

  features  VOTES
0        A      6
1        C      9
2        V      3
Joe
  • 11,147
  • 5
  • 36
  • 50
0

You can add reset_index or parameter as_index=False, also for not sorting values of features is possible add parameter sort=False:

df = dataframe_clusters.groupby('features', sort=False)['VOTES'].sum().reset_index()

df = dataframe_clusters.groupby('features', as_index=False, sort=False)['VOTES'].sum()

print (df)
  features  VOTES
0        A      6
1        V      3
2        C      9

If want assign to new column is possible use GroupBy.transform for return Series of aggregated values with same size as original DataFrame:

dataframe_clusters['VOTES'] = dataframe_clusters.groupby('features')['VOTES'].transform('sum')
print (dataframe_clusters)

  features  VOTES
0        A      6
1        V      3
2        A      6
3        C      9
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
0

From your question is not really clear what you need in the end. The grouping you're doing is OK, but for some reason you're assigning it a column of the same dataFrame. I'm guessing that you need a join in the end. Check this:

import pandas as pd
df = pd.DataFrame(data={'features':['A','V','A','C'], 'VOTES':[4,3,2,9]})
totals = df.groupby('features').sum()
print(df)
print(totals)
joined = df.join(totals, on='features', rsuffix='_total')
print(joined)

It will give you this:

   VOTES features
0      4        A
1      3        V
2      2        A
3      9        C
          VOTES
features       
A             6
C             9
V             3
   VOTES features  VOTES_total
0      4        A            6
1      3        V            3
2      2        A            6
3      9        C            9