2

I am performing a bunch of aggregate stats on a groupby data frame. For one column in particular, ios_id, I would like a count and a distinct count. I'm not sure how o output this to two seaparate columns with different names. As of right now, the distinct count just overwrites the count.

How do I output both the distinct count and the count for the ios_id column to two separate columns?

df_new = df.groupby('video_id').agg({"ios_id": np.count_nonzero,
                                     "ios_id": pd.Series.nunique,
                                     "feed_position": np.average,
                                     "time_watched": np.sum,
                                     "video_length": np.sum}).sort('ios_id', ascending=False)
metersk
  • 10,448
  • 17
  • 59
  • 92
  • `ios_id` is a reference to the column on which to perform the statistic on. If I change the names then there is nothing to reference. – metersk May 30 '15 at 16:12

1 Answers1

1

Something like this should work. Note the nested dictionary structure for iOS_id.

df_new = df.groupby('video_id').agg({"ios_id": {"count": "count",
                                                "distinct": "unique"},
                                     "feed_position": np.average,
                                     "time_watched": np.sum,
                                     "video_length": np.sum})

For more details, please refer to Naming returned columns in Pandas aggregate function:

Community
  • 1
  • 1
Alexander
  • 96,739
  • 27
  • 183
  • 184