1

I am working on dataframe like this in Python Pandas:

import pandas as pd
df = pd.DataFrame({'date_time':['2019-03-29 16:04','2019-03-20 20:31' , '2019-03-26 21:02' , '2019-03-26 21:05' , '2019-03-29 16:04','2019-04-01 19:40'  , '2019-04-01 19:41'],
                   
             'item':['bed',  'lamp', 'candle', 'chair', 'bed',  'candle', 'lamp'],
             'location' :['home', 'home', 'home',   'home',  'home', 'home',   'home' ],
             'status'   :['new',  'used', 'used',   'new',   'new',  'used',   'new' ]})

df

enter image description here

and I am trying to create a summary like this dataframe:

# Expected output
# The number of records for each date_time and item, sorted by date_time

# date_time        item   count
# 2019-03-20 20:31 lamp   1
# 2019-03-26 21:02 candle 1
# 2019-03-26 21:05 chair  1
# 2019-03-29 16:04 bed    2
# 2019-04-01 19:40 candle 1
# 2019-04-01 19:41 lamp   1

I tried different aggregation methods but I could not create this dataframe.

Hamideh
  • 651
  • 1
  • 6
  • 17
  • 1
    IIUC, you need `df.groupby(['date_time','item']).agg(count=('item','count')).sort_index()` – Umar.H Oct 07 '20 at 09:53
  • 1
    @Manakin It looks good. The only problem is the column names. I want to have the column names to be date_time, item and count. – Hamideh Oct 07 '20 at 10:00
  • 1
    @Hamideh just do a reset_index() in Manakin's line of code. – LazyCoder Oct 07 '20 at 10:04
  • 1
    Second answer of dupe Or `df.groupby(['date_time','item'])['item'].size().reset_index(name='count')` – jezrael Oct 07 '20 at 10:11
  • @jezrael thanks I got distracted but was coming back to mark as dupe – Umar.H Oct 07 '20 at 11:08
  • 1
    @Manakin - No problem, too much question in this tag, sometimes not easy closing it. So if help with it is is really good. – jezrael Oct 07 '20 at 11:09

0 Answers0