0

I have the following Pandas dataframe:

pd.DataFrame({'ID': {0: 40606,
  1: 19433,
  2: 34933,
  3: 18297,
  4: 46228,
  5: 36038,
  6: 16143,
  7: 12308,
  8: 22986,
  9: 28871,
  10: 36818,
  11: 31890,
  12: 42947,
  13: 10110,
  14: 25186,
  15: 43999,
  16: 28460,
  17: 48862,
  18: 24836,
  19: 31806,
  20: 34473,
  21: 38459,
  22: 36586,
  23: 14140,
  24: 49353,
  25: 44097,
  26: 14852,
  27: 36944,
  28: 44621,
  29: 40944,
  30: 10477,
  31: 22217,
  32: 34648,
  33: 27399,
  34: 14917,
  35: 26678,
  36: 49312,
  37: 44874,
  38: 21136,
  39: 40321,
  40: 49900,
  41: 11644,
  42: 37934,
  43: 24800,
  44: 22028,
  45: 13390,
  46: 34589,
  47: 12490,
  48: 33239,
  49: 35289,
  50: 28609,
  51: 37062,
  52: 45509,
  53: 23642,
  54: 12108,
  55: 21251,
  56: 36965,
  57: 21814,
  58: 11152,
  59: 30479,
  60: 16660,
  61: 31351,
  62: 47235,
  63: 48265,
  64: 46502,
  65: 27535,
  66: 17125,
  67: 30240,
  68: 26522,
  69: 21117,
  70: 14616,
  71: 45137,
  72: 31875,
  73: 17964,
  74: 41279,
  75: 25983,
  76: 37298,
  77: 40478,
  78: 21939,
  79: 25310,
  80: 24907,
  81: 24884,
  82: 31009,
  83: 25270,
  84: 48393,
  85: 33631,
  86: 25470,
  87: 18581,
  88: 39614,
  89: 24353,
  90: 38089,
  91: 30369,
  92: 18381,
  93: 48787,
  94: 40100,
  95: 19084,
  96: 14243,
  97: 40782,
  98: 23796,
  99: 34305},
 'Date': {0: '02,01,2021',
  1: '02,01,2021',
  2: '02,01,2021',
  3: '02,01,2021',
  4: '02,01,2021',
  5: '02,01,2021',
  6: '02,01,2021',
  7: '02,01,2021',
  8: '02,01,2021',
  9: '03,01,2021',
  10: '03,01,2021',
  11: '03,01,2021',
  12: '03,01,2021',
  13: '03,01,2021',
  14: '03,01,2021',
  15: '03,01,2021',
  16: '03,01,2021',
  17: '03,01,2021',
  18: '03,01,2021',
  19: '03,01,2021',
  20: '04,01,2021',
  21: '04,01,2021',
  22: '04,01,2021',
  23: '04,01,2021',
  24: '04,01,2021',
  25: '04,01,2021',
  26: '04,01,2021',
  27: '04,01,2021',
  28: '04,01,2021',
  29: '04,01,2021',
  30: '05,01,2021',
  31: '05,01,2021',
  32: '05,01,2021',
  33: '05,01,2021',
  34: '05,01,2021',
  35: '05,01,2021',
  36: '05,01,2021',
  37: '05,01,2021',
  38: '05,01,2021',
  39: '05,01,2021',
  40: '05,01,2021',
  41: '05,01,2021',
  42: '05,01,2021',
  43: '05,01,2021',
  44: '05,01,2021',
  45: '05,01,2021',
  46: '06,01,2021',
  47: '06,01,2021',
  48: '06,01,2021',
  49: '06,01,2021',
  50: '06,01,2021',
  51: '06,01,2021',
  52: '06,01,2021',
  53: '06,01,2021',
  54: '06,01,2021',
  55: '06,01,2021',
  56: '06,01,2021',
  57: '06,01,2021',
  58: '06,01,2021',
  59: '06,01,2021',
  60: '06,01,2021',
  61: '07,01,2021',
  62: '07,01,2021',
  63: '07,01,2021',
  64: '07,01,2021',
  65: '07,01,2021',
  66: '07,01,2021',
  67: '07,01,2021',
  68: '07,01,2021',
  69: '07,01,2021',
  70: '07,01,2021',
  71: '07,01,2021',
  72: '07,01,2021',
  73: '07,01,2021',
  74: '07,01,2021',
  75: '07,01,2021',
  76: '07,01,2021',
  77: '07,01,2021',
  78: '07,01,2021',
  79: '07,01,2021',
  80: '07,01,2021',
  81: '07,01,2021',
  82: '07,01,2021',
  83: '08,01,2021',
  84: '08,01,2021',
  85: '08,01,2021',
  86: '08,01,2021',
  87: '08,01,2021',
  88: '08,01,2021',
  89: '08,01,2021',
  90: '08,01,2021',
  91: '08,01,2021',
  92: '08,01,2021',
  93: '08,01,2021',
  94: '08,01,2021',
  95: '08,01,2021',
  96: '08,01,2021',
  97: '08,01,2021',
  98: '08,01,2021',
  99: '08,01,2021'}})

You will notice that the dates are repeated for several rows, and then continues to other dates (also repeated). I would like to 'distinct count' the occurence of ID values, for the repeated dates, and sum them up into a new column.

The final columns should have non-repeated dates and only counts of IDs found for those dates.

For example, how many IDs have occured for the date of 3rd Jan 2021 ? I have tried the following code, but it doesn't result in a correct set of values:

df.ID.rolling.sum().shift()
Dinesh
  • 590
  • 1
  • 7
  • 27

0 Answers0