I have the following Pandas dataframe:
pd.DataFrame({'ID': {0: 40606,
1: 19433,
2: 34933,
3: 18297,
4: 46228,
5: 36038,
6: 16143,
7: 12308,
8: 22986,
9: 28871,
10: 36818,
11: 31890,
12: 42947,
13: 10110,
14: 25186,
15: 43999,
16: 28460,
17: 48862,
18: 24836,
19: 31806,
20: 34473,
21: 38459,
22: 36586,
23: 14140,
24: 49353,
25: 44097,
26: 14852,
27: 36944,
28: 44621,
29: 40944,
30: 10477,
31: 22217,
32: 34648,
33: 27399,
34: 14917,
35: 26678,
36: 49312,
37: 44874,
38: 21136,
39: 40321,
40: 49900,
41: 11644,
42: 37934,
43: 24800,
44: 22028,
45: 13390,
46: 34589,
47: 12490,
48: 33239,
49: 35289,
50: 28609,
51: 37062,
52: 45509,
53: 23642,
54: 12108,
55: 21251,
56: 36965,
57: 21814,
58: 11152,
59: 30479,
60: 16660,
61: 31351,
62: 47235,
63: 48265,
64: 46502,
65: 27535,
66: 17125,
67: 30240,
68: 26522,
69: 21117,
70: 14616,
71: 45137,
72: 31875,
73: 17964,
74: 41279,
75: 25983,
76: 37298,
77: 40478,
78: 21939,
79: 25310,
80: 24907,
81: 24884,
82: 31009,
83: 25270,
84: 48393,
85: 33631,
86: 25470,
87: 18581,
88: 39614,
89: 24353,
90: 38089,
91: 30369,
92: 18381,
93: 48787,
94: 40100,
95: 19084,
96: 14243,
97: 40782,
98: 23796,
99: 34305},
'Date': {0: '02,01,2021',
1: '02,01,2021',
2: '02,01,2021',
3: '02,01,2021',
4: '02,01,2021',
5: '02,01,2021',
6: '02,01,2021',
7: '02,01,2021',
8: '02,01,2021',
9: '03,01,2021',
10: '03,01,2021',
11: '03,01,2021',
12: '03,01,2021',
13: '03,01,2021',
14: '03,01,2021',
15: '03,01,2021',
16: '03,01,2021',
17: '03,01,2021',
18: '03,01,2021',
19: '03,01,2021',
20: '04,01,2021',
21: '04,01,2021',
22: '04,01,2021',
23: '04,01,2021',
24: '04,01,2021',
25: '04,01,2021',
26: '04,01,2021',
27: '04,01,2021',
28: '04,01,2021',
29: '04,01,2021',
30: '05,01,2021',
31: '05,01,2021',
32: '05,01,2021',
33: '05,01,2021',
34: '05,01,2021',
35: '05,01,2021',
36: '05,01,2021',
37: '05,01,2021',
38: '05,01,2021',
39: '05,01,2021',
40: '05,01,2021',
41: '05,01,2021',
42: '05,01,2021',
43: '05,01,2021',
44: '05,01,2021',
45: '05,01,2021',
46: '06,01,2021',
47: '06,01,2021',
48: '06,01,2021',
49: '06,01,2021',
50: '06,01,2021',
51: '06,01,2021',
52: '06,01,2021',
53: '06,01,2021',
54: '06,01,2021',
55: '06,01,2021',
56: '06,01,2021',
57: '06,01,2021',
58: '06,01,2021',
59: '06,01,2021',
60: '06,01,2021',
61: '07,01,2021',
62: '07,01,2021',
63: '07,01,2021',
64: '07,01,2021',
65: '07,01,2021',
66: '07,01,2021',
67: '07,01,2021',
68: '07,01,2021',
69: '07,01,2021',
70: '07,01,2021',
71: '07,01,2021',
72: '07,01,2021',
73: '07,01,2021',
74: '07,01,2021',
75: '07,01,2021',
76: '07,01,2021',
77: '07,01,2021',
78: '07,01,2021',
79: '07,01,2021',
80: '07,01,2021',
81: '07,01,2021',
82: '07,01,2021',
83: '08,01,2021',
84: '08,01,2021',
85: '08,01,2021',
86: '08,01,2021',
87: '08,01,2021',
88: '08,01,2021',
89: '08,01,2021',
90: '08,01,2021',
91: '08,01,2021',
92: '08,01,2021',
93: '08,01,2021',
94: '08,01,2021',
95: '08,01,2021',
96: '08,01,2021',
97: '08,01,2021',
98: '08,01,2021',
99: '08,01,2021'}})
You will notice that the dates are repeated for several rows, and then continues to other dates (also repeated). I would like to 'distinct count' the occurence of ID values, for the repeated dates, and sum them up into a new column.
The final columns should have non-repeated dates and only counts of IDs found for those dates.
For example, how many IDs have occured for the date of 3rd Jan 2021 ? I have tried the following code, but it doesn't result in a correct set of values:
df.ID.rolling.sum().shift()