I have this data which is from a ferry crossing a straight. I have selected one year and filtered it to show only the harboring time. I need to figure out how long the ferry has spent in harbor per hour per day, in other words I need to end up with 365 x24 values. Since I have filtered the data by harboring, there will be rows which don't exist as the ferry usually only spends around 10 to 15 minutes in harbor each hour.
timestamp crossing
1585695 2019-12-31 23:00:00+00:00 8940
1585696 2019-12-31 23:00:30+00:00 8940
1585697 2019-12-31 23:01:00+00:00 8940
1585698 2019-12-31 23:01:30+00:00 8940
1585699 2019-12-31 23:02:00+00:00 8940
... ... ...
2636037 2020-12-30 22:57:30+00:00 14423
2636038 2020-12-30 22:58:00+00:00 14423
2636039 2020-12-30 22:58:30+00:00 14423
2636040 2020-12-30 22:59:00+00:00 14423
2636041 2020-12-30 22:59:30+00:00 14423
Here is one of my many attempts:
df_new.crossing.groupby(pd.to_datetime(df_new.timestamp).dt.hour).count()
which gives the following:
timestamp
0 43800
1 43749
2 43679
3 31218
4 18411
5 12193
6 9344
7 8773
8 8500
9 9177
10 9000
11 8637
12 8543
13 8397
14 8306
15 9038
16 9119
17 9946
18 10810
19 12901
20 14304
21 30206
22 42618
23 43800
If divided by two, this actually seems to be the right answer, except that I need to show the count per hour by day - not per hour by year. Any ideas?