0

I have this data which is from a ferry crossing a straight. I have selected one year and filtered it to show only the harboring time. I need to figure out how long the ferry has spent in harbor per hour per day, in other words I need to end up with 365 x24 values. Since I have filtered the data by harboring, there will be rows which don't exist as the ferry usually only spends around 10 to 15 minutes in harbor each hour.

    timestamp   crossing
1585695 2019-12-31 23:00:00+00:00   8940
1585696 2019-12-31 23:00:30+00:00   8940
1585697 2019-12-31 23:01:00+00:00   8940
1585698 2019-12-31 23:01:30+00:00   8940
1585699 2019-12-31 23:02:00+00:00   8940
... ... ...
2636037 2020-12-30 22:57:30+00:00   14423
2636038 2020-12-30 22:58:00+00:00   14423
2636039 2020-12-30 22:58:30+00:00   14423
2636040 2020-12-30 22:59:00+00:00   14423
2636041 2020-12-30 22:59:30+00:00   14423

Here is one of my many attempts:

df_new.crossing.groupby(pd.to_datetime(df_new.timestamp).dt.hour).count()

which gives the following:

timestamp
0     43800
1     43749
2     43679
3     31218
4     18411
5     12193
6      9344
7      8773
8      8500
9      9177
10     9000
11     8637
12     8543
13     8397
14     8306
15     9038
16     9119
17     9946
18    10810
19    12901
20    14304
21    30206
22    42618
23    43800

If divided by two, this actually seems to be the right answer, except that I need to show the count per hour by day - not per hour by year. Any ideas?

  • Need `df_new.timestamp = pd.to_datetime(df_new.timestamp)` and then `df.resample('H', on='timestamp')['crossing'].count()` ? – jezrael Sep 03 '21 at 12:30

0 Answers0