I have a dataframe maint_comp4 which has the following values-
comp4 datetime_maint machineID
1 2020-06-01 06:00:00 24
1 2020-06-16 06:00:00 4
1 2020-06-16 06:00:00 15
1 2020-06-16 06:00:00 25
1 2020-07-01 06:00:00 1
And another dataframe tel_times with following values-
machineID datetime_tel
0 1 2021-01-01 06:00:00
1 1 2021-01-01 07:00:00
2 1 2021-01-01 08:00:00
3 1 2021-01-01 09:00:00
4 1 2021-01-01 10:00:00
I am merging these two dataframes and than applying a condition such that we take values where datetime_tel is greater than datetime_maint and then I am taking the difference of these two datatime columns to find number of day difference.
maint_tel_comp4 = tel_times.merge(maint_comp4, on='machineID', how='left')
component4= maint_tel_comp4[maint_tel_comp4['datetime_tel'].gt(maint_tel_comp4['datetime_maint'])]
component4['sincelastComp4'] = (component4['datetime_tel']-component4['datetime_maint']).dt.days
The problem is when I am applying the condition greater than, it is giving multiple rows but I want the row which has the least day difference.
For example this is the code for a particular date and machineID,
component4[(component4['machineID']==30) & (component4['datetime_tel']=='2021-01-31 06:00:00')]
And it is giving the following output,
machineID datetime_tel comp4 datetime_maint sincelastComp4
1978014 30 2021-01-31 06:00:00 1 2020-08-15 06:00:00 169
1978015 30 2021-01-31 06:00:00 1 2021-01-16 06:00:00 15
Now this output has two rows but I want only one column which has sincelastComp4 value 15 (the minimum one). Similarly, for other dates also, there are multiple rows coming but I want only that ones which has minimum sincelastComp4 value.
I want the minimum because sincelastComp4 indicates, number of days which has passed since Component 4 was repaired. So, we want that value to be minimum.
Edit1- As per answer suggested in comment, I tried using sort_values, but it is giving output for only 1 machineID and not for all.
component4 = component4.sort_values('sincelastComp4').drop_duplicates('datetime_tel', keep='last')
Output-
machineID datetime_tel comp4 datetime_maint sincelastComp4
1554168 24 2021-01-01 16:00:00 1 2020-06-01 06:00:00 214
1554159 24 2021-01-01 15:00:00 1 2020-06-01 06:00:00 214
1554150 24 2021-01-01 14:00:00 1 2020-06-01 06:00:00 214
1554141 24 2021-01-01 13:00:00 1 2020-06-01 06:00:00 214
1554132 24 2021-01-01 12:00:00 1 2020-06-01 06:00:00 214