1

I have a table like so:

ID  Timestamp   Status
A   5/30/2022 2:29  Run Ended
A   5/30/2022 0:23  In Progress
A   5/30/2022 0:22  Prepared
B   5/30/2022 11:15 Run Ended
B   5/30/2022 9:18  In Progress
B   5/30/2022 0:55  Prepared

I want to compute the duration between each status grouped by the ID. So the resulting output table would be:

ID  Duration(min)   Status change
A   0.40    In Progress-Prepared
A   125.82  Run Ended - In Progress
B   502.78  In Progress-Prepared
B   117.34  Run Ended - In Progress

How do I order it by descending timestamp (grouped by ID) and then subtract the last row from the previous row all the way upto the top for each ID group?

thentangler
  • 797
  • 9
  • 23

2 Answers2

2

You can use a groupby.diff and groupby.shift:

out = (df
 .assign(**{'Duration(min)': pd.to_datetime(df['Timestamp'], dayfirst=False)
            .groupby(df['ID'])
            .diff(-1).dt.total_seconds() # diff in seconds to next time in group
            .div(60), # convert to minutes
           'Status change': df.groupby('ID')['Status'].shift(-1)+'-'+df['Status']
           })
 .dropna(subset='Duration(min)') # get rid of empty rows
 [['ID', 'Duration(min)', 'Status change']]
 )

Output:

  ID  Duration(min)          Status change
0  A          126.0  In Progress-Run Ended
1  A            1.0   Prepared-In Progress
3  B          117.0  In Progress-Run Ended
4  B          503.0   Prepared-In Progress
mozway
  • 81,317
  • 8
  • 19
  • 49
1

You can use groupby('ID')[value].shift(1) to access the previous value in the same ID group.

import pandas as pd

df = pd.DataFrame({
    'ID': ['a','a','a','b','b','b'],
    'time': [1,2,3,1,4,5],
    'status': ['x','y','z','xx','yy','zz']
})
df['previous_time'] = df.groupby('ID')['time'].shift(1)
df['previous_status'] = df.groupby('ID')['status'].shift(1)
df = df.dropna()
df['duration'] = df['time'] - df['previous_time'] # change this line to calculate duration between time instead
df['status_change'] = df['previous_status'] + '-' + df['status']
print (df[['ID','duration','status_change']].to_markdown(index=False))

Output:

ID duration status_change
a 1 x-y
a 1 y-z
b 3 xx-yy
b 1 yy-zz

PS. you can subtract time and previous_time with answer in this thread

ThSorn
  • 502
  • 3
  • 7