currently one row describes an event. A case consist of several events. But I need a dataframe with the whole case per row. So I tried to encode the "activities" and the "time", numbered 1 to 6, for each case and write them in new columns.
My question is now: How can I compromise these rows so that I have one complete case with all information in one row? (please have a look at the attached picture)
I will delete the redundant in the next step so I do not care If the values are in all rows for each case or just in the first row of each case.
Here is my attempt, I am struggling with Groupy and I don't even know if this is the right approach.
Thanks for any help! :)
for i in df.index:
act_col = 'activity_' + df.loc[i, 'case_event'].astype('str')
time_col = 'time_' + df.loc[i, 'case_event'].astype('str')
df.loc[i,act_col] = df.loc[i,'activity']
df.loc[i,time_col] = df.loc[i,'rel_time']
df.head(6)
here I start to struggle:
df['activity_6'] = (df.groupby(['case_id'], sort=False)['activity_6']
.sum()
.reset_index())
df.head(6)
Picture of Output: