i have a dataframe that looks like this :
data = {'key_part_1': ["Jack", "Jack","Lisa"],
'key_part_2': ["Mayer", "Mayer","Osten"],
'subscription_date': ['2021-05-01', '2021-08-01','2019-08-01'],
'cancelation_date': ['2021-08-01', '2021-11-01', '2019-09-025'],
}
As you can se, the group ['Jack','Mayer'] is repeated twice with subscription and cancelation dates than coincide between the two lines. I want to solve that by creating this :
data = {'key_part_1': ["Jack","Lisa"],
'key_part_2': ["Mayer","Osten"],
'subscription_date': ['2021-05-01','2019-08-01'],
'cancelation_date': ['2021-11-01', '2019-09-025'],
}
Essentially keeping only one appearance of the group ['Jack','Mayer'] in my dataframe by taking the earliest subsciption date and the latest cancelation date (no need to check if the subscription and cancelation dates are equal from a line to another, it is always the case for repeated keys).
I want to do this with all the persons that are repeated in the dataframe. I know i should sort the dates first within the groups but couldn't find how.
Thanks !
Thank you.