After I have been searching and studying some previous solutions in here:
- Pandas Dataframe - for each row, return count of other rows with overlapping dates
- Find date range overlap in python
Since those solutions cannot provide what I'm looking for when I want to find the overlapping date range with specific date for rows in a dataframe where it can be overlapped if active_date current is > effective_date previous
| group_id | active_date | effective_date | from_date | to_date |
|----------|-------------|----------------|-------------|-----------|
| A | 2020-01-13 | NaN | 2020-01-14 | 2020-03-03|
| A | 2020-01-13 | 2020-02-19 | 2020-03-14 | 2020-04-13| //overlap1
| A | 2020-02-20 | NaN | 2020-03-01 | 2020-03-15| //overlap1
| C | 2020-04-20 | 2020-05-19 | 2020-05-19 | 2020-06-02| //overlap2
| C | 2020-04-01 | NaN | 2020-04-12 | 2020-05-20| // not overlap: active < effective
| C | 2020-06-01 | NaN | 2020-06-02 | 2020-06-15| //overlap2
My current solution is kinda tedious:
- Trying to iterate every row in the dataframe
- If found the date that is overlapped, exclude it.
which is its runtime is pretty bad since the size of data is too big. So I need help to guide me if I could use something more Pythonic or Pandas or NumPy.
Thank you!