0

After I have been searching and studying some previous solutions in here:

Since those solutions cannot provide what I'm looking for when I want to find the overlapping date range with specific date for rows in a dataframe where it can be overlapped if active_date current is > effective_date previous

| group_id | active_date | effective_date |  from_date  |  to_date  |
|----------|-------------|----------------|-------------|-----------|
|    A     | 2020-01-13  |      NaN       | 2020-01-14  | 2020-03-03|
|    A     | 2020-01-13  |   2020-02-19   | 2020-03-14  | 2020-04-13|  //overlap1
|    A     | 2020-02-20  |      NaN       | 2020-03-01  | 2020-03-15|  //overlap1 
|    C     | 2020-04-20  |   2020-05-19   | 2020-05-19  | 2020-06-02|  //overlap2
|    C     | 2020-04-01  |      NaN       | 2020-04-12  | 2020-05-20|  // not overlap: active < effective
|    C     | 2020-06-01  |      NaN       | 2020-06-02  | 2020-06-15|  //overlap2

My current solution is kinda tedious:

  1. Trying to iterate every row in the dataframe
  2. If found the date that is overlapped, exclude it.

which is its runtime is pretty bad since the size of data is too big. So I need help to guide me if I could use something more Pythonic or Pandas or NumPy.

Thank you!

benji
  • 198
  • 1
  • 10

0 Answers0