0

I want to append a new column with items from another dataframe based on multiple conditions.

This is an extract of my two dataframes.

DF1:

ID Time                 Latitude  Longitude 
1  01-03-2015 05:51:00  49.00972  2.54778
1  01-03-2015 06:07:00  49.00972  2.54778
1  01-03-2015 06:10:16  48.98417  2.34639   
1  01-03-2015 06:12:44  48.96445  1.99806
1  01-03-2015 06:15:51  48.65389  1.99417
1  01-03-2015 06:16:27  48.58722  1.99611
1  01-03-2015 06:19:08  48.29500  2.00361
    
2  01-03-2015 06:54:00  49.00972  2.54778   
2  01-03-2015 06:57:12  48.98417  2.34639   
2  01-03-2015 06:59:36  48.96445  1.99806   
2  01-03-2015 07:01:46  48.65389  1.99417   
2  01-03-2015 07:02:20  48.59195  2.03111   
2  01-03-2015 07:05:06  48.29083  2.20945   
2  01-03-2015 07:06:51  48.08333  2.25972       
DF2:

ID Zone      Entry Time           Exit Time
1  TAXI_OUT  01-03-2015 05:51:00  01-03-2015 06:07:00
1  EDUEAAA   01-03-2015 06:07:00  01-03-2015 06:15:06
1  EDUERRR   01-03-2015 06:15:06  01-03-2015 07:06:01

2  TAXI_OUT  01-03-2015 06:41:00  01-03-2015 06:54:00
2  LFFFAAA   01-03-2015 06:54:00  01-03-2015 07:02:59
2  LFFFRRR   01-03-2015 07:02:59  01-03-2015 07:47:09

I want to merge them so that every row in DF1 gets a corresponding zone from DF2. I want to append a new column to DF1. Conditions are:

  • The corresponding ID should match (df1.id == df2.id) and
  • the time of df1 has to be between the Entry- and Exit-Time of df2. (df2.EntryTime <= df1.Time < df2.ExitTime)

The resulting dataframe should look like this:

ID Time                 Lat       Lon      Zone
1  01-03-2015 05:51:00  49.00972  2.54778  TAXI_OUT  
1  01-03-2015 06:07:00  49.00972  2.54778  EDUEAAA   
1  01-03-2015 06:10:16  48.98417  2.34639  EDUEAAA      
1  01-03-2015 06:12:44  48.96445  1.99806  EDUEAAA   
1  01-03-2015 06:15:51  48.65389  1.99417  EDUERRR   
1  01-03-2015 06:16:27  48.58722  1.99611  EDUERRR   
1  01-03-2015 06:19:08  48.29500  2.00361  EDUERRR   
    
2  01-03-2015 06:54:00  49.00972  2.54778  TAXI_OUT  
2  01-03-2015 06:57:12  48.98417  2.34639  LFFFAAA   
2  01-03-2015 06:59:36  48.96445  1.99806  LFFFAAA   
2  01-03-2015 07:01:46  48.65389  1.99417  LFFFAAA   
2  01-03-2015 07:02:20  48.59195  2.03111  LFFFAAA   
2  01-03-2015 07:05:06  48.29083  2.20945  LFFFRRR   
2  01-03-2015 07:06:51  48.08333  2.25972  LFFFRRR   

Is there a way to achieve this without nested loops? The dataframes are quite big, it would be convenient if there is a solution with fast computing time.

0 Answers0