I want to append a new column with items from another dataframe based on multiple conditions.
This is an extract of my two dataframes.
DF1:
ID Time Latitude Longitude
1 01-03-2015 05:51:00 49.00972 2.54778
1 01-03-2015 06:07:00 49.00972 2.54778
1 01-03-2015 06:10:16 48.98417 2.34639
1 01-03-2015 06:12:44 48.96445 1.99806
1 01-03-2015 06:15:51 48.65389 1.99417
1 01-03-2015 06:16:27 48.58722 1.99611
1 01-03-2015 06:19:08 48.29500 2.00361
2 01-03-2015 06:54:00 49.00972 2.54778
2 01-03-2015 06:57:12 48.98417 2.34639
2 01-03-2015 06:59:36 48.96445 1.99806
2 01-03-2015 07:01:46 48.65389 1.99417
2 01-03-2015 07:02:20 48.59195 2.03111
2 01-03-2015 07:05:06 48.29083 2.20945
2 01-03-2015 07:06:51 48.08333 2.25972
DF2:
ID Zone Entry Time Exit Time
1 TAXI_OUT 01-03-2015 05:51:00 01-03-2015 06:07:00
1 EDUEAAA 01-03-2015 06:07:00 01-03-2015 06:15:06
1 EDUERRR 01-03-2015 06:15:06 01-03-2015 07:06:01
2 TAXI_OUT 01-03-2015 06:41:00 01-03-2015 06:54:00
2 LFFFAAA 01-03-2015 06:54:00 01-03-2015 07:02:59
2 LFFFRRR 01-03-2015 07:02:59 01-03-2015 07:47:09
I want to merge them so that every row in DF1 gets a corresponding zone from DF2. I want to append a new column to DF1. Conditions are:
- The corresponding ID should match (df1.id == df2.id) and
- the time of df1 has to be between the Entry- and Exit-Time of df2. (df2.EntryTime <= df1.Time < df2.ExitTime)
The resulting dataframe should look like this:
ID Time Lat Lon Zone
1 01-03-2015 05:51:00 49.00972 2.54778 TAXI_OUT
1 01-03-2015 06:07:00 49.00972 2.54778 EDUEAAA
1 01-03-2015 06:10:16 48.98417 2.34639 EDUEAAA
1 01-03-2015 06:12:44 48.96445 1.99806 EDUEAAA
1 01-03-2015 06:15:51 48.65389 1.99417 EDUERRR
1 01-03-2015 06:16:27 48.58722 1.99611 EDUERRR
1 01-03-2015 06:19:08 48.29500 2.00361 EDUERRR
2 01-03-2015 06:54:00 49.00972 2.54778 TAXI_OUT
2 01-03-2015 06:57:12 48.98417 2.34639 LFFFAAA
2 01-03-2015 06:59:36 48.96445 1.99806 LFFFAAA
2 01-03-2015 07:01:46 48.65389 1.99417 LFFFAAA
2 01-03-2015 07:02:20 48.59195 2.03111 LFFFAAA
2 01-03-2015 07:05:06 48.29083 2.20945 LFFFRRR
2 01-03-2015 07:06:51 48.08333 2.25972 LFFFRRR
Is there a way to achieve this without nested loops? The dataframes are quite big, it would be convenient if there is a solution with fast computing time.