1

I want to merge two data frames on Date Time column dtype.date-time columns contain both similar and different values. But I am unable to merge them such that all unique date-time rows are finally there..with NA in uncommon columns. I am getting NAs in date_time column for 2nd data frame. tried both in R and python

python code:

df=pd.merge(df_met, df_so2, how='left', on='Date_Time')

In R..data_type is date-time using as.POSIXct

df_2<-join(so2, met_km, type="inner")
df3 <- merge(so2, met_km, all = TRUE)
df_4 <- merge(so2, met_km, by.x = "Date_Time", by.y = "Date_Time")

df_so2:

 X  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1    2  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2    2  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3    2  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2
 4    2  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2
 5    2  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2

df_met:

 X        Date_Time  air_temp_set_1  dew_point_temperature_set_1
 1  2015-01-01 1:00            35.6                         35.6
 2  2015-01-01 2:00            35.6                         35.6
 3  2015-01-01 3:00            35.6                         35.6
 4  2015-01-01 4:00            33.8                         33.8
 5  2015-01-01 5:00            33.2                         33.2
 6  2015-01-01 6:00            33.8                         33.8
 7  2015-01-01 7:00            33.8                         33.8

Expected Output:

 X  POC    Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1  1.0  2 WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2  2.0  2 WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3  NaN      NaN  2015-01-01 1:00               NaN                 NaN  NaN
 4  NaN      NaN  2015-01-01 2:00               NaN                 NaN  NaN
Trenton McKinney
  • 43,885
  • 25
  • 111
  • 113

3 Answers3

1
merge(df_so2, df_met, by = "Date_Time", all = T)

        Date_Time X.x POC Datum         Date_GMT Sample.Measurement MDL X.y air_temp_set_1 dew_point_temperature_set_1
1 2015-01-01 1:00  NA  NA  <NA>             <NA>                 NA  NA   1           35.6                        35.6
2 2015-01-01 2:00  NA  NA  <NA>             <NA>                 NA  NA   2           35.6                        35.6
3 2015-01-01 3:00   1   2 WGS84 01/01/2015 09:00                2.3 0.2   3           35.6                        35.6
4 2015-01-01 4:00   2   2 WGS84 01/01/2015 10:00                2.5 0.2   4           33.8                        33.8
5 2015-01-01 5:00   3   2 WGS84 01/01/2015 11:00                2.1 0.2   5           33.2                        33.2
6 2015-01-01 6:00   4   2 WGS84 01/01/2015 12:00                2.3 0.2   6           33.8                        33.8
7 2015-01-01 7:00   5   2 WGS84 01/01/2015 13:00                1.1 0.2   7           33.8                        33.8
Jon Spring
  • 40,151
  • 4
  • 32
  • 50
0

merge on outer should get them all:

  • pandas.DataFrame.merge
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
  • based upon your comment, you want all the dates, not just those shown in Expected Output
  • add the parameter, sort=True if you want them sorted by date
df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')

 X_x  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL  X_y  air_temp_set_1  dew_point_temperature_set_1
 1.0  2.0  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2    3            35.6                         35.6
 2.0  2.0  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2    4            33.8                         33.8
 3.0  2.0  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2    5            33.2                         33.2
 4.0  2.0  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2    6            33.8                         33.8
 5.0  2.0  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2    7            33.8                         33.8
 NaN  NaN    NaN  2015-01-01 1:00               NaN                 NaN  NaN    1            35.6                         35.6
 NaN  NaN    NaN  2015-01-01 2:00               NaN                 NaN  NaN    2            35.6                         35.6

without columns from df_met:

df_exp.drop(columns=['X_y', 'air_temp_set_1', 'dew_point_temperature_set_1'], inplace=True)
df_exp.rename(columns={'X_x': 'X'}, inplace=True)

   X  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1.0  2.0  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2.0  2.0  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3.0  2.0  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2
 4.0  2.0  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2
 5.0  2.0  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2
 NaN  NaN    NaN  2015-01-01 1:00               NaN                 NaN  NaN
 NaN  NaN    NaN  2015-01-01 2:00               NaN                 NaN  NaN
Trenton McKinney
  • 43,885
  • 25
  • 111
  • 113
0

df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')

I got:

 POC   Datum        Date_Time           Date_GMT   Sample.Measurement   MDL   air_temp_set_1   dew_point_temperature_set_1   relative_humidity_set_1   wind_speed_set_1   cloud_layer_1_code_set_1   wind_direction_set_1   pressure_set_1d   weather_cond_code_set_1   visibility_set_1  wind_cardinal_direction_set_1d  weather_condition_set_1d
    2  WGS84   2015-01-01 3:00  01/01/2015 09:00                   2.3   0.2             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 09:00                   0.6   2.0             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 12:00                   7.4   0.2             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 10:00                   1.0   0.2             35.6                           NaN                       NaN                NaN                        NaN                    NaN               NaN                       NaN                NaN                             NaN                      NaN

Notes:

  • Check df_met.info() and df_so2.info() and verify Date_Time is non-null datetime64[ns]
  • If not, try the following:
  • df_so2.Date_Time = pd.to_datetime(df_so2.Date_Time)
  • df_met.Date_Time = pd.to_datetime(df_met.Date_Time)
Nimantha
  • 5,793
  • 5
  • 23
  • 56
  • I added notes to your answer, that you can try. Make sure to delete this answer once we've finished. If this doesn't resolve the issue, if you can share the actual datafiles you're using, I might have better luck figuring it out. – Trenton McKinney Sep 14 '19 at 13:42