3

I have 2 sets of data:

df1  
  'Y'  
A a  
B b  
C c  
D d  
E e  

df2  
  'Y'  
D d  
E e  
F f  

I need to transform the first df1 to remove any data appearing in df2 I can easily do it using SQL, but that's not an option for me The logic is 'select * from df1 where df1.Y not in (select Y from df2)' I can't seem to get the implementation right using merge. Any ideas? Desired result:

df3  
  'Y'  
A a  
B b  
C c  

with df2 unchanged

Psidom
  • 195,464
  • 25
  • 298
  • 322
  • related: http://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe and dupe: http://stackoverflow.com/questions/32651860/python-pandas-how-to-find-rows-in-one-dataframe-but-not-in-another – EdChum Dec 24 '16 at 22:09

2 Answers2

4

You can subset df1 with the .isin() method:

df1[~df1.Y.isin(df2.Y)]

#   Y
#A  a
#B  b
#C  c
Psidom
  • 195,464
  • 25
  • 298
  • 322
2

I do like @Psidom's Pandas solution.

Here is a NumPy alternative:

In [26]: d1[~np.in1d(d1.Y, d2.Y)]
Out[26]:
   Y
A  a
B  b
C  c

and yet another Pandas solution, which uses SQL-like query() method:

In [30]: d1.query('Y not in @d2.Y')
Out[30]:
   Y
A  a
B  b
C  c
Community
  • 1
  • 1
MaxU - stop genocide of UA
  • 191,778
  • 30
  • 340
  • 375