1

I want to remove rows in df1 which have common numbers in the "0" columns of both df's

df1 = pd.read_csv('Displacement.txt', sep="\s", header=None)
df2= pd.read_csv('overconstraint.txt', header=None)

df1 (>6million rows) looks like

    0   1   2   3   4   5   6
0   1   ,   1   ,   1.0 ,   1.8800E-01
1   2   ,   1   ,   1.0 ,   1.8491E-01

df2 (>1000 rows) looks like

    0
0   1119933
1   1119933
2   1122062

How to do this?

user9106985
  • 435
  • 3
  • 8

2 Answers2

1

You need specify column instead DataFrame by Series.isin:

test=df1[~df1[0].isin(df2[0])]

#in some pandas versions is necessary convert Series to list
test=df1[~df1[0].isin(df2[0].astype(int).tolist())]

Solution with casting to integers:

test=df1[~df1[0].astype(int).isin(df2[0].astype(int))]

#in some pandas versions is necessary convert Series to list
test=df1[~df1[0].astype(int).isin(df2[0].astype(int).tolist())]
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
1

Make the difference between ref columns:

>>> df1[~df1['ref'].isin(df2['ref'])]
   ref  value1
0    2      76
1    4      23
5   12      78
6   14      34
Corralien
  • 70,617
  • 7
  • 16
  • 36