0

I have 2 DataFrames : df0 and df1 and df1.shape[0] > df1.shape[0].

df0 and df1 have the exact same columns. Most of the rows of df0 are in df1.

The indices of df0 and df1 are

df0.index = range(df0.shape[0])
df1.index = range(df1.shape[0])

I then created dft

dft = pd.concat([df0, df1], axis=0, sort=False)

and removed duplicated rows with

dft.drop_duplicates(subset='this_col_is_not_index', keep='first', inplace=True)

I have some duplicates on the index of dft. For example :

dft.loc[3].shape

returns

(2, 38)

My aim is to change the index of the second row returned to have a unique index 3. This second row should be indexed dft.index.sort_values()[-1]+1.

I would like to apply this operation on all duplicates.

References :

Python Pandas: Get index of rows which column matches certain value

Pandas: Get duplicated indexes

Redefining the Index in a Pandas DataFrame object

Basile
  • 525
  • 1
  • 5
  • 13

2 Answers2

1

Add parameter ignore_index=True to concat for avoid duplicated index values:

dft = pd.concat([df0, df1], axis=0, sort=False, ignore_index=True)
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
1

Use reset_index(drop = True)

dft.reset_index(drop=True)
Bharath_Raja
  • 602
  • 5
  • 16