2

I have the date as below and the Date as Index. I want to remove the duplicated Date

           Stock      Open      High       Low     Close Adj Close  Volume
Date                                                                      
2016-05-13   AAD  5.230000  5.260000  5.200000  5.260000  5.260000    5000
2016-05-16   AAD  5.220000  5.260000  5.220000  5.260000  5.260000    6000
2016-05-17   AAD  5.210000  5.260000  5.210000  5.260000  5.260000    2000
2016-05-17   AAD  5.210000  5.260000  5.210000  5.260000  5.260000    2000
2016-05-18   AAD  5.200000  5.250000  5.200000  5.250000  5.250000    3000 

The output I needed

           Stock      Open      High       Low     Close Adj Close  Volume
Date                                                                      
2016-05-13   AAD  5.230000  5.260000  5.200000  5.260000  5.260000    5000
2016-05-16   AAD  5.220000  5.260000  5.220000  5.260000  5.260000    6000
2016-05-17   AAD  5.210000  5.260000  5.210000  5.260000  5.260000    2000
2016-05-18   AAD  5.200000  5.250000  5.200000  5.250000  5.250000    3000  

I try by using df.drop_duplicates() and the output delete extra lines after the duplicated date.

          Stock      Open      High       Low     Close Adj Close  Volume
Date                                                                      
2016-05-13   AAD  5.230000  5.260000  5.200000  5.260000  5.260000    5000
2016-05-16   AAD  5.220000  5.260000  5.220000  5.260000  5.260000    6000
2016-05-17   AAD  5.210000  5.260000  5.210000  5.260000  5.260000    2000
  • I think you need [this](https://stackoverflow.com/a/34297689/2901002) – jezrael Jun 06 '17 at 14:23
  • @jezrael I try but not working with duplicated. It just eliminate the index, I want to drop a row as shown above –  Jun 06 '17 at 14:26
  • It use [`boolean indexing`](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing) - so it should works `df = df3[~df3.index.duplicated()]` – jezrael Jun 06 '17 at 14:27
  • @jezrael what is ~df3 means? –  Jun 06 '17 at 14:34
  • ~ is a 'not' operator. Where d3.index is not a duplicate. – Scott Boston Jun 06 '17 at 14:44
  • Does this answer your question? [Remove pandas rows with duplicate indices](https://stackoverflow.com/questions/13035764/remove-pandas-rows-with-duplicate-indices) – tyersome Oct 13 '21 at 01:04

1 Answers1

2

Let's use the information Jezrael provided.

Input Dataframe:

print(df)
           Stock  Open  High   Low  Close  Adj Close  Volume
2016-05-13   AAD  5.23  5.26  5.20   5.26       5.26    5000
2016-05-16   AAD  5.22  5.26  5.22   5.26       5.26    6000
2016-05-17   AAD  5.21  5.26  5.21   5.26       5.26    2000
2016-05-17   AAD  5.21  5.26  5.21   5.26       5.26    2000
2016-05-18   AAD  5.20  5.25  5.20   5.25       5.25    3000

df1 = df[~df.index.duplicated(keep='last')]
print(df1)

Output:

           Stock  Open  High   Low  Close  Adj Close  Volume
2016-05-13   AAD  5.23  5.26  5.20   5.26       5.26    5000
2016-05-16   AAD  5.22  5.26  5.22   5.26       5.26    6000
2016-05-17   AAD  5.21  5.26  5.21   5.26       5.26    2000
2016-05-18   AAD  5.20  5.25  5.20   5.25       5.25    3000
Scott Boston
  • 133,446
  • 13
  • 126
  • 161