3

Given a Pandas Series (or numpy array) like this:

import pandas as pd
myseries = pd.Series([1, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 2, 2, 3, 3, 1])

Is there a good way to remove sequential duplicates, much like the unix uniq tool does? The numpy/pandas unique() and pandas drop_duplicates functions remove all duplicates (like unix's | sort | uniq), but I don't want this:

>>> print(myseries.unique())
[1 2 3 4]

I want this:

>>> print(myseries.my_mystery_function())
[1, 2, 3, 4, 3, 2, 3, 1]
DrAl
  • 67,377
  • 10
  • 101
  • 105

3 Answers3

7

Compare by ne (!=) with shifted Series and filter by boolean indexing:

myseries = myseries[myseries.ne(myseries.shift())].tolist()
print (myseries)
[1, 2, 3, 4, 3, 2, 3, 1]

If performance is important, use Divakar solution.

jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
4

We can use slicing -

In [62]: a = myseries.values

In [63]: a[np.r_[True,a[:-1]!= a[1:]]]
Out[63]: array([1, 2, 3, 4, 3, 2, 3, 1])
Divakar
  • 212,295
  • 18
  • 231
  • 332
0

A version of jezrael's using !=:

print(myseries[myseries!=myseries.shift()].tolist())

Output:

[1, 2, 3, 4, 3, 2, 3, 1]
U12-Forward
  • 65,118
  • 12
  • 70
  • 89