Remove adjacent duplicate words in a string with Python?

Question

How would I remove adjacent duplicate words in a string. For example 'Hey there There' -> 'Hey there'

https://stackoverflow.com/questions/7794208/how-can-i-remove-duplicate-words-in-a-string-with-python if you want no duplicate words at all... Or do you only want to remove adjacent duplicates? — ChrisOram, Jul 22 '21 at 07:57

Tim Biegeleisen · Accepted Answer · 2021-07-22T08:01:07.460

8

Using re.sub with a backreference we can try:

inp = 'Hey there There'
output = re.sub(r'(\w+) \1', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

The regex pattern used here says to:

(\w+)  match and capture a word
[ ]    followed by a space
\1     then followed by the same word (ignoring case)

Then, we just replace with the first adjacent word.

edited Jul 22 '21 at 08:01

answered Jul 22 '21 at 07:58

Tim Biegeleisen

451,927
24
239
318

What does r mean above? – user1655130 Jul 22 '21 at 08:06
@user1655130 An `r` preceding a Python string indicates that it is a _raw_ string. We use raw strings because it can make it easier to write regex, avoiding escaping. – Tim Biegeleisen Jul 22 '21 at 08:06
from a learning perspective - how would you do this with recursion? – user1655130 Jul 22 '21 at 10:00
I suggest opening a new question, as using some kind of recursive approach is very different from my current answer (but maybe I can post _another_ answer). – Tim Biegeleisen Jul 22 '21 at 10:01
Unfortunately, it wont let me ask a similar question. Thanks for your help – user1655130 Jul 22 '21 at 14:06

ROHIT SHARMA 16110141 · Answer 2 · 2021-09-08T11:05:25.803

inp = 'Hey there There'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

inp = 'Hey there eating?'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there eating?

\b ensures word boundary and captures the entire word instead of character. The second test case ("Hey there eating?") does not work with https://stackoverflow.com/a/68481181/8439676 answer given by Tim Biegeleisen.

Remove adjacent duplicate words in a string with Python?

2 Answers2