0

How would I remove adjacent duplicate words in a string. For example 'Hey there There' -> 'Hey there'

Right leg
  • 14,916
  • 6
  • 44
  • 75
user1655130
  • 399
  • 1
  • 2
  • 11
  • 2
    https://stackoverflow.com/questions/7794208/how-can-i-remove-duplicate-words-in-a-string-with-python if you want no duplicate words at all... Or do you only want to remove adjacent duplicates? – ChrisOram Jul 22 '21 at 07:57
  • These words are not adjacent though – user1655130 Jul 22 '21 at 07:58

2 Answers2

8

Using re.sub with a backreference we can try:

inp = 'Hey there There'
output = re.sub(r'(\w+) \1', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

The regex pattern used here says to:

(\w+)  match and capture a word
[ ]    followed by a space
\1     then followed by the same word (ignoring case)

Then, we just replace with the first adjacent word.

Tim Biegeleisen
  • 451,927
  • 24
  • 239
  • 318
2
inp = 'Hey there There'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

inp = 'Hey there eating?'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there eating?

\b ensures word boundary and captures the entire word instead of character. The second test case ("Hey there eating?") does not work with https://stackoverflow.com/a/68481181/8439676 answer given by Tim Biegeleisen.