How would I remove adjacent duplicate words in a string. For example 'Hey there There' -> 'Hey there'
Asked
Active
Viewed 597 times
0
-
2https://stackoverflow.com/questions/7794208/how-can-i-remove-duplicate-words-in-a-string-with-python if you want no duplicate words at all... Or do you only want to remove adjacent duplicates? – ChrisOram Jul 22 '21 at 07:57
-
These words are not adjacent though – user1655130 Jul 22 '21 at 07:58
2 Answers
8
Using re.sub with a backreference we can try:
inp = 'Hey there There'
output = re.sub(r'(\w+) \1', r'\1', inp, flags=re.IGNORECASE)
print(output) # Hey there
The regex pattern used here says to:
(\w+) match and capture a word
[ ] followed by a space
\1 then followed by the same word (ignoring case)
Then, we just replace with the first adjacent word.
Tim Biegeleisen
- 451,927
- 24
- 239
- 318
-
-
@user1655130 An `r` preceding a Python string indicates that it is a _raw_ string. We use raw strings because it can make it easier to write regex, avoiding escaping. – Tim Biegeleisen Jul 22 '21 at 08:06
-
from a learning perspective - how would you do this with recursion? – user1655130 Jul 22 '21 at 10:00
-
I suggest opening a new question, as using some kind of recursive approach is very different from my current answer (but maybe I can post _another_ answer). – Tim Biegeleisen Jul 22 '21 at 10:01
-
Unfortunately, it wont let me ask a similar question. Thanks for your help – user1655130 Jul 22 '21 at 14:06
2
inp = 'Hey there There'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output) # Hey there
inp = 'Hey there eating?'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output) # Hey there eating?
\b ensures word boundary and captures the entire word instead of character. The second test case ("Hey there eating?") does not work with https://stackoverflow.com/a/68481181/8439676 answer given by Tim Biegeleisen.