Remove first occurrence of word in string

Question

test = 'User Key Account Department Account Start Date'

I want to remove duplicate words from strings. The solution from this question functions well...

def unique_list(l):
     ulist = []
     [ulist.append(x) for x in l if x not in ulist]
     return ulist

test = ' '.join(unique_list(test.split()))

But it only keeps the subsequent duplicates. I want to remove the first occurrence within the string such that the test string reads "User Key Department Account Start Date".

emremrah · Accepted Answer · 2021-07-21T20:09:01.607

3

This should do the job:

test = 'User Key Account Department Account Start Date'

words = test.split()

# if word doesn't exist in the rest of the word list, add it
test = ' '.join([word for i, word in enumerate(words) if word not in words[i+1:]])

print(test)  # User Key Department Account Start Date

edited Jul 21 '21 at 20:09

answered Jul 21 '21 at 20:03

emremrah

1,543
10
17

Yes! Functional and easily modified to work on my DataFrame column of strings. Thanks! – Michael Kessler Jul 21 '21 at 20:12

score 1 · Answer 2 · answered Jul 21 '21 at 20:10

If you want to keep just the last occurrence of each word then just start from the back and work your way forward.

tokens = test.split()
final = []

for word in tokens[::-1]:
    if word in final:
        continue
    else:
        final.append(word)

print(" ".join(final[::-1]))
>> 'User Key Department Account Start Date'

score 1 · Answer 3 · answered Jul 21 '21 at 20:11

1

Here is one way to do it:

l=test.split()
m=set([i for i in l if test.count(i)>1])

for i in m:
    l.remove(i)

res = ' '.join(l)

>>> print(res)
'User Key Department Account Start Date'

answered Jul 21 '21 at 20:11

IoaTzimas

10,263
2
10
29

score 1 · Answer 4 · answered Jul 21 '21 at 20:12

You can convert the source string to a list, and then reverse the list before using the unique_list function, and then reverse the list again before converting back into a string.

def unique_list(l):
     ulist = []
     [ulist.append(x) for x in l if x not in ulist]
     return ulist


orig="User Key Account Department Account Start Date"
orig_list=orig.split()
orig_list.reverse()

uniq_rev=unique_list(orig_list)
uniq_rev.reverse()

print(orig)
print(' '.join(uniq_rev))

Example:

$ python rev.py 
User Key Account Department Account Start Date
User Key Department Account Start Date

Kfcaio · Answer 5 · 2021-07-22T18:05:41.850

If you like it functional:

from functools import reduce
from collections import Counter

import re


if __name__ == '__main__':
    sentence = 'User Key Account Department Account Start Date'

    result = reduce(
        lambda sentence, word: re.sub(rf'{word}\s*', '', sentence, count=1),
        map(
            lambda item: item[0],
            filter(
                lambda item: item[1] > 1,
                Counter(sentence.split()).items()
            )
        ),
        sentence
    )

    print(result)
    # User Key Department Account Start Date

score -1 · Answer 6 · answered Jul 21 '21 at 19:56

-1

put all element in to a set.

tokenize your sentence into strings and insert into a set.

set<std::string> s;

s.insert("aa");
s.insert("bb");
s.insert("cc");
s.insert("cc");
s.insert("dd");

answered Jul 21 '21 at 19:56

amit gupta

9

splitting the string and then using the set function removes duplicates, but also removes the order. I need to keep the string in its order. – Michael Kessler Jul 21 '21 at 20:01

Remove first occurrence of word in string

6 Answers6