python remove duplicates from 2 lists

Question

I am trying to remove duplicates from 2 lists. so I wrote this function:

a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]

b = ["ijk", "lmn", "opq", "rst", "123", "456", ]

for i in b:
    if i in a:
        print "found " + i
        b.remove(i)

print b

But I find that the matching items following a matched item does not get remove.

I get result like this:

found ijk
found opq
['lmn', 'rst', '123', '456']

but i expect result like this:

['123', '456']

How can I fix my function to do what I want?

Thank you.

7stud · Answer 1 · 2017-05-03T14:39:49.543

Here is what's going on. Suppose you have this list:

['a', 'b', 'c', 'd']

and you are looping over every element in the list. Suppose you are currently at index position 1:

['a', 'b', 'c', 'd']
       ^
       |
   index = 1

...and you remove the element at index position 1, giving you this:

['a',      'c', 'd']
       ^
       |
    index 1

After removing the item, the other items slide to the left, giving you this:

['a', 'c', 'd']
       ^
       |
    index 1

Then when the loop runs again, the loop increments the index to 2, giving you this:

['a', 'c', 'd']
            ^ 
            |
         index = 2

See how you skipped over 'c'? The lesson is: never delete an element from a list that you are looping over.

score 30 · Answer 2 · answered Aug 12 '13 at 19:20

30

Your problem seems to be that you're changing the list you're iterating over. Iterate over a copy of the list instead.

for i in b[:]:
    if i in a:
        b.remove(i)


>>> b
['123', '456']

But, How about using a list comprehension instead?

>>> a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
>>> b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
>>> [elem for elem in b if elem not in a ]
['123', '456']

answered Aug 12 '13 at 19:20

Sukrit Kalra

30,727
7
64
70

If the `a` list grows longer it may turn out that turning it into a `set` is a lot more efficient (`x in s` is O(1) for sets, O(n) for lists) according to http://wiki.python.org/moin/TimeComplexity – Frerich Raabe Aug 12 '13 at 19:40

Mario Rossi · Answer 3 · 2013-08-12T21:57:20.763

26

What about

b= set(b) - set(a)

If you need possible repetitions in b to also appear repeated in the result and/or order to be preserved, then

b= [ x for x in b if not x in a ]

would do.

edited Aug 12 '13 at 21:57

answered Aug 12 '13 at 19:24

Mario Rossi

7,501
23
37

2

This answer was downvoted once. Can anybody tell why? Any grave syntax/conceptual error? Not contributing to the question asked (and considering that sometimes it is extremely difficult to understand what is being asked)? Bad English to the point of unintelligibility? – Mario Rossi Aug 12 '13 at 21:59
I saw `if not x in a`, it's a bit strange for me. It also works well, but I think you should change it to `if x not in a` will make the code clearer. My personal opinion. – Lê Tư Thành Jun 28 '19 at 09:59
Note that the list comprehension option here must be written: `[x for x in d if not (x in o )]` to pass pep8. – Rob May 04 '20 at 22:04

score 4 · Answer 4 · answered Aug 12 '13 at 19:26

You asked to remove both the lists duplicates, here's my solution:

from collections import OrderedDict
a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]

x = OrderedDict.fromkeys(a)
y = OrderedDict.fromkeys(b)

for k in x:
    if k in y:
        x.pop(k)
        y.pop(k)


print x.keys()
print y.keys()

Result:

['abc', 'def', 'xyz']
['123', '456']

The nice thing here is that you keep the order of both lists items

score 3 · Answer 5 · answered Aug 12 '13 at 19:22

3

or a set

set(b).difference(a)

be forewarned sets will not preserve order if that is important

answered Aug 12 '13 at 19:22

Joran Beasley

103,130
11
146
174

score 3 · Answer 6 · edited Dec 15 '20 at 16:21

3

You can use lambda functions.

f = lambda list1, list2: list(filter(lambda element: element not in list2, list1))

The duplicated elements in list2 are removed from list1.

>>> a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
>>> b = ["ijk", "lmn", "opq", "rst", "123", "456"]
>>> f(a, b)
['abc', 'def', 'xyz']
>>> f(b, a)
['123', '456']

edited Dec 15 '20 at 16:21

BcK

2,302
1
11
26

answered Dec 14 '20 at 12:31

Golvin

31
1
2

score 2 · Answer 7 · answered Aug 12 '13 at 20:24

2

One way of avoiding the problem of editing a list while you iterate over it, is to use comprehensions:

a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
b = [x for x in b if not x in a]

answered Aug 12 '13 at 20:24

Mayur Patel

905
1
7
15

Same solution posted 1h ago by Mario Rossi and Sukrit Kalra. – DevLounge Aug 12 '13 at 20:29
Perhaps @Mayur Patel started writing it at the same time than me. This is a topic for meta (I guess): either blocking questions when 1 (or perhaps 2) people are answering them (for certain amount of time?), or at least an indication of how many other people is answering them. I mean before the answers are **posted**. I'm a noobie, though. If something like this is already there, please let me know. – Mario Rossi Aug 12 '13 at 22:04

Vincenzo Pii · Answer 8 · 2013-08-12T20:35:22.653

There are already many answers on "how can you fix it?", so this is a "how can you improve it and be more pythonic?": since what you want to achieve is to get the difference between list b and list a, you should use difference operation on sets (operations on sets):

>>> a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
>>> b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
>>> s1 = set(a)
>>> s2 = set(b)
>>> s2 - s1
set(['123', '456'])

score 0 · Answer 9 · answered Dec 14 '20 at 13:09

Along the lines of 7stud, if you go through the list in reversed order, you don't have the problem you encountered:

a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]

b = ["ijk", "lmn", "opq", "rst", "123", "456", ]

for i in reversed(b):
    if i in a:
        print "found " + i
        b.remove(i)

print b

Output:
found rst
found opq
found lmn
found ijk
['123', '456']

score 0 · Answer 10 · answered Sep 08 '21 at 11:44

You can use the list comprehensive

a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]

duplicates value removed from a

c=[value for value in a if value not in b]

duplicate value removed from b

c=[value for value in b if value not in a]

Baka_coder · Answer 11 · 2022-05-01T13:42:37.297

0

a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]

b = ["ijk", "lmn", "opq", "rst", "123", "456","abc"]

for i in a:
    if i in b:
        print("found", i)
        b.remove(i)
print(b)

output:
found abc
found ijk
found lmn
found opq
found rst
['123', '456']

edited May 01 '22 at 13:42

answered May 01 '22 at 13:36

Baka_coder

1
1

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 01 '22 at 21:09

TheIrishPizzaGuy · Answer 12 · 2021-08-27T10:35:41.553

-1

A simple fix would be to instead iterate through a range, look at the element at the index, delete that element, then decrement the counter by 1.
Mock untested code

for i in range(0, len(b)):
    if b[i] in a:
        del b[i]
        i -= 1

edited Aug 27 '21 at 10:35

answered Aug 27 '21 at 10:14

TheIrishPizzaGuy

21
4

Deleting elements of an iterable that you are iterating over is typically going to result in problems. What happens when you do run this code with the example lists provided? – Frodnar Aug 27 '21 at 10:58
Please add further details to expand on your answer, such as working code or documentation citations. – Community Aug 27 '21 at 10:59

python remove duplicates from 2 lists

12 Answers12

Linked

Related