35

Suppose I have two lists, L and M. Now I want to know if they share an element. Which would be the fastest way of asking (in python) if they share an element? I don't care which elements they share, or how many, just if they share or not.

For example, in this case

L = [1,2,3,4,5,6]
M = [8,9,10]

I should get False, and here:

L = [1,2,3,4,5,6]
M = [5,6,7]

I should get True.

I hope the question's clear. Thanks!

Manuel

Manuel Araoz
  • 15,465
  • 22
  • 68
  • 94
  • 4
    See http://stackoverflow.com/questions/3170055/test-if-lists-share-any-items-in-python for a more thorough analysis of this problem. `not frozenset(L).isdisjoint(M)` seems to be the optimal solution. – Edward Falk Apr 04 '15 at 19:16

4 Answers4

54

Or more concisely

if set(L) & set(M):
    # there is an intersection
else:
    # no intersection

If you really need True or False

bool(set(L) & set(M))

After running some timings, this seems to be a good option to try too

m_set=set(M)
any(x in m_set  for x in L)

If the items in M or L are not hashable you have to use a less efficient approach like this

any(x in M for x in L)

Here are some timings for 100 item lists. Using sets is considerably faster when there is no intersection, and a bit slower when there is a considerable intersection.

M=range(100)
L=range(100,200)

timeit set(L) & set(M)
10000 loops, best of 3: 32.3 µs per loop

timeit any(x in M for x in L)
1000 loops, best of 3: 374 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
10000 loops, best of 3: 31 µs per loop

L=range(50,150)

timeit set(L) & set(M)
10000 loops, best of 3: 18 µs per loop

timeit any(x in M for x in L)
100000 loops, best of 3: 4.88 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
100000 loops, best of 3: 9.39 µs per loop


# Now for some random lists
import random
L=[random.randrange(200000) for x in xrange(1000)]
M=[random.randrange(200000) for x in xrange(1000)]

timeit set(L) & set(M)
1000 loops, best of 3: 420 µs per loop

timeit any(x in M for x in L)
10 loops, best of 3: 21.2 ms per loop

timeit m_set=set(M);any(x in m_set  for x in L)
1000 loops, best of 3: 168 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
1000 loops, best of 3: 371 µs per loop
John La Rooy
  • 281,034
  • 50
  • 354
  • 495
  • 1
    @gnibbler - Is it provable that the `any()` version is less efficient? It seems like it would go through `M` only until it found an element in `L`, at which point `any` would return `True` and be done. This sounds more efficient than converting both `L` and `M` to sets beforehand. At least, on paper. – Chris Lutz Feb 04 '10 at 05:40
  • This here, this is the answer. – jathanism Feb 04 '10 at 05:42
  • 2
    @Chris, worst case is when when there is no intersection - O(l*m). With sets i believe it is O(l+m) – John La Rooy Feb 04 '10 at 05:43
  • WOW! so bool(set(L) & set(M)) is faster than any(x in M for x in L)... Who would think? :) Thank you. – Manuel Araoz Feb 04 '10 at 05:54
  • 1
    @Manuel - The best, it seems, is to convert _one_ list to a `set` to allow for faster membership testing (`in`), then to filter based on this membership test (`x in m_set for x in L`). @gnibbler, can we get some tests that utilize two randomly constructed lists just for completeness? (and also +1 for a fine job) – Chris Lutz Feb 04 '10 at 06:07
  • Do you see any difference between frozenset and set in these tests? I just picked frozenset by default because this didn't happen to need mutability. – Darius Bacon Feb 04 '10 at 06:15
  • @Darius, see the final test. set was considerably faster than frozenset. – John La Rooy Feb 04 '10 at 06:26
5

To avoid the work of constructing the intersection, and produce an answer as soon as we know that they intersect:

m_set = frozenset(M)
return any(x in m_set for x in L)

Update: gnibbler tried this out and found it to run faster with set() in place of frozenset(). Whaddayaknow.

Darius Bacon
  • 14,703
  • 5
  • 51
  • 53
3

First of all, if you do not need them ordered, then switch to the set type.

If you still need the list type, then do it this way: 0 == False

len(set.intersection(set(L), set(M)))
gahooa
  • 122,825
  • 12
  • 91
  • 98
  • This doesn't seem very efficient. I mean, the whole intersection is been calculated, isn't it!? Or is it lazily evaluated? Thanks! – Manuel Araoz Feb 04 '10 at 05:41
  • @Manuel, when i tested it, the intersection took less time to calculate than the time converting the lists to sets, so less than 1/3 of the total time – John La Rooy Feb 04 '10 at 06:06
-1

That's the most generic and efficient in a balanced way I could come up with (comments should make the code easy to understand):

import itertools, operator

def _compare_product(list1, list2):
    "Return if any item in list1 equals any item in list2 exhaustively"
    return any(
        itertools.starmap(
            operator.eq,
            itertools.product(list1, list2)))

def do_they_intersect(list1, list2):
    "Return if any item is common between list1 and list2"

    # do not try to optimize for small list sizes
    if len(list1) * len(list2) <= 100: # pick a small number
        return _compare_product(list1, list2)

    # first try to make a set from one of the lists
    try: a_set= set(list1)
    except TypeError:
        try: a_set= set(list2)
        except TypeError:
            a_set= None
        else:
            a_list= list1
    else:
        a_list= list2

    # here either a_set is None, or we have a_set and a_list

    if a_set:
        return any(itertools.imap(a_set.__contains__, a_list))

    # try to sort the lists
    try:
        a_list1= sorted(list1)
        a_list2= sorted(list2)
    except TypeError: # sorry, not sortable
        return _compare_product(list1, list2)

    # they could be sorted, so let's take the N+M road,
    # not the N*M

    iter1= iter(a_list1)
    iter2= iter(a_list2)
    try:
        item1= next(iter1)
        item2= next(iter2)
    except StopIteration: # one of the lists is empty
        return False # ie no common items

    while 1:
        if item1 == item2:
            return True
        while item1 < item2:
            try: item1= next(iter1)
            except StopIteration: return False
        while item2 < item1:
            try: item2= next(iter2)
            except StopIteration: return False

HTH.

tzot
  • 87,612
  • 28
  • 135
  • 198