Why such dramatic impacts from using indexes in Python loops?

Question

I'm writing Python courseware and am teaching about nested for loops. I have four homegrown versions of itertools.combinations(iterable, 2) to compare, as follows:

combos.py:

import itertools

# baseline to compare
def combos_v1(iterable):
    return itertools.combinations(iterable, 2)

def combos_v2(iterable):
    a = []

    for elem1 in iterable[:-1]:
        iterable.pop(0)

        for elem2 in iterable:
            a.append((elem1, elem2))

    return a

def combos_v3(iterable):
    a = []

    for idx1, elem1 in enumerate(iterable[:-1]):
        for elem2 in iterable[idx1 + 1:]:
            a.append((elem1, elem2))

    return a

def combos_v4(iterable):
    a = []
    length = len(iterable)

    for idx1, elem1 in enumerate(iterable[:-1]):
        for idx2 in range(idx1 + 1, length):
            a.append((elem1, iterable[idx2]))

    return a

def combos_v5(iterable):
    a = []
    length = len(iterable)

    for idx1 in range(length - 1):
        for idx2 in range(idx1 + 1, length):
            a.append((iterable[idx1], iterable[idx2]))

    return a

I was prepared for the last three versions to be slower than the first two, but not over 1,000 times slower:

time_combos.py:

from timeit import timeit
times = []

for ver in range(1, 6):
    times.append(timeit(
        f"combos_v{ver}(it)",
        f"from combos import combos_v{ver}; it = [42]*100",
        number=10000))

print("Results:")

for ver, time in enumerate(times, start=1):
    print(f"v{ver}: {time} secs")

Output:

Results:
v1: 0.011828199999999999 secs
v2: 0.003176400000000003 secs
v3: 4.159025300000001 secs
v4: 4.5762011000000005 secs
v5: 5.252337500000001 secs

What's happening under the hood that makes the versions that use indexes in some way so much slower than combos_v2? I actually thought combos_v3 was going to be the slowest because it copies the list in each iteration of the outer loop, but it's not significantly different from the non-copying versions.

EDIT

I suspect the answer is not because of indexing. Instead, I think the slowdown is from Python's use of unlimited precision integers. The _v3, _v4, and _v5 options all either explicitly or implicitly (in enumerate()) do integer math, which I believe is more expensive with unlimited precision integers than it is with CPU native integers. I don't have a good way of testing this, though. I can verify that iterating over a range is slower than iterating over a list:

>>> timeit("for __ in range(10000): pass", number=10000)
1.565418199999982
>>> timeit("for __ in it: pass", "it = list(range(10000))", number=10000)
0.6037281999999777

but it's not thousands of times slower, so I'm not sure whether that accounts for what I'm seeing. I can compare iterating over a range vs. indexing without integer math:

>>> timeit("for el in range(10000): el", number=10000)
1.8875144000000432
>>> timeit("for __ in it: it[42]", "it = list(range(10000))", number=10000)
2.2635267

but I think this is an apples-to-oranges comparison. I tried rewriting some of the slow versions using numpy.uint64 integers and they were marginally faster, but I don't know how much overhead I'm incurring from numpy.

`combos_v2()` is doing a lot less work than the other versions, because it's *utterly broken*. Removing elements from a list that you're iterating over typically results in half the elements being skipped. Before worrying about time comparisons, you should check that your various versions are actually producing the same results... — jasonharper, Jan 30 '20 at 21:15
@jasonharper - thanks for your response. Please check the code carefully. The slice operation in the outer loop makes a *copy* of `iterable`, so it's safe to `pop()` from it in the outer loop. The code does work. — John Riehl, Jan 30 '20 at 21:18
I'm not sure how you arrived at that conclusion from the given link @Gulzar. — Mark, Jan 30 '20 at 21:43
@Gulzar - thanks. Slices of a list return a new list that is a shallow copy of the selected elements of the old list (see the [docs](https://docs.python.org/3/tutorial/introduction.html#lists)). You can see this for yourself: `list1 = [1, 2, 3, 4]; list2 = list1[:-1]; list1.pop(0); print(type(list1), list1, type(list2), list2)`. The only View objects I know of are Dictionary Views (see the [docs](https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects)). — John Riehl, Jan 30 '20 at 21:43
The [dis module](https://docs.python.org/3.7/library/dis.html) is great for giving you hints about this kind of thing. Also, very related question that may be a duplicate: https://stackoverflow.com/questions/13024416/how-come-unpacking-is-faster-than-accessing-by-index — skrrgwasme, Feb 15 '20 at 14:26
@skrrgwasme - thanks for the suggestion on `dis`. Unfortunately I don't know enough of the internals of the Python VM to be able to ID the offending opcodes and I don't have the time to learn. I'm going to consign this question to my list of things beyond my grasp. — John Riehl, Feb 18 '20 at 21:08
@JohnRiehl Your `combos_v1` function actually returns an iterator. Since that one is not consumed (iterated over), not a single combination gets ever calculated and its run time is more or less constant, independent from the input length. Simply return `list(itertools.combinations(iterable, 2))` and suddenly run times will be more realistic. As to be expected, it still beats the last three methods. — amain, May 15 '20 at 03:21
@amain Good catch, thanks! I'm still stumped as to why `combos_v2` is so much faster than every other approach, including that of `itertools.combinations()`. — John Riehl, May 16 '20 at 12:48

Why such dramatic impacts from using indexes in Python loops?

0 Answers0