I'm writing Python courseware and am teaching about nested for loops. I have four homegrown versions of itertools.combinations(iterable, 2) to compare, as follows:
combos.py:
import itertools
# baseline to compare
def combos_v1(iterable):
return itertools.combinations(iterable, 2)
def combos_v2(iterable):
a = []
for elem1 in iterable[:-1]:
iterable.pop(0)
for elem2 in iterable:
a.append((elem1, elem2))
return a
def combos_v3(iterable):
a = []
for idx1, elem1 in enumerate(iterable[:-1]):
for elem2 in iterable[idx1 + 1:]:
a.append((elem1, elem2))
return a
def combos_v4(iterable):
a = []
length = len(iterable)
for idx1, elem1 in enumerate(iterable[:-1]):
for idx2 in range(idx1 + 1, length):
a.append((elem1, iterable[idx2]))
return a
def combos_v5(iterable):
a = []
length = len(iterable)
for idx1 in range(length - 1):
for idx2 in range(idx1 + 1, length):
a.append((iterable[idx1], iterable[idx2]))
return a
I was prepared for the last three versions to be slower than the first two, but not over 1,000 times slower:
time_combos.py:
from timeit import timeit
times = []
for ver in range(1, 6):
times.append(timeit(
f"combos_v{ver}(it)",
f"from combos import combos_v{ver}; it = [42]*100",
number=10000))
print("Results:")
for ver, time in enumerate(times, start=1):
print(f"v{ver}: {time} secs")
Output:
Results:
v1: 0.011828199999999999 secs
v2: 0.003176400000000003 secs
v3: 4.159025300000001 secs
v4: 4.5762011000000005 secs
v5: 5.252337500000001 secs
What's happening under the hood that makes the versions that use indexes in some way so much slower than combos_v2? I actually thought combos_v3 was going to be the slowest because it copies the list in each iteration of the outer loop, but it's not significantly different from the non-copying versions.
EDIT
I suspect the answer is not because of indexing. Instead, I think the slowdown is from Python's use of unlimited precision integers. The _v3, _v4, and _v5 options all either explicitly or implicitly (in enumerate()) do integer math, which I believe is more expensive with unlimited precision integers than it is with CPU native integers. I don't have a good way of testing this, though. I can verify that iterating over a range is slower than iterating over a list:
>>> timeit("for __ in range(10000): pass", number=10000)
1.565418199999982
>>> timeit("for __ in it: pass", "it = list(range(10000))", number=10000)
0.6037281999999777
but it's not thousands of times slower, so I'm not sure whether that accounts for what I'm seeing. I can compare iterating over a range vs. indexing without integer math:
>>> timeit("for el in range(10000): el", number=10000)
1.8875144000000432
>>> timeit("for __ in it: it[42]", "it = list(range(10000))", number=10000)
2.2635267
but I think this is an apples-to-oranges comparison. I tried rewriting some of the slow versions using numpy.uint64 integers and they were marginally faster, but I don't know how much overhead I'm incurring from numpy.