How to loop over an iterator twice?
It is impossible! (Explained later.) Instead, do one of the following:
Collect the iterator into a something that can be looped over multiple times.
items = list(iterator)
for item in items:
...
Downside: This costs memory.
Create a new iterator. It usually takes only a microsecond to make a new iterator.
for item in create_iterator():
...
for item in create_iterator():
...
Downside: Iteration itself may be expensive (e.g. reading from disk or network).
Reset the "iterator". For example, with file iterators:
with open(...) as f:
for item in f:
...
f.seek(0)
for item in f:
...
Downside: Most iterators cannot be "reset".
Philosophy of an Iterator
The world is divided into two categories:
- Iterable: A for-loopable data structure that holds data. Examples:
list, tuple, str.
- Iterator: A pointer to some element of an iterable.
If we were to define a sequence iterator, it might look something like this:
class SequenceIterator:
index: int
items: Sequence # Sequences can be randomly indexed via items[index].
def __next__(self):
"""Increment index, and return the latest item."""
The important thing here is that typically, an iterator does not store any actual data inside itself.
Iterators usually model a temporary "stream" of data. That data source is consumed by the process of iteration. This is a good hint as to why one cannot loop over an arbitrary source of data more than once. We need to open a new temporary stream of data (i.e. create a new iterator) to do that.
Exhausting an Iterator
What happens when we extract items from an iterator, starting with the current element of the iterator, and continuing until it is entirely exhausted? That's what a for loop does:
iterable = "ABC"
iterator = iter(iterable)
for item in iterator:
print(item)
Let's support this functionality in SequenceIterator by telling the for loop how to extract the next item:
class SequenceIterator:
def __next__(self):
item = self.items[self.index]
self.index += 1
return item
Hold on. What if index goes past the last element of items? We should raise a safe exception for that:
class SequenceIterator:
def __next__(self):
try:
item = self.items[self.index]
except IndexError:
raise StopIteration # Safely says, "no more items in iterator!"
self.index += 1
return item
Now, the for loop knows when to stop extracting items from the iterator.
What happens if we now try to loop over the iterator again?
iterable = "ABC"
iterator = iter(iterable)
# iterator.index == 0
for item in iterator:
print(item)
# iterator.index == 3
for item in iterator:
print(item)
# iterator.index == 3
Since the second loop starts from the current iterator.index, which is 3, it does not have anything else to print and so iterator.__next__ raises the StopIteration exception, causing the loop to end immediately.