63

Why can't I iterate twice over the same iterator?

# data is an iterator.

for row in data:
    print("doing this one time")

for row in data:
    print("doing this two times")

This prints "doing this one time" a few times, since data is non-empty. However, it does not print "doing this two times". Why does iterating over data work the first time, but not the second time?

Trilarion
  • 9,942
  • 9
  • 61
  • 98
JSchwartz
  • 2,176
  • 6
  • 28
  • 41
  • 11
    Iterable vs. iterator. – Ignacio Vazquez-Abrams Aug 16 '14 at 03:45
  • I'm not saying that this is a duplicate, but you might also want to refer to https://stackoverflow.com/questions/9884132/understanding-pythons-iterator-iterable-and-iteration-protocols-what-exact for some more context / explanation – Nick Meyer Aug 16 '14 at 04:07
  • Related: [Resetting an iterator object](https://stackoverflow.com/questions/1271320/resetting-generator-object-in-python) – Aran-Fey Jun 13 '18 at 16:28
  • 2
    The code presented in this question is not the shortest possible to recreate the problem. The question could be improved by presenting a better code example. – Trilarion Feb 13 '22 at 12:01
  • @Trilarion Yes, I think the `def _view(self,dbName): db = self.dictDatabases[dbName] data = db[3]` can be removed safely since no other answer discusses that portion of the code. – Mateen Ulhaq Feb 15 '22 at 00:59
  • @MateenUlhaq Thanks for the improvement. I despair a bit at the question because as a debugging question it never showed runnable code and as a knowledge question (already knowing that it's an iterator) it doesn't show any research, yet it got so many upvotes. Added a bit of research because that is what a good question would have done. – Trilarion Feb 15 '22 at 05:03
  • I think there's an unanswered question here, one that can trip up novices: "How can I tell if my data is an iterator or just iterable?" For example, why can I go through this list twice, but not through this file twice? – AShelly Feb 15 '22 at 22:42

4 Answers4

48

It's because data is an iterator, and you can consume an iterator only once. For example:

lst = [1, 2, 3]
it = iter(lst)

next(it)
=> 1
next(it)
=> 2
next(it)
=> 3
next(it)
=> StopIteration

If we are traversing some data using a for loop, that last StopIteration will cause it to exit the first time. If we try to iterate over it again, we'll keep getting the StopIteration exception, because the iterator has already been consumed.


Now for the second question: What if we do need to traverse the iterator more than once? A simple solution would be to save all the elements to a list, which can be traversed as many times as needed. For instance, if data is an iterator:

data = list(data)

That is alright as long as there are few elements in the list. However, if there are many elements, it's a better idea to create independent iterators using tee():

import itertools
it1, it2 = itertools.tee(data, 2) # create as many as needed

Now we can loop over each one in turn:

for e in it1:
    print("doing this one time")

for e in it2:
    print("doing this two times")
davidsbro
  • 2,758
  • 4
  • 21
  • 33
Óscar López
  • 225,348
  • 35
  • 301
  • 374
  • 22
    @ÓscarLópez Note from the documentation on `tee`: "This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()." So if you're using `it1` and `it2` like you are in the example, you might not be getting any real benefit out of `tee` (while probably taking some extra overhead). – svk Aug 16 '14 at 19:39
  • 11
    I support @svk - in this case `tee` will create full copy of iterator values in slightly less efficient way than a single `list` call. One should use `tee` not when there are a lot of elements in iterable - this is not relevant, but when there is locality of usage -in this case the `tee`'s cache can be less than the whole list. For example if two iterators go neck in neck, like in `zip(a, islice(b, 1))` call. – shitpoet Sep 08 '20 at 11:32
  • 12
    @user2357112supportsMonica Your edits to this answer are being discussed on [meta](https://meta.stackoverflow.com/questions/416012). – cigien Feb 13 '22 at 02:21
26

Iterators (e.g. from calling iter, from generator expressions, or from generator functions which yield) are stateful and can only be consumed once, as explained in Óscar López's answer. However, that answer's recommendation to use itertools.tee(data) instead of list(data) for performance reasons is misleading.

In most cases, where you want to iterate through the whole of data and then iterate through the whole of it again, tee takes more time and uses more memory than simply consuming the whole iterator into a list and then iterating over it twice. tee may be preferred if you will only consume the first few elements of each iterator, or if you will alternate between consuming a few elements from one iterator and then a few from the other.

kaya3
  • 41,043
  • 4
  • 50
  • 79
  • 1
    should probably link to just the hash (just `#25336738`) since requiring a reload of the entire page is awful – somebody Feb 14 '22 at 04:18
  • @somebody I don't think that's possible - at least, I tried, and Stack Overflow's editor didn't make it into a link. If you know how to do it then I suggest you make the edit. – kaya3 Feb 14 '22 at 06:33
  • :( unfortunately it doesn't seem to be possible (see [this feature request](https://meta.stackexchange.com/questions/37894/support-anchor-names-in-posts)) - seems like the next best thing is to assume users clicked on the question (and not one of the answers): [link](https://stackoverflow.com/questions/25336726/why-cant-i-iterate-twice-over-the-same-data#25336738) – somebody Feb 14 '22 at 08:32
13

Once an iterator is exhausted, it will not yield any more.

>>> it = iter([3, 1, 2])
>>> for x in it: print(x)
...
3
1
2
>>> for x in it: print(x)
...
>>>
falsetru
  • 336,967
  • 57
  • 673
  • 597
  • 3
    that makes sense, but how do I get around it? – JSchwartz Aug 16 '14 at 03:55
  • @JSchwartz, Convert the iterator into sequence object (`list`, `tuple`). Then iterate the sequence object. (Only if the size of the csv is not huge) – falsetru Aug 16 '14 at 03:56
  • 3
    @JSchwartz, Alternatively, if you can access the underlying file object and that is is seekable. you can change file position before the second loop: `csv_file_object.seek(0)` – falsetru Aug 16 '14 at 03:56
3

How to loop over an iterator twice?

It is impossible! (Explained later.) Instead, do one of the following:

  • Collect the iterator into a something that can be looped over multiple times.

    items = list(iterator)
    
    for item in items:
        ...
    

    Downside: This costs memory.

  • Create a new iterator. It usually takes only a microsecond to make a new iterator.

    for item in create_iterator():
        ...
    
    for item in create_iterator():
        ...
    

    Downside: Iteration itself may be expensive (e.g. reading from disk or network).

  • Reset the "iterator". For example, with file iterators:

    with open(...) as f:
        for item in f:
            ...
    
        f.seek(0)
    
        for item in f:
            ...
    

    Downside: Most iterators cannot be "reset".


Philosophy of an Iterator

The world is divided into two categories:

  • Iterable: A for-loopable data structure that holds data. Examples: list, tuple, str.
  • Iterator: A pointer to some element of an iterable.

If we were to define a sequence iterator, it might look something like this:

class SequenceIterator:
    index: int
    items: Sequence  # Sequences can be randomly indexed via items[index].

    def __next__(self):
        """Increment index, and return the latest item."""

The important thing here is that typically, an iterator does not store any actual data inside itself.

Iterators usually model a temporary "stream" of data. That data source is consumed by the process of iteration. This is a good hint as to why one cannot loop over an arbitrary source of data more than once. We need to open a new temporary stream of data (i.e. create a new iterator) to do that.

Exhausting an Iterator

What happens when we extract items from an iterator, starting with the current element of the iterator, and continuing until it is entirely exhausted? That's what a for loop does:

iterable = "ABC"
iterator = iter(iterable)

for item in iterator:
    print(item)

Let's support this functionality in SequenceIterator by telling the for loop how to extract the next item:

class SequenceIterator:
    def __next__(self):
        item = self.items[self.index]
        self.index += 1
        return item

Hold on. What if index goes past the last element of items? We should raise a safe exception for that:

class SequenceIterator:
    def __next__(self):
        try:
            item = self.items[self.index]
        except IndexError:
            raise StopIteration  # Safely says, "no more items in iterator!"
        self.index += 1
        return item

Now, the for loop knows when to stop extracting items from the iterator.

What happens if we now try to loop over the iterator again?

iterable = "ABC"
iterator = iter(iterable)

# iterator.index == 0

for item in iterator:
    print(item)

# iterator.index == 3

for item in iterator:
    print(item)

# iterator.index == 3

Since the second loop starts from the current iterator.index, which is 3, it does not have anything else to print and so iterator.__next__ raises the StopIteration exception, causing the loop to end immediately.

Mateen Ulhaq
  • 21,459
  • 16
  • 82
  • 123