What's the difference between iterator syntax in range-based loops for STL containers

Question

Although I'm more than knowledgeable about C and OOP in Java, I'm starting to dive into C++ and its particularities. I've read all the basic things about C++, but I'm still confused with some C++11-specific things, both syntax- and performance-wise. On of such things are container iterators, which I have found implemented in a myriad of syntax forms (e.g. range-based loops).

I was wondering which of these are completely equivalent, why would one want to use one or another and what are the effects in performance.

a) auto vs explicit declaration:

Is the auto always supported? Aside from code readability issues, why would a programmer prefer the explicit declaration?

list<int>::const_iterator i = myIntList.begin();    /* Option a1 */
auto i = myIntList.begin();                         /* Option a2 */

for(auto i : myIntList) { ... }                     /* Option a3 */
for(int i : myIntList) { ... }                      /* Option a4 */

b) Compact form vs extended loop form

list<int> l = {1, 2, 3, ...};
for(auto i : l) { ... }                             /* Option b1 */
for(auto i = l.begin(); i != l.end(); ++i) { ... }  /* Option b2 */

c) Constant, non-constant / Access type

Why/when would one prefer to have a reference or a constant in the loop body?

/* Constant/non-constant: */
for(list<int>iterator i = l.begin(); ...) { ... }   /* Option c1 */
for(list<int>const_iterator i = l.begin(); ...) { } /* Option c2 */

for(const int& i : list) { ... }                    /* Option c3 */
for(int& i : list) { ... }                          /* Option c4 */

/* Access by reference/by value: */
for(auto&& i : list) { ... }                        /* Option c5 */
for(auto i : list) { ... }                          /* Option c6 */

d) Loop's exit condition:

/* Option d1: end is defined within the start condition or outside the loop. */
for(auto i = l.begin(), end = l.end(); i != end; ++i) { ... }

/* Option d2: end is defined in the continue condition. */
for(auto i = l.begin(); i != l.end(); ++i) { ... }

Perhaps most of them are identical, and perhaps the choice of one option or another only makes sense for a given loop body, but I wonder what's the purpose of allowing so many possible ways of programming the same behaviour.

I'd separate constant/non-constant from reference/value cases. — Stan, Jul 13 '16 at 10:57
Also iterating over elements vs iterators. I think you are asking too many things here. — juanchopanza, Jul 13 '16 at 10:58
`I wonder what's the purpose of allowing so many possible ways of programming the same behaviour.` because the language has to remain compatible with older versions while adding improved syntax. it's that simple. everything else you asked has been discussed to death elsewhere. flagging as too broad. — underscore_d, Jul 13 '16 at 11:03
@underscore_d Actually, most of the "possible ways" actually do different stuff. — juanchopanza, Jul 13 '16 at 11:23
@juanchopanza true, I meant that range-`for` is often a shorthand for various things that can be specified using explicitly iterator-based syntax, probably oversimplified though. I still maintain that this 'question' is actually many questions, all of which are amply answered elsewhere, whether that's on SO or simply in a good C++ reference. — underscore_d, Jul 13 '16 at 11:25

eerorika · Answer 1 · 2016-07-14T09:36:51.627

Is the auto always supported?

Only since C++11, but so are range based loops, so if you can rely on one, then you should be allowed to rely on another.

Aside from code readability issues, why would a programmer prefer the explicit declaration?

You can do an implicit conversion to another type with explicit type.

Compact form vs extended loop form

You can do more with explicit use of iterators (what you call "extended"), than you can with a range based loop (what you call "compact"). But, if you simply want to iterate the range of elements once, then range based loop has much simpler syntax. This is why it was introduced to the language.

Why/when would one prefer to have a reference [...] in the loop body?

When one cannot copy, or wants to avoid copying of the iterated elements.

Why/when would one prefer to have a [...] a constant in the loop body?

When one only has const access to the range, or one wants to express that they do not intend to modify the object.

Loop's exit condition

If the end pointer is invalidated within the loop, then only d2 is correct.

If the end iterator is invariant, then d1 can be slightly more efficient due to bringing a function call out of the loop.

If the compiler can see the definition of T::end(), then it may optimize by transforming d2 as written, into d1.

In any case, the overhead of a single function call is often negligible, unless the loop body itself is trivial.

I wonder what's the purpose of allowing so many possible ways of programming the same behaviour.

It is true that all loop structures could be implemented using goto.

So, why would c++ allow any other loop structures? for, while etc were introduced to make the program easier to understand, more readable. OK, then why not get rid of goto, when for is easier to understand? That is because for cannot do all that goto can. goto is more general.

Same reasoning holds for this new range based loop. It is easier to reason about than the more general loop structures, and therefore a useful addition. But the general structures still have their place for uses that are not possible with the range based loop. Besides, removing the general for structure would break backwards compatibility of the language, which would be undesirable.

score 3 · Accepted Answer · answered Jul 13 '16 at 11:18

All of the forms are alternatives, and each has advantages and disadvantages compared with others.

1) Explicit declaration of loop variables is preferred when you want a type different than auto might deduce. It is also mandatory if it is necessary to maintain compatibility with C++ implementations predating C++11 (yes, there are practical real-world cases where that is necessary - there is a cost in changing compiler, just as there is a cost in maintaining an old one).

2) The "compact" form (more correctly, a range based loop) is unsuitable if the requirement is anything other than iterating sequentially over all elements in a range. For example, if looping over every second element, if the loop body resizes the container for some reason (which invalidates iterators).

3) A const qualifier signals an intent is that the loop will not change elements of the container. This can be very useful for getting the compiler to diagnose issues where the loop (potentially) makes an unintended change of elements. e.g. calling a non-const member. Without using the const qualifier, there are numerous circumstances where such problems are bugs that are very difficult to track down.

4) Defining the end condition within the start condition results in undefined behaviour if the loop body resizes the container in any way (since that invalidates the iterators). Recalculating the end condition on every loop iteration can prevent such issues.

The purpose of having so many distinct ways of writing loops is programmer convenience. Depending on what the programmer is attempting, different techniques may be appropriate.

The trade-off is that sometimes it is difficult to decide on the most "suitable" form of loop.

score 2 · Answer 3 · answered Jul 13 '16 at 11:07

Range based for(:) loops are, unusually in C++, defined in terms of for(;;) loops. Prior to C++17 it was basically identical to option d1, with the note that the iterator variables are invisible in client code.

Note that range based iterate over elements, not over valid iterators. Youmcan make a ranged based over iterators, but it requires some glue code.

Post C++17 range based for(:) was modified so that it permits end to have a different type than begin. This was useful for the sentinal technique, but is not that important a difference at this point.

auto always works like it does. Barring expression template fancyness, it always works in ways one can easily understand. Using it can make types a touch harder to work out, but sometimes you do not care, and sometimes you are just needlessly repeating the type. Not using it can lead to surprising type mismatches if you screw up.

A reference is an alias. A value is a copy. If ypu want to iterate over copies, you can. If you want to iterate over aliases to the content of the container, you can.

Similarly, you can have a const or non-const view of the contents of the container. You get to pick, depending on what you need or want.

Biasing towards const and values can make code a touch easier to decode for programmers and compilers. However, needless copies of complex structures can be expensive, and const can inhibit move and implicit move and other mutation-for-efficiency tricks.

If you cache end (d1 vs d2) it can sometimes be slightly faster. But usually not noticable, and it is noise. In theory a non-cached end could work better if the container changes, but changing the container while iterating is usually madness and requires modification of the advance clause as well as the termination. The for(:) loop caches end, because the noise argument is gone.

auto&& deduces a forwarding reference for the variable: it can be an lvalue reference, or an rvalue reference-to-temporary. It means "I don't care, don't copy anything but let me work with it". Reference lifetime extension will usually make dangling references not a problem, so long as the source of the data you are binding does not screw up.

Another big thing you are missing is non-member begin. That permits access to C-style arrays as if they where ranges, and if done in the right way lets you easily add range support to third party range-likes so for(:) works on them too.

underscore_d · Answer 4 · 2016-07-13T11:14:18.277

[a] Is the auto always supported? Aside from code readability issues, why would a programmer prefer the explicit declaration?

almost always yes, and they would prefer explicit typing if they wanted to perform a conversion from the RHS type to a non-identical type on the LHS... or just don't like implicit stuff.

b) Compact form vs extended loop form

Is there a question here? The difference is, of course, that you don't have to dereference an iterator in range-for.

If your algorithm requires you to do stuff with iterators and/or operations that invalidate them (e.g. insertion, erasure), then you need iterators, so use the 'old'/'manual' syntax.

But if you don't, then range-for avoids the hassle of having to dereference the iterator by doing that for you.

[c] Why/when would one prefer to have a reference or a constant in the loop body?

The choice between a reference or constant, like anywhere else - notably including the almost identical choice for function arguments - depends on

whether you want to alter the element, hence &
even if not, whether it would be expensive to copy by value into the loop variable on each iteration, hence const &

d) Loop's exit condition:

The 2nd variant caches the end() iterator, which is

possibly faster, but check compiled output to verify
no use if any operation within the loop invalidates the end() iterator

score 1 · Answer 5 · answered Jul 13 '16 at 11:21

a) auto vs explicit declaration:

auto is always supported because all types are known at compile time.

The most important reason to declare the type explicitly is code readability, as it makes it easier to figure out the type when reading the code later. In addition, explicitly declaring the type allows you to convert the item to another type if there is an implicit conversion.

b) Compact form vs extended loop form:

Both are completely equivalent. You should use the compact form when possible.

c) Constant, non-constant / Access type:

The reason why you would want to use const references instead of mutable references is const correctness, a protection mechanism that keeps you from mutating objects when you shouldn't. In particular, if you only have a const reference to the container itself, you can only use const references, or copies of the values, in the loop body.

A common piece of advice is to use const whenever possible, at least with method parameters. If a method receives a const reference to an object, you can rest assured that the method won't mutate the object (unless there is a const_cast somewhere).

Accessing the items by value is the shortest option, works even with const containers, and doesn't have any performance penalty if the item type is a small scalar type (bool, int, double etc.) It's what you'd want to do by default.

You'll need to access the items by reference if they are of a noncopyable type or if you want to mutate them. In addition, if they're of a large class type, accessing the items by reference is faster.

d) Loop's exit condition:

Option d2 only calls l.end() once, which is slightly faster in theory. However, in practice end() is an extremely fast method. Option d1 is almost always preferred because it's shorter.

The purpose of allowing many ways

Consider loops, for example. C++ was evolved from C and it supports C-like loops for the purpose of being familiar for C programmers and backwards compatible with existing C code.

The range-based for loop is just syntactic sugar over the old looping method. There is no reason to disallow the old looping syntax: besides, disallowing it would be quite problematic, because the language constructs used in pre-C++11 loops are still useful elsewhere. The old for syntax is useful for things like looping over numbers:

for (int i = 0; i < 5; ++i)
    std::cout << i;

Likewise, the begin() and end() methods return iterators which can be e.g. passed to standard algorithms in <algorithm>.

Really, the C++ committee doesn't have any reasonable way to prevent old-style loops without also removing useful features in the language or making really weird special cases (like disallowing begin() or end() from being called in a for statement). They don't have any reason to do that, either.

The other cases are similar: there are multiple ways to do the same thing because that's just how language features interact, and there is no reason to attempt to prevent that situation.

Nice answer, but I would add that if iterators are required for manipulation or can be invalidated by operations in the loop body, a range-`for` almost certainly won't cut it. This is probably the key reason that, as you briefly indicated, the 'old' method of looping still has very valid use-cases for less simplistic loops. — underscore_d, Jul 13 '16 at 11:36

What's the difference between iterator syntax in range-based loops for STL containers

5 Answers5

Linked