This is a great article which talks about low level optimization techniques and shows an example where the author converts expensive divisions into cheap comparisons. https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920
For those who don't want to click, essentially he converted this:
uint32_t digits10(uint64_t v) {
uint32_t result = 0;
do {
++result;
v /= 10;
} while (v);
return result;
}
Into this:
uint32_t digits10(uint64_t v) {
uint32_t result = 1;
for (;;) {
if (v < 10) return result;
if (v < 100) return result + 1;
if (v < 1000) return result + 2;
if (v < 10000) return result + 3;
// Skip ahead by 4 orders of magnitude
v /= 10000U;
result += 4;
}
}
Resulting in up to a 6 times speed up.
While comparisons are very cheap, I've always heard that branches are very expensive because they can cause pipeline stalls. Because of the conventional wisdom about branching, I never would have considered an approach like this.
Why is branching not a bottleneck in this case? Is it because we return right after the each of the comparisons? Is it because the code size here is small and thus there is not too much for the processor to mispredict? In what cases would it be a bottleneck and start to dominate the cost of the divisions? The author never speaks about this.
Can anyone resolve the apparent contention between cheap comparisons and expensive branches? Of course the golden rule of optimization is that one must always measure. However, it would at least be good to have some intuition about this issue so that one could use comparisons intelligently when trying to come up with new approaches to making code faster.
Thanks!