3

Raymond Chen has this to say on his recent post on code optimizations... Obvious optimizations - one that begs to be optimized - tend to be "de-optimizations" if you consider all that need to be considered...

I'm sure you must have come across / even coded optimizations you were embarrassed about after you learnt more...

Care to share?

Vyas Bharghava
  • 6,258
  • 5
  • 37
  • 55

3 Answers3

10

Duff's Device, which is so twisted that it looks like it shouldn't even compile in ISO C:

int n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7:      *to = *from++;
case 6:      *to = *from++;
case 5:      *to = *from++;
case 4:      *to = *from++;
case 3:      *to = *from++;
case 2:      *to = *from++;
case 1:      *to = *from++;
           } while (--n > 0);
}
Matt Campbell
  • 1,142
  • 9
  • 15
6

My favorite example would be the XOR swap algorithm:

// swap these two values:
int x = 4;
int y = 2;
// original:
int temp = x;
x = y;
y = temp;
// optimized version:
x ^= y;
y ^= x;
x ^= y;

Yes, it uses no temporary variable, and can usually be done in three processor cycles, but it sure isn't obvious what it does!

e.James
  • 113,058
  • 40
  • 174
  • 210
  • 2
    This is a particular annoyance to me - people making their code less readable because, OMG, they don't want to add an extra 4 bytes to the stack. What, are these people running on Intel 4004 or RCA 1802 CPUs? – paxdiablo Nov 28 '08 at 01:20
  • I think some compilers nowadays can detect and optimize the first case. – strager Nov 28 '08 at 01:36
  • @strager: and so they should! – e.James Nov 28 '08 at 01:52
  • Doesn't this cause an overflow with some values, too? – Matthew Schinckel Nov 28 '08 at 02:33
  • 5
    This "optimization" adds data dependencies where the original can be dealt with by the hardware via register renaming. In fact, the original can quite possibly take *zero* cycles. – CesarB Nov 29 '08 at 16:56
  • 1
    @Luke: it works with any value. Of course XOR has to treat the values as integers. – Bastien Léonard Apr 16 '09 at 16:17
  • This is now an unoptimization. Causes CPU stalls, and is often slower than the normal solution. – GManNickG Apr 13 '10 at 23:48
  • Why would it cause CPU stalls? I'm not disagreeing with you, I'm genuinely curious about the failure mechanism. At any rate, I would definitely rank this one in the "not so hilarious" category. – e.James Apr 13 '10 at 23:59
  • 4
    Because line 2 cannot execute until line 1 is done, and line 3 cannot execute until line 2 is done. So the CPU will be blasting long then stop and go "1, 2, 3" then keep going. Contrarily, `x = y` and `y = temp` can execute at the same time. – GManNickG Apr 14 '10 at 01:17
  • @GManNickG: I wonder if `x = y` and `y = temp` really can execute at the same time... obviously doing `x = y` before `y = temp` or `y = temp` then `x = y` will yield different results. At the same time ? This implies the underlying hardware would also be able to perform some kind of swap at assembly level. – kriss Jan 24 '13 at 09:02
5

My favourite is

// original code
int a[10];
a[5] = 3;

// optimized code
int a[10];
*(a + 5) = 3;

Yes, all of a sudden, that's magically faster!!</sarcasm>

Johannes Schaub - litb
  • 481,675
  • 123
  • 870
  • 1,191
  • Actually, it should be "int a[10]; a += 5; *(a+N) = 3;" since, on average, it's faster for the CPU to reach a given element from the middle of the array. If you wanted a[9], it would have to count all the way up from 0 with the original solution. In my optimization, it has to count 5 at most :-) – paxdiablo Nov 28 '08 at 02:29
  • 3
    Oh yes, Pointer Math. Yet another reason I'm glad I switched to managed languages. – Matt Campbell Nov 29 '08 at 17:13
  • 1
    @pax, `a+=5` won't work because the pointer to an array can't be changed. – Blindy Oct 19 '09 at 05:22