8

So, say that I have the following code, which gives three examples of what I believe to be unnecessary copies of values.

mov    QWORD PTR [rbp-0x18],rdi
mov    rdx,QWORD PTR [rbp-0x18]
lea    rax,[rbp-0x10]
mov    rsi,rdx
mov    rdi,rax
call   4003e0 <strcpy@plt>

Why is the value in rdi copied to memory at rbp-0x18, then copied back to rdx ? It's then copied to rsi (2 extra copies).

Finally, why the lea + mov for rbp-0x10 to rax, then to rdi ? Is there any reason the following code wasn't generated ?

mov    rsi,rdi
lea    rdi,[rbp-0x10]
call   4003e0 <strcpy@plt>

(My guess is that this is just an artifact of the code generation in the compiler, but I'm making sure there's not some rules of x86-64 that I'm missing.)

perror
  • 19,083
  • 29
  • 87
  • 150
David
  • 285
  • 1
  • 7

1 Answers1

9

There are no artifacts and surely the compiler, and I mean GCC, can generate a better and faster code if told so. The first version of your generated code is non optimized. Why ? Either because -O0 flag (0 level optimizations ==> No optimizations) was specified, or because no optimization flags were specified and by default GCC turns optimizations off.

Below you'll find two versions of the same code. Version 1 with -O0 flag. Version 2 with -O2 flag.

  • Version 1:

     55                      push   rbp
     48 89 e5                mov    rbp,rsp
     48 81 ec 10 04 00 00    sub    rsp,0x410
     89 bd fc fb ff ff       mov    DWORD PTR [rbp-0x404],edi
     48 89 b5 f0 fb ff ff    mov    QWORD PTR [rbp-0x410],rsi
     48 8b 85 f0 fb ff ff    mov    rax,QWORD PTR [rbp-0x410]
     48 83 c0 08             add    rax,0x8
     48 8b 10                mov    rdx,QWORD PTR [rax]
     48 8d 85 00 fc ff ff    lea    rax,[rbp-0x400]
     48 89 d6                mov    rsi,rdx
     48 89 c7                mov    rdi,rax
     e8 40 fe ff ff          call   400400 <strcpy@plt>
     48 8d 85 00 fc ff ff    lea    rax,[rbp-0x400]
     48 89 c7                mov    rdi,rax
     e8 41 fe ff ff          call   400410 <puts@plt>
     b8 00 00 00 00          mov    eax,0x0
     c9                      leave
     c3                      ret
     66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
     00 00 00  
    
  • Version 2:

     48 81 ec 08 04 00 00    sub    rsp,0x408
     48 8b 76 08             mov    rsi,QWORD PTR [rsi+0x8]
     48 89 e7                mov    rdi,rsp
     e8 ad ff ff ff          call   400400 <strcpy@plt>
     48 89 e7                mov    rdi,rsp
     e8 b5 ff ff ff          call   400410 <puts@plt>
     31 c0                   xor    eax,eax
     48 81 c4 08 04 00 00    add    rsp,0x408
     c3                      ret
     0f 1f 00                nop    DWORD PTR [rax]
    

If you're interested in the optimizations performed by GCC you should read this link, and this one too. You can also check the GCC summit publications.

yaspr
  • 2,663
  • 14
  • 20
  • Ok, seems to be what I was assuming. I was posting code from a wargame challenge, and I suppose no optimization makes sense there. – David Jun 05 '14 at 17:32