Identifying variable args function

Question

How would a C variable argument function such as printf(char* format, ...) look like when disassembled?

Is it always identified by calling convention, or are there more ways to identify it?

score 18 · Accepted Answer · answered Mar 21 '13 at 00:14

It is very simple in some architectures, and not very obvious in others. I'll describe a few I'm familiar with.

SystemV x86_64 (Linux, OS X, BSD)

Probably the easiest to recognize. Because of the boneheaded decision to specify the number of used XMM registers in al, most vararg functions begin like this:

    push    rbp
    mov     rbp, rsp
    sub     rsp, 0E0h
    mov     [rbp+var_A8], rsi
    mov     [rbp+var_A0], rdx
    mov     [rbp+var_98], rcx
    mov     [rbp+var_90], r8
    mov     [rbp+var_88], r9
    movzx   eax, al
    lea     rdx, ds:0[rax*4]
    lea     rax, loc_402DA1
    sub     rax, rdx
    lea     rdx, [rbp+var_1]
    jmp     rax
    movaps  xmmword ptr [rdx-0Fh], xmm7
    movaps  xmmword ptr [rdx-1Fh], xmm6
    movaps  xmmword ptr [rdx-2Fh], xmm5
    movaps  xmmword ptr [rdx-3Fh], xmm4
    movaps  xmmword ptr [rdx-4Fh], xmm3
    movaps  xmmword ptr [rdx-5Fh], xmm2
    movaps  xmmword ptr [rdx-6Fh], xmm1
    movaps  xmmword ptr [rdx-7Fh], xmm0
loc_402DA1:

Note how it's using al to determine how many xmm registers to spill onto the stack.

Windows x64 aka AMD64

In Win64 it's less obvious, but here's one sign: the registers that correspond to the elliptic parameters are always spilled onto the stack and at positions that line up with the rest of arguments passed on the stack. E.g. here's the printf's prolog:

  mov     rax, rsp
  mov     [rax+8], rcx
  mov     [rax+10h], rdx
  mov     [rax+18h], r8
  mov     [rax+20h], r9

Here, rcx contains the fixed format argument, and the elliptic arguments are passed in rdx, r8 and r9 and then on the stack. We can observe that rdx, r8 and r9 are stored exactly one after another, and just below the rest of the arguments, which begin at rsp+0x28. The area [rsp+8..rsp+0x28] is reserved exactly for this purpose, but the non-vararg functions often don't store all register arguments there, or reuse that area for local variables. For example, here's a non-vararg function prolog:

  mov     [rsp+10h], rbx
  mov     [rsp+18h], rbp
  mov     [rsp+20h], rsi

You can see that it's using the reserved area for saving non-volatile registers, and not spilling the register arguments.

ARM

ARM calling convention uses R0-R3 for the first arguments, so vararg functions need to spill them onto stack to line up with the rest of parameters passed on the stack. Thus you will see R0-R3 (or R1-R3, or R2-R3 or just R3) being pushed onto stack, which usually does not happen in non-vararg functions. It's not a 100% foolproof indicator - e.g. Microsoft's compiler sometimes pushes R0-R1 onto the stack and accesses them using SP instead of moving to other registers and using that. But I think it's a pretty reliable sign for GCC. Here's an example of GCC-compiled function:

STMFD   SP!, {R0-R3}
LDR     R3, =dword_86090
STR     LR, [SP,#0x10+var_14]!
LDR     R1, [SP,#0x14+varg_r0] ; format
LDR     R0, [R3]        ; s
ADD     R2, SP, #0x14+varg_r1 ; arg
BL      vsprintf
LDR     R3, =dword_86094
MOV     R2, #1
STR     R2, [R3]
LDR     LR, [SP+0x14+var_14],#4
ADD     SP, SP, #0x10
RET

It's obviously a vararg function because it's calling vsprintf, and we can see R0-R3 being pushed right at the start (you can't push anything else before that because the potential stack arguments are present at SP and so the R0-R3 have to precede them).

Awesome, thanks for breaking down the different scenarios with examples! — Glides, Mar 21 '13 at 17:39

score 10 · Answer 2 · edited Apr 17 '13 at 05:11

(My answer is x86-specific).

Internally to the function, it looks just like any other function. The only difference being, at some point during the function, it will take the (stack) address of the last non-variable argument, and increment it by the word size on the platform; this is then used as a pointer to the base of the variable arguments. Externally to the function, you will observe that different numbers of arguments are passed as parameters to the function (and typically one of the non-variable arguments will be some obvious indicator as a variable argument function, such as a hard-coded format string or something similar). Variable argument functions can not be __stdcall, since __stdcall relies upon precompiled ret XXh instructions, whereas the point of a variable argument function is that an unknown amount of parameters can be passed. Hence, these functions must be __cdecl, i.e. the caller must correct the stack to remove all pushed arguments.

Identifying variable args function

2 Answers2

SystemV x86_64 (Linux, OS X, BSD)

Windows x64 aka AMD64

ARM