How would a C variable argument function such as printf(char* format, ...) look like when disassembled?
Is it always identified by calling convention, or are there more ways to identify it?
How would a C variable argument function such as printf(char* format, ...) look like when disassembled?
Is it always identified by calling convention, or are there more ways to identify it?
It is very simple in some architectures, and not very obvious in others. I'll describe a few I'm familiar with.
Probably the easiest to recognize. Because of the boneheaded decision to specify the number of used XMM registers in al, most vararg functions begin like this:
push rbp
mov rbp, rsp
sub rsp, 0E0h
mov [rbp+var_A8], rsi
mov [rbp+var_A0], rdx
mov [rbp+var_98], rcx
mov [rbp+var_90], r8
mov [rbp+var_88], r9
movzx eax, al
lea rdx, ds:0[rax*4]
lea rax, loc_402DA1
sub rax, rdx
lea rdx, [rbp+var_1]
jmp rax
movaps xmmword ptr [rdx-0Fh], xmm7
movaps xmmword ptr [rdx-1Fh], xmm6
movaps xmmword ptr [rdx-2Fh], xmm5
movaps xmmword ptr [rdx-3Fh], xmm4
movaps xmmword ptr [rdx-4Fh], xmm3
movaps xmmword ptr [rdx-5Fh], xmm2
movaps xmmword ptr [rdx-6Fh], xmm1
movaps xmmword ptr [rdx-7Fh], xmm0
loc_402DA1:
Note how it's using al to determine how many xmm registers to spill onto the stack.
In Win64 it's less obvious, but here's one sign: the registers that correspond to the elliptic parameters are always spilled onto the stack and at positions that line up with the rest of arguments passed on the stack. E.g. here's the printf's prolog:
mov rax, rsp
mov [rax+8], rcx
mov [rax+10h], rdx
mov [rax+18h], r8
mov [rax+20h], r9
Here, rcx contains the fixed format argument, and the elliptic arguments are passed in rdx, r8 and r9 and then on the stack. We can observe that rdx, r8 and r9 are stored exactly one after another, and just below the rest of the arguments, which begin at rsp+0x28. The area [rsp+8..rsp+0x28] is reserved exactly for this purpose, but the non-vararg functions often don't store all register arguments there, or reuse that area for local variables. For example, here's a non-vararg function prolog:
mov [rsp+10h], rbx
mov [rsp+18h], rbp
mov [rsp+20h], rsi
You can see that it's using the reserved area for saving non-volatile registers, and not spilling the register arguments.
ARM calling convention uses R0-R3 for the first arguments, so vararg functions need to spill them onto stack to line up with the rest of parameters passed on the stack. Thus you will see R0-R3 (or R1-R3, or R2-R3 or just R3) being pushed onto stack, which usually does not happen in non-vararg functions. It's not a 100% foolproof indicator - e.g. Microsoft's compiler sometimes pushes R0-R1 onto the stack and accesses them using SP instead of moving to other registers and using that. But I think it's a pretty reliable sign for GCC. Here's an example of GCC-compiled function:
STMFD SP!, {R0-R3}
LDR R3, =dword_86090
STR LR, [SP,#0x10+var_14]!
LDR R1, [SP,#0x14+varg_r0] ; format
LDR R0, [R3] ; s
ADD R2, SP, #0x14+varg_r1 ; arg
BL vsprintf
LDR R3, =dword_86094
MOV R2, #1
STR R2, [R3]
LDR LR, [SP+0x14+var_14],#4
ADD SP, SP, #0x10
RET
It's obviously a vararg function because it's calling vsprintf, and we can see R0-R3 being pushed right at the start (you can't push anything else before that because the potential stack arguments are present at SP and so the R0-R3 have to precede them).
(My answer is x86-specific).
Internally to the function, it looks just like any other function. The only difference being, at some point during the function, it will take the (stack) address of the last non-variable argument, and increment it by the word size on the platform; this is then used as a pointer to the base of the variable arguments. Externally to the function, you will observe that different numbers of arguments are passed as parameters to the function (and typically one of the non-variable arguments will be some obvious indicator as a variable argument function, such as a hard-coded format string or something similar). Variable argument functions can not be __stdcall, since __stdcall relies upon precompiled ret XXh instructions, whereas the point of a variable argument function is that an unknown amount of parameters can be passed. Hence, these functions must be __cdecl, i.e. the caller must correct the stack to remove all pushed arguments.