How did varargs in C develop?

Question

C has a feature for variadic functions, my understanding is this feature was originally a hack, relying on the simple stack-based parameter passing used by early C implementations and that some time later it became a compiler feature, allowing it to support more complex ABIs. I'd like to know more about the history of the feature but my searching isn't turning up much.

Some particular points.

When and by who was the feature first introduced?
When and by who was the feature first implemented for ABIs that passed some parameters in registers requiring the compiler to save those registers on function entry so the varargs code could iterate over them.
When and by who was the feature first implemented for ABIs that reordered parameters and/or used multiple types of register requiring va_list to be more than a simple pointer.

Probably something to do with how int fn() means a function taking any number of arguments. — Omar and Lorraine, Jul 28 '21 at 09:54
What do you mean by "varargs", specifically? The existence of at least one function with a variable number of arguments, however implemented? A generalized facility for any C programmer? A specific header file? — dave, Jul 28 '21 at 11:02
This is surprisingly hard to research. The earliest Unix in which the compiler supported variadic functions written in C was probably 4.1c2BSD. But the header file includes one particularly puzzling definition: the va_end macro, which expands to nothing. Was the author of the header that forward-looking, or was the header adapted from somewhere else? — user3840170, Jul 28 '21 at 11:19
The ANSI C committee specifically declined to standardize varargs.h and instead invented stdargs.h. I believe the reason was the unimplementability on non-stack machines. I have the Rationale on paper somewhere... — dave, Jul 28 '21 at 11:20
@another-dave I would assume the reason was an intent to eventually deprecate K&R-style function declarations, which varargs.h relied on. — user3840170, Jul 28 '21 at 11:21
Ah, so there are two parts to this: (1) the language syntax to declare a function with a variable number of args, and (2) the macros to access those args. I had been thinking solely of the second part. — dave, Jul 28 '21 at 11:24
I'd bet you that the first shot on any implementation for a variadic function was "simple stack-based parameter passing". With enough time at their hands compiler builders may turn it into intrinsics with arbitrary optimizations; but the incentive for optimization is small because 99% of variadic functions are wrappers for printf anyway, and I/O time dominates all optimizations elsewhere. — Peter - Reinstate Monica, Jul 29 '21 at 13:47
In oldschool C a function caller doesn't know if the called function is variadic or not. So if your ABI for regular functions passes parameters in registers your ABI for variadic functions needs to do the same. — Peter Green, Jul 29 '21 at 13:58
It would be interesting to know what the first compiler to support variadics on an ABI that did not pass parameters in registers was. — Peter Green, Jul 29 '21 at 14:00
@PeterGreen: You mean an ABI that does pass parameters in registers, I assume? No later than 1985. MIPS, ARM, SPARC are all using register-based calling conventions. — DevSolar, Jul 29 '21 at 15:16
@PeterGreen: It might have been helpful for the Standard to offer two different syntactic forms for variadic function prototypes--one of which implementations would be allowed (but not required) to process in a manner consistent with existing practice, and one of which would have allowed a called function to know the types of arguments. An implementation which sought to minimize the code size at call sites in exchange for execution speed and the need to include a few "variable args management" library functions in an executable could on many platforms have produced smaller call-site code... — supercat, Jul 29 '21 at 20:20
...for common use cases using the latter approach than the former (generate a call instruction to a "call variadic function" library routine, which would be followed in code space by the address of the function to call and a list of descriptors of the arguments, indicating for each the type and either a value, a frame-relative address, a global address, or an frame-relative address of a pointer to the value). The library routine would then create an object on the stack containing a pointer to a "process arguments" function and pointer to the descriptor, and invoke the desired function. — supercat, Jul 29 '21 at 20:25
I have a memory that dmr said that the invention of <stdarg.h> was among the pieces of X3J11's work he was most happy with. He was on record as saying that C's biggest flaw was that "varargs functions are impossible, yet printf exists." — Steve Summit, Jul 29 '21 at 21:50
Varargs was written by Andrew Koenig, later of C++ fame. Might try contacting him for the history. I think it appeared in Unix/TS or whatever USG called their Unix just before that. — Mark Plotnick, Jul 30 '21 at 06:41
@SteveSummit: Of course, C without varargs wasn't unique in that regard. Pascal's writeln function can't be written in (standard) Pascal. — dan04, Jul 30 '21 at 22:21
Note that the "simple argument passing convention" was the one advocated by the designers of the VAX-11 (I still have a text on assembly language programming for it somewhere) and also still on intel's x86 much later. — vonbrand, Aug 01 '21 at 23:51
@supercat, and end up with two types of variadic functions, which the programmer will inevitably mix up to get extremely entertaining fireworks? No, thank you so very much. If anything, the way GCC does it (via __builtin_... compiler internals) should allow to do it automatically if warranted. — vonbrand, Aug 01 '21 at 23:55
@vonbrand: Some platforms that have multiple calling conventions have different linker-symbol naming conventions to go with them. For example, IIRC, compilers targeting both 68000-based Macintosh and 8086-based DOS prepend C-calling-convention functions with an underscore, but store Pascal-calling-convention functions' names in allcaps with no underscore. — supercat, Aug 02 '21 at 13:14
@vonbrand: At present, there are already separate functions that accept a va_list and those that use ... notation. My change would essentially be to have a syntax that would allow a programmer to write one function and have caller-side code build and pass a va_list. — supercat, Aug 02 '21 at 13:16

user3840170 · Accepted Answer · 2022-11-13T17:05:39.877

It was a pointer arithmetic hack, later abstracted away into a more portable form in some version of Unix; even later, it was adapted into ANSI C.

In many languages (like Pascal for example), variadic functions, if they were included at all, had to be handled as special cases. B, which was the predecessor to C, did not have to, because B did not require functions to be declared in advance at all. Because the caller did not have the information how many arguments a function accepted, the language implementation had to accept function calls with any number of arguments. This trait was inherited by early C, in which function declarations were optional, and even if present, they did not have to state how many arguments the function accepts. Implicit function declarations have been removed in C99, but only recent standardisation efforts are set to remove declarations that do not specify the function signature fully.

Early implementations of Unix took advantage of this freedom in defining printf, by having it read a variable number of arguments depending on the format string received. Walking the argument list was done by pointer arithmetic on parameter pointers. V6 Unix for the PDP-11 did so in C⁰, and likewise did its Interdata 7/32 port¹:

printf(afmt, args)
char *afmt;
{
    register char *fmt;
    register int *argp, left, c, n;
/*
 * argp is used to step along list of arguments, since
 * the number of args is not known in advance.
 */
argp = &amp;args;

for (fmt = afmt; c = *fmt; fmt++) {
    /* ... */

    if (prf1(c, *argp, left) &gt;= 0)
        argp++;     
}

}

V7 Unix added sprintf and fprintf and switched the implementation to PDP-11 assembly. You will notice this version also added support for the long data type, which was previously absent. (Older versions seem to have alternated between C and assembly implementations of printf.)

At some point, the <varargs.h> header was created. Though I can say with some confidence it came from Unix, I am having a hard time definitively establishing in which version it appeared first. If The Unix Heritage Society site is to be believed, <varargs.h> was there as early as in V7 Unix, dating January 1979. The contents of the header are pretty trivial:

typedef char *va_list;
# define va_dcl int va_alist;
# define va_start(list) list = (char *) &va_alist
# define va_end(list)
# define va_arg(list,mode) ((mode *)(list += sizeof(mode)))[-1]

This is basically encapsulating the same hack as above in just a little more abstract interface. The user of the header was supposed to declare a dummy parameter named va_alist, set a dummy type for it using the va_dcl macro, and then iterate over the arguments using a va_list variable:

sum(va_alist)
    va_dcl
{
    va_list ap;
    int result;
    int x;
va_start(ap);
result = 0;
while ((x = va_arg(ap, int)) != -1) {
    result += x;
}
va_end(ap);

return result;

}

I would assume that the invention of <varargs.h> was part of an effort to make Unix more portable to different architectures. Of particular note is the va_end macro: it seems to have been added purely as a forward-compatibility measure. It expands to nothing at all in the V7 Unix version, but allows other compilers to choose a wildly different implementation strategy.²

The ANSI C committee adapted the <varargs.h> header into a slightly different <stdarg.h> header that was more compatible with new-style, prototype-based function declarations (and as such, with non-traditional calling conventions that required prototypes to be present) while doing away with dummy parameter declarations. The changes being that va_dcl has been removed, while va_start was modified to additionally require passing the name of the last non-variadic parameter. The latter allowed existing implementations to keep using the same pointer arithmetic trick to access variadic arguments as they did before. This is explicitly mentioned in the same rationale document (§4.8.1.1) that @another-dave linked in his answer:

The parmN argument to va_start is an aid to writing conforming ANSI C code for existing C implementations. Many implementations can use the second parameter within the structure of existing C language constructs to derive the address of the first variable argument. (Declaring parmN to be of storage class register would interfere with use of these constructs; hence the effect of such a declaration is undefined behavior. Other restrictions on the type of parmN are imposed for the same reason.) New implementations may choose to use hidden machinery that ignores the second argument to va_start, possibly even hiding a function call inside the macro.

Modern compilers indeed seem to do the latter: in GCC and Clang, va_start is defined in terms of an opaque compiler intrinsic __builtin_va_start, and although it still requires the last non-variadic parameter to be passed, it is only for the sake of being able to warn when it is not what it ought be.

⁰ Note the source uses =- and =+ for the operators that are now spelt -= and +=.

¹ Linking to the .a file, which fortunately displays well enough. The link to the tarball from the TUHS page for this release is now dead; save it while you can.

² Although in practice I am having a hard time finding an implementation of va_end that does anything nontrivial. Other than the completely opaque definition in terms of __builtin_va_end used by GCC and Clang, most expand to nothing at all, or to (void) 0; some reset any pointers within the passed va_list (or va_list itself) to null. The most interesting version of va_end is the one I found in the Acorn C/C++ manual (p. 103):

#define va_end(ap) ((void)(*(ap) = (char *)-256))

In principle, though, it would be valid to have an implementation that performs memory allocation in va_start, or even opens a block, intending it to be matched by a counterpart in va_end. The ANSI C design rationale document remarks on va_end (§4.8.1.3):

In many implementations, this is a do-nothing operation; but those implementations that need it probably need it badly.

The ‘probably’ seems telling: it’s as if the committee wasn’t sure either whether anyone actually found va_end useful.

Your link for "the contents" seems to have a history button which seems to indicate that the file was present in "BSD 3", do you have a reason for saying 4.1c2 was the earliest? — Peter Green, Jul 28 '21 at 17:14
@PeterGreen You mean, because the file is available on the branch named BSD-3-Snapshot-Development? I noticed that too at some point, but then I also noticed the commit message names 4.1c2BSD as the version it comes from, and the commit date is 1982, postdating 3BSD by four years. The file is also absent from earlier 4.1BSD branches. So I would assume that being on BSD-3-Snapshot-Development is an error in the repo’s (re)construction. Whether that casts doubt on its veracity as a whole, I leave up to you. — user3840170, Jul 28 '21 at 17:39
I was reffering to https://github.com/dspinellis/unix-history-repo/commit/a717a93cc2f0c2faced6e8581bbe184eb38a05f6#diff-64a5c371b0fddcc6692443b6f599c09cdd52b07e4cb2cab0ef497a33c3ca1b9f which has a date of 1979 and describes itself as "BSD 3 development" — Peter Green, Jul 28 '21 at 17:41
@PeterGreen Okay, I have no idea what to think now… at least about that Unix history repo. The other source does suggest the focus on portability started with 4.2BSD (thus indirectly corroborating the origin of <varargs.h>), but I am less sure now. — user3840170, Jul 28 '21 at 17:48
I think the 4.2BSD theory is falling apart. https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/include/varargs.h shows varargs.h was present as early as in V7 Unix. But I don’t know how credible either source is… — user3840170, Jul 28 '21 at 22:18
Isn't the use of variable length argument lists even older than printf? Though my personal experience doesn't go back to the beginning, AFAIK there's always been "int main (int argc, char *argv)", and a lot of basic nix utilities depend on being able to accept an arbitrary number of arguments. And if main can do it, extending the capability to other functions seems obvious. — jamesqf, Jul 29 '21 at 04:55
The "hidden machinery" (__builtin_va_start) becomes necessary as soon as the calling conventions on the machine in question don't use the stack for parameters. That would be actually most machines these days, including but not limited to x86_64. That is why you don't see the old pointer hack anymore -- it wouldn't work. — DevSolar, Jul 29 '21 at 15:04
@DevSolar Not really. An unusual calling convention could also be taken care of in user code. TinyCC’s implementation does so. The more important reason is that pointer tricks don’t play well with provenance- and type-based aliasing optimisations. — user3840170, Jul 29 '21 at 15:23
@jamesqf: But "int main (int argc, char **argv)" takes only two arguments. The shell handles bundling up the arguments into an array, and the executable ultimately gets a pointer to an array. What makes varargs distinctive is that it supports a variable number of arguments within C: the calling function passes the arguments directly, exactly as for a non-varargs function, and the varargs sorts it out (nowadays by using standard mechanisms for that, but originally by "breaking through" the usual C abstractions and directly understanding the underlying calling convention). — ruakh, Jul 29 '21 at 17:14
Did Acorn have anything special at the address -256, or is the macro just intended to be an “invalid” pointer? Why not use NULL? — dan04, Jul 29 '21 at 20:14
@user3840170: Compilers that wanted to efficiently process code that does weird things with pointers could do so by treating certain kinds of suspicious actions as indications that certain objects' addresses had been leaked outside the compiler's control and, at least within the context of such leakage, a compiler should allow for the possibility that that pointers whose provenance can't be traced back to a time before the leakage might identify the potentially-leaked objects. Handling that would be far easier than dealing with questions of how arguments are represented. — supercat, Jul 29 '21 at 20:32

DrSheldon · Answer 2 · 2021-07-28T20:07:31.803

To add to the other answers, the func(arg, ...) syntax first developed in C++, and then was incorporated into the ANSI C standard, sometime between 1984 and 1988.

My copy of The C Programmers Handbook, AT&T Bell Laboratories, February 1984 is based on K&R 1st edition (published in 1978). It discusses printf and scanf in detail, and states

The printf functions can have a variable number of arguments. The number and type of arguments should match the conversion controls in the format string.

yet makes no mention of variable-argument lists elsewhere (despite an entire page devoted to the details of function calls, and documentation of all of the then-standard libraries).

However, it makes an appearance in The C Programming Language, 2nd edition, by Kernighan and Ritchie, 1998. Appendix A6.8.3 states

If the paramter type list ends with an ellipsis ", ...", then the function may accept more arguments than the parameters explicitly described; see A7.3.2.

and

The ellipsis notation ", ..." for variadic functions is also new, and, together with the macros in the standard header library <stdarg.h>, formalizes a mechanism that was officially forbidden but unofficially condoned in the first edition.

These notations were adapted from the C++ language.

Appendix C also states

The Standard introduces (borrowing from C++) the notion of a function prototype declaration that incorporates the types of parameters, and includes an explicit recognition of variadic functions together with an approved way of dealing with them.

K&R 2nd ed. is, I recall, the post-ANSI edition? +1 for the "adopted from C++" contribution to the discussion. — dave, Jul 28 '21 at 21:03
@another-dave: Correct, it says "ANSI C" right on the cover. — DrSheldon, Jul 28 '21 at 21:07
Note that C++ used func(arg...) with no comma. When the ANSI standardization process introduced "prototypes" they used C++ declaration syntax except for inserting this comma. I recall a time, before Cfront was updated, it was problematic to read the "new" C compiler's headers. — JDługosz, Jul 29 '21 at 15:38

score 13 · Answer 3 · answered Jul 28 '21 at 11:39

For the second part of the question - the first standardized support was in ANSI C, which explicitly considered issues of portability and implementability with non-stack calling conventions.

This appears to me to be pure invention of the standardization committee, i.e., adding something to the language that did not exist before, but of course there could have been existing implementations that guided them.

From the Rationale for American National Standard for Information Systems -- Programming Language -- C.

3.7.1 The Committee considered it important that a function taking a variable number of arguments, such as printf, be expressible portably in C. ... Several diverse implementations, however, can implement argument passing more efficiently if the arguments are not required to be contiguous. Thus the Committee decided to hide the implementation details of determining the location of successive elements of an argument list behind a standard set of macros (see 4.8)

and

4.8 Variable Arguments <stdargs.h>
... These macros, modelled after the UNIX <varargs.h> macros, have been added to enable the portable implementation in C of library functions such as printf and scanf (4.9.6). Such implementation could otherwise be difficult, considering newer machines that may pass arguments in machine registers rather than using the more traditional stack-oriented methods. ...

That may have been the first support by an "official" standards body, but compilers that sought to be compatible with each other generally processed arguments the same way as the original PDP-11 implementation had done, even in cases where such handling would differ from the target platform's normal argument-passing conventions. — supercat, Jul 28 '21 at 19:25

How did varargs in C develop?

3 Answers3