Features


Using Variable-Length Argument Lists in C

John Kodis


John Kodis has developed a wide variety of computer applications and has programmed in nearly all of the major languages which have come into fashion throughout the past decade. He is currently using C to develop supercomputer fileserver systems while

C has always had the ability to operate on functions with a variable number of arguments, although this option hasn't been commonly used because the method for accessing the optional parameters wasn't documented until recently. For example, the first edition of Kernighan and Ritchie's The C Programming Language, one of the standard references on the language, makes no mention of the operation. The method to access the optional parameters used a certain amount of machine-dependent code. Such functions could not be written as portably as other C code. So, most programmers saw functions accepting a variable number of arguments only as part of the C runtime library in the printf, scanf, and related functions. But user-defined functions that accept variable arguments can benefit the programmer also.

According to the ANSI C standard, the ellipsis (...) provides a controlled way to circumvent the type-checking of function calls against function prototypes. The compiler will not report a call to the same function with a different length argument list as an error. The stdarg.h header file supplied with ANSI-compliant C compilers includes macros that let you access the optional arguments in a variable-length parameter list in a portable manner. These macros are va_start, used to set up variable argument list processing, va_arg, which retrieves the next argument from the argument list, and va_end, which performs any cleanup necessary before leaving the function. stdarg.h also provides a type, va_list, with which the three argument-list macros keep track of the position of the next argument in the list.

A Sample Implementation

To show how a simple routine would be written using the stdarg.h macros to access a variable number of parameters, the program in Listing 1 implements a function called maxn. The maxn function takes one mandatory argument — the count of optional arguments. The optional arguments follow. maxn searches through the number of optional arguments specified by the first integer argument, determines the largest, then returns this value to the calling function. Listing 2 shows examples of valid calls. As shown in Listing 1, the ellipsis after n tells the compiler that n may be followed by any number of unnamed arguments of any type.

Unfortunately, in such a declaration the compiler has no way to check the types of the optional parameters. Even though the maxn function is clearly intended to handle only integer arguments, a compiler will readily process a statement such as

/* an erroneous call */
maxn(2, 3.14159, acos(-1.0));
or even

/* equally erroneous */
maxn(3, NULL, "This is a
string", '!');
since only the first parameter's type can be checked. The programmer must verify the validity of the other parameters. Even so, this declaration is an improvement over the obsolete form in the original K&R parameter definition style

int maxn (n)
which doesn't even allow type-checking the first parameter.

The maxn function body declares a variable named argp (for argument pointer) with a type of va_list. stdarg.h defines va_list as a type for use with the other variable-length argument-list macros. A variable of this type must be declared and supplied as the first parameter to all of the argument-list access macros.

The next line in maxn uses the va_start macro to initialize the argument pointer argp. va_start is called with two parameters: the name of the va_list variable (argp, in this example), and the name of the function's last named argument. One implication of this scheme is that a function that accepts a variable number of arguments must have at least one named parameter. The compiler will not accept attempts to write a function such as

int func0(...) { ... }
which is just as well since there is no named argument to use with the va_start macro or to use in determining the number of parameters that follow.

After initializing the argp variable, maxn fetches each of the n arguments, using the macro calls to va_arg. The va_arg macro accepts two parameters: the name of the argument list pointer (as always) and the type of the parameter to be retrieved from the argument list. va_arg returns the value of the next argument being passed to the function and updates the va_list variable so that the next va_arg macro call will retrieve the following argument. Supplying va_arg with the type of the argument to be retrieved serves two purposes. First, it allows casting the va_arg macro to the type of the argument being fetched. Second, it tells the va_arg macro by how much the va_list variable must be adjusted to point to the next argument. The va_arg macro can be a bit confusing since it can return a different type each time that it is called. However, as long as there is agreement between the type specified in the macro call, the type of the variable receiving the value, and the parameter passed in the argument list, no problems will result.

Each call of the va_arg macro must update the value of the va_list variable so that this latter is set up to retrieve the next argument from the function's argument list. Occasionally, some unexpected results can occur. For example, some programmers define a max macro that looks something like

#define max(a, b) ( (a)>(b) ?
   (a) : (b) )
Now suppose the body of the while loop in the maxn function is changed from

val = va_arg(argp, int);
if (val > max_val)
   max_val = val;
to incorporate the max definition

max_val = max( max_val, va_arg(argp, int) );
Depending on the compiler, the va_arg macro may be called twice whenever a new maximum value is encountered. Using the local variable val to receive the value of the current argument prevents this potential for excess calls to the va_arg macro.

After the while loop terminates, a call to the va_end macro will perform whatever cleanup is required after argument processing is complete. This call can come anywhere between the end of argument processing and the return of the function. Finally, the maxn function returns the value of max_val as its result.

Implementation Details

While the details of the argument-list access macros will vary somewhat from one CPU type to another and even from one compiler to another, some common threads run through all implementations. To make this particular discussion more concrete, I'll assume a CPU architecture in which parameters are passed by pushing them onto a stack in memory, and that the stack on this machine grows from high-memory addresses toward low-memory addresses. This description fits the CPUs on the Motorola 68000 series of microprocessors, the DEC VAX computers, the Intel 8088, 8086, through the Intel 80486 microprocessors, as well as most other CPUs. I'll assume further that all parameters on this machine are aligned on a four-byte boundary. While such an alignment narrows the field a bit, some type of alignment is a fairly common requirement.

When a C compiler encounters a call to a function, it evalutes the expressions that make up the function's argument list. It writes the resulting values to the stack in such a way that when the called function is entered, the left-most argument in the caller's parameter list is located nearest the top of the stack. Since the C standard states that the order in which a function's parameters are evaluated is unspecified, programs that count on a compiler's behavior in this area are flawed. As a practical matter, however, most compilers for the type of machine under discussion here will generate code that evaluates arguments and pushes parameter values on the stack starting with the rightmost parameter and proceeding to the left. Just don't count on it. This set of operations generates the correct parameter list order while allowing values to be pushed on the stack immediately after being calculated.

As an example, the statement

printf("Five=%d. \n", 5)
will generate code to push the value 5 onto the stack, to push the address of the character string "Five=%d. \n" onto the stack, and to call the printf function. On entering printf, the stack will resemble the diagram in
Figure 1.

Because the left-most parameters are pushed onto the stack last, they will always be located a fixed distance from the address that the stack pointer points to on entry to the function. You can therefore locate the left-most arguments in the same way as if the called function had a fixed number of parameters. In the case of printf, the left-most argument is a format string that is scanned to determine the number and type of arguments that follow. Furthermore, the address of the last required parameter can be used to set up the va_list variable so that the subsequent va_arg macro calls can step through the argument list starting with the first of the optional parameters.

Typically, the va_start macro sets the va_list argument to the next location beyond the address of the last required argument by adding the argument's size to its own location and adjusting the sum to account for any alignment requirements. In the example above, the first parameter (0x1234) is located eight bytes above the location pointed to by the stack pointer on entering the called routine. Adding the size of this parameter to its address obtains the address of the next parameter (5) at SP+12.

This pattern is continued within the called routine. Each call to the va_arg macro retrieves the argument pointed to by the va_list variable, casts it into proper form for assignment to a variable of the specified type, and adjusts the va_list variable to allow the next macro call to repeat this pattern. For successful repetitions, you must correctly specify the type of the variable to be retrieved in the va_arg macro call. The va_arg macro applies the sizeof operator to this type specification to determine the amount to increment the va_list variable. Responsibility for getting the type correct lies entirely with the programmer; the compiler cannot perform any kind of validation on the type of parameter being fetched.

Finally, after all of the parameters in the argument list have been retrieved, the va_end macro is called to perform any cleanup that might be required. On most systems, no cleanup is required, so the macro does nothing but evaluate to ((void)0). However, the standard states that you must call va_end after all parameters have been retrieved but before the function returns. To be safe, always include the va_end macro as the standard requires, even if an application seems to run correctly without it. The code you write today may some day be ported to a machine that requires you to clean up the stack.

The printf Functions

The C runtime library has always provided a variety of formatted input and output routines for variable-length argument lists, the most basic of which are the scanf and printf functions. These two functions access the stdin and stdout files, which are normally associated with the keyboard and terminal. The fscanf and fprintf functions provide formatted input and output from or to a file. These functions are the same as scanf and printf, except for a file pointer as an additional leading argument. For formatted data transfers from or to character strings in memory, the sscanf and sprintf functions are available. They are similar to the fscanf and fprintf functions, except that instead of accepting a file pointer, the sscanf/sprintf functions accept a pointer to a character string. This string plays the role of data source or sink that the file fulfills in the fscanf/fprintf routines.

Recently, three more formatted I/O functions have been added to this lineup. These are vprintf, for output to the standard output file; vfprintf, for output to other files; and vsprintf, for output to character strings in memory. The only difference between this v series of scanf and printf functions and their earlier counterparts is that where the printf functions expect a variable-length list of parameters to be formatted, the v series of functions expects a va_list pointer. For example, the function prototype for the sprintf function looks something like

int sprintf(char *str, const char *fmt, ...);
while the function prototype for the corresponding v function will be something like

int vsprintf(char *str, const char *fmt, void *args);
Whereas the sprintf function is passed (in order) a pointer to the destination string, a pointer to the format-specification string, and a variable number of arguments, the vsprintf function takes a pointer to the destination string, a pointer to the format-specification string, and a pointer to an argument list. This last pointer is simply the va_list type variable that the va_start macro initialized.

These functions are useful for developing other functions that provide customized versions of the services that the printf family of functions provides. One application of this type of routine is in graphics. The Turbo C graphics library provides a routine called outtextxy used to write a string of characters to the terminal when in graphics mode. While the function itself is sufficient to write a simple character constant, more frequently the string requires some formatting. This involves allocating buffer space, formatting the string, and finally calling outtextxy to write the formatted string to the display. It would be much easier if a function accepted the x and y coordinates for the output, a printf-style format specification string, and the corresponding argument list, and would output the string to the screen as desired. The gprintf (for "graphics printf") function in Listing 3 does exactly that.

The va_start macro sets the va_list variable args to point to the start of the variable portion of gprintfs argument list. args is then passed to the vsprintf function to perform the requested formatting, and the outtextxy routine is called to write the formatted string to the display. The va_end macro is called to tidy up, and the length of the formatted string is returned. The main routine initializes the graphics routines, calls the gprintf function to display the driver and mode numbers at coordinates (100, 200), and waits for a keystroke before shutting the graphics system down and exiting.

You can use this same technique for many other purposes. For example, an application may need an emergency exit routine that generates a formatted explanation of what caused the emergency, and then shuts the application down in a controlled fashion.

Another possibility is to augment the standard printf processing routines by scanning the format list and performing additional processing when a specific character pattern is recognized in the input. For example, programmers accustomed to FORTRAN bemoan the lack of a repetition specifier in format lists. They're used to being able to write something like

write(*, '(3i)') i, j, k
to write three integers out, instead of the C equivalent

printf("%i%i%i\n", i, j, k);
While repeating a format specifier three times is hardly a problem, it can become inconvenient when the number of variables grows. One way to alleviate this printf restriction is to write a printf preprocessor function that accepts repeat counts in format lists.

The rprintf function in Listing 4 does just that. Calling the rprintf function with a format list of "%3r(%i )" performs the same function as the two statements above. rprintf prints out three integers without requiring that the integer format specifier %i be repeated three times. rprintf scans the format string for occurrences of the %r format specification. On encountering a %r repeat specifier, the format specifier enclosed in parenthesis is expanded into repeated occurrences. Everything else in the string is passed through as is. The resulting format string is then passed to the vprintf function, along with a pointer to the original argument list. vprintf then performs its usual job of formatting and printing.

Other, more application-specific enhancements can also be easily added to the printf family of functions using the approach used for repeat specifiers. For example, a financial application might need to display numbers in accounting style: prefaced by a dollar sign, with two digits after the decimal point, and the minus sign after the number rather than before it. In other application areas, it could prove helpful to be able to conveniently include time and date information in an easy-to-read format in a printf statement. The vprintf family of functions is just the ticket for any situation where formatted output must be processed in a slightly non-standard manner.