December 1997/Questions and Answers

Columns

Questions & Answers

Pete Becker

The Ins & Outs of Variable Argument Lists

Pete explains how to scan an argument list, and how to push debugging output out the door.

To ask Pete a question about C or C++, send e-mail to petebecker@acm.org, use subject line: Questions and Answers, or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046

Q

On occasion I get the privilege of actually receiving a copy of CUJ from a friend of mine. Currently, I've run into a little problem which nobody seems to be able to solve, so I thought I'd drop you a line.

You know the prototype for printf()? Well it looks a little something like this ????
printf(const char *, ...)
The thing that puzzles me is how the ... part works. I've coded a couple of functions that accept an unlimited amount of mixed parameters, but I can't access the parameters not defined in the prototypes. The ... allows me to pass all the parameters I need, but I have no way of using their values. I know it is possible to do, since printf() makes use of this, but neither Borland nor Microsoft could help me. Could you please take a peek at this and try to give me a little idea of what I should try?

— Ewald Horn

A

Being able to use variable length argument lists is one of the things that separates real programmers from Pascal-using wannabes[1] . I commend you on your decision to throw away the training wheels and to dig in and do some serious programming. But be warned: there's no turning back.

When you write a function that takes a variable number of arguments there are two things you need to know: how many arguments are actually passed and what types they are. C doesn't give you any help here — you have to establish your own coding conventions for this. But once you've figured out how to transmit this information to your function, using the extra parameters is straightforward. So let's begin by simplifying things, and writing a function that takes only arguments of type int. Its prototype looks like this:
void show_values( int, ... );
In general, that's the way to prototype a function that takes a variable number of arguments. It must always have at least one argument whose type is specified. That argument provides the anchor for the compiler magic that comes later. In this case, let's impose a requirement on users of this function that the first argument must be the number of additional integer arguments in this particular call. For example:
/* two additional arguments:*/
show_values( 2, 1, 3 );
/* four additional arguments:*/
show_values( 4, 1, 3, 2, 5 );
In our implementation of show_values we use that first argument to control what the rest of the function does:
void show_values( int count, ... )
{
    while( count-- > 0 )
        /* do something with the next argument */
}
So far, it's pretty easy. Now comes the magic. We begin by pulling in the header stdarg.h, which defines the type name va_list and a set of macros: va_start, va_arg, and va_end. We have to create a variable of type va_list, which the compiler uses to keep track of where we are in the argument list. Then we use va_start to tell the compiler where the argument list begins. We use va_arg to pull off the individual arguments, and when we're done we use va_end to tell the compiler that we're done [2] . Like this:
#include <stdarg.h>
#include <stdio.h>
void show_values( int count, ... )
{
va_list ap;
va_start( ap, count );
    while( count-- > 0 )
        {
        int i = va_arg( ap, int );
        printf( "%d\n", i );
        }
va_end(ap);
}
We've used the va_start macro to set ap to point to the first argument, which follows count in the argument list. You should always use the argument that comes just before the ellipsis [3] when you invoke va_start. Each time the code executes the va_arg macro it gets the next argument from the argument list and copies its value into i. Try this with the calling sequences I mentioned above.

Now let's try a different way of getting the argument count into the function. Let's write another function, one that takes only char * arguments after the first. We want it to look like this:
void show_strings( const char *, ... );
This time we're going to pass in a series of pointers to char, with a final null pointer to indicate the end of the list. Like this:
char *msg1 = "Hello, world!";
char *msg2 = "Goodbye, cruel world!";
char *msg3 = "Just kidding...";
show_strings( "Comments:", msg1, msg2, msg3, (char *)0 );
Notice that the last argument here uses an explicit cast to char *. Remember, the compiler doesn't know what types of parameters we're expecting inside show_strings, so we have to make sure that everything we pass in is of the right type. You can't safely use NULL here, because it may be just a macro that expands to 0.

Writing this function is straightforward:
#include <stdarg.h>
#include <stdio.h>
void show_strings( const char *header, ... )
{
va_list ap;
char *arg;
printf( "%s\n", header );
va_start( ap, header );
arg = va_arg( ap, char * );
while( arg != NULL )
    {
    printf( "\t%s\n", arg );
    arg = va_arg( ap, char * );
    }
va_end(ap);
}
Once again, we used va_start to point ap to the first argument in the variable part of the argument list, the part that comes right after header. We use va_arg to pull off the arguments, all of type char *, and as long as we get a non-null pointer, we print out the char array that it points to and move to the next one.

Working with mixed types in the variable part of the argument list isn't much harder, as long as you know what type is coming next. To see how this works, let's simplify things again by just alternating ints and doubles. We'll use the first argument to pass the total number of arguments. The prototype for our function looks like this:
void show_pairs( int, ... );
It's easy to write:
void show_pairs( int count, ... )
{
va_list ap;
va_start( ap, count );
while( count > 0 )
    {
    int first = va_arg( ap, int );
    double second = va_arg( ap, double );
    printf( "%2d. %f\n", first, second );
    count -= 2;
    }
va_end(ap);
}
We can call it like this:
show_pairs( 4, 2, sqrt(2.0), 3, sqrt(3.0) );
That's really all there is to it. Of course, if you want to use more complex conventions for the types of parameters you pass, you may want to pass a parameter that indicates the types of the variables that follow. Hmm, maybe you could pass a string, consisting mostly of text to be copied to the console, but with % signs every now and then to indicate that the function should insert the next argument. The next letter after the % sign could indicate the type of the argument. Then all you'd have to do is write some code to interpret the type indicators and convert the arguments to text. Imagine what you could do with a function like that.

Q

I have been trying to write a log class that would replicate the behavior of ostreams but would send messages both to the screen and/or to a log file, at the user's option. My first try was rather cumbersome, but it was working (minus a couple of features I wanted to add). Then, I saw your September '97 column which described how I should have implemented my class. However, there is one requirement I have that I cannot implement using your technique. When a message is sent, my class must open the log file, write the message into it, and then close the file. (This project began after a hellish month of using all my spare time to track down a problem that was crashing my program, so any messages that I had sent to a file or to cout would get lost since they were never flushed.) Since my class will open and close the file, it doesn't seem to me to make sense to pass an ofstream object into it. I should be able to just give it the name of the log file I want, and the ofstream should be maintained within the log class. But when I use a constructor that takes only a char * as an argument, my program compiles and links fine but crashes with a GPF when I run it! Is it possible to use your teebuf class this way?

— Rob Richardson

A

This is a fairly common problem when you're dealing with code that isn't working right, or when you're writing an application that must take pains not to lose data. You've identified the fundamental problem: in most cases stream output gets stored in internal buffers without being written to the disk, and if the program crashes, the data is lost. There are several techniques you can use for improving data integrity, and they offer varying degrees of improvement at varying run-time costs.

The simplest technique is to flush the stream regularly. In C you do this by calling the fflush() function:
FILE *fptr;
fptr = fopen( "data.log", "a+" );
if( fptr == NULL )
    exit(EXIT_FAILURE);
fprintf( fptr, "Here's some data\n" );
fflush( fptr );
To do the same thing in C++, use the manipulator endl or the member function flush():
std::ofstream out( "data.log",
    ios::out | ios::ate );
out << "Here's some data" << endl;
The endl manipulator inserts a newline into the text stream and flushes the internal buffers. You'd use the member function flush() like this:
out.flush();
The obvious drawback with this approach is that you have to remember to flush the buffers regularly. If you forget, you lose the benefit of this technique.

You can avoid this problem, at the cost of slowing down your output significantly, by eliminating the buffering. Instead of gathering data in memory and writing out big chunks, you can set up your stream to send each byte of data directly to the underlying file system as soon as you send it to the stream. In C you do this with setvbuf():
FILE *fptr;
fptr = fopen( "data.log", "a+" );
if( fptr == NULL )
    exit(EXIT_FAILURE);
setvbuf( fptr, NULL, _IONBF, 0 );
fprintf( fptr, "Here's some data\n" );
The _IONBF parameter sets the stream to be unbuffered. You'd do the same thing in C++ with ios::unitbuf, which comes in two forms: a manipulator and a format flag. Use the manipulator this way:
std::ofstream out( "data.log",
    std::ios::out | std::ios::out );
out << std::unitbuf;
out << "Here's some data" << endl;
Now the stream will be flushed to the underlying file system after each insertion [4] .

In C++, the standard streams cerr and clog are by default unit-buffered, so you don't have to remember to set unitbuf.The stream cerr writes to stderr, and clog writes to stdout.

You may have noticed that I've avoided saying that your data will be flushed to the disk. That's because the techniques we've looked at so far don't promise to do that. What they do promise is that your data will be flushed out of your application's internal buffers. This makes it more likely that your data will actually end up on disk if your application crashes, but that depends on how your operating system handles its own buffering. Yes, there's another layer in there, and that's why we often end up actually closing a file after writing to it. Closing the file gives the operating system a strong hint that we actually want the data to go out to the disk. It's not a guarantee, though: some file systems use a technique known as write caching to increase the apparent speed of writing to a disk. They save the data in memory, even when you've closed the file, and wait until the file system isn't too busy to actually copy the data to the disk. You can shut off write caching, but the technique is system dependent. Consult your operating system documentation if this is important to you.

Closing the file after writing significant data will often improve the safety of your data. Back in the olden days, we did this in C++ by creating a temporary stream object:
std::ofstream( "data.log",
    std::ios::out | std::ios::out )
    << "Here's some data\n";
By invoking the ofstream constructor directly, rather than creating a named object as we did in the earlier examples, we create a temporary object of type ofstream. We insert our data into this object, and then at the end of the statement the compiler invokes the destructor for this temporary object, which closes the underlying file [5] . This technique doesn't work any more, because the language definition now says that this temporary object can only be modified by its own member functions [6] . As long as you stick to the built-in inserters you'll be okay — they're supposed to be members of ostream. But you won't be able to use any of the inserters that you've written for your own types. That may be an acceptable limitation, depending on the sort of data that you're trying to preserve.

If you don't want to have to remember to open and close the file every time you want to write some data there are a couple of approaches you can take. First, you can simply create your ofstream object when you need it, and let its destructor take care of closing the file at the end of the block:
void f()
{
std::ofstream out( "data.log",
    std::ios::out | std::ios::out );
// write some data
// write some more data
}    // file closed before return
In fact, you can add otherwise extraneous blocks to your program to ensure that the file gets closed:
void f()(
{
// do some computations
    {
    std::ofstream out( "data.log",
        std::ios::out | std::ios::out );
    // write some data
    // file closed on exit from block
    }
// do some more computations
}
Adding blocks can get a bit tedious, but it might be worthwhile. Of course, in an actual application you would probably want to write a function to give you the name of the file to write to, rather than hardcoding it.

That's about all I can think of. In particular, I don't see a portable way of using the stream classes to guarantee that the data will be flushed to the disk when you write to the stream. You have to remember to flush the data yourself. Careful coding can get you to the point where your data won't be lost if your application crashes. If the crash is bad enough that it takes down the operating system as well, you can still lose data.

Notes

[1] Now, don't get me wrong. Some of my best friends use Pascal. I think it's a fine language. But let's face it, you wouldn't want your sister to program in it, would you?

[2] If you want to look up the definitions of these macros, I can't stop you, but I don't recommend it. You'll find strange things; for instance, in many cases va_end() is empty. That doesn't mean that you can leave it out. Everything that these macros do is highly system dependent, and if you don't go through all the required steps, someday you'll run into a compiler that doesn't like what you've done. That's why I keep calling these things magic. There's grave danger here for the inexperienced — don't mess with the macros.

[3] Yes, ellipsis, not ellipses. There isn't even one ellipse here.

[4] The flush is performed at the end of each of the built-in inserters. If you've written your own inserters for your own types you don't need to worry about flushing your own data as long as your inserters use the built-in inserters.

[5] There used to be some variation among compilers in when the destructor would actually be invoked; in some cases it wouldn't happen until the end of the block in which the temporary object was created. We don't need to concern ourselves with that here, because the language definition now requires that temporary objects be destroyed at the end of the full expression in which they are created.

[6] The explanation for this is that it is, after all, a temporary object, and it's probably a mistake to modify a temporary object, since those modifications will disappear at the end of the expression that created the temporary object. In most cases that's probably right, but there are cases like this where the side effects of modifying the object are important.

Pete Becker is Technical Project Leader for Dinkumware, Ltd. He spent eight years in the C++ group at Borland International, both as a developer and manager. He is a member of the ANSI/ISO C++ standardization commmittee. He can be reached by email at petebecker@acm.org.