printf Revisited

Dr. Dobb's Journal January, 2005

Picking up where IOStreams leaves off

By Walter Bright

Walter is the author of the D language and the Digital Mars C/C++ compiler (among others). He can be reached at http://www.walterbright.com.

I love C's printf. It's one of the original reasons why I switched from Pascal to C 20-plus years ago. It's quick and easy to bang out. It never complains, it's always faithful, it just does its job. It's just so darned useful and convenient.

But I have to admit it, printf has well-known faults:

The most noticeable attempt to fix these faults was C++'s IOStreams, which itself has undergone two major overhauls, and credibly solves some of printf's problems. But in the process, we lost what was so cool about printf. What was short and sweet turned long and clumsy. For instance, "Hello world" looks fine:

printf("Hello world!\n");
std::cout << "Hello world!\n";

but when there are more arguments, it starts looking dodgy:

printf("Elapsed time for %s is %d %s\n", foo, etime, units);
std::cout << "Elapsed time for " << foo << " is "
<< etime << " " << units << std::endl;

But IOStreams really gets beaten with the ugly stick when you try to use something other than the default formatting:

printf("%d|%-8d|%d\n", 123, 456, 789);
std::ios_base::fmtflags flags_save = std::cout.flags();
std::cout << 123 << '|' << std::left
<< std::setw(8) << 456
<< "|" << 789 << std::endl;
std::cout.flags(flags_save);

Even worse, formatting with IOStreams introduces problems with exception safety and thread safety with respect to stream state, since the std::left relies on a sticky global state.

And finally, IOStreams inherits all of C's problems with UTF characters.

The D programming language (http://www.digitalmars.com/d/index.html) attempts to retain the great things about C and C++ while fixing the faults. Therefore, printf needs to be fixed to resolve the acknowledged problems while retaining its handy utility. Few other issues with D generated more debate, heat, ideas, and volume on the D newsgroup (news.digitalmars.com).

The obvious root of printf's problems is the lack of type information corresponding to the variadic function arguments. In D, variadic functions are passed a hidden argument that is an array of the types of all the variadic arguments. This gives a free hand in designing a replacement—writef—and having these functions corresponding roughly to printf, fprintf, and sprintf:

void writef(...); // write arguments to
// stdout
void fwritef(fp, ...); // write arguments to fp
char[] format(...); // create a string and
// write the arguments to it

What happened to the format string as the first argument? It isn't necessarily needed. The argument scanner in writef's implementation scans each argument in order. If it is a char[], it is interpreted as a format string; otherwise, it is printed in the default manner appropriate to its type:

writef(); // prints no characters
writef("hello", " world\n"); // prints 'hello
// world\n'
writef(123, 8.60, '@'); // prints '1238.6@'

If an argument is an instance of a class derived from Object, the class's toString() method is called to get the corresponding string to print:

class Man
{
char[] toString() { return "I am a Man, not an ape"; }
}
void foo(A a)
{
writef(a); // prints 'I am a Man, not an ape'
}

writef's format strings are greatly simplified because they no longer have to do double duty as type specifications. Any string not under the control of a previous format is interpreted as being a format string. It is parsed and interpreted, and subsequent arguments are consumed and formatted in an analogous manner to printf. %s takes the next argument and prints it in its default format. %d takes the next argument, which must be an integral type, and prints it in decimal. (Whether it is signed is determined by its type, not its format.) If the next argument is not an integral type, an exception is thrown.

Similarly, %b, %o, %x, and %x format integral types in binary, octal, lowercase hex, and uppercase hex, respectively. The %e, %E, %f, %F, %g, %G, %a, and %A take the next argument, which must be a floating-point type, and prints it in the corresponding floating format (they work the same as C99's floating-point formats).

And that's it for basic formats. No need for l, ll, L, or hh modifiers, or for %u and the like:

writef("%x %s", 123, 456L, 789);
// prints '7b 456789'
writef("%o ", 56, "%b", 22);
// prints '70 10110'
writef("%d %d %d", -1, -1, cast(uint)-1);
// prints -1 -1 4294967295

To the basic formats, writef adds in the flags, field width, and precision specifications wholesale from C99.

writef("%d|%-8d|%d", 123, 456, 789);
// prints '123|456|789'

So how does writef stack up against the complaints about printf? Well, writef:

Recall the complaint that C++ IOStreams was neither exception safe nor thread safe with respect to the stream state. writef is exception safe because its formatting state is local on the stack, not globally set. It's thread safe when the implementation of writef synchronizes itself with stdout across the entire invocation of writef.

DDJ