Columns


Standard C

Numerical C Extensions Group

P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, published by Prentice-Hall, and ANSI and ISO Standard C (with Jim Brodie), published by Microsoft Press. You can reach him at pj@plauger.com.

Introduction

I have mentioned the Numerical C Extensions Group (NCEG) on more than one occasion in the past. It began as a series of evening meetings called by Rex Jaeschke at past X3J11 C standards meetings. Rex offered to serve as a clearing house for common extensions to C — features not likely to make it into the rapidly congealing language standard. Over time, it grew as more people saw the obvious benefit in it. Why implement the same sort of extensions several different ways if there was no compelling reason to do so? Better to standardize such things as boring syntactic details, if only on an informal basis. That way, when the C Standard comes up for review in a few years, the good ideas that survive will have a wider basis in prior art. Prior art is much easier to sell than on-the-fly invention as a basis for inclusion in a revised standard.

The "numerical" part of NCEG is a slight misnomer. It is true that many of the extensions the group has considered are of interest to numerical programmers. They always seem to be pushing the state of the art to pull off ever more ambitious calculations with ever more complex computers. But some of the extensions are arguably of general interest. Still, a group has to have a name, and NCEG is arguably more mellifluous than, say, X3J11.

Thus, it was a sad day when NCEG decided to get official. You should know that companies do not idly send representatives to meetings like these. Some bystanders are quick to shout collusion and restraint of trade. Meeting under the auspices of an authorized standards group, and following scrupulously their rules for open dealings, is one way to head off such nonsense. So NCEG in the fullness of time became X3J11.1, the one and only authorized subcommittee of X3J11.

What X3J11.1 is chartered to do is produce a Technical Report. By ANSI rules, this is a document that does not have the force of a standard. It is merely intended to inform, to provide guidance. And that, of course, is entirely in keeping with the original goals set by Rex Jaeschke at his evening meetings, long before he became Chair of X3J11.1.

That Technical Report will represent the contributions of a number of different subcommittees. Three of those subcommittees have recently produced drafts that they consider fairly complete. These are:

My purpose in this column is to give you an overview of the first two of those three documents. The floating-point extensions involve considerable detail, so I will deal with them in a separate installment.

I remind you that this stuff may not be Standard C, at least not yet, but it can be a portent of things to come. Some compiler vendors are already supplying the extensions described in these draft reports. You can be sure that more and more companies will follow suit over time.

Compound Literals

More than one eleventh-hour proposal got left out of of the C Standard by X3J11. One of the better ones was put forth, albeit tentatively, by no less than Ken Thompson, the co-developer of UNIX. He reported success in implementing compound literals.

Standard C has lots of flavors of literals for the arithmetic types. You can write integer literals such as 10, 0x3ful, and 'x'. You can write floating-point literals such as 3.2 and .32e+1f. Each of these stands for a value of a given type, independent of any object that might hold it. (They're what C programmers like to call rvalues.)

You can even write literals that have associated objects (what C programmers call lvalues). These are string literals, such as "", 'Hi!\n", and "Hello, world". All have type array of char (not const array, sadly). They differ only in their dimensionality and content.

You already know how handy it is to write string literals. No need to declare an object or make up a funny name. Just describe the contents of the string in the one place where you need to refer to it. Let the translator take care of the rest.

A compound literal extends this convenience to all the other data types in your program. You write the general form:

(T) { ..... }
wherever you need an lvalue of type T. The translator effectively replaces the expression with secret_name from the declaration:

static T secret_name = { ..... }
Need an int with the value 3 in it, for a function f that expects an argument of type pointer to int? Then you can write:

f(&(int){3})
to get the function call you neeed. (Yes, the function can even scribble on it, for good or for ill.)

Or maybe you've declared type complex to be the usual form:

typedef struct {
   double re, im;
   } complex;
Then you can write the pure imaginary i as the compound literal (complex) {0, 1}.

For declaring arrays, you can even indulge the common practice of letting the brace-enclosed initializer determine the actual size of the array, as in:

double *pten2n = (double []) {1,
10,
   100, 1e4, 1e8, 1e16, 1e32,
#ifdef IEEE754
   1e64, 1e128, 1e256,
#undef
};
No need to make the code even uglier by putting the array size inside still more #ifdef logic.

Obviously, most of us have survived to date without compound literals, but they sure can come in handy if you've got 'em. Here is an extension to C that is obviously of general interest.

Designated Initializers

A related extension solves still other problems with writing initializers. The only way you can initialize a union, for example, is with a value suitable for its first member. The only way to initialize the thirteenth member of a struct, or the thirteenth element of an array, is to provide initializers for all the earlier stuff as place holders. Wouldn't it be nice if you could do more, and make your code more readable in the bargain?

Thus the call for designated initializers. These are members of an initializer list that name the thing they intend to initialize. Here are some examples:

complex imag = {.im = 1};
union {
   float fl; int in;
   } int_one = {.in = 1};
char line[80] = {
   '<', '<', [78] = '>', '>'}
The leading dot and member name designates a member to initialize in a struct or union. The leading integer value in brackets designates an element to initialize in an array. As with enumeration constants, you can mix positional and designated initializers. The initializer for line, for example, provides nonzero values for elements 0, 1, 78, and 79.

An array element designator can be an arbitrary constant expression, not just an integer value. Hence, you can even use enumeration constants to make more readable tables, as in:

enum {red = 1, green = 2, blue = 4};
const char *names[] = {
   [red] = "red",
   [green] = "green",
   [blue] = "blue",
   [red+green] = "yellow",
   [blue+green] = "cyan",
   [red+blue] = "magenta",
   [0] = "black",
   [red+blue+green] = "white"};
So you can use designated initializers when you don't know (or care about) the actual position within an aggregate object. (Type div_t, declared in <stdlib.h>, doesn't guarantee the ordering of its members quot and rem, for instance. Standard C offers no portable way to initialize such a type to other than to all zeros.) You can use them to skip long, uninteresting stretches where zero initializers will serve quite adequately. And, of course, you can even use them in compound literals, as in:

   return (complex) {.im = 1};
(not that this particular example is very compelling).

The one problem this set of extensions does not address is writing repeated (nonzero) initializers. What if you wanted line, declared above, to be filled with spaces as well as having angle brackets on both ends? You're back to writing long string literals and counting characters, or writing lots of comma-separated place holders.

Years ago, I added an extension to the Whitesmiths compilers that proved popular. It also used a constant expression in leading square brackets, but without the following equal sign. The constant expression served as a repetition count, as in:

char line[80] = {
    '<', '<', [76] ' ', '>', '>'}
Please note that this extension is not proposed by X3Jll.1. It happens to be compatible with it, strictly speaking at least. (I suspect the double use of square brackets would lead to some confusion.) But in my experience, it is of comparable value to the extensions actually proposed by X3Jll.1.

With or without repetition counts, compound literals and designated initializers certainly add to the expressiveness of Standard C. Even better, the complexity they add to the language and to translators is fairly localized. (They do cause problems for single-pass translators, to be sure, but not insurmountable ones.) On balance, I rather like them.

The Aliasing Problem

One of the darker moments in the development of the C Standard occurred near the end. It began, simply enough, with a request from Tom MacDonald of Cray Research. He wanted some way to assist C compilers in discovering code that can be "parallelized," or converted to an equivalent code sequence that executes many operations in parallel on a computer that supports such speedups. (It is far from coincidence that Cray is a leading supplier of such computers.)

This is a difficult problem which we foolishly endeavored to solve in just one or two meetings. The result was the eleventh-hour addition of a new type qualifier to Standard C. The keyword noalias, with accompanying semantics, went into the draft at one meeting and got yanked out at the next. Dennis Ritchie, the developer of C, attended that second meeting to ensure that noalias got yanked. It was the only meeting of X3J11 he saw fit to attend. That should give you a feeling of the magnitude of our mistake.

But the underlying problem did not go away with the keyword. Many numerical programmers stick with FORTRAN for one simple, practical reason. Compilers can optimize FORTRAN programs for parallel optimization much better than can C compilers of comparable sophistication. Any number of serious numerical applications just plain run faster when written in FORTRAN for highly parallel computers.

MacDonald and others remain determined to lure more numerical programmers away from FORTRAN and into C. With decreasing hardware costs, parallelism is becoming more commonplace in new computer architectures. Thus, helping C compilers parallelize code is a topic of growing concern to the entire community.

Here is the archetypical problem to be solved:

void f(float a[], float b[], float c[], size_t n)
   { /* form vector sum a = b + c */
   size_t i;
   
   for (i = 0; i < n; ++i)
      a[i] = b[i] + c[i];
   }
I have intentionally cast this in a form that parallels FORTRAN closely, but it should still compile into fairly efficient C.

FORTRAN, as a call-by-reference language, also passes pointers to the various arguments (even n, not that it matters here). But that language imposes a special constraint on function or procedure arguments — they are not supposed to overlap. Put another way, the compiler is free to rearrange the actual loads and stores for arguments on the assumption that no harm will ensue. If your arguments really overlap and you don't get the answer you expect, that's your tough luck.

Thus, FORTRAN can take advantage of things like vector arithmetic units in parallel computers. The generated code might pick up a whole bunch of b[i]s, add in the corresponding c[i]s, then store them all in the proper a[i]s. That's fine, given that none of the values stored are also read during the vector calculation. Overlap causes all sorts of interesting possibilities. Depending on how big the chunks are, and in what order they're accessed, you can end up with all sorts of different results. You can see why FORTRAN is happy to ban such overlap.

We C programmers have it much worse. C imposes no restrictions on the overlap of objects designated by pointers, whether or not the pointers are arguments. The translator cannot, in general, determine whether *p and *q, or even *p and x, are aliases for the same region of storage. So long as all the lvalues have essentially the same type, aliasing is always a possibility. (I won't discuss what I mean by "essentially" here. That's another long story.)

Even worse, in this context at least, Standard C makes some pretty strong promises about sequencing of code execution. Each of the assignments in that for loop is bracketed by sequence points. Any earlier stores must have occurred before the current assignment expression starts being evaluated. The store called for by the current assignment must occur before the next expression gets evaluated. Thus, you can habitually write code such as:

for (c[0] = 'x', i = 1; i < n; ++i)
     c[i] = c[i - 1];
and know that it will propagate the value 'x' throughout the array.

Aliasing Control

So here is the classic dilemma of optimization. You really want your C translator to aggressively rearrange your program to speed it up a lot, except when you don't. Put another way, you only want the translator to move stores past sequence points when the resulting code behaves as if the sequence points were properly honored. That's a tall order for a translator that has to deal with arbitrary aliasing through pointers.

There's precedent for helping out C translators. The storage class register, for example, appeared early in C purely as an optimization crutch. Its only real semantics is that you can't take the address of a register object. But that helps the translator squirrel such objects in special places, such as fast machine registers. If you've guessed right, and if the translator can take advantage of your hint, your register declaration will make your code smaller and faster.

Modern C translators generally do a better job of using registers than you ever will. Often, your suggestions are simply ignored. But global flow analysis is still not up to solving the aliasing problem. C translators can match FORTRAN optimizations only with a bit of help.

The help proposed by X3Jll.1 takes the form of a new keyword called restrict. You use it as a type qualifier (like const or volatile, but for pointer lvalues only. Such a "restricted pointer" promises exclusive access to the object it currently designates. For example, our archetypical vector sum can now be written:

void f(float * restrict a, float b[], float c[],
   size_t n)
   { /* form vector sum a = b + c */
   size_t i;

   for (i = 0; i < n; ++i)
      a[i] = b[i] + c[i];
   }
The translator is promised that the object dynamically designated by a (in this case an array of n elements of type float) overlaps no other objects in the scope of the pointer a. Thus it is safe to push around stores into this array rather aggressively within the body of the function. That's just the kind of promise a smart vectorizer needs to do its thing.

Note that there is no harm in having the objects designated by b and c overlap, since both are accessed but never modified in this function. You could also declare these arguments as restricted pointers, but a smart translator would neither benefit nor suffer from the added information.

Restricted pointers can occur outside function arguments. They make sense as static objects, in which case they strongly resemble C+ + references. (But be warned, they are not quite the same thing.) They even make sense as members of a structure or union.

In all cases, however, restricted pointers represent a fairly subtle compact between a sophisticated programmer and an equally sophisticated C translator. Even if you have access to a translator with this extension, I recommend that you resist the urge to use it. It's the kind of decoration that can make a critical piece of code run substantially faster, but at a cost. The code will be less portable and harder to understand. In other words, kids, don't try this at home.

From what I understand of restricted pointers, they're somewhat better thought out than our abortive noalias proposal. That's mostly because they're designed to closely model the proven machinery found in FORTRAN optimizers. By the same token, restricted pointers are rather ad hoc. They solve a specific problem in a specific way. We tried to generalize the issue with noalias and succeeded in some important ways. But we failed in the most important way of all — making the machinery clear and believable. What the heck.