C/C++ Contributing Editors


The Learning C/C++urve: C9X VLAs

Bobby Schmidt

Many languages support variable-length arrays. Soon Standard C will too.


Copyright © 1998 Robert H. Schmidt

I had hoped to cover the new C variable-length arrays (a.k.a. VLAs) plus a few other topics this month. But as I dug into VLAs, I found enough material to occupy more than a single column. As a result, I'll devote this month entirely to VLAs; next month will feature whatever I can't fit this time, plus the aforementioned other topics. Scott Meyers' ennui will have to wait.

Traditional Arrays, Briefly

In Standard C, arrays are defined as having a constant integral length, either explicitly:

int a[10];    /* length 10 */

or implicitly:

char const s[] = "G'day"; /* length 6 */

Arrays can also be declared (but not defined) with an unspecified length:

extern int x[]; /* unspecified length */

Here x is of an incomplete type. Wherever x's storage is actually defined, x's type must be made complete:

extern int x[5]; /* defines x with length 5 */

Regardless of how an array's length is defined, that length is typically lost once the array is passed to a function:

void f(int a[])
    {
    /* a is a pointer, not an array. */
    /* Original length is lost. */
    }

As I wrote last month, adding a dimension doesn't help:

void f(int a[10])
    {
    /* a is still a pointer, not an array. */
    }

And of course, solutions like references to array objects, and array templates, are available in C++ only.

C programmers typically get around this problem by passing the array length as an extra argument:

void f(int a[], size_t n)
    {
    /* Assume that a is of length n. */
    }
     
int a[10];
f(a, 10);

From f's perspective, a's length is known at run time. However, from the perspective of f's caller, a's length is fixed at translation time. To make a's length fixed at run time instead, we must use dynamic allocation:

int *a = malloc(sizeof(int) * 10);
f(a, 10);

But this means we are no longer really manipulating arrays; rather, f is passed in a pointer that it treats as if it were really an array. The language does not enforce this convention - only programmer discipline does.

VLAs to the Rescue

Variable-length arrays offer another approach: real arrays (not pointers) with lengths set at run time [1]. VLAs are new to C9X, the next proposed version of Standard C. They do not exist in the original C Standard, nor are they part of the C++ Standard (although C++ container classes can mimic VLAs). Your C compiler most likely does not yet support them.

In their simplest form, variable-length arrays look like normal arrays:

int a[n]

The difference is that the array length n is evaluated at run time instead of compile time:

extern size_t n; /* n's value is unknown at compile time. */
int a[n]; /* OK, n is evaluated at run time. */

Because their sizes are not reckoned at compile time, VLAs cannot have static duration:

extern size_t n;
static int a[n]; /* Error, n must be known at compile time. */

This implies that VLAs can never appear at file scope. In fact, they can appear only at block scope:

void f(size_t n)
    {
    int a[n]; /* OK, at block scope. */
    }

or at prototype scope:

extern size_t n;
     
void f(int a[n]) /* OK, at prototype scope. */
    {
    /* ... */
    }

(Yes, you masochists, this means VLAs cannot be members of structures or unions.)

Within the block or prototype declaring them, variable-length arrays act like normal arrays:

size_t n;
     
void f(char const param[n])
    {
    char const local[n];
    printf("n == %lu\n",
  (unsigned long) n);
    printf("sizeof(param) == %lu\n",
        (unsigned long) sizeof(param));
    printf("sizeof(local) == %lu\n",
        (unsigned long) sizeof(local));
    }
     
int main(void)
    {
    char const message[] = "VLAs Rule!";
    n = sizeof(message);
    f(message);
    return 0;
    }

When run with Tenon MachTen and GNU C++ 2.7.2.F.1 on my Mac, the above produces

n == 11
sizeof(param) == 4
sizeof(local) == 11

On your system, sizeof(param) may be different. Because of typical array-to-pointer decay, param is really seen by f as a pointer, and its size is therefore that of a pointer. local, on the other hand, is known to f as an array, so its size stays n.

Unspecified Lengths

In the above example, n must be declared before f, so that the prototype

void f(char const param[n])

knows what n to reference. But what if you want to forward declare f first, then declare n later:

void f(char const param[n]);
     
/* ... */
     
size_t n;
     
/* ... */
     
void f(char const param[n])
    {
    /* ... */
    }

As written, this won't compile. The first f declaration fails, since its parameter n is unknown. And if you leave n out altogether:

void f(char const param[]);

the compiler considers param a pointer instead of a VLA.

To get around this problem, the Standard C committee invented new usage for everyone's favorite token [2]:

void f(char const param);

The * is a place holder, implying the parameter has no compile-time length. Within the definition (body) of f, the compiler substitutes an actual length expression for *:

void f(char const [*]);
     
/* ... */
     
void f(char const param[n])
    {
    /* ... */
    }

You may wonder why the committee chose *, which is already overused in declaration syntax. From what I can tell, the answer is "prior art." While VLAs are new to Standard C, they have been an extension to conforming C compilers for years. Different vendors have come up with different ways of implementing them. While there was sentiment for alternatives like [?] and [:], the weight of existing code using the [*] syntax seems to have been the trump card [3].

Length Parameters

f references an array length n declared at file scope. You can also provide the length as a parameter:

void f(size_t n)
    {
    char const local[n]; /* OK */
    /* ... */
    }

The quirk comes when using n to define another parameter's length. The obvious solution

/* Error */
void f(char const param[n], size_t n)
    {
    /* ... */
    }

fails to compile! According to the C lookup rules, the declaration of n must come before any reference to n:

/* OK */
void f(size_t n, char const param[n])
    {
    /* ... */
    }

even though this ordering (first length, then array) is counter-intuitive to many C programmers. Certainly the Standard C library has ample precedent for array-first ordering:

fwrite(void const *, size_t, ...);
memcpy(void *, void const *, size_t);
strftime(char *, size_t, ...);
strncat(char *, char const *, size_t);

Nonetheless, if you want to use VLA parameters and pass in their lengths, adopt the idiom of length first, array second. I guess this is a case where prior art - so compelling for [*] - yields to other considerations.

Actually, to be fair, I can't imagine doing it any other way. If the array could come first, how would the compiler interpret

size_t n;
     
void f(int a[n], size_t n)
    {
    /* ... */
    }

Would the n in a[n] be the n in the prototype list, even though it appears after a[n]? Or the n from global scope, even though the other n is more local? Thanks to the above-cited rule, we are spared such wonderment. (However, if the ambiguity silently thrills you, please consider moving to C++, a veritable smorgasbord of name-lookup delights.)

Bad Karma

Joyfully, this length-first ordering leads to an Unintended Consequence. Remember back to the Good Ol' Days of K&R C, which had no true function protoypes. Instead, function definitions listed out their parameters as quasi-local variables:

void f(n, a)
    unsigned n;
    int a[n];
    {
    /* ... */
    }

To maintain compatibility with older K&R code, the ISO C committee allowed these sorts of parameter definitions. According to the Standard C rules, the ordering of these definitions doesn't matter. The above should be equivalent to

void f(n, a)  /* Same calling sequence... */
    int a[n]; /* ...but different definition order. */
    unsigned n;
    {
    /* ... */
    }

For K&R C and Standard C, this equivalence is true - but in those languages, the parameters don't reference each other this way. With variable-length arrays, the ordering of these definitions does matter, and the latter example will fail to compile.

While considering VLA parameters, committee members explored the notion of relaxing the scoping rules so that a[n] could come before n. However, the members finally decided such relaxation was not consistent with the venerated "Spirit of C." In actual practice, I'm hard-pressed to believe anyone would add such new features to K&R-style code, so I think the committee made the right choice.

But I Want the Array First!

If you absolutely must have your calling sequence be array first, then length, you can use an intermediate explicit pointer:

void f(int *a, size_t n)
    {
    int (*p)[n] = (void *) a;
    }

Salient points:

Here p points to a variable-length array. Compare this to a pointer to a normal array, as shown last month:

/* compile-time constant length */
size_t const n1 = 10; 
/* run-time variable length */
extern size_t n2;     
     
/* ptr to array of constant length */
char const (*p1)[n1]; 
/* ptr to array of variable length */
char const (*p2)[n2]; 

The syntax is identical; the difference lies in the semantics. In Standerdese, pointers to variable-length arrays have variably modified or VM types. At the run-time point of their definition, VM pointers evaluate the length argument (n2 here), just as VLAs do. Thus, the actual dynamic type of a given VM pointer can vary over the program's lifetime.

Like the VLAs they reference, VM pointers must appear at block or prototype scope, and cannot be struct/union members. However, unlike VLAs, such pointers can have static duration (since the pointer's size is the same regardless of what it references, and thus can be reckoned at compile time).

sizeof VLAs

In the case of a normal array:

void f(void)
    {
    char a[10];
    size_t n = sizeof(a); /* OK, n = 10 */
    }

the array size exists in the compiler's internal symbol table, and is substituted for the expression sizeof(a) at compile time. The sizeof operator does not actually evaluate its operand (a) at run time. Even though sizeof(a) may look like a function call, it really acts more like a symbolic constant.

If we change the array to be variable-length:

void f(size_t x)
    {
    char a[x];
    size_t n = sizeof(a); /* OK, n = x */
    }

the array size is completely unknown to the compiler. sizeof(a) must actually evaluate a to discover its length. Further, each time f is called, x may be different, implying a may have a different length. Thus, f cannot cache away sizeof(a), but must reevaluate it each time.

Here comes the interesting fallout:

int main(void)
    {
    size_t n = 10;
    int a[n];
    printf("sizeof(a) == %lu\n",
      (unsigned long) sizeof(a));
    ++n;
    printf("sizeof(a) == %lu\n",
      (unsigned long) sizeof(a));
    return 0;
    }

What do you expect the result of this to be? Clearly the first printf should show sizeof(a) as 10. But what about the second printf? I'm guessing you'll interpret the code one of two ways:

The correct answer depends on the semantics of n, and how it relates to a.

If you consider a and n to be somehow linked or bundled together, the first interpretation makes sense. Following a C++ analogy, a could be a container object, with n a property (data member) of that object; by changing n, you change the value and behavior of a.

However, there is no such linkage between a and n. The expression defining the array's length is evaluated exactly once, at the point of a's definition. Once a is defined, it acts just like a normal array. Thus, the second bullet point above is correct.

Finally, I offer the perverse variation

int main()
    {
    size_t n = 10;
    int a[++n];
    /* What is n?  And what is sizeof(a)? */
    return 0;
    }

which I leave as a thought problem for the student (to be answered next month).

Shades of C++

A VLA's length may be evaluated every time it's encountered, even within declarations. If yours is a resource-constrained system where every cycle counts, VLAs can have hidden costs:

extern size_t n;
int i;
for (i = 0; i < 10000; ++i)
    {
    int a[n];
    /* ... */
    }

In this example, the expression n in the declaration int a[n] may be evaluated every time through the loop. The compiler's optimizer may be able to figure out that n is unchanged during the loop, and thus may act as if the code were really

/* evaluated only once */
size_t const _constant_n = n; 
for (...)
    {
    /* a is no longer a VLA */
    int a[_constant_n]; 
    /* ... */
    }

but you cannot rely on this optimization. Certainly if the loop is too complex, or references functions outside the current translation unit, the compiler cannot make such an optimization.

As we've seen earlier, this silent run-time evaluation is not limited just to the VLAs themselves. The modified example

for (i = 0; i < 10000; ++i)
    {
    int a[n];
    int (*p)[n] = a;
    size_t x = sizeof(a);
    /* ... */
    }

may end up evaluating a and/or n 30,000 times! (This problem is exacerbated by C9X's support for mixed code and declarations, a topic I'll cover next month.)

So what looks like a static compile-time evaluation with no run-time overhead could in fact incur significant run-time cost. Novice C++ programmers learn this the hard way with constructor calls, where a class object definition may call many functions behind the scenes. Until now, C programmers have been spared such stealth cost - but that innocence is now lost.

Nonetheless, I think variable-length arrays are an attractive alternative to more prosaic pointer-based solutions. But to use them effectively, you must stay aware of their underlying run-time properties. I've shown you only some of those properties here. Next month I'll show the rest, including a couple of very subtle surprises. o

Notes

[1] The concept of variable-length arrays is not new to programming. From the little I recall of ISO Pascal, VLAs match some properties of Pascal's conformant arrays. And from comments on the ISO C committee's email reflector, I've deduced that VLAs mimic behavior of languages like FORTRAN and Algol. They certainly honor the intent of Standard C++ containers like vector and list.

[2] For you language lawyers out there: none of my compilers supports this syntax yet, so these examples are guesswork, based on my reading of the C9X Draft and its Rationale. I freely admit these examples might not work exactly as advertised.

[3] I'm guessing that the [*] is intentionally reminiscent of the Unix regular expression wildcard character *, which matches any number of characters. This suggests that an array declared [*] can match any number of elements.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 14518 104th Ave NE, Bothell, WA 98011; by phone at +1-425-488-7696, or via Internet e-mail as rschmidt@netcom.com.