January 1994/Standard C

Columns

Standard C

Technical Corrigendum 1

P.J. Plauger

P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.

Introduction
No sooner had the ANSI C Standard hit the streets in 1989 but people started asking questions about it. Some questions were simple requests for enlightenment, or how to interpret an apparent ambiguity or oversight. Others were direct challenges to the correctness or completeness of the C Standard itself. In each case, ANSI rules required that committee X3J11 respond to the query as a Request For Interpretation, or RFI.
X3J11 did its best to respond to four dozen such RFIs. Unfortunately, the rules of the game did not permit us to easily change the wording of the C Standard. We often had to excuse what we wrote instead of simply making it a bit clearer. Equally unfortunately, those dozens of RFIs and the committee responses were slow to see the light of day. The first batch was only recently approved by ANSI for publication.
Meanwhile, ISO committee JTC1-/SC22/WG14 assumed more and more responsibility for the C Standard. It didn't help that the ISO rules for interpreting and fixing language standards were, in their own way, as obscure and inappropriate as ANSI's. It was not until the August 1992 plenary meeting of SC22 that we laid down sensible procedures for responding to queries and challenges from the public.
WGI4, with lots of cooperation from X3J11, has worked hard for the past year or so to catch up. We've now processed nearly five dozen Defect Reports (or DRs), the ISO analog to RFIs. All of the responses to date have been gathered into a Record of Responses (or RR) that is now being balloted within SC22. By the time you read these words, the ballot should be closed. I'll be astonished if the ballot yields any serious opposition. Too many experts from too many countries have labored for too many years for any major objections to remain hidden.
A companion document to the Record of Responses, called a Technical Corrigendum (or TC), summarizes all the changes that WG14 now recommends to the ISO C Standard. The RR is not normative but the TC is. Most of the changes are designed to clarify wording that can be misread. A few resolve ambiguities or patch holes that are hard to argue away. Just one or two definitely change the rules of C — to make the language more like what X3J11 meant instead of what we ended up saying. None of the changes add significant new features to Standard C, or take any away.
What I present here are the actual instructions for changing the ISO C Standard (ISO/IEC 9899:1990). Be warned that they are in draft form — they can still change in response to comments from the balloting. I expect such changes to be small, however. I have extracted changes to the appendixes and put them at the end. Otherwise, the changes occur in no particular page order. That's the way we responded to the questions as they came in.
If you have the ANSI version instead (ANSI X3.159:1989), you'll find the leading digit of the subclause number differs. Usually, the ISO number is three higher. You'll also find that page numbers tend to be off by one, mostly. But even if you lack a copy of the C Standard in any form, you should make sense out of what follows.
I've added my own commentary in italics to explain the reason for each change. The actual TC reproduces the Defect Report that led to the change, but I lack the space to do the same here. The words in boldface are the meta-instructions from the TC describing where each change should occur.

The Changes
Some implementors wanted to avoid copying structures in a function returning a structure, even if that meant the return value might overlap a structure argument value. We wanted to clarify that this is not permissible:
Add to subclause 6.6.6.4, page 80:
The overlap restriction in subclause 6.3.16.1 does not apply to the case of function return.
Example
In:

struct s {double i;} f(void); union {struct {int f1; struct s f2;} u1; struct {struct s f3; int f4;} u2; } g; struct s f(void) { return g.u1.f2; } /* ... */ g.u2.f3 = f();
the behavior is defined.
We missed one or two places where the C grammar is ambiguous. Sometimes it's hard to tell from context whether a type definition is being used a different way in a nested scope. We generalized the guideline originally laid down by Dennis Ritchie:
In subclause 6.5.4.3, page 68, change:
In a parameter declaration, a single typedef name in parentheses is taken to be an abstract declarator that specifies a function with a single parameter, not as redundant parentheses around the identifier for a declarator.
to:
If, in a parameter declaration, an identifier can be treated as a typedef name or as a parameter name, it shall be taken as a typedef name.
We got the words wrong regarding two declarations for the same name. We meant to have the later declaration assume the composite type of the two, even if the earlier declaration was in an outer scope. This is a substantive change to make Standard C behave as we intended:
In subclause 6.1.2.6, page 25, change:
For an identifier with external or internal linkage declared in the same scope as another declaration for that identifier, the type of the identifier becomes the composite type.
to:
For an identifier with internal or external linkage declared in a scope in which a prior declaration of that identifier is visible*, if the prior declaration specifies internal or external linkage, the type of the identifier at the latter declaration becomes the composite type. [*Footnote: As specified in 6.1.2.1, the latter declaration might hide the prior declaration.]
Here is a similar error regarding the determination of storage class. We meant the rule to apply across any two scopes, not just file scope and another one, so we fixed it:
In subclause 6.1.2.2, page 21, change:
If the declaration of an identifier for an object or a function contains the storage-class specifier extern, the identifier has the same linkage as any visible declaration of the identifier with file scope. If there is no visible declaration with file scope, the identifier has external linkage.
to:
For an identifier declared with the storage-class specifier extern in a scope in which a prior declaration of that identifier is visible*, if the prior declaration specifies internal or external linkage, the linkage of the identifier at the latter declaration becomes the linkage specified at the prior declaration. If no prior declaration is visible, or if the prior declaration specifies no linkage, then the identifier has external linkage. [*Footnote: As specified in 6.1.2.1, the latter declaration might hide the prior declaration.]
We wanted to clarify how a tentative array definition with unknown size gets completed. Adding an example changes no normative wording, but provides a useful hint to the reader:
Add to subclause 6.7.2, page 84:
Example
If at the end of the translation unit containing

int i[];
the array i still has incomplete type, the array is assumed to have one element. This element is initialized to zero on program startup.
We wanted to clarify that array arguments become pointer arguments rather early in the life of a function prototype. You can treat arrays as pointers both for determining type compatibility and for forming a composite type:
In subclause 6.5.4.3, page 68, lines 23-25, change the two occurrences of:
its type for these comparisons
to:
its type for compatibility comparisons, and for determining a composite type.
A similar confusion recurs on just when a structure type becomes complete. We clarified that completion occurs at the closing brace in the structure definition:
In subclause 6.5.2.3, page 62, line 27, change:
occurs prior to the declaration that defines the content
to:
occurs prior to the } following the struct-declaration-list that defines the content
Yet another confusion recurs about when the size of an enumeration is known:
Add to subclause 6.5.2.3, page 63:
Example
An enumeration type is compatible with some integral type. An implementation may delay the choice of which integral type until all enumeration constants have been seen. Thus in:

enum f { c = sizeof(enum f)};
the behavior is undefined since the size of the respective enumeration type is not known when sizeof is encountered.
Some people read the description of fscanf as requiring a conversion failure on %n when the input is exhausted. That was not our intent:
Add to subclause 7.9.6.2, page 138:
Example
In:

#include <stdio.h> /* ... */ int d1, d2, n1, n2, i; i = sscanf("123", "%d%n%n%d", &d1, &n1, &n2, &d2);
the value 123 is assigned to d1 and the value 3 to n1. Because %n can never get an input failure the value of 3 is also assigned to n2. The value of d2 is not affected. The value 3 is assigned to i.
We made clearer just what is meant by the implicit initialization of static objects to zero:
In subclause 6.5.7, pages 71-72, change:
If an object that has static storage duration is not initialized explicitly, it is initialized implicitly as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant.
to:
If an object that has static storage duration is not initialized explicitly, it is initialized implicitly according to these rules:

if it has pointer type, it is initialized implicitly to a null pointer constant;

if it has arithmetic type, it is initialized implicitly to zero;

if it is an aggregate, every member is initialized (recursively) according to these rules;

if it is a union, the first named member is initialized (recursively) according to these rules.
It was not completely clear that a newline always ends a preprocessing directive:
Add to subclause 6.8, page 86, Description:
A new-line character ends the preprocessing directive even if it occurs within what would otherwise be an invocation of a function-like macro.
Some situations in Standard C are described as both constraint violations and undefined or implementation-defined behavior. We decided to clarify the precedence of errors:
Add to subclause 5.1.1.3, page 6:
If a construct violates a constraint and is also specified as having undefined or implementation-defined behavior the constraint takes precedence.
Example
An implementation shall issue a diagnostic for the translation unit:

char i; int i;
because in those cases where wording in this International Standard describes the behavior for a construct as being both a constraint error and resulting in undefined behavior, the constraint error shall be diagnosed.
Some people felt it was not obvious enough that the members of a structure or union inherit its storage class:
Add to subclause 6.5.1, page 58:
A declaration of an aggregate or union with a storage-class specifier other than typedef implicitly causes all of its members to be given the storage-class specifier.
We wanted to clarify that assignment to a narrower type does indeed effectively stuff the value through a knothole, scraping off high-order bits:
Add to subclause 6.3.16.1, page 54:
Example
In the fragment:

char c; int i; long l; l = ( c = i );
the value of i is converted to the type of the assignment-expression c = i, that is, char type. The value of the expression enclosed in parenthesis is converted to the type of the outer assignment-expression, that is, long type.
Some people were confused about the meaning of "ignored" when talking about unnamed structure or union members during initialization:
In subclause 6.5.7, page 71, line 39, change:
All unnamed structure or union members are ignored during initialization.
to:
Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of struct and union type do not participate in initialization. Unnamed members of struct objects have indeterminate value even after initialization. A union containing only unnamed members has indeterminate value even after initialization.
In subclause 6.5.7, page 72, lines 4-5, change:
The initial value of the object is that of the expression:
to:
The initial value of the object, including unnamed members, is that of the expression:
How macros get expanded is a source of confusion to many. We added yet another example to help clarify this difficult topic:
Add to subclause 6.8.3.3, page 90:
Example

#define hash_hash # ## # #define mkstr(a) # a #define in_between(a) mkstr(a) #define join(c, d) in_between(c hash_hash d) char p[] = join(x, y); /* equivalent to char p[] = "x ## y"; */
The expansion produces, at various stages:

join(x, y) in_between(x hash_hash y) in_between(x ## y) mkstr(x ## y) "x ## y"
In other words, expanding hash_hash produces a new token, consisting of two adjacent sharp-signs, but this new token is not the catenation operator.
Here's a one-word change, to clarify that we are talking about identifiers in general and not some (unspecified) one in particular:
In subclause 7.1.2, page 96, lines 34-35, change:
However, if the identifier is declared or defined in more than one header,
to:
However, if an identifier is declared or defined in more than one header,
The functions ftell and fgetpos can often fail. Only values returned by successful calls are permitted in certain contexts:
In subclause 7.9.9.2, page 145, lines 39-40, change:
a value returned by an earlier call to the ftell function
to:
a value returned by an earlier successful call to the ftell function
In subclause 7.9.9.3, page 146, lines 10-11, change:
a value obtained from an earlier call to the fgetpos function
to:
a value obtained from an earlier successful call to the fgetpos function
We really didn't say clearly what is the type of a function call expression:
In subclause 6.3.2.2, page 40, line 35, change:
The value of the function call expression is specified in 6.6.6.4.
to:
If the expression that denotes the called function has type pointer to function returning an object type, that object type is the type of the result of the function call. The value of the function call is determined by the return statement that executes within the called function, as specified in 6.6.6.4. Otherwise, the function call has type void.
We used two different terms for "iteration structures" and "control structures." This change eliminates the form we used only once:
In subclause 5.2.4.1, page 13, lines 1-2, change:
— 15 nested levels of compound statements, iteration control structures, and selection control structures
to:
— 15 nested levels of compound statements, iteration statements, and selection statements
Some readers insisted on believing that an expression such as: x<3&&0>x must be parsed to include the token <3&&0>, and hence requires a diagnostic. It was easier to add a sentence to the C Standard than to continue to fight such perversity:
Add to subclause 6.1, page 18:
There is one exception to this rule: a header-name preprocessing token is only recognized within a #include preprocessing directive, and within such a directive, a sequence of characters that could be either a header-name or a string-literal is recognized as the former.
Here is a similar, but milder, form of the same perversity:
Add to subclause 6.1.2, page 20:
When preprocessing tokens are converted to tokens during translation phase 7, if a preprocessing token could be converted to either a keyword or an identifier, it is converted to a keyword.
More cleanup of header-name parsing:
In subclause 6.1.7, page 32, delete:
Constraint
Header name preprocessing tokens shall only appear within a #include preprocessing directive.
Add to subclause 6.1.7, page 32:
The header-name preprocessing token is recognized only within a #include preprocessing directive.
The %0 conversion specifier in fprintf has some subtle implications. It is not the same as forcing a zero fill. Nor is it the same as forcing increased precision:
In subclause 7.9.6.1, page 132, lines 37-38, change:
For 0 conversion, it increases the precision to force the first digit of the result to be a zero.
to:
For 0 conversion, it increases the precision, if and only if necessary, to force the first digit of the result to be a zero.
Similarly, the matching rules for fscanf seem to need no end of clarification:
In subclause 7.9.6.2, page 135, change:
An input item is defined as the longest matching sequence of input characters, unless that exceeds a specified field width, in which case it is the initial subsequence of that length in the sequence.
to:
An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.
In subclause 7.9.6.2, page 137, delete:
If conversion terminates on a conflicting input character, the offending input character is left unread in the input stream.
Add to subclause 7.9.6.2, page 137:
fscanf pushes back at most one input character onto the input stream.* Therefore, some sequences that are acceptable to strtod, strtol, or strtoul are acceptable to fscanf. [*Footnote: If conversion terminates on a conflicting input character, the offending input character is left unread in the input stream.]
The following change started out in an entirely different arena. We wanted to clarify that an implementation can add extra identifier characters, such as $, provided that it issues a diagnostic when they're used. But we discovered an ambiguity in how such extra characters would parse in a macro definition. So we decided to resolve the ambiguity and make the extension more usable:
Add to subclause 6.8, page 86, Constraints:
If the first character of a replacement-list is not a member of the minimal basic source character set*, there shall be white-space separation between the identifier and the replacement-list. [*Footnote: "Minimal basic source character set" refers to the 90-odd basic source characters listed in subclause 5.2.1.]
We thought it was clear enough that library macros should be written sensibly, but not everyone seemed to agree:
Add to subclause 7.1.2, page 96:
Any definition of a macro described in this clause shall expand to code that is fully protected by parentheses where necessary, so that it groups in an arbitrary expression as if it were a single identifier.
Here's a small but potentially misleading gaffe in an example:
Change subclause 7.12.2.3, page 172, line 16, from:

if (mktime(&time_str) == -1)
to:

if (mktime(&time_str) == (time_t)-1)
And a similar error in the index:
In the index, page 217, change:
static storage-class specifier, 3.1.2.2, 6.1.2.4, 6.5.1, 6.7
to:
static storage-class specifier, 6.1.2.2, 6.1.2.4, 6.5.1, 6.7
When we listed the rules for aliasing (accessing the same object by lvalues with different types), we were overly restrictive in describing the kinds of qualified types that are valid:
In subclause 6.3, page 38, lines 18-21, change:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:36
— the declared type of the object,
— a qualified version of the declared type of the object,
to:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 36
— a type compatible with the declared type of the object,
— a qualified version of a type compatible with the declared type of the object,
Some of the functions declared in <string.h> take a length argument, which can be zero. We spelled out what happens when that argument is zero:
Add to subclause 7.11.1, page 162:
Where an argument declared as size_t n determines the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function, pointer arguments on such a call must still have valid values, as described in subclause 7.1.7 Use of library functions. On such a call, a function that copies characters shall copy zero characters, while a function that compares two character sequences shall return zero.
We made clear that the macros for signal numbers defined in <signal. h> must have distinct values:
In subclause 7.7, page 120, lines 14-16, change:
and the following, each of which expands to a positive integral constant expression that is the signal number corresponding to the specified condition:
to:
and the following, which expand to positive integral constant expressions with distinct values that are the signal numbers, each corresponding to the specified condition:

Listing Undefined Behavior
The UK delegation to WG14 wants a complete list of undefined behaviors in Appendix G.2. This is part of an ongoing effort to round out that list:
Add to subclause G.2, page 204:
— A program contains no function called main.
Add to subclause G.2, page 204:
— A storage-class specifier or type-qualifier modifies the keyword void as a function parameter-type-list.
Add to subclause G.2, page 204:
— For an array of arrays, the permitted pointer arithmetic in subclause 6.3.6, page 47, lines 12-40 is to be understood by interpreting the use of the word "object" as denoting the specific object determined directly by the pointer's type and value, not other objects related to that one by contiguity. Therefore, if an expression exceeds these permissions, the behavior is undefined. For example, the following code has undefined behavior:

int a[4][5]; a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an "array bounds violation," while others may choose to interpret such attempted accesses successfully with the "obvious" extended semantics.
Add to subclause G.2, page 204:
— If a fully expanded macro replacement list contains a function-like macro name as its last pre-processing token, it is unspecified whether this macro name may be subsequently replaced. If the behavior of the program depends upon this unspecified behavior, then the behavior is undefined.
Example
Given the definitions:

#define f(a) a*g #define g(a) f(a)
the invocation:

f(2)(9)
results in undefined behavior. Among the possible behaviors are the generation of the preprocessing tokens:

2*f(9)
and

2*9*g
Add to subclause G.2, page 204:
— A call to a library function exceeds an Environmental limit.

Conclusion
I believe these changes are reasonably minor. The C Standard is honored by dozens of vendors, required by hundreds of customers, and validated by several agencies around the world. Yet it has seen remarkably few challenges for all that. Put another way, the C Standard has held up pretty well these past five years.
Still, it doesn't hurt to fix the obvious flaws. The fewer ambiguities in a document, the fewer misunderstandings result. And, of course, the fewer questions get directed to those of us who generate answers.