Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementors of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu. net.
This month we'll complete our quality of implementation set of puzzles. Try to debug them yourself first and see what messages your compiler produces.
The Puzzles
1. I just know the @#$% file exists. See Listing 1.2. When is a string literal not literal?
/* 1*/#include <stdio.h> /* 3*/main () /* 4*/{ /* 5*/ printf("?? (Error: message) \n"); /* 6*/}3. Side-effects in unused arguments.
/* l*/#include <stdio.h> /* 3*/main () /* 4*/{ /* 5*/ int i = 5, j = 6; /* 7*/ printf("Hello\n", ++i, j- -); /* 8*/}Output:
HelloAre i and j incremented and decremented, respectively?4. It compiles, it works, so what's the problem?
/* 1*/#include <stdio.h> /* 3*/main() /* 4*/{ /* 5*/ void f(); /* 7*/ f(-32768); /* 8*/} /*10*/void f(i) /*11*/int i; /*12*/{ /*13*/ printf("i = %d\n", i); /*14*/}5. Hey, someone stole my loop!
/* 1*/#include <stdio.h> /* 3*/main() /* 4*/{ /* 5*/ int i; /* 7*/ for (i = 0; i < 5; ++i); /* 8*/ printf("i = %d\n", i); /* 9*/}Output:
i = 56. I forgot to declare the formal parameter list but it doesn't seem to mind!
/* 1*/double f(i, j) /* 2*/{ /* 3*/ return i * j; /* 4*/}The Solutions
1. As indicated in the comment, this is a DOS-specific problem. A file does exist by the name of test.dat in the root directory of the default disk, but you can't seem to get at it. The message produced, however, does indicate some kind of problem. For some reason, the leading part of the filename is missing. On closer inspection you realize that not only is the t missing but so too is the \ and the two together mean something special in C.Of course, the compiler sees the filename as a tab character followed by est.dat. Why didn't the tab get displayed in the output? It did, but the tab stops of the terminal were set such that the tab was not obvious.
The solution? Use \test.dat instead. Now the compiler sees no escape sequence.
Note that according to ANSI C, a different rule is applied with the #include preprocessor directive. For example,
#include "\test.h"causes the preprocessor to search for the file test.h in the root directory. The \t is not interpreted as a tab. The grammar of this directive is specific to the preprocessor the construct ". . ." must not be treated as a string literal. It is simply a string of arbitrary characters delimited by double quote characters. (ANSI C defines header names as quite separate tokens from string literals. The actual grammar used here is "xxx" where xxx is called a q-char-sequence.)2. It is not unreasonable to expect the following output:
??(Error: message)However, ANSI C requires:
[Error: message)According to PC-Lint you have stumbled on one of ANSI C's quiet changes," an instance where a correct program's behavior has been changed by the standard.
line 5 - Trigraph Sequence '??(' in literal (Quiet Change)I will not discuss trigraphs except to say that a trigraph is a three-character sequence beginning with ?? that permits certain punctuation characters to have an alternative representation. ANSI C invented trigraphs to allow C source to be mechanically converted to machines supporting the ISO-646 character set (which can have alternate graphics for #, |, [, [, ], and \, for example). Trigraphs are recognized before any tokens (such as string literals) are processed.3. Yes. Before a function can be called, each of its arguments must be evaluated. Then their values are put into the function's call frame. A compiler is not required to know anything about any of the standard library functions (although it is permitted to), so provided the actual argument list is compatible with the function's prototype, the call is OK. In this case, the ellipsis notation is used in the printf prototype so there's no conflict.
Interestingly, PC-Lint had the following to say:
line 7 - number of arguments inconsistent 'with formatThat is, it really did check the number and type of trailing arguments against the format string. This can be a very useful check, but it can only be performed if the first argument is a string literal. The inconsistency would not be detectable if the format argument were the name of a char array or a pointer initialized at runtime.4. On the 16-bit compilers I tested, by far the most common output was
i = -32768However, another result is possible:
i = -1[You can even get 0 on some machine pjp]One compiler gave the following hints as to why:
line 7 - Function f has no prototype line 7 - Constant has long type line 10 - Parameter list for f is inconsistent with previous callWe know that on a 16-bit twos-complement machine, the smallest int value is -32768. As such, many people expect that is what we have passed to f. Certainly that's what f is expecting. Note, however, that the compiler warned that a long int was actually passed. Let's accept that for now and see what follows. We pass a 32-bit long, yet f expects a 16-bit int. You can pass the long in two ways low word first or high word first. Depending on which way the implementation chooses, f either maps into -32768 or -1.By adding a prototype in main for f, of the form
/* 5*/ void f(int);the long int -32768 would be silently truncated to an int. Interestingly enough, this really would have the value -32768.Now, back to the type of -32768. In the April 1990 issue of CUJ (Volume 8, Number 8), I made the following statement: "An expression such as -32768 consists of two source tokens; the unary minus operator and the integer constant 32768. Note there is no such thing as a negative constant in C. The constant is non-negative and it is preceded by a unary minus operator. An interesting situation exists on 16-bit twos-complement machines where -32768 is the smallest value that can be stored in an int. It so happens that the type of -32768 when written in this form is not int; it's long int, but that's another story."
This resulted in reader mail (and subsequent reply), but here is the explanation.
According to the ANSI Standard, (page 28, lines 37-41), "The type of an integer constant is the first of the corresponding list in which its value can be represented. Unsuffixed decimal: int, long int, unsigned long int; unsuffixed octal or hexadecimal: int, unsigned int, long int, unsigned long int; suffixed by the letter u or U: unsigned int, unsigned long int; suffixed by the letter 1 or L: long int, unsigned long int; suffixed by both the letters u or U and 1 or L: unsigned long int."
32768 is an unsuffixed decimal, and it won't fit into an int. The compiler tries long and it works, so long is its type. The compiler then applies negation to the result of that long int. If you look carefully, you will see that the rules are different for decimal and octal/hex. The following program demonstrates this:
#include <stdio.h> main() { printf("sizeof(-32768) = %lu\n", (unsigned long)sizeof(-32768)); printf("sizeof(0x8000) = %lu\n", (unsigned long) sizeof(0x8000)); printf("sizeof(0100000) = %lu\n", (unsigned long)sizeof(0100000));The values -32768, 0x8000, and 0100000 have exactly the same bit pattern when stored in 16 bits. However, the type of the first expression is long, while that of the second and third is unsigned int. The correct output produced is
sizeof(-32768) = 4 sizeof(0x8000) = 2 sizeof(0100000) = 25. I see this kind of problem all the time, mostly with programmers new to C. When starting C, you learn that all statements must be terminated with a semicolon. However, if a while or for statement really is a statement, where does its semicolon go? The answer is "They don't have one, but each of their subordinate primitive statements does need one." In my introductory C textbook, I call for, while, if, etc., constructs rather than statements to avoid students putting in unneeded semicolons.The problem is that the language supports a null statement represented simply by a semicolon. As a result, spurious semicolons may be hazardous to your program, as in the previous example. The trailing semicolon on line 7 represents a null statement that is subsequently taken as the body of the loop. The call to printf, therefore, occurs once, after the loop terminates. In this case, the output indicates the problem, but in cases where the actual loop body produces no visibly strange behavior, the problem can be difficult to find.
PC-Lint did issue the following, very useful message:
line 7 - Suspicious use of ;I didn't experiment to see just which uses of ; are not suspicious, but even if all null statements were flagged that would be useful, since null statements are not very commonly needed.6. We all know that you must explicitly declare an identifier before using it. Of course, many rules have exeptions, and the obvious one here is for functions. If the compiler comes across a call to an unknown function, it presumes that function returns an int. It is not able to check the argument list because no prototype is available.
There is one other, very obscure exception. If you omit the type from any formal arguments in a function definition, the compiler assumes that the argument has type int. The following old-style definitions are equivalent:
double f(i, j) int i, j; {} double f(i, j) int i; int j; {} double f(i, j) int i; {} double f(i, j) int j; {} double f(i, j) {}The equivalent prototype version is:
double f(int i, int j) {}However, you are not permitted to mix the old and new styles in certain ways. For example, the following are invalid:
double f(i, int j) {} double f(int i, j) {}Having said all this, I strongly recommend you not use such default typing.