Columns


Questions & Answers

More On Passing Arrays And Precedence Rules

Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707.

You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).

Announcing The Great Name/Obscure Code Contest

Based on a reader's response later on in this column, it appears reasonable to launch a new contest.

Send examples of the worst names or abbreviations that you have seen in other people's programs (or even your own). Include both the name and a description of what it is supposed to represent. The best (or worst) examples will be published here, with credit for your submission. The name of the programmer who actually wrote the code in which the name is used will not be mentioned without his/her express permission.

Q

When we met at the Triangle C Users' Group meeting, you invited questions at any difficulty level on 'C'. So, here is one:

I try to pass a char array to a function. In the function, I use sizeof to get the array's allocated size. It doesn't work. The function is returning the sizeof passed_array as if it were a pointer, two bytes long. It must be a pointer with the location of the beginning of real_array in it. Right?

I asked our instructor about this, and he said the code in Listing 1 would work. At least, that's how I understood what he said. It produces the results shown below it when compiled with Power C from MIX or Instant C 3.0 from Rational Systems, and fails with these compiler messages under Turbo-C 1.5:

(marker between char and passed_array[]; in
          getarray()'s formal parameter list)
" Error 10:Type mismatch in redeclaration of
          'getarray' "
(then, at the end of function getarray(), it complains:)

" Warning 13:Parameter
'passed_array' is never used in
function "
How can I get an array's allocated size within a function it has been passed to?

What is bothering Turbo-C, and why don't the other compilers complain similarly?

If an array's name is really a pointer to the array's beginning, why doesn't sizeof(real_array) also return a 2 when called in main()? (Not that I WANT it to ... :-)

Glenn Jordan
RTP, NC

A

The declaration of an array in a function as a local variable actually sets aside storage for the array. In this sense the sizeof(real_array) is 20, because that is how much storage is set aside for it. The name of an array (one declared as local variable) (or a static/external) represents the address when passed to a function,

When you declare a parameter to be an array (e.g. passed_array), you are not really declaring an array at all. You are really declaring that the parameter is a pointer. You pass an address in the call (i.e. real_array), and the function receives that address in passed_array.

The sizeof passed_array is the size of a pointer (two or four bytes, depending on the memory model). Alternatively, you could have declared it as int *passed_array;. For parameter declarations, both int *passed_array; and int passed_array[]; are equivalent. The compiler interprets both as meaning that the parameter is a pointer. Your instructor may not have mentioned that you can reference an individual int with a pointer by using either:

passed_array[i]
or

*(passed_array + i)
The compiler treats both declarations identically.

Instead of using sizeof, you could pass both a pointer to the array and its size in either bytes or in elements. Usually the element count is more useful than the byte count:

function(real_array, 20);
.....

function(passed_array, size)
char passed_array[];
int size;
    {
    for (i = 0; i < size; i++)
        {
        ......
You can avoid passing the size by designating a unique value for the end of the valid elements in the array — just as strings (character arrays) are terminated with the NUL (zero or all bits off) character. Remember, the terminator must be some unique value that will never appear as a valid value for the type you are manipulating.

Q

My instructor says that (ch)++ evaluates as increment by one type-size-length the value found in address ch. No quarrel there.

Being curious, I asked him how the expression would evaluate without the parenthesis (). He said that since ++ and * are unary operators, and that * had higher precedence than ++, the following was true:

*ch++ is evaluated identically to just ch++ without the *
and that in both cases, you would just increment the address ch by 1.

I disagreed (never disagree with your instructor). I said that if * had higher precedence than ++, like he insisted, that

*ch++ should be exactly the same as (*ch)++
He said no, since ++ is a unary operator, it could only see the ch, not *ch, even if * had already operated on ch. I told him I thought that was really crazy, and he got upset...

Now, actually, as you experienced programmers know, * and ++ have the exact same precedence, and are evaluated right-to-left when there is an associativity question, as above. So, he was right, the parentheses are required to make (*ch)++ increase the value held in address ch. But the stuff about ++ acting only on the adjacent operand must be wrong, right ???? I mean, wouldn't:

++*ch do exactly the same as (*ch)++
He claimed that in the case, ++*ch, the ++ would not know what to do with the operand *, while I claimed that *ch would already be evaluated to the single-value contents of address ch when ++ attacked. He responded that he had been programming in C for years, and knew what he was talking about.

Comments? Perhaps I am the one who is misunderstanding...

Glenn Jordan
RTP, NCA

A

You are correct in your interpretation. By associativity and precedence rules:

ch++ equals *(ch++)
Both use the address contained in ch as a pointer and then increment the contents of ch using pointer arithmetic.

++*ch equals ++(*ch)
These forms use the address in ch as a pointer and increment the contents at that address.

*++ch equals *(++ch)
These forms increment the contents of ch using pointer arithmetic and then use that new value as a pointer.

In order to post-increment the contents of a target location, you need to use explicit parentheses to overcome the precedence, yielding:

(*ch)++
This combination uses the address in ch as a pointer and increments the contents at that address.

For example, if we assume that doubles are eight bytes long, then incrementing a pointer to a double increases that pointer by eight. The comments in Listing 2 detail the behavior of various pointer/increment combinations.

Reader Responses:

Character Constants

This letter is in response to your discussion of character constants on pages 113 and 114 of The C Users Journal, January 1990. I believe that your discussion is flawed and that the Microsoft C and Quick C implementations do not comply with the draft standard.

You say that the character è (where e is replaced by the accented e, code 138 decimal) is not part of ASCII and so the compiler could do with it what it wants. I refer you to the following items in the December 7, 1988 draft C standard: Section 2.2.1, Page 11, Line 12:

Both the basic source and the basic execution character sets shall have at least [emphasis added by RHG] the following members ...

Section 3.1.3.4, Page 29, Line 16: c-char:

any member of the source character set except the single quote ', backslash \ , or new-line character escape-sequence

Section 3.1.3.4, Page 30, Line 33:

If an integer constant contains a singlecharacter or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

Given that Microsoft's C and Quick C compilers accept character 138 without any diagnostic, I think it is safe to assume that they consider the character to be part of the source character set. Therefore, the literal is indeed a legal character constant and so should be treated like (int) (char) è which has the value -118 if characters are treated as signed. Hence, your example demonstrates a bug in the compilers, not an implementation dependency.

I have copied this note to the postmaster at Microsoft in the hope that he will forward it to the appropriate person in the C compilers development group for comment. You may also use this letter in a future column if you see fit.

Richard H. Gumpertz
Leawood, Kansas

A

You (Mr. Gumpertz) are correct, I believe. The sample program in the article (pg. 114) shows a bug that has actually been in MS C since well before the ANSI standard (I duplicated it all the way back to C v4.0).

We do appreciate you bringing this to our attention. This bug will be fixed in our upcoming version of MS C.

Thanks.

Dave Weil
Group Development Mgr.,
System Languages
Microsoft Corp.

Thank you. I stand corrected on this technicality. You are correct if a character representation is accepted in a character constant, then it should act according to the rules for characters. Non-ASCII characters are not in the ANSI standard source character set that must be supported by a conforming compiler.

I strongly urge against using non-ASCII characters as character values. You can always use a #define in their place. Not only do you avoid the inherent non-portability of such a program, you also avoid word processing problems.

For example, I was porting a program somebody had written with a word processor that accepted non-ASCII characters. My word processor does not accept them. It uses the high order bit as an internal designation of the end of a word. It read the program, but the non-ASCII characters appeared as the ASCII value with the high-order bit off.

If you do use characters with the high-order bit on, then you could declare the variables that use them as unsigned chars preventing sign extension when the char is expanded to an integer. —KP

Naming Conventions And Indentation

Everyone knows the "CMP" stands for corrugated metal pipe, not compare or compute. —Marcus Russell, West Berlin, NJ

This comment refers to a previous response I had given to a question regarding naming conventions. I suggested that one should adopt some standard abbreviations, if one did not spell out names in full.

Comments from other people have suggested that there is a widespread distinction between the vowel droppers and the first few letter users. "compute" could be abbreviated as "cmpt" or "comp", depending on your preference. In my earlier days I used to use "cmp" as a shortening of "cmpt". This always caused conflict when "compare" got shortened to "cmpr" and then to "cmp" also.

I find it interesting reading listings in this and other magazines. I believe that a program should be almost as readable as a book. Using fully spelled out variable names contributes as much as any other factor to easier understanding of a program.

This leads me on to another topic of readability — the great brace debate. Brace alignment of compound statements seems to be a topic that provokes a variety of opinions. I think that, like taste in art, each person's view is different and sufficient justification can be developed to support any particular stand.

A recent article in the C Gazette had some words to say about indentation styles. I recommend the magazine for those who like reading C code in order to learn about it. There are a lot of source listings in that magazine.

There are many possibilities for brace alignment. If braces are placed on lines by themselves, then either or both can be aligned with the enclosed statements or one tab stop to the left of the statements. Alternatively, the opening brace may be on the same line as the controlling statement. The closing brace might be on the same line of the last enclosed statement. This yields a number of possibilities. In Chapter 14 of All on C, I listed four common formats. Here are those with a few more. I've left off several variations which appear rather ugly and of no use.

Braces on separate lines and aligned with enclosed statements.

if (x)
    {
    ...
    }
Braces on separate lines and aligned with controlling statement.

if (x)
{
    ...
}
Opening brace on same line as controlling statement, closing brace aligned with enclosed statements [my preference rlw].

if (x) {
    ...
    }
Opening brace on same line as controlling statement, closing brace aligned with controlling statement (Kernighan and Ritchie style).

if (x) {
    ...
};
Opening brace on same line as controlling statement and closing brace on same line as last enclosed statement.

if (x) {
    ...}
Luckily there are "pretty-print" programs that you can use to alter the style of the indentations, for programs you have written or that you have received and are trying to alter or maintain. However it's usually wise to adopt one style and use it faithfully. I originally adopted the style:

if (x) {
    ...
    }
The initial choice was arbitrary. Later I reviewed my usage and found a few compelling reasons to switch to:

if (x)
    {
    ...
    }
This appearance is more consistent with the use of indentation for non-compound statements. Those looks like:

if (x)
    statement
It also makes it easy to match up braces. The other styles which have unaligned braces make it more difficult to do this.

Those of you who submit code for this column will find that I have reformatted the listing for the sake of consistency within the column. —KP