March 1991/Questions & Answers

Columns

Questions & Answers

Readers' Replies

Ken Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).
Q
Why do computer programmers confuse Halloween and Christmas?
Thor Parrish
Cambridge, MA
A
Let me think about that for a while. Check the end of the column. (KP)

Reader's Replies

Matching Braces
In response to Kim Tsang about the closing braces in C programs, you can improve the readability of the code by defining the braces with #define at the top of the file to be BEGIN and END.
Then they are easily found when scanning a listing and the comments at the end will help match them up. I also put the END in the same column as the BEGIN, then indent everything in between so that nothing should show up in that same column between the BEGIN and END, as shown in Listing 1.
I can keep better track of my blocks and cut down on searching. (It also saves on colored pens.)
I find that a four-character display works best for me. Any more and you crowd the right-hand side of the screen, any less reduces readability.
Garrett J. Boni
Baton Rouge, LA
Thanks for another way of handling the blocking problem. I prefer braces, as they are only a keystroke apiece. (KP)

Token Pasting
Thank you for printing Matthias Hansen's reply to my problem from the July issue. His trick of using the token-pasting operator with the angle brackets was something I just didn't think of, and it should be useful in the future.
Unfortunately, I can't get Herr Hansen's example to work. I tried Turbo C 2.0, Turbo C++ 1.0, and Microsoft C 6.0, and all report that they can't find the file MH_PRG.A. His variation for Quick C does work with all three and is similar to the solution I finally used.
I've come up with some new solutions to the problem. First, if the name-dependent code has some regularity you can use the token-pasting and stringizing operators in a block of preprocessor code to produce what you need. This can easily become unreadable, though, and so is hard to maintain.
Another solution would be to produce the code through an awk program, though it, too, requires some regularity. It will be easier to maintain. I did something like this years ago, and am still kicking myself for not remembering it.
There is yet a third way to do this, one that can handle any arbitrary differences. In our main source file, we put the line

#include "progxh.h"
and in that file we put lines like

#include "prog1_a.h" #include "prog1_b.h"
The trick is to create this file only when needed. In your make file, before the compiler is called, add a line to create progxh.h. You can use the ECHO command in a simple batch file, or maybe an awk program if the file is complicated. A clever programmer could use a feature in some make implementations (like PolyMake) to create temporary files with the appropriate contents.
None of the solutions we've discussed are clean; they all have the potential for developing portability and/or maintenance problems. Which one you use will depend upon the circumstances and whatever pay-now/pay-later philosophy you have.
Jim Howell
Lafayette, Colorado
Jim's original question revolved around using the preprocessor to force a #include to use different source files, based on the value of a #define. (KP)

qsort Function Prototypes
This letter is in response to Firdaus Irani's question concerning calls to qsort (CUJ Dec. 1990, p. 92). I, too, had similar problems with qsort. Unlike Firdaus, I had better luck with Borland's technical support. I was told that the function was strictly prototyped to only accept const void * as arguments to _fcmp(). I don't remember exactly why he said it was this way. He did tell me how to use it correctly.
Using the example Firdaus supplied I changed the prototype for comp to comp(const void *, const void *) and then changed the comp routine to look like this Listing 2.
This compiles under Turbo C++. I tried the MicroSoft C compiler and it also compiled fine.
This answers Firdaus'question of multiple calls to qsort with different compare functions. You prototype the functions with const void * and then cast the pointers to whatever type of pointer you need. Hopefully this will help others who have had the same problem.
Mike Beard
Abilene, Texas
Thanks for your answer. For the efficiency-minded, you might put the function that calls qsort in its own source file. Do not include <string.h>, but simply have a prototype:
int strcmp(const void *a, const void *b);
Then you could call qsort with strcmp. This bypasses the prototype mechanism for the sake of efficiency. (KP)
[This declaration violates the C standard, but you can probably get away with it on most implementations. — PJP]

Large Arrays In Turbo C
I'm writing in reference to a question in your column in the November 1990 issue of The C Users Journal. The question was by Abdel Hindi from Montreal, Canada concerning large static structures in Turbo C v2.0.
I just recently had occasion to write a program that used a large static three-dimensional array of characters. When I calculated the number of bytes needed to hold the array, it came out to more than 150Kb. I tried to compile the program with the compact memory model and received the same error message as was mentioned, "Too much data...."
Please see Listing 3 for my solution to this problem. I declared a two-dimensional array of huge pointers as global data. In the program I use the function farcalloc to add the third dimension to the array. After the memory data area has been allocated, all accesses to the data appear just like a three-dimensional array.
I hope this helps.
D. F. Spencer
San Jose, CA

Externs In Headers
In reference to your column in the October 1990 issue (page 83), the following is an approach that I use to solve the question posed by Andreas Lang.

#ifdef MAIN #define EX #define EQ(i) = i #else #define EX extern #define EQ(i) #endif EX int i EQ(5);
This code excerpt allows a single header file to provide both the declarations and the external references.
I always #undef EX and EQ at the end of the header, but this is not required.
Gary Hoskins
Castroville, CA

Pointers
Rex Jaeschke's illuminations about pointers and your recent column got me to do some experimenting on my own as follows:

int p[10]; int (*q)[10] = (int (*)[10]) malloc( 2 * 10 * sizeof(int));
I thought I could assign *q, q[0], or q[1] another address just like I can any other pointer. I understand that p is a constant pointer and can't be reassigned another address, but what should stop me from treating any single dereferencing of q as just another assignable address. After all, with my Turbo C compiler I can do q = &p, but why can't I do *q = p? What I got was a "need lvalue" error in compilation.
I was mystified at first and tried a parallel-to-q creation of r.r and q were now of the same type. I tried to assign q[1] = r[1], bonnnkkkk!, but q = r and q[1][1] = r[1][1] worked fine. I concluded that the compiler must simply treat int [10] like it does p. So q[0] now references a declared array just like p and can't be assigned another value as far as my compiler is concerned.
I hope my analysis of this is correct and I've not slipped a gear this evening. Thank you for the hearing.
Mike Hanna
San Francisco, CA
You are right on. q is a pointer to objects that are int[10], that is, arrays of 10 integers. *q represents that array of integers. Note that some compilers will give a warning of "& operator on array name ignored" if you assign q = &p. All you need is q = p.
To follow along with your example, suppose you had:

int array_of_arrays[3][10] = { {1,2,3,4,5,6,7,8,9,10}, {11,12,13,14,15,16,17,18,19,20}, {21,22,23,24,25,26,27,28,29,30}, }; int (*q)[10]; q = array_of_arrays;
q contains the address of array_of_arrays. q[0] is the address of the first element in array_of_arrays, i.e., the value of array_of_arrays[0]. Since this happens to be an array, it is an address, which of course has the same value (but not the same type) as array_of_arrays. q[1] is the address of the second element in array_of_arrays, i.e., the value of array_of_arrays [1], and so forth.
Let's put some numbers to this example. ints are two bytes and array_of_array is located at address 100. See Listing 4. (KP)
[Confusion between pointers and arrays is endemic among even the brightest C programmers. They are two quite different creatures, but array names get converted to pointers so effortlessly and so often that they often appear interchangeable. When they are not, as in the case reported by Mr. Hanna, look out. — PJP]

Structure Access
I'm writing in response to the letter from R. Palmer Benedict (The C Users Journal, November 1990, p. 104). I wanted to address some of the issues raised by that letter, which you didn't get into. First let me note that my investigations (see Listing 5) support Benedict's statement that accessing a structure member directly is the fastest method of access. It was Benedict's letter, by the way, that induced me to run timing tests, since I prefer to access data by means of arrays whenever possible.
In re-reading the letter, I ran across some rather strange logic. No code accompanied the letter, so I may be misinterpreting Benedict's remarks. However, there seems to be some incongruity in his method. Benedict states, "Pointers or indices can access the data in the fields and subfields, which is much slower than accessing elements of a structure." Three different methods of access are represented in Benedict's statement. The fastest is direct access to a member of a structure variable via the dot operator:
i = structure.member;
The slower methods include access via a structure pointer,
i = s_pointer->member;
and access via a properly initialized array index:
i = array[fieldnum];
Perhaps Benedict's somewhat convoluted method of creating a structure variable (by using an array to set aside memory, and aiming a structure pointer at that memory) has led to some confusion on his part. He remarks that he can "access the elements of the structure through the structure pointer." Certainly he can, but by doing so he loses the speed advantage of direct access to a structure variable! He winds up defeating his own apparent purpose of using the fastest access method. As it turns out, he's using the slowest of the three methods of access (slowest on my machine, at least).
I was disappointed to find I had to agree with Benedict that access via an array is slower than some other methods. I really prefer keeping the fields of a database record in an array as opposed to a structure, because I can access any field simply by changing an array index value. Or I can print all the fields in a concise loop containing only one print statement, while the loop increments the array index. (For relevant code examples see the function Verify-Contents() in Listing 5. ) On the other hand, direct access to a structure variable requires that I specifically enter the ID of the structure member that I want. To print them all, I have to request each one individually. The same is true for access via the structure pointer. I was relieved to observe that access via an array is faster than access by structure pointer.
I couldn't understand why there would be any speed variation involved in this. It seemed to me that the same thing was happening in each case. The parts of a structure are accessed by means of offsets, which is what array indexing is all about. So when I found that there was a difference in speed, I decided to delve a little deeper. I switched to the tiny memory model, and told the compiler to generate a map file. By using the tiny model, I could ignore memory segments and find locations in the .EXE file by offsets alone. Then I used DEBUG to view the code. It turns out that the speed of the routine depends heavily upon when the address calculation occurs — at compile time or runtime. Machine constraints also affect performance.
The fastest access that I came upon was the direct access to a member of a structure variable via the dot operator. Using the map file, I found the location of the UsingS_Object function in the .EXE file. Stripping away the loop code to leave just the access statements produced
MOV WORD PTR [1E4A],1D7A
MOV WORD PTR [1E4A],1DAC
MOV WORD PTR [1E4A],1DDE
MOV WORD PTR [1E4A],1E10
In assembly language, the assignment is from right to left (as it is in BASIC and C). Each of the four hex constants 1D7A...lE10 in turn is assigned to the destination variable [1E4A]. This is the pointer variable named access in Listing 5. The four hex constants are hard-coded addresses of the structure members being accessed. There is in fact no indexing, no offset calculation at runtime. That's why this is the fastest method.
The second fastest access that I measured was access via array. The following code is the machine representation of array access, extracted from the function UsingArray().
MOV AX, [1E42]
MOV [1E4A] ,AX
;
MOV AX, [1E44]
MOV [1E4A],AX
;
MOV AX,[1E46]
MOV [1E4A],AX
;
MOV AX,[1E48]
MOV [1E4A],AX
Each of the four accesses requires two machine statements. The hex value [1E42] in brackets is array element 0. The value referenced by this pointer is moved to the machine register AX and from there to the same destination that was used in the faster code above. The machine cannot perform transfers from one memory location to another, without the intermediate step of storage in a register. If I had declared access to be a register variable, this routine might have been as fast as the previous method.
Since each access by array requires two steps, it isn't quite as fast as the direct access approach. Yet it is faster than accessing structure members via a structure pointer:
MOV AX,[1786]
MOV [1E4A],AX
;
MOV AX,[1786]
ADD AX, 0032
MOV [1E4A],AX
;
MOV AX,[11786]
ADD AX, 0064
MOV [1E4A],AX
;
MOV AX,[1786]

ADD AX, 0096
MOV [1E4A],AX
These lines are from the Using-Structure() function. Here, an offset is added to the contents of the structure pointer at runtime to locate the desired member. Now three steps are required for each access (except the first). That slows this routine down a bit more than the access via array.
It would be interesting to see how the offsetof macro translates into machine code. I suspect that the code would look like the final example above. offsetof would be figured at compile time, to calculate the values which in the above example are the constants that get added to the AX register at run-time.
It would also be interesting to see how close Mr. Benedict is to the target with another remark: To "read each field or subfield individually into its place in the structure... can be dreadfully slow." I wonder how much slower it is to assign one field at a time, rather than all at once. We're not talking interpreted BASIC here. In C, using the <stdio.h> FILE functions, a disk access fills a buffer in RAM. Then individual fields and subfields would be read from this buffer. Shouldn't be all that slow...
I'm not sure that the search for the fastest algorithm is as important as it used to be. There have been great advances in hardware speed over the last several years. The fine-tuning speed adjustments that software allows are small by comparison. They can be sacrificed in a tradeoff. This permits selection of algorithms and writing of code that is easier to read and maintain. One can let the hardware take care of the speed factor. In just the same way, high-level languages now do what used to be done in assembley language. The more competent machines can handle these larger, less efficient programs with ease. Ultimate machine code efficiency is no longer a primary consideration.
Some comments on Listing 5:
As written, a routine is iterated 70,000 times, and then the elapsed time is measured. The routines should probably be set up to run for a given period of time while the repetitions are counted. At the very least, they should be clocked by a timer with better resolution than the BIOS clock. Some of these functions perform 70,000 iterations of a test in about 18 clock ticks, which is one second. That's almost 4,000 iterations per tick. If one routine is on the quick side of 17 ticks, and another is on the slow side of 18 ticks, the difference could almost be 8,000 iterations, or well over 10 percent. That's a wide error margin. Nonetheless, I'm using the BIOS clock for a quick benchmark timer.
One of the test functions in Listing 5 is an empty loop. This is included as a sort of test of my benchmark routine logic. The analyze routine compares the time to execute two separate loops against the time to execute a single loop which does the work of the other two. The time difference between these should be close to the time to execute an empty loop that does no work. It is.
The .EXE file analyzed above was produced by TurboC v1.5 and compiled (for this analysis) in the tiny memory model. The benchmark times in Listing 6 were produced on a 10 MHz AT clone.
Art Shipman
Westbrookville, NY 12785
Thanks for your tests and comments. I agree with your comment that ultimate machine efficiency is not a primary consideration these days — unless the time required by inefficient code makes the user go out for coffee.
If I wanted to access the individual fields in the structures by using a loop, I would declare
field_addresses[] = {
   &S_object. field1,
   &S_object.field2,
   &S_object.field3,
   &S_object.field4};
and then use a loop as
for (i = 0; i < 4; i++)
{
printf("\n Field %d is %50.50s",
  field_addresses [i]);
}
That would probably be almost as fast as using an array access, since the offset addresses are precomputed. (KP)

Casts And ANSI Standard
Your column in the November 1990 issue included a letter from Hans-Gabriel Ridder, concerning the construct

char *ptr; ((long *) ptr)++;
His ANSI-conformant Lattice C compiler gave him an "lvalue required" error on this. Though he had used constructs like this for years, Lattice told him that "casts not being lvalues: was required by ANSI, in the name of portability."
As far as I can tell, a cast has never yielded an lvalue, even in "classic" K&R C. And ++ has always required an lvalue. I'm not surprised that many early compilers would accept such a construct without complaint and even give the expected results. I think early compilers often accepted syntax that was not really legal.
I'm also pretty sure that some pre-ANSI compilers would (correctly) reject such an expression. Allowing casts (even to pointer values) to be lvalues would permit such strange things as

(int *)f() = g();
when what was probably intended was

*(int *)f() = g();
Mr. Ridder was especially concerned about the case where the pointer is dereferenced along with the post-increment, in a manner similar to

char *cptr; f(*cptr++);
but treating cptr as a pointer to long. You suggested using

*(cptr += sizeof(long))
instead of

*(char *) (((long *) ptr)++)
but this does not have the same effect, since cptr would be incremented before dereferencing, and I think Mr. Ridder wanted the dereferenced value to be a long instead of a char anyway. (If not, the solution below won't help him).
Your editor claimed parenthetically that "you can also get the old behavior by writing: ((long*)&ptr)++," but this is also clearly wrong. This still tries to increment the result of a cast.
I think the editor was trying to say the following:

*(*(long **)&cptr)++
This will treat cptr as a pointer to long, dereference it, and post-increment it. This is legal C, both classic and ANSI, but it is not guaranteed to work. It assumes that you represent and manipulate pointers to char and pointers to long in the same way, and that there is no problem with the alignment of the objects being pointed to. You can think of it as a tricky way of getting the effect of a union without actually using one.
As a test, I tried the following functions on Zortech C 2.12:

char *g1(char *cptr) { f(*(*(long **)&cptr)++); return cptr; } long *g2(long *cptr) { f(*cptr++); return cptr; }
With no optimization, g1 generated shorter code than g2! With optimization, the code generated was identical for these functions.
Raymond Gardner
Englewood, CO
Thanks for your tests. As you say, mixing pointers to ints and longs or any other data objects for that matter is a tricky business. I prefer more straightforward methods, as the unions you mentioned.
When I see expressions like

*(*(long **)&cptr)++
it reminds me why some novices wonder how they ever got into this language. I prefer breaking a tricky expression into two parts, with the increment being put before or after the use of cptr, as desired.

*cptr = 'a'; cptr += sizeof(long))
Okay, it'll take a microsecond or so more. It took me a few seconds to figure out that *(*(long *)&cptr)++ = 'a' actually transfers four bytes (size of a long), even though cptr is a pointer to char. That's a tradeoff of more than a million to one. Of course, if you really wanted to transfer just a character, it would be * (char *) (*(long *)&cptr)++ = 'a'. Say that fast 10 times. (KP)

Answer to Halloween/Christmas
Because OCT 31 equals DEC 25. (Octal 31 equals decimal 25.)