November 1991/Questions & Answers

Columns

Questions & Answers

Portability And Standards

Ken Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).
I'll be giving sessions on "Function and Data Abstraction" and "Managing the Transition to C" at the C-Forum in Boston this November. As technical chairman, I recommend that you register for the conference. It will have a hot lineup including Brian Kernighan, Robert Ward, Dan Saks, and a number of other authors in the C world. See you there.
Q
In the July 91 issue of The C Users Journal, you condone the following practice:
...
int int_compare(const int *, const int *);
typedef int (* fcmp_t)(const void *, const void *);
...
qsort (int_array, num_elements, sizeof (int), (fcmp_t) int_compare);
...
This practice is unportable because an int *and a void * need not be the same size, which means int_compare()'s parameters could be garbage. If you are doing this out of ignorance, then I am shocked at the level of understanding you and the magazine's proofreaders have of C.
If you are doing this because "you don't have to worry about being that portable" or "it works on my PC," then I suggest changing the name of the magazine to The MS-DOS C User's (who also don't care about porting or standards) Journal. What is so terrible about the following?

... int int_compare(const void * p1, const void * p2) { int i1 = * (const int *) p1, i2 = * (const int *) p2; } ... qsort (int_array, num_elements, sizeof(int), int_compare);
There are far too many nonportable and buggy C programs and confused C programmers (whether they know it or not). This is because C is a dangerous language and the overall quality of literature related to C is terrible (incorrect information, bad advice). ANSI went a long way towards making C less dangerous and more defined (e.g., prototypes, conversion rules), but I have not seen much improvement in the quality of C literature.
After rereading this letter, I feel I should make it clear that I am not as mad or disgusted as it may seem. I enjoy reading your magazine every month (the University library subscribes), although I wish you would adhere more to the ANSI standard when answering questions
Jamshid Afshar
Austin, TX 78751
A
You are right that void * pointers may not have the same size as int * pointers. (If you know of a system where these pointers don't have the same size, I would like to hear about it.) On many systems, including the IBM-PC, int * () (function pointers) may not be the same size as int * or void *.
I can easily defend my usage for just the reason you mention: "There are too many ... buggy C programs." With your approach, it would be easy to use the comparison function inappropriately outside of the context of the qsort. For example:

int int_compare(const void * p1, const void * p2); double d; char c; if (int_compare(&d, &c)) ...
is a perfectly legal construction, given your prototype, that will pass every ANSI compiler's test with flying colors. However in the words of Richard M. Nixon, "It would be wrong." If you used the "stricter" version of the prototype and function

int int_compare(const int *, const int *); double d; char c; if (int_compare(&d, &c))
then all ANSI compilers will flag this as a flagrant violation. One needs the void * parameter only for purposes of qsort and similar functions. Even if one passed a ridiculous function to qsort with this typedefing, say:

int silly_int_compare(double *); qsort (buffer, number, size, (fcmp_t) silly_int_compare);
it will be fairly apparent that the sort did not work. qsort itself will not modify areas outside the space designated by the other three parameters. The only way you can shoot yourself in the foot is if you make silly_int_compare() modify the address which is passed.(KP)
Q
Using curses, I coded something like the following:

while (!done) { move (10,10); printw("Default value"); move (10,10); refresh (); gets (char_array); /* More code */ }
It worked fine the first time around, but the second time around the default value was not printed out. The value that was input from gets() stayed there. Why?
Larry Meyer
Bahama, NC
A
Originally, the curses library was intended to provide terminal-independent screen positioning routines for use over low-speed lines.
The package keeps an internal representation of how it thinks the terminal screen appears, as well as a representation of the modifications you want to make. When you make curses calls, such as printw, this internal representation is updated. No characters are sent to the terminal screen until you call refresh.
When you call refresh, curses determines the minimum number of characters to send to update the real terminal screen so that it agrees with the representation of the modified screen. This so-called cursor optimization gives rise to the name curses.
For example, if you write

move (10,10); printw("Default value"); move(10,24); printw("Another value"); refresh ();
curses does not send two cursor positioning commands. Instead, it determines what character is on the internal screen and sends something like

CURSOR TO 10, 10 Default value Another value
where CURSOR TO 10, 10 is replaced by the terminal control sequence that performs that operation
When you called refresh the second time, the internal representation had not been modified, since you used the standard I/O function gets (). You must use the curses function getstr(char_array) to update the internal representation.
Note that running under most MS-DOS-based packages, cursoring or writing strings updates the real screen directly. There is no refresh () function.(KP)
Q
As I understood, one of the great advantages of Object Orientation as the ability to encapsulate and hide implementation. I have now struck this problem.
When creating my own classes, a structure from another library may appear privately in a class. So to define the class, I would use the code in Listing 1.
Any user of the InfoWindow object must now include the grafix.h header file in addition to the header for the class. This means coupling between the InfoWindow object module and its caller appears unnecessarily tightened. It seems to show the implementation and break the cohesion of the object module.
I could use byte arrays the size of the graphics structures within the class, then cast pointers to them inside the member functions. However this depends on remembering to change the size of the arrays in the class if the size of the structures change. It all seems a bit of a quick and nasty fix anyway.
How can I prevent users of my objects having to include headers of structures (and even lower level classes) when they don't need to know?
Glen Watson
Glendalough, Western Australia
A
Why don't you put the
#include <grafix.h>
statement inside your header file. You can protect it from multiple inclusions by adding a conditional compilation test.
File grafix.h:

#ifndef _GRAFIX_H_INCLUDED
#define _GRAFIX_H_INCLUDED
/* remainder of grafix.h */
#endif

File infwndo.h:

#include <grafix.h>
/* remainder of infwndo.h */

File infwndo.cpp:

#include <infwndo.h>
/* remainder of infwndo.cpp */
Note, if you are concerned about the name mangling that C++ performs on the function names, you can tell the compiler not to mangle names by using the statement
extern "C" {
       #include "grafix.h"
   }
Coding Style
In the classes that I teach to corporations, there are always several questions about code style. I stress that readability and consistency are good goals to strive for in writing C. The beginning C programmer is cautioned against writing code that does not resemble the language he/she has been using.
For example, a common C idiom is:

while ( (c = getchar()) != EOF) { /* Do something */ }
The use of an assignment statement within the test expression can be confusing. Intermediate and advanced C programmers may easily decipher the statement for its intended meaning. I suggest that it may be just as clear to write:

for(;;) { c = getchar(); if (c == EOF) break; else /* Do something */ }
or better yet:

eof = FALSE; while (!eof) { c = getchar(); if (c == EOF) eof = TRUE; else /* Do something */ }
I tend to prefer the latter, as it is more extensible, even at the cost of an extra comparison on every loop. For example, you might see code as the following:

while ((c = getchar()) && c != EOF && c != '\032' && c != '\177' ) { if (c == 'A' || == 'B' || c == 'D') /* Do something */ else if (c == 'D' || c == 'E') /* Do something else */ }
Using the second form above, this would be coded as:

done = FALSE; while (!done) { c = getchar(); switch(c) { case EOF: case '\032': case '\177': done = TRUE; break; case 'A': case 'B': case 'C': /* Do something */ break; case 'D': case 'E': /* Do something else */ break; } }
If you used this form consistently through your code, then your code may be more readable. I prefer the look of the switch over the nested if statement. If I need to find whether a particular value has been accounted for, it is easy to spot it in a list of case values.
Of course, if you are after maximum speed, then the test expression with the embedded assignment statement may compile to faster code. However, if you used the first form as this:

for(;;) { c = getchar(); switch(c) { case EOF: case '\032': case '\177': goto end; break; case 'A': case 'B': case 'C': /* Do something */ break; case 'D': case 'E': /* Do something else */ break; } } end:
then your code may be just as fast as the C idiom since the loop comparison is eliminated and replaced with a goto. The need for maximum speed version should only be checked after you have seen whether your code is fast enough using a consistent C style.
You may notice that there is an unreachable statement in this code. The break following the goto is not needed. I have a tendency to leave it in for consistency's sake. It makes that set of statements for the case appear like all other sets. Likewise it is needed in the form with the done flag.
Which approach is best? Whichever one works for you consistently.
Names are another area of readability that I have harped on in the past. I have just started teaching OSF UNIX internals, since it is now becoming available. Much of the kernel has been adopted from BSD. It is quite clear from the code which parts have been added. The BSD code (which was written with an eight character name limit) and the OSF code (with a 32 character limit) stand distinctly apart.
New OSF system calls include task_set_special_port() and thread_suspend(). Members of the kernel structure vm_object (a virtual memory object) include paging_in_progress, resident_page_count, and con_persist. Code can be read without resorting to an internal associative table in your mind.
As far as whether the Hungarian style (aVeryLongName) or the underscore style (a_very_long_name) is better, I offer the following advice.
Use whichever one you (and your colleagues) agree on. You may note my preference from the examples I use in this column. Schoolday admonitions about following capitalization rules may have something to do with this approach.

Readers' Replies

Word Counting
Re the timing programs in August CUJ (Listing 1) . There's a bug in your wordcount routine. Suppose it's called with wordcount("one"); It will segfault on the line after the strchr() call. When strchr() returns NULL, then *s will reference an invalid address.
Rich Salz
BBN
The routine, printed here in Listing 2, is not mine, but you are right. It may work on some machines (if there is a 0 at address zero), but it is wrong in its C.strchr() returns NULL when it does not find the character. In a less concise form, the code might look like Listing 3. A little less conciseness may prevent concision of your program. (KP)

Order of Functions
In July 1991 Vol 9, No 7, Lyle O. Haga, was asking about the position of main(). I would like to add the following. I discovered while experimenting with device drivers for MS-DOS that the order of functions within a module, and the order of object files in a link command was very useful. A device driver needs to know where it ends, so placing your last function (usually the initializing function) as the last in a module, and the module as the last in the list of objects for linking, enabled me to point to this function, and thus mark the end of the driver.
Another important use of this was writing firmware in parallel C for transputers. Using 3L's parallel C and positioning time critical functions first, they are placed in the transputers local memory, which has a much faster access time than the external memory. Note the configuration file must also reflect that the code should be placed in the local memory and not data.
Adrian Kemp
Bucks, England
You have some good points. If I need to know the length of a function, I usually put some sort of dummy function after it.
In addition to the hardware you mentioned, placement of functions may affect execution time in paged machines. If functions that constantly call each other are spread over a large address space, the number of pages required for efficient program execution (called the "working set") might be significant. On a single-task system, placement has no effect unless the real memory was significantly limited.
On a multi-tasking system, however, a memory hogging program can affect its own execution, either by being swapped out or by requiring other programs to be swapped out. If time and/or space is critical in your program, you should worry about function order. (KP)