June 1992/Questions & Answers

Columns

Questions & Answers

Initializing Arrays

Ken Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).
Q
I have been a subscriber of The C Users Journal for a few months. I like the magazine, and of course your "Question and Answer" columns. I would like to submit to you a little programming question.
Presently I am working with some C modules containing array definitions and initializations like these:
#define N 2
static int num [N] = { 1, 2 };
static char * nam[N] = { "first", "second" };
int fun1(int);
int fun2(int);
static int (* fun[N]) (int) = { fun1, fun2  };
The modules have to be customized frequently, varying the value of N, and the initializations. Each time I must guarantee that all the elements of the arrays have been initialized (to non-zero values). For example, if N is changed from 2 to 3, I have to guarantee that a new element has been added to the initialization of fun. One of the reasons for this is that in my modules fun[N-1] will be called without checks.
Of course, it is easy (and desirable) to introduce some runtime checks, but I am also looking for a way to check at compile time if my arrays have been completely initialized or not. (This will be more useful than runtime checks).
I tried the following solution

static int num[] = {1, 2 }; #if sizeof(num) != N * sizeof(num[0]) #error wrong number of initializers for num #endif
But this solution is not accepted by Standard C because the use of the sizeof operator within #if is explicitly prohibited. I found that some compilers accept this construction, but most generate an error.
I tried this second way

static int num[] = { 1, 2 }; int num[N];
If the number of initializers for num is different from N an error is produced by the compiler (something like "redefinition of num"), while if the number is exactly N then the duplicate declaration of num is accepted. But this works only for global variables, for other variables the redefinition of num always produces an error.
In my real case, many of the arrays I want to check are global, so this second solution works fine. My problem is really originated by the fact that I have many of these arrays and their declarations are scattered and interspersed with other declarations. I would like to know if the duplicate declaration is really Standard C, or not.
But I also have some local variables: for these I did not find a clean solution to my problem. I found only some tricky solutions such as the following one.

static int num[] = { 1, 2 }; int dummy[ sizeof(num)/sizeof(num[0]) == N ];
If the number of initializers for num is different from N, the number of elements of the array dummy evaluates to zero, so the compiler produces an error. But I do not like this solution (I was never serious about using it).
I would be very grateful to you, if you will tell me your opinion about this question, although I understand that the problem is not so important.
Bruno Fassino
Torino, Italy
A
The solution that you used for global variables appears to be ANSI standard. Section 3.5.4.2 states that two compatible arrays need the same size specifier, if it is present. Using an initialization list as an implicit size specifier would seem to count. So if the size specifiers were not the same, an error message should result (if the declarations are in the same source file).
Your requirement for a compile-time error makes matters a slight bit more complicated. The assert macro would abort the program if the value of the expression passed to it was false. This would only occur at run time.

assert( sizeof(num)/sizeof(num[0]) == N);
The sizeof is a compiler operator, rather than a preprocessor function, so your #if approach using sizeof is not ANSI standard.
My standard approach when multiple data types must be coordinated is to use a structure, as in Listing 1. This keeps all the initialization values for each group together on a single line, so it is easier to be sure that all have been initialized properly. You would make references to group[i].num, and so forth.
I like your creative approach to the local variable problem. I cannot think of another way to create a compile time error. [But it is also not Standard C — pjp] Perhaps one of our readers has an answer.
Q
I'm writing to you because I'm baffled. I have just bought Turbo Assembler with the purpose of using inline assembly in my C programs. I typed in the plusone.c routine (Listing 2) taken from the Turbo Assembler User's Guide and checked that the procedure
tcc -S plusone.c
tcc plusone.asm
produces working code (though the plusone.asm looks a little bit different than plusone.asm on pp. 262-264 of the above mentioned manual), while the following sequence
tcc -S plusone.c
tasm plusone.asm
tlink plusone
gives out something discouraging, and the resulting plusone.exe doesn't work!!! And the question is should I care about it or consider it a bug or what?
Vladimir Laptev
Los Angeles, CA
A
It is not a bug. You simply failed to specify to tlink the additional files necessary to completely link the program — the startup routine and the appropriate C library. The tcc program, like many other compilers, will automatically invoke the linker (tlink), unless otherwise requested. A compile-only or produce-assembly-output flag inhibits this invocation. When tlink is executed by tcc, it is passed two additional filenames. These are CX.LIB and COX.OBJ, where X is the memory model that you used for the compilation.
The source for COX.OBJ is CO.ASM, which is in the Turbo C .\EXAMPLES\STARTUP directory. It may be educational to look at some of the setup it performs. You can modify it if you wish or substitute your own if you are operating on a system other than MS-DOS. Some of the comments have been pulled out and placed in Figure 1. It is in this startup routine that the stack segment is created.
Q
I find increasingly that my C programs do not produce good output on newer printers, particularly laser printers. For example, the printer may ignore linefeeds or carriage returns or it does not produce the ordered characters. I suspect that, to use these printers, a printer driver has to be added to the program. If this is the case, this raises some questions, namely:
1. Where can these drivers be obtained?
2. How can they be incorporated into the program?
3. Is it possible to obtain a preprogrammed file with printer drivers for the most used printers that can be linked into the program or, perhaps, be used as a header file?
Possibly the unsatisfactory output is not caused by missing printer drivers. If so, what could be the cause?
Thanking you in advance.
Gerrit M. de Wit
Boston, VA
A
There are a couple of reasons why the laser printer may not be working the way you expect it to. First, the extended character set (character values greater than 127) that was supported on many dot-matrix printers is not supported by many laser printers. You may need to switch to a different font and use different character values to get the same result that you see on the screen. Second, if you are using a serial port to drive the laser printer, it may be set up for only seven bits. If so, the printer will not be getting the eighth bit.
I have not come across a laser printer that ignores carriage return/line-feed combinations and form-feeds. All the printers I have experienced work with plain ASCII characters the same way. It's only when you are trying to do underlining or other "fancy stuff," that you run into a problem. The Slate package from the Symmetry Group may solve many of your problems. I haven't used it myself, but I've heard of people who have.
Q
There must be a good reason (but I've never seen it) as to why there is no Standard C function for clearing a terminal screen. It seems absurd since every terminal has some way to do it and on something like a teletype it could be a line feed or a NOP.
There's not even good agreement among compiler makers for the name or location of this function. I came up with the list shown in Figure 2 for a C class I was teaching.
I would guess that a lot of programmers would be curious about the background on this matter if you would care to discuss it in your column.
Bob Patton
Arlington, TX
A
Since C and UNIX are related in their creation, the C output functions regard all output as a simple stream of bytes. There is no conception of a terminal screen in the standard language library.
The curses package on UNIX and available on several other systems comes closest to a standard way of controlling the terminal screen. The clear function clears the screen. It is essentially terminal-independent. It reads an environment variable that gives the terminal type and then reads a data-file for the appropriate commands for that type of terminal. Of course, the curses library also has provision for positioning the cursor on the screen, setting attributes (e.g. highlight) and other functions.
If the curses library had been ported to the IBM-PC at the very beginning of its existence, then most of the compilers would have probably supported it. However, to my knowledge, it was several years later, so by then compiler manufacturers had come out with their own functions and put them in their own unique places.
The clrscr routines that you listed usually call the BIOS (on IBM-PCs) to clear the the screen. These would be the least portable of all the methods. The ANSI method sends the ANSI escape sequence for clearing the screen to the terminal. If the terminal is an ANSI-compliant one (and most of them have such a mode), then the terminal will respond with the appropriate sequence. If ANSI.SYS is loaded on an IBM-PC, then it will also work on that machine. This would be the most portable.
Note that this lack of uniformity extends to almost any function that makes a reference to MS-DOS. My approach to the problem has been to define my own set of functions, which then call the appropriate compiler functions. For example, I use a routine that looks like
void clear_screen(void)
   {
   printf("\033 [2J");
   }
All my code use the clear_screen function. It then becomes a simple matter to alter to
void clear_screen(void)
   {
   clrscr();
   }
or whatever may be required.
[X3J11 considered defining a set of screen-control functions, but rejected the idea in the end. At the time, we felt that we didn't know enough to choose a good set. I think we were wise not to try then. — pjp]

Readers' Replies

Conditional Expression Operators
I read your comments in The C Users Journal on the conditional expression operator, and I wanted to comment for us embedded-systems programmers (who are always concerned about space).
The conditional operator uses less code space! Although many purists out there claim that compilers should take care of optimizations, this claim is idealistic and not realistic.
One example is

strcpy(to_string, (flag ? from_string true, from_string_false));
versus:

if(flag) strcpy(to_string, from_string_true); else strcpy(to_string, from_string_false);
where there are two calls to the strcpy function with pointers to the location of to_string being passed twice instead of just once.
As your comments suggested, this is not always the best way to save space, and judgment by knowledge or trial and error should always be employed when attempting to write tight code.
Stephan Warre
Santa Cruz, CA
I well agree with your suggestion to use the conditional operator to save space. There are two reasons for using everything that C has to offer — space and time. To make a Z-80-based CP/M system (with a maximum of 64KB) work reasonably well without resorting to assembly language, using every trick in the book was necessary at some point in the code. Embedded systems with limited speed and memory size require the same care.
With more recent systems, sufficient memory and speed is usually available to be able to write code in whatever style you desire. With the more complex user interfaces of today, what is gained in speed by making your code perfectly efficient may not be as noticeable. If half the execution time is spent in window functions which you do not control, then even if you cut your code-execution time by a factor of ten, you would only cut your total program time by about half.
As a caveat, I am not suggesting that one simply ignore code optimization entirely in writing programs. If sloppy programming causes your execution time to increase by a factor of ten, then the total program time for a windowed program might increase about four and a half times. Using the proper algorithm is usually much more important than worrying about whether using two statements are slower than a single statement. (KP)

Conditional Operator
Your Q&A had a good discussion of the ternary operator. I tend to shy away from casual use of it in ordinary code, but there is one place where it really pays its way — in macros. Typically, when you're making a macro, you need something that is an expression, not a statement. And that's where the ternary operator shines. For a trivial example

#define SAFESTRING(s) ( (s != NULL) ? s : "Error: Unsafe string")
You can certainly do a

printf ("%s:n",SAFESTRING (s));
but try to accomplish the same thing with if/then and you're at a dead end. Note that in this case we're using the ternary operator to simplify program structure through a macro, not displaying it all over the code. If written correctly and properly protected, no maintainer should have to figure out the logic.
Thanks for the great column!
Randy Fay
Evolving Systems, Inc.
Hiding a conditional operator in a macro is perfectly fine by me. It is practically the same as hiding the details of a function. Your example macro could have been coded as

char *SAFESTRING(char *s) { char *ret; if (s == NULL) ret = "Error: Unsafe string"; else ret = s; return ret; }
Using a macro may save a bit of overhead in calling the function. (KP)

Conditional operator
Instead of

i = (7 > 5 ? (10 > 9 ? 4 : 7) : (3 > 1 ? 6 : 2) ;
or

if (7 > 5) if (10 > 9) i = 4 ; else i = 7 ; else if (3 > 1) i = 6 ; else i =2 ;
how about

i = (7 > 5 ? (10 > 9 ? 4 : 7) : (3 > 1 ? 6 : 2);
Not only is is it clearer than the first two, but you can then do interesting things on the left side of the "=" sign where the redundancy was removed that would look far more complex in the second form above. E.g.

(4 > 5 ? x[f(z)] : i) = (7 > 5? (10 > 9 ? 4 : 7) : (3 > 1 ? 6 : 2) ;
The key issue is factoring vs. redundancy. The use of the if statement to simplify is largely a red herring since the main contribution was to graphically structure the conditional. As I have shown, this is trivial to accomplish with conditional expressions. On the other hand, comparing a conditional expression to a case statement is not quite fair since you are simply swapping one factoring (the printf statement) for another (the comparison). The correct solution would be a case expression which, of course, does not exist in C.
Ed Ipser Solomon
I agree with you that it is a partially a matter of redundancy versus factoring. I tend to stay with nested if logic, since that is the way that it would be done in most other languages.
I prefer simple expressions — as the advertisement goes, I'm a simple kind of guy. My rule of thumb is that an expression should fit on a single line (or two lines if you use long names). Anything longer should be broken up into multiple statements. This is just a guideline. It is not meant as an absolute rule. (KP)

void * and int *
I was just reading your column in the November 1991 issue of The C User Journal, where you responded that you didn't know of any systems where void * pointers and int * pointers were different sizes. I develop software using C on an HP1000 system (this is Hewlett Packard's real-time system) and, while the two pointers are not different sizes, they do have different representations. An int * pointer points to a word address, and a void * pointer points to a byte address. So, while the pointers are not different sizes, I believe they still will not work with the typedef that you provide.
Greg Long
General Motors Research Laboratories
On most all systems, void pointers can point to any address in the system — in fact they are required by ANSI to have the same representation as a char pointer. Other pointers, such as int *, double *, need only be able to point to addresses that follow the alignment requirements of the processor. Under ANSI C, you could code without compiler error (but perhaps a warning)

void * void_pointer; int * int_pointer; int * another_int_pointer; char * char_pointer; char * another_char_pointer; char c; char another_c; int i; int another_i; char_pointer = &c; void_pointer = char_pointer; int_pointer = void_pointer;
int_pointer will contain an invalid address for an int if the variable c were not stored on a word boundary. Everything is legal (although immoral) so far. If you tried to use the invalid address as in

i = *int_pointer;
then the program may abort on a memory access violation. The ANSI rules state that a pointer to a particular type can be converted to a void pointer and back to the original type pointer without any problem. So you could code

another_char_pointer = void_pointer; another_c = *another_char_pointer;
You can also code in ANSI

int_pointer = &i; void_pointer = int_pointer; another_int_pointer = void_pointer; another_i = *another_int_pointer;
If your HP system does not allow this, then it is not ANSI-compliant. The C standard section 3.2.2.3 states that "a pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer."
This does not require that void and int pointers have the same representation. void pointers cannot be dereferenced, so their representation is immaterial. It simply means that conversion to a void pointer and back again does not eliminate any of the bits in the address. (KP)

Columns

Questions & Answers

Initializing Arrays

Ken Pugh

Readers' Replies

Conditional Expression Operators

Conditional Operator

Conditional operator

void * and int *