November 1990/Questions & Answers

Columns

Questions & Answers

Pointers, Padding, And Poking

Ken Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukeac.ac.duke.edu (Internet ) or dukeac!kpugh (UUCP).
Q
I am a new and very impressed subscriber to CUG, and a C programmer of roughly three years.
My question here deals with the acquisition of the size of a date object — ye 'ole array of pointers to type char.
To my question, I find no decent explanation, all the way from the second edition of K&R to my TURBO C User's Guide. None of my C programming friends or associates have given me an answer worth sleeping over, hence, this letter to you!
Of course, I've enclosed code (see Listing 1) . With the array of character pointers called message, I've done a bunch of printf's to spew out information regarding its location and such. To no surprise, all of the information is just as expected, according to every description of array and pointer interchangeability I've read. All except one, that is.
Since the name of the array (message) is a pointer containing the address where the array starts (&message[0]), why can't I get the actual size of the array in bytes by doing a sizeof(message) in the final printf statement. If the variable name message is a pointer to the start of the array, it should be only two bytes long (in 16 bit machines anyway). Normally, an array is sized by multiplying the number of elements by the data type. If I copy the array name to another suitable variable of type (char *[]), say for example
char *array_copy[]
then a sizeof(array_copy) returns a size of 2, as expected. Maybe I need a doctor to cure a blind eye, because I just don't see the logic. Please help.
Thanks so much for listening!
Peter Upczak
Santee, CA
A
One statement in your question needs clarification. The name of the array (message) is not a pointer containing the address of where the array starts. (That implies that the name has its own memory location.) The name of the array represents two items. When used in an expression, it is a constant value of type pointer, which is the address. When used as the operand of sizeof, it represents the entire object.
In your example, the array message is a two-element array of pointers. Thus, its size is two times the size of a pointer (two bytes) or four bytes.
Confusion about arrays/pointers exists because the same notation is used in two different places and has two different meanings. It can also be used with different effect, both to declare a variable and to declare a parameter. I'll use your example with a bit more description for the rest of the readers.
The array subscript notation does not declare an array when used to declare a parameter. The array subscript notation states that the parameter it is receiving will be a pointer that holds the starting address of an array. The parameter acts like a local variable and is truly a pointer.
In your function new_array_name, char *array_copy[] declares that array_copy will receive an address which it will perceive as the starting address of an array of pointers to char. You could have declared it as char **array__copy. In a parameter declaration, these two mean exactly the same thing. When you pass an array name to this function, the address value is passed and initializes the parameter value.
The compiler allows you to declare the parameter as char *array_copy[2]. It ignores the size declaration (since an array is not being declared in the parameter list) and simply treats it as if you had written char *array_copy[];
Suppose you had the variables declared all in one function as:
function ()
 {
 char *message[2] =
   {"aaaa", "bbbbb" };
 char **array_copy;
 array_copy = message;
 }
The size of message is four bytes (two pointers of two bytes each) and the size of array_copy is two bytes (size of a pointer). The value of message in the assignment is simply the address of the array (two bytes long), and it is assigned to array_copy.
Note that in declarations of local variables, you cannot declare an array without explicitly or implicitly stating its size.
auto char *array_copy[];
is an illegal declaration. Therefore you cannot copy the array name message to this variable in a function.
To qualify the preceding statement, let me say that there are some conditions under which this can be done. You can make a reference to a global array using this syntax. For example:
extern char *message[];
is a valid reference to an external array of char pointers which is defined elsewhere in the program. However, you cannot apply sizeof to message, unless the size of the array is declared elsewhere within the file.
If I still haven't made myself perfectly clear, Steve Clamage of TauMetric offers another explanation:
I just read your dissertation on array vs. pointer in the August C Users Journal. This seems to be the most misunderstood subject in C.
I have found a simple way to explain the difference in the declaration of an array vs. a pointer:
The declaration
char data[10];
means that data is the address of an array of 10 chars. So when you later use data[i], the meaning is to add i to the address of the data array and retrieve the char at that address.
The declaration
char *data;
means that data is the address of a pointer which contains the address of a char (or the first address of an array of chars). So when you later use data[i], the meaning is to add i to the contents of the pointer stored in the variable data and retrieve the char at that address. (-SC)
In short, if T is a type, then pointer to T is not the same as array of T. When you declare an object in more than one place, use the same declaration for it each time. (-KP)
Q
Perhaps you can give me an answer to a question I have regarding user-defined data types.
In Pascal (at least in Apple Pascal) you can use the following statement to create a data type of LONG:
TYPE LONG : INTEGER[36];
Variables of type LONG can represent integers up to 36 digits in length and can be operated on by all the arithmetic operators with the exception of MOD.
Is there any possible way to achieve this in C?
Another question I have, has to do with necessity of casting when assigning the value of a pointer to one data type to a pointer to a different data type.
Both of the enclosed programs (Listing 2 and Listing 3) run perfectly on my system (Aztec C running on an Apple IIc). Is this just a non-standard quirk of my system or is it that the assignment operator, = , transfers only the value of a pointer and not the attributes (scalar multiple)?
Also note in June issue of CUJ, (page 98, Listing 3, line 14), the return of the function, malloc, supposedly a pointer to char is assigned without a cast to variable which has been defined to be a pointer to a struct.
Stanley Cohen
N. Valley Stream, NY
A
You could write your own functions to perform arithmetic of the type you suggest. Robert Ward used to sell a package that performed something like what you suggest.
The K&R versions of C, such as Aztec on the Apple, did not distinguish much between types of pointers. You can freely assign a pointer to one data type to a pointer to another data type without a single complaint from the compiler. Only the value of the pointer (the address) is transferred.
Freely passing around addresses without some control can lead to particularly horrendous debugging problems. Thus the ANSI standard tightened up the assignment of pointer values. You can only assign a pointer to particular data type to a pointer to the same type. You may do the assignment with a cast, as:
cptr = (char *) matA;
cptr = (char *) recptr;
ANSI C now has a pointer to void type. This is like "type O" blood — the universal donor. A pointer to any type can be assigned to a pointer to void. Likewise a pointer to void can be assigned to a pointer to any type. You could declare and assign without a cast as:
void *void_pointer;
void_pointer = matA;
void_pointer = recptr;
You cannot increment a void pointer, since there is no memory size associated with void, so void_pointer++ is illegal.
The malloc function is now declared as returning a void pointer:
void*malloc();
Therefore, the return from malloc can be assigned without a cast. I tend to use the cast anyway out of habit, but it is not necessary under ANSI C.
There is one instance where this tightening of pointer types appears to be a slight bother. Plain char, unsigned char and signed char pointers are all treated as different types. Thus you need to use a cast to make an assignment. This makes initializing unsigned char pointers a little messy. A string literal is of type pointer to plain char. So the declaration must read:
unsigned char *p = (unsigned
char *) "abc";
Note that you could have written out the matrix in Listing 2 as:
write(fd, matA, sizeof(MAT));
Even though the old description of write shows that it requires a char pointer, it will accept matA as the address. Under ANSI C, the second argument to write is described as a pointer to void. Thus, it is perfectly legal to pass it matA.
Q
I have a question that I hope you can help me with. I recently started working with a new compiler from Lattice which claims ANSI conformance. In the following program fragment:
char *ptr;
((long *) ptr)++;
it complains about requiring an lvalue for ++. I've used constructs like this for years on various compilers, but ANSI seems to have changed a lot of what I "knew." In speaking with Lattice about the problem, they said that this particular treatment of "casts not being 1values" was required by ANSI, in the name of "portability."
I realize that what I want is not completely portable, however it is more portable than the alternative (assembly language). I see no reason to restrict such useful behavior for all machines — it could be an error on machines that cannot perform the required conversions, and perhaps a "portability warning" on machines that can.
My current workaround is to use:
char *cptr;
long *lptr;
lptr = (long *) cptr;
lptr++;
cptr = (char *) lptr;
which seems like longhand for the previous fragment, and it is no more portable. Also, it produces worse code, especially when dereferencing (*) the pointer along with ++ (even after optimization.)
If ANSI has removed or standardized this functionality, then it seems we have lost a valuable feature of the language. I hope you can shed some light on this situation.
Hans-Gabriel Ridder
Colorado Springs, CO
A
As I mentioned in a previous column, the standard states that "a cast converts the value of the expression to the named type." It also states that a "cast that specifies an implicit conversion or no conversion has no effect on the type or value of an expression."
The ++ operator can only be applied to an lvalue. An lvalue represents a memory address where a value can be stored. (It is that which can be on the lefthand side of an assignment statement.) The ++ operator cannot be applied to an expression, so Lattice is correct in its statement.
Alternatively you could use:
lptr = (long *) cptr;
cptr = (char *) ++lptr;
which would cut down one statement. I prefer:
cptr += sizeof(long);
This should not generate much more code (if any) than ((long *) ptr)++. I prefer maintaining this construct in code, since it more clearly shows what the address will be incremented by. If you want, you can dereference it to a single char as
*(cptr += sizeof(long))
instead of using
* (char *) (((long *) ptr)++)
I don't think ANSI tried to standardize this out of the language. It was simply a loose end that got tightened up. [You can also get the old behavior by writing:
((long*)&ptr)++
Requiring the address of operator is wordier, but makes for a more consistent language. — Ed.]
Q
I have been writing C programs for several years and there is a point I find puzzling. Most of the programs involved some sort of manipulation of data bases. Structures are handy tools for this, but I understand C does not guarantee that the elements of structure will be contiguous in memory. One should therefore not make a declared structure the recipient of a record read from a disk, but should read each field or subfield individually into its place in the structure. This can be dreadfully slow. If one declares an array big enough for a record, then the record can be read in one operation. Pointers or indices can access the data in the fields and subfields, which is much slower than accessing elements of a structure. The same situation exists in writing from memory to a file.
I have worked out a method that has produced code that runs day after day without problems. (If it works, it must be legal C?) I declare a structure and a pointer to this structure. Then I set up an array big enough to hold the structure and initialize the structure pointer to point to the array. Now, I've indirectly created a structure that is in contiguous memory. I can read or write with the array as the target, and access the elements of the structure through the structure pointers. Both of these are fast operations. Is this a proper use of C capabilities?
R. Palmer Benedict
Wellesley, MA
A
Structure members are not guaranteed to be contiguous because of access considerations. On some machines, you can only access numerical values on word or double word boundaries. If characters are intermixed with numerical values, then packing bytes are inserted after some characters to round out the number to a word or double word boundary. These bytes are normally invisible to the programmer. However for a particular machine/compiler combination, they will always occur in the same place. The example program in Listing 4 will write out two structures and read them back in. This will work regardless of the packing arrangement.
If you write the structure to a disk file and read it back in, everything will work fine. But, if you write it to a disk file on one machine (say MS-DOS 8086) and read it back in on another type of machine (say a MacIntosh 68000) it won't read correctly. The packing bytes and the internal representation of the values could be different.
Even with the same compiler, the structure may be packed differently. The Microsoft compiler has a -Zp option, which packs structures and eliminates packing bytes. This is possible since there are no word alignment requirements on the 8086 line. Without this option, packing bytes are added to align integers to even byte boundaries.
Q
Looking through some old listings, I noticed that some C programs can only be run in Turbo C because of the Peek and Poke functions. Other DOS and BIOS functions are in Microsoft C under different names, but not Peek and Poke. Could you please make some Microsoft C equivalents?
Anthony Whitford
Sidney, BC CANADA
A
Although I will present these functions for you, I do not suggest writing a C program that peeks and pokes (other than for poking around in screen memory). You didn't say what memory model you want them for, so let me give you some simple equivalents. Note that you will have to pass each function a far pointer (four byte pointer). The first two bytes are the segment value and the second two bytes are the offset. See Listing 5.
For example:
poke(OXB8000000, 'A');
puts an 'A' into screen memory. You can always directly access memory without using a function. If you are using the large or compact memory model, you could use a char pointer. For example:
char *pc = 0XB8000000;
*pc = 'A';
or even:
*((char *) OXB8000000) = 'A';
Q
I'm an amateur programmer working with Turbo-C V2.0. I have a problem solving a program that holds a large static array of structures. When compiled under Compact or Huge Memory model, the same error appears:
"Too much global data defined in file"
Could you help me solve this problem? (The program is shown in Listing 6. )
Thank you.
Abdel Hindi
Montreal, Canada
A
I'm not sure how big the n is in your structure template, or how big the strings were that you were using. I presume either or both must be fairly large. With a compact model, the size of an array is limited to 64K bytes. A value of n greater than 64K bytes / 4 bytes / 200 elements, or about 80, would be too big.
If the n was smaller than this, then the error is due to the total length of the character strings. I can only find implicit references to an assumption that constant data in a module is placed in one segment and that there is no way to specify this to be otherwise. Thus, the total length of the character strings in a single source file cannot exceed 64K. With n equal 10 and an average string length of 35 bytes, you would exceed this figure (10 times 200 times 35 is 70,000). If a reader has any other ideas on this, please let me know.
To overcome this limitation, you could read a file to initialize the character data. You could read it a string at a time into a buffer, determine the length, perform a malloc() to get space to put it, copy the buffer into that space, and assign the pointer to each member of each element in OBJECT. Since fgets returns the string with a new-line character, you need to take it out.
The code would look something like the code in Listing 7, which does not include error checking.
Q
I've been stung by a canned software package and I can't stop the burning! While I was tuning an application consisting of a window package, an expert system, a mountain of application code, and an embedded database, I discovered that the application wasn't using the system's free() and malloc().
After much research I found that the database package that we integrated with had developed its own free and malloc routines which performed more slowly than the system free and malloc for my window package.
I tried relinking with the C library listed first in the list of libraries, but that gave me a double declaration error. Next I tried extracting free and malloc from their libraries; however, when I tried relinking, I had unresolved externals!
Any suggestion?
Robert Schweiss
College Park, Maryland
A
The solution depends on the compiler/linker you are using. Some linkers allow you to ignore the double declarations of errors and continue with the link. For example, Microsoft's has a/NOE option. Be careful if you use it, for it also allows the link to complete if there are missing externals.
When you remove a module from a library, all functions in that module are removed. The module that contained free and malloc probably also contained other functions that were required. You need to include a file with equivalent functions and the same names in your link before you access that library. For example, if the original source file for the package contains code as in Listing 8, you will need to create a file with some_other function in it that does the same thing as the one in the library.

Obscure Name Contest
See Table 1 for some unusual names that I have authored or run across in my years of programming.

Readers' Replies

More on LPT1, etc.
In the recent issue of TCJ (July '90) you responded to Chaiyos Gosolsatit of Lewiston, NY about how to print the output to printer 2 (LPT2) instead of LPT1 (stdprn). You suggested that he use the MS-DOS MODE command on the DOS command line before executing the program. This is a round about way of doing what could be done in a much more practical manner (I think).
The following program will do what he wants in a more straightforward manner. I've used fopen ("LPT2", "mode") on several occasions with both LPT1 and LPT2 (and I would guess that LPT3 would work as well) to allow the user to select which printer, local or remote, on a network he wishes to print to. See Listing 9.
Mike Fox
Ohio State
A similar reply was received from Ian Cargill of Surrey, England.

Include Filenames
I am replying to Mr. Jim Howell's question in the C Users Journal July 1990, p. 92. On the attached sheet you'll find INCTEST. C (Listing 10) which solves the problem for Turbo C 2.0 and QuickC 2.0. In Turbo C it is sufficient to simply define MH_PRG to the corresponding application name (progl in the example). Doing the same in Quick C produces the include directive #include <MH_PRG_A.h> which is not what we want. Although your suggestion to use #ifdefs is certainly the most portable way to solve this problem, I definitely prefer the more elegant Turbo C 2.0 behavior. I use a similar construct to include user-specific source files in my applications. Because each user has his own serial number I only need to write a file name r_usrXXX. c where XXX is the user's serial number. The application then automatically includes this file. This saves me the time to expand all the #ifdefs for — hopefully — lots of new users. See Listing 10. Thank your for your great column.
Matthias Hansen
Rendsburg, West Germany

Columns

Questions & Answers

Pointers, Padding, And Poking

Ken Pugh

Obscure Name Contest

Readers' Replies

More on LPT1, etc.

Include Filenames