October 1992/Q & A

Columns

Q & A

Linked-List Functions

Ken Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C progrmaming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).
Q
At the moment I am writing a fairly large program using more than one linked list. What I would like to know is whether it is possible to write just one single Store-Function for the different linked lists using a void pointer. I use Microsoft's QuickC, and I know that the realloc function for example takes a void pointer as an argument just like I would like my Store-Function to, but I just can't get it to work.
Thomas Dalgaard
Dumfries, VA
A
It is possible to write a single set of linked lists functions that work on multiple data types. There are two ways to do this in Standard C, plus an additional way that can be coded in Microsoft C. Each link can simply point to the data for that link or each link can contain the data.
In the first method, the links looks like
struct s_link
    {
    struct s_link * next_link;
    void * pointer_to_data;
    };
To store a value in the list, you pass a pointer to the data. The function would appear something like Listing 1.
Notice that the size of the data being pointed to is not passed. It is the responsibility of the calling program to keep the address of the data valid until the link is removed from the list or the list destroyed. To concentrate on the area of interest, error handling and setting the link address are not shown.
With the second alternative you allocate enough storage for the data. The function could look something like Listing 2.
With this version, the calling program can then destroy the item whose address was passed.
Microsoft allows an unsized or zero-sized array as the last member of a structure. The link could look like
struct s_link
    {
    struct s_link *next_link;
    char data[];
    };
The corresponding function would appear as in Listing 3.
The allocation for the link includes the size of the item. This Microsoft feature eliminates one level of indirection when accessing the data. In this context, the accessing will be hidden from the user. However, in other applications that have a number of records that contain variable length data, it can simplify the appearance of your code.
Q
I just started reading about C++. I noticed a function prototype that read int foo(int &);. Is the function expecting an address?
Leslie Schroft
New York, NY
A
The prototype specifies that the parameter will be passed by reference. This reference declaration is a useful one that might be added one day to ANSI C. A reference variable works like a pointer with some additional considerations. It contains an address, but it does not need the dereferencing operator (*) to access the value at that address. It must be initialized at the time of declaration of the variable and its value cannot change.
Let's take a simple case
int i, j;
int * pointer_to_int;
int & reference_to_i = i;
With pointer_to_int, you can set it either in the declaration or in an assignment
pointer_to_int = &i;
/* Sets i to 0 */
* pointer_to_int = 0;
pointer_to_int = &j;
/* Sets j to 0 */
* pointer_to_int = 0;
The reference variable must be initialized. In fact many compilers will complain if it is not. If it is the parameter for a function, it will be initialized when the function is called by the address of the argument (or a copy of its value) in the calling program.
When using the reference variable in the executable part, any assignments to it alter the value of the referred variable.
/* Assigns value of j to i */
reference_to_i = j;
/* Sets i to 0 */
reference_to_i = 0;
Notice that the initialization syntax for reference variables does not include the address symbol (&). The compiler implicitly uses the address. When the reference variable is used in an expression, the compiler implicitly uses the indirection operator.
The power of the reference declaration comes into play when it is the parameter of a function. The argument which is passed is not passed by value, but by reference. The user of the function does not have to make any changes in the calling program to use a call by reference. The compiler, seeing a function prototype which declares the parameter with the & symbol, passes the address rather than the value.
Let's take a look at the alternatives for passing parameters. Listing 4 shows the code for passing the address. Listing 5 is an example of passing by reference.
You can change the value of a variable passed by reference simply by assigning it a value, For Listing 5, you would simply write input_value = 5;. If you do not change the variable in the function, it is considered proper if you declare the parameter as const int & input_value.
There is an interesting stylistic dispute as to whether you should declare a parameter whose value is going to be changed as a reference or as a pointer. In the former case, the users do not have to put the address operator on the arguments. In the latter case, they do.
One advantage of the address operator is that it is more apparent to the user that the argument is going to be altered. My altered rule is that if the function name suggests that the parameter is to be changed, that a reference parameter may be preferable. If it does not, then a pointer type parameter should be used.
One advantage of using reference parameters over pointer parameters comes with structs and classes. If the parameter is a reference, then the plain member-of (.) operator is used to reference the members. This can make the code appear simpler and still be faster than passing the structure by value. All it takes is a change in the function header and the prototype. (See Listing 6. )

Fixed Field Files
It's been a while since I've written, but I can't resist tossing out a few comments in response to Tom Crosman's letter and your reply in the April 1992 CUJ. Tom's letter concerned the processing of mostly fixed-field files. His question concerned loading a structure with data from the file.
His own solution read a record to an input buffer, then counted and copied bytes into a structure. Your example was more complete. You used fgets to read data into a char buffer, and then followed Tom's example, counting and copying bytes. I was surprised that you didn't use fread to load the data directly into the structure!
There is a little monkey wrench here, perhaps. Tom's existing data was created using PL/I and (judging by the code he provides) the data is stored without a null-terminating byte. Adding a null-terminating byte to each member of the C structure makes the structure grow larger than a record in the file. Destination addresses no longer line up properly with source addresses, and the data does not fall into place. However, your example offers a simple solution to this problem. You use the precision specifier with the printf function to extract just the right number of bytes from the buffer. You demonstrate that null-terminating bytes are not needed in the structure, if absent from the datafile.
Your example also uses a flag to indicate whether or not the datafile contains null-terminators. Perhaps you provided this flag variable in response to Tom's need for a "very generic" solution. However, I feel that the null-terminator flag added complexity to an otherwise simple solution. Because of this flag, an additional parameter must be passed from main to each of your other functions. Moreover, each of these other functions must now handle two cases rather than one. To my mind, a function that does two things is more than twice as complex as a single-purpose function. As you note, "At some point the work of providing and using a generic interface exceeds the benefit."
I'd like to go into this trade-off a little more. Given a datafile to be accessed, you declare a structure template that corresponds to the file format. Then you create a corresponding array of field sizes. You keep the declaration of this array of sizes "close to the structure template. Any changes in the order or size of the fields can be simply coordinated." So far, so good. Then you note, "you could add an array of field addresses to the calls. If that is necessary, I might suggest not using a generic function." So the system that supports field names and sizes breaks down when a third element (addresses) is added.
The enclosed code (Listing 7) presents a solution that avoids such a trade-off. My code uses a technique which permits all of the datafile-specific elements to be moved out of the source code, and into a separate include file that I call the FILE_FORMAT. This new include file contains (only) all of the information specific to the particular datafile that is to be accessed. This makes it easy to assure that the file structure is correctly described. Futhermore, different FILE_FORMATs may be created to describe different datafiles. To access a particular datafile, simply specify the desired FILE_FORMAT in the C source.
My code is based on Robert White's technique for list alignment, presented in the "C Programming" column of Dr. Dobbs Journal, October 1990. The technique was also discussed in your column in the C Users Journal, February 1991.
The FILE_FORMATs I present as examples (Listing 8) share a standard pattern. Each defines the name of the datafile. It indicates whether or not the datafile is null-terminated. And it lists the fields of the datafile by name. Alongside each field name is the size of the field in bytes. Each fieldname/fieldlength pair is punctuated by a comma, surrounded by parentheses, and prefixed by the letters ff for File-Format. This arrangement produces a set of macro invocations, one for each field of the datafile. The technique permits the C source to #define a macro, and then invoke the macro for each field of the datafile simply by including the FILE_FORMAT file.
The include file ends by #undefineing ff. It undefines a macro #defined within the C source, and thereby permits the C source to define another macro. In the C source I present here, the include file FILE_FORMAT is brought in three times. First, it is used to create a structure fitted to the datafile format. Second, it is used to create an array of field lengths. As long as the FILE_FORMAT include file lists field name and field length side by side, the file format will always be properly created. Third, I include the FILE_FORMAT within my print function to create a custom report that perfectly (and automatically) matches the datafile format. Other macros, such as one to generate an array of field addresses, could be added to the C source should the need arise.
The generation of a program suited to access a different file format requires only one change in the C file. The definition of FILE_FORMAT in the C source must be set to identify the include file in which you describe the datafile. Then this source file is compiled and linked to produce an executable that can read and write the file format of the desired datafile.
If this ain't generic, I don't know what is!
Art Shipman
Westbrookville, NY
Thanks for your submission. It is definitely generic. I like your textual file description for each file. It permits general file manipulations. That leads me to suggest that one could write a package (or create a C++ object) that reads a textual description file similar to yours. It would have a set of functions to access the fields in the file. The calls might look something like
char value_returned[MAX_SIZE];
file_id = open_fixed_record_file("arts.ff");
get_number_fields(file_id);
get_field_value(file_id, 2, value_returned);
get_field_name(file_id, 2, value_returned);
This package could be expanded to read any type of file (fixed record, dBase, Paradox, etc) without change in the user interface. (KP)