Columns


Questions & Answers

Linked Lists, Strings, and Internationalization

Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).

Q

Just got the August CUJ, so please excuse the response delay. To sort a linked list quickly, count the elements, allocate an array to hold them all, then sort the array. Pick off the elements to rebuild the list. Or else use a radix ("Postman's") sort, if possible. Or use a merge sort.

Any sort not relying on element exchange will serve to sort a linked list, though it will of course need minor modifications. Quicksort becomes a merge sort, except that partitioning is based on a pivot rather than being blind (half the elements go in each sublist).

Next, in your "array of strings" answer you have the statement:

printf(get_string(1));
SHAME! If there are any % characters in the string, nasty things may happen. Use:

puts(get_string(i)); /*adds \n */
or

printf ("%s", get_string(1));
Keep up the (otherwise) good work.

David Chapman
San Jose, CA

A

Thanks for your reply. As I suggested in my original article, if one needs to sort a linked list, it might be just as simple to keep an array of pointers to the data items and sort that instead.

Thanks also for pointing out the potential problem with the format specifier. Most likely, the percent character (%) will not appear in text followed by any character other than a space. This will be an invalid conversion specifier whose behavior is undefined by the ANSI standard. This is another "little gotcha" of the type described in the answer to the next question.

On some machine/compiler combinations, the percent sign will show up on the output. On others, it will disappear. Of course if % is followed by a legitimate conversion character, garbage will appear on the output or the program may bomb.

If the specifier was for a float number (e.g. %f), then it is possible that the value picked up will be an invalid floating-point number and a floating-point exception will occur.

In practice, the return value of get_string will not usually be passed to printf. Instead, it will be a parameter to some display function that will display it with an attribute and/or in some window.

For those further interested in making programs portable, you might want to investigate the internationalization functions that are specified under the Xopen Portability Guidelines (XPG). These functions are available under OSF UNIX, and probably are available on several other systems. Xopen is a consortium of European computer vendors.

A brief description of the design of the functions may help you develop your own if you are on a system that does not support them.

The functions use a message catalog file that is specific to a single language. The catalog file contains all strings that need to be translated to another language. The files for each language are kept in a separate directory. The strings in the catalog are arranged in sets. Each set has an identifying number and each string within the set has a unique number within that set.

The initial function, catopen, opens the appropriate file corresponding to the filename passed to it. The directory for the file (i.e. the language of the file) is derived using the environment variables NLSPATH and LANG. catopen returns a catalog identifier used by the remainder of the functions.

The function corresponding to the get_string example is catgets. Its four parameters are the catalog identifier, a set number within the catalog, a string number with the set, and a default string. It returns a pointer to the corresponding string. If the routine is unable to access the particular string, it returns a pointer to the default string.

Listing 1 and Listing 2 give an example of the use of the file. The example code produces:

   String to output
on standard output if the file is accessible and

   I can't find it
if the file is not accessible.

The message catalog is actually composed of two files. The first file is an ASCII file. A compilation program uses the ASCII file as input and creates the actual file read by the program. This makes the lookup quicker as either a single or double seek is all that is required. The compiled file can be read in as a whole if it is not too large.

The catalog can use symbolic names, rather than absolute numbers. The compilation program generates a header file for use by the calling functions. Obviously this is a far superior method, as it is easier to match the strings with their intended usage. Listing 3 and Listing 4 give an example of the use of symbols.

Variable Argument Lists, Portability, and Gotchas

Q

Regarding the answer you gave to reader Willi Fleischer of Moerfelden, Germany (CUJ, September 1992, page 114) on the use of variable parameter list, I have tried to solve a problem of the same kind: to pass the variable arguments from a function to a function (which can be nested). The solution I found, though it needs to be verified on different platforms (I tested on SGI and IBM UNIX workstations), is to call the function that needs the variable argument list and to prevent it from adding any element to the calling function's stack.

For example, in Listing 5, Msg will call PrintString in order to convert the printf argument list into a static string. As you can see, PrintString is the first function/statement to be executed in Msg after declaring line. If not, you will fail (try if(1) line = PrintString;).

I understand this is somewhat restrictive, but it can be handled easily in many cases. Its advantages are: making PrintString syntax simple (no need for _direct and _va_list functions of the same kind), and making the functions that call it not have to deal with va_start and so on.

Joseph Z. Wang
White Plains, NY

A

I consider this in the land of serendipitous trouble. A particular set of operations works on a couple of computers, but is not really part of ANSI standard. The dead giveaway is the fact that it does not work if another statement is inserted into the code.

PrintString is only being passed one parameter (fmt). The way the parameters are passed on this particular set of machines, the va_start macro coincidentally sets up args to point to the first parameter after fmt. If the stack pointer is altered by the calling function (such as you suggested), the coincidence disappears.

Unfortunately (or fortunately, depending on how you look at it), the only portable ways to pass the variable parameter list is by the method suggested or one that follows along a similar line.

I teach a class in how to make C portable. It turns out there are a few "gotchas" that can appear while using the language. Code appears portable, as it works on a few machines, but in reality, it is not. Your code is one example. The previous question regarding the % in a string passed to printf is another example of non-portable behavior.

The C standard lists nine pages of unspecified, undefined, or implementation-defined behavior. Many of the items concern situations that a good C programmer would avoid anyway. Some others are of the "gotcha" variety. For example, the order of two equal members in an array sorted by qsort is unspecified. Some implementations may leave equal members in the original order. Going to an implementation that does not may produce some unexpected surprises if your programming was counting on the original behavior.

Many of the portability problems can be pointed out by a good lint. For those new to C, this program is a source-code analysis program that comes with many UNIX systems and is available separately on PC systems. I have used PC-Lint by Gimpel and found it very satisfactory, in many cases superior to UNIX lints I have tried.

For example, the following loop may set each element of the array to the value of its index:

int array[10];
int i;
for (i = 0; i < 10;)
    {
    array[i] = i++;
    }
Then again, it may not. The value of i may be incremented before or after it is used as the index value for array. The order of evaluation of the assignment operator is not specified by the language. Nor is the time specified that the increment will take place except that it is after the value of i on the right hand side is obtained and before the semicolon is reached. Any of the lint programs should point out this error of dependence on order of evaluation.

My suggestion for writing portable code is to keep it simple. Another author suggests that "well written code is portable." In a paper by A. Dolenc, A. Lemmke, D. Keepl and G.V. Reilly, "Notes on Writing Portable Programs in C," the authors state that the expression f(p=&a, p=&b, p=&c) is ambiguous. The standard permits it to be interpreted as f(&a, &b, &c) or f(&c, &c, &c) [or many others — pjp]. My immediate response is that one should have coded it in whichever of the forms was really required, followed by the appropriate assignment for p.

By the way, the paper includes some excellent suggestions for making your program portable. They would permit your program to be backfitted to old K&R compilers, if you need to do that.

I will admit that I do not code in an absolute portable fashion. It takes a lot of hard work. The one facet which I ignore entirely is the limitation of six characters of significance in an external name. Although the ANSI C library must adhere to this limitation, I find it difficult to obey it while writing a maintainable program. If I come across that odd system that requires it, I will simply make up some header files that look like:

#define my_reasonably_named_function(a,b,c) IHE345(a,b,c)
#define another_useful_function(d,e) IHE346(d,e)

Truncating a File in Place and Portability

Is it possible to truncate a file in place using ANSI C? Currently, I am copying the truncated data to a temporary file, removing the original file, and renaming the temporary file to the original file's name. Is there a more sophisticated method? BSD4.3 UNIX provides the (non-portable) system function calls truncate and ftruncate. Do these implement the method described above?

Jeff Dragovich
Urbana, Illinois

A

The answer is as you have just described it. The BSD functions can play games with the inode (which contains the length of the file and data block-allocation information). Obviously you can run out of space when you truncate the file, as you will be making a copy of it.

Since our theme is portability this month, let me suggest how to make functions like these portable. You can create a compatibility library of functions that you commonly use. For example, instead of using ftruncate, you might make up functions as:

int file_truncate_by_fd(int file_descriptor, unsigned int size);
int file_truncate_by_name(char *file_name, unsigned int size);
These functions would be kept in a package of file functions whose implementation might vary from machine to machine. A header file, say file.h would be used for the function prototypes and any other #defines or typedefs required by your functions.

In your BSD version of the library, these functions could call truncate and ftruncate. Alternatively, you could #define them in the header file to be the appropriate call, so the functions will be called inline, without an extra function call layer. For the MS-DOS version, the functions would perform the operations you listed.

For all versions, the functions should return an error indication if the truncation was unable to be performed. Note that it is logically impossible for the BSD version to fail to truncate, while the MS-DOS version may fail if there is insufficient disk space.

scanf and ANSI

I have some comments about the sscanf question in the September 1992 CUJ. I realize this is probably the thousandth letter you've received about this, but I've run into similar sscanf problems in the past and have a vested interest in these questions.

According to my reading of Harbison and Steele, (I don't have a copy of the ANSI Standard document), sscanf should return 4 on string[0], and 5 on string[1]. I suspect the return of 4 may be the result of an error in the code as published (should string[0] b e garbageR A 3 0 4 ?), but, as you pointed out, string[1] scanning using the [^PR] conversion operator will stop immediately upon seeing the R; no conversion will take place. Since the assignment-suppression operator is involved, this won't affect the rest of the conversion. No pointers would be consumed in either case. Conversion will then pick up at the first %c, and continue with the other char and the three ints, all of which are present in string[1]. Am I correct in this, or am I missing yet one more subtlety in sscanf formatting?

By the way, sscanf behaves this way in Borland C++ version 3.1; previous versions of Turbo C/C++ would, believe it or not, cause a null-pointer error with this usage of sscanf.

Thanks for your attention in this.

Dan Rempel
Victoria, B.C. Canada

A

Thank you for your sharp eye in catching the code error. I'll repeat the relevant code here, so our readers won't have to go back. The strings (with your correction) are:

char *string[] = {
   "some garbageR A 3 0 4",
   "R A 3 0 4"
   };
The sscanf with its format looked like:

const char *format = "%*[^PR]%c %c %d %d %d";
ret = sscanf (string[i], format, &l, &q, &k, &m, &n);
When scanning string[0], the * suppresses the assignment of any set of characters which match the [^PR] specifier. That specifier matches any nonempty sequence of characters that are not P or R. This directive eats up the characters "some garbage" in string[0]. The value R is then assigned to 1, and the remaining assignments take place normally.

For string[1], there is no sequence of characters that match the initial [^PR] specifier. The directive fails with a matching failure (inappropriate input, according to the standard). When that directive fails, sscanf returns with a zero value as no assignments have taken place.

As I read the C Standard, if a match to a directive fails, even if there is assignment suppression, the function terminates at that point. The order of the operations as specified by the C Standard is to match the specifier and convert it to the corresponding type. It then checks for assignment suppression.