April 1993/Questions & Answers

Columns

Questions & Answers

Message Catalogs

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet) and on Compuserve 70125,1142.
Q
I was reading your column entitled, "Linked Lists, Strings, and Internationalization," in the January 1993 issue of The C Users Journal, and was particularly interested in the section about the X/Open message catalog. Yours is the first description of the layout of the file I've seen.
We have Sun workstations here and are planning on using a message file scheme, but Sun has no documentation about the format of a message file, despite supporting gencat, catopen, etc. Where can I find information about the format of message files (the text file that we'd create, not the .cat file), preferably with examples?
I enjoyed the article — it was very interesting and well written. I also agree with your comments about Gimpel's PC-lint/Flexelint — I swear by it. Many thanks.
Alec Sharp
Louisville, CO
A
The information I have on message catalogs comes from my teaching OSF UNIX. The DEC documentation for this system explains precisely the organization of these message catalogs. I think similar information should be available from X/Open, since that is where the interface originated. I think it would be ironic if you had to purchase the documentation for a Sun UNIX feature from OSF.

Strings Again
Q
I am writing in reference to your comments on how to handle strings in the August 1992 issue of The C Users Journal (page 132). The presented approach is quite acceptable, but there is something I do not like. Whenever the programmer introduces a new string, he or she must do three things — add the string to the array (char *string[]=..); define the corresponding constant (#define INIT_STR 1); and use it (printf(get_string(INIT_STR));). I think the programmer should only do two things: define the string and use it. For example:

defstr ( InitStr, "Init String" ) ... printf (InitStr); or printf (getstr(InitStr));
Of course, having multiple identical strings in code is not acceptable.
One possible solution (tested with Borland C++ compiler) is shown in Listing 1. The file STRDEF.H contains the string definitions. File STRDEMO.C illustrates how the string definitions are used. There are two additional files: STRHNDL.H and STRHNDL.C, but being written once, they do not need to be changed any more. All the strings appear as global data in file STRHNDL.C. The file STRHNDL.H provides the necessary extern declarations making the strings accessible. The same approach could be easily adapted to handle multilingual messages. This is illustrated in Listing 2.
This solution is far from being perfect — at least, because of using a lot of global data. I am sure there are better solutions. Perhaps you could suggest one. Also, I am interested in your opinion about using an object-oriented approach to the problem.
Stefan Ganev
Bourgas, Bulgaria
A
I agree with you that having to create a #define INIT_STR and to put the string into an array seems like double work. There is always the potential for a mismatch between the value of INIT_STR and its position in the array.
Your macro approach helps eliminate this problem. I would suggest a minor change in your second example. If you concatenate the name of the array in your #define defstr(x,y,z) macro, the name space of the character arrays would be practically guaranteed to not conflict with any other external names. The replacement might look something like:
       char *STR__##strx[] = { y, z };
The same change would be made in the other places that you refer to the array.
I explained the X/Open alternative in the January issue. In that system, you can use identifying names with the strings and it will produce a header file with those names #defined. That eliminates the mismatch and extra coding.
You asked about an object-oriented approach to this problem. A goal of my design would be to make the calling program not aware of where or how the strings are actually stored. That is, the calling module should not have to be recompiled if just the strings are changed. Preferably it should not have to be relinked, as only resource values would have changed. Your approach meets most of these criteria, except for the necessity to relink.
I would define a StringResource object as:
    class StringResource
        {
    public:
        StringResource(Language_Identifier lid);
        String get_string(String_Identifier sid);
        //...
        }
A header file would contain the values for your particular application. This could be generated automatically from a text file that looked like:
INITIAL_STRING "This is the first string"
ANOTHER_STRING "This is another string"
As an example, the header file my_prog.h might look like:
    enum Language_Identifier {ENGLISH};

    enum String_Identifier {
        INITIAL_STRING,
        ANOTHER_STRING,
        //...
        };
A calling program would look something like Listing 3. I used the stream class for input and output and assumed that the operator << was overloaded for Strings. If your String class had a conversion to char *, you could use the Standard C library functions, such as puts(sr.get_string(INITIAL_STRING)). I would not define such a conversion for the StringResource class.
Whether you decide to compile the string file into the code or keep it in some external file is completely transparent to the calling program. The get_string function returns a String object that contains the requested characters in either case.
The StringResource example here assumes that you only require one set of strings per program. That is the typical case. Going to multiple sets complicates the picture. Readers might send in their suggestions for what a multi-set design would look like.

Truncating a File
In the January 1993 issue of The C Users Journal in the question titled "Truncating a File in Place and Portability," on page 106, you say "For the MS-DOS version, the functions would perform the operations you indicate... while the MS-DOS version may fail if there is insufficient disk space." The operations indicated were copying the file and deleting and renaming.
I thought you should know that both the Microsoft C compiler and the Borland compiler offer a file truncate function, chsize (or _chsize) which can truncate a file in place, just as the UNIX approach may. Further, this is a single DOS call. It's quick and has no disk free-space problem. For those without an equivalent function in their runtime libraries, the trick in MS-DOS is simply to write to an open file handle with a count of zero.
Anyway, your main point (about writing a single function that is portable, whose implementation uses OS dependent code) is right on. I just wanted to point out something many long-time DOS programmers don't realize; a file may be easily truncated or extended, without copying. Regards, and keep up the good work.
Dave Angel
A
Thanks for your comments. Truncating files is not something I do often. In fact, the only time I can ever remember doing it is demonstrating how to use the function on UNIX.
I just like to expand a bit on the main area of discussion. It all boils down to what you consider your standard interface to the environment in which you program. This standard interface consists of the set of functions which you can expect to have on any system. The Standard C library is a good starting place for such a set of functions. Actually when using C++, I hide even the standard C functions from use in my programs. I also hide the iostream class, even though its interface is becoming more standard.
I basically have created a set of classes and methods for each class which are as primitive as possible. The member functions are easily written in terms of Standard C and operating system functions.
This permits me to easily port my code to any environment which may not support either the Standard C library or the C++ streams library. The functions are simple enough to implement for whatever the system supports.
The class for File looks like Listing 4. I have not reproduced all the classes and their implementation in their entirety for space and copyright reasons.
The File::open, File::close, File::read, and File::write member functions call their respective operating system functions. On UNIX and MS-DOS, these are the open, close, read, and write functions with the appropriate parameters supplied.
The Standard does support unbuffered reading and writing using fopen and setvbuf calls. To run on a system that only supports those calls one needs only to change the way Internal_File::open works and to alter the appropriate calls in each of the other member functions.
This class can also eliminate the problems of the mode (text or binary) of the file opened by open. One vendor's implementation of using a global variable for the type of file has caused me grief in the past.
The testing program for this class has code that looks like Listing 5. You may wonder where the number of bytes read or written is being kept. It is part of Internal_ByteArray.
Now stout-hearted C programmers will probably be upset at my use of a separate function to return the error code for an operation. It actually gets returned from Internal_File members. The Internal_File::error is an inline function that simply returns the value of the last error set. I could have set up a variable as:
       Internal_File::Error error;
and programmed as:
       if ( (error = new_file.write(buffer) )
           != Internal_File::No_error)
           {
           // Do something about it
           }
But I didn't.