February 1994/Questions & Answers

Columns

Questions & Answers

A Tricky (But Important) Type Distinction

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for programmers and All On C, and was a member of the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.

Heap or Stack
Regarding your Q?/A! column "Heap or Stack — Which Should You Use" in the October 1993 CUJ: On page 122, col. 1, bottom, you state "The type of char_matrix[4] is pointer to char...". Actually, it is "array of 10 char." You make this distinction clear elsewhere in the article. Like all such array pointer differences, this one really only becomes visible if you do sizeof(char_matrix[4]), which reports 10, not sizeof(char*). I enjoy your articles — keep up the good work. This is a mere quibble, but I thought you'd like to get things exactly right.
Craig Berry
Thanks for the sharp-eyed correction. For readers who have lost their October issue, the declaration was:

#define NUMBER_OF_CHARS 10 #define NUMBER_OF_STRINGS 5 char char_matrix[NUMBER_OF_STRINGS] [NUMBER_OF_CHARS];
char_matrix[4] is an array of 10 chars, as Mr. Berry notes. Its type reduces to char *, so you can code the following without warning:

char *p_char = char_matrix[4];

Char Pointers
I really enjoyed your article in the October issue of CUJ. The examples that you presented are very practical examples of allocating memory for arrays. Most books on C fail to provide such practical examples and assume that everyone simply defines strings inside code hardwire style such as:

char char_array[3] = "123";
I wish I had had a copy of your article four years ago when I had to figure out how to allocate strings through experimentation. Keep up the good work!
Russell Thrasher
Austin, TX
Thanks for your feedback. If you object to the literal "123" appearing in the source code, you can bet that C programming books define strings this way for the sake of simplicity. As a general rule, it's best to avoid scattering literals, strings, or numeric constant definitions throughout the body of a program. Usually, these constants need only appear in #defines at the beginning of the source file where they're easy to find. In some cases you can eliminate constants from the source file entirely by reading the constants into the program from a seperate file at run time.
Keeping string constant definitions out of a program's source files allows you to alter the user interface appearance without changing the executable program. This capability is especially important in the international market, where you may need to change the (human) language of a user interface. The example in the October issue demonstrated one way of eliminating constant definitions from the source code entirely, but there are lots of other methods.
For example, the language features of UNIX (NLS), such as the message catalog, provide a standard method for storing strings. You can use the resource files for the MacIntosh and for Microsoft Windows to store strings in a program. You can configure the .Xdefaults file for X-Window (Motif and Open-Look) to store all strings for an application. Finally, you can find commercial products that will extract strings from a code file and place them into files similar to that referenced in the October example.

A Class for String Storage
Following along the lines of the previous letter, I will show how C++ classes can eliminate most char *pointers and the problems they create, as well as cut down on portability problems between various implementations of strings.
To start, I define a String class as:

class String { public: String(const char * string); operator const char *(); ... private: char * data; ... };
At this point, I don't specify much of an implementation, as that should not be of concern to the user. I include the cast operator to a const char *only for back-fitting to functions that require char *pointers. In a more abstract interface, all functions in the program would only require either a String or a String & (reference to a String) parameters.
A Collection_of_strings class could have the following interface:

class Collection_of_strings { public: Collection_of_strings(const char * identifier); ~Collection_of_strings(); String get_string_by_index(unsigned int index); ... private: String * strings; unsigned int number_of_strings; };
The user of this class might code a header file as follows:

#define MY_STRINGS_IDENTIFIER "my_string.dat" // These are the names for each string which represents its purpose #define ERROR_STRING 0 #define TITLE_STRING 1 ...
And the user's program might look like this:

void main() { Collection_of_strings strings(MY_STRINGS_IDENTIFIER); String a_string; ... a_string = strings.get_string_by_index(ERROR_STRING); ... printf("%s", (const char *) a_string); // or whatever else you need to do with it ... }
Note that this program does not need to concern itself with how the strings are stored in memory or on disk. The only information this program requires is the identifier of the set of strings and names or identifiers for all the strings.
Let's look at possible implementations of Collection_of_strings. The MY_STRING_IDENTIFIER might either identify an actual data file or a subsection of a standard file. The constructor would open the appropriate file. The constructor could read all the strings into a memory array and close the file. get_string_by_index( unsigned int index ) would use the index to locate the appropriate String and return it.
Alternatively, the constructor would perform an open and the destructor a close. get_string_by_index() would use the index to read the corresponding String off the disk. Depending on the system and the number of strings read, the I/O delay might not be noticeable. With a disk cache, the delay might only be apparent the first time a String was read.
If you are using NLS, Microsoft Windows, or the MacInstosh, the implementation of this class could load the string via calls to the appropriate functions. You would need to ensure that the values of the string identifiers matched on all these systems, but that is a bookkeeping problem, not a coding problem.
Let's make a few changes and add a few features to this class:

class Collection_of_strings { public: enum Retrieval_type {FAST_BUT_NEEDS_LOTS_OF_MEMORY, SLOW_BUT_LITTLE_MEMORY}; enum Error_code {OK, IDENTIFIER_NOT_AVAILABLE, LANGUAGE_NOT_AVAILABLE}; Collection_of_strings(); Error_code load(const char * identifier, Retrieval_type retrieval_type, Language language); ~Collection_of_strings(); String get_string_by_index(unsigned int index); ... private: String * strings; unsigned int number_of_strings; };
This class has a language selection capability, as well as the ability to utilize alternative retrieval methods.
When you design a set of classes, you face a tradeoff between individual class complexity and how many classes you must design. Sometimes using another parameter in the constructor decreases the number of classes and simplifies the class interface, although it may make the implementation slightly more complex.
Instead of adding parameters to the constructor, I created a separate load function. A constructor can only report problems via exceptions or by setting a flag that can be tested later; having a separate load function allows an error code to be returned. Since the likelihood is high that a requested language won't be available, a return code is a better choice than an exception as a reporting method.
The program source should contain a general language header file, say "language.h" that contains:

enum Language {ENGLISH, SPANISH, FRENCH, ....};
The program may also require a file that contains strings with standard or alternative spellings of languages for user interface programs. For example:

struct Language_equivalent { enum Language code; char * string; }; Language_equivalent language_equivalents[] = { {ENGLISH, "English"}, {ENGLISH, "American English"}, {GERMAN, "German"}, {GERMAN, "Deutsch"}, ... };
With this Collection_of_strings class, the calling program might appear as follows. (This example does not handle every possible error problem, but it demonstrates the general approach.):

void main() { Collection_of_strings strings Strings::Error_code error_code; error_code = strings.load(MY_STRINGS_IDENTIFIER, FAST_BUT_NEEDS_LOTS_ OF_MEMORY, FRENCH); switch (error_code) { case OK: break; case IDENTIFIER_NOT_AVAILABLE: cerr << "String data not available " << MY_STRINGS_IDENTIFIER; exit(1); break; case LANGUAGE_NOT_AVAILABLE: cerr << "Language not available " << "English selected"; error_code = strings.load(MY_STRINGS_IDENTIFIER, FAST_BUT_NEEDS_LOTS_OF_MEMORY, ENGLISH); break; } ... String a_string; ... a_string = strings.get_string_by_index(ERROR__STRING); ... cout << a_string; // or whatever else you need to do with it ... }
You could implement this class with NLS fairly easily. Implementing it with Microsoft Windows or the MacIntosh would be a bit more difficult, since the resources, although logically separate from the code, are bound to the executable file. With X-Window, you could load an .Xdefaults file based on the particular human language being handled by the user interface.
A particular implementation might not be able to support efficiently both Retrieval_types. You could add Error_codes to report that the desired type was unavailable and an alternative type was employed. You might even try out intermediate Retrieval_types, which could attempt to trade off memory space for speed. However, that might unduly complicate the implementation.
Q
I have just read your comments on lint for C++ in C Users Journal with the greatest interest as I was about to contact Gimpel with the same question that Sue Lindsey posed to you.
I have found lint (with strong type checking) to be the single most valuable tool and source of education in using C, and rely on it heavily. However, I've been following the C++ and object-oriented programming literature and with a current software project exceeding 10,000 lines of source code (for the first time for me) I can readily appreciate some of the advantages C++ potentially offers.
I don't have the luxury of much "dabbling time," and have been nervous about making a difficult move which would also deprive me of my trusted lint! If your statement that C++ implicitly takes care of most of the problem areas in C that lint covers is really true, that would be one of the most powerful reasons I have heard for moving from C to C++. I have never seen an equivalent view expressed before, and I just want to hear you say it again!
Could I suggest that you expand on this a bit in a future column, perhaps even seeking some input from Gimpel? I'm thinking of the number of times lint has saved me from what would have been a hard bug to find by a "conceivably un-initialized" warning in a rarely-taken program branch (for example). I don't see anything in C++ that would help there; perhaps I'm wrong!
Finally, this gives me a welcome opportunity to thank you for your excellent columns in CUJ — I always read them first and, as a fairly isolated practioner, find most OF them informative and entertaining.
Rob Sherlock
I described Gimpel's PC-lint for C++ in last month's column. If you were hesitant about moving to C++ due to the lack of a lint for the language, you need wait no longer.
As I noted in my answer to the original lint question, C++ compilers (as well as many C compilers) have become more lint- like in their warning messages. However, compilers still vary in recognizing potential errors. lint reports conditions that are not detected by all compilers. For example, some compilers may not recognize

if (a_variable=0) another_variable=5;
as a potential error, but lint will.