January 1994/Questions and Answers

Columns

Questions and Answers

Lint for C++

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member of the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
A few months ago, a reader asked a question regarding the availability of Lint for C++. I replied that there was no product available at that time. In response to the column, Gimpel software, makers of PC-lint, sent me a beta copy of their new PC-lint for C/C++. I've used their C product on a number of programs with great success in finding obscure bugs — especially on programs I inherited form other people. Their new version provides the same types of error analysis for C++ programs as the older versions did for C programs.
The new version of PC-lint adds a number of C++ specific error messages. PC-lint analyzes the relationship of class data and class member functions and provides warnings about some common errors in class design. No existing compiler that I know about flags these errors. For example, PC-lint can report when a destructor for a base class is not virtual or does not exist. In this situation, using a collection of base class pointers that contain pointers to the derived classes will result in a call to the wrong destructor.
Another common mistake that PC-lint catches is the use of a constructor for a class that calls new, without having a copy constructor or assignment operator declared for the class. Normally a class that allocates memory requires the declaration of an explicit copy constructor and an explicit assignment operator. These member functions usually need to allocate memory, unless the class implementation uses some form of reference counting.
The new version of PC-lint also adds a few more C warnings, such as the warning issued for a missing semicolon at the end of a structure definition. Heeding this warning would eliminate the relatively obscure compiler messages that result from code such as the following:
struct my_struct
    {
    int member;
    }
function(int x)
    {
    ...
    }
An important purpose of the original lint program was to catch parameter type mismatches. C++ has eliminated the need for this check by making function prototypes mandatory, but it has added the potential for a number of problems resulting from the misuse of classes. According to Gimpel Software, the list of possible C++ warnings that are analyzed will expand in the next version.

A Discussion of Object Orientation
I just spent a week teaching an advanced C course. Most of the prepared material covered syntax and construction of complex objects, but all of the students were more interested in the design of objects.
Designing good classes is one of the most rewarding aspects of C++. Using bad classes can be one of the most frustrating. Therefore, it's useful to develop criteria for determining how well a class is designed.
The subjective measurements of a class's quality include coupling, cohesiveness, sufficiency, primitiveness, and completeness. Coupling is the degree of interdependence between objects. Classes that are friends of other objects are highly coupled; classes that just depend on the existence of other objects are loosely coupled.
In a highly cohesive class, the data and function members work together as a single abstraction.
The last three criteria, sufficiency, primitiveness, and completeness. A class should have a sufficient interface, which is one that provides all required operations. The class needs to contain all primitive operations, which are those requiring access to the hidden implementation. Finally, a class is complete if it provides all possible operations that a user might want to perform on it or with it. A class that is primative is easier to port to another system or to modify, as it has fewer member functions than a complete class. On the other hand, a complete class can be easier for users to employ in their code.
An example is in order here. Suppose you created a file class called File. The most primitive interface for this class might look like:

class File { public: File(char * name, int mode); ~File(); int read(char * buffer, int length); int write(char * buffer, int length); };
With this interface, the constructor might throw an exception if it could not open the file. Alternatively, read and write might return the appropriate error indication if the file could not be opened. The destructor will close the file if the constructor was able to open it. For simplicity's sake, I used an int for the constructor's mode parameter. You could use some type of enumerated parameter (e.g. enum FILE_ACCESS_TYPE {READ_ONLY, WRITE_ONLY, ....} ) instead for a clearer functional interface.
The following primitive interface is similar:

class File { public: File(); ~File(); int open(char * name, int mode); int read(char * buffer, int length); int write(char * buffer, int length); int close(); };
Notice, however, that the meaning of class File has changed. File no longer represents a particular file. Now, an object of class File can be reused in the same scope with a different file. The interface is still primitive. In particular, there is no seek function to go to a particular byte in the file. A user who needs to perform a seek, can use read and throw away the intervening data. To get to a previous position in the file, close the file, reopen it, and perform another read. Since seek can be implemented as a combination of primitive operations, it is not a primitive operation.
Since many operating systems provide a seek operation, it would be more efficient to include that as part of the interface. On systems that did not have seek, the implementation could fake it with discardable reads. So a slightly less primitive, but more efficient interface would look like:

class File { public: File(); ... long seek(long position, int direction); };
As a side note, you could replace the long parameter with a typedef, as the standard C fseek function uses. You could implement the direction parameter as an enumeration.

Sufficiency
A typical user might want to find out the current file position. The primitive interface does not provide an operation for determining file position. The user could determine it by keeping track of the reads, writes, and seeks. Since this facility is commonly needed, a sufficient interface could be written as follows:

class File { public: File(); ... long seek(long position, int direction); long tell(); };

Toward Completeness
Users may occasionally reposition files to the first byte, using seek(0, 0). You might code a convenient macro for this operation as follows:

#define rewind() seek(0,0);
If rewind were a common operation, you might want to add the function to the interface. The function is not necessary, but may be generally useful. Adding a rewind function would make the interface more complete. The question is when to stop adding functions. Should you also include a

search_for_a_byte_value(int byte_value)
function? Or should you let the user write a private version? The more member functions a class contains, the more overwhelming it can be. The fewer functions a class contains, the more code the user may have to write.

Objective Criteria for Class Design
A few implicit objective criteria for adding functions exist. First, an object should perform the operations requested of it by the user. If the object cannot perform an operation, it should notify the user through an error return, an exception, or by some other means. Second, an object should do no harm. Using an object should not cause memory to be overwritten, or cause changes to other objects, unless those actions were truly intended. (Science fiction buffs may notice that these criteria are similar to Isaac Asimov's rules for robots.)
The following example demonstrates the need for these rules. I spent a few days trying to use an object-oriented user interface generator for Microsoft Windows, DOS, and other systems. The documentation for the system assumes that you are new to C++, as it covers the language in some detail. On the other hand, the system uses pointers and pointers to tables of pointers to objects as part of the programmer interface.
One of the objects the system supports represents a vertical list. One of the member functions for the vertical list object loads the object from persistent storage. The member function must open a corresponding file to get the data for the object. Unfortunately, the function failed to close that file. Not even the destructor for the object closed the file.
In my application, this function error did not manifest itself until the fourth time through a particular series of operations. The program ran fine until it suddenly was unable to open a data file. I spent a number of hours to determine what unapparent error on my part had caused such a problem. The answer: my application had exceeded the open file limit; the object's storage file had been repeatedly opened without ever being closed.
The failure of this object to obey one of the two "objective" rules (to not cause harm) definitely caused a lot of human grief.

Questions and Answers

Pointer Types
Q
The C-program in Listing 1 generates an unexpected warning when compiled using version 2.3.3 of gcc (no options specified). The diagnostic is:
test.c: In function 'main':
test.c:12: warning: passing arg 1 of 'sub2' from incompatible
pointer type
The diagnostic is tied to my prototype on line 7 where I declare a single argument to be a pointer to an array of const int dimensioned DIM2. This should prevent any obvious assignments to the parameter a within sub2, which was my intention.
I've run this by several people (including gcc tech support) and some agree with the compiler and others do not. I would appreciate your opinion.
Michael G. Soyka
Warren, RI
A
This is a great followup to a question from last month. In that issue, I discussed the datatypes of variables such as:
int int_array2d[1][10];
The type of int_array_2d usually reduces to (int (*)[10]) and int_array usually reduces to (int *), so:
pointer_to_int_array = int_array_2d;
pointer_to_int = int_array;
are compatible assignments. I use the phrase "usually reduces to" to emphasize that the type of an array is not the same as a pointer type. Let's add a const term to one set of these variables, as in the following:
const int int_array_of_const[1];
const int *pointer_to_int_const;
The type of int_array_of_const usually reduces to (const int *), so the assignment of
pointer_to_int_const = int_array_of_const;
is proper. When using two-dimensionional arrays and pointers, typing becomes somewhat more complex. For the declarations
const int int_array_2d_of_const[1][10];
const int (*pointer_to_array_of_const_int)[10];
the type of int_array_2d_of_const usually reduces to (const int (*)[10]), so the assignment is compatible. Now, the reduced type of int_array_2d from the first example was (int (*)[10]). That reduced type is not the same as (const int (*)[10]), which is the type of pointer_to_array_of_const_int. So
pointer_to_int_array_of_const = int_array_2d;
yields a compiler warning, since these two expressions represent pointers to incompatible types. One expression represents a pointer to a const array, the other to a non-const array.
A pointer to an int can be assigned to a pointer to a const int. This assignment simply adds "constness" to the object originally pointed at by the pointer to int. The C Standard does not include an example of this operation, but its effect seems to follow from the definition of const. The reverse operation is not allowed without a cast, as that would be taking away "constness."
It seems appropriate to me to allow a programmer to add "constness" to an object at any level. I discussed this concept in a conversation with Chris Skelly, who has written for The C User's Journal and has written a book (soon to be published) on pointers. We agreed that it seems proper to be able to add "constness" to a data type with an assignment.
However, according to P.J. Plauger, the C Standard does not require that a pointer to array of const T be assignment-compatible with a pointer to array of T. Some vendors may provide this latitude while others may not.
I tried your program with the Borland C++ 3.1 and Microsoft C++ 7.0 compilers. Borland accepts it without complaint. Microsoft generates a message similar to gcc. I altered your original example as shown in Listing 2 to better illustrate the problem.
This program yielded an error reading:
error C2440: 'initializing' : cannot convert
from 'int [1][1]' to 'const int (_near *)[1]'
The preceding diagnostic is valid for the line
const int (*pointer_2d_to_const_2)[DIM2] = array_2d;
For reasons I explained before, the diagnostic may not be clearly worded, although it is valid. Note that the compiler had no problem with
const int (*pointer_2d_to_const_1)[DIM2]: array_2d_of_const;
which implies that the conversion from const int [1][1] to const int (_near *)[1] is acceptable.
Interestingly enough, the diagnostic does not appear when the program is compiled as a C program with Microsoft. This discrepancy is probably due to the more extensive type-checking performed by C++ compilers.
One way to solve your problem is shown in Listing 3. I use a typedef to eliminate one dimension from the other declarations. The use of typedef makes the array and pointers equivalent to the following declarations:
int int_array[1];
const int *pointer_to_const int;
Along these same lines, I ran into an "interesting" problem when I needed to convert some old C code from the Microsoft large model to C++ under the medium model. Nobody would do this if they want to maintain their sanity. However, the requirement existed, so I plunged ahead. I did the conversion in two steps — first to large model C++, then to medium model C++.
The first step went relatively smoothly. I had already determined the interface to the C++ objects and created test implementations of the objects. I linked the overall test program using the test implementation and the results came out correct. Then I began the second step. My addition of __far to every pointer seemed to work fine. The compiler warned me about the few places that I missed, except... And it was a big exception:
One of the original function prototypes was defined as follows:
function (char *array[])
array was an array of pointers to character strings. Since I could use either pointer or array syntax for an argument, I would expect to be able to convert the prototype to:
function (char __far * __far array[]);
I used the first __far because the array was going to contain elements which were far pointers to char. The second __far was for the array itself. This declaration should be logically equivalent to
function (char __far * __far * array);
In fact, Microsoft's compiler accepts the former declaration form without a hint of a warning — but the code compiles incorrectly. Garbage addresses are passed. Memory is overwritten. Windows crash. Beeps go off. For the first time in a long period, I had to invoke a debugger. With the switch to the second form of the declaration all the problems went away.
This address problem occurs only with multiply-dimensioned arrays; it's similar to your const problem. There appears to be a fundamental disagreement in how to bind __far to arrays versus how to bind it to pointers. Since __far is not in the C Standard, compiler vendors can do it any way they wish. I just wish they would give me a warning.