September 1993/Questions & Answers

Columns

Questions & Answers

Lint for C++?

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for Programmers and All on C, and was a member of the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
Q
I have been using lint for a number of years with C. Now that I am moving to C++, I am wondering if such a product is available or the language. Do you know of any vendors? What types of features should I look for?
Sue Lindsey
La Mesa, CA
A
Like you, I had been using lint (in my case, Gimpel's PC-Lint) with C, but since I've begun using C++, I haven't even noticed its absence. I do not know of any C++ lint, but I'm also not sure what might be the utility of such a tool.
Let me review briefly what lint does. The program examines multiple source files that make up a total executable. One of its original purposes (predating the introduction of prototypes in ANSI C) was to make sure that the arguments used in a function call matched the parameters in the function definition. lint also caught suspicious usage of expressions and variables, such as using a variable before it was initialized or not using a variable at all. The compilers available at that time did not flag these potential errors.
Today's compilers implement many of these bug-checking capabilities, but PC-Lint can still find many more potential errors (its features include format checking on printf and scanf statements). Particularly useful is the option of strong type checking, whereby typedefs are considered to be types. For example, given
typedef int COUNT;
typedef int INDEX;
void function(COUNT count, INDEX index);
then

COUNT c; INDEX i; function(i, c)
will produce an error. Even though both i and c are ints, they are declared using typedefs. The parameter mismatch will be reported, as you are calling with function(INDEX, COUNT).
Contemporary C compilers do not have the capability of typedef checking. Thus, previous program error would go undetected by the compiler and would generate a bit of grief for the burner of midnight oil.
How does this relate to C++? Prototypes are required, as opposed to being optional in ANSI C, so that parameter checking is automatic. Strong type checking is built into the language. A class can be defined so that it would be hard to abuse. The code

class COUNT {...}; class INDEX {...}; void function(COUNT count, INDEX index); ... COUNT c; INDEX i; function(i, c)
would generate an error with a C++ compiler unless conversions between COUNT and INDEX had been explicitly defined.
If you use the stream classes (e.g., cin >> i and cout << "abc"), then the compiler will perform the equivalent of type checking on the printf and scanf format strings.
C++ compilers have built upon their C predecessors' warning features and thus can generate many of the potential warnings. C++ has the potential for creating its own problems — such as failing to use delete in a destructor for a class in which the constructor calls new — but those are design flaws, rather than coding flaws. If a "super-lint" could catch those, then it might be useful.
It's true, that even the best C++ compilers do not report all of the potential problems that PC-Lint can find in C code, such as unusual shift values. However, the number of errors that a C++ lint might find over the C++ compiler is dramatically diminished.

Structures and Files
Q
As a subscriber to The C Users Journal, I have found your column very helpful, and hope that you may be able to help solve the following problem. The problem arose nearly a year ago, and the Borland analyst I spoke to confirmed it and said he was turning it over to their Quality Assurance people. I have heard nothing from them since, and long ago adopted a workaround solution. Since I am not entirely happy with the workaround, I am hoping that you may be able to identify the problem and suggest a better solution.
The objective of the code (enclosed with this letter [see Listing 1) is to store the PayRecord structure in a file for each month using fseek and fwrite. I found that the EOF pointer seems to vary with the numerical results being stored, and contrived the simple example in the enclosed code to demonstrate the problem. Using my example code makes the data unretrievable as successive months are stored. Consider the numerical example shown on the second page of the enclosure. When the double variable pay.last_payment is a repeated decimal, the EOF indicates a file of 65 bytes rather than the correct 64 bytes, which is the length of the structure PayRecord.
My workaround solution was to store each element of the structure as a character string, and perform the arithmetic operations with variables not used in the structure.
Is there really a problem here, or is my code in error? Is there a standard method for storing instances of a structure that will prevent the file from becoming unreadable?
Bert Hall
Newport Beach, CA 92660
A
Your problem is one that used to creep up on me, especially when I moved to a new compiler. It has to do with the way the file is opened. You opened it with the statement
if((fptr = fopen(payfile,"w")) == NULL)
In Standard C, this means you've opened the file in text mode. When a record containing a newline character, i.e., '\n', is written, that newline may be converted to a carriage-return/line-feed combination. This does not occur under UNIX, but does with MS-DOS. If the record you are writing to the file with
fwrite (&pay,sizeof(pay),1,fptr);
contains the value of '\n', usually OXOA, then that byte will be converted to two bytes on output. This value appears to occur in one of the bytes in the floating representation for the value 135.420000 in pay.last_pay. When you read the file back in, the two bytes will be converted to a single newline value.
As you noticed, you will not be able to use fseek to get to the next record, since the size of the output record will depend on its contents.
The solution is to open the file in binary mode, which will preclude the conversion of newline values. The proper fopen call is:
if((fptr = fopen(payfile,
              "wb")) == NULL)
You will need the same "b" mode value on the open for the read.
The Manx Aztec compiler, which I used extensively in the early 1980s because it ran on a number of different platforms, treated newline characters according to the calling function. If you read from a file using line-oriented function calls, such as fgets or getchar, it performed the carriage-return/line-feed translation. If you used record-oriented function calls, as fwrite and fread, there was no conversion. In the latter case, it made the relatively true assumption that you wanted to read in a binary fashion.
As a side note, you could have worked around this in an entirely different fashion. Though not appropriate for this particular application, the alternate workaround illustrates the use of ftell. You could keep track of the position at which each record started in a separate file. The code for this would be something like:
long record_offset;
FILE *index_file;
index_file = fopen("INDEX.IDX", "wb");
...
current_position = ftell(fptr);
fwrite (&pay, sizeof(pay), 1, fptr);
fwrite (&record_offset, sizeof(long), 1, index_file);
ftell returns a value that will take you back to that record, regardless of the mode of the fopen. The ftell/fwrite statements assume that you are writing the records in order from the beginning of the file. To retrieve a record, you would code:
long record_offset;
FILE *index_file;
index_file = fopen("INDEX.IDX", "rb");
...
fseek(index_file, record_number * sizeof(long),
     SEEK_SET);
fread(&record_offset, sizeof(long), 1, index_file);
fseek(fptr, current_position, SEEK_SET);
fread (&pay, sizeof(pay), 1, fptr);
The first fseek/fread gets the record_offset from the index file. The second fseek/fread gets the actual record. This technique is useful if you are using variable-length records. As an extension, you could store a negative value in current_position to delete a record. You might also make up a file-packing routine to eliminate deleted record space in the data file.

Quick Quiz
You'll find the answers to this quick quiz in your compiler manual or the ANSI standard. You may save a few minutes or hours of programming time if you memorize the answers.
You wish to set every byte of a block of memory to the same value. Is the ANSI function called memset or setmem?
To not give away the answer, suppose the name was memory and you had the following code:

#define BYTE_VALUE OXFF; #define SIZE_MEMORY 10 char block_of_memory[SIZE_MEMORY];
Which is the correct call?

memory(block_of_memory, SIZE_MEMORY, BYTE_VALUE);
or

memory(block_of_memory, BYTE_VALUE, SIZE_MEMORY);
If you use the wrong call, will you get a compiler warning? If not, what could possibly go wrong? Hint: this is why strong type checking is desirable, as described in the first question.

Version Control
In a recent UNIX class, I was asked about the importance of version control and how it operates. First I want to differentiate between version control and project control, as good software control requires both. Version control works on individual files. Project control works with an entire program or sets of programs. Both have the same underlying purpose, but the units of control are different. There are a number of standard packages for version control, including RCS and SCCS on UNIX and TLIB, SourceSafe, and others for MS-DOS. Though the packages work differently, the basic features are the same. You check a source file into the system, which stores the information in a reference file kept in a proprietary file format. To change the source file, you check out an editable copy of the source from the reference file, use the editor of your choice to make the changes, file, then check the new file back into the system. The system asks for reasons for the changes and records the date and time the new file was checked in. The reference file keeps a record of the old and the new source files. In order to save space, it stores either the entire old file and the changes (the delta) necessary to create the new file or the entire new file and the changes so that the old file can be recreated. The system can report on the change history for each file, basically, who did what to which (Colonel Mustard, lead pipe, conservatory).
Most systems allow you to use identifiers in your source code that get replaced when a file is checked out for compilation. For example, with SCCS, the identifier %Z% is replaced by "@(#)". Other identifiers are replaced by values such as release, version numbers, and module names. Under UNIX, the what program looks through an executable file to pick out strings that contain "@(#)". These are placed in source files with a line such as:

static char version[] = "%A%";
Version control works with individual files; project control, by contrast, might involve a single program or a set of programs. Under version control, to track down a bug in a new revision of your program, you would have to check each reference file for each source module that went into the program to see if a change had been made in that source.
With a project control system, you could simply query the project system for all changes in files related to a program. SourceSafe, for example, consolidates records of all of the changes into a single database. The database does not track individual changes to each file, but keeps an overall record of when a file has been updated. You check files in and out through the centralized database. The database can reconstruct the versions for each module used to create a particular system.
A centralized project control system can handle libraries used by multiple projects in the same way it handles programs. Knowing which programs use which libraries (a sort of high-level cross-reference) can be very helpful in assessing the cost or complexity of a functional library change.