Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet) and on Compuserve 70125,1142.
Q
It seems sort of ironic that the issue devoted to debugging contains an example of using macros to solve the "Fixed Field Files." (This arrangement produces a set of macro invocations.) The article, "Glass-Box Testing" hints at how to use debugging to break code and verify coverage. Unfortunately macros can't be debugged. After reviewing, understanding, and maintaining the code in various products, I've come to the conclusion that C programmers abuse the macros and the preprocessor in general.
I am a big convert to C++ because the macro preprocessor hardly needs to be used. With true consts and inline functions there is no need to write nasty macros. The following code is written in C-style
#define MAX_LINE 80 getline( char ** line ) { static char myline[MAX_LINE]; }Compare it to
const int MAX_LINE=80; getline( char ** line ) { static char myline[MAX_LINE]; }which is written in C++-style.Now with a debugger you can print MAX_LINE without saying
find /usr/project/x400 -name *.h -name *.c \ -exec grep '#define MAX_LINE'to search through 200 header files to find the correct macro definition.As for C++ I'm just praying that programmers will not go overboard and use overloaded functions in C++ too much. Because of the inheritance/dynamic binding we can have multiple overloaded function(s) in each class, and the superclass (base class) can have those same overloaded classes defined. I guess it would look something like Listing 1.
What's SIZE? The problem gets worse as the program gets larger. I was working on debugging my project and reading the C Users Journal in between the compiles. I hope in the future programmers will recognize that in the world of "reuse" that readability must come before performance and space considerations.
I know a programmer who does the following because he's too lazy to write the whole line:
#define ptr a.get.line.alpha.beta ptr->dataNow when I try and print ptr, the debugger says, ptr not defined. I guess you probably get the message. By the way. David Staker's book, C: Style Standard & Guidelines is quite informative.David A. Dennerline
A
I definitely agree with you that C++ can help a programmer write better code, if it is utilized properly. But basic design rules need to be followed, especially in using meaningful names.
I cannot emphasize too much that when one writes a program, it should be written in English or in whatever native language code set is supported by one's compiler. Meaningful variable, member, parameter, and function names go a long way to writing maintainable code. Making too many assumptions about what the reader knows can result in programs that are difficult to understand. The member function names are particularly important in C++ as one is not supposed to look at the implementation to determine what the function is doing.
Let me take your example functions, which were created for example purposes, rather than as actual code. Looking at the parameter names gives no clue as to what is really required.
virtual int get_size( char * s ); virtual int get_size( int idx );In the first case, the question is whether s is the name of the car or the model of the car or a combination of both. Whichever it is should be demonstrated by the name, as
virtual int get_size( char * model_name);In the second case, the parameter name is not an English word. It ought to be replaced by a real word, such as "index." I do not think that even that is enough, as some indication should be given as to what the index is used for. The question is whether the index is based on passenger compartment size, horsepower, consumer rating, or something else.
virtual int get_size( int index_of_consumer_rating);We are still left with two overloaded functions. I would eliminate the overloading by modifying the name of each member function. In addition, there is no indication in the name of what size is really being gotten. So the function might really be called
virtual int get_engine_size_by_model_name ( char * model_name); virtual int get_engine_size_by_consumer_rating (int index_of_consumer_rating);Then the code might read
get_engine_size_by_consumer_rating(SIZE);Now the name SIZE looks rather funny here. That's because it is not long enough to describe what it really is. If you are just the class provider, you do not have any control over the names by which your classes will be invoked. At least you have done your best to provide the means to write readable code.One objection to requiring meaningful names is that the source code is longer. Sometimes it will take two lines to write an expression that could have fit onto one. My response is that statements are like sentences. Look at the last letter or article that you wrote. See how many sentences fit onto a single line.
Another objection is that the names can be too long and can exceed the 31-character limitation of significance. The words in the name do not have to all be in the dictionary. Abbreviations are okay, especially if they appear in an industry dictionary or glossary. Proprietary abbreviations are okay, if the company has a standard glossary. Abbreviations that are local to your programs should be avoided, unless they are consistently applied and a glossary is included. Every time you use your own abbreviation, you add an information element that has to be absorbed by the reader.
I remember reading David Copperfield in my youth. It was a bit advanced for me at the time, so I had it in one hand with a dictionary in the other. Every time I came across an unknown word, I looked it up in the dictionary. You can imagine how that affected following the plot. Using too many of your own abbreviations can make the reader miss the flow of the code.
Let me point out that overloading constructors is almost unavoidable, since there is only one name for the constructor. However, as pointed out in Tom Cargill's C++ Programming Style, many overloaded constructors can be eliminated by specifying default arguments in the parameter list for the constructor.
Your other instance of bad programming shows that the designer did not properly modularize the code in C. One has to learn to do this in C, as that is a design technique, not a feature of the language. C++ makes it easier to design modular code as many of the bookkeeping aspects are taken care of, but it does not demand it.
My basic rule of thumb for structure access is that anytime you use more than one member operator, you should examine how to redesign your code. If you use more than two, you should start the redesign immediately. A program that quires a reference as
a.get.line.alpha.betais going to be difficult to maintain.
External Declarations
QI had a long debugging session recently when I ran across the following problem (see Listing 2) . I've cut it down to the essentials. The code appears all in the same source file. The value of cnt is incremented by both functions. The correct code for the second function should have been:
static int cnt1; ... cnt1++; printf("cnt is %d", cnt1);This code was only a small portion of a large file with lots of small functions, so it was not readily apparent that it was incorrect. However I would have thought the compiler would have told me that cnt was declared twice. No error message was generated, so the bad code was hard to find. What gives?Sam Mellon Boston, MA
A
You have been exposed to the new semantics of external declarations in ANSI C. This is a major change from K&R C, but it is not noted as such in many texts. Let me review how external variables are linked together, as well as the differences in K&R and ANSI.
In this explanation of externals, K&R refers to the Appendix A in the Kernighan & Ritchie The C Programming Language Book. Some compilers implemented externals slightly differently.
The linker matches external references (REFs) to external definitions (DEFs). There are REFs and DEFs for both functions and data, though they are generated slightly differently. Take functions first. Their operation is the same in both K&R and ANSI.
Each call to a function in your code generates a REF for that function. The code for the function itself is the DEF for the function. For example, this code segment:
void calling_function(void) { called_function(); .. }generates a DEF for calling_function and a REF for called_function. You can have one and only one DEF for a function in all the files that are linked together. Of course, you can and probably will have lots of REFs for a function, but you don't even have to have any.You may have gotten messages from the linker regarding duplicate definitions or the failure to provide a definition. Depending on the vendor, it may or may not allow you to create an executable file if those errors occur.
External REFs and DEFs for data work slightly differently. Under K&R C (Appendix A), the same rules applied as to having one and only one DEF and allowing zero or more REFs. A DEF was generated by a variable declaration outside of any function, such as:
int global_i; calling_function () { global_i = 5; ... }This declaration could appear in one and only one source file.Optionally, you could initialize the variable as:
int global_i = 5;and the initial value would be set to 5. If you did not specify an initializer, its value would default to 0. To generate a REF, the declaration with the keyword extern was used:
extern int global_i; another_calling_function() { global_i = 7; ... }It was permissible to have both a DEF and a REF in the same source file, such as:
int global_i = 5; extern int global_i;This would not normally be done directly, but indirectly, as the REF or a set of REFs would be placed into a header file, such as:
"external.h" extern int global_i; /* Other extern declarations */The DEFs could be in individual files or a single file. The header file would be included in this file, such as:
#include "external.h" int global_i = 5;If the declaration of the DEF does not match the declaration of the REF, the compiler generates an error.Although K&R was pretty straightforward on how to handle DEFs and REFs, vendors developed their own styles, some of which were adapted from other languages. I won't go into all the possibilities. ANSI C adopted an amalgamation of the styles. It also added the concept of tentative DEF. The first declaration of an external without an explicit initializer is considered to be a tentative DEF.
All similar declarations that follow in the same file are also tentative DEFs.
For example, suppose your file contained:
int global_i; /* Tentative DEF */ ... int global_i; /* Tentative DEF */These are not double definitions of global_i, but simply a set of tentative definitions of the same variable. One tentative DEF will be turned into a true DEF by the compiler.Now if you declare an initializer in the declaration, it becomes a true DEF. The compiler will accept one true DEF and ignore all the other tentative DEFs.
int global_i = 5; /* True DEF */ ... int global_i; /* Tentative DEF - refers to previous*/The compiler will complain if two true DEFs appear in the same source file, as:
int global_i = 5; /* True DEF */ int global_i = 6; /* Compiler error - double DEF */Just as with K&R, you can only have a single true DEF in all the linked source files. The linker should complain if two true DEFs appear in multiple linked source files. However, the way many compilers/linkers work extends the concept of tentative DEFs to multiple source files. A tentative DEF in a source file continues to act as a tentative DEF, rather than being turned into a true DEF. If there is no true DEF in any source file, then the linker turns a tentative DEF into a true DEF with an initializer of zero. The linker will accept one true DEF and ignore all the other tentative DEFs.If you include the keyword static, the scope of the variable is only within the source file. However the rules for tentative DEFs still apply:
static int cnt; /* Tentative DEF */ static int cnt; /* Tentative DEF */Both tentative DEF's refer to the same variable. This is why your code had a problem. However, it is acceptable ANSI C, so the compiler did not complain. If you used initializations, you should get an error, such as:
static int cnt = 0; /* True DEF */ static int cnt = 0; /* 2nd True DEF - compiler error */If you had simply initialized one, the other would have been treated as a tentative DEF for the same variable:
static int cnt = 0; /* True DEF */ static int cnt; /* Tentative DEF */To complete the external puzzle, the keyword extern has been altered from K&R. It now can be used with an initializer, as:
extern int global_i = 5; /* True DEF */ extern int global_i; /* REF */If used without an initializer, it still means REF.I should note that the concept of tentative DEF's does not exist in C++. A set of declarations as:
int global_i; int global_i;is a compiler error due to the double definition.I've covered my guidelines for using global variables long ago. It can be summed up in a word Don't. However, if you must, I suggested the style that follows that as shown for K&R above. Use extern declarations (REF's) in a header file that gets included in every source file. Then in a single file initialize all the externals. For static externals, place all declarations together near the top of the source file.
P.J. Plauger long ago pioneered a similar style with Whitesmith's C, using the new ANSI standard for externals. Initialized declarations (extern int i = 1;) in one file are marked true DEF's. Uninitialized declarations (extern int i;) in a header file were used as REF's.
Social Security Feedback
I received a note from Bruce Bogert regarding the conventions on Social Security numbers. The first three digits are assigned on a state by state basis. For example 212 through 220 are for Maryland. The next two digits may be used for individual divisions within a state and the last four digits are sequentially assigned. If you do need to make up a Social Security number for some outfit that requires it but is not legally obligated to have it, you might want to know that numbers beginning with 900 to 999 are not used for the most part. (KP)