const promises a lot in a parameter list, but sometimes it appears to promise too much.
Copyright © 1998 by Dan Saks
In my recently completed series on namespaces, I tried to see how far I could get using namespaces, in combination with separate compilation and incomplete types, to decompose a C++ program into simpler discrete components. I came to the not-so-surprising conclusion that, although namespaces may help eliminate name conflicts among libraries, namespaces are no substitute for classes when it comes to packaging abstractions. (See "C++ Theory and Practice: Classes vs. Namespaces, CUJ, July 1998.)
Throughout that series on namespaces, I experimented with different uses for namespaces by applying them in a sample program called xr, a cross-reference generator. Of all the versions of xr that I presented, none used the const qualifier. Before I do any more work with the program (such as rewrite it using classes), I probably should correct that oversight by adding the const qualifier where I think it's appropriate. That's what I'll do right now.
Once More, From the Top
I described the xr program in detail some months ago. (See C++ Theory and Practice: Partitioning with Namespaces, Part 1, CUJ, April 1998.) Here's the executive summary.
xr reads text from standard input and writes a cross-reference to standard output. The cross-reference output is an alphabetized list of the words (identifiers as in C++) that appeared in the input. Each line in the output contains one word followed by the sequence of unique line numbers on which that word appeared in the input.
For example, if you use the source code for xr (Listing 1) as input to itself, the cross-reference output should contain the following line for the identifier token:
token: 32 34 35 36 37token actually appears twice on line 35, but the output only mentions 35 once.
Listing 1 is nearly identical to the program I started with several months ago. The original program was a single source file written in Typesafe C, C that also compiles as C++ [1]. Since I will eventually rewrite the program using classes, I've made a few minor changes so that the program compiles only as C++. In particular:
- It uses // (C++-style) comments.
- It assumes that struct tag names are type names.
- It uses new-expressions instead of
malloc.The program contains just four functions:
- main
- get_token: get a token from standard input
- add_tree: add an entry to the cross-reference for a word and its corresponding line number
- put_tree: write the alphabetized cross-reference listing to standard output
xr uses a binary tree to maintain each word in alphabetical order. Each tree_node holds a word and its associated set of line numbers. Each set of line numbers is implemented as a singly linked list with separate pointers to the first and last list_node in the list. Each list_node contains one line number.
When add_tree adds a cross-reference entry for a word that is already in the tree, it simply adds the line number to that word's existing list (if the number isn't already there). Maintaining a pointer to the last list_node in each list makes adding a new line number very fast and easy.
Using const to Specify Interface
Listing 1 differs from the original Typesafe C version of xr in one other way it uses the const qualifier in the appropriate places. My concept of the appropriate places is based on a general approach I presented almost two years ago. (See C++ Theory and Practice: const as a Promise, CUJ, November 1996.) I don't believe I have taken the time to show you how I'd apply that approach in practice, so I'll use this as my opportunity.
The const qualifier appears in five places in Listing 1. However, not all those appearances are logically distinct. Listing 1 declares add_tree early, and defines it much later. The const qualifier appears in the declaration and in the same place in the later definition. Similar duplication occurs in the declaration and later definition of put_tree.
The first const appears in the declaration of add_tree's second parameter:
tree_node *add_tree ( tree_node *t, char const *w, unsigned n )add_tree makes some notation in the cross-reference table that word w appeared on line n. w is a pointer to the first character in a null-terminated array. For example, in a call such as
xr = add_tree(xr, token, ln);w points to the first character of token.
The const qualifier in w's declaration indicates that add_tree won't try to change the contents of the array passed as the second argument, which in this example is token. In my book, declaring w as char const * places a meaningful constraint on add_tree's behavior as seen by callers.
Both parameter t and the return type have type tree_node *. Neither should have type tree_node const *. That's because add_tree can't do its job unless it retains the right to modify the nodes in the tree.
Even though add_tree doesn't alter the value of its third parameter, n, I'm not inclined to declare it as const. Declaring n as const has no impact on add_tree's outward behavior. As it is, n is passed by value. Even if add_tree did alter n, a call such as:
xr = add_tree(xr, token, ln);wouldn't modify actual argument ln because parameter n holds a completely separate copy of ln's value. Adding const to n's declaration only reaffirms that add_tree won't alter its own copy.
What's the harm in declaring n as const? Admittedly, not much. For the most part, it's just low-level noise, but it can be distracting. For example, suppose you declared n as const, as in:
tree_node *add_tree ( tree_node *t, char const *w, unsigned const n )It appears just above the declaration of get_token:
void get_token ( char *s, size_t n )get_token gets the next token from standard input into an array of n characters starting at s. Obviously, you can't declare s as a char const * because get_token must be able to write to *s. But, if you declare add_tree's parameter n as const, why not declare get_token's parameter n as const? Because get_token's current implementation alters n's value.
Although xr is currently a single source file, the function declarations appearing before main correspond to declarations that would appear in a header if the program spanned multiple source files. When I read one of these function declarations, especially one that others have written, I'm always looking for information about the calling interface. That is, I want to know what to pass to the function, and what it will pass back. In the case of add_tree, declaring w as a char const * gives me information I can use. Declaring n as an unsigned const does not.
Moreover, if you insist on declaring n as const, then you should also declare w as const, as in:
tree_node *add_tree ( tree_node *t, char const *const w, unsigned const n )After all, add_tree doesn't need to alter w any more than it needs to alter n. Now only one of the three occurrences of const in this function declaration conveys information about the interface. The low-level noise is getting louder. My urge is to squelch it.
Using const As If It Were Deep
The const qualifier appears in the declaration of put_tree's lone parameter:
void put_tree(tree_node const *t)put_tree writes the tree to standard output, and shouldn't alter the tree in the process. Again, this is a useful constraint on the function's behavior.
I also used the const qualifier in this declaration in the body of put_tree:
list_node const *p;p points to list_nodes in a list emanating from a tree_node. Since the tree_node is const, it seems reasonable that a list within that tree_node should also be const. Interestingly, if I had declared p without the const qualifier, the code would still compile. Here's why.
A tree_node is defined as:
struct tree_node { char *word; list_node *first; list_node *last; tree_node *left; tree_node *right; };A const tree_node object behaves as if it were defined as:
struct tree_node { char *const word; list_node *const first; list_node *const last; tree_node *const left; tree_node *const right; };That is, it behaves as if the members are const pointers that point to non-const nodes.
When I declared put_tree's parameter t as a tree_node const *, I wanted to assert that put_tree wouldn't change any part of the tree whose root is t. Unfortunately, this use of const only asserts that it won't change the members of the tree_node itself. put_tree can still change the characters in the array that word points to, and the nodes in the list that first and last point to.
In a sense, const is shallow. It applies only to the pointer members themselves. In this case, const should be deep. It should also apply to what the pointers point to. That is, a const tree_node should behave as if it were defined as:
struct tree_node { char *const word; list_node const *const first; list_node const *const last; tree_node const *const left; tree_node const *const right; };Unfortunately, it doesn't. Thus, if I had declared p as:
list_node *p;then the assignment:
p = t->first;in the initialization of the for-statement would copy a value from a const object to a non-const object of the same type. Such assignments are perfectly valid.
Even though I didn't need to declare p as a list_node const *, I did anyway. Using const helps ensure that put_tree won't alter any list_nodes. In other words, I'm trying to program as if const were deep.
I suspect that some of you might think I'm being inconsistent here. I used const in declaring put_tree's local variable p to prevent certain logic errors, yet I left the const off the declaration of add_tree's parameter n even though that const might prevent a similar class of logic errors. Why should I treat them differently?
The difference is that users of put_tree don't see p and therefore don't care about the const in p's declaration, but users of add_tree see the declaration of parameter n. In both cases, const is an implementation detail, but in the case of a parameter declaration, it appears in the user interface. If I have to make a choice, I think keeping the implementation separate from the interface is more important than enlisting help from const. Fortunately, C++ no longer forces you to make the choice. You can use the following technique.
Using const to Specify Implementation
As I explained earlier, you can declare add_tree as either:
tree_node *add_tree ( tree_node *t, char const *w, unsigned n )or:
tree_node *add_tree ( tree_node *t, char const *const w, unsigned const n )The first form uses const only where it affects the function's interface. The second uses const where it affects the function's implementation as well. When I thought I had to choose, I chose the former as the clearer depiction of the function's outward behavior.
Fortunately, you don't have to choose. You can use both. You can use the former when you declare add_tree (before main or in a header), and use the latter when you define add_tree (after main or in a separate file). Here's how it works.
According to the C++ standard, declarations such as:
void f(T v); void f(T const v);do not overload f. These are two declarations for the same function named f. C++ ignores the const qualifier when it applies to the entire type of the parameter.
In the declarations just above, if you replace type T with X *, you get that:
void f(X *v); void f(X *const v);are also two declarations for the same function. Please don't confuse these declarations with declarations such as:
void f(X *v); void f(X const *v);which declare two different functions. The exact placement of const makes all the difference. In the last declaration, const applies, not to the type of v, but to the type of *v.
All of this means you can declare a function f as:
void f(T v);in a header, yet define the same function as:
void f(T const v) { ... }in a source file. Then, when users look in the header, they only see uses for const that affect the interface. But, when you implement the function, you can still protect yourself against accidentally changing the value of the formal parameter.
You can use this style with put_tree as well. The declaration for put_tree (before main) can be:
void put_tree(tree_node const *t);The heading of the corresponding definition later in the source file can be:
void put_tree(tree_node const *const t)I compiled the program with three different compilers Borland C++ 5.0, Edison Design 2.38, and Microsoft Visual C++ 5.0 and it worked with every one.
My reading of the C standard, as well as the draft for the new C9X standard, is that C does not sanction this programming idiom.
The Great Tabs vs. Spaces Debate
A few months ago, in a reply to mail from a reader, I wrote that I preferred to indent with tabs rather than spaces. (See C++ Theory and Practice: Partitioning with Namespaces, Part 3, CUJ, June 1998.) This prompted a small flurry of mail from readers who disagreed. Although I can't say it changed my mind, I still appreciated reading different viewpoints.
For those of you who care to read any of these opinions, you can find them in the Code Review forum on CUJ's website (www.cuj.com/forum). You can chime in by sending mail to mbriand@mfi.com.
Time for a Break
I've been writing regular columns for the CUJ for nearly eight years, and I'm starting to burn out. I'm going to take a sabbatical to recharge my batteries. Please don't fret. I'll be back in a few months with more pearls of wisdom.
Reference
[1] Thomas Plum and Dan Saks. C++ Programming Guidelines (Plum Hall, 1992).
Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.