Dan pushes his design using namespaces to its logical conclusion and logically concludes he was wrong to design that way.
Copyright © 1998 by Dan SaksFor the past three months I've been experimenting with different ways to partition a C++ program into simpler components using separate compilation, incomplete types, and namespaces. (See ''C++ Theory and Practice: Partitioning with Namespaces,'' Parts 1 through 3, CUJ, April through June 1998.) Now that I've explored several variations on this theme, it's time to reflect on just how effective the techniques are.
I've been demonstrating the techniques by applying them to xr, a program that reads text from standard input and writes a cross-reference to standard output. The cross-reference output is an alphabetized list of the words (identifiers as in C++) that appeared in the input. Each line in the output contains one word followed by the sequence of unique line numbers on which that word appeared in the input.
xr implements the cross-reference table as a binary tree. Although the program started out as a single source file, I soon split it into two source files:
- xr.cpp: the main part of the application, including the input processing
- cross_reference.cpp: the function and data definitions that constitute the cross-reference table implementation
and one header file:
- cross_reference.h: the declarations that constitute the cross-reference table interface
cross_reference.h (shown in Listing 1) declares several names, all as members of namespace cross_reference. Of those names, only add and put are truly parts of the cross-reference interface. The other names are merely details of the cross-reference implementation. I concluded last month by saying that I'm still bothered by those details in the header. Now I'll explain exactly what's bothering me.
The Illusion of Simplicity
My objective in this, and every other software design, is to produce a program that's as correct and maintainable as possible, without undue sacrifices in speed and space efficiency. As I explained a few months ago, I believe the best way to ensure correctness and maintainability is organize the program so that it's as simple as possible. (See ''C++ Theory and Practice: Basing Style on Design Principles,'' CUJ, March 1998.)
You can rarely turn a large, complicated program into a truly simple one, but you can create the illusion that the program is simpler than it really is. That is, you can decompose a large program into simpler abstractions so that you can work effectively with just one or two parts of the program while ignoring most of the others.
Granted, xr is not a very large or complicated program. However, it's large enough to exhibit, at least in a small way, the problems that plague larger programs. I think attacking xr as if it were a much larger program is still a reasonable way to demonstrate abstraction techniques in C++.
I split xr into separate source files to simplify the main function. main needs to know what the cross-reference table does, but not how it's implemented. Thus, I moved the implementation details to cross_reference.cpp in an attempt the hide them from main.
Ideally, cross_reference.h should provide just enough information so that main can use the full functionality of the cross-reference table without any concern for the inner workings of that table. Unfortunately, cross_reference.h provides more than that. It contains declarations that reveal implementation details.
Why is that a problem? After all, the main function (in Listing 2) can, and does, call the cross-reference add and put functions without any apparent knowledge that the cross-reference table is a tree. Unfortunately, the majority of the declarations in cross_reference.h specify implementation details. It's easy to refer to them accidentally. Sure, you can add comments to cross_reference.h exclaiming:
// don't declare variables of this type!or
// don't call this function!but I have on it good authority from this guy named Murphy that accidents will happen nonetheless.
The only way to prevent such mishaps is to make the declarations inaccessible so that compilers can enforce the abstractions. One sure way to make those declarations inaccessible is to make them invisible to main by removing them from the header. I tried that in the first part of this series, but it introduced speed and space penalties which I chose not to accept. That's why implementation details remain in the header.
When designing separately-compiled abstractions, you may find other legitimate reasons to leave implementation details in the header. One of the most common reasons is to allow multiple instances of the abstraction.
Allowing Multiple Instances
As it stands, xr places all the identifiers into one cross-reference table. When I moved the code for the cross-reference table to separate source and header files, I didn't anticipate supporting more than one table. The calls
cross_reference::add(token, ln);and
cross_reference::put();in main don't specify the cross-reference as a parameter; they act upon the cross-reference, whatever and wherever it is. In the current implementation, the cross-reference table is the statically-allocated variable cross_reference::xr. This design just doesn't allow more than one cross-reference table.
What happens if you want to allow for more than one cross-reference table in the application? Admittedly, this particular application doesn't need more than one table. But it's not hard to imagine applications that use more than instance of some kind of symbol table similar to the cross-reference table. Rather than write a new application, let's just rewrite xr to allow for more than one cross-reference table, even if we don't take advantage of it right away.
To allow more than one table, you must pass some additional information to the cross_reference add and put functions to tell them which table to act upon. For example, if t designates a table, then
cross_reference::put(t);displays the contents of table t. How do you declare t? You could declare it as
cross_reference::tree_node *t;but that would expose the tree implementation to main. You should use a type name that hides the implementation, something abstract like:
cross_reference::table t;table could be just a typedef defined in cross_reference.h as a member of namespace cross_reference:
typedef tree_node *table;You must then add a parameter of type table to the add and put functions, and change add's return type to table as well.
The revised interface for the cross-reference table appears as cross_reference.h in Listing 3. The corresponding revisions to function main appear in Listing 4. Those revisions include a statement that initializes cross-reference table t:
t = NULL;If you omit this step, t's value remains undefined. The program might work anyway, but don't count on it.
Here we go again. This initializer exposes implementation details to main. The inappropriate detail is not that the table needs to be initialized; it's that the table needs to be initialized with a null pointer. If the cross-reference table were implemented as a different data structure, this assignment statement might not be the proper way to initialize it. In fact, it might not even compile.
Again, you should use a more abstract mechanism, such as an initialization function. The call might look like:
t = cross_reference::init();or
cross_reference::init(&t);This, in turn, raises even more questions:
- How do you ensure that a program using the cross-reference does not forget to initialize t?
- How do you prevent the application from bypassing the call to init and assigning NULL to t instead?
Remember, cross_reference::table is just a typedef. A typedef is not a distinct type; it's an alias for some other type. In this case, table is an alias for a pointer type. The typedef lets you write code that treats the table as an abstraction, but doesn't prevent you from still treating it like it's a pointer.
By the way, with the current implementation, you can get away without an initialization function if you define t outside main (in global scope). Pointer objects defined at global scope (in fact, at all namespace scopes) are default initialized to zero, which converts to a null-pointer constant. This masks the need for an initialization function, but doesn't make it go away.
The Right Tool for the Job?
Until now, I've been focusing on using a namespace to package the cross-reference table as an abstraction. I've been gradually reducing the implementation details in the header. I almost got to where it looked like the namespace would actually do the job. But that was only as long as I was willing to settle for at most one cross-reference table per application. Even then, the namespace couldn't hide all the implementation details.
Now that I'd like to allow for more than one table in the application, it's apparent that I've been using the wrong tool. The thing that needs to be an abstraction is the type of the cross-reference table. A namespace is not a type; it's just a cluster of names. The principle mechanism for packaging abstractions in C++ always was, and still is, a class. The need for an init function strongly suggests that the table type should be a class with a constructor.
This is an important point, so I'll dwell on it for a while longer. Namespaces and classes have many similarities. It's easy to be confused about when to use one as opposed to the other. This is especially so because namespaces are new and there aren't yet many published examples that use them.
You can use namespaces along with separate compilation, as I did, to implement an abstraction such as a cross-reference table. However, you might have to accept that you can't have more than one instance of the abstraction. You might also have to accept performance penalties introduced by extra layers of function calls. Even then, you will probably still wind up with implementation details lurking in the header.
No matter how you shuffle declarations among namespaces, implementation details in header files weaken abstractions. There's just no way to make namespace members in a header inaccessible to code that includes the header. That's because namespaces aren't for building abstractions; they're for avoiding name conflicts. Classes are for building abstractions.
Rewriting the cross-reference table as a class won't eliminate all the implementation details from the header, but it will make those details inaccessible to all but the implementation. Class members are subject to access control. A class can make its members inaccessible, even if it can't make them invisible. Namespace members are not subject to access control; they are always public. The only way to make them inaccessible is to make them invisible.
Some Thoughts on Pedagogy
You may be wondering why I chose to work xr through so many revisions and in such detail only to conclude that that I was going about it wrong. I don't think it was a waste of time. Far from it. I wasn't going about it all wrong. Most of what I wrote about designing abstractions is still valid. It's just that I should have been using classes instead of namespaces to render parts of the design.
I must admit that I suspected this outcome before I began. After all, I'd done this exercise before using classes, and I knew that they worked fine. On the other hand, I hadn't used namespaces all that much, so I wasn't really sure where, or even if, they'd come up short.
Even though I anticipated the result, I still think it's important to work through examples like this in detail. They often raise subtle points that help you discover which language features are best for certain tasks. Moreover, these little experiments don't always come out as expected. In the years that I've been writing for CUJ it has happened more than once that the point I thought I was going to make when I started an article was not the one that I ultimately made. (See ''C++ Theory and Practice: Mixing const with Type Names,'' CUJ, December 1996 for a case in point.)
I don't think it would have even occurred to me to try to use namespaces as I did were it not for an example that I read in chapters 8 and 9 of Stroustrup [1]. Stroustrup's example uses namespaces and separate compilation to partition a program into several modules, each with an interface that is an abstraction of its implementation.
Stroustrup's example program is a simple desk calculator, which he reworks a few times to increase abstraction and reduce coupling between modules. The techniques seem to work fairly well. The surrounding text explains many features of namespaces, using directives, and using declarations, including some I have not covered in these articles. Many of you should find it worth reading.
On the other hand, Stroustrup introduces his example with this disclaimer:
The calculator is a tiny program, so in ''real life'' I wouldn't use namespaces and separate compilation to the extent I do here. It is simply used to present techniques useful for larger programs without our drowning in code. In real programs, each ''module'' represented by a separate namespace will often have hundreds of functions, classes, templates, etc.
The disclaimer raises questions about how he would write the calculator in ''real life.'' Should some of the modules in the calculator be classes rather than namespaces? If so, which ones? On what basis do you distribute the classes among the remaining namespaces?
The calculator example appears before the chapter on classes, so it doesn't use classes to implement any of the abstractions. The chapter on classes does no further work with the calculator. I think it would be better if it did. Although there are exercises at the end of the chapter that suggest rewriting parts of the calculator using classes, the book leaves you pretty much on your own to figure whether to introduce new classes into the existing namespace structure or to turn some of the namespaces into classes.
My current thinking is that, in ''real life,'' most of the namespaces in the calculator should be classes. For example, the calculator includes a lexer module which pieces input characters into tokens such as numbers and operators. The lexer is implemented as a namespace. As such, it doesn't provide any data abstraction. It declares data representing the internal state of the lexer as part of its interface. In addition, the namespace allows only one instance of the lexer. These are the same problems that haunt the namespace implementation of the cross-reference table in my xr program.
I think the lexer should be a class. If you want to see what a lexer class looks like, look at my decl program. decl converts C++ declarations into English. (See ''The Column That Needs a Name: Parsing C++ Declarations, Part 2,'' CUJ, March 1996.) In decl, the lexer class is called scanner.
My point is that, while it's not too hard to show what you can do with namespaces, it's much harder to show what you should do with namespaces. Namespaces are supposed to help manage names in large programs and libraries. It's hard to demonstrate large-scale programming techniques using programs that are small enough to publish.
Thus far, I have only shown you what you can do with namespaces. I've also suggested something you shouldn't do with namespaces, namely, use them instead of classes to package abstractions. I guess I'm still on the hook to suggest ways that you really should use namespaces.
I admit that I'm still just feeling my way through this stuff. I don't think I have any real answers, at least not yet. However, I think this discussion has helped lay some important groundwork by distinguishing the capabilities of classes and namespaces.
In the coming months, I will rework the cross-reference program using classes. We'll see what role namespaces continue to play.
Reference
[1] Bjarne Stroustrup. The C++ Programming Language, 3rd ed. (Addison-Wesley, 1997).
Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.