March 1999/C++ Theory and Practice

C/C++ Contributing Editors

C++ Theory and Practice: Trimming Excess Fat

Dan Saks

Static members combine the lifetime of global objects and functions with the privacy of member names.

Copyright © 1999 by Dan Saks

I continue this month with further refinements to the programming example that I began last month. The example provides a concrete basis for discussing alternative C++ programming techniques. (See "C++ Theory and Practice: Partitioning with Classes," CUJ, February 1999.)

My example program is a cross-reference generator called xr. xr reads text from standard input and writes a cross-reference to standard output. The cross-reference output is an alphabetized list of the words (identifiers as in C++) that appeared in the input. Each line in the output contains one word followed by the sequence of unique line numbers on which that word appeared in the input.

xr uses a binary tree to maintain the words in alphabetical order. Each node in the tree holds a word and its corresponding sequence of line numbers. More precisely, each node in the tree contains a pointer to the null-terminated character-array representation of a word, and pointers to the head and tail of a linked list representing the sequence of line numbers for that word.

Last month, I created a class called cross_reference_table, which encapsulates that tree. As it stands now, the program consists of two source files and one header file:

xr.cpp: the main part of the application, including the input processing

table.h: the cross_reference_table class and inline member function definitions

table.cpp: the remaining definitions for the cross_reference_table class

The tree implementation employs a structure type, tree_node, and two functions, add_tree and put_tree. The first version of the program declared tree_node, add_tree, and put_tree at global scope. Toward the end of last month's column, I observed that all three names represent implementation details that should be inaccessible outside the cross_reference_table class, and so I transformed them into private class members. The resulting class definition appears in Listing 1. The corresponding member definitions appear in Listing 2.

I concluded last month by noting that declaring add_tree and put_tree as class members introduces a bit of unnecessary overhead into the program. You might want to take a moment and look at the listings to see if you can spot it, but you'd better hurry because I'm going to tell you what it is right now.

Doing Without this

The declaration for add_tree is:
tree_node *add_tree
    (
    tree_node *t,
    char const *w,
    unsigned n
    );
That is, add_tree has a parameter list with three parameters. When it was declared globally, what you saw was what you got. That is, add_tree really had three parameters. However, as a class member, add_tree actually has four parameters — the three you see, plus this.

Normally, I would consider passing this a necessity. After all, the purpose of a member function is to apply some operation to a class object, the so-called receiver of the member function call. However, in this case, add_tree doesn't actually do anything with its receiver object.

A little terminology will help me elaborate. An explicit member access is an expression of the form x.m, where x is an object of class T and m is a member of T. (m could be a data member, a member function, or even a member enumeration constant.) An explicit member access can also take the form p->m, where p is an expression of type T *.

The definition for a member function T::f can refer to member m either by an explicit member access such as x.m or p->m, or by an implicit member access that's just plain m. The compiler interprets an implicit member access to member m as this->m.

For example, the body of cross_reference_table::add in Listing 1 contains a single statement with no explicit member accesses:
root = add_tree(root, w, n);
However, it contains three implicit accesses. The equivalent statement with explicit accesses looks like:
this->root = this->add_tree
        (this->root, w, n);
Now I can elaborate my point that add_tree doesn't actually do anything with its receiver object. Although, add_tree never explicitly mentions this, it does contain implicit member accesses. However, the only implicit accesses involve recursive calls such as:
t->left = add_tree(t->left, w, n);
which is equivalent to:
t->left = this->add_tree(t->left, w, n);
In other words, add_tree never does anything with this other than pass it to another call to add_tree. It never uses this to access the data member root nor to call any other member functions.

The same can be said for put_tree: it makes no use of its this parameter other than to pass it to other calls to put_tree. Thus, the extra code and execution time associated with passing the extra parameter is a waste.

If add_tree and put_tree don't operate on cross_reference_table objects, then why declare them as class members? As I mentioned earlier, tree_node, add_tree, and put_tree are implementation details that shouldn't be available to code outside the cross_reference_table class. The most straightforward way to confine them to the class is to declare them as private members. Unfortunately, declaring add_tree and put_tree as member functions incurs the slight performance penalty I just described. Fortunately, it's easy to eliminate that penalty by declaring them as static member functions.

Static Members

It's been a long time since I did anything in this column with static members, so here's a brief explanation of what they are.

Suppose your application uses a class called widget, and you want to track the number of widget objects in existence at any given time in the execution of your program. Simply define a counter, initialized to zero, that counts the number of objects. Then, add a statement to each widget constructor to increment the counter, as in:
widget::widget()
    {
    ++counter;
    // initialize the widget
    }
and add a statement to the widget destructor to decrement the counter, as in:
widget::~widget()
    {
    // discard the widget's resources
    --counter;
    }
Now the question is: Where do you declare the counter? Clearly, you can't declare the counter as an ordinary data member of the widget class, as in:
class widget
    {
    widget();
    ~widget();
    // ...
private:
    int counter;
    // ...
    };
because then you'd get a separate counter for each widget object rather than one counter for all the widgets.

The counter variable must be statically allocated and separate from every widget object so there's one and only one counter for all widget objects. Declaring the counter as a global variable will work (in a sense), but global variables increase the potential for name conflicts and subtle coupling among components. So let's forget that.

Rather, declare the counter as a static data member, as in:
class widget
    {
    widget();
    ~widget();
    // ...
private:
    static int counter;
    // ...
    };
A static data member is in the scope of its class and is subject to access control (it can be public, private, or protected). Unlike an ordinary data member, a static data member is not a part of each class object; there's only one copy of the static member, separate from every object. That one copy has static storage duration and external linkage, so that all objects of the class type share the same static member.

The declaration of a static data member inside a class is only a declaration. The definition (and initialization) of the static member appears elsewhere, typically in a source file along with other members of the class. For widget::counter, that definition looks like:
int widget::counter = 0;
If the counter were public, code outside the widget class could query the counter by using its fully-qualified name, widget::counter, as in:
if (widget::counter > 0)
    // some widgets are lying around
But then code outside the widget class could also modify widget::counter and invalidate the count. That's why the counter should be private.

If you want to allow code outside the widget class to query the private widget counter, you must provide a public member function that returns the current counter:
int widget::how_many()
    {
    return counter;
    }
However, this function has a problem — it has a this pointer that it doesn't use and therefore doesn't need. Sound familiar? how_many doesn't need this to locate widget::count because widget::count is not in a widget object. Passing the address of a receiver object wastes code and execution time.

This problem goes away if you declare how_many as a static member function:
class widget
    {
public:
    widget();
    ~widget();
    static int how_many();
    // ...
private:
    int counter;
    // ...
    };
A static member function does not have a this pointer parameter, so it cannot access ordinary data members. It can access static data members and call other static member functions. Thus, you don't need a widget object to call how_many. You simply call it by its full name, as in:
if (widget::how_many() > 0)
    // some widgets are lying around
If you wish, you can still use an explicit member access expression to call how_many, as in:
if (w.how_many() > 0)
    // some widgets are lying around
where w is a widget object, or in:
if (p->how_many() > 0)
    // some widgets are lying around
where p is a pointer to a widget object. In either case, the translator uses w or p only to determine the class type of the static member; it does not bind a this pointer to the object as part of the call.

Static Members vs. const Members

Listing 3 shows the cross_reference_table class definition with add_tree and put_tree as static member functions. When I added the keyword static at the beginning of the function declarations, I made one other change: I removed the keyword const at the end of the declaration of put_tree. Here's why.

In Listings 1 and 2, put is a const member function because writing a cross-reference table to cout shouldn't change the table's contents. A const member function must treat its receiver object as a const object.

Inside a const member function, the compiler must reject any attempt to modify the receiver object. Specifically, it must reject any attempt to alter a member of the receiver object. It must also reject any attempt to pass the receiver to another function as a non-const object.

The only statement in the body of put is:
put_tree(root);
which, after converting the implicit member accesses to explicit member accesses, is equivalent to:
this->put_tree(this->root);
This call passes put's receiver object as put_tree's receiver object. If put_tree were a non-const member function, the compiler would reject the call as an attempt to pass a const object as a non-const object. Therefore, put_tree had to be a const member function.

But that was then. put_tree is now a static member function — it doesn't have a receiver object. Yet the const in the heading of a const member function modifies the type of the receiver. Therefore a member function cannot be both static and const. Now that put_tree is static, it cannot be const.

Miles to Go

Using static members offers just one way to organize this program so that all details of the tree structure are inaccessible outside the cross_reference_table class. I'll consider alternatives at some time in the future.

Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.