May 1991/Stepping Up To C++

Columns

Stepping Up To C++

Your First Class

Dan Saks

Dan Saks is the owner of Saks & Associates, which offers consulting and training in C, C++ and Pascal. He is also a contributing editor of TECH Specialist. He serves as secretary of the ANSI C++ committee and is a member of the ANSI C committee. Readers can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or by email at dsaks@wittenberg.edu.
In my last column, I demonstrated a strategy for easing into C++. You transform an existing C program with poor encapsulation into a C++ program with better encapsulation. (See "Writing Your first Class," CUJ, March 1991). This month, I will explain the resulting C++ program in more detail and present further improvements.

A Brief Recap
xr is a simple cross-reference generator that prints an alphabetized list of words appearing in a document. Each word in the output is followed by the sequence of line numbers in the document on which that word appears.
The cross-reference generator uses two kinds of data structures — a binary tree and linked lists. The program contains only one binary tree. Each node in the tree stores a word along with that word's sequence of line numbers. The binary tree is not implemented as a class. The cross-reference table module xrt.cpp uses static data and functions to hide the tree structure from main.main accesses the table through a narrow interface of external functions declared in xrt.h and implemented in xrt.cpp.
The line numbers for each word are actually stored in a singly-linked list rooted in the tree node for that word. The line number sequences are implemented as a class called ln_seq. All the implementation details (including the fact that line number sequences are lists) are hidden inside the class. I presented two slightly different implementations of ln_seq — one using a single pointer to the first node in the list, and another with pointers to both the first and last nodes in the list. The header that declares the second version of ln_seq appears in Listing 1. The code is in Listing 2.

Basic Anatomy
The syntax for class declarations is an extension of the syntax for structs. In its simplest form, a class declaration looks like:

class class_name { member_declaration; ... };
Class members can be data just like in a struct. first and last are the data members of ln_seq. Class members can also be functions. ln_seq, add, and print are the function members of ln_seq. Class members can even be other types, such as classes, structs, and enums.listnode is a type member of ln_seq.
The notation X::m identifies member m of class X as distinct from member m of any other class. X::m is subtly different from y.m (where y is an object of class X). X::m refers to an element of a class type, but y.m refers to a member of the specific object y. For example, the declaration:

class ln_seq { ... void print(); ... };
declares (but does not define) a member function print of class ln_seq. The function definition appears later as:

void ln_seq::print( ) { ... }
If you declare an object:

ln_seq seq;
then seq.print( ) calls ln_seq::print applied to object seq. If you declare:

ln_seq *p;
then p->print( ) applies ln_seq::print to *p.

Access Specifiers
The access specifiers public and private control a client's access to class members. (A client is a program component that accesses a class from outside that class.) A client has complete access to public members — it can call public member functions, inspect public data members, and modify any non-const public data members. A client has no access to private members, except via public member functions.
An access specifier can appear before any member declaration in the class declaration:

class class_name { access_specifier: member_declaration; ... access_specifier: member_declaration; ... };
Each access specifier determines the accessibility of the members that follow it, until the next specifier. In Listing 1, the public specifier applies to the functions ln_seq, add, and print. The private specifier applies to struct listnode and data members first and last.
Access specifiers are optional. By default, class members are private. Thus, you can move the private members above the public members and omit the private specifier from the declaration of ln_seq, as shown in Listing 3. The choice between the class layouts in Listing 1 and Listing 3 is a matter of personal style. I prefer using explicit access specifiers and placing the public members first, as in Listing 1. (It's a habit I acquired from Lippman[1].)
Unfortunately, I've had difficulty formatting my classes consistently because of minor glitches in the current crop of C++ compilers. I've found that occasional mysterious compilation errors disappear when I juggle the declaration order of class members.
The default access specifier for a struct is public. Thus:
struct X
   {
   T a;
   };
is in most ways equivalent to:
class X
   {
public:
   T a;
   };
Constructors
A constructor is a member function that initializes an object. A constructor has the same name as its class. Although it has no return type, it may have parameters. ln_seq has a constructor with no arguments that sets the first and last pointers to null (see Listing 2) .
A constructor is called whenever an auto (or register) object is declared. For example:
int foo()
   {
   ln_seq seq;
   ...
calls the constructor to initialize the data members of auto object seq. The new operator also calls a constructor when it creates an object. (In C++, the dynamic memory area managed by the new and delete operators is called the "free store". It is often the same memory area as the heap used by malloc and free.) For example:
ln_seq *p;
// ...
p = new ln_seq;
allocates storage to hold an object of class ln_seq, and applies the constructor to that storage before copying the object's address to p.
No constructor is invoked when creating an object using malloc. For example:
p = (ln_seq *)malloc(sizeof(ln_seq));
doesn't call a constructor, and leaves *p uninitialized. Thus, as you migrate from C to C++, you should replace calls to malloc (and its cousins calloc and realloc) with calls to new, which invoke a constructor.
If you don't declare constructors for class X, C++ creates X::X( ), a default constructor with no arguments. The generated default constructor for an ordinary C struct, such as listnode in Listing 1, is just an empty function that does nothing. If the struct has members with non-empty constructors, such as struct treenode in Listing 4, the generated default constructor invokes the members' constructors.
At a glance, a treenode of the binary tree looks like an ordinary C struct — it has neither function members nor access specifiers. However, treenode has a member:
ln_seq lines;
with a non-empty constructor. Thus, the generated constructor for treenode applies ln_seq's constructor to initialize lines. treenode's constructor is invoked by:
t = new treenode;
in addtree (see Listing 4) .
A class can have more than one constructor. For example, Listing 5 adds a second constructor ln_seq(unsigned) to the declaration of class ln_seq. This additional constructor initializes a line number sequence with a first line number. A declaration such as:
ln_seq seq(n);
initializes seq to contain line number n. It offers a compact and efficient alternative to using:
ln_seq seq;
...
seq.add (n);
Listing 6 shows an implementation for ln_seq::ln_seq (unsigned).
Using this constructor in the cross-reference generator requires further changes. All of the ln_seq objects in the program are data members of treenode. The generated default constructor for treenode invokes the ln_seq default constructor. To use the alternate ln_seq constructor, you must create another treenode constructor that invokes ln_seq(unsigned) instead of ln_seq( ). You must also modify addtree to use this new treenode constructor. These changes appear in Listing 7.
I changed treenode from a struct to a class. All the members of treenode are still public, so I could have left the keyword struct in the declaration for treenode. However, I used the keyword class to emphasize that treenode is no longer just a collection of data.
The treenode constructor is defined as:
treenode::treenode(unsigned n)
   : lines(n) { }
The construct lines(n) between the : and the { is an initializer. It initializes treenode:: lines using the ln_seq(unsigned) constructor. Without this initializer, lines will be initialized with the default constructor ln_seq( ).
Initializers can initialize class members that have primitive types. For example, you can embellish the treenode constructor to initialize the left and right pointers using either:
treenode::treenode(unsigned n)
   : lines(n)
   {
   left = right = 0;
   }
or:
treenode::treenode(unsigned n)
   : lines(n), left(0), right(0)
   { }
I don't believe that one notation has an advantage over the other for members with primitive types.
Using the explicit treenode constructor, I replaced the lines:
t = new treenode;
...
t->lines.add(n);
in addtree with:
t = new treenode(n);
I also replaced the last lingering call to malloc with a call to the new operator. The expression new char[n] returns a pointer to an array of n chars allocated from the free store.

Destructors
The cross-reference program allocates memory from the free store to build the table, but it never releases that memory. This program can afford to be cavalier with the free store because it builds just one table that exists for the duration of the program. The program terminates immediately after printing the table, effectively discarding the table. If the cross-referencer were part of a larger application with other uses for the free store, then the table's storage should be recycled when it is no longer needed.
The destructor for a class is a special member function that destroys the value of an object of that class. That is, a destructor typically releases resources, such as memory or files, used by an object. The destructor is called implicitly just before the object itself is destroyed. The destructor for class X is denoted X::~X(). A class can have only one destructor. That destructor cannot have any arguments.
Listing 8 shows an implementation of In_seq::~ln_seq( ) that frees the storage used by a line number sequence. The destructor uses the C++ operator delete, instead of the Standard C function free, to return the storage to the free store.
The destructor is called implicitly when an object goes out of scope. For example, in:
void foo()
   {
   ln_seq seq;
   ...
   seq.add(n);
   ...
   seq.print();
   }
the scope of seq extends to the end of foo. The compiler generates a call to ~ln_seq just prior to the return from foo. At runtime, the destructor frees the listnodes referenced by seq, and the function return discards the storage for seq itself.
Just as the new operator invokes a constructor when allocating an object, the delete operator invokes the destructor when freeing that object. delete calls the destructor just before it returns storage to the free store. For example, if you declare p as:
ln_seq *p = new ln_seq;
then:
delete p;
applies ~ln_seq to *p and then frees the storage for *p.
If you don't define a destructor for a class with members that have destructors, the compiler generates a destructor for you. For example, now that class ln_seq has a destructor, treenode (Listing 7) has an implicit destructor that applies ~ln_seq to lines.
treenode's pointer members, word, left, and right, do not have destructors. However, they reference storage that should be deleted when a treenode is destroyed. Therefore, the generated destructor won't discard an entire subtree. If you want to discard a subtree, you must write an explicit destructor that deletes the storage referenced by the pointer members.
Listing 9 shows a new implementation of xrt.cpp, the cross-reference table. class treenode has an explicit destructor that uses the delete operator to destroy objects referenced by the pointer members. The line number sequence lines is destroyed by an implicit call to ~ln_seq.
The member:
char *word;
references an object with a primitive type. Primitive types don't have destructors, so:
delete word;
merely returns storage to the free store. However, the members:
treenode *left, *right;
reference other treenodes. Because treenode has a destructor (you're looking at it), calling:
delete left;
applies the destructor to *left recursively before freeing the storage for *p. The recursion appears infinite, but in fact it stops on null pointers. The preceding delete statement behaves as if it were:
if (left != 0)
   delete left;
Now that treenode has a destructor that destroys a subtree, all you need is:
delete root;
to discard the entire cross-reference table. root is a static data object hidden in xrt.cpp and main (in xr.cpp) cannot access root directly. Thus xrt.cpp has a new external function xrt_destroy that deletes root. You must add xrt_destroy to xrt.h (see Listing 10) and call it from main after printing the table (see Listing 11) .

More Good Things To Come
The cross-reference table is looking more and more like a class. The interface functions in xrt.h will be public member functions of class xrt — xrt_add, xrt_print, and xrt_destroy will become xrt::add, xrt::print and xrt::~xrt, respectively. The static data object root in xrt.cpp will be a private data member xrt::root, initialized to null by the constructor xrt::xrt. I'll transform the cross-reference table into a class in my next column.

A Change In Style
In some of the listings in my last column, I inadvertantly used both 0 and NULL to represent the null pointer constant. I meant to always use 0. The listings in this article use 0 consistently.
In days gone by, I was fairly adamant about using NULL to represent the null pointer constant. I was distressed to find that nearly all C++ literature uses 0 instead of NULL. For some C++ implementations, using NULL creates some problems.
According to the C standard, a null pointer constant is either an integral constant expression with value 0, or such an expression cast to void*. Thus some C implementations, such as Microsoft C 6.0, define NULL as:

#define NULL ((void *)0)
Many C++ implementations use AT&T's cfront translator to compile C++ into C. These use a separate C compiler as a back-end to compile the C into object code. For example, Glockenspiel C++ 2.0 for MS-DOS and OS/2 is a cfront translator that uses Microsoft C 6.0 as its back-end.
Glockenspiel C++ uses the standard C headers supplied by Microsoft C. Thus, Glockenspiel C++ compiles:

char *p = NULL;
as:

char *p = ((void *)0);
Although assignment of a void* to a char* is valid in C, it is not valid in C++. (See my column on "Paving the Migration Path", CUJ, January 1991.) If you insist on using NULL, you must write the initializer as:

char *p = (char *)NULL;
Because the integer constant 0 is compatible with any pointer type, you might as well write the initializer as:

char *p = 0;
In fact, there's no real advantage to using NULL instead of 0 in either C++ or Standard C. NULL is just an artifact of traditional C. Before C had function prototypes, programmers used NULL to ensure that null pointer arguments were the correct size.
With the advent of prototypes, the only remaining use for NULL is to provide the correct size for a null pointer passed as an unchecked argument to a function with a variable-length argument list. For example, in:

printf("null looks like %p\n", NULL);
NULL is an unchecked argument (corresponding to the ... in the function prototype). Using 0 instead of NULL might pass the wrong sized object. I now prefer to write the call as:

printf("null looks like %p\n", (void *)0);
The current draft of the C++ standard (based on [2]) doesn't include (void *)0 as a null pointer constant, and makes no statement regarding the symbol NULL. I hope that, for compatibility with C, the eventual C++ standard stipulates that NULL must be defined so that:
char *p = NULL;
is valid.

References
[1] Lippman, Stanley B., C++ Primer. Addison-Wesley, Reading, MA, 1989.
[2] Ellis, Margaret A. and Stroustrup, Bjarne, The Annotated C++ Reference Manual. Addison-Wesley, Reading, MA, 1990.