December 1993/Stepping Up To C++

Columns

Stepping Up To C++

Virtual Functions

Dan Saks

Dan Saks is the founder and principal of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
Object-oriented programming is a collection of programming techniques:

data abstraction

inheritance

polymorphism
C++ is an object-oriented language in the sense that it provides specific language features that support each of these techniques. (However, it's not a pure object-oriented language in that it doesn't insist that you use any of these techniques.)
C++ supports data abstraction (also called encapsulation) by providing classes with public and private access specifiers. I first introduced classes and access specifiers way back in "Writing Your First Class," CUJ, March, 1991, and I've been using them ever since. C++ supports inheritance through class derivation, which I introduced in my column starting with "Inheritance, Part 1," CUJ, March, 1993. C++ supports polymorphism by providing virtual functions, which is my topic for this month.
Just as you can't use inheritance unless you're already using classes, you can't use polymorphism unless you already understand inheritance. Thus I'll start with a brief review of inheritance and type hierarchies before introducing new terminology.

Type Hierarchies
Data abstraction is a technique for grouping into a single program unit all the data that represents a single variable of an abstract type along with all the functions that can operate on that data. In C++, that unit is called a class. The data and functions declared inside the class are called its members.
A class controls access to its members by using access specifiers (the keywords public and private). The public members in a class define the class interface, that is, the set of outward functional characteristics that the class provides to the rest of the program. The private members specify implementation details that the class chooses to hide from the rest of the program.
Inheritance creates a new class (the derived class) from some existing class (the base class). The derived class inherits all the members (both data and functions) of the base class. The derived class may add more members, but it cannot take any away. However, a derived class can override the definition of an inherited member function with a new definition.
Inheritance can also be public or private; public is the more useful form. Public inheritance defines an Is-A relationship between a derived class and its base class. That is, an object of a (publicly) derived class is an object of its base class (and probably more). A function that expects an object (or a pointer or reference to an object) of some base class B will readily accept an object (or a pointer or reference to an object) of any class publicly derived from B.
You can use inheritance to define hierachies of related classes. That is, you can derive several classes from a single base class, and then derive additional classes from each of those derived classes. Class hierarchies let you propogate a host of common attributes from a base class to many derived classes, and provide a range of behaviorial variations with a minimum of effort.
For example, you can design a class hierarchy to implement a device-independent file system, such as the ones in Unix or MS-DOS. In these systems, each device type — disk, tape, serial port, parallel port — usually has a distinct low-level I/O protocol for sending (flushing) data to the device, receiving (filling) data from the device, or signalling I/0 status. But the file system provides a common set of high-level operations, like read and write, so that all these different device types look essentially alike to application programmers.
At some low level, programs reading from or writing to a device must behave differently according to the true device type. But most of the high-level algorithms and data structures, such as those for buffering and unbuffering data, are the same for all devices. Furthermore, random-access devices like disks and tapes share some characteristics with each other, while sequential devices like ports share a slightly different set of common characteristics.
You can model a device-independent file system with a multi-level class hierarchy as outlined in Listing 1 and graphically represented in Figure 1. The root of the hierarchy is the base class, file, which declares functions, like read, write and (test for) eof, that are common to all files. file also defines the data structures for buffering that every derived class needs.
Class random_file derives from file and overrides the inherited functions with revised definitions that work properly for random access. In many cases, an overriding function will be very simple because it can call the function it overrides to do most of the work, as in
int random_file::read(char *buf, size_t len)
   {
   // do something specific for random access
   // call file::read(buf, len)
   // do something else specific for random access
   }
Class random_file defines additional members unique to random access devices. In particular, it provides additional operations like seek and tell, along with a nested type pos_t for representing a file position indicator. Classes disk_file and tape_file derive from random_file, overriding and augmenting inherited characteristics to match the needs of each device exactly.
On the other side of the hierarchy, class sequential_file derives from file, overriding inherited functions as needed and defining additional members to support sequential devices. Classes parallel_port and serial_port derive from sequential_file, and in turn, add the appropriate embellishments.
Type hierarchies like these arise in many other libraries and applications. Of particular interest these days are user interface libraries containing all sorts of graphical objects that share common properties. For example, an interface library might support different kinds of menus: drop-down menus, pop-up menus, and menu bars. All of these different menu types can be classes derived from a single menu base class.

Polymorphism
Inheritance simplifies the task of implementing families of related class types. But by itself, inheritance doesn't necessarily make life any easier for users of these types.
Consider the file system hierarchy. As a user, I want to be able to write a function that operates on a file and to pass it an object of any file type. That is, I want to be able to pass a disk_file, a tape_file, or any other kind of file to the function and have it work correctly in all cases. Furthermore, I don't want to add code to my function to ask the file "What are you, really?" before it reads or writes. The function should simply apply read or write to its file parameter, with the expectation that the call will do the right thing for whatever type of file that parameter is.
That's what I want. But that's not quite what I get.
Using the file hierarchy from Listing 1, I can indeed create a disk_file object and pass it to a function that expects a file (or a file & or file *). I can pass a disk_file object as a file object because a disk_file Is-A file. Unfortunately, when the disk_file object arrives in the function, the function only knows it has base class file object. It can't tell what kind of file it really has.
Consider the example outlined in Listing 2. The filter function accepts two parameters, fin and fout, that are both of type file &. main calls filter, passing disk_files din and dout as the actual arguments. I would like the call fin.read(buf, BUFSIZ) inside filter to invoke the appropriate function for reading a disk_file, namely disk_file::read. Unfortunately, as the hierarchy is currently written, fin.read always calls file::read regardless of the file's type. The same problem occurs with the calls to fin.eof and fout. write.
What we have here is a failure to communicate. Communication breaks down because C++ uses static binding by default. That is, when translating a call like r.f, where r is a reference, C++ normally selects the member function based on the static type of r. If r is declared in the current scope with type B &, then the call r.f invokes B::f. Even if r is actually bound to a D object, where D is derived from B and overrides B::f, calling r.f still calls B::f, applying it to the object referenced by r. Similarly, when translating a call like p->f where p is a pointer, C++ selects the member function based on the static type of *p.
With static binding, if you want an object to exhibit the derived class's behavior, you have to refer to the object with an object expression of the derived type. (The object expression is the expression in a member function call that appears to the left of the dot or arrow.) So, to work properly, the filter function in Listing 2 must somehow determine that fin refers to a disk_file, and then cast fin to a disk_file & before calling read, as in
n = ((disk_file &)fin).read(buf, BUFSIZ);
Such conversions are inconvenient and generally unsafe.
I won't get the kind of behavior I want unless the file system hierarchy uses dynamic binding. That is, each file object should remember what type it really is (its dynamic type) so that member function calls applied to that object invoke the overriding functions for the object's true type. For example, with dynamic binding, the call to fin.read in the filter function of Listing 2 would call disk_file::read whenever fin is actually bound to a diskfile. It would call tape_file::read whenever fin really refers to a tape_file object. And so on.
This combination of inheritance and dynamic binding is called polymorphism (from the Greek for "manyformism"). Polymorphism is a programming technique which lets you pass an object of a derived class type (the dynamic type) to functions that know the object only by its base class type (the static type). Yet, the object retains its dynamic type so that member function calls applied to that object invoke the derived class behavior. With polymorphism, the calling function need not ask for the object's dynamic type. In effect, the the caller just says "Whatever you are, do this operation in whatever way is right for you."
Calls to dynamically-bound functions are typically a little slower than calls to statically-bound functions. Since C++ tries to avoid making you pay for a feature unless you use it, you can't get dynamic binding unless you ask for it. You ask for dynamic binding one function at a time, by adding the keyword virtual to the beginning of the function declaration. Only member functions can be virtual. A class with at least one virtual function is called a polymorphic class.

A Hierarchy of Shapes
I prefer to demonstrate virtual functions with a complete working example. Unfortunately, a file system hierarchy is a bit too large to present in its entirety. So I'll use a simpler example involving a collection of shapes, like circles, rectangles and triangles, all derived from a common base class.
A graphical application might employ a basic set of shapes for composing complex diagrams. For example, CASE (computer-aided software engineering) tools use different shapes to represent classes, objects, and processes. A drafting tool might use shapes as the basic building blocks for representing complex 3-D entities.
Of course, my shapes are much simpler. Listing 3 shows the class definition for the base class shape. The public member functions define the basic operations common to all shapes. The class provides a constructor, shape(palette) that creates a shape with a given color, along with the following member functions:
color — return the color of a shape
name — return the name of a shape
area — return the area of a shape
put(ostream &) — display a shape on a stream
All of the member functions except the constructor are virtual. The compiler will use dynamic binding when calling any of these virtual functions.
Constructors cannot be virtual. Virtual functions make an object act according to the type it was given when it was created. In a sense, the constructor "gives" an object its type. You can't ask an object to construct itself in a way appropriate to its type because, prior to the constructor call, the object doesn't even exist. The constructor for a polymorphic object (of a class with at least one virtual function) puts something in the object to describe its dynamic type. Subsequent virtual function calls applied to that object use that something to route the calls to the right place.
Destructors can be virtual. Each object in the hierarchy may have different resource requirements, and thus different procedures for discarding those resources. Using virtual destructors, you can delete an object via a base class pointer and rest easy that the appropriate destructor will be called.
Shapes come in a variety of decorator colors. The nested type palette defines the available colors. By nesting the type inside the class, the names of the type and the enumeration constants are all in the scope of the class, and not global. This eliminates the possibility of global name conflicts that might occur if another library defines a different color scheme but uses the same names.
You specify a shape's color as an argument to the shape constructor. The constructor stores that color into the data member _color, as shown in Listing 4. The notation : _color(c) is a member initializer that initializes member _color with value c. The data member _color is private; non-members can inspect its value by calling the public member function color.
The area member function returns the area of a shape. Derived classes like circle have some dimension, like radius. But the base class shape has no dimension, so its area is just zero. The name member function returns a null-terminated string representing a shape's name. A shape with zero area has the name "point."
shape::put(ostream &os) writes a shape's attributes to ostream os in text form. The static member color_image is an array of strings that are printable spellings for the colors. color_image contains one string for each selection from palette. For example, color_image[GREEN] selects the string "green." (The declaration for color_image appears in the class definition in Listing 3. The definition for color_image appears in Listing 4. ) Thus, shape::put displays a shape's color and its name.
So much for the base class. Listing 5 shows the class definition for circle, derived from shape. circle redeclares the functions area, name, and put, each with the same signature and return type that it has in the base class. Thus, these functions override the inherited definitions.
Even though the keyword virtual does not appear in circle's declarations for area, name, and put, these functions are virtual nonetheless. A function in a derived class that overrides a virtual function in a base class is itself a virtual function. Adding virtual to the overriding function's declaration in the derived class doesn't hurt, but isn't necessary.
Listing 5 also contains circle's member function definitions. The constructor creates a circle with a specified color and radius. The constructor initializer list
: shape(c), radius(r)
passes c to thenstructor when it initializes the base class part of a circle, and then initializes the radius with r.
circle::area computes a circle's area using the familiar pr2 formula. circle::name simply returns the string "circle."
circle::put does something very interesting. It calls the put function from the base class using the explicitly-qualified name shape::put. Even though shape::put is virtual, this call with explicit qualifier turns off the dynamic binding. A call to shape::put puts the color and name of the shape, and returns to circle::put, which displays the radius as well. The output looks something like:
blue circle with radius 3
Dynamic Binding in Action
Consider the consequences if the circle::put called shape::put(os) using dynamic binding. Remember, inside circle::put, the compiler translates the call shape::put(os) into this->shape::put(os). That is, the call applies shape::put to the object addressed by this. If the call binds dynamically, then it calls the put function for the dynamic type of this, which in this case is circle. Thus, if this call were dynamically bound, it would call itself recursively until the call stack overflowed.
Notice that circle does not override the inherited color function. The color of a circle is the value stored in its inherited _color data member. The rule for inspecting _color is the same for all shapes, so derived classes need not override it. But if derived classes never have a reason to override the color function, then it need not be virtual. Just because some functions in a base class are virtual doesn't mean all of them must be.
For good measure, Listing 6 shows a rectangle class derived from the shape class. Each rectangle stores its height and width (along with its inherited color). The overriding member functions fall in line accordingly.
Listing 7 illustrates the real power of dynamic binding using virtual functions. The function largest selects the shape with the largest area from a collection of shapes. In this example, the collection is simply an array, sa, of pointers to (constant) shapes. Thus, sa[i] is the ith shape in the collection, and sa[i]->area returns the area of the ith shape. Since area is a virtual function, largest never needs to know the dynamic type of a shape. Dynamic binding automatically selects the appropriate area function for each shape.
Even better, largest works not only for collections of circles and rectangles, but for collections of any shapes derived from the base class shape. If you derive triangle from shape, and add a triangle to the collection, largest can handle it. If you put largest in a separate source file, you don't even have to recompile largest each time you add a new shape to the hierarchy. largest only depends on the definition for the base class, which doesn't change. So the derived classes for circle, rectangle, triangle, and so on, need not be in scope when compiling largest.
Collections of polymorphic objects are a staple in object-oriented applications. For example, in a windowing environment, the desktop can simply be a collection of window objects ("views"). These objects might be message boxes, dialog boxes, or edit windows. Closing a message box might simply erase the box, while closing an edit window might close a file before erasing the box. But an operation like close_all (clear the desktop) need not concern itself with these subtleties. It simply visits each desktop object and invokes its virtual close function.
main in Listing 7 writes shapes to cout using the << operator. For example, the statement
cout << *ls << ".\n";
puts the shape addressed by ls to cout. The definition for the overloaded operator<< appears in Listing 8. It's trivial, but dynamic binding makes it quite powerful.
The call s.put(os) in Listing 8 uses dynamic binding. That is, calling os << s to write shape s to ostream os simply calls the put function defined in the class of the dynamic type of s. As with largest, operator<< never asks what that dynamic type is. It justs makes the call, and the virtual call mechanism does the rest. Thus, you don't need to define operator<< for circle, rectangle, or any other derived type. This one operator works with any type derived from shape.

The File Hierarchy, Again
Adding the right virtual functions to the base class file should give my File objects the behavior I want. As with the shape hierarchy, not all the base class functions need to be virtual — only the ones that I override in derived classes.
There are more refinements I can make to these hierarchies that require additional languages features. Stay tuned.