Kenneth Van Camp is a Vice President at Bear Stearns Securities Corporation. He has been programming in C++ for about four years, and in C for about twelve years. His applications experience includes securities, databases, graphics, product planning, contact management, engineering simulations, real-time control, and robotics. He may be reached via e-mail at camp@bear.com.
Introduction
Of all the techniques offered by object-oriented programming (OOP) languages, inheritance is probably the most powerful. And of all the OOP languages, C++ is generally regarded as the most robust and flexible. Yet despite all its flexibility, C++ still lacks some basic features found in other OOP languages. One of these is dynamic inheritance.Most C++ programmers are familiar with the technique of multiple inheritance. There are times, however, when the multiple inheritance approach breaks down. A class may need to inherit properties from several different base classes in different combinations, resulting in an unwieldy number of derived classes.
Consider a common business problem, in which a table of data may require several different representations. Table 1 shows a typical spreadsheet interface to display a stock market portfolio.
Several other views of this data may be required. For instance, the user may need to see a summary of the portfolio by type (stocks, bonds, futures, etc.). It may be necessary to show data as a percentage of the total, or normalized relative to a baseline data point. The application may need to show aggregates by week, month or year, averages, percentages of the total, or the results of other formulas specific to the data being presented.
All of these represent different views of the same data, and their display can be thought of as a specialization of the basic data display. This is an obvious application of inheritance, but the number of possible combinations is large. The program may need to show monthly aggregates in either standard or normalized form. Type summaries may be required by the percentage of the total. The list goes on and on.
For example, Figure 1 shows a multiple-inheritance approach to one of these views using the Booch notation [1]. The arrows joining the different class "clouds" show the inheritance relationships between them (e.g., TableDataAverage is inherited from the TableData and Average classes).
In this diagram, TableData is the basic class to hold a table of data. By deriving TableDataAverage from both TableData and an Average class, the user of this class could show an average of the table of data. Likewise, by deriving TableDataNormal from both TableData and a Normalize class, a normalized view of the data might be created. And by deriving a class from all three of these base classes, an average of the normalized data (TableDataNormalAvg) could be created.
Now consider what would happen if you added a fourth base class to this hierarchy say a percentage mode class. Here is a list of all the new classes you would have to define to allow the user to access the possible views:
Percent TableDataPercent TableDataPercentAvg TableDataPercentNormal TableDataPercentNormalAvgIn fact, there may be other classes if the attributes are order-dependent. For instance, is the average of the percents the same as the percents of the averages?By now you have seen the problem. As new attributes are added, the number of classes in the hierarchy grows quickly. In fact, the growth is exponential. The inheritance tree will become a tangled web, and maintenance will be a nightmare.
Dynamic Inheritance
What the program needs is a class whose full inheritance is established at run time. Although C++ does not directly support dynamic inheritance, it can be simulated with a technique known as the envelope-letter idiom. Using this approach, each instance of a class maintains a pointer to another object of the same (or similar) type. In this way, multiple objects can be connected to form a "chain." Each object specializes, or filters, the data it receives from the previous object in the chain. Since links in the chain can be dynamically assigned at run time, and even modified on the fly, the effect is similar to that of dynamic inheritance. (For more details, see the sidebar, "The Envelope-Letter Class Idiom.")The approach I have described is analagous to the "filter" concept that was championed by the designers of the UNIX operating system (and later emulated in MS-DOS and others). In UNIX, a user can "pipe" the output of one command to the input of another command. Generic filter commands (like cut, paste, sort, etc.) can be linked up to form complex commands whose inner workings are invisible to the user.
Consider the class hierarchy diagram shown in Figure 2. I will use this class hierarchy as an example, throughout the rest of the article, to illustrate a useful application of filter classes. The example will allow a table of data values to be displayed in four different display modes: the original data, the average of the original data, the normal of the original data, and the average of the normal of the original data. Adding a new option (percentage mode, for example) would only add one additional class to the hierarchy, yet would create four additional display modes (the same as the original four, but shown in percentage mode).
In Figure 2, the line circling from TableData back to itself, with a circle on one end and a box on the other, shows that each TableData "uses" another instance of itself. This shows the envelope/letter relationship. It is the link between objects in the chain. The 'A' in the triangle denotes that TableData is an abstract base class.
Notice that all of these filter classes are descended from a common base class, TableData. This is important, because the link that joins objects in the chain is a pointer to an object of the base class type. Polymorphism in C++ will be used to select the proper filtering methods to invoke for each object.
Next consider Figure 3, which shows one possible way to arrange filters in a chain. This is an object scenario diagram (again using the Booch notation), and shows how the objects may be linked at run time (the "uses" relationship from Figure 2) . In this case an instance of the TableDataMarketBuf class (called aTableDataMarketBuf) holds the original data to be presented. It is used by an instance of the TableDataNormalize class, which in turn is used by an instance of the TableDataAverage class.
The 'F' in the box on aTableDataMarketBuf merely notes that this object is a field in aTableDataNormalize, and the arrow pointing toward this object shows that aTableDataNormalize invokes methods from aTableDataMarketBuf. It does not indicate the direction of data flow. In fact, if anything data tends to flow more heavily in the direction against the arrow. This may be more clear if you think in terms of the pipe analogy made earlier:
aTableDataMarketBuf | aTableDataNormalize | aTableDataAverageBy accessing methods at run time from aTableDataAverage, the user would see the average of all the normalized values. A second view of the data could be gained simply by accessing methods from aTableDataNormalize. The result would be the normalized values without averaging. Yet a third view of the data could be presented by omitting the middle object in the chain (aTableDataNormalize), which would show the average of the original values (unnormalized).
Implementation
Listing 1 shows the base class, TableData, from which all others are derived. Its lone constructor accepts a pointer to the previous TableData object in the chain as its argument. Referring back to Figure 3, when instantiating a TableDataNormalize object, this pointer would reference the TableDataMarketBuf object. When instantiating TableDataAverage, it would reference the TabledataNormalize object.The methods in the TableData class are all quite simple. Any requests for information are simply passed up the chain to the previous object. In this sense, TableData is a "do-nothing" filter that never alters any data. Its main purpose is to maintain PrevTD, the pointer to the letter object in the envelope.
Notice that all of TableData's methods are virtual. This is very important for dynamic inheritance to work. Polymorphic behavior will cause the proper virtual method to be invoked for each object.
In my implementation, TableData is actually a template class. This makes TableData a generic container class that can hold any type of tabular data, but it is not important for the filtering behavior we have been discussing.
To make TableData a useful class, you must add some processing in a derived class. Listing 2 shows the TableDataNormalize class, which overrides the GetCell and PutCell methods. Instead of merely passing values on from the previous filter in the chain, TableDataNormalize::GetCell performs some calculations to normalize each data value (i.e., divide each cell by the value in the first row).
In order to write TableDataNormalize, it was necessary to make some assumptions about the cellType class that the template is parameterizing. Since the normalizing process requires calculations, the cellType class must contain some type of numeric data. For that reason, TableDataNormalize assumes that the cellType class has GetValue and SetValue methods to operate on its numeric data.
Listing 3 shows the TableDataAverage class. This filter differs from both the TableData and TableDataNormalize classes in that it also changes the size of the table. For the previous two classes, the GetNumRows method always returned the value of the previous filter. The TableDataAverage class, however, collapses an entire table into a single row. The GetCell method, which is only valid for row zero, returns the average of all cells in the specified column.
Data Buffers
None of the classes presented so far actually contain any data they merely filter the data from a previous filter in the chain. In order to be useful, however, it is necessary to put some real data in the chain. This is where the TableDataBuffer class comes in (Listing 4) . In addition to accepting a pointer to the previous filter in the chain, the constructor accepts a count of the number of rows and columns in the table. Then it allocates a buffer large enough to hold all the cells in the table. The RefCell method and the three validation methods provide a convenient interface to the buffer. The remaining methods are for compatibility with the base class.In order to load the table with real data, it is necessary to derive an application-specific class from TableDataBuffer. Listing 5 shows the TableDataMarketBuf class, which is an example of one such class. In a real application, this class would probably contain a reference to a database interface object. For this simple example, however, I have merely inserted some dummy data in the buffer. In addition to containing the main buffer inherited from TableDataBuffer, the TableDataMarketBuf class also contains two arrays to hold the column and row headings.
The second class in Listing 5, ValAttrCell, is an example class to be used as the cellType template parameter in any of the TableData classes. One instance of this class will be created by TableDataBuffer for each cell in the table, so this class should be relatively simple and efficient. Note the GetValue and SetValue methods as previously noted these are required for compatibility with the TableDataNormalize and TableDataAverage classes.
The ValAttrCell class holds two pieces of data for each cell: a floating-point value and an enumerated attribute that can be used to hold cell-specific display options. It may make sense to add other data to this class, for instance a date field or other fields to hold application-specific data.
Finally, Listing 6 shows a simple demonstration of the use of these classes. TableDataMarketBuf must be instantiated first. This will be the first object in the chain of filters (which you can tell because its PrevTD pointer is null). It's important for the beginning link in the chain to be a buffer class (i.e., descended from TableDataBuffer). If a non-buffer object started the chain, there would be no data to filter. (This is the reason why many methods in the non-buffer classes contain the assertion PrevTD != 0.)
Next, the TableDataNormalize class is instantiated, with its PrevTD referencing the TableDataMarketBuf object. Simply by instantiating these two classes, we have created two different views of the data. By using methods from the buffer object, we see the original data. By using methods from the normalizing class we see the normalized view.
This example program also demonstrates the TableDataAverage class, by creating two additional objects: one pointing to the original data buffer (the average of the original data) and another pointing to the normalized view (the average of the normalized values). Finally, the example program prints the two sets of data. The original table of data is printed, followed immediately by the average of all the rows. Then the normalized table is printed, followed by its average. The output is shown in Listing 7.
Other Applications
The techniques shown in this article are not restricted to displaying tables of data. Any data-centered application that requires multiple views of the data can use this approach. It is particularly useful when an application must show multiple views simultaneously, because a single data buffer can be routed through multiple display filters to present two or more simultaneous views. Even if your application shows only one view at a time, however, the application of filter classes can greatly enhance the flexibility of your programs.A word processor, for instance, might use the filter approach to separate character data into words, phrases, sentences, pages, paragraphs, or chapters. Such filters might be used for something as complex as a language-recognition engine, or as simple as a continuously updated word count.
In some applications, it may be necessary to extend some of the concepts presented in this article. For instance, there is nothing that says each filter must have a pointer to only a single "previous object" in the chain. A concatenation class might reference multiple objects to join two or more tables together. This could have been used effectively in the example presented, where an average row was displayed after the full table of data.
My simple example makes no allowance for dynamically relinking objects after their instantiation, although this is easily done. The programmer must just make sure that any buffers are refreshed from the underlying filters.
It may also make sense to make more of the objects be buffer-holding objects. The TableDataAverage class, for instance, could be modified to calculate the average when it is first instantiated and hold it in a buffer, instead of always recalculating the average when the value is requested for an individual cell. A tradeoff exists here in terms of space versus time, and the correct mix depends upon the application requirements. Maintaining multiple buffers may also make it more difficult to show multiple views simultaneously, unless a method is implemented to force changes to be refreshed in the different views.
Maintaining multiple buffers in the chain can be especially useful in database applications. The first buffer can represent the original state of the database when the data was loaded and a second buffer can hold user updates. To commit the user's changes to the database, you can write a method to compare the first buffer to the second buffer and only update the values that changed.
Filter classes are not appropriate for all applications, but they are a useful addition to any programmer's repertoire. You might want to think about using them when you design your next program.
References
1. Booch, Grady, Object Oriented Design With Applications, 2nd ed. (Benjamin/Cummings, 1993).2. Coplien, James O., Advanced C++ Programming Styles and Idioms (Addison-Wesley, 1992).