July 1994/Structure Mapping Techniques in C++

Features

Structure Mapping Techniques in C++

Adam Greissman

Adam Greissman is the New York I.T. Manager for Capital Markets and Treasury at Price Waterhouse. Currently he is implementing systems in C++ to perform Credit Risk Managemnt and Market Analysis. Direct queries to: (212) 596-7000. Compuserve ID 70762,2017.
Structure Mapping is a technique that is used to map the contents of C structures by member name and member data type. Structure mapping can also be a useful technique for C++ classes. As I will demonstrate, structure mapping eliminates duplication of coding effort in creating certain kinds of descendant classes. This technique enables users to create code at the base-class level that implements persistent storage, assignment, and comparison of descendant class objects. This is not the same idea as Run-Time Type Identification (RTTI). RTTI attempts to solve problems related to downcasting, when a object is held by reference to a base class. Structure Mapping solves a different problem. Consider the following code:
class Point {
   int x, y;
public:
   Point( int x, int y ) : x( x ), y( y ) {}
   friend ostream &operator << ( ostream &o, Point &point );
};
In C++ there is no reasonable way to write a generic output routine for this class without naming the class members x and y directly. For example:

ostream &operator << ( ostream &o, Point &point ) { 0 << "x = " << x << " y = " << y; return 0; }
Each time the program creates an object having this type of persistence, it must repeat the following algorithms:
1. State the member name
2. State the member value
Common techniques cannot eliminate this duplication effort, because a base class has no knowledge of descendant class members. Encapsulating a ToString and FromString algorithm in virtual functions might seem to solve the problem, because it makes these functions accessible from the base class. However, using virtual functions in this case will not decrease coding effort. For each addition or modification to Point the programmer will have to make at least three modifications to the source code. And this is for only one basic operator. What's worse, we have yet to define the copy constructor or a corresponding input operator.
Structure Mapping techniques address this problem by creating a data map of the structure at declaration time. The Structure Map is a list of:
1. The class member name
2. The class member data type
3. The offset of the class member within the class instance
This list allows access to class data members, even in the absence of the descendant class declaration. This technique enables construction of base-class methods for streams, deep copy, assignment, and comparison without having to write any code at the descendant class level. In the first part of this article, I cover the fundamentals of implementing Structure Maps. In the second part, I discuss a preprocessor technique that allows creation of both the class declaration and the map definition in single pass.

The Structure Map
The structure map contains three essential elements. The Field object declares the type and name of the class data member. The Map object is a collection of fields, which describe a class. The MappedObject owns a Map, and thus has access to certain guaranteed methods.

The Field Class
The Field class is declared in Listing 1. The concrete attributes of the class are the data member name and the offset of the data member in the instance object. The virtual attributes concern the implementation of the field data type. For example, the Field::toString function takes a destination string and the base address of the class instance that contains the field. Classes that descend from Field, for example IntField, are responsible for dereferencing the data and converting it to string form (Listing 2) .
Note that the IntField begs to be implemented using a template, since many numeric types will have a similar implementation. Since the common data types usually resolve to numeric types, strings, and dates, templates would make a very useful optimization here. I also want to draw attention to my use of a macro to dereference the data. In this case, I use a macro because DEREF operates across all data types, not just numerics. I know this sort of thing makes some programmers queasy; it might seem that I could accomplish the same trick with a parameterized function, but not in this case. C and C++ sometimes require this sort of preprocessor trick, especially when portability across environments is desired, since the behavior of the preprocessor is much more consistent than the current range of template implementations. Macros are useful when they encapsulate an idea that cannot be encapsulated any other way. In the case given above, a parameterized function would require that an instance of type be given as a function parameter. No such instance is available at the point of invocation. Therefore, I use a macro instead.

The Map Class
The Map class declares the collection of fields, and provides convenient tools for iteration through that set (Listing 3) .
Note that the map contains the name of the object being mapped. The object name will be useful when it's necessary to verify that objects of like type are being compared or assigned.

The MappedObject Class
The MappedObject class declares the methods that will be inherited by all classes having a map (Listing 4) . Now things really get interesting! All mapped objects will be streamable, comparable, assignable, and printable. Best of all, I don't need to write any code to support these behaviors in descendant classes. To understand how this is accomplished, review the code for the stream operators (Listing 5) .
The output operator outputs the object name, followed by the field contents in string form. (Using ASCII as a persistence medium has many wonderful side effects, not the least of which is that it provides the basis for command-file implementation. Other significant problems can benefit from the same code structure, such as binary transfer or SQL column binding.) The input operator verifies that the object in the stream is the object that is expected, relative to the map. (This in itself is interesting, because it suggests how collections of maps can be used to provide persistence mechanisms for streams that pass heterogeneous class types.) The operator inputs each of the fields, and converts the data from string form using fromString.
The code for this transformation is simple and elegant. More importantly, I can write this routine at the lowest baseclass level. Every class that descends from MappedObject will have this form of persistence, by virtue of its map definition alone.

Implementing a Mapped Object
In the preceding paragraphs I have discussed the basic properties of maps, and have shown how they can be used to implement powerful base-class methods. Now I can review the code required to implement a MappedObject, assuming the existence of IntField and CharField classes (Listing 6) .
The code required to implement Point is not overly burdensome. Point::fieldv does all of the work. Nonetheless, this technique requires two additional uses of each data member to support the map. (In the second portion of this article I demonstrate how to use the preprocessor to factor out the map definition.)
Function main below shows the benefit of this approach. I can stream Point, print it, and utilize the relational operators — all without having to write code to implement that behavior.
void main()
{
   char buf[ 256 ];
   ostrstream os( buf, sizeof(buf) );
   istrstream is( buf, sizeof(buf) );
   
   Point point( 3, 6, "Hello_world" );
   
   cout << point << endl;
   // Demonstrate stream output
   
   Point p2;
   
   os << point << ends;   // In one stream...
   is >> p2;              // and back out again!
   
   cout << p2 << endl;        // Dump p2 to verify
   
   p2.print();                // Pretty print p2
   point.print();             // and the original point
   if (point != p2) {         // test a logical operator
      printf( "Error in comapre\n" );
      exit( -1 );
   }
}
Running the program produces the following output:
Point 3 6 Hello_world Point 3 6
Hello_world Begin Point {
    int    x                     <3>
    int    y                     <6>
    char*  text                  <Hello_world>
} End Point;
Begin Point {
    int    x                     <3>
    int    y                     <6>
    char*  text
<Hello_world>
} End Point;
The point and p2 instance of the Point class demonstrate the streamability and useful printing behavior that it inherits by being a mapped object.

Plastic List Manipulation
Plastic List Manipulation (PLM) is a preprocessor technique which I use with the structured mapping technique just presented. PLM gives a list of expressions different syntactic form depending upon the context of their usage. (Because the meaning of the list has become context sensitive, I say that it has plasticity.) The key to PLM is that it surrounds items in a list with macro wrappers, which describe their function at a meta level. For example, the following is a PLM of the Point class from the previous section of this article:

#undef MapName #define MapName Point MapBegin( Point ) MapInt( x ) MapInt( y ) MapChr( text ) MapCpp public: MapCpp Point(); MapCpp Point( int x, int y ); MapEnd( Point )
I would typically keep this declaration in a source file called an Object Definition Table. An ODT is a special type of header file that contains the MappedObjects existing within an application. This is ugly, I admit. Not only that, but the preprocessor tricks required to turn this into the Point class declaration's header file pass and the structure map definition's source file pass are even uglier still. However, what makes PLM beautiful is that from this single declaration I can derive both the class declaration and the map definition. Advanced uses of these Object Definition Tables will leverage this meta construct even further. From this same declaration I can auto-generate functions to:

Create an object given its name

Return the map of an object given its name

Exercise SQL-based objects in a system test harness.
I can do all of this without having to write code at the application level.

PLM Mechanics
A plastic list contains at least three components:
1. A list of items in a macro wrapper
2. A header file to define the meaning of the wrappers
3. A header file to undefine the meaning of the wrappers
Typical usage in a source file would look like this:

#include "mapdef.h" // Define the wrappers #include "mylist.dec" // Include the list #include "mapclr.h" // Clear the wrappers
For example, the macro MapInt() declares a class member, which is an integer. In the declare phase, I want to express:

#define MapInt( x ) int x; // Resides in mapdec.h MapInt( myVar ) // Resides in an application header file #undef MapInt // Resides in mapclr.h
Three files have come into play. Only one of them, in this case the Object Definition Table, contains the application-specific information. In the define phase, all I need to change is the first file, the one that contains the macro wrappers:

#define MapInt( x ) IntField( #x, offsetof( MapName, x ) ), // Taken from mapdef.h

Using PLM With Structure Maps
Statements in the ODT fall into one of four categories:

MapName/MapBegin Begin Object MapInt/MapChr Object Data Members MapCpp C++ Declaration (irrelevant to the data map) MapEnd End Object
This system will allow me to map objects of arbitrary complexity. I can extend the MapInt and MapChr data-member macros, both at the library and the application level. The following is the ODT version of the Point class:

#undef MapName #define MapName Point MapBegin( Point ) MapInt( x ) MapInt( y ) MapChr( text ) MapCpp public: MapCpp Point(); MapCpp Point( int x, int y ); MapEnd( Point )
This technique is also ugly, but again, justified by its usefulness. The MapBegin macro must accomplish several goals:
1. It must undefine the previous MapName token.
2. It must define a new token for MapName.
3. The token it defines must be stringizable in MapBegin and MapEnd.
Unfortunately, points 1-3 cannot be combined into a single macro. Point 3 is more subtle than 1 and 2, but also more distressing. Considering the following:

#define MapBegin( MapName ) char Class##MapName[] = #MapName; #define MapBegin char Class##MapName[] = #MapName;
These lines resolve to:

char ClassPoint[]= "Point"; char ClassMapName[]= "MapName";
The first line works, the second line does not. Neither stringizing nor token concatenation are permitted with visible tokens, just macro arguments. Another approach that won't work is to try to define macros to begin and end C++ blocks, as opposed to the MapCPP macro, which is effective for the current line only. At least, I haven't found a way to do it. I won't describe the roadblocks to this approach here, except to say that they also involve characteristics of the C++ preprocessor.

Conclusions
This article has demonstrated how Structure Mapping can be used to factor out code for class methods involving persistence, assignment, and comparison. There are many other uses for this technique. Price Waterhouse is currently using a system similar to the one described in this article for a risk-management project at Chemical Bank in New York. On a project-wide basis, we have seen code reductions in the neighborhood of 20 to 30-1. Another leverageable aspect of the Map is the ability to refer to classes and class members by name. In the case of the class name, once a mechanism is developed for containing sets of maps, objects can be created and utilized via that name.
Plastic List Manipulation allows creation of a single list which will serve double duty. The principle drawback of the system is that the application-level information needs to be stored in a separate declaration file, apart from the normal header and source files. PLM is typically used to solve variants of the define/declare problem, which generally use a list of items at least twice, once to declare that list in a header file, and once to define it in a source file. PLM allows the programmer to eliminate that duplication, and maintain a single list.
Structure Mapping benefits from this approach, because it allows the programmer to consolidate the declaration of mapped objects into a single file. Structure Mapping and Plastic List Manipulation are not "pretty" techniques, but they are very practical. Together they can achieve code reductions that are truly staggering. Not every system will benefit from this approach, because it carries a fair amount of baggage and learning curve. However, in systems that implement complex data models, the benefits make the system worthwhile.