Reflecting Attributes and Base Classes

C/C++ Users Journal September, 2004

Giving reflected member variables user-defined names

By Arne Adams

Arne Adams is a professional programmer who works with C++, Java, and Smalltalk. He can be contacted at inbox@arneadams.com.

Reflection is the ability to access information about a program in the language the program is written in. This includes (but is not confined to) being able to answer questions such as:

Which classes are contained in the program?
For each class, which member variables does this class have?
For each class, which member functions does this class have?
What is the purpose of each class?
What is the purpose of each member variable?
What is the purpose of each member function?

Some programming languages—Smalltalk and Java, for instance—have built-in reflection. However, using reflection in these languages comes with a significant performance penalty. When member variables are reflected, it is possible to define algorithms more generally so that they are applicable to a variety of classes. The usual suspects for these algorithms are stream operators (database or file, read or write), and nontrivial assignment operators and copy constructors.

In a recent data-centric project, I needed to write parsers/printers for a proprietary, length-encoded record file format and store some of these records in a database. None of the approaches to reflection in C++ [1,2,3] was applicable because:

I needed to support an unbounded number of member variables (the bound in Boost /tuple of no more than 10 member variables of different types was too tight).
Users of reflected classes should not have to write angle brackets to simply call get/set functions of a member variable. (Some programmers tend to have strong opinions about angle brackets in C++; I don't mind them, but implementing reflection through tuples, typelists, and the like is still an implementation detail and subject to change for its own reasons.)
It should be as easy to debug classes with reflected attributes as it is to debug hand-written classes.
Reflecting attributes should be possible for classes that inherit from user-defined classes without having to use multiple inheritance (partly because of strong opinions some programmers have on that issue).

In this article, I'll address these issues by presenting a library that reflects attributes and base classes. Among other things, this library makes it possible to let the compiler generate code to read and write the attributes of a reflected class and all of its reflected base classes to files or databases. Put more general: With the traversions in this library, it is possible to generate anything that can be achieved by applying functions to subsets of member variables. For each traversion class, the library contains a sample attribute function that shows the usage of that traversion. RecordInitializer is an attribute function that shows the usage of a flat attribute traversion, PrintRecord is an attribute function that shows the usage of a deep unary attribute traversion, and RecordIsLess is an attribute function that shows the usage of a deep binary attribute traversion.

User API

In a nutshell, I needed to be able to enumerate the (/ some) attributes of a class and access the member variables through integral constant indices. On the other hand, member-access-related identifiers should follow the corporate standards for function names and member variables (CamelHumpsNotation, or underscore_separated_identifiers, member variables prefixed with m_, or suffixed with underscore_, ...). Mixing templates and macros can then solve both problems (although there are programmers that have strong opinions on macros as well).

Any unbounded increasing sequence of numbers can be used to enumerate a finite set of values. Thus, the output of the __LINE__ macro is perfectly suited to tackle this issue (although chances are that this macro was not exactly invented to solve this problem). Using this approach, classes with reflected attributes look like Listing 1. With these definitions, Listing 2 works without any additional coding.

makeLexicographicalRecordSort returns a function object that is a binary predicate. The arguments of makeLexicographicalRecordSort are some tag classes associated with attributes of the Address class. Among other things, a reflected attribute has a unique tagclass associated with it. This tagclass is an empty struct defined in the class scope where the attribute lives. The type of the member variable is the first parameter of the DEF_REFLECTED_ATTRIBUTE macro. The second parameter is the name of the tagclass. Figure 1 is the output of Listing 2.

Streaming Credibility used an appropriate mapping from CredibilityType to the string (where PaysAlwaysInTime maps to the string "fine"). The declaration of sortedClients in Listing 2 shows the use of the compiler-generated compare function (RecordIsLess) that does a lexicographical comparison on the set of all reflected attributes.

Inheritance Support

This library supports inheriting from reflected classes. The directed acyclic graph of all reflected base classes of a given class can be traversed (depth-first) at compile time with a user-supplied function object as parameter. This function object is invoked on each reflected member variable of each reflected base class. The compiler then generates empty functions for each virtual base class that was already traversed as a virtual base class in that graph.

The call to getRecordDesc in Listing 3 lets the compiler generate a traversion of each base class with PrintRecord as a user-defined function object; Figure 2 depicts the class structure of Listing 3 as a UML diagram. PrintRecord uses depth-first traversion on the graph spanned (recursively) by the set of reflected attributes and prints their values on a stream. Since the classes here do not have reflected attributes at all, only the names of the classes are displayed:

B:      X:      Y:      B:      Z:      AA:

Here, class B is contained twice as a subobject of AA: once through the virtual inheritance X and Y contribute, and again through Z (nonvirtual).

I don't know of a portable way to detect virtual inheritance of an empty base class at compile time. That is why there are two macros to reflect a base class—DEF_REFLECTED_BASECLASS and DEF_REFLECTED_VIRTUAL_BASECLASS.

Implementation

The generated getRecordDesc function displays the attributes of a member variable that contained reflected attributes (the Addresses). For this to work, you must detect whether a given class contains reflected attributes or not. Because this library does not require specific base classes, I inject the code for that through the macros. It turns out that the declaration of the conversion operator CompileTIterableObjectTag()const in Listing 4 does the job.

Maybe you have noted that the credibility was fine per default, in contrast to a garbage-initialized enum member variable. Or as the C++ Standard puts it, the member variable will have an "indeterminate initial value" [5]. My credibility was not "indeterminate" because the default constructor defined in BEGIN_REFLECTION called initReflectedMembers, which in turn iterates over all members and default initializes the plain old data members.

When the __LINE__ macro enumerates the attributes, either users of the macro have to be careful not to leave blank lines between attribute definitions, or the implementer of the library has to be careful to skip blank lines. The Gap_Num tag class serves to detect blank lines. (The library cannot support more than one attribute definition per line.)

boost::is_same is a compile-time function from the Boost libraries (http://www.boost.org/) that yields 1 if both template arguments are the same class, and 0 otherwise.

boost::ct_if is a compile-time if/else function that works even without partial specialization of class templates [4]. Since the C++ Standard forbids full specializations in class scope (although Microsoft Visual C++ 6.0-7.1 let you get away with it), all member template classes have an additional dummy parameter. Attributes are enumerated 0-based, which is why the recursion end specialization AttributeNum<0, Dummy> has -1 as its position value.

Listing 5 shows what goes on behind the scenes when you define a reflected member variable. (The actual code uses macros to let users configure the notation of get/set functions.) The following line defines an integral constant that holds an enumerator of the current attribute:

enum {TagName##GappedNum = __LINE__ - beginLineNo};  \

and this line would expand to something similar to:

enum {StreetGappedNum = 27 - beginLineNo};

Having an enumeration with (possibly) gaps, you need to calculate the number of the attribute if there were no gaps. Therefore, you have to recursively check all preceding line numbers until you reach a line that contains an attribute-defining macro or until the number to check is 0. The latter case indicates that there was no preceding attribute and the specialization with the position value (-1) is chosen. In any case, the number of the current attribute is the preceding position +1. That is what both the specialized AttributeNum template and the general AttributeNum template do.

With the following line, I store the 0-based, gapless number as an integral constant in the scope of the enclosing class:

enum {TagName##Num = AttributeNum<TagName##GappedNum>::value};

This line expands to something similar to:

enum {StreetNum = AttributeNum< StreetGappedNum >::value};

where the value would be 0 in my example.

Once you have the number of the attribute (in an enumeration without gaps), you can define other functions for that attribute through specialization of member templates. There is a function that maps the tag of the attribute to its number (AttributePos), a function that maps the number of the attribute to its tag (GetNameTag), a function that maps the number of the attribute to its ValueType (GetValueType), and a function that maps the tag of the attribute to its ValueType (GetValueTypeByTag). Listing 6 summarizes the contents of the preceding attribute definitions.

For each reflected class, the number of reflected attributes is a compile-time constant in the scope of that class. Reflecting base classes of a class uses the same trick, only tags are not needed to "name" a base class.

Iterating Over Attributes

I borrowed from the STL the idea to separate code that iterates over a set from the code that does something to an item in that set. Iterations can be:

foreach direct member variable.
Depth-first traversion on each member variable (for instance, std::container will be processed elementwise in this traversion).
The preceding traversions combined with depth-first traversion on the graph of all base classes of a class.

These traversions apply a user-defined function object to each attribute that is visited by the traversion.

Listing 7 shows how the iteration over direct member variables is implemented. processSingleAttribute has to call the user-defined function on the current attribute, and as long as there is more than one attribute left to treat, I have to process the current and the next attribute (okay, in C++ these two lines take more space than two lines).

The functions I did need were happy to get the AttributeTag as information (that is, as a type parameter for the function call operator()). At least one function did take advantage of the context information (the class that holds the current attribute). That is why I pass the attribute, the attribute tag, and a pointer to the enclosing class to the function.

A Concrete Example

I initialize plain old data (pod) members in the constructor of each reflected class because I have run into trouble with indeterminate values. Chances are (although admittedly, infinitesimal) that other programmers have run into similar problems. Chances are as well (even smaller) that other programmers will continue to run into this kind of problem. Chances are (unfortunately quite considerable) that I will continue to run into this kind of problem every now and then. The remedy is straightforward, as in Listing 8.

The solution consists of three statements and a function that does nothing. There is nothing more to do because for member variables that are instances of std::containers, the C++ Standard guarantees that all elements of that container are default initialized. Each member variable that is an instance of a reflected class has its plain old data members default initialized because (or if) all constructors of reflected classes use the RecordInitializer.

boost::is_pod is a compile-time function that maps all classes that are not plain old data to the integral constant 0, classes and structs (even if they are pods) to 0 as well, and the rest to the integral constant 1. Classes with reflected members have a user-declared constructor and are thus not aggregates in the sense of the C++ Standard [5]. In particular, these classes are not pods [5]. Hence, member variables that are instances of reflected classes will be default initialized. (The actual implementation checks for reference and const qualification as well.)

Iterating Over Attributes

Some algorithms might need to traverse the complete graph spanned (recursively) by the reflected attributes of a class. For instance, the type of a reflected member variable might be std::vector<std::vector<Address> >.

My library currently supports depth-first traversion of object graphs. Depth-first traversion was used to print the state of a reflected object on a stream. The following graph events turned out to be useful for that:

Enter attribute.
Leave attribute.
Enter reflected class.
Reflected class.
Enter collection.
Collection.
Enter item of a collection.
Item of a collection.

A user-defined function object must supply suitable overloads of the function-call operator for all of these events. The enter-event functions are called before the corresponding node is processed, and the leave-event functions are called after the corresponding node is processed. Consequently, a user-defined function object that can be used with the depth-first algorithm has to define the function-call operator overloads specified in Listing 9. These overloads can also be implemented for all tags (Listing 10). Here, a leaf is simply something that is neither an instance of a reflected class nor an instance of a standard library container.

Another Example

To illustrate, in Listing 11 records can be printed on an ostream. (Since I only need this for debugging and tracing, I didn't worry too much about formatting the output.) print calls the depthFirstForeach algorithm that traverses the reflected base classes of the record along with each attribute. The parameters for that algorithm are the record to print as a dataTuple and the printer as a function object. The depthFirstForeach algorithm then iterates over all attributes of the record and calls the printer's operator() overloads at the corresponding graph events, and it also calls the leaf-overload:

operator()(const ValueType& value,NameTag,const Scope*)

with each value in the object graph that is neither compile-time (nor runtime) iterable (the latter is the case if the value is an instance of an std::container).

ObjectPath is basically a stack of pairs of position numbers and attribute names. When you enter an attribute of a reflected class, you need the scope of the attribute because you can only get the name of the attribute from the scope. You can't ask a Client for the name of the Addresses Street attribute (which won't even compile).

Other Uses

A simple persistence mapping (one table per class) could be defined like Listing 12. Moreover, it has been argued that template code is hard to write because the compile process cannot be debugged. This holds only partially. You can always debug the template generation process with printfs, as in Listing 13. The call to functionThatIsNotEvenDeclared issues a compiler error that contains (hopefully) the value of the template argument.

Conclusion

With this library, reflected member variables have a user-defined name, and thus do not differ from handwritten code in that respect. In particular, debugging reflected classes is not different from debugging nonreflected classes. Defining reflected members is easier than defining nonreflected members because you get the get/set functions for free. I've tested this library with Visual C++ 6.0, Visual C++ 7.1, and GCC 3.2. On each compiler, I was able to generate a class with 129 member variables. This is more than I ever needed and I assume that this is not an upper bound for any of the tested compilers.

Acknowledgments

Thanks to Blanca Martinez for correcting what I thought was English, Dietmar Leibecke (from applied technologies GmbH, Germany) for reviewing this paper, and Vodafone Information Systems, for providing a productive environment that led me to the __LINE__ trick.

References

[1] Winch, Emily. "Heterogenous Lists of Named Objects" (http://www .oonumerics.org/tmpw01/winch.pdf).
[2] Weiss, Roland J. and Volker Simonis. "Storing Properties in Grouped Tagged Tuples" (http://www-ti.informatik.uni-tuebingen.de/~weissr/ doc/Props.pdf).
[3] Alexandrescu, Andrei. Modern C++ Design, "Class Generation with Typelists," Addison-Wesley, 2001.
[4] Czarnecki, Krzysztof and Ulrich W. Eisenecker. Generative Programming, "Explicit Selection Constructs," Addison-Wesley, 2000.
[5] International Standard ISO/IEC 14882 Programming Languages C++.
[6] http://sourceforge.net/projects/loki-lib/.