C/C++ Users Journal February 2004
Graphs are collections of related nodes and the arcs among them. One way to represent graphs is with data structures, where each record represents one node, and pointers to other records represent arcs. I call these types of graphs "passive graphs," because outside algorithms act upon the graph.
Another way to represent graphs is with interfaces. In this approach, each implementation of an interface represents one node, and pointers to other interfaces represent arcs. I call these types of graphs "active graphs," because they perform their own actions. (A third way to represent graphs is as a set of values implicit or explicit to be acted upon by generic algorithms, as in the Boost Graph Library, http://www.boost.org/. These "functional graphs" have many of the same advantages as active graphs, but are more applicable to graph-specific problems such as shortest path, minimum spanning tree, or closure. Active graphs lend themselves to solutions that involve graphs, even when the problem does not.)
While passive graphs are still used to solve many problems, active graphs provide additional benefits. Passive graphs are inherently procedural, while active graphs are inherently object oriented. In particular, active graphs provide data hiding and polymorphism in ways that passive graphs cannot.
Since a passive graph is a data structure, it exposes all of its state to the procedures that act upon it. Once procedures become sensitive to data structures, the structures can't be easily changed. When you add a new feature requiring more information from the graph, all procedures that access that structure are potentially affected. Active graphs, however, hide their state and implementation behind interfaces. New features are added as classes implementing the common interfaces and encapsulating their own data and procedures. Software organized in this way tends to be less rigid and less fragile.
Similarly, a passive graph, being a static data structure, cannot change its behavior polymorphically. The procedures acting upon the data structure may react to the data by taking different branches. But for this to work, all possible behaviors are programmed into one monolithic procedure (think "case statement"). An active graph, however, is a collection of implementations that can change behavior from node to node. Each node encapsulates its own behavior, and isolates it from the behavior of other nodes.
Active graphs appear in many object-oriented software systems. For example, Java's Swing UI library (http://java.sun.com/j2se/1.4.2/docs/api/javax/swing/package-summary.html) is comprised of many different classes implementing a small set of interfaces (JComponent, LayoutManager, ActionListener, and so on). You construct a UI by creating several Swing objects and connecting them through their interface pointers. The graph thus created is traversed as the nodes themselves call methods through the interfaces to other nodes. Furthermore, you can add new components to Swing simply by implementing the expected interfaces in new ways. Graphs created with these new components adopt their behavior.
The most famous example of an active graph is the State pattern [1], an object-oriented technique for implementing state machines. In this pattern, an interface defines the type of all states in a machine. Each method of this interface represents behavior that is dependent upon the state of that machine. One or more of these interface methods advances the machine to the next state, either by returning a pointer to the next state interface or by passing this pointer into a callback function.
A couple of issues arise in the implementation of the State pattern. First, the lifetime of state objects must be managed. State objects might be created as needed and destroyed in a transition, or they might all be created ahead of time. Furthermore, they might be shared among several state machines as flyweights or Singletons [1]. Second, a graph should usually behave as one unit, acting within one context. As documented in [1], the State pattern places the context in an extrinsic object, and passes a pointer to this object into every state method. Indeed, these implementation issues arise in most active graphs. Fortunately, one technique solves the lifetime issues and removes extrinsic context information from the parameter lists.
Both of these problems are solved if the graph is itself one instance instead of a collection of related objects. Each node in the graph is no longer its own object, but rather a facet of the one graph object, with one uniform context and one lifetime. A facet is "an interface pointer that limits a client to a subset of the full interface" of an object [2] [3]. To design with facets is to recognize that one class can implement several interfaces. To implement facets, however, requires knowing how to accomplish this when all of the interfaces are of the same type.
The term "facet" correctly connotes the relationship between the interface and the object, bringing to mind a gemstone whose many faces tessellate its surface. The faces are the boundaries between the stone and the outside world they are its interfaces. Imagine an icosahedron with each of its faces numbered 1 through 20. (If you are a fan of role-playing games, you probably have such a device.) All of the faces are congruent they have the same type; namely, an equilateral triangle. However, the faces are not all the same; they each have a different identity as evidenced by the different numbers. Furthermore, these facets are Liskov substitutable if you roll this object, any one of the facets can appear on top when it comes to rest. But do not confuse these facets for distinct objects. They are all part of the same 20-sided die.
In object-oriented programs, facets are all part of one object, even though they have different identities. Upon first consideration, this sounds like a problem to be solved via multiple inheritance (MI). Through MI, a single C++ class can implement many different interfaces. The C++ compiler creates a new virtual table for each interface, and populates it with pointers to member functions matching the interface method names.
But MI runs into a snag. As in the icosahedron example and State pattern, you often want one class to implement many interfaces of the same type. But because method names are the means of interface resolution, a single C++ class cannot directly implement several instances of the same interface type. You have to get a little fancy.
Suppose you want to implement multiple facets for this interface:
class State
{
protected:
~State() {}
public:
virtual State *accept( char c ) = 0;
};
You have to find a way to trick the compiler into generating a different virtual table for each facet. In an article on interface programming, I demonstrated how to accomplish this with intermediate classes that I called "interface interpreters" (see "Program for Change: Use Interfaces in Your C++ Programs to Simplify Maintenance," Visual C++ Developers Journal, November/December, 1999). These intermediate classes essentially rename the interface methods.
class StateHead : public State
{
protected:
virtual State *head_accept( char c ) = 0;
public:
State *accept( char c ) {
return head_accept( c );
}
};
Now, the multifaceted class that inherits all of these intermediate classes can implement all of the renamed methods simultaneously. To select among the facets, the pointer to the multifaceted class is cast to a pointer to an intermediate class.
class Parser : private StateHead, StateTail
{
private:
State *head_accept( char c );
State *tail_accept( char c );
public:
State *start() { return (StateHead *)this; }
};
State *Parser::head_accept( char c )
{
if ( isalpha(c) )
return (StateTail *)this;
else
return NULL;
}
State *Parser::tail_accept( char c )
{
if ( isalnum(c) )
return (StateTail *)this;
else
return NULL;
}
The problem with this technique is that each intermediate class must be a different type, even though they all do the same thing: rename interface methods. Writing them all by hand is tedious. In my aforementioned article, I showed how to use macros to define intermediate classes (not much better). More recently, I've discovered how to write a class template that generates the family of intermediate classes, using overloading instead of renaming.
template< int S >
class FState : public State
{
protected:
virtual State *accept( FState<S> *, char c ) = 0;
public:
State *accept( char c ) {
return accept( this, c );
}
};
But even this technique leaves something to be desired, as the facet methods must take an additional parameter that will be ignored. There has to be a better way than multiple inheritance.
In Secrets of the C++ Masters [2], Jeff Alger proposes an implementation of facets, though not for the purpose of implementing the same interface type in multiple ways. Again, he declares an intermediate class, but instead of MI, he uses PIMPL [3]. The resulting code, when used to implement the State interface, looks something like this:
class Parser;
class StateHead: public State
{
private:
Parser *m_p;
public:
StateHead( Parser *p ) : m_p(p) {}
State *accept( char c );
};
/* StateTail omitted. */
class Parser
{
private:
StateHead head;
StateTail tail;
State *head_accept( char c );
State *tail_accept( char c );
friend class StateHead;
friend class StateTail;
public:
Parser() :
head( this ),
tail( this ) {}
State *start() { return &head; }
};
State *StateHead::accept( char c )
{
return m_p->head_accept(c);
}
This code can be generalized in two ways. First, replace the hard-coded member function to which the facet delegates with a pointer to a member function. This lets all facets of Parser share one type:
class Parser;
class FState : public State
{
public:
typedef State *(Parser::*Function)( char c );
private:
Parser *m_p;
Function m_f;
public:
FState( Parser *p, Function f ) : m_p(p), m_f(f) {}
State *accept( char c ) { return (m_p->*m_f)(c); }
};
class Parser
{
private:
FState head;
FState tail;
State *head_accept( char c );
State *tail_accept( char c );
public:
Parser() :
head( this, head_accept ),
tail( this, tail_accept ) {}
State *start() { return &head; }
};
The second generalization is to decouple the intermediate class from the Parser class, which lets all graph implementers reuse one facet type. You simply replace the Parser class name with a template parameter, and FState can live in a library header file:
template< class IMPL >
class FState : public State
{
public:
typedef State *(IMPL::*Function)( char c );
private:
IMPL *m_p;
Function m_f;
public:
FState( IMPL *p, Function f ) : m_p(p), m_f(f) {}
State *accept( char c ) { return (m_p->*m_f)(c); }
};
class Parser
{
private:
FState< Parser > head;
FState< Parser > tail;
State *head_accept( char c );
State *tail_accept( char c );
public:
Parser() :
head( this, head_accept ),
tail( this, tail_accept ) {}
State *start() { return &head; }
};
To demonstrate the technique of facets via delegation, as well as the power of facets for implementing active graphs, I have created an interpreter for the Xerces XML parser. Xerces implements the SAX protocol for responding to parts of an XML document as events. I demonstrate the use of this interpreter on a modification of the sample schema found in the Xerces-C download package [4]. I tested the source code with Version 2.3.0.
The schema from Apache presents a collection of people, each with their name, e-mail address, URI, manager, and subordinates. The Apache sample source code implements a DocumentHandler that interprets this schema. But if we make one small modification to the schema, we find a flaw in the method that the sample uses.
The schema calls for an element, called name, containing two elements: given and family. As it is, the family element only appears as a subelement of name. If, however, we replace the uri element with one called web, having subelements work and family, a problem arises. To distinguish between the family name and the family web page, the DocumentHandler implementation must now keep track of context. The flaw in the original implementation is not that it behaved incorrectly, but that it is too rigid; a large change in source code is required to accommodate a small change in the problem.
The problem with SAX is that one DocumentHander handles all events in the document, irrespective of context. So an element named family results in the same callback whether it appears as a subelement of name or of web. If, instead of one single document handler, SAX fired events into a graph of element handlers, the element events would be isolated from one another. A change in one element of the schema would have no impact on the implementation of other elements.
The downloadable source code (FacetDemo.zip) adapts the SAX DocumentHandler interface to an active graph of element handlers. Download both the FaceDemo source code and the Xerces-C library. Install the library and add the include and library paths to the appropriate settings in your development tool. The project file for Visual C++ 6.0 provided links to xerces-c_2.lib. The resulting executable must find xerces-c_2_3_0.dll in the path.
The XMLInterpreter class implements the DocumentHandler interface and keeps a stack of element handlers (see Listing 1). On startElement, it pushes a new element handler to the stack, and on endElement it pops the stack. Each document event is sent to the element handler at the top of the stack.
The XMLInterpreter constructor takes an XML document handler interface as a parameter. This is the starting point of the graph. The document handler and element handler interfaces share a common base, which determines the next element handler for a given child. This forms an active graph because the nodes themselves act to determine the next node to visit.
The application class PersonnelHandler implements the active graph for the personnel schema (Listing 2). It defines a facet for each element type appearing in the schema. The element-handler interface defines four methods, so each facet takes four function pointers in its constructor. Most elements only need to implement one or two of these functions because they have only subelements, characters, or attributes. For the rest, they delegate to default functions.
The SAX example demonstrates that active graphs can be powerful tools for solving difficult problems. Active graphs change their behavior as the state of the system changes. Furthermore, the nodes in the graph determine which transitions are taken, not an external procedure.
Facets are a natural way to implement active graphs. When facets are employed, the graph acts like one uniform object. In addition, since the facets are not separate objects, their lifetimes do not need to be individually managed.
There are many ways to implement facets in C++, the most versatile is through delegation. An intermediate class delegates interface methods to pointers to member functions of one implementation class. This technique enjoys three degrees of code reuse. First, the facet template can be instantiated for any implementation class. Second, that instantiation can be used for all facets of that class. And third, default functions can be shared among many facets. The resulting code is neither rigid nor fragile, and can accommodate many new features to come.
[1] Gamma, Erich et al. Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995.
[2] Alger, Jeff. Secrets of the C++ Masters, Academic Press, 1995.
[3] Sutter, Herb. Exceptional C++: 47 Engineering Puzzles, Programming Problems, and Solutions, Addison-Wesley, 2000. Also see The PIMPL idiom, Wiki. http://c2.com/cgi-bin/wiki?PimplIdiom.
[4] Xerces-C, Apache Software Foundation. http://xml.apache.org/xerces-c/index.html.