Designing an Application Framework

Reusability is what's important

Grady Booch

Grady is chief scientist at Rational and author of Object-oriented Analysis and Design (Benjamin/Cummings, 1994), on which this article is based. Grady can be contacted at egb@rational.com.

A major benefit of C++, Smalltalk, and similar object-oriented programming languages is the degree of reuse they make possible in well-engineered systems. Achieving a high level of reuse means that you write and maintain less code for each new application.

Reusing individual lines of code is the simplest form of reuse (all of us have copied and pasted code from one application into another), but it offers the fewest benefits (that code must be replicated across applications). You can do better by specializing existing classes through inheritance. Better yet, you can reuse whole groups of classes organized into a "framework"--a collection of classes that provide a set of services for a particular domain.

Domain-neutral frameworks, such as general foundation-class libraries, math libraries, GUI libraries, and the like, apply to a wide variety of applications. Vertical application frameworks--hospital patient records, securities and bonds trading, general business management, telephone switching systems, and so on--are specific to a vertical domain. Wherever there's a family of programs that solve substantially similar problems, there's an opportunity for an application framework. In this article, I'll present a foundation-class library design with an adaptable architecture. A number of requirements must be met when writing such a foundation-class library. In general, the library must provide a collection of domain-independent data structures and algorithms sufficient to cover the needs of most production-quality C++ applications. Additionally, it must be:

Complete. The library must provide a family of classes, united by a shared interface, but each employing a different representation so that developers can select the ones with the time and space semantics most appropriate to their given application.

Adaptable. All platform-specific aspects must be clearly identified and isolated so that local substitutions may be made. In particular, developers must have control over storage-management policies, as well as the semantics of process synchronization.

Efficient. Components must be easily assembled (efficient in terms of compilation resources), must impose minimal run-time and memory overhead (efficient in execution resources), and must be more reliable than hand-built mechanisms (efficient in developer resources).

Safe. Each abstraction must be type-safe, so that static assumptions about the behavior of a class may be enforced by the compilation system. Exceptions should be used to identify conditions under which a class's dynamic semantics are violated; raising an exception must not corrupt the state of the object that threw the exception.

Simple. The library must use a clear and consistent organization that makes it easy to identify and select appropriate concrete classes.

Extensible. Developers must be able to add new classes independently, while at the same time preserving the architectural integrity of the framework.

Unfortunately, these requirements are open-ended: A library that provides abstractions for all the foundation classes required by all possible applications would be huge. Since problems like this could easily suffer from analysis paralysis, you should focus on providing the most generally useflibrary abstractions and services.

Domain Analysis

Your first step is domain analysis: Survey the theory of data structures and algorithms, then harvest abstractions found in production programs. For example, start the analysis session by organizing abstractions into structures, which contain all structural abstractions, or into tools, which contain all algorithmic abstractions. There is a "using" relationship between these two categories: Certain tools build upon the more primitive services provided by some structures.

In the second phase of the domain analysis, study the foundation classes used in a variety of application areas (the wider the spectrum, the better). Along the way, you may discover common abstractions that overlap with what you encountered in the first phase: This indicates that you've discovered truly general abstractions, so you'll keep these within the boundary of your problem. You may also find certain domain-biased abstractions--currency, astronomical coordinates, measures of mass and size, and the like. You can reject these abstractions because they are either difficult to generalize (such as currency), highly domain specific (such as astronomical coordinates), or so primitive that it is hard to find compelling reasons to turn them into first-class citizens (measures of mass and size, for example). On the basis of this analysis, you can settle on the kinds of structures in Table 1.

Organizing the abstractions represented by this list is a problem of classification. I chose this particular organization because it clearly separates behavior among each category of abstractions.

You might settle upon the kinds of tools shown in Table 2, which are also based upon the domain analysis. Many of these abstractions have obvious functional variations. For example, you may distinguish among many different kinds of sorting agents (for quick sorting, bubble sorting, heap sorting, and so on), as well as among different kinds of searching agents (those responsible for sequential searching; binary searching; and pre-, in-, and post-order tree searching).

Design

Coggins's Law of Software Engineering states that, "pragmatics must take precedence over elegance, for Nature cannot be impressed." A corollary is that design can never be entirely independent of language--the particular features and semantics of a language influence architectural decisions. Ignoring these influences leaves you with abstractions that do not take advantage of the language's unique facilities, or with mechanisms that cannot be efficiently implemented in any language.

Object-oriented languages offer three basic facilities for organizing a rich collection of classes: inheritance, aggregation, and parameterization. Inheritance is the most visible (and most popular) aspect of object-oriented technology; however, it is not the only structuring principle to consider. Indeed, parameterization combined with inheritance and aggregation can lead you to a very powerful, yet small, architecture.

The elided declaration of a C++ domain-specific queue class in Example 1(a) is a concrete realization of the abstraction of a queue of events: a structure in which you can add event objects to the tail of the queue, and remove them from the front. C++ encourages abstraction by letting you state the intended public behavior of a queue (expressed via the operations clear, add, pop, and front), while hiding its exact representation. Certain uses of this abstraction may demand slightly different semantics--you may need a priority queue, in which events are added to the queue in order of their urgency. You can take advantage of the work you've already done by subclassing the base queue class and specializing its behavior, as in Example 1(b).

Virtual functions encourage abstraction by allowing you to redefine the semantics of concrete operations (such as add) from a more generalized abstraction. In combination with parameterized classes, you can craft even more general abstractions. The semantics of queues are the same, no matter if it's a queue of cabbages or a queue of kings. With template classes, you can restate the original base class, as in Example 1(c). This is a common strategy when applying parameterized classes: Take an existing concrete class, identify the ways in which its semantics are invariant according to the items it manipulates, and extract these items as template arguments.

You can combine inheritance and parameterization in very powerful ways. For example, you can restate the original subclass, as in Example 1(d). Type safety is the key advantage of this approach. You can instantiate any number of concrete queue classes, as in Example 2. The language will enforce abstractions, so that you can't add events to the character queue, nor floating-point values to the event queue.

Figure 1 illustrates this design by showing the relationships among a parameterized class (Queue), its subclass (PriorityQueue), one of its instantiations (PriorityEventQueue), and one of its instances (mailQueue). This example leads to this library's first architectural principle: Except for a few cases, the classes you provide should be parameterized. This decision supports the library's requirement for safety.

Macro Organization

Classes are a necessary but insufficient vehicle for decomposition. This certainly applies to the class library I'm designing here. One of the worst organizations you could devise would be a flat collection of classes, through which developers would have to navigate to find the classes needed. It's far better to place each cluster of classes into its own category, as in Figure 2. This helps satisfy the library's requirement for simplicity. A quick domain analysis suggests the opportunity for exploiting the representations common among the classes in this library. Consequently, you assert the existence of the globally accessible category named Support to organize lower-level abstractions. You'll also use this category to collect the classes needed in support of the library's common mechanisms.

This leads to the library's second architectural principle: Distinguish clearly between policy and implementation. In a sense, abstractions such as queues, sets, and rings represent particular policies for using lower-level structures such as linked lists or arrays. For example, a queue defines the policy whereby items can only be added to one end of a structure and removed from the other. A set, on the other hand, enforces no such policy. A ring does enforce an ordering, but sets the policy that the front and the back of its items are connected. I'll therefore use the support category for those more primitive abstractions, upon which I can formulate different policies.

By exposing this category to library builders, you support the library's requirement for extensibility. In general, application developers need only worry about the classes found in the categories for structures and tools. Library developers and power users, however, may wish to use the more primitive abstractions found in Support, from which new classes may be constructed, or through which the behavior of existing classes may be modified.

As Figure 2 suggests, you organize this library as a forest of classes, rather than as a tree, since there's no single base class, as with languages such as Small-talk. Although not shown in the figure, the classes in the Graphs, Lists, and Trees categories are subtly different from the other structural classes. Abstractions such as deques and stacks are monolithic in that they're treated as a single unit: There are no identifiable, distinct components. Referential integrity is therefore guaranteed. Alternatively, a polylithic structure (such as a graph) permits structural sharing. For example, you may have objects that denote a sublist of a longer list, a branch of a larger tree, or individual vertices and arcs of a graph. The fundamental distinction between monolithic and polylithic structures is that, in monolithic structures, the semantics of copying, assignment, and equality are deep; in polylithic structures, copying, assignment, and equality are shallow operations (aliases may share a reference to a part of a larger structure).

Class Families

A third principle central to the design of this library is the concept of building families of classes, related by lines of inheritance. For each kind of structure, I'll provide several different classes, united by a shared interface (such as the abstract base class Queue). Each class has several concrete subclasses, each having a slightly different representation, and therefore having different time and space semantics. In this manner, the library's requirement for completeness is supported. You can select the one concrete class whose time and space semantics best fit the needs of a given application, yet still be confident that, no matter which concrete class is selected, it will be functionally the same as any other concrete class in the family. This intentional and clear separation of concerns between an abstract base class and its concrete classes allows you to initially select one concrete class and later, as the application is being tuned, replace it with a sibling concrete class with minimal effort. (The only real cost is the recompilation of all uses of the new class.) You can be confident that the application will still work because all sibling concrete classes share the same interface and the same central behavior. Another implication of this organization is that it makes it possible to copy, assign, and test for equality among objects of the same family of classes, even if each object has a radically different representation.

In a simple sense, an abstract base class captures the relevant public design decisions about the abstraction. Another important use of abstract base classes is to cache a common state that might otherwise be expensive to compute. This can convert an O(n) computation to an O(1) retrieval. The cost is the cooperation required between the abstract base class and its subclasses, to keep the cached result up to date.

The various concrete members of a family of classes represent the forms of an abstraction. You must consider two fundamental forms of most abstractions when building a serious application: the form of representation (which establishes the concrete implementation of an abstract base class) and the choice of in-memory structures (is the structure stored on the stack or on the heap?). In the "bounded" form of an abstraction, the structure is stored on the stack and thus has a static size at the time the object is constructed; in the "unbounded" form, the structure is stored on the heap and thus may grow to the limits of available memory.

Because both bounded and unbounded forms share a common interface and behavior, you can make them direct subclasses of the abstract base class for each structure. The second important variation concerns synchronization. Many useful applications are "sequential systems"--they involve only a single thread of control. Other applications, especially those involving real-time control, are "systems concurrent"--they require the synchronization of several simultaneous threads of control within the same system. The synchronization of multiple threads of control is important because of mutual exclusion. Simply stated, it is improper to allow two or more threads of control to directly act upon the same object at the same time, because they may interfere with the state of the object, and ultimately corrupt its state. For example, consider two active agents that both try to add an item to the same Queue object. The first agent might start to add the new item and be preempted, leaving the object in an inconsistent state for the second agent.

There are three possible design alternatives: sequential, guarded, and synchronous. Each requires different degrees of cooperation among the agents that interact with a shared object. The interactions among the abstract base class, the representation forms, and the synchronization forms yield the same family of classes for every structure, as shown in Figure 3. This architecture explains why I've chosen to organize this library as a family of classes rather than having a singly rooted tree:

It accurately reflects the regular structure of the various component forms.
It involves less complexity and overhead when selecting one component from the library.
It avoids the endless ontological debates engendered by a "pure object-oriented" approach.
It simplifies integrating the library with other libraries.

Conclusion

Building frameworks is hard. In crafting general class libraries, you must balance the needs for functionality, flexibility, and simplicity. Strive to build flexible libraries, because you can never know exactly how programmers will use your abstractions. Furthermore, it is wise to build libraries that make as few assumptions about their environment as possible so that programmers can easily combine them with other class libraries. You must also devise simple, efficient abstractions that programmers can understand. The most profoundly elegant framework will never be reused, unless the cost of understanding it and then using its abstractions is lower than the programmer's perceived cost of writing them from scratch. The real payoff comes when these classes and mechanisms get used over and over again, indicating that others are gaining leverage from the developers' hard work, allowing them to focus on the unique parts of their own particular problem.

Table 1: Structures for the class library.

Structure     Description
-------------------------
Bag           Collection of (possibly duplicate) items.
Collection    Indexable collection of items.
Deque         Sequence of items in which items may be added and
               removed from either end.
Graph         Unrooted collection of nodes and arcs, which may contain
               cycles and cross-references; structural
               sharing is permitted.
List          Rooted sequence of items; structural sharing is permitted.
Map           Dictionary of item/value pairs.
Queue         Sequence of items in which items may be added from one
               end and removed from the opposite end.
Ring          Sequence of items in which items may be added and
               removed from the top of a circular structure.
Set           Collection of (unduplicated) items.
Stack         Sequence of items in which items may be added and
               removed from the same end.
String        Indexable sequence of items, with behaviors involving
               the manipulation of substrings.
Tree          Rooted collection of nodes and arcs, which may not
               contain cycles or cross-references; structural
               sharing is permitted.

Table 2: Tools determined by domain analysis.

Tool               Description
------------------------------
Date/Time          Operations for manipulating date and time.
Filters            Input, process, and output transformations.
Pattern matching   Operations for searching for sequences within
                    other sequences.
Searching          Operations for searching for items within
                    structures.
Sorting            Operations for ordering structures.
Utilities          Common composite operations that build upon more
                    primitive structural operations.

Example 1: (a) Abstraction of an event-queue class; (b) subclassing from EventQueue; (c) restating EventQueue using a template class; (d) PriorityQueue as a template class.

(a)
     class NetworkEvent...

     class EventQueue {
     public:

       EventQueue();
       virtual ~EventQueue ();

       virtual void clear();
       virtual void add(const NetworkEvent&);
       virtual void pop();

       virtual const NetworkEvent& front() const;
       ...
     };

(b)
     class PriorityEventQueue : public EventQueue {
     public:

       PriorityEventQueue ();
       virtual ~PriorityEventQueue ();

       virtual void add(const NetworkEvent&);
       ...
     };

(c)
     template<class Item>
     class Queue {
     public:

       Queue();
       virtual ~Queue();

       virtual void clear();
       virtual void add(const Item&);
       virtual void pop();

       virtual const Item& front() const;

       ...
     };

(d)
     template<class Item>
     class PriorityQueue : public Queue<Item> {
     public:

       PriorityQueue ();
       virtual ~PriorityQueue ();

       virtual void add(const Item&);
       ...
     };

Example 2: Instantiating concrete queue classes.

Queue<char> characterQueue;
typedef Queue<NetworkEvent> EventQueue;
typedef PriorityQueue<NetworkEvent> PriorityEventQueue;