C/C++ Users Journal March 2004

Database Portable Software

Design patterns yield an architecture for database portable information systems

By Michael H. Lutz, Colin J. Neill, and Phillip A. Laplante

Portability is an important software quality attribute. We say a software system is "database portable" when it can optimally run against different database management systems. Database portable software exploits vendor-specific features when necessary, while at the same time facilitating code reuse. Database portable software should support functional extension following Meyer's Open-Close principle (open to extension, closed to change [7]), thus minimizing support costs.

In this article, we describe a pattern-based architecture for achieving database portability in an optimal way, letting you use database native features when necessary. In the process, we present DbKit, a toolkit that supports the architecture.

The Problem

Three main factors incur variance in C++ database access code:

The API itself (for instance, ODBC, OCI, or Pro*C, which are entirely different).
SQL differences (PL-SQL versus Transact-SQL, for example) and version differences (Oracle 8i versus 9i).
Driver differences and levels of conformance (e.g., to the ODBC specification).

The obvious question regarding database portable software in C++ is: "Why not use ODBC?" While ODBC is certainly a strategy, it doesn't entirely resolve the issues [2]. Some vendors (Oracle, for instance) do not provide ODBC drivers outside the Microsoft platform; this incurs a dependency on potentially costly third-party drivers if Linux or UNIX support is a requirement. Second, driver conformance and subtleties can be problematic. Finally, ODBC itself does not process SQL — the database engine does. So even with ODBC in the picture, SQL differences remain. The SQL code in Listing 1 illustrates an insert into a relatively simple table, and we use each database manager's native mechanism for generating primary keys (a common practice) and retrieving (or retaining) them to resolve foreign key constraints for subsequent inserts. The SQL represents the required logic if coded in ODBC. Notice both the order and syntax differs. (a) is SQL for Server, and (b) is for Oracle. Performance may also be a concern with ODBC, although vendors such as DataDirect Technologies have generally matched native performance numbers [1].

Strategies

Organizations can attack this problem with one of three strategies:

"Free for all" (that is, unmanaged).
A generic data access library.
A problem domain database abstraction layer.

In the "free for all" approach, the system is designed without consideration to database portability, and developers are "forced to cope and get it to work." We do not support this approach; it is unmanageable and will fail as portability requirements change during a product's lifetime.

The second approach is to buy or build a generic data access library with the intent of entirely shielding developers from the aforementioned issues. We did not select this approach for various reasons, including the high costs associated with maintaining such a library and the mediocre performance that usually results. On paper, this approach sounds great, but in practice it handcuffs developers into proprietary APIs and least-common-denominator database features.

The third approach, a problem domain database abstraction layer, implies a set of database-independent interfaces at the problem domain level. This is similar to option two except the level of abstraction is higher. These abstractions must then be implemented as needed to meet a system's database portability requirements.

To achieve portability you need an architecture that lets you exploit vendor-specific features without delving into onerous if/else or conditional compilation blocks. This is the classic conditional logic bad-code smell. The traditional refactoring involves using polymorphism, but this may result in a high degree of redundant low-level code depending on the nature of the application. To reduce redundancy, we developed a set of Wrapper Facade [3] classes that plug into the class hierarchies as needed. The Wrapper Facade classes solve many fundamental data access problems. Combining polymorphic refactoring with Wrapper Facade classes removes if/else blocks and reduces redundant low-level code.

Pattern Selection

We found it necessary to combine several design patterns to achieve our architectural goals. Three problems in particular required examination and resulted in design pattern application.

Problem 1: How do we structure Model classes and their supporting persistence implementation classes? This is the crux of the problem.

To achieve database portability in a general sense, we first need to abstract database operations into database-neutral interfaces. We then need to implement these interfaces with an architecture allowing developers to exploit vendor-specific features and syntax when necessary. We also want to promote code reuse and maintainability.

This is captured in the intent of the Bridge pattern. Bridge [5] allows abstractions and their implementations to vary independently. The goal of Bridge is often platform independence.

Looking deeper, we need to map our Model classes to their supporting persistence implementation classes. We've assumed in the reference implementation that each Model class aggregates exactly one interface defining the database operations required for its persistence needs. So we've got three kinds of classes: the Model classes, the vendor-neutral database interfaces (one interface per Model class), and the implementations of those interfaces.

The standard reification of the Bridge pattern refers to a number of participants. In our case, Abstraction maps to the Model classes (classes that capture domain abstractions: the entities we wish to persist), Implementor maps to the database-neutral interfaces, and Concrete Implementors map to the persistence implementation classes. See Figure 1 for an example class diagram.

Since we are largely rewriting our persistence logic per database manager, we want to minimize code in the database tier (persistence implementation classes) and maximize code in the business logic tier (the Model classes) so as to promote code reuse.

Problem 2: How do we instantiate the correct persistence implementation classes at runtime?

The software needs to know how to instantiate persistence implementation classes at runtime. For Bridge, GoF [5] mentions two approaches for instantiating concrete implementors: an object-composition-based approach that abstracts away object creation in an Abstract Factory, or a derivation-based approach where intelligence is built into the abstractions themselves.

In our case, each implementation class is mapped to exactly one Abstraction class (Model class). Any code requiring access to data should use the appropriate Model class, not the underlying persistence implementation(s). Given this fact, we did not see a need to abstract away object creation into Abstract Factory. Thus, each Model understands how to select and instantiate its persistence implementations (via Factory Method [5]).

Now if you're thinking through all of this carefully, you may have stumbled on a problem. Note that the nature of these Factory Methods is important: We do not want to incur runtime dependencies on libraries residing on a given machine that are not being used. So for instance, if a system is configured to use OCI or OCCI, the system should not have a runtime dependency on ODBC (particularly important for Linux or UNIX). To resolve this problem we implemented a portable, dynamic library-loading mechanism and partitioned our components (binaries) by technology. This level of detail is generally not addressed directly by GoF.

Problem 3: How can we best promote code reuse and reduce tedious, repetitive data-access code?

APIs such as ODBC, OCI, and OCCI are complicated and difficult to master. For example, binding a single parameter to a SQL statement in ODBC requires 10 arguments. Developers invariably end up with a lot of tedious code to write because of the nature of most data-access APIs.

The Template Method [5] pattern can reduce much of this redundancy. The main idea is to implement tedious code such as parameter binding in parent classes, and generate SQL in subclasses. Many times parameter binds are identical across database managers for a given query, but the SQL itself varies. Consider the complexity of parameter binding. Listing 3(a) shows binding a parameter in ODBC, while (b) illustrates binding a parameter in OCI. To avoid endless repetition of binding code in such cases, use the Template Method pattern. Place the binding code in a parent class and define a virtual function to construct the SQL. In subclasses, only implement the function to create the SQL, and you'll bypass repeating the bind calls.

Other patterns can also help to decrease tedious, redundant code. The POSA Wrapper Facade pattern [3] can simplify complex APIs by wrapping them in a class or class library. The DbKit toolkit provides a reference implementation that includes Wrapper Facade classes for ODBC and OCCI (Oracle C++ Call Interface).

DbKit

The purpose of DbKit is to reduce development costs. We achieve this goal by placing highly repeated code in an easy to use toolkit. The toolkit (available at http://www.cuj.com/code/) also helps to ensure that common persistence problems are solved consistently across an application.

Any class in the namespace DbKit is part of the toolkit; everything else is part of the reference implementation. All DbKit files begin with DbKit.

In the toolkit, one Wrapper Facade class is provided for each data-access technology (currently ODBC and OCCI are supported). These classes, DbKitOdbc and DbKitOcci, bundle low-level code to reduce redundancy. They are not intended to entirely shield you from the technologies. The Wrapper Facade classes ease the burden of dealing with the following issues: connecting to the database; connection pooling; environment handle management; statement handle creation; execution, transactions, diagnostics, and logging; and disconnecting from the database. Listing 2 is an example of such code. Oracle 9i introduced the Oracle C++ Call Interface (OCCI), a JDBC-like API for C++. In (a), the call in ODBC turns on connection pooling specifically setting up one pool per driver. Note the connection pool's scope spans data sources, we just need one call per process space, while (b) is the logically equivalent OCCI code. OCCI connection pools are specific to a database connection string. Additionally, OCCI needs to be told how long to wait before freeing connections inactive for long periods of time (to free system resources). ODBC driver managers handle these details entirely and automatically.

By design, the Wrapper Facade classes do not present a consistent interface. They are employed in technology-specific persistence implementation classes, so making their interfaces consistent was unnecessary.

The toolkit also contains classes for handling DbKit-thrown exceptions, a Value Object [4] used to maintain connection configuration state, and a portable library loader and thread mutex class.

Additionally, the toolkit contains an interface, DbKit::IModelDB, which all database implementation interfaces should subclass. Subclassing this interface provides a consistent means of configuring connections for all persistence implementation classes (Concrete Implementor Bridge participants). Finally, the toolkit provides an interface for a custom logger to be plugged into the solution to log database error messages (generally following Visitor [5]).

Reference Implementation

The example reference implementation is a simple order entry system written in C++ using the Model View Controller pattern [6]. The UI is text based (not very exciting), but the UI exists solely to drive the backend.

There is one View class, and it knows about the Controller. To send messages to Model classes, Value Objects are instantiated by the View. These Value Objects also implement a common Command Pattern [5] interface (DbKit::ICmd), thereby doubling as Command objects. After being populated with the necessary state, these objects are passed to the Controller, which in turn invokes their Execute method (see DbKit::ICmd). The Execute method implementations are hard-wired with the intelligence to instantiate the correct Model classes and invoke the correct method(s) based on their state. A given Value Object may handle Execute requests differently depending on its state. Generally there is a one-to-one relationship between the Value Object and Model classes. Herein lies the basic structure of the reference implementation. All of these classes may be found in the ReferenceImp project.

Things get a little more interesting in the Model classes. In terms of the Bridge implementation, we want to instantiate the database implementations in a consistent manner so we can minimize redundant Factory Method code. The reusable portion of the Factory Methods is contained in the DbKit utility class DbKit::Loader. This mechanism is platform independent, and most importantly it loads libraries and returns interfaces on demand.

It is here that some magic is required. Based on a naming convention hard-wired into the Controller (Listing 4) and stored in the Value Objects, the utility loader (DbKit::Loader) knows how to dynamically load the correct library and instantiate the correct implementation object. The naming convention is based on the class name, the data-access technology (ODBC, OCCI, and so on), the database manager name, and finally the library name. In a production system these values would be configured as opposed to being hard-wired. DbKit ensures that requested libraries are loaded exactly once per process and are never unloaded.

An important design consideration for this solution is mapping classes to technology-specific components (binaries). Notice that DbKit consists of three shared libraries: one specific to ODBC, one for OCCI, and a third that is database independent. The same holds true for the reference implementation, with the addition of a main executable that is database independent (rather obviously). This breakdown is necessary to avoid unwanted runtime dependencies. In cases where classes or interfaces are required by multiple components, we placed them in one of the technology-independent shared libraries.

Final Remarks

Database development in C++ is hard work, there's just no easy way around it. Expectations are constantly on the rise in the software business, and database portability offers no safe haven from this trend.

Design patterns provide an effective solution to these and other real-world programming problems. In this article, we showed how to use design patterns to design and build one solution to achieve database portability.

Acknowledgments

Many thanks to Brian Oberholtzer and Daniel Hannum of Siemens Health Services for their ideas and techniques for obtaining interfaces from dynamically loaded libraries in a portable manner.

Bibliography

[1] http://www.datadirect-technologies.com/techres/doc-wp/odbc/WP_ODBCvsOCI.PDF.

[2] http://www.firstsql.com/ioodbc4.htm.

[3] Buschmann, Frank et al. Pattern-Oriented Software Architecture A System of Patterns. John Wiley and Sons Ltd., 1996.

[4] Fowler, Martin. Patterns of Enterprise Application Architecture. Pearson Education Inc., 2003.

[5] Gamma, Erich et al. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.

[6] Krasner, Glenn E. And Stephen T. Pope. "A Cookbook for Using the Model View Controller User Interface Paradigm in Smalltalk-80." Journal of Object-Oriented Programming, August/September 1988.

[7] Martin, Robert C. Agile Software Development, Principles, Patterns, and Practices. Prentice Hall, 2003. o

Michael Lutz is a software engineer for Siemens Health Services, Colin Neill is an assistant professor of software engineering at Pennsylvania State University, and Phillip Laplante is an associate professor of software engineering, also at Pennsylvania State University. They can be contacted at michael.h.lutz@siemens.com, cjn6@psu.edu, and plaplante@psu.edu, respectively.