C/C++ Users Journal April, 2005
Any programmer who has used a database interface such as the Oracle native API, ODBC, OLE DB, or ADO, knows that application code can get polluted by the types and structures these database interfaces supply. In other words, any technology you use tries to dominate your program! In this article, I present a library that provides a straightforward interface to any SQL database, thereby decoupling you from tedious database structures. Mixing the handle-body idiom [1] with templates creates the decoupling without losing type safety. All you have to do is decide which database you want to use, then add/manipulate/extract data from the database by executing SQL statements, simple transactions, queries, and autocreation of UPDATE, INSERT, and DELETE statements.
In 1994, I worked for a Danish company that developed systems for public libraries. The system ran on top of an Oracle 7 database on a Supermax multiprocessor UNIX computer (also IBM AIX and HP). The programming interface to the Oracle database was a flat C interface, where SQL statements could be executed and result sets retrieved. The problem with this architecture was that my employer was stuck with the Oracle database. It was impossible to change the underlying database without rewriting the whole programand the library system was then hard-coupled to the Oracle database.
During the 1990s, ODBC, ADO, and OLE DB came along to make the software loose-coupled to the database. The good thing about ODBC is that there are drivers for most databases on both Windows and UNIX [2]. The bad thing is that the interface is a flat C interface that has no understanding of C++ types and exceptions. Microsoft tried to fix this by wrapping some of the ODBC API in MFC, but it is old-fashioned and had no understanding of types such as std::string or boost::posix_time::ptime. In the late 1990s, Microsoft introduced COM interfaces and object models for database access by means of ADO and OLE DB. The two models are object models and give a uniform interface to different kinds of databases for Win32 clients. Alas, everything is soiled with the tedious COM types. Of course, a Visual Basic 6 (VB6) programmer will never notice this because of the close coupling between COM and VB6, but this is a nightmare for C++ programmers. Moreover, the client applications are no longer portable as with ODBC code.
Listing 1 assumes a table "Persons" in a SQL server has been created, and an ODBC DSN called "MyLife" has been made. As seen, you can easily write data to a database. But what about database queries? This is done through a template class called search. The idea behind the search class is to make it work like a Standard Library iterator. A search needs a C++ record type for storing each row fetched. In Listing 2, a record type called Person is created. It needs to implement the static template method create_bindings. This method creates a mapping between the data field in the record and data column in the fetched row by calling a bind method for each variable. The method definition and bind methods are encapsulated into easy-to-use macrosBEGIN_DB_BINDING, DB_BIND_COLUMN_NAME, and END_DB_BINDING, among others (Listing 2).
The search object is instantiated using the database object and a SQL select statement. The end_s object is a "past-the-end" iterator analogous to the use of std::istream_iterator from the Standard Library. The search class conforms to an std::input_iterator concept and can be used any place an std::input_iterator is permitted. An std::input_iterator has only pre- and postincrement operators, so it is only possible to move forward in the result set.
The database class handles the connection to the database and takes care of transactions. The implementation of the database class is straightforward. As Figure 1 illustrates, the implementation is divided into two using the counted-handle-body idiom [1]. The use of the counted-handle-body decouples the client from the implementation and makes lifetime control easy. The handle class gives a database-independent interface to users, while the body class contains knowledge about the database technology. When using ODBC, the body class controls an ODBC environment handle, a connection handle, and the ability to create new statement handles. The reference count ensures that the noncopyable body class can be sent as arguments to methods without forcing the client to use references. The state of a database connection cannot be copied without representing exactly the same connection, so the internal reference count eases the client programmer task of controlling the lifetime of a specific database object.
This class can perform almost any task that can be made through SQL statements by using the execute command. The only task it cannot do is retrieve records from the database when making a database search. This task is left to the search class.
Our initial design had methods directly on the database object that could control transactions (begin, commit, and rollback). The problem with this approach is that the mix of directly called transaction lifetime methods (begin, commit, and rollback) and exceptions is not a good cocktail because it contradicts the RAII idiom. Consequently, we created a transaction class that handles the database transaction as a resource (see Figure 1), similar to the boost::mutex class. The inspiration for the approach is from the Microsoft Transaction Server (MTS). Listing 3 uses this class and creates more than one transaction object on the same database transaction.
In Listing 3, the first transaction object is created and sets a reference count on the database transaction to 1. When the second transaction object is created, the reference count is incremented. When any transaction object runs out of scope, it votes set_complete (meaning "yes") or puts a veto on the transaction and decrements the reference count. When all transaction objects run out of scope, all votes are counted. If any transaction has put a veto on the transaction by calling transaction::set_abort or has forgotten to vote, the database transaction rolls back. If all transaction objects called transaction::set_complete, the database transaction is committed. If exceptions are thrown, any transaction object that runs out of scope calls its set_abort method.
The search class handles SQL statements that return record sets, and provides a convenient wrapper for reading these result sets from the database. The search template just needs to know the type of the object into which each row should be stored. It then binds each database-specific variable in the record type to its specific address in the record-type layout. This is done by a user-defined template method called create_bindings, which links the record variable to the columns in the result set. Figure 4 shows the result of such a binding.
It took many iterations before the design of the search class felt right. Implementing the search class as an std::input_iterator gives the best result because a search object needs to be copied; for example, when using the std::copy algorithm. It is nearly impossible to copy the state of a database cursor, so instead a copy of a search object shares the state with the original search object. It might have been a more natural choice to implement the search class as an std::forward_iterator, but this iterator type cannot handle shared statehence, the choice of the std::input_iterator in accordance with ¤24.1.1 (Iterator library) [4].
Listing 4 demonstrates alternatives to Listing 2 for iteration of data using the search class. As you can see in Listing 4, the end() method is used as a past-the-end iterator. It doesn't contradict the concept of an std::input_iterator to have a method such as end(). For our purposes, we found the use of an end method more convenient.
The design and implementation of the search class gave us headaches because we wanted to use the counted-body idiom so that the user code did not get polluted by database noise. However, we also used templates. Consequently, we cheated by injecting a generic layer between the search template and the actual binding code (Figure 1). In this layer, we have the search_binder and the search_binder_impl, which use the counted-handle-body idiom and remove the type information about the user-defined record type by converting it into void*. Admittedly, this sounds like "turning to the dark side" of C++ programming, but we made a promise to separate the client and the implementation, and to fulfill this we use void*. The Boost library [3] also uses void* in bind, function, and signal, so the idea is not new. Listing 5 shows the search class.
The actual binding is done through a cascade of templated bind member methods. Each layer has its own set of bind methods. The separation between the client and the implementation means that the actual implementations need to be defined in the .cpp files and put into the library; see Figure 2. This is possible due to explicit template instantiation (see ¤14.7.2 in [4]) and is implemented in most compilers. The drawback of this implementation technique is that some errors aren't caught until link time and they can be hard to figure out.
Each call to the bind method creates an instance of a subclass of column_binder. The purpose of the column_binder is to bind the data field in the C++ record to the column in the database. Saving the offset of the C++ member variable in the column_binder does this. The concrete class of the column_binder decides how data is moved from database to C++ variable. For the simple types, such as bool, int, unsigned int, short, double, and so on, this is a simple task because there is a one-to-one correspondence to the db representation and the general col_binder_impl<T> is used (see Figure 3). For more special tasks, such as std::string and boost::posix_time::ptime, template specialization is used because special conversion from the db type to the C++ type representation is required.
When the actual search is executed, each fetch of a row iterates through all the column_binders and calls column_binder::update_field, which does the actual conversion from the database type to the C++ type. Listing 6 is the general case, while Listing 7 presents the update_field for template specialization col_binder_impl< boost::posix_time::ptime>. The template method get_memvar_ref<T>(void * record) uses the saved offset to find the specific entry in the C++ record and returns it as a strong-typed C++ variable.
The binding of a variable to a column happens after the SQL statement has been executed. It can either be bound to an increasing column index, specific column index, or column name. This means that the same field from the database record set can be stored into one or more different variables in the user-defined record class.
When a search object is instantiated, it first makes the search_binder create the search_binder_impl. The constructor for search_binder_impl retrieves a database statement and returns to search_binder, which then calls open on the search_binder_impl that executes the SQL statement (Figure 3). The search object then makes a callback to the user-defined create_bindings method (defined by the BEGIN_DB_BINDING macro). The used mechanism has similarities to the double-dispatch mechanism as used in the Visitor Pattern [5], except that template methods are used instead of virtual methods. The create_bindings method binds each database-persistent member referenced in the DB_BINDING map to an increasing index, specific index, or specific column name. Each call to bind goes from search to search_binder, and then to search_binder_impl. The search_binder_impl holds a metadata_buffer that holds an entry for each variable that is bound.
Each entry in the metadata_buffer keeps the offset to the user-defined record type where the variable is defined and has an object that can convert from the database type to the C++ type. This converter class is called a column_binder (see Figure 4).
After the binding process has finished and metadata has been set up, the data is retrieved from the database. The search object eventually calls search_binder_impl::bind_ all_values_to_db, which makes the actual ODBC binding. If the index is already known, it binds using this index. If only the column name is known, search_ binder_impl uses the ODBC metadata to get the index from the name.
The search object then calls get_next_value. This method calls the ODBC ::SQLFetch method. After the fetch method is called, each column_binder is asked to do its magic and convert and/or copy the database values into its C++ companion (see column_binder::update). In the simple case, this is just a simple copy. In the case of complex variables such as std::string, a conversion from a char* to an std::string is performed. The size of the local char* buffer is either fixed (set by users) or set when metadata from the database is accessed in the case of a column name.
Besides normal values, a database can hold null values. This extra value has no direct resemblance in the C++ type system, so when bound to a normal plain C++ variable, an exception is thrown if the database value is NULL. To handle the null values, use the boost::optional template class. If you know your database can return a null value for an integer field, use a boost::optional<int> instead of a normal C++ int.
There are two different kinds of types that are supported by the system. The first kind is the plain-old-data-type that has a one-to-one mapping into the database type system. The second kind is the type that needs to be converted after the values are retrieved from the database, such as the std::string type. The first can be handled using a general-case template, while the second can be handled using template specialization. Table 1 lists the types that are supported by the database library.
A more complex example that also uses boost::posix_time::ptime when reading data can be found in Listing 8.
When working with objects, it is not enough just to be able to manipulate and do queries through the database and the search objects. The ability to store, update, or delete from the database is also needed. All this is supported through the persist template. An example that inserts/updates/deletes an object might look like:
database db("MyLife", "sa", "");
std::string tableName = "Persons";
persist<Person> p(db, tableName);
Person value("Asger, my only son", 2);
p.insert(value);
value.name_= "Asger, my first son out of two";
// setting a nonunique field
p.update(value);
Person value2("Sigurd, my second son", 0);
p.erase(value); // delete is a keyword // in C++, so "erase" is used instead
There is a specific need for more information than the search object neededthe data direction (is the data coming or going to the database?) and whether the field is a part of a subset of the fields that can uniquely identify the row. This information can be added as field attributes in the create_bindings method, as in Listing 9.
The implementation is straightforward. The persist_binder_impl (similar to the search_ binder_impl) just iterates over the fields and generates the appropriate SQL statements (INSERT INTO, UPDATE, DELETE FROM). The update and delete statements only work for records that are bound to column names because we need unique identification of the specific row. During the generation of the SQL statement, each kind of statement has its own set of dynamic checks to see if the record class conforms to the statement. One of these checks is that the update statement needs to have a least one field that is marked as unique and to_db.
There are still tasks that need to be looked at in the next version of the librarybulk fetch, BLOBs (and lazy fetching of them), stored procedures as function objects, autocreation of select statements, and a metadata class. We also need to look at exception guarantees and performance and concurrency issues. We are planning to support new types, such as MFC or Boost, on a library basis. You just include the right file and voilàyou have support for the new types (in VC++ 7.1, we use autolinking).
As long as any database supports the concept of database connections and cursors, the methods presented here work, and the database-specific code won't pollute the user code because of the compilation firewalls. As recommended in [9], I talked to database users, database experts, and C++ gurus before creating the library. I also looked into already existing libraries [6, 7, 8] for a domain scan. Many of the libraries I looked at could do more than this small lightweight library, but they all had issues that I disliked. This approach let me create a framework that supports the most common tasks, while maintaining the slickness of a Standard-friendly library.
Boost is currently working on a database library that uses templates and metaprogramming to generate SQL statements so they can be checked at compile time. It will be interesting to see the outcome!