Dr. Dobb's Sourcebook September/October 1997
Before the Object Database Management Group (ODMG), the lack of a standard for object databases was a major limitation to their more widespread use. The success of relational database systems did not result simply from a higher level of data independence and a simpler data model than previous systems. Much of the success came from standardization. The acceptance of the SQL standard allows a high degree of portability and interoperability between systems, simplifies learning new relational DBMSs (RDBMSs), and represents a wide endorsement of the relational approach.
All of these factors are important for ODBMSs, as well. The scope of ODBMSs is more far reaching than that of RDBMSs, integrating the programming language and database system, and encompassing all of an application's operations and data. A standard is critical to making such applications practical.
The ODMG's primary goal is to put forward a set of standards allowing ODBMS programmers to write portable applications; that is, applications that run on more than one ODBMS product. The data schema, programming language binding, data manipulation, and query languages must be portable.
It is important to define the scope of our efforts, since ODBMSs provide an architecture that is significantly different than other DBMSs -- they are a revolutionary rather than evolutionary development. Rather than providing only a high-level language such as SQL for data manipulation, an ODBMS transparently integrates database capability with the application programming language. This transparency makes it unnecessary to learn a separate DML, obviates the need to explicitly copy and translate data between database and programming language representations, and supports substantial performance advantages through data caching in applications. The ODBMS includes the query language capability of relational systems as well, and the query language model is more powerful -- it incorporates lists, arrays, and results of any type.
In short, we define an ODBMS to be a DBMS that integrates database capabilities with object-oriented programming language capabilities. An ODBMS makes database objects appear as programming language objects, in one or more existing programming languages. The ODBMS extends the language with transparently persistent data, concurrency control, data recovery, associative queries, and other database capabilities.
An in-depth discussion of the complete ODMG 2.0 standard is clearly beyond the scope of this article. For definitive information on ODMG 2.0, see The Object Database Standard: ODMG 2.0 (Morgan Kaufmann Publishers, 1997).
Release 2.0 of the ODMG Standard differs from Release 1.2 in a number of ways. With the wide acceptance of Java, we added a Java Persistence Standard in addition to the existing Smalltalk and C++ ones. We made the ODMG object model much more comprehensive, added a metaobject interface, defined an object interchange format, and worked to make the programming language bindings consistent with the common model. We made changes throughout the specification based on several years of experience implementing the standard in object database products.
As with Release 1.2, we expect future work to be backward compatible with Release 2.0. Although we expect a few changes to come, for example to the Java binding, the Standard should now be reasonable stable.
The major components of ODMG 2.0 are:
It is possible to read and write the same database from C++, Smalltalk, and Java, as long as you stay within the common subset of supported data types. Unlike SQL in relational systems, ODBMS Data Manipulation Languages are tailored to specific application-programming languages, in order to provide a single, integrated environment for programming and data manipulation. We don't believe exclusively in a universal DML syntax. We go further than relational systems, as we support a unified object model for sharing data across programming languages, as well as a common query language.
A better understanding of the architecture of an ODBMS will help put the components we have discussed into perspective. Figure 1 illustrates the use of the typical ODBMS application we are trying to standardize.
The programmer writes declarations for the application schema (both data and interfaces), plus a source program for the application implementation. The source program is written in a programming language (PL) such as C++, using a class library that provides full database OML including transactions and object query. The schema declarations may be written in an extension of the programming language syntax, labeled PL ODL in the figure, or in a programming-language-independent ODL. The latter could be used as a higher level design language, or to allow schema definition independent of programming language.
The declarations and source program are then compiled and linked with the ODBMS to produce the running application. The application accesses a new or existing database, with types that must conform to the declarations. Databases may be shared with other applications on a network; the ODBMS provides a shared service for transaction and lock management, allowing data to be cached in the application.
The Object Definition Language (ODL) is the declarative portion of C++ ODL/OML. The C++ binding of ODL is expressed as a library that provides classes and functions to implement the concepts defined in the ODMG object model. OML is a language used for retrieving objects from the database and modifying them. The C++ OML syntax and semantics are those of standard C++ in the context of the standard class library.
ODL/OML specifies only the logical characteristics of objects and the operations used to manipulate them. It does not discuss the physical storage of objects. It does not address the clustering or memory management issues associated with the stored physical representation of objects or access structures (like indices used to accelerate object retrieval). In an ideal world, these would be transparent to the programmer. In the real world, they are not. An additional set of constructs called "physical pragmas" is defined to give the programmer some direct control over these issues, or at least to enable a programmer to provide "hints" to the storage management subsystem provided as part of the ODBMS run time. Physical pragmas exist within the ODL and OML. They are added to object type definitions specified in ODL, expressed as OML operations, or shown as optional arguments to operations defined within OML. These pragmas are not in any sense stand-alone languages, but rather a set of constructs added to ODL/OML to address implementation issues.
The programming-language-specific bindings for ODL/OML are based on one basic principle -- that the programmer feels that there is one language, not two separate languages with arbitrary boundaries between them.
While no Smalltalk language standard exists at this time, ODMG member organizations participate in the X3J20 ANSI Smalltalk standards committee. We expect that (as standards are agreed upon by that committee and commercial implementations become available) the ODMG Smalltalk binding will evolve to accommodate them. In the interests of consistency and until an official Smalltalk standard exists, we will map many ODL concepts to class descriptions as specified by Smalltalk-80.
The ODMG Smalltalk binding is based upon two principles -- it should bind to Smalltalk in a natural way that is consistent with the principles of the language, and it should support language interoperability consistent with ODL specification and semantics. We believe that organizations specifying their objects in ODL will insist that the Smalltalk binding honor those specifications. These principles have several implications that are evident in the design of the binding:
As with other language bindings, ODMG Java binding is based on one fundamental principle -- the programmer should perceive the binding as a single language for expressing both database and programming operations, not two separate languages with arbitrary boundaries between them. This principle has several corollaries:
The Java binding provides persistence by reachability, like the ODMG Smalltalk binding (this has also been called "transitive persistence"). On database commit, all objects reachable from database root objects are stored in the database.
The Java binding provides two ways to declare persistence-capable Java classes:
One possible ODMG implementation that supports these capabilities would be a postprocessor that takes as input the Java .class file (bytecodes) produced by the Java compiler, then produces new modified bytecodes that support persistence. Another implementation would be a preprocessor that modifies Java source before it goes to the Java compiler. Another implementation would be a modified Java interpreter.
We want a binding that allows all of these possible implementations. Because Java does not have all the hooks we might desire, and the Java binding must use standard Java syntax, it is necessary to distinguish special classes understood by the database system. These classes are called persistence-capable classes. They can have both persistent and transient instances. Only instances of these classes can be made persistent. The current version of the standard does not define how a Java class becomes a persistence-capable class.
DDJ