ODMG 2.0: An Overview

Dr. Dobb's Sourcebook September/October 1997

Release 2.0 adds Java binding and more

By Douglas K. Barry

Rick Cattell is chair and Douglas Barry executive director of the Object Database Management Group. They can be contacted at http://www .odbms.org/. This article is adapted from The Object Database Standard: ODMG 2.0 (Morgan Kaufmann Publishers, © 1997).

Sidebar: The Online Database Derby

Before the Object Database Management Group (ODMG), the lack of a standard for object databases was a major limitation to their more widespread use. The success of relational database systems did not result simply from a higher level of data independence and a simpler data model than previous systems. Much of the success came from standardization. The acceptance of the SQL standard allows a high degree of portability and interoperability between systems, simplifies learning new relational DBMSs (RDBMSs), and represents a wide endorsement of the relational approach.

All of these factors are important for ODBMSs, as well. The scope of ODBMSs is more far reaching than that of RDBMSs, integrating the programming language and database system, and encompassing all of an application's operations and data. A standard is critical to making such applications practical.

The ODMG's primary goal is to put forward a set of standards allowing ODBMS programmers to write portable applications; that is, applications that run on more than one ODBMS product. The data schema, programming language binding, data manipulation, and query languages must be portable.

It is important to define the scope of our efforts, since ODBMSs provide an architecture that is significantly different than other DBMSs -- they are a revolutionary rather than evolutionary development. Rather than providing only a high-level language such as SQL for data manipulation, an ODBMS transparently integrates database capability with the application programming language. This transparency makes it unnecessary to learn a separate DML, obviates the need to explicitly copy and translate data between database and programming language representations, and supports substantial performance advantages through data caching in applications. The ODBMS includes the query language capability of relational systems as well, and the query language model is more powerful -- it incorporates lists, arrays, and results of any type.

In short, we define an ODBMS to be a DBMS that integrates database capabilities with object-oriented programming language capabilities. An ODBMS makes database objects appear as programming language objects, in one or more existing programming languages. The ODBMS extends the language with transparently persistent data, concurrency control, data recovery, associative queries, and other database capabilities.

An in-depth discussion of the complete ODMG 2.0 standard is clearly beyond the scope of this article. For definitive information on ODMG 2.0, see The Object Database Standard: ODMG 2.0 (Morgan Kaufmann Publishers, 1997).

ODMG 2.0

Release 2.0 of the ODMG Standard differs from Release 1.2 in a number of ways. With the wide acceptance of Java, we added a Java Persistence Standard in addition to the existing Smalltalk and C++ ones. We made the ODMG object model much more comprehensive, added a metaobject interface, defined an object interchange format, and worked to make the programming language bindings consistent with the common model. We made changes throughout the specification based on several years of experience implementing the standard in object database products.

As with Release 1.2, we expect future work to be backward compatible with Release 2.0. Although we expect a few changes to come, for example to the Java binding, the Standard should now be reasonable stable.

The major components of ODMG 2.0 are:

Object Model. We have used the OMG Object Model as the basis for our model. The OMG core model was designed to be a common denominator for object request brokers, object database systems, object programming languages, and other applications. In keeping with the OMG Architecture, we have designed an ODBMS profile for the model, adding components (relationships) to the OMG core object model to support our needs. Release 2.0 introduces a meta model.
Object Specification Languages. One of the specification languages for ODBMSs is the Object Definition Language (or ODL), which is distinguished from traditional database data definition languages (DDLs). We use the OMG Interface Definition Language (IDL) as the basis for ODL syntax. Release 2.0 adds another language, the Object Interchange Format (OIF), which can be used to exchange objects between databases, provide database documentation, or drive database test suites.
Object Query Language. We define a declarative (nonprocedural) language for querying and updating database objects. We have used relational SQL as the basis for the Object Query Language (OQL) where possible, though OQL supports more powerful capabilities.
C++ Language Binding. The C++ binding to ODBMSs includes a version of the ODL that uses C++ syntax, a mechanism to invoke OQL, and procedures for operations on databases and transactions.
Smalltalk Language Binding. The Smalltalk binding also includes a mechanism to invoke OQL, and procedures for operations on databases and transactions.
Java Language Binding. The binding between the ODMG Object Model (ODL and OML, which stands for Object Manipulation Language) and the Java programming language as defined by Version 1.1 of the Java Language Specification. The Java language binding also includes a mechanism to invoke OQL, and procedures for operations on databases and transactions.

It is possible to read and write the same database from C++, Smalltalk, and Java, as long as you stay within the common subset of supported data types. Unlike SQL in relational systems, ODBMS Data Manipulation Languages are tailored to specific application-programming languages, in order to provide a single, integrated environment for programming and data manipulation. We don't believe exclusively in a universal DML syntax. We go further than relational systems, as we support a unified object model for sharing data across programming languages, as well as a common query language.

A better understanding of the architecture of an ODBMS will help put the components we have discussed into perspective. Figure 1 illustrates the use of the typical ODBMS application we are trying to standardize.

The programmer writes declarations for the application schema (both data and interfaces), plus a source program for the application implementation. The source program is written in a programming language (PL) such as C++, using a class library that provides full database OML including transactions and object query. The schema declarations may be written in an extension of the programming language syntax, labeled PL ODL in the figure, or in a programming-language-independent ODL. The latter could be used as a higher level design language, or to allow schema definition independent of programming language.

The declarations and source program are then compiled and linked with the ODBMS to produce the running application. The application accesses a new or existing database, with types that must conform to the declarations. Databases may be shared with other applications on a network; the ODBMS provides a shared service for transaction and lock management, allowing data to be cached in the application.

Programming Language Bindings

The Object Definition Language (ODL) is the declarative portion of C++ ODL/OML. The C++ binding of ODL is expressed as a library that provides classes and functions to implement the concepts defined in the ODMG object model. OML is a language used for retrieving objects from the database and modifying them. The C++ OML syntax and semantics are those of standard C++ in the context of the standard class library.

ODL/OML specifies only the logical characteristics of objects and the operations used to manipulate them. It does not discuss the physical storage of objects. It does not address the clustering or memory management issues associated with the stored physical representation of objects or access structures (like indices used to accelerate object retrieval). In an ideal world, these would be transparent to the programmer. In the real world, they are not. An additional set of constructs called "physical pragmas" is defined to give the programmer some direct control over these issues, or at least to enable a programmer to provide "hints" to the storage management subsystem provided as part of the ODBMS run time. Physical pragmas exist within the ODL and OML. They are added to object type definitions specified in ODL, expressed as OML operations, or shown as optional arguments to operations defined within OML. These pragmas are not in any sense stand-alone languages, but rather a set of constructs added to ODL/OML to address implementation issues.

The programming-language-specific bindings for ODL/OML are based on one basic principle -- that the programmer feels that there is one language, not two separate languages with arbitrary boundaries between them.

While no Smalltalk language standard exists at this time, ODMG member organizations participate in the X3J20 ANSI Smalltalk standards committee. We expect that (as standards are agreed upon by that committee and commercial implementations become available) the ODMG Smalltalk binding will evolve to accommodate them. In the interests of consistency and until an official Smalltalk standard exists, we will map many ODL concepts to class descriptions as specified by Smalltalk-80.

The ODMG Smalltalk binding is based upon two principles -- it should bind to Smalltalk in a natural way that is consistent with the principles of the language, and it should support language interoperability consistent with ODL specification and semantics. We believe that organizations specifying their objects in ODL will insist that the Smalltalk binding honor those specifications. These principles have several implications that are evident in the design of the binding:

There is a unified type system that is shared by Smalltalk and the ODBMS. This type system is ODL as mapped into Smalltalk by the Smalltalk binding.
The binding respects the Smalltalk syntax, meaning the Smalltalk language will not have to be modified to accommodate this binding. ODL concepts will be represented using normal Smalltalk coding conventions.
The binding respects the fact that Smalltalk is dynamically typed. Arbitrary Smalltalk objects may be stored persistently, including ODL-specified objects, which will obey the ODL typing semantics.
The binding respects the dynamic memory-management semantics of Smalltalk. Objects will become persistent when they are referenced by other persistent objects in the database, and will be removed when they are no longer reachable in this manner.

As with other language bindings, ODMG Java binding is based on one fundamental principle -- the programmer should perceive the binding as a single language for expressing both database and programming operations, not two separate languages with arbitrary boundaries between them. This principle has several corollaries:

There is a single, unified type system shared by the Java language and the object database; individual instances of these common types can be persistent or transient.
The binding respects the Java language syntax, meaning that the Java language will not have to be modified to accommodate this binding.
The binding respects the automatic storage management semantics of Java. Objects will become persistent when they are referenced by other persistent objects in the database, and will be removed when they are no longer reachable in this manner.

The Java binding provides persistence by reachability, like the ODMG Smalltalk binding (this has also been called "transitive persistence"). On database commit, all objects reachable from database root objects are stored in the database.

The Java binding provides two ways to declare persistence-capable Java classes:

Existing Java classes can be made persistence capable.
Java class declarations (as well as a database schema) may automatically be generated by a preprocessor for ODMG ODL.

One possible ODMG implementation that supports these capabilities would be a postprocessor that takes as input the Java .class file (bytecodes) produced by the Java compiler, then produces new modified bytecodes that support persistence. Another implementation would be a preprocessor that modifies Java source before it goes to the Java compiler. Another implementation would be a modified Java interpreter.

We want a binding that allows all of these possible implementations. Because Java does not have all the hooks we might desire, and the Java binding must use standard Java syntax, it is necessary to distinguish special classes understood by the database system. These classes are called persistence-capable classes. They can have both persistent and transient instances. Only instances of these classes can be made persistent. The current version of the standard does not define how a Java class becomes a persistence-capable class.

For More Information

Object Database Management Group
14041 Burnhaven Drive, Suite 105
Burnsville, MN 55337
612-953-7250
http://www.odmg.org/