Database Interoperability and Application Transportability

Tool strategies for database interoperability

Edward Dowgiallo

Ed is an independent database developer and can be reached at P.O. Box 390, Effort, PA 18330, or on MCI Mail at 590-6310.

In the database world, "interoperability" refers to the ability to build applications that can simultaneously access data in different databases provided by different vendors. And since most databases are supposedly compliant with ANSI/ISO/IFIPS SQL standards, you'd expect smooth interoperability between them. After all, with relatively little trouble, you can compile C programs using different ANSI C compilers. But the database world is different.

Transportability, on the other hand, is the ability to move an application from one network/hardware combination to another while continuing to use the same database. Being able to do this lets you scale hardware to match changes in the size of the user organization and the volume of application traffic. In another sense, transportability is the ability to write an application and move it from one vendor's database to another.

Interoperability and transportability are achieved using various strategies, among them: single vendor, 4GL, middleware, gateway, 3GL, and CASE. Each strategy is most appropriate in certain situations, and every major database provider supports two or three--even though none are 100 percent effective.

The Single-vendor Strategy

In the single-vendor approach to an open-systems environment, you achieve a heterogeneous hardware environment by standardizing on the database vendor. By leveraging the UNIX operating system, most of the database vendors will claim to support upwards of 100 operating environments. The vendor supplies you with the database engine, application-development tools, and (for client/server application architectures) a networking layer that makes the particular network you're using transparent to the vendor's engine and tools. If you adopt this strategy, you can scale your database server with little impact on application code. The cost is that you're locked into a single vendor.

This strategy has its complication. Vendors gear their development efforts toward systems used by most of their customers. Thus, new versions are slow to appear on otherwise popular operating systems not traditionally used to support database servers. SCO UNIX is one example of this.

Networking is not always well implemented because, again, vendors put their efforts to where customers are, not where they might be going. Thus, a typical vendor will support TCP/IP for its UNIX servers, SNA protocols for IBM servers, and SPX/IPX for PC-based servers. But what if you'd like a PC server to participate in a distributed application with a UNIX or IBM server?

Many companies with PC support under Oracle v6 discovered that they would have to implement TCP/IP on all their PCs to support Oracle's SQL*Net to standard UNIX servers. Even under Oracle v7, instead of committing to implementing SPX/IPX listeners for the various UNIX servers, Oracle rewrote SQL*Net to permit the equivalent of a SQL*Net router which converts packets from one protocol to another.

Support is also an issue in the single-vendor strategy. Of the numerous platforms that a database might claim to support, many are not directly supported by the vendor. If you're buying Oracle for a Silicon graphics computer, for instance, the port to the Silicon graphics version of UNIX was done by Silicon Graphics. That machine doesn't have enough database customers to warrant Oracle investing in a port. Silicon Graphics wants Oracle on their box, so they work with Oracle to do the port. If you run into real problems, you'll receive support from both companies, but neither will deal with your problem well. Oracle lacks the experience with the native environment, and the small vendor typically doesn't have a lot of database experience. Similar situations exist for Ingres, Sybase, Informix and others.

Finally, the single vendor you select may be slow to adopt new technologies. Most database vendors were slow to move to PCs and even slower in supporting GUI environments. Complete toolsets under Windows, Presentation Manager, and so on still aren't available for most of the major vendors.

The 4GL Strategy

4GL vendors base their strategy on the fact that the database vendors tend to leapfrog each other in engine technology. Why standardize on a database engine when the engine you choose will be surpassed by a competitor within a year? The benchmark competition among the database vendors is legendary, with vendors routinely announcing new performance records on a hardware platform different from last week's record breaker.

Instead of standardizing on the database, the 4GL strategy is to standardize on the application-building tool. After all, you want all your applications to have a common look-and-feel. Therefore, you should let the 4GL tool hide the complex details of the underlying database so that you can write applications in a standard environment and treat the database as a commodity item.

However, most 4GL tools require the networking components provided by the underlying database. If you're using Oracle, the tool will probably need SQL*Net; for Sybase, it will need db/library. You'll then have problems in DOS and Windows with the number of drivers you have to load in real memory not leaving enough room to run much of anything else.

When you need to do distributed queries across databases from more than one vendor, the data is typically joined by the tool in your PC. This bypasses all the global-query optimization algorithms implemented by the database vendors to avoid the kinds of horrendous network traffic this generates. Both Paradox and Advanced Revelation take this approach, as do most 4GL vendors that support multivendor joins.

Even if you just want to write applications against one database at a time, you may incur suboptimal performance. Many PC development tools have added SQL support by providing a driver in the tool that treats the SQL database like a flat-file manager. Anytime data is needed, a SELECT statement is issued against a single table. The performance of this type of implementation strategy can be devastating. Similarly, many tools simply turn on auto-commit and have no real transaction support. If your tool supports SQL, ASCII, dBase, and Paradox data formats, ask the vendors about the sophistication of their drivers.

UniFace, on the other hand, has a component called a Polyserver, which resides in the database server and implements UniFace's own data dictionary. Features are used in the underlying database when available, and simulated when not. UniFace thus guarantees a set of features much greater than a least-common denominator approach--there's an intelligent driver for each database supported. The downside is that you're committed to using Uniface's networking and application-development components, which have been trailing their competitors in supporting GUI features.

The Middleware Strategy

Middleware vendors include companies such as Information Builders with its EDA/SQL product. This strategy provides a networking interface that's standardized on both sides. This supports an open-system strategy in that the database networking infrastructure is independent of front-end and back-end vendors. This enables you to potentially change vendors on either side because the change doesn't involve altering the infrastructure. If nothing else, this gives you greater clout when negotiating contracts with database and front-end tool vendors.

At the programming level, the benefit is that the client software uses exactly the same call-level interface, regardless of the database on the back end. This makes it easy for a tool vendor to support numerous databases. Simply writing one driver for the tool that supports a popular middleware product makes all the database engines that have a driver for that middleware product "supported" by the tool.

However, because the tool vendor is writing to a generic interface and can't be sure which database engine is on the other side of that interface, the SQL supported is typically generic ANSI SQL. This virtually guarantees a least-common denominator level of support--SQL features that ensure performance, data integrity, and location transparency probably won't be available to your application, which nullifies most of your reasons for switching to SQL in the first place. This will change as the interface standards being promulgated by the SQL Access group become more feature rich.

A variation of the middleware strategy is the RPC approach, which database vendors use to implement their own networking components. I avoid this unless I'm developing an application with very special requirements because RPCs are a major complication. They bypass existing technology, increase the skill needed by maintenance personnel, and require you to rewrite existing applications to take advantage of RPC.

The Gateway Strategy

The gateway strategy provides a co-existence stage for converting from one database vendor to another. The purpose of a database gateway is to make a "foreign" database behave like the database provided by the gateway vendor. For instance, the Oracle gateway for DB2 allows DB2 to be accessed by tools that can interface with Oracle's SQL*Net. If you're familiar with Oracle and want to access DB2 data from your local workstation, this is a viable approach: You can move data very easily from DB2 into Oracle, run SQL*Plus scripts, generate ad hoc reports, and the like.

Unfortunately, any tool that depends on the Oracle system tables won't work since the Oracle gateway doesn't simulate these tables in DB2. That excludes SQL*Forms and most other development tools. SQL extensions such as Oracle's ability to do self-joins (tree traversals in an inventory parts explosion, for example) aren't supported since DB2 doesn't have this facility, so you end up with the intersection of the features supported by the two database engines and the ability to use tools that do not use information in the system tables of either database.

This isn't to pick on Oracle. Most database gateways are simply dynamic SQL programs with little specific knowledge about the database they interface with. Typically, such programs don't do much beyond data-type translation between the two database systems, but they aren't expensive to develop, and there are marketing reasons to do so.

It's more expensive to develop a higher-quality gateway that really masks the differences between the two systems. Oracle is planning to redo their DB2 gateway to make it more functional and competitive with similar gateways from Ingres and Sybase, but differences between databases will always be visible through the gateway.

The gateway strategy is inappropriate as a general-purpose infrastructure component. Information Builders has many database gateways available for FOC-Net and EDA/SQL, but it's unwise to include applications that send transaction-oriented production traffic through such gateways as part of a middleware strategy.

Gateways do have good specific uses, though. It's reasonable to make a gateway part of a decision-support infrastructure, as long as the traffic is relatively low volume. It's also reasonable to make a gateway part of a data-replication or data-warehouse strategy. The key is that transaction traffic should never run directly against the gateway.

The 3GL Strategy

3GL support for SQL generally comes as precompilers and call interfaces. Of the two, precompilers are the easier way to write transportable code. The typical precompiler supports either an SAA or ANSI syntax of SQL. Support is typically provided for both static and dynamic SQL. Truly static SQL requires that the engine support some form of stored execution plan, which is usually implemented as a stored procedure or access plan.

If you want to support a specific group of databases when using a precompiler, the differences can be handled via the conditional compile facility. Oddly, connection management lacks standardization, so no matter how plain-vanilla you write the SQL, you'll still have to at least conditionally compile the database logon and logoff. Most shops will find SQL-standardized syntax for error checking inadequate and want to use the lower-level facility. One or more communication-area variables are usually available. It's best to write a subroutine, which is called after each statement, checks the return codes, handles the issues of a message, and returns a standardized code telling your application whether to continue, do something over, or abort.

If any of the databases you want to write for only supports a call interface, the problem becomes more complex. Most call interfaces have subroutines that perform identical functions, although their names and parameters differ. The basic steps in executing any SQL statement are to place the statement in a command buffer, bind program variables, execute the statement, get a completion code, and retrieve data returned, if any. Between a specific pair of databases, it's usually possible to come up with a set of macros that mask all these differences. But this isn't a trivial task. You get a break at the call-interface level in that the differences between static- and dynamic-SQL syntax disappear. All the SQL statements are strings passed to the interface. The more dynamically the string is assembled at run time, the more complicated the code, but the basic steps remain identical. Static SQL is usually only available via a stored-procedure mechanism when using a call interface. Conversely, all engines that support the static-SQL stored-access plan use a precompiler interface. This static-SQL based strategy transfers the burden of knowledge to the developer. If the developers aren't versed in several engines, you can't achieve transportability between engines.

The CASE Strategy

CASE vendors are starting to provide serious support for SQL. Tools such as IEF from Texas Instruments, Telon from Computer Associates, and APS from Sage Software are viable for organizations doing in-house development of major systems. For transportability and interoperability, products that generate code can go a long way toward making the differences between database products transparent. A CASE tool's degree of success in doing this depends on how much custom knowledge about the target database has been built into its code generator. If the degree of knowledge is high, you can build an application with the CASE tool and migrate it to different operating environments just by changing a few check boxes on the code-generator screen.

Unfortunately, while these tools are well developed for generating Cobol in an MVS environment using various IBM front ends (CICS, IMS/DC, and so on) and back ends (VSAM, IMS/DB, DB2, and the like), support for C, non-MVS environments, and UNIX-based SQL database engines is only beginning to appear. Take care to choose a tool with adequate support for the environments for which you'll be developing. Consider, also, the degree to which particular databases are supported, as these environments are still new to CASE developers. Don't be surprised if the code generated resembles ANSI SQL and ignores all the high-performance, nonstandard features of the target database engine.

Copyright © 1993, Dr. Dobb's Journal