Looking at virtually any data from virtually anywhere takes some doing. The authors have designed just such a framework.
Introduction
Information on the Internet is frequently accompanied by numerous pictures and graphics. The graphical presentation depends on compression formats such as TIFF and JPEG, which reduce the amount of data that must be transmitted for each image. Web browsers and languages such as Java contain software or components for automatically decompressing images in these formats and displaying them on the monitor. Unfortunately, it is not possible to take advantage of this infrastructure when the data of interest does not have a readily available visual representation. This is not to say that the data cannot be visualized or that no appropriate graphic exists, but rather, by its very nature, no immediate visual representation is inherent within the data. Examples of such data include airflow over the surfaces of automobiles and airplanes, temperature variations of the ocean, wind direction and speed over the earth, and terrain elevation.
Such data is usually numeric, often large, and stored in databases or binary files; it is visualized by converting the magnitude and direction of the measured or sampled points into contour maps, shaded relief images, or some other graphic. Often color is employed in the image to portray key components of the data. Thus, for example, in the display of ocean temperature data, hot colors such as red, yellow, and orange are used to indicate regions of relatively high temperature while cooler colors like green and blue show areas of lower temperature. The formation of images from this type of data requires a procedure to map the numerical information into an image that provides a useful visual portrayal, since no such image is inherent in the data itself.
Large non-visual binary datasets are continually being generated and updated by scientists, researchers, and others in both the public and private sectors. Many of these datasets are publicly available via the Internet and other electronic media. An excellent example is the USGSs (United States Geological Survey) DTED (Digital Terrain Elevation Data) product, which covers the entire United States and is freely and publicly available on their website at http://edc.usgs.gov/pub/data/DEM/250. The data consists of terrain elevation postings spaced at approximately 100 meter intervals arranged into one-square-degree files.
Researchers, investigators, and others interested in using this (and similar) datasets often have a general idea of the desired characteristics required from the data, but are uncertain of its exact location. In many cases, a visual representation of the data greatly facilitates identification of the desired dataset. Downloading, imaging, and viewing large non-visual datasets for identification purposes, however, is both time consuming and impractical. A better approach is to permit visual browsing of the data online, as demonstrated in [1]. For datasets like the USGS elevation data, this requires either generating and storing images for each dataset or creating it on demand as the data is browsed. Creating and storing a visual representation of each dataset increases both storage requirements and maintenance of the data. Creating images on demand, although technically more sophisticated, is made practical by combining technologies like CORBA (Common Object Request Broker Architecture) with existing Internet infrastructure and languages, such as C++ and Java.
This article presents a client-server solution for visually browsing non-visual datasets. Both client and server are connected via a CORBA backbone and are configurable to almost any dataset. An example configuration for visually browsing the USGS DTED dataset is given. This example was selected due to the availability of the data, the inherent non-visual nature of the dataset, and the ease of interpreting the resulting images by a casual (non-technical) audience.
Since CORBA permits linking objects written in different languages, the language most appropriate to the particular application is selected for writing both the client and server. Because the client or browser portion of the application is where the image display resides, and since this display can reside on potentially any platform, Java is (arguably) the preferred language for use in writing the browser. Similarly, since the server or reader portion of the application is primarily concerned with quickly accessing the data (rather than its display), C++ is the preferred language due to its (again, arguably) superior performance in this area.
The CORBA architecture provides a message bus that conveys invocation requests and results to and from resident objects anywhere on a network. For example, a Java class within the browser client discussed above can invoke a method written in C++, which resides within the data-reader server. The browser and reader are connected through the CORBA bus via the Internet or some other network, but need not reside on the same system. Programmatically, this is accomplished using the IDL (Interface Design Language), which permits specification of the interface in a language-neutral format. IDL compilers (available for both C++ and Java) convert this specification into a language-specific interface. On the client side, this interface is referred to as stub code; on the server side, it is referred to as skeleton code.
The stub code is complete and provides the client with the necessary interface to the required server methods. The skeleton code must be augmented to provide the necessary functionality. This is usually accomplished by writing a class that inherits from a base class contained in the IDL-generated skeleton code. The derived class provides the required functionality via the appropriate member functions. These functions are unimplemented and declared as virtual in the skeleton code base class.
Application Design
The distributed data visualization application consists of three components: a visual browser for selecting datasets and displaying their associated images, a data reader for finding and reading the data, and a CORBA backbone for connecting the browser (client) with the reader (server). Both browser and reader are required to work with (almost) any non-visual dataset. In addition, these datasets can be in any format and visualized using any number of techniques. Also, the browser should permit writing of the data in a user-chosen (proprietary) format and allow for custom data visualization. This should all be achieved without recompiling either the reader or the browser. Since datasets can be stored in any format, visualized using any number of methods, and written in any format, both reader and browser are designed to be configurable by their respective users.
The reader and browser software is made configurable by separating it into modules; the configurable portion of the reader is placed into a shared library (or DLL under Windows), while the configurable portion of the browser is placed into a Java package. On the server side, the configurable portion of the reader consists of the C++ code responsible for reading the datasets and providing information used during initialization. On the client side, the configurable portion of the browser is the Java package used for rendering images and writing data in proprietary formats. The configurable portion of the software is provided by the end users of the client and/or the server. The browser includes a default image-rendering package that provides three-bit color contour visualization for the data. The browser software can therefore be utilized as is, but requires additional customization for writing and imaging. The reader must be augmented to provide the necessary data access. The example presented here provides the required augmentation for reading the USGS terrain elevation dataset. In addition, the browser is customized by adding a proprietary writer and shaded-relief visualization [2]. Figure 1 shows the major components for both the reader and browser, including the CORBA interfaces.
The browser is implemented as a Java application, using the Java Development Kit v1.2. Java is freely available via the Sun Microsystems website. The reader was written in C++ and tested on the following platforms: Linux, Unix (SGI), Windows 98, and Windows NT. The GNU C++ compiler was used to compile the server code under Unix, and Microsoft Visual C++ was used under Windows. omniORB (http://www.uk.research.att.com/omniORB/index.html) was utilized for the CORBA backbone. omniORB v3.0 was selected because it is freely available (see product licensing agreement at website), runs on a variety of platforms, and conforms to the latest CORBA standards. This product comes pre-built for Windows, Linux, SGI, SUN, and other platforms, and the source is available for building the ORB, if desired (or required).
Implementation
The first step in the application implementation is to specify the client-server interface in IDL. In order to browse the data, the server must send the client a list of available files for viewing and downloading. In addition, because large datasets are often involved, it is also convenient to send information about file size. Finally, a method is required for passing the data from the server to the client. Since the data can exist in byte, integer, or floating-point format, all data is sent as bytes. A header containing data format and size precedes the byte stream (see below) and is used to assemble the bytes into the correct format.
Listing 1 defines the interface in IDL. It consists of five function specifications for requesting the data (getData), file location (getLocation), available data files (getFiles), file resolution or size (getResolution), and path delimiter (getPathDelimiter). Input parameters to these functions are indicated by the IDL keyword in, and output parameters are indicated by the keyword out. The statement:
typedef sequence<octet>UnboundedData;is used to specify the byte stream for the output data. The keyword sequence specifies an array, and the keyword octet specifies the type of the array as byte (in Java) or unsigned char (in C++).
The keyword module is mapped to a package under Java and a namespace under C++. Likewise, the keyword interface is mapped to a class under both Java and C++. The type keyword string is mapped to String (under Java) or const char* (under C++). The keyword long is mapped to int under Java and long under C++. The calling arguments are detailed below.
Listing 1 is available in the online source code (see www.cuj.com) as the file Data.idl. This file is compiled using both the C++ and Java IDL compilers. The IDL-C++ compiler is supplied with the omniORB package; it uses Data.idl to generate the server skeleton code. Usage is as follows:
>>omniidl -v -bcxx -Wbh=.h -Wbs=SK.cpp Data.idlThis produces two files, Data.h and DataSK.cpp, which contain the skeleton code. DataSK.cpp must then be compiled in C++ and included when making the server. The header file, Data.h, contains the definition of the DataInfo base class, which is used to build the server interface.
On the Java side, the IDL-Java compiler is used to compile the IDL to generate the stub code. Sun Microsystems supplies this compiler at its Java website. Usage is as follows:
>>idltojava -fno-cpp Data.idlThe Java-IDL compiler produces a directory (folder) named Data containing the stub code in the following files: DataInfo.java, _DataInfoImplBase.java, _DataInfoStub.java, DataInfoHelper.java, DataInfoHolder.java, UnboundedDataHelper.java, and UnboundedDataHolder.java. These files are compiled into the Data package using the Java compiler, and imported into the client. This gives the client access to the server functions specified in Data.idl through the class DataInfo.
The Data Reader
After generating the skeleton code, the server is built around the CORBA interface. As shown in Figure 1, the data reader consists of three components: the skeleton code, an initialization module that contains the program main, and the configurable reader. The skeleton code defines the DataInfo base class that contains the unimplemented virtual member functions defined in the IDL interface. A new class, DataReader, is defined that inherits the DataInfo base class and implements these functions. However, since these functions are part of the configurable portion of the server, it is necessary to implement them in a shared library and call them from within the corresponding member function of the DataReader class. Listing 2 shows the class definition.
In addition to the interface function, the DataReader class contains five functions for configuring the CORBA Naming Service. The Naming Service allows the client to identify the requested objects (functions) by name instead of the internal numerical codes used within the object broker. At initialization, the server identifies the objects it provides to the object broker via these names. The naming convention is similar to the standard file/directory structure used under Unix and Windows and employs four strings for object identification. These names are arbitrary and supplied by the server; as such, they are configurable and are placed in this portion of the software. For more details, see [3]. A fifth function is used to control the amount of data transferred per request. For omniORB, the default is 2 MB. For DTED datasets, this size must be increased to 3 MB to permit transfer of the complete set. The member function drSetServerMessageSize allows this limit to be increased.
At startup, an instance of the DataReader class is created and used to register itself with CORBA. Listing 3 shows the sequence of events. First, the ORB (Object Request Broker) is initialized, and a reference to it is obtained by calling the function ORB_init (line 1). Second, the ORB reference is used to obtain a reference to a root or base object and to activate this object so that it is available to clients requesting services (lines 2-3). Third, an instance of the DataReader class is created and activated (lines 4-5). Fourth, the instance of the DataReader class is registered with the naming service (lines 6-8), and the maximum message size increased (line 9). Fifth, the server is then activated (lines 10-11) and run (line 12).
Once running, the server waits for client requests and processes these requests via the five member functions specified in the IDL interface. The DataReader class members mTheDir, mTheFile, and mPathDel are used for holding and passing the file location, the filenames, and the path delimiter, respectively. The member class mDataInterfacePtr encapsulates the interface functions in the configurable portion of the server software. Listing 4 shows the definition of this class.
The DataInterface class isolates the configurable portion of the server from the CORBA interface and encapsulates the differences between calling functions in shared libraries under Unix and DLLs under Windows. This class contains functions that mirror those in the DataReader class. In addition, two private functions are used by the diGetData member function to request data in raw or image format. This allows the server to supply data to the client as images, if desired. The single data member, mLib, in the class is used under Windows for calling functions in the DLL.
Note that the functions getLocation and getFiles are not mirrored in the DataInterface class. These functions are implemented directly using the fsGetAbsDir and fsGetFileList functions contained in the FileService namespace. The member functions in the DataInterface class call the functions in the configurable portion of the server to satisfy browser requests. These functions are placed in a shared library or DLL called DataHandler and mirror the member functions in the DataInterface class. Table 1 lists these functions along with a brief description of the services they provide.
The ten functions listed in Table 1 constitute the configurable interface and must be supplied for each unique reader application. These functions are implemented for the USGS digital terrain elevation dataset. Table 2 shows the information returned by each of these functions for DTED datasets.
With the exception of readFileAsData, the functions in Table 2 consist of two to three lines of source code and are relatively simple and self-explanatory. readFileAsData must read the selected dataset file and place this information into the return byte stream. The return byte stream contains a header followed by the actual data in the file. The header can be of any length but must be formatted as four-byte words that contain the information listed in Table 3.
The first 16 words of the header are used by the browser to interpret the byte stream and provide information about the data to the user on the display. Word zero is used to determine the start of the actual data, and words four and five show its dimensions. Words two and three are used to reconstruct the data from the byte stream. The data can be formatted as byte (usually for images), short or long integers, or float or double values. For the USGS terrain elevation datasets, elevation values are stored as short integers. The elevation postings are spaced three seconds apart (in latitude or longitude) and a one square degree file contains 1,201 by 1,201 values.
The member function getData in the class DataReader is called from the browser using the CORBA infrastructure. This function calls the diGetData member function in the DataInterface class through the member class instance mDataInterfacePtr. After bookkeeping, the byte data stream returned by this function is passed to the browser.
diGetData examines the type input argument and calls either the member function diReadFileAsData or diReadFileAsImage depending on whether data or an image is requested by the browser. Finally, the diReadFileAsData calls the readFileAsData function that is contained in the configurable portion of the code. For Unix implementations, this function can be called directly; for Windows, a function pointer must first be obtained by calling the utility function GetProcAddress. The details of actually reading the USGS data are contained in the shared library function readFileAsData. This function can be modified to handle different datasets and a new shared library or DLL constructed and used with the server to supply this new functionality.
The Data Browser
On the browser side, the situation is similar programmatically to that on the server side. Requests for data and file information are handled through CORBA via the IDL-generated stub code. Unlike the server skeleton code, the stub code is complete and can be used as is. In addition, the browser is primarily concerned with the display of the data instead of its retrieval. Most of the effort on the browser side therefore resides in the user interface. Listing 5 shows the Java code used to request the data byte stream from the server.
The Java source begins by importing the IDL-generated Data package for handling requests through CORBA and the CORBA packages supplied with the JDK (Java Development Kit). As in the server, the init function is called to obtain a reference to the ORB. Next, an object reference is obtained and used to obtain a reference to the DataInfo class via the naming service. Finally, an UnboundedData container is initialized and the getData function is called through the DataInfo reference to obtain the data from the server. Requests for file information, location, etc. are handled almost identically.
Figure 2 shows the browser display with the shaded relief image generated from the USGS terrain elevation data. The menu at the top of the display contains four items: a File menu with a Save As item for writing data in a user-defined format, a File Type menu for setting the data request to either Image or Data, a Data menu for displaying a dialog box containing basic information about the data such as location and resolution, and a Help menu. Below the menu bar is a text box for inputting the IP address of the machine where the server resides. The IP address can be entered as either a text string or the standard four-field numerical format. The default is localhost, indicating that the server resides on the browsers local machine. The Connect button next to the address text field is used to establish contact between the browser and the server. When this button is pressed the server returns a list of available files that can be viewed using the browser.
The list of available files and any subdirectories containing similar files are displayed in the left portion of the browsers main window, below the address text field. Clicking on a file in this portion of the display causes the browser to request the dimension and size of the file. This information is shown on the four buttons directly below the file list in the left portion of the display. The first button contains the full dimensions and size of the selected file. The remaining three buttons contain the same information except for reduced resolutions. Clicking any of these buttons sends a request to the reader to return the data at that resolution. In all cases, data covering the entire area (one square degree for the USGS dataset) is sent. This option is provided to permit quicker browsing of the data by allowing the user to view complete, but lower resolution versions of the dataset. The right portion of the display shows the chosen visualization of the data. In the case of Figure 2, a shaded relief image of Death Valley is displayed, at the 400 x 400 resolution (third button). The blue area indicates the portion of the terrain below sea level.
As shown in Figure 1, the browser contains a configurable module for writing the data received from the server and for generating custom visualizations. Unlike the server, it is not necessary to provide these augmentations; the browser functions without them, generating color contour visualizations of the data without a write (Save As) capability. In the example shown in Figure 2, these augmentations are provided in the form of a shaded-relief visualization and a writer that dumps the byte stream to a file on the browsers local disk.
The configurable portion of the browser consists of a package called DataSet that contains a single class of the same name. This class contains methods for generating the color contour visualization and handling the data. It serves as a base class to the DataView class contained in the package of the same name. It is the DataView class that constitutes the configurable portion of the browser. The methods dataSetImageCreate and dataSetWrite are re-implemented from the DataSet base class thereby providing the desired custom functionality. In C++ terms, these methods are virtual.
Operations
Using the browser and reader requires installing and configuring the omniORB software before running either application. We have provided instructions for doing this along with the source code on the CUJ website (www.cuj.com/java).
Conclusion
Several considerations have been omitted from both the server and client portions of the data browser application. First are security issues. No attempt has been made to address network and computer security here, and both the data reader and browser software have ignored this issue. Clearly, security needs to be considered in any actual application. Second, no attempt has been made to measure or optimize performance of either the browser or reader. For the datasets described herein on the machines and networks utilized, performance was satisfactory with only a few seconds or tens-of-seconds required to transmit and image the data. For larger datasets, optimization should be examined in more detail. Third, a substantial increase in performance can be achieved by compressing and decompressing the data byte stream before and after transmission. At a minimum, the amount of data transmitted could be reduced by a factor of two using simple compression techniques. Fourth, although the server portion of the application was tested with multiple users (less than ten), scalability has not been investigated. In particular, server performance with hundreds (or thousands) of users is unknown. Finally, the browser portion of the application could be converted to an applet and embedded in a web page and accessed using the standard Internet web browsers.
References
[1] A.T. Diba, E.C. Kraemer, and J.L. Oravitz, Bay-Delta Data Browser, Proceedings of the 1997 ESRI Users Converence, 1997, http://www.esri.com/library/userconf/PROC97/PROC97/TO600/PAP583/P583.htm.
[2] R.E. Huss, and M.A. Pumar. Fast Rendering of Irregular Surfaces Using Lambert Shading, C/C++ Users Journal, January 1998.
[3] S. Lo, D. Riddoch, and D. Grisby. The omniORB Version 3.0 Users Guide (AT&T Laboratories Cambridge, May 2000).
Deanna K. Evans is a senior software engineer for Raytheon ITSS in Pasadena CA. She holds a B.S. in Mechanical Engineering from Washington University in St. Louis, Mo. Deanna is currently working on the Ground Data Processing System for the Shuttle Radar Topography Mission that flew aboard the space shuttle Endeavor in February 2000. She can be reached at Deanna_K_Evans@raytheon.com.
Ed C. Kraemer is a senior software engineer at Raytheon ITSS in Pasadena CA. He holds a B.S Degree in Computer Science from UCI. Ed is currently working on the Ground Data Processing System for the Shuttle Radar Topography Mission and the Next Generation Interplanetary Navigation software development effort. He can be reached at Edwin_Kraemer@raytheon.com or ed@Kraemer.net.
Mark A. Pumar is a senior software engineer at Raytheon ITSS in Pasadena CA. He holds a M.S Degrees in Physics and Electrical Engineering from UCLA. Mark is currently working on the Ground Data Processing System for the Shuttle Radar Topography Mission and the Next Generation Interplanetary Navigation software development effort. He can be reached at Mark_Pumar@raytheon.com.