Using Locators to Manipulate Large Objects

Dr. Dobb's Sourcebook March/April 1997

Handling multimedia data without a penalty

By Gary Bist

Gary is a DB2 course developer at IBM's Toronto Lab. He can be contacted at bist@torolab4.vnet.ibm.com.

Today's relational database has had to adapt to the multimedia age. It has done so by creating a new set of data types (called large objects) capable of storing pictures, sound, video, and text. The data types are either binary large objects (BLOBs) or character large objects (CLOBs). Generically, they are called large objects or LOBs for short. Data assigned these data types can be stored in a database just like data with traditional SQL data types (integer, double, character, long varchar, and so on). That's the good news. The bad news is that these large, multimedia objects are, well, large -- very large. DB2 for common IBM servers, for instance, can store objects with sizes up to 2 GB! That would solve your problem if you only had to store large objects. However, applications also have to manipulate them. Moving megabytes and even gigabytes of multimedia data in and out of a database, and then manipulating this data in the application, can cause performance degradation. The problem worsens in a client/server configuration.

Though the client/server model is often presented as a model where everyone gets what he or she wants, that wonderful promise creates complexity. To write this article on locators, for example, I created an application to run on an OS/2 client connected to an AIX server through a LAN with a TCP/IP communication protocol. A client/server environment allows people to use a mixture of hardware, operating systems, and network protocols. But when you are developing an application, you have to overcome the resulting rat's nest of different architectures and technical characteristics. When large objects are involved, a network can be jammed by the data coming off a server, with a subsequent reduction in response time to any application using that network. After the large objects pass through the network, skinny clients (those with minimal memory, CPU speed, and disk space compared to the server) must still struggle to process them. Processing can involve more calls to the server for more large objects, further compounding the performance problem. One technique to avoid this performance penalty in the client/server environment is to use locators.

What Is a Locator?

A locator is a representation of an entire large object or part of a large object. It can also represent a large-object expression consisting of pieces of large objects glued together by the concatenation operator. It is called a locator because it specifies the location either of the entire object or the place inside the large object that is to be manipulated by the application. Conceptually, a locator is similar to a pointer: It points to a value stored elsewhere. Like a pointer, it allows you to manipulate a specific value without the cost of moving that value in and out of memory. Regardless of the size of the object it represents, a locator itself is only 4 bytes.

Don Chamberlin, author of Using the New DB2: IBM's Object-Relational Database System (Morgan Kaufmann Publishers, 1996) defines a locator this way:

A very powerful feature of the new LOB data types, which distinguishes them from long varchars and other implementations of large objects, is the concept of a locator. Locators arose out of the observation that it is quite expensive to move large objects back and forth between the database and an application program...A locator is a value that can be used in an application program to represent the value of a large object without actually containing the bytes of the large object. By manipulating locators, a program can perform operations on large objects while these objects remain inside the database system. In this way, the program can often avoid allocating the storage to hold a large object and paying the cost of moving the large object between the database and the application.

Coding Applications with Locators

Let's look at a typical challenge that can be addressed with locators. Boris Coder works tirelessly in a tiny cubicle of a large computer corporation. Except for occasionally buying some coffee from a machine down the hall, Boris churns out code hour after hour, six or seven days a week. He does have one distraction, however. He likes walking further down the hall to flip through the latest PC magazines piled on a coffee table. In particular, Boris likes to follow the columnists who appear to have opinions about everything in the computer industry. They even have opinions about life outside the computer industry (something Boris himself can only speculate about). In a flash of inspiration one day, Boris realized that he could put all his favorite columnists' opinions online by using large-object data types. He scanned the hundreds of columns he had read and stored them in CLOB files. Each CLOB file represented a collection of opinions by one PC magazine columnist. Boris then created a table for storing the CLOB files in his database, creating a column for the columnist's name and another for his opinion. The opinion column allowed for a CLOB file of up to 5 MB. The columnists were very opinionated. Table 1 illustrates Boris' table.

Selecting a CLOB file from this table does not result in a value such as "manor.txt" returning to the screen. Instead, you get all 5 MB of text characters returning to the screen! Boris was often amused watching first-time users of CLOB files make this common error.

Figure 1, from the dboreyak.txt CLOB file, is typical of the content and structure inside the 5-MB files. Each opinion starts with a title, followed by the columnist's opinion, and concludes with an "End of my humble opinion" indicator.

The Opinionated_People table was an excellent idea, but it did have the drawback that if Boris wanted to see a particular opinion from a columnist he still needed to get all 5 MB of the columnist's opinions out of the database and print the entire file. This annoyed other users in other cubicles who would sometimes say, "Why is the system running so slow today? I wonder if that CODER userid is playing around with his CLOB files again." You see, the database that all the cubicle people shared was far from their location in New York City. Rising crime rates and vandalism in the city had caused senior executives to protect their equipment and valuable corporate data by moving it all to Tasmania. There, they didn't have any problems with vandals breaking their server. However, some of the cubicle people did grumble about response time. (None of the senior executives grumbled, but then they never used the system themselves.) You can see the problem Boris Coder caused whenever he accessed his large object files: Everyone else using the LAN and the Tasmanian database saw their response time suddenly increase by a minute or so.

In frustration, Boris actually opened a DB2 manual, discovered locators, and wrote the program in Listing One It requests the name of a columnist and the subject that a user wants an opinion about, then finds that opinion in the appropriate large object file, and places just that opinion in a small text file for the user. Using locators, Boris' application never manipulates the entire large object. Only two 4-byte locators are used. Let's look at how the application uses locators to eliminate manipulating the entire large object.

Declaring Locators

Listing Two shows how the application declares a locator. The first locator(all_opinions_locator) represents all of the columnist's opinions; that is, the entire large object. Like a pointer, the locator is pointing to the beginning of the large object. The second locator (starting_ with_subject_locator) is the point in the large object where the opinion wanted by the user starts. Note that the size of the locator does not need to be specified. The system allocates 4 bytes for each locator.

Using Locators

The Create_All_Opinions_Loc function assigns the all_opinions_locator to the set of opinions selected by the user (based on the name of a PC magazine columnist) by using a SELECT statement; see Listing Three. This locator is, in turn, used to isolate only the opinion wanted by using a second locator (starting_with_subject_locator) in the Isolate_Opinion_Wanted function. Using the POSSTR (for locating the position of a character in a string) and SUBSTR (for creating a substring from a string) functions, the application operates on the locators as if the locators were the actual large object itself. The POSSTR and SUBSTR functions are capable of operating on the full 5 MB of text in the large object, and in this application, they do so through the 4-byte locators representing the 5 MB. The application continues by isolating only the opinion wanted and placing it in a small (5 KB) buffer; see Listing Four.

The buffer is written to a file outside the database in the Put_Opinion_Wanted_to_File function. Note that the opinion_file variable in this function is a file-reference variable. File-reference variables, like locators, represent something larger than themselves. In particular, they represent large files outside the database used for direct input and output of large-object data. The opinion_file file-reference variable is declared in a similar way to a locator (SQL TYPE IS CLOB_FILE) and assigned a file outside the database (opinion.txt) in the Provide_Opinion_File function. Using file-reference variables along with locators can further reduce the overhead of manipulating large-object data; see Listing Five.

The application concludes by freeing the locators in the Free_Locators function. The values associated with the locator are destroyed and the memory associated with them is returned to the system; see Listing Six.

Conclusion

The PC_Mag_Columnists application uses locators to manipulate a 5-MB large object with only two 4-byte locators (all_opinions_locator and starting_with_subject_locator). This minimal cost to manipulate a very large text object results in quickly processing the request from a user and placing the opinion wanted into a file (opinion.txt). If the user specified "John_Deboreyak" as his columnist and "Net-centric Computing" as his subject, Listing One would place Figure 2 into the file.

Using locators (and file-reference variables) can dramatically improve the performance of applications that handle large objects. The example I've described here manipulated only one large object, but in a real application where many large objects might be handled, the performance benefit is compounded significantly. Typically, these applications are "multimedia applications." You can read more about locators in the SQL Reference, which comes with the DB2 for common servers product. DB2 for common servers follows IBM's open system design principle, which means that users can put DB2 on their operating system of choice. It is therefore available on many platforms, including OS/2 and AIX where the code shown in this article was tested.

Acknowledgments

I'd like to thank Toby Lehman, Doug Free, Patti Cartwright, and Patricia Acton for reviewing this article; and also Dennis Bockus, who provided editorial guidance. All are employees of IBM.

DDJ

Listing One

// A program to get an opinion from a large object text file - Gary Bist#include <iostream.h>
#include <string.h>
#include <sqlenv.h>
EXEC SQL INCLUDE SQLCA;
 
class Any_Opinion_Any_Time
{
public:
    Any_Opinion_Any_Time();              // Constructor
                                         // Public member functions
    void Connect_to_Database();          //  Connects app to database
    void Provide_Opinion_File();         //  File for opinion wanted
    void Get_Name();                     //  Requests columnist's name
    void Get_Subject();                  //  Gets subject for opinion
    void Create_All_Opinions_Loc();      //  Locator for all opinions
    void Isolate_Opinion_Wanted();       //  Isolates opinion wanted
    void Put_Opinion_Wanted_to_File();   //  Outputs opinion to file
    void Free_Locators();                //  Frees memory
    void Disconnect_from_Database();     //  Disconnects app
    void Tell_Where_to_Find_It();        //  Points to file with opinion
    ~Any_Opinion_Any_Time();             // Destructor
private:                                 // Private data members
    EXEC SQL BEGIN DECLARE SECTION;
       char columnist_name[25];          //   Stores columnist's name
       char subject[25];                 //   Stores subject for opinion
       SQL TYPE IS CLOB_FILE      opinion_file; // Stores the opinion
       SQL TYPE IS CLOB_LOCATOR   all_opinions_locator; // All opinions
       SQL TYPE IS CLOB_LOCATOR   starting_with_subject_locator;
       SQL TYPE IS CLOB( 5 K )    opinion_wanted_buffer; // Buffer
       long                       opinion_wanted_position; // Position
       long                       end_of_opinion_wanted; // Position
    EXEC SQL END DECLARE SECTION;
};
// ------- Any_Opinion_Any_Time constructor
Any_Opinion_Any_Time::Any_Opinion_Any_Time()
{
}
// ------- Member function to connect application to database
void Any_Opinion_Any_Time::Connect_to_Database()
{

EXEC SQL CONNECT TO tasmania; if ( SQLCODE != 0 ) { cout << "\nConnect to database error: SQLCODE = " << SQLCODE; } } // ------- Member function to provide a file for the opinion wanted void Any_Opinion_Any_Time::Provide_Opinion_File() { strcpy( opinion_file.name, "e:\\locators\\opinion.txt" ); opinion_file.name_length = strlen( "e:\\locators\\opinion.txt" ); opinion_file.file_options = SQL_FILE_OVERWRITE; } // ------- Member function to get columnist's name from user void Any_Opinion_Any_Time::Get_Name() { cout << "Type the name of a columnist and press Enter\n"; cin >> columnist_name; } // ------- Member function to get subject for opinion void Any_Opinion_Any_Time::Get_Subject() { cout << "\nType the subject you wish an opinion about and press Enter\n"; cin >> subject; } // ------- Member function to create a locator for all columnist's opinions void Any_Opinion_Any_Time::Create_All_Opinions_Loc() { EXEC SQL SELECT opinion INTO :all_opinions_locator FROM opinionated_people WHERE columnist = :columnist_name; if ( SQLCODE != 0 ) { cout << "\nColumnist name not found error: SQLCODE = " << SQLCODE; } } // ------- Member function to isolate opinion wanted to buffer void Any_Opinion_Any_Time::Isolate_Opinion_Wanted() { // First find where opinion wanted starts using all_opinions_locator EXEC SQL VALUES( POSSTR( :all_opinions_locator, :subject ) ) INTO :opinion_wanted_position; if ( SQLCODE != 0 ) { cout << "\nStart of opinion wanted position error: SQLCODE = " << SQLCODE; } // Next put all opinions starting with opinion wanted in another locator EXEC SQL VALUES( SUBSTR( :all_opinions_locator, :opinion_wanted_position ) ) INTO :starting_with_subject_locator; if ( SQLCODE != 0 ) { cout << "\nOpinions starting with subject locator error: SQLCODE = " << SQLCODE; } // Next find where opinion wanted ends using starting_with_subject_locator EXEC SQL VALUES( POSSTR( :starting_with_subject_locator,

'End of my humble opinion' ) ) INTO :end_of_opinion_wanted; if ( SQLCODE != 0 ) { cout << "\nEnd of opinion wanted position error: SQLCODE = " << SQLCODE; } // Finally, output opinion wanted to a buffer EXEC SQL VALUES( SUBSTR( :starting_with_subject_locator, 1, :end_of_opinion_wanted - 1 ) ) INTO :opinion_wanted_buffer; if ( SQLCODE != 0 ) { cout << "\nWriting opinion to buffer error: SQLCODE = " << SQLCODE; } } // ------- Member function to write opinion wanted by user to a file void Any_Opinion_Any_Time::Put_Opinion_Wanted_to_File() { EXEC SQL VALUES( :opinion_wanted_buffer ) INTO :opinion_file; if ( SQLCODE != 0 ) { cout << "\nWriting opinion to file error: SQLCODE = " << SQLCODE; } } // ------- Member function to free memory of locator values void Any_Opinion_Any_Time::Free_Locators() { EXEC SQL FREE LOCATOR :all_opinions_locator; EXEC SQL FREE LOCATOR :starting_with_subject_locator; if ( SQLCODE != 0 ) { cout << "\nFreeing locators error: SQLCODE = " << SQLCODE; } } // ------- Member function to disconnect application from database void Any_Opinion_Any_Time::Disconnect_from_Database() { EXEC SQL COMMIT; // Commit any open transaction EXEC SQL DISCONNECT tasmania; if ( SQLCODE != 0 ) { cout << "\nDisconnect from database error: SQLCODE = " << SQLCODE; } } // ------- Member function to tell user where to find opinion void Any_Opinion_Any_Time::Tell_Where_to_Find_It() { cout << "\nThe opinion is in the opinion.txt file"; } // ------- Any_Opinion_Any_Time destructor Any_Opinion_Any_Time::~Any_Opinion_Any_Time() { } // ======= Program to get an opinion from a large object text file void main() { Any_Opinion_Any_Time PC_Mag_Columnists; // PC mag columnists' opinions

PC_Mag_Columnists.Connect_to_Database(); // Connect app to database PC_Mag_Columnists.Provide_Opinion_File(); // File for opinion wanted PC_Mag_Columnists.Get_Name(); // Get name of columnist PC_Mag_Columnists.Get_Subject(); // Get subject for an opinion PC_Mag_Columnists.Create_All_Opinions_Loc(); // Locator for all opinions PC_Mag_Columnists.Isolate_Opinion_Wanted(); // Isolate only opinion wanted PC_Mag_Columnists.Put_Opinion_Wanted_to_File(); // Output opinion wanted PC_Mag_Columnists.Free_Locators(); // Free up memory PC_Mag_Columnists.Disconnect_from_Database(); // Disconnect app from db PC_Mag_Columnists.Tell_Where_to_Find_It(); // Tell where to find opinion }

Back to Article

Listing Two

SQL TYPE IS CLOB_LOCATOR   all_opinions_locator; // All opinions
SQL TYPE IS CLOB_LOCATOR   starting_with_subject_locator;
 

Back to Article

Listing Three

// ------- Member function to create a locator for all columnist's opinions
void Any_Opinion_Any_Time::Create_All_Opinions_Loc()
{
    EXEC SQL SELECT opinion INTO :all_opinions_locator
       FROM opinionated_people WHERE columnist = :columnist_name;
    if ( SQLCODE != 0 )
    {
       cout << "\nColumnist name not found error: SQLCODE = " << SQLCODE;
    }
}
 

Back to Article

Listing Four

// ------- Member function to isolate opinion wanted to buffer
void Any_Opinion_Any_Time::Isolate_Opinion_Wanted()
{
   // First find where opinion wanted starts using all_opinions_locator
   EXEC SQL VALUES( POSSTR( :all_opinions_locator, :subject ) )
      INTO :opinion_wanted_position;
   if ( SQLCODE != 0 )
   {
      cout << "\nStart of opinion wanted position 
                                               error: SQLCODE = " << SQLCODE;
   }
   // Next put all opinions starting with opinion wanted in another locator
   EXEC SQL VALUES( SUBSTR( :all_opinions_locator, :opinion_wanted_position ) )
      INTO :starting_with_subject_locator;
   if ( SQLCODE != 0 )
   {
    cout << "\nOpinions starting with subject locator 
                                                 error: SQLCODE = "<< SQLCODE;
   }
  // Next find where opinion wanted ends using starting_with_subject_locator
   EXEC SQL VALUES( POSSTR( :starting_with_subject_locator,
      'End of my humble opinion' ) ) INTO :end_of_opinion_wanted;
   if ( SQLCODE != 0 )
   {
      cout << "\nEnd of opinion wanted position error: SQLCODE = " << SQLCODE;
   }

// Finally, output opinion wanted to a buffer EXEC SQL VALUES( SUBSTR( :starting_with_subject_locator, 1, :end_of_opinion_wanted - 1 ) ) INTO :opinion_wanted_buffer; if ( SQLCODE != 0 ) { cout << "\nWriting opinion to buffer error: SQLCODE = " << SQLCODE; } }

Back to Article

Listing Five

// ------- Member function to write opinion wanted by user to a file
void Any_Opinion_Any_Time::Put_Opinion_Wanted_to_File()
{
   EXEC SQL VALUES( :opinion_wanted_buffer ) INTO :opinion_file;
   if ( SQLCODE != 0 )
   {
      cout << "\nWriting opinion to file error: SQLCODE = " << SQLCODE;
   }
}
 

Back to Article

Listing Six

// ------- Member function to free memory of locator values
void Any_Opinion_Any_Time::Free_Locators()
{
   EXEC SQL FREE LOCATOR :all_opinions_locator;
   EXEC SQL FREE LOCATOR :starting_with_subject_locator;
   if ( SQLCODE != 0 )
   {
      cout << "\nFreeing locators error: SQLCODE = " << SQLCODE;
   }
}

Back to Article