September 2000/Processing Variant Records with STL

STL

Processing Variant Records with STL

Linda Kasparek

When you need a map class, you need a map class, and STL provides a good one for many applications.

Introduction

Processing variant records, which is often carried out via the brute force approach (a la the big switch statement), can lead to cumbersome code. This code becomes difficult to develop and maintain when the switch statement gets very large. Also, since the switch statement is essentially a "hard-wired" reflection of the record formats to be processed, it provides little flexibility for modifications or additions. Fortunately, C++ gives us a way to handle variant records a bit more gracefully. By storing information about the different record formats and their fields in an STL map, the mechanics can be reduced to a few lines of code. In this article, I show how to use STL to tame the unwieldy switch statement, and how to concentrate data format definitions in one convenient location within the source code.

The Problem with Variant Records

A variant record, as represented in C/C++, is a struct with a union of the variants. The fixed part of the record usually contains a field that specifies the format of the variant part.

A typical variant record structure would look something like this:
typedef struct {
    char desc[31];
    char fmt;    // 'A', 'B', or 'C'
    union {
        struct {
            // pseudo code
            field1
            field2
            field3
        // to be used if fmt = 'A'
        } A;
        struct {
            // pseudo code
            field1
            field2
            field3
            field4
        // to be used if fmt = 'B'
        } B;
        struct {
            // pseudo code
            field1
            field2
        // to be used if fmt = 'C'
        } C;
    } Variant;
} Record;
Note that field1 in A, B, and C may not necessarily be the same field.

Variant records in C/C++ are often used to store information from the outside world -- commonly a database or configuration file. As an example, I show a simple data file with three different record formats (see Figure 1) . In this file, a record spans multiple lines, with one line per field; each record is delimited by the '^' character. Each field is prepended by a field identifier. The file uses these identifiers to allow fields in the variant part to occur in any order, or to be omitted if the variant part is a default value. The first two fields are fixed:

N - name
F - record format

The 'F' prefix in front of the second field indicates that this is a format ID, which identifies the variant portion of the record. Table 1 shows the various field identifiers used within each format.

The structure that will hold these records is shown in Listing 1, as is the brute force approach to reading these records into the structure.

As you can see in Listing 1, the brute force approach requires a pretty ugly switch statement -- the code has to branch on each record format and then branch on each field within the record format. This approach would become too unwieldy to maintain if it got any more complicated -- and in real life it almost always does! Every time a record format changed, you'd have to muck around with this section of code and you'd probably want to regression test the parts that didn't change. It's easy to introduce new bugs with forgotten break statements, for example. So, you say, there's got to be a better way!

A Solution

There is a more elegant way to handle variant records, using STL's map container class and the C/C++ macro offsetof. The map container provides storage and retrieval of objects via keys. It holds data in the form of Key/Value pairs, where the Value is some object to be stored in the map, and Key is another object used to identify a particular Value object. offsetof returns the offset in bytes of a member relative to the base address of its parent structure. The basic idea here is to store the offset as the Value and to combine the record format and field prefix to make the Key. When the program hands a Key to the map, the map hands back an offset. The offset will be used to point to the corresponding structure member into which the data will be copied. This scheme is illustrated in Figure 2.

To instantiate a map container class you need to define a pair of classes -- a class to represent keys and a class to represent their corresponding values. You also need to provide for a comparison function object. STL uses the comparison function object, specified by providing a comparison class as the third parameter in the template instantiation, for find and insert operations. The predefined comparison class less<T>, found in STL's function.h, uses operator< to make comparisons. For keys of a built-in type, you get this for free. For user-defined types, there are two alternatives -- either write a custom comparison function class, or override operator< in the key and use less<T> where T refers to the key class. In this article, I show how to devise a comparison function class.

Implementing the Variant Map

First, I'll implement the key class. This class is pretty straightforward; it has two attributes, the record format and the field prefix. I also define a copy and default constructor, an assignment operator, a default destructor, and a constructor that takes arguments used to initialize the two attributes. The STL map class requires that the first four functions be explicitly defined. The code for this class (CKey) is in the globals.h header file, shown in Listing 2.

The comparison function class is also simple if you examine the key's attributes separately. The code for this class is also in globals.h, just below the implementation of the CKey class. This class has just one member: operator(). Defining operator() enables STL to use this class as a function object. This implementation of operator() works as follows: first, it compares the record format attribute for each key and returns TRUE or FALSE, depending on which is the greater. If the values for the record formats are equal, then operator() examines the prefix and returns TRUE or FALSE depending on which is the greater.

The value class CValue is straightforward as well. It has two attributes, the offset and the type indicator. The type indicator will come in handy when it is time to copy the value into the pointer location determined by the offset. If the value is a string, the program uses strcpy. Otherwise, it converts the value into a number using atoi, atof, or some other appropriate conversion and then copies it to the target field. This class also meets the prerequisite type requirements by defining the default and copy constructors, default destructor, and assignment operator.

To populate the map, I declare an array of CKey/CValue pair objects and then instantiate the map like this:
mymap m(array, array + 7);
mymap is a typedef for map<CKey, CValue, less_CKey>. The above statement calls a map constructor that takes two pointers to designate an initialization range. This invokes a map initializer that inserts each pair element in the array (array) into the map. Another alternative would be to declare and define each pair and insert it into the map one at a time.

I created the File class to encapsulate the reading and managing of variant record files; see file.h and file.cpp in Listings 3 and 4, respectively. File is responsible for getting each complete record from the file. To do this, File has to read each line in the record, look up the offset from the map based on the record format and field prefix, copy the value into the corresponding structure member to which the offset points, and return the populated structure. To look up the offset in the map, File uses the iterator j, declared in mapdef.cpp (Listing 5) , and calls the map's find() function with the key j = m.find(key);. find returns the Key/Value pair object. The pair_type template class is a typedef for pair<CKey, CValue>, which defines the members first and second, that contain the key and value objects, respectively.

To see all this in action, take a look at the little driver program (Listing 6) I implemented in Visual C++ 4.1. It instantiates a File object and calls its GetRecord member function until all records have been read. The driver prints out each populated record structure that GetRecord returns. The output appears at the bottom of Listing 6.

Other Uses for this Technique

I've had other occasions to use this scheme -- if only I'd been able to get my hands on STL. For instance, I had a task to populate a flat structure with inputs from a front-end that could only pass it one field at a time -- it did not have the capability to pass an entire populated structure. I then used this flat structure to instantiate objects in the back-end for processing. At the time, STL was not available. The field IDs were constant, consecutive integers, so I created an array of structures and could access the array using the field ID as the index. (We avoided using compiler-specific map classes, to maintain portability.) Anyway, having the STL support would have been nice.

Conclusion

As you may have noticed, using STL didn't get me completely away from the dreaded switch statement. Function File::GetRecord (Listing 4) still needs to do some processing on format types to populate a variant record. The difference between this switch statement and the "brute force" switch is subtle, but important. The brute force switch requires a separate case clause for every format type, and a separate case under each format for every field type. The improved switch requires a separate case clause only for each field type. This technique basically splits the handling of formats and the handling of field types into two separate, manageable chunks. The format types are more or less automatically handled by modifying the struct sVariantRecord, and perhaps adding a few lines to mapdef.cpp. (Adding a format requires adding four lines to mapdef.cpp; modifying a format requires no changes to this file.) Field types, which are far less likely to change over the life of an application, are handled in the switch statement.

I suppose I could have pulled this off in C using a couple of associative arrays. But STL let me program intuitively, by defining Key/Value pair objects and comparison objects. It was just too portable and elegant a solution to resist.

Reference

Mark Nelson, C++ Programmer's Guide to the Standard Template Library. (IDG Books Worldwide, Inc., 1995).

Linda Kaparek has what she considers the ultimate job -- developing a Windows-based mass-market software package for a large midwestern financial corporation while working from home. She is pursuing an M.S. in Computer Science at UMass-Lowell, having obtained an M.S. in Statistics in 1993. She has been in software development for 11 years, the most recent three years in MS Windows/C++.