November 2001 C++ Experts Forum/The (B)Leading Edge

C++ Experts Forum

The (B)Leading Edge: Using the XDR_Stream Class, Part II

Jack W. Reeves

Backtracking

In this installment of "The Bleading Edge," I had intended to continue the examination of my new XDR_Stream class and look at some of the other enhancements I found useful to add to the XDR_Stream library. Before I can do that, I have to make a confession. The code in the previous column is wrong. It is not just a little wrong either — it is so wrong that absolutely nothing should work. Unfortunately (and I use that term advisedly), it does work — and not just on one platform, but on all the platforms where I could get it to compile [1]. In fact, I was growing quite pleased with how well it seemed to be working. At this point, a little explanation is probably in order, especially since I think this is a fairly important point.

When I first created XDR_Stream, my XDR_Char was a POD (plain old data) [2] struct. This meant it had no constructors, destructors, virtual functions, etc. As such, it met the requirements for a "character" type that could be used to specialize the templates in the IOStreams library. When I first tried to compile a test program of my XDR_Stream class, I got a compile error deep inside the IOStream library. The error was located in a function named widen. I checked and widen is a member function of basic_ios. It was being used to initialize the "fill" character that is maintained by basic_ios. This initialization was part of the init function invoked by an XDR_Stream constructor. The compiler was complaining that there was no way to convert the regular default fill character (a blank) into an XDR_Char.

Since XDR_Stream I/O operations never make any use of the fill character, I didn't really care how it got initialized (or even whether it was initialized or not), so I just looked for a quick way to get rid of the compile error. There was a bold hint in the library code where the error occurred that my XDR_Char type needed a converting constructor that would take a regular char. With the hint staring me in the face, I added two constructors to XDR_Char, the necessary converting constructor and a default constructor. I was not too happy about doing this since it meant that my XDR_Char type no longer qualified as a POD struct, which in turn meant that my basic_ios specialization was now into the realm of "undefined behavior." On the other hand, my code now compiled, and I figured I knew enough about the real requirements for a character type used to specialize an IOStreams template that I could get away with a little undefined behavior. Soon, I was too busy with other issues to worry about it.

About the time I finished the last column, release 3.0 of the GNU Compiler Collection became available. I was really looking forward to this since release 3.0 has the new Standard C++ library. (The previous release had still been using a non-templatized version of IOStreams, so I had not been able to try my XDR_Stream library with it.) Shortly after I sent in the column, I sat down to try my XDR_Stream library out with the new G++ compiler. First off, my code compiled without even a warning. I was quite pleased by this. Then I tried to run my first test program, and it blew up in my face — segmentation fault. After spending some time with the debugger, I determined that the fault was occurring in the basic_ios::init function when it was trying to initialize the fill data member.

Now, I am much too paranoid to assume it was just coincidence that two different Standard C++ libraries would both give me problems at basically the same place. I figured there was something wrong with my code, no matter how well it seemed to work on two other platforms — two platforms that I now realized both used the same standard library. So I first went to the source, the ISO C++ Standard, to see if I could figure out what exactly should be happening. The Standard clearly states that init will initialize the fill member to widen(' '). I knew that much already. So I looked at the definition of widen. Unfortunately, it says:

Returns: use_facet< ctype<char_type> >(getloc()).widen(c);

I was disgusted. I didn't have to check (although I did) to know that use_facet should throw an exception if the requested facet is not in the specified locale. Since I had figured that XDR_Stream did not need any locale-specific information in order to do its formatting, I had not bothered to create a ctype facet for XDR_Char. Much to my chagrin, I was just now discovering that I needed one.

My first reaction was to be triply upset. First I was upset with the C++ Standard that it would require something like this special facet just to initialize the basic_ios class. If it needed to be something that could be customized, why not have the widen (and narrow) functions be part of the character traits type that has to be provided for any character type used to specialize basic_ios anyway. I quickly got over this pique — how you convert one character type to another depends not only on the data types involved, but also on the actual character set in use. Providing this information is what locales are for.

Next, I was upset with both libraries. Both of them are incorrect. The first actually works with incorrect code, while the second gets a segmentation fault instead of throwing the required exception. Armed with the knowledge of what should be happening, I was quickly able to find the bugs in both libraries. I couldn't get too upset about the first library, however. After all, its bug allowed my code to work. I suspect that the library's implementers may have felt the same way I did about the uselessness of having to create and install a special facet just to be able to initialize basic_ios. Whatever the reason, they just bypassed the call to use_facet in the basic_ios::init function and invoked the default behavior of the ctype template's widen function directly.

The second library was slightly more annoying. My incorrect code would not run using that library, which overall is more correct, but getting a segmentation fault instead of the correct exception is still annoying.

In any case, I couldn't get too upset at either library. This is not in the "lots of people use it every day, so why wasn't it tested properly" category. In fact, it is pretty far out on the bleeding edge. (There is a reason this column has the name that it does.) Besides, my own code was also wrong. This left me primarily upset at myself for not having thoroughly investigated the requirements for specializing basic_ios. I was especially annoyed that there had been at least one compiler generated hint, which I had ignored.

I have gone into this explanation because it provides a good object lesson about something that I (and many other writers) harp about — the need to ensure your code does not exhibit undefined behavior. In this case, I think the code should have had well-defined behavior — an exception — but that just makes the point more valid. It is easy to assume that undefined behavior will always lead to something like a segmentation fault, but it doesn't have to. As this example clearly shows, undefined behavior can interact with other bugs or other code that also has undefined behavior to result in something that appears to work just fine. Unfortunately, as is also shown by this example, it may work just fine on only one platform or with only one particular version of the library, or even with only a specific set of compiler options. When the code gets ported later, its behavior may change. Since ports are often done by people other than the original developer, figuring out what is wrong can turn into a major debugging effort.

You may be thinking "If Jack can make this kind of mistake, and both libraries also got it wrong, what hope do I have?" First, let's admit that the ordinary programmer probably isn't going to be doing stuff like this. But that is beside the point. People like me spend time creating libraries like XDR_Stream in an attempt to make other people's programming easier. If we do our jobs right, then ordinary programmers do not have to worry about such details, but there will always be details that have to be worried about. The real point of my baring my soul like this is to emphasize the need to be aware of the problem. Programmers dance with undefined behavior every day. Practically every API in existence has some requirements that have to be met but are not checked. (The STL has a lot of examples.) The only real defense is to actually know what you are doing, and that means study and practice. Obviously, your risks go up when you are doing something new. This is the time to take it slow and make sure you understand the details. A little extra effort in getting it right can pay big dividends down the road.

In order to get my code to work correctly, I had to move it back into the realm of defined behavior and work around the bugs in the two libraries. To do this, I first made my XDR_Char a synonym for a built-in type — uint32. This makes it a POD type as required by the Standard. It also works around the bug in the first library since the compiler will now accept a statement of the form:
XDR_Char(c)
as a request to cast c to be an XDR_Char [3]. Bypassing the bug in the second library just involved fixing the code in general. This meant specializing the ctype facet for XDR_Char and installing it into the global locale. The latter is handled by the XDR_Stream constructor. It uses has_facet< ctype<XDR_Char> >(getloc()) to see if the facet has already been installed. If not, it creates and installs the facet. After this, the basic_ios init function should now be able to correctly initialize its fill data element.

Reader feedback

This leads me to a similar topic. One reader emailed me a question about why I decided to create XDR_Stream as a class directly instead of explicitly specializing basic_iostream and its siblings. His reasoning was that if I did the latter, then such things as file-based and memory-based XDR streams would come along for free by just instantiating the templates for basic_fstream, basic_stringstream, etc. The simple answer is that while I originally considered specializing basic_istream, basic_ostream, and basic_iostream, I did not think it through. This is another example of the need to explore and understand all the options when venturing into new territory.

Having now explored it more thoroughly, I am not sure it will work.
template<class charT> struct char_traits;
has to be available to support explicit specialization. In addition, the two explicit specializations:
template<> struct char_traits<char>;
template<> struct har_traits<wchar_t>;
have to be provided. Beyond that, the Standard only specifies the requirements for the members of struct char_traits. My interpretation of this has always been that an implementation is not required to provide an actual definition of struct char_traits that can be instantiated. This means that every other character type that needs to specialize char_traits has to have an explicit specialization provided by the user. Nevertheless, every standard library implementation that I currently use does provide a definition of template struct char_traits. This means that such things as basic_string<unsigned char> will actually compile and work on these platforms without having to provide an explicit specialization of char_traits<unsigned char>, but I think this falls into the undefined behavior category. Library vendors can and do provide facilities in their libraries that go beyond what the Standard requires. Furthermore, the Standard is clear that library vendors are free to provide an implementation in any manner that meets the required behavior. The Standard says nothing about any protected interface in the definition of basic_iostream<> and its relatives; it only defines the public interface. Nor does it place any restrictions on how a class specified as a derived class in the Standard can use facilities provided by its base class. An implementation can add protected member functions to a base class such as basic_iostream. Naturally, the implementation of derived classes such as basic_fstream can use the implementation-defined extensions. Since basic_iostream<XDR_Char> would be an explicit specialization, it would not even be providing all of the same public member functions of template class basic_iostream, let alone support any implementation-defined extensions. Therefore, I am forced to reluctantly conclude that if I provide XDR_Stream as an explicit specialization of basic_iostream<XDR_Char>, then behavior of any Standard classes derived from that explicit specialization would be undefined. If basic_fstream<XDR_Char> and basic_stringstream<XDR_Char> happen to work, that's nice, but not something I can depend upon being portable.

My latest definition of the XDR_Stream classes is shown in Listing 1 (XDRStream.h). The implementations of these classes are contained in several different files not shown here. All source code is available for download from the CUJ website. The source code also contains a file XIOStream.h that defines the classes XIOStreambuf, iXIOStream, oXIOStream, and XIOStream. These are concrete classes derived from the appropriate classes in XDRStream.h. The XIOStreambuf class provides a wrapper around an existing Streambuf. This way, you can create an XIOStream out of any other existing stream.

To summarize the developments up to this point:

I wanted to create a facility that would encode and decode data using the XDR protocol. Specifically, I wanted to create an object-oriented interface that could be used in place of the standard, low-level xdr_foo library functions.

I decided that I wanted this new XDR protocol class to be modeled on the IOStreams library.

I wanted to use as much functionality from the IOStreams library as I could. I hoped that the fact that the Standard IOStreams was template based and could be specialized on the type of character the stream was suppose to read/write would help

I created an XDR_Char type and proceeded to specialize the basic_streambuf template using that new character type.

I created iXDR_Stream, oXDR_Stream, and XDR_Stream, which provide the functions that would encode/decode the basic data types specified by the XDR protocol.

All of this was described in [4].

Next I started using my new XDR_Stream class and promptly started refining it in several ways.

I cleaned up the interface and renamed some of member functions to make their usage more intuitive.

I added several new functions to the interface to provide better support for data that was accessed via pointers. In particular, I added better support for arrays that were variable length and needed to be allocated from the free store, and for cases where pointers were used to indicate that the corresponding data elements were optional.

This was described in [5].

Finally, I mentioned that one of my goals with XDR_Stream was to create not only a mechanism that made it easier to use XDR as a communications protocol, but also to provide some facilities to support a simple object persistence mechanism. Up to this point, everything I have described, and all the refinements that I have made, have been equally applicable to both the communications and the persistence domains. What I am going to present in the rest of this column are a couple of enhancements that are primarily added to support the object persistence role.

To repeat my summary from the last column, pointers are often used for four reasons:

when array data has variable lengths that can only be determined at run time

when data is optional

when reference semantics (instead of value semantics) are required

when polymorphism is required

In the last column I discussed enhancements to XDR_Stream for the first two cases. The rest of this column discusses the last two cases. I will continue with the same example that I was using in the previous column: a simple contact database. Please see [5] for more details.

Reference Semantics

Sometimes pointers are used because reference semantics are required. This is true for a lot of complex data structures such as trees. It can also be the case that it is simply desirable that all references to a value to be to a single copy of that value. In our example, let us suppose that many of the Contacts have the same City and State as part of their address. To save space, that part of Address is changed to use reference semantics:
struct Address {
    std::string    street1;    
    std::string    street2;    
    CityState*     cityState;    
    char           zip[5];    
};
Now things start to get a little interesting. There are lots of different ways that reference semantics can be encoded. A linked list can just serialize the value portion of the data in order and reconstruct the links when the data is read back. Arbitrarily complex networks are only slightly more difficult.

Conversely, for something like the above, it might be reasonable just to treat the reference as a value and serialize the CityState object as part of every Address. When the objects are decoded, the CityState pair can be checked to see if it is a duplicate and eliminated if so.

There is obviously no single solution for dealing with reference semantics. Nevertheless, I decided that it might be useful for XDR_Stream to provide some basic capabilities to support common situations. The most common situation (that I could think of) involves substituting a unique object ID for the pointer. Somewhere else, an association between the object ID and its value has to be established.

I decided to add a dictionary to the oXDR_Stream. This is class ObjToIdMap. This map can be accessed via the member function:
obj2id
It provides three operations:
long find(const void* obj)
takes a pointer to an object, looks it up in the map, and returns the ID for that object. Zero is an invalid ID and is returned if the object pointer is not in the map.
pair<long, bool> insert(const void* obj);
inserts an object pointer in the map, assigning it the next sequential ID. The return value is the ID assigned to that object and a flag to indicate whether the ID was newly assigned or not (duplicate object pointers are not allowed).
long operator[](const void* obj);
The index operator will return the object ID for the pointer, inserting a new one into the map if necessary.

This map provides a way of associating objects with unique IDs on a per stream basis. In a simple case, when a new object ID is created, both the object ID and the object value can be encoded into the stream. Thereafter, when a reference to the same object is seen again, only its ID is encoded into the stream. Using our example, I might have
oXDR_Stream& operator<<(oXDR_Stream& oxs, const Address& addr) {
  oxs << addr.street1 << addr.street2;
  pair<long,bool> rtn = oxs.obj2id().insert(addr.cityState);
  oxs << rtn.first;
  if (rtn.second) oxs << *addr.cityState;
  oxs.vput_string(addr.zip, sizeof(addr.zip));
  return oxs;
}
Obviously, I must have an operator<< function that can encode the contents of a CityState object.

On the input side, I have a corresponding IdToObjMap (and the id2obj accessor function). It provides similar functions.
void* find(long id);    
takes an object ID and returns the pointer to that object, if one exists. A null pointer is returned if the object ID has no corresponding object.
pair<void*, bool> insert(long id, void* obj);
will insert the object ID into the map along with the pointer. The pointer can be null at this point. The return value is the pointer associated with the object ID, and a flag that indicates whether the object ID was already in the map (the standard STL convention is followed: a true indicates that insert succeeded in adding a new ID to the map).
void*& operator[](long id);
The index operator will lookup the ID, inserting it if necessary, and return an reference to the pointer that is associated with the object ID. Since a reference is returned, this function can be used to change the pointer associated with an object ID.

If I have the encode function above, my corresponding decode function might look something like the following:
iXDR_Stream& operator>>(iXDR_Stream& ixs, Address& addr) {
  ixs >> addr.street1 >> addr.street2;
  long id;
  ixs >> id;
  pair<void*,bool> rtn = ixs.id2obj().insert(id, 0);
  if (not rtn.second) {
    addr.cityState = static_cast<CityState*>(rtn.first);
  } else {
    addr.cityState = new CityState;
    ixs >> *addr.cityState;
    ixs.id2obj()[id] = addr.cityState;
  }
  ixs.vget_string(addr.zip, sizeof(addr.zip));
  return ixs;
}
Again after some contemplation, I decided that I could simplify things by adding a little bit of capability to the XDR_Stream itself. In particular, I wanted to hide the use of the ObjToId and IdToObj maps. So, on the oXDR_Stream side I added the function:
bool put_objId(oXDR_Stream&, const void* obj);
This looks up the pointer, gets the object ID, and encodes it in the stream. The return value indicates whether the object ID is new. With this function, my first example can be reduced to:
oXDR_Stream& operator<<(oXDR_Stream& oxs, const Address& addr) {
  oxs << addr.street1 << addr.street2;
  bool isNew = put_objId(oxs, addr.cityState);
  if (isNew) oxs << *addr.cityState;
  oxs.vput_string(addr.zip sizeof(addr.zip));
  return oxs;
}
On the input side, I had to add two functions:
void* get_objId(iXDR_Stream&, long& id);    
iXDR_Stream& map_objId(iXDR_Stream&, long id, void* obj);
The first reads the object ID from the stream, looks it up in the id2obj map — inserting it if necessary — and returns the resulting object pointer. It also returns the object ID for additional use. The second function will assign the object pointer to the ID. With these two functions, I can now write my CityState decoder as follows:
iXDR_Stream& operator>>(iXDR_Stream& ixs, Address& addr) {
  ixs >> addr.street1 >> addr.street2;  
  long id;
  addr.cityState = reinterpret_cast<CityState*>(get_objId(ixs,id));
  if (not addr.cityState) {    
    addr.cityState = new CityState;    
    ixs >> *addr.cityState;    
    map_objId(ixs, id, addr.cityState);    
  }
  ixs.vget_string(addr.zip, sizeof(addr.zip));
  return ixs;    
}
I am still a little concerned about the fact that the user has to remember to register the object and its ID back with the stream. I toyed briefly with the idea of using a version of the iXDR_Stream::Sentry to hold a reference to the object pointer and automatically update the id2obj map upon destruction, but I decided that was too obscure.

One thing I have to note is that by adding this capability I stepped outside the realm of XDR's notation, and hence the intended use of XDR as a communications protocol. The encoding of Address has now become:
struct Address {    
  string    street1;    
  string    street2;    
  int       cityStateId;    
            ????    
  char      zip[5];    
};
???? indicates that I do not know how to describe the optionally encoded CityState object. In some sense, this is a discriminated union, where the object ID is the discriminate. This might be shown as:
union switch (int objId) {    
case ????:    
  CityState    cityState;    
default:    
  void;    
};
But, as you can see, there is still the question of what discriminate value results in the CityState object being encoded in the stream. The answer is, "a new object ID (one that has not been used in the stream before this)," but that requires programming logic and cannot be expressed in a simple data description notation.

At this point I stopped and thought about what I had so far. My intent was to use XDR as a binary representation for building a simple object persistence mechanism. If I couldn't describe what I was doing using XDR notation, then I wondered if things were getting too complicated. In a lot of cases where reference semantics are used, it is usually possible to simply substitute the object ID for the pointer. In fact, it probably makes more sense (most of the time) to encode all the objects like CityState together elsewhere in the stream (or in another stream) and not encode them inline at all.

It is also possible to describe (and encode) recursive data structures, such as linked lists, using the XDR notation. Unfortunately, in the real world, data references can be circular. When this happens, you have to break the chain somewhere, or you cannot encode the structure. Besides, if reference semantics take us outside the normal XDR notation, then polymorphic objects really mess things up.

Polymorphic Objects

The final use of pointers in C++ is to refer to polymorphic objects. In other words, we have a pointer to a base class, but the actual object probably is of a derived type. This is where it really gets interesting.

Encoding a polymorphic object into an XDR stream is not difficult. It is straightforward to write a stream insertion operator that acts like a virtual function — it just has to dispatch to a virtual encode function implemented in the derived class. Likewise, a pseudo virtual stream extraction function can be written. Of course, the problem with the latter is that you have to have created an object of the correct derived type (i.e., the same actual type as the object that encoded itself into the stream) before you can extract the value of the object from the stream. In some cases this may be possible, but in general the user of a base class does not have any idea what the actual type of the object is. Therefore, solving the general case means that the object has to somehow encode its type into the stream along with its state. The extraction function has to be able to decode the type, create the correct type of object, and finally read its state from the stream.

Again, one size does not fit all, so I did not try to come up with a totally general solution. As above, I added some support for the general case to the XDR_Stream, but I leave it up to individual clients to use the support if it makes sense, or to solve the problem in some other way if that is more appropriate.

On the output side, the need is to encode the object type into the stream. My approach is very similar to what I used above. I added a TypeToId map to the oXDR_Stream. This has a find function:
long find(const string& type);
which takes a string representing a type name and returns a unique ID for it. The insert and index functions look as you have come to expect:
pair<long, bool> insert(const string& type);
long operator[](const string& type);
On the input side, there is the corresponding IdToTypename map.

As with the reference semantics, I added three functions to the XDR_Stream interface to make things easier on the user:
bool   put_typeId(oXDR_Stream&, const string& type_name);
string get_typeId(iXDR_Stream&, long& id);
void   map_typeId(iXDR_Stream&, long id, const string& type);
The first takes a string representing a type name and looks up its ID in the type2id map. It inserts the string in the map with a new ID if necessary. It then encodes the ID into the stream.

The other functions perform the opposite operations an on input stream. The second reads a type ID from the stream, looks up the corresponding string in the id2type map, and returns it. If the ID is not in the map, a new one is inserted along with an empty string. The third function creates an association between a type name and its ID.

For an example of how this gets used, let's assume that our Contact list can have different types of Addresses. Perhaps there is a USAddress, a UKAddress, and so on. Our inserter for Contact has not changed much at all:
oXDR_Stream& operator<<(oXDR_Stream oxs, const Contact& obj) {    
  return oxs << obj.name << obj.addr << obj.phone;    
}    
Note however that even though addr is a pointer to Address, I do not dereference it. In this case, the inserter function for Address takes the pointer instead of a reference to the object.
oXDR_Stream& operator<<(oXDR_Stream& oxs, const Address* obj) {    
  string type = FactoryCollection<Address>::find(typeid(*obj));    
  bool isNew = put_typeId(oxs, type);    
  if (isNew) oxs << type;
  obj->encode(oxs);    
  return oxs;    
}
I will explain the FactoryCollection template in a moment. For now, just accept that its find function takes a reference to a type_info object and returns a string that identifies that type. In the above, the put_typeId call converts that string to an ID unique to the stream and encodes that ID in the stream. If the ID was new, the type name is also encoded into the string. The next line dispatches to the derived class the responsibility for encoding its state into the stream.

On input, things are almost the direct inverse:
iXDR_Stream& operator>>(iXDR_Stream& ixs, Contact& obj) {    
  return ixs >> obj.name >> obj.addr >> obj.phone;    
}
And the real interesting stuff is here:
iXDR_Stream& operator>>(iXDR_Stream& ixs, Address*& obj) {    
  long id;
  string type = get_typeId(ixs, id);  
  if (type.empty()) {
    ixs >> type;  
  obj = FactoryCollection<Address>::find(type).make();    
  obj->decode(ixs);    
  return ixs;    
}
Here, the get_typeId function reads the ID from the stream and looks up the corresponding string. If the ID is a new one, the string is also read from the stream and both are inserted into the id2type map. Next, the string is passed to the find function of the FactoryCollection. This returns a reference to a factory object. That object in turn has a make function that creates an object of an Address-derived class, the one corresponding to the type-name string. Once an object of the correct type has been created, the code dispatches to that object to extract its state from the stream.

All in all, this is not exactly trivial, but it is not too complicated either. It does help to understand some basic software patterns [7] such as Factory, Abstract Factory, and Prototype, however.

Factory Collection

The heart of the functionality above is the FactoryCollection template, and it is not really specific to XDR_Stream. The problem of how to turn a type name into an object is a general one, so I came up with a general solution some time back. Most such solutions are variants of the Prototype pattern, and so is this — I just find it more general purpose than the basic Prototype pattern.

First off, FactoryCollection is itself a mono-state [7] object. This means that all of its data and members are static. This is a variation on the Singleton pattern [6], but again simpler. FactoryCollection is a template that has to be instantiated with the type of a base class. As you would deduce from the name, it then is expected to contain factory methods for the derived classes of the base. There are three member functions of FactoryCollection:
string find(const type_info&);    
FactoryCollection::Factory& find(const string& type);    
template<typename Derived> register_type(const string& type);    
The first function you saw used above in operator<<. It looks up a type and returns a string. At this point you might wonder if I want a string from a type why not just use the name member function of the Standard class type_info. The simple reason is that the string that is returned by type_info::name is implementation defined. This means that it is not portable across platforms. Furthermore, that name reflects the compiler's view of the type. It includes all the namespaces, template arguments, etc. that make up the full type name. If you change something in the slightest, then that string will change. I have found it is much more useful for the application to assign appropriate names to the types.

The find function throws an exception (invalid_argument) if the type_info object has not been registered. (There is a version find(const type_info&, nothrow) that just returns an empty string, for use when exceptions are not appropriate.)

The second function you also saw above. It takes the string representing the derived type name and looks up and returns a reference to the appropriate factory object. This function will likewise throw an exception if the argument has not been registered. (There is a nothrow version of this that returns a null pointer if the factory object does not exist.)

The final function puts strings and the appropriate factory functions into the collection. Because it is a template that does not have the template argument as one of the function arguments, it has to be invoked with the derived type explicitly specified. In our Address example, we might expect to find code like the following somewhere at the beginning of the application:
typedef FactoryCollection<Address> Addresses;    
Addresses::register_type<USAddress>("USAddress");    
Addresses::register_type<UKAddress>("UKAddress");    
// and so on
That is the basics, and I hope they make reasonable sense (or that the code makes things clearer). I want to point out that FactoryCollection can be specialized in some useful ways.

FactoryCollection contains two nested types: Factory and TFactory. The first is the base class for all the factory objects that are put into a collection. It provides a virtual make function that has to be overridden by derived classes. The TFactory class is a template that derives from Factory. It provides a concrete make method that allocates an object of the type it is instantiated with. This is fine as far as it goes. The problem is that the default version of TFactory must be used with classes that have a default constructor. This is usually true, but not always. As one final example, suppose that our Address class hierarchy had derived classes with a constructor that took an iXDR_Stream object as an argument. This way we could construct and load the object in one statement. In order to use this constructor, we have to explicitly specialize FactoryCollection::Factory and FactoryCollection::TFactory like so:
template<>    
struct FactoryCollection<Address>::Factory {    
virtual Address* make(iXDR_Stream&) = 0;    
};    

template<>    
template<typename T>    
struct FactoryCollection<Address>::TFactory {    
virtual Address* make(iXDR_Stream&);    
};
And we provide the following implementations:
template<>    
FactoryCollection<Address>::Factory::~Factory()    
{} 

template<>    
template<typename T>    
Address* FactoryCollection<Address>::TFactory::make(iXDR_Stream& ixs)    
{ return new T(ixs); }
With this specialization, we can write our Address extractor as:
   
iXDR_Stream& operator>>(iXDR_Stream& ixs, Address*& obj) {    
  string type = get_typeId(ixs);    
  obj = FactoryCollection<Address>::find(type).make(ixs);    
  return ixs;    
}
In the simple case of Address above, this probably is not necessary. When an object contains pointers to other polymorphic objects that it needs to create as part of its own creation, then this type of thing can be essential.

Well, there are the basics. I have them in place, and I am starting to use them. I will let you know how things work out in practice. In particular, in the next column, I hope to show a fairly major application of XDR_Stream. I feel that I must add the warning that although I am pleased with how XDR_Stream is working in my testing so far, I do not think that it is "ready for prime time" yet. In particular, I am concerned about the error handling — or the lack thereof — in the implementation. I shall also have more to say about that in the next column.

Notes and References

[1] Of the four platforms that I currently have access to, two still have old IOStream library implementations that are not template based. As a result, I cannot even attempt to compile the XDR_Stream on those platforms. The other two have more Standard-compliant libraries.

[2] The official Standard term for C-compatible data types.

[3] This is still not correct since the first library ignores any custom ctype facet that might actually be provided. It suffices to initialize the basic_ios<>::fill member to some value, however, and that is sufficient for XDR_Stream.

[4] Jack Reeves. "The (B)Leading Edge: Using IOStreams — Creating a Whole New Stream Class," C/C++ User's Journal Experts Forum, July 2001, <www.cuj.com/experts/1907/reeves.htm>.

[5] Jack Reeves. "The (B)Leading Edge: Using the XDR_Stream Class," C/C++ User's Journal Experts Forum, September 2001, <www.cuj.com/experts/1909/reeves.htm>.

[6] Erich Gamma, et. al. Design Patterns (Addison Wesley Longman, 1995).

[7] Neil Harrison, et. al. Pattern Languages of Program Design 4 (Addison-Wesley, 1999).

Jack W. Reeves is an engineer and consultant specializing in object-oriented software design and implementation. His background includes Space Shuttle simulators, military CCCI systems, medical imaging systems, financial data systems, and numerous middleware and low-level libraries. He currently is living and working in Europe and can be contacted via jack_reeves@bleading-edge.com.