July 2001 C++ Experts Forum/The (B)Leading Edge

C++ Experts Forum

The (B)Leading Edge: Using IOStreams — Creating a Whole New Stream Class

Jack W. Reeves

In this installment, I am going to continue my exploration of the Standard C++ library IOStreams class, only I am going to get a little further out in left field. (In fact, this is far enough out that it requires a very good version of the standard library — I have had trouble making this work on more than one platform.) First let me take a slight sidetrack. When I was first learning the new standard library, it rapidly became apparent that just about everything is a template. (You have probably noticed this also.) This made perfectly good sense for the containers, the algorithms, and by extension the iterators. After all, these were the components of the original Standard Template Library. It even seemed to make sense that the string class was a template that had to be specialized by its actual character type. After all, Standard C++ includes several character types.

But originally I could not really understand the value of having the IOStreams library be template based. It seemed to me that ultimately you were going to generate a byte stream, so what was the point. Obviously, it made sense to have output operations for both regular character strings and wide character strings, but that could be accomplished with member template functions without parameterizing the entire library.

When I thought about it a little more, I realized that I was thinking at too low a level. I was also forgetting some details about how templates work. IOStreams are not really about generating or reading a byte stream, but instead are an abstraction that handles formatting data to/from character representations. Since C++ has different types of characters, it makes perfectly good sense that there should be I/O mechanisms that can read/write a wchar_t stream, as well as one that can handle ordinary characters. Templates in C++ provide a way of expressing this characteristic at the abstract level.

Once I got use to the idea of an IOStream being a parameterized type, I started wondering what other character types might make sense. One of the things I have been interested in from the first time I learned IOStreams was the possibility of using the same abstraction to handle network connections. This is obviously nothing new. Lots of different programmers and different libraries have created things like SocketStreams, PipeStreams, etc. Unfortunately, there are a couple of problems with using the IOStreams abstraction for network programming. The first is that the IOStreams abstraction works pretty well for the client side of a connection, but it is missing some fundamental operations necessary for the server side. This can be handled by adding the operations to the derived IOStreams class, but most designers quickly find that it makes more sense to have something like a Socket, Listener, or Acceptor class that handles the details of receiving a connection request. Once a connection is established, then it can be associated with some related type of IOStream in any of several different ways.

There is a more fundamental problem with using IOStreams as an abstraction for network communications however. As I noted above, the basic abstraction of IOStreams is that of formatting data to/from a character stream. Unfortunately, most of the network programming I have done has involved some type of application-level protocol layered on top of the connection. Few of these protocols have been as simple as the formatting provided by IOStreams. In fact, many of the protocols that I have used have been binary protocols. As a result, I have usually found myself using a network stream basically in unformatted mode. This pretty much defeats the purpose of having an IOStream tied to a connection — if I just want to do reads and writes, an ordinary Socket class can usually handle that just fine.

One of the binary protocols that I have used is the XDR protocol. If you are not familiar with this protocol, it is the presentation layer used by the RPC mechanism. One of the characteristics of this protocol is that all data types are represented by multiples of four-byte "words." It occurred to me that these four-byte chunks of data could be thought of as the "characters" of the XDR protocol. From there I wondered if it would be possible to use the template mechanisms in the IOStreams library to create an IOStream abstraction that could handle reading and writing the XDR stream. This column describes what I have come up with so far. I have to note that I am still tweaking the code as I try this out on different platforms, and with different possible applications.

XDR Basics

The XDR protocol is defined by RFC 1832 [4] and is really pretty simple. All data comes in four-byte chunks. XDR defines encoding schemes for the following basic types:
int          — encoded in 32 bit twos complement, big endian
hyper int    — encoded in 64 bit
unsigned int
unsigned hyper int
float        — encode in 32 bit, IEEE format, big endian
double       — encoded in 64 bit, IEEE format
quadruple — encoded in 128 bit
enum         — encoded same as int
bool         — encoded as 1 or 0 in int format
opaque data (fixed length) — encoded as stream of octets
                             padded with nulls
opaque data (variable length) — encoded as <length><data>
array (fixed length)
array (variable length) — has <length> preceding data
string       — encoded as <length> followed by ASCII data
XDR also defines a discriminated union, which is encoded as you would expect
<discriminate><appropriate data>
Note that XDR does not contain a char type. I will have more to say about this below.

XDR_Stream

Listing 1 is the header file XdrStream.h. XdrStream.cpp is the implementation of the basic data types. XdrStream_reader.cpp contains the implementation of the iXDR_Stream class. XdrStream_writer.cpp is the implementation of the oXDR_Stream class, and the bi-directional XDR_Stream class. XdrStream.cpp, XdrStream_reader.cpp, and XdrStream_writer.cpp are available for download from reeves.zip.

First let me describe the interfaces. The first thing I needed was a class to represent an XDR character.
struct XDR_Char {
    unsigned char _data[4];
    void swap_bytes();
    void pad(int)
};
Since chars on my platform are eight-bits long, this struct has the correct size for an XDR character. If your platform uses something other than eight-bit characters, you will have to adjust the definition of an XDR_Char accordingly.

In order to use this class to specialize any of the templates in the IOStreams library, it should meet certain requirements. The Standard requires that it be a POD (Plain Old Data) class. What I have shown here is a POD struct. While the actual definition of a POD struct in the C++ Standard is pretty formal, the basic idea is that it has to act like a built-in type when it comes to construction, copying, and destruction. That means no private or protected data members; no user-defined constructors, destructor, or copy assignment operator; no data members or base classes that are not POD types; and no virtual functions. It is still possible to have member functions however. Note: while the Standard requires a POD class, it may not work, depending upon how your version of the standard library is implemented. My implementation requires a way to construct an XDR_Char from a regular char. To do this, I had to add a default constructor (XDR_Char()), and a converting constructor (XDR_Char(char x)) to my class. While this technically means that it is no longer a POD class, (I call it a light weight class [5]), it should behave like a POD for all practical purposes.

I have defined the XDR_Char class inside of namespace Util. This is my general-purpose utilities namespace. I may move this later.

The other characteristic of a type used to specialize an IOStreams template is the requirement that a traits class exists for that type. I decided to just specialize the char_traits class in the standard library. This way, I don't have to keep specifying the traits class all the time. Since I am providing a definition of a specialization of a template defined in namespace std, I have to put that definition in namespace std also.

Once I had my character type and its corresponding traits type, I could then use it to create some new IOStreams types. First, I created an XDR_Streambuf class by specializing the std::basic_streambuf template. This creates a base streambuf class that can manage writing to and reading from a buffer of XDR_Chars. As with the regular streambufs, a concrete derived XDR_Streambuf class will have to be created to actually connect an XDR_Streambuf to a source/sink of XDR_Chars.

Next, I had a decision to make. I needed iXDR_Stream/oXDR_Stream/XDR_Stream classes to act as base classes for any derived concrete XDR stream classes. In the IOStreams model, these classes provide the formatting necessary to insert or extract data objects from the stream. Unfortunately, the type of formatting needed by an XDR stream is totally different from that provided by the other IOStream classes. I could create my own specialization for basic_istream<XDR_Char>, basic_ostream<XDR_Char>, and basic_iostream<XDR_Char>, but in the end I decided it wasn't worth the effort. Users were going to work with an iXDR_Stream class, etc., and not care whether it was a typedef of a template specialization or a class in its own right. So I decided to just define my own XDR_Stream classes from scratch. Note, they still derive from basic_ios<>, which in turn derives from ios_base. This way I still get a lot of the functionality of an IOStream class without having to write it myself. I also get some stuff that I cannot use — such as a lot of field formatting options that don't apply — but that is of minor importance. If I was concerned about the latter, I might be tempted to make basic_ios<XDR_Char> a protected base class instead of a public one. (As I said, I am still tweaking this code.)

As you can see, the interface for iXDR_Stream, oXDR_Stream, and XDR_Stream looks pretty conventional. I will use oXDR_Stream as my example, but most of what I say applies equally well to iXDR_Stream.

There are inserter (operator<<) functions for all the basic types in C++. I noted above that XDR does not define a char type (or a wchar_t). Therefore, the inserters for these types actually are implemented using the inserter for int. Note: C++ leaves it up to the implementation whether the built-in type char is signed or unsigned. XDR will move the bit pattern correctly from one machine to another, but if you interpret that bit pattern as an integer, you might get some unexpected results if one platform treats a char as signed and the other doesn't.

Since my platform is an Intel-based one, my ints are already in two's-complement form, but they are encoded internally as little endian. The implementation of the int inserters have to swap the bytes into big-endian format before the XDR_Char can be written to the XDR_Streambuf. Since my platform does not provide a long long int, I do not have an inserter that will directly encode an XDR hyper integer type.

The inserters for the real types are similar. Again, my platform already stores floating data in IEEE format, so it was just a matter of getting the endians correct in order to encode them. On my platform, a long double is actually implemented as a double (boo, hiss, Microsoft). Because of this, I did not bother to implement the long double inserter, which should encode as an XDR quadruple type.

The implementation of the opaque data functions is also straightforward. The only requirement is to make sure that the final XDR_Char in the encoding is padded with nulls as necessary. The function names indicate the type of underlying XDR encoding that is to be accomplished: put does a fixed-length encoding, while vput does a variable-length encoding. In the latter, the length of the data is encoded before the actual data. Note that on the input side, the corresponding functions exist: get and vget. The vget function expects to extract the length of data from the XDR stream. For this function, the second argument is the maximum length data that can be extracted. If the actual length in the stream exceeds the specified maximum, the read fails and the stream is marked as "bad." To make things easier to use, I have included both vput and vget functions that take an argument of a vector. This is primarily of importance on the input side: the vector will grow dynamically to hold the actual amount of data extracted from the stream.

There are template functions that handle the encoding of arrays of data. I probably should have named these put and vput also, but decided not to hassle with compilers that don't like this type of overloading. The vget_arr extractor is overloaded with another one that takes an argument of a std::vector<T>. This can be used when the user does not want to allocate an array of the maximum possible size. Alternately, the user can simply extract the length separately and then allocate the correct size array and read it directly with a get_arr call. Using the vget_arr call with a vector is simpler — that is why it is there.

The last thing to note are the free operators that handle strings. XDR has no provision for a fixed-length character string, so strings are always encoded with a length followed by the data. The data itself has to be in ASCII format — people on EBCDIC machines will have to convert. Lastly, the final XDR_Char of a string encoding has to be padded with nulls. On the input side of things, an attempt to extract to a char* will assume that there is enough space to hold the actual data. The maximum allowed size can be specified by setting the field width. (This is one of those things that gets added for free because an iXDR_Stream is derived from basic_ios<>). Since the actual data length is always encoded in the stream, setting the field width is equivalent to specifying the maximum length when using a vget function — if the length is greater than the maximum, an error occurs.

For those users who are familiar with the low-level xdr_foo functions, you will note that there is no provision to pass in a null char* and let the iXDR_Stream library allocate the required amount of buffer. If you need a dynamically sized buffer, read the data into a string instead. That is why it is part of the interface.

That is all there is to it. Naturally, in actual practice that is just the beginning. Any class type T that needs to be encoded in XDR format will have to provide its own versions of :
oXDR_Stream& operator<<(oXDR_Stream&, const T& obj);
iXDR_Stream& operator>>(iXDR_Stream&, T& obj);
Naturally, these will be implemented using the existing inserters and extractors.

Wrapping Up

At this point you might be wondering "What is the point?" After all, the existing library of xdr_foo functions exists, it works, it is pretty much standard, and XDR encoding/decoding is a pretty low-level activity that very few programmers ever get involved with. Those that do usually don't have any problem working with the low-level API. (I was probably one of a very few that hated working with that API.) As one friend asked me — "Where's the payback?"

The simple answer is that now things are easier, so those people that have to do this sort of encoding/decoding should be able to do it quicker and with fewer errors. Unfortunately, while that argument works for me, I have found that it doesn't seem to work for most people.

So how about this argument: once I had this code in place, I realized that I had the beginning of a pretty basic, but nevertheless real, object persistence mechanism. Think about it. I think XDR makes a very nice standard for use along those lines. Once we start talking about that instead of network data programming, then obviously a lot more people are going to be involved in writing XDR encode/decode functions than otherwise would have been. And now the benefit of XDR_Stream abstraction becomes apparent. By making it easy enough, it enables possibilities that most people would ordinarily dismiss as being not worth the time and effort.

Next time, I will show you what I am doing along the lines of building a persistent object store using XDR streams.

References

[1] Jack Reeves. "The (B)Leading Edge: Using IOStreams — Part I," C/C++ User's Journal C++ Experts Forum, January 2001, www.cuj.com/experts/1901/reeves.htm.

[2] Jack Reeves. "The (B)Leading Edge: Using IOStreams — Part II," C/C++ User's Journal C++ Experts Forum, March 2001, www.cuj.com/experts/1903/reeves.htm.

[3] Jack Reeves. "The (B)Leading Edge: Using IOStreams — Locales and Facets," C/C++ User's Journal C++ Experts Forum, May 2001, www.cuj.com/experts/1905/reeves.htm.

[4] RFC 1832 — eXternal Data Representation, http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1832.html.

[5] Jack Reeves. "The (B)Leading Edge: Low Overhead Class Design," The C++ Report, February 1998.
Jack W. Reeves is an engineer and consultant specializing in object-oriented software design and implementation. His background includes Space Shuttle simulators, military CCCI systems, medical imaging systems, financial data systems, and numerous middleware and low-level libraries. He currently is living and working in Europe and can be contacted via jack_reeves@bleading-edge.com.