October 2001/Sending Objects across Platforms

Objects and Components

Sending Objects across Platforms

Fabio Lopiano

Herewith are some very handy classes for moving your objects around a network.

The number of applications written for communicating across multiple computing platforms and environments on a network continues to grow. A common problem in developing client/server, e-commerce, distributed, and other networked applications is the conversion of data from host to network format and vice versa. The non-standard (but commonly available) functions ntoh* and hton* help to deal with this marshalling problem. Unfortunately, these functions can be tedious to invoke for every data member of every object (and its subobjects) that need to be transmitted or received.

This article demonstrates a technique to bridge the host/network/host chasm, which uses hton* and ntoh* in a more elegant way. It also shows how this technique can be extended to develop a framework for the rapid implementation of complex communication protocols.

Introduction

When you write a program that will exchange binary data with other programs over a network, you may need the data exchange to work even when the programs are running on different kinds of machines. If so, when sending, you must convert the data’s representation from the internal format used by the host where the program is running into a common network format, and then from the common network format to the internal format used by the host where the receiving program is running.

Some architectures (e.g., Intel, Alpha) store internal data in “little-endian” format, while others (e.g., Sparc, PowerPC) store data in “big-endian” format. hton* and ntoh* address endianness: for example, htons and ntohs convert short integers from host to network format and back, respectively, and htonl and ntohl do the same for 32-bit integers. So, when exchanging data over a network, you must invoke the hton* functions to convert all your data into network format, and, when receiving the data, you must convert it back from network to host format.

Unfortunately, invoking these conversions manually and getting the types right is usually an error-prone task, and it is common to encounter some type mismatches in the sending and receiving routines. Further, when the protocol itself changes in a way that affects the types being used (for example a short is changed into an int), both sides’ routines must be changed accordingly.

Instead, it would be nice to send whole objects (even complex ones) over the network without having to translate every individual field manually and without worrying excessively about the exact size of each data member.

The idea is to use a mechanism similar to that used with the standard I/O streams: the insertion and extraction operators << and >> insert/extract data into/from a buffer, thus providing an easy hook where hton* and ntoh* can be wrapped and their use hidden. Moreover, by taking advantage of the compiler’s own knowledge of types, you can avoid worrying about type safety in terms of invoking the right conversion function (short or long) because the compiler can already choose the appropriate operator in a type-safe way.

NetBuffer and NetObject

First, you need to define a NetBuffer class, which manages a dynamically allocated binary buffer and provides shift operators << and >> to insert and extract data of specific types. NetBuffer will principally convert integral data from and to network format. Having such a NetBuffer allows you to write something like this [1]:
NetBuffer nbuf;

int   x;
short y;
char  z;

nbuf << x << y << z ;

socket.send( nbuf.buffer() );
and on the server side:
NetBuffer nbuf;

// Socket::recv(Buffer &) resizes 
// the buffer
socket.recv( nbuf.buffer() );

int x;
short y;
char z;

nbuf >> x >> y >> z;
So far, NetBuffer is just a simple shortcut to hton* and ntoh*. But the objective is to send complex objects without having to cope with their inner structures.

What you need is something like a NetObject class, from which you derive your own classes and which internally uses a NetBuffer to convert its members from one format to the other, without the clients of the inherited classes noticing.

This would enable you to write code like this:
class MixedData : public NetObject
{
  int x;
  short y;
  char z;
public:
    // other members not shown
};


MixedData msg;

socket.send( msg.toBuffer() );
and, on the other side:
NetBuffer buf;

socket.recv( buf.buffer() );
MixedData msg = buf;
Once you have this, you can go even further, composing objects like this:
class Header : public NetObject {...};
class Body   : public NetObject {...};
class Message  : public NetObject
{
    Header header_;
    Body   body_;
public:
    ...
};
You can put a Message into a Buffer and get it out on the other side without having to know that all the bytes of the inner objects were converted as needed during the transfer.

Implementation

Listing 1 shows an implementation of NetBuffer. It composes a Buffer [2] and keeps two counters for its current writing and reading position.

This code assumes that CHAR_BITS==8 and that the sizes of char, short, and int are respectively one, two, and four bytes. For platforms with different sizes, it is possible to further generalize the code, keeping in mind that the network format of the data should still have the above-mentioned sizes.

Two methods, put and get, allow inserting/extracting of data into/from the buffer. These methods are used by the operators << and >> defined for each type you want to manage.

The shift operators for unsigned values have been omitted, but they could be useful [3].

A method buffer provides access to the underlying Buffer, and the methods size and resize allow for accessing and modifying its size.

When implementing NetBuffer, you must also consider its internal memory management.

Assume that you will know in advance how much data you are going to write into the buffer. This lets you avoid a resize for each insert. This assumption simplifies the class and improves performance. (You will see shortly how to automatically calculate the correct size for a given NetObject.)

You must also decide what to do when you try to write (or read) more data than will fit (or is available) in the buffer. As it turns out, a useful strategy is to simply ignore extra writing into your internal buffer when dynamic memory is exhausted and returning zero values for out-of-bounds reading, and instead, in each case update the respective internal write/read counter.

This brings us to our main class where the streaming work is done, NetObject (shown in Listing 2). NetObject will use the NetBuffer for its own implementation details, and it will make use of both the above “unintuitive” features of NetBuffer.

NetObject has only two virtual functions, netWrite and netRead, both of which take a NetBuffer as argument. All that the derived classes must do is implement these two methods to insert and extract their data members. With these two functions, you can give a concrete implementation to all of NetObject’s other methods without needing to know the details of the concrete classes to be derived later.

The netCopy method allows derived classes to be filled with the contents of a NetBuffer or of a NetObject using netRead.

The netClear method creates an empty NetBuffer and invokes a netRead on it, which has the effect of setting to zero all the data members of the class (or, more precisely, all the data members involved in the network conversion).

The netSize method calculates the size of the object by invoking netWrite on an empty NetBuffer and returning its write counter. This returns the amount of space needed to store the object in a NetBuffer. Note that this size may be different from the actual size of the object as reported by sizeof.

The toBuffer method uses netWrite to fill a NetBuffer containing the whole object in network format. Note that before writing the buffer, its size is calculated using the netSize.

Operator == compares two objects by converting both to Buffers and then comparing the buffers.

In this way, all derived classes inherit working netCopy, netClear, netSize, and toBuffer methods and comparison operators, regardless of which kind of data members they contain.

Finally the shift operators << and >> are defined to insert and extract whole NetObjects into and from a NetBuffer.

The derived classes must simply implement the two virtual methods in the obvious way: using the operators << and >> on all their data members.

For example, the class MixedData, shown above, will have the following implementation:
void MixedData::netWrite
    (NetBuffer& nb) const
{
    nb << x << y << z ;
}

void MixedData::netRead(NetBuffer& nb)
{
    nb >> x >> y >> z ;
}
It is also possible to extend an existing class, giving it a network conversion capability using multiple inheritance.

For example, given a concrete class A, you can write a class like this:
class NetA :
   public A,
   public NetObject
{
   protected:
      void netWrite(NetBuffer &) const;
      void netRead(NetBuffer &);
   public:
      // add constructors if needed
};
You only need to implement the netWrite and netRead, and then you may use an object of this new class as though it were an object of class A. You can also send it into a socket without additional effort. A common use of this kind of class is to implement the network-extended class as a data member of a composite NetObject derived class, whose public interface allows clients to access only the base class (shown in following example):
class BigObject : public NetObject
{
   NetA theFirst;
   NetA theSecond;
   // other members omitted
 public:
   A & first() { return theFirst; }
   A & second(){ return theSecond; }
};
Even complex objects such as arrays or vectors of objects are easy to implement. Listing 3 shows two template classes for vectors of fixed or variable length. In particular, note how the NetVector template class inherits from std::vector and carries on its interface [4].

The NetArray<> class has a fixed length, and the conversion operators allow you to use it as a normal array.

The NetVector<> class has a variable length, so netWrite puts in the stream the number of elements following, while netRead reads it before reading the elements themselves. This is a common way to send lists of data. Less common is the size of the counter: usually, for short lists, one byte is used, but in some cases it is possible to use a short or even an int. For this reason, a template parameter (defaulted to char) is used to represent the vector’s length.

Now it’s easy to implement a complex protocol — you merely need to compose little NetObject classes to obtain complex classes, which seamlessly can be converted to and from network format.

Sending and Receiving NetObjects

So far, I have shown how to design classes whose objects can easily be converted to and from network format. Now I will show how to use them.

For the sending side, it is quite simple: once you have an NetObject ready to be sent, simply assign it to a Buffer class (using toBuffer) and send it.

For example, if the class Message inherits from NetObject, you can send the following message:
Socket  socket;
Message message;

// code to fill the message and to set 
// up the socket

socket.send( message.toBuffer() );

// error checking omitted
The receiving side, especially for complex protocols, is not so straightforward. The problem is that, while the sender knows what he or she is sending, the receiver (usually) does not know what is coming from the network.

For simplicity, assume that you are receiving from a datagram socket, like UDP, and that for each invocation of recv, you will read exactly one message [5].

Once you have read the data into a Buffer, you assign it to an object of your class derived from NetObject. It is enough to define in your class a constructor (and/or an assignment operator) from Buffer, which simply invokes netCopy on its argument.

But wait. It is likely that your application-level protocol has different kinds of messages. How can you know which class to choose? Usually these kinds of protocols have a fixed header and a variable body (and often a list of other optional items appended to it); the header carries all the information to parse the remainder of the message. But what if you want to separate the handling of the socket from the semantic of the message? You want to handle a generic message at this low level (so close to the network) and examine its details far up at a more appropriate application level. This can be achieved using a “lazy-evaluation” technique.

You can use a generic class Message made up by the Header (whose structure is fixed) and a Body, which at this level is simply a NetBuffer. You can create an instance of this class from the data you read from the socket and propagate it to the higher levels of your application, which can then assign the Body to a more specific class depending on the information stored in the Header. This operation can be performed in a seamless way by providing the specific class a constructor from NetObject that simply invokes netCopy on it.

Consider for example the Message class shown in Listing 4. (Note that before extracting the body from the buffer, you need to resize it to the size specified in the header.) You can have a piece of code that reads a message from the socket as follows:
   // object socket like in the previous example

   Buffer buf;

   socket.recv(buf);

   // error checking omitted

   Message message(buf);
At a higher level, you can have more specific classes for the various kinds of bodies. You can decide which class to use depending on the type of the header, and you can simply assign msg.body to it.

The same technique can be reused inside the specific bodies. In fact, often the body of a message may contain other objects (or lists of objects) whose structure is not fixed, but the body depends on other data stored in the container itself. Again you can assign general objects to more specific objects, later on, when their types will be known.

A general way to do this is a with a constructor (and an assignment operator) from NetObject. Their implementations will simply invoke netCopy.

Adding these constructors and operators, all objects derived from NetObject can be assigned to each other. For example, an object of a class containing a Buffer of twelve chars can be assigned to another object of a different class containing three integers. The comparison operator == will return true if applied to them: those objects are not identical on a byte-per-byte comparison, but their “network format” versions are equal.

This allows you to assign a class containing a list of generic objects to a class with a list of specific objects. This is useful when some general operation must be performed on some kind of message before exactly knowing its full structure, but acting on it as though it were something more specific than a simple buffer (examples could be integrity checks, like CRCs or MD5 digests).

By using these techniques, implementing a complex communication protocol becomes a simpler job.

In Listing 4, the class Message manages a list of Attributes without caring about the exact kind of information stored in them. At a higher level, each item of this list can be assigned to an instance of the appropriate class. For example, a certain kind of Attribute could store a time stamp, so a class, again derived from NetObject, storing ID and time stamp, can be assigned from the Attribute class like this:
TimeStamp ts = message.findAttribute(TimeStamp::Id);
provided that findAttribute searches the vector for an item with the given Id.

Of course, since all these classes may be assigned to each other, you must use them with care. For example, two objects of different classes could end up being used interchangeably without a warning from the compiler [6].

The effort of writing the same constructors and operators for all derived classes is a bit tedious and can be avoided by using a simple template class (shown in Listing 5) that provides operator= and constructors.

Note that the sender and receiver usually use complementary constructors and methods. The sender usually puts information into the object, so it uses a constructor or some methods to set the values of the members. On the other side, the receiver must extract the information from the object and often needs the constructors provided by this class. Even if sender and receiver are usually implemented in two different programs, it is a good thing to share the definition of the classes (when possible) so that they are always in synch.

A general way to do this is to write a class A derived from NetObject, with specific constructors and set methods to be used by the sender. Then use the class NetObjectT<A> in the receiver, which does not need the specific constructors, but does usually need the Buffer and NetObject constructors provided by this template class.

Source Code

On the website (<www.cuj.com/code>), you can find the complete source code for the classes Buffer, NetBuffer, and NetObject along with an example program that uses most of the features illustrated in this article.

The code has been tested both with the GNU compiler 3.0 and with Microsoft Visual C++ 6.0.

Notes

[1] In the examples, I assume to have a class Socket, which allows sending and receiving of an object of type Buffer, where Buffer is a class chosen for the implementation of NetBuffer (e.g., std::vector<char> in Listing 1).

[2] The examples in this article use a vector of char as underlying Buffer; the full code (available at <www.cuj.com/code> provides a smarter Buffer class. If the readers already have their own Buffer class, they can readily implement NetBuffer in terms of that.

[3] It is possible to have a smarter implementation of this class using template methods with the help of a partially specialized template class, but this is outside of the scope of this article.

[4] This inheritance could be problematic because std::vector’s destructor is not virtual and is not designed for inheritance, so the user should not use a NetVector<T> polymorphically as a std::vector<T>. In fact, this is unlikely to happen because the purpose of this class is to be a data member for other NetObject derived classes.

[5] This is not true for stream sockets, like TCP, where you have a continuous stream of bytes and need to manually separate the messages, but these details can be hidden in the implementation of the Socket class.

[6] A safer way is to allow construction/assignment only from specific classes, but this is not always possible and causes stricter coupling between NetObject classes (for example TimeStamp could have a constructor/assignment only for Attribute).

References

[1] Scott Meyers. More Effective C++ (Addison-Wesley, 1996).

[2] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).

Fabio Lopiano received a degree in Computer Science from the University of Pisa (Italy) in 1995. He currently works at the Eutelsat Multimedia Labs in Paris, where he is busy with Internet services via satellite. Some of the ideas explained in this article were developed during his work at Teseo srl (Rome), where he designed a framework for online sports betting, which is currently deployed in Italy and will be used in South Korea for the 2001 Soccer World Championship.