May 1997/Speaking in Iostreams-ese

Features

Speaking in Iostreams-ese

Oleg Kiselyov

And you thought that iostreams was just a cute way to avoid calling printf and scanf. That's only the beginning, if you take a broad enough view of what constitutes a stream.

The C++ iostreams library was designed to provide a convenient way to perform mostly text-based sequential I/O. It popularized the notation of streaming — writing a chain of data items and manipulators connected with operator<< or operator>> to form a compound transaction. But streaming can be a far more general concept. Indeed, the need to marshal data in a compound transaction goes far beyond mere reading and writing a text file or terminal.

The iostreams library provides a convenient interface for the tasks of arranging data into a collection, taking data from it, and assembling a transaction. The version specified by the ANSI C++ standard hints at this generality. The classes defined in the header <strstream>, for example, look and feel like any other iostreams, yet they do not interact with physical devices at all. The strstreams classes merely rearrange data within a program, typically changing the data's representation. Similar tasks arise in many other areas, which may actually have little, if anything at all, to do with textual data or files.

In this article I show a few examples of these tasks and how iostreams — as defined in the C++ iostreams library, or implemented otherwise but in the same spirit — can help in expressing the solution. I demonstrate several typical phrases (idioms) in the iostreams parlance, which turn up in a variety of situations. I also present a few verbs (stream applicators and manipulators) that are useful in expressing data marshaling in a concise and safe way. I draw illustrative examples from the areas of object communication (externalizing/internalizing objects to/from a communication packet), assembling a dictionary of a TIFF file, and compiling graphics primitives to transfer in blocks of bits (blit).

Object Communication

Storing and communicating objects is known to be a tough problem. The most difficult part is deciding how to represent the layout of each object (the schema), and reconstruct an object based on the schema. Communicating an object's data is a challenge as well, since data representation — byte ordering, padding, etc. — varies among different platforms.

A binary representation of an object reference may vary from one invocation of a program to another even on the same computer. Furthermore, communicating the content of an object safely may require that extra information, such as a checksum or authentication code, be computed, transmitted, and validated. Streams notation can provide a convenient language for marshaling the data that constitutes an object and defining a transaction that communicates the object.

The iostreams library obviously cannot, in and of itself, solve the problems of "persistent" references, representation of meta-information, etc. It can, however, help in structuring a solution and expressing it clearly. As with other programming languages, expressiveness does matter. The right set of words and connectives helps a compiler to catch more errors, makes it more difficult to compose dubious phrases, and spares the programmer mundane chores. Besides, a good language makes code look aestethically pleasing, and programming more fun.

For example, the whole process of assembling a transmission packet can be neatly and concisely expressed as:
void Segment0::send(LAPOut& out_link) const
{
SegmentOutStream(out_link)
    << (SegmentHeader)(*this)
    << file_size << MTU
    << max_seq_no
    << byte_array(file_name, sizeof(file_name));
}
This one single statement in the body of a method allocates a packet, packs it with meta-information and payload data, computes a CRC, and sends the packet out (through out_link). The chores of data marshaling and conversion, keeping track of the CRC, etc. are carried out behind the scenes by a segment stream object, an instance of the SegmentOutStream class. The instance itself is transient. It is created on the stack and is not even named. Nevertheless, the segment stream object looks and feels like a regular ostream, with SegmentHeader and byte_array acting as inserters.

The reverse operation of digesting a received packet and recovering Segment0, a session control object, can be expressed in the stream language just as well:
void Segment0::complete(SegmentInStream& packet)
throw (rc_bad_packet) {
.....
packet >> const_cast<unsigned long&> (file_size)
>> const_cast<unsigned char&>(MTU)
>> max_seq_no
>> byte_array(file_name, sizeof(file_name));
if( MTU >= max_MTU || MTU == 0 || file_size == 0 || max_seq_no == 0 )
    throw rc_bad_packet(rc_bad_packet::bad_segment0);
}
These segment streams can certainly be derived from the istream and ostream classes of the C++ iostreams library. However, this seemed to be overkill in the context of the project from which the examples are taken — a system which I completed recently that broadcasts meteorological products over unreliable low-bandwidth, no-feedback satellite channels. Listing 1 shows how simple the SegmentInStream and SegmentOutStream classes actually are. Listing 2 shows the implementation of a few typical methods.

A segment-stream manipulator byte_array is the only nontrivial verb in the stream expressions above. It looks like a familiar C++ stream manipulator, yet it has two user arguments. Indeed, any specification of an arbitrary block of memory requires two pieces of information, a pointer to the beginning of the block and the size of the block. The syntax of operator<<, however, allows only one argument on its right-hand side.

The obvious solution is to group the two parameters into a structure, a SegmentOutStream::Array object, and pass it as a single argument to operator <<. The structure thus contains an embedded foreign pointer, which acts as a reference to a memory block. A reference object like this requires special care. You have to make sure the pointer is at all times valid — the object must not outlive the part of memory it refers to.

To safeguard against such violations, the Array class is not given any public constructors. The class does not have any publicly accessible methods either. This all makes it quite difficult to use Array instances in any non-transient fashion, letting them linger past the time the embedded pointer becomes invalid. Incidentally, Array instances never become garbage — allocated storage that is no longer accessible.

Although these syntactic constraints are rather strict, the implementation is trivial. In fact, constructing, passing, and disposing of Array objects is often optimized away in the object code by a compiler, resulting therefore in no run-time overhead. Thus the byte_array manipulator is relatively safe to use, lets the code run fast, and makes it look good.

To demonstrate the expressiveness of iostream phrases, it is helpful to juxtapose them with more traditional code. An example of the latter is the following snippet, taken from a BeOS sample mail client application:
void  register(const char * user, const char * id)
{
BMailMessage    *mail;

mail = new BMailMessage();  // construct mail object
mail->AddField(B_MAIL_TO, company, strlen(company));
mail->AddField(B_MAIL_SUBJECT, subject, strlen(subject));
mail->AddField(B_MAIL_CONTENT, user, strlen(user));
mail->AddField(B_MAIL_CONTENT, id, strlen(id), TRUE);
mail->Send();
delete mail;
}
In iostreams parlance, the whole procedure body (transaction) can be expressed in one single statement:
NBMailMessage()
    << BMail::Header(B_MAIL_TO,company)
    << BMail::Header(B_MAIL_SUBJECT,subject)
    << BMail::Body(user) << BMail::Body(id)
    << endl;
Note that neither the heap nor the namespaces are polluted with short-lived junk. There are no pointers to watch over. The statement works just as fast as the original code. As a matter of fact, it runs faster due to fewer heap allocation/deallocations. From a grander point of view, the one-liner above is better because it shows and emphasizes what BMailMessage really is — a container, or a stream.

Writing a TIFF File

Iostreams can be used to express more sophisticated packing techniques than merely dumping data at the end of a container. As an example, we will consider writing a TIFF file.

A TIFF file is a database, a dictionary of items: a pixel matrix, information related to the pixel matrix (image width, image height, compression mode), and other meta-information (author's name, picture's title, etc). Each dictionary item thus consists of an identifying tag, fields describing an item's data (type, quantity, size), and the data themselves, as an immediate value or a "pointer," an offset to the data's location elsewhere in the file.

Writing a TIFF file is therefore tantamount to assembling its dictionary. One has to keep in mind a stipulation that all items in the dictionary must appear in the increasing order of the values of their tags.

From the applications programmer's point of view, the requirement to know tag values and to keep an eye on the order of adding items is a great imposition. It is far more convenient to allow the programmer to refer to tags by symbolic labels only, and to add items in any order. The dictionary stream can take the responsibility of maintaining the proper order of tags, letting the programmer worry only about the content of the items.

The following code snippet illustrates how these smart TIFF directory streams make creating of TIFF file much easier and clearer. The example is taken almost literally from my image processing library:
http://pobox.com/~oleg/ftp/README.html#cpp.improc
This library also contains the complete implementation.
void IMAGE::write_tiff(
const char * file_name,const char * title,
const TIFFUserAction& user_adding_tags) const
{
is_valid();

message("\nPreparing a TIFF file with name '%s'\n",
    file_name);

EndianOut file(file_name);
TIFFBeingMadeDirectory directory;

directory << ScalarTIFFDE::New(TIFFTAG_IMAGEWIDTH,
(unsigned)q_ncols());
directory << ScalarTIFFDE::New(TIFFTAG_IMAGELENGTH,
    (unsigned)q_nrows());
directory << ScalarTIFFDE::New(TIFFTAG_COMPRESSION,
    (unsigned short)COMPRESSION_NONE);
directory << RationalTIFFDE::New(TIFFTAG_XRESOLUTION,
    72, 1);
if( name != 0 && name[0] != '\0' )
    directory << StringTIFFDE::New(
        TIFFTAG_IMAGEDESCRIPTION, name );
user_adding_tags(directory);
// Give the user a chance to add his own tags
directory.write(file);
file.close();
}
Note the directory stream accepts items of various kinds (Scalar, Rational, String, etc), taking care to figure out if an item should be stored within the dictionary or separately, and making sure the required order of tag values is maintained.

All items being added to a TIFF dictionary are specialized instances of an abstract class TIFFNewDirItem. This class, like the Array class in the previous section, has no public constructors. All its instances are built only on the heap, through a specialized (overloaded) static method New.

In contrast, all SegmentOutStream::Array instances can be made only on the stack. A friend function byte_array (the only available instance generator) will see to it. Unlike the Array objects however, ScalarTIFFDE, StringTIFFDE, and other TIFFNewDirItem instances are supposed to be "permanent," since they are to be chained in an ordered list maintained by the directory stream. That is why these objects must be constructed on the heap.

Note that the programmer actually never sees these heap pointers. All the static member New makes is a wrapper object (of class TIFFNewDirItem::ItemRef, which has no publicly accessible methods. The wrapper stashes the pointer and passes it from the TIFFNewDirItem hierarchy onto the directory stream.

Thus the present example (the complete implementation available at the URL above) demonstrates how to safely and reliably handle pointers to heap objects. These class hierarchies make it syntactically impossible

to allocate an object other than on the heap

to change, cast, or delete its pointer

to use the pointer in any way other than intended (which is, to pass it to a designated recipient, a TIFF directory stream).

The mere possibility of misuse, miscasting, or accidental disposing of an object is thus eliminated. It may surprise you to know that all this syntactic sugar carries zero run-time overhead (as the object code generated by gcc 2.7.2 and Metrowerk's CodeWarrior versions 7 through 9 testifies).

Efficient Blitting

The iostream phrases presented in this section are even further removed from file I/O, dealing specifically with graphical hardware (accelerator cards), such as the one found in a BeBox. An accelerator card receives a graphics command — for example, a line drawing primitive — from an application thread, rasterizes the line in a buffer, and blits it into screen memory.

The card operates as its fastest when it is given a block of several graphics primitives to execute in a single transaction. To arrange such a block, a low-level BeOS graphics API provides several tools, which are typically used as follows:
BPoint pt(viewer_pos.xe,viewer_pos.ye);
BeginLineArray(2);
AddLine(pt-BPoint(viewer_pos.gx,viewer_pos.gy),
    pt+BPoint(4*viewer_pos.gx,4*viewer_pos.gy),color);
AddLine(pt-BPoint(viewer_pos.gy,viewer_pos.gx),
    pt+BPoint(2*viewer_pos.gy,2*viewer_pos.gx),color);
EndLineArray();
The AddLine function, which adds a graphical primitive, must be used within a transaction boundaries set by calling BeginLineArray and EndLineArray. Furthermore, BeginLineArray must be told in advance how many calls to AddLine are to follow.

Thus to use this accelerator card API, you must abide by several semantic constraints, which however are not reinforced syntactically. The API burdens the programmer with extra chores (like counting calls to AddLine). It also makes the code difficult to maintain. For example, one always has to remember to update the count in BeginLineArray when adding/deleting calls to line-drawing primitives. A few classes, defined and implemented in Listing 3, do away with these headaches, making the job of arranging a graphical transaction much more pleasant, not to mention safer. With these classes, one can write:
LineArray lines(view);
lines << BPoint(20,20)
      << LineArray::rline_to(-10,-10);
lines << LineArray::line_from_to(
BPoint(10,20), BPoint(20,10));
lines.offset_by(BPoint(100,100));
lines.stroke();
or even
LineArray(view)
    << LineArray::line_from_to(BPoint(0,0),
        BPoint(20,10))
    << LineArray::rline_to(30,30) << endl;
The classes reinforce the Begin/EndLineArray policy, and make it simply impossible to violate the API's usage patterns. An applications programmer no longer needs to count line segments in a line path. The LineArray class takes care of all that. The run-time overhead is near zero, as most of the class methods are inlined (a few of them are merely syntactic sugar anyway), and almost always temporary buffers are allocated on the stack.

A Simple C++ Stream

I finish where I started, with the C++ iostreams library. The iostreams library is designed to provide a convenient access to (mostly text-based) sequential I/O. It's standard, rather voluminous, implementation — the one that usually comes with the compiler — is not the only implementation possible. I show instead a very simple alternative.

The need for it arose when I was porting to BeOS my linear-algebra and image-processing code, which uses the standard iostreams library a great deal. It quickly became obvious that Metrowerk's C++ development environment for BeOS DR8, alas, comes without the C++ iostreams library. I could not afford to implement the full hierarchy, but the prospect of going through all my code and replacing all occurrences of operator<< with corresponding printf calls was daunting.

As it turned out however, this task can be easily accomplished without any modification to the code, with the right definitions of operator<< and cout. Listing 4 shows this simple implementation. It totals 60 lines, including empty lines. This single file, included as iostream.h, provides the bulk of commonly used C++ standard iostreams functionality. With this .h file, much source code written to use the traditional streams cin and cout compiles as is, and runs correctly. o

Oleg Kiselyov is a computer scientist/software developer with Computer Sciences Corp. (CSC) in Monterey, CA. He's been programming professionally for 17 years. He can be reached at oleg@pobox.com.