Columns


Standard C

The Header <streambuf>

P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.

For the past two months, I have been discussing that part of the Standard C++ library known collectively as iostreams. (See "Standard C: Introduction to Iostreams," CUJ, April 1994, and "Standard C: The Header <ios>, CUJ, May 1994.) The topic this month is the header <streambuf). Its primary focus is the class streambuf, which serves as the driving engine for all iostreams operations. But there are one or two other critters of interest in this header that I will discuss first.

Listing 1 shows one way to write the header <streambuf>. As with the header <ios> I discussed last month, I feel obliged to warn you of some potential surprises:

For now, just take a pass over Listing 1. We'll be revisiting it in detail.

Living with <stdio.h>

The first thing you might notice is an old friend from the Standard C library. Macro EOF has exactly the same definition, and meaning, as in the header <stdio. h>. It is an integer value distinguishable from any value representable as type unsigned char. And, as in the Standard C library, it is used here to signal either end-of-file when reading or some other kind of input/output failure.

Another borrowing from Standard C is the type streamoff. It is used to represent signed offsets for positioning operations within a stream controlled by a streambuf object. That, of course, includes C-style streams controlled by objects of type FILE. A streamoff object must thus be able to represent the offsets used by the functions fseek and ftell, both declared in <stdio.h>. Hence, the type is pretty much constrained to have the same representation as type long.

Sadly, type streamoff shares the same double meaning with its Standard C counterpart:

It is not always clear from context which meaning is intended.

I have found it convenient in this implementation to define the secret constant _BADOFF. As in Standard C, it is the standard way to indicate an invalid absolute position. Unfortunately, it sometimes also is used to indicate an invalid relative position, for which the value —1 is a less compelling choice.

The Standard C I/O model also influences the iostreams machinery in more subtle ways. In particular, the remainder of header <streambuf> defines two classes:

Both strive in many ways to be abstract data types. The "streams" they control can be files in the Standard C sense, or text strings that grow and shrink dynamically in memory, or an open-ended set of user-defined sources and sinks.

Nevertheless, an important use for both these classes is to interface with their Standard C counterparts. Hence, class streampos is shaped internally by the needs of the Standard C functions fgetpos and fsetpos, declared in <stdio.h>. And class streambuf is obliged to work well with the streams of Standard C. Like it or not (and many C++ purists definitely do not like it), you will find artifacts of Standard C popping up in both these classes.

The Class streampos

The next major item in Listing 1 is the definition of class streampos. As you might guess by now, it is an attempt to improve upon the type fpos_t, defined in <stdio.h). And in many ways it succeeds:

By comparison, an fpos_t value is just a magic cookie. All you can do is obtain a current file position by calling fgetpos, copy the value about, and use that value to restore the same file position by calling fsetpos (for the same open stream, of course).

For an in-memory stream, a streamoff value stored inside a streampos object probably suffices to represent all sensible stream positions. All this arithmetic makes sense and is easy to perform. But for a C-style stream, you have to rely in the end on the Standard C file-positioning functions. If they can't do what you ask, all this flexibility is for naught.

I encourage you to indulge in as little file positioning as possible. When you must, try to confine yourself to revisiting positions that you've visited earlier. If you're more ambitious, expect occasional surprises, and positioning failures.

The implementation I chose endeavors to succeed as often as possible, even in the teeth of possible failure. It stores within each streampos object two private member objects:

The actual declared type of the latter is _Fpost because the name fpos_t is not necessarily defined at this point. (The draft C++ Standard pays lip service to the need for this component of class streampos, but is intentionally mealy mouthed about whether it is actually present.)

This implementation declares the structure fpos_t (_Fpost) with the members:

The state memory, in turn, has the elements:

For all but wide-oriented streams, both these components are always zero.

I don't have the space here to describe all the subtleties of class streampos. For now, I'll just show the code that goes with the class definition in Listing 1. Listing 2 shows the class constructor. I've augmented the required definition with an optional second argument, to ease interfacing with the Standard C library.

Listing 3 shows the function streampos::offset(). Note that it ignores the algebraic offset _Pos if the absolute position _Fp._Off is invalid. (Yes, the names are backwards.)

Similarly, Listing 4 shows the function streampos::operator-(const streampos&). It too yields _BADOFF if either file position is undefined. Sadly, there is no easy way to distinguish an invalid return from the very sensible difference —1.

Finally, Listing 5 shows the function streampos::operator==(const streampos&). Note that an invalid file position always compares equal to another invalid file position, and never compares equal to a valid one. The equality check for valid file positions may be too exacting, but I don't think so.

The Class streambuf

I could devote multiple columns to class streambuf, and I probably will. But I intend to save many of the details for columns that describe classes derived from streambuf. For now, I focus on what you need to know to understand how this class functions as a base class for more interesting offspring. That's a hard enough tale to tell in this limited space.

Conceptually, every streambuf object controls an input stream and an output stream. Often, one of the two streams doesn't really exist. All operations on the nonexistent stream simply fail. If there are indeed two streams, there need be no connection between them, but usually there is. Each stream can maintain a separate position indicator, but they can also be tied together and move in concert.

Class streambuf describes a very general engine for managing input and output character sequences. The class offers a public interface for civilian use. Typically, the inserters associated with an ostream object, such as cout, call on parts of this public interface to "insert characters into an output sequence." And the extractors associated with an istream object, such as cin, call on other parts of this public interface to "extract characters from an input sequence."

You may never have occasion to use this interface yourself. Even if you write your own inserters and extractors, you often do the low-level work by calling on existing inserters and extractors. In fact, I'd discourage you from making direct calls on a streambuf object, at least until you've had time to study some existing inserters and extractors and really understand the protocol. Any abstract description you read will never compare to first-hand practical experience. For now, I'll characterize the public interface to class streambuf in the briefest of terms.

When extracting from a sequence:

When inserting into a sequence:

To position within a sequence, you use the functions pubseekoff and pubsetpos. An argument of type ios::seekdir (or ios::seek_dir, as I discussed last month) can take the values ios::beg, ios::cur, or ios::end. If you know how to call fseek, you can quickly guess what these mean. (And if you don't, you probably shouldn't try to guess.) An argument of type ios::openmode (yet another misnomer) has ios::in set to affect the input sequence, and ios::out set to affect the output sequence. The effect of various combinations of arguments depends very strongly on the actual type derived from the base streambuf.

Similarly, the functions pubsetbuf and pubsync are really just hooks into virtual functions that are specialized for derived classes. The default behavior for the base class is to do nothing.

Underground streambuf

So much for the public interface. Class streambuf also offers a protected interface which is even more sophisticated. You care about this interface, naturally enough, only if you derive your own class from a streambuf base. (That's the only way you can even access this interface.)

The first part of the protected interface you can, and should, pretty much ignore. It is used by the public members to conduct their business. Here again you will find a number of funny names and irregularities. But that should be of no concern to the typical user of a class derived from streambuf. Look at the public member functions, nearly all of which are inlined, to get a feel for what they do, if you care.

You might note, in passing, that the constructors for class streambuf are protected. That keeps you from constructing an object of this class outside captivity. The idea is to derive from the base class and specialize it to perform useful operations. (I discussed the special magic about constructors with argument type _Uninitialized last month.)

The second part of the protected interface is where the action is. Class streambuf is built around a set of virtual member functions that allow you to tailor behavior over a very wide range. Here is where insertions get turned into actual writes to some physical sink for characters. Here is where extractors get turned into actual reads from some physical source of characters. Here is where positioning operations get real, or those other hooks get something hung upon them. In short, the art of specializing streambuf is the art of writing virtual member functions that honor the basic protocol of the base class.

Until the draft C++ Standard came along, there was only one way to specialize a streambuf. Find some code for another specialization that more or less worked, and that you could more or less understand, and crib like crazy. Having access to the source for streambuf proper also helped.

Now you have a little help. You can read the draft C++ Standard to learn the protocol that must be honored by these virtual member functions. Then you can go look at some specialization that works and that you can more or less understand, and crib like crazy. I can't honestly recommend any other way to proceed.

Why is this so? Because describing the protocol for the streambuf virtual member functions is the hardest piece of standardese I've ever essayed. Jerry Schwarz, the original author of iostreams, is still not happy with what I wrote. I think it's getting reasonably accurate and precise, but I'd never accuse this part of the draft of being a decent tutorial. Learning by imitation — and by doing — is still the safest route.

Having said that, I'll give you just the briefest of hints about what the four most critical virtual member functions do:

Implementing streambuf

In principle, an object of class streambuf stores six pointers. These point at the beginning, current character, and just past the end of the input and output buffers. In practice, I found it desirable to add a level of indirection. You will notice in Listing 1 that the class indeed has six private pointer members. It also has six private pointers to pointers to char. And it is these indirect pointers that the protected member functions use to access characters in in-memory buffers.

I chose this more ornate implementation to permit a very important optimization. Remember, I said earlier that a principal use for streambuf objects is to work in concert with C-style streams controlled by objects of type FILE. The obvious way to do so is to derive a class whose virtual member functions call fgetc, fputc, etc. for the corresponding FILE. But that is terribly inefficient. A better way, when possible, is to have both the streambuf and FILE objects control exactly the same set of pointers into exactly the same buffers.

This I have been able to do with my implementation of the Standard C library. The cost is this extra level of indirection within class streambuf. The payoff is substantially improved performance for C++ iostreams when reading and writing files. I consider it a good tradeoff.

So the six indirect pointers point directly inside a FILE object, whenever possible, as I will show at a later date. In this case, the six character pointers are ignored. Otherwise, the six indirect pointers point at the six character pointers.

Listing 6 shows the file streambu.c. It contains all the functions needed to implement the base class streambuf. Most are dummy functions that either fail or do something innocuous. The two flavors of the secret function _Init give you some hint of the two ways to use the six pointers that I just described.

If you don't understand streambuf yet, don't worry. Its use will become clearer as you see some of the ways it is specialized later in the Standard C++ library.