P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.
For the past two months, I have been discussing that part of the Standard C++ library known collectively as iostreams. (See "Standard C: Introduction to Iostreams," CUJ, April 1994, and "Standard C: The Header <ios>, CUJ, May 1994.) The topic this month is the header <streambuf). Its primary focus is the class streambuf, which serves as the driving engine for all iostreams operations. But there are one or two other critters of interest in this header that I will discuss first.
Listing 1 shows one way to write the header <streambuf>. As with the header <ios> I discussed last month, I feel obliged to warn you of some potential surprises:
For now, just take a pass over Listing 1. We'll be revisiting it in detail.
- If you think you know iostreams as it is commonly practiced, be prepared for a few changes. These are a result of the standardization process.
- If you know next to nothing about iostreams, be prepared for more confusing detail. This package is huge, and not easily grasped a bit at a time.
- Note that the implementation presented here doesn't always follow traditional lines. As usual, I implement the draft standard as it currently reads, not as it evolved historically.
Living with <stdio.h>
The first thing you might notice is an old friend from the Standard C library. Macro EOF has exactly the same definition, and meaning, as in the header <stdio. h>. It is an integer value distinguishable from any value representable as type unsigned char. And, as in the Standard C library, it is used here to signal either end-of-file when reading or some other kind of input/output failure.Another borrowing from Standard C is the type streamoff. It is used to represent signed offsets for positioning operations within a stream controlled by a streambuf object. That, of course, includes C-style streams controlled by objects of type FILE. A streamoff object must thus be able to represent the offsets used by the functions fseek and ftell, both declared in <stdio.h>. Hence, the type is pretty much constrained to have the same representation as type long.
Sadly, type streamoff shares the same double meaning with its Standard C counterpart:
It is not always clear from context which meaning is intended.
- It can be a signed displacement relative to some position within a file. All values are meaningful.
- It can be an absolute position within some file. Negative values are nonsensical. (And a file may be so large that it has positions not representable as values of type streamoff.)
I have found it convenient in this implementation to define the secret constant _BADOFF. As in Standard C, it is the standard way to indicate an invalid absolute position. Unfortunately, it sometimes also is used to indicate an invalid relative position, for which the value 1 is a less compelling choice.
The Standard C I/O model also influences the iostreams machinery in more subtle ways. In particular, the remainder of header <streambuf> defines two classes:
Both strive in many ways to be abstract data types. The "streams" they control can be files in the Standard C sense, or text strings that grow and shrink dynamically in memory, or an open-ended set of user-defined sources and sinks.
- streampos, for describing arbitrary positions within a stream
- streambuf, for controlling input and output to a stream
Nevertheless, an important use for both these classes is to interface with their Standard C counterparts. Hence, class streampos is shaped internally by the needs of the Standard C functions fgetpos and fsetpos, declared in <stdio.h>. And class streambuf is obliged to work well with the streams of Standard C. Like it or not (and many C++ purists definitely do not like it), you will find artifacts of Standard C popping up in both these classes.
The Class streampos
The next major item in Listing 1 is the definition of class streampos. As you might guess by now, it is an attempt to improve upon the type fpos_t, defined in <stdio.h). And in many ways it succeeds:
By comparison, an fpos_t value is just a magic cookie. All you can do is obtain a current file position by calling fgetpos, copy the value about, and use that value to restore the same file position by calling fsetpos (for the same open stream, of course).
- You can add a streamoff value to a streampos object, or subtract a streamoff value from a streampos object to determine a new stream position.
- You can subtract a streampos object from another streampos object to obtain a streamoff difference.
- You can compare two streampos objects for equality or inequality.
For an in-memory stream, a streamoff value stored inside a streampos object probably suffices to represent all sensible stream positions. All this arithmetic makes sense and is easy to perform. But for a C-style stream, you have to rely in the end on the Standard C file-positioning functions. If they can't do what you ask, all this flexibility is for naught.
I encourage you to indulge in as little file positioning as possible. When you must, try to confine yourself to revisiting positions that you've visited earlier. If you're more ambitious, expect occasional surprises, and positioning failures.
The implementation I chose endeavors to succeed as often as possible, even in the teeth of possible failure. It stores within each streampos object two private member objects:
The actual declared type of the latter is _Fpost because the name fpos_t is not necessarily defined at this point. (The draft C++ Standard pays lip service to the need for this component of class streampos, but is intentionally mealy mouthed about whether it is actually present.)
- _Pos, of type streamoff, to keep track of the simple offset arithmetic described above
- _Fp, of type fpos_t, to keep track of a C-style file position when working with actual files
This implementation declares the structure fpos_t (_Fpost) with the members:
The state memory, in turn, has the elements:
- _Off, for the absolute position within a stream (_BADOFF if undefined)
- _Wstate, for the state memory when parsing a wide-oriented stream (See "Standard C: Wide Character Streams," CUJ July 1993.)
For all but wide-oriented streams, both these components are always zero.
- _State, for the current parse and shift state within a multibyte stream
- _Wchar, for the current wide-character accumulator used by the multibyte stream parser
I don't have the space here to describe all the subtleties of class streampos. For now, I'll just show the code that goes with the class definition in Listing 1. Listing 2 shows the class constructor. I've augmented the required definition with an optional second argument, to ease interfacing with the Standard C library.
Listing 3 shows the function streampos::offset(). Note that it ignores the algebraic offset _Pos if the absolute position _Fp._Off is invalid. (Yes, the names are backwards.)
Similarly, Listing 4 shows the function streampos::operator-(const streampos&). It too yields _BADOFF if either file position is undefined. Sadly, there is no easy way to distinguish an invalid return from the very sensible difference 1.
Finally, Listing 5 shows the function streampos::operator==(const streampos&). Note that an invalid file position always compares equal to another invalid file position, and never compares equal to a valid one. The equality check for valid file positions may be too exacting, but I don't think so.
The Class streambuf
I could devote multiple columns to class streambuf, and I probably will. But I intend to save many of the details for columns that describe classes derived from streambuf. For now, I focus on what you need to know to understand how this class functions as a base class for more interesting offspring. That's a hard enough tale to tell in this limited space.Conceptually, every streambuf object controls an input stream and an output stream. Often, one of the two streams doesn't really exist. All operations on the nonexistent stream simply fail. If there are indeed two streams, there need be no connection between them, but usually there is. Each stream can maintain a separate position indicator, but they can also be tied together and move in concert.
Class streambuf describes a very general engine for managing input and output character sequences. The class offers a public interface for civilian use. Typically, the inserters associated with an ostream object, such as cout, call on parts of this public interface to "insert characters into an output sequence." And the extractors associated with an istream object, such as cin, call on other parts of this public interface to "extract characters from an input sequence."
You may never have occasion to use this interface yourself. Even if you write your own inserters and extractors, you often do the low-level work by calling on existing inserters and extractors. In fact, I'd discourage you from making direct calls on a streambuf object, at least until you've had time to study some existing inserters and extractors and really understand the protocol. Any abstract description you read will never compare to first-hand practical experience. For now, I'll characterize the public interface to class streambuf in the briefest of terms.
When extracting from a sequence:
When inserting into a sequence:
- Peek at the next character by calling sgetc(), extract (and point past) it by calling sbumpc(), or put back the last character ch you extracted by calling sputbackc(ch).
- Read up to n characters into a buffer buf by calling sgetn(buf, n). The return value tells you how many characters you actually read.
- For various reasons of performance or reliability, avoid calling snextc() or sungetc(). Also avoid calling sputbackc(ch) except as I described above.
To position within a sequence, you use the functions pubseekoff and pubsetpos. An argument of type ios::seekdir (or ios::seek_dir, as I discussed last month) can take the values ios::beg, ios::cur, or ios::end. If you know how to call fseek, you can quickly guess what these mean. (And if you don't, you probably shouldn't try to guess.) An argument of type ios::openmode (yet another misnomer) has ios::in set to affect the input sequence, and ios::out set to affect the output sequence. The effect of various combinations of arguments depends very strongly on the actual type derived from the base streambuf.
- Write a character ch by calling sputc(ch).
- Write up to n characters from a buffer buf by calling sputn(buf, n). The return value tells you how many characters you actually wrote, which should always be n.
Similarly, the functions pubsetbuf and pubsync are really just hooks into virtual functions that are specialized for derived classes. The default behavior for the base class is to do nothing.
Underground streambuf
So much for the public interface. Class streambuf also offers a protected interface which is even more sophisticated. You care about this interface, naturally enough, only if you derive your own class from a streambuf base. (That's the only way you can even access this interface.)The first part of the protected interface you can, and should, pretty much ignore. It is used by the public members to conduct their business. Here again you will find a number of funny names and irregularities. But that should be of no concern to the typical user of a class derived from streambuf. Look at the public member functions, nearly all of which are inlined, to get a feel for what they do, if you care.
You might note, in passing, that the constructors for class streambuf are protected. That keeps you from constructing an object of this class outside captivity. The idea is to derive from the base class and specialize it to perform useful operations. (I discussed the special magic about constructors with argument type _Uninitialized last month.)
The second part of the protected interface is where the action is. Class streambuf is built around a set of virtual member functions that allow you to tailor behavior over a very wide range. Here is where insertions get turned into actual writes to some physical sink for characters. Here is where extractors get turned into actual reads from some physical source of characters. Here is where positioning operations get real, or those other hooks get something hung upon them. In short, the art of specializing streambuf is the art of writing virtual member functions that honor the basic protocol of the base class.
Until the draft C++ Standard came along, there was only one way to specialize a streambuf. Find some code for another specialization that more or less worked, and that you could more or less understand, and crib like crazy. Having access to the source for streambuf proper also helped.
Now you have a little help. You can read the draft C++ Standard to learn the protocol that must be honored by these virtual member functions. Then you can go look at some specialization that works and that you can more or less understand, and crib like crazy. I can't honestly recommend any other way to proceed.
Why is this so? Because describing the protocol for the streambuf virtual member functions is the hardest piece of standardese I've ever essayed. Jerry Schwarz, the original author of iostreams, is still not happy with what I wrote. I think it's getting reasonably accurate and precise, but I'd never accuse this part of the draft of being a decent tutorial. Learning by imitation and by doing is still the safest route.
Having said that, I'll give you just the briefest of hints about what the four most critical virtual member functions do:
- overflow(ch) either writes the character ch to the physical sink of characters, somehow, or it "makes a write position available" (creates an in-memory buffer with a space available for writing a character) and puts the character in it.
- pbackfail(ch) either writes the character ch back to the physical source of characters, somehow, or it "makes a putback position available" (creates an in-memory buffer with a space available for putting back a character) and puts the character in it.
- underflow() either reads a character from the physical source of characters without consuming it, somehow, or it "makes a read position available" (creates an in-memory buffer with a character available for reading) and reads the character from it without consuming it.
- uflow() either reads a character from the physical source of characters and consumes it, somehow, or it "makes a read position available" (creates an in-memory buffer with a character available for reading) and reads and consumes a character from it.
Implementing streambuf
In principle, an object of class streambuf stores six pointers. These point at the beginning, current character, and just past the end of the input and output buffers. In practice, I found it desirable to add a level of indirection. You will notice in Listing 1 that the class indeed has six private pointer members. It also has six private pointers to pointers to char. And it is these indirect pointers that the protected member functions use to access characters in in-memory buffers.I chose this more ornate implementation to permit a very important optimization. Remember, I said earlier that a principal use for streambuf objects is to work in concert with C-style streams controlled by objects of type FILE. The obvious way to do so is to derive a class whose virtual member functions call fgetc, fputc, etc. for the corresponding FILE. But that is terribly inefficient. A better way, when possible, is to have both the streambuf and FILE objects control exactly the same set of pointers into exactly the same buffers.
This I have been able to do with my implementation of the Standard C library. The cost is this extra level of indirection within class streambuf. The payoff is substantially improved performance for C++ iostreams when reading and writing files. I consider it a good tradeoff.
So the six indirect pointers point directly inside a FILE object, whenever possible, as I will show at a later date. In this case, the six character pointers are ignored. Otherwise, the six indirect pointers point at the six character pointers.
Listing 6 shows the file streambu.c. It contains all the functions needed to implement the base class streambuf. Most are dummy functions that either fail or do something innocuous. The two flavors of the secret function _Init give you some hint of the two ways to use the six pointers that I just described.
If you don't understand streambuf yet, don't worry. Its use will become clearer as you see some of the ways it is specialized later in the Standard C++ library.