Columns


Standard C/C++

The Header < istream>

P.J. Plauger


P.J. Plauger is senior editor of C/C++ Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.

I continue my detailed review of iostreams. (See "Standard C: Introduction to Iostreams," CUJ, April 1994, "Standard C: The Header <ios>, CUJ, May 1994, and Standard C: The Header <streambuf>, CUJ, June 1994.) This month, we finally get to one of the more visible classes, class istream. It is the principal inhabitant of the header <istream>. Listing 1 shows one way to implement this header.

Class istream is derived from the virtual public base class ios to help you read from a stream. Actually, in the jargon of iostreams, you extract characters from a stream, since the stream is not necessarily associated with an external file that you have to read. The best known object of this class is cin, which controls input from the standard input stream, also controlled by stdin. You can also create istream objects that control files you open by name, or strings of text stored in memory, as we shall see in future installments.

Most of the istream member functions fall into one of two broad categories:

The former group mostly overloads the names get and read. It is analogous to the Standard C library's fgetc and fread, but a bit easier to use. For example, you can extract a line of text with:

if (cin.getline(buf, sizeof (buf)))
   <input succeeded>
The test is true (nonzero) only if the function extracts a non-empty line, that can fit in buf, from the stream controlled by the istream object cin. A "line" is delimited (terminated) by an optional third argument of type char, which defaults to '/n'. The function stores the null-terminated line in buf, discarding the delimiter.

The latter group of functions overloads operator>> to make a family of extractors. It is analogous to the Standard C library's fscanf and friends, but with a variety of often touted advantages. For example, you can extract an octal integer with:

int n;
if (cin >> oct >> n)
   <input succeeded>
The test is true only if the extractor extracts a non-empty field that matches the pattern for octal integers, and the converted value can be properly represented in an object of type int.

Extractors are sexier than unformatted input functions. They are second in popularity only to inserters (formatted output functions) as a selling point for iostreams over more conventional Standard C input and output. But they also take a deal of explaining. I therefore confine my attention this month to describing the unformatted input functions in some detail. Next month, I can then cover extractors with a better foundation.

Input Discipline

Remember that all input and output in iostreams is mediated by objects of class streambuf. (See last month's column.) An object of class istream finds its related streambuf object via a pointer in its base class ios. A member function of class istream can extract and consume the next available input character by calling rdbuf()->sbumpc(). It can put back a character by calling rdbuf()->sputbackc(ch). (It can perform several other related functions as well, but these are the most robust, given the underlying I/O machinery in many implementations.)

There is one small problem, however. It is perfectly permissible for the pointer to streambuf to be null. It is also quite possible that the pointer is non-null, but the stream is in some error state that should discourage extracting. Thus, it behooves any input function to look before it leaps.

The canonical way to play safe is to wrap every input function with two calls to istream member functions:

if (ipfx(noskip))
   <perform any input>
isfx();
The "prefix" function ipfx(int) verifies that the stream is both ready and willing to supply input, or at least to support calls on the streambuf member functions described above. It also performs other initializiation operations and, if you so request, will skip any white space in the input stream. As a rule, formatted input functions skip leading white space while unformatted input functions do not.

The "suffix" function isfx() performs any necessary wrapup operations after each input member function does its work. As you can see from Listing 1, this implementation defines isfx() as an inline empty function — it has nothing to do. But that is not always the case for an arbitrary implementation. If you write an input function that uses the istream object is, always call is.ipfx(noskip) and is. isfx() as shown above.

Listing 2 shows the file istream.c. It defines the two functions you are likely to need any time you declare an object of class istream, its destractor and ipfx(int). Note the use of the pointer tie to an object of class ostream. You "tie" an output stream to another stream to ensure that it gets flushed at appropriate times. cout, for example, is conventionally tied to cin, and possibly cerr as well. You want to flush out any prompts to the console display before you solicit input from the console keyboard.

Exception Handling

The prefix and suffix functions have been part of iostreams for many a year. You will find similar creatures when we discuss class ostream and its inserters, later on. But the draft C++ Standard introduces yet another source of surprises, which calls for even more machinery. All sorts of functions can throw exceptions. The input functions have clearly defined responsibility when exceptions occur during their execution.

If the Standard C++ library had complete control over matters, it would not have to worry so much. Input functions could simply avoid calling functions that might throw exceptions. But that is not the case. Any call to rdbuf()->sbumpc(), for example, can result in a call to a streambuf virtual member function. And any object of class streambuf can actually represent a derived class, with programmer-supplied virtual member functions. The library can thus never know when a call on a streambuf member function might throw an exception.

The draft C++ Standard is clear about what happens when an exception occurs during execution of an input (or output) member function. The function must call setstate(failbit), then rethrow the exception. The structure of an arbitrary input function must now look like:

try {
   if (ipfx(noskip))
      <perform any input>
   isfx();
   }
catch (...) {
   setstate(failbit);
   throw;
   }
Note that the act of setting failbit can also raise an exception, of class ios::failure. Should that happen, the original exception never gets rethrown. (See "Standard C: The Header <ios>, CUJ, May 1994.)

One problem remains. Not all implementations of C++ support exceptions yet. And not all people who use C++ are comfortable with the use of exceptions when they are provided — for a variety of reasons I won't detail here. So I've chosen to implement exception handling in input and output functions with macros. The above code actually looks like:

_TRY_I0_BEGIN
   if (ipfx(noskip))
      <perform any input>
   isfx();
_CATCH_IO_END
The obvious expansion replaces the exception-handling code. I also have definitions for these macros that invoke the Microsoft exception-handling macros. Still another set effectively turns off exception handling, for implementations that can do nothing better. The choice is determined in a header that gets included indirectly by <istream>.

Unformatted Input

Now you have enough background to understand how the unformatted input functions work. Listing 3 shows the simplest of these. Member function get() extracts a single character and delivers it as the value of the function. A failure to extract the requested character, for any reason, is reflected in a return value of EOF.

The only thing new here is the setting of the private member object _Chcount. Each of the unformatted input functions stores in this object a count of the number of characters it extracts. You can access this stored value by calling the member function gcount(). Obviously, a subsequent call to another unformatted input function overwrites the stored value.

Equally obviously, the overhead in obtaining a single character via get() is substantially higher than for the Standard C library function getchar(). The latter is almost invariably implemented as a macro that fetches a character straight from an input buffer more often than not. The former is almost impossible to treat similarly, given the prefix/suffix and exception handling required by the draft C++ Standard. Thus, you should favor methods that extract many more characters for each call, whenever possible, if performance is an issue.

Listing 4 shows the function get(char&). You use it instead of get() when you want to chain operations, as in:

is.get(c1).get(c2);
(a practice not universally admired). The variants get(unsigned char&) and get(signed char&) simply call this function. get(char&), in turn, calls get() to do the serious work. You can guess what this implies about the performance considerations I raised above.

Listing 5 shows the function get(char *, int, char). It too has its variants for unsigned char and signed char. You use it to extract a sequence of characters up to, but not including, a delimiter character. If that sounds much like getline(char *, int, char), which I described earlier, you're right. Listing 6 shows the plain char version of this threesome of unformatted input functions. Compare the two functions and you will find only small differences in the handling of the delimiter and the reporting of errors.

In a nutshell, both deliver null-terminated strings guaranteed not to contain an instance of the delimiter. Both complain if end-of-file is encountered while extracting characters, or if the buffer ends up holding an empty string. But get doesn't actually consume the delimiter, nor does it get upset if it fills the buffer before it sees a delimiter. (This behavior of getline is not entirely consistent with current practice, a topic of ongoing discussion within the C++ standards committees.)

If you want rather less logic, consider function read(char *, int), shown in Listing 11. It simply reads until a count is exhausted, or until no more characters can be extracted.

Lest you think that iostreams are always at a disadvantage compared to Standard C input/output, take a close look at Listing 7. It shows the function get(streambuf&, char), which extracts characters and writes them directly to a streambuf up to, but not including, a delimiter. Thus, you can write:

cin.get(*cout.rdbuf(), '\n');
to copy the next line from cin directly to cout. In this case, copying can be quite fast. (A related extractor is even faster, since it doesn't have to check for delimiters.) Note the use of still more macros for exception handling. I leave it to your imagination to guess what they do, in general. In this context, they deal specifically with exceptions raised while writing (inserting, actually) characters to the stream controlled by sb. Such exceptions are caught, but not rethrown.

Related Functions

A handful of additional functions sometimes proves handy in conjunction with the unformatted input functions. Listing 8 shows the function ignore(int, int). It simply extracts and discards characters up to a specified count, or until a specified delimiter is extracted and discarded.

Listing 9 shows the function peek(), which lets you see the next character to be extracted without actually consuming it. Listing 10 shows the function putback(char), which lets you "put back" an extracted character. (It doesn't really modify the input stream, but the next extraction will deliver up the character.) And Listing 13 shows the related function ungetc(). It backs up a character position without altering the apparent contents of the input stream. With either of these two functions, you shouldn'count on backing up more than one character position between extractions. Portable support for such antics is not guaranteed.

Finally, Listing 12 shows the function sync(). It simply calls the public "synchronizing" function pubsync() in the associated streambuf object. Typically, such an operation flushes an output stream to the associated external file. Some systems use calls like this to discard pending characters from an interactive input stream, but that is not mandated by the draft C++ Standard. Don't count on it. In fact, I'm not sure what portable use you can make of this particular member function.

In any event, that ends the list of istream member functions not associated with formatted input. We'll visit the remainder in detail next month.