Columns


Standard C

Implementing <stdio.h>

P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest book is The Standard C Library, published by Prentice-Hall. You can reach him at PJP@wsa.oz; or uunet!munnari!wsa.oz!pjp.

Introduction

The header <stdio.h> is far and away the largest one in the Standard C library. Only <stdlib.h> comes close in size, and that one is a collection of several unrelated groups of functions. By contrast, the header <stdio.h> focuses exclusively on one topic — performing input and output.

I have discussed many aspects of this header in the past:

If you don't have access to back issues of CUJ, take heart. You will find most of these words recycled in Chapter 12: <stdio.h> of The Standard C Library. I am not about to repeat them yet again here.

I am also not going to follow my usual practice of quoting the relevant portion of the C Standard. That would take a whole column in its own right. I have no qualms about getting paid in part for quoting from the standard — I feel I contributed significantly to developing those words. I just don't believe that such an extensive quote best serves the goal of this column — to broaden your understanding of Standard C.

Instead I will go right to the really new stuff. I describe how I implemented the functions in <stdio.h.> The challenges are:

Remember, what I show here is just one possible implementation. Different approaches can be better, depending on circumstances. My purpose in showing a particular implementation is to illustrate how <stdio.h> can work, not how it must work.

Two design decisions are critical to the implementation of <stdio.h>:

I begin by discussing the first of these two topics in detail. You can then appreciate how the portable low-level I/O functions work. I save the primitives for later.

Data Structures

Listing 1 shows the file stdio.h. By now you should be familiar with my use of the internal header <yvals.h> to supply implementation-dependent parameters. Here are the parameters defined in <yvals.h> that affect <stdio.h>, with some reasonable values for them:

#define _NULL   (void *)0/* value for NULL */
#define _FNAMAX 64 /*  value for FILENAME_MAX */
#define _FOPMAX 32 /*  value for FOPEN_MAX */
#define _TNAMAX 16 /*  value for TMP_MAX */
The file stdio.h contains a few other mysteries which shall become clear in time. For now, I concentrate on the type definition FILE. Its members are:

The design of the FILE data structure is driven by the needs of the macros getc and putc (and their companions getchar and putchar). Each of these expands to a conditional expression that either accesses the stream buffer directly or calls the underlying function. The predicate (test expression) part of the conditional expression must be simple and always safe to execute. Thus, str->_Next < str->_Rend is always true if characters that can be read are in the buffer for the stream pointed at by str. And str->_Next < str->_Wend is always true if space is available in the buffer to write characters to the stream. An expression such as str->_Wend = str->_Buf, for example, disallows writes to the buffer from these macros.

The functions that you call to read and write streams make more extensive tests. A read function, for example, distinguishes a variety of conditions such as: characters are available, buffer currently exhausted, end-of-file encountered, buffer not yet allocated, reading currently disallowed, and reading never allowed. The functions rely heavily on the various indicators in the member _Mode to make those distinctions.

Only functions within the Standard C library need be privy to the meaning of these indicators. For that reason, and others, I created the internal header "xstdio.h". All the functions described in this chapter include "xstdio.h". It defines macros for the stream-mode indicators. It includes <stdio.h> and declares all the internal functions used to implement the capabilities of <stdio.h>. It also defines a number of macros and types of interest only to the formatted input and output functions.

Unlike <stdio.h>, the header "xstdio.h" contains too many distractions to present at this point. I show you what goes into it only as the need arises. Here, for example, are the macro names for the various indicators in the member _Mode. Each is defined as a value with a different bit set, as in 0x1, 0x2, 0x4, 0x8, and so on. The actual values are unimportant, so I omit them here:

These macros have private names — beginning with an underscore and an uppercase letter — even though they don't have to. As I developed the library, I found myself moving them in and out of <stdio.h>. Some versions of the macros visible to user programs used these macro names, later versions did not. In the end, I left the names in this form as insurance. You may find occasion to introduce macros that manipulate the indicators in the member _Mode.

The indicators are actually the union of two sets. One is the set of indicators that determines how to open a file. The other is the set of indicators that helps record the state of the stream. Since the two sets partially overlap, I chose to keep them all in one "space" of bit encodings. A tidier implementation might well choose to separate the two uses. You might also want to define two sets of values if you are starved for bits in _Mode. In either case, you must add code to translate between the two representations.

Opening And Closing Files

The best way to see how the library uses a FILE data object is to track one through its lifetime. Listing 2 shows the file fopen.c. It defines the function fopen that you call to open a file by name. That function first looks for an idle entry in the static array of FILE pointers called _Files. It contains FOPEN_MAX elements. If all of these point to FILE data objects for open files, all subsequent open requests fail.

Listing 3 shows the file xfiles.c that defines the _Files data object. It defines static instances of FILE data objects for the three standard streams. Each is initialized to be open with appropriate parameters. I have wired in the handles 0 for standard input, 1 for standard output, and 2 for standard error. This is a widely used convention, inherited from UNIX. You may have to alter or map these values.

Elements beyond the first three in _Files are initialized to null pointers. Should fopen discover one of these, the function allocates a FILE data object and marks it to be freed on close. fopen discovers a closed standard stream by observing a non-null element of _Files that points at a FILE data object whose member _Mode is zero.

fopen calls on the internal function _Foprep to complete the process of opening a file. Listing 4 shows the file freopen.c. The function freopen also calls this internal function. Note how it records the state of the indicator _MALFIL until after fclose has closed the file currently associated with the stream. The one operation that freopen does not want fclose to perform is to free the FILE data object.

You may as well see fclose too, at this point. Listing 5 shows the file fclose.c. It undoes the work of the file-opening functions in a fairly obvious fashion. The one bit of magic is where it calls the function _Fclose to close the file associated with the stream.

Listing 6 shows the file xfoprep.c that defines the function _Foprep. It parses the mods (second) argument to fopen or freopen, at least as much as it can understand, and initializes members of the FILE data object accordingly. In the end, however, it must call on some outside agency to finish the job of opening the file. _Foprep passes on the file name, the encoded indicators, and whatever is left of mods to a function called _Fopen.

Primitives

_Fclose and _Fopen are two of several low-level primitives that stand between <stdio.h> and the outside world. Each must perform a standardized function for the Standard C library. Each must also be reasonably easy to tailor for the divergent needs of different operating systems. This implementation has nine functions in <stdio.h> that must be tailored to each operating system.

By implementing these interface primitives, you can use this library in conjunction with several popular operating systems. I have cobbled up versions that work with:

I say "cobbled" because my versions cut an occasional corner. They may, for example, call functions that violate the name-space caveats of the C Standard. (I may call unlink instead of writing an assembly-language equivalent called _Unlink.) Or they may not deal with all the nonstandard ways that a carriage return can appear within an MS-DOS file.

Nevertheless, I'm comfortable that these primitives are reasonable and workable. Next month, I'll discuss the I/O primitives in detail. I'll also show you an example of one way to float <stdio.h> atop an operating system.

This article is excerpted in part from P.J. Plauger, The Standard C Library, (Englewood Cliffs, N.J.: Prentice-Hall, 1992).