Columns


Standard C

Primitives For <stdio.h>

P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is secretary of the ANSI C standardMs committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest book is The Standard C Library, published by Prentice-Hall. You can reach him at PJP@wsa.oz; or uunet!munnari!wsa.oz!pjp.

Introduction

Last month, I began discussing how to implement the functions declared in the header <stdio.h>. I showed my version of that header file and described the FILE data structure. I also walked through the code needed to open and close files. (See "Implementing <stdio.h>," Standard C, CUJ January, 1992.)

This month, I present the low-level "primitive" functions in <stdio.h> that must be tailored to each operating system. A primitive is a function that performs some essential operation. No other combination of library functions can substitute. You can keep most of the library portable if you isolate system dependencies in a minimum set of primitives.

As I mentioned last month, this implementation has nine such functions Each primitive function must perform a standardized operation for the Standard C library. Each must also be reasonably easy to tailor for the divergent needs of different operating systems.

The Primitives

Three of the primitives are also standard functions:

Each of these functions is small and very dependent on the peculiarities of the underlying operating system. It is not worth writing any of them in terms of lower-level primitives. You can often find versions in an existing C library that do the job nicely.

Three of the primitives are macros defined in the internal header "yfuns.h". It defines macros and declares functions needed only within the Standard C library to interface to the outside world. Only certain functions written for this implementation need include "yfuns.h". (The internal header <yvals.h>, by contrast, must be included in several standard headers.)

The three macros look like internal functions with the declarations

int _Fclose(FILE *str);
int _Fread(FILeE *str, char *buf, int size);
int _Fwrite(FILE *str, const char *buf, int size);
Their semantics are:

Many operating systems support functions that have declarations very similar to these. You can often find existing functions that the macro expansions can call directly.

The last three primitives are internal functions. One function is declared in "xstdio.h". Two are used in masking macros, and hence are declared in <stdio.h>. (See last month's presentation for a listing of this standard header file.) Their declarations are:

short _Fopen(const char *name, unsigned short mode, const char *mods);
long _Fgpos(FILE *str, fpos_t *fpos);
int _Fspos(FILE *str, const fpos_t *fpos, long offset, int way);
Their semantics are:

You are less likely to find existing functions that you can commandeer to implement part or all of these three functions. Each involves data representations that are probably peculiar to this implementation.

File Positioning Functions

Old C hands are probably comfortable with most of these primitives. They bear a strong resemblance to a fistful of functions not included in the C Standard. These have names such as close, open, read, and write. They are based on system calls in the UNIX operating system. Early ports of C to other operating systems retained this heritage as much as possible. The old primitives were omitted from the C Standard for good reasons. Nevertheless, many of them still provide a good foundation for the rest of the library.

The primitives you are likely to find strangest are _Fgpos and _Fspos. They bear only a loose resemblance to their ancestor, called lseek. Most of the difference is designed to accommodate two new file-positioning functions, fgetpos and fsetpos. These have no exact analog in UNIX.

To show you why these new primitives take the form they do, I simply show how the library uses them. The function fseek, for example, is simply

int fseek(FILE *str, fpos_t *p)
   {
   return (_Fspos(str, NULL, off, smode));
   }
Its brother ftell is

long ftell(FILE *str)
   {
   return (_Fgpos(str, NULL));
   }
And rewind is

void rewind(FILE *str)
   {
   _Fspos(str, NULL, OL, SEEK_SET);
   str->_Mode &= ~_MERR;
   }
Similarly, the new function fgetpos is simply

int fgetpos(FILE *str, fpos_t *p)
   {
   return (_Fgpos(str, p));
   }
And fsetpos is

int fsetpos(FILE *str, const fpos_t *p)
   {
   return (_Fspos(str, p, OL, SEEK_SET));
   }
The results speak for themselves. These two primitives make the five file-positioning functions trivial.

UNIX Primitives

By implementing these interface primitives, you can use this library in conjunction with any reasonable operating systems. As I mentioned last month, I have written sets of primitives for several popular operating systems. For completeness, I show here primitives for just one environment. Please remember, however, that these represent but one of many possibilities.

For simplicity, I sketch here primitives that interface to many versions of the UNIX operating system. That is often the easiest system to use as a host for the Standard C library. The C language has moved to many other environments. Still, as I indicated above, much of the library design was shaped by the needs and capabilities of UNIX. The files I show are only sketches because they often can be augmented to advantage.

In all cases, I assume the existence of C-callable functions that perform UNIX system calls without violating the name-space restrictions of Standard C. I take the conventional UNIX name, make the first letter uppercase and prepend an underscore. Thus, unlink becomes _Unlink. You may have to write these functions in assembly language if your UNIX system supplies no adequate substitutes.

For example, Listing 1 shows the file remove.c that defines the function remove. This version simply invokes the UNIX system call _Unlink. A more careful version would verify that a program with super-user permissions is not doing something rash.

Listing 2 shows the file rename, c. It defines a simple version of rename that simply manipulates links to the file. That typically works only if both the new and old file names are within the same filesystem (on the same logical disk partition). A more agressive version might choose to copy a file when the link system service fails.

Listing 3 shows the file tmpnam.c. It defines a simple version of tmpnam that concocts a temporary file name in the directory /tmp, the customary place for parking temporary files. It encodes the current process-id to make a family of names that should be unique to each thread of control.

Listing 4 shows the file xfopen.c that defines the function _Fopen. It maps the codes I chose for the mode indicators to the codes used by the UNIX system service that opens a file. A proper version of this program should not include all these magic numbers. Rather, it should include the appropriate header that UNIX provides to define the relevant parameters.

UNIX makes no distinction between binary and text files. Other operating systems may have to worry about such distinctions at the time the program opens a file. Similarly, UNIX has no use for any additional mode information. (_Fopen could insist that the mode argument be an empty string here. This version is not so particular.)

Listing 5 shows the file xfgpos.c that defines the function _Fgpos. It asks the system to deliver the file-position indicator for the file, then corrects for any data buffered on behalf of the stream. A file-position indicator under UNIX can be represented in a long. Hence, type fpos_t, defined in <stdio.h>, is a structure that contains only one long member. (I could have defined fpos_t as type long directly, but I wanted to keep the type as restrictive as possible.) In this case, the functions fgetpos and fsetpos offer no advantage over the older file-positioning functions. The difference can be important for ther systems, however.

_Fgpos is simpler under UNIX in another way. No mapping occurs between the internal and external forms of text streams. Hence, the correction for characters in internal buffers is simple. Consider, by comparison, a system that maps text streams. Say it terminates each text line with a carriage return plus line feed instead of just a line feed. That means that _Fread must discard certain carriage returns and _Fwrite must insert them. It also means that _Fgpos must correct for any alterations when it corrects the file-position indicator. The problem is manageable, but it leads to messy logic that I choose not to show at this point.

Listing 6 shows the file xfspos.c that defines the function _Fspos. It too benefits from the simple UNIX I/O model in the same ways as _Fgpos. Output causes no problems, since the function flushes any unwritten characters before it alters the file-position indicator.

The remaining three primitives are macros. All expand to calls on functions that perform UNIX system services directly. The UNIX version of "yfuns.h" contains the lines:

#define _Fclose(str) \      _Close((str)->_Handle)
#define _Fread(str, buf, cnt) \
   _Read((str)->_Handle, buf, cnt)
#define _Fwrite(str, buf, cnt) \
   _Write((str)->_Handle, buf, cnt)
int _Close(int);
int _Read(int, unsigned char *, int);
int _Write(int, const unsigned char *, int);

Summary

That's the underpinnings of this implementation of the header <stdio.h>. I can now go on to show you how to perform various unformatted reads and writes using this machinery. And as a grand finale, you can see at least some of the code needed to perform formatted I/O. The story continues.

This article is excerpted in part from P.J. Plauger, The Standard C Library, (Englewood Cliffs, N.J.: Prentice-Hall, 1992).