Low-Level I/O for Embedded Systems

By Randy Meyers and Thomas Plum

It is too bad that this isn't the September installment of this column, because it has a lot in common with those "What I Did On My Summer Vacation" essays that vex returning school children.

In theory, if not always practice, Standards have a lifecycle made up of two five-year periods. The first is five years of development of a new Standard, during which time, the committee considers all sorts of extensions and actively solicits suggestions for features from the public. At the end of this phase, the Standard is approved and frozen with respect to new features. The second five-year period then begins, and the committee answers questions about the correct interpretation of the Standard and fixes via technical corrigenda whatever problems may surface. The process then repeats.

Under this schedule, a new Standard is approved about every 10 years, and the industry gets to enjoy a period of stability in between.

Not surprisingly, most of the work in the maintenance phase occurs in the first year or two after a Standard is approved. So, what has the C committee been doing since 1999?

The C committee has been repeating a successful exercise from the maintenance period of the 1989 Standard — it has been working on Technical Reports (TRs). As development of the C89 Standard was wrapping up, two related groups felt like they had not been well served. The high-performance computing community found the aliasing permitted by C's pointers to block vectorization and parallelization. The numerical programming community wished for better integration with IEEE floating point and better arrays and other Fortran-like features.

During the maintenance phase, the C committee attempted to address these problems by writing a TR entitled "Numerical C Extensions Group (NCEG) Technical Report." By publishing a TR, the committee struck a balance between two conflicting goals. First, the C committee is reluctant to standardize features that have no real-world testing. Since a TR is not a Standard, it permits implementations to experiment with the features described, and to alter (or ignore) them as issues, costs, and market-acceptance dictate. Second, a TR does give some direction to implementations wishing to experiment with novel features.

The NCEG TR was pretty successful. Most of the described features were implemented in at least one compiler, and the committee got the real-world feedback it wanted. Based on the experience gained, many of the features were incorporated in C99. A few were not. For example, syntax to support explicitly parallel programming never caught on.

Embedded Systems TR

During the maintenance period for C99, the C committee has been working on TRs. The first of these — "Extensions for the Programming Language C to Support Embedded Processors" [1] — is nearing completion. We call it the "Embedded TR." It addresses the rather specialized needs of the embedded-systems market, and so might be of little or no interest to the general-purpose computing market. If these features catch on and are standardized at a later date, they are likely to appear in a section of the Standard specific to embedded systems. This is a distinction that has always existed in the C Standard. It describes two categories of conformance:

Freestanding implementations currently do not have to provide most of the standard library since, for example, the freestanding implementation running your microwave oven probably cannot open a file and read it.

The Embedded TR contains extensions for fixed-point arithmetic (a common DSP hardware datatype), multiple address spaces, and low-level I/O.

Low-Level I/O

Low-level I/O is also known as "physical I/O" or "hardware I/O." While all computers do low-level I/O, most general-purpose computer programmers never access it directly, while many embedded-systems programmers have no access to anything else.

In a general-purpose computer, device drivers, file systems, and I/O libraries present programmers with convenient and uniform access to I/O devices. If you want to create files on floppies or write to printers, you call fopen() with the name of the file or the printer, call printf() to write the data, and then call fclose() to finish. In contrast, depending on the hardware involved, when writing data to floppies using low-level I/O, you would access a device register to turn on the floppy drive motor. You would access a device register to position the read-write head. You would send a stream of data to the device interface, and then give a command to perform the write. Doing low-level output to printers requires different steps. You write one data byte to the parallel-port device register. You check a different port register to see if the printer was ready to read a byte. If so, you flip a bit in a device register to command the printer to fetch the byte from the port.

At some point, low-level I/O involves the CPU communicating with the device using device registers. The meaning of values read and written to device registers is all device dependent, and may trigger all sorts of complex interactions. For example, hard-disk controllers might access main system memory directly to store blocks of information read from a disk.

One problem embedded-systems programmers face is that the method of accessing device registers can vary from system to system. Some systems treat device registers as if they were normal memory locations in the address space. In such memory-mapped systems, any machine instruction that accesses a memory location can, in principle, access the device register merely by referring to its address. Other systems treat device registers as I/O ports. On such systems, only specialized IN/OUT instructions may access the device register. You might think of IN/OUT instructions as LOAD/STORE instructions that access a different address space than normal. Some systems even take the memory-mapped approach for some devices and the port-based approach for others.

The problem for embedded programmers is that they might have a particular interface chip that has certain device registers and a protocol for using them. Code can be written to manipulate the device registers and accomplish specific tasks using the protocol, but the device registers might be I/O ports in one embedded system or memory-mapped addresses in another. A standardized way to hide this difference would help. A related problem arises when different compiler vendors provide different names for the low-level intrinsic functions that produce hardware IN/OUT instructions, or different ways to designate the device registers. Standardized names would reduce the complexity of the application programs. These two problems are addressed by the low-level I/O section of the Embedded TR.

As you can see, the Embedded TR ignores many of the issues with low-level device access that occur in general-purpose computers where I/O devices might be plugged-in and unplugged at will, and where an operating system may prohibit direct access to device registers to achieve both sharing and security. The Uniform Driver Interface Project [2] addresses those more general issues.

Low-Level I/O API

The Embedded TR describes an interface that consists mainly of two things — I/O register designators, which represent hardware device registers; and functions, which take an I/O register designator as an argument and perform some operation on the specified device register (like reading or writing the register).

The functions are provided in a new header, iohw.h. The functions might not be true functions — they might be macros that expand to compiler-intrinsic functions that produce specific machine instructions at the point of call for purposes of efficiency.

The I/O register designators are provided in one or more headers with implementation-defined names. The headers define names for all of the device registers that you can access using the iohw.h interface. For example, LPT1ctrl might be the name (I/O register designator) for the control register of the first parallel port. The I/O register designators contain all of the information needed for the iohw.h functions to access the port. Given the I/O register designator, the functions know the size of the device register, whether the device register is memory-mapped or a port, and even the Endian-ness of the register (whether bits or bytes need to be swapped in the way in or out of the register). All of the system-specific details of accessing the device register can be determined from the I/O register designator.

You might think of I/O register designators as named constants or const objects. However, just as the functions in iohw.h might not be functions, the I/O register designators might not be objects at all. You are never told the type of I/O register designators, and they might not even have a type. They might be macros that expand into new compiler keywords or special syntax. The only thing that you can do with I/O register designators is use their names as arguments to the functions in iohw.h. This restriction exists to aid compilers that wish to generate special machine instructions inline for calls to iohw.h functions.

We wanted to present a working example that does low-level I/O, but such programs are usually implemented using specialized compilers for embedded systems. There does remain one arena for examples — accessing a PC's parallel port. Please understand, there are dozens of complicating details underneath the simple examples we present here. Our purpose is only to illustrate the motivation and use of the low-level I/O API in the Embedded TR.

Windows, like most other multitasking operating systems, uses hardware features to prevent (or at least control) direct access to device registers by user programs. To avoid this problem, we wrote our program to run under MS-DOS. We were able to use two compilers (both free downloads) to do this: the old Borland Turbo compiler from http://community.borland.com/museum/ and the modern Digital Mars C compiler from http://www.digitalmars.com/.

Our example program prints "Hello, World," carriage-return, line-feed, and form-feed onto the first parallel port. Full program sources and an executable are available on the CUJ web site (http://www.cuj.com/code/). See the readme.txt file included with the sources for information on running the program under Windows and some restrictions with inexpensive printers.

From the programmer's point of view, the parallel port is a set of at least three 8-bit registers. Newer parallel ports, such as the extended parallel port (EPP) or the extended capabilities port (ECP), have additional registers associated with them. Here, the file iolpt.h defines four I/O register designators for the four device registers we need:

extern _IOreg lpt1_data;
extern _IOreg lpt1_status;
extern _IOreg lpt1_control;
extern _IOreg lpt1_ecp_control;

In our example, I/O register designators are going to be represented by objects of type _IOreg, and the functions in iohw.h will be true functions. All identifiers beginning with an underscore are private implementation details only to be used in iolpt.h or iohw.h, or their corresponding .c files. The clients of iohw.h and iolpt.h should never use such identifiers.

Listing 1 shows the declarations from iohw.h of the _IOreg type. We support both 8-bit and 16-bit I/O registers, which can be either ports or a memory-mapped locations. The _IOreg type consists of an enum that tells us the type of register, and a union for the register address or port number.

In the terminology of the Embedded TR, the parallel port is an "I/O group" (whose "I/O group designator" is LPT1_GROUP) consisting of the aforementioned four I/O register designators. The program announces its intention to use these registers by invoking iogroup_acquire(LPT1_GROUP). I/O group designators, like I/O register designators, may or may not be objects with types. In this example, we just make I/O group designators integer codes that indicate the registers being referenced.

In addition to granting ownership of the registers in the I/O group, the Embedded TR states that iogroup_acquire might also initialize or finish initializing the I/O register designators. Listing 2 presents our iogroup_acquire(), which uses the PC BIOS table to determine the port number of the first parallel port. The type member was initialized at compile time in iolpt.c.

We send a byte of data (named c) to lpt1_data by invoking iowr(lpt1_data, c). We read a byte of data from lpt1_status by invoking iord(lpt1_status). We can turn ON the 0x01 bit in the lpt1_control output register by invoking ioor(lpt1_control, 0x01). Similarly, we can turn OFF the 0x01 bit by invoking ioand(lpt1_control, ~0x01). There are also iogroup_release to relinquish an I/O group, and ioxor for EXCLUSIVE-OR. Here are the prototypes for the functions mentioned so far:

#include <iohw.h>
void iogroup_acquire( iogroup_designator );
					// acquire
void iogroup_release( iogroup_designator );
					// release
unsigned int iord( ioreg_designator );
					// read
void iowr( ioreg_designator, unsigned int a );
					// write
void ioand( ioreg_designator, unsigned int a );
					// and
void ioor( ioreg_designator, unsigned int a );
 				    // or
void ioxor( ioreg_designator, unsigned int a );
				    // exclusive-or

The same functions are available in unsigned long versions, with "l" appended to the names (iordl, iowrl, ioandl, ioorl, and ioxorl). The same functionality is also available for an array of I/O registers, in which each invocation reads or writes the ith register in the series. The names are formed by appending buf to the names, and the index has a type named ioindex_t:

unsigned int iordbuf( ioreg_designator, ioindex_t ix );
					// read
void iowrbuf( ioreg_designator, unsigned int a, ioindex_t ix );
					// write
void ioandbuf( ioreg_designator, ioindex_t ix, unsigned int a );
					// and
void ioorbuf( ioreg_designator, ioindex_t ix, unsigned int a ); 
					// or
void ioxorbuf( ioreg_designator, ioindex_t ix, unsigned int a );
					// exclusive-or

The "buf" versions are also available in unsigned long versions, again with "l" appended to the names (for example, iordbufl, iowrbufl, ioandbufl, ioorbufl, and ioxorbufl).

Listing 3, which is our implementation of the iord() function, switches on the I/O register's type to do a byte IN or word IN; or byte fetch or word fetch. The byte and word IN instructions are accessed using the Turbo C inportb() and inport() extensions.

Listing 4 shows the function in example.c that uses the iohw.h interface to send a character to the printer. Various bits in the parallel-port device registers are directly wired to pins on the parallel-port connector. Sending a value to the data register turns on/off the signals in the pins on the connector. After the data pins have the correct configuration, turning on the strobe pin in the connector causes the printer to read in the data pins.

When doing low-level I/O, all sorts of practical considerations have to be honored. One example concerns the strobe pin: The strobe pin has to be turned on for long enough to allow the printer to copy the data bits. If the pin is turned on and off too fast, the printer might not see it.

Conclusion

Low-level I/O is far more complicated than regular I/O. Every device has different device registers, with different meanings and different protocols for programming. Even the simplest high-level I/O operation might take several steps. With so much complexity, it is no wonder that embedded programmers and device-driver programmers desire at least some standardized methods of access for device registers. The Embedded TR (or, alternatively, the UDI Specification) provide that.

References

[1] ISO/IEC DTR 18037, C Extensions to Support Embedded Processors. http://std.dkuug.dk/jtc1/sc22/wg14/www/docs/n1005.pdf.

[2] The UDI specifications are available at http://www.projectudi.org/.

[3] Kyle A. York. "High-Speed Transfers on a PC Parallel Port," C/C++ Users Journal, November 1996.

[4] http://www.fapo.com/ieee1284.htm.

[5] http://www.repairfaq.org/filipg/LINK/PORTS/F_Parallel.html.

[6] http://www.beyondlogic.org/spp/parallel.htm.


Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at rmeyers@ix.netcom.com.

Dr. Thomas Plum has authored four books on C, and coauthored Efficient C (with Jim Brodie) and C++ Programming Guidelines (with Daniel Saks). He has been an officer of the United States and International C and C++ Standards committees. His company Plum Hall Inc. provides test suites for C, C++, Java, and C#. He can be reached at tplum@plumhall.com.