Register Access in C++

C/C++ Users Journal May, 2005

Exploiting C++'s features for efficient and safe hardware register access

By Pete Goodliffe

Pete Goodliffe is a senior C++ programmer and columnist for the ACCU. He can be contacted at pete@cthree.org.

Embedded programmers traditionally use C as their language of choice. And why not? It's lean and efficient, and lets you get as close to the metal as you want. Of course C++, used properly, provides the same level of efficiency as the best C code. Moreover, you can also leverage powerful C++ features to write cleaner, safer, more elegant low-level code. In this article, I present a C++ scheme for accessing hardware registers in an optimal way.

Most embedded code needs to service hardware directly. This seemingly magical act is not that hard. Some kinds of registers need a little more fiddling to get at than others, but you certainly don't need an eye-of-newt or any voodoo dances. The exact mechanism depends on how your circuit board is wired up. The common types of register access are:

Memory-mapped I/O. The hardware lets you communicate with a device using the same instructions as memory access. The device is wired up to live at memory address n; register 1 is mapped at address n, register 2 is at n+1, register 3 at n+2, and so on.
Port-mapped I/O. Certain devices present pages of registers that you have to map into memory by selecting the correct device "port." You might use specific input/output CPU instructions to talk to these devices, although more often the port and its selector are mapped directly into the memory address space.
Bus separated. It's harder to control devices connected over a nonmemory-mapped bus. I²C and I²S are common peripheral connection buses. In this scenario, you must either talk to a dedicated I²C control chip (whose registers are memory mapped), telling it what to send to the device, or you manipulate I²C control lines yourself using GPIO ports (General Purpose Input/Output, assignable control lines not specifically designed for a particular data bus) on some other memory-mapped device.

Each device has a data sheet that describes (among other things) the registers it contains, what they do, and how to use them. Registers are a fixed number of bits wide; this is usually determined by the type of device you are using. This is an important fact to know: Some devices will lock up if you write the wrong width data to them. With fixed-width registers, many devices cram several bits of functionality into one register as a "bitset." The data sheet would describe this diagrammatically, similar to Figure 1.

So what does hardware access code look like? Using the example of a hypothetical UART line driver device presented in Figure 1, the traditional C-style schemes are:

Direct memory pointer access. It's not unheard of to see register access code similar to Listing 1, but we all know that the perpetrators of this kind of monstrosity should be punished. It's neither readable nor maintainable.
Pointer usage is usually made bearable by defining a macro name for each register location. There are two distinct macro flavors. The first macro style defines bare memory addresses (as in Listing 2). The only real advantage of this is that you can share the definition with assembly code parsed using the C preprocessor. As you can see, its use is long winded in normal C code, and prone to error; you have to get the cast right each time. The alternative (see Listing 3) is to include the cast in the macro itself; far nicer in C. Unless there's a lot of assembly code, this latter approach is preferable.
Macros have no overhead in terms of code speed or size. The alternative, creating a physical pointer variable to describe each register location, would have a negative impact on both code performance and executable size. However, macros are gross and C++ programmers already smell a rat here. There are plenty of problems with this fragile scheme. It's programming at a very low level, and the code's real intent is not clear—it's hard to spot all register accesses as you browse a function.
Deferred assignment is a technique that lets you write code like Listing 4, defining the register location values at link time. This is not commonly used; it's cumbersome when you have a number of large devices, and not all compilers provide this functionality. It requires you to run a flat (nonvirtual) memory model.
Use a struct to describe the register layout in memory, as in Listing 5. There's a lot to be said for this approach—it's logical and reasonably readable. However, it has one big drawback—it is not Standards-compliant. Neither the C nor C++ Standards specify how the contents of a struct are laid out in memory. You are guaranteed an exact ordering, but you don't know how the compiler pads out nonaligned items. Indeed, some compilers have proprietary extensions or switches to determine this behavior. Your code might work fine with one compiler and produce startling results on another.
Create a function to access the registers and hide all the gross stuff in there. On less speedy devices, this might be prohibitively slow, but for most applications it is perfectly adequate, especially for registers that are accessed infrequently. For port-mapped registers, this makes a lot of sense; their access requires complex logic, and writing all this out longhand is tortuous and easy to get wrong.

It remains to be seen how to manipulate registers containing a bitset. Conventionally, you write such code by hand, something like Listing 6. This is a sure-fire way to cause yourself untold grief, tracking down odd device behavior. It's easy to manipulate the wrong bit and get very confusing results.

Does all this sound messy and error prone? Welcome to the world of hardware devices. And this is just addressing the device: What you write into the registers is your own business, and part of what makes device control so painful. Data sheets are often ambiguous or miss essential information, and devices magically require registers to be accessed in a certain order. There will never be a silver bullet and you'll always have to wrestle these demons. All I can promise is to make the fight less biased to the hardware's side.

A More Modern Solution

So having seen the state of the art, at least in the C world, how can you move into the 21st century? Being a good C++ citizen, you'd ideally avoid all that nasty preprocessor use and find a way to insulate us from our own stupidity. By the end of the article, you'll have seen how to do all this and more. The real beauty of the following scheme is its simplicity. It's a solid, proven approach and has been used for the last five years in production code deployed in tens of thousands of units across three continents. Here's the recipe:

The first step is to junk the whole preprocessor macro scheme and define the device's registers in a good old-fashioned enumeration. For the moment, I'll call this enumeration Register. Although you immediately lose the ability to share definitions with assembly code, this was never a compelling benefit anyway. The enumeration values are specified as offsets from the device's base memory address. This is how they are presented in the device's data sheet, which makes it easier to check for validity. Some data sheets show byte offsets from the base address (so 32-bit register offsets increment by 4 each time), while others show "word" offsets (so 32-bit register offsets increment by 1 each time). For simplicity, I write the enumeration values however the data sheet works.

The next step is to write an inline regAddress function that converts the enumeration to a physical address. This function is a simple calculation determined by the type of offset in the enumeration. For the moment, presume that the device is memory mapped at a known fixed address. This implies the simplest MMU configuration, with no virtual memory address space in operation. This mode of operation is not at all uncommon in embedded devices. Putting all this together results in Listing 7.

The missing part of this puzzle is the method of reading/writing registers. I do this with two simple inline functions—regRead and regWrite (Listing 8). Being inline, all these functions can work together to make neat, readable register access code with no runtime overhead whatsoever. That's mildly impressive, but you can do so much more.

Different Width Registers

Up to now, you could achieve the same effect in C with judicious use of macros. I've not yet presented anything groundbreaking. But if your device has some 8-bit registers and some 32-bit registers, you can describe each set in a different enumeration. Let's imaginatively call these Register8 and Register32. Thanks to C++'s strong typing of enums, you can now overload the register access functions, as in Listing 9.

Now things are getting interesting: You still need only type readRead to access a register, but the compiler automatically ensures that you get the correct width register access. The only way to do this in C is manually, by defining multiple read/write macros and selecting the correct one by hand each time. This overloading shifts the onus of knowing which registers require 8- or 32-bit writes from programmers using the device to the compiler. A whole class of error silently disappears.

Accessing Bitsets

You can make it easier and safer to write values into register bitsets. Usually you only ever need to read or set a few bits at a time. Manually crafting the bit-twiddling logic leads to hard-to-follow code and is also very error prone. Ideally, you'd provide a set of simple functions for this operation, akin to regRead and regWrite, that can manipulate the individual register bits. The only question is how to specify which bits to read/write.

You construct each bitset definition in an enumeration by encoding the starting bit position and the number of relevant bits in the enumeration values. Unfortunately, you have to use a macro to do this (there's no reasonable alternative; enumeration values can only be integer constants so you can't use a helper function here).

The inline functions bitRead and bitWrite are defined to decode the enumeration values and act on them accordingly; see Listing 10. The resulting code is as good as any hand-written alternative, and again you have improved code readability and safety.

Extending to Multiple Devices

An embedded system is composed of many separate devices, each performing their allotted task. Perhaps you have a UART for control, a network chip for communication, a sound device for audible warnings, and more. You need to define multiple register sets with different base addresses and associated bitset definitions. Some large devices (like super I/O chips) consist of several subsystems that work independently of one another; you'd also like to keep the register definitions for these parts distinct.

The classic C technique is to augment each block of register definition names with a logical prefix. For example, you'd define the UART transmit buffer like this:

#define MYDEVICE_UART_TXBUF ((volatile
   uint32_t *)0xffe0004)

C++ provides an ideal replacement mechanism that solves more than just this aesthetic blight. You can group register definitions within namespaces. The nest of underscored names is replaced by "::" qualifications—a better, syntactic indication of relationship. Because the overload rules honor namespaces, you can never write a register value to the wrong device block: It's a syntactic error. This is a simple trick, but it makes the scheme incredibly usable and powerful.

Namespacing also lets you write more readable code with a judicious sprinkling of using declarations inside device setup functions. Koenig lookup combats excess verbiage in our code. If you have register sets in two namespaces DevA and DevB, you needn't qualify a regRead call, just the register name. The compiler can infer the correct regRead overload in the correct namespace from its parameter type. You only have to write:

uint32_t value = regRead(DevA::MYREGISTER); 
// note: not DevA::regRead(...)

Variable Base Addresses

Not every operating environment is as simplistic as discussed so far. If a virtual memory system is in use, then you can't directly access the physical memory-mapped locations—they are hidden behind the virtual address space. Fortunately, every OS provides a mechanism to map known physical memory locations into the current process's virtual address space.

A simple modification lets you accommodate this memory indirection. You must change the baseAddress variable from a simple static const pointer to a real variable. The header file defines it as extern, and before any register accesses, you must arrange to define and assign it in your code. The definition of baseAddress will be necessarily system specific.

Other Uses

Here are a few extra considerations for the use of this register access scheme:

Just as you use namespaces to separate device definitions, it's a good idea to choose header filenames that reflect the logical device relationships. It's best to nest the headers in directories corresponding to the namespace names.
A real bonus of this register access scheme is that you can easily substitute alternative regRead/regWrite implementations. It's easy to extend your code to add register access logging, for example. I have used this technique to successfully debug hardware problems. Alternatively, you can set a breakpoint on register access, or introduce a brief delay after each write (this quick change shows whether a device needs a pause to action each register assignment).
It's important to understand that this scheme leads to larger unoptimized builds. Although it's remarkably rare to not optimize your code, without optimization inline functions are not reduced and your code will grow.
There are still ways to abuse this scheme. You can pass the wrong bitset to the wrong register, for example. But it's an order of magnitude harder to get anything wrong.
A small sprinkling of template code lets you avoid repeated definition of bitRead/bitWrite; see Listing 11.

Proof of Efficiency

Perhaps you think that this is an obviously good solution, or you're just presuming that I'm right. However, a lot of old-school embedded programmers are not so easily persuaded. When I introduced this scheme in one company, I met a lot of resistance from C programmers who just could not believe that the inline functions resulted in code as efficient as the proven macro technique.

The only way to persuade them was with hard data—I compiled equivalent code using both techniques for the target platform (GCC targeting a MIPS device). Table 1 lists the results. An inspection of the machine code generated for each kind of register access showed that the code was identical. You can't argue with that!

It's particularly interesting to note that the #define method in C is slightly larger than the C++ equivalent. This is a peculiarity of the GCC toolchain—the assembly listing for the two main functions is identical: The difference in file size is down to the glue around the function code.

Conclusion

Okay, this isn't rocket science, and there's no scary template metaprogramming in sight (which, if you've seen the average embedded programmer, is no bad thing!). But this is a robust technique that exploits a number of C++ features to provide safe and efficient hardware register access. Not only is it supremely readable and natural in the C++ idiom, it prevents many common register access bugs and provides extreme flexibility for hardware access tracing and debugging.

The source code accompanying this article (available at http://www.cuj.com/code/) contains a full example of the scheme in action, along with some other extensions.