Endian-Neutral Software, Part 1

System concepts and implications

James R. Gillig

Jim is a software engineer on OS/2 and IBM Workplace technologies in Boca Raton, Florida. He can be reached through the DDJ offices.

Endian is a processor-addressing model that affects the byte ordering of data and instructions stored in computer memory, and the data's representation provided by a programming language. Endian concepts can be confusing since there are different Endian types, different ways to represent these types, and intertwined considerations for both code and data portability between opposite-endian hardware platforms. Historically, the term "Endian" comes from Gulliver's Travels, by Jonathan Swift:

It is computed that eleven Thousand Persons have, at several Times, suffered Death rather than submit to break their Eggs at the smaller End.

In the first installment of this two-part article, I will lay the groundwork by examining what Endian means from the programmer's perspective. In next month's article, I'll discuss how you can write portable software by applying Endian-neutral design and programming principles.

The most common addressing models are Big-endian, derived from the left-to-right order of writing in western-culture languages, and Little-endian, stemming from the right-to-left order of arithmetic operations in hardware processors. As Figure 1 illustrates, the Big-endian (BE) addressing model assigns or maps the lowest address to the highest-order (that is, the most significant or leftmost) data byte of a multibyte-scalar data item. The Little-endian (LE) addressing model assigns or maps the lowest address to the lowest-order (least significant or rightmost) data byte of a multibyte-scalar data item.

The "Endianness" of a multibyte-scalar data type such as an integer halfword or word is BE or LE. When compiled for a LE processor, its byte order is the reverse of the byte order compiled for a BE processor. The simplest way to think about Endian is that a LE scalar data item is equivalent to a byte-reversed BE scalar data item. Such a scalar should be treated as a single, indivisible data item although it has more than one byte and is composed of smaller addressable units of storage. Aggregate data such as files, data structures, and arrays are composed of multiple data elements; each element that is a multibyte scalar has Endianness. Byte values or single-byte character data do not have Endianness because the smallest addressable unit of memory is one byte; consequently, byte order is not an issue.

Some processors are Little-endian (Intel x86), others are Big-endian (IBM AS/400, System/370, Macintosh), and some are bi-endian (PowerPC) and can run in either BE or LE mode. In turn, the Endianness of software (code and data) is determined by the processor for which it is written.

The data structure in Figure 2 shows how Endianness can affect addressability and byte order. When a data structure containing different data types is compiled for a BE processor and again separately for a LE processor, note the following about the compiled data structure:

Each data item is at the same address location, whether BE or LE (see variable b at address 0x08, Figure 2).
The LE byte order within a scalar data item is equivalent to byte-reversed BE (see variable b byte address, Figure 2).
Single-byte characters lack Endianness and are at the same byte address in BE or LE mode (see array d[7], Figure 2).

Endian Maps and Forms

An Endian model maps addresses to the bytes of a multibyte scalar. There are different ways to illustrate Endian maps and forms of data for human viewing. The byte addresses of a LE data item are shown in either left-to-right or right-to-left order, with byte values appearing in the opposite order. For a BE data item, both addresses and bytes are shown in the same left-to-right order. The relationship between BE and LE mappings and their forms of representation are shown in Figure 3. Figure 4 is based on the sample data structure in Figure 2 but illustrated in the alternate left-to-right addressing form for LE. A disadvantage of this form is that the scalar data items do not appear in the more readable (to western cultures) left-to-right order.

In addition to BE and LE, other related Endian maps and forms may exist as part of a processor's addressing architecture or its implementation. Some special forms may be internal to a processor and transparent to software; they should not be confused with BE and LE, which are visible to software. BE and LE are most common, but you should not categorically assume that they are the only addressing models in existence and that all data in the world is only BE or LE.

Finally, it is interesting to compare how halfword, word, and doubleword integers can appear as members of a data structure in BE and LE form.

The data structure in Figure 5(a) has its BE/LE byte-address mappings shown next to it. Figure 5(b) shows a different mapping for LE than before. Finally, Figure 5(c) shows yet a different byte-address mapping for LE. For BE, the byte address of each byte value is the same in (a), (b), and (c) of Figure 5; for LE, the byte addresses are all different for the same byte value.

Multibyte-scalar data should be treated by software as a single, indivisible entity, such as an integer, pointer, or float. You can write code that treats a scalar as aggregate data by addressing a specific byte location or byte subfield internal to the scalar. This practice results in code that is not readily portable between Endians. In Figure 5, the short-integer s3.k data item is at address 04 for both Endian types, but its two component bytes are at different addresses! A program accessing data at location (char*) &s3.k+1 would find 0x16 when running in BE mode and 0x15 in LE mode. In short, when twiddling with the internal bits and bytes of scalar data, do not assume they are stored at a particular address; otherwise, such a program may break when ported to a different Endian. Bits can be more portably selected in BE or LE with bitwise operations such as n & 0x03FC0000 and be independent of byte address. The important principle is not to rely on those bits being stored at a particular byte address.

PowerPC Bi-endian Capabilities

The PowerPC is a bi-endian RISC processor that supports both Big- and Little-endian addressing models. The bi-endian architecture provides hardware and software developers with the flexibility to choose either mode when migrating operating systems and applications from their current BE or LE platforms to the PowerPC. Figure 6 shows the address mapping of its 32-bit executable instructions when running in BE mode and LE mode. These examples illustrate how program instructions are like multibyte-scalar data and are subject to the byte-order effect of Endian.

Each individual PowerPC machine instruction occupies an aligned word in storage as a 32-bit integer containing that instruction's value. In general, the appearance of instructions in memory is of no concern to the programmer. Program code in memory is inherently either a LE or BE sequence of instructions even if it is an Endian-neutral implementation of an algorithm.

How does the PowerPC handle both LE and BE addressing models? The processor calculates the effective address of data and instructions in the same manner whether in BE mode or LE mode; when in LE mode only, the PowerPC implementation further modifies the effective address to provide the appearance of LE memory to the program for loads and stores.

The operating system is responsible for establishing the Endian mode in which processes execute. Once a mode is selected, all subsequent memory loads and stores will be affected by the memory-addressing model defined for that mode. Byte-alignment and performance issues need to be understood before using an Endian mode for a given application. Alignment interrupts may occur in LE mode for the following load and store instructions:

Fixed-point load instructions.
Fixed-point store instructions.
Load-and-store with byte reversal instructions.
Fixed-point load-and-store multiple instructions.
Fixed-point move-assist instructions.
Storage-synchronization instructions.
Floating-point load instructions.
Floating-point store instructions.

For multibyte-scalar operations, when executing in LE mode, the current PowerPC processors take an alignment interrupt whenever a load or store instruction is issued with a misaligned effective address, regardless of whether such an access could be handled without causing an interrupt in BE mode. For code that is compiled to execute on the PowerPC in LE mode, the compiler should generate as much aligned data and instructions as possible to minimize the alignment interrupts. Generally, more alignment interrupts will occur in LE mode than in BE mode. When an alignment interrupt occurs, the operating system should handle the interrupt by software emulation of the load or store.

A very powerful feature of the PowerPC architecture is the set of integer load-and-store instructions with byte reversal that allow applications to interchange or convert data from one Endian type to the other, without performance penalty. These load-and-store instructions are lhbrx/sthbrx, load/store halfword byte-reverse indexed and lwbrx/stwbrx, load/store word byte-reverse indexed. They are ideal for emulation programs that handle LE-type instructions and data, such as the emulation of the Intel instruction set and data. These instructions significantly improve performance in loading and storing LE data while executing PowerPC instructions in BE mode and emulating the Intel instruction behavior; this eliminates the byte-alignment and data-conversion overhead found in architectures that lack byte-reversal instructions. Currently, these instructions can be accessed only through assembly language. Until C compilers provide support to automatically generate the right load and store instructions for this type of data, C programs can rely on masking and concatenating operations or embed the assembly-language byte-reversal instruction.

--J.R.G.

Distributed Environments and Endianness

A distributed application running between client desktops, servers, midframes, and mainframes depends on the communications model and its API for resolving Endian differences. In a mixed, distributed environment, applications must be able to compensate for differences in data representation between the systems that participate in the application.

Specific implementations for handling Endian and other conversions exist within applications written to lower-layer communications APIs. Higher-level application-development models like the Remote Procedure Call (RPC) of the Distributed Computing Environment (DCE) provide more general and robust support that isolates applications from these differences.

Most existing distributed software is written directly to a communications API. Typical communication interfaces are TCP/IP with a sockets or streams interface, NetBIOS with its own control block-based interface, or various SNA or ISO OSI interfaces.

Although communications APIs guarantee that data will be transmitted/received between network nodes, they do not understand the data types being transmitted and cannot convert data or data attributes, including Endian type, between clients and servers that have dissimilar data representations. This forces a distributed application to compensate for any differences.

DCE RPC allows an application to be developed as if it were nondistributed. At the same time, it allows any of the application's subroutines to be executed on a remote system. The RPC application-development model divides the local (client) and remote (server) parts of a program along an application's internal procedural interfaces.

Since the remote procedures are application defined, they must be able to support a variety of high-level language data types, including int, char, and struct. RPC hides the fact that data communications take place between client and server subroutines, and one of its functions is to interpret and convert native data-representation differences that may exist between the communicating systems. These differences include the addressing model (Endianness), alignment rules, character-set encoding, floating-point conventions, and numerical data formats.

Unlike writing directly to a communications API, writing to the DCE RPC interface allows you to ignore data representation and Endian conversion. DCE RPC can convert a well-defined, broad set of data types, including most C scalar and vector types as well as some extended types for use in a distributed environment. Examples of the latter include a byte data type to protect data from any conversion and a pipe data type to transfer large blocks of data.

The RPC data marshaling and unmarshaling routines handle the bulk of the data-conversion responsibility. Marshaling converts typed data into an encoded, linear buffer suitable for data communications. Unmarshaling recreates the typed data by interpreting the encoded data in the buffer. The marshaling/unmarshaling process takes, for example, a struct data type, decomposes it into its elements, and writes the data and a description of the struct into a single logical buffer. Unmarshaling rebuilds the struct by reading the data and description contained in the buffer.

A typical client/server call has at least two data transfers: The first is from client to server, and the second is the return flow back from server to client. The RPC subsystem takes the arguments from the procedural interfaces and assembles them into buffers using the Network Data Representation (NDR) encoding rules. The buffers constructed by the RPC marshaling routines include the data itself, as well as descriptors defining the type, size, and relative location of the data and its elements. Additional protocol information includes a field describing the native data representation of the transmitting system.

Embedded in the buffers containing the transmitted data is a variable that classifies the data as Big- or Little-endian. The algorithm used to properly decode or unmarshal the data buffers uses the principle of receiver-makes-right; see Figure 7. The receiver determines from the protocol information whether the transmitter's data representation is the same as its own. If so, no conversion is necessary. If not, a specific, standard conversion routine is called for each data type unmarshaled from the received packet(s). The data can then be presented to the application in the native-machine format.

In summary, a distributed application either compensates for any Endian differences when using lower-layer comunications APIs or uses a higher-level model such as DCE RPC that supports automatic conversions.