Extensions


Bit Arrays with C++

A. Giancola and L. Baker


Anthony Giancola has an M.S. in Computer Science from the University of New Mexico. He currently works on scientific visualization and parallel processing.

Louis Baker has a Ph.D. in astronomy and works on numerical modeling. He has written four books, three on C and C++.

Both may be reached at Mission Research Corp., 1720 Randolph Rd. SE, Albuquerque, NM, 87106.

Introduction

The oriental boardgame of Go is played on a rectangular grid of 19 horizontal and vertical lines, the players alternately placing stones of black and white at the intersections of the lines. To write a Go-playing program, we wanted to set up an array corresponding to the board, with two bits devoted to encode for each location the conditions: unoccupied, black stone, white stone, and don't care. We also wanted to do so using C+ + in the natural manner, i.e. referring to each bit pair by syntax such as

board(1, 19) = 1;
or something similar.

It's easy to overload operators such as () and [] to return the bits, but how do you make this work on the left-hand side of an assignment? As bits are not directly addressable, getting an lvalue is not an option. And because C++ wants operator= to return a reference to an object or the object itself, it seemed that overloading would not give the form desired. Examples in books and magazines, such as N. Wilt's [5], simply use a Get function.

This article is about the implementation of such a bit array. It supports data elements of 1, 2, or 4 bits through the use of templates to determine the number of bits in the fundamental object. This limitation to a number of bits that is a power of two prevents the data elements from spanning bytes, simplifying the access. The technique presented here can be used for other data types as well, and can be generalized to arbitrary bit-collection sizes.

Assigning to Bits

The assignment operator in C++ (operator=) is normally overloaded only to return a pointer (reference) to a data value or the data value itself. Obviously, as bytes (type char) are the smallest directly addressable data elements in the C memory model, there are no pointers to bits. The crucial observation is that it is possible to overload the assignment operator to return type void, i.e., nothing.

The user can then freely do what he wishes as a side effect of the assignment function being invoked. Returning void prevents overwriting the desired byte by some otherwise undesired value, thereby insuring that the desired side effect is the only thing that, in fact, happens. There is no reason to return anything, and tests with the Cameau C++ compiler showed cfront generated the same C code when an object of type BIT_TMP was returned by the assignment operator.

Listing 1 gives code to illustrate the bit array. A sample main test driver is given in Listing 2, with the expected output in Listing 3 and Listing 4 for the cases of bit objects of two and four bits. Two classes are defined. The main class is Bit_2D_Array, which contains the bit data in an array of characters, and an element of the other class, BIT_TMP. Operators are defined for operator=, which copies one bit array to another, and for operator==, which compares two bit arrays. Output is with a dump function rather than ostream due to compiler limitations connected with templates.

These operators work byte-by-byte. BIT_TMP is nested within Bit_2D-Array to provide information hiding and prevent inadvertent misuse of objects. Masking is used to insure that assignments to bit type fit within the assigned bits, as the assignment of integer types to the bit elements is permitted. Array bounds are also checked.

The bit array is accessed by overloading the function call operator: operator(). Thus, the bit-array syntax is similar to that of Fortran, with an element identified as b(1,4), rather than b[1,4] or b[1][4]. This is due to the limitations of C+ + the first syntax is invalid and the latter would be cumbersome to implement for bit arrays.

operator() computes the location of the bit element specified, i.e., which byte it resides in and which bit(s). It then uses the other class, BIT_TEMP, to save these results. The value field of this class retains the value of the specified bit field, and the other fields are used as effective lvalues.

Class BIT_TMP does the actual bit field manipulation. It again provides operator= and operator==, this time for these fields. operator== merely compares the value fields. We need to allow statements of the form bfield(1,3) = 1 as well as bfield(1,3) = bfield(3,1). So two versions of operator= are provided. Both take the result that the right-hand side saves and stores it in the value field of the left-hand data item. We have made BIT_TMP local_rec private to prevent unauthorized access.

Note that the usual bitwise operators |, &, ~, and ^ are defined for elements without further effort on the part of the programmer. If fields of a single bit are being used, the logical operators | |, &&, etc. may be used interchangably with the bitwise operators. They may easily be overloaded to operate on entire bit arrays if desired.

Example

This code was tested on the Comeau C+ + 3.0 compiler on the Amiga, and Borland C++ 3.0 on the PC. The former uses cfront and is a licensed derivative of the AT&T C++; the latter is not. The most significant and interesting difference between the two was that the latter warns that the use of a for statement prevents in-line expansion of a function, and continues, while the former considers the same thing to be an error and terminates the compilation.

The use of cout in BIT_TMP() is a work-around required by cfront. A nested class within a template must have all functions in-line. The for loop in the constructor prevents this code from being compiled in-line. Borland did not mind, but cfront objected. By inserting the cout, cfront accepted the code.

Listing 2 contains a test driver main that exercises these routines. Listing 3 and Listing 4 shows the results of running this test code for two- and four-bit objects, respectively. Assignments such as B(1,3) = B(3,1) work on both compilers.

Interestingly, print statements inserted in the BIT_TMP operator() show that Borland C++ would call that operator first for the (3,1) case and then for the (1,3), while Comeau evaluates the locations in the other order. Why do both orders of evaluation work, when there is only one temporary? In the Comeau case, temporary variables are used to store the results of both evaluations. Note that if the definition of the operator were changed to return a pointer, the temporaries would not be created and the transposition example would fail. There has been some criticism of C++'s use of temporaries as producing excessive overhead in time and storage. For our example, an aggressive optimizer or a change in C++ that would eliminate the use of temporary storage here might break the code.

Coplien [3] considers in his book (p.49) an example in which operator[] is overloaded. He returns a pointer, but forces the use of an intermediate temporary by making a type conversion. It has been pointed out in comments on the USENET network that, in doing so, he subverts the type checking inherent in C++. Rather than rely on type conversion to produce an rvalue, we explicitly overload the assignment operator to move the data from one temporary to the destination. However, it is not absolutely guaranteed that another implementation of C++ will not employ temporaries and order evaluations such that statements of the form B(i,j) = B(k,l) will fail to perform correctly.

Conclusions

This example illustrates the flexibility that can be obtained in C++ by overloading operator= to return a void type and relying upon side effects to perform the desired actions. It also illustrates a useful overloading of operator(). The result is a bit array that behaves as if every element is directly addressable. Other examples of overloading the assignment operator to return void are given in the references. Finally, the use of templates is illustrated; they add flexibility in the number of bits per data object, and the size of the array.

The class discussed here can be generalized to add higher dimensions, bit fields of arbitrary length, set operations, etc. This arises from the possible confusion as to whether a sequence of items separated by commas within parentheses is an argument list with one or more default arguments or an overloaded operator ( ).

We make no claim to having produced the most efficient bit array package possible. Obviously, a good assembly language programmer could develop a more efficient implementation. As C+ + matures, it will become more efficient in principle, and implementations will become more efficient in practice.

References

Baker, L. 1991. "Complex Arithmetic and Matrices in C++." C Users Journal (Jan.): 123.

Baker, L. 1992. C Mathematical Function Handbook. N.Y.: McGraw Hill.

Coplien, J. O. 1992. Advanced C++, Reading, Mass: Addison Wesley.

Dreitlein, J. F. and J. R. Sauer. 1990 "Spinor Software Tools in C++." Computers in Physics (Jan./Feb.): 64.

Wilt, N. "A Bit Vector Class in C++." PC Techniques (Feb./March): 55.