P.J. Plauger is senior editor of C/C++ Users Journal. He is Convener of ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Draft Standard C++ Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.
Introduction
A common practice among programmers has long been to use flag and mask words. Both of these are objects typically of some integer type. But they do not store a range of integer values. Rather:
I group these two uses together because they often overlap. Consider the bitmask type ios::fmtflags. It has both bit fields, such as skipws, and mask fields, such as floatfield.
- Each bit field of a flag word represents an independent Boolean value.
- Groups of one or more bits in a mask are set to select, by a bitwise AND operation, corresponding mask fields in integer values of the same size.
The idiom in C has long been to define macros that specify each bit and mask field, as in:
#define SKIPWS 0x80 #define FLOATFIELD 0x70 .....You can then define flag and mask words of some integer type and perform bitwise operations between such objects and these macros.The limitations of this approach are obvious:
Enumerations help some with the first limitation. The draft C++ Standard improves matters further by tightening the type checking on enumerations. Indeed, the bitmask types are used throughout the Standard C++ library to replace old-fashioned flag words. But bitmask types still suffer from the remaining limitations listed above.
- The translator can perform no type checking, to ensure that only the relevant macros mix with flag or mask words.
- You have to contrive non-overlapping bit and mask fields by hand. Adding or changing a field is both tiresome and error prone.
- When you exhaust the bits in the largest available integer type, you must indulge in significant rewrites to add more bit or mask fields.
X3J16 and WG21, the committees standardizing C++, adopted the header <bitset> as a way to overcome all three limitations. It defines the template class bitset<N>, and a handful of related functions. The parameter N, of type size_t, specifies the number of elements in an object of this class. You can perform bitwise operations on this class by much the same rules as for the basic integer types, and with the same notation, regardless of the number of bits in a particular instantiation. You can also designate an element by its position (counting from zero), rather than writing explicit shifting and masking code.
A good implementation of this template class is presumably economical in its use of storage. In particular, the idea is that:
You still need to contrive names for the bits in a bitset<N> object, but a conventional enumeration can now define positions for the distinct bit fields in the object. And, unlike older implementations of flag words, you can more easily loop over subranges of bits, just by varying a position value.
- for small enough values of N, a single integer stores all the bits and the generated code as efficiently as traditional C code
- for larger values of N, multiple integers store all the bits and the generated code loops over these objects as needed
I've already shown two headers that defines templates. The simpler one is <iomanip>, which defines the iostreams manipulators that take an argument. (See "Standard C/C++: The Header<iomanip>," CUJ, December 1994.) The more elaborate one is <string>, which defines a whole family of string classes. (See "Standard C/C++: The Header <string>," CUJ, July 1995, and "Standard C/C++: Implementing <string>", CUJ, August 1995.) I've also discussed the current limitations of template technology, in conjunction with <iomanip>.
The template class bitset<N> is rather more ambitious than any of the manipulators, and rather less complex than the templatized strings. It is also an invention of the Committee. Some implementations, particularly those that endeavor to be portable, are still hampered by available technology. And some would-be users are still hampered by lack of implementations, and sheer lack of time to build experience. It is thus too soon to tell whether programmers will widely abandon conventional flag and mask words in favor of this newer and more disciplined technology.
Using <bitset>
You include the header <bitset> to make use of the template class bitset<N>. Here, N is an integer constant expression that specifies the number of bits in an instantiation of the template class. Objects of such a template class let you represent and manipulate a fixed-length sequence of bits. Earlier, I dubbed such objects flag and mask words. The advantage here is that you can manipulate all objects of template class bitset<N> the same way irrespective of the number of words required for their representations.To construct a bitset<N> object x0, say of size 20 bits, with all bits initially zero, you can simply write:
bitset<20> x0;Each object x reports the length of its bit sequence with the call x.length(). That way, you don't have to reconstruct the value N, or store it separately, from when you first instantiate the class until you later need it again.One style for determining the size of a particular instantiation is in conjunction with the enumeration constants that designate its elements, as in:
enum color (red, green, blue, ....., N); typedef bitset<N> colorset;You append N to the list of enumeration constants you define to access different elements (bits) of colorset objects. That way, the translator determines the number of bits for you.You can also construct a bitset<N> object and initialize its elements from an unsigned integer value, as in:
// set every other bit bitset<32> x1(0x55555555);In this case, the least-significant bit in the integer argument determines element zero. More generally, the initializer stores the value one in element j if, for the argument val, the expression val & (1 << j) is nonzero.Finally, you can construct a bitset<N> object and initialize its elements from corresponding elements of a character sequence controlled by string object. The character value '0' initializes the element to zero, and the character value '1' initializes the element to one. No other character values are permitted. As usual with string arguments, you can specify one or two additional arguments to select a substring. In the examples that follow, the comment after each constructor indicates the constructor with equivalent effect:
string alt ("10101"); // bitset<10> x2(0x15) bitset<10> x2(alt); // bitset<10> x3(0x0a) bitset<10> x3(alt, 1); // bitset<10> x4(0x02) bitset<10> x4(alt, 1, 3);An unsigned integer or string object that specifies more than N bits causes no problems. The excess bits are simply ignored.Given two objects x0 and x1 of class bitset<N> (for the same value of N, of course), you can perform the usual bitwise operations:
x0 = x1 x0 &= x1 x0 & x1 x0 |= x1 x0 | x1 x0 ^= x1 x0 ^ x1 x0 <<= m x0 << m x0 >>= m x0 >> m x0 == x1 x0 != x1Here, m is an expression with some integer type. The semantics of all these operations involving x0 and x1 are a natural extension of those for unsigned integer values.You can deal with all bits at once, by writing:
x0.set() // set all bit x0.reset() // clear all bits x0.flip() // flip (invert) all bits ~x0 // flip all bits x0.count() // count all set bits x0.any() // test if any bit set x0.none() // test if no bit setOr, given a valid bit position n, you can affect just that bit, by writing:
x0.set(n) // set bit n x0.set(n, b) // set bit n to // (b != 0) x0.reset(n) // clear bit n x0.flip(n) // flip bit n x0.test(n) // test bit nThe constructors, shown above, define mappings from unsigned integers or string objects to bitset<N> objects. You can perform the inverse mappings as well:
// return unsigned short equivalent x0.to_ushort() // return unsigned long equivalent x0.to_ulong() // return string equivalent x0.to_string()The first two member functions can report an overflow error (by throwing an exception) if the corresponding value is too large to represent as the return type. Absent any overflow, such a return value should serve as a constructor argument that generates a bitset<N> object equal to x0.Finally, you can insert and extract objects of class bitset<N>. For example:
cin >> x0effectively extracts a string object from the standard input stream and assigns it to some temporary. Unlike the usual string extractor, however, the function:
The function then converts the controlled character sequence, by the same rules as for constructing a bitset<N> object from a string argument, and assigns it to x0.
- does not use the width field stored in cin, nor does it set the field to zero
- extracts at most N characters, after skipping any white space
- otherwise extracts up to, but not including, the first character that is not 0 or 1
Last of all, you can insert the character sequence into, say, the standard output stream, by writing:
cout << x0As you might expect, this is equivalent to inserting x0.to_string().
Implementing <bitset>
Listing 1 shows the file bitset, which implements the standard header <bitset>. All template functions appear within the header as inline definitions, as I discussed much earlier. Nevertheless, you will find that not all translators will accept this header unmodified.In particular, those based on the cfront preprocessor may refuse to translate inline functions that contain for statements. You have to reorganize the code for cfront. I present the header in this form because I believe it is clear and fairly simple. Hence, it should serve as a good starting point even if it is not completely acceptable to all currently available translators.
A handful of functions are nominally declared outside class bitset<N>. But, function declarations of the form:
bitset<_N> operator&(const bitset<_N>& _L, const bitset<_N>& _R) {return (bitset<_N>(_L) &= _R); }are a bit equivocal. For N to be a true parameter, the declarations should properly be written as template functions. But few, if any, commercially available implementations currently let you write a template function with no type parameters. In this particular case, an integer parameter does indeed provide the information needed to enable overload resolution all the template functions have at least one argument of class bitset<N>. That's a hard point to get across to a translator, however, and the form shown here is not a valid way to do the job.One way to do so with current technology is to define the template functions as inline friend functions within the template class bitset<N>. The parameter N is then available to express all the necessary types.
Within the template class definition, I introduced the type definition _T. An object of class bitset<N> stores an array of one or more unsigned integer member objects. _T is the type of the array elements. I have chosen unsigned long for this type, but you might prefer unsigned short or unsigned char. Some implementations may manipulate the smaller element types more efficiently. Or you may declare large arrays of bitset<N> elements and want to keep object sizes smaller.
Private to the class, I define two useful constants. They masquerade as elements in an enumeration a common trick but they are used independently as numeric values:
I won't recite all the reasons for choosing these particular derived parameter values. All I can say is, they were the result of several rewrites to make the code at once compact, robust, and readable.
- _Nb, the number of bits in an array element
- _NW, one less than the number of elements in the array
The template class bitset<N> also defines five secret protected member functions:
I discussed the exceptions associated with invalid-argument, overflow, and range errors many moons ago. (See "Standard C/C++: The Header <exception>," February 1994.) As you can see, all of these functions are used throughout the member function definitions.
- _Tidy, which initializes the array elements all to the same value (usually zero)
- _Trim, which sets to zero any unused bits in the last element of the array, both for good hygiene and to simplify several bitwise operations
- _Xinv, which reports an invalid-argument error
- _Xoflo, which reports an overflow error
- _Xran, which reports a range error
None of the member function definitions is particularly complex. Nevertheless, several of the larger ones are poor candidates for making inline, as they are declared here. As I have discussed earlier, some translators refuse to make inline any definition that includes a looping statement. (Some simply refuse to translate the code at all, even though they are supposed to.) Others may generate inline code, but slavishly loop even over a single word.
Obviously, this code works best in conjunction with a smart template optimizer. Such creatures are currently in short supply, but that should not always be the case. The draft C++ Standard is banking ever more heavily on template technology, and not just within the Standard C++ library. If such pressures continue to drive the commercial compiler industry, then the kind of code I show here will eventually come into its own.
Meanwhile, I can report that this code does translate successfully under at least some current implementations. To make more than nontrivial use of it, however, you should check carefully to see the kind of code it generates. You may well want to modify it.
This article is excerpted in part from P.J. Plauger, The Draft Standard C++ Library, (Englewood Cliffs, N.J.: Prentice-Hall, 1995).