Features


A Fixed-Point Numeric Class for C++

Ron Fosner


Ron Fosner is a Principal at Lotus Development Corporation, where he designs and writes graphics and analytical software. His current interests are the application of visualization techniques such as 3-D, stereo, and interactivity to the problems of reducing raw data to usable information. While he has degrees in Chemistry and Chemical Engineering, he has always been looking for ways to apply computers to solve problems. He can be reached at Lotus Development Corporation, 1 Rogers St., Cambridge, MA 02142, or through e-mail at ron@lotus.com.

Three-dimensional graphics typically involve lots and lots of floating-point calculations, making it hard to get the program to respond quickly. And when the program responsiveness depends on how fast you can perform these calculations, you quickly search for alternatives.

Since I was writing in C++, I decided to design a class that would provide the resolution the application required without resorting to floating-point calculations. I created the fixed-point class (Listing 1, Listing 2, and Listing 3) . Unlike floating-point notation, which encompasses a large range of values by converting everything into an exponent and a mantissa, the fixed-point class can only encompass a relatively small range of values using a fixed number of decimal places.

The design of the class centers around the use of a double word of storage, using the first word (16 bits) for the whole number part and the remaining word for the fractional part. The most-significant bit of the whole number word is reserved as the sign bit. The entire fractional word is used to store the fractional values. This gives you a number that can represent values from approximately -32,768 to 32,767, in increments of 1/65,535th, or about 0.00001526. The design constrains us to projecting points onto a space only +/-32K in any direction. The resolution of each number is such that even a few trips through a transformation matrix will still provide reasonable accuracy.

Designing a Concrete Data Type

A concrete data type in C++ behaves like a built-in type. Since the behavior of the built-in types, such as int, is well known, a class with an intuitive interface and a consistent behavior might be easier to use. James Coplien, in his book Advanced C++, (Addison-Wesley, 1992) describes a form for creating concrete types by using what he terms the orthodox canonical class form. Basically a template, this form helps you design a concrete data type with known and predictable behavior from its interface and its data-handling processes. The form requires that the actual storage of the data be completely hidden from the users of the class, and that the class contain a default constructor, copy constructor, assignment operator, and destructor. After creating a class skeleton using the form, you create the public interface.

If your class will mimic built-in types, base its interface on the interface provided by the complex class, which comes with most C++ compilers. After all, you're probably using C++ for code reuse, so why derive a new interface when one with years of debugging is available? (There are a few twists to the implementation of the interface that I'll discuss in detail later.)

Once you've developed the basics of the class, you're nearly done. Since built-in types such as ints and floats behave in a well-documented manner, make sure your class behaves as closely to a built-in type as possible to minimize surprises. So in addition to the default constructor, destructor, copy constructor, and assignment operator, you'll need the math operators, comparison operators, and also some conversion operators that allow a fixed-point number to interact with built-in types.

Class Data Storage

Since I decided to use the same storage as a long (32 bits on a PC), it was convenient to be able to refer to the entire 32 bits, or either 16-bit part. Referring to the data as a long enabled me to use the Fixed class's data as a long for addition, subtraction, equivalence operators, and most other operators that don't care that there was some hidden decimal point. (A 32-bit addition is a 32-bit addition.) You can think of the numbers being left-shifted 16 bits before being stored to push the decimal point off the end. The real challenge comes with input, multiplication, and division, where you are forced to acknowledge the storage format.

The actual storage is a union between a long and a struct of two integers (one signed for the whole numbers, and the other unsigned for the fraction). The typedef for the union is:

typedef union
{
signed long long_rep;
struct {
   unsigned int fractional_part;
   signed int whole_part;
   } half_rep;
} FixedData;
FixedData.long_rep refers to the entire 32 bits, while FixedData.half_rep.whole_part refers to the whole number part of the data. This data layout is Intel 80x86 specific.

Fixed Class Constructors and Destructors

In order to make the Fixed class as useful as possible, I designed it to handle the following cases for instantiation and initialization:

Fixed A,B;    // default constructor
A=10; B=10.1; // assignment from ints or floats
A = B;       // assignments from other Fixed point numbers
Fixed C=B;    // copy constructor
Fixed D(20),E(10.1),F(B); // ctors with arguments
You'll typically be initializing Fixed values with either integers, floating-point values, or other Fixed values, so you must have constructors accepting these types. You can add other initializing constructor as needed. For example, if you wanted to read in Fixed values from a file or the keyboard, then you'd probably want to create a constructor that could create a Fixed from a text string.

You should be aware that C++ will generate a default constructor, copy constructor, assignment operator, and address-of operators if you don't provide them. Thus, if you've been using one of these implicit functions unawares, you may have let yourself in for some unexpected behavior such as memory leaks or improperly initialized classes. Program defensively by not depending too much on the compiler to provide these functions for you. If necessary, you can turn any of these defaults off by simply declaring a private prototype for them in your header, but never actually implementing the function. This effectively prevents the compiler from providing implicit functions. This way either the compiler or the linker will catch the attempted use.

The compiler will provide the address-of operator (operator&), a function frequently forgotten, usually until it's too late, if you don't provide it. But if you are implementing a class where you want to perform reference counting, that is, having a pointer to the data with a "use count" recording the number of different classes sharing the data, then you'll need to implement your own operator& to track the data usage.

For a class to mimic a built-in type, you need a default constructor, because when you create an array the default constructor will be called for each element. If you create a default constructor, you should create a destructor, even if the destructor (as in this case) is a null function.

A copy constructor is used for creating a new Fixed object from a preexisting Fixed object. Constructors that take arguments, particularly arguments that are built in types, will provide an easy method of implementing some of the math operators. (More on this later.)

You should initialize all the data in the constructor. This way you always leave a freshly-instantiated variable in a known state. For more complicated classes with more internal data, you should create one private member function to take care of initializing all of the variables in a standard order. This makes it easier to write the destructor, since no matter which constructor you call, there will be only one destructor function. If you allocate memory in the constructors, the destructor is the place you would free it.

Fixed Class Operators

It's easy to provide the relational and equality operators, since you can use the long representation for a Fixed, then use the built-in capabilities. This follows from the fact that if X <= Y, then (X<<16) <= (Y<<16). If you keep the internal format of the signed or unsigned integer representation of the whole number part in sync with the long number representation as signed or unsigned, then it's a simple matter to use the long representation for the operation.

The math operators needed for Fixed point are addition, subtraction, multiplication, and division. The same associative rule used for the equality operators will work for addition and subtraction. You simply pretend that you have multiplied each Fixed point value by 64K, added or subtracted them, then divided by 64K. This won't work with higher-order operators such as multiplication or division, since, for example, multiplying two numbers by 64K before you multiply them together results in the original two numbers multiplied by 64K squared, which quickly reduces the applicable range. So, for multiplication and division (recognizing that multiplication consumes the most time) you probably will have to break down and use assembly language to properly perform the operations.

For purposes of brevity, I'll restrict myself to an 80386+ discussion of how to perform these operations. The 80386 chip has a 32-bit multiply and a 32-bit divide instruction, which greatly speeds these operations when used in the Fixed class. They can also be applied to any chip capable of a native 32-bit or greater operation. For a chip like the 8086, you will have to break the operation into steps tedious in their bookkeeping.

Implicit Conversions

The class would be pretty useless if it prevented the use of floats or integers in an equation involving Fixed numbers. However, until you've played around with a class mimicking a built-in type, there are some subtleties that you should ponder concerning mixed type operations. Typically you might define the addition operator for the Fixed class as follows:

Class Fixed
   {
   public:
   Fixed();// default constructor
   Fixed( int );
      // constructor taking an int
   Fixed operator+( const Fixed& );
   .. other implementation details...
   }
Consider the following operations that use the addition operator:

Fixed fixed_var, result1, result2;
result1 = fixed_var + 1;
                 // compiles OK
result2 = 1 + fixed_var; // error!
Should there be any difference between the line for result1 and the line for result2? You bet! They are just hidden by the compiler. These lines can be rewritten (and are equivalent to):

result1 = fixed_var.operator+(1);
result2 = 1.operator+(fixed_var);
Where you have (or might have) defined an operator+ taking an int argument in class Fixed, there's no complimentary call in the built-in type int. Actually, the first line only works because you have a constructor for a Fixed that takes an int as an argument. The compiler initially searches for Fixed Fixed::operator+( Fixed& ). When it can't find it, it tries to match what you requested with what it can convert. Since, via the constructor taking an int, it knows how to convert an int into a Fixed, it does this conversion, allowing the first line to compile by silently calling the constructor for you. However, the second line can't be solved so easily, since the compiler will not apply user-defined conversions to the first operand of a member operator function.

What you need is a global function. You remove the operator+ member function from class Fixed and replaced it with a global function, like so:

Fixed operator+( const Fixed&, const Fixed& );
Now both the lines of code will compile successfully, each having silently called the constructor to convert an int into a Fixed. Typically you'll find this is a friend function of the class in most text books, but remember, you need only make it a friend if it will access the private data members of the class. In this case, you need to grant extra access privileges to the global operator+, since it can't do what it needs entirely through the public interface of the Fixed class.

Making the Class Efficient

You can reduce overhead by being aware of when the compiler will create a temporary Fixed value. If you program the operator+ in a straightforward manner, such as this:

Fixed Fixed::operator+ (Fixed rhs1, Fixed rhs2 )
{
   Fixed return_value;
   return_value.long_rep = rhs1.long_rep + rhs2.long_rep;
   // add the long numbers
   return return_value;
}
you create a lot of overhead. The temporary value, return_value involves a default constructor call, then the assignment of the long, then a destructor call. A better implementation might be to try to wrap the constructor call with assignment:

Fixed Fixed::operator+( Fixed rhs1, Fixed rhs2 )
{
   return Fixed( rhs1.long_rep,rhs2.long_rep);
}
In this case, you would make this constructor (taking two long arguments) a private member function. This would wrap the instantiation code in with the initialization code and result fit in a faster implementation.

As a final touch, fine tuning the class by using a profiler is not only a great way of identifying the hot-spots in the program, but it'll let you know when you'll need to do some optimization of the class interface. In the example program, I'm performing a matrix multiplication — this is an operation that involves a repetitive sequence of multiplications and additions. Since our interest is to eke out as much performance from the class as possible, I've created a member function, addProduct, that replaces the operation of a multiplication of two fixed point numbers and then the summing of the result into a third fixed point. The function takes two references to fixed point numbers, performs the multiplication of the two and then sums the result into the current object (pointed to by the this pointer), all in assembly language. For a complex operation that is going to be performed a number of times, it can be a significant speed increase to hand-code some assembly instructions of maximize the benefit of using the Fixed point class. Using this optimized function significantly increased the execution speed, mostly through the simple elimination of Fixed point temporaries created by the compiler during the math operation. Testing performed on an 80386 / 80387 @ 20Mhz for a simple 3 by 3 matrix multiplication showed that the optimized C++ code was 86% faster than using floats and 165% faster than using doubles. This illustrates that while C++ compilers are getting better at optimizing away inefficiencies, you're currently better off examining the operations that you need to perform repeatedly or that take up the most time (for which you need the profiler) and hand-coding some assembly code. Aside from using a profiler to figure out how to make your program run faster, it's also a great way to gain insight into the way that the compiler (and by inference, C++) operates. Once you understand how the compiler does things, you've acquired a considerable body of useful knowledge that will prove invaluable in that quest for tight, optimized code.

Listing 4