January 1992/Stepping Up To C++

Columns

Stepping Up To C++

Operator Overloading, Part 1

Dan Saks

Dan Saks is the owner of Saks & Associates, which offers consulting and training in C, C++ and Pascal. He is also a contributing editor of Windows/DOS. He serves as secretary of the ANSI C++ committee and is a member of the ANSI C committee. Readers can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or by email at dsaks@wittenberg.edu.
No programming language can be all things to all programers. No matter how many features a language has, there's always someone who wants just one more. A language like FORTRAN that caters to numerical applications provides integer and real numbers in a variety of precisions, and even throws in complex numbers, but does nothing for programmers who want rational (exact fractional) numbers or set algebras. The list of wants is never ending.
Although you can't create a language with everything that every programmer could ever want, you can design a language that gives programmers the tools to create the data types they need [1]. This is the approach taken by C++. As explained by Bjarne Stroustrup, the inventor of C++:
"C++ has no high-level data types and no high-level primitive operations. For example, there is no matrix type with an inversion operator or a string type with a concatenation operator. If a user wants such a type, it can be defined in the language itself. In fact, defining a new general-purpose type or application-specific type is the most fundamental programming activity in C++. A well designed user-defined type differs from a built-in type only in the way it is defined and not in the way it is used" [2].
C++ programmers create new types by defining classes. A class defines both the representation of class objects and the operations that can be performed on those objects. C++, like C, has many operators that apply to built-in types. For user-defined types to appear as if they were built-in, programmers need the ability to define new meanings for operators when they are applied to user-defined types.
For example, C++ has no built-in support for complex numbers, but you can define complex numbers as a class:
class complex
   {
public:
   complex ();
   ...
private:
   double r, i; // real and imaginary parts
   };
Class names are type names in C++, so
complex c1, c2;
is all your users need do to declare c1 and c2 as complex variables. But, if you implement complex addition as a member function called add, then users must write
c1.add(c2);
to add c2 to c1, and they will know that complex numbers are not built in. For complex numbers to look built-in, users must be able to write
c1 = c1 + c2;
or
c1 += c2;
In fact, you can make complex numbers look built-in using a feature known as operator overloading. This month's column shows how to define and use overloaded operators.

Rational Numbers
I'll begin my explanation of operator overloading by implementing a class for rational numbers (fractions), with values such as 1/2 or -3/4. Whereas floating point numbers represent numbers such as 1/3 and 4/9 only approximately, rational numbers represent them exactly.
My rational number class uses long integers to store the numerator and denominator of the fraction. The class defines four overloaded binary operators: + (addition), -- (subtraction), * (multiplication), and / (division), along with constructors and an output function. Listing 1 shows the header rational.h that defines the class.
The declaration of an overloaded operator is just like the declaration of a function, except that the name of the function is the keyword operator followed by the operator symbol. For example, the member function declaration
rational operator+(rational r);
in Listing 1 declares a member function called operator+ that accepts one argument of type rational and returns a rational.
For the most part, there's nothing magical about a function name like operator+. Had I named the function add, the function declaration would look like just another ordinary member function declaration:
rational add(rational r);
You can call operator+ just as you would call any other member function. That is, if r1, r2 and r3 are rationals, then
r3 = r1.operator+(r2);
adds r1 and r2 using rational::operator+, and then copies the result to r3. Obviously, there must be some other advantage to overloading operators, because this syntax for calling operator+ is no more readable than
r3 = r1.add(r2);
If there is any magic here, it's that for any object x of a class type, the expression x+y means x.operator+(y). In other words, for rationals r1 and r2, the expression r1+r2 is a familiar shorthand syntax for calling r1.operator+(r2).
Listing 2 shows a small test program that uses rationals, and Listing 3 shows the output from that program. Note that this implementation of rational numbers doesn't reduce fractions to their simplest form. That is a detail I will address later.
In Listing 2, a rational declaration such as
rational r2 (3, 5);
uses the constructor
rational::rational(long n, long d);
to initialize r2 with the value 3/5 (three-fifths). Although 3 and 5 are int (not long int) constants, a C++ compiler applies integral promotions to the arguments if necessary to find a constructor with a signature that matches the call. (See my earlier column, "Function Name Overloading," CUJ, Nov. 1991, for an explanation of function signatures.)
Notice that you can call that constructor to create rational numbers on the fly for use in expressions. For example, in the statement
r2 = r2 * rational(2, 3) + r1;
the subexpression rational(2, 3) calls the constructor to create a temporary rational object whose value is 2/3. Aside from the fact that rational constants must be written as rational (x, y) instead of x/y, expressions involving rationals look like expressions involving primitive types.
Nearly all the built-in unary and binary operators can be overloaded. Only
. . * : : ? : sizeof # # #
cannot. (.* is the dereferencing operator for pointers to members, which I have not yet covered in this column. # and ## are preprocessing operators for stringizing and pasting.) The ANSI C++ committee is even considering allowing overloaded. (dot). The operators that are both unary and binary,
+ - * &
can be overloaded both ways.
Overloading operators does not change their precedence or associativity. For example,
r2 = r2 * rational(2, 3) + r1;
is evaluated as
r2 = ((r2 * rational(2, 3)) + r1);
because * has higher precedence than + and + has higher precedence than =. You cannot create new operator tokens by overloading. For example, you cannot create ** as an exponentiation operator.
I will present an implementation of the overloaded rational operators shortly. But first, I must explain a little about the underlying mechanism of member function calls.

Using The Keyword this
A class member function operates implicitly on the class object to which the function is applied. For example, suppose class T looks like

class T { public: void f(const T &r); void g(const char *s); ... private: int m, n; ... };
Then the call

x.f(y);
means "apply member function f to object x using y as an argument." Inside the body of f, an unqualified reference to a class member, say m, is a reference to the m member in the T object for which the member was called. For instance, given the previous function call, the assignment in

void T::f(const T &r) { ... m = r.m * n; ... }
multiplies y's m (referring to y through r) times x's n and stores the result in x's m. The compiler knows how to find y, because it was passed explicitly as the argument r. But how does f know where to find x?
The member function f actually has a hidden extra argument that's the address of the object to which the function applies. A call such as

x.f(y);
translates into the C-like function call

f(&x, y);
Inside the member function, you can refer to the hidden argument by the keyword this. In every member function of class T, the compiler implicitly declares this as a pointer whose type is T *const. In other words, the member function declaration

void T::f(int i);
translates into the C-like declaration

void f(T *const this, int i);
Note that the this pointer itself is const, but *this is not const. This means you can't alter the pointer, but you can alter the object it addresses.
Inside the body of a member function, every unqualified reference to a data member of T is implicitly prefixed with this->. For example, inside f, the statement

m = r.m * n;
compiles as if it were written as

this->m = r.m * this->n;
Also, if f calls another member function from its own class, such as

g("hello");
the call compiles as if it were written as

g(this, "hello");

Implementing rational's Operators
Listing 4 shows the source file rational.cpp that implements the member functions for the rational class. All of the operator functions use the same implementation strategy:
1) create a local rational object called result;
2) compute result's numerator;
3) compute result's denominator;
4) return (a copy of) result.
The four operator functions apply the formulae shown in Table 1.
As I explained earlier, a class member function operates implicitly on the class object to which the member is applied. The expression
r1 + r2
compiles as
r1.operator+(r2)
which means "apply member function operator+ to r1 using r2 as the argument." In the body of operator+, num and denom not prefixed with r. refer to the num and denom of the left operand, and r.num and r.denom refer to the num and denom of the right operand.
Class rational has two constructors: a default (parameter-less) constructor and another two-argument constructor that initializes the numerator and denominator explicitly. I needed the two-argument constructor to create rational number objects with specific values. I used the default constructor to declare rational variables when I didn't care about the initial value, as in the first line inside the body of each operator function in Listing 4. The compiler generates a default constructor only if the class has no other constructors. Had I not declared the default constructor explicitly, then the class would not have a default constructor, and a declaration like
rational result;
would be illegal.
It turns out that, at least for this application, you don't really need a default constructor. By using the expressions for the numerator and denominator as arguments to the two-argument constructor, you can compute the results in a temporary object and return the temporary. That is, for any expressions n and d that are convertible to long int, you can rewrite
rational result;
result.num = n;
result.denom = d;
return result;
as simply
return rational(n, d);
The simplified implementations for the rational operators appear in Listing 5.

Assignment Operators
In addition to using the four overloaded rational operators, the program in Listing 2 also applies the assignment operator = to rationals. In fact there are two assignments:
r1 = (r1 + r2) / rational(1, 2);
r2 = r2 * rational(2, 3) + r1;
which work as you'd expect, even though the rational class doesn't define operator=.
If a class doesn't define operator=, the compiler generates a default version. The generated operator= performs what is called a memberwise copy — it copies each member of the right-hand operand to the corresponding member of the left-hand operand. For each member of the class that is itself a class object, memberwise copy uses that member's operator=.
For a class T, the generated assignment operator is declared as either
T &T:: operator=(const T &);
or
T &T::operator=(T &);
If all of T's members are primitive types or class types with assignment operators of the first form, then T's assignment operator also uses that form. Otherwise, T's assignment operator uses the second form.
Notice that both forms of the overloaded assignment operator return a reference to a class object. At first, you might expect that assignment has a void return. The most common use for assignment is as a single statement like
r1 = r2;
But in C++, as in C, assignments are not just statements; they are expressions that can be used in other expressions, such as
if ((r1 = r2) != 0)
or
r1 = r2 = r3;
= is right-associative (grouped from right to left), so the latter statement is interpreted as
r1 = (r2 = r3);
When applied to built-in types, the = operator yields the value of its left operand. The default operator= for a class type preserves this behavior for class types.
Whenever you explicitly overload an assignment operator, you should have the function return a reference to its left operand. That is, the return statement in an overloaded assignment should be something resembling
return *this;
Note that a function returning a reference cannot use
return this;
When a function returns a reference, the return statement binds the return expression value to the returned reference using the same semantic rules as a reference declaration. When you declare a reference such as
rational &r1 = r2;
r2 must be a rational or a rational &; it cannot be a rational *. You cannot initialize a reference with the address of another rational:
rational &r1 = &r2;
because the type of &r2 is rational *. By this rule, the return expression of a function that returns a reference must be an object and not a pointer.
return *this;
binds the reference to an object, but
return this;
is an error because it attempts to bind a reference to a pointer. Listing 6 shows an implementation for rational::operator= that has the same functionality as the default memberwise assignment generated by the compiler.
Overloaded definitions for the other assignment operators, like += and *=, should also return a reference to the left operand. Listing 7 shows the definition of class rational extended to include definitions for +=, -=, *=, and /=. The implementation of each assignment is very similiar to its corresponding binary operator.
Listing 8 shows an implementation of rational::operator+= and another version of rational::operator+ rewritten in terms of rational::operator+=. operator+= computes its result directly in the left operand and then returns a reference to that operand (by referring to this). operator+ creates a local rational called result, initialized by copying this using the (default memberwise) copy contructor. operator+ computes result using operator+=, and returns a copy of (not a reference to) that result.
Listing 9 shows an implementation of rational::operator-= written in terms of rational::operator-. The operator- in Listing 9 is the same as the one in Listing 5. operator-= computes its result in its left operand (*this) using both operator- and operator=, and then returns a reference to that left operand.

Reducing Fractions To Simplest Form
The output from my little test program in Listing 3 shows that the rational operators don't reduce fractions to their simplest form. For example the fraction 22/10 equals 11/5, and -270/150 equals -9/5. Listing 3 shows that each arithmetic operation on a rational makes both the numerator and denominator grow. Unless you eliminate the common multiples in the numerator and denominator, they will overflow after only a few more operations.
I solved this problem by adding a private member function rational::simplify. The revised class definition appears in Listing 10 and the member function definitions appear in Listing 11. simplify divides both the numerator and denominator by the greatest common divisor, which is computed by calling gcd. gcd is a non-member function because I thinks it's a general-purpose function that should eventually go into an application-independent library of math functions. (This version of gcd is based on one by Niklaus Wirth [3]. I don't know if it's optimal, but it seems to work well for this application.)
I rewrote all the rational binary operators in terms of their corresponding assignment operators. Every assignment operator calls gcd. This assures that each arithmetic operation on rationals always leaves the resulting fraction in its simplest form. Listing 12 shows the revised output of the test program with simplified fractions.
My technique is an improvement, but it doesn't prevent every avoidable overflow. Consider this example:
(999999/1000000) * (1000000/3)
The result of this multiply is 333333/1, but the numerator overflows before the simplify function reduces the fraction to its simplest form. A more robust implementation would eliminate the greatest common divisor before multiplying, not after.
I will continue this discussion of overloaded operators in my next column.

Inquiring Minds Want To Know...
I recently received this question via electronic mail:
Your article on reference types in the C Users Journal (September 1991) struck a cord with me, but not as you might expect. I should add that I am a spotty reader of CUJ, so I hope I am not dealing with an old issue.
In your opening remarks, you say "Just as many programmers often pronounce int *as "int star," I often pronounce int & as "int ref." I have been programming in C++ for about a year now, and have been following it with interest longer than that. And, I must admit, with not a little embarrassment, that I DON'T KNOW HOW TO PRONOUNCE IT! Is it "C Plus Plus" (which I always use), or "C Incremented"? My "intellectually minded" colleagues and I have been arguing this for at least a year. I have never seen a definitive statement about this very important topic.
Could you please answer this, and put my mind to ease? Maybe even publish the answer. Thanks in advance.
David D. Hathaway
762 N. Ripley St.
Alexandria, VA 22304
76104.1042@CompuServe.COM
It's pronounced "C plus plus." That's the way I've heard the inventor Bjarne Stroustrup pronounce it, and he says so in his latest book [2].

References
[1] Jagger, Mick and Keith Richard, "You Can't Always Get What You Want," Let It Bleed. Gideon Music, 1968.
[2] Stroustrup, Bjarne, The C++ Programming Language, 2nd. ed. Addison-Wesley, 1991.
[3] Wirth, Niklaus, Algorithms + Data Structures = Programs. Prentice-Hall, 1976.