November 1991/Stepping Up To C++

Columns

Stepping Up To C++

Function Name Overloading

Dan Saks

Dan Saks is the owner of Saks & Associates, which offers consulting and training in C, C++ and Pascal. He is also a contributing editor of TECH. Specialist. He serves as secretary of the ANSI C++ committee and is a member of the ANSI C committee. Readers can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or by email at dsaks@wittenberg.edu.
Many C libraries contain groups of closely related functions that perform essentially the same operation but on operands of different types. For example, the Standard C library functions include a family of "put" functions declared in <stdio.h>:
int putc(int c);
int fputc(int c, FILE *stream);
int puts(const char *);
int fputs(const char *s, FILE *stream);
All of these functions write something to a stream; however, they differ in the type of thing they write and how they determine the stream. putc and fputc write a single character (passed as an int), while puts and fputs write each character of a null-terminated string. fputc and fputs require that the caller specify the output stream explicitly; putc and puts assume the stream is stdout.
Even though functions in a closely related family have different names and implementations, programmers often refer to the functions as if they all had the same name. In fact, belaboring the physical distinction between logically similar operations introduces unnecessary detail into most discussions. Often, you simply refer to the operation by its logical name, e.g. put, and disambiguate the operation (if necessary) by specifying the type(s) of the operand(s), e.g., character or string.
For example, you rarely hear C programmers say "I putc-ed the character" or "I puts-ed the string". Rather, they say "I put the character" or "I put the string". C programmers understand that putting a character and putting a string are performed by physically distinct functions, and they have no reason to keep reminding each other that this is so.
Nonetheless, the C programming language insists that each function have a unique name. More precisely, neither two functions declared in the same scope nor two functions with external linkage in the same program can have the same name. These restrictions add to the burden of both developers and users of C language libraries. Developers must think up function names with sometimes arbitrary variations in their spellings, while users must learn these various names or keep looking them up.
C++ alleviates this burden by letting you "overload" function names, that is, declare more than one function with the same name in the same scope. Function name overloading lets you create programs and libraries using function names that accurately reflect the way you'd like to deal with function names. This month's column explains how to declare and use overloaded functions.

Overloading Basics
Overloading function names in C++ requires no special effort. You declare overloaded functions as you would any other functions. (Early editions of C++ required a separate declaration using the keyword overload. Many C++ compilers support, but no longer require, this anachronism.)
For example, you can implement the four stdio.h output functions listed above as four overloaded functions named put:

// putc: int put(int c); // fputc: int put(int c, FILE *stream); // puts: int put(const char *s); // fputs: int put(const char *s, FILE *stream);
When the compiler encounters a call on a function named put, it selects the appropriate function to call by matching the type(s) of the actual argument(s) in the call against the type(s) of the formal parameter(s) in one of the declarations. For example,

put('a', stdout);
calls the function declared as

int put(int c, FILE *stream);
and

put("Hello, World\n");
calls

int put(const char *s);
A call such as

put("Fatal Error #", n, stderr);
produces a compile error, because none of the declarations for put accepts three arguments. However, the actual argument(s) need not match the formal argument(s) exactly; the compiler applies some promotions and conversions in an attempt to find a match. For example,

char c; . . . put(c);
calls

void put(int c);
because char promotes to int. The rules for argument matching are quite detailed, and have changed over the years. For the moment, I'll stay with simpler examples where the matching is straightforward.
The Standard C library provides another example where overloading is useful. <stdlib.h> defines two functions that compute absolute values:

int abs(int j); long labs(long j);
and <math.h> defines another:

double fabs(double x);
You might reasonably expect that fabs operates on floats rather than doubles, and that the absolute value function for doubles would be called dabs. However, for historical reasons, fabs operates on arguments of type double. Traditional (K & R) C performs all float arithmetic in double precision, and promotes float function arguments to double, so all the <math.h> functions accept and return arguments of type double.
A Standard C implementation may compute float expressions in float precision, so some applications may need a version of fabs that accepts and returns a float. Also, Standard C supports long double operands, so applications using long double might need a long double version of abs. In anticipation of these needs, the C standard reserves all the function names declared in <math.h> suffixed with either f or l for naming the corresponding math functions with float and long double operands and return values. In the case of fabs, the reserved names are fabsf and fabsl.
Using C++, you can avoid these naming complexities by simply overloading the desired forms of abs:

int abs(int j); long abs(long j); float abs(float x); double abs(double x); long double abs(long double x);
To some extent, a macro such as

#define abs(i) ((i) >= 0 ? (i) : -(i))
also solves the naming problem. This single macro definition computes the absolute value for any arithmetic type. But, as I mentioned when I first introduced inlines ("Rewriting Modules as Classes," CUJ, July 1991), macro calls can have unwanted side-effects. For example, the call

y = abs(*p++)
expands to

y = ((*p++) >= 0 ? (*p++) : -(*p++))
which increments p twice. Standard C doesn't permit this implementation for abs, labs and fabs.
In C++, declare your overloaded abs functions as inlines, as in

inline float abs(float x) { return x >= 0 ? x : -x; }
Since an inline function evaluates its actual argument(s) only once (at the call), a full set of overloaded inline abs functions provides the naming simplicity and efficiency of a macro, without the unwanted side-effects.

Function Signatures
C distinguishes functions by their names only. Obviously, since C++ allows overloading, names no longer uniquely identify functions. C++ distinguishes functions by their names and signatures.
The signature of a function is the sequence of types in its formal parameter list. For example, the signature of

int fputs(const char *s, FILE *stream);
is

(const char *, FILE *)
and the signature of

float abs(float);
is just

(float)
The formal argument names aren't part of the signature. Thus

int put(int); int put(int i); int put(int n);
all have the same signature. As in C, a function declaration that has the same signature as a previously declared function is simply a redeclaration, not an overloading, of that function.
Note that

int put(); int put(int);
refer to the same function in C, but to distinct overloaded functions in C++. In C, a function with an empty argument list has an unspecified argument list. Such a function is compatible with any other declaration for a function with the same name and return type, even a declaration with a non-empty argument list. However, in C++,

int put();
is equivalent to

int put(void);
which clearly has a different signature than

int put(int);
typedef names do not create distinct types; they are aliases for existing types. C++ turns typedef names into their equivalent types when computing signatures. Thus, for example, in

typedef int INT; . . . int put(int); int put(INT);
both functions have the same signature.
Class members' functions can be overloaded just like non-member functions. In my last column ("Reference Types," CUJ, September, 1991), I introduced a rudimentary string class that used overloaded constructors (although I never called them "overloaded"). I will examine those constructors more closely, and extend the class with additional overloaded member functions.

A String Class
When I introduced the string class last time, I didn't provide any rationale for it. Since string classes are among the most popular C++ programming examples, and one I will use in future columns, I should back up a little and explain them more fully.
C provides extremely limited support for strings. The language recognizes string literals, and the library provides a modest set of string operations, and that's about it. Each C application must adopt its own policy for managing string memory and keeping string operations from writing beyond that memory.
The library string functions are fairly lax about safety. For example, strcat and strcpy will write past the end of the storage actually allocated for string with nary a peep. strncat and strncpy are safer, but a little less convenient. An application that uses strncat and strncpy must cart around maximum string lengths with each string. None of the C library functions extends the destination string if it's too short to hold the result, nor do they provide any direct indication that some characters at the end of the result were discarded.
A well-written C++ string class provides a safe, flexible, and space-efficient alternative to null-terminated strings and the C library string functions. Each string object manages its own memory, and guarantees that string operations don't corrupt data outside that memory. String operations that increase a string's length, like concatenation, automatically allocate more memory if necessary. There are many ways to implement a string class. Stroustrup [1] and Hansen [2] present a sampling of the alternatives.
My String class stores variable-length strings in character arrays allocated from the free store. Each String has a data member str that stores the pointer to the first character in the array, and another member len that stores the array length (plus one for a \0 at the end of the string). Each concatenation operation allocates a new, larger character array from the free store, and deletes the old array. The class definition and a small test program appear in Listing 1.

Overloaded Constructors
Class String has two (overloaded) constructors:

String::String(const char *s); String::String(const String &s);
The first constructor initializes a String with a copy of a null-terminated string. The second, a copy constructor, initializes a String with a copy of the text from another String. The type(s) of the argument(s) in an object declaration determine the signature of the constructor, so that

String s1("hello");
calls the first constructor to initialize s1 with a copy of hello, and

String s2(s1);
invokes the second constructor to initialize s2 with a copy of s1.
The String class in Listing 1 doesn't have a default constructor (a class constructor with an empty signature). C++ only creates a default constructor for a class when that class has no explicitly declared constructors. Without a default constructor, a String declaration with no actual arguments, such as
String s3;
is an error.
Listing 2 adds a default constructor to the String class. This constructor initializes a String as an empty string by pointing the str member to an empty string allocated from the free store. Alternatively, the constructor could simply set str to 0 (a null pointer), but this would complicate the concatenation functions described later.
Please note that there's a big difference between
String s3;
and
String s3();
The former declares a String object s3 using the empty constructor. The latter declares a function s3 that accepts no arguments and returns a String.

Overloaded Member Functions
The String class in Listing 1 has only a single concatenation function,
void String::cat(char c);
that appends a single character to the end of a String. A useful String class should provide for concatenation of at least two other types of operands:
void String::cat(const char *s);
which appends a null-terminated string to a String, and
void String::cat(const String &s);
which appends one String to another. Listing 3 shows an implementation of the String class with overloaded cat functions, along with a simple test program that invokes each one.
You can declare the cat function that appends one String to another either with or without a reference parameter. That is, instead of
void String::cat(const String &s);
which passes the String by reference, you can use
void string::cat(String s);
which passes the String by value. In this case, the choice of signature changes neither the body of the function nor the form of a call to it. However, using a reference argument produces more efficient calls.
In C++, the semantics of argument passing are the same as the semantics of initialization. Each function call creates a new object for each formal parameter by initialization. This means that if the type of a formal parameter is a class with constructor(s), then the function call initializes that parameter by invoking one of those constructors. If the class also has a destructor, then the function return destroys the formal parameter object by invoking that destructor.
Thus, if you defined the cat function as
void String::cat(String s);
then each call of the form
s1.cat(s2);
uses the String copy constructor to initialize formal parameter s with a copy of s2. The return from cat discards s using the String destructor. These constructor and destructor calls add considerable overhead to each cat call. In contrast, passing a const reference argument simply binds the reference to the actual argument, avoiding both the constructor and destructor calls.

Overloading And Return Types
The return type of a function is not part of its signature. Thus you cannot declare

double f(int); char *f(int);
in the same scope. Overloading on return types places an undue burden on the compiler to determine the programmer's intent. Consider this example:

int n; . . . if (f(n)) . . .
Since both double and char * expressions are meaningful as the controlling expression of an if, the compiler must make a difficult choice. While it's not impossible to define rules to disambiguate function calls based on return type, I suspect the rules would be either very complicated or unintuitive.

Overloading And Scope
All of the overloaded instances of a function with a given name must be declared at the same scope level. Declaring a function with that same name in an inner scope hides the overloaded declarations from the outer scope. The example in Listing 4 illustrates this behavior.
Listing 4 defines two functions named put at file scope, one that puts a character and another that puts a null-terminated string. However, main contains a declaration for only one of the put functions - the one that puts a character. The other put function is invisible inside main and the call
put("hello, world", stdout);
causes a compile-time error.
C++ programs rarely redeclare functions at block scope, so the circumstances in Listing 4 don't occur in practice. Listing 5 shows a more practical situation involving class scopes.
A class defines a new scope. The members of a class are in the scope of that class. If a class member function has the same name as a family of overloaded functions declared in the scope enclosing the class, then all of the overloaded functions are invisible in the class scope.
Listing 5 contains the definition of class File, with a member function put that writes a null-terminated string to a File. File::put uses one of the overloaded put functions declared at file scope to actually put the string. Unfortunately, all of those put functions at file scope are invisible inside the body of File::put. The C++ compiler thinks the call to put(s, f) inside File::put is a (recursive) call to File: :put, but with the wrong number of arguments.
To avoid this error, use the scope resolution operator ::, as shown in Listing 6. When the compiler encounters a name prefixed with ::, it searches for that name in file scope, rather than in the current scope.

Type-Safe Linkage
In C++, overloaded functions defined in one compilation can be declared and referenced in another compilation. Therefore, the linker, as well as the compiler, must have the ability to resolve function references based on function names and signatures, and not just names. This means that all external function declarations in a program, not just the overloaded ones, must agree in name and signature with a function definition somewhere in the program.
Declaration matching across compilation units is called "type-safe linkage." If you inadvertently write the wrong declaration for a function defined in another module, you get a link error, because the linker won't find a function definition with the same signature as the declaration.
Most linkers still can't handle data typing information directly. Therefore, C++ often uses a technique called "name-mangling" to encode the function name and signature into a single long name. Ellis and Stroustrup [3] describe one such encoding scheme. Their scheme encodes the signature of
int put(const char *s, FILE *stream);
as something like
put_FPCcP4FILE
This is the name used by the linker to resolve external references.
Early C++ implementations actually displayed the mangled names in error messages for unresolved externals, forcing you to demangle the names yourself. Most current implementations automatically demangle the name, or provide separate a tool that does it for you.
Since return types aren't part of signatures, type-safe linkage isn't 100 percent safe. Nonetheless, type-safe linkage is an improvement over C, and another reason to considering stepping up to C++.

A Small Correction
In "Your First Class" (CUJ, May 1991), Listing 2 contains a definition for ln_seq::ln_seq(unsigned n) that shouldn't be there. Later in the article, I added this constructor to the class, so it should appear in later listings. However, since the constructor is not declared in Listing 1, Listing 2 won't compile unless you remove the definition.

References
[1] Stroustrup, Bjarne, The C++ Programming Language,, 2nd ed., Addison-Wesley, 1986.
[2] Hansen, Tony, The C++ Answer Book, Addison-Wesley, 1989.
[3] Ellis, Margaret A. and Bjarne Stroustrup, The Annotated C++ Reference Manual, Addison-Wesley, 1990.