November 1992/Stepping Up To C++

Columns

Stepping Up To C++

Dynamic Arrays

Dan Saks

Dan Saks is the owner of Saks & Associates, which offers consulting and training in C and C++. He is secretary of the ANSI C++ standards committee, and also contributing editor for Windows/DOS Developer's Journal. Dan recently finished his first book, C++ Programming Guidelines, written with Thomas Plum. You can write to him at 393 Leander Dr., Springfield, OH 45504, or dsaks@wittenberg.edu (Internet), or call (513)324-3601.
In C++, as well as in C, array dimensions in array declarations must be constant integral expressions with positive values. A constant expression is an expression whose value can be determined at compile time, rather than at runtime.
A constant integral expression can only contain integral constants, enumerators, floating expressions cast to integral types, or sizeof expressions. A constant integral expression may include operators, but all the operands must themselves be constant. For example,
#define M 10
#define N 15
...
int *b [M];
// OK, 10 elements
float c[2 * N - 1];
// 0K, 29 elements
Sometimes I wish I could defer choosing a particular array dimension until runtime. However, C++ (like C) does not let me write

void g(int n) { double a[n]; // error ... }
because n is not a constant expression.
C++ extends the definition of constant expression to include const integral objects initialized with constant expressions. For instance, C++ allows

const int len = 100; char name[len]; // OK in C++ but not C
However, C++ still does not allow

void g(int m) { const int n = m; double a[m]; // error ... }
because, even though n is const, its initializer is not a (compile-time) constant expression.
Often when I need an array whose size may vary each time I run the program, I set the array size to be as large as I'll ever need. For example, the standard header stdio.h defines FILENAME_MAX as the size of an array large enough to hold the longest file name string used by the host system. Whenever I manipulate file names, I use arrays declared like

char fn[FILENAME_MAX];
I can get away with this because FILENAME_MAX is rarely very big, so I need not worry about wasting a lot of memory. But, this is not a general solution to the problem of allocating space for arrays of widely varying sizes. Often, there just isn't enough room to give every array its largest possible size.

Explicitly-Allocated Dynamic Arrays
If you really need an array whose dimension is determined at runtime, you can declare a pointer (to the element type of the array) and allocate the array using new. The size of an array allocated by new need not be constant. For example,

void g(int n) { float *a = new float[n]; assert(a != 0); ... }
allocates an array of n floats addressed through pointer a. The call to assert verifies that the allocation succeeded.
In some ways, explicit allocation is almost as convenient as declaring the array as an auto variable. Inside g, you can still refer to the ith element of a as a[i]. However, when you're done with the array, you must remember to call

delete [] a;
to recycle the array. But this delete expression can't recover the storage if it's skipped by executing longjmp. The program in Listing 1 illustrates the problem.
This program uses the Standard C setjmp and longjmp functions to recover from an error. The setjmp call in main marks the recovery point and returns zero. main calls g, which dynamically allocates array a, and then calls f. If f detects an error of some sort, it calls longjmp. longjmp does not return to f; it discards the stack frames for f and g and returns control to main. Inside main, the return from longjmp appears like a return from setjmp with a non-zero return value.
By skipping the normal return from g, the longjmp also skips the delete expression
delete [] a;
at the end of g. Since the longjmp discards a without deleting the storage it points to, that storage is lost forever — you get a memory leak. Someday, C++ programmers will be able to use the new exception-handling features (try, throw, and catch) instead of setjmp and longjmp. throw improves on longjmp by calling destructors for local objects as it discards stack frames, but even a throw won't invoke that delete expression to free a's storage.
Manually allocating and deallocating the storage for dynamic arrays is tedious and error-prone. C++ programmers typically solve this problem using dynamic array objects with destructors.

A Dynamic Array of float
Listing 2 shows the class definition for float_array, a minimal class for an array of floating-point values, that lets you set and change the number of elements at runtime. The data structure is similar to the one I used for variable-length strings in my last column (see "Stepping Up to C++: Initialization vs. Assignment", CUJ, September 1992). Each float_array has two data members: array is a pointer to the first element of the actual array allocated from the free store, and len is the number of elements in that array. Listing 3 shows the implementation for the non-inline float_array member functions. Listing 4 contains a small test program that does some simple float_array operations. Figure 1 is a sample output from that program.
The constructor
float_array(size_t n = 0)
constructs a float_array with a given number of elements. (Remember, size_t is an unsigned integral type, so the number of elements must be non-negative.) Since parameter n has a default value (= 0), this constructor can be called with no arguments, so it's also the default constructor. If n is zero, the constructor simply sets the length to zero and the pointer to null. Otherwise, it allocates an array and initializes all the elements with zero. The call to assert catches allocation failures. Figure 2 shows the representation of fa and fb declared as
float_array fa;
              // default n = 0
float_array fa(4);
             // explicit n = 4
The copy constructor
float_array(const float_array &fa)
constructs a float_array by copying fa. This constructor replaces the shallow copy that would have been generated by the compiler with a deep copy that completely replicates the structure of fa.
The destructor
~ float_array ()
simply deletes the allocated array. Although array may be null, you need not check because deleting null is legal and has no effect.
The assignment operator
float_array &operator=
   (const float_array &fa)
replaces the shallow copy that the compiler would have generated with a proper deep copy. I added the if statement
if (this == &fa)
   return *this;
at the beginning of the function (in Listing 2) to avoid copying an object on top of itself. fa is a reference, but & applied to the reference applies to the referenced object, not to the underlying pointer that implements the reference. So &fa is an expression of type "pointer to const float array" whose value is the address of the float_array object to be copied (the source object). this is the pointer to the destination of the copy. Hence, that conditional expression determines if the source and destination are the same object.
Strictly speaking, I don't need that first if statement in my operator=. If I omitted that if, the remaining function might spin its wheels for a while copying an object over itself, but it would still produce a valid result. However, that first if statement illustrates a useful technique for detecting when two objects are the same.
I'll skip operator[] for the moment. It's the subject of the next section.
float_array::length ()
simply returns the number of elements currently in the array. The class definition declares length as an inline function, but doesn't include the function body. The function body appears later in the header file. I prefer ot to clutter class definitions with the bodies of inline function members. However, inline member function definitions must appear later in the same header so the compiler can see the function body and substitute it at each call. Had I put the definition of float_array::length in fa1.cpp (Listing 3) instead of in fa1.h (Listing 2) , then the calls to float_array::length (Listing 4) would compile as non-inline calls.
The header also contains a non-member output operator
ostream &operator<<
   (ostream &os, const float_array &fa)
that writes a float_array to an ostream as a sequence of floating-point values separated by spaces and enclosed in braces. Note that operator<< is not a friend of float_array because it does not need access to any of float_array's private members.

Subscripting
The float_array member function

operator[] (size_t i)
overloads the [] operator so that subscripting on float_arrays looks very much like subscripting on other arrays. C++ translates the expression fa[i] into fa.operator[](i).
Unlike the built-in [] operator, float_array::operator [] is not commutative. For array name (or pointer) a, the expression a[i] is equivalent to *(a + i), *(i + a) and even i[a]. However, since float_array::operator[] is a member function (it cannot be a non-member), the left operand must be a float_array object. Thus, you cannot rewrite fa[i] as i[fa].
Operator overloading does not preserve the usual relationships between operators applied to non-class types. Thus you cannot write fa[i] as *(fa + i) unless you also overload * and + appropriately (if it's even possible).
Overloading operator [] gives the class designer the ability to detect and respond to subscripting errors. You can choose to ignore the error or you can catch it by calling

assert(i < len);
as in Listing 3. You can even choose to extend the array so that the errant subscript is no longer out of bounds.
operator[] returns a float & that refers to the selected float_array element. Why a reference? Why not a float or a pointer to a float? Well, suppose operotor[] returned just a float. A function call expression returning a float is an rvalue. This means you can use it on the right side of an assignment, as in
x = fa[i];
but not on the left, as in
fa[i] = y;
That's not acceptable, so try defining the function as
float *float_array::operator[] (size_t i)
   {
   assert(i < len);
   return &array[i];
   }
so it returns a float * pointing to the selected element. The returned pointer value is still an rvalue. However, you can dereference the pointer to get an lvalue for the selected element. You can use that lvalue on either side of assignment.
x = *fa[i];
copies the ith element of fa to x, and
*fa[i] = y;
stores y into the ith element of fa. But the extra * spoils the look and feel of the subscripting expressions.
A reference behaves like a pointer that's automatically dereferenced each time you use it. A function call returning a reference is an lvalue. When operator[] returns a reference, you can write
x = fa[i];
fa[i] = y;
and even
fa[i] = fa[j];
as if you were subscripting any other array type.

The const Qualifier
C++, like C, is designed for programming a very wide range of applications, from very large end-user applications to very small embedded control programs. C++ improves support at the high end with features like classes and operator overloading. It retains C's power at the low end with features like the const and volatile qualifiers.
The const qualifier indicates that the qualified object can't be changed, so the translator is free to place that object in read-only memory (ROM) or maybe even optimize the object out of existence. For example, given

const char name[] = "Ben";
a C++ (or C) translator may place name in ROM. Once you declare name as const, the translator must diagnose any attempt to modify it, as in

name[0] = 'b'; // error
C++ won't let you strip off the const qualifier without at least a little fuss. For example,

char *p: name; // error
is an error because p drops the const qualifier. Failure to catch this error would open the door to

*p = 'b';
which writes over the const object. Of course, you can strip the const qualifier using a cast, as in

char *p = (char *)name;
if you think you know what you're doing.
When you write a call like

strcpy(fullname, name);
how does the compiler know that strcpy won't change name? The second parameter in strcpy's declaration (in string.h) contains the appropriate const qualfier

char *strcpy(char *s1, const char *s2);
In effect, the const qualifier is a promise by strcpy that it won't change the characters addressed by s2. Obviously, strcpy can't make the same promise for s1, because strcpy's purpose in life is to write into the characters addressed by s1.
If you look through the Standard C headers, you'll see that it uses const qualifiers wherever possible to maximize library support for ROM-based applications. But most C++ programmers don't write ROM-based applications. Is there another reason to use const?
I've said before, and I'll say again, I don't enjoy debugging. I want my language translator (compiler and linker) to catch my errors for me. That's why I like the stricter type checking of C++. I prefer a "declarative" style of programming where I tell the translator as much as I can about my intent, so it can tell me when I violate that intent.
I view const as a statement of intent. Basically, I adopt the view that "everything that can be const should be const." I put a const qualifier on every pointer or reference parameter refering to an object that I don't intend to change. I want the compiler to poke me if I accidently try to change one of those objects.

const Member Functions
Let's apply this policy to the float_array class in Listing 2 and Listing 3 and the test program in Listing 4. The display function in Listing 3 is an non- member function that displays the name and value of a float_array in the form
name = { v1 v2 ... }
display changes neither of its parameters, so both should have a const qualifier. But, when you change the function heading to
void display(const char *s,
     const float_array &fa)
you get a diagnostic in the function body, saying something like "cannot bind non-const reference to const object" or "temporary used for parameter fa in call to operator<<(ostream &, float_array &)".
The problem is that display calls operator<< (declared in Listing 2 and defined in Listing 3) , passing fa (now a const float_array &) as the second argument. But the second parameter of operator<< is still a (non-const) float_array &. In other words, display promised not to change its float_array argument, but operator<< hasn't made the same promise. But, since operator<< doesn't change it's float_array argument either, it can easily keep the promise. Simply add a const qualifier.
You change the function heading to
ostream &operator<<(ostream &os,
         const float_array &fa)
in both the header and the source file. But, when you recompile Listing 3, you get at least two more diagnostics, this time in the body of operator<<. One diagnostic occurs at the call to fa.length, and the other occurs at the expression fa[i]. Both messages probably say something about calling a non-const member function for a const object.
Now that operator<< has promised it won't change fa, how does it know that fa.length and fa[i] won't change fa either. operator<< can't apply a float_array member function to a const float_array object unless that member function also promises it won't change the object. How does a member function state that promise?
The problem is that the float_array operand doesn't appear explicitly in a member function's parameter list. Every (non-static) member function has an extra parameter, known as this, that's a pointer to the object to which the function applies. Ordinarily, the type of this in a float_array member function is float_array *const. That is, a member function declaration like
size_t float_array::length()
translates into something more like
size_t float_array::length
   (float_array *const this)
As such, you cannot change this itself, but you can change the object it points to.
But operator<< wants an assurance that float_array::length won't change the float_array object. It needs the declaration for this to be
const float_array *const this
But, since you never declare this explicitly, where can you place the const? After the parameter list, as in
size_t length() const;
length is now a const member function. It promises not to alter the float_array object to which it applies. The compiler will flag any attempt to modify either data member array or len inside length.
You should also change operator[] to a const member function. You must be careful to change it both in the class definition and in the member function definition. Listing 5 shows the float_array class definition with length and operator[] as const member functions. Listing 6 shows the function definition for operator[] as a const member function.

Is const Worth the Trouble?
As you may have noticed, adding const qualifiers to arguments in existing C++ functions causes the need for consts to ripple through the entire program. Adding const in one function usually forces you to add const in another. And another. Many good C++ programmers just don't think it's worth the bother retrofitting const to existing C++ code. It's a hard to argue with them. It's much easier to apply const properly if you plan to use it from the start.

More to Come
If you rebuild the test program using the const member functions, everything compiles and links with no complaints. The resulting program produces the same output as before. But the operator[] in Listing 6 is an accident waiting to happen. It let's you write an expression, without a cast, that inadvertantly modifies a const float_array. And the compiler can't detect it.
I'll explain the cause and cure in my next column.