April 1994/Questions & Answers

Columns

Questions & Answers

Pointers and Arrays

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of All On C, C for COBOL Programmers, and UNIX for MS-DOS Users, and was a member of the ANSI C committee. He also does custom C/C++ programming and provides SystemArchitectonicssm services. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
Q
I keep referring back to your article in CUJ, October '93, p.123 about "array of pointers to char" vs. "pointer to array of n chars." On page 124 near the bottom you declare
char *array_p_char[NUMBER_OF_STRINGS];
Now, I've had a very hard time allocating this array of pointers dynamically. Here's what I tried:

char * pc1 = new char; // returns char * char * pc2 = new char[n]; // returns char * char ** ppc3 = new char*; // returns char * * char ** ppc4 = new (char(*)); // returns char * * char * pc5 = new (char*)[n]; // returns char * (what's char * pc6 = new (char(*))[n]; // returns char * this?) char ** ppc7 = new char*[n]; // returns char * * (hurray!) char (** ppc8)[1] = new (char(*)[n]); // returns char[1] * * (huh?)
I can understand pc1, pc2, and ppc3 very well, but I begin to wonder at ppc4. Why does it need parentheses? Is it the same as ppc3?
What kind of data structure(s) are returned for pc5 and pc6? Are they the same?
ppc7 seems to be what I'm looking for? Is it?
What kind of data structure is ppc8? What is returned by new? If ppc3 and ppc4 are the same, why are ppc7 and ppc8 different?
Thank you for a very useful article and for any answers you can provide for this one obscure area.
For the sake of completeness, how do you allocate the array_p_char using malloc()?
Hans Salvisberg
Berne, Switzerland
A
My column on pointers seems to have generated a good deal of feedback. Let me review a couple of points before answering your specific questions.
If you wish to allocate memory for a set of characters, you can do it one of two ways — statically or dynamically. Suppose you need a set of SIZE_OF_STRING characters. Then you can use either of the following:

#define SIZE_OF_STRING 10 char char_array[SIZE_OF_STRING];
or

char *p_char_array; p_char_array = mal- loc(SIZE_OF_STRING * sizeof(char));
The sizeof(char) function is not absolutely necessary, since in all installations sizeof(char)==1 but I include it for completeness. Now if you want to allocate a number of these strings (say NUMBER_OF_STRINGS), you have three choices. First, you can statically allocate storage for everything. The code will look like this:

#define SIZE_OF_STRING 10 #define NUMBER_OF_STRINGS 5 char char_array[NUMBER_OF_STRINGS] [SIZE_OF_STRING];
Second, you can statically declare an array of pointers, and dynamically allocate the strings. For example:

char *array_p_char[NUMBER_OF_STRINGS]; for (i = 0; i < NUMBER_OF_STRINGS]; i++) { array_p_char[i] = malloc(SIZE_OF_STRING * sizeof(char)); }
(This code supplies the answer to your last question.) Third, you can dynamically allocate everything, which includes the array of pointers and all the strings targeted by these pointers. You might implement this approach as follows:

char **p_p_char; p_p_char = malloc(NUMBER_OF_STRINGS * sizeof(char *)); for (i = 0; i < NUMBER_OF_STRINGS]; i++) { p_p_char[i] = malloc(SIZE_OF_STRING * sizeof(char)); }
For the C++ crowd, let me paraphrase the prior malloc calls using new. The code to allocate a single array of chars becomes:

char *p_char_array; p_char_array = new char[SIZE_OF_STRING];
To create an array of pointers to chars, the code becomes:

char *array_p_char[NUMBER_OF_STRINGS]; for (i = 0; i < NUMBER_OF_STRINGS]; i++) { array_p_char[i] = new char [SIZE_OF_STRING]; }
To create a similar data structure bound to a double pointer, you can use the following:

char **p_p_char; p_p_char = new char * [NUMBER_OF_STRINGS]; for (i = 0; i < NUMBER_OF_STRINGS]; i++) { p_p_char[i] = new char [SIZE_OF_STRING]; }
Now I will return to your examples. Taking them in order:

char * pc1 = new char;
This statement dynamically allocates memory for a single char and stores a pointer to the allocated memeory in pc1. (See Figure 1a. )
char   * pc2      = new char[n];
This statement dynamically allocates memory for n chars and stores a pointer to the allocated memory in pc2 (Figure 1b) . This statement works just like the combined statements
char *p_char_array;
p_char_array = new char
            [SIZE_OF_STRING];
which I provided in a previous example.
char  ** ppc3     = new  char*;
char  ** ppc4     = new (char(*));
These statements perform identical allocation operations. The additional parentheses in the second statement do nothing. Both statements allocate memory for a char *, that is, for a pointer to char (Figure 1c) .
char   * pc5      = new (char*)[n];
char   * pc6      = new (char(*))[n];
These statements compile without errors — that sort of mystifies me; The new returns char *, but I expected it to return a char **, which would point to a memory block big enough for n pointers to char.
What is actually being allocated is unclear to me. I asked P.J. Plauger about this mystery. He suggested that there are some subtleties in the C++ standard that dictate when you need to write parentheses and when sometimes even parentheses won't do. Different implementors probably interpret the rules differently and may even vary in how well they get their own rules right. Perhaps a reader can come up with an explanation of what these allocations really do.
char  ** ppc7     = new  char*[n];
This statement allocates memory for n pointers to char (Figure 1d) . For each pointer to char, you should allocate memory in the following manner:
ppc7[i] = new char[m];
Your final example,
char (** ppc8)[1] = new (char(*)[n]);
throws us into an entirely different ballgame. (I could probably devote an entire book to all the possible combinations of *, (), and [].) You have declared ppc8 to be a pointer to a pointer. What is the second pointer pointing at? It is pointing at objects of type char[1], that is, array of 1 char. The new operator is going to allocate a pointer, pointing at char[n], that is, array of n char. new returns the address of a memory block long enough to hold a single pointer to something of type char[n].
I'll suggest a rule of thumb for declaring and allocating complex types. Make up a variable of the type you are trying to allocate. Assign the address of that variable to the type of pointer you are using. Compile it. If it works, then you have allocated the right type.
When you use [] and a size specifier with new, you are allocating an array of objects of the type you specified, not just a single object of that type. This size does not enter into the type specification. As a substitute for your example code, you could try:
char c;
char *p3;
char (*p4);
char (*pc8)[n];
char   * pc1      = &c;   // Okay
char   * pc2      = &c;   // Okay
char  ** ppc3     = &p3;  // Okay
char  ** ppc4     = &p4;  // Okay
char   * pc5      = &p4;  // n of these
                      // Compile error
char   * pc6      = &p4;  // n of these
                      // Compile error
char  ** ppc7     = &p3   // n of these
                      // Compile error
char (** ppc8)[1] = &pc8; // does not make sense
For those who might have been confused by the description of ppc8, let me provide a few simpler examples.
char *array_p_char[NUMBER_OF_STRINGS];
In this statement array_p_char is an array containing NUMBER_OF_STRINGS elements. What is each element in array_p_char? It is a pointer. What does the pointer point to? It points to a char.
If you add parentheses in the right place, as in:
char *(array_p_char[NUMBER_OF_STRINGS]);
then the meaning stays the same. (The resulting data structure is shown in Figure 2a. ) If you add them in another place, then the meaning changes (see Figure 2b) .
char (*p_array_char)[SIZE_OF_STRING];
I've changed the variable names here to reflect the new meaning. First and foremost p_array_char is a pointer. What does it point to? It points to arrays of length SIZE_OF_STRING elements. What is each element in such an array? Each element is a char. What would such a pointer be used to point to? Data objects which are compatible. For example:
char arrays_of_chars[NUMBER_OF_STRINGS][SIZE_OF_STRING];
p_array_char = &arrays_of_chars[0];
I use the address operator on the first element, rather than the name of the array to emphasize the correspondence. If you increment p_array_char, you will point to the next element in the array, i.e. arrays_of_chars[1].
If you need a variable that points to p_array_char, you would declare and initialize it as:
char (**p_p_array_char)[SIZE_OF_STRING] =
   &p_p_array_char;
Now you can see why managers get gray hairs when they try to read C code. I strongly recommend avoiding variables with more than one level of indirection. You rarely need those kind of variables anyway. Normally you can eliminate them by using inline functions or macros.

Templates
In a recent C++ class, my students asked a number of questions regarding templates. Although templates are part of the C++ draft standard, not all vendors support them. Templates are useful for creating packages of functions that support multiple data types; with templates, you don't have to resort to void pointers or similar items. To demonstrate the value of templates, I first show how you use a common C library function, qsort, without the benefit of templates. Then I show how to use templates to get an equivalent function but with simpler code.
qsort is declared as follows:

qsort(void * array, size_t number_elements, size_t size_of_element, int (*compare_function)(const void *,const void *));
To use qsort with an integer array, you need to supply a comparison function, such as:

int int_compare(const void * first, const void * second) { if ( * (int *) first < * (int *) second) return -1; else ( * (int *) first == * (int *) second) return 0; else return 1; }
You call the function within the argument list to qsort as in:

#define SIZE_OF_ARRAY 10; int int_array[SIZE_OF_ARRAY]; qsort(int_array, SIZE_OF_ARRAY, sizeof(int), int_compare);
(In the next column, I'll examine some of alternative function declarations. for the comparison function.)
Moving to C++, you can declare a class such as Date, for example:

class Date { public: compare(Date &); ... };
and a compare function such as:

Date_compare(const void *, const void *) { return (Date *) first -> compare( * (Date *) second); }
You could then call the qsort function as follows:

#define SIZE_OF_ARRAY 10; Date date_array[SIZE_OF_ARRAY]; qsort(date_array, SIZE_OF_ARRAY, sizeof(Date), Date_compare);
Now to simplify things you can declare the sort function as a template, as in:

template <class Type> sort(Type *array, size_t size) { // Definition of function // comparisons using Type::compare(Type &); }
You can now call this function with the following:

sort(date_array, SIZE_OF_ARRAY);
The template must appear in the source file in which this statement is used. The compiler will generate a corresponding function definition using Date as the value of Type.
Inside sort, the sizeof an element is not needed, because it is compiled for the type Date. To use this sort function, you must provide a compare member function for every type of class you attempt to sort.
What happens if you use this template in two source files that get linked together into a program? It is possible that the same function (e.g. sort<Date>) has been generated in both files. In this case, the linker better be smart. The linker must detect that both functions were generated from templates, use only one function, and not produce an error message. Alternatively, keep all template definitions in one and only one source file. Then the linker does not have to be so smart.

Using Macros for Practice
Using macros can help you understand templates, since they are somewhat similar in concept. For example, suppose you have two source files, sort.htp, and sort.ctp. The sort.htp contains

#define glue(a,b) a##b #define sort_function_name(class_name) \ glue(class_name,_sort) sort_function_name(class_name * array, unsigned_int size);
The sort.ctp file contains

#include "sort.htp" sort_function_name(class_name * array, unsigned_int size) { // Definition for sort }
To use the sort function for Dates, you can create two source files, datesort.h and datesort.cpp. Let datesort.h contain

#undef class_name #define class_name Date #include "date.h" #include sort.htp
and let datesort.cpp contain

#include "datesort.h" #include sort.ctp
You will eventually compile datesort.cpp and link it into your program. In these source files that call one of the sort functions, you simply include the datesort.h at the top, as in:

#include "datesort.h" #define SIZE_OF_ARRAY 10; Date date_array[SIZE_OF_ARRAY]; Date_sort(date_array, SIZE_OF_ARRAY);

Templates vs. Macros
You can use this macro technique with classes, such as containers. For example, instead of defining a template, such as

// File array.h template <class Type> class Array { // Definition of member functions Type get_member (int index); ... }; // All class definitions
and including it into another file, as in:

#include "array.h" Array<int> my_array;
You could set up the macro files as previously shown and use:

#include "intarray.h" int_Array my_array;
What are the advantages of macros over templates, if your current compiler supports templates? Not many. The major advantage is portability to compilers that don't support templates. The macro version of sort hides the implementation from the user. If you decide to change how sort works, you only need to compile a single file. Depending on how you design the template version of sort, you may have to compile all files which use the template. The major disadvantage to macros is having to create the header and implementation files for each class that uses a sort function.
What are the advantages of using a sort template over the generic qsort function? Type checking. And that's a big plus for C++.