July 1994/Questions & Answers

Columns

Questions & Answers

When to Use Pointers

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of All On C, C for COBOL Programmers, and UNIX for MS-DOS Users, and was a member of the ANSI C committee. He also does custom C/C++ programming and provides SystemArchitectonicssm services. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
Last month, a reader asked a question concerning when to use and when not to use pointers. In that issue, I listed three operations that call for use of pointers: calling by reference (calling by address), increasing speed of execution, and accessing allocated memory. I presented several examples of using pointers as function parameters to implement call by reference. This month, I'll continue with my answer, by covering the other two operations requiring pointers, increasing execution speed and accessing allocated memory.

Increasing Execution Speed
Last month I suggested using array notation for parameters which are arrays.
For example:

void zero_an_array(int array[], int size_of_array);
Note that the declaration of array with array notation does not prevent me from accessing array with pointer notation in the body of the function. In my initial design I normally code the body of a function with an array parameter and an index. For example:

void zero_an_array(int array[], int size_of_array) { register int i; for (i = 0; i < size_of_array; i++) { array[i] = 0; } return; }
Only if the function is too slow do I consider rewriting it using pointers. Since a parameter which uses array syntax is the same as a pointer, I don't need to change the function's header. For example, I could rewrite the function as follows.

void zero_an_array(int array[], int size_of_array) { register int i; for (i = 0; i < size_of_array; i++) { *array = 0; array++; } return; }
This form eliminates the index computation and replaces it with pointer arithmetic, which is usually faster. To make this function even faster, I might write:

void zero_an_array(register int array[], register int size_of_array) { while (size_of_array--) { *array++ = 0; } return; }
The register keyword indicates that you want the parameter to be retained in fast register which the funciton executes, if possible.
If I am still concerned with performance, I can eliminate the overhead of calling a function by turning this operation into a macro, as in the following:

#define zero_an_array(array, size_of_array) \ { \ register int i = size_of_array; \ register int * p_array = array; \ while (i--) \ { \ *p_array++ = 0; \ } \ }
As a general rule, I do not code specifically for speed. Careful program design can affect execution time as much as playing games with pointers. With the use of modern computers, the speed advantage of using pointers over using array indices is decreasing. In many algorithms, the time spent in other forms of calculations overwhelms the time spent in index calculations.
In a recent CUJ article Dwayne Phillips reviewed the book C Elements of Style by Steve Qualline (CUJ, March 1994, p. 115). This book suggested that the following form:

*destination = *source; destination++; source++;
is more readable than this form:

*destination++ = *source++;
I feel that this style issue is becoming obsolete. You can replace the above forms with strcpy(), or memcpy(), or other equivalent functions, which are probably as fast, if not faster. For example, the function above could be coded as:

#define zero_an_array(array, size_of_array) \ memset(array, 0, size_of_array)

Accessing Allocated Memory
You must use pointers to access allocated memory. That's because the allocation functions return only the address of a requested block of memory. If I need to allocate only a single structure, then I use a pointer to reference it, as in the following example:

struct s_something { int a_member; ... }; struct s_something *p_something; p_something = calloc(sizeof(struct s_something), 1); ... p_something->a_member = 0;
Note that with structure pointers, you can use the pointer to member operator (->), so the indirection operator does not appear in the code. (*p_something).a_member would look a little clumsy anyway.
If my design calls for an array of data objects, I typically start with a declared array. For example:

#define SIZE_ARRAY 100 int an_array[SIZE_ARRAY]; //... int i; for (i = 0; i < SIZE_ARRAY; i++) { an_array = i; }
To transform the array from automatic or global memory to allocated memory, I just change the way an_array is declared, and perform an allocation and deallocation. None of the references to the variable have to change. For example:

#define SIZE_ARRAY 100 int *an_array; int i; an_array = calloc(sizeof(int), SIZE_ARRAY); //... for (i = 0; i < SIZE_ARRAY; i++) { an_array[i] = i; } ... free(an_array);
For some C wizards, this intentional avoidance of * in code may seem un-C-like. I have found in teaching and using the language that using syntax more closely equivalent to other languages cuts down on coding mistakes and maintenance time. As I tell my classes, pointers are like fire. If you know what you're doing with them, you can really cook. If you don't, you're gonna get burned.

Floating-Point Precision Problems
Q
Hi Ken. You may or may not remember me from a few years back. You were teaching a private C class for Chubb Life in Chattanooga, TN.
I have had some programmers (my users) complain that float/double math causes them some headaches when working with dollars and cents calculations. I seem to remember your writing an article in CUJ about this problem. You know, the one where a calculation that should render an answer of, say, 1.342989 comes out to be something like 1.342988993. These guys come from the mainframe world where packed decimal arithmetic doesn't give them this problem. You can imagine the flack I take about this. I would be eternally grateful if you could reveal the secret or at least point me to a book/magazine/scroll/etc. that has the answer.
Many thanks.
John Lee
Chattanooga TN
A
Thank you for remembering me. For the benefit of readers who've never worked on mainframes, I should probably explain that the packed decimal representation is more commonly available on these machines mainly because of the languages they support, particularly COBOL. Your problem comes from differences in the way fractions are represented in binary and decimal systems. If all you dealt with were whole numbers (that are not too big), the problem would not arise. Using fractions in programs presents two difficulties: The first difficulty is that no representation can store all possible fractions with complete accuracy; a particular representation will round or truncate some fractions and store others without error. To make matters worse, the set of "accurately representable" fractions will be different for two different systems of representation. The second, related, difficulty is that calculations performed in each representation system will give different results due to each system's unique method of rounding or truncation.
COBOL typically stores numbers and performs computations in PIC format, which is a form of decimal or packed decimal. Each byte or half-byte stores a decimal digit. Floating-point (COMP-2) numbers store values using binary representation. A decimal fraction such as 0.123 cannot be represented exactly in binary floating point.
If you performed all your calculations with COMP-2 variables rather than PIC variables, you would get the same end-of-fraction differences that you get using double variables in C.
In complex calculations the final result of using the above two forms may differ in the last digit. But even two different decimal arithmetic systems can produce inconsistencies for the same calculations if you used different rounding techniques in each or different numbers of fractional digits for intermediate results.
There are a number of ways of dealing with these problems. One approach is to write a set of floating-point routines which perform rounding of numbers at a given decimal digit. An example of a routine that converts a double to a string is shown in Listing 1.
Alternatively, you could buy or create a set of routines that operated on numbers represented by BCD or ASCII strings. You could do this in C, but the functions might be easier to use if they were part of a C++ class. Here's a sample header for such a class:
class Rounded_number
   {
public:
   // Constructors
   enum Round_type { ROUND_UP, TRUNCATE};
   Rounded_number(double value, int decimal_digits, Round_type round_type);
   Rounder_number(String value, int decimal_digits, Round_type round_type);
   // Operators
   Rounded_number operator + ( Rounded_number & one, Rounded_number & two);
   Rounded_number operator = ( Rounded_number & other);
   // Conversions
   to_char_array(char output[]);
   //...
   };
The usage of this class might look like:
a_function()
   {
   Rounded_number n1(3.2, 1, ROUND_UP);
   Rounded_number n2("6.43", 2, TRUNCATE);
   Rounded_number n3(6.4, 1 ROUND_UP);
   n3 = n1 + n2;
   char out[100];
   n3.to_char_array(out);
   }
The enum Round_type would determine how to store a value into the variable when the assignment statement was employed. Using intermediate variables with the same number of fractional digits as used with COBOL PIC variables should help match the COBOL computations.

Feedback on Pointers
In a previous column, I presented a question from Hans Salvisberg regarding pointers and the new operator in C++. In particular, two examples he asked about compiled without error. These were:

char * pc5 = new (char*)[n]; /* Case 5 */ char * pc6 = new (char(*))[n]; /* Case 6 */
The compiler's lack of complaint seems to imply that new returns char *. I expected it to return a char **, which would have pointed to memory big enough for n pointers to char. P.J. Plauger suggested that the semantics of parentheses had not been completely determined by the standards committee.
Some readers have sent in comments. Aarne Y Rotiala of Espoo, Finland writes:
"Well, I'll say that the compiler is simply broken. I tried to repeat the (very suspicious) allocation. Here's what I tried on a Sun Sparc 10, running SunOS4.1:

#include <stdio.h> int main( int argc, char * argv[]) { char *p0, * p1, * p2, * p3, * p4; p0 = new char [10]; p1 = new char * [10]; p2 = new (char (*)) [10]; p3 = new (char *) [10]; p4 = new (char (*)) [10]; // empty body }
It returned these messages:

alloc.cc: In function 'int main(int, char **)': alloc.cc:7: assignment between incompatible pointer types alloc.cc:8: warning: ANSI C++ forbids array dimensions with parenthesized type alloc.cc:8: assignment between incompatible pointer types alloc.cc:9: warning: ANSI C++ forbids array dimensions with parenthesized type alloc.cc:9: assignment between incompatible pointer types alloc.cc:10: warning: ANSI C++ forbids array dimensions with parenthesized type alloc.cc:10: assignment between incompatible pointer types
So, g++, in which I usually trust, especially when splitting hairs about the ANSI standard, agrees with you — the text should not compile."
Burt Smith of Jeffersonville, PA writes:
"I'm working with Sun C++ 2.1 (old stuff, I know), and case #6 won't compile for me — not that it is particularly relevant, since it is essentially the same as #5.
Anyway, what might clear this up is (ulp!) another set of parentheses:

char* pc5 = (new (char*))[n]; char* pc6 = (new (char(*))[n];
What appears to be happening is that the compiler is allocating a single pointer to char, but then indexing into the one-deep array of pointers to the nth element. This may compile, but it sure wouldn't run too well.
This is how I think the compilers are interpreting it. Arguably, this may be an incorrect interpretation on the compiler's part; I would think that the following would be more appropriate:

char* pc5a = new ((char*)[n]); // Error: should be char** pc5a. char* pc6a = new ((char(*))[n]); //Error: should be char** pc5b.
(Incidentally, char* pc5a = new ((char*)[n]); will not compile for me either, but char* pc5a = new (char*[n]); will.)
Without any parentheses the following is incorrect:

char* pc5b = new char*[n]; // Error! char* pc6b = new char(*)[n]; // Error? I can't test this one.
Basically, this is dependent upon how the compiler deals with the interaction of the vector new vs. regular new, and what the precedence is. Obviously my first example caused the compiler to see the new operator as having higher precedence than the [] operator, while without parentheses, the [] had a higher precedence, causing vector new to be called. Go figure."
I thank Aarne and Burt for their replies. I would agree with g++ and designate such constructions as invalid. A technical explanation for accepting the construction in the standard may exist. But if several programmers cannot determine what the declaration really means without a lot of deep thought, then I would prefer disallowing it in a company standard.