June 1994/Questions & Answers

Columns

Questions & Answers

Moving from FORTRAN to C

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates; teaches C and C++ language courses for corporations. He is the author of All On C, C for COBOL Programmers, and UNIX for MS-DOS Users, and was a member of the ANSI C committee. He also does custom C/C++ programming and provides SystemArchitectonicssm services. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
Q
I am new to C programming after being a Fortran programmer for 25 years.
I am interested in writing nested for loops in C, since DO loops can be nested in Fortran. I have been successful as the attached example (Listing 1) seems to indicate. My major problem, the example not withstanding, is that I am not certain of the scope of the braces. I have written the loop several ways with different results. I have checked the Annotated ANSI C Standard by H. Schildt and find no discussion of the action of the scope of braces on the for statement's execution.
I would appreciate any help. I enjoy reading your column.
Jason Sachs
Lakewood, NJ
A
I used to program a fair bit in FORTRAN. I always liked it better than BASIC. If FORTRAN had had structures when I was using it (in the 70s), I might never have switched to C. Before I answer your question, I will first give an example of loops written in both FORTRAN and C. Suppose you want the sum of integers from 1 to 50. In FORTRAN, you could write this as either:
   ISUM = 0
   DO 10 I = 1, 50
   ISUM = ISUM + I
10 CONTINUE
or
   DO 10 I = 1, 50
10 ISUM = ISUM + I
In the second example, the statement label which terminates the DO loop appears on an executable line of code, so that line of code is included in the loop. I always preferred the style shown in the first example. The appearance of the statement label after the loop (with CONTINUE) acted as an "ENDDO" in my mind. The corresponding loop in C could look either like the following:
int i, sum;
sum = 0;
for (i = 1; i <= 50; i++)
   {
   sum= sum + i;
   }
or like this:
int i, sum;
sum = 0;
for (i = 1; i <= 50; i++)
   sum = sum + i;
Now to answer your question about the scope of for loops: The for statement always controls a single statement. The controlled statement can either be a compound statement, as in the first example, or a simple statement, such as in the second example. You can use a compound statement (a set of statements enclosed in braces) anywhere you use a simple statement; thus a compound statement will also serve as the body of the while, if, and do-while statements. I strongly prefer the compound statement for both appearance and consistency. You can also rearrange the for loop and replace the controlled statement with a null statement. A null statement, as the name implies, contains no code. In the following for loop, the semicolon immediately following the right parenthesis terminates a null statement.
int i, sum;
sum = 0;
for (i = 1, sum = 0; i <= 50; sum = sum + i, i++);
Note that you can combine expressions in the for control portion with the comma operator, as shown by the sequences i=1, sum = 0, and sum = sum+i, i+t. I find this version potentially confusing, especially for someone coming from FORTRAN, but you will see it in some textbooks. If you write a for loop that controls a null statement, I suggest using extra braces, as in:
for (i = 1, sum = 0; i <= 50; sum = sum + i, i++)
   {
   ;
   }
In this instance, the for statement controls a compound statement which consists of the null statement. This form typographically emphasizes that the real work is being performed in the control portion of the for statement.

Nesting Loops
A nested loop in FORTRAN might look like either

ISUM = 0 DO 20 J = 1, 2 DO 10 I = 1, 50 ISUM = ISUM + I 10 CONTINUE 20 CONTINUE
or

ISUM = 0 DO 10 J = 1, 2 DO 10 I = 1, 50 ISUM = ISUM + I 10 CONTINUE
or

DO 10 J = 1, 2 DO 10 I = 1, 50 10 ISUM = ISUM + I
Each of these fragments executes the inner loop 50 times and the outer loop twice. For the same reasons as before, I prefer the first version. In C, you could code these nested loops as either:

int i, j, sum; sum = 0; for (j = 1; j <= 2; j++) { for (i=1; i <= 50; i++) { sum = sum + i; } }
or

int i, j, sum; sum = 0; for (j = 1; j <= 2; j++) { for (i=1; i <= 50; i++) sum = sum + i; }
or

int i, j, sum; sum = 0; for (j = 1; j <= 2; j++) for (i=1; i <= 50; i++) { sum = sum + i; }
or

int i, j, sum; sum = 0; for (j = 1; j <= 2; j++) for (i=1; i <= 50; i++) sum = sum + i;
In the first two examples, the initial (outer) for statement controls a compound statment. In the last two examples, the initial for statement controls another for statement. I prefer the first example's style for clarity and consistency.
Your listing does not need that last set of braces (which enclose the second printf). In keeping with my previous suggestions, I would rewrite your example as shown in Listing 2. Note that the bracing and indentation more clearly reflect the flow of the loops.
You may see code with pairs of braces that are aligned underneath the controlling statement, as opposed to their being aligned with the controlled statements. According to my informal surveys of C programmers, there is an approximately equal split between the "innies" and the "outies." However, I've encountered only a small minority of programmers who prefer the Kernighan & Ritchie style of placing the opening brace on the same line as the controlling statement.

Pointers
Q
In the C class I took from you, you emphasized not using pointers. Every C book I read seems to spend a good deal of time on pointers. Can you give me some examples of when I should use pointers and when I should not?
Dan Williams
New York, NY
A
I estimate that pointer misuse and abuse is the source of well over half of the tough bugs in C programs. Therefore, I suggest that programmers use pointers as little as possible.
Let me review three occasions when it is either desirable and/or necessary to use pointers or pointer notation. The first occasion is when you implement a call by reference. The second is when you access allocated memory. The third is when you want to speed up your code. This month, I'll describe the first occasion.

Call by Reference
Suppose you write a C program which declares and initializes a certain variable v. Later, that program calls a function f, with v as a parameter. If f needs to change the contents of v (and not just change its local copy of v) you must use a makeshift form of call by reference. In other languages, such as Pascal, calling by reference means that the compiler implicitly passes an argument's address to the function. The function can "get at" the argument because it knows the argument's address. On the other hand, C employs only call by value. C functions get only a copy of the argument's value — the compiler does not provide the address. So to simulate a call by reference in C, you must explicitly pass the address of a variable. (To avoid confusing this concept with the C++ reference type, I will refer to this as call by address.) The following code uses call by address:

function_changing_parameter(int * p_change); /* prototype */ int variable_to_change; int i; i = function_changing_parameter(&variable_to_change);
When a function uses an address (i.e., pointer) parameter, I usually like to create a local variable within the function to associate with that parameter. If you use a number of pointer parameters, local variables can make expressions appear simpler by eliminating extra indirection (dereferencing) operations. At the beginning of the function I dereference the passed address and store a copy of the dereferenced value in the local variable. Within the function I manipulate the contents of the local variable, rather than repeatedly dereference the funcion's parameter. At the end of the function I store the contents of the local variable back into the location specified by the parameter. To illustrate, let me first show how the above function might typically be coded:

int function_changing_parameter(int * p_change) { int return_value; int i, j; i = (*p_change * 10) / (*p_change + 3); /* Other manipulations with *p_change */ ... *p_change = j * 10; return return_value; }
Using a local variable for the parameter would make the function look like:

int function_changing_parameter(int * p_change) { int return_value; int i, j; int local_change = *p_change; i = (local_change * 10) / (local_change + 3); /* Other manipulations with local_change */ ... *p_change = local_change; return return_value; }

Caveats
Be careful when you declare function parameters as pointers. Pointer parameters allow you to do things inside of functions that you shouldn't. For example, consider the following function:

void a_function(int * p_first, const int * p_second) { /* What you can do */ *p_first = *p_second; /* What you cannot do */ p_second = p_first; /* What you should not do */ p_first = p_second; }
As the comments note, you can accidentally assign the value of one pointer to another pointer. To avoid this potential error, make the pointers themselves const. For example, you could rewrite the function as follows:

void a_function(int * const p_first, const int * const p_second) { /* What you can do */ *p_first = *p_second; /* What you cannot do */ p_second = p_first; p_first = p_second; }

Hiding Ugly Pointers
You can eliminate the appearance of pointer parameters in many cases by using array notation. The array notation for a parameter is a pointer in disguise. I have found in teaching C that students easily understand the array notation for parameters, but have difficulties comprehending the corresponding pointer notation. An array parameter can be naturally interpreted as a call by reference without involving any additional operators. I contend that given a choice of equivalent forms, the one that is easier to understand will cause fewer errors.
For example, suppose you had a function that zeroed the elements in an array. It might be called as in the following:

#define SIZE_MY_ARRAY 10 int my_array[SIZE_MY_ARRAY]; zero_an_array(my_array, SIZE_MY_ARRAY);
The corresponding function prototype could look like either:

void zero_an_array(int array[], int size_of_array);
or

void zero_an_array(int * array, int size_of_array);
In the ANSI standards process some participants expressed opinions that array notation should be reserved for passing an array by value. I can sympathize with this idea. It is consistent with the way other data types are passed. If C ever adopted passing of arrays by value, a tremendous amount of code would have to be changed — someone would have to do a global search and replace of all those square brackets. I do not believe this is going to happen. Therefore I prefer to use array notation when the parameter is really a reference to an array and pointer notation only if the parameter represents the address of a single data element.

What About C++ References?
When we switched from older languages to C, we lost something: a simple, clean way to do call by reference. C++ restores this capability, although not in exactly the same way as say, FORTRAN. To pass parameters by reference in C++, you can use a bona fide reference parameter, rather than a pointer. For example, the calling program might include the following:

int function_changing_parameter(int & change); int variable_to_change; int i; i = function_changing__parameter(variable_to_change);
The C++function would look like:

int function_changing_parameter(int & change) { int return_value; int i; i = (change * 10) / (change + 3); /* Other manipulations with change */ change = i; return return_value; }
This function parallels the previous C function:

void a_function(int & first, const int & second) { /* What you can do */ first = second; /* What you cannot do */ second = first; }
C++ reference parameters provide fewer ways for you to go wrong since you don't use the indirection operator in accessing the parameters.