October 1995/Stepping Up To C++

Columns

Stepping Up To C++

Style and Syntax

Dan Saks

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
Programming style is a very broad topic. You can approach it from many different angles. For many of us, our initial forays into programming style tended to focus on its more superficial aspects, such as indenting and spacing between tokens. But as we matured, most of us came to realize that the surface appearance of a piece of code is not necessarily an accurate indicator of its overall quality. The component-level architecture of a piece of code, particularly its use of types and functional abstractions, probably has a much greater impact on its overall quality. The best works on C++ programming style, such as [1] and [2], dwell primarily on these architectural issues.
Nonetheless, even as we acquire a more comprehensive view of style, those of us who like to think we have a sense of aesthetics still have trouble ignoring code layout (indenting and spacing) as a first-order measure of code quality. Obviously, if the code is indented the way I think it should be, then it must be pretty good, right? And if it's indented in some other way then it's probably guilty of a host of other less obvious programming sins. We rarely say such things aloud, but we can't help but think them, if only occasionally.
How is it that we latch on to our preferred layout style? For, some programmers, it's simply a matter of imprinting. Just as a baby duck fixes its attention on the first thing it sees when it emerges from its shell, some programmers become attached to the first indenting style they see. Others with more inquiring minds experiment with different indenting styles until they finally hit upon the "right" way to lay out code. However we acquire them, most of us settle into habits that are hard to break.
Why should you want to break your habits? If you work alone all the time, then you have no need. But if you work with others, you'll probably find it in your own interest to subordinate some of your aesthetic senses for the greater good of the team. In a team project, choosing some consistent layout style is much more important than choosing any one particular layout style. The team members should try to agree on a sufficiently common set of layout rules so that each member can comfortably read and maintain other team members' code without feeling compelled to reformat it.
So, if layout style is not all that important, why bother writing about it? Well, let's face it, it's fun. After writing about standards deliberations for the last six or seven months, I'm in the mood for lighter fare. Also, recent conversations with some of my students and colleagues suggest to me that I might actually have something insightful to say about the subject.
My concern is that, even if the programmers on a given project think they agree on a particular set of layout rules, deep down, they probably don't. The programmers may follow the letter of the law, but they often fail to grasp its spirit, because no one ever states what that spirit is.
Every specification I've seen for a set of indenting rules (including the ones in my own book [3]) simply presents the rules by example. If the examples spell out enough common cases, the programmers on a team should be able to apply these examples to write code in a reasonably common format. However, I suspect many such programmers still view each layout example as a distinct rule, rather than as an example of a unified view of language structure.
What's missing from indenting rules by example is a description of how the rules derive from the syntactic structure of the programming language. What I want to show is that any consistent set of indenting rules can be expressed as a short set of underlying rules rooted in the language syntax. Moreover, I believe that each different set of indenting conventions stems from a fundamentally different view of that syntax.
In discussing the different indenting styles, I'll try to be as even-handed as I can, but some of my biases will no doubt show through. Obviously, I have a preferred indenting style, and I think I have a rational basis for my preference. I find other styles acceptable, but the arguments in their favor are less compelling to me. Many of my preferences stem from my strong preference for simple, straightforward rules over more complicated rules. I dislike special cases, and I try to avoid them.

The Structure of Flow Control Statements
With respect to flow control statements, mainstream programming languages employ either of two basic syntactic forms. Languages such as C and C++ use the compound statement form, in which each flow control statement controls what appears in the grammar as only a single statement. However, you can replace any single statement by a sequence of bracketed statements called a compound-statement.
For example, the grammar rule for an if-else statement in both C and C++ is

if-else-statement: if (expression) statement else statement
Here, statement is a grammatical construct representing any of a variety of statements, including expression-statements (assignments and function calls), iteration-statements (for and while statements) and selection-statements (if and switch statements). You can place more than one simple statement under the control of the if-part or the else-part by simply wrapping those statements inside a pair of curly braces to make a single compound statement. The C++ grammar defines a compound-statement as:

compound-statement: {statement-seqopt} statement-seq: statement statement-seq statement
The C standard uses slightly different names for the grammar symbols, but the rules are effectively the same.
The compound statement form originated with Algol 60. Algol 60 uses the keywords begin and end instead of C's braces. Pascal also uses begin and end; PL/I uses DO and END.
Languages such as Ada, Fortran, and Modula-2 use self-delimiting control flow statements rather than compound statements. Using this approach, each flow control statement controls not just one statement but a sequence of statements delimited by a symbol (or symbols) belonging to the flow control statement itself.
For example, an Ada if-else statement has the general form:

if expression then statement-seq else statement-seq end if ;
Each Ada flow control statement has its own unique delimiter. For example, loop statements end with end loop and case statements end with end case. Algol 68 uses the opening delimiter spelled backwards as the closing delimiter. For example, Algol 68 if statements end with fi, do statements end with od, and case statements end with esac. Some of you may recognize these cutesy spellings from the UNIX Korn Shell.
In contrast to the languages mentioned above, Modula-2 uses a single keyword, END, to terminate its CASE, IF, FOR, WHILE, and WITH statements. (Modula-2 keywords must be in uppercase. That's one of the reasons Modula-2 is so popular.)
No matter how you spell them, in these languages each closing delimiter is part of the enclosing flow control statement. You must write the closing delimiter even if the enclosed statement sequence has only one statement.
It's not my intent to debate the relative merits of languages designed using compound-statements compared with languages that use self-delimiting control statements. You can find such discussions in [4] as well as other books on programming language design. Nevertheless, some comparisons will be inevitable in the following discussion.

Indenting Flow Control Statements
The compound statement forms seem to invite a wider variation in indenting styles than do the self-delimiting forms. People just can't seem to agree on where to place the compound-statement delimiters (curly braces or begin-end pairs). The self-delimiting forms don't use compound-statement delimiters, so the problem just never arises.
For example, most Ada programmers seem to follow the pattern

if expression then statement; statement; ... else statement; statement; ... end if;
for if-else statements, and

while expression loop statement; statement; ... end loop;
for while-loop statements. The Ada syntax also allows infinite loops of the form:

loop statement; statement; ... end loop;
Thus, some programmers prefer to pair end loop with loop rather than with while, so they layout their while-loops as

while expression loop statement; statement; ... end loop;
If you use this style, I believe it follows that you should also place the keyword then on a separate line, as in

if expression then statement; statement; ... else statement; statement; ... end if;
Listing 1 and Listing 2 summarize these two alternative indenting styles for self-delimiting flow structures. Now, with this background in mind, let's look at some alternative indenting styles for C and C++.

The K & R Style
One popular indenting style comes from Kernighan and Ritchie (K & R), and was used in their best-selling description of C[5]. Listing 3 shows examples illustrating the K & R style. Straker[6] also calls this the "Kernel" style.
I believe the underlying philosophy of this style is to "get the braces out of the way" and let the indenting speak for itself. I must confess I have a little trouble with this view, because it runs contrary to the true syntactic structure of C. This style suggests that C should have been designed using self-delimiting flow structures rather than compound statements. If you really believe this, then why not go all the way and define macros to make C look like it has self-delimiting flow structures?
For example, if you define
#define IF if (
#define THEN ) {
#define ELSE } else {
#define ENDIF }
then you can write C code that looks like
IF expression THEN
   statement;
   statement;
   ...
ELSE
   statement;
   statement;
   ...
ENDIF
I'm not suggesting that you use macros to reinvent the language. On the contrary, it's a very bad idea. It invites even more diversity in layout styles and more disputes over aesthetics. However, I think such macros are a short logical step from the K & R style, and that's why I have misgivings about this style.
If you really do favor this style, you should at least acknowledge that it treats the curly braces as if they were part of the enclosing flow structure, and not part of the enclosed compound statement. Thus, it follows that you should always use curly braces with every flow control statement, even if the enclosed statement sequence has only one statement in it.

The Whitesmiths Style
In contrast to the K & R style, the Whitesmiths style treats the curly braces as part of the compound statement (which indeed they are). This style gets it name from a company that pioneered the C compiler business. Plum [7] dubbed it the "indented" style to contrast it with the "exdented" style (described later). Although Plum and I [3] both referred to it as the indented style, I now prefer calling it the Whitesmiths style.
The essense of the Whitesmiths style is to place the opening and closing braces, as well as each statement of a compound statement, on separate lines, at the same indentation level a single statement would receive in the same context. Thus, the layout for an if statement enclosing a single statement is:
if ( expression )
   statement;
and the layout for an if statement enclosing a compound statement is simply:
if ( expression )
   {
   statement;
   statement;
   ...
   }
Listing 4 shows other examples of this style.
I must admit that the Whitesmiths style is my favorite. It's conceptually simple and meshes well with the grammatical structure of the language. Best of all, you can describe it pretty rigorously with only a few rules:
1. Put the opening and closing brace and each statement of a compound statement on separate lines and at the same indentation. For example,
{
statement;
statement;
...
}
2. In each flow control statement, begin each consecutive non-statement phrase on a new line, at the same indentation level. For example, in the grammar for an if-else statement
if ( expr ) stmt else stmt
place
if ( expr )
and
else
on a separate line at the same indentation level. (I use expr and stmt here as abbreviations for expression and statement.)
3. Indent the nested statement(s) with respect to the non-statement phrases of the enclosing flow control statement.
That's it.
In all fairness, a formal description of the K & R style (with explicit braces everywhere) requires only a few changes and one additional rule (rule 0):
0. Rewrite the grammar rules for every statement containing a nested statement by replacing each nested statement with the right-hand side of the rule for a compound statement. For example, rewrite
if ( expr ) stmt else stmt
as
if (expr) { stmt-seq } else { stmt-seq }
1. Put each statement of a statement sequence on a separate line and at the same indentation level. For example,
statement ;
statement ;
...
2. In each flow control statement, begin each consecutive non-statement phrase on a new line at the same indentation level. For example, in the (augmented) grammar for an if-else statement
if ( expr ) {stmt-seq} else {stmt-seq}
place
if ( expr ) {
and
} else {
and
}
on a separate line at the same indentation level.
3. Indent the statement(s) in each statement sequence with respect to the non-statement phrases of the enclosing flow control statement.
When you uniformly insert curly braces around every nested statement, the K & R style rules are not much more complicated than the Whitesmiths style rules. However, many K & R adherents (including K & R) omit the braces around single nested statements. This just complicates the rules.

The do-while
One advantage of the Whitesmiths style over the K & R style is that it lines up matching curly braces at the same indentation level. For many programmers, this makes visual inspection for mismatching braces much easier.
On the other hand, the Whitesmiths style may confuse readers looking at do-while statements. If you rigorously apply the Whitesmiths style to the do-while statement, you get the following layout:

do { statement ; statement ; ... } while ( expression ) ;
Someone reading this code may mistake the while at the end of the statement for a separate while statement with an empty body. Some adherents to the Whitesmiths style simply treat the do-while as a special case and write

do { statement ; statement ; ... } while ( expression ) ;
This is what Plum and I [3] suggested, but I don't like it. (Tom wrote that part.)
This is arguably a flaw in the C language. Had C used a distinct keyword, such as until, the Whitesmiths style would not have this problem. Of course, you can correct that flaw with a macro:
#define until(e) while (!(e))
but again, I don't recommend opening that can of worms.
The peculiar problem of the do-while statement is actually an argument in favor of the K & R style, which lays out the statement as
do {
   statement ;
   statement ;
   ...
} while ( expression ) ;
The fact is, do-while statements are relatively rare. I haven't run into any problems applying the usual Whitesmiths style to the do-while, so I don't treat it specially.

The Allman Style
Yet another popular style is essentially the same as the Whitesmiths style, but with the braces at the indentation level of the enclosing flow control statement, rather than at the level of the enclosed statement sequence. For instance, this style applied to an if statement looks like

if ( expression ) { statement ; statement ; .... }
Plum [7] called this the "exdented" style, and so did Plum and I [3]. Straker [6] calls this the Allman style. Even though I haven't the foggiest notion where that name comes from, I'll use it anyway. Listing 5 shows some other examples indented according to the Allman style.
On the surface, the Allman style has essentially the same strengths and weaknesses as the Whitesmiths style. The Allman style lines up the braces just like the Whitesmiths style, and it has the same trouble coping with the do-while statement.
For most people, the difference between these two styles is purely a matter of aesthetics. Personally, I prefer the freedom to omit curly braces around single statements, but this gives Allman style indenting an irregular appearance. For example, adding a second statement between the if and else alters the appearance of an if-statement rather markedly, from
if ( expression )
   statement;
else
   statement;
to
if ( expression )
{
   statement;
   statement;
}
else
   statement;
I suspect that the Allman style works best if you always include the curly braces, even when they enclose only one statement. But, if that's the case, then here again we have an indenting style predicated on the belief that the syntactic structure of the C language is wrong. Sure, C has its flaws (and indeed, so does C++), but I don't think this is one of them.

Labeled Statements
None of the examples I've shown so far illustrate how any of the styles, K & R, Whitesmiths, or Allman, handle switch statements. None of these styles requires additional rules to handle the outermost level of a switch statement.
The grammar rule for a switch statement is simply:

switch-statement: switch ( expression ) statement
Applying the usual rules for the K & R style, the layout is

switch ( expression ) { statement; statement; ... }
With the Whitesmiths style, the layout becomes

switch ( expression ) { statement; statement; ... }
What's missing is a rule for where to place the labels that appear inside the switch statement. All three styles employ the same rule, which is pretty simple once you understand the grammar for labeled statements:

labeled-statement: identifier: statement case constant-expression : statement default : statement
The grammar shows that the case labels (including default) are just statement labels. C and C++ impose a separate constraint that these labels can only appear inside switch statements, but they're still just labels. Thus, all you need to lay out labels is the following rule:
4. Place the label (everything up to the colon) of a labeled statement on a separate line at one indent level less than the statement it labels.
Listing 6 shows the consequence of incorporating this rule into each of the popular styles.

Indenting Function Bodies
The K & R style has a curious inconsistency, namely, the placement of the left brace that opens a function body. The K & R style places the left brace of a function body on a line by itself at the same indentation as the corresponding right brace (as in the Allman style), which is one indentation level less than the statements of the function body itself. That is, the usual K & R style lays out a function as

int f(int i) { statement; statement; }
rather than as

int f(int i) { statement; statement; }
I believe this apparent inconsistency is an anachronism, dating back to the days before C had function prototype headings. Old-style C function headings list the formal parameter types between the right parenthesis of the parameter list and the opening curly brace of the function body, as in

int f(i) int i; { statement; statement; }
In the face of this syntax, I guess this is a pretty reasonable exception to the usual K & R rule (that the opening brace goes at the end of the line before the first statement in the statement sequence). But I think it's really just another strike against the K & R style. Neither the Whitesmiths nor the Allman styles suffer from this inconsistency.

In Summary
If you work alone and you can do as you please, then none of this should really matter to you. You can make up whatever style rules you want, and you don't have to justify them to anybody, least of all me.
However, if you work as a member of a programming team, it's probably in your and the team's interest to try to agree on a reasonable set of layout rules that will promote harmony among the team. Agreeable layout rules encourage team members to read, critique, and share each other's code.
I don't believe most programmers want to be bothered consulting their project style manual when they are unsure about how to indent a particular construct. Most would rather go with their gut reaction. Layout rules specified by examples are better than nothing, but they're a poor substitute for a shared understanding of layout principles. The best way to describe those principles is in terms of the syntactic structure of the programming language.
Of course, you may argue that I espouse this syntactic view of indenting rules because it makes my preference (the Whitesmiths style) look pretty good. On the other hand, I like to think it works the other way around. That is, my preference is based on the fact that the Whitesmiths style has an underlying syntactic elegance and simplicity.
I trust you'll let me know if I've done your favorite indenting style an injustice.

References
[1] Tom Cargill. C++ Programming Style (Addison-Wesley, 1992).
[2] Scott Meyers. Effective C++ (Addison-Wesley, 1992).
[3] Thomas Plum and Dan Saks. C++ Programming Guidelines (Plum Hall, 1991).
[4] Bruce J. MacLennan. Principles of Programming Language (Holt, Rinehart, and Winston, 1983).
[5] Brian Kernighan and Dennis Ritchie. The C Programming Language (Addison-Wesley, 1978 (1st. ed.) and 1988 (2nd. ed.)).
[6] David Straker. C Style: Standards and Guidelines (Prentice-Hall, 1992).
[7] Thomas Plum. C Programming Guidelines, 2nd. ed. (Plum Hall, 1989).