Karanjit S. Siyan is a consultant with Freedom Technologies and Integrated Computer Systems. He be reached at Box A, Corwin Springs, MT 59021.
At one time or another, almost every software developer has had to face the beast of software complexity. One of the most common methods of measuring software complexity is counting the number of lines of code in the program. This is an example of a "software metric." Generally speaking, a software metric measures the characteristics of a program in quantitative terms. Experience reveals that the number of lines of code is a simplistic and unrealistic way of measuring the complexity of a program. A better software metric is needed --one that can measure the "content" of a program. Another metric is one that can provide an estimate of the difficulty in understanding the program, writing a similar program, and estimating how many errors you can expect to see in this program.
Freedom Technologies has been using several software metrics for managing development efforts. Many of these measures are based on intuitive understanding of how complex a program is. For example, a program that exhibits excessive decision making can be somewhat difficult to follow. A program that makes use of many global variables can be extremely difficult to modify and maintain because of the tight degree of unwanted coupling and "side-effects," and the dreaded "ripple-effect" where one innocuous change can result in the program not working.
Different approaches have been suggested to measure those characteristics that make up complex programs. Broadly speaking, these approaches can be categorized into two groups: size metrics and control-flow metrics.
Size metrics are based on the premise that the "larger" the program, the more complex it is. The size of a program is not necessarily based on the number of lines of code. A more meaningful measure of this size would be based on the number of procedures, number of operators, number of variables, and so forth.
Control-flow metrics attempt to measure complexity based on the decisions -- statements such as if, for, while, case, etc. --used in a program, and the levels of nesting. Some measures use graph theory by viewing the abstract flow of program control as a directed graph.
The two most popular families of metrics that fall into this category are the Software Science (size metric) and the Cyclomatic Complexity (control-flow metric). At least one tool from SET Labs Inc. (Portland, Ore.), called PC-METRIC, computes these metrics and runs on MS-DOS-based systems for languages such as C, Modula-2, and Pascal.
To determine if measurement of a certain characteristic of a program contributes to the complexity of a program, the following two facts must be established: 1. Does the characteristic computed by the metric actually contribute to complexity? and 2. Does the computation of a metric reflect how much complexity is being contributed?
The answers to these questions are based on a statistical analysis of the programs. Some aspect of the programming process --development time, number of errors during development, and so forth --is compared with the computed metric. As an example, if, for a large sample of programs, the time to develop a program increases with a larger value of the computed metric, then the computed metric indicates more complex programs. The degree to which a computed metric is related to the programming process is called correlation. A well-developed theory exists in statistics for measuring the correlation between variables. The better the correlation between the software metric (product metric) and an attribute of the programming process (process metric), the greater the effectiveness of the software metric.
Another approach is to partition the set of sample programs into groups based on some characteristic (such as the number of decisions). The average number of errors can be measured in these groups. If the number of errors is found to increase for groups that have a larger number of decisions, the software metric of "number of decisions" does contribute to program complexity.
The Software Science metrics were developed in the early 1970s by Maurice Halstead of Purdue University. Halstead observed that all programs are made up of operators and operands. By coming up with a measure based on operators and operands, you can get a better idea of how complex the program really is. The following four parameters can be counted for any program:
n1 = number of unique operators N1 = total number of operators n2 = number of unique operands N2 = total number of operands
Halstead's entire theoretical framework is based on these four parameters. For example, the "length" of a program (N) is defined as the total number of operators and operands used. Also, the vocabulary (n) of a program is the total number of unique operators and operands.
N (Length) = N1 + N2 an (Vocabulary) = n1 + n2
What is an "operator," and what is an "operand" in a program? How do you compute the values of n1, N1, n2, N2 for a program? The best way to illustrate this is through an example. Figure 1 is a program fragment and Figure 2 shows a tally of the operators and operands. Notice that paired items such as {}, ( ) are counted as operators. In some situations, there may be differences in opinion as to what is an operator and what is an operand. The statement goto Label can be legitimately viewed as consisting of the operator goto and the operand Label. You could also treat the whole statement as an operator. Several researchers have shown that minor differences in counting rules have little impact on the effectiveness of Software Science as long as the rules are applied consistently.
Several interesting relations are derived based on the count of operators and operands. The first one is the predicted "length" of a program. Remember, that the actual "length" (N) is defined as N1 + N2 or simply the count of all the operators and operands in a program. Halstead theorized that a well-written program should have a predicted length, N^ (pronounced N-hat) as shown here:
int atoi(NumString)
char NumString[I]
{
int Number;
int i;
i = 1;
Number = O;
while(NumString[I] !='\0')
{
Number = Number + NumString[I] - '0';
i ++;
}
return Number;
}/* end
atoi */
Operator Count Operand Count
___________________ ______________________________
= 3 i 4
; 5 Number 4
() 1 NumString 2
[] 2 '/O' 1
{} 2 'O' 1
!= 1
+ 1 N2 = 12
) 1 n2 = 5
++ 1 n1 = 9
n = 14
N1 = 17 N = 29
N^ = [n1 X log[2](n1)] + [n2 x log[2](n2)]
This equation is called the length equation. If the actual length, N, is different from the predicted length, the program could have any of the following six classes of impurities: canceling of operators, ambiguous operands, synonymous operands, common subexpressions, unnecessary replacements, or unfactored expressions.
Based on N and N^, the purity ratio is defined as N^/N. A purity ratio of 1 would imply few impurities. C programs usually have a purity ratio greater than 1. This means that the actual length, N, is less than the predicted length, N^. This could be because logic can be expressed very succinctly in C.
What is interesting about the length equation is that the first term [n1 x log [2] (n1)] can be treated as a constant. The rationale for this is that the number of unique operators in a programming language is finite, and most programmers have a small subset of operators they regularly use. The second term [n2 x log[2] (n2)] is the one that has a greater impact on the predicted length. If you can compute an accurate estimate of how many operands will be used in a program, you can predict how large a program is going to be.
Halstead also theorized that if a program has "n" (n = n1 + n2), unique operators and operands, the number of bits used to represent or encode these as a binary number would be log[2](n). Since there are "N" (N = N1 + N2) such usages of these operators and operands, the total number of bits used to represent the program would be N x log[2](n). This expression is termed the volume (V) of a program.
V (Volume) = N x log[2](n)
Consider two programs, each with 200 uses of operators and operands (N = 200). The first program uses 16 unique operators and operands, while the second uses 64 unique operators and operands. The volume for the first program is 200 x log[2] (16) = 800, while the volume for the second program is 200 X log[2] (64) = 1200. The volume is a better estimate of program size than a simple count of operators and operands (N), since the value is larger with a larger number of unique operators and operands.
Another way of looking at the program volume is that N total operators and operands will require log[2](n) mental lookups using the best search algorithm. The total number of mental lookups is N x log[2](n), same as the program volume. On the average, people tend to make mistakes every E0 mental comparisons. Work done by psychologists shows that the value of E0 is about 3,000 to 3,200. The number of estimated errors, B^ is
B^ = V/E0 = [N x log[2](n)]/E0
A programmer with superior ability will have a larger value of E0, and someone with an inferior ability will have a lower value of E0. The programmer with superior ability will make fewer errors. If the actual number of errors is recorded for a number of programs, a project manager can determine the E0 values for the team members.
You can expect that the effort required to develop a program will be greater for a program with larger program volume. A parameter called "effort" (or E) is defined as
E = V/L
where V is the volume of a program and L is an estimate of the level of abstraction in a program. The level of abstraction is defined as
L = V*/V
where V is the volume measure (as previously defined) and the new V* is the volume a program would have if it could be represented as a procedure call within the language. An example will clarify this.
Consider a program to print a report from a data file. A procedure call to this program could be written as
PrintReport(DataFile)
The volume (V*) of this statement would be 3 x log[2](3) = 4.74. The volume (V*) is called the potential volume. An estimate for L (L^) is often used instead of the equation L = V*/V, which is based on the number of operators and operands. This estimate is given by the following empirical formula:
L^ = 2/n1 x n2/N2
A good thing about this estimate for the level of abstraction is that it is easily computed by a simple analysis without knowing too much about the internal structure of a program.
What do all the given equations mean? Halstead suggested that the effort (E) represents the number of "mental discriminations" a programmer would have to perform in order to write the program. A psychologist, Stroud, found that humans are capable of making five to twenty mental discriminations per second. This number is called the Stroud Number, S. You now have the following equation:
Estimated time to develop a program,
T^ = E/S
A common value of "S" for programmers is 18.
Tom McCabe suggested that the cyclomatic number of a program's control flow graph would be an accurate estimate of how complex a program's flow of control is. The control flow graph is similar to a flowchart. Each node in this graph consists of a basic block. A basic block is a segment of code that has a single entry point and a single exit point, and there is no transfer of control within the basic block. Figure 3 is a fragment of code and Figure 4 is its control graph. The cyclomatic number of a flow graph is defined as
Cyclomatic Number of graph "G," V(G) = e - n + 2
where n is the number of nodes in the graph, and e is the number of edges or lines connecting each node. In Figure 4, n is 4, and e is 4, giving a cyclomatic number of 2.
A simpler way of calculating the cyclomatic number is to add 1 to the sum of all the decision-making statements (if, while, for, and so on). In the example in Figure 3, there is only one if statement, and adding 1 gives 2.
Extended Cyclomatic Complexity is defined as including not just decision-making statements, but also decision-making predicates. Examples of decision-making predicates are the logical expressions in an if statement that are connected by logical and, or. A shortcut method of calculating the Extended Cyclomatic Complexity is
Init( ): /* Node A */
if (ReportOption)
{
GenReport(ReportOption); /* Node B */
}
else
{
ReportError(ReportOption); /* Node C */
}
Finish( ); /* Node D */
Sum of the number of decisions, ANDs, ORs + 1.
The following if statement has an Extended Cyclomatic Complexity of 5 because there is 1 if statement, 2 AND (&&), 1 OR (| |).
if (_dbStatus == 0 && IsValid( ) | |
_dbStatus > 10 && IsExpr())
{
return 0;
}
The useful thing about all these metrics is that they actually help develop better code. Here are some examples.
After writing a piece of code, a software developer can compute the software metrics (using a tool, of course,) to provide feedback about potentially troublesome aspects of a program. The software metrics can be compared against a norm that is user-definable. Remedial action, if necessary, can then be taken. This could be a different approach or algorithm that will produce less complex code, breaking the program down to simpler modules. There are applications where even the best solution may be very complex. In this case, software metrics can identify the modules that need better and extra documentation.
A small percentage of modules in a software system has an inordinate amount of errors. Software metrics can identify the more error-prone modules, and greater resources can be allocated to testing them. During the maintenance phase, software metrics can be used as one of the tools to measure the amount of effort and time required to make changes to existing modules.
Software metrics should be viewed as one of the tools (among many) to help manage the process of developing software. However, this tool is not a replacement for common sense.
In some programs, the metrics may not accurately capture complexity and flag the programs as being overly complex or less complex than they actually are. Common sense will aid in determining if the metric results are reasonable. Applying the metrics to one or two individual programs may be disappointing. Software metrics work best when applied to a large group of programs.
While much of the literature in software metrics reports studies supporting the software metrics described in this article, several researchers have expressed doubts about its validity. Shen, Conte, and Dunsmore voice most of the objections raised by researchers. Much of the criticism is directed at certain assumptions made by Halstead and the methodology used by those studies that support Software Science. These objections drive home the point that software metrics are effective in a statistical sense when applied consistently to a large number of programs.
Much of the information for this article was taken from Reference 1. References 2 and 3 are classics by the founders of the metrics examined in this article. Reference 4 is a critique on Software Science and gives examples of situations where it does not work.
Copyright © 1989, Dr. Dobb's Journal