Examining C++ Program Analyzers

Dr. Dobb's Journal February 1997

Finding out how programs really behave

Scott, a software-development consultant and author of Effective C++ and More Effective C++, can be contacted at smeyers@ netcom.com. Martin holds a degree in computer science from the Johannes Kepler University, in Linz, Austria, and can be contacted at mklaus@swe.uni-linz.ac.at.

Sidebar: "Constraint Expression Languages"

C++ has a well-deserved reputation for power and flexibility. It has an equally well-deserved reputation for complexity -- its "gotchas" are legion. For example, omitting a virtual destructor in a base class typically leads to incomplete destruction of derived class objects when they are deleted through base-class pointers.

Experienced C++ programmers learn to avoid these kind of problematic constructs, but experience should not be necessary: Troublesome C++ can often be detected by static analysis, using tools that parse and analyze C++ source code. Such tools are becoming available, and during the summer and fall of 1996, we undertook an investigation to identify these tools and to assess their capabilities. In this article, we summarize the initial results of our investigation.

We were interested in answering three questions.

First, what tools statically analyze C++ programs and issue warnings about likely trouble spots? By focusing on static analysis, we limited our research to tools spiritually akin to lint. We explicitly ignored tools designed to detect dynamic (run-time) errors, such as programs that monitor memory usage and report on leaks. Such tools are important, but they offer functionality that complements -- not replaces -- static analysis. We also ignored tools that focus on lexical issues (identifier names, indentation style); our interest was in tools that identify constructs that affect program behavior.

Second, how comprehensive are the tools in identifying suspect C++ constructs? C++ has many facets, including data abstraction, inheritance, templates, and exception handling, and we wanted to find tools that checked for likely errors in many of these areas. A few tools checked only the C subset of C++; we ignored those offerings. Our interest was in tools for C++ programmers, and C++ programmers have different needs than C programmers.

Third, how well do the tools work on real programs? Can they parse real source code? Do they scale well when run on large projects? Are they robust enough to handle complex template instantiations, including those generated by the Standard Template Library?

In this article, we will address only the first two questions.

Identifying Tools

When we began this project, we were aware of several static-analysis tools for C++, but we suspected there were others we didn't know about. Consequently, we posted a request for information to several USENET newsgroups, including groups devoted to C++ programming, OOP, and programming on various platforms. Based on the responses, we ultimately identified the tools discussed here.

CodeCheck from Abraxas Software. CodeCheck is a stand-alone tool for DOS, Windows, and UNIX that lets you use a C-like language to specify what kinds of program analysis to perform. It comes with several predefined analysis programs, including some for computing program-complexity metrics and identifying non-portable code.
C++Expert from CenterLine Software. Also a stand-alone tool, C++Expert performs static and dynamic analyses of C and C++ programs. Its static checks are drawn from Scott Meyers' Effective C++ and More Effective C++, and its diagnostics contain hypertext links to online versions of those books. At this time, it supports only UNIX.
FlexeLint/PC-Lint from Gimpel Software. Another stand-alone tool (the name is FlexeLint for UNIX, and PC-Lint for DOS, Windows, and OS/2), FlexeLint/PC-Lint is perhaps truest to the classic lint tradition. It can check for over 600 potential error conditions in C and C++ source code, including conditions that affect more than one translation unit or that require detailed dataflow analysis.
CodeAdvisor from Hewlett-Packard. CodeAdvisor is a part of HP's SoftBench development environment for UNIX. It enforces 23 predefined rules, and you can extend its capabilities by coding new analyses in C++, then linking them in. Source-code information is stored in a database, so it is possible to perform checks that involve multiple translation units.
CodeWizard from ParaSoft. CodeWizard is a stand-alone UNIX tool designed to enforce a set of 24 rules selected from Effective C++.
QA/C++ from Programming Research. Another stand-alone tool for UNIX, QA/C++ works in two phases. First, it examines C or C++ source code and stores the results in a database. Different Programming Research analysis tools may then be run against the database; these tools generate warning messages. There is no database API that lets programmers develop their own analyses.
The Apex C/C++ Development Environment from Rational Software. Among other capabilities, the Apex environment enforces 22 predefined rules for C and C++ programming under UNIX.

To these choices, we added our noncommercial program, CCEL, purely for purposes of comparison. CCEL began as a research project on static analysis of C++ programs under the direction of one of us (Meyers) and was eventually fully implemented through independent work by the other (Klaus). We added CCEL to our investigation because we were familiar with its capabilities and limitations, and we felt it would be interesting to compare commercial approaches to our research-based initiative.

Our Approach

There were three phases in our testing process.

1.We developed a set of benchmark rules constraining the structure of C++ programs. For example, one rule is that all base classes must have virtual destructors. We tried to develop a set of rules that was representative of the kinds of rules that real programmers would find useful.

2.We contacted vendors and asked which rules their tool could enforce. This information proved useful during our empirical tests, because discrepancies between vendor claims and our findings often identified subtle differences between our rules and those enforced by vendors.

3.We developed of a set of sample source files seeded with rule violations. We ran each tool on each source file to see whether the seeded rule violation was correctly identified.

Our results yielded Table 2, which shows how well each tool enforced our benchmark rules on our benchmark programs.

Choosing Rules

There are many ways to compose a set of benchmark rules for C++ programs, but it is difficult to argue that one set is "better" than another. As a result, we made no attempt to develop the "best" set of rules. Instead, we fell back on the fact that one of us (Meyers) has authored two books containing guidelines for C++ programming and we chose nearly all our rules from those books.

This approach is not as gratuitous as it might appear. Meyers' Effective C++ and More Effective C++ have been well-received in the C++ programming community, and one or both form the basis for many sets of corporate-coding guidelines. In addition, these books form the basis for at least two of the static-analysis tools in our investigation. Finally, by drawing our rules from well-known and easily accessible sources, we avoided the need to explicitly justify individual rules in our benchmark set. Instead, the justification for nearly every rule is available in the books, and we simply refer to the appropriate book location as the rationale for each rule.

We chose 36 rules divided into eight categories; see Table 1. Each rule begins with its "Rule" number, followed by a reference to either Effective C++ (E) or More Effective C++ (M). Next is a reference to the book "Item" number from which the rule is derived. The text of the rule is often different from the text of the book Item, because the book Items tend to be worded too generally to be checked.

Some of the rules may seem controversial, especially in light of the C++ found in many popular class libraries. Rule 15 (no public data members) is widely violated in the MFC, for example, while almost no library adheres to Rule 19 (make all nonleaf classes abstract). With the exceptions of Rules 13 and 23 (which we hope are self explanatory), Effective C++ and More Effective C++ offer firm technical foundations for each rule. We believe it is therefore important that programmers be able to enforce those constraints, even if the majority of programmers choose not to. Furthermore, our decision to include rules that are commonly violated helps us evaluate the effectiveness of the tools' filtering capabilities. (We do not report on this aspect of the tools in this article, but it is an important consideration in the practical application of any tool.)

Benchmark Programs

For each of our 36 rules, we developed a source file seeded with a violation of the rule. We then executed each tool on each source file to see if the tools correctly identified the seeded errors. These source files were truly trivial -- many were under ten lines long. Our goal was not to provide a realistic test of the tools -- just to see whether or not the tools could identify rule violations in the simplest of cases. (Sometimes, this backfired and yielded misleading results.) Listing One s the source code for the file used to test Rule 20.

Compilers versus Special Tools

Several people responded to our request for information on static-analysis tools by remarking that they found little need for such tools. Instead, they relied on compilers to flag conditions that were likely to lead to trouble ("I find GNU G++ with -ansi -pedantic -Wall -O flags useful," was a typical comment).

In fact, the GNU compiler was singled out as being especially good at warning about troublesome C++. This piqued our curiosity about compiler warnings. How many of our candidate rules would compilers identify?

To find out, we submitted our benchmark programs to five compilers, in each case enabling as many warnings as possible. As Table 2 shows, the results were disappointing. Even G++ identified, at most, 2 of the 36 rule violations, and three of the compilers identified none. This confirmed our impression (based on our experience as C++ programmers) that while compilers -- at least the compilers with which we have had experience -- are good at many things, identifying legal, but potentially troublesome, C++ source code is not one of them.

Specifying Constraints

The tools in our study let you specify what conditions to check for in one of two ways. Most tools follow the lint model, whereby the tool is created with the ability to enforce some set of predefined constraints, and you turn these constraints on or off. There is no way to extend the capabilities of such tools. For example, a tool is either capable of detecting that an exception may leave a destructor (Rule 31) or it's not. If it's not, there is no way for a tool user to add that capability.

A different approach -- employed by Abraxas' CodeCheck, HP's CodeAdvisor, and our CCEL -- is to provide tool users with a language in which to express constraints of their own. Such tools may or not be useful "out of the box" (it depends on the existence and utility of predefined rule libraries), but can be extended to check for new, user-defined conditions. This approach is more powerful, but, as in the case of C++ itself, complexity often accompanies power; the power is inaccessible until you have mastered the constraint-expression language. Furthermore, the addition of user-defined constraints may affect an analysis tool's performance, because enforcement of such constraints may require arbitrary amounts of time, memory, or other resources.

We made no attempt to master the various constraint-expression languages used by the different tools, but the examples we saw (see the accompanying text box entitled "Constraint Expression Languages") reinforced the lessons we learned during the design and implementation of CCEL -- it's hard to design a language for expressing constraints on a language as feature-filled as C++, and such a constraint language is nontrivial to learn. Abraxas, for example, reports that it takes between three and six months to become proficient in the CodeCheck constraint language. Most Abraxas customers want to hire specialists to compose rules instead of having to learn to write the rules themselves.

Most programmable tools attempt to offer the best of both worlds by shipping a set of predefined rule libraries that check for commonly desired constraints. This eliminates the need to write rules to cover common constraints.

Results and Discussion

Table 2 presents the results of running the various tools on the collection of benchmark programs. Several features are of interest. First, no tool was able to enforce all of our 36 benchmark rules, not even the tools supporting user-defined constraints. Thus, even the best of tools currently available offers only partial coverage of C++. This is especially noteworthy because our benchmark rules themselves failed to exercise all major language features; templates are a particularly obvious omission.

Second, the number of benchmark rules that can be enforced without programming (out of the box) is, at most, 17 of 36. (CCEL supports 19, but CCEL is a research project, not a commercial tool.) If we speculate that our set of benchmark rules is somehow representative of the kinds of constraints real programmers might want to enforce, this suggests that current tools cover, at best, only about half of those constraints. Of course, automatic enforcement of half a set of requirements is better than no enforcement at all, but the data in Table 2 suggest that there is much room for increased language coverage by static-analysis tools for C++.

Third, it is not uncommon to have subtle mismatches between a benchmark rule and the conditions detected by the analysis tools. In most cases, this is an outgrowth of the vendors' attempts to avoid generating warning messages when no truly harmful condition exists. For example, consider Rule 10: "Make destructors virtual in base classes." Many programmers consider this rule too aggressive, and a common alternative form of the same rule is: "Make destructors virtual in classes containing virtual functions." This form has the advantage that no virtual table pointer is added to a class simply to satisfy the rule. (This is the rule variant that's employed by the GNU C++ compiler, HP's CodeAdvisor, and Programming Research's QA/C++.)

The motivation for this rule (in any form) is that Listing Two is generally harmful if the base class lacks a virtual destructor. In truth, Listing Two is only harmful if one or more of the following conditions holds:

D has a destructor.
D has data members that have destructors.
D has data members that contain data members (that contain data members, and so on) with destructors.

At least one tool vendor attempts to issue a diagnostic only if these more stringent conditions exist, and the conditions do not exist in our test program (Listing Three). The tool in question thus issues no diagnostic on our sample program, but if class Derived were nontrivial, the tool might issue a warning.

This more precise analysis should be beneficial for users, because a diagnostic should be issued only if a problem truly exists. However, the rules of C++ can be both complicated and unintuitive, and their subtlety can cut both ways. In the case of the vendor attempting to check for the more detailed conditions outlined earlier, the test for data members with destructors in the derived class was omitted. Hence, though the tool avoids issuing warnings in harmless cases, it also avoids issuing warnings in some harmful, but rare cases. These are precisely the cases in which static-analysis tools that correctly understand the detailed rules of C++ are most useful!

Another tool had trouble issuing correct diagnostics when compiler-generated functions -- default constructors, copy constructors, assignment operators, and destructors (especially derived-class destructors) -- were involved. Because of the minimalist nature of our test cases, our programs had many instances of such functions; this led to incorrect results from some tools.

Whether such shortcomings would cause problems when the tools are applied to real programs is unknown, but it hints at a deeper problem we found: Vendors don't seem to understand the subtleties of C++ as well as they should. We believe that vendors of C++ analysis tools must understand C++ as well as compiler vendors, but based on our experience with the tools in this study, we must report that such expertise cannot yet be taken for granted.

Caveats

While Table 2 provides insight into the state of existing lint-like tools for C++, it is important to recognize what it does not show. We were interested only in the capability of such tools to handle the "++" part of C++, but most of the tools also provide significant other capabilities.

Most tools also check the "C" part of C++, some quite extensively. This can be useful. By limiting our tests specifically to C++ capabilities, we were able to sharpen our focus, but we also screened out the majority of some tools' functionality.

Many tools offer stylistic and lexical checks in addition to the semantic issues we looked at. For example, if you wish to ensure that classes never use the default access level of private, but instead declare it explicitly, at least one tool will note violations of that constraint.

Some tools offer complementary analyses in addition to checking coding "style." For example, Programming Research's QA/C++ can calculate various program-complexity metrics.

In addition, our set of benchmark rules was far from exhaustive. Some vendors check for C++-specific conditions we didn't consider; Table 2 says nothing about such capabilities.

All this is to say that Table 2 is anything but a buyer's guide. Furthermore, there are many nontechnical characteristics of analysis tools you should consider before deciding which, if any, is suitable for your circumstances. The following questions come to mind:

How easy is it to install, configure, and use the tool? These factors are especially important for tools with equivalent (or close to equivalent) capabilities. For example, Gimpel Software's FlexeLint and Productivity Through Software's ProLint use the same underlying analysis engine, but offer quite different user interfaces.
How easy is it to filter out unwanted diagnostics? The traditional Achilles Heel of lint-like tools is an unacceptable signal-to-noise ratio, so it's important that users be given fine-grained control over what code is analyzed and which diagnostics appear. In fact, some vendors deliberately avoided offering checks for some conditions (for example, the use of preprocessor macros to define constants -- our Rule 1) because they felt it would be more bothersome than useful to their customers. (Respondents to our newsgroup postings indicated that a bad signal-to-noise ratio is a continuing problem, even with some of the tools considered here.)
How robust and up-to-date is the C++ parser? To be maximally useful, a C++ analyzer must parse exactly the same language as the compiler(s) you use. It's particularly frustrating if the analyzer rejects code your compiler accepts.
Can the tool handle large projects -- those with multiple libraries and hundreds or thousands of source files? Based on the responses we got from our USENET postings, the answer too often is that it cannot.
Is the documentation complete, accurate, accessible, and comprehensible?
Does the vendor offer adequate customer service, including technical support?
How well established is the vendor? Is the vendor likely to continue to support the tool for years to come?

Our study considered none of these issues.

Finally, it is important to remember that Table 2 is based on tests we performed in August/September 1996. Virtually all of the tools we examined are under active development, so it's likely that new versions exist even as you read this report. For example, we know that Abraxas is currently beta-testing a set of predefined constraints derived from material in Meyers' books, and CenterLine and Rational are planning upgrades to C++Expert and Apex, respectively, that will allow users to define new constraints. Other vendors are similarly active. Table 2 represents a mere snapshot of the commercial state of the art in September 1996.

Summary

A number of analysis tools are now available that read C++ source code and warn about possible behavioral problems. They cover varying aspects of C++, though none offers truly comprehensive coverage of the language. Based on simple tests, we believe that many dangerous C++ constructs can be detected, though the complexity of C++ leads to incorrect behavior on the part of some tools, especially where compiler-generated functions are concerned. C++ analysis tools are under active development, and it is likely that the data in this article fails to accurately reflect the current capabilities of the tools we examined. If you are interested in static-analysis tools for C++, we encourage you to contact the vendors, conduct your own tests, come to your own conclusions -- then share them with us.

Acknowledgment

We are grateful to Jill Huchital for her comments on a draft of this article.

References

Meyers, Scott. Effective C++, Reading, MA: Addison-Wesley, 1992.

-- -- -- More Effective C++, Reading, MA: Addison-Wesley, 1996.

Meyers, Scott, Carolyn K. Duby, and Steven P. Reiss. "Constraining the Structure and Style of Object-Oriented Programs." Principles and Practice of Constraint Programming. Cambridge, MA: MIT Press, 1995.

Musser, David R. and Atul Saini. STL Tutorial and Reference Guide, Reading, MA: Addison-Wesley, 1996.

For More Information

Abraxas Software
5530 SW Kelly Avenue
Portland, OR 97201
503-244-5253
http://www.abxsoft.com/

Centerline Software
10 Fawcett Street
Cambridge, MA 02138-1110
617-498-3000
http://www.centerline.com/

Gimpel Software
3207 Hogarth Lane
Collegeville, PA 19426
610-584-4261
http://www.gimpel.com/

Hewlett-Packard
19410 Homestead Road
Cupertino, CA 95014-0604
408-725-8900
http://www.hp.com/sesd/CA/
ParaSoft Corp.
2031 South Myrtle Avenue
Monrovia, CA 91016
818-305-0041
http://www.parasoft.com/

Productivity Through Software Inc.
555 Bryant, Suite 555
Palo Alto, CA 94301
415-934-3200
http://www.pts.co.uk/

Programming Research Ltd.
1/11 Molesey Road, Hersham
Surrey KT12 4RH, UK
+44-1932-88 80 80
http://www.prqa.co.uk/

Rational Software Corp.
2800 San Tomas Expressway
Santa Clara, CA 95051-0951
408-496-3600
http://www.rational.com/

DDJ

Listing One

//  20  M24  S   Avoid gratuitious use of virtual inheritance, i.e., make//  sure there are at least two inheritance paths to each virtual base class.
class Base { int x; };
class Derived: virtual public Base {};
Derived d;

Back to Article

Listing Two

class B { ... };             // base class;  assume no virtual dtorclass D: public B { ... };   // derived class
void f(B *p);                // f is some function taking a B*
D *pd = new D;               // pd points to a D
f(pd);                       // pass pd to f, binding pd to p in f
void f(B *p)
  {
    delete p;                // this calls only B's dtor, not D's!
  }

Back to Article

Listing Three

// test program for rule 10
class Base {};
class Derived: public Base {};


int main()
{
  Base *pb = new Derived;
  delete pb;
  return 0;
}


Back to Article


Copyright © 1997, Dr. Dobb's Journal