Debugging Standard Containers with GDB

C/C++ Users Journal June, 2004

Customizing your debugger can pay off

By Herman Pijl

Herman Pijl is a software engineer at Eurocontrol where he develops software that provides airspace data to flight-processing clients. He can be contacted at herman.pijl eurocontrol.int.

Whenever you need to collect data in your code, it's a good idea to use one of the Standard Library's containers. Why? Not just because containers are free, but also because they're generally considered bug free, having been based on publicly known source files [1] reviewed by experts. That said, it is still a good idea to run code through debuggers such as GDB, the GNU Project Debugger (http://www.gnu.org/ software/GDB/GDB.html) that, among other things, lets you look inside a program while it executes. GDB 6.0 is the current version of the debugger, which can debug programs written in C, C++, and Objective-C, among other languages running on Windows and most UNIX implementations. Even when code is correct, you need to inspect variables declared as Standard Library containers. Unfortunately, this isn't a straightforward process.

For example, Listing 1 is a program that could have been taken from a C++ tutorial. Each time a container is filled, I insert a breakpoint and examine the corresponding variable. Since I am following the coding guidelines of my current project, the local variables have the prefix l_.

I walk through the code with GDB and stop after adding the items to l_vec. Looking at the content of l_vec in Figure 1, I see the internal structure of the vector. Of particular interest is the number of items and their values, or the value at a particular index. Listing 2 shows the typical member variables of a vector. The template class vector can be considered a wrapper around the classic C array. You can index a vector like you can index an array, or add an integral value to a pointer. With iterators, you iterate over the vector's content using the operators operator++() and operator--() like you can in C with a pointer. The member variable start refers to the start of the allocated array and the member variable end_of_storage points to the position just after the allocated array. The member variable finish points to the position just after the last filled position in the array. That means finish - start is the actual (logical) size of the vector. This leads to the user-defined commands in Listing 3. Calling the user-defined command veclen in GDB with a variable name sets the user variable $veclen. I use a convention that, when a user-defined command mycommand returns a value, sets the user variable $mycommand. When a user-defined command prints something on the screen, it has the prefix p. To look at the content at a certain index, I define pveci, and to display the whole vector, I define pvec (Listing 4). Notice that pveci requires two arguments—the variable name and requested index.

Most (if not all) vendors implement the template class set with red-black trees. Red-black trees are balanced binary trees where the nodes have an extra attribute—a color that is either red or black [2,3]. In Listing 5, each node has a color and you can traverse the tree bidirectionally with the pointer's parent, left and right. A red-black tree keeps track of its root and/or header node (member variable header) as well as a node count (member variable node_count). The template class set has a private member variable that represents the tree and to which everything is delegated. Listing 6 shows the initial implementation of the user-defined commands psetlen and pset.

A straightforward implementation of pset would be to call begin() and end() on the set variable and iterate over the container's contents. However, this is not possible because the functions are inlined, thus invisible to GDB. A workaround would be to not inline these functions and use explicit template instantiations [4], as in Listing 7. But this would not only require a change of the standard headers, but every container type would need an explicit template instantiation—both unacceptable options. So setbegin, setend, and setnext are reverse-engineered using support functions setleft, setright, setparent, and setminimum.

Another problem with the initial implementation is the inelegant passing of the type and the corresponding typecast. I tried several approaches to get rid of the type argument. The first approach was to try some pointer arithmetic expressions to make the pointer type of $setloop the same as l_set.t.header.value_field. This approach was doomed because pointer arithmetic is only Standard compliant and thus guaranteed for pointers in the same array [5,6]. A second approach was an attempt to make use of the many typedefs in the templates, but this also turned out to be a dead-end street. An elegant solution finally emerged by using an undocumented GDB feature I ran across.

In Figure 2, what I want to do is cast a void pointer to another type that I do not know explicitly; for instance, I have an example variable of the type, but I don't know the type itself. The trick is to assign the address of the example variable to a user-defined variable $myvar. Then I assign the address of the user-defined variable to another user-defined variable $myvar2, which makes that a pointer to a pointer. Then I replace the value *$myvar2 with the void pointer I want to cast. This way, I can output the value of **$myvar2 and GDB knows about the type. A peculiar thing is that $myvar keeps its initial value. Applying the pointer type trick to the initial implementation of pset turns Listing 6 into Listing 8. Now it no longer needs to pass the type of the value.

A map and a set are the same thing. You can store a map as a set of pair<const Key, Value>. If your vendor used the same member variable names, you can just call pset mapvar; if not, you'll need some cut-and-paste to define a user-defined command pmap.

Figure 3 shows a summary of the GDB session. As you can see, the user-defined commands of GDB offer you flexibility. Spending some effort in reverse-engineering the standard containers gives you enormous benefits at debug time. You can then put your user-defined commands under source control and promote them to the organizational level for everyone's use.

References

[1] http://stlport.org or http://www.sgi.com/tech/stl.
[2] Sedgewick, Robert. Algorithms in C++, Addison-Wesley, 1992.
[3] Kruse, Robert L. and Alexander J. Ruba. Data Structures and Program Design In C++, Prentice Hall, 1999.
[4] ISO/IEC 14882, clause 14.7.2 Explicit instantiation.
[5] ISO/IEC 14882, clause 5.7 Additive operators.
[6] Stroustup, Bjarne. The C++ Programming Language, Addison-Wesley, 1997.