Perl Benchmarking

The Perl Journal October, 2004

By Glenn Wood

Glenn works on HazelSt.com, a web site and support service for budding writers, and can be contacted at CTO@HazelSt.com.

At my company, we recently faced a dilemma—we had a Perl process that ran along, consumed 2 GB of RAM, then died without a peep (not even a whimper). This process required about an hour of a quad i686 Linux box to do its magic. But in this instance, we got nothing; there was no result, no return code, no reports; not even a message in the error log.

What to do? At this point, the conversation between the programming team and our boss went something like this:

Boss: Ok, everybody, let's do a code review!
Us: We did that, weeks ago. Do you think we'll find anything new?
Boss: More tests?
Us: We've tested this, and so has QA. That's how we found this problem. It works with other datasets; it's just this one that's causing the problem.
Boss: Analysis, then—look at the input data.
Us: You mean all 50 GB of it, or only the 500 MB picked out by the three page SQL SELECT statement?
Boss: Profiling?
Us: Ok, let's! Here we go...looks like most of the time is being spent in function X, which is called in, umm, give us a minute...uh...only 23 different places. Well, that didn't help much.
Boss: Tracing, then. Stick in print "Hello World" in a bunch of places, and analyze the result. That will tell us where it quit, at least.
Us: I thought you said you wanted this fixed by Christmas.

The answer in a case like this is a special kind of code test called a "harness." Like profiling, it gives you intimate information about what functions are being invoked, how often, and for how long. It goes beyond profiling in that any kind of analysis can be performed at the moment of each function's invocation (and/or return). It's kind of a cross between profiling and tracing.

Most languages perform this type of analysis by subclassing, code templates, applying test interfaces to the classes under test, or by time-consuming (and expensive) execution using special test environments. This usually means recompiling the program, redeploying the application, perhaps deploying an expensive test tool, and then rerunning the test.

Perl is a special kind of creature, though (as if you didn't know). What many people do not realize (or fail to grasp the significance of) is that Perl is compiled each time it is run, and classes and methods can be redefined after they have been initially defined, at runtime. This feature can be used to implement a powerful, easily used, generalized benchmark harness.

Benchmark::Harness (available on CPAN) is a module that does most of this work for you. You give it a list of modules/subroutines you want it to investigate (wildcards are accepted), and it interrupts the process at the entry and/or exit of these subroutines to perform whatever operations you want to perform at that time. It then hands control back to the subroutine or caller as if nothing had happened. The CPAN distribution already contains a few sets of such operations you can apply simply by naming one in your parameters to the harness. You can create new sets of operations by using one of these samples as a template.

Basic Example

The graph in Figure 1 was created by a laborious process of sitting and watching the process run, writing down top statistics, entering those in a table, and graphing it. Let's look at an easier way to do that with Benchmark::Harness.

At the beginning of the process, we insert a single line to activate the harness:

new Benchmark::Harness('MemoryTrace', 1, 'Fubar::.*');

MemoryTrace names a module that will handle the interrupts. The digit "1" instructs the disposition of the report. The final parameter, which may be repeated as often as necessary, names the subroutines of the modules you want to harness. That's all there is to it!

Benchmark::Harness and MemoryTrace will create an XML file containing statistics about the runtime environment at the entry and exit of each subroutine in the Fubar module (Fubar::.*). The XML file will contain a simple list of elements, one for each event that is harnessed, containing this data. For instance:

<T t="60" c="5.0" n="Entry(Fubar::processItem)" m="245"/>

This is very compressed. This report may contain many thousands of events, so the names of elements and attributes in the XML are chosen to be short:

T, name of a "trace" element in the XML.
t, the time at which it was captured.
c, the CPU utilization.
n, the method that was harnessed.
m, memory utilization at that point.

This element is created by MemoryTrace, so if you were to install your own set of harness operations instead of MemoryTrace, you could print any kind of information you like. For our purposes, c and m are the data we want.

Since it's in XML, it is incredibly easy to use this data. We ran it through a Perl script to convert it to a graph using GD::Graph::lines. Don't be intimidated by GD; it is very easy to use. Most of the folderol is for setting up special graphical characteristics, and that is standard stuff for any graph:

use GD::Graph::lines;

# Parse the T element from the report
my $xml = new XML::DOM('Fubar.harness.xml');
my @T = $xml->selectNodelist('/MemoryUsage/T');

# Translate the data to GD::Graph format
my $data =
  [
    [ map { $_->getAttribute('t') } @T ],
    [ map { $_->getAttribute('m') } @T ],
    [ map { $_->getAttribute('c') } @T ]
  ];

# Plot the graph
my $graph = new GD::Graph::lines(500,250);
$graph->set( 
      # numerous GD::Graph::lines parameters go here . . .
);
$graph->plot($data);

# Write the graph to a file
my $ext = $graph->export_format;
open(IMG, ">Fubar.graph.$ext");
binmode IMG;
print IMG $graph->gd->$ext();
close IMG;

The result is shown in Figure 2.

Pretty Pictures

Ok, fine, but that's only what we knew when we gathered this data by hand earlier. What we want to know is what subroutine is eating all the memory, and where did it die? So we'll analyze the report a little more.

XSL is extremely valuable in analyzing XML documents and it shouldn't be gratuitously overlooked, but it doesn't create pretty pictures. We like pretty pictures, so we decided to extend the graphical representation we began previously.

Through a process of legerdemain (which we will not go into here), we transformed the data into arrays marking the entry and exit point of each subroutine, and then graphed those as lines alongside each other on the memory/CPU graph. Some minor trickery in GD::Graph::lines was required, but it is simple:

  # Draw the Memory and CPU data
$graph->plot($data);

# Set parameters to draw the per-function lines
$graph->set(
  # a new set of GD::Graph::lines
  #    parameters goes here
);
# Print a graphic line for each function
# @functionData was generated via legerdemain
for ( @functionData ) { 
  my $data = [
    [ map { $_->getAttribute('t') } @T ],
    [ @$_ ],
  ];
  $graph->plot($data);
}

By the way, a SAX parser can parse XML even if the document is incomplete (as it would be in this case of catastrophic failure). This is an important consideration in selecting which XML parser you use to analyze the report.

Figure 3 is a graph of subroutine activity. There is a legend (not shown here) that identifies what subroutine each horizontal line represents. We see the last subroutine that is called that is still running at the end (top-right gray line), but we've also identified which subroutines participate in the ramp up.

Back to the Harness

We now have a better view of what is going on in our errant process, but we'd like to know more. It's easy, with a harness, to find out more about what's happening.

We'll introduce some terminology here, and then extend the example. The moment at which each method is invoked is called an "event." The operation that is performed by the harness at that time is, therefore, an "event handler." To those familiar with GUI applications, or with real-time applications, these are familiar terms, and they mean the same thing in this context, with one exception. In a GUI or real-time API, the API determines what events there are. In the harness model, we determine what events are fired. We declare new types of events simply by naming methods of our program in the new Benchmark::Harness statement.

Once the event is captured in the event handler, we can do anything we want. It is Perl, after all.

When the process enters a subroutine, Harness calls the event handler named OnSubEntry(). When it exits the subroutine, OnSubExit() is called. These are defined in MemoryTrace, which in turn is named by the new Benchmark::Harness statement. We can subclass MemoryTrace, or we could simply write our own event handlers module. For simplicity, let's take the second route, and for laziness, use the original MemoryTrace.pm as our initial template.

When the harness calls OnSubEntry() or OnSubExit(), it hands over the parameters of the target subroutine, then that subroutine's returned result, respectively. Thus:

Package MyHarness; use base qw(Benchmark::Harness);

sub OnSubEntry { # (self, funcName, funcParms)
  my $self = shift;  # MyHarness object instantiated
                     #  by new Benchmark::Harness()
  my $funcName = shift; # Name of harnessed function
  print "$funcName has $#_ parameters\n"; # Our 'trace'
  return @_;
}

sub OnSubExit { # (self, funcName, funcReturn)
my $self = shift;
my $funcName = shift;

if (wantarray) { # wantarray context is preserved
    print "$funcName returns an array - @_ items)\n";
  return @_;
} else {
    print "$funcName returns a scalar - $_[0]\n";
  return $_[0];
}
}

You can print anything you like in your handler, but Benchmark::Harness provides a context to easily generate the kind of XML formatted report we illustrated above. You do this with one or more calls to $self->harnessReport().

harnessReport() takes any number of parameters, in any order, to specify contents of the <T> element that will be generated. The <T> element will actually be generated after you return from OnSubEntry() or OnSubExit(), so all parameters you supply via harnessReport() will be included at that time.

The parameters of harnessReport() are distinguished by their reference type:

scalar—If the parameter is a scalar, it is the tag name of the element (default "T").
hash ref—A hash of values that will become attributes of the <T> element. These are accumulated by subsequent calls to harnessReport().
array ref—References an array of child elements of <T>. These items themselves are array refs to ('tagName', {attributes}, [child elements], "text content").
scalar ref—References a string that will become the text content of the <T>.

Analyze the Results

Back to our example. We had now determined that the sudden rise in RAM use occurred in a subroutine (we'll call it subroutine Y()), but more interestingly, it locked into high RAM use, without changing, in another subroutine (which we'll call subroutine Z()). It looked like an infinite loop in Z().

A major benefit of writing a program as smaller functions is to ease analysis and debugging. Unfortunately, the coder (who shall remain nameless) who wrote this program didn't follow this dictum, and wrote Z() as a single 400-line subroutine. Where to start?

Extend the Harness

We knew which subroutine was causing the problem, but there was a lot going on inside it. (This isn't really surprising. After all, if the solution were simple, we probably would have found it earlier.)

What we needed was a mechanism to analyze variable use within the subroutine, and also possibly a line-by-line trace. This could be done by inserting traces into Z(). This would have been a perfectly adequate approach, since we had localized the problem in one place. We wouldn't have had to put traces everywhere. However, this violates the purity of the harness model by requiring changes in the target code, so there should be a better way. Those of you familiar with Perl's tie operation and the Tie::* modules can probably see the obvious application of these facilities here. We didn't go this direction in solving our problem, but I invite someone out there to perform this extension to Benchmark::Harness (if I don't get to it first!).

Browsing through the raw benchmark report, I noticed a strange phenomenon. We had just enhanced the harness to replace "ps -r" to discover memory usage (which is slow) with Proc::ProcessTable (which is fast). Just where memory use peaked and flattened out, the number recorded in the report changed from a large positive to a large negative number. What a coincidence, I thought. Proc::ProcessTable is having a problem rendering this number at exactly the same place our process stalls.

We began to wonder if there was something magical about the 2-GB threshhold? It's a nice round number, after all. We found a few spots in the ramping up subroutine to reduce memory use by 5 or 10 percent, and ran it again.

This time, the process went further. Past 30 minutes, past 2 GB, right up to 2.6 GB in two hours. It was still running 36 hours later; still with no output.

That meant there was no magical 2-GB ceiling, and that the process was crawling to a near halt during its HTML rendering. We use a popular HTML rendering module (whose name is withheld to protect the innocent). It is being swamped by our input data. There isn't much hope in improving things by working to reduce memory use or to make our preprocessing more efficient. The same number of data points will go to rendering. The problem is the entire process itself, rather than any one component of it. There is just too much data for HTML rendering.

So in this case, Benchmark::Harness has helped us uncover a fundamental flaw in our design. We have not decided what to do about this problem yet, but thanks to Benchmark::Harness, we know that no matter how much code tweaking we do, it won't do any good. That realization has saved us a lot of time and trouble. Now that we know more about what we are up against, we can make a wiser choice. Maybe this report, in its current form, isn't really required at all. Perhaps we can just save the intermediate result and take slices of that to create subsidiary reports that may be as useful (or more so) than our original version. Regardless, the harnessing process has taken us out of the woods and the path ahead is clearer than it was before.

As the legendary Sherlock Holmes said, once you have eliminated all other possibilities, whatever is left (no matter how improbable) must be the case. What better way to eliminate all other possibilities than with a tool that searches for problems everywhere at once, as the harness model does.

There's More

The harness model addresses many of the traditional methods for solving problems in your code, including testing, analysis, profiling, and tracing. What is especially nice is that it addresses all of them at once, with an easy-to-implement method for activating the analyses.

But what about code review—learning how your target program is structured and how it actually works? The harness model has another powerful capability, one that is especially relevant to Perl.

One of the frustrations of working with Perl is that it can create new methods and properties for any class/module on the fly (which is both good and bad—few swords are single edged). This means that it can be difficult to know what methods and properties are actually incorporated in a program, even through intensive code review. Even here, the harness model can come to the rescue.

Since a harness can peek into every object at runtime, it could be enhanced to accumulate all the methods and properties it discovers in each object it encounters, as the program is run and as Perl instantiates new methods and properties along the way. Transform the results via XSL or Perl, and violà—instant class diagram!

I leave this as an exercise for the reader. At my company, this may be our next step—marrying Benchmark::Harness with Devel::Diagram to create beautiful class diagrams of Perl modules.

For more information about Benchmark::Harness and to participate in its continued development, visit Benchmark-Harness.org/.

TPJ