TIME DILATION AND RELATIVISTIC DEBUGGING

Looking for patterns in the real-time behavior of code

Edward N. Adams III

Ed is an independent contractor specializing in analysis and synthesis of microprocessor software. He can be contacted at 2782 Waverley Street, Palo Alto, CA 94306.


Time is Nature's way of keeping everything from happening at once.

-- Woody Allen

The fundamental question related to debugging is, "What happened, in what order, and when?" To answer this, you need to represent the time relations among events. Breakpoints and debug print statements suffice for a short problem, but they demand a lot of your time.

Some bugs, however, are manifest in a deterministic manner after thousands of events. Others are part of a nondeterministic trajectory through the code. In either case, you can debug much quicker using compact graphic displays. In a fraction of a second, you can read hundreds of events from such a display. Thus you can see both patterns and exceptions, and in many cases keep up with the real-time behavior of the code. Time dilates when you're debugging this fast, and you are doing relativistic debugging.

This article addresses the challenge of portraying time, both in the large and in the small. Several methods are presented for representing a single stream and multiple streams of events. The examples come from two different real-world applications. The first, represented in Figure 1 through Figure 6, is an embedded controller for a logic-speed data-acquisition device. It runs under pSOS, a real-time, multitasking operating system from the Software Components Group (San Jose, California), on a Motorola 68000 processor. It communicates with a host computer via SCSI. Although data is transmitted DMA, a few of the protocol bytes are transmitted one at a time, using the _scsi_send_char routine.

The second system, illustrated in Figure 7-Figure 9, is an avionics display subsystem running on a Performance Semiconductor 1750. The only task consists of a top-level dispatcher and 12 major subroutines. The scheduling algorithm is based on time and availability of inputs. Thus, the system can be viewed as a multitasking system with round-robin scheduling. In this system, the variable Process_phase acts as the current task index. The Process_6 routine is expected to have work to do about every 30-33 milliseconds.

All figures were captured and displayed on Biomation's (Milpitas, California) Variable Value Monitor (VVM-1), a logic analyzer-like instrument designed to track and debug code.

One Variable vs. Time

Sometimes you can see both the erroneous and the normal behavior of a software system in a single stream of data. Examples are:

Ordinal Time. Figure 1-Figure 3 represent ordinal time. This means that each represents the order of events, without showing the amount of time between them. So, in Figure 1, the variable Current_task points to one of three tasks: RAM01, CHILD, or SYSMGR. From left to right, the values are CHILD, SYSMGR, RAM01, CHILD, RAM01, and CHILD. This chart shows the pattern of task-switching behavior during a fraction of a second of execution.

Figure 2 shows an Address Scope monitoring the instructions of the _scsi_send_char subroutine. The y coordinate is the relative address in the subroutine; the x coordinate is ordinal time. A descending diagonal line represents one linear execution of the subroutine. Horizontally arrayed line segments indicate separate iterations of one loop. This picture shows eight calls to the subroutine, with two loops. The first loop iterates a variable a number of times. The C code for this subroutine (see Example 1) clarifies the reason for the looping.

Example 1: Code for subroutine _scsi_send_char.

  scsi_send_char(data)
  byte    data;
  {
      int     tries;

     scsi->write.data = data;

      /* Wait for ACKNOWLEDGE to go away. check for timeout */
      while (scsi->read.bus_and_status & BIT (0))
      ;

      /* Assert REQ so that MAC can accept data. */
      scsi->write.target_command |= BIT(3);

      /* Wait till ACK is asserted. */
      while(!(scsi->read.bus_and_status & BIT (0)))
          ;

      /* DeAssert REQ now that ACK has been asserted. */
      scsi->write.target_command &= ~BIT(3);

      /* Wait for ACKNOWLEDGE to go away. check for timeout */
     while(scsi->read.bus_and_status & BIT (0))
          ;

      return 0;
  }

Both in debugging and in performance analysis, measurement of time difference between two related events can be informative. In Figure 3, a Point A to Point B Strip Chart plots the execution time of subroutine _scsi_send_char. When the program gets to A, we note the time. When it gets to B, we note the time, subtract the two, and plot a point logarithmically on the y axis. Once again, the subroutine calls are displayed from left to right. This chart compactly represents over 70 calls to the subroutine. It's easy to see the usual behavior (deltat=2000 ticks) and a few outliers (deltat>5000 ticks).

The major strength of an ordinal time display is that it portrays the fine detail of the pattern of behavior.

Cardinal Time. Many people request to see the events in Figure 1-Figure 3 spaced out in time. When deltax on the display is proportional to deltat, the display is said to have a cardinal time scale. Figure 4(a) shows, in cardinal time, the data for the first part of Figure 2. In cardinal time, you can see the large gaps in time between a small number of clusters of events. Changing the time scale, as in Figure 4(b), uncovers the fine detail.

ILTI Time. How do you show both the size of the time differences and the fine structure of individual bursts in one display? One method is the Integrated Log Time Interval, or ILTI (rhymes with "guilty") scale. It is similar to ordinal time, except that adjacent events are separated by a distance proportional to log (deltat). Compare Figure 5, an ILTI Address Scope, with Figure 2 and Figure 4. The fine structure is as visible as in the ordinal time display, but high-order time differences are also noticeable. For example, you can see the time gaps between calls (the largest one being before the fifth call). Furthermore, the lack of horizontal gaps elsewhere indicates that no subroutine calls or interrupts intruded.

Plus deltaT. An alternative method of representing both the fine and global structures of time is to add a parallel deltaT Scope, as in Figure 6. The lower chart plots the time differences between successive events logarithmically on the y axis. Each event on the Address Scope corresponds to the time difference directly below it on the deltaT Scope. In this chart, most intervals are near one micros, corresponding to uninterrupted instructions. The seven much longer intervals correspond exactly to the gaps between the finish and start of the routine. Most of these gaps are 100micros, but a couple are much longer (2 ms and 100 ms).

This chart allows more horizontal compactness, and thus more events per horizontal inch, than the ILTI display. Furthermore, it shows the time intervals in much more detail.

Time Correlation

Debugging frequently requires investigating the correlations between hypothesized causes (or enabling conditions) and observed effects, both good and bad. Such investigations are aided by viewing parallel streams of events, and clear representation of time becomes even more important.

Figure 7-Figure 9 correlate an Address Scope and a Data Scope, but the same principles apply to any combination of events.

Ordinal Time. The Address Scope in Figure 7 shows the beginning of the Process_6 subroutine. The Data Scope shows the value of the Process_phase variable, similar to a current task id. Both use the ordinal time scale. In this case, I wanted to see the behavior before and after Process_phase takes on the value 4. In particular, is Process_6 called exactly whenever Process_phase takes on value 6?

In the real world, the events of the two streams occur in interleaved order. We need to preserve this order, while separating the two streams. Whenever the subroutine is executing, we display the Process_phase variable value as constant. Conversely, when the variable value is changing, we display the subroutine address off the screen (-1).

The ordinal display shows the fine structure, both within and between the streams.

Cardinal Time. Once again, when you want to see one level of a time interval and don't care about individual events, use a cardinal time scale. Compare Figure 8 with Figure 7. Now it becomes perfectly clear that an event in the Address Scope coincides with the data variable taking on the value 6. If the data variable were merely suspected of having an effect on the subroutine, this chart would make the coincidence obvious. Then you could investigate further, perhaps with the ordinal time chart.

Plus deltaT. In order to see both the fine and the coarse structure, add a deltaT Scope, as shown in Figure 9. Here the Data Scope, Address Scope, and deltaT Scope charts are all displayed in ordinal time. This allows immediate correlation between the streams, and between the streams and the time differences down below.

The sub-microsecond intervals correspond to instructions of the subroutine. The multi-millisecond intervals correspond to changes to Process_phase. The 1-millisecond peak at address 11 of the Process_6 indicates that Process_6 called a subroutine. If the peak weren't always at address 11, it would indicate an interrupt instead.

Conclusion

Relativistic debugging requires efficient display of events, their relative orders, and their absolute times. Different graphic representations of event streams give these relationships different emphases. But they all share the property of compressing hundreds or thousands of events into a comprehensible whole.

Ordinal time emphasizes the fine structure to the total exclusion of time intervals. Cardinal time gives a rapid view of one order of magnitude of time interval. The ILTI time scale retains the fine structure, while adding a flavor of the largest of time intervals. For display of all orders of magnitude of time, the correlated ordinal deltaT graph is most effective.

In short, the ideal method of dilating time depends on the observer's state of mind.


Copyright © 1992, Dr. Dobb's Journal