LETTERS

Rhealstone Recommendations

This letter is in response to the article "Implementing the Rhealstone Real-Time Benchmark," by Rabindra P. Kar, which appeared in the April 1990 issue of Dr. Dobb's Journal. There are several areas of the benchmark that could be improved, which we discuss below:

1. Synthetic benchmarks and kernels are generally held in less regard now than in the past by experts in the field of performance evaluation. The detailed reasons are too long to include here. However, for an excellent explanation please read Chapter 2, "Performance and Cost," of the book Computer Architecture - A Quantitative Approach, by Hennessey and Patterson (published by Morgan Kaufmann, 1990). Their recommendation for evaluating performance is to use a workload consisting of a mixture of real programs. Hennessey and Patterson say that synthetic benchmarks, such as Whetstone and Dhrystone, are the least accurate of all predictors of performance. We would hope that Rhealstone can avoid some of the limitations of such benchmarks.

2. Rhealstone uses average times rather than worst-case times. In many real-time applications, worst-case time is more important than average-case. In fact, a prime characteristic that distinguishes real-time systems from data processing systems is that a real-time system has deadlines that must be met in order for the system to work properly. Average throughput is of some interest but not the critical factor. In order to determine whether the deadlines will be met, the worst-case response times must be measured.

3. The article states that the set of six time values will typically be in the tens of microseconds to milliseconds range. Note that this is a range of two orders of magnitude. However, to compute the "real-time figure of merit" the equation takes all six values and finds the arithmetic mean of them. The result is that those components whose values are small, such as interrupt latency, will contribute negligibly to the total. Similarly, large values (such as infinity for deadlock-break time on many kernels) will contribute enormously, to the point that the number of Rhealstones per second is zero. The article does allow an application-specific measurement to assign unequal weights to the components, but for the standard measurement, the weights are all equal. Hennessey and Patterson present arguments for using geometric means, rather than arithmetic means, to avoid some of these problems.

4. Regarding the definition and benchmark for measure interrupt latency, there are really three aspects of kernels that affect interrupt response: the interrupt disable time, interrupt latency, and interrupt servicing.

The interrupt disable time is the worst-case time in the kernel that interrupts are disabled (at task level), in order for the kernel to manipulate critical data structures. The processor cannot respond to interrupts at all during this time. Unfortunately, it is extremely difficult to measure this value, since there is little or no external indication from the processor that interrupts are disabled. You can measure it statistically using sophisticated hardware (emulators, oscilloscopes, and so on), but this measurement may not yield the true worst-case time. The only real way to determine it is by identifying the instruction stream in the kernel that disables interrupts for the longest period, and then calculating the timings of those instructions using worst-case timing figures for the processor. This may include cache misses, pre-fetch queue fills, memory wait states, etc.

Interrupt latency is the time it takes from the point an interrupt is acknowledged by the processor, up to the first application instruction in the interrupt service routine. The article describes one way to measure this value. For some kernels, this value is simply the time taken by the processor to respond to the interrupt, since they require no preamble or vectoring code through the kernel, but instead let the interrupt vector point directly to the interrupt service routine.

Interrupt service time is the time it takes, at interrupt level, to do whatever work is required by the kernel and application before returning to task level. Since nested interrupts at the same priority generally are not allowed during this period (although in some systems they can be), this timing can significantly affect interrupt response to bursts. This timing can be measured, given the appropriate test case. It should be recognized that different kernels have different mechanisms for servicing interrupts, and some have more alternatives than others. We suppose a particular kernel should be able to put forward its best case for this measurement.

Any benchmark attempting to define interrupt response characteristics should address all three of these factors. Rhealstone currently measures only interrupt latency.

5. The Rhealstone benchmark as described measures a small set of cases that actually measure more or less the same thing: The time it takes to switch from one task to the other in a system with 2 or 3 tasks. This approach fails to address an important aspect of kernel performance which relates to the behavior (of the kernel) when large numbers of tasks are present. Some kernel designs use linear linked-lists for handling ready tasks. This approach can be fast in the case of small numbers of tasks, but degrades when large numbers of tasks are present and are made ready in a particular order. Other designs have flat response time relative to the total number of tasks. Any benchmark should address this issue, for example by measuring task switch time in a system with a hundred tasks.

6. The standard definition of deadlock is a situation in which each member of a set of tasks is waiting for a resource that is owned by another task in the set, in such a way that none of the tasks (regardless of priority) are able to proceed. What the article refers to as a "deadlock" is called priority inversion in standard terminology.

7. Since many real-time operating systems do not implement priority-inheritance or priority-ceiling protocols, which automatically solve the priority inversion problem, the "deadlock-break time" will be infinity for such systems. Although it is legitimate to ask whether or not a kernel has a particular feature, it should be expressed as such, rather than masking it as a "performance measurement."

There are other techniques available for avoiding priority inversion in these systems through explicit control of the application. The simplest way, if the application allows for it, is to avoid having unequal priority tasks compete for the same resources. If this is not possible, the following method can be used: Let's assume the classic example of three tasks H, M, and L, with respective priorities high, medium, and low. Tasks H and L share a common resource. Task M is unrelated to H or L and does not compete for the resource. When L wants to acquire a resource, it should explicitly raise its priority to that of the highest-priority task that also uses that resource (H), before taking the resource. This requires a few extra system calls, but achieves the purpose. It behaves a little differently than the iRMX/iRMK style, which lets M run briefly until H waits on the resource. However, the former behavior may be more efficient in that respect, as it avoids two context switches into and out of M. It does block M temporarily, even if H doesn't really want the resource at all during the time L has it.

8. Rhealstone does not include a benchmark which measures "broadcast" wake-up of several tasks, using event flags. Since this feature is commonly used in real-time systems, we suggest that this be added.

9. Since Rhealstone is described in English, each implementation of it may be written in a different programming language, make different operating system calls, and make various assumptions about what the English text really means. This will make it impossible to interpret the benchmarks on different systems. To have validity, the benchmark must be written in one programming language, call one set of operating system services, and there must be one master implementation of it which everyone uses.

To summarize, we feel that these issues should be addressed to improve the validity of the proposed Rhealstone benchmark. However, even with these corrections, the accuracy in performance prediction of a synthetic benchmark, such as Rhealstone, can never come close to that of an actual real-time program. We feel that benchmarks for real-time should describe or incorporate actual real-time programs. These programs will presumably call some proprietary real-time operating system services, so the operating system calls should be converted to a standard, portable, operating system interface such as POSIX, with its Real-Time Extensions. Only then can meaningful performance measurements be obtained.

Glenn Kasten, Ready Systems David Howard, Ready Systems Bob Walsh, Ready Systems

Robin responds: The authors of this letter raise some interesting and valid issues about the Rhealstone benchmark. Responding to each issue in-depth would make this almost a full-length article, so I will address some of the more important objections that they have raised.

One major criticism of Rhealstones is that it specifies the measurement of average times rather than worst-case time. This criticism stems from the notion that real-time systems (hardware + real-time OS or kernel) are benchmarked primarily to validate their critical response-time capability. The expectation here is that the Rhealstone number achieved by the system should indicate if it will meet interrupt response or other deadlines with 100 percent certainty. The reality is that benchmarks are used, by the computer industry, to evaluate and compare average, long-run performance of "typical system operations," after determining that the system can meet the job's minimum requirements. There is little doubt that any widely used real-time benchmark will be put to similar use. Consequently, Rhealstones measure average performance of "typical real-time operations." Running Rhealstones will not lift the burden of determining worst-case response times from the application designer's shoulders; it was never intended for that purpose.

Section 4 in the letter states that interrupt disable time, interrupt latency, and interrupt servicing together affect interrupt response; and further claims that Rhealstones only measures latency. My article defines interrupt latency as the delay between the CPU's receipt of an interrupt request and the execution of the first application-specific instruction. Their letter defines latency from the point that an interrupt is acknowledged by the CPU. In effect, Rhealstones measure both interrupt disable time as well as "latency," as defined in the letter. It does not measure interrupt service time, because that is entirely a function of the application, NOT the system.

Section 5 points out that real-time system performance may be impacted if the system is running a large number of concurrent tasks. It suggests that the benchmark should measure task-switch time, for example, with a hundred active tasks. Their point is well taken but, I believe, it is inappropriate for the benchmark to specify what the "background load" should be. Why a hundred tasks? Why not five tasks or five hundred? And what should each task be doing? Background loading is a very application-specific issue. The only way to obtain a generic benchmark number is to use an unloaded system (no background application tasks).

The most important objection raised in this letter is that we should not be trying to devise or use synthetic benchmarks like Rhealstones (or Whetstones or Dhrystones) at all. A book on computer architectures is cited to support the assertion that "a mixture of real programs" can be a better vehicle for performance evaluation than a synthetic benchmark. We could have a long and vigorous debate on that point. But even if it were true, Kasten, Howard, and Walsh seem to have overlooked some major drawbacks of using a suite of actual real-time programs as a benchmark.

For a benchmark to gain wide acceptance in an industry, it must be compact and easy to recompile and run in a variety of software/hardware environments. This is much more true of a synthetic benchmark than an actual program suite. It is probably the major reason why Dhrystones and Whetstones are so commonly used, whatever their theoretical shortcomings.

Since synthetic benchmarks are short and relatively simple, the application engineer can easily understand them, modify them and/or decide if they are appropriate for the task at hand. In the real world, few engineers have the time to even understand what exactly is being measured by some humongous program suite. If you were evaluating a real-time kernel for use in an automobile microcontroller, how much confidence would you have in a mixture of benchmark programs from the oil-drilling industry, satellite manufacturers, nuclear reactor designers, and a dozen other industries (especially if you did not know what any program actually did)?

The program suite would not reveal what the system's worst-case response times are. This was one of their major criticisms of Rhealstones, but their suggested solution comes no closer to addressing it.

Finally I'd like to thank Kasten, Howard, and Walsh for their thoughtful response to my benchmarking proposal. The absence of a widely used real-time benchmark standard often leads to the use of inappropriate general-purpose benchmarks (Whetstone, Dhrystone) by real-time application designers. The Rhealstone benchmark was proposed to help fill this void, though neither Rhealstones nor any other benchmark will do justice to every real-time situation. If it seeds and motivates further creative effort and discussion in the real-time software community, the Rhealstone proposal will have more than served its purpose.

SEGTABLE and Windows 3.0

Facing the start of a Windows development, I recently reread the article which Tim Paterson and Steve Flenniken contributed to the March 1990 issue of DDJ, "Managing Multiple Data Segments Under Microsoft Windows." It was a good article. But I have a few questions about its applicability to Windows 3.0.

The obvious first question is does Windows 3.0 support the undocumented call they use to register the local segment table? Since recent Microsoft apps that ran under Win 2.0 and that they say use these techniques run unchanged under Win 3.0, I would guess that the answer is "Yes, the world does look a bit different, though, in Win 3.0 since the segment table is no longer handling segment numbers but rather protected-mode segment descriptors." Again, I would guess that this shouldn't make too much difference.

In the second article in the series, Paterson and Flenniken spend time discussing how the segment table technique fits in the EMS. Since Win 3.0 manages all memory on a machine, I would think that one need no longer worry about EMS but should write an application as if one had much more global memory available for allocation. One might still be memory constrained on a machine with 1 Meg of RAM running in Win 3.0 real mode, but now it seems that the solutions are to write to use less RAM (which might include roll-your-own virtual memory) or to require more RAM installed as extended memory and use Win 3.0's protected memory mode.

I actually find it a bit disturbing that Microsoft uses these techniques in their products but does not document them. It seems to give the lie to their assertion that Microsoft apps writers have no secret information about Windows that gives them a competitive advantage.

In addition to commenting on the above, I would appreciate any other comments Tim Paterson might have with regard to using memory in Win 3.0.

3Com Corporation

Santa Clara, California

Tim responds: Steve asked some very good questions about the relationship between Windows 3.0 and the segment table I described in my February/March article. Fortunately, the answers are surprisingly simple, once you know a little bit about the new Windows.

Windows 3.0 has three operating modes, selected when it starts up. Real mode is identical to Windows 2.x, so it can run all Windows 2.x applications unchanged. Real mode even works on 8088/8086 processors, just as Windows 2.x did. Real mode also supports the segment table exactly as I described in my article. In fact, the new Windows Software Development Kit (SDK) includes a description of the Define Handle Table() function that kicks in the segment table, although the one-paragraph description is a little thin to relate a full understanding of its application.

The other modes for Windows 3.0 are Standard mode (or 286 mode) and 386 Enhanced mode. From the standpoint of a Windows application, these modes are the same; their main difference is in how they handle non-Windows programs. These modes run the processor in protected mode, where the values loaded into the segment registers are "selectors," not paragraph addresses. Even when the Windows memory manager moves or discards a segment in protected mode, the selector never changes; as far as the application is concerned, there is no memory movement. The concept of the segment table is meaningless, and the DefineHandleTable() function is ignored.

In other words, if you're willing to limit your applications to 286 or better processors running Windows 3.0, ignore my article. Not only will you get to pass far pointers around willy-nilly with no concern for memory movement, you will also get transparent access to extended memory. But if you want to run on the older machines (8088/8086) or with Windows 2.x, Windows 3.0 Real mode changes nothing. It appears that most companies, including Microsoft, will be taking the first approach.

I, too, found it disturbing that Microsoft applications were using a technique that was not documented -- that's why Steve Flenniken and I wrote the article. However, people at Microsoft left the impression that this was more of an oversight (or maybe just too much trouble to document) than intentionally holding back. I find myself working for Microsoft once again, and no one has complained to me that I told a secret.

Hypertext Caveats

Regarding your June 1990 hypertext issue: How do I usually read DDJ? I usually read it on the train or as bedside reading. I also usually mark up the listings. Hypertext has its place, but not in the quiet contemplation that must accompany learning.

Port Credit, Ontario, Canada

Patents Cont.

Lacking Barr Bauer's special qualifications and insights into the patent ethos, I would nonetheless like to reply to his letter which appeared in the July 1990 edition of DDJ. Mr. Bauer makes a convincing case that an inventor or investor should be encouraged to advance the cause of technological innovation. Whether patent laws do this or not is not clear.

It appears that patent laws can benefit large, well-financed organizations, but this benefit comes more from the ability to defend themselves in court than from any innate protection afforded by the laws themselves. Many innovations of the twentieth century (button-release socket wrenches, intermittent-control windshield wipers, FM radio, television, to name only a few) have been appropriated from their original inventors and exploited by large organizations. And, ironically, a significant part of the success story of American industry in the late 1800s-early-1900s and beyond is the story of industrial espionage and the infringement of some key European patents.

Whether or not investors and inventors benefit in the long run from patent laws, it seems to me that a larger question when the laws are applied to software is whether the craft or society itself benefits. As Mr. Bauer points out, changes in software come so rapidly that patents may outlive their usefulness well before they expire. If this is so, how can developers ever hope to build on existing software to further the state of the art?

I think patenting will tend to further fragment the development of software, adding unnecessary costs and delaying true innovation. Isaac Newton said "I have stood on the shoulders of giants," and by this he meant: No programmer is ever going to get anywhere if he has to go back to square one every time he boots his system.

Phil Wettersten

Chillicothe, Ohio

We welcome your comments (and suggestions). Mail your letters (include disk if your letter is lengthy or contains code) to DDJ, 501 Galveston Dr., Redwood City, CA 94063, or send them electronically to CompuServe 76704,50 or via MCI Mail, c/o DDJ. Please include your name, city, and state. We reserve the right to edit letters.