Dr. Dobb's Journal March 2001
Demand for highly integrated wireless systems is increasing as silicon providers respond with a broad array of intellectual property, increasingly dense technologies, and an industry-wide focus on System-On-Chip (SOC) design tools and integration methodologies. The trend toward reducing the number of components in these systems clearly has the advantage of reducing cost, overall power consumption, and manufacturing complexity.
On the other hand, developers are left with the daunting task of realizing complex devices with increasingly reduced visibility of subsystem interaction. These devices use programmable microcontroller (MCU) and digital signal processor (DSP) cores coupled to embedded memories and a myriad of peripheral modules on a single chip. This trend toward more highly integrated systems is clearly increasing the need for improved methods of system validation.
SOC design methodologies for programmable cores now include static debug blocks, which may be used during the early stages of product development. By including additional debug-related capability, on-chip suppliers are offering designers the ability to fully understand the behavior of a given system, including validation of both hardware and software architectures and their interdependence. This is essential for evaluating real-time power consumption in Internet-ready handheld devices.
In this article, we'll examine the issues associated with developing power-efficient handheld wireless devices and the necessary on-chip debug capability needed for rapid product development. To illustrate what's involved in debugging highly integrated multiple core systems on a single chip, we'll use Motorola's M-CORE M341 micro-RISC processor as an example. We'll also discuss an implementation of a real-time debug port based on the IEEE Industry Standards and Technology Organization (ISTO) Nexus 5001 Forum specification.
To better appreciate the problems in developing low-power/high-performance systems, a cellular handset is a good place to start. A digital cellular handset can be partitioned into three main sections; see Figure 1. The RF section receives and transmits analog and/or digital information; the Analog Baseband and Control section handles intermediate frequency conversion, user interaction, and power control; and the Power Management section distributes and manages power to all elements of the handset.
In first and second generation digital cellular devices, overall baseband power consumption is derived from the combination of:
The relative periods of standby and active power can be calculated with decent accuracy based on knowledge of the wireless protocol. Standby power consumption can hence be estimated via leakage of current information for a given technology, and knowledge of the amount of time the chip stays in this inactive mode.
Active power consumption is a bit more difficult to estimate, but for repetitive software stacks performing known protocol functions, it also can be relatively accurately determined. Consider the problem of estimating and optimizing on-chip power consumption during user-induced events such as Wireless Application Protocol (WAP) browsing, high-speed down/up-link transactions, Motion Picture and Entertainment Group (MPEG4) structured audio activity, and the like.
Although the embedded system contains all the necessary capability to perform these functions (even in parallel with other events), their behavior is much less deterministic. Software written to handle this multitude of system activity must be carefully optimized to improve overall battery life for a particular application. Prior studies show that the three main blocks of the cellular handset each consume from 15 to 50 milliamps (ma) of current depending on its state of activity; see Table 1.
Lab bench analysis of prototype systems permits conventional methods of evaluation such as circuit boards with logic analyzer interfaces. Typically, these boards provide a means for initial power up and integration of software and hardware modules. Each core in the baseband processor chip is evaluated individually in a static debug form where they are each put in a special mode of operation. Their programmer model registers and memory are checked while single stepping a test program that was downloaded from a host computer. Once the system passes the smoke test, where each processor exits reset and performs initialization functions correctly, the task of debugging real-time kernels and interrupt structures begins.
The art of debugging a real-time wireless device has classically required the use of a logic analyzer monitoring an external bus interface, where at each clock cycle, a sample of bus activity is recorded. This is becoming extremely expensive and physically impossible as microcontrollers and DSPs operate above 100 MHz. When bus interfaces are not available, developers embed printf statements in their code at strategic points so data needed is sent to a peripheral port and retrieved by a host processor. The information is minimal and presents intrusive delays in the application. This has become unacceptably time consuming as system software layers become more complex. This being a common problem, microcontroller developers are now demanding reliable and cost-effective solutions from semiconductor suppliers and development tool vendors.
Over the past couple of years, a consortium of companies have worked to address the issue of real-time debugging of highly embedded systems. The automotive, telecommunications, and network-appliance industries have driven this effort in order to reduce time to market of new products. The consortium began with five companies and has grown to more than 25 companies participating in the definition of a specification, which is now governed by the IEEE-ISTO Nexus 5001 Forum (http://www.ieee-isto.org/Nexus 5001/index.html).
The objective of the IEEE-ISTO Nexus 5001 Forum is to define a common set of microcontroller on-chip debug features, protocols, pins, and interfaces to external tools that may be used by real-time embedded application developers. Revision 1.0 of the specification is currently available for review and serves as the model for future on-chip debug resources implemented by silicon vendors.
One major objective of the forum is to help development tool vendors easily provide a standard set of tools that may be used on a number of embedded microcontrollers. In the spirit of reusability, it has been recognized that many semiconductor vendors currently have debug ports and toolsets that address the static debug requirements of their architectures. Providing a cost-effective yet powerful migration path to a standard set of dynamic debug features is a goal of the forum. More than 70 percent of leading embedded microcontroller vendors have dedicated circuits and pins which assist in new product development based on the IEEE 1149.1 Joint Test Action Group (JTAG) 4-wire serial interface.
The JTAG pins and protocol help developers with static debug methodologies in a master-slave mode, but there is no means for the embedded microcontroller to initiate real-time information transfers to host computers. The Nexus standard addresses this need with a scalable set of features, whereby existing debug blocks may be used with an extensible auxiliary port. The features associated with this new auxiliary port focus on real-time transfer of information to and from the embedded microcontroller.
To address various levels of development needs, the Nexus 5001 Forum has categorized static and dynamic debug features according to class levels. These classes provide a means for implementing a scalable debug architecture that can address different market segment requirements. Also, it should be noted that when a product is in development, it is desirable to have as many debug features available as possible because of time to market constraints. Once a device is put into production, however, it may not be necessary or desirable to have all the development features and pins. Cost savings may be realized by implementing a scalable debug port, which meets only the requirements needed for specific stages of the products life cycle.
Again, the heart of the cellular handset is the baseband transceiver which performs all computations relative to call service, Internet web interaction, and handset control. Figure 3 is a block diagram of a Motorola wireless baseband processor (including separate MCU and DSP core complexes) interfaced to separate on-chip RAM and ROM memories and core-specific peripheral and I/O functions.
To fully understand each core's operation separately as well as its interaction with the other, it would be desirable to pin out the internal core buses to external pads, thus achieving good visibility of core bus cycles. However, due to the desire to reduce I/O and package costs, this becomes prohibitive. Nonetheless, system hardware and software architects still desire a method of understanding standalone and integrated core behavior.
Rather than elaborate on the details of the IEEE-ISTO Nexus 5001 specification, we'll examine some of the needs of debugging an Internet-ready handset. The WAP architectural specification focuses on optimizing for efficient use of device resources. But the task of providing a communications protocol as well as an Internet protocol layer dictates that the RAM required will be 1-4 MB and the Flash ROM, which will hold the kernel, will be on the order of 256-512 KB.
Since the number of external accesses to RAM directly affects power consumption, the microcontroller engine must have an efficient instruction set and resident cache and Memory Management Unit (MMU) to reduce external bus transactions as in Figure 4. A key power consumption goal is to write the handset code so that it efficiently uses the cache. Once the cache and MMU are enabled, the interaction of the core with the cache is no longer visible to the user unless there is a cacheable instruction or data miss resulting in an external access to fill the cache. This problem is aggravated when you must debug code that exhibits abnormal behavior in real time or there is a need to capture power measurements when running specific code.
The M-CORE M341 architecture implements a Nexus 5001 port for accessing user resources using a high-speed output port to transmit real-time program and data information. The feature set of the Nexus 5001 port is of class 3, providing static debug capability and real-time process identification, program trace, data trace, and read/write access to M-CORE Local Bus (MLB) resources. The task of reporting real-time 32-bit address and data values through a 2- to 8-bit output port requires that an efficient method of transmission be utilized.
A set of data packets, commonly referred to as "Public Messages" in the Nexus 5001 specification, has been defined for efficient transfer of debug information between the embedded processor and a development system. Public Messages consist of a transfer code (TCODE), source processor identification number, and the data associated with the particular feature being accomplished. A key requirement in the definition of the Public Messages is efficiency; thus packets may be variable in length depending on the TCODE. Messaging capability is controlled by a JTAG serial interface. The JTAG interface couples to a OnCE static debug block and provides access to all Nexus 5001 registers on the M-CORE M341 processor. Messaging capability is enabled prior to desertion of the reset pin, so exit from reset may be monitored.
Following program behavior can be reduced to the changes in the program counter due to branching, jumping to subroutines, and servicing interrupts and exceptions. Analysis shows that on average, 12-13 percent of instructions executed in a program are of change of flow nature. Therefore, it is not necessary to report every instruction's address but rather report only the change of flow. What is needed to follow the source listing is where you are relative to a reference start address, and where you are going when you change program flow.
Three types of public messages provide program flow behavior. Real-time operating system (OS) debug must have a means for reporting a process ownership identifier. The objective of the Ownership Trace Message is to give the most current value of the data bus when a process writes to a special address. This address is called the "User Base Address," where comparators on the Nexus 5001 snoop logic triggers a capture of the data bus. Thus, whenever a context switch of the OS occurs, a process identifier may be transmitted using Ownership Trace message. This may be key for correlating virtual to physical address maps of the MMU when sending messages to the source-level debugger.
Branch Trace Messages report when direct branch or indirect branch instructions are executed. The difference in the messages is that in a direct branch occurrence, the only information needed is the number of instructions executed since the last change of flow. A reference address using a Sync Message is normally transmitted to establish where the program counter currently is. After that, all references are made to that address until an indirect change of flow occurs. This reduces the number of bits transmitted in a message. Indirect branch messages report the number of instructions executed since the last change of flow and the address where the program counter is jumping to, thus establishing a new reference address.
If you need to report specific memory accesses, the Watchpoint Message does the job. This message triggers off hardware comparators and complex access qualifiers which monitor the M-CORE virtual bus. The idea is to set a watchpoint trigger where a signal, as well as a message may be transmitted. The message tells which of the watchpoint triggers occurred. This is especially valuable for debugging variable writes. For example, if you have a global variable, which is being modified by a number of processes, and you want to pinpoint which of those processes is accessing that variable, the watchpoint message is the tool to use. This feature also asserts an event-out pin that may trigger a logic analyzer to capture specific public messages and/or peripheral signals. For power analysis, the trigger may be used to capture current measurements at specific points in code or data accesses, which may be useful for pinpointing power-consuming hot spots.
Data Trace messages provide a means for reporting real-time data accesses to memory locations. Reporting data loads and stores has a much higher instance than reporting program flow changes. Analysis has shown that as much as 25 percent of instructions executed in a program are data accesses. Data Messages may be used to report stack contents, global and local variables, as well as peripheral port accesses. To control the number of Data Messages transmitted data trace qualifiers, include the access type (that is, read, write, or either) as well as a start and stop address range. If the data address and access type qualifiers are met, data messages are generated and sent to the debug port. This narrows the window of memory locations that may incur a Data Message.
Only sending the unique portion of the data address instead of the complete address reduces output bandwidth requirements for the debug port. Consequently, a data trace message is reconstructed relative to each prior message using a synchronization message as a reference address to begin with.
The M-CORE M341 Nexus port provides access to the MLB mapped resources via the JTAG port. A Ready for Transfer pin (RDY) increases the transfer rate. Calculations show that accesses to the Read/Write Data Register allow for a throughput of 1 MBps on an M-CORE M341 microcontroller operating at a 50-MHz system clock. Block transfers are possible with a single setup of Read/Write control and address registers. This permits 32-bit transfers in 38 JTAG clocks where each JTAG clock is one-half of the system clock.
This capability reduces program and data load times as well as lets you examine arrays of memory without stopping the application. Data Trace Messages only report data movement within a well-defined data window while Read/Write Access permits accessing values asynchronously. This feature is useful for downloading new filter coefficients or encryption keys when testing communication protocols. Another important use is the retrieval of power data values that may be built into the power management unit of the handset.
Two key ingredients to successfully reducing development time is on-chip circuits such as the Nexus 5001 port just described and the development tools that support the Nexus interface. The Nexus 5001 specification defines pins, connectors, and the protocol for transferring messages to/from host computers. However, it is difficult to define stringent rules for Nexus register sizes, bit positions, and other implementation-specific details that may not suit particular semiconductor vendors' architectures. An application protocol interface (API), which abstracts implementation details is the ideal solution for tool vendors.
Figure 5 illustrates a lab debug environment that uses an integrated software tool set coupled to a logic analyzer and power source/monitor for complete handset development. The emulation controller provides the abstraction layer so that an API (which provides the details at the emulation controller to Nexus interface without burdening the tool vender) may be defined. An FPGA was added to the emulation controller, which would reconstruct the full message from the two Nexus port output pins to a 40-bit wide message with message trigger signal. This improves utilization of the logic analyzer's trace buffer. Classical debuggers use a load/arm/go scenario, where once the debugger has sent the target processor(s) to work, the debug environment is frozen until the target processor reenters a debug or interrogation mode. To fully exploit the real-time debug capabilities of the M341 Nexus port, the debugger must permit interrogation of target resources as it executes code in real time.
During initial development of the M341 Nexus port, a Hiware HI-WAVE (Highwave) debugger was interfaced to a Tektronix TLA-714 logic analyzer. The Hiwave debugger interacts with the Tektronix logic analyzer to arm its trace buffer for message captures and later displays the trace buffer contents within the Hiwave environment (see Figure 6).
Because the M341 processor has a different instruction set and programmer's model than the DSP56600 architecture, a dual integrated environment with split windows, one each for the respective processor, is utilized for debugging the baseband transceiver. A single emulation controller is used, which can communicate with each processor using the JTAG protocol. A semaphore configuration in the dual debugger's control module regulates traffic to the emulation controller so that there are no message collisions when communicating with either processor.
The additional feature set of the Nexus port doesn't come without some die area and power penalty. Therefore, during its implementation, all sub module clocks were gated off for inactive circuitry, and the message decode state machine and logic were enabled/disabled via Nexus control. Special consideration was given to the message queues to reduce power, and the output port was made a variable width to accommodate a 2- or 8-bit width. This is important from a development perspective. During lab analysis, the 8-bit port would be used because there was room to add larger connectors on the evaluation cards. But once the handset ergonomics had been finalized and the high-density, double-sided surface-mount board was used, it was decided that a reduced bandwidth over the output port could be feasible at that point in system development. Overall, the Nexus Class 3 implementation was 7.5 percent of the M341 processor area as shown in Figure 7. But considering the size of the complete baseband transceiver, it becomes small, relative to the addition of on-chip memories and the DSP.
Cellular-based products that can interact with the Internet are growing at a phenomenal rate. This increase in features will lead to more sophisticated portable systems and will challenge designers to provide more features at less power consumption. Therefore, system validation will play a more important role in SOC design methodologies to meet short time to market cycles. Significant effort is underway throughout the electronics industry to improve tools and methods for designing complex embedded systems. The IEEE-ISTO Nexus 5001 Forum is a testament to this and demonstrates that there is a dire need to standardize on a set of features, protocols, pins, interfaces, and tools for rapid development of real-time microcontroller-based products.
Originally targeted for automotive applications, the Nexus 5001 Forum has extended the scope of this effort to encompass telecommunications, industrial, and portable hand-held products. The problems of real-time visibility to deeply embedded microcontrollers are very similar, if not identical, to most product types. Each will have special cases for solving specific design issues, but the proposed global standard has addressed this by allowing for vendor-defined blocks for special features, all addressed by a common protocol.
DDJ