DSP and Audio Compression

Enhancing audio compression performance

Jay B. Reimer

Jay, a senior member of the technical staff at Texas Instruments, has been involved in every generation of TMS320 digital-signal processors. He has also been involved in the development of a number of audio and communications algorithms and systems. Jay can be contacted at Texas Instruments, P.O. Box 1443, M/S 701, Houston, TX 77251-1443.


By its very nature, audio compression is a real-time process. You simply cannot afford to have gaps during the processing of data that lead to dropouts in the output audio. Historically, the architecture of the PC's operating environment--that is, its operating system and hardware--wasn't designed to optimally support real-time operation. With increasing demands placed on the PC's host processor to support graphics and provide access to file systems, audio processing is relegated to whatever MIPs are left over. This means that as the system designer, you have two choices: shoehorn the audio processing into the leftover MIPS, or have a dedicated coprocessor that can meet the real-time requirements of audio processing.

This is where the benefits of digital-signal processors (DSPs) become apparent. The architecture of DSPs has, from its inception, been optimized to support the needs of real-time systems. The DSP is designed to do a great deal of work in each clock cycle. This unit of work is consistent and includes the ability to perform a multiply- accumulate calculation (MAC) in the same time the processor can perform an add calculation or other ALU operation. Many microprocessors still require multiple cycles for add and ALU operations, although RISCs only require one. Since the essence of many of the audio-compression techniques (LPC, ADPCM, MPEG) is a "sum-of-products" type calculation, DSPs provide perfect support for them. This article explores the various audio-compression techniques and how DSPs can be used to improve performance.

Audio-compression Techniques

To reconstruct an analog signal, basic sampling theory dictates that the sample rate be based on the signal's bandwidth. For audio signals, the sampling frequency for selected inputs is shown in Figure 1. The second major attribute of a sampled signal is the resolution, or number of bits, that are used to digitize the input waveform. Most PC sound systems offer 8 bits of resolution, while music or CD-quality systems require 16 bits of resolution. These two attributes--sampling rate and resolution--determine the overall data rate of the system and the total amount of memory required to store a digitized signal.

A basic technique to capture digitized audio and preserve it is called linear pulse code modulation (PCM). PCM samples an audio signal at a given frequency with a selected resolution to generate a digital representation of an audio input. PCM requires a large amount of memory to store a digital representation of a sampled signal. One minute of uncompressed, stereo, CD-quality audio can consume close to ten Mbytes of storage.

Therefore, one of the biggest challenges associated with the use of audio information is the implementation of efficient compression algorithms that minimize the amount of storage space required for a sampled signal. By selecting the most effective audio-compression technique, you can minimize the amount of memory required to store a sampled audio signal while producing an output audio signal of acceptable quality. Some of the more popular audio-compression techniques are shown in Figure 2.

One popular type of audio compression, pioneered by the telecommunications industry, is LogPCM. This compression technique is based on a form of compression that is logarithmic instead of linear. One of two logarithmic functions are generally used: the mLaw function in Example 1(a), and the aLaw function in Example 1(b). The logarithmic nature of the compression algorithm provides nearly 2:1 compression by allowing the system to have 14 bits of dynamic range contained in a sample that only requires 8 bits to physically store. There are many devices currently available to implement this compression technique, and for the storage of voice-only signals, LogPCM is adequate to the task.

Another variation of PCM encoding, called "adaptive differential PCM" (ADPCM), gives a 4:1 compression ratio. It uses the history of the input audio signal as a method of predicting the changes in that signal. Instead of encoding the actual input audio signal's value, the ADPCM technique encodes the difference between the value of the input audio sample and the same signal's predicted value. If the prediction, which is typically done with a filter process through a feedback loop, is good and the difference is small, the difference can easily be encoded with fewer bits. The quality of an audio input compression using the ADPCM technique is largely determined by the amount it is oversampled and the length of the signal's stored history.

The subband coder (SBC) can provide additional compression by combining elements of ADPCM with a technique for segmenting the signal into frequency bands. The significance of each frequency band is either predetermined for the coder or determined by additional analysis at the time of compression. Those frequency bands deemed to be more important to the overall quality of the signal can then be allocated a proportionally larger number of the bits used in coding. This process does not waste bandwidth on unused or less-significant frequency bands.

MPEG audio, a method of compression which is being widely considered for a large number of high-quality audio applications, uses a subband coder technique. Figure 3 provides a block

diagram of the MPEG encoder and decoder. To maintain quality with increased compression, MPEG relies on a masking model. In this model, processing noise for one frequency band is covered up or masked by the signal in an adjacent frequency band. The use of multiple frequency bands and the masking model enables MPEG to be used for effective audio compression of signals with a 20-kHz audio bandwidth. Compression ratios as high as 24:1 can be realized with MPEG compression.

Linear predictive coding (LPC) is a technique which provides even greater levels of compression, but is applicable to voice rather than general audio. LPC was designed for human speech and uses a production model that takes advantage of speech-production characteristics--that is, the human vocal tract and the assumption of a single sound source. Using this model, compression ratios of 32:1 and greater are possible. One of the more recent variations of LPC, codebook-excited linear prediction (CELP), offers superior quality over previous versions. A block diagram of CELP is shown in Figure 4.

Audio-compression Quality

While the quality of each compression technique shown in Figure 2 is subjective, there's general agreement that as the compression ratio increases, the quality decreases, at a given sample frequency. While this statement is generally true, there are certain exceptions based on the compression technique. In other words, linear PCM only maintains its superiority over LogPCM up to a point. Below that point (at fewer bits per sample), the noise introduced by reducing the number of bits in the PCM data is more noticeable than that introduced by the compression technique used by logPCM.

Basically, every compression technique has a limit on quality for different levels of compression. This means no single compression technique is a panacea for all potential applications. The selection of the proper audio-compression technique is a key element in the design of any enhanced audio application.

Selecting an Audio-compression Technique

The selection of the appropriate audio-compression model for a particular application is a complex process. None of the compression techniques is perfect for all applications, and the designer or system user must always make trade-offs when selecting an audio-compression algorithm. Figure 5 provides a means of visualizing the trade-offs between audio-compression techniques. The three-

dimensional space shown is defined to be the complexity of the compression algorithm, the bit rate of the compressed data, and the audio, or acoustic, quality of the signal. In all three cases, the further the attribute is from the origin, the better it is. Although other attributes can be assigned to each of the axes, most are dependently linked to each other.

When a point on each axis is chosen, a plane in the three-dimensional space is defined. This plane remains relatively constant within the constraints of system cost, where system cost is defined in terms of processing power (MIPs) and memory requirements, both for processing and for storage or transmission. Therefore, to maintain a given level of audio quality, the system has to bear the additional cost of a higher bit rate or a more-complex algorithm. If the system cost cannot be increased, some trade-offs will have to be made between the algorithm's complexity and bit rate.

There are also several other dependent factors that you must consider when selecting a compression technique. The additional factors that affect the axes attributes are listed under their respective axes.

DSP-based Audio Compression

Although DSP techniques have been used in many fields for decades, it has only been since the early '80s, with the introduction of TI's TMS320 series of single-chip DSP devices, that these devices have had enough power to perform the calculations required to implement real-time audio-compression techniques for compression ratios greater than 2:1, while maintaining high levels of quality. Much of the early demand for DSP audio compression was driven by the telecommunications industry, where various techniques such as ADPCM have been in use for the last ten years.

A compelling example of just why DSP-based, or for that matter, any other type of audio compression, is important, can be made using this article as a test case. For the sake of argument, let's make the following assumptions: This article contains about 3000 words. On average, a word consists of about six ASCII characters, including spaces. A typical speaking rate is 100 words per minute. In order to retain good voice quality, I'll use a sample rate of 8000 samples per second and a resolution of 16 bits per sample.

Making these assumptions means the ASCII representation of this article requires nearly 20 Kbytes to store. If it takes 30 minutes to orate this article, a quick calculation shows that using basic PCM encoding means the audio representation of this article would require about 28.8 Mbytes to store! If I were to use the ADPCM compression technique, the storage requirements for this same example would shrink to 7.2 Mbytes. If the more-advanced CELP audio-compression technique were used, the audio representation of this article would become 900 Kbytes.

As Figure 6 demonstrates, while audio-compression techniques can vastly minimize the storage needed for a given sample, they can't reach the density of pure ASCII. The question you must ask is whether or not the advantages of audio outweigh the disadvantages, such as the associated overhead. Based on the advantages of incorporating audio into an application, the proper audio-compression technique makes this decision much easier to make.

DSP and Audio Compression

An important aspect of the DSP-based system, especially as it relates to the various requirements of multimedia, is its ability to process and schedule multiple tasks simultaneously. The design of a flexible DSP-based system guarantees that each and every real-time task will have the processing resources it needs. As compression algorithms become more complex and the host processor becomes more overloaded, the need for a dedicated multitasking DSP-based solution becomes more pronounced.

Another important aspect of a DSP-based approach is the efficiency of its I/O system. By incorporating a powerful, direct memory access (DMA) capability, a DSP-based system is able to use most of its processing power for actual computation, not for moving data in and out of the system.

As previously mentioned, virtually all audio-compression techniques are typified by the MAC cycle. Since the architecture of the DSP has been defined to allow MAC cycles to occur efficiently while doing other tasks in parallel--such as fetching data from memory or storing results back to memory--audio-compression techniques map well onto DSP-based hardware. This "hand-in-glove" fit of a DSP-based architecture to implement audio-compression algorithms is certainly more than coincidence. The latest techniques for audio compression, including MPEG and LPC, all require some sort of input filtering, such as infinite-impulse response (IIR), or finite-impulse response (FIR) filtering, in order to operate. This filtering process, which is very MAC intensive, is a perfect application for the DSP.

MPEG uses another key DSP technique, the Fast Fourier Transform (FFT), as an important part of its audio-compression algorithm. MPEG uses the FFT to determine where most of the energy in the input signal is and can therefore allocate the distribution of masking bits appropriately. Since the DSP supports real-time FFT processing, this distribution of masking bits can occur in real time, thus improving the overall compression algorithm and the output signal.

Software to Simplify System Development

The use of audio compression in a software application can seem like an overwhelming task, but in reality, a flexible, DSP-based solution simplifies this task significantly. Choosing the right system allows you to include audio-compression techniques without a corresponding increase in overall code complexity. This means you will be able to focus on the high-level metrics associated with audio compression; for example, the number of disk-storage bytes one minute of recorded speech or audio will require, or which sound quality is appropriate for a particular application. In order for this to take place, a simple interface to the DSP is a must.

A typical DSP-based system block diagram is shown in Figure 7. This block diagram, which is based on TI's TMS320 system, is typical of the standard DSP-based, multimedia solutions available today. A system such as this allows you to support many different types of audio compression without the overhead associated with the development of a new software driver for each one. Since the audio files are stored with information about the sample rate, the number of bits per sample, and the encoding algorithm, a DSP-based system allows a high-level API to be used. As new compression methods become available, the flexible nature of this type of system will accommodate them with a minimal impact on the system design.

As shown in Figure 7, there are two main software interfaces in a DSP-based system that are of interest to a system designer: the applications interface and the task developer's interface. The software support required for each of these, while very different, must be comprehensive to provide the system designer with the tools needed for a complete software-development environment.

Applications Interface

For most software developers, the applications interface--the interface between user applications, the operating system, and the DSP-based system--is most important. The key attribute for the applications interface is to provide a simple, standardized interface between the application and the DSP-based system. This allows you to easily incorporate advanced audio-compression techniques in new software applications and to make the use of different types of audio compression transparent to the application itself.

One method of accomplishing this is through the use of APIs to extend the base-level Windows Media Control Interface (MCI) device drivers. These high-level APIs would, for example, allow the application to simply call a driver that would play a compressed audio file. The driver itself recognizes the type of compression algorithm used and plays the file back accordingly. All of these operations would take place transparently to the user because of the advanced programmable nature of the DSP-based system. This means an application can easily incorporate audio compression to enhance its functionality and marketability.

Task Developer's Interface

The task developer's interface is the low-level software interface to the DSP-based system that allows you to produce new applications-interface drivers and write new tasks (such as compression algorithms) for the DSP device itself. This level of software interface requires a full set of software-development tools, including debuggers, compilers, assemblers, and linkers as well as access to the DSP operating system.

The key to accessing this level of the system is the ability to incorporate the latest advancements in audio compression in your application without changing the hardware. By writing the underlying code associated with a new audio-compression algorithm, you can be assured that upgradability and functionality will follow your systems, increasing their value to the end user.

One example for isolating the functionality of a new process from the underlying hardware is shown in the code segment in Listing One, page 69. This task developer's model is relatively simple and provides a standard methodology for handling data I/O and process control. Data I/O handled in this manner minimizes the amount of overhead associated with the task of interfacing the system to data from a continuous source, or sink, such as A/D or D/A converters, by automatically keeping track of where the data is at all times.

Listing One contains two basic, key forms of task communications that are important to the task developer. The first involves the concept of interfacing a stream of continuous data, in either a compressed or uncompressed form, to the application via general-purpose connectors (GPCs), which are circular buffers that serve as the interface to the audio-compression algorithm (one GPC for the input, and one GPC for the output) and the I/O data streams. These circular buffers allow the algorithm to have free access to the data on an "as needed" basis without being burdened with the additional overhead and peculiarities associated with controlling the data transfers in and out of physical I/O channels.

The GPCs also provide an excellent means for inserting or removing additional tasks from a processing flow without requiring changes in each task to account for the presence or absence of the additional tasks. As an example, a mLaw compression task could be added following an LPC synthesis task if a

mLaw CODEC is being used to convert the digital signals to analog in one system. In another system, where a linear CODEC is used, the mLaw compression task would be removed. In either case, the LPC synthesis task would process data in exactly the same way.

The second key concept involves the use of an intertask communications block (ITCB), which is a form of state control for system communications. The purpose of the ITCB is to pass pertinent information, such as task status and process commands, between the host processor and the DSP or between two DSP tasks. A typical application for an ITCB would be to store the volume-control information associated with the compressed-audio data. In this application, the ITCB would hold the audio-gain information for a particular data sample, which both the host processor and the DSP subsystem have direct, shared access to.

Listing One includes two functions, Echo and ModifySample, which demonstrate the processing that may be done on typical sample data. The Echo function simply demonstrates how to read data from the input GPC and write data to the output GPC. The purpose of ModifySample is to show where the data-compression algorithm would be placed in the code to implement the selected compression functionality.

Conclusion

The potential for audio compression in today's multimedia market is practically limitless. As more applications demand audio, practical system limitations, such as cost and system size, will dictate the need for audio compression. And while the current usage of enhanced audio in multimedia systems is impressive, it pales in comparison to the potential for future applications. Although the applications for enhanced audio compression continue to evolve, compression techniques are all based on fundamental concepts that will remain applicable well into the future. The selection of a programmable DSP-based system can help you implement the next generation of multimedia applications.

References

Lynch, Thomas J. Data Compression: Techniques and Applications. New York, N.Y.: Van Nostrand Reinhold, 1985.

Figure 1 Sampling frequency. The amount of data that will need to be stored for an audio signal is directly related to the quality of the output you wish to produce. Higher-quality outputs require more data to be stored and directly affect the type of compression technique that you should use.

Example 1: Logarithmic functions used in LogPCM compression: (a) the mLaw function; (b) the aLaw function.

Figure 2: Common algorithms and compression ratios.

Figure 3: A single MPEG audio channel. MPEG audio is a subband algorithm with adaptive quantization.

Figure 4: CELP is a relatively new, enhanced version of LPC. This algorithm uses the specific qualities of human speech to provide maximum audio compression.

Figure 5: The trade-offs between the basic elements of audio-compression algorithms modeled in three dimensions. If more than one of the dimensions is fixed at some upper bound, the attributes of the other dimensions must be varied and a compromise made.

Figure 6: Results of compressing a 3000-word article using various audio-compression techniques. PCM, ADPCM, and LPC assume 8K samples per second, 16 bits per sample.

Figure 7: Block diagram for a fully programmable, DSP-based audio-compression system based on Texas Instruments' TMS320 system.

[LISTING ONE]


typedef struct gpc_t {
   void        *gpc_putp;       /* Get/Put Pointer */
   unsigned short    gpc_size;  /* Size of GPC, in bytes */
   unsigned short    gpc_mwpf;  /* Max. words to be written in 1 frame */
   void             **gpc_aput; /* Address of owner's put/user's get pointer */
   unsigned short    gpc_prot;  /* protocol to be used */
} gpc_t;

#define gpc_aget gpc_aput          /* Aliases for owner/user */
#define gpc_getp gpc_putp
  .
  .
  .
extern gpc_t vio_input, vio_output;
  .
  .
  .
extern struct vioitcb_t {
       int     hioctl;      /* 0=>tel, -1=>handset connected to processor */
       int     inputctl;    /* 0=>handset, -1=>microphone input */
       int     shactive;    /* 0=>on-hook, -1=>off-hook */
} *vioitcb2;

 /*
  * prototypes for routines declared later
  */
void Echo(gpc_t * src, gpc_t * dst, short n);
int ModifySample (short s);
  .
  .
  .
short oh;
void main(void)
{
        vioitcb2->hioctl   = actl;
        vioitcb2->inputctl = amic;
        oh =  vioitcb2->shactive;
        if (oh) {                        /* Are we off-hook? */
                Echo(&vio_output, &vio_input, VIO_SPF);
        } else {
                GPCFillM128(&vio_input, VIO_SPF, 0);
                GPCAdvance32M128(&vio_output);
        }
}
  /*
  * Echo
  *
  *   src is a pointer to the input GPC
  *   dst is a pointer to the output GPC
  *   n is the number of words (16-bit) to move
  */

 void Echo(gpc_t * src, gpc_t * dst, short n)
{
        short *s;
        short *d;
        short sample;
        s = src->gpc_getp;
        d = dst->gpc_putp;
        while (n-- > 0) {
                s = GPCIncM128(s);
                sample = *s;
                /* NOW MODIFY THE SAMPLE */
                sample = ModifySample (sample);
                d = GPCIncM128(d);
                *d = sample;
        }
        src->gpc_getp = s;
        dst->gpc_putp = d;
}
  .
  .
  .

End Listing


Copyright © 1994, Dr. Dobb's Journal