Examining Audio DSP Algorithms

Algorithms for DSP-based audio effects

Dennis Cronin

Dennis is an engineer specializing in UNIX driver development for Solaris and HP-UX operating systems. He can be contacted at denny@cd.com.

Digital-signal processing (DSP) configurations span the range of complexity and cost from complete solutions (such as attached array processors) to add-in PC cards with embedded DSP controllers. In this article, I will demonstrate some basic audio DSP algorithms for creating real-time audio effects--pitch change, echo, flanging, and phase shifting--for the Microsoft Windows Sound System (WSS) which sells for well under $150.00. The only development tools you'll need to write software for the WSS are your regular C compiler and linker. The WSS sound card is comprised of an Analog Devices' AD1848 analog-to-digital converter (ADC) and digital-to-analog (DAC) chip, plus a Yamaha FM synthesis chip and some glue logic. Since this card does not have a dedicated DSP onboard, the host CPU does all the processing. It turns out that a 486 can do quite a respectable job running some of the DSP algorithms, even in C.

I compiled and tested FX.C with Borland Turbo C 2.0 and Borland C++ 2.0 on a 33-MHz 486. You'll need a 486 or a fast 386 with a math coprocessor to run this program, as it uses floating point extensively to avoid obscuring the algorithms with the fussy bit shiftings and normalization characteristic of integer DSP.

All patches work at the default sample rate of 16K. The version compiled under Turbo 2.0 will, in fact, run most patches at a sample rate of 27K on a 33-MHz 486; the Borland C++ 2.0 version is somewhat slower. Your mileage may vary with other compilers, of course. I also developed many of the same algorithms for the $99.00 Texas Instruments DSP Starter Kit (DSK). These are in its native 320C20 assembly language and use the .ASM suffix. (This code is available electronically; see "Availability," page 3.) The TI SDK comes complete with PC-based assembler, debugger, and manuals and can be ordered from any TI distributor.

Echo

The first audio effect I'll look at is an "echo," which is achieved using a single, fixed-delay element and produces the well-known Hello hello hello hello_. An echo is simply an identical copy of the original audio signal, but it is delayed by a fixed amount of time. It is easy to digitally create this fixed delay.

As you read samples from the analog-to-digital converter, you store them in a circular buffer. When the buffer is filled, the store pointer wraps back around to the beginning of the buffer.

The delay comes from a single read pointer, which is placed N slots "behind" the store pointer and marched along in step with the store pointer. As each new sample is stored, a sample is read from N samples behind it, thus creating a static delay of N*1/Fs, where Fs is the sampling rate.

This delayed signal is then mixed in with the original signal, usually at a somewhat reduced volume level. While this gives you a nice echo, so far it's only Hello hello.

To get the decaying repeats alluded to earlier, we need to supply a feedback path around the delay element. Figure 1 shows the block diagram of an echo effect that can provide decaying repeats. The more feedback, the longer the repeats take to fade away.

The first patch provided by FX.C (available electronically) for WSS demonstrates this decaying-echo effect. No similar effect is provided for the TI module since it doesn't provide quite enough memory to get suitable delays for echo-type effects.

Flanging

Flanging is a simple, delay-based effect that produces the whooshing sound heard on numerous rock records. (A classic example of this is in the break-down section of "Life in the Fast Lane" by the Eagles.) Flanging uses extremely short delay lengths which are not discernible to the ear as discrete echoes.

The origins of the term "flanging" are somewhat uncertain. Some credit George Martin, the producer for the Beatles, with coining the term in jest. Others suggest a practical origin. In any case, the effect was originally produced by running two tape machines with identical tapes closely in sync. Then the speed of one machine was slightly varied, possibly by a recording engineer's thumb on the flange of the tape reel. The resultant short varying delay creates the characteristic striking whooshing sound.

Why the whoosh? When a signal is mixed with a very short delay of itself, there will be certain frequencies at which the signal is 180 degrees out of phase with itself and near-total cancellation will occur. For instance, with a delay of one millisecond, dips (or notches) will occur at 500, 1500, 2500, and 3500 Hz, and so on.

This frequency-response shape is commonly called a "comb" filter since the notches resemble the teeth of a comb. As the delay is varied from a fraction of a millisecond to 5 milliseconds or so, the notches sweep dramatically up and down in frequency. Your ears hear this as sounding "whooshy."

The digital implementation of flanging is similar to that of echo except that the delay time must be very short and continuously variable. Figure 2(a) shows a block diagram of the signal path for flanging, while Figure 2(b) shows the "shape" of delay variation we use to create flanging. The key element is the implementation of the varying delay element.

Now, it might seem like you could vary the delay by taking the fixed-delay element implementation described previously for echo and simply moving the read pointer in relation to the store pointer by a notch every now and then. Unfortunately, this approach creates a little click every time the delay tap is changed. Any steady movement of the delay tap results in "zipper noise," an objectionable, gritty modulation noise mixed with the varying delay signal.

To avoid this, you need to implement a method of achieving noninteger delays, thereby sweeping the delay more smoothly, varying it by just a little bit with each sample. This problem is akin to the general problem of sample-rate conversion, which is discussed at length in many texts on DSP. Unfortunately, most proper methods for noninteger ratio sample-rate conversion tend to be a bit computationally intensive and not always well suited for real-time work.

Luckily, there's a simple but inexact method that yields subjectively low audible distortion, yet is computationally efficient. An averaged linear interpolation between two sample points gives good bang-for-the-buck; see Figure 3. The file FLANGE.ASM (available electronically) and the module flange_chorus() (Listing One excerpted from FX.C) provide implementation details of this linear-interpolation technique.

To create the basic flanging sound, this fine-grained variable-delay element is cyclically "swept" between a very short delay value of less than a millisecond to a longer delay value of 5--10 milliseconds. The rate and range of this sweep can be adjusted to achieve radically different characters of the basic flange.

Variations on the basic flanging effect can be achieved by providing a feedback path (like that used to create decaying echoes) and recirculating some of the delayed signal. This can dramatically intensify the flanging effect and impart a strong harmonic nature to the sound as the feedback creates a more-resonant filtering action.

Note also that the delayed signal and the feedback can be inverted before being summed by simply reversing the sign of the gain stages. This creates some interesting variants often overlooked in commercial effects devices. Figure 2(a) shows the paths for feedback and the gain stages providing for inversion prior to summing.

For another variant, the sweep is disabled entirely, reverting back to the basic fixed-length delay effect but using short delay times. The robot-voice patch in FX.C uses a fixed short delay with a lot of feedback. This creates the static, metallic, resonant filter sound used to make mechanical voices in old sci-fi movies.

It's also a short hop from flanging to "chorusing," an effect that uses the exact same processing as flanging except that the delay value is increased to somewhere around 20--40 milliseconds. This delay is long enough that the exaggerated comb-filtering action decreases but short enough that the delayed signal is not quite heard as a distinct echo. Instead, the gently undulating pitch change resulting from the varying delay just adds a subjective richness to the sound, much like a second voice singing unison--hence the term "chorusing."

Pitch Change

As you play around with flanging and chorusing effects, you'll probably notice how faster rates of delay change result in funny, warbling pitch variations in the signal. As the delay goes from longer to shorter, you'll hear a sort of Doppler shift up in pitch. When the sweep reverses, you'll hear the reverse Doppler shift as the delay "moves away from you." It proves to be fairly easy to harness this effect of a varying pitch from a varying delay into a decent, real-time, pitch change algorithm.

Pitch changers allow the pitch of a sound to be dramatically altered in real time. A downward pitch change can make your voice sound like Darth Vader from Star Wars (although the actual Darth effect is a little more complex). An upward pitch change will make you sound like you've been inhaling helium to get ready for your Alvin and the Chipmunks tryouts.

To create an upward pitch change, you could start with a 30-millisecond delay and steadily decrease it at a rate that yields the desired pitch change. As the delay approaches 0, you would start a second delay channel at 30 milliseconds and sweep it as well. Then you do a quick crossfade from the first to the second channel, making sure the first channel is completely faded out before its delay reaches 0. Repeat this process, going back and forth between the delay channels. Figure 4 diagrams the changing delays and the crossfading pattern.

A downward pitch change is achieved in a similar manner, only the delay channels are started with a near-zero initial delay, and the delay is increased out to around 30 milliseconds, at which time the alternate channel is started and the crossfade performed.

The subjective result of this approach is good for small to medium amounts of pitch change. As the interval becomes greater, however, the frequency of the splicing (or crossfading) increases until a "singing through the fan" effect is created in the pitch-changed signal. In spite of these limitations, however, this approach is effective and many commercial units are based along these lines.

PITCH.ASM (available electronically) and the pitch_change module of FX.C illustrate implementation details. A smooth crossfade is essential to good-quality blending of the two delay channels. The FX.C version use sine and cosine lookup tables to generate ideal crossfade blends, whereas the memory-limited PITCH.ASM version uses a two-piece, linear approximation of the sine and cosine functions to perform the crossfade. Note that the crossfade time can be tinkered with to provide less-noticeable splicing at certain pitch-change rates and for different signal types.

You will definitely want to plug in a microphone and talk through some of these pitch-change effects. (And you'll probably want to change your answering-machine message once you hear how cool you sound, talking through a serious downward pitch change!)

Some really wild effects can be created by combining pitch change with echo-length delays and/or providing feedback paths around the pitch-change element. Some of the patches provided with the FX.C program demonstrate these tricks.

Phase Shifting

Phase shifting is not unlike flanging in that its frequency-response characteristic is one or more notches sweeping up and down. Like flanging, the notches in the frequency spectrum result from phase cancellation between the unaffected signal and the processed signal. You can hear phase shifting all over Pink Floyd's Dark Side of the Moon and numerous other records from the early '70s. This effect is based on a curious type of filter called "all-pass." As the name implies, this type of filter passes all frequencies, but "filters" the phase of the signal. While its frequency response is a straight line, its phase response varies by 180 degrees, with a 90-degree phase shift at what would traditionally be considered the cutoff frequency of a normal filter.

Example 1(a) is the normalized transfer function of a first-order all-pass filter. Using a bilinear z-transform (BZT) method, you arrive at the (difference) equation in Example 1(b), where the coefficient A is described by Examples 1(c) and 1(d).

The phase-shifting effect is then implemented by cascading several such all-pass filter sections and sweeping their cutoff frequencies in unison. Mixing this processed signal with the original signal results in the notching effect, as the total phase delay through the filter sections causes certain frequencies to cancel. Like flanging, the effect can be varied by providing a feedback path around the filter sections, and by providing for inversion of the processed signal and the feedback.

A smooth sweep function is important; the frequencies of the filters should be changed exponentially over time. This is easily accomplished using floating point in C, but the assembly version uses another linear approximation of the desired function. See PHASER.ASM and the phase_shift module of FX.C for implementation details.

Figure 1 Echo effect. Figure 2 (a) Flanging effect; (b) sweep for flanging effect. Figure 3 Linear interpolation. Figure 4 Delay/crossfade scheme for pitch change.

Example 1 Phase shifting: (a) Normalized transfer function of a first order all-pass filter; (b) using a bilinear z-transform (BZT) method, you arrive at this equation where the co-efficient A is described by (c) and (d).

             s-1
(a)   H(s) = ---
             s+1

(b)   y(n)=A*x(n)+A*y(n-1)-x(n-1)

          1-wp
(c)   A = ----
          1+wp

           (PI*freq)
(d)   wp = ----------
              Fs
      Fs = sampling rate

Listing One


/* flange_chorus. Does flanging/chorusing family of effects based on a single
    varying delay.
    
    dry_mix     mix of unaffected signal (-0.999 to 0.999)
    wet_mix     mix of affected signal (-0.999 - 0.999)
    feedback    amount of recirculation (-0.9 - 0.9)
    rate        rate of delay change in millisecs per sec
    sweep       sweep range in millisecs
    delay       fixed additional delay in millisecs
*/
void
flange_chorus(struct program *p)
{
    int fp,ep1,ep2;
    int step,depth,delay,min_sweep,max_sweep;
    double inval,outval,ifac = 65536.0;
    long scan = 0;
    bw data;
    wl sweep;
    /* fetch params */
    step = (int)(p->rate * 65.536);
    depth = (int)(p->depth * (double)SampleRate / 1000.0);
    delay = (int)(p->delay * (double)SampleRate / 1000.0);
    /* init/calc some stuff */
    max_sweep = BFSZ - 2 - delay;
    min_sweep = max_sweep - depth;
    if(min_sweep < 0) {
        printf("Can't do that much delay or depth at this sample rate.\n");
        exit(1);
    }
    sweep.w[1] = (min_sweep + max_sweep) / 2;
    sweep.w[0] = 0;
    /* init store and read ptrs to known value */
    fp = ep1 = ep2 = 0;
    /* disable interrupts, go to it */
    disable();
    while(1) {
        while((inp(SR) & 0x20) == 0);       /* wait for input ready */
        data.b[0] = inp(PDR);               /* read input from chip */
        data.b[1] = inp(PDR);
        /* interpolate from the 2 read values */
        outval =
         (Buf[ep1] * sweep.w[0] + Buf[ep2] * (ifac - sweep.w[0])) / ifac;
        /* store finished input plus feedback */
        Buf[fp] = (inval = (double)data.w) + outval * p->feedback;
        /* develop final output mix */
        outval = outval * p->wet_mix + inval * p->dry_mix;
        if(outval > 32767.0)
            data.w = 32767;
        else if(outval < -32768.0)
            data.w = -32768;
        else
            data.w = (int)outval;
        while((inp(SR) & 0x2) == 0);        /* wait for output ready */
        outp(PDR,data.b[0]);                /* write output to chip */
        outp(PDR,data.b[1]);
        /* update ptrs */
        fp = (fp + 1) & (BFSZ - 1);
        sweep.l += step;
        ep1 = (fp + sweep.w[1]) & (BFSZ - 1);
        ep2 = (ep1 - 1) & (BFSZ - 1);
        /* check for sweep reversal */      
        if(sweep.w[1] > max_sweep)          /* see if we hit top of sweep */
            step = -step;                   /* reverse */
        else if(sweep.w[1] < min_sweep)     /* or if we hit bottom of sweep */
            step = -step;                   /* reverse */
            
        check_human();                      /* check on human every so often */
    }
}