Virtual Audio Through Ray Tracing

Calculating early reflections through ray tracing of simple rooms

Tom Zudock

Tom is an engineer with Motorola's Digital Signal Processing Division. He can be contacted at tomz@dsp.sps.mot.com.


Recordings are often referred to as "virtual audio" because of their use in virtual-reality applications. When I heard my first virtual-audio recording, I was amazed that a pair of headphones with transducers just an inch from my ear could generate a sound source I perceived to be three feet over my head. Perception is the key. By understanding how we hear and what cues we use to identify the location of sounds, we can digitally process sound that will be perceived as virtual audio.

Surprisingly, it is not difficult to have crude control over virtual-audio sounds via signal processing. If your goal is simply to have general control over distance and position, simple models of the system can be used with respectable results. However, accurate placement of virtual-sound sources in 3-D space requires careful system design and implementation.

When I began developing the virtual-sound project presented in this article, I focused on maintaining a simple model. Still, a full description of the virtual-audio model and the programming involved is beyond the scope of a single article. I'll limit my discussion to the early reflections generated by a sound source reaching listeners in a simple room.

Figure 1 models a virtual-audio signal-processing system. The shaded region indicates the portion of the model I'll discuss here. The inputs to the model are the sound-source position, listening position, dimensions of the room, and monophonic sound source to be processed. The output is stereo audio presented to listeners over headphones. In the first stage, the impulse response of the room, based on early reflections, is determined. To use this information in a binaural model, the angle of incidence of each reflection must be known. From these two pieces of information and a model of the human head, time delays and attenuations of the sound reaching each ear can be imposed upon the monophonic audio to generate two channels of audio. Believe it or not, it really is that simple.

In this article, I will calculate the early reflections used to create an impulse response. The impulse response is used to process monophonic audio data and generate a monophonic output. Although not discussed here, further incorporating a simple model of the human head provides the means to generate a full 3-D virtual-audio image.

The Room Model

Figure 2 illustrates a rectangular room with a sound source and a listener located at specified coordinates. By restricting the room shape to rectangular, you can simplify the early reflection calculations.

In Figure 2, I modeled sound as linear rays ("sound rays") that travel from the sound source to the listener. Clearly, the behavior of sound is more complex than this computational simplification. Figure 2 depicts two sound rays--one traveling directly from the sound source to the listener, and another reflecting off one wall before reaching the listener. The latter reflection path is referred to as a "first-order reflection" since it made contact with one surface before reaching the listener. (Figure 2 is two dimensional, although my final solution covers three dimensions.)

To generate the room response based on the early reflections, relative arrival times and attenuations are needed. Both of these quantities are derived from the distance traveled by a sound ray. Hence, the first step in generating the room response is

to determine the distance traveled by each sound ray. Consider the lower-left corner of the room to be the origin of the Cartesian plane. Calculating the distance traveled by the reflections off the surfaces of the room may be facilitated by using

the image model method (see Durand Begault's 3-D Sound For Virtual Reality and Multimedia, Academic Press, 1994, ISBN 0-12-084735-3). A mirror image of the room and the sound source is created. A direct path from the mirrored sound source in the mirrored room represents the full distance traveled by the reflection; see Figure 3.

The importance of this technique is that the total distance traveled by the reflected sound ray is simply the distance from the mirrored source's position, MS, to the listener's position, L.

There are, of course, many more sound rays that reach the listener by reflecting off multiple surfaces of the room. By repeating the mirroring technique in both the x- and y-dimensions, direct rays from the mirrored sound sources may be drawn. In Figure 4, the mirrored rooms contain a number representing the order of the reflection that emanates from that mirrored room and the direct path for two second-order reflections from mirrored sound sources.

By extending this concept in the z-dimension, reflections off the ceiling and floor can be determined. In this scenario, the model becomes a cube subdivided into a collection of mirrored rooms. To determine the distance traveled by a reflected ray, the coordinates of the mirrored sound source must first be determined.

To find the general solution for computing the position of mirrored sources, I'll reexamine a single dimension and calculate the mirrored source coordinates by hand. Figure 5(a) shows a one-dimensional string of rooms, and Figure 5(b) shows the x-coordinates of several mirrored sound sources. The calculations of the x-coordinate of the mirrored sources may be generalized to one formula as in Figure 5(c). The first term is self evident. The second term will be XR for N odd, and zero for N even. The third term will be -XS for N odd, and +XS for N even. This formula applies to the y- and z-dimensions as well. Simply replace x with y and z.

With these equations, the coordinates of mirrored sounds in three dimensions can be calculated. The total distance traveled by each sound ray (see Figure 3) is simply the distance from the mirrored sound source (XMS, YMS, ZMS) to the listener's position.

Generating the Room Response

Generating the room response requires knowledge of the travel times and attenuations of each ray at the listening position. Since the result will be a digital filter, the arrival time will ultimately be calculated in terms of the number of samples delayed. Hence, it will be quantized to a discrete value. The delay time at the listening position is, of course, zero (since the distance is zero).

When determining the attenuation for a particular sound ray, it is the level of that sound ray relative to other sounds that is of interest. First, choose a particular distance to use as a reference (say, one meter). The relative sound level is simply the square of the ratio of the two distances.

Finally, each time a sound ray contacts a wall, it is attenuated. In reality, the attenuation is frequency dependent and varies based upon angle of incidence and the characteristics of the surface. To simplify matters, assume the walls simply scale the sound ray by a fraction. Thus, the sound level is determined by Figure 6(b).

The Room Response

Using the knowledge of the arrival times and the relative levels of the sound rays, the room response (based upon the early reflections) can be generated. The room response is the sum of the delayed and attenuated versions of the sound emitted by the source. Figure 6(c) shows how the Gain and Delay information can be used to compute the final sound. This equation is simple to implement. Only three arrays are needed: one each for gain values, delay values, and samples. The array of samples must be at least as large as required by the greatest delay time. The amount of computation required is low, but the memory required is relatively high.

Implementation

Since I wanted the program to be versatile enough to change many of the key parameters without having to modify the source code and rebuild it, I decided to use a parameter file; see Example 1. All times are in seconds and all distances are in meters. Many of the parameters are obvious, such as the input gain, the room dimensions, the wall reflectivity, and the listener coordinates. The flag on the second line indicates an open or closed sound-source path. For an open path, when the sound source reaches its last coordinate, it stays there. For a closed path, the source continually repeats the cycle. The number of mirrored rooms per dimension is also indicated. For a value of 5, there are 125 rooms total, 124 of which are mirrored rooms. The remainder of the parameter file provides the sound-source coordinates. Each coordinate is paired with a time duration to expire while at that position. Each position specified in this sample file updates on a one-second interval.

To facilitate 3-D space positions, I used two data structures: point and position (see Listing One). The point structure contains 3-D coordinates. The position structure pairs a point with a time value and a pointer to the next position. In this manner, it is easy to create and traverse a list of positions, staying at one position until its time has elapsed.

Only a few parts of the entire program are associated with the signal processing. The signal processing is performed in the main program and two other functions are used in association. Listing One includes their function prototypes, while Listing Two presents the variable declarations. The complete source code is available electronically.

Figure 7 is a flow chart of the core program, while Example 2 is its code. Before entering the steady-state operation of the program, the linked list of positions is created, memory is allocated for buffers and coefficients, and the impulse response for the first position in the list is calculated.

The processing loop executes until the entire input-data file has been processed. Once inside the loop, the first order of business is to examine the sound-source position list. The list is first checked to see if the end has been reached (for a closed path the list is circular, so this never occurs). Then, it's just a matter of time! When the specified time has expired, the pointer in the position list advances, and a new room response is calculated. No problem. Processing thereafter ensues, using the new room response.

The remainder of Example 2 essentially implements Figure 6(c) to produce the processed output. The TapDelay and TapGain arrays contain values for every reflection. The OutputIndex is calculated as a delay from the current index into the sample array using the values in the TapDelay array. Before using the OutputIndex, it is tested and adjusted for modulo behavior to stay within the range of the sample array's size. Finally, the Output is the summation of delay and attenuated input samples. This is a finite impulse-response filter. The remainder of the main program is little more than file I/O and modulo index incrementing.

Calculating the impulse response of the room is really the central focus of this article. Example 3 is the core of this calculation. To see the function in its entirety, refer to Listing Three. The TapDelay and TapGain arrays will be filled with the delays and gains for each reflection. As shown earlier, these are used in the main program to calculate the output.

Each loop represents one dimension; hence, three of them cover three dimensions. Each combination of i, j, and k represents a mirrored room in 3-D space. Figure 8 shows how the looping values i/j/k, correspond to mirrored rooms.

The steps to calculate the room response parameters are simple. You first mirror the sound source into a mirrored room using the looping variables. This is done by implementing the equation in Figure 5(c), ultimately providing the mirrored sound-source coordinates. The TapDelay for this sound ray is then determined using Figure 6(a). The values for FS (sampling frequency) and VS (velocity of sound) are constants.

All that remains is initializing the TapGain array. The distance is known, but the reflection order must also be incorporated to account for attenuation that occurs at each room surface. In the two-dimensional mirrored room model in Figure 4, the order can be determined graphically by counting the number of line segments the sound ray crossed moving from the mirrored sound source to the listening position. In three dimensions, this concept expands to be the number of planes crossed. The reflection order is computed and then used in the calculation for TapGain, which is an implementation of Figure 6(b).

With the TapDelay and TapGain arrays fully initialized, the function is complete and processing of audio using this newly generated room response is handled by the main program discussed earlier.

Conclusion

A simple model of the human head, combined with the early reflections model presented here, will yield extracranial sound images that are really quite astounding. I encourage you to read Durand Begault's book for more on this topic. Beyond that, it is up to you to decide on the level of control needed to yield the desired result. But be careful, once you get started, this kind of programming is addictive.

References

Begault, Durand R. 3-D Sound For Virtual Reality and Multimedia. San Diego, CA: Academic Press, 1994.

Handel, Stephen. Listening. Cambridge, MA: MIT Press, 1989.

Woram, John. Sound Recording Handbook. Valley Forge, PN: Howard W. Sams & Company, 1988.

Figure 5: (a) Mirrored rooms in the x-dimension only; (b) calculating the positions of the mirrored sound sources; (c) single equation for the position of a mirrored sound source.

Example 1: File of input parameters.

0.5               # Input gain for data file (float)
1                 # Open Path = 0 Closed Path = 1 (int)
6    5    4       # Room Dimensions (X,Y,Z) (float)
0.95              # Reflectivity of walls (float)
11                # Number of rooms per dimension, 1 or greater
2    3    2       # Listener coord (X,Y,Z) (float)
2    2    2    1  # Source coord and time at coord (X,Y,Z,t) (float)
3    2    2    1  # Source coord and time at coord (X,Y,Z,t) (float)
4    3    2    1  # Source coord and time at coord (X,Y,Z,t) (float)
3    4    2    1  # Source coord and time at coord (X,Y,Z,t) (float)
2    4    2    1  # Source coord and time at coord (X,Y,Z,t) (float)

Example 2: Processing portion of main program.

// get params from file, init variables, make position list
CurrentSource = InputParameters(PositionFile,Room,Source,
                               Listener,ReflCoef,InGain,NR);
AllocateMemory(Fs,Room,NR,Samples,TapDelay,TapGain,BufSize);

// calculate the response for the first position
CalcImpResp(TapDelay,TapGain,Room,CurrentSource->Coord,
            Listener,ReflCoef,Fs,NR);
        
while (feof(InFileBin)==0) {

// test for final position, test elapsed time...
// ... advance position and calc new response if time
  if (CurrentSource->NextSource!=NULL) {
    if ((CurrentSource->Time) <= (ElapsedSamps/Fs)){
      CurrentSource = CurrentSource->NextSource;
      CalcImpResp(TapDelay,TapGain,Room,CurrentSource->Coord,
                  Listener,ReflCoef, Fs, NR);
      ElapsedSamps=0;
    }
  }
  fread(&Sample,sizeof(Sample),1,InFileBin);
  Samples[SamplesIndex] = (float) InGain*Sample;
  // calculate output using all taps
  Output = 0;
  for (int i = 0;i < NR*NR*NR;i++) {
    OutputIndex = SamplesIndex-TapDelay[i];
    (OutputIndex < 0) ? OutputIndex+=BufSize:OutputIndex;
    Output += Samples[OutputIndex]*TapGain[i];
  }
  Sample = (short) Output;
  fwrite(&Sample,sizeof(Sample),1,OutFileBin);
  SamplesIndex++;                // increment modulo index
  (SamplesIndex >= BufSize) ? SamplesIndex=0:SamplesIndex;
  ElapsedSamps++;
}                                // end while loop

Example 3: Subroutine to calculate the room's impulse response.

for (int i=-NR/2;i<=NR/2;i++) {   // loop through all 3 dimensions
   for (int j=-NR/2;j<=NR/2;j++) {
      for (int k=-NR/2;k=<NR/2;k++) {
        // calc x,y,z sound source coords in mirrored room
        MirrSource.X = (i)*Room.X 
                      +fabs(((i)%2)*Room.X)
                      +pow(-1,(i))*Source.X;
        MirrSource.Y = (j)*Room.Y 
                      +fabs(((j)%2)*Room.Y)
                      +pow(-1,(j))*Source.Y;
        MirrSource.Z = (k)*Room.Z 
                      +fabs(((k)%2)*Room.Z)
                      +pow(-1,(k))*Source.Z;
        // calculate distance to listener
        Dist = sqrt(pow(MirrSource.X-Listener.X,2)
                   +pow(MirrSource.Y-Listener.Y,2)
                   +pow(MirrSource.Z-Listener.Z,2));
        // calculate delayed arrival time of reflection
        TapDelay[(i+NR/2)*NR*NR+(j+NR/2)*NR+(k+NR/2)]=Dist/Vs*Fs;
        ReflOrd = abs(i)+abs(j)+abs(k);
        // calculate attenuation for the reflection
        TapGain[(i+NR/2)*NR*NR+(j+NR/2)*NR+(k+NR/2)]=
                pow(RefDist/Dist,2.0)*pow(ReflCoef,ReflOrd);
      }
   }
}

Listing One

// structure for a point in 3 dimensional space struct point { double X; double Y; double Z; }; // structure for a point paired with time duration struct position { point Coord; double Time; position *NextSource; }; void CalcImpResp( long *, double *, point &, point &, point &, double, double, int); position * InputParameters( File *, point &, point &, point &, double &, double &, int &);

Listing Two

short          Sample;         // variable for reading from file
int            NR;             // num of mirrored rooms per dimension
long           FileSize;       // number of samples to process
long           BufSize;        // size of sample array
long           *TapDelay;      // pointer to array of tap delays
long           SamplesIndex;   // current index into sample array
long           OutputIndex;    // index into sample array for output
unsigned long  ElapsedSamps=0; // elapsed time since new position 
float          *Samples;       // pointer to array of input samples
double         Output;         // the value of the current output
double         Fs;             // the sampling frequency
double         *TapGain;       // pointer to array of tap gains
double         ReflCoef;       // the reflectivity of the walls
double         InGain;         // a fixed gain applied to the input
position       *CurrentSource; // current source in positions list 
point          Room;           // coords for the room size
point          Source;         // coords of current sound source
point          Listener;       // coords of the listener

Listing Three

/******************************************************************************
CalcImpResp

This subroutine calculates the time delays (in samples) and attenuations for the early reflections in the room.
******************************************************************************/

void
CalcImpResp
  (long * TapDelay,
  double * TapGain,
  point & Room,
  point & Source,
  point & Listener,
  double ReflCoef,
  double Fs,
  int NR)
{
  double Dist;                     // distance travelled by sound ray
  double ReflOrd;                  // reflection order of sound ray
  point MirrSource;                // mirrored source x,y,z coords
  double DistSquared;              // Square of distance
  for (int i=-NR/2;i<NR/2;i++) {   // loop through all 3 dimensions
     for (int j=-NR/2;j<NR/2;j++) {
        for (int k=-NR/2;k<NR/2;k++) {
          // calc x,y,z sound source coords in mirrored room
          MirrSource.X = (i)*Room.X 
                         +fabs(((i)%2)*Room.X
                         +pow(-1,(i))*Source.X;
          MirrSource.Y = (j)*Room.Y 
                          fabs(((j)%2)*Room.Y)
                         +pow(-1,(j))*Source.Y;
          MirrSource.Z = (k)*Room.Z 
                         +fabs(((k)%2)*Room.Z) 
                         +pow(-1,(k))*Source.Z;
          // calculate distance to listener
          Dist = sqrt (pow(MirrSource.X-Listener.X,2) 
                       +pow(MirrSource.Y-Listener.Y,2)
                       +pow(MirrSource.Z-Listener.Z,2));
          // calculate delayed arrival time of reflection
          TapDelay[(i+NR/2)*NR*NR+(j+NR/2)*NR+(k+NR/2)]=Dist/Vs*Fs;
          ReflOrd = abs(i)+abs(j)+abs(k);
          // calculate attenuation for the reflection
          TapGain[(i+NR/2)*NR*NR+(j+NR/2)*NR+(k+NR/2)]=
                  pow(RefDist/Dist,2.0)*pow(ReflCoef,ReflOrd);
        }