The VESA BIOS Extension/Audio Interface

A standard software interface for audio

Doug Cody

Doug is a software engineer at MediaVision and member of the VBE/AI committee. He can be contacted at MediaVision, 47300 Bayside Parkway, Fremont, CA 94538.


The Video Electronics Standards Association (VESA, for short) is an international, nonprofit standards organization responsible for creating the Super VGA BIOS standards, that has developed and published the VESA BIOS Extension/Audio Interface (VBE/AI) for AT-class PCs. The main goal of VBE/AI is to provide a standard software interface for audio, similar to the ROM BIOS for video. Just as the lack of Super-VGA standards has hindered application development in the video world, the lack of sound standards has limited the sound-board industry. At the current time, support for individual audio boards falls on the shoulders of third-party developers. In addressing this void, VBE/AI has three main goals:

The following article briefly describes various aspects of the VBE Audio Interface.

VESA Audio Services

The video BIOS in PCs provides a set of functions to set video modes, read/write characters to the screen, scroll the screen, and so on. Figure 1 shows the original services. The VESA Super-VGA BIOS extensions started higher numerically to coexist with the standard BIOS. In the first release, the extensions appeared similar to the standard set; see Figure 2. In the next couple of releases, a provision was made to allow extensions for other types of devices, starting at function number 0x4F10. All new VESA standard devices will be allocated a particular ID, as in the case of the Audio Interface, which was assigned 0x13. Figure 3 lists the complete set of VBE/AI Int 10H functions. The primary purpose of the Int 10 services is to allow the application to query, open, and close the device. Access to the device API is performed through 32-bit far calls using Pascal calling convention.

Since there can be multiple audio devices in one system, VBE/AI allows for multiple software APIs, one per device. A handle gives the application access to one device API. To use a device, the application locates a device handle, queries the specific device via the handle, then opens and closes the device via the handle.

Before jumping into specific device services, I just want to skim over the query process. Within VBE/AI, each device provides two data structures. One of these contains information about the device, and the other contains the specific entry points of the device services. The query process allows the application to retrieve a copy of the information structure. This structure contains information such as hardware ID, configuration, and capabilities. The format of this structure is not yet complete at the time of this writing, but it is worth mentioning because it is an integral part of the VBE/AI interface.

WAVE Audio Interface

The Wave device class supports two common metaphors for handling the data: single-block and continuous streaming. The single-block operation performs the operation on the data block, issues an interrupt when done, then returns to an idle state. The streaming mode attempts to keep data moving to and from the hardware and doesn't stop until explicitly told to by the application. In this mode, completion interrupts are presented to the application at fixed points during the streams I/O. These approaches model the real-world hardware operation of the 8237 DMA controller, of which there are two in every AT class machine. Most audio cards make use of a DMA channel; some use two. Other devices, such as the Turtle Beach Multisound card, make use of memory mapping with on-board memory, using the CPU to move the data instead of a DMA channel. Since only the metaphor of single-block and continuous modes are characterized in the interface, these non-DMA devices can easily be supported.

When a VBE/AI Wave device is opened, the application is given a copy of it's services structure. Listing One, page 62 illustrates the Wave Services structure. The data fields in the structure are for identification purposes or reserved for future updates of the VBE/AI release. The remaining portion of the services structure presents subroutine calls for the application and two 32-bit far callbacks for the application to fill out. These callbacks allow the VBE/AI code to call the application on certain events.

The services can be lumped into three groups: housekeeping, process control, and block I/O. Each of these services is described in Figure 4. Note wsTimerTick. In the VBE/AI definition, the application "owns" the system-clock timer. The VBE/AI device specifies how many callbacks per second are needed to keep running. The application is responsible for calling this subroutine at the requested rate.

The two callbacks from the device to the application provide the application with real-time notification that certain events have occurred. These two callbacks indicate the completion of block I/O performed by the device. There is one callback for completion of PCM playback, and one for PCM record; see Figure 5. The two-callback approach was adopted so that full duplex (that is, simultaneous play and record) could be performed using the VBE Audio Interface.

MIDI Interface

The MIDI device class in its simplest form uses the metaphor of a MIDI transmitter for playing notes. This is similar to the Windows approach that views all musical devices, such as an FM chip, wave-synthesis card, or MIDI port, as a common device type. The VBE/AI MIDI device is defined as a general-MIDI device, so its channel and patch (instrument) assignments are consistent with the International MIDI Association's definition of general MIDI. Unlike Windows, though, there is no differentiation between low- and high-end synthesizers. This scheme, within Windows, was implemented by allocating channels 1-10 for the high-end synthesizer, and channels 11-16 for a low-end synthesizer. In the general-MIDI specification (and therefore in the VBE/AI interface), only channel 10 is recognized as a special channel. All others are free to be used any way the application desires. (Incidentally, channel 10 in the general-MIDI specification is the percussive instrument channel.)

When a VBE/AI MIDI device is opened, a copy of its services structure is given to the application; Listing Two, page 62 illustrates this structure. The structure, like the wave-services structure, contains few data variable, several subroutine calls, and a couple of callbacks to the application.

The only data variable presently worth looking at is the mspatches bit field. This array of 256 bits indicates which general-MIDI patches are currently preloaded, and thus available for use. For simple MIDI transmitters and receivers, this bit field could have all bits set, since it can only "assume" that the external device contains the required patches. For devices like an FM synthesizer that have no memory, this table may have all bits set to 0, thus indicating it will need patch data from the application before the instrument is played. Later on, I'll discuss the VBE/AI unique approach to loading patch data into the device.

Again, the services can be lumped into three groups: housekeeping, patch handling, and MIDI I/O; see Figure 6. Additionally, there are two callbacks from the device to the application to provide the application with real-time notification that certain events have occurred. The first callback allows downloaded patch memory blocks to be returned to the application. The second callback lets VBE/AI MIDI receivers send inbound MIDI data to the application in real time; see Figure 7.

One issue addressed by the VBE/AI interface is that most PC MIDI hardware doesn't contain full general-MIDI patch sets. For devices, such as the OPL2 FM chip (as found on the original Adlib FM card), there's no memory within the chip to hold all the general-MIDI instruments. So, either the driver takes up an exorbitant amount of memory to cache the instrument bank, or the application will have to provide the patches from an external source, such as a disk-resident library. The later approach of storing the patches to reside on disk has been adopted for the VBE audio interface. To do this, the architecture has been divided in two parts. The first part is the hardware-specific driver to translate the MIDI stream into audible sound, as described earlier. The second part is an optional disk-resident, vendor-specific, library that holds the individual instrument settings, known as patches.

The role of the patch library is to allow the application to present a given patch to the hardware before the instrument is played. To integrate hardware dependencies in this approach, the application need only "understand" the skeletal structure of the library. In other words, the app must know how to get to the patch data, but doesn't have to understand the contents of the patch. When a program change is required, the application will search the library, extract the patch, and pass this to the device using the msPreLoadPatch subroutine. For patches larger than available memory, a protocol has been defined for sending patch data in smaller chunks.

One benefit to this architecture is that applications can substitute their own patches for the ones in the vendor's library. There's no rule saying the patches must come from the vendor's own library. Of course, this does mean the application is delving into hardware dependencies.

Volume Control

The VBE/AI Volume services is an API for controlling both volume and mixer channels. Within the API are setting controls for stereo volume, sound positioning within a 3-D space using a spherical-coordinate system, parametric equalizer, and filter control. The 3-D sound positioning, which is derived from the virtual-reality field, is the most exciting aspect of this interface. In new products coming to market this year, 3-D audio will become an important selling feature. Two-dimensional schemes such as Qsound and Surround Sound can also work with this interface.

The approach to the volume/mixer class is different than the MIDI and Wave device classes. When opened, the MIDI and Wave devices are considered owned until closed, whereas a volume device can be acquired by any number of applications, and is thus not exclusively owned. This way, the user may have volume control through such things as a TSR, even if the application does not provide one.

An interesting aspect is that volume service APIs are acquired without performing an Int 10 Device Open function these services are acquired through the functions rather than the device open functions.

When a VBE/AI Volume device is acquired, a copy of its services structure is given to the application; see Listing Three, page 62. The structure (like the wave-services structure) contains few data variables, and several subroutine calls. The services can be lumped into two groups: housekeeping, and device control; see Figure 8.

Next Up

In future releases of the VBE/AI specification, you can expect to see a 32-bit audio interface and sound-effects processing control. In addition, VESA has active working committees developing open standards in the areas of video interfacing (VAVI), feature connector (VAFC), and multimedia data channels (VM-Channel).

DOS and Software Standards for Audio

At the Game Developer's Conference held earlier this year in Santa Clara, California, Tom Rettig of Broderbund Software held a round-table discussion on the sound issues facing software developers. A list of topics was fielded from the audience and rated by importance. By an overwhelming majority, the number-one topic was the need for a DOS audio software standard.

You may be wondering why there is such concern when the world seems to be moving to platforms like Windows and OS/2 that already support an audio interface. Indications are that the games industry is leaning towards staying in DOS via 32-bit DOS extenders. DOS is still the optimum game environment due to having direct control over the machine. So, if the games don't move soon to these new OSs, PC-audio industry revenue will be tied to DOS for quite a while. There's still no compelling business audio application that creates revenue like the DOS entertainment market.

It may seem ironic that games developers are asking for a software interface, yet want to have direct control over the hardware. In fact, most DOS software developers demand, as a inalienable right, the knowledge to directly program the hardware. Up until 1990, software vendors had few audio-hardware selections, so support was a relatively easy endeavor. For music, there was the Adlib card and Roland MPU-401. For voice, there was Creative Labs' SoundBlaster and Covox Speech Thing. But when all else failed, the lowly PC speaker could still be used to play tones, and rudimentary sounds.

Since 1990, an explosion of audio hardware, with varying architectures has entered the PC sound market. Each software company that has maintained in-house support for sound has had to carry the burden of adding support for these new cards. As a result, third-party software libraries for audio became very popular in 1991 as the number of cards grew.

But due to the plethora of architectures, the industry is swaying under the burden of trying to support too much with too little resources. There are well over a dozen hardware interfaces a developer will need to support, in order support all PC audio-board products. The showing of hands at the Game Developer's Conference suggests that software developers are ready to take one step back off the hardware, but first need a uniform approach to programming audio. The trade-off of using a hardware-independent software interface, vs. creating, maintaining, and growing a complete library, is now a prudent choice of engineering time.

Not everyone has come to this conclusion, however. A few die-hard game developers have retreated to a position that they will only support a few audio cards. Unfortunately, this doesn't serve the customer, who might have an unsupported audio board, nor does it foster development of advanced technologies. From a competitive standpoint, these developers will ultimately suffer, since competitors will gain a technical one up by taking advantage of the new whiz-bang features found on the newer cards. The old lesson is, standing still only allows your competitors to pass you by.

--D.C.

Figure 1: Standard Int 10H functions

Figure 2: VESA Int 10h extensions

Figure 3: VESA Int 10h extensions for the audio interface

Figure 4: Wave-device services (a) housekeeping, (b) block I/O, (c) process control

Figure 5: Callback functions

Figure 6: MIDI services: (a) Housekeeping, (b) for MIDI I/O; (c) patch handling

Figure 7: MIDI callback functions

Figure 8: Volume and mixer services: (a) housekeeping; (b) device control

[LISTING ONE]


typedef struct {

    // housekeeping

    char wsname[4];     // name of the structure
    long wslength;      // structure length

    char wsfuture[16];      // 16 bytes for future expansion

    // device driver functions

    long (pascal far *wsDeviceCheck ) (int, long );
    int  (pascal far *wsPCMInfo     ) (int, long, int, int );
    int  (pascal far *wsPlayBlock   ) (void far *, long );
    int  (pascal far *wsPlayCont    ) (void far *, long, long);
    int  (pascal far *wsRecordBlock ) ();
    int  (pascal far *wsRecordCont  ) (void far *, long, long);
    int  (pascal far *wsPauseIO     ) ();
    int  (pascal far *wsResumeIO    ) ();
    int  (pascal far *wsStopIO      ) ();
    void (pascal far *wsTimerTick   ) ();
    int  (pascal far *wsGetLastError) ();

    // callback filled in by the application

    void (pascal far *wsApplPSyncCB ) (int, void far *, long ); //play
    void (pascal far *wsApplRSyncCB ) (int, void far *, long ); //rec

} WAVEService, far *fpWAVServ;


[LISTING TWO]


typedef struct {

     // housekeeping

     char msname[4];        // name of the structure
     long mslength;     // structure length

     // runtime data

     int  mspatches[16];    // patches loaded table bit field
     char msfuture[16];     // 16 bytes for future expansion

     // device driver functions

     long ( pascal far *msDeviceCheck ) (int, long );
     int  ( pascal far *msGlobalReset ) ();
     int  ( pascal far *msMIDImsg     ) (char far *, int );
     int  ( pascal far *msPollMIDI    ) (int );
     int  ( pascal far *msPreLoadPatch) (int, int, void far *, long );
     int  ( pascal far *msUnloadPatch ) (int, int );
     void ( pascal far *msTimerTick   ) ();
     int  ( pascal far *msGetLastError) ();

     // callbacks filled in by the application

     void ( pascal far *msApplFreeCB ) ( int, int, void far * );
     void ( pascal far *msApplMIDIIn ) ( int, int, char );

} MIDIServ, far *fpMIDServ;


[LISTING THREE]


typedef struct {

    // housekeeping

    char vsname[4];     // name of the structure
    long vslength;      // structure length

    char vsfuture[16];      // 16 bytes for future expansion

    long (pascal far *vsDeviceCheck  ) (int , long );
    void (pascal far *vsSetVolume    ) (int , int , int );
    long (pascal far *vsSetFieldVol  ) (int , int , int );
    int  (pascal far *vsToneControl  ) (int , int , int );
    long (pascal far *vsFilterControl) (long );
    void (pascal far *vsOutputPath   ) (int );
    void (pascal far *vsResetChannel ) ();
    int  (pascal far *vsGetLastError ) ();

} VolumeService, far *fpVolServ;

Copyright © 1994, Dr. Dobb's Journal