Mike Cohn works as Software Team Leader for Telephone Response Technologies. He has an M.A. in economics from the Claremont Graduate School and is pursuing an M.S. in computer science at the University of Idaho. His interests include software engineering, object-oriented design, and anything remotely related to C or C++. He can be reached at 70324,3535 on Compuserve.
By now we've all experienced it. In fact, hardly a week goes by when we don't at some point deal with an Interactive Voice Response (IVR) system. IVR systems are those that allow us to determine our bank account balances, confirm airline reservations, order retail merchandise or request product information from companies solely by using our telephone keypads. It may come as a surprise to many programmers that a great deal of these IVR systems are being run by standard MS-DOS machines equipped with add-on cards.
Despite the power inherent in these applications, programming them can be made quite simple given the proper basic framework. Because one of the basic requirements of an IVR system is that it handle multiple incoming phone lines, the code presented here will be of interest to anyone who could make use of a multitasking state machine in C++.
Necessary Hardware
In order to set up an IVR system, you will need two pieces of special hardware an add-on card to interface with an incoming telephone line, and a record adapter. The record adapter is a small box that connects in-line between your speech card and your telephone. It allows you to simulate an incoming call. You could program a system without the record adapter by actually calling into your system over the local phone lines. This is not only inconvenient, however, it also reduces the quality of your recordings because of the inherent line noise.Voice cards are available from a variety of manufacturers, ranging from Talking Technologies' BigMouth on the low end to Dialogic on the upper end. Because of their superiority of features and applicability to multi-line systems, the example developed in this article uses the Dialogic cards. However, the basic concepts presented here are portable among most card manufacturers. The use of C++ allows for the encapsulation of any card-specific details.
Basic Concepts
Channel Event Queue
The Dialogic speech cards maintain an internal queue of events which occur on all phone channels within the system. Events are signalled when certain multitasking functions of the hardware are completed. These functions include playing and recording messages, getting touch tone input from a caller, and performing outdialing operations. As these events occur, the speech card will store the information. It may then be retrieved through a call to the Dialogic library function gtevtblk.
Read/Write Block
When messages are recorded or played on the speech cards, information is passed between the calling program and the speech card through a read/write block. This C structure is defined in the Dialogic library. It is used for passing information such as the file handle to use, the number of bytes read or written, and the maximum number of seconds of silence before cancelling a record.
The Sample Application
Because our system will support multiple phone channels in a single machine, it will be important to maintain information about the current state of each channel. For example, channel 1 may be playing the "hello" message, while channel 2 is getting input from a caller, and channels 3 and 4 are idle. Maintaining this state information can most easily be accomplished by using state machines that control the flow through the system of the individual phone lines. A State Transition Diagram for this program is shown in Figure 1.As you can see, this example program involves ten states with all phone channels beginning in the Wait For Ring state. After a ring is detected, the channel will transition to the Offhook state this is the computer's equivalent of lifting the receiver. If that operation is successful, it will be followed by the Hello state. If the offhook operation is not successful, the channel will be placed back onhook and return to Wait For Ring. At the Hello state, a simple message informing the user that he has reached our IVR system is played.
In the Get Digits state, the user is asked to press 11 to hear a message about current products, to press 22 to record a message, or to press 33 to hear the most recently recorded message. After either playing or recording the desired message, the channel progresses to the Goodbye state. At this time a "Thank you for calling message" is played and the channel is placed back on hook and into the Wait For Ring state.
Multitasking with State Machines
Two steps will be involved in achieving the level of mulitasking necessary in creating our IVR application. The first step will be to partition each state into two functions which will begin and end the state. The second step will be to write a main processing function which will repeatedly check on the status of each channel.States are broken into "begin" and "end" functions so that state transitions can be separated from state processing. For example, in the Offhook state's begin function, the Dialogic card is instructed to take the line offhook. The Offhook state's end function is called the next time a Dialogic event is received for that line. The end function will control the state transition by moving the line to either the Hello or the Onhook state based on whether the current Offhook attempt was successful.
The CHANNEL Class
The only class necessary for a simple IVR application will be CHANNEL. It contains all information specific to a given phone line. Defining just the data members of the CHANNEL class gives the following:
class CHANNEL { private: int lineno; char msgname; char digits [MAXDIGITS+1]; RWB rwb; int (CHANNEL::*begin_func) (); void (CHANNEL::*end_func) (int evtcode); };Each instantiation of the CHANNEL class contains a character array to hold the name of any speech file currently being played. It also contains a second array which will hold digits that are entered on the caller's phone keypad. The lineno member stores the channel number and the rwb member is the channel's read/write block.The most interesting parts of the CHANNEL class are the two function pointers, begin_func and end_func. These two function pointers are the means for initializing and terminating each state. Each state will consist of two functions, each of which will be pointed to by one of these class members. Because the function pointers are declared as private, we will need to add public member functions to our class to access these function pointers:
public: int begin_state(); void cmplt_state(int evtcode);The biggest advantage of accessing our begin and end function pointers through separate class members is not the extra level of abstraction but the ability to hide one of C++'s stranger syntactical features. The begin_state function is written as follows:
typedef int (CHANNEL::*INTPROC)(); int CHANNEL::begin_state() { INTPROC fp = this->begin_func; return (this->*fp) (); }This code sets a pointer to a function which is itself a member of the CHANNEL class. This function is then executed through the implicit this object. Because this aspect of C++ takes some getting used to, it is beneficial to hide it in a separate member function. Similar pains are taken with the end function, as can be seen in Listing 2.
Member State Functions
Once designed, the actual implementation of the individual begin and end functions is almost trivial. All code for processing the states used in this sample application is contained in Listing 3. For example, consider the Offhook state, which consists of offhook and offhook_cmplt.
int CHANNEL::offhook() { return(sethook(lineno, H_OFFH)); } void CHANNEL::offhook_cmplt(int evtcode) { if (evtcode == T_OFFH) { begin_func = CHANNEL::hello; end_func = CHANNEL::hello_cmplt; } else { // otherwise go back on hook begin_func = CHANNEL::onhk; end_func = CHANNEL::onhk_cmplt; } }The begin function, CHANNEL::offhook, simply instructs the Dialogic hardware to set the status of the line to offhook. Because this task cannot be performed instantly, the hardware will at some point signal an event for this line. At that point in time the end function, CHANNEL::offhook_cmplt, will be called and passed the event code. If that event code represents a successful offhook operation the state function pointers will be set to the Hello state.
Playing Messages
Probably the most fundamental concept in any IVR application is playing messages. Because a typical IVR application may play hundreds or even thousands of different messages, the actual implementation of playing a message has been hidden a level lower. States that play messages do not need to do anything other than store the name of the file to play. For example, the Goodbye state is written as:
int CHANNEL::goodbye() { strcpy(msgname, "goodbye"); return play(); } void CHANNEL::goodbye_cmplt(int evtcode) { close(rwb.filehndl); begin_func = CHANNEL::onhk; end_func = CHANNEL::onhk_cmplt; }It is in the play function, which is called by CHANNEL::goodbye that the true play begins. CHANNEL: :play, shown in Listing 2, creates the full name of the speech file to play. A handle to this file is stored in the channel's read/write block. Also stored in the read/write block is information instructing the hardware to terminate the play if the user presses a touch tone or if a loop signal is detected. A loop signal is essentially a change in the current on the line. This frequently indicates that a caller has hung up. The play is initiated through the call to xplayf, which is in the Dialogic library.
Gathering Touch Tones
The ability to gather and process touch tones from a caller is what distinguishes an IVR application from a glorified answering machine. Touch tones gathered by our demo application will be stored in the digits element of the CHANNEL class. The first step in gathering digits is to clear the channel's read/write block by using the Dialogic library function clrrwb. Next, a pointer to the location of the character array used for storing digits is set in the read/write block. Finally, options for the number of touch tone digits to get, the maximum number of seconds to allow for input, and whether to terminate on loop signal are set.Because touch tone input is stored in the CHANNEL class as a null-terminated string, it is possible to perform standard string manipulation functions with it. In the get_digits_cmplt function in our example, a simple menu is created by performing a series of string comparisons. Based on the results of the string comparison, the application can then progress to the Product_info, Playback, or Rec_msg states.
Recording a Message
Some IVR applications can be quite functional without allowing callers to record messages. For example, many vendors now have technical support hotlines that play back answers to frequently asked questions. But it is certainly an easy and valuable feature to implement. As shown in CHANNEL::rec_msg in Listing 3, the steps involved are to clear the channel's read/write block and then get a handle to the file into which you want to record the caller's message. In this application, all caller recordings will be stored in OUTPUT. VOX. After opening the file, a number of options are set in the read/write block. These control the maximum recording length, which digits will terminate the recording, the amount of silence before timing out, whether to terminate on a loop signal, and the duration of the beep used to notify the caller to begin speaking. The actual record process then begins with a call to the recfile function in the Dialogic library.All that is necessary in the end function for recording a message is to close the open file handle and clear the channel of any digits that the caller may have pressed. If the channel is not cleared of these digits, they will still be waiting the next time the Dialogic card is called.
Demonstration Program
The demonstration program presented here implements the features of playing messages, recording messages, and gathering and acting upon touch tone input from a caller. It is thus a good foundation for building more useful applications. It can easily be enhanced so that callers can be presented a list of topics and then hear a message about each topic. Similarly, the system could be enhanced by removing the limitation that allows only a single user-recorded message. Instead of always recording to the filename OUTPUT. VOX, the system could record to a filename based on the value of an integer that is incremented each time a record is requested.
Practical Applications of IVR
One of the most basic applications you can create using the foundation presented here is the "Automated Attendant." This is the computerized receptionist that can be programmed to answer your regular phone lines. Automated attendant systems can be full-time systems that route calls through your regular phone systems (including PBXs). They can also be off-hours systems that can present callers with a menu of likely options. ("Press 1 to leave a message for Customer Service, press 2 to leave a message for Marketing.") Systems such as these are extremely useful at playing back any standard set of information a user may request. ("Press 1 for upgrade information, press 2 for a list of new features.")Many companies have made use of IVR technology to provide order hotlines through which customers may track the progress of their orders. Others have used IVR to automate order processing, especially for customers upgrading to a new software version. Such systems can use the caller's serial number to determine the caller's name and address, and can then request a credit card number. Credit card validation can be accomplished by dialing out on a different line to a credit validation company.
One of the most exciting prospects for the use of IVR to arise over the past few years has been "Demand Publishing." This is the integration of an IVR system with one or more fax cards installed in the same machine. The caller can instruct the system to fax him company information or product literature. Most of the large software resellers have added systems like this over the past year. This allows you to call them and by pressing product access codes be instantly faxed back information on available products.
I have shown how easy it can be to create IVR applications using C++. But there may be even easier methods based on your needs. There are now commercial products available that serve as Fourth-Generation Languages for creating voice applications. Naturally, there will be a tradeoff between flexibility and development effort when using one of these 4GL products. However, in most cases the saved development time is worth it.
Listing 1
Listing 4