Dr. Dobb's Journal July 1998
Remember the movie 2001: A Space Odyssey? One of the most memorable characters in that movie was, of course, "Hal" the computer. Like most movie computers (the exception being the Macintosh in Star Trek IV), Hal was smart and spoke with humans. Well, it isn't quite 2001, but it's getting pretty darned close, and still, real computers don't speak and understand as well as their movie counterparts. However, recent developments have brought a plethora of voice-recognition tools for Windows. Though these tools are mainly for dictating into word processors, as a developer you can certainly think of other applications. Wouldn't it be nice to piggyback your program on an existing voice recognition system? You can, using toolkits such as Dragon Systems' DragonXTools.
DragonXTools visual controls make it possible for you to speech-enable Windows applications with up to 60,000-word dictation using Visual Basic, Visual C++, and other development tools that support VBX/OCX controls or C-callable libraries. Users then use DragonDictate for Windows to enter text, data, and commands into Windows apps simply by speaking. You can distribute the DragonXTools custom controls royalty free. However, you'll need run-time licenses to distribute DragonDictate for Windows.
In this article, I'll show how to use DragonXTools custom controls to add speech recognition to programs. In doing so, I'll use Visual Basic 5 to write a voice-activated autodialer (available electronically; see "Resource Center," page 3). Since the controls are ActiveX controls, however, you can use most any language with them.
Even if you do not take special action, DragonDictate still works with your program, although it may not work well, depending on how your program is written. When DragonDictate sees your program running, it scans your menu items and control captions. Since DragonDictate generates its own speech models, it can respond to users speaking your menu items and control captions. Of course, some captions work better than others. Sometimes DragonDictate can't determine an appropriate speech model. Also, if you use custom controls for graphical buttons, DragonDictate can't decide what that means.
DragonDictate operates in either command or dictate mode. Usually, DragonDictate is in command mode, which lets you speak menu commands, or key names. In command mode, you can operate the computer without using a mouse or a keyboard. However, entering text in command mode is a chore. You'd have to speak each letter individually. Instead, when you want to enter text you say "dictate mode." Now, DragonDictate interprets user speech as words until you say the command "command mode." (One problem is that users must remember to enter dictation mode. A speech-aware program might handle this automatically.)
In short, it isn't strictly necessary to alter your programs in any way to make them work with DragonDictate. However, with DragonXTools, you can get significant benefits. For example, you can set your own pronunciations for words that DragonDictate might misinterpret (or graphics that DragonDictate can't read at all). You can also respond to words with your own actions, use DragonDictate's macro language, and share the sound system with DragonDictate (many sound cards can only listen or talk at one time).
DragonXTools includes a control that lets you recognize speech and interact with the Dragon engine, and another that converts plain text into speech. This works about as you expect; the speech sounds computer-generated, which is not always pleasing.
Using the components isn't difficult and the manual provides examples to help you get started. The examples are for VB, but the manual includes advice on how to use the components in C++, Delphi, and Java as well.
Dragon separates words it will recognize into vocabularies and groups. It scans different vocabularies depending on the current situation. The Dragon speech control lets you manage vocabularies and groups. You can create new words, and control which groups Dragon examines for speech recognition.
Like all ActiveX controls, the Dragon control has properties, methods, and events. Table 1 is a list of the members used in the VB program I present here. Many functions that you'll need to use require you to access Dragon's scripting language (via the Script property).
Using the controls is straightforward once you get the hang of it. You first have to make sure DragonDictate is running. If it isn't, you can start it before your program proceeds using something like Example 1. Once DragonDictate is running, you have to attach your speech control to the speech recognition engine. You can do this with the Attach property. Then you are ready to begin creating your own vocabularies. You can set the currently active vocabulary and group using the Vocabulary and Group properties. When Dragon recognizes any of the words in your control's group, it sends an event so that you can act on the word.
Although the tools are generally easy to use, I did find a few things I wish had been different. First, DragonDictate doesn't work well under Windows NT. The Dragon web site (http://www.dragonsystems.com/) has a FAQ about this. You can use Dragon and the tools under NT, but the behavior is quirky. Occasionally, you'll be thrown into another window, for example. Worse, during development, the system frequently crashes and hangs VB. This seemed to have something to do with breakpoints, so I don't expect it would be a problem in the shipped program, but it sure made writing software a chore.
Another thing I thought odd is the way Dragon handles sleeping. If you are like me, you can't really leave your microphone on all the time for DragonDictate to listen. There are phones ringing, dogs barking, and all manner of other noises in my office. Suppose you are dictating text into a program and the phone rings. You can say "go to sleep," which puts Dragon in a dormant state. It still listens, but it doesn't do anything until you say "wake up." However, DragonDictate still notifies you when it recognizes any of your words. That means you have to know if Dragon is asleep or not. However, you can't ask Dragon if it is asleep. If you want to handle this problem, you have to define your own "go to sleep" and "wake up" commands and do the work yourself. Then you'll still get the events for other words, but your program will know it should be asleep. The Dragon manual has examples of several ways to do this.
I've always thought the voice-activated autodialers on some high-end telephones are a great idea. You just speak a name into the phone and it dials away. (And before you ask, yes, you can get a headset that works with Dragon and your phone.) For the first cut, I tried to make the program understand the name I was saying and I figured I would store the phone numbers in a flat file or database.
As I got the speech part working, I realized I really didn't need a database, since I could store the phone number and name along with the speech model. Of course, you don't want to have to say the name and the number to dial on the phone, right? That defeats the whole purpose. However, Dragon lets you specify alternate pronunciations for words, as I'll show you in a bit. Since users have their own vocabulary, that means individual users also have a private phone book.
Figure 1 shows a completed dialer application. You don't need any special code or controls to handle the three buttons -- Dragon takes care of them automatically. But you do need some special work to make the phone automatically dial when you speak a name.
When you say "add" or click the Add button, the program brings up a simple form that lets you make a new entry. The form has two fields, Name and Number, that you can jump to by simply saying the appropriate word. Also, when the Name field has the focus, the program automatically places Dragon in dictation mode. It is impressive how many common names Dragon correctly interprets.
Saying "number" (or clicking the button with that name) brings up an input box. From here you can just say numbers aloud to dial them. Say "okay" when you are done, or "cancel" to abort. Dialing the phone is easy with an MSCOMM ActiveX control. The dialer assumes you have a Hayes-compatible modem on COM1 (although that's easy to change).
The trick to this program is setting up word recognition. When you add a new name, the program constructs a string to "teach" Dragon. The string consists of the name, a tab character, and the phone number. However, this would be awkward to pronounce, so the program also adds a square bracket, the name alone, and closing bracket. By placing alternate text in brackets, you are telling Dragon that the text between the brackets is the correct pronunciation for the preceding word. The program then feeds this string to the AddWord method of the speech control. If Dragon can't deduce a speech model for the word, AddWord returns False and the program lets the user train the word in question. Listing One ncludes this logic (see the Add_Click subroutine).
Interestingly, the listbox holds names in the same format (but no pronunciation in square brackets). This makes it easy to create a single Dial routine that handles a string from the voice recognition or listbox. It also makes it easy to reconstruct the listbox from the vocabulary data on startup (see the Form_Load subroutine in Listing One).
The DDSpeech1_SpeechRecognized routine handles the voice dialing. The only reason there is more than one line of code in this routine is that I wanted to change the listbox selection to reflect the dialed number. Visual feedback is important when you are dealing with voice command, because voice is not 100 percent accurate. When you delete a name, for example, the program is careful to prompt you before taking action. It might be a good idea to add a similar safeguard to the dialing routine, too.
The form used to add names is available online. There, you can find the code that sets Dragon's mode when each text box receives the focus. This allows users to dictate names without having to explicitly set the dictation mode.
Once you have the ability to work with voice commands and dictation, there are many other ways you can make your application more voice friendly. For example, by using the SetHomeGroup script command, you could restrict the phone-number fields to accept only words that make sense for phone numbers. You can also use DgnTTS control to convert words back to voice (although for simple uses, you might be better off just playing prerecorded wave files).
Although DragonXTools has some problems (poor NT compatibility and difficult to manage sleep mode), it is exciting to watch a program respond to spoken words. Probably the biggest disadvantage is that users have to already have one of the Dragon products that provide the actual speech processing. Of course, if you are building a dedicated system, or you are willing to license the product from Dragon, this may not be a problem. Just try to resist the urge to speak into your mouse.
DDJ
VERSION 5.00Object = "{C9F1DD69-49F9-11D0-B5C5-444553540000}#1.0#0"; "dd32.ocx"
Object = "{648A5603-2C6E-101B-82B6-000000000014}#1.1#0"; "MSCOMM32.OCX"
Begin VB.Form MainForm
Caption = "Voice Dialer"
ClientHeight = 3195
ClientLeft = 60
ClientTop = 345
ClientWidth = 4680
LinkTopic = "Form1"
ScaleHeight = 3195
ScaleWidth = 4680
StartUpPosition = 3 'Windows Default
Begin VB.CommandButton ManDial
Caption = "Number"
Height = 495
Left = 120
TabIndex = 3
Top = 1560
Width = 975
End
Begin MSCommLib.MSComm MSComm1
Left = 720
Top = 2520
_ExtentX = 1005
_ExtentY = 1005
_Version = 327680
DTREnable = 0 'False
End
Begin VB.CommandButton Delete
Caption = "Remove"
Height = 495
Left = 120
TabIndex = 2
Top = 840
Width = 975
End
Begin VB.CommandButton Add
Caption = "Add"
Height = 495
Left = 120
TabIndex = 1
Top = 120
Width = 975
End
Begin VB.ListBox List1
Height = 2790
Left = 1320
Sorted = -1 'True
TabIndex = 0
Top = 120
Width = 3135
End
Begin DDSpeechLib.DDSpeech DDSpeech1
Left = 120
Top = 2640
_Version = 65536
_ExtentX = 741
_ExtentY = 741
_StockProps = 0
End
End
Attribute VB_Name = "MainForm"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Option Explicit
Private Sub Add_Click()
' Name, number, and generic string
Dim n As String, nm As String, s As String, word As String
AddForm.Show vbModal
If AddForm.Cancelled <> True Then
n = AddForm.NewName
nm = AddForm.NewNumber
s = n & Chr(9) & nm
List1.AddItem s
word = s & "[" & n & "]"
If DDSpeech1.AddWord("PhBook", "TelNum", word, "'")
= EXP_ERR_WORD_HAS_NO_MODEL Then
DDSpeech1.TrainWord = word
End If
Unload AddForm
End If
End Sub
' Dial a number in the format of name (tab) number [xxx]
' The brackets, if present at all, are ignored
Sub Dial(ByVal word As String)
Dim n As Integer
Dim t0 As Date
Dim dn As String, nam As String ' Dial number, name
n = InStr(word, Chr(9))
dn = Right(word, Len(word) - n)
nam = Left(word, n - 1)
n = InStr(dn, "[")
If n <> 0 Then dn = Left(dn, n - 1)
MSComm1.PortOpen = True
MSComm1.Output = "ATV1E0DT" & dn & Chr(13)
t0 = DateAdd("s", 5, Now)
Do
DoEvents
Loop Until Now > t0 ' Wait 5 seconds
MSComm1.PortOpen = False
MsgBox dn, vbOKOnly, "Dialed " & nam
End Sub
'Delete Entry
Private Sub Delete_Click()
Dim n As Integer
Dim word As String, nam As String
n = List1.ListIndex
If n <> -1 Then
If MsgBox("Delete this entry", vbYesNo) = vbNo Then Exit Sub
word = List1.Text
nam = Left(word, InStr(word, Chr(9)) - 1)
word = word & "[" & nam & "]"
' Delete word from dragon dictionary
If DDSpeech1.DeleteWord("PhBook", "TelNum", word) Then
List1.RemoveItem n
Else
MsgBox "Can't remove name"
End If
Else
MsgBox "Please select a name first"
End If
End Sub
' Manual dial a number
Private Sub ManDial_Click()
Dim nr As String
nr = InputBox("Enter or say the number to dial")
If nr <> "" Then Dial ("Manual Dial" & Chr(9) & nr)
End Sub
Private Sub DDSpeech1_SpeechRecognized(word As String, WordValue As String)
Dim SearchWord As String
Dim i As Integer
' Find string in listbox so we can highlight it
SearchWord = Left(word, InStr(word, "[") - 1)
List1.ListIndex = -1
For i = 0 To List1.ListCount - 1
If SearchWord = List1.List(i) Then
List1.ListIndex = i
Exit For
End If
Next i
Dial word ' Do it
End Sub
Private Sub Form_Load()
Dim s As String
Dim n As Integer
' Start Dragon if not already started
If Not IsDDWinRunning() Then
If Not StartDDWin() Then
MsgBox "Can't start Dragon Dictate", vbExclamation
End
End If
End If
DDSpeech1.Attach = True
DDSpeech1.AddVocabulary "PhBook"
DDSpeech1.AddGroup "PhBook", "TelNum"
DDSpeech1.Vocabulary = "PhBook"
DDSpeech1.Group = "TelNum"
' Load phone numbers already in vocabulary
s = DDSpeech1.WordFirst
Do While s <> ""
n = InStr(s, "[")
List1.AddItem (Left(s, n - 1))
s = DDSpeech1.WordNext
Loop
End Sub
' Double click for those who are speechless!
Private Sub List1_DblClick()
Dial List1.Text
End Sub