Review of L&H Voice Xpress
Dateline: May 17, 1998
I'M back! Did you miss me, or shouldn't I ask?
Fact is, I suffered a two-week bout of what afflicts all writers (I believe) from time to time: burnout. Got fed up trying to find something new and useful to say or report, so instead went out and chopped a couple of half-dead trees, built a brick mailbox, and suchlike soul-salving stuff. Turned out to be good for my frozen shoulders, too, but that's another story.
Halfway through last week, the UPS truck (successfully avoiding my brick mailbox; which that august and imperious body, the Clinton County Road Commission, has since ordered me to remove because there's an ordinance prohibiting brick mailboxes, for Heaven's sake) trundled up my drive and deposited a package from Lernout & Hauspie (L&H). It was a review copy of L&H's just-released automatic speech recognition (ASR) program called Voice Xpress Plus, US$99 and available all over the place.
L&H, you may recall, is the Belgian ASR company of which Microsoft bought a slice (8 percent, if memory serves) for about US$45 million, or the small change in Bill's pocket, a year or so ago. L&H had just purchased Kurzweil, one of the originators and true innovators in the ASR field, and continues to sell specialized dictation products under the Kurzweil name.
I've previously expressed the fear that Microsoft will see a voice-driven computer interface as a threat to Windows' dominance of my desktop and yours, just as it saw a similar threat from Netscape's browser. Microsoft Research's alleged foray into ASR evidently led to nothing innovative, just a decision that they'd be better off buying than building the stuff themselves. By buying into L&H, not to mention by just being Microsoft, Microsoft will essentially control that company, and in my view will at some point seek to incorporate L&H's ASR algorithms directly into Microsoft's operating systems. This is the one fundamental reason why Microsoft so adamantly and vehemently opposes the U.S. federal and state justice system, which would stop Microsoft from integrating products into its operating system.
Integrating ASR with Windows is what I would also want to do if I were adamant about controlling every desktop in the world and did not, at heart, give a fig for innovation. But that would make it harder for Dragon Systems and IBM—its two major competitors in the ASR arena—to compete. And with Dragon and IBM out of the ASR picture, Microsoft would no longer have any motivation to spend heavily on improving the ASR. Indeed, it would have more incentive to curtail innovation, thereby cutting costs, thereby increasing profits.
But all this is speculation, and for the present it does not detract from the excellent work L&H and their Kurzweil colleagues have done with Voice Xpress and other products. From my limited experience with it so far, Voice Xpress promises to be at least as good as Dragon's NaturallySpeaking and IBM's ViaVoice. Unless and until those two programs encounter unfair competition, the end user can expect to benefit as all three go head to head to continually make their products better than the others.
Having said this much, I am going to have to leave you dangling for another week for a review of the program's performance under fire. It took me a couple of days to install the program properly, so I just didn't have time to give it the workout it needs for a thorough and valid review. What I will do in the remainder of this "Review Part 1" is list the program's features as claimed by L&H and describe my experiences during installation.
Features
Below is what L&H claims for
its new product. My comments are in green text.
You can create text, format, and edit documents all by voice and all directly within Microsoft Word – no need to cut and paste. If you don't use Microsoft Word, you can use the L&H Voice Xpress word processor to create documents and then simply copy and paste the text into your favorite Windows application.
This is all accurate.
Natural Language Technology – Our
unique Natural Language Technology lets you "Say
It Your Way," enabling L&H Voice Xpress Plus to interpret your navigation,
formatting and editing commands … making
L&H Voice Xpress Plus easy to learn
and more powerful than other
voice programs.
I have indeed found it easier to issue commands in Voice Xpress than I did in Dragon NaturallySpeaking, which has a more rigid and more limited set of commands.
Continuous Speech Technology – Lets you create text by dictating in a natural, conversational manner. No need to pause between words, so you can "type" up to 140 words per minute.
This is the same as NaturallySpeaking and ViaVoice.
Create Entire Documents By Voice
– When using L&H Voice Xpress Plus in Microsoft
Word or in the L&H Voice Xpress word processor, you don't have to user
your hands at all. Use
your voice to quickly navigate the application menus and dialog
boxes, or integrate keyboard and
mouse with verbal control to maximize your efficiency.
This is accurate, and darned nice. NaturallySpeaking's latest version (2.02) also has this capability; ViaVoice does not.
Outstanding Accuracy – L&H Voice
Xpress Plus understands you without any training,
and over time, L&H Voice Xpress Plus can automatically adapt to your
voice, boosting ongoing accuracy upto
95% or higher. L&H Voice Xpress Plus even
offers unique speech profiles developed
to boost the accuracy of teen-agers and children.
The competition claims identical accuracy, and they are probably right. From my limited experience so far, it seems as though Voice Xpress is quicker to learn my voice patterns than NaturallySpeaking was, so it should get to the 95 percent level a bit faster.
Large, Customizable Vocabulary –
L&H Voice Xpress Plus will understand you because
it has a 30,000-word vocabulary that contains the words you use every day.
Additionally you can add up to 30,000
words or phrases that are specific to your work,
such as people's names, acronyms,
and industry-specific terms for a total vocabulary of
60,000 words. You can even use L&H Voice Xpress Plus to scan documents
on your PC for words
you want to add to the L&H Voice Xpress Plus vocabulary. So easy!
Same as the competition.
Ability to Add Dictation SmartText – You can automate common tasks by creating a voice macro that inserts a complete block to often used text.
Not tried it yet, but it sure sounds useful.
Text-To-Speech – You can hear your documents read back to you, making them easier to edit.
This works, and it's useful. It's like having a stenographer read back her (come on, most stenographers are women) shorthand of your dictation so you can check she got it right.
Network Support – Install L&H Voice Xpress Plus on a network server and you can use L&H Voice Xpress Plus to create documents on any network client. If you are a systems administrator and you need to backup files, you need backup only the server.
Not tested, but again this would be a very good feature to have if you have a network.
Support for Multiple Users – If several people share the same PC, they can all use L&H Voice Xpress Plus to improve their productivity.
Yep, this is true. They can't all use it at the same time, of course, and each user must train Voice Xpress to recognize his or her unique speech pattern.
Natural Speech for Number, Dates,
Dollar Amounts – With L&H Voice Xpress Plus, you
not only dictate words in a natural manner, but you can also enter numbers,
dates, and dollar amounts in your natural
speech. For example, you say, "three
thousand and four dollars" and
L&H Voice Xpress Plus types "$3,004."
This is true and useful also. NaturallySpeaking and ViaVoice are less flexible.
No Initial Training Required – You can boost your productivity immediately by using L&H Voice Xpress Plus right out of the box.
Given the context (ASR), this is ambiguous. Does it mean the user does not need training in the use of the program, or that the program does not require training to the user's voice? If the former, it is correct; if the latter, it's true that initial training is not "required" in the sense of being obligatory, but it IS required in the sense of being necessary for meaningful work. Without "enrollment" the system will not correctly interpret much of what you say.
Installation Blues
I said earlier it took me a couple of days to install. Don't be alarmed. I had a system setup situation which most of you probably won't have, and this is what caused me grief. In the first place, I run WindowsNT Workstation 4.0, not the crash-prone Windows95 most of you will be afflicted with. (If you're thinking of upgrading to WIndows98 if and when it ever gets released, take my advice: don't! Switch to Windows NT WOrkstation instead.) Running Voice Xpress on NT vs. 95 should not make a difference, since Voice Xpress is designed for both operating systems, but who knows . . . .
In the second place—and this is where L&H's support staff suspect my problems lay—I had installed both Microsoft Office 95 and Office 97 on my machine. I can't remember if, when upgrading to Office 97, I uninstalled Office 95. I don't think I did—I think I assumed the upgrade would take care of eliminating the old stuff; but anyway, on going through the contents of my hard disk and the Registry there were still bits of the old version of Office lying around.
But none of this was apparent or suspected to start with. I inserted the Voice Xpress CD-ROM and installed the program with ease. A warning message recommended downloading and installing a set of (free) bug fixes known as Service Release 1 for Microsoft Word 97 from Microsoft's Web site, saying that it would improve Voice Xpress's performance. Since it was a recommendation and not an imperative, I ignored it, and ran the program as soon as the installation was finished and I had rebooted the machine.
(Later on, in trying to solve the problem I'm about to describe, I went ahead and downloaded Service Release 1 for Office 97 (of which Word is a component)—about 2 megs or 20 minutes over the miserably slow and archaic GTE phone line and switch in my area, only to find that in order to install it, I also had to install a 20 meg bug patch for NT 4.0 called Service Patch 3. Sigh. I left the modem to download overnight and went to bed. The next day I had some serious spring cleaning to do on my C drive partition (which IBM in its peculiar notion of wisdom had set at a measly 500 megs out of 4 gigabytes available on the disk) in order just to be able to install these patches, but my machine is now bang up to date as far as NT and Office are concerned. But back to our story . . . )
The Voice Xpress Plus program group contains two ways of invoking the program: either with Microsoft Word or with the built-in XpressPad mini word processor. I went straight for the MS Word option, and while MS Word loaded OK, there was no sign of Voice Xpress. So I shut down Word and tried the XpressPad option. This worked fine: the mini word processor loaded, then Voice Xpress loaded on top of it, leaving its own menu bar at the top of the screen with controls for turning the microphone on and off, etc.
I went through a short and easy routine for calibrating my sound system and the (very light and easy to wear) headset microphone bundled in with Voice Xpress, then accepted the program's offer to take me on an excellently done five-minute tour of Voice Xpress's features and then on an interactive training session that quickly gets one up to speed on using commands to format text (bold this, italicize that, delete the previous paragraph, move the first paragraph to the end of the document, etc.)
Even without training to my voice, Voice Xpress was very good at recognizing formatting commands, though plain text dictation was messy, as one would expect before specific voice training (or what L&H call "enrollment"). One of the neat things about Voice Xpress versus the version (1.0) of Dragon NaturallySpeaking I reviewed a few months ago is that Voice Xpress lets multiple users use it (one at a time, of course; you can't just switch speakers in mid-session.) Each user has to go through the "enrollment" (training) process, and a separate profile of speech patterns is stored on disk for each named user. So after Mary has finished using it and shuts down the program, John can fire it up, select his own profile, and dictate to his heart's desire. (The latest versions of NaturallySpeaking and ViaVoice also have multi-user capability.)
The Voice Xpress enrollment process is very similar to that of NaturallySpeaking. It's easy to use, and takes about an hour. Whereas Dragon gives you the choice of a couple of long passages from books, L&H supply a whole bunch of witticisms, proverbs, and aphorisms that are actually quite fun to read.
So far, so good. The next step was to figure out how to make Voice Xpress work with Microsoft Word, as advertised. I tried loading Word after loading XpressPad, but Voice Xpress would still not work with Word, only with XpressPad. I called the L&H Help Desk. It took several longish (toll free) phone calls, with only one longish (maybe two minutes) wait, to get to the root of the problem.
The result of having bits of the old Microsoft Office 95 lying around on my hard disk was that Voice Xpress apparently got confused about where to copy the files needed to make it work together with Microsoft Word 97 (part of Office 97), and it simply did not copy them, nor did it give me any error message.
Before discovering this problem and its fix, however, we went through all sorts of gyrations. First, it was thought the problem might have been caused by my installing the program to the D partition on my hard drive, while the MS Word files were on the C partition. I had put Voice Xpress on D because it needs about 130 megs of disk space and my C partition only had about 50 megs free. Even clearing out old, unused files left only 90 megs free; still not enough.
L&H suggested moving Word over to D, so it would share the same partition with Voice Xpress. Tried that, and uninstalled/re-installed Voice Xpress from scratch—which meant I lost the file containing my voice patterns, painstakingly built through an hour of enrollment, and would have to go through it again.
But it still didn't work, and it was only when Paul and Jason, the two L&H support technicians I talked with, asked me to check for certain files that we discovered them missing from the crucial directory. Suddenly the problem became very simple, and the fix very fast: copy three small files from one directory to another. Bingo. Evetything worked great.
L&H's director of software development is onto the case, and you can be sure that: 1. The product really does work as advertised; 2. The installation problem I had was relatively obscure and is unlikely to affect most people; 3. L&H will fix the installation routine so people who've upgraded their versions of MS Word or MS Office won't encounter a problem at all in future releases; and 4. Until that fix is out, L&H technical support will know exactly what to tell you to do if you happen to call with this particular problem, and it'll only take two minutes to fix, not two days.
The only problem I have now is that the microphone recording volume seems overly sensitive, even though I have calibrated the volume almost to zero. The mic picks up all sorts of extraneous noises, like my dog padding along the wood and tile floor above me (I work in a basement home office, and my pitbull has long nails she won't let us clip—you don't argue with a pitbull—so she sounds a bit like a shoed horse walking on cobblestones.) The mic hears this clip-clopping-along and Voice Xpress tries to interpret it, and while it is interpreting I cannot interrupt it, not even to turn the microphone off. So sometimes, when the dog is restless (i.e., it's suppertime), I seem to get stuck in the program, and have to invoke CTRL-ALT-DEL to shut it down—in which case I lose any training that has taken place since I started the session.
Like all ASR programs, Voice Xpress has to be finicky about microphone and sound card. When I reviewed Dragon NaturallySpeaking, I had to go and buy a Sound Blaster audio card because the el-cheapo card that came bundled with my old Compaq Presario gave lousy results. The Crystal sound card that came bundled with my IBM 300XL worked OK with NaturallySpeaking, but not quite so well with Voice Xpress. I have calibrated the mic several times, and each time I get a "better than average" rating from the Voice Xpress utility program, so it's hard to figure out what to do.
It's neither surprising nor unreasonable that ASR is finicky about audio recording capability. ASR is a very demanding task, and ASR programs need sharp ears; sharp in the sense of being able to distiguish between sounds intended for it and sounds that are purely extraneous. The microphone and sound card constitute the program's ears. One can only hope that the major computer manufacturers—and they certainly include Compaq and IBM, whose machines I own—will soon get the message that programs like Voice Xpress require and create a demand for good audio recording circuits, not just good playback circuits. All sound cards today have pretty good playback circuitry. Playing back sounds and music has been all the typical PC user wanted out of their sound system, and few people have used their PC's recording capabilities. That's about to change, and equipment manufacturers need to get the message.
I'll be talking more with the good folks at L&H about my sound problem, and I'll doubtless be doing more tweaking, so I hope to report progress next week, along with my detailed report of the results of using the program; it's accuracy, speed of learning, and so on. Perhaps we'll have to buy slippers for the dog.
In the meantime, I hope I have whetted your appetite with my first impressions, after only a few hours working with Voice Xpress. The program is as good as Dragon NaturallySpeaking was when I first ran it, in terms of recognizing my speech; and may actually be a little better and a little faster to learn and adapt to my speech pattern.
Where it shines right out of the
box is in formatting and editing text. An AI sub-program is trained to
recognize a variety of natural language commands that let you control Word
(and XpressPad), and it works remarkably well. Editing is inevitable
with all ASR programs, because none can achieve 100 percent accuracy in
recognizing what you say, so the easier it is to edit the better. I found
it a good deal easier and more natural to move around my documents and
make edits with Voice Xpress than with NaturallySpeaking.
Not only is Voice Xpress more tolerant of different ways I might
say the same thing, but it can handle all the commands available as menu
options in Word. So instead of having to go through several mouse
clicks to create a table in a document, I simply say "Insert a 3 by 4 table
here," and . . .
Voilà!
Don't miss next week's gripping
conclusion.
Until
next week,
NEXT WEEK: Part 2 of the above review.
Help Wanted: Got questions or comments on this article or on any other AI-related subject under the sun? Post it in the AIBB!