Review of L&H Voice Xpress (Part 2)

Dateline: May 24, 1998

In my elation at finally getting Voice Xpress to work (see last week's article) I paid little attention to the fact that when I rebooted my PC after finally getting through setup successfully, Microsoft Word and Voice Xpress loaded automatically. This would be OK if Word was the main program I use, but it isn't—Netscape Communicator is, and I also use other programs quite a bit. So when my machine fires up, I want to see my highly customized and personalized desktop and Office toolbar, not the Microsoft Word screen.

The reason I initially paid little heed was because I thought it would be easy to fix. Must be in the Startup folder, I thought—but no, I look and it's not there. Maybe Office 97 is doing it? If so, I can't see where. I've looked everywhere—in Control Panel/System, in Startup, in Office, in Voice Xpress and in Word themselves; and darned if I can figure out how Word and Voice Xpress come to be autorun on bootup, or more important how to turn the autorun off.

Anyway, on with . . .

The Review

To put Voice Xpress through its paces, I began by dictating from a handwritten draft synopsis of my book manuscript, without pausing to edit mistakes. Next, I went back through it to correct misinterpretations. Then I dictated the piece again, corrected that, and repeated the process two more times, before the process got totally tedious.

At each iteration, there was a noticeable improvement in recognition accuracy, and by the fourth and final iteration (and having overcome or sidestepped some of the problems I encountered in using the program, which I discuss below), the program was performing well enough to be a faster alternative—even with the time needed to edit misrecognitions—to typing text at the keyboard. And that, essentially, is the real benefit of ASR (automatic speech recognition) programs for those who can type. For those who can't type at all, the benefit is even greater.

I have captured the difference between the first and fourth iterations and highlighted the errors and their corrections, and I urge you to look at these as they are really the meat of this review. I have made them separate documents rather than inserting them here simply to keep the size of this file down, so it will load faster for you. You should bear in mind also that the improvement in recognition took place after a relatively short learning period, leading me to conclude that if I continue using the program regularly it will become very good indeed in terms of recognition accuracy within a month or so.

Recognition accuracy is the most important aspect of any ASR program, but it's also important that it be easy to use and not be buggy. The following sections describe problems encountered in using the software. If I appear to be being tough on Voice Xpress, it's only because I think it's a good program with the potential to be even better.

Using Voice Xpress

During dictation or subsequent editing, I had some difficulty selecting text to edit. For example, saying "Select into!" ("Into" was a misinterpretation for "In" as the first word on the first line) sent the cursor halfway down the document to highlight another instance of "into," even though my cursor was set at the very beginning of the document. I tried it three times with the same result, and finally gave up, deciding instead to use the mouse to select text—it was quicker, but it defeated the object. Most of the time, the program does select the text you want, but it's not 100 percent.

At one point when I went to correct a misinterpreted word (I had said "further" and it typed "for the") I noticed an option called Pin on the corrections menu. So I selected the Help icon (a question mark) on that menu to find out what Pin was. I got an error message: "corrections.HLP" could not be found, and did I want to look for it? Yes, I did, and found it in the "Speechcenter" subdirectory within the Voice Xpress program directory. Seems like another instance of files not being copied to where they are supposed to be copied during installation and setup (see last week's article).

After all that, the Pin option turned out to be cosmetic and quite unimportant anyway.

The correction method used by Voice Xpress is either not working as it should, or it is not as clever as Dragon NaturallySpeaking's correction method. In the latter, selecting a misinterpreted word and saying "Correct that!" instantly produces a list of similar-sounding words likely to contain the word you want. You can then select the word from the list by saying "Select [the number next to the appropriate word]!" If the word you want is not in the list, as you begin to type it in, NaturallySpeaking updates the word list with words beginning with the letters you type, so you may only have to type a few letters before seeing the word you want.

Voice Xpress only gave me the option of typing in the word I wanted to substitute for the mis-heard word. It did not make suggestions.

In both programs, after making a correction a Train option lets you teach the program how you pronounce the corrected word or phrase. NaturallySpeaking is smarter: it asks you to speak both the misinterpreted word and the correct word, so it gets better at recognizing the difference in the way you say them. Voice Xpress only asks you to speak the correct word, so (unless I am missing something here, or unless this is another bug in Voice Xpress) it is not going to learn the nuances in your pronunciation so well or so quickly, or both.

The Em Dash Debacle

This is not a big deal unless—like me—you like to use the Em dash in your writing. If you do, it turns out to be quite a big deal with Voice Xpress; so big that I have written a separate report about it (to save space and to avoid boring those who don't care a fig about Em dashes).

A Memory Hog

When (and only when) running Voice Xpress I keep getting messages from Windows NT, some critical, some non-critical, that my system memory is running low. I already had 125MB of disk space reserved for virtual memory, but ended up adding another 200MB (on the D drive, since my C drive has only about 50MB free). This seems to have reduced the frequency of the memory messages, but has not eliminated them entirely. They re-surfaced when I tried to open and edit my book manuscript, a 722KB (and growing) Word file.

I have now determined that the best way to use Voice Xpress on my machine, a 266MHz Pentium II with 64MB of RAM, is to shut down every single program including the Microsoft Office toolbar and memory-resident stuff such as the AltaVista personal indexer/search engine and RealAudio, so there is nothing running except Windows NT, Microsoft Word, and Voice Xpress.

Word itself is often slow to respond to voice and keystroke commands with Voice Xpress loaded, but it returns to its usual quick responses after Voice Xpress is closed. All of this suggests that either L&H programmers have some work to do to tighten up the program code and manage memory better, or a 400MHz Pentium II with several hundred megabytes of RAM would be nice.

This is not really a criticism of Voice Xpress. All ASR programs are memory hogs. They need to keep tens of thousands of words in quick-access memory at all times, plus your speech patterns for them. The processor, too, has its work cut out in continually scanning for voice or keyboard input and then shuffling the input against the words and speech patterns in memory. It's just a huge computing task, and I for one am grateful that the programs work as well as they do on the current generation of standard PCs.

(It's about time our desktop machines came with multiple processors. Pentiums can be joined to work in parallel. Parallel processing would give exponentially better performance. I'm not sure what the holdup is, but I suspect it's that most programmers don't know how to write code to take advantage of parallel architectures. Offer them more pizza.)

The Bottom Line

Voice Xpress is a good program. It works pretty much as advertised, apart from some installation and use bugs which, while annoying and time-consuming initially, can be overcome or avoided if you know about them (which is why I have documented them here and in the previous article), and until such time as L&H fixes them. The program is as good as NaturallySpeaking v. 1.0 at recognizing my speech and is reasonably quick to learn the quirks of my much-adulterated northern English accent. Its ability to control Microsoft Word is really neat, and while the latest version of NaturallySpeaking can do that too, I have not tested the latter.

If you do not already own but are interested in buying automatic speech recognition software, you will not be disappointed either by Voice Xpress or by Dragon's NaturallySpeaking. (IBM's ViaVoice may be another contender, but I have been unable to get a review copy so cannot comment.) The first version of NaturallySpeaking had a recommended selling price of $699, but the street price was half that or less. The current "DeLuxe" version is still listed at that ridiculous price (see Dragon's Web site). If Dragon doesn't do something about that, it will lose a lot of market share to $99 Voice Xpress Plus, which is just about as good a program.

And market share may prove to be very important for ASR products, at least until genuinely speaker-independent software—that can recognize anyone's speech accurately, without training—arrives, in a few years' time. (Dragon needs to take note that Microsoft, which owns a slice of L&H, takes a fanatical interest in market share.) In the meantime, those who have already purchased one of the three competing packages (Voice Xpress, NaturallySpeaking, ViaVoice) are unlikely to switch for the simple reason that, more than most other kinds of software, ASR is by nature intensely personal.

It takes time for you and the program to get to know one another. More time than the hour of initial training or "enrollment." Having spent time reviewing and getting to know NaturallySpeaking several months ago, it was a wrench for me to switch over to using Voice Xpress, simply because after a few months' use NaturallySpeaking and I were starting to get along pretty well.

One thing this present review has proved to my satisfaction, however, is that Voice Xpress and I could have at least as good a relationship within a fairly short time.
 
 

  Until next week, 

 

NEXT WEEK: Analog computing. What is it, and will it be necessary for the creation or emergence of a truly intelligent machine?

Help Wanted: Got questions or comments on this article or on any other AI-related subject under the sun? Post it in the AIBB!

Previous Features