[Disclaimer: In the interest of full disclosure, I work in speech recognition for SpeechWorks International. You may enjoy testing the SpeechWorks Demo Line at 1.888.SAY.DEMO (1.888.729.3366).]
A number of users have posted comments questioning the benefit of speech recognition operating on a PDA. Before examining this topic, I would like to quickly review the technical state of the art (in an effort to satify your inner-geek).
The Technology
Speech recognition operates by performing statistical matches of incoming sound against familiar words or phonemes (i.e. individual sounds; a works is composed of one or more phonemes). Traditionally, embedded speech recognition systems have featured small vocabularies (i.e. a limited set of recognizable phrases), but advances in processor speed are allowing larger and more complex vocabularies.
For applications in the telephony industry, a system running on a 500 MHz Pentium may support up to 50 lines and ten languages. These systems use a combination of directed dialog (asking for specific pieces of information - "what is your account number?") or limited natural language ("I'd like to fly from Boston to San Jose").
Returning to the PDA market, the task is to recognize a single user operating in a single language. This greatly reduces the memory and processor requirements. Further tradeoffs are possibly by adapting to the speech patterns of a single user as frequently occurs in dictation systems. But, as we will discuss below, the vocabularies are much more complex and word-spotting becomes vital.
Speech on a PDA
For simple tasks like navigation, the point and click interface works great. You get immediate feedback from the screen and you may peruse a page of information at a time. A speech based interface, in contrast, is more serial than parallel. If you are walking through a list of 50 items, your eyes will locate the correct item far faster than if the list is being read. Likewise speech recognition will not replace the keyboard for data entry. It is, however, a valuable suppliment which allows the user to jump to information not readily visible. While you're composing an email on the Palm Pilot, for instance, saying "Tell me the birthday of Jim Bob Jones" may be faster than navigating there yourself. Likewise, if you're navigating through a database of 20k companies, it may be easier to just say "Yoyodyne Propulsion Systems".
To make speech recognition useful on a PDA, the vocabularies must directly relate to the installed applications and information. Complex navigation using true natural language is a difficult and very much unsolved recognition task. But speech recognition on a PDA is even harder. Why?
Imaging that you're sitting in a cave and you hear "Dave, I'm sure that I've got it.. umm... that's not... no... Boston Sand & Gravel... come on...". You're the PDA. What does the user want? If you understood the context of the situation, you might recall the above example of company names in a database. You might say, I've got that installed and locate the entry for the 'Boston Sand & Gravel Company' for the user. But a PDA is not that smart. It needs to first pick out the allowed phrases from the noise and surrounding conversation. This is called 'word spotting'. Then it needs to decide how to interpret the phrase. Without a restricted application, the PDA must understand the context, frequently in human terms, of the speech.
If this seems hopeless with today's technology, you are correct. We will see speech applied first to limited interactions and simple applications. Over time, the domain will grow. Think back to handwriting recognition on the early Newtons. We've come a long way in a few years. On the PDA, the same will be true for speech.
-- Given one hour to live, the student replied: "I'd spend it with professor FP who can make an hour seem like a lifetime."
Anyone else see a problem with this?
by
zyqqh
·
· Score: 4
Those of you who own a PDA right now -- try to think of the last 5 places where you've used it. Thought of them? Good. Now think to yourself, in exactly how many of those places would talking aloud (esp to a little black box) be regularly tolerated? Maybe I'm seeing things from a distorted viewpoint, but I'd primarily have to use it in class, and, well, you can probably see what can come of that. I somehow doubt that modern speech recognition technology is sufficient to recognize instructions at a quiet-whisper level.
Yea, this has its applications for accessibility to people who can't use the stylus standard, but, as a mainstream item, I don't see this getting too far.
[Disclaimer: In the interest of full disclosure, I work in speech recognition for SpeechWorks International. You may enjoy testing the SpeechWorks Demo Line at 1.888.SAY.DEMO (1.888.729.3366).]
A number of users have posted comments questioning the benefit of speech recognition operating on a PDA. Before examining this topic, I would like to quickly review the technical state of the art (in an effort to satify your inner-geek).
The Technology
Speech recognition operates by performing statistical matches of incoming sound against familiar words or phonemes (i.e. individual sounds; a works is composed of one or more phonemes). Traditionally, embedded speech recognition systems have featured small vocabularies (i.e. a limited set of recognizable phrases), but advances in processor speed are allowing larger and more complex vocabularies.
For applications in the telephony industry, a system running on a 500 MHz Pentium may support up to 50 lines and ten languages. These systems use a combination of directed dialog (asking for specific pieces of information - "what is your account number?") or limited natural language ("I'd like to fly from Boston to San Jose").
Returning to the PDA market, the task is to recognize a single user operating in a single language. This greatly reduces the memory and processor requirements. Further tradeoffs are possibly by adapting to the speech patterns of a single user as frequently occurs in dictation systems. But, as we will discuss below, the vocabularies are much more complex and word-spotting becomes vital.
Speech on a PDA
For simple tasks like navigation, the point and click interface works great. You get immediate feedback from the screen and you may peruse a page of information at a time. A speech based interface, in contrast, is more serial than parallel. If you are walking through a list of 50 items, your eyes will locate the correct item far faster than if the list is being read. Likewise speech recognition will not replace the keyboard for data entry. It is, however, a valuable suppliment which allows the user to jump to information not readily visible. While you're composing an email on the Palm Pilot, for instance, saying "Tell me the birthday of Jim Bob Jones" may be faster than navigating there yourself. Likewise, if you're navigating through a database of 20k companies, it may be easier to just say "Yoyodyne Propulsion Systems".
To make speech recognition useful on a PDA, the vocabularies must directly relate to the installed applications and information. Complex navigation using true natural language is a difficult and very much unsolved recognition task. But speech recognition on a PDA is even harder. Why?
Imaging that you're sitting in a cave and you hear "Dave, I'm sure that I've got it.. umm... that's not... no... Boston Sand & Gravel... come on...". You're the PDA. What does the user want? If you understood the context of the situation, you might recall the above example of company names in a database. You might say, I've got that installed and locate the entry for the 'Boston Sand & Gravel Company' for the user. But a PDA is not that smart. It needs to first pick out the allowed phrases from the noise and surrounding conversation. This is called 'word spotting'. Then it needs to decide how to interpret the phrase. Without a restricted application, the PDA must understand the context, frequently in human terms, of the speech.
If this seems hopeless with today's technology, you are correct. We will see speech applied first to limited interactions and simple applications. Over time, the domain will grow. Think back to handwriting recognition on the early Newtons. We've come a long way in a few years. On the PDA, the same will be true for speech.
Given one hour to live, the student replied: "I'd spend it with professor FP who can make an hour seem like a lifetime."
Those of you who own a PDA right now -- try to think of the last 5 places where you've used it. Thought of them? Good. Now think to yourself, in exactly how many of those places would talking aloud (esp to a little black box) be regularly tolerated? Maybe I'm seeing things from a distorted viewpoint, but I'd primarily have to use it in class, and, well, you can probably see what can come of that. I somehow doubt that modern speech recognition technology is sufficient to recognize instructions at a quiet-whisper level.
Yea, this has its applications for accessibility to people who can't use the stylus standard, but, as a mainstream item, I don't see this getting too far.
// zyqqh