Using PDAs for Dictation?
SunPin asks: "I'm a writer that is 99% dependent, due to fine-motor disabilities, on voice dictation. I've been a dictation user since 1990. My preference is 'discrete' speech because of very low resource consumption and its effectively infinite flexibility. Over the years, my computer use has de-evolved to programming, FTP, email (Mozilla), word processing (OpenOffice) and Ricochet. Drop the game and there's nothing that I shouldn't be allowed to do on the go. The problem is that I can't. Back in 1990, the requirements for IBM VoiceType were: DOS, 8MB RAM, 10MB of drive space with one of those new-fangled scorching 386-16MHz processors... not exactly demanding by today's standards and, unless I'm outright wrong, not demanding by today's PDA standards. Why hasn't it occurred yet?"
"In the disability offices of the hundreds of universities across the US, such software would be a major money saver because not all students need a high-powered laptop. While natural speech is great from a marketing perspective, it is simply impractical for general use and cannot adapt to mildly noisy environments. IBM, L & H and Microsoft have all given me the run-around. IBM refused to entertain the possibility. L & H is on life support, in a deep coma. Only Microsoft had a remotely positive response saying that they were testing natural recognition in Mandarin Chinese in their Beijing research office. Does anyone believe in keeping it simple, anymore?"
I'm guessing the storage space requirements for that in terms of the data files the programs would use to map vocalizations to meaning would be the biggest stumbling block... Most mainstream PDAs only have 8mb of ram/storage combined, and Palm is still shipping devices with as little as 2mb. Your best bet might be one of the StrongArm based handhelds combined with a reasonably large CompactFlash/SecureDigital card... (E.g. Sharp Zaurus, Hewlett-ComPackard's iPaq, etc.) Of course, that's probably 300-500, but that's still less than a new laptop...
News for Geeks in Austin, TX
Yeah but the author claims he was happy with discrete speech processing on a 386-16 that we had back in the day. He doesn't want continuous speech that doesn't have to be trained and all that jazz - just simple old school voice recognition. Is it so much to ask that someone port the old algorithms to the palm?
11*43+456^2
It's not just the phonetic sounds, but the multitude of various inflections and emphasis' that are lacking, and are pretty hard to reproduce, unless the TTS engine can interpret the meaning of the text.
Raising the voice at the end of a question may be easy enough. But how much? When? This is a question too, is it not?
A good orator would read a more 'exciting' passage more quickly, and with more enthusiasm, punctuating key verbs and nouns. How is software to know which passages are more exciting, and which arent?
It's not just a hard task for computers, but people too.
Computers read aloud at about the same level as poor orator. Pho-net-i-call-y, in a dull drab monotone. Drop by the local high school, and listen to them reading shakespeare.
Reading aloud may be simple, reading it well and naturally is a skill.
I don't need no instructions to know how to rock!!!!
Yeah but the author claims he was happy with discrete speech processing on a 386-16 that we had back in the day.
The author might be happy with what he had those days. The rest of the market would not be happy with that. In fact, the market is not happy with what we have now, as witnessed by the very low penetration of voice-recognition software. So why would we expect companies to spend the resources porting the old stuff when the new stuff won't even sell ?
Palm applications, in particular, are designed around the idea of "forms" -- you put a form up on the screen, and then you sit there waiting for the user to do something. You don't run a constant loop listening to a microphone every minute, because that sucks up the battery like crazy. The Palm programming philosophy says that 99% of the machine's time should be, essentially, idle. Voice recognition, on the other hand, is very processor-intensive -- probably too much so for a pair of AAA's.
Breakfast served all day!
Voice being the natural way to interact with devices? Think it through: an entire office trying to dictate to their word processing program all at once, with people popping in to each other trying to talk about work; an airplane of road warriors all trying to dictate stuff to their respective laptops at once (without saying anything confidential); support departments trying to make dictation work with fifty other people speaking commands to their respective clients; or programmers trying to spell their way through their creations.
And have you ever actually tried speaking for eight to ten hours at a stretch? I'm not talking about random, occasional speech acts, but sustained, focused speech. You'd have about three weeks until laryngitis became an occupational hazard among white-collar workers.
Speech is nice, but it is very much a niche application. Not only now, but ever. A keyboard is faster than speech, and does not contribute to noise level or occupational damage nearly as much as sustained speech would. It's a nice, even essential, mode of operation for those apps when a keyboard just won't do; the disabled, firemen, surgeons and so on will rightly love the interface. For mainstream use, however, it's just not good enough even when it's perfect.
It could become an accessory input, on the lines of replacing menu commands for an app: mark text, say "cut", mark a place, say "paste" and so on, but it just would never replace keyboard input in any mainstream application.
Trust the Computer. The Computer is your friend.