Using PDAs for Dictation?
SunPin asks: "I'm a writer that is 99% dependent, due to fine-motor disabilities, on voice dictation. I've been a dictation user since 1990. My preference is 'discrete' speech because of very low resource consumption and its effectively infinite flexibility. Over the years, my computer use has de-evolved to programming, FTP, email (Mozilla), word processing (OpenOffice) and Ricochet. Drop the game and there's nothing that I shouldn't be allowed to do on the go. The problem is that I can't. Back in 1990, the requirements for IBM VoiceType were: DOS, 8MB RAM, 10MB of drive space with one of those new-fangled scorching 386-16MHz processors... not exactly demanding by today's standards and, unless I'm outright wrong, not demanding by today's PDA standards. Why hasn't it occurred yet?"
"In the disability offices of the hundreds of universities across the US, such software would be a major money saver because not all students need a high-powered laptop. While natural speech is great from a marketing perspective, it is simply impractical for general use and cannot adapt to mildly noisy environments. IBM, L & H and Microsoft have all given me the run-around. IBM refused to entertain the possibility. L & H is on life support, in a deep coma. Only Microsoft had a remotely positive response saying that they were testing natural recognition in Mandarin Chinese in their Beijing research office. Does anyone believe in keeping it simple, anymore?"
what kind of machine does Stephan Hawking use nowadays? Last I heard him in early 90's, he was still using 80286 based machine. Today's PDAs running at 400 MHz, 32 bit, should atleast be 50 times faster.
I think it's fairly clear that the person in this question has said they'd be happy with the functionality of the older software, if it were available on a PDA. That's not hard to understand is it? He's not asking why voice recognition is unpopular; given it is a niche application especially for people who can't use a keyboard. But for those people, isn't a PDA solution, even if it isn't up to "your" standards, a good idea?
I'm guessing the storage space requirements for that in terms of the data files the programs would use to map vocalizations to meaning would be the biggest stumbling block... Most mainstream PDAs only have 8mb of ram/storage combined, and Palm is still shipping devices with as little as 2mb. Your best bet might be one of the StrongArm based handhelds combined with a reasonably large CompactFlash/SecureDigital card... (E.g. Sharp Zaurus, Hewlett-ComPackard's iPaq, etc.) Of course, that's probably 300-500, but that's still less than a new laptop...
News for Geeks in Austin, TX
The Simputer comes with Text-to-Speech out of the box, but not Speech-to-Text. It does have microphone and USB jacks, so loading additional software may be an option. Battery life is in the not-so-great realm as the major downside.
Any spoon would be too big.
Yeah but the author claims he was happy with discrete speech processing on a 386-16 that we had back in the day. He doesn't want continuous speech that doesn't have to be trained and all that jazz - just simple old school voice recognition. Is it so much to ask that someone port the old algorithms to the palm?
11*43+456^2
I wonder where your claim that it doesn't work comes from. I have some experience (a couple of hours) with Philips FreeSpeech 2000 (I guess it's called that - I don't have it at hand). It recognizes natural speech fairly accurately. My guess would be that discrete speech is easier to recognize, making for better results and requiring less hardware real estate. I am absolutely willing to bet that it works a lot better than typing for people with disabilities. Your turn again.
Please correct me if I got my facts wrong.
Conjecture: Voice recognition on a PDA could work if you had a separate voice server over a wireless connection. So you have voice sent over a regular phone connection to you home pc (with modem) that does the recognition, it then spits back text (over another connnection?) to your PDA.
Some might say that this would make VR to slow. I don't see why this would be noticibly slower than doing VR in person. After all, when we talk on the phone the person on the other end hears us almost instananeously.
On a side note: my brother is doctor who uses VR to do his dictations. It is much cheaper than paying a transcription service. He also does not need to review the transcriptions afterwards for accuracy, because he essentially reviews it as he speaks it.
I don't think so.
He is correct, current markets go for the majority and don't bother for the minority (excepting small speciality groups).
Unless you show one of the big players how to turn it in to a cash cow, they won't put to much time or money in to it.
It's not just the phonetic sounds, but the multitude of various inflections and emphasis' that are lacking, and are pretty hard to reproduce, unless the TTS engine can interpret the meaning of the text.
Raising the voice at the end of a question may be easy enough. But how much? When? This is a question too, is it not?
A good orator would read a more 'exciting' passage more quickly, and with more enthusiasm, punctuating key verbs and nouns. How is software to know which passages are more exciting, and which arent?
It's not just a hard task for computers, but people too.
Computers read aloud at about the same level as poor orator. Pho-net-i-call-y, in a dull drab monotone. Drop by the local high school, and listen to them reading shakespeare.
Reading aloud may be simple, reading it well and naturally is a skill.
I don't need no instructions to know how to rock!!!!
In the late 90's there were 3 major SpeechWreck vendors: IBM, Lernout & Hauspie and Dragon Systems.
Microsoft poured a bunch of cash into L&H. L&H eliminated some competition by purchasing Dragon.
L&H did some highly irregular accounting tricks, got themselves thrown in jail, and took their comapny down with them.
End result: There is only really one speech recognition vendor at this time, IBM, and they are just useless at marketing consumer products.
Keep an eye on Phillips. They are currently spending big bucks developing their Speech Magic engine.
Your other option is to find a copy of Dragon Mobile. Record an audio file on your mobile, then have it recognized on your PC.
Yeah but the author claims he was happy with discrete speech processing on a 386-16 that we had back in the day.
The author might be happy with what he had those days. The rest of the market would not be happy with that. In fact, the market is not happy with what we have now, as witnessed by the very low penetration of voice-recognition software. So why would we expect companies to spend the resources porting the old stuff when the new stuff won't even sell ?
Palm applications, in particular, are designed around the idea of "forms" -- you put a form up on the screen, and then you sit there waiting for the user to do something. You don't run a constant loop listening to a microphone every minute, because that sucks up the battery like crazy. The Palm programming philosophy says that 99% of the machine's time should be, essentially, idle. Voice recognition, on the other hand, is very processor-intensive -- probably too much so for a pair of AAA's.
Breakfast served all day!
Tiny devices like cell phones and PDAs don't have the CPU power for sexy, high quality voice recognition. They do however have wireless connectivity. So, solve the problem this way...
Install voice recognition servers, network connected boxes with powerful CPUs and the best voice recognition software you can get your hands on. A voice recognition client then just needs to send the voice data up to the server and get the translation back, say 100kbps up and some tiny amount back.
The payback comes because most devices will only use voice recognition for brief periods, so will present a negligible load on the servers. The dictation users will place a higher load on the servers, but even there, I'm guessing there is a lot of pausing involved. I'm also going to guess that some lag is acceptable for dictation. Presumably the person is thinking about what they are saying and proof reading later. This load can be prioritized lower to allow better immediate response for people issuing voice commands on their mobile devices.
Power consumption on the portable device will probably improve. They will have to operate their transmitter (think "talk time" vs. "on time"), but they won't need 5 watts of CPU doing recognition. (Guessing from a mobile G3 PPC, further validated, considering that the CPU spot of my iBook gets far hotter under solid use than a cellphone.)
So, just to pick numbers out of the air, a dual processor, high end commodity hardware voice server might serve 500 pda users giving intermittent commands and 6 simultaneous dictation users.
A company or school could easily justify the hardware cost of this service.
Now, someone go out and build one.
You don't have to be disabled in some way to think this'd be handy, do you? That's the story for this one person, okay. But if you hadn't heard of a PDA ever before, wouldn't this be one of the most likely functions you'd think of for them? It's a totally natural application for a handheld gadget like that, and one that really would have a natural market among all the middle manager types who made Palms so popular to start with. Right?
(Are there PDAs that can even read text in the other direction, though -- text to speech?)
"Fundamentalism" isn't about divine morality. It's about human authority.
Maybe the way to approach voice recgnition through using air waves is all wrong to start with:
Bowman: "Hello, HAL? Do you read me, HAL?"
HAL: "Affirmative, Dave, I read you."
Bowman: "Open the pod bay doors, HAL."
HAL: "I'm sorry Dave, I'm afraid I can't do that."
Bowman: "What's the problem?"
HAL: "I think you know what the problem is just as well as I do."
Bowman: "What are you talking about, HAL?"
HAL: "This mission is too important for me to allow you to jeopardize it."
Bowman: "I don't know what you're talking about HAL..."
HAL: "I know you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen."
Bowman: "Where the hell'd you get that idea, HAL?"
HAL: "Dave, although you took thorough precautions in the pod against my hearing you, I could see your lips move."
The point of the article was that the poster was quite happy with what voice recognition could do on his desktop. So your blathering about it not being there is missing the point. The posters question was why hasn't the software which works fine on older PCs been ported to PDAs which now have equivalent or greater computing power ?
If I were as obtuse as you, I would avoid posting on the intelweb for fear of people finding me out.
The poster's question brings to mind a thought I've had lately, though, on PDAs and smart mobile phones. I've recently 'switched' from a Visor to just using my Sony Ericsson T68 as an organizer. Works great with iSync, etc.
The Palm-with-phone always made more sense to me than the phone-with-organizer. It seemed that the phone part could change shape - I could stick it in my ear in the form of a headset, with a connector to the Palm. A phone I need to hold up to my head. I can't surf with something held against my head that way.
However,
I've realized that I need a phone more, and more importantly, I only enter very small bits of text into the Palm. Furthermore, I spend much more time looking up things than entering things (as I use the Mac do enter data whever possible).
This led me to the conclusion -- the one thing we are missing from the organizer/phone landscape, as the poster asked, is some kind of speech-to-text.
If I could literally hit a button and say "lunch with Dave next Tuesday" and have it enter that as live text... blammo. No more Palm, no more stylus. The phone already listens to voice commands. If it took short notes/appointments, I could literally walk around, call people, make appointments and notes, and not take the thing out of my pocket. Nice dream.
*sigh*
If Jesus wants me it knows where to find me.
Exactly. The real problem is that speach recognition is a niche demand. Speach recognition in and of itself has no mainstream uses. Think of an office full of people using speach recognition. Not pretty. At home? People only want speach recognition if it is tied to computer commands. ("Computer, download my email, filter for spam, then read back the names of the senders.") Who's left? People who find typing difficult because of a physical limitation. While a worthy cause, it may well not be a profitable one.
Voice being the natural way to interact with devices? Think it through: an entire office trying to dictate to their word processing program all at once, with people popping in to each other trying to talk about work; an airplane of road warriors all trying to dictate stuff to their respective laptops at once (without saying anything confidential); support departments trying to make dictation work with fifty other people speaking commands to their respective clients; or programmers trying to spell their way through their creations.
And have you ever actually tried speaking for eight to ten hours at a stretch? I'm not talking about random, occasional speech acts, but sustained, focused speech. You'd have about three weeks until laryngitis became an occupational hazard among white-collar workers.
Speech is nice, but it is very much a niche application. Not only now, but ever. A keyboard is faster than speech, and does not contribute to noise level or occupational damage nearly as much as sustained speech would. It's a nice, even essential, mode of operation for those apps when a keyboard just won't do; the disabled, firemen, surgeons and so on will rightly love the interface. For mainstream use, however, it's just not good enough even when it's perfect.
It could become an accessory input, on the lines of replacing menu commands for an app: mark text, say "cut", mark a place, say "paste" and so on, but it just would never replace keyboard input in any mainstream application.
Trust the Computer. The Computer is your friend.
"Think of an office full of people using speach recognition. Not pretty. "
Almost as frightening as an office full of people all using telephones.
You don't remember typewriters and adding machines, or for that matter, the dictaphone, do you?
-fb Everything not expressly forbidden is now mandatory.
The new model of the Sharp Zaurus will have a built in microphone and an application where you can dictate notes right into calendar.
Just because you have 300 mhz doesn't mean it can do the same as a notebook computer. The CPU on a notebook has additional instructions (floating point arthithmetec), and more importantly it has additional chipset that support the main processor. Most PDA's use Risc processors ie in Palms. The current algrithems use a lot of floating point instuctions, the RISC processors do not have floating point. Most computers have multimeadia chipsets that are in addition to the main processor that most of you are thinking of. You mention only a few companies that have voice reconigtion but there are many more but not on the market now. One of them to watch is Apple it has had voice reconignition but not doing alot of new products with it. Dragon for Newton is the one I still use that takes simple words and does instructions but the Newton is not sold any more. The Newton is I think 100mhz. The Zarus should be able to do it with Linux. I think we will just have to wait. Microsoft will probably not do it because it has not show to be a money maker.