Using PDAs for Dictation?
SunPin asks: "I'm a writer that is 99% dependent, due to fine-motor disabilities, on voice dictation. I've been a dictation user since 1990. My preference is 'discrete' speech because of very low resource consumption and its effectively infinite flexibility. Over the years, my computer use has de-evolved to programming, FTP, email (Mozilla), word processing (OpenOffice) and Ricochet. Drop the game and there's nothing that I shouldn't be allowed to do on the go. The problem is that I can't. Back in 1990, the requirements for IBM VoiceType were: DOS, 8MB RAM, 10MB of drive space with one of those new-fangled scorching 386-16MHz processors... not exactly demanding by today's standards and, unless I'm outright wrong, not demanding by today's PDA standards. Why hasn't it occurred yet?"
"In the disability offices of the hundreds of universities across the US, such software would be a major money saver because not all students need a high-powered laptop. While natural speech is great from a marketing perspective, it is simply impractical for general use and cannot adapt to mildly noisy environments. IBM, L & H and Microsoft have all given me the run-around. IBM refused to entertain the possibility. L & H is on life support, in a deep coma. Only Microsoft had a remotely positive response saying that they were testing natural recognition in Mandarin Chinese in their Beijing research office. Does anyone believe in keeping it simple, anymore?"
Next thing, you'll be wanting a machine to wash your dishes and clothing, or, heck, let's be crazy, and send moving pictures around the world!
I think it has more to do with the perception of voice dication as unreliable and resource intensive rather than any actual fact, as the poster points out, it can be done fairly cheaply.
I have not had much experience, but I think the other thing is that people are averse to any sort of training or teaching required, no matter the long term dividents.
Like most things, it comes down not to fact, but to perception and prejuidice. Most people base their buying decisions on 30-second spots, not informed research, so the cost of educating people to is too high for producers to incur.
It's the other, most overlooked piece of hardware used in speech recognition, the microphone. The junky headset given away with ViaVoice or the el cheapo unit sold in Radio Shack for under $10 makes most people's experiences with voice recognition software less than favorable. Invest in a $50-$60 professional headset and the ability of the software to accurately detect your speech patterns improves dramatically. How are they going to shoe horn a high fidelity audio sound processor in there? Maybe a USB headset might be the answer assuming the device can accept USB devices.
I'm also going to assume that the current line of speech recognition products are MUCH better than what ran on your old 386.
IBM can't even manage to do this on, for example, a P3 733EB. How they're going to do it on a 300MHz XScale or SH chip or similar (let alone a Motorola Dragonball) is beyond me. I think your head is in the clouds.
With that said, voice recognition is very much on everyone's minds and it is coming. The limiting factor in handhelds right now is battery technology, which seems to be advancing more rapidly now than it has been in the last decade or so. With more power density comes faster processors and more ram, and the ability to perform these kinds of operations on smaller computers.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Dragon has a portable product that you dock to your PC to do the voice to text. You can bring it with you, then connect it when you're home. A digital recorder is available bundled with the software, or you can use any micro cassett recorder and a Norcom playback and interface device. Seach Google for info!
I'm guessing the storage space requirements for that in terms of the data files the programs would use to map vocalizations to meaning would be the biggest stumbling block... Most mainstream PDAs only have 8mb of ram/storage combined, and Palm is still shipping devices with as little as 2mb. Your best bet might be one of the StrongArm based handhelds combined with a reasonably large CompactFlash/SecureDigital card... (E.g. Sharp Zaurus, Hewlett-ComPackard's iPaq, etc.) Of course, that's probably 300-500, but that's still less than a new laptop...
News for Geeks in Austin, TX
With a simple search for dictaphone I was able to find a product called EXSpeech. I think this is what you are looking for.
>> First off, buying a dictaphone ...
DICTAPHONE? DICTAPHONE?
re-vulcanize my tires, post-haste. And make sure this post is on the next auto-gyro to Prussia.
I don't need no instructions to know how to rock!!!!
It's not just the phonetic sounds, but the multitude of various inflections and emphasis' that are lacking, and are pretty hard to reproduce, unless the TTS engine can interpret the meaning of the text.
Raising the voice at the end of a question may be easy enough. But how much? When? This is a question too, is it not?
A good orator would read a more 'exciting' passage more quickly, and with more enthusiasm, punctuating key verbs and nouns. How is software to know which passages are more exciting, and which arent?
It's not just a hard task for computers, but people too.
Computers read aloud at about the same level as poor orator. Pho-net-i-call-y, in a dull drab monotone. Drop by the local high school, and listen to them reading shakespeare.
Reading aloud may be simple, reading it well and naturally is a skill.
I don't need no instructions to know how to rock!!!!
You can get a version of ViaVoice for the PocketPC. However, it sucks. It's not a real dictation system though- it only allows you do use a pretty small pre-defined group of commands, not general english word dictation. I was pretty disapointed. However, I wouldn't be surprised if eventually there will be a full-blown ViaVoice Embedded version for the PocketPC.
As usual, there are some results that come up with a simple Google search.
There was a Dragon Naturally Speaking beta for the Newton OS 2.1, and it works OK. But it's still a beta and is far from perfect.
If you're looking for voice recognition for other PDAs, including PalmOS or Linux devices, you'll probably have much less luck.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Basically, they are working to analyze speech in slices (phonemes) instead of the more computationally intensive task of the whole word. This would lead to a higher success rate and could be easily used across multiple accents of the same language (English, engrish, etc).
I'm excited about what they could accomplish there.
-Cyc
/.'s 10 Millionth
Only recently have PDA's been shipping with anything approaching a good DAC and many PDAs still lack any ADC support. Without a good Analogue to Digital convertor built into the PDA you won't be able to do voice recognition. Remember that your 386 still required a soundcard to work properly. The same is true for PDAs today.
Since the asker wanted to know WHY nobody has done this yet, I'll spell it out:
Basically the major pitfalls to developing this are: :)
1) Crappy algorithms that mangle what you really said into something unrelated
2) Power Consumption
3) Interfacing to the PDA (not hard to do, but non-trivial)
4) Limited PDA capabilities (Remember that Palm's DragonBall is a RISC architecture, and things like speech recognition NEED floating point math which must be emulated)
The solutions:
1) Somebody (not unlike me...) has to code the already existing better algorithms (check the literature - speech recognition is a mature technology, and publications abound) into a usable chunk of code, instead of simply recycling ViaVoice or NaturallySpeaking's libraries.
2) Add more battery storage.
3) Use another processor to do the conversion, then simply write it to the Palm in a serial stream.
I would just wait about a year, then ask that question again to your physician friends, and see what they whip out of their pockets... :)
Palm applications, in particular, are designed around the idea of "forms" -- you put a form up on the screen, and then you sit there waiting for the user to do something. You don't run a constant loop listening to a microphone every minute, because that sucks up the battery like crazy. The Palm programming philosophy says that 99% of the machine's time should be, essentially, idle. Voice recognition, on the other hand, is very processor-intensive -- probably too much so for a pair of AAA's.
Breakfast served all day!
Lo, many years ago I had a lot of luck with EARS on my 66MHz 486. It's a very simple discrete trainable recogniser; you have to teach it every word before it would recognise it. But it was fast then, it should be really fast now, and was pretty decent for recognising simple commands.
"IBM: Where software goes to die."
and it also has 8k of storage! You could store all of "Hel" on that thing
If I can find a machine to wash my dishes AND clothing, I'd say that'd be pretty cool!
Some mics do this mechanically also - They have a port on the reverse side of the mic element so it only detects pressure differences between the two sides of the mic, i.e. only nearby sounds coming from one side of the mic (your mouth). Plantronics has plenty of these - Such NC headsets are common thanks to cellular telephone handsfree kits being required by law in some states, and they are quite good. (I love my Plantronics headset.)
retrorocket.o not found, launch anyway?
I worked on dictation and dialogue on a PDA prototype at MS several years ago. It was called MiPad and was pretty cool. Well except that it really had to use a wireless network to a computer to get the recognition done.
There are a couple of reasons why this hasn't hit the market yet:
1) the PDAs really are not powerful enough to do decent recognition. Mainly, they don't have good enough audio input systems for reasonable speech quality. Also not enough disk space for dictionary storage. And the cpus are slow and the RAM is too low.
2) at least at MS it is not a top priority to make speech work for disabled users. Outrageous you say? Not so! Turns out when the speech guys approached the accessability guys on the subject, they learned that speech recognition is not workable in most cases where accessability is needed; that is to say, the market for disabled people who cannot use the keyboard but who CAN use speech input is actually quite small. Most people who don't have the motor function to type (or use some sort of keyed input like Stephen Hawking has) dont have the motor function to speak clearly enough for speech recognition to work. Bottom line: other solutions work better.