Using PDAs for Dictation?
SunPin asks: "I'm a writer that is 99% dependent, due to fine-motor disabilities, on voice dictation. I've been a dictation user since 1990. My preference is 'discrete' speech because of very low resource consumption and its effectively infinite flexibility. Over the years, my computer use has de-evolved to programming, FTP, email (Mozilla), word processing (OpenOffice) and Ricochet. Drop the game and there's nothing that I shouldn't be allowed to do on the go. The problem is that I can't. Back in 1990, the requirements for IBM VoiceType were: DOS, 8MB RAM, 10MB of drive space with one of those new-fangled scorching 386-16MHz processors... not exactly demanding by today's standards and, unless I'm outright wrong, not demanding by today's PDA standards. Why hasn't it occurred yet?"
"In the disability offices of the hundreds of universities across the US, such software would be a major money saver because not all students need a high-powered laptop. While natural speech is great from a marketing perspective, it is simply impractical for general use and cannot adapt to mildly noisy environments. IBM, L & H and Microsoft have all given me the run-around. IBM refused to entertain the possibility. L & H is on life support, in a deep coma. Only Microsoft had a remotely positive response saying that they were testing natural recognition in Mandarin Chinese in their Beijing research office. Does anyone believe in keeping it simple, anymore?"
http://slashdot.org/article.pl?sid=02/11/19/234216 &mode=thread&tid=100
IBM can't even manage to do this on, for example, a P3 733EB. How they're going to do it on a 300MHz XScale or SH chip or similar (let alone a Motorola Dragonball) is beyond me. I think your head is in the clouds.
With that said, voice recognition is very much on everyone's minds and it is coming. The limiting factor in handhelds right now is battery technology, which seems to be advancing more rapidly now than it has been in the last decade or so. With more power density comes faster processors and more ram, and the ability to perform these kinds of operations on smaller computers.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Dragon has a portable product that you dock to your PC to do the voice to text. You can bring it with you, then connect it when you're home. A digital recorder is available bundled with the software, or you can use any micro cassett recorder and a Norcom playback and interface device. Seach Google for info!
With a simple search for dictaphone I was able to find a product called EXSpeech. I think this is what you are looking for.
Is the poster just dissatisified with existing software: or pissed because he wants to be computing star Trek style and never will?
Stephen Hawking uses text --> speech, not speech --> text (considering he can't speak). Text --> speech is easy, speech --> text is not.
You can get a version of ViaVoice for the PocketPC. However, it sucks. It's not a real dictation system though- it only allows you do use a pretty small pre-defined group of commands, not general english word dictation. I was pretty disapointed. However, I wouldn't be surprised if eventually there will be a full-blown ViaVoice Embedded version for the PocketPC.
As usual, there are some results that come up with a simple Google search.
There was a Dragon Naturally Speaking beta for the Newton OS 2.1, and it works OK. But it's still a beta and is far from perfect.
If you're looking for voice recognition for other PDAs, including PalmOS or Linux devices, you'll probably have much less luck.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Only recently have PDA's been shipping with anything approaching a good DAC and many PDAs still lack any ADC support. Without a good Analogue to Digital convertor built into the PDA you won't be able to do voice recognition. Remember that your 386 still required a soundcard to work properly. The same is true for PDAs today.
Since the asker wanted to know WHY nobody has done this yet, I'll spell it out:
Basically the major pitfalls to developing this are: :)
1) Crappy algorithms that mangle what you really said into something unrelated
2) Power Consumption
3) Interfacing to the PDA (not hard to do, but non-trivial)
4) Limited PDA capabilities (Remember that Palm's DragonBall is a RISC architecture, and things like speech recognition NEED floating point math which must be emulated)
The solutions:
1) Somebody (not unlike me...) has to code the already existing better algorithms (check the literature - speech recognition is a mature technology, and publications abound) into a usable chunk of code, instead of simply recycling ViaVoice or NaturallySpeaking's libraries.
2) Add more battery storage.
3) Use another processor to do the conversion, then simply write it to the Palm in a serial stream.
I would just wait about a year, then ask that question again to your physician friends, and see what they whip out of their pockets... :)
The headphone isn'y an issue, like you said, make it accept USB and get a good headset type mic and your good.
The problem is in recognizing what you said, the best software out there still sucks and you have to train it forever. No matter what you will have to train it to recognize your voice. My saying car and some one from Boston saying car are drastically different but they are the same word. Given a lot of training you can get something halfway decent but it still requires corrections. This is especially true if you have a cold, you just woke up or are sleepy.
It's a very complex thing and I don't see any signifigant breakthroughs anytime soon. I've used quite a lot of programs (with a good microphone) and you can get ok results especially for simple things like "Open" "Close" but I think we're a long way from really good dictation software.
-Chris
http://www-3.ibm.com/software/speech/handheld/ipaq _fam.html
Lo, many years ago I had a lot of luck with EARS on my 66MHz 486. It's a very simple discrete trainable recogniser; you have to teach it every word before it would recognise it. But it was fast then, it should be really fast now, and was pretty decent for recognising simple commands.
Unfortunately for you, discreet speech is seen as passe by the major players (IBM, L&H, MS). For a long time, continuous speech was seen as the major boundry to widespread acceptance of general purpose dictation software (another boundry was the support of large vocabularies). Eventually, processor power and algorithms evolved to a point that both barriers were overcome and discrete speech (and small vocabs) were left by the wayside.
One byproduct of this was a decrease in voice error correction performance -- Most verbal corrections are single words (e.g., the user selects the misrecognized word, "foo" and repeats the intended word "bar" without any of the coarticulation cues that the continuous recognition engine relies on). The recognition of isolated words by a continuous speech recognizer is inferior to the performance of a discrete system, yet the major software companies removed the discrete recognition engines from their products. (for more on speech errors, see this or this pdf).
Anyway, the use of discrete recognition engines has been essentially abandoned by the major players, and seems to have been relegated to the specialty shops that cater to disabled users. One outcome of this is that there is very little innovation related to discrete speech because it was one of (many) historical barriers to the use of desktop speech reco. I can certainly understand the resistence by the big companies to go back to an "inferior" recognition engine for handheld devices. Most likely, speech reco on the handheld will emerge in a client-server environment with the speech signal (maybe somewhat processed) being sent from the handheld to a server for recognition, and the text being returned to the handheld. We probably won't see a general purpose speech recognition application (as opposed to a limited vocab application) that runs solely on a handheld until continuous processing can be done entirely on the device.
and it also has 8k of storage! You could store all of "Hel" on that thing
See
p aq _fam.html
http://www-3.ibm.com/software/speech/handheld/i
Plantronics makes several headsets with microphone that only require a USB connection, but do not require a sound card. They work quite well, and this should lower the hardware requirements for a small, lower-powered device.
http://www.plantronics.com
and search for their DSP-*00 series. I picked up their DSP-500 (normally $110) for $40 on a deal.
Everyone has gotten so used to the idea that computers will do exaclty what we tell them. SR will never be 100% reliable (or even 99%) because of the noisy communication medium - air. Therefore you will always need some handy error correction protocol (commonly called dialogue).
Have you ever wondered about how well people recogize speech. If something is blurted out at random we rarely catch the meaning first time. "What?". If humans have a lot of trouble understanding each other (about 20% error rate) then computers have no chance when it comes to out-of-the-box out-of-the-blue dictation. And computers don't have the benefit of a decade of childhood, not to mention millions of years of evolution.
What I'm getting at is that computers need a great deal of context to succeed (to reduce the number of possible interpretations, and therefore the number of ways of getting it wrong).
(I'm speech recogition engineer - our company went bust last year - another dot bomb).
1) the algorithms are good (trust me, i've seen them)
2) the training takes bloody ages - it takes weeks (and tera-bytes of data) to get good results across most of the speaking population.
3) dialogue is very hard.
4) actual recognition is fast (we had dozens of simulateous recognitions on 600Mhz machines).
The take home message: Train the users. Manage expectations. Say bye bye to HAL.
wrong..
I work in a retail location that sells both items, PDA's and Dictaphones. Basic Palm PDA (Zire of M105, for example both run for $169.99 CDN...
Dictaphone is $300
Do the math, a basic palm would be cheaper and more cost effective...
There may be a better solution though...
Olympus, on the other hand, manufacturers a very cool little digital voice recorder, that has a USB docking station, uses MP3 compression to boot. Dictate to the recorder, download the MP3 to your desktop, and run it through ViaVoice or Dragon Naturally speaking, and you get a word document.
Dragonball's Motorala's, not Palm's. It is a CISC, not RISC, more specifically a M68K. RISC is usually better than CISC at floating point, but both architectures can go without a floating point unit, and that's what Dragonball does.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Some mics do this mechanically also - They have a port on the reverse side of the mic element so it only detects pressure differences between the two sides of the mic, i.e. only nearby sounds coming from one side of the mic (your mouth). Plantronics has plenty of these - Such NC headsets are common thanks to cellular telephone handsfree kits being required by law in some states, and they are quite good. (I love my Plantronics headset.)
retrorocket.o not found, launch anyway?
Absolutely.
Any PDA dictation system would need to have at least 1000 triphones. In total they would use around 20MB.
Ack it filtered out my URL.
http://www.dictaphone.com
Since it doesn't look too promising I think you may want to expand your search beyond PDAs. I saw several references to the linux based simputer, maybe one of those with Linux based speech-to-text software is the way to go?
Work for Change & GET PAID!
The guy doing the demo was probably dumbing up a basic microphone tactic that's been in use for decades.
There are not two microphones in that headset - that would just make it worse, since no PC it would run on is real time enough to match the sound samples together, etc, etc, etc.
Instead they use a dual port microphone. The element lies between the front of the mic (towards the speaker) and the back (towards ambient noise). Sound pressure from ambient noise tends to hit both the front and back simultanously, while sound pressure from the speaker hits only the front. The difference gives mainly the speaker, with muted external sound
Even cheap mics have that now. The main difference between a good mic and a bad one is its construction and materials, which affect its response characteristics.
-Adam