Are You Talking to Your PC Yet?
An anonymous reader writes "If you have ever asked "Do those speech-to-text apps like Dragon NaturallySpeaking and IBM ViaVoice really work?" Pocket PC Addict has posted a detailed review of Dragon Naturally Speaking for Pocket PC and Desktop machines. It is written from the perspective of someone who has been burned by speech to text software in the past and had vowed to never try one of these apps again. It is encouraging for slow typists who would like to use their voice to write. Plus it details some valuable tips for using it with Pocket PCs."
OS X has really good system wide integration for Voice commands. and the voice interpreter is pretty good for one that comes with the OS, but I could not get it to work consistently....
:-) (when it worked)
other than that I thought it was cool to say "computer give me brad's number" and it would display my buddy brad's phone number on the screen
I am the Alpha and the Omega-3
My Accent screws up everything. I hate my Accent.
Is anyone out there giving any thought to how a programming language should be structured to make it easy to code using a speech recognition engine?
If not, why not?
Sphinx is a great project. I definitely recommend it.
Attention deficit disorder is a complicated issue, spanning several major... HEY LET'S GO RIDE BIKES!
1. It's awkward to talk when you're trying to compose something that requires a lot of thought first. I usually like to talk to myself (either out-loud or in my head) and type out what I'm thinking in a more formal fashion.
2. It is very tedious to go back and edit or make corrections. If I make an error while typing, I'm cognizant of the error very soon after it happens. With voice recognition, techincally "someone else" is typing and it takes more time to see where the mistakes were made.
3. I deal with lots of boilerplate text with original content intermingled. A lot of times working on such a text becomes an editing process where using the keyboard & mouse is more efficient.
4. My voice doesn't last for much longer than 30 minutes for non-stop speaking...and that's with short breaks for water.
Conclusion: Just hire a hot secretary that can type.
Bill Clinton: Pimp we can believe in. - The Shirt!!!
...it worked OK as long as you trained it properly and you had a nice quite room and a good mic. However, there are issues with "voice typing" that can't be overlooked. Primary is security. If you want to type a document or e-mail that contains sensitive data, make damn sure that no one can hear you. My bank recently moved to a voice activated system. I'm surprised they haven't gotten a ton of complaints from people since it REQUIRES you to say your SS# and PIN out loud. This means I can no longer check my account from my cell phone or at work. If you sit down and think about how many things you type that you would never want to say out loud, you can see why voice typing hasn't taken off. Imagine this emanating from your cubicle in a monotone:
;P
"http://www.goat.cx/ Take that you bukkake loving lunixtards."
Your co-workers would think you were a nutjob if they saw half of what you posted as AC to Slashdot.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
I don't know if they have improved, but I am not too keen to try again.
Dragon comes in accent variants for particular countries but unfortunately, I have a slightly mixed accent.
This means I once spent an entire evening "training" the software, and repeating specific words over and over. It drives a person over the edge very quickly.
I am sure if I ever need to type something that is all obscenities, it will be really well trained. If I need to use any words with an oo sound in them, then I think I will have to type.
But then again, you have never seen pure aluminum...well, 99% of the population has never seen pure aluminum, it oxidies instantly. There are some methods for observation, but it's mostly not worth it, besides, pure aluminum looks mostly just like aluminum oxide.
The Macintosh has had this since the Quadra, and it has just the same problem. You drop aliases (or whatever) into your speakable items folder and it matches them to your speech. Problem is, when you get fifteen or twenty things in there (maybe more on a powermac, but I'm talking the quadra experience here) it frequently would match things that made no sense whatsoever.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I use the text-to-speech on several crontab entries. Chip (yes, that's the computer's name) will announce basic daily schedule items, such as the date in the morning, kid's bedtime, and a final signoff at 11pm. I added some checks so it wouldn't talk whenever iDVD or iTunes was running. I used to have it monitor news headlines too, but it would talk too often and we would tune it out.
I also tried some "Speakable Items" for basic tasks. Essentially, there is a special folder with a number of AppleScript files. The filenames are their voice triggers. If the computer hears you say one of those filenames, it runs the AppleScript. There are nested directories with items for specific applications, so you can speak the global commands or the active app's specific commands. Well thought-out.
Some Speakable Items could come in handy, but the eMac microphone is too limited to be able to command the machine from across the room. You also cannot have a set of Speakable Items somewhere which are still active when nobody's logged in. Thus, I need to have a user logged in (and then turned away with user switch). Lastly, for most of the automation tasks I'd like to run, Perl or Bash is a better choice than AppleScript, but Speakable Items must be special text-command files or AppleScript, and I can't imagine making a bunch of AppleScript stubs for each Unix-style script I would write. These each limit the usefulness of the voice-commandable appliance I was hoping for.
On the utility side, speech command would be great for specific queries, "Chip, what day is it?" and generic countdowns: "Chip, give me ten!" and he'll tell you when ten minutes have elapsed.
[
IMHO, the problem with this kind of engines is that they don't make a separation between speech to phoneme / phoneme to text.
:-/ )
If someone designs a good open source speech to phoneme architecture, I'm sure people would start working on phoneme to text AI algorithms.
They say: "Open source? Death!!! Where will our revenues for research go?"
But... what use is patenting/selling something that doesn't work in the first place?
Again, this is only my personal opinion. (I couldn't RTFA because... *slashdotted*
Some of my fellow classmates at Rice University and I are in the process of finishing up some Digital Signals projects.
One group is trying to do a voice conversion - one person says something and you change it to another person's voice. They're having fair success.... for the phrases they've already recorded.
My group recognizes an instrument playing within a larger context; it's working so long as the context isn't _too_ large. (i.e. picking out a clarinet in the recording of an orchestra).
What we've learned from this... If you match filtered your voice with the samples you saved on your computer, the matching works _extremely_ well (with well-chosen threshold values). We test with on the order of 40 notes and it picks each one out very, very cleanly.
You might be able to do something similar with your voice and the commands on the computer.
Michael Lawrence
I tried iListen when my wife was having difficulty typing. I ran through the training and played with it a bit to get familiar with it prior to having her use it. The accuracy rate was very high for me.
Then she tried to use it. Even the training procedure was difficult for her. She grew up in the midwest and had no discernable accent, so that wasn't the problem. Near as I could determine, she didn't always have the same inflection when saying many words. Without the consistency of pronunciation, the software couldn't learn correctly. She became very frustrated, which led to her over-enunciating the word in question, which just confused the software even more. It became shelfware.
I dragged it out a month ago and started using it again. I've gotta say, the response time on a Dual-G5 is pretty impressive. And for the smartasses out there; no, I'm not using it for this post, I'm at work. Isn't that where everyone reads slashdot?
I've been working recently on a language I call 'verbal'. My goal initially was a language I could use in the car, while driving. (I love to code.)
I realized that such a language would be useful for blind people and anyone who couldn't type.
The target is a language that will mimic a subset of English, so that a program might be:
I've written a compiler that translates that kind of thing into C, but I'm not releasing it just yet. It only has the type int, and no functions or objects. As soon as it can handle objects, I'll post it quietly.
(I got stuck for a day doing an elegant itoa.c, but that's done now. All I needed it for was to generate good labels for constants on the symbol table, and sprintf didn't fit right. Of course I found a slightly simpler one after I got it done.)
sigs, as if you care.
Exactly. Also, when will these programs start filtering for what the computer is putting out of the speakers (is there a sensible way to do this, to compensate for manual volume controls that the PC doesn't know about/control?)? I nearly always have some music on in the background...
You know you've been IMing too long when you almost say 'lol' out loud to a non-geeky friend...
Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, olny taht the frist and lsat ltteres are at the rghit pcleas. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by ilstef, but the wrod as a wlohe