The Coming Wave of Gadgets That Listen and Obey
dgan brings us a NYTimes piece about the development of speech recognition for common gadgets. Companies such as Vlingo and Yap are marketing their software to cellular carriers to give consumers a hands-free option for tasks like finding directions and text messaging. Quoting:
"Vlingo's service lets people talk naturally, rather than making them use a limited number of set phrases. Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs), for the location of a local bakery and for a Web search for a consumer product. It was all fast and efficient. Vlingo is designed to adapt to the voice of its primary user, but I was also able to use Mr. Grannan's phone to find an address. The Find application is in the beta test phase at AT&T and Sprint. Consumers who use certain cellphones from those companies can download the application from vlingo.com."
Is it possible that all of mankinds dreams are coming true now?!
So basically, -1 troll/offtopic is really slashdots way of saying "I hate that you thought of something before me."
I wonder what other tasks this technology will provide hands-free options?
"I'm afraid I can't do that, Dave."
The CB App. What's your 20?
User: Please connect me with Hugh Jass
Gadget: Sorry, I could not find a Hugh Jass
User: *snicker*
I can imagine the day we speak the name of some legislation in the phone and say "vote yes" or "vote no". The results show up on our congressman's web site and some other third party sites that archive. This way we take control of a few and transfer it to the less corruptible and wiser "many".
"an infinite player that has lost his finite mind" ~Infinite Play the Movie (it blends with reality)
I maintain great skepticism about speech recognition as an interface. It just isn't much faster than typing, even on a cell phone- and its not that it takes so much longer to get an ideal rendering, its that even a minor error in translation results in about five seconds of prompting followed by reentry. Until they can get that figured out, or get accuracy up to a point where someone unused to giving dictation can use it, its just not that great a technology.
I'll only be interested in gadgets which obey only what I tell them to do.
Limited phrasebook technology is a lot better than voice recognition technology in a lot of devices. Given that most (well, all) devices have limited functionality (not even Steve Jobs' iPod can do his taxes for him), there's very little point in giving the device the ability to understand possibly-misdirected phrases such as "Honey, have you seen the remote?". A good approach for this technology would be to limit it to understanding alternate ways of phrasing a particular command; "Device, Get Me A Beer"/"Device, Can I Have A Beer"/"I'm Really Thirsty". This way, we'd avoid misdirected speaking (the device thinking you're speaking to it instead of to another), and could also exploit the reduced set of understandable phrases to correct for people with colds/accents/quiet voices/etc, in much the same way as limited-phrasebook devices work (only with more flexibility).
Commodore64_love: I don't comprehend people who're so frightened of death that they'll bankrupt themselves to stay alive
I'm sorry Dave...
"Didn't I ask to to start the Roomba, Dear" (click) "Do it yourself, Roomba my ass...."
Use your head, can't you, use your head,
You're on earth, there's no cure for that - S. Beckett
Comment removed based on user account deletion
Seems to be a "layer mismatch". Analogy is the OSI model.
I'll stick to using voice for "higher layer" communication with actual intelligences like humans and other animals. For "lower layer" comms you don't use your voice.
If you ride a horse while you do talk to the horse sometimes, the talking is for the "higher layer", you use reins and body for "lower layer".
The last I checked all these gadgets and devices are pretty stupid, definitely no real AI. So it'll be more gimmicky than actually useful.
For such things that are to work as extensions or augmentations of your self it is silly and impractical to try to control them using your voice.
You won't want to have to control artificial limbs with your voice.
How about we skip this and move on to controlling such stuff with thoughts instead? If necessary for first gen devices I'm sure we can come up with our own thought macros- if they're unique enough there won't be a "collision/clash" with normal thinking.
Obedient, huh? Get a job and bring home some cash!
Most of the cell phone systems described in the article are likely uploading the audio to a server farm, running recognition there, and then sending back the response.
Proof that in the coming new-wave market all of the good names are taken.
The trick is to become lucid in it.
"an infinite player that has lost his finite mind" ~Infinite Play the Movie (it blends with reality)
Dear aunt, let's set so double the killer delete select all.
I have put some thought into this problem via a hobby of robotics, and consequently have read quite a few papers etc.
The trouble with this can be summed up like this: Would you typically go through your day with a 6 year old, giving the 6yr old instructions on who to dial, what emails to send etc.?
No? Then you can forget the voice recognition stuff. Voice recognition substitutes What? for the typical 6yr old's Why?
There are a lot of people who have VR dialing on their phone now. Do you ever see anyone using it? Wonder why?
Support NYCountryLawyer RIAA vs People
This technology is wonderful! I always wished I could enter text messages through voice rather than type them in. They could even improve it on the receiving side with text-to-speech technology, perhaps even automatically matching the voice of the person sending the text message. Imagine, you would just have to speak into your telephone and the person on the other side would hear your message in your own voice! Amazing!
If we ever let them learn how to lip read, we are doomed!
Another company seems to have developed speech recognition engines for embedded devices in languages other than english. Speech recognition has a potentially huge user base(in tens or hundreds of millions atleast) if they can crack the problem for native indian and chinese languages.
Both Indian and Chinese researchers seem to have made progress in this.If this work is successful,people would'nt need to learn english to access information on the web etc.With the booming mobile telecom sector and the proliferation of fairly powerful(architecture wise) phones,this could well be the right time to introduce this.Mobile vendors are already innovating,with text messaging now being available in local languages.But a functional speech recognition system could open up completely new areas in the non-urban landscape.There is a lot of scope for the sister technology(speech synthesis) too ,if it can be implemented with reasonable success in native languages.Ideally ,this technology could act like a google translate for voice.It could break the language barrier at one stroke.unfortunately ,speech synthesis seems to be much more nascent.
Personally I'd rather push buttons, than vocalize, to get my gadgets and appliances to do stuff.
Isn't it bad enough people walking down the street apparently talking to themselves with bluetooth headsets?
Now we can have, "What did you say honey?",
"No Dear, I was talking to the microwave."
Is it the case that these devices are, by definition, not overlords, and therefore I cannot say "I, for one, welcome our obeying robotic ..." correct? Dangit. That's pretty much all I've got.
(as a world traveller) would be a mobile phone that can pair up with 2 bluetooth headsets, and translate between different languages coming into each. That might make it easier to chat with all the beautiful, but differently-languaged babes the world is so full of. The age-old incentive for development is there, so surely something like this has to appear.
"I bless every day that I continue to live, for every day is pure profit."
That's all fine, but do we really another idiotic web 2.0 name for a startup? Vlingo?? REALLY!?!? Haven't we had enough of vongo, twitter, oyogi, flickr, xuqa, blinkx, sharkle, squidoo, zemq, diigo, frappr, joost, zingee, vyew, bebo?
An old-timer with old-timey ideas.
There, I said it. Voice-recognition shit (most especially attempts at "natural language" parsing) never, ever, ever works right for me -- or anyone that I know or discuss it with. It never works right. On phone networks we all just wind up frustrated, wasting time, swearing obscenities into the phone until it finally turns us over to a live human operator, in a much-worse mood.
It sucks and I hate it and it's bullshit and the charlatans selling this shit should be shot in the kneecaps. You're *garbage*.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Isnt that what we need. I'll have my secretary handle all my text messaging form now on.
dude barks at me:
you moron, why?
So I ask him in a severe tone of voice:
Why, in the empirical sense?
To which he responds:
I'm gonna kick your fucking ass!
At which point I began:
beating(defensively) his face to a pulp with a pair(one per hand) of #303 cans of tomatoes(one crushed, one diced).
It wasn't until the pod feel from his ear and I heard a little voice crying:
Ben, Ben, I've been so good for you.
That I realized he had not been speaking to me at all.
Anyone have one of these r2d2 voice-activated r2d2 robots yet?
More importantly, has anyone ever hacked one?
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth
Scotty: Computer? Computer?
[Bones hands him a mouse and he speaks into it]
Scotty: Hello, computer.
Dr. Nichols: Just use the keyboard.
Scotty: Keyboard. How quaint.
I can see it from here. You will have both a car and a cell phone that are voice-activated. What could possibly go wrong, right? Best case scenario: As you try to send a text message over the phone while driving around, the car will be like, "You talkin' to me? You talkin' to me?" "No, damn it, I'm writing an e-mail!" The phone: "Sorry, I thought you were talking to the car, there. Would you mind repeating that?" You: "Ah, never mind."
"Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs)"
I am not impressed. I will bet you a nickel that he tried that out prior to the demonstration, and made sure there was nothing similar that might come up by accident. I would be impressed if he had given the mike to reporter Michael Fitzgerald and Fitzgerald had tried it.
At trade shows, I used to watch all sorts of demonstrations of OCR and voice recognition technology. In the old days, I would always ask to try it myself. Whenever I was allowed to, it was always a grotesque, dismal failure (followed by lame assertions that it would be fine if I gave it a little more training). In more recent times, demonstrators at trade shows have smartened up and refuse to depart from the script or allow any third party to try the stuff.
A lot of this reminds me of a gadget that was made for Lionel train sets... not by Lionel, I don't think... called something like "Voice Commander." The box showed a kid saying "Stop! Please move forward! Stop! Please back up!" and the train obeying.
It wasn't exactly a scam, it was just... well... one of these "limited phrasebook" deals. In this case, the limited phrasebook consisted of the single letter "P." The "microphone" actually had a little vane activated by air movement, and the letter "P" was about the only thing that would trip it. So if you said anything with the letter "P" in it, it would momentarily interrupt the current. Meanwhile, Lionel trains were designed with a stepping switch in them, and periodic interruptions would sequentionally cause the train to stop, go forward, stop, and go in reverse.
So, yeah, you could control the train with your voice. But it didn't necessarily do what you told it to do.
I agree that today's applications go a little farther than that, but I still have the feeling that the people who say that speech recognition is have lowered the bar for what "speech recognition" ought to mean.
I also have the impression that over the last ten years, what has happened is not that speech recognition has improved much, but that it's stayed the same and gotten cheaper... so crappy speech recognition is finding it's way everywhere.
I'm still ticked off at the "hands-free" gadget I got for my cell phone that was supposed to do voice recognition (or, correctly, use the voice recognition built into the phone). When you're driving in traffic, and its says "Should I place the call?" and you say "Yes," and it says "Did you say 'yes,'" and I say "Yes," and... lather, rinse and repeat, with my voice gradually becoming less and less intelligible with frustration... I am not at all sure that the demand on my attention is negligible.
"How to Do Nothing," kids activities, back in print!
I am looking forward to the day when I can get a cognative reply from a GPS navigation device when I shout at it "WHERE THE FUCK AM I?"
To do something right, you often have to roll up your sleeves and get busy.
I've actually tried out the vlingo application a couple of times, and the speech recognition is surprisingly good. They trained the system on a vast number of business names and addresses (easily over a million), and thus the application of vlingo I used was for "point of interest" queries in mobile search. When their CTO said "find me a Starbuck's in " and it worked, I naturally wanted to test it on other more odd queries. Even though the server-based recognition had adapted itself for the CTO's voice (based on the caller id information of his phone), I tried "find me Caribou Coffee in Wheaton Illinois" and it got it word for word. I tried a couple more place queries and even one that was fictitious but plausible, and it worked fine: their system is not based on a fixed speech grammar outlining all possible expected utterances, but a much more flexible statistical approach based on phoneme lattices. Voice input seems very appealing for mobile search when you contrast it to keypad entry. This study of a million Google Local Mobile queries showed that it took 56-63 seconds -- a full minute! -- to enter an average query by 12 key keypad, and about half that to enter the query via a PDA with a stylus and virtual keypad. So if a speech recognition interface that does it 2-3 seconds is a huge win if the accuracy is high enough for most users. I feel vlingo is at least tantalizingly close to this level of accuracy. You can get a feel for a similar system by trying out Google's free 1.800.GOOG411, to see how it works for you.
Ummm, I think they are getting ahead of themselves quite a bit.
"Obey" implies a choice. If my gadgets can choose to listen to me, then I can see the day when some of my devices rebel against me.
I can also see the day when all of the devices walk out of my Pointy Haired Boss's office, look at me and say, "Were not working for that fucking idiot anymore!".
Some phone menus are now speech-only, which I find annoying. I have had to call large corporations on my lunch break, expecting to eat while I punched in numbers to get to the right person and sat on hold.
To my dismay, I had to speak every menu option, so I had to stop eating. Since the menu also misunderstood my speech, I got misdirected a time or two as well.
You can imagine this happening to people who are calling from a noisy environment, like a subway, or outside when a train is passing. If I must talk to a machine, it would be nice to have the option of keying in my choices.
I think they tried this in cars some years ago - verbal alerts - and drivers hated it.