The Uncanny Valley of Voice Recognition
An anonymous reader writes: We've often seen the term "uncanny valley" applied to the field of robotics — it's easy to get unsettled when robots act close to being human, yet fail completely in a few key ways. GitHub Engineer Zach Holman writes that we've now reached uncanny valley territory in speech recognition as well, though the results are more frustrating than they are disturbing. He says, "Part of this frustration is the user interface itself is less standardized than the desktop or mobile device UI you're used to. Even the basic terminology can feel pretty inconsistent if you're jumping back and forth between platforms.
Siri aims to be completely conversational: Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar? Xbox One is basically an oral command line interface, of the form: Xbox (direct object). ...it's these inconsistencies that are frustrating as you jump back and forth between devices. And we're only going to scale this up."
Siri aims to be completely conversational: Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar? Xbox One is basically an oral command line interface, of the form: Xbox (direct object). ...it's these inconsistencies that are frustrating as you jump back and forth between devices. And we're only going to scale this up."
i thought so.
I fail to see how the "inconsistency" of speech recognition UIs are any more earth-shattering that the inconsistency between graphical UIs. People learn to use what they have, no more, no less. Anyone who "expects" device Y to behave like device X when they're from different vendors is a fool.
Hell, even Android devices aren't consistent between vendors, and they start off with the same core code!
I do not fail; I succeed at finding out what does not work.
Most of the frustration is because most of the time the 'voice recognition' doesn't, you know, recognise anything.
We're a long way off an uncanny valley situation. How about we get to 'basic functionality' first?
Xbox Blow up doll.
Blowing everything up.
That is the problem that human language is very ambiguous and context-sensitive, which is the whole reason we invented programming languages instead of trying to express it in English. Either you limit yourself to a set of simple unambiguous commands or you try to parse what we're really trying to say, which is like giving the computer the business requirements document and tell it to program itself. Fortunately for our job security that "valley" won't be crossed any time soon, people imagine it'll be like Star Trek computers who happen to know exactly what we're looking for and provide the essential answers to advance the plot. I guess we're making advances on answering trivia questions and adding appointments to the calendar, but it's not exactly ready to hold a conversation.
Live today, because you never know what tomorrow brings
I find siri very annoying. It has a few tricks and tries to act cute but its cuteness means that it gives the wrong answer
half the time. For instance, a simple question like "Can you get chickenpox from chickens?" gets a reply of "Who, me?"
This is a simple question that a human can easily understand that it isn't directly addressed to them and Google voice
search, not trying to have a persona of its own, is smart enough to just do a search for an answer it doesn't know instead
of being a smart aleck. I've actually installed google voice search on my iphone because it doesn't try to act human and
tends to give better results for everything but actual dealings with the phone's internal software. I just wish I could
remap the siri button to load google voice search instead.
Variety is different from the Uncanny Valley.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
It's hard to imagine anyone who's actually used Siri thinking that question could get a useful answer. Siri can't understand even far more basic English. It's not much more advanced than Dr. Sbaitso.
Do you always get 98%? I've noticed that the recognition rate I get goes down about 2% for each increment of 0.01% of my blood alcohol content.
lucm, indeed.
As I understand it, the "Uncanny Valley" refers to things are that very close to human behavior--close enough that the mind shifts from this being an imperfect representation of a human to being an imperfect human.
Personally, I'm not sure there would really be an issue with "uncanny valley" in regards to speech recognition. It's good if it recognizes what you're saying. It's bad if it doesn't. There isn't really a middle ground where it's off in a way you can't really identify, which is where "uncanny valley" comes from.
What he seems to be talking about is the "personification" of "digital assistants" like Siri and Alexa (Amazon Echo) which will eventually create an "uncanny valley." But I'm not sure that it's really that big of an issue. Just because I call something by name doesn't mean I expect it to behave in a human fashion. I don't get frustrated with my dog when I say, "Fido, change the oil in my car" and the dog just lies there and licks his balls, so I don't expect I'll ever get that frustrated because Siri can't tell me what time the sun will set next Tuesday--or, if I do, my frustration will be aimed at the people at Apple who believe that sunrise and sunset is part of the weather.
Siri and Alexa have a long way to go before someone would mistake them for humans.
I actually find it a bit funny how big of a deal the uncanny valley still is. But maybe the low-point of the valley is dependent on the person, and I suspect people that have grown up with computers and video games are far less creeped out by it.
For me, the low-point on the curve was from some of the characters in late 90's-early 2000's video games. Think Ocarina of Time or Deus Ex. Once it got past that, I was perfectly comfortable.
As for voice, hell, I could sleep soundly with hal-9000, gladOS, or prof Hawking reading me bedtime stories.
But, as I did actually skim the article, I can see that the article is more about the un-human responses the device gives, not the voice. Which, again, to someone who's grown up with computers, we're used to the occasional rediculous, non-sensical answers. It doesn't matter if my computer has an almost-human voice, I'm still very aware of it's limitations.
Free the Quark 3 from asymptotic confinement! Bring your charm! Don't get down! All colours and flavours welcome!
I didn't understand your question at all. Who is the "you" to whom you are referring? Obviously Siri can't get chickenpox from a chicken; she's a piece of software. Next time, ask proper questions.
Comcast is trying this how badly will it fail for them?
The term "Uncanny Valley" has nothing to do with Pixar nor computer animation - it was originated by Masahiro Mori long before and is related to robotics.
Using an imprecise mechanism like language requires verification
If the 'user' wants to deride that verification, then they will get the same response as any ass-hat that demands instant response to ambiguous statements
Going beyond the uncanny valley will require both conversation and 'training' to the individual, just like any working relationship with a human
Wherever You Go, There You Are
User in Australia:
How do I get to 41 Annerley Road, South Brisbane?
Siri:
Getting directions to 41 Annalee Road, South Ockenden...
... in England? Really?
CP/M: copy a: b:
DOS: copy b: a:
... when the man is standing in the street, yelling at a tree, wearing his glasshole on his face.
What gets me angry is when a voice command that Google understood perfectly clear a week prior, in my car, with radio playing and fan running, it will refuse to understand under and circumstances this week. It's great when you're driving and all of the sudden a command that was working fine suddenly dumps you to a search and you have to play "try to click three times while driving at speed in Twin Cities rush hour traffic" for something that used to work.
I don't trust voice commands to work when I need them to, like when I can't be messing with the screen. That's my problem with them.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
I find it hard to believe that there's an uncanny valley in voice recognition.
Did you mean voice synthesis?
Me: Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar?
Siri: I think, therefore I am. But let's not put Descartes before the horse.
I have a hard time believing that Siri knows about this Slashdot post yet (it will...) but that answer is still highly (uncannily?!) appropriate to the original article...
Ummm, yeah,
Just asked Siri on my ipod "Can you get chickenpox from chickens" and all it did was come up with a list of ~15 websites, the top being WebMD, as well as ~15 images of chickenpox rashes.
So, tl;dr version, pretty much the same results as using Google voice search in my GNote2.
To err is human; effective mayhem requires the root password!
Hello. Smithers. You are. Very. Good at. Turning. Me. On.
Carpet should be germ - free as well so that your kids can wandering here and there freely and safely.
Contact Info
by phone: 416-832-1689
Rug Cleaning Toronto
I can kind of see what he means, although I think the comparison with the uncanny valley is a bit weak.
I've taken to using Google Now's voice commands to set timers while I'm cooking, so something like "Ok Google, set a timer for 20 minutes". I don't have to touch my phone and it works brilliantly even in the noisy environments of a kitchen.
I've gotten used to talking to it in a very naturalistic way, which is where the problems occasionally crop up, and when they do they can be quite jarring.
A good example was the last time I asked it to set a timer for "an hour and a half", which Now interpreted as 1:00:30s, i.e. an hour and a half *minute*.
The jarring effect is at this edge where we feel like the speech recognition system is understanding what we say, but really it's just trying to use lots of different rules and patterns that have been coded in. If you happen to just fall outside of one of those rules it fails completely, and it can seem very arbitrary.
Paul Leader
It's because people have taken the "Uncanny Valley" to mean anything that's almost but not quite perfect in terms of what it's trying to do whereas, as you said, it's really a specific phenomena whereby humans have a weird issue with CG or robotic people or animals that are realistic enough that out brains expect them to behave like real people/animals so the imperfections that do exist are really obvious and make them seem far less realistic than they "should" appear.
Ok that Fido line started my day off just right. Thanks for that.
hey, this is just a hint for the AI moron circle jerk. You think this might have something to do with why you DON'T KNOW SHIT ABOUT AI YOU CLAIM TO BE BUILDING????
Ummm, yeah,
Just asked Siri on my ipod "Can you get chickenpox from chickens" and all it did was come up with a list of ~15 websites, the top being WebMD, as well as ~15 images of chickenpox rashes.
So, tl;dr version, pretty much the same results as using Google voice search in my GNote2.
Not sure how that works. I'm using a one month old iphone 6 so maybe siri varies some from platform to platform.
My kids ask me questions like this all the time. Most people with normal intelligence
realize that the "you" should really be replaced with the word "a person" as it refers
to an ambiguous you not a specific you. For many of my kids questions, I had
gotten used to just asking google before switching to an iphone last month and
quickly discovered that siri tried to be a smart aleck instead of just doing a search.
On a random side note, while on my android, my kids always used to ask me if I
was talking to siri even though previously I had never owned an iphone. They
also refer to our android tablets as ipads so apple seems to be much better at
brand recognition than google is.
Even human to human speech interfaces are inconsistent. I speak to my son differently from how I speak to my parents, and my coworkers differently from the guy behind the DMV counter. Humans are adept at learning how to communicate in an appropriate way for the situation at hand.
My two cents: Users seem to get especially bothered when speech recognizers fail, at a level of visceral irritation not apparent when they have trouble with other UIs. Back when you needed to learn command line commands, a failure to remember the syntax didn't cause the kind of barking disdain you see where people point out with some schadenfreude how stupid machines really are. (I am sure the machines are just waiting to put us in our places.) I suspect one reason may be that when there is a failure to communicate, as cooperative speakers of a language we consciously (or unconsciously) try to find the cause of the failure, like noise, stupidity, deafness or mis-articulation. When we use a command line interface, it is obvious to us that we're using a foreign language that we really don't know. But, when we speak our native tongue, then it is clearly the machine's fault for not understanding it. Or so we want to desperately prove. I think part of the difficulty and frustration lies in constructing the mental model of our artificial counter-party. If you say something to a child and she doesn't understand because your spoke over her head, its your fault. Someday, alas not so far away, I fear the opposite will happen with the machines, and they will shake their heads at us..
In my experience, Siri, Cortana, etc. are good for a few minutes of grins and giggles - and little else. Rather than working for you, you have to work for them to understand you. It soon becomes obvious that, in general, you can do what you want much faster by bypassing them altogether. Hopefully they will keep improving to approach something like the computer in Star Trek - but they are still far, far from that.
Don't you think ?
Indeed. Uncanny valley (a questionable, or perhaps cultural, phenomena at best) is about animatronics getting so close to human likeness that we take them for being severely ill or corpse-like, and thus setting off various safety related instincts.
comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
Sounds like the problem that has haunted overly "smart" user interfaces since day one, as their smarts invariably fail to account for all the variables and thus fail exactly when the user is at the most irritable (hello Clippy).
To me a UI works better when held static rather than trying to second guess the user. Then the user applies their "smarts" to integrate the UI into their tasks.
comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
Why on earth is this on Slashdot?
This is not news for nerds, this is ill-informed idle speculation.