Slashdot Mirror


The Uncanny Valley of Voice Recognition

An anonymous reader writes: We've often seen the term "uncanny valley" applied to the field of robotics — it's easy to get unsettled when robots act close to being human, yet fail completely in a few key ways. GitHub Engineer Zach Holman writes that we've now reached uncanny valley territory in speech recognition as well, though the results are more frustrating than they are disturbing. He says, "Part of this frustration is the user interface itself is less standardized than the desktop or mobile device UI you're used to. Even the basic terminology can feel pretty inconsistent if you're jumping back and forth between platforms.

Siri aims to be completely conversational: Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar? Xbox One is basically an oral command line interface, of the form: Xbox (direct object). ...it's these inconsistencies that are frustrating as you jump back and forth between devices. And we're only going to scale this up."

20 of 83 comments (clear)

  1. y'all fixing to handle this problem? by turkeydance · · Score: 2

    i thought so.

  2. I fail to see how it's any worse than other UIs by msobkow · · Score: 4, Insightful

    I fail to see how the "inconsistency" of speech recognition UIs are any more earth-shattering that the inconsistency between graphical UIs. People learn to use what they have, no more, no less. Anyone who "expects" device Y to behave like device X when they're from different vendors is a fool.

    Hell, even Android devices aren't consistent between vendors, and they start off with the same core code!

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:I fail to see how it's any worse than other UIs by Shadow+of+Eternity · · Score: 4, Insightful

      Exactly. The problem isn't frustration with different interface schemes, the problem is that they don't fucking work. I use several different programs with buttons and menus in different arrangements, but when I click a button the button is bloody well clicked regardless of where exactly it is. Voice recognition on the other hand is simply too unreliable.

      --
      A bullet may have your name on it but splash damage is addressed "To whom it may concern."
    2. Re:I fail to see how it's any worse than other UIs by Dutch+Gun · · Score: 2

      There's a standard for speech recognition already, as long as you're talking about "intelligent agents", which the Xbox One is certainly not: Natural English (or insert your language here) conversation. The gold standard, no pun intended, should be to phrase queries or commands in such a manner that any reasonably intelligent native speaker could easily understand your intent, and the computer should perform those tasks or retrieve that information for you.

      At this point, the only reason there's jarring inconsistencies is because these systems are still very primitive. In essence, for Android, I more or less can expect my statements to be translated into the equivalent of a Google search. As long as the keywords are there, it can more or less pull up any information I need within some reasonable limits. I know Siri has some specific queries that the system recognizes, but I haven't played with it or Cortana yet to know how complex those can get.

      The Xbox One and Siri have two different "interfaces" because they have completely different goals. This is natural. The Xbox one has no need for natural language processing - it just needs to recognize a specific and limited set of commands. It would be more accurate to compare Siri to Cortana. And with all new technologies, it will take a while of experimenting to find something that works well, and does so naturally and seamlessly. Then everyone will copy it, corporations will try to patent it, and sue the ever-living crap out of the copiers, etc, etc. Live and business goes on...

      --
      Irony: Agile development has too much intertia to be abandoned now.
    3. Re:I fail to see how it's any worse than other UIs by someoneOtherThanMe · · Score: 2

      but when I click a button the button is bloody well clicked

      Looks like you don't have much experience with cheap touch screens.

    4. Re:I fail to see how it's any worse than other UIs by Rockoon · · Score: 5, Funny

      Picard: "Computer, Fire at will!"

      -> Commander Riker is suddenly shot down by an automated defense phaser mounted on a security turret.

      Picard: "What the..."

      -> Computer: "Please restate your question"

      Worf: "His death was without honor"

      --
      "His name was James Damore."
    5. Re:I fail to see how it's any worse than other UIs by jc42 · · Score: 4, Interesting

      but when I click a button the button is bloody well clicked

      Looks like you don't have much experience with cheap touch screens.

      Heh. You obviously haven't work with any of the more expensive ones. I have a small collection of different portable gadgets for web testing, and that statement about buttons definitely isn't true for the various Apple tablets or phones. Thus, there's a little "x" icon whose function is to close the tab/window. I've learned to just start tapping it about twice per second, and maybe by the 3rd or 4th or 6th or 10th tap, it'll close.

      Of course, the little monster might know very well that I'm tapping it, but wants to see how serious I am about it.

      Of course, Apple's gadgets aren't the only ones like this. They're just one of the worst of a bad lot. And often it's a good idea to not tap too fast, because when the window finally closes, it usually gets replaced with another that'll do something totally unexpected when you tap it in that newly-exposed spot.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    6. Re:I fail to see how it's any worse than other UIs by jez9999 · · Score: 2

      I just do not think you have had enough experiment with. delete that, delete that, dear mom let's set so double the killer delete select all

  3. Re:It's because they don't work... by AuMatar · · Score: 2

    Depends on your accent. I get about 98% recognition. I still don't use it because its easier to type/swype.

    --
    I still have more fans than freaks. WTF is wrong with you people?
  4. Variety by penguinoid · · Score: 2

    Variety is different from the Uncanny Valley.

    --
    Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    1. Re:Variety by penguinoid · · Score: 2

      Specifically, if this was an Uncanny Valley then people would prefer lower quality voice recognition.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    2. Re:Variety by Half-pint+HAL · · Score: 2

      The concept of the uncanny value is pseudoscience anyway.

      Then think of it as a theoretical paradigm that gives us a useful conceptualisation until we have a better understanding.

      My only beef with the uncanny valley is that too many analyses stop there. People investigating cartoons saw that there was a subtle interplay between drawing quality and animation quality: if the drawing quality is better than the animation quality, it looks fake, but the opposite is not true -- even simple stick men can look real when well animated. This was known decades ago, yet was completely ignored by many sections of the video game world. The effects in FPSes from Quake into mid 2000s were noticable. Even if you had a detailed texture and a good walk cycle, the stretch and distortion of the textures during animation didn't mesh, and the character ceased to look real. And yet I could quite readily relate to the cartoony characters with their stylised movements in Final Fantasy VII.

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  5. Ridiculous by bipbop · · Score: 3

    Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar?

    It's hard to imagine anyone who's actually used Siri thinking that question could get a useful answer. Siri can't understand even far more basic English. It's not much more advanced than Dr. Sbaitso.

  6. More than the accent by lucm · · Score: 2

    Do you always get 98%? I've noticed that the recognition rate I get goes down about 2% for each increment of 0.01% of my blood alcohol content.

    --
    lucm, indeed.
  7. Re:That's not the uncanny valley by lucm · · Score: 2

    I guess we're making advances on answering trivia questions and adding appointments to the calendar, but it's not exactly ready to hold a conversation.

    It's a good thing. If I have to start holding a conversation with my computer to get it to manage my calendar it will become higher maintenance than my secretary, who only needs a cheap gift basket on Secretary Day and a small smack on the butt when she remembers the extra espresso shot in my latte.

    --
    lucm, indeed.
  8. I don't think that means what you think it means by R3d+M3rcury · · Score: 5, Informative

    As I understand it, the "Uncanny Valley" refers to things are that very close to human behavior--close enough that the mind shifts from this being an imperfect representation of a human to being an imperfect human.

    Personally, I'm not sure there would really be an issue with "uncanny valley" in regards to speech recognition. It's good if it recognizes what you're saying. It's bad if it doesn't. There isn't really a middle ground where it's off in a way you can't really identify, which is where "uncanny valley" comes from.

    What he seems to be talking about is the "personification" of "digital assistants" like Siri and Alexa (Amazon Echo) which will eventually create an "uncanny valley." But I'm not sure that it's really that big of an issue. Just because I call something by name doesn't mean I expect it to behave in a human fashion. I don't get frustrated with my dog when I say, "Fido, change the oil in my car" and the dog just lies there and licks his balls, so I don't expect I'll ever get that frustrated because Siri can't tell me what time the sun will set next Tuesday--or, if I do, my frustration will be aimed at the people at Apple who believe that sunrise and sunset is part of the weather.

    Siri and Alexa have a long way to go before someone would mistake them for humans.

  9. Re:Uncanny valley in recognition? by Half-pint+HAL · · Score: 2

    No, it's a fair view. The uncanny valley is all about intolerance of diversion from the expected norm. When VR was all stilted commands, all users quickly became accustomed to it (even if they didn't like it). The problem with Siri is that at first use, we're not expected to treat it like a formal system -- we're encouraged to interact act with it in an unconstrained way... yet it doesn't respond to that. It is "broken human" rather than "stupid machine".

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  10. Re:I don't think that means what you think it mean by NoNeeeed · · Score: 4, Interesting

    I can kind of see what he means, although I think the comparison with the uncanny valley is a bit weak.

    I've taken to using Google Now's voice commands to set timers while I'm cooking, so something like "Ok Google, set a timer for 20 minutes". I don't have to touch my phone and it works brilliantly even in the noisy environments of a kitchen.

    I've gotten used to talking to it in a very naturalistic way, which is where the problems occasionally crop up, and when they do they can be quite jarring.

    A good example was the last time I asked it to set a timer for "an hour and a half", which Now interpreted as 1:00:30s, i.e. an hour and a half *minute*.

    The jarring effect is at this edge where we feel like the speech recognition system is understanding what we say, but really it's just trying to use lots of different rules and patterns that have been coded in. If you happen to just fall outside of one of those rules it fails completely, and it can seem very arbitrary.

  11. Re:I find siri's lame attempts to be human annoyin by Wycliffe · · Score: 2

    My kids ask me questions like this all the time. Most people with normal intelligence
    realize that the "you" should really be replaced with the word "a person" as it refers
    to an ambiguous you not a specific you. For many of my kids questions, I had
    gotten used to just asking google before switching to an iphone last month and
    quickly discovered that siri tried to be a smart aleck instead of just doing a search.
    On a random side note, while on my android, my kids always used to ask me if I
    was talking to siri even though previously I had never owned an iphone. They
    also refer to our android tablets as ipads so apple seems to be much better at
    brand recognition than google is.

  12. Re:It's because they don't work... by jc42 · · Score: 3, Interesting

    I speak standard BBC English, and I have often been described by people as "the easiest person to understand in the company" in many different companies.

    I my experience, the recognition rate appears to be about 2%.

    Not surprising; your "BBC English" and our "media English" over here in North America are basically artificial dialects developed by the broadcast industries starting back in the 1940s. They even managed to do some fairly scientific testing, assembling listeners with different native dialects, and counting their mistakes when listening to different proposed pronunciations of various words and phrases. Their intent was to to develop dialects that were easily understood by most of their target audiences, and they did a reasonable job of it.

    This doesn't help the computers' voice recognition software very much, though, because few customers speak these "standard" artificial dialects well. The software people aren't working on making the customers understand the computer's speech; they're trying to get the computers to understand untrained humans speaking their native dialects. This requires rather different processing than what the broadcasters were trying to do, and is a much more difficult task for us humans, too. It doesn't help that the computers are often listening to humans who aren't totally awake and sober ...

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.