Slashdot Mirror


Microsoft Speech Recognition Now As Accurate As Professional Transcribers (techcrunch.com)

An anonymous reader quotes TechCrunch: Microsoft announced today that its conversational speech recognition system has reached a 5.1% error rate, its lowest so far. This surpasses the 5.9% error rate reached last year by a group of researchers from Microsoft Artificial Intelligence and Research and puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times. Both studies transcribed recordings from the Switchboard corpus, a collection of about 2,400 telephone conversations that have been used by researchers to test speech recognition systems since the early 1990s. The new study was performed by a group of researchers at Microsoft AI and Research with the goal of achieving the same level of accuracy as a group of human transcribers who were able to listen to what they were transcribing several times, access its conversational context and work with other transcribers.

10 of 176 comments (clear)

  1. Laughable Hype by bwanagary · · Score: 5, Interesting

    On a daily basis in my work environment Microsoft technology is used to a) record voicemail and b) generate text from the speech.  Never, ever, have I received any converted voicemail that wasn't completely unintelligible gibberish.  Seriously.  This is utter nonsense.

    1. Re:Laughable Hype by avandesande · · Score: 4, Funny

      You should start talking with people who don't speak gibberish.

      --
      love is just extroverted narcissism
  2. Errors are not Errors by idji · · Score: 5, Insightful

    When a human transcriptionist makes a mistake you can usually work out what they meant. When Speech-to-text (STT) makes a mistake it is often gibberish. So objectively it is "better" at transcribing, but subjectively much worse.

    1. Re:Errors are not Errors by AmiMoJo · · Score: 4, Interesting

      Not any more. One of the ways that they got the accuracy up so high is by giving the machine an understanding of English and common phrases, similar to what a human has. It's been used for input correction on smartphones for a while too, e.g. with the Google keyboard it can correct the previous word based on the next one you type if it realizes that they don't make sense together.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    2. Re:Errors are not Errors by jellomizer · · Score: 4, Informative

      Normally we have transcriptionist who are trained in a particular area to understand the context of the message. A legal transcriptionist requires different training then a Medical Transcriptionist.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    3. Re:Errors are not Errors by AmiMoJo · · Score: 4, Interesting

      It's more than just syntax and grammar rules. For example, Google has been mining the web for that kind of knowledge. You can see it in Google Translate sometimes. It generates suggestions for your input, and sometimes screws up like thinking "alot" is a word. It also uses colloquialisms in its output, which again it gathered from analysis of the web and which doesn't fit standard grammar or syntax rules.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    4. Re:Errors are not Errors by gnick · · Score: 4, Insightful

      A legal transcriptionist requires different training then a Medical Transcriptionist.

      And sometimes even that training falls short. Does anyone remember the explosion at WIPP when the tech transcribed "an organic kitty litter" instead of "inorganic kitty litter"?
      Kitty litter explosion.

      --
      He's getting rather old, but he's a good mouse.
  3. Using it to post on slashdot by Harald+Paulsen · · Score: 4, Funny

    holyfield is these all of this was made worse by the fact that i had these birds skilled estimate uh... supplying itself what's your special prom to prevent fraud reform
    thoughtfulness julia roberts police comments entry drug connections predicting that nighttime beating

    --
    Harald
  4. Bad experiences on this front by CustomSolvers2 · · Score: 4, Interesting

    Some months ago, I did some tests with speech recognition software and my conclusion was that it is still too unreliable. My intention was to develop an application allowing me to write moderately complex code by voice (creating files and folders, including proper indentation, recognising functions, variables and other basic elements, etc. Basically, allowing me to write/edit the main parts of a random algorithm in certain language without touching the keyboard). I did test Microsoft in-built functionality (+ used one of Microsoft's .NET programming languages) and it wasn't even close to what "5.9% error rate" seems to indicate (almost perfect?).

    In defence of the software, I have to say that my English accent isn't precisely excellent (some people say that it is "too thick" and other people just say "what?". LOL) and honestly I make a very little effort to pronounce properly. But this is also the problem with speech recognition: it is mostly focused on a specific language/accent/intonation. I was doing my tests in an English Windows version and this was the language for the default speech recognition (and adding a different one wasn't precisely straightforward).

    I do perfectly understand the complexity associated with developing a reliable enough piece of software delivering what I was expecting; but this is precisely the reason why I looked for existing solutions rather than developing everything myself (what I do pretty often). In any case, my impression is that you can still not expect good enough reliability of (Microsoft's) speech recognition software, much less when mixing languages/accents up (particularly problematic situation: including Spanish words when talking in English). I might give a new shot at all this next year though.

    --
    Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
  5. "As Accurate As Professional Transcribers" by Anonymous Coward · · Score: 5, Funny

    "As Accurate As Professional Transcribers..."

    They left out "from Uzbekistan transcribing Navajo - underwater".

    Never trust anything Clippy say.