Slashdot Mirror


Microsoft Claims Its Speech Transcription AI is Now Better Than Human Professionals (qz.com)

Microsoft announced today a system that can transcribe the content of a phone call with "the same or fewer errors" than real actual human professionals trained in transcription -- even when the human transcript is double-checked by a second human for accuracy. As you can imagine, this is a huge milestone for speech recognition. From a Quartz report:The team doesn't attribute this achievement to any breakthrough in algorithm or data, but the careful tuning of existing AI architectures. To test how their algorithm stacked up against humans, first researchers had to get a baseline. Microsoft hired a third-party service to tackle a piece of audio for which they had a confirmed 100 percent accurate transcription. The service worked in two stages: one person types up the audio, and then a second person listens to the audio and corrects any errors on the transcript. Based on the correct transcript for the standardized tests, the professionals had 5.9 percent and 11.3 percent error rates. After learning from 2,000 hours of human speech, Microsoft's system went after the same audio file -- and scored 5.9 percent and 11.1 percent error rates. That minute difference ends up being about a dozen fewer errors. Microsoft's next challenge is making this level of speech recognition work in noisier environments, like in a car or at a party. This implementation is crucial for Microsoft, and goes well beyond just transcription.

9 of 98 comments (clear)

  1. Right ... by scunc · · Score: 4, Funny

    I'll believe that when I ducking see it.
    --
    This comment was transcribed by Microsoft's new AI transcription software.

  2. Voice Control by Rockoon · · Score: 5, Insightful

    If you want voice input to be more than just a toy, then getting near flawless accuracy here seems to be a required first step.

    If your mouse occasionally sent an erroneous input to the computer no matter how careful you were, you wouldnt use it so much.

    --
    "His name was James Damore."
    1. Re:Voice Control by TFlan91 · · Score: 2

      Agreed, however people down south don't move their mouse with "the typical hospitallllity of us folk 'round here" as opposed to the people up north who couldn't give a rats ass.

      Speech is incredibly dense to parse. Where a near perfect operation is required for a mouse, voice control can have a couple bumps in its' road before (and while) being highly adopted.

    2. Re:Voice Control by stephanruby · · Score: 2

      If your mouse occasionally sent an erroneous input to the computer no matter how careful you were, you wouldnt use it so much.

      Wrong example. Mouse usability requires constant visual feedback and almost constant human correction. That is the reason why we can't really use a mouse without looking directly at the screen.

      In any case, flawless transcription accuracy of one single human voice out of 7.5 billion voices already happens with Google Voice. The problem occurs when Google Voice is not tuned to the voices of the other 7.49999 billion people. Do you think that's what Microsoft is using in the backend this second time around?

  3. any better than "Show me to buy milk"? by itsme1234 · · Score: 2

    Like any human would think about milk or "open reminders" when hearing "Show me my most at-risk opportunities".

  4. Now put it to good use! by cmiller173 · · Score: 4, Informative

    Automated closed captioning for the hearing impaired would be one. I'm not hearing impaired, but I use the CC system with the volume low when I am watching TV while everyone else in the house is sleeping. I also use it when everyone is awake and noisy. It is amazing how awful some CC can be.

  5. Govt Survellience by mcolgin · · Score: 3, Insightful

    I assume this is so the Govt agencies can transcribe cell-phone communications to text and then perform analysis to find all the "bad guys" ?

    --
    I made this: http://www.bpftpserver.com
  6. Re:Microsoft? by Opportunist · · Score: 3, Funny

    Hush! As long as MS exists, I have total job security!

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  7. Defused by John+Jorsett · · Score: 3, Interesting

    The acid test for transcription for me is if the transcriptionist gets the word "defuse" right, as in "He defused the tense situation." Every, and I mean EVERY, closed caption I've seen transcribes it as, "He diffused the tense situation." It seems to be the universal mistake.