Slashdot Mirror


Baidu's Voice Recognition Software Is More Accurate Than Typing (thestack.com)

The massive Chinese web services company Baidu has launched their sophisticated new TalkType 'keyboard' which defaults to voice recognition app. An anonymous reader quotes The Stack: Baidu claims that the app's speech recognition is more accurate than actual typing, having developed and tested the technology alongside speech software experts at Stanford University...The researchers concluded that Baidu's technology was three times faster than a typical user typing in English. The results showed that the TalkType error rate was 20.4% lower than an English texter hunting and tapping for letters. The accuracy was even greater for those typing in Mandarin, with the error rate dropping 63.4% when using TalkType.
Of course, last year Baidu was also accused of gaming the testing for their image-recognition software.

8 of 55 comments (clear)

  1. Better than hunt and peck? by Carewolf · · Score: 2

    That is like the test where someone claimed they defeated the Turing test by pretending to a retarded foreign boy that didn't speak English.

    I guess they also only picked people who had never before typed anything in their life as well.

  2. Texting isn't typing by localroger · · Score: 2

    On a full English language keboard there is no way speech is faster if you know how to type. Now if you don't know how to type or you're using a touch screen, then yeah. Maybe if you're using Mandarin because it's not as straightforward as the Roman alphabet. But no, I can type considerably faster than I can talk and almost as fast as I can read, which is well over 100 wpm, and with a display and backspace key (since I'm human) my ultimate accuracy is 100%.

    --
    Brackets contain world's first nanosig, highly magnified:[.]
    1. Re:Texting isn't typing by wonkey_monkey · · Score: 2

      On a full English language keboard

      Heh.

      there is no way speech is faster if you know how to type.

      Nah, that's just not true. Most professional typists don't exceed 100wpm, while the average person talks at 130-150wpm.

      If typing was so much faster than speaking, they wouldn't do live subtitles by having someone repeat the words into a mic for speech recognition. Which is what they do, with occasionally hilarious results.

      --
      systemd is Roko's Basilisk.
  3. I'm not the average typist by Snotnose · · Score: 3, Interesting

    When I got my Trash 80 back in the 70s the first program I bought was a typing tutor. I've been touch typing for some 40 years ago and my fingers don't have any trouble keeping up with my brain. My mouth, not so much.

  4. Oh Good... by skam240 · · Score: 2

    Oh good, more assholes yelling into their phones while in public spaces. That's exactly what we need.

    --
    I ignore Anonymous Coward posts. If you want to discuss something, that's awesome. Log in.
  5. Leads to hunt-n-smash by fyngyrz · · Score: 4, Insightful

    Another thing -- when I'm typing, and there is an error, I'm right there to correct it.

    With voice recog, at least right now, editing it after it's been screwed up by Google or whatever is more of a PITA than just typing it out in the first place.

    Trying to actually do decent editing (at least on my S7) is seriously annoying. Cursor positioning is flaky as hell, parts of messages disappear above and blow the edit point, I try to drag the edit point and it scrolls up or down so fast there's no chance of actually getting where I meant to go...

    I grant you that this kind of thing is the result of bad design at some level in Android or some library most everyone is using, and could be corrected... but right now, it's SN/AFU. That's a big factor in why editing as I go, rather than trying to get "somewhere" in something already containing lots of text, is much easier on my temper.

    That said, I would welcome 99.99999% accurate voice recog. Not holding my breath, though.

    --
    I've fallen off your lawn, and I can't get up.
  6. Re:Voice recognition? by the_povinator · · Score: 2

    The article talks about speech recognition, not voice recognition. EditorDavid has the two concepts mixed up: speech recognition is all about trying to recognized what you are saying, whereas voice recognition is all about recognizing specific voice, like e.g. for reasons of identifying who is speaking.

    [actual expert here]

    Not exactly: "speech recognition" means taking in speech and putting out some kind of text; "speaker recognition" is a general term for identifying speakers or verifying speaker identity. "Voice recognition" is a term that is not used in the field (but is sometimes used in the media) which generally means the same thing as "speech recognition".

    --
    The .sig is dead, and I believe I had a hand in killing it.
  7. Suspicious... by myowntrueself · · Score: 2

    One of the things that characterises modern Chinese language is the proliferation of homophones (words that sound alike).

    The way that Chinese people cope with this is extreme use of context and of spelling; the homophones don't have the same character. Sometimes Chinese people will clarify meaning by sketching a character in the air, often unconsciously.

    If the error rate reduction is so huge based on speech recognition this would suggest that pinyin can replace characters for writing Chinese. And this has been disproved on many occasions; you can literally write an entire story using only the syllable 'ma'. In pinyin it all comes out as 'ma' with the 4 tones. In characters its actually readable. Same with the story of the lion eating poet in the stone den which is all 'shi'.

    So a great test of this Baidu software would be to get someone to read this to it and see what it comes up with:

    https://chinesepod.com/blog/ho...

    https://en.wikipedia.org/wiki/...

    and see if it gets it right:

    Sh Shì shí sh sh

    Shíshì shshì Sh Shì, shì sh, shì shí shí sh.

    Shì shíshí shì shì shì sh.

    Shí shí, shì shí sh shì shì.

    Shì shí, shì Sh Shì shì shì.

    Shì shì shì shí sh, shì sh shì, sh shì shí sh shìshì.

    Shì shí shì shí sh sh, shì shíshì.

    Shíshì sh, Shì sh shì shì shíshì.

    Shíshì shì, Shì sh shì shí shì shí sh.

    Shí shí, sh shí shì shí sh sh, shí shí shí sh sh.

    Shì shì shì shì.

    --
    In the free world the media isn't government run; the government is media run.