Slashdot Mirror


Baidu's Voice Recognition Software Is More Accurate Than Typing (thestack.com)

The massive Chinese web services company Baidu has launched their sophisticated new TalkType 'keyboard' which defaults to voice recognition app. An anonymous reader quotes The Stack: Baidu claims that the app's speech recognition is more accurate than actual typing, having developed and tested the technology alongside speech software experts at Stanford University...The researchers concluded that Baidu's technology was three times faster than a typical user typing in English. The results showed that the TalkType error rate was 20.4% lower than an English texter hunting and tapping for letters. The accuracy was even greater for those typing in Mandarin, with the error rate dropping 63.4% when using TalkType.
Of course, last year Baidu was also accused of gaming the testing for their image-recognition software.

55 comments

  1. ten finger typing is faster by Anonymous Coward · · Score: 0

    texter hunting and tapping for letters

    your doing it wrong m8

    1. Re:ten finger typing is faster by Anonymous Coward · · Score: 0

      ur doing it wrng 2 m8

    2. Re:ten finger typing is faster by Anonymous Coward · · Score: 0

      No, I am pretty sure Buddha's voice is more accurate than typing.

  2. Better than hunt and peck? by Carewolf · · Score: 2

    That is like the test where someone claimed they defeated the Turing test by pretending to a retarded foreign boy that didn't speak English.

    I guess they also only picked people who had never before typed anything in their life as well.

    1. Re:Better than hunt and peck? by Anonymous Coward · · Score: 0

      Indeed I've seen chatbots outfox dimwitted people who think the bots are engaging in genuine conversation. Turing test is easily passed by dumb bot against dumber human.

    2. Re:Better than hunt and peck? by F.Ultra · · Score: 1

      I just assume that they typed on a phone and not on a proper keyboard. Otherwise it does not make any sense.

    3. Re:Better than hunt and peck? by Mondor · · Score: 1

      Not really. The key is - (faster than) "English texter hunting and tapping for Mandarin letters".

  3. Hunt & peck by Anonymous Coward · · Score: 0

    "20.4% lower than an English texter hunting and tapping"
    What about touch typists? How does that compare?
    I type at about 40wpm but there are others who can type
    above 100wpm (words per minute).

    1. Re:Hunt & peck by Anonymous Coward · · Score: 0

      I type at about 40wpm but there are others who can type above 100wpm

      And the rest of the girls are hunt & peckers, right?

  4. OH DONALD! by Anonymous Coward · · Score: 0



    Not at all unusual 'banter' among dudes that I have known.

    Now, Billy Bush.  I would just never be around that kind.

    But for the recording.  Whooop! Deee!  Doooo!  Daaaaa!  The wife.  Yeah, she can act all pissed off.  Imagine the shoe on the other foot.  I have.  That would be unusual, but funny, and frankly, cool.  And you know women do that all the time, too, you just don't get it hear it.

    The condemnation, though.  That is crocodile tears: makes for good short-time ratings.

    And Baidu.  Echo Me-too.

  5. Texting isn't typing by localroger · · Score: 2

    On a full English language keboard there is no way speech is faster if you know how to type. Now if you don't know how to type or you're using a touch screen, then yeah. Maybe if you're using Mandarin because it's not as straightforward as the Roman alphabet. But no, I can type considerably faster than I can talk and almost as fast as I can read, which is well over 100 wpm, and with a display and backspace key (since I'm human) my ultimate accuracy is 100%.

    --
    Brackets contain world's first nanosig, highly magnified:[.]
    1. Re:Texting isn't typing by Anna+Merikin · · Score: 1

      You're right, texting isn't as fast or accurate as typing, but I think you got the numbers wrong.

      Near the turn of the millennium, speech recognition software (ViaVoice, etc,.) achieved a claimed 99% accuracy. So I tried it out. After training, I got over 95% by speaking carefully (and slowly). The problem was finding and fixing those 05% mistakes took longer than typing the whole document over would have taken.

      And yeah, most touch typists can't get more than 35 wpm and touch screens are worse, so the deck is stacked to an extent.

    2. Re:Texting isn't typing by wonkey_monkey · · Score: 2

      On a full English language keboard

      Heh.

      there is no way speech is faster if you know how to type.

      Nah, that's just not true. Most professional typists don't exceed 100wpm, while the average person talks at 130-150wpm.

      If typing was so much faster than speaking, they wouldn't do live subtitles by having someone repeat the words into a mic for speech recognition. Which is what they do, with occasionally hilarious results.

      --
      systemd is Roko's Basilisk.
    3. Re:Texting isn't typing by Anonymous Coward · · Score: 0

      Yes, Via Voice wasn't that great...was good for the era. I can speak much faster than I can type, as I can only type around 50 wpm. The only time I can type faster is if I have to pause to think about what I'm going to say.

      Try using Google's voice dictation in English, then maybe compare that to Baidu...that's a more fair comparison.

    4. Re:Texting isn't typing by Anonymous Coward · · Score: 0

      It's only faster than typing for slow typists or when writing in diacritic-heavy or non-latin languages where every character is an adventure in alt-keycodes or IME methods.

    5. Re:Texting isn't typing by maynard · · Score: 1

      Speaking is not writing. It impacts quality of prose. Though it might be a good way to bang out an initial rough, I'd still want to fine tune phrasing and word choice by keyboard.

    6. Re:Texting isn't typing by xvan · · Score: 1

      As far as I know, live activities use stenotypers. Maybe they started using speech recognition for live captioning because it's cheaper, but it's the first time I've heard about it.

    7. Re:Texting isn't typing by gordguide · · Score: 1

      The keyboard layout of a modern computer / laptop is based on the typewriter key layout. The interesting thing is the that layout was deliberately crafted to *slow down* type speed, as the typists of the day (a hundred years ago) would type faster than the machines could render the text, leading to jamming. So, not only do I not believe that any voice recognition software can keep up to a touch typist, it's probable that with different key layouts (which exist, I know, but no-one actually uses them) a touch typist could type well beyond the common 40~60 wpm many of us can manage easily, and the somewhat less common 80~100 wpm some are capable of, and kick this Voice Recognition "breakthrough" all over the block. And how much do you want to bet that a new keyboard, throw in some training hours even, would cost less than whatever hardware and software and training hours this is going to require?

      Because a touch typist never looks at the keyboard but always at the output (versus a touchscreen virtual keyboard user, who almost always will be continuously switching his or her gaze from the keyboard to the output, back to the keyboard, etc) corrections take as little as a fraction of a second and even amongst those whose typing speed is less than 40 wpm, not much longer. The old "thousand monkeys"(1) could probably beat someone making a correction on a virtual keyboard because the software interface, as clever as it may be, is clumsy as hell and that is probably as advanced as it's going to get without live editing with a stylus and text recognition that actually works.

      Which brings us to ...

      I first used text recognition software twenty years ago. It worked about 95% of the time, and you had to make tedious corrections on the rest. Two decades later, we have computers that are maybe a thousand times faster, and who knows how many hours of software development invested, and it works about 95% of the time, and you have to make tedious corrections on the rest. To me this pretty much establishes this is an application of technology that is un-solvable, or at least un-solvable until we are past the point where we have moved on and no-one needs it to work.

      Voice recognition seems to me to be just a fancy variation of the same thing, with the same fundamental flaws, designed to fleece people out of the funds in the software budget.

      (1) "If you gave a thousand monkeys a thousand keyboards and waited long enough, they eventually would produce the complete works of Shakespeare."

    8. Re:Texting isn't typing by Anonymous Coward · · Score: 0

      35 wpm? There must be way more arthritics in the world than I thought.

    9. Re:Texting isn't typing by thegarbz · · Score: 1

      and with a display and backspace key (since I'm human) my ultimate accuracy is 100%.

      That applies to every form of input. Texting only has a low accuracy rate because people need to make corrections, kind of like you are doing which makes you 100%.

      Also you typing 100wpm is atypical. Most people don't type that fast. No actually that's not right. I'll wager that very very few people are able to type that fast.

    10. Re:Texting isn't typing by chill · · Score: 1

      Then you have a major speech impediment and should probably see a therapist for it.

      Using your post at a sample, I am able to read it aloud in 22 seconds at a conversational rate. This is the same rate I use reading stories aloud to my children. Using my slower, more enunciated "speech recognition" voice, usually reserved for Google input, it takes me 37 seconds and the only thing I had to correct afterwards was the ( and ) you used. That includes all of your punctuation and the automatic correction of "keboard" to "keyboard" that you missed. With Dragon Dictate, I'm closer to the 22 seconds than the 37, but I only use that at work.

      Typing it out -- and I can touch type and quite a good clip -- gives me 75 seconds. That included two uses of the backspace key to correct my typing errors.

      Yes, I greatly prefer to edit text using a keyboard. It allows for deliberation of thought as well as a much greater precision in actual editing. But, for just getting words to the page, even with basic formatting in place, speech recognition is by far faster.

      --
      Learning HOW to think is more important than learning WHAT to think.
    11. Re:Texting isn't typing by yes-but-no · · Score: 1

      For that 05%, I think it would be good to use the best of both worlds. That is the software will produce the written text while allowing you to edit manually using keyboard. You may mark some words using a command-word like 'fix-it'; Example: 'The thickness needs to be reduced' .. let the s/w produced 'The sickness needs to be reduced' .. you are visually seeing it...then you say 'fix-it thickness' . so it will highlight the word sickness which you can come back later and manually change to thickness. [the software may also record your audio clip so when you come back it can hint you on what you said before]

    12. Re:Texting isn't typing by TheSync · · Score: 1

      "Revoicing" is becoming more popular for live TV captioning. Revoicers, also known as respeakers, repeat clearly what is being said during unscripted events using special software that's trained to recognise their voice. Their speech is then converted into text which appears on a caption unit, an LED or large screen. Revoicers also need to pare down (edit) the live dialogue or conversation, which means the text that appears isn't verbatim, although it will always give a good idea of what's being said.

    13. Re:Texting isn't typing by Anonymous Coward · · Score: 0

      I type (on a standard qwerty keyboard) at 120+ wpm.

      I can't keep up with a normal speaking pace. Sure, if there are enough pauses I can just barely do it, but, even being in the top 1% of touch typists, I can't keep up with a normal person speaking.

      Qwerty wasn't built to "slow down" so much as to "prevent jamming" - it was made to slow certain things down a bit.
      Could I increase my speed a bit if I switched to dvorak or whatever? probably, but it's going to be a fractional improvement, not a big jump.

      Good speech recognition will beat my typing speed for regular english text input any day.
       

    14. Re:Texting isn't typing by nbauman · · Score: 1

      On a full English language keboard there is no way speech is faster if you know how to type.

      How fast do you type?

      I've transcribed hundreds of hours of tapes, mostly lectures and panel discussions. I tested ~72 wpm. I spent a lot of time perfecting my typing methods and speed.

      I estimated that most lectures were about 120 wpm. Some people talk much faster, particularly in bursts. I think certified courtroom stenographers have to pass a test at 210 wpm.

      I could never keep up with continuous speech. I used a transcribing machine, and played it back at a slower speed, and/or backpedaled. I could usually keep up with normal lectures without pausing when I reduced the speed to 50%.

      I know a lot of people who transcribe lectures and interviews, and there is a general consensus that it takes about 3 hours to transcribe 1 hour of speech. That's with good accuracy and corrections. It probably takes about 2 hours for a rough draft. I could never do it in 1 hour.

    15. Re: Texting isn't typing by Anonymous Coward · · Score: 0

      And after your ultra fast speech to text is done, you have to spend double the time fixing the errors. But hey, progress.

  6. I'm not the average typist by Snotnose · · Score: 3, Interesting

    When I got my Trash 80 back in the 70s the first program I bought was a typing tutor. I've been touch typing for some 40 years ago and my fingers don't have any trouble keeping up with my brain. My mouth, not so much.

    1. Re:I'm not the average typist by Anonymous Coward · · Score: 0

      When I got my TRS 80 back in the 70s

      There, I autocorrected that for you.

  7. Voice recognition? by Gaygirlie · · Score: 1

    The article talks about speech recognition, not voice recognition. EditorDavid has the two concepts mixed up: speech recognition is all about trying to recognized what you are saying, whereas voice recognition is all about recognizing specific voice, like e.g. for reasons of identifying who is speaking.

    1. Re:Voice recognition? by the_povinator · · Score: 2

      The article talks about speech recognition, not voice recognition. EditorDavid has the two concepts mixed up: speech recognition is all about trying to recognized what you are saying, whereas voice recognition is all about recognizing specific voice, like e.g. for reasons of identifying who is speaking.

      [actual expert here]

      Not exactly: "speech recognition" means taking in speech and putting out some kind of text; "speaker recognition" is a general term for identifying speakers or verifying speaker identity. "Voice recognition" is a term that is not used in the field (but is sometimes used in the media) which generally means the same thing as "speech recognition".

      --
      The .sig is dead, and I believe I had a hand in killing it.
  8. What about autocorrect? by wonkey_monkey · · Score: 1

    The results showed that the TalkType error rate was 20.4% lower than an English texter hunting and tapping for letters.

    How many of those errors could have been reliably corrected by some form of autocorrect, or was such already included in the tests?

    If I try and type "thw quick rbown fox jump sover the lazy dog" as fast I can... well, that's the result. Autocorrect could have fixed most of those problems.

    --
    systemd is Roko's Basilisk.
    1. Re:What about autocorrect? by xvan · · Score: 1

      The issue with auto correct is that it takes damn long to prevent it from correcting not recognized words.

    2. Re:What about autocorrect? by gordguide · · Score: 1

      The FIRST thing you do, when you get a new "smart" phone, is turn auto-correction off.

      About a month later, you will discover you never needed it in the first place. Plus, you will never have have to deal with people who mis-interpret your meaning in your text communication, as the improperly spelled uncorrected version of whatever you were trying to say will be instantly recognizable by whomever is reading it for what it was supposed to be, because as humans we are very, very good at that.

      Why would anyone want to substitute a perfectly spelled and completely out-of-context substitution, unless, of course, you consider communication to be a skill valued lower than dirt.

  9. Oh Good... by skam240 · · Score: 2

    Oh good, more assholes yelling into their phones while in public spaces. That's exactly what we need.

    --
    I ignore Anonymous Coward posts. If you want to discuss something, that's awesome. Log in.
  10. Leads to hunt-n-smash by fyngyrz · · Score: 4, Insightful

    Another thing -- when I'm typing, and there is an error, I'm right there to correct it.

    With voice recog, at least right now, editing it after it's been screwed up by Google or whatever is more of a PITA than just typing it out in the first place.

    Trying to actually do decent editing (at least on my S7) is seriously annoying. Cursor positioning is flaky as hell, parts of messages disappear above and blow the edit point, I try to drag the edit point and it scrolls up or down so fast there's no chance of actually getting where I meant to go...

    I grant you that this kind of thing is the result of bad design at some level in Android or some library most everyone is using, and could be corrected... but right now, it's SN/AFU. That's a big factor in why editing as I go, rather than trying to get "somewhere" in something already containing lots of text, is much easier on my temper.

    That said, I would welcome 99.99999% accurate voice recog. Not holding my breath, though.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re: Leads to hunt-n-smash by gweilo8888 · · Score: 1

      Not just that, but it will very frequently give you a bunch of different choices all of which change a dozen-word phrase, instead of the single word you're trying to correct. It's absolutely and totally unuseable, and a moronic design that's the result of trying to be far too clever.

    2. Re:Leads to hunt-n-smash by Flavianoep · · Score: 1

      Have you tried Gravity Box? It can activate a pair of arrow keys. Needs root.
      That's it, unfortunately: if you want to have some arrow keys to position the edit point more conveniently, you have to root your phone!

      --
      Linux is for people who don't mind RTFM.
  11. Are those... clutches? by fyngyrz · · Score: 0
    --
    I've fallen off your lawn, and I can't get up.
  12. Better than typing? by Anonymous Coward · · Score: 0

    No, it's really not.

  13. Suspicious... by myowntrueself · · Score: 2

    One of the things that characterises modern Chinese language is the proliferation of homophones (words that sound alike).

    The way that Chinese people cope with this is extreme use of context and of spelling; the homophones don't have the same character. Sometimes Chinese people will clarify meaning by sketching a character in the air, often unconsciously.

    If the error rate reduction is so huge based on speech recognition this would suggest that pinyin can replace characters for writing Chinese. And this has been disproved on many occasions; you can literally write an entire story using only the syllable 'ma'. In pinyin it all comes out as 'ma' with the 4 tones. In characters its actually readable. Same with the story of the lion eating poet in the stone den which is all 'shi'.

    So a great test of this Baidu software would be to get someone to read this to it and see what it comes up with:

    https://chinesepod.com/blog/ho...

    https://en.wikipedia.org/wiki/...

    and see if it gets it right:

    Sh Shì shí sh sh

    Shíshì shshì Sh Shì, shì sh, shì shí shí sh.

    Shì shíshí shì shì shì sh.

    Shí shí, shì shí sh shì shì.

    Shì shí, shì Sh Shì shì shì.

    Shì shì shì shí sh, shì sh shì, sh shì shí sh shìshì.

    Shì shí shì shí sh sh, shì shíshì.

    Shíshì sh, Shì sh shì shì shíshì.

    Shíshì shì, Shì sh shì shí shì shí sh.

    Shí shí, sh shí shì shí sh sh, shí shí shí sh sh.

    Shì shì shì shì.

    --
    In the free world the media isn't government run; the government is media run.
    1. Re: Suspicious... by stevedog · · Score: 1

      That would seem to actually favor an engine like this. A good autocorrect does use context, but generally it only has access to what you have said before, not after, the current word (at least during the initial input). In such a context-dependent environment as you describe, being able to retroactively go back and change earlier text based on closely subsequent input (as speech recognition software often does, but keyboards generally don't) would seem especially valuable.

      In fact, Google's voice accuracy is often due to exactly that: at least for me, it will often initially have something very wrong, but then end up with the right recognition result based on the rest of the input ("navigate to Pete's" -> "navigate to pizza restaurant").

    2. Re: Suspicious... by Anonymous Coward · · Score: 0

      This is not modern Mandarin. It's old Chinese. Modern Mandarin words on average are two syllables long. There are lots of homophones but it's nowhere as bad as it seems.

    3. Re: Suspicious... by Dayze!Confused · · Score: 1

      Chinese input using New Phonetic Method actually does this too. Basically after I type the sounds then tone and move on to the next character it will change the first character based on the sounds and tones of the following characters and continue to do so onto I press enter. Often I will type out an entire sentence before pressing enter, though sometimes it starts to spit out bad results if you go for too long. It also does some recognition based off previous character choice such as when using gendered pronouns, which I would assume the speech recognition they are claiming would have a lot of trouble with as the pronoun in speech is the same for all genders, such as the word you, or s/he/it.

      On the iPhone they go one step further with a slightly different form of input where even the tone is not required unless you are having trouble finding the correct character. I can start typing just the sounds of multiple characters strung together and the suggestions bar will show possibilities. Going a step even further, my Taiwanese friend showed me that you don't even have to type out all the sounds, just the beginning sound is enough, so I can type out an idiom by merely pressing the first phonetic symbol of each of the typically four characters. The problem with these latter methods is training to know which words can easily be found by omitting the ending phonetics and which cannot.

      I'm assuming Baidu is probably doing this for Simplified Chinese rather that Traditional, though it would be great if it could do it for both.

      The largest problem still remains, most likely, with bilingual input. Typing I can easily switch between typing Chinese and typing English, but speech recognition has so far not offered this option. At my inlaws house conversation can easily go from Mandarin to Taiwanese, with random English words thrown in within the same sentence and not missing a beat. Humans have no problem being able to recognize the meaning of these kinds of sentences if they speak, or at least understand, all three languages. Even a basic understanding isn't necessarily required, as is the case with English in this situation there tends to be certain words that everyone understands without having any ability to speak any more of the language.

      --
      "All tyranny needs to gain a foothold is for people of good conscience to remain silent." [Thomas Jefferson]
    4. Re:Suspicious... by Anonymous Coward · · Score: 0

      That poem is actually the typical (and very good) example used by Cantonese advocates to highlight one of the problems with Mandarin: with only 4 tones, it's no wonder homophones is a big issue. The Cantonese language, on the other hand, has 9 tones, which means that homophones is less of a problem. This, of course, has a big effect on how the use of the two languages has evolved over time. Mandarin users tend to use different words (characters, more accurately) to describe the same thing compared to Cantonese users, and this difference is in turn reflected in written material as well (in addition to the fact that the Chinese now use simplified written characters while Hong Kongers and Taiwanese use traditional characters). Thus in many ways the Hong Kong/Cantonese culture has more of a traditional link to ancient Chinese culture than modern day China.

  14. 'hunting' and typing? by phantomfive · · Score: 1

    I don't think the typical person 'hunts' and types anymore. Maybe 30 years ago......

    --
    "First they came for the slanderers and i said nothing."
    1. Re: 'hunting' and typing? by Anonymous Coward · · Score: 0

      If most people touch-typed, backlit keyboards wouldn't be a thing.

  15. No big brother by zaphirplane · · Score: 1

    What not a single comment about big brother implication .... specially for a Chinese company where each of them are accused of being an extension of the party.
    And I don't care about look what yahoo did or whatever an extension is different from complying with the law in a democracy

  16. Icy by Anonymous Coward · · Score: 0

    I see

  17. NSA announces a competing product by Cyberax · · Score: 1

    NSA is going to announce a competing keyboard. They're already digitizing everything but now they'll be sending you a copy as a courtesy and for proofreading and correction. Win-win!

  18. The problem isn't the low error rate... by MMC+Monster · · Score: 1

    The problem is the false sense of security and subsequent lack of proofreading and error correction.

    Try voice recognition software for a week. You'll likely find that you will read over something that you dictated and not realize that there are errors in it. People are less likely to find errors in something they dictated than in something they typed.

    --
    Help! I'm a slashdot refugee.
  19. Current Keyboard Was Designed For Speed by HannethCom · · Score: 1

    Not sure where you got your incorrect information from, but the current keyboard was designed for maximum speed of the day, not to slow down typing speeds.
    While it is true that there are keyboard layouts that can make typing faster on a computer, the current keyboard was designed to space out the hammers so they would not jam on typewriters thus increasing the speed people could type.

    --
    Microsoft, Apple, Google, Amazon what's the difference? All steal money from devs and control with walled gardens.
  20. Character versus word errors by goombah99 · · Score: 1

    A character error is easy to read past be a word error changes the meeting.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:Character versus word errors by fyngyrz · · Score: 1

      a word error changes the meeting.

      I complete agree: a turd error changes the meating. [send] (goddammit)

      --
      I've fallen off your lawn, and I can't get up.
  21. Huh? by John+Allsup · · Score: 1

    I can type more precisely and quickly and confidently than I can talk. Deciding which words to type is faster and easier. How is voice recognition going to improve on the speed by which I can speak, which is inferior to my typing ability whenever precision is required?

    --
    John_Chalisque
  22. Meanwhile back at the ranch... by BrianMahoney1357 · · Score: 1

    ...the app is probably keeping track of everything a user speaks/types and sending it back home to China.