Slashdot Mirror


Google's New Voice Recognition System Works Instantly and Offline (If You Have a Pixel) (techcrunch.com)

Google's latest speech recognition works entirely offline, eliminating the delay that many other voice assistants have to return your query. "The delay occurs because your voice, or some data derived from it anyway, has to travel from your phone to the servers of whoever operates the service, where it is analyzed and sent back a short time later," reports TechCrunch. "This can take anywhere from a handful of milliseconds to multiple entire seconds (what a nightmare!), or longer if your packets get lost in the ether." The only major downside with Google's new system is its limited availability. As of right now, it's only available to people with a Pixel smartphone. From the report: Why not just do the voice recognition on the device? There's nothing these companies would like more, but turning voice into text on the order of milliseconds takes quite a bit of computing power. It's not just about hearing a sound and writing a word -- understanding what someone is saying word by word involves a whole lot of context about language and intention. Your phone could do it, for sure, but it wouldn't be much faster than sending it off to the cloud, and it would eat up your battery. But steady advancements in the field have made it plausible to do so, and Google's latest product makes it available to anyone with a Pixel.

Google's work on the topic, documented in a paper here, built on previous advances to create a model small and efficient enough to fit on a phone (it's 80 megabytes, if you're curious), but capable of hearing and transcribing speech as you say it. No need to wait until you've finished a sentence to think whether you meant "their" or "there" -- it figures it out on the fly. So what's the catch? Well, it only works in Gboard, Google's keyboard app, and it only works on Pixels, and it only works in American English. So in a way this is just kind of a stress test for the real thing.
"Given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application," writes Google in their blog post.

3 of 41 comments (clear)

  1. the reason offline function is available.. by Anonymous Coward · · Score: 2, Insightful

    is simply because pixel is google, and the spy shit will still end up getting transmitted later when a connection is available. it has nothing to do with 'computing power' of the device. early dragon naturallyspeaking worked on lowly 486dx and pentiums running windows 95 and nt 4. all it takes it a little 'training' of the user's voice, and a trained dragon 1.0 did just as well back then, as current shit does today. current iterations of 'voice assistants' still do the 'training' for voices.. just 'in the cloud'.. cuz spying is good for profits and it allows untrained voices to be mostly recognized most the time.

    1. Re:the reason offline function is available.. by epine · · Score: 3, Insightful

      and a trained dragon 1.0 did just as well back then, as current shit does today

      You're completely nuts.

      Dragon did okay back in the day if you bought exactly the right condenser microphone, positioned it exactly right on your headband (about 2" away from your lips just off to the side of your mouth), trained it properly in exactly that configuration, and you used it in quiet environment with no dogs barking, slamming doors down the hall, traffic noises through the open window, etc. Also, it was good to avoid getting allergies or coming down with a cold, to start/stop smoking unless you wanted to train your model again with your "new" voice.

      It's the same deal with squash rackets. The original graphite rackets from the early 1980s had a powerful sweet spot, but it wasn't very big. They also shattered every tenth time you scuffed the wall hard by accident. Then they started to monkey with the head shape, and the sweet spot expanded to the size of a cantaloupe. The graphite eventually became less brittle, too.

      But that old sweet spot the size of a mandarin orange sure was just as good as the modern shit today.

  2. Yea, lots of power by Khyber · · Score: 3, Insightful

    "but turning voice into text on the order of milliseconds takes quite a bit of computing power."

    Uhh, Dragon Naturally Speaking worked on fucking Pentium II processors. It only takes a lot of computing power today because nobody knows how to fucking code.

    --
    Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.