Google's New Voice Recognition System Works Instantly and Offline (If You Have a Pixel) (techcrunch.com)

← Back to Stories (view on slashdot.org)

Google's New Voice Recognition System Works Instantly and Offline (If You Have a Pixel) (techcrunch.com)

Posted by BeauHD on Tuesday March 12, 2019 @01:30PM from the eliminate-the-middleman dept.

Google's latest speech recognition works entirely offline, eliminating the delay that many other voice assistants have to return your query. "The delay occurs because your voice, or some data derived from it anyway, has to travel from your phone to the servers of whoever operates the service, where it is analyzed and sent back a short time later," reports TechCrunch. "This can take anywhere from a handful of milliseconds to multiple entire seconds (what a nightmare!), or longer if your packets get lost in the ether." The only major downside with Google's new system is its limited availability. As of right now, it's only available to people with a Pixel smartphone. From the report: Why not just do the voice recognition on the device? There's nothing these companies would like more, but turning voice into text on the order of milliseconds takes quite a bit of computing power. It's not just about hearing a sound and writing a word -- understanding what someone is saying word by word involves a whole lot of context about language and intention. Your phone could do it, for sure, but it wouldn't be much faster than sending it off to the cloud, and it would eat up your battery. But steady advancements in the field have made it plausible to do so, and Google's latest product makes it available to anyone with a Pixel.

Google's work on the topic, documented in a paper here, built on previous advances to create a model small and efficient enough to fit on a phone (it's 80 megabytes, if you're curious), but capable of hearing and transcribing speech as you say it. No need to wait until you've finished a sentence to think whether you meant "their" or "there" -- it figures it out on the fly. So what's the catch? Well, it only works in Gboard, Google's keyboard app, and it only works on Pixels, and it only works in American English. So in a way this is just kind of a stress test for the real thing. "Given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application," writes Google in their blog post.

41 comments

Min score:

Reason:

Sort:

I'd make that by Anonymous Coward · 2019-03-12 13:43 · Score: 0

What's the audio input range and rate?
How many microphones made out of what?
1. Re:I'd make that by Anonymous Coward · 2019-03-12 13:57 · Score: 0
  
  Presumably enough to log all conversation when in flight mode for later uploading.
the reason offline function is available.. by Anonymous Coward · 2019-03-12 14:11 · Score: 2, Insightful

is simply because pixel is google, and the spy shit will still end up getting transmitted later when a connection is available. it has nothing to do with 'computing power' of the device. early dragon naturallyspeaking worked on lowly 486dx and pentiums running windows 95 and nt 4. all it takes it a little 'training' of the user's voice, and a trained dragon 1.0 did just as well back then, as current shit does today. current iterations of 'voice assistants' still do the 'training' for voices.. just 'in the cloud'.. cuz spying is good for profits and it allows untrained voices to be mostly recognized most the time.
1. Re:the reason offline function is available.. by Solandri · 2019-03-12 16:23 · Score: 3, Informative
  
  Most of the software functionality of the Pixel 3 has been hacked and extracted. You can install it on your Android device running Nougat or later if you're rooted with Magisk. If this offline voice recognition is done in software instead of dedicated hardware (like the original Moto X), expect it to be made available for other rooted devices as well.
2. Re:the reason offline function is available.. by ShanghaiBill · 2019-03-12 17:18 · Score: 3, Interesting
  
  early dragon naturallyspeaking worked on lowly 486dx and pentiums running windows 95 and nt 4
  That was just speech-to-text. Google is going much further than that, with semantic understanding of what you are saying. That requires way more compute power. On a cell phone, this has only been viable with sub-second response times since mobile GPUs got decent support for CUDA and OpenCL.
3. Re:the reason offline function is available.. by AmiMoJo · 2019-03-12 20:55 · Score: 1
  
  I was wondering how the conspiracy would evolve in light of this news.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
4. Re:the reason offline function is available.. by squiggleslash · 2019-03-13 00:35 · Score: 1
  
  This is all true, but actually what most of us want is something simple, not the all singing all dancing "Show me some pictures of ShanghaiBill in the nude" thing that Hey Google can do.
  Specifically 90% of us would prefer it if CALL <contact-name> AT <location-type-eg-cell-or-home-or-work> was one of a small number of commands pressing the button on your bluetooth actually recognized.
  Unfortunately, Google's insistence on making it more complex than that means the ability to voice dial has been completely fucked up. Bugs in it, such as it redirecting to the wrong audio device for the call itself, have never been fixed, presumably because the resources are elsewhere. Every few months the entire thing stops working because you're supposed to accept some new terms and conditions that it won't speak to you. At one point it would try to match your contact name with any vaguely related contact name and then give you options: "Call bill" would result in "Do you mean? Call Bill at home, Call William Defoe at home. Call William Defoe at work. Call William Tell at work. Call Will Smith on mobile. BEEEEEEEEEEEEEEEEEEEEEP!" (Session ends, no way to actually select one of these options, and it was fucking obvious which option was the right one in the first place.)
  And of course, it stops working altogether when you have no data connection available, or if the data available is congested or just unreliable.
  No-one, other than the marketing team at Google, has ever asked for an AI to handle basic voice commands on phones. Google tries to do too much, which makes their service objectively less useful than the voice dialing feature on flip phones around 2004.
  
  --
  You are not alone. This is not normal. None of this is normal.
5. Re:the reason offline function is available.. by epine · 2019-03-13 01:37 · Score: 3, Insightful
  
  and a trained dragon 1.0 did just as well back then, as current shit does today
  You're completely nuts.
  Dragon did okay back in the day if you bought exactly the right condenser microphone, positioned it exactly right on your headband (about 2" away from your lips just off to the side of your mouth), trained it properly in exactly that configuration, and you used it in quiet environment with no dogs barking, slamming doors down the hall, traffic noises through the open window, etc. Also, it was good to avoid getting allergies or coming down with a cold, to start/stop smoking unless you wanted to train your model again with your "new" voice.
  It's the same deal with squash rackets. The original graphite rackets from the early 1980s had a powerful sweet spot, but it wasn't very big. They also shattered every tenth time you scuffed the wall hard by accident. Then they started to monkey with the head shape, and the sweet spot expanded to the size of a cantaloupe. The graphite eventually became less brittle, too.
  But that old sweet spot the size of a mandarin orange sure was just as good as the modern shit today.
6. Re:the reason offline function is available.. by AmiMoJo · 2019-03-13 01:40 · Score: 1
  
  Their stated aim is the computer on Star Trek TNG, i.e. you can have a natural language conversation with it. They are actually getting there too - it understands context and follow-up questions in many cases.
  The computer on Star Trek is actually kinda great. You can ask it for a cup of tea and it will ask what kind, what temperature etc. But you can also use shortcuts, like "tea, Earl Gray, hot". It teaches you how to use it just by talking to it, because next time you remember the follow up questions it had and state the info up front.
  You can do that with Google Assistant already for some stuff, e.g. setting reminders.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
7. Re:the reason offline function is available.. by rahvin112 · 2019-03-13 05:40 · Score: 2
  
  The reason it's just the pixel's is that the Pixel phones contain a special piece of Google produced silicon (ASIC) that can dramatically accelerate speech to text. They are some of the only phones with this extra AI chip it was also one of the huge selling points of the pixel as they use this same AI chip to do their photo magic.
8. Re:the reason offline function is available.. by doconnor · 2019-03-13 08:13 · Score: 1
  
  You mean like a text adventure that ran on a 8088?
What will an ad company do? by AHuxley · 2019-03-12 14:29 · Score: 1

Voice recognition for more ads.

--
Domestic spying is now "Benign Information Gathering"
1. Re: What will an ad company do? by Anonymous Coward · 2019-03-12 17:02 · Score: 0
  
  I wish they'd do it like Roku. You have to hold the mic button down while speaking.
Battery by darkain · 2019-03-12 14:41 · Score: 2

I love how their reasoning is battery life... The mic is already turned on 24/7 to listen to the "OK Google" command, so that doesn't change. And then the actual audio is only about 1-5 seconds in length that takes about the same amount of time to process. Having the CPU at max for such a short period of time does absolutely nothing to significantly drain the battery. Do they think that having the radio turned on to transmit/receive data from the cloud magically uses less data?
1. Re:Battery by alvinrod · 2019-03-12 16:15 · Score: 1
  
  I'm assuming that until recently the processors weren't able to handle the processing quickly enough. I wouldn't be surprised to learn if there's some dedicated hardware that's been added to the SoCs in the latest phones that enable doing this on the device itself. Apple put in some dedicated neural network hardware in their latest SoC to help with photography and other workloads. I believe that the Qualcomm SoC in the Pixel has similar hardware that might be useful for doing speech to text.
2. Re: Battery by darkain · 2019-03-12 17:48 · Score: 1, Interesting
  
  For reference, Microsoft's Speech API launched in 1995 - Note that the fastest consumer processor in the world at the time was the Intel Pentium (original) in the 200MHz range. https://en.wikipedia.org/wiki/...
3. Re: Battery by Anonymous Coward · 2019-03-12 20:59 · Score: 2, Informative
  
  And it's quality was so poor you never hear of anyone actually using it. So what's your point?
4. Re:Battery by rahvin112 · 2019-03-13 05:45 · Score: 1
  
  The OK google listening never leaves the phone, it's processed locally to save bandwidth and battery and doesn't send anything off the phone until it's recognized the OK google.
This would be great on a Pixel 3... by Anonymous Coward · 2019-03-12 16:23 · Score: 0

If the mic was cranky due to bad QA...
TechCrunch neophytes? by Etcetera · 2019-03-12 16:55 · Score: 2

Whomever wrote this story speaks with the voice of someone who seems like they couldn't possibly understand why *anyone* would prioritize data-stays-on-device, non-cloud, privacy-related living.
Is this what the new generation of tech journalists is like? With no conception of out-dated functions like data locality and operational independence? Someone who couldn't imagine why someone would download local audio instead of streaming it from their cloud service?

--
Hire a Linux system administrator, systems engineer,
1. Re:TechCrunch neophytes? by iampiti · 2019-03-12 21:02 · Score: 1
  
  I guess they're young people for whom being spied 24/7 by Google and Facebook is a normal thing. They may have even not known anything else so they're not used to concepts like privacy and having data locally on your devices or the may even think those things are outdated concepts only missed by "old" people.
2. Re:TechCrunch neophytes? by Anonymous Coward · 2019-03-13 05:17 · Score: 0
  
  Modern journalism is the equivalent of being a real estate agent or a sales person.
I guess they've finished... by Anonymous Coward · 2019-03-12 17:35 · Score: 1

... snarfing up enough voice samples to last them a few decades ...
In Korea... by Anonymous Coward · 2019-03-12 23:20 · Score: 1

In Korea, only old people have digital privacy.
On device voice recognition by doconnor · 2019-03-13 01:37 · Score: 1

15 to 20 years ago there was voice recognition software that ran on PCs without using cloud. The average smartphone has more computing power then a 20 year old PC and should be able to do it easily.
Yea, lots of power by Khyber · 2019-03-13 01:50 · Score: 3, Insightful

"but turning voice into text on the order of milliseconds takes quite a bit of computing power."
Uhh, Dragon Naturally Speaking worked on fucking Pentium II processors. It only takes a lot of computing power today because nobody knows how to fucking code.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
1. Re:Yea, lots of power by Anonymous Coward · 2019-03-13 04:09 · Score: 1
  
  Dragon Naturally Speaking never worked. It made letters appear in your word processor, sure, but it never worked.
2. Re:Yea, lots of power by Anonymous Coward · 2019-03-13 04:22 · Score: 0
  
  This post is emblematic of the quality of posters and general level of intelligence on Slashdot.
3. Re:Yea, lots of power by radarskiy · 2019-03-13 05:22 · Score: 1
  
  A Pentium II alone drew more power than the charger for a Pixel can even theoretically supply, and you'd still need more power for the rest of the computer around that Pentium II.
I may actually use it by Anonymous Coward · 2019-03-13 01:53 · Score: 0

I may actually use voice commands if it's done completely offline. I'll wait for it to come to LineageOS though because I still don't want Play Store and associated services on my device.
Language by Shotgun · 2019-03-13 02:20 · Score: 1

"and it only works in American English"
Correction. It only works in white, upper middle-class Californian English.
It will not work for black men from Atlanta, Bostonian housewives, anyone from Appalachia, Valley Girls, anyone from "New Yawk", anyone that knows the words to Blake Shelton's "Boys Round Here", etc.
Having a deep voice, and born and raised in central North Carolina, my experience with all voice recognition I've ever encountered has been "less than pleasurable". Getting it to work generally involves trying to imitate the voices of people I work with from out west.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
There is dedicated hardware for neural networks by SuperKendall · 2019-03-13 02:59 · Score: 3, Informative

I wouldn't be surprised to learn if there's some dedicated hardware that's been added to the SoCs in the latest phones that enable doing this on the device itself.
Yes, just like Apple has the Neural Engine, Google has the Pixel Visual Core
The name is misleading because from what I can tell (and what the article says) it is like the Apple chip, and can help with arbitrary neural network processing.
What I'm not sure of is the speed of the iPhone chip compared to the Pixel one, the iPhone chip took quite a leap in speed this year...

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Is there source code for this by Anonymous Coward · 2019-03-13 05:31 · Score: 0

If they are releasing for the phone itself does that mean the source code will be available?
Hot damn by dcooper_db9 · 2019-03-13 12:01 · Score: 1

I've got like, 480,000 pixels!

--
I do not block ads. I do block third party scripts.
what a nightmare! by DeVilla · 2019-03-14 17:02 · Score: 1

This can take anywhere from a handful of milliseconds to multiple entire seconds (what a nightmare!), or longer if your packets get lost in the ether.
Dunno about a nightmare, but it can suck.

User: Wiretap. How late is the hardware store open?
...
User: Hello? Wiretap?
User: HELLO COMPUT...
Wiretap: The hardware store is open until 6pm.
User: ... ok.
Wiretap: I'm sorry. I don't understand compute.
...