Google Launches More Realistic Text-To-Speech Service Powered By DeepMind's AI (theverge.com)

← Back to Stories (view on slashdot.org)

Google Launches More Realistic Text-To-Speech Service Powered By DeepMind's AI (theverge.com)

Posted by BeauHD on Tuesday March 27, 2018 @01:25PM from the better-than-ever dept.

Google is launching a new AI voice synthesizer, named Cloud Text-to-Speech, that will be available for any developer or business that needs voice synthesis on tap, whether that's for an app, website, or virtual assistant. The Cloud Text-to-Speech service is being powered by WaveNet, software created by Google's UK-based AI subsidiary DeepMind. The Verge explains why this is significant: First, ever since Google bought DeepMind in 2014, it's been exploring ways to turn the company's AI talent into tangible products. So far, this has meant using DeepMind's algorithms to reduce electricity costs in Google's data centers by 40 percent and DeepMind's forays into health care. But, directly integrating WaveNet into its cloud service is arguably more significant, especially as Google tries to win cloud business away from Amazon and Microsoft, presenting its AI skills as its differentiating factor. Second, DeepMind's AI voice synthesis tech is some of the most advanced and realistic in the business. Most voice synthesizers (including Apple's Siri) use what's called concatenative synthesis, in which a program stores individual syllables -- sounds such as "ba," "sht," and "oo" -- and pieces them together on the fly to form words and sentences. This method has gotten pretty good over the years, but it still sounds stilted.

WaveNet, by comparison, uses machine learning to generate audio from scratch. It actually analyzes the waveforms from a huge database of human speech and re-creates them at a rate of 24,000 samples per second. The end result includes voices with subtleties like lip smacks and accents. When Google first unveiled WaveNet in 2016, it was far too computationally intensive to work outside of research environments, but it's since been slimmed down significantly, showing a clear pipeline from research to product. The Verge has embedded some samples in their report to see how WaveNet sounds.

34 comments

Min score:

Reason:

Sort:

No thanks. by Anonymous Coward · 2018-03-27 13:36 · Score: 0

No one with good sense is going to use ( and become dependent on ) a service Google provides, because Google has a long and dishonorable history of abruptly killing off products.
Add to the above that Google will be recording and mining everything you say, and if you still think it's a good idea to use their stuff, you deserve no more sympathy than the sheep who used Facebook.
1. Re:No thanks. by Anonymous Coward · 2018-03-27 13:53 · Score: 0
  
  Right, typical Google service.
  Google will keep it running as long as they can collect new info about this market and its users. Then they will kill the service. You can be sure the info will be used -- somewhere.
2. Re: No thanks. by Anonymous Coward · 2018-03-27 14:38 · Score: 0
  
  That info will be used to replace americans with shitty smelly parasitic hindu-chimps.
3. Re:No thanks. by Anonymous Coward · 2018-03-27 16:13 · Score: 0
  
  I like how they blatantly lie about this using AI. AI doesn't exist no matter how many things marketing label as such.
Nice try by duke_cheetah2003 · 2018-03-27 13:44 · Score: 2

Given Google's history of taking things away, I would not build anything that depends on this. It will probably disappear in a year.
1. Re:Nice try by rtb61 · 2018-03-27 15:55 · Score: 0
  
  Nope, you might not be able to access it in a year bet you can bet three letter agencies will be wanting to use it to covert all your, and I do mean all of 'YOU', spoken words into data mine able transcripts, welcome to the panopticon https://en.wikipedia.org/wiki/... brought to you by the lets be evil company, watching 'YOU' all of the time, trying to control 'YOU' all of the time.
  How many hours or days in the week, should you be spending, disconnected, to remain free.
  
  --
  Chaos - everything, everywhere, everywhen
2. Re:Nice try by Anonymous Coward · 2018-03-27 16:18 · Score: 0
  
  depends on how google can monetize all the data contained within the documents they transcribe or associate the text to google|gmail|youtube accounts.
  this is some scary stuff... google cannot be trusted with even the data they have now.. add this to it? fuck. that is, we're fucked.
3. Re:Nice try by ShanghaiBill · 2018-03-27 20:44 · Score: 1
  
  you can bet three letter agencies will be wanting to use it to covert all your, and I do mean all of 'YOU', spoken words into data mine able transcripts
  I'll take that bet, and win. RTFA, or RTFS, or even RTFH. This is text-to-speech, not speech-to-text.
4. Re:Nice try by Anonymous Coward · 2018-03-27 21:10 · Score: 0
  
  Don't waste your time. rtb61 typically reads about half the words, gets triggered, then rattles off some barely on-topic "thoughts".
5. Re:Nice try by caseih · 2018-03-28 02:37 · Score: 1
  
  It's not just Google that does this. Amazon bought the Ivona TTS system and completely killed it for anyone other than users of the Kindle Fire who want to use the Ivona voices that ship with it.
  I have to admit Sallli is an amazing TTS voice on the Fire; sure wish I could use it on my Android phone. I've tried different ways of extracting it but haven't had any luck.
6. Re:Nice try by rtb61 · 2018-03-28 15:14 · Score: 1
  
  What goes in one direction, goes in the other, like duhh, 1+1=2, 2-1=1.
  
  --
  Chaos - everything, everywhere, everywhen
BA-OO-SHT by Anonymous Coward · 2018-03-27 13:45 · Score: 0

Too expensive for me.
They sound pretty much the same to me by Anonymous Coward · 2018-03-27 13:52 · Score: 0

Am I missing something?
Features not available by Anonymous Coward · 2018-03-27 13:55 · Score: 0

Has anyone played around with this?
The demo takes SSML, but it does not appear to support <prosody> functionality :-(
Kill the Cloud by Etcetera · 2018-03-27 14:08 · Score: 2

Hopefully, the Tech Awakening we're experiencing in the US at a consumer level might trickle upwards into actual products as well.
No way in hell I'm going to rely on something I have to use a remote service for, which is no doubt collecting and storing as many bits of data as possible. I don't need human-sounding-voice *that* badly that I can't wait for someone to figure out how to get 95% of this does and run on a few cores, or perhaps spare GPU capacity.

--
Hire a Linux system administrator, systems engineer,
1. Re:Kill the Cloud by pthisis · 2018-03-27 14:47 · Score: 2
  
  I mostly agree, though if the license on the generated audio is liberal enough I could see using this to create audio books of public domain texts in a crowd-sourced project. Feed the texts through (which, if distributed reasonably, shouldn't really be a significant privacy intrusion; the information's all out there already) and then save it for future use so it's still available even if the cloud service goes down.
  
  --
  rage, rage against the dying of the light
Poorer Quality by DavenH · 2018-03-27 14:10 · Score: 2

These voices are quite a far cry from the results of the original wavenet paper. I suppose a lot of computational tradeoffs happened, but these are Siri-level, not human level.
1. Re:Poorer Quality by JohnStock · 2018-03-30 12:37 · Score: 1
  
  It's way beyond Siri. In fact non WaveNet powered Google Assistant is slightly better and less robotic than Siri
This will be very useful by Anonymous Coward · 2018-03-27 14:18 · Score: 0

This will be very useful, to telemarketers.
1. Re:This will be very useful by ShanghaiBill · 2018-03-27 21:01 · Score: 1
  
  This will be very useful, to telemarketers.
  T2V is already good enough for telemarketers. Their problem is not generating the voice, but semantic analysis of the replies.
  I get an occasional spam call that I am not 100% sure if it is a human or a robot. So I try to immediately force it off script by asking something like "What color underwear are you wearing?" Sometime the call is disconnected, sometimes it is forwarded to a human, and sometimes the robot tries to get back on script. But best of all, sometimes it is an actual human, who will sometimes hang up, sometimes give a flustered response, and sometimes say something creative like "I'm not wearing any underwear".
Next up: Celebrity voices by Anonymous Coward · 2018-03-27 14:19 · Score: 0

Personally, I'd go for the Terminator
1. Re:Next up: Celebrity voices by PPH · 2018-03-27 14:49 · Score: 1
  
  the Terminator
  Are you sure you know what you want?
  
  --
  Have gnu, will travel.
I only need it to say one thing by darthsilun · 2018-03-27 14:21 · Score: 1

I'm sorry Dave. I'm afraid I can't do that.
Re: Lip smacks and accents? Impressive by Anonymous Coward · 2018-03-27 14:44 · Score: 0

Lips smack when cocks get suck
Great! by PPH · 2018-03-27 14:51 · Score: 3, Funny

I'll finally figure out how to pronounce 'doge'.

--
Have gnu, will travel.
Intelligence services boasted 10 years ago about by Anonymous Coward · 2018-03-27 16:32 · Score: 0

having the ability to do this.
So what took Google so long?
how about translating chinese? by Anonymous Coward · 2018-03-27 21:06 · Score: 0

Why doesn't google use all that fancy AI talent in translating text numbers in Chinese into text numbers in English. I can't believe it's all that hard to do yet they fail all the time.
I'm unimpressed by Anonymous Coward · 2018-03-27 21:51 · Score: 0

If it ain't Majel Barrett, it ain't shit.
FTFY by neo-mkrey · 2018-03-28 01:01 · Score: 1

The Verge has embedded some samples in their report to hear how WaveNet sounds.
When voice works locally by Anonymous Coward · 2018-03-28 01:04 · Score: 0

and doesn't have to rely on the cloud to work, then, Maybe, I might work with it. Otherwise my voice patterns are in the cloud and subject to data breaches. That's as bad as actually putting your signature on an electronic signature pad.
Re:Intelligence services boasted 10 years ago abou by tinkerton · 2018-03-28 02:18 · Score: 1

The NSA does speech to text so they can collect all voice calls and index them for textsearch and trigger words. THis is about text to speech.
Things sure got better by tinkerton · 2018-03-28 02:20 · Score: 1

If you check the competitor voice generation in the article it's also pretty good. Things have improved since Radiohead's depressing song 'Fitter Happier'
https://www.youtube.com/watch?...
pico tts by kbahey · 2018-03-28 04:45 · Score: 1

Why 'cloud' when local works well?
$ sudo aptitude install libttspico-utils
$ pico2wave -w h.wav "Hello World"
$ aplay h.wav

--
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.