Google's Voice-Generating AI Is Now Indistinguishable From Humans (qz.com)
An anonymous reader quotes a report from Quartz: A research paper published by Google this month -- which has not been peer reviewed -- details a text-to-speech system called Tacotron 2, which claims near-human accuracy at imitating audio of a person speaking from text. The system is Google's second official generation of the technology, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from Alphabet's AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly. The Google researchers also demonstrate that Tacotron 2 can handle hard-to-pronounce words and names, as well as alter the way it enunciates based on punctuation. For instance, capitalized words are stressed, as someone would do when indicating that specific word is an important part of a sentence. Quartz has embedded several different examples in their report that feature a sentence generated by AI along with a sentence read aloud from a human hired by Google. Can you tell which is the AI generated sample?
Despite choosing a low-quality human comparison (the audio fidelity is fine, but the timing and pronunciation is terrible), it is still quite obvious which is which. The synth version is slightly too clipped and the timing does not sound natural.
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Robocalls! :-D
Just yesterday we saw a thread about someone giving Alexa the skills to ask questions. Now we see Google home is answering them. Set one against another and watch the fun!
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Duuuuude, it's AI!!!! Everything you can label "AI" gets a shit ton of page views.
Even my doorbell has AI in it, because it rings when it "knows" someone is at the door looking for me.
Words matter, caveman. What we are calling "AI" is definitely artificial, not not intelligent. If we are going to start calling computer programs "AI" just to start another VC hype cycle, then what is the point? Microsoft Word is "AI".
Listen for the "plosives", the "p" or "b" sounds. All text-speech systems get them wrong, because they are generally programmed from recorded speech that is very frequency limited. There are reasons for that. Full digital sampling of sound uses analog-to-digital converters, limited by the digital sampling. To reduce the amount of digital storage and processing required, the designers of both recording and synthesis tools lower the sampling frequency as far as possible. They also add low bandwidth filter on the input and the outputs, to avoid sharp step functions from generating undesired artifacts on the output, and to avoid weird "beat" harmonics with the sampling frequency from confusing the recorded inputs. But the result is smearing of sharp sounds which are more rich in transients, such as "t" and "p". And dear lord, does it screw up languages with "click" sounds like Zulu.
I'm going to guess they this is with an American accent. I've yet to hear a Google voice that says "kilometres" in the same way we do in Ireland. (It's something I find a little irritating when using Google Maps for navigation).
Everyone is going to call it AI, though.
Everyone can be wrong, of course, but who loses in normal conversation? The Average Joe or a pedant?
I'm sure the technology will be referred to in the correct terms by the people who use and probably invented the correct terms. For everyone else, there's AI.
I'm impressed with the progress, but annoyed at how the results are oversold. First, they seemed to have asked that human comparison voice to sound like a robot and she succeeded, but credit for that doesn't go to the robot. Second, they only demonstrated sentences that fit in one breath. The way humans read a paragraph or a book chapter requires us to adjust our pauses for breath and our pacing to the content being read. I expect that Google know this and are working on it, and to be fair to them, it was slashdot and not they who came up with the "as good as humans" line. But I'm still annoyed.
One thing that seems to be missing from all of these is a programmatic understanding of how much air is in the lungs.
"Alexa, what is 69! (factorial)"
Listen in amazment as she rhymes off the number but then enter the uncanney valley about the time she should be taking a breath...
I am not interested in articles about life extension advancements.
Same with electric heater. The thermostat has built in AI so that it knows when to turn the heater off when it is too hot.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
The problem is that some people expect AI to be like something from sci-fi movie and happen to know that sci-fi AI and real world AI are nothing alike. For a layman it doesn't really matter, it's all magic anyway. "Deep learning neural networks" is a bit of a mouthful and doesn't get the point across as well as "AI", even if some people have unrealistic expectations about what AI is supposed to be. Complaining about it is nonsense semantics anyway, whatever you call it won't change what it is.
No, they don't.
It's funny how angry you keep getting every time the word AI appears in a slashdot article.
And yet, for all your rants, nothing changes. The world keeps on using AI to mean what you insist it doesn't mean.
In the English language, popular use determines meanings. So, this word has attained a new meaning, whether you approve of it or not.
But hey, keep posting your angry rants. Maybe they will go viral and convince the world to change.
Sounds like bullshit. A CD is only 650 MB, and holds 80 minutes of high quality audio. Who cares about the amount of digital storage for a couple of "b" and "t" samples ?
If you smash a pickaxe through your eye, you will no longer care what people call AI, and we won't have to read your inane shit. It's a win/win.
Dude, the proper definition of AI is obvious - It's whatever computers can't yet do.
And a hacker is someone who enjoys making technology do interesting things. Good luck trying to redefine common language.
For that matter, this isn't even "common" language. Researchers in the field call it AI as well, and have for decades. When necessary they distinguish between strong AI and weak AI, but most of the time it's not necessary because strong AI doesn't yet exist.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
I'm looking for a decent smart doorbell. I'd like one that rings when someone who doesn't live in my house approaches the door. It should have a button for backup.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Of course this is more "AI" baloney as you can clearly tell it is speech synthesis.
Meanwhile, actual speech synthesis researchers are acutely aware that mimicking human speech requires dedicating significant NLP resources to generating correct prosody, which may very well be hard or next to impossible without the machine actually understanding what the text is about.
Ezekiel 23:20
Hey google, read all slashdot comments to me with a sarcastic tone.
When necessary they distinguish between strong AI and weak AI, but most of the time it's not necessary because strong AI doesn't yet exist.
And you haven't even started distinguishing between AI the result (what you're talking about) and AI the field (which you need to have before you arrive at the former).
Ezekiel 23:20
I feel your pain binary. You should relax though, can you remember the mainframe, cloud, and e buzzwords? Everything will be called AI for a short while because its sounds cool and advanced to the masses, but this buzzword shall pass.
I do not like it. It is unsettling.
Brought to you by Carl's Junior.
I guess I need to listen to it to see just how bad it is. You make it seem like William Shatner should be worried about losing work to automation.
About 10 or so years ago, there was an automated voice reading weather reports on an HDTV sub-channel. I think it was actually the official National Weather Service radio audio. Whenever it came across "patchy fog", it would always say "patch-eef ogg". So now I'm expecting that times a hundred.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Clippy was AI.
My first program:
Hell Segmentation fault
When I was a kid, 35 years ago, I had a TI-99/4A home computer with a speech synthesizer (which was actually 5 years old tech at the time). Sure, it didn't sound great, but it was totally understandable. With the Terminal Emulator II cartridge you could build from phonemes directly and thus have it say any English word, and not just words from its predefined "dictionary" of words it knew how to pronounce already. That was 35 years ago, with a consumer grade home computer running at 3Mhz, that a 10 year old was goofing around with for fun.
The fact that we didn't reach "Indistinguishable From Humans" in TTS *years* ago is not saying much for the state of our software.
Here's an example of it speaking... https://youtu.be/0vu1GftX02Q?t...
Better known as 318230.
The storage and CPU cost of recording audio are so small that they reached the point of irrelevance 15-20 years ago, for low-end consumer hardware. More like 40 years ago for professional grade equipment - around the time that CDs were introduced. Despite what a bunch of "audiophile" sites trying to push a product will tell you, it is not difficult, expensive, or taxing in any way to work with PCM audio of a sufficient bit depth and sampling rate to cover the entire range of human hearing. Or even dog hearing!
But regarding speech synthesis specifically - there is software out there, still being used by somebody I'm sure, that was designed to be run on consumer PCs back in the 90s. At that time, on those systems, there were computational limits that were relevant to sound quality. Whatever outdated software Stephen Hawking uses, sounds like it renders the output at no higher than 10 or 12 kHz sampling rate (compared to 40 - 50 kHz to cover the human hearing range.) But the sampling rate is a very small part of why Hawking sounds bad. The artifacts you hear from a low sampling rate are mostly limited to high-frequency sounds being cut. (And possibly temporal smearing, depending on how you filter.) It sounds similar to turning the treble knob on your stereo all the way down.
The quality problems with Hawking's synthesizer go way beyond a treble knob. Things like pacing, emphasis, minor slurring of certain sounds that are adjacent to each other, etc... problems that you take care of by making the software more intelligent, not upping the sample frequency. Which is exactly what Google is doing, and making some progress at it too. No, it doesn't sound like a human yet.
I would think if they were trying to showcase their technology they would have chosen someone with a less "robotic" voice to copy. I guess they just wanted someone who spoke very clearly?
Words matter, caveman. What we are calling "AI" is definitely artificial, not not intelligent. If we are going to start calling computer programs "AI" just to start another VC hype cycle, then what is the point? Microsoft Word is "AI".
People really need to start modding these types of comments as Troll and move on. AI has included basic algorithms used as a stand in for intelligent thought since the field arguably began at The Dartmouth Summer Research Project on Artificial Intelligence over 60 years ago. At the time they were very aware of how difficult it could be to define intelligence, so they intentionally did not let that limit what was considered artificial intelligence research.
Today the researchers and field of scientific journalism both agree that machine learning and neural networks fit within the field of artificial intelligence. That is all that matters, not your personal feelings about what the field should be.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
But not in science they don't! AI has a definite scientific meaning.
And since its inception in the 1960's, AI has included basic algorithms used to approximate the results of intelligent thought.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
Before the mid 1900's if you saw the term AI it would have almost certainly meant artificial insemination, so I assure you the meaning of AI has changed over time.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
If every book can be accessed by those who want to listen instead of read! Not a trivial development at all.
No, they don't.
Yes, they do
If your argument was somehow about "AI" specifically, you can see ranton's comment and/or picture how "AI" can become another instance of the example words I linked to.
Words matter, caveman. What we are calling "AI" is definitely artificial, not not intelligent. If we are going to start calling computer programs "AI" just to start another VC hype cycle, then what is the point? Microsoft Word is "AI".
There's a straightforward difference. If the logic (or business logic, or branching structure / conditionals) was authored by a human programmer then we call it a conventional program. If the logic was an emergent property of running a learning algorithm over a training set, then we call it AI.
This is a practically useful distinction for us working software engineers. (Why? The latter can't usefully be checked into source control itself; only its training data. You can't diff it. The typical bugs you get is very different between the two - the first kind of software has weird discontinuous edge cases, and the latter is generally "smooth". We engineers need different skillsets to develop and debug the two. The way we respond to requirements specs is different between the two. Each of them have their strengths at particular classes of problems - compiler-writing is dominated by the first kind; real-world sensory processing was done at first by the first kind like OpenCV up to 2010, but has been wholly eclipsed by the second kind).
No, Microsoft Word isn't "AI" under this commonly-used definition.
If you want to keep railing against it, why not (1) recognize that it's a practically useful distinction to make, (2) come up with a term you think is better?
A research paper published by Google this month -- which has not been peer reviewed -- details a text-to-speech system called Tacotron 2, which claims near-human accuracy at imitating audio of a person speaking from text.
If anyone remembers "reading groups" from primary school, there is a pretty big range in the term "human accurate reading".
Good enough for Hawking maybe.
I'd prefer a nice high class British female voice Or Paul Bethany as Jarvis..
I think it might be more realistic to say that Google and a speaker speaking in a monotonous, robotic way are pretty much indistinguishable from another. They both sound robotic to me. When it can imitate what people really sound like, normal people, then talk to me. Not that this isn't cool, but from the cursory bits I read and heard it seems to over-hype itself.
In a few years. AI will progress so that AI will sound more human than humans.
Have you heard about the woman working in a tourist shop on "The Sunshine Coast" of British Columbia, Canada?
She sells sea shells on the Sechelt Peninsula.
I'm not repeating myself
I'm an X window user; I'm an ex-Windows user
I like Australian Siri and wish Alexa would offer similar accents. $0.02
-==- Buy a Mac and leave me alone!