Slashdot Mirror


Google's Voice-Generating AI Is Now Indistinguishable From Humans (qz.com)

An anonymous reader quotes a report from Quartz: A research paper published by Google this month -- which has not been peer reviewed -- details a text-to-speech system called Tacotron 2, which claims near-human accuracy at imitating audio of a person speaking from text. The system is Google's second official generation of the technology, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from Alphabet's AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly. The Google researchers also demonstrate that Tacotron 2 can handle hard-to-pronounce words and names, as well as alter the way it enunciates based on punctuation. For instance, capitalized words are stressed, as someone would do when indicating that specific word is an important part of a sentence. Quartz has embedded several different examples in their report that feature a sentence generated by AI along with a sentence read aloud from a human hired by Google. Can you tell which is the AI generated sample?

101 comments

  1. Not so much by smallfries · · Score: 4, Informative

    Despite choosing a low-quality human comparison (the audio fidelity is fine, but the timing and pronunciation is terrible), it is still quite obvious which is which. The synth version is slightly too clipped and the timing does not sound natural.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    1. Re:Not so much by Anonymous Coward · · Score: 0

      Agreed. It's better than before but still not there.

    2. Re:Not so much by Anonymous Coward · · Score: 1

      Heck, a good number of the ads I hear on radio have unnatural timing. Even a politician on a teleprompter sounds unnatural to me. Lots of people are bad (or untrained) at sounding natural as they read from copy.

    3. Re:Not so much by Oswald+McWeany · · Score: 1

      Despite choosing a low-quality human comparison (the audio fidelity is fine, but the timing and pronunciation is terrible), it is still quite obvious which is which. The synth version is slightly too clipped and the timing does not sound natural.

      Funny thing is, I thought both samples sounded more like a computer more than a human.

      --
      "That's the way to do it" - Punch
    4. Re:Not so much by Anonymous Coward · · Score: 0

      "unnatural" gets noticed and is not as easy too automatically tune out. They do it on purpose.

    5. Re: Not so much by megamind · · Score: 1

      Still easy to distinguish. Just wait a few seconds and then try to interrupt and see if it stops talking.

    6. Re: Not so much by Anonymous Coward · · Score: 0

      And cannot pronounce my last name or my wife's online name.

    7. Re:Not so much by jellomizer · · Score: 2

      I remember a claim from the Final Fantasy movie how its CGI Characters are Indistinguishable from real people. But only hitting the Uncanny Valley very hard.
      The problem I expect in the audio is like with CGI a bit too perfect, that it misses human imperfections, A computer doing a voice will do the voice is suppose to do. While a narrator while an expert at his craft, is affected by their emotions. When reading what they are saying will emotionally move them so this response will be in their voice.
      Much like how CGI Characters even perfectly rendered ones, just don't show the details of the emotions.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    8. Re:Not so much by kwoff · · Score: 2

      The voice reminded me of the narrator for "Physics Videos by Eugene Khutoryansky". Several people have asked in that channel's comments section if it is computer-generated, but it's claimed to be a woman named Kira. AFAICT, it's a voice actor, Kira Vincent. It makes me wonder if Google had her pronounce things, and her pronounciation just happens to be somewhat synthetic-sounding :) (though I looked quickly at the research paper and didn't find a mention of "Kira" or a name for the voice).

    9. Re:Not so much by cascadingstylesheet · · Score: 1

      I remember a claim from the Final Fantasy movie how its CGI Characters are Indistinguishable from real people. But only hitting the Uncanny Valley very hard. The problem I expect in the audio is like with CGI a bit too perfect, that it misses human imperfections, A computer doing a voice will do the voice is suppose to do. While a narrator while an expert at his craft, is affected by their emotions. When reading what they are saying will emotionally move them so this response will be in their voice. Much like how CGI Characters even perfectly rendered ones, just don't show the details of the emotions.

      Still ... "it took over a hundred questions with Rachel, didn't it??"

    10. Re:Not so much by chispito · · Score: 1

      Heck, a good number of the ads I hear on radio have unnatural timing.

      Part of that is because audio can now be digitally sped up without a corresponding pitch change, which precludes the need to hire actors like John Moschitta Jr. to read the terms, conditions, warnings, etc., at the end of an ad. I'm starting to suspect some agencies compress the entire ad in this manner to try to fit in more content without their actors sounding out of breath.

      --
      The Daddy casts sleep on the Baby. The Baby resists!
    11. Re:Not so much by Anonymous Coward · · Score: 0

      Having tried some audiobooks recently I think it would still beat current meatspace voice generation in many cases. At lest it's more likely to read a sentence honoring punctuation.

    12. Re:Not so much by nospam007 · · Score: 1

      "Even a politician on a teleprompter sounds unnatural to me."

      But some of them 'have the best words', or so they say.

    13. Re: Not so much by Anonymous Coward · · Score: 0

      I think the voice sounded rather good. Much better than anything I have heard before.

      I am pretty sure that voice would fool anyone. If that voice would read the news and nobody told it was computer generated, nobody would notice...

      I am actually pretty sure a new group of people should find a new line of work, soon.

    14. Re:Not so much by Anonymous Coward · · Score: 0

      UV is a result of near-perfect, but rejection from the subconscious. The unsettling, unpleasant sensation is the mental gap, internal disagreement.

      Perfect is, by definition, indistinguishable. A form of distinction would be a flaw.

      Whatever missing element you were trying to eventually describe, is then the next step in progress, and in all likelihood quite soluble to some mixture of time and money.

    15. Re:Not so much by iMadeGhostzilla · · Score: 1

      That makes sense. Our speaking apparatus, the muscles and nerves and whatnot are modulated by the emotions running through us at the moment. At the same time our own listening apparatus is trained through endless repetition to catch many of those modulations and identify them, consciously or not. For AI speech to be "indistinguishable from humans" it would need to simulate modulation by emotions which depend on the person and the context.

    16. Re:Not so much by MichaelSmith · · Score: 1

      Which never made sense to me. All through the movies, artificial organisms have serial numbers, as did the Nexus 8 in 2049. Couldn't Deckard just sample Rachel's DNA? Probably do it with a hand held reader by that time.

    17. Re:Not so much by Anonymous Coward · · Score: 0

      Yeah, it's bullshit. I checked out the sample clips and could easily distinguish between the human and the text to speech.

      Try again Google, you fail.

    18. Re:Not so much by Anonymous Coward · · Score: 0

      Updated: This story has been updated to reflect that two of the audio clips are humans speaking, not AI-generated voices.

      lol burn dude. Both clips were human. +5 informative. Slashturd strikes again..

    19. Re:Not so much by smallfries · · Score: 1

      Yeah... it said that when I commented. Hence my claim that it is not indistinguishable. Do you understand?

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  2. Baloney by 110010001000 · · Score: 0, Troll

    Of course this is more "AI" baloney as you can clearly tell it is speech synthesis. Even if it were indistinguishable, this is NOT AI. A "neural network" is nothing like a human brain. It is a weasel term to fool laypeople into thinking it is some sort of magic. Nice try Google. Keeping pushing your Google Home gadgets.

    1. Re:Baloney by Anonymous Coward · · Score: 0

      Stop parsing the vernacular. I suppose you are going to go on a crusade next about how smartphones are not really 'smart.' It's assholes like you who make communicating way, way more complicated than it needs to be.

    2. Re:Baloney by rodrigoandrade · · Score: 3, Insightful

      Duuuuude, it's AI!!!! Everything you can label "AI" gets a shit ton of page views.

      Even my doorbell has AI in it, because it rings when it "knows" someone is at the door looking for me.

    3. Re:Baloney by 110010001000 · · Score: 3, Insightful

      Words matter, caveman. What we are calling "AI" is definitely artificial, not not intelligent. If we are going to start calling computer programs "AI" just to start another VC hype cycle, then what is the point? Microsoft Word is "AI".

    4. Re:Baloney by Anonymous Coward · · Score: 5, Informative

      Listen for the "plosives", the "p" or "b" sounds. All text-speech systems get them wrong, because they are generally programmed from recorded speech that is very frequency limited. There are reasons for that. Full digital sampling of sound uses analog-to-digital converters, limited by the digital sampling. To reduce the amount of digital storage and processing required, the designers of both recording and synthesis tools lower the sampling frequency as far as possible. They also add low bandwidth filter on the input and the outputs, to avoid sharp step functions from generating undesired artifacts on the output, and to avoid weird "beat" harmonics with the sampling frequency from confusing the recorded inputs. But the result is smearing of sharp sounds which are more rich in transients, such as "t" and "p". And dear lord, does it screw up languages with "click" sounds like Zulu.

    5. Re:Baloney by Merk42 · · Score: 0

      Words matter, caveman.

      and those words' meanings change all the time.

    6. Re:Baloney by Anonymous Coward · · Score: 2, Insightful

      Everyone is going to call it AI, though.

      Everyone can be wrong, of course, but who loses in normal conversation? The Average Joe or a pedant?

      I'm sure the technology will be referred to in the correct terms by the people who use and probably invented the correct terms. For everyone else, there's AI.

    7. Re:Baloney by mikael · · Score: 4, Funny

      Same with electric heater. The thermostat has built in AI so that it knows when to turn the heater off when it is too hot.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    8. Re: Baloney by Anonymous Coward · · Score: 0

      But not in science they don't! AI has a definite scientific meaning.

    9. Re:Baloney by rkordmaa · · Score: 1

      The problem is that some people expect AI to be like something from sci-fi movie and happen to know that sci-fi AI and real world AI are nothing alike. For a layman it doesn't really matter, it's all magic anyway. "Deep learning neural networks" is a bit of a mouthful and doesn't get the point across as well as "AI", even if some people have unrealistic expectations about what AI is supposed to be. Complaining about it is nonsense semantics anyway, whatever you call it won't change what it is.

    10. Re:Baloney by 110010001000 · · Score: 0, Troll

      Bullshit. It isn't semantics. This isn't AI. "deep learning neural networks" are nothing like real neural networks (brains) and don't learn either. This is just another hype cycle, like 3DTV and VR.

    11. Re:Baloney by 110010001000 · · Score: 2

      No, they don't.

    12. Re:Baloney by Anonymous Coward · · Score: 1

      It's funny how angry you keep getting every time the word AI appears in a slashdot article.

      And yet, for all your rants, nothing changes. The world keeps on using AI to mean what you insist it doesn't mean.

      In the English language, popular use determines meanings. So, this word has attained a new meaning, whether you approve of it or not.

      But hey, keep posting your angry rants. Maybe they will go viral and convince the world to change.

    13. Re:Baloney by religionofpeas · · Score: 1

      Sounds like bullshit. A CD is only 650 MB, and holds 80 minutes of high quality audio. Who cares about the amount of digital storage for a couple of "b" and "t" samples ?

    14. Re:Baloney by Dog-Cow · · Score: 1

      If you smash a pickaxe through your eye, you will no longer care what people call AI, and we won't have to read your inane shit. It's a win/win.

    15. Re:Baloney by Anonymous Coward · · Score: 2, Insightful

      Dude, the proper definition of AI is obvious - It's whatever computers can't yet do.

    16. Re:Baloney by swillden · · Score: 1

      And a hacker is someone who enjoys making technology do interesting things. Good luck trying to redefine common language.

      For that matter, this isn't even "common" language. Researchers in the field call it AI as well, and have for decades. When necessary they distinguish between strong AI and weak AI, but most of the time it's not necessary because strong AI doesn't yet exist.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    17. Re:Baloney by swillden · · Score: 1

      I'm looking for a decent smart doorbell. I'd like one that rings when someone who doesn't live in my house approaches the door. It should have a button for backup.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    18. Re:Baloney by K.+S.+Kyosuke · · Score: 1

      Of course this is more "AI" baloney as you can clearly tell it is speech synthesis.

      Meanwhile, actual speech synthesis researchers are acutely aware that mimicking human speech requires dedicating significant NLP resources to generating correct prosody, which may very well be hard or next to impossible without the machine actually understanding what the text is about.

      --
      Ezekiel 23:20
    19. Re:Baloney by K.+S.+Kyosuke · · Score: 1

      When necessary they distinguish between strong AI and weak AI, but most of the time it's not necessary because strong AI doesn't yet exist.

      And you haven't even started distinguishing between AI the result (what you're talking about) and AI the field (which you need to have before you arrive at the former).

      --
      Ezekiel 23:20
    20. Re: Baloney by Anonymous Coward · · Score: 0

      You have no fixed, coherent, usable definition of intelligence yourself. I guarantee it. No one has, unless they're willing to use the word in some very different sense than normal.

    21. Re: Baloney by Anonymous Coward · · Score: 1

      I feel your pain binary. You should relax though, can you remember the mainframe, cloud, and e buzzwords? Everything will be called AI for a short while because its sounds cool and advanced to the masses, but this buzzword shall pass.

    22. Re:Baloney by fph+il+quozientatore · · Score: 1

      Clippy was AI.

      --
      My first program:

      Hell Segmentation fault

    23. Re:Baloney by sound+vision · · Score: 3, Informative

      The storage and CPU cost of recording audio are so small that they reached the point of irrelevance 15-20 years ago, for low-end consumer hardware. More like 40 years ago for professional grade equipment - around the time that CDs were introduced. Despite what a bunch of "audiophile" sites trying to push a product will tell you, it is not difficult, expensive, or taxing in any way to work with PCM audio of a sufficient bit depth and sampling rate to cover the entire range of human hearing. Or even dog hearing!

      But regarding speech synthesis specifically - there is software out there, still being used by somebody I'm sure, that was designed to be run on consumer PCs back in the 90s. At that time, on those systems, there were computational limits that were relevant to sound quality. Whatever outdated software Stephen Hawking uses, sounds like it renders the output at no higher than 10 or 12 kHz sampling rate (compared to 40 - 50 kHz to cover the human hearing range.) But the sampling rate is a very small part of why Hawking sounds bad. The artifacts you hear from a low sampling rate are mostly limited to high-frequency sounds being cut. (And possibly temporal smearing, depending on how you filter.) It sounds similar to turning the treble knob on your stereo all the way down.

      The quality problems with Hawking's synthesizer go way beyond a treble knob. Things like pacing, emphasis, minor slurring of certain sounds that are adjacent to each other, etc... problems that you take care of by making the software more intelligent, not upping the sample frequency. Which is exactly what Google is doing, and making some progress at it too. No, it doesn't sound like a human yet.

    24. Re:Baloney by ranton · · Score: 2

      Words matter, caveman. What we are calling "AI" is definitely artificial, not not intelligent. If we are going to start calling computer programs "AI" just to start another VC hype cycle, then what is the point? Microsoft Word is "AI".

      People really need to start modding these types of comments as Troll and move on. AI has included basic algorithms used as a stand in for intelligent thought since the field arguably began at The Dartmouth Summer Research Project on Artificial Intelligence over 60 years ago. At the time they were very aware of how difficult it could be to define intelligence, so they intentionally did not let that limit what was considered artificial intelligence research.

      Today the researchers and field of scientific journalism both agree that machine learning and neural networks fit within the field of artificial intelligence. That is all that matters, not your personal feelings about what the field should be.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    25. Re: Baloney by ranton · · Score: 2

      But not in science they don't! AI has a definite scientific meaning.

      And since its inception in the 1960's, AI has included basic algorithms used to approximate the results of intelligent thought.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    26. Re:Baloney by ranton · · Score: 1

      Before the mid 1900's if you saw the term AI it would have almost certainly meant artificial insemination, so I assure you the meaning of AI has changed over time.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    27. Re:Baloney by Anonymous Coward · · Score: 0

      Most of these things they're calling "AI" use neural networks now, meaning that they really do think similarly to a real brain. It just isn't a particularly good brain, nor do they tend to be trained very well.

      Most humans are dangerously stupid too. Are they intelligent?

    28. Re:Baloney by Merk42 · · Score: 1

      No, they don't.

      Yes, they do

      If your argument was somehow about "AI" specifically, you can see ranton's comment and/or picture how "AI" can become another instance of the example words I linked to.

    29. Re:Baloney by ljw1004 · · Score: 2

      Words matter, caveman. What we are calling "AI" is definitely artificial, not not intelligent. If we are going to start calling computer programs "AI" just to start another VC hype cycle, then what is the point? Microsoft Word is "AI".

      There's a straightforward difference. If the logic (or business logic, or branching structure / conditionals) was authored by a human programmer then we call it a conventional program. If the logic was an emergent property of running a learning algorithm over a training set, then we call it AI.

      This is a practically useful distinction for us working software engineers. (Why? The latter can't usefully be checked into source control itself; only its training data. You can't diff it. The typical bugs you get is very different between the two - the first kind of software has weird discontinuous edge cases, and the latter is generally "smooth". We engineers need different skillsets to develop and debug the two. The way we respond to requirements specs is different between the two. Each of them have their strengths at particular classes of problems - compiler-writing is dominated by the first kind; real-world sensory processing was done at first by the first kind like OpenCV up to 2010, but has been wholly eclipsed by the second kind).

      No, Microsoft Word isn't "AI" under this commonly-used definition.

      If you want to keep railing against it, why not (1) recognize that it's a practically useful distinction to make, (2) come up with a term you think is better?

    30. Re:Baloney by Anonymous Coward · · Score: 0

      Bullshit. It isn't semantics.

      Says the person who just stated "It isn't blah, it's blah"

      This isn't AI. "deep learning neural networks" are nothing like real neural networks (brains) and don't learn either.

      "Deep learning neural networks" are AI. That's all there is too it.

      Why bring up that "deep learning neural networks" are not like real neural networks in the brain?
      Not a single person (except you) has said they were the same.
      Not a single person (except you) even thinks they are they same.

      They are very different things! Yes, you made the same claim, but I don't think you actually understand this fact since you label them both as AI to imply they are the same.

      By strict definition the brain is not AI, it is NI (natural intelligence)
      Any neural network type of thing that has been created by chance from nature doing its thing is not AI either.

      AI must, again by its definition, be artificially created. This is understood to be created by man.
      Mankind has not designed let alone built a brain from scratch. We haven't created NI.
      AI however we have, and you nearly defined it yourself, other than claiming our software is somehow identical to a human brain of course.

      This stuff isn't just semantics. Words have meaning. I'd suggest you learn them.

    31. Re:Baloney by Anonymous Coward · · Score: 0

      And not ring when someone who lives there approaches? Can;t think of anything out of the box offering that, though you could possibly make an IFTTT rule with geotagging based on your phones location.

  3. Welcome to the wide world of.... by Zurkeyon3733 · · Score: 5, Insightful

    Robocalls! :-D

    1. Re:Welcome to the wide world of.... by Anonymous Coward · · Score: 0

      "You won a FRRRrrreeeeeee......" *click*.

      At some point, we will just white-list numbers with the exception of those registered as emergency contacts. We need a system in-place to make this both effective and safe.

    2. Re:Welcome to the wide world of.... by Megane · · Score: 1

      Wake me up when they can answer out-of-band questions like "What is today?", or respond in a human way to talking over their script with "Hello? Hello? Hello? Hello?" I'm not saying it won't happen, but for now, those are the fastest ways to fail them on a Turing test. When they figure those out, I'll move up to a next level of ez-fail questions.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    3. Re:Welcome to the wide world of.... by sound+vision · · Score: 1

      Welcome? I've been in that world for years. Anyway, most robocalls play a recording of an actual human voice, so I fail to see what they'd gain by using a synthesizer. I doubt that *recording the message* is the thing that limits their profits.

  4. Ha! Sabash!! Great competition. by 140Mandak262Jamuna · · Score: 1

    Just yesterday we saw a thread about someone giving Alexa the skills to ask questions. Now we see Google home is answering them. Set one against another and watch the fun!

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
  5. To.Tall.E. by Anonymous Coward · · Score: 0

    I. Think. This. Google. A. I. Sounds. A. Maze. ING.

    1. Re:To.Tall.E. by Megane · · Score: 1

      I guess I need to listen to it to see just how bad it is. You make it seem like William Shatner should be worried about losing work to automation.

      About 10 or so years ago, there was an automated voice reading weather reports on an HDTV sub-channel. I think it was actually the official National Weather Service radio audio. Whenever it came across "patchy fog", it would always say "patch-eef ogg". So now I'm expecting that times a hundred.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  6. No. by Anonymous Coward · · Score: 0

    "Can you tell which is the AI generated sample?"

    So you can use me as a Turing test guinea pig? For free? My answer is "no". Or... rather "show me the money".

  7. What about accents? by Tomahawk · · Score: 1

    I'm going to guess they this is with an American accent. I've yet to hear a Google voice that says "kilometres" in the same way we do in Ireland. (It's something I find a little irritating when using Google Maps for navigation).

    1. Re: What about accents? by Anonymous Coward · · Score: 2, Interesting

      As speech synthesis rises in usage, my guess is evolution will eliminate harder accents like the Irish, Jamaican, Cuban, etc. It will also eventually eliminate plosive sounds, etc. The language we speak will end up leaving towards how these systems speak because they'll be more ubiquitous.

    2. Re:What about accents? by jrumney · · Score: 1

      Have you tried setting your default language to English (Ireland) or English (UK)? (they seem to both be the same South-East England accent) The way they pronounce kilometers is definitely different than the US English voice.

    3. Re:What about accents? by chill · · Score: 1

      No need to guess, it says so right in the last paragraph of the article.

      However, the system is only trained to mimic the one female voice; to speak like a male or different female, Google would need to train the system again.

      Training against different accents is something that would easily be within Google's reach, once they're satisfied with the main product.

      --
      Learning HOW to think is more important than learning WHAT to think.
    4. Re: What about accents? by jabuzz · · Score: 1

      I would add that the volume of training material is huge and varied. Though one imagines that Amazon have easier access to the material through their Audible subsidiary. Audiobooks with wispersync being especially useful.

    5. Re: What about accents? by chill · · Score: 1

      I read some time back, that when first working on their Translate application, Google contracted with the United Nations for access to their professional translation archive. Thousands of samples of source material and professional translations in dozens of different languages.

      If that included voice recordings as well as written translations, it could be the solution to the problem of training material. Not regional accents, of course, but still, a big leg up.

      --
      Learning HOW to think is more important than learning WHAT to think.
    6. Re: What about accents? by EvilSS · · Score: 1

      I would add that the volume of training material is huge and varied. Though one imagines that Amazon have easier access to the material through their Audible subsidiary. Audiobooks with wispersync being especially useful.

      The problem is neural networks can be unpredictable in their response to training. Start feeding it different voices and it might just start averaging them out, or start doing the voice equivalent of code switching. That would be really weird to listen to.

      Also don't go getting the author's guilds and voice actors all riled up. They'll be suing preemptively.

      --
      I browse on +1 so AC's need not respond, I won't see it.
    7. Re:What about accents? by Paradise+Pete · · Score: 2

      I've yet to hear a Google voice that says "kilometres" in the same way we do in Ireland.

      Nobody else says anything the same way you do in Ireland.

    8. Re:What about accents? by Anonymous Coward · · Score: 0

      (Speak) kilometer
      (if ccTLD = ie then append)
      (Speak) ya right bastard

    9. Re: What about accents? by Tomahawk · · Score: 1

      True.
      Specifically for this, most say Keelow-meters or Killow-meters, while we day kill-Om-eters. Emphasis is on the Om.

  8. Terrible comparisons by Anonymous Coward · · Score: 1

    I'm impressed with the progress, but annoyed at how the results are oversold. First, they seemed to have asked that human comparison voice to sound like a robot and she succeeded, but credit for that doesn't go to the robot. Second, they only demonstrated sentences that fit in one breath. The way humans read a paragraph or a book chapter requires us to adjust our pauses for breath and our pacing to the content being read. I expect that Google know this and are working on it, and to be fair to them, it was slashdot and not they who came up with the "as good as humans" line. But I'm still annoyed.

  9. Breath by lazarus · · Score: 4, Insightful

    One thing that seems to be missing from all of these is a programmatic understanding of how much air is in the lungs.

    "Alexa, what is 69! (factorial)"

    Listen in amazment as she rhymes off the number but then enter the uncanney valley about the time she should be taking a breath...

    --
    I am not interested in articles about life extension advancements.
    1. Re:Breath by DigiShaman · · Score: 1

      The ever-lasting wind bag. Oh, what bagpipes she could be!

      --
      Life is not for the lazy.
    2. Re:Breath by Anonymous Coward · · Score: 0

      You should of just asked "Alexa, what is 69 ?"

  10. Romeo to Cinderella. by Anonymous Coward · · Score: 0

    > claims near-human accuracy at imitating audio of a person speaking from text

    If you believe this, I have a japanese hologram teenage pop idol to sell you. No kidding, one can buy the "Vocaloid CV-01 V4x" singing synthesis software, boxed or online for about 150USD. It comes with a clumsy manga-girl mascot design, who became a full-blown celebrity in her own right.

    Have you heard that Hatsune Miku perform in concert? She's the Number One Princess in the singing synthesis world exactly because she sounds so robotic and emotionless, which attracts weaboos like a Magnet.

    Why does she sound robotic? Because it hasn't been possible to refine singing synthesis for fluent pro-musician use, despite 15 years of best efforts by Yamaha Music Corp. in Japan and the Pompeu Fabra Research University lab in Spain. Thus everybody has lost interest in procedural song generation except the otaku subculture, who even want their own all-singing all-dancing Miku "waifu" in a jar called Gatebox.

  11. This will be great! by burhop · · Score: 4, Funny

    Hey google, read all slashdot comments to me with a sarcastic tone.

    1. Re:This will be great! by Anonymous Coward · · Score: 0

      That sounds like a really good idea. You should definitely try that.

  12. A ways to go yet. by Anonymous Coward · · Score: 0

    I'll be impressed when it (or any other text-to-speech bot) can read a novel aloud even half as well as a human narrator.

    This would include things like subtle voice changes for different characters, (and yet another change for narrative voice), changing the reading pace according to the mood of the scene (eg fast-paced for action, slower for deliberation or melancholy), and handling punctuation properly. (The latter isn't that hard, but the Kindle reader fails miserably at it, running chapter titles into the text because typically a chapter title has no period at the end.)

    Bonus points for auto-correcting typos and inserting dramatic pauses where appropriate.

    Extra bonus points for not screwing up a sentence like "Polish the silverware." and pronouncing the first word as the verb polish rather than Polish as in the language or someone from Poland.

  13. I noticed this after the last upgrade. by wjcofkc · · Score: 1

    I do not like it. It is unsettling.

    --
    Brought to you by Carl's Junior.
  14. That's not saying much. by Dan+East · · Score: 1

    When I was a kid, 35 years ago, I had a TI-99/4A home computer with a speech synthesizer (which was actually 5 years old tech at the time). Sure, it didn't sound great, but it was totally understandable. With the Terminal Emulator II cartridge you could build from phonemes directly and thus have it say any English word, and not just words from its predefined "dictionary" of words it knew how to pronounce already. That was 35 years ago, with a consumer grade home computer running at 3Mhz, that a 10 year old was goofing around with for fun.

    The fact that we didn't reach "Indistinguishable From Humans" in TTS *years* ago is not saying much for the state of our software.

    Here's an example of it speaking... https://youtu.be/0vu1GftX02Q?t...

    --
    Better known as 318230.
    1. Re:That's not saying much. by religionofpeas · · Score: 1

      Replaying pre-recorded phonemes is an adequate solution for poor quality speech, but you can't extend that method to reach high quality. In order to do that, you have to start over from scratch, using a much more difficult method.

  15. why ? by Anonymous Coward · · Score: 0

    Seriously. What's the problem we need solved here? The Google voice in maps is fine - even the robotic one when maps cannot connect to the mother ship.

    Focus on "The AI " understanding what I'm asking, please.

    1. Re: why ? by Anonymous Coward · · Score: 0

      So they can have a computer sell you shit you don't need while having you think it's a real person selling you shit you don't need.

  16. Maybe not the best test subject by Headw1nd · · Score: 1

    I would think if they were trying to showcase their technology they would have chosen someone with a less "robotic" voice to copy. I guess they just wanted someone who spoke very clearly?

  17. This is huge for the audio book market by Botched · · Score: 1

    If every book can be accessed by those who want to listen instead of read! Not a trivial development at all.

  18. That how they faked those "cellphone" calls... by Anonymous Coward · · Score: 0

    ...from United Airlines Flight 93 on 9/11/2001. So, that tech is declassified now.

  19. Compared to what humans? by SeaFox · · Score: 1

    A research paper published by Google this month -- which has not been peer reviewed -- details a text-to-speech system called Tacotron 2, which claims near-human accuracy at imitating audio of a person speaking from text.

    If anyone remembers "reading groups" from primary school, there is a pretty big range in the term "human accurate reading".

  20. Still sounds choppy by bobstreo · · Score: 1

    Good enough for Hawking maybe.

    I'd prefer a nice high class British female voice Or Paul Bethany as Jarvis..

  21. Not peer reviewed by Anonymous Coward · · Score: 0

    Of course it's "not peer reviewed". Google doesn't want their stupid hype train derailed. They've been doing this shit for years. Same reason they never bothered to play their sixty thousand dollar chess machine against anything at least resembling a half decent laptop.

  22. Eh, I think the title might be better worded... by itwasgreektome · · Score: 1

    I think it might be more realistic to say that Google and a speaker speaking in a monotonous, robotic way are pretty much indistinguishable from another. They both sound robotic to me. When it can imitate what people really sound like, normal people, then talk to me. Not that this isn't cool, but from the cursory bits I read and heard it seems to over-hype itself.

  23. Just a start by BradMajors · · Score: 2

    In a few years. AI will progress so that AI will sound more human than humans.

    1. Re:Just a start by Anonymous Coward · · Score: 0

      And just like Rob Zombie?

  24. Bad source by Anonymous Coward · · Score: 0

    Quartz mangled the article, this source is better in every way:
    https://research.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html

  25. Study English pronunciation by Anonymous Coward · · Score: 0

    I'd like to see (ehm hear) it to this little poem: https://www.cs.cmu.edu/~clamen/misc/humour/TheChaos.html and teach me a few things in the proces.

    1. Re:Study English pronunciation by knorthern+knight · · Score: 1

      Have you heard about the woman working in a tourist shop on "The Sunshine Coast" of British Columbia, Canada?

      She sells sea shells on the Sechelt Peninsula.

      --

      I'm not repeating myself
      I'm an X window user; I'm an ex-Windows user
  26. So how does it sound by Anonymous Coward · · Score: 0

    When I make it say words like "shit" "fuck" "cunt" "penis"....

  27. More voices please by Not-a-Neg · · Score: 1

    I like Australian Siri and wish Alexa would offer similar accents. $0.02

    --
    -==- Buy a Mac and leave me alone!
  28. Which are illegal in at least Germany ... by Anonymous Coward · · Score: 0

    Why exactly is this legal in the US?

    Also, why don't we have communication whitelist firewalls?

    I started to, when I had a stalker in 2004.
    I made an answering machine that only allowed people in my address book through.
    Then I configured my e-mail client, and then server, to use the same logic.
    And my Jabber instant messenger too.

    I currently still have a mailbox, due to living shared apartment, but we have legally valid digital signatures in our passports, which can be used for e-mail and everything, so there really is zero reason to send information-only letters. Hence, I plan to reject all letters, and exclusively accept parcel (including letters that aren't just information) in person, as soon as I move. (Around here, if you're not home, you can tell them a time or go to their branch to get it.)

    I also still have a doorbell, but have plans to disable it, add a note, and just have people call me if they are at the door, which in the last three apartment buildings was more convenient anyway. I just have to think about how to handle e.g. emergency services or cops ringing. Maybe the doorbell and intercom will be connected to a small single-board computer, which then routes it as a SIP call, but that would defeat the purpose of a whitelist. Hmm ...

  29. P.S.: Don’t block callers. Better idea: by Anonymous Coward · · Score: 0

    Make your "answering machine" instantly take the call (before the first ring), and play your local "There is no such number".
    Otherwise, they will just keep calling.