Lyrebird Claims It Can Recreate Anyone's Voice Based On Just a 1 Minute Sample (theverge.com)
Artem Tashkinov writes: Today, a Canadian artificial intelligence startup named Lyrebird unveiled its voice imitation deep learning algorithm that can mimic a person's voice and have it read any text with a given emotion, based on the analysis of just a few dozen seconds of audio recording. The website features samples using the recreated voices of Donald Trump, Barack Obama and Hillary Clinton. A similar technology was created by Adobe around a year ago but it requires over 20 minutes of recorded speech. The company sets to open its APIs to the public, while the computing for the task will be performed in the cloud.
Goodbye, voice actors.
Film actors, you're next.
we'll hear Donald Trump wax poetic about the taste of Putin's cock and Barack Obama gloat about fooling all the crackas with his secret Muslim agenda. I can't wait.
Didn't know anybody still used that. Hosers!
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Perhaps now we'll need more verification and proof before information is accepted, leading to more accountability
If this true I imagine Hollywood would jump on this -- they now have one less reason to be inconvenienced when an (popular) actor dies.
Someone uses a reconstruction of someone else's popular, but now dead voice, as a marketing ploy -- much like Natalie Cole hijacked her father's song -- are we going to have lawsuits over unauthorized sound-a-likes now?
I also imagine the music industry would go crazy over it as well. First with their Auto-Tune shenanigans I'm now waiting for the inevitable "Auto-Sing" -- "we can recreate the voice of any dead singer!"
So far the every sample (including titular one with Robo Donald Trump) sounds like a mangled Stephen Hawking voice-bot :(
If I heard that voice from behind the door asking if I were John Connor, I'd say I'm a meat popsicle.
Hyperom.com
gets their hands on this. With photorealistic CGI and manufactured voices, they can manufacture any recorded situation and evidence they want, and pass it off as real.
I think we will eventually reach a point in the world where every person of notability has a private encryption key, and any statement or appearance they make will be signed so people know what is real and what is not.
I would love for a "personal digital assistant" to have Majel Barrett's voice or John Forsythe's voice. Hell, if nothing else we could continue to produce TV programs or movies where their voices are important.
Do not look into laser with remaining eye.
I guess it's better than Festival but it's proprietary technology while Festival is free.
If you're a zombie and you know it, bite your friend!
This is true in the same way that auto-tune removes the need for musical singing ability. Sure, you can force a certain note, but it sounds artificial. Similarly this tool can replicate a voice at standard timbres and emotions well enough to be recognizable, but not well enough to be undetectable as a digital emulation.
It's not until it's undetectable (such as some of the best modern CGI) that we'll actually have made actors obsolete. Except... amazingly, CGI costs more than the actors, it's less flexible, and slower. I think it will be quite a while before we have something that is both on-par for quality and cheaper than a skilled live human.
"I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
I wouldn't class myself as a technophobe but this leaves us all open to the creation of a "confession" for something we have not done. Scary shit in my opinion. And no I don't trust some law inforcement agencies or in fact some government agencies to do just that. (I'll put on my tinfoil hat)
It's already happened. Here's another one.
"First they came for the slanderers and i said nothing."
>Sure, you can force a certain note, but it sounds artificial.
But it doesn't need to. They don't have to do auto-tune in discrete steps following a set scale, it could be (as far as the human ear is concerned) done in an analog fashion.
The technology will improve until you don't even notice it. It may already have done so, with the only auto-tune you notice today being deliberately worse than necessary for effect or simply the result of cut-rate sound engineering.
Which makes me wonder... can you get a mic with built-in auto-tune for home karaoke yet? I sing like a cat being strangled, I could use one.
Is it just me that still hears microsoft Sam under all of this. While the likeness is there it's still pretty obvious it's generated.
This will be great! Now ill be able to order stuff with anyones Alexa!
The folks at University of Montréal aren't to be sneezed at. https://lyrebird.ai/ethics makes a nice bilingual joke.
davecb@spamcop.net
I give it about a month before there will be a decent open source clone. Progress in AI is crazy fast.
They're not going anywhere. The point is that they're 'real' people. I suppose it might cost second stringers their jobs, but then who'll rise through the ranks? It takes time to build star power.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
Have gnu, will travel.
It's not going to take nearly as long for Mr. T and Dennis Leary to provide samples to tell me what to do the next time i have to drive through Chicago.
There is, though, a hell of a lot more to a "voice" than tone and timbre. Christopher Walken is going to be safe for a long time.
While this technology does a decent job capturing some of the voice characteristics, it still sounds like a damn generated voice. Im no sound expert but its the reverb or something like that in the generated voice that makes it sound just like all other generated voice. Hell if you didn't tell me that was Obama I might not even have put 2 and 2 together - sounds like a drunk (lacking enunciation) Obama I suppose. The Hillary, barely even recognizable as her. Sorry but I cant hear past the "robot" voice attenuation, which is what plagues all generated voice.
I'm with you man. These sure do have a long way to go! Call me when there is actually something worth listening to.
Yeah, it's impressive for what it is, but they don't sound human. The Trump voice was the closest, but then Trump doesn't sound like any other human I've ever heard.
Never let a lack of data get in the way of a good rant.
LOL @ Johnnie Knoxville. Just realized that's him in that commercial. Before his jackass days.
Hello, i am the system administrator. My voice is my password, verify me.
The point of this isn't that they can recreate 100% believable audio yet, but that they can get really close, and that it's going to happen relatively soon, so we should stop relying on audio recording as authentic.
There's really no logical reason that famous film stars are also billed prominently for animation, and yet that's what we have.
The vocal performance and personality of the actor shapes and defines the animation of the character.
Disney understood that from the beginning, which is why three generations of stars from film, radio, television and theater have recorded for Disney. Try imaging the animated Aladdin without the manic improvisation of Robin Williams.
For bonus points, try re-casting the voice of Rocket Raccoon and see if you if you still have a CGI and motion capture character that audiences will actually give a damn about, help anchor a new franchise and deliver a billion-dollar pay-off at the box office.
No, estates will abuse it till they can get all the money there is. Expect actors that would never lower themselves to certain level be featured in ads - b/c family gets an extra buck!
http://fortune.com/2017/03/28/...
But it's NOT "really close". It's not even REMOTELY close. How the hell did this comment get modded "Insightful"?
Queen to knights level 3
Computer, verified
but that they can get really close
I'm not so sure about that. Those samples, if they're the best we can manage, seem to indicate that we're a long way off from 'really close'.
it's going to happen relatively soon
In the geologic sense, I suppose.
so we should stop relying on audio recording as authentic
That's a bit premature. Synthesized voice isn't even tolerable yet; listening to it is almost painful. I don't think we'll need to worry computer generated impersonations ruining our lives for a long, long, time.
Required reading for internet skeptics
The true goal of AI is to destroy encryption while digitally fingerprinting all of us for those that use SSL and VPN, or whatever comes next. If an AI can recreate your voice, than it can definitely know who is typing what on the Internet. Uploading biometric data to social networks isn't helping much either. Cloud computing was designed from open source software at the start to make better use of mobile devices. But now, it is currently utilized by corporations to destroy the freedoms of the desktop, the privacy of software users, and removing control. This does not set well with most Linux people and the irony is that most cloud servers are running Linux. This allows companies to "love" open source and actually mean it, but it's really a kick in the nuts for anyone that loves FOSS and a huge financial advantage for not paying for licenses, ergo using server-based open source to destroy its desktop competition. I can get access to your API? O'lordy sir. Thankya fors ya scraps. Fuck API's. Cloud computing is just an excuse to get people who will buy mobile devices but not new laptops stuck into something they have to pay for and no control over. They could try to standardize a new architecture like they did in the late 2000s to get people to buy tech, but the cloud way is cheaper and they make more money and save more by not having the demand to improve hardware. I saw a new laptop the other day for $400 and it only sports 1.2Gz and 4GB of RAM. WTF is this shit? Y'all need to wake up because the millennial "It's 1984, oh well" syndrome is going to put us into something we average consumers can't get out of.
Give it a sample of your mothers voice and get it to make a sick note for your work, school.
You may fire when ready.
...then Queen can start touring again!
I wonder if you could use this for some sort of ultra-low-bandwidth voice compression? Imagine if your phone/chat program learned the voice of your contacts when you had bandwidth to spare, and then used that model to enhance the sound quality if you're ever severely bandwidth-constrained. Hell, it could theoretically do voice recognition and send text plus some intonation hints (or perhaps some sort of phoneme encoding), relying entirely on the remote model to turn it back into voice.
Of course, this would be a lot of work for very modest gains; voice is hardly a bandwidth hog as is.
"I can here him barking." Seems like that's not too far away. Should make some interesting robocalls!
Check out 2-minute papers on youtube. The advance of real time rendering is going majorly exponential. 5-10 years from now you will be able to type in a screenplay with a few additional parameters, and the movie will pop out, and it will look like live action, and the effects might even look PRACTICAL!
Also, I think people are underestimating the creative input that a performer puts into a voice performance. They can put in a lot of subtle emphasis and emotion into speech. Even if AI can perfectly replicate someone's voice, will it know when to emphasize a word, when to change the pitch of its voice, and when to insert a dramatic pause?
This is a fun thing, but the voices still sound very very artificial.
Although the samples sounded robotic, ZRTP's known attack vector is the "Rich Little" MITM attack. It's not hard to imagine this getting to the point where this is a real concern for ZRTP users.
>O'lordy sir. Thankya fors ya scraps.
AHHaaaHaaa! Yes, you are spot on.
>..because the millennial "It's 1984, oh well" syndrome
Oh hell, you ARE spot on.