Google's DeepMind Made an AI Watch Close To 5000 Videos So That It Surpasses Humans in Lip-Reading (thetechportal.com)
A new AI tool created by Google and Oxford University researchers could significantly improve the success of lip-reading and understanding for the hearing impaired. In a recently released paper on the work, the pair explained how the Google DeepMind-powered system was able to correctly interpret more words than a trained human expert. From a report: To accomplish the task, a cohort of scientists fed thousands of hours of TV footage -- 5000 to be precise -- from the BBC to a neural network. It was made to watch six different TV shows, which aired between the period of January 2010 and December 2015. This included 118,000 difference sentences and some 17,500 unique words. To understand the progress, it successfully deciphered words with a 46.8 percent accuracy. The neural network had to recognize the same based on mouth movement analysis. The under 50 percent accuracy might seem laughable to you but let me put things in perspective for you. When the same set of TV shows were shown to a professional lip-reader, they were able to decipher only 12.4 percent of words without error. Thus, one can understand the great difference in the capability of the AI as compared to a human expert in that particular field.
Is 15 years late.
Are we going to have to put black tape over phone cameras now?
Al who? Al Gore? I keep hearing about this dude named Al, and all I know from TFA is that he works at Google and watches a lot of tv.
....how HAL 9000 did it!
2001: A Space Odyssey belongs to viewing selection of any discerning AI wanting to learn lip-reading.
My beloved grand-mother went deaf after years working in a factory; (in those days - especially during WW2; she helped build tanks - HSE did not exists).
It was really painful to see how it penalised her in daily life, family gatherings etc.
She ended up talking all the time, and then getting paranoid about "what people were saying about her".
So, if this can be used with some kind of (better-resolved implementation) of Google glass to help the hard of hearing then, great!
As a person with hearing difficulty, realtime captioning of live conversation would be an awesome use of this technology.
Add to that an app that identifies the people I'm talking to, and I'm your next customer.
"Never attribute to malice that which is adequately explained by stupidity." - Hanlon's Razor
Armed and Dangerous (1986)
https://www.youtube.com/watch?...
As I was reading TFA, it occurred to me that the ability of a machine to lip-read does indeed qualify as artificial intelligence. I then thought about all the posts I expect to read that say "No, this isn't AI". So maybe it's time to create a new term, "Artificial Sentience". This would distinguish between machines simply doing very complex tasks that used to be exclusively human endeavours, (such as lip reading), and machines that have self awareness and can independently, and with purpose, initiate actions toward goals defined entirely by and within the machine. I know that this rather goes against Turing's definition of AI, but I think it would add both clarity and granularity to the discussion.
Further, I would add that Artificial Intelligence is a necessary-but-not-sufficient condition for Artificial Sentience. I don't know that Artificial Sentience will ever exist, but I'm pretty sure in my own mind that Artificial Intelligence has already arrived.
Then there's the matter of whether anything truly sentient can be regarded as 'artificial' - but that's a whole 'nother question.
'The Economy' is a giant Ponzi scheme whose most pitiable suckers are the youngest among us and the yet-unborn.
Sounds about right, for the circumstances.
...lots of words end up word salad with any tools, even custom-trained, but the tools are nice for being able to at least have the words show up on beat once they are human-corrected.
I'm working on a project right now using CMU Sphinx (because it's free/open source) to identify word starts/ends for the sake of syncing word display to audio. All the tools available for speech-to-text are going to require human editing:
Comparrison of commonly used speech-to-text tools
Syncing video frames of talking without the audio has got to be even more ambiguous, with more reliance on context.
Sounds like a good challenge for a learning system to pick up on. The 5000 hour mark seems almost analogous to what a human child might pick up raised watching TV in a language different from their family.
Ryan Fenton
Ugh. Followed closely by, 'whatever', with a dash of, 'meh.'.
I highly doubt they had a license to show the footage to an AI, since the copyright of those TV shows if for human consumption, SUE THEM! ask for 10 millions per word!
To accomplish the task, a cohort of scientists fed thousands of hours of TV footage -- 5000 to be precise -- from the BBC to a neural network.
Accuracy is therefore greatly increased on the words "tea," "Doctor," and "wanker."
systemd is Roko's Basilisk.
If it can get half the words right, it can probably figure out what the wrong ones are.
so it can only read words that are approved by Downing Street?
I'm having a hard time coming up with a scenario in which the video is available but not the audio (otherwise just process the audio for a much better accuracy), but the speaker has consented to being recorded/transcribed. This seems equivalent to hiding in the bushes with a parabolic microphone. Of what practical use is this other than eavesdropping? Or maybe it doesn't matter if those other uses exist; it will nonetheless be put the the former.
[p]...And now it desperately wants cake.[/p] [p]A great cake! So delicious and moist![/p]
Whew! This water sure is cold!
This pleases the police state
English is relatively easy to lip read.
I'll be impressed when the AI can do this with Japanese, which is practically impossible for humans to lip read.
In the free world the media isn't government run; the government is media run.
People normally watch the person who is talking because we actually use both sight and sound to understand what the other person is saying. The sound is more important, for most people, but we augment the sound by lip reading a little bit.
In an environment with many people talking such as a bar or a party, our ears may hear six different people talking. Since we can focus on eyes on just one person, it helps us pick out their words from the other noise. To start with, you can see when they start and stop talking, meaning you can ignore any words you hear when their lips aren't moving; those words would be part of the conversation between other people near you n
A lip reading model is probably too transformative to qualify as a derivative work of the TV shows. And if that argument fails, Google had a license not from the copyright owners but from the federal government pursuant to 17 USC 107. This is the same license that Google used when reusing method signatures from the standard class library of Oracle's Java platform, and this case should be even clearer because the TV shows aren't reproduced verbatim in the model.
FTFY
(Score: -1, Stupid)
Why would the AI even WANT to know what you're screaming as you drift away from the airlock?
This is astounding given how little data it was given..only 100,000 sentences !
My ass.
It's for government and businesses.
Bullshit and wild honey are not the same thing.
It little behooves the best of us to comment on the rest of us.
If you don't want to be eavesdropped on, learn to speak without moving your lips. Problem solved.
I taught myself in high school without really thinking about. I like listening to music and following along with the lyrics but I didn't want to be seen silently mouthing the words, so I did it with my tongue with my mouth closed. I could "hear" (predict) what sound I'd be making without actually making it. Doing the same movements while vocalizing when I was alone resulted in the right sounds. Some were tricky to master but I found the right substitutions.
As long as my jaws are together or close, I can do it. Doesn't matter what position my lips are in. I'd do very well in that "Speak Out" game.
I was very interested in dating this woman.
https://m.youtube.com/watch?v=...
Re: Culling the herd...
"Send out an email warning users never to click on a link embedded within an email, with an embedded link saying "Click here for more information..." and then sack everyone who does."
did it have to listen to Beethoven's 9th symphony while doing it
I'm sure Stanley Kubrick would approve
It ain't what they call you. It's what you answer to. http://mylyceum.us/
People have been looking for non-human sentience for a long time: ghosts, gods, aliens, smart animals and machines.
yeah we pick our hotel
5000 hours of video != 5000 videos.
Requiem for the American Dream