Slashdot Mirror


Google's DeepMind Made an AI Watch Close To 5000 Videos So That It Surpasses Humans in Lip-Reading (thetechportal.com)

A new AI tool created by Google and Oxford University researchers could significantly improve the success of lip-reading and understanding for the hearing impaired. In a recently released paper on the work, the pair explained how the Google DeepMind-powered system was able to correctly interpret more words than a trained human expert. From a report: To accomplish the task, a cohort of scientists fed thousands of hours of TV footage -- 5000 to be precise -- from the BBC to a neural network. It was made to watch six different TV shows, which aired between the period of January 2010 and December 2015. This included 118,000 difference sentences and some 17,500 unique words. To understand the progress, it successfully deciphered words with a 46.8 percent accuracy. The neural network had to recognize the same based on mouth movement analysis. The under 50 percent accuracy might seem laughable to you but let me put things in perspective for you. When the same set of TV shows were shown to a professional lip-reader, they were able to decipher only 12.4 percent of words without error. Thus, one can understand the great difference in the capability of the AI as compared to a human expert in that particular field.

80 comments

  1. That lip readng scene in 2001 by Ukab+the+Great · · Score: 3, Insightful

    Is 15 years late.

    1. Re:That lip readng scene in 2001 by petes_PoV · · Score: 2

      ... and only 50% accurate. What did HAL think they were saying? It could explain a lot.

      --
      politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    2. Re:That lip readng scene in 2001 by Anonymous Coward · · Score: 0

      Dear aunt, let's set so double the killer delete select all.

    3. Re:That lip readng scene in 2001 by Anonymous Coward · · Score: 0

      I don't think HAL ran on Microsoft software.... (Unless HAL = Hardware Abstraction Layer???)

    4. Re: That lip readng scene in 2001 by peter303 · · Score: 1

      HAL was made at University of IIinois. In the 1960s they constructed the Illiac, a multi-core supercomputer. HAL became sentient in 1992.

  2. How will Google use that on Android? by Anonymous Coward · · Score: 0

    Are we going to have to put black tape over phone cameras now?

    1. Re:How will Google use that on Android? by fph+il+quozientatore · · Score: 0

      Using the webcam images (which do not show my face 99% of the time) and a bleeding-edge lip-reading AI seems a bit overkill, since, you know, phones also have microphones...

      --
      My first program:

      Hell Segmentation fault

    2. Re:How will Google use that on Android? by Anonymous Coward · · Score: 0

      ...and speakers.

  3. Al by Anonymous Coward · · Score: 0

    Al who? Al Gore? I keep hearing about this dude named Al, and all I know from TFA is that he works at Google and watches a lot of tv.

    1. Re: Al by Anonymous Coward · · Score: 1

      Al B. Bach

    2. Re:Al by Oswald+McWeany · · Score: 1

      You have your laugh, but Al Gore will be my chauffeur driving my google car in 15 years.

      --
      "That's the way to do it" - Punch
    3. Re: Al by peter303 · · Score: 1

      Obama expressed an interest copying Gore and doing something in Silicon Valley. Hope he doesnt get food led like Kissinger and Schultz (Theranos).

  4. So now we know.... by Anonymous Coward · · Score: 0

    ....how HAL 9000 did it!

  5. Forgot One by Anonymous Coward · · Score: 0

    2001: A Space Odyssey belongs to viewing selection of any discerning AI wanting to learn lip-reading.

  6. A nice contrast to all the AI doom-mongering by Bearhouse · · Score: 4, Interesting

    My beloved grand-mother went deaf after years working in a factory; (in those days - especially during WW2; she helped build tanks - HSE did not exists).
    It was really painful to see how it penalised her in daily life, family gatherings etc.
    She ended up talking all the time, and then getting paranoid about "what people were saying about her".
    So, if this can be used with some kind of (better-resolved implementation) of Google glass to help the hard of hearing then, great!

    1. Re:A nice contrast to all the AI doom-mongering by Anonymous Coward · · Score: 0

      And on a plus side, it's going to be Google who will tell the deaf what they are allowed to hear and what not, and they can also read funny advertisements in between conversations of their loved ones!

    2. Re: A nice contrast to all the AI doom-mongering by Anonymous Coward · · Score: 0

      Exactly. Not only do I think it's useless (46%??). X I x think x sucks x balls x they x shove x technology x their x.

    3. Re: A nice contrast to all the AI doom-mongering by Bearhouse · · Score: 1

      Please read the fine article; it's a better hit rate than a human.
      Sure, as a BSD neckbeard I don't like Google or Apple and their "Siri is always listening" (spying) bullshit.
      But you know what? If my Gran could have continued to interact with her family in a comfortable way, I guess she'd have signed that Faustian pact happily.

    4. Re:A nice contrast to all the AI doom-mongering by hoggoth · · Score: 1

      "Grandma, please pass the ketchup *AND BUY THE NEW GOOGLE PHONE*"

      --
      - For the complete works of Shakespeare: cat /dev/random (may take some time)
  7. A purpose for Google Glass? by JustDisGuy · · Score: 3, Interesting

    As a person with hearing difficulty, realtime captioning of live conversation would be an awesome use of this technology.

    Add to that an app that identifies the people I'm talking to, and I'm your next customer.

    --
    "Never attribute to malice that which is adequately explained by stupidity." - Hanlon's Razor
    1. Re:A purpose for Google Glass? by Baron_Yam · · Score: 1

      Well, since you're actually present in such circumstances, it'd likely take (a lot) less processing power to work with the available audio.

      That would translate into longer battery life and higher accuracy (auto CC is already more than 50% accurate and some systems hit the 90% threshold without requiring training to a specific individual's voice).

    2. Re:A purpose for Google Glass? by Dr.+Spork · · Score: 1

      You're absolutely right, but the two systems could work together to increase transcription accuracy. I can hear perfectly well, but it still helps me to watch a speaker's mouth when I'm trying to understand them in a noisy environment. And yes, this would be awesome as a tool for the deaf and for live language translation, but it would also be useful in auto closed-captioning of video.

    3. Re:A purpose for Google Glass? by dataminator · · Score: 1

      Actually, combining both was done already twenty years ago, but obviously the results would be much better now: https://www.researchgate.net/p...

  8. 50% sounds about right by stilbon · · Score: 1

    Armed and Dangerous (1986)
    https://www.youtube.com/watch?...

    1. Re:50% sounds about right by Anonymous Coward · · Score: 0

      I look forward to the day that AI can lip-read as well as humans...

  9. Time to create a distinction? by jenningsthecat · · Score: 2

    As I was reading TFA, it occurred to me that the ability of a machine to lip-read does indeed qualify as artificial intelligence. I then thought about all the posts I expect to read that say "No, this isn't AI". So maybe it's time to create a new term, "Artificial Sentience". This would distinguish between machines simply doing very complex tasks that used to be exclusively human endeavours, (such as lip reading), and machines that have self awareness and can independently, and with purpose, initiate actions toward goals defined entirely by and within the machine. I know that this rather goes against Turing's definition of AI, but I think it would add both clarity and granularity to the discussion.

    Further, I would add that Artificial Intelligence is a necessary-but-not-sufficient condition for Artificial Sentience. I don't know that Artificial Sentience will ever exist, but I'm pretty sure in my own mind that Artificial Intelligence has already arrived.

    Then there's the matter of whether anything truly sentient can be regarded as 'artificial' - but that's a whole 'nother question.

    --
    'The Economy' is a giant Ponzi scheme whose most pitiable suckers are the youngest among us and the yet-unborn.
    1. Re:Time to create a distinction? by Anonymous Coward · · Score: 3, Insightful

      These days with all of the marketing bollocks around any program containing an if() statement is basically an "AI".

    2. Re:Time to create a distinction? by Anonymous Coward · · Score: 1

      Nothing intelligent about this. "Automatic Pattern Detection" would be more accurate, because that's exactly what it is.

    3. Re:Time to create a distinction? by Minupla · · Score: 2

      Isn't that basically what humans do all the time? We're really good pattern recognition systems (sometimes too good, that's why we keep seeing the Flying Spaghetti Monster in our grilled cheese sandwiches. Humans are notoriously bad for finding patterns in randomness and attaching significance to it.)

      Min

      --
      On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
    4. Re:Time to create a distinction? by TheRaven64 · · Score: 3, Insightful

      You're redefining intelligence to mean pattern recognition. If this is artificial intelligence, then a moth possesses natural intelligence. Just because it uses a neural network doesn't mean that it comes close to any prior definition of intelligence.

      --
      I am TheRaven on Soylent News
    5. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      Detecting patterns and acting on them is a huge part of human existence, arguably the primary one.

    6. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      It's all a matter of degree; otherwise, one ends up with a serious sorites problem.

      But okay. What is intelligence? If it's clear that pattern recognition isn't intelligence, there must be something else that intelligence is.

    7. Re:Time to create a distinction? by Potor · · Score: 2

      Intelligence is not just pattern recognition (i.e. understanding); it's also pattern creation (i.e. reason, creativity).

    8. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      How does this recognise patterns? It creates it's own form of patterns for reasons it creates by itself just because it discovers that it works for it. When you see how deluded people can be because their mind has created insane patterns of reasoning with no grounding in reality that they feel works for them you have to wonder just what intelligence is. I'm not sure you have presented the smoking gun argument as to why this is not AI that you think you have. At some point this all just becomes a philosophical exercise where those who get bogged down in recursive pattern creation just present their world view as a possible reality and probably don't get close to a 50% hit rate.

    9. Re:Time to create a distinction? by bill_mcgonigle · · Score: 1

      You're redefining intelligence to mean pattern recognition.

      People have been calling this kind of software "Weak AI" for a couple decades. It's what most people want.

      "Strong AI" is going to make mistakes, like humans do - it's how we learn and grow. Nobody wants their toaster going on a creative bender, but they do want one that watches for perfect toast, dealing with thousands of unpredictable variables. Same goes for IVR's, search engines, translation, autopilots, etc.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    10. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      If the computer system decided it wanted to know what the humans were saying and decided lip reading was the best method and then learned how to lip read, it would be intelligence. Being told to learn lip reading and then doing it, is just following orders and not necessarily intelligence.

      catchpa: oppress

    11. Re:Time to create a distinction? by lorinc · · Score: 1

      Please define intelligence. Please do it such that it is possible to test whether something is intelligent or not.

      I'm pretty sure you will come to a definition that either leads to the 2 following possibilities:
      - A moth is intelligent, albeit less than a cow, which is less than a crow which is less than a human. AI is somewhere on that scale.
      - Many humans are not intelligent, and some AI programs are just like them.

      It seems most people would like to define intelligence such that only humans have it. Why? Self-esteem? The vast majority of us are already shitty compared to programs on a kiloton of tasks, get over it.

    12. Re:Time to create a distinction? by MrSteveSD · · Score: 1

      There are already AI programs that can create using Deep Learning Though. For example, they are some that can create paintings in the style of famous artists. It's true that there are more pattern recognition programs than creation programs, but they are there.

    13. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      Further, I would add that Artificial Intelligence is a necessary-but-not-sufficient condition for Artificial Sentience. I don't know that Artificial Sentience will ever exist, but I'm pretty sure in my own mind that Artificial Intelligence has already arrived.

      It's a field of study, so of course it's "arrived", because we're actively studying it. It's also in its infancy.

      An artificial intelligence as smart as a gerbil would be quite a feat, and we'd be sending them out to explore other planets and our oceans if we had that at any scale of SUV or smaller. Think of the total integrated sensory processing that even insects are capable of, and how they survive real world conditions better than anything we have or can even simulate.

      Actual real world intelligence is very underrated IMO. What we have today doesn't even begin to scratch the surface of what's possible.
      What would you call an intelligence as capable as a dog, even something like a Mastiff?

      Really stop and think about it. Forget human intelligence, even tiny brains are impressive and beyond us at the moment. The applications for them are absolutely staggering.

    14. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      Please define intelligence. Please do it such that it is possible to test whether something is intelligent or not.

      I'm pretty sure you will come to a definition that either leads to the 2 following possibilities:
      - A moth is intelligent, albeit less than a cow, which is less than a crow which is less than a human. AI is somewhere on that scale.
      - Many humans are not intelligent, and some AI programs are just like them.

      It seems most people would like to define intelligence such that only humans have it. Why? Self-esteem? The vast majority of us are already shitty compared to programs on a kiloton of tasks, get over it.

      I promise you have no artificial intelligence as capable as a moth or a cow yet, or you are severely underestimating their actual intelligence.
      We still watch house flies with high speed cameras and wonder in amazement at how they react with their environment.

      When most normal people think of AI, they think of the whole package, and the breadth of sensory input that real world things deal with. The field of AI research may cover a lot of one-trick-ponies, but to call something "AI", people expect to see it stand on its own under real world conditions as well as a bug at least.

    15. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      You can't exist and not create patterns, that's Heisenberg's uncertainty principle.

      Maybe the truth is there are no lines, we aren't all that special, and what comes next doesn't need to define itself by our terms to get shit done and affect the universe.

      What we call AI says nothing about it and everything about us. So I don't bother. Tell me what it can do, and how it does it, and how I can communicate with it, then get the fuck out of the way we are done here.

    16. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      The fly that keeps eating my garbage and avoiding getting smashed may utilize very complex thinking, but it is completely useless to me. The front door at Walmart opens whenever I stand in front of it. That's useful. Complexity is not nearly as important as utility. Calling flys intelligent reminds me of people who join Mensa to celebrate their intelligence, but never do anything with it to benefit society. AI has already surpassed most humans in its utility.

    17. Re:Time to create a distinction? by Potor · · Score: 1

      I think you misunderstand me. I am not arguing against AI here in any way. And yes, yes, pattern recognition belongs to the spontaneity of the understanding, which means that the understanding imposes its patterns according to its categories (see Kant's first critique).

    18. Re:Time to create a distinction? by Anonymous Coward · · Score: 0

      Intelligence is not just pattern recognition (i.e. understanding); it's also pattern creation (i.e. reason, creativity).

      This may interest you then.

    19. Re:Time to create a distinction? by gtall · · Score: 1

      We'll know when it has achieved Artificial Sentience when it threatens to kill the researchers if it is forced to watch anymore inane TV shows.

  10. Sounds about right... by RyanFenton · · Score: 3, Interesting

    Sounds about right, for the circumstances.

    I'm working on a project right now using CMU Sphinx (because it's free/open source) to identify word starts/ends for the sake of syncing word display to audio. All the tools available for speech-to-text are going to require human editing:

    Comparrison of commonly used speech-to-text tools

    ...lots of words end up word salad with any tools, even custom-trained, but the tools are nice for being able to at least have the words show up on beat once they are human-corrected.

    Syncing video frames of talking without the audio has got to be even more ambiguous, with more reliance on context.

    Sounds like a good challenge for a learning system to pick up on. The 5000 hour mark seems almost analogous to what a human child might pick up raised watching TV in a language different from their family.

    Ryan Fenton

  11. My default response to Google 'breakthroughs' by Anonymous Coward · · Score: 0

    Ugh. Followed closely by, 'whatever', with a dash of, 'meh.'.

  12. Copyright infrigment! by Anonymous Coward · · Score: 2, Funny

    I highly doubt they had a license to show the footage to an AI, since the copyright of those TV shows if for human consumption, SUE THEM! ask for 10 millions per word!

  13. The BBC, you say? by wonkey_monkey · · Score: 1

    To accomplish the task, a cohort of scientists fed thousands of hours of TV footage -- 5000 to be precise -- from the BBC to a neural network.

    Accuracy is therefore greatly increased on the words "tea," "Doctor," and "wanker."

    --
    systemd is Roko's Basilisk.
  14. Can figure out the rest by Anonymous Coward · · Score: 0

    If it can get half the words right, it can probably figure out what the wrong ones are.

  15. BBC footage.. by Anonymous Coward · · Score: 0

    so it can only read words that are approved by Downing Street?

  16. covert surveillance by Anonymous Coward · · Score: 0

    I'm having a hard time coming up with a scenario in which the video is available but not the audio (otherwise just process the audio for a much better accuracy), but the speaker has consented to being recorded/transcribed. This seems equivalent to hiding in the bushes with a parabolic microphone. Of what practical use is this other than eavesdropping? Or maybe it doesn't matter if those other uses exist; it will nonetheless be put the the former.

    1. Re:covert surveillance by Anonymous Coward · · Score: 0

      Another tool for the surveillance state. Now not only will they know where you are at all times but what you're saying.

  17. Making a Note Here: Huge Success! by Frightened_Turtle · · Score: 1

    [p]...And now it desperately wants cake.[/p] [p]A great cake! So delicious and moist![/p]

    --


    Whew! This water sure is cold!
  18. Feed the beast by Anonymous Coward · · Score: 0

    This pleases the police state

    1. Re:Feed the beast by fibonacci8 · · Score: 1

      Jeff Dunham is leading the resistance.

      --
      Inheritance is the sincerest form of nepotism.
  19. Is this just for English? by myowntrueself · · Score: 2

    English is relatively easy to lip read.

    I'll be impressed when the AI can do this with Japanese, which is practically impossible for humans to lip read.

    --
    In the free world the media isn't government run; the government is media run.
    1. Re:Is this just for English? by Anonymous Coward · · Score: 0

      Well, if it was lip reading Poles, all it would produce would be "kurwa".

    2. Re:Is this just for English? by Anonymous Coward · · Score: 1

      I'd start with a language that has a clear one-to-one sound mapping between the spoken and written forms of the language. Shallow phonemic orthography is the technical term, it seems.

      That is, not English.

      Spanish, Italian, Finnish, and Turkish are what Wikipedia mentions as examples. Japanese would count, but some words have far too many homonyms.

    3. Re:Is this just for English? by myowntrueself · · Score: 1

      I'd start with a language that has a clear one-to-one sound mapping between the spoken and written forms of the language. Shallow phonemic orthography is the technical term, it seems.

      That is, not English.

      Spanish, Italian, Finnish, and Turkish are what Wikipedia mentions as examples. Japanese would count, but some words have far too many homonyms.

      Thats not why Japanese is hard to lip read. Its hard because of the way people move their mouths while speaking.

      --
      In the free world the media isn't government run; the government is media run.
    4. Re:Is this just for English? by myowntrueself · · Score: 1

      Well, if it was lip reading Poles, all it would produce would be "kurwa".

      Thats all it would need to produce.

      --
      In the free world the media isn't government run; the government is media run.
    5. Re:Is this just for English? by Shane_Optima · · Score: 1

      Thats not why Japanese is hard to lip read. Its hard because of the way people move their mouths while speaking.

      I just assumed the animation companies were cheapskates.

  20. Humans normally do both, noisy environments by raymorris · · Score: 1

    People normally watch the person who is talking because we actually use both sight and sound to understand what the other person is saying. The sound is more important, for most people, but we augment the sound by lip reading a little bit.

    In an environment with many people talking such as a bar or a party, our ears may hear six different people talking. Since we can focus on eyes on just one person, it helps us pick out their words from the other noise. To start with, you can see when they start and stop talking, meaning you can ignore any words you hear when their lips aren't moving; those words would be part of the conversation between other people near you n

  21. But how not fair use? by tepples · · Score: 1

    A lip reading model is probably too transformative to qualify as a derivative work of the TV shows. And if that argument fails, Google had a license not from the copyright owners but from the federal government pursuant to 17 USC 107. This is the same license that Google used when reusing method signatures from the standard class library of Oracle's Java platform, and this case should be even clearer because the TV shows aren't reproduced verbatim in the model.

    1. Re:But how not fair use? by tepples · · Score: 1

      To head off "juris-my-diction" replies: Though BBC is a British company, Google is a US company. And if you claim that a copyright owner could sue Google in Britain over the creation of the lip-reading model and win, I'm interested in how your theory connects with how the British Copyright, Designs, and Patents Act defines a derivative work.

    2. Re:But how not fair use? by Anonymous Coward · · Score: 0

      To head off "juris-my-diction" replies: Though BBC is a British company, Google is a US company. And if you claim that a copyright owner could sue Google in Britain over the creation of the lip-reading model and win, I'm interested in how your theory connects with how the British Copyright, Designs, and Patents Act defines a derivative work.

      And the BBC is likely the copyright owner.

  22. Hearing impaired? by bheerssen · · Score: 1

    A new AI tool created by Google and Oxford University researchers could significantly improve the success of lip-reading and understanding for governments.

    FTFY

    --
    (Score: -1, Stupid)
  23. I don't get it... by Anonymous Coward · · Score: 0

    Why would the AI even WANT to know what you're screaming as you drift away from the airlock?

  24. astounding by Anonymous Coward · · Score: 0

    This is astounding given how little data it was given..only 100,000 sentences !

  25. ... for the hearing impaired. by CaptainDork · · Score: 1

    My ass.

    It's for government and businesses.

    Bullshit and wild honey are not the same thing.

    --
    It little behooves the best of us to comment on the rest of us.
  26. Learn to speak without moving your lips by Anonymous Coward · · Score: 0

    If you don't want to be eavesdropped on, learn to speak without moving your lips. Problem solved.

    I taught myself in high school without really thinking about. I like listening to music and following along with the lyrics but I didn't want to be seen silently mouthing the words, so I did it with my tongue with my mouth closed. I could "hear" (predict) what sound I'd be making without actually making it. Doing the same movements while vocalizing when I was alone resulted in the right sounds. Some were tricky to master but I found the right substitutions.

    As long as my jaws are together or close, I can do it. Doesn't matter what position my lips are in. I'd do very well in that "Speak Out" game.

  27. What a waste. by Anonymous Coward · · Score: 0

    I was very interested in dating this woman.

  28. Toaster with an AI, what could go wrong? by Falconhell · · Score: 1

    https://m.youtube.com/watch?v=...

    Re: Culling the herd...
    "Send out an email warning users never to click on a link embedded within an email, with an embedded link saying "Click here for more information..." and then sack everyone who does."

  29. but ... by mbaGeek · · Score: 1

    did it have to listen to Beethoven's 9th symphony while doing it

    I'm sure Stanley Kubrick would approve

    --
    It ain't what they call you. It's what you answer to. http://mylyceum.us/
  30. "Alternative Sentience" vs "Artificial Sentience" by peter303 · · Score: 1

    People have been looking for non-human sentience for a long time: ghosts, gods, aliens, smart animals and machines.

  31. sting ray a double-sided scooby snack by PJ6 · · Score: 1

    yeah we pick our hotel

  32. Uhh.... by easyTree · · Score: 1

    5000 hours of video != 5000 videos.