Researchers Build An AI That's Better At Reading Lips Than Humans (bbc.com)
An anonymous reader quotes the BBC:
Scientists at Oxford say they've invented an artificial intelligence system that can lip-read better than humans. The system, which has been trained on thousands of hours of BBC News programs, has been developed in collaboration with Google's DeepMind AI division. "Watch, Attend and Spell", as the system has been called, can now watch silent speech and get about 50% of the words correct. That may not sound too impressive - but when the researchers supplied the same clips to professional lip-readers, they got only 12% of words right...
The system now recognizes 17,500 words, and one of the researchers says, "As it keeps watching TV, it will learn."
The system now recognizes 17,500 words, and one of the researchers says, "As it keeps watching TV, it will learn."
I'm sorry Dave, I'm afraid I can't do that.
Sseeing as there's so much closed-captioning going on, they've got an enormous volume of material to train their neural network on.
I've done this sort of thing before, and often finding a large set of quality training material is a significant challenge.
Getting half the words correct, then feeding that into a grammar / context engine should yield very close to 100% accuracy. That's what deaf (and hearing impaired) lip readers have to do since the stated 12% initial recognition is about right. They have to stay very focused on the speaker and make heavy use of context to work out what's being said. And that's a perfect job for a computer.
I work for the Department of Redundancy Department.
Well, there is "Bad Lip Reading" - their videos are usually pretty funny.
'the same clips to professional lip-readers"
ok, who else didn't know that there are "professional" lip readers?
The police use them from time to time (on surveillance videos). I imagine there are other uses as well.
Just cruising through this digital world at 33 1/3 rpm...
The surveillance state is coming in its pants thinking about all the additional conversations they'll be able to monitor now.
Time to break out the bandannas and cough-masks....soon it'll be fashionable to wear them in public!
Just cruising through this digital world at 33 1/3 rpm...
Go compare this to a deaf person that reads lips. I know of literally thousands that never miss a single spoken word as long as they're looking at the speaker's mouth.
Source: Camfrog, where there are fucktons of deaf people communicating with those with hearing. We speak after getting their attention with a hand signal, they read our lips and reply with zero issues.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Why don't they offer to run this against the thousands of hours of course videos that Berkley just pulled due to ADA? Google gets massive training material, Berkley gets free transcripts, and the material stays online. Everyone wins...