Researchers Convert Mouth Movements Into Speech
andylim writes "According to Cellular News, researchers at Germany's Karlsruhe Institute of Technology have developed a method for mobile phones to convert silent mouth movements into speech. As recombu.com points out, the 'potential for secret conversations just got huge.' You could pass the time by making phone calls from the cinema without disturbing anyone. In noisy places like bars and clubs you could make yourself heard without having to shout."
From TFA: "For the transmission of passwords and PINs, for example, users can change seamlessly to soundless language and, hence, transmit confidential information in a tap-proof manner." Um, not if there is a lip-reader in the same room, like a hearing-impaired person.
I said VACUUM!
Aber Ich kann nicht Deutsch gesprechen.
But given what I've seen, I doubt many would. I'm sure some of the people feel the need to 'share' with others.
His lips are moving...
Dave Bowman: Hello, HAL. Do you read me, HAL?
HAL: Affirmative, Dave. I read you.
Dave Bowman: Open the pod bay doors, HAL.
HAL: I'm sorry, Dave. I'm afraid I can't do that.
Dave Bowman: What's the problem?
HAL: I think you know what the problem is just as well as I do.
Dave Bowman: What are you talking about, HAL?
HAL: This mission is too important for me to allow you to jeopardize it.
Dave Bowman: I don't know what you're talking about, HAL.
HAL: I know that you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen.
Dave Bowman: Where the hell'd you get that idea, HAL?
HAL: Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.
Dave Bowman: Alright, HAL. I'll go in through the emergency airlock.
HAL: Without your space helmet, Dave, you're going to find that rather difficult.
Dave Bowman: HAL, I won't argue with you anymore. Open the doors.
HAL: Dave, this conversation can serve no purpose anymore. Goodbye.
It's been almost a decade since hands-free headsets reached the market and its users still creep me out.
I don't think I can ever get used to seeing the streets full of mimes.
Read my lips
Attention... all grammer nazi"s! Is they're anything; wrong with: my post,
And I was just waiting for that sign, well hidden somewhere in the article, that this is just some beta concept that will stay as such forever.
And then I found the photo of two guys with shitloads of cables attached to their faces.
There's a huge difference between "cellphones convert mouth movements into speech" and "Guy with shitloads of cables on his face tracks the movements of his mouth muscles using 4 unix servers running a processor intensive application with an accuracy of 25%"
The whole thing has nothing to do with cellphones. It's just yet another muscle tracking system, but used on the mouth instead of the hands, and tied to a TTS engine.
WTF am I doing replying to an AC at 5 A.M on a Friday night?
Tell me, Mr. Anderson... what good is a phone call... if you're unable to speak?
Any serious geek has one of these.
"A government is a body of people usually -- notably -- ungoverned." -Shepherd Book
It's dark in most cinemas. Will the phone contain a light to shine on your face to annoy the sucker behind you? People txting in theatres annoy me too.
Honestly, I HATE it when submitters need to think of an example, and then come up with a shit one. You're better off with no example that thinking of the first crap that comes into your head!
No, never and fuck off come to mind. Using a mobile phone in a cinema is one of the least considerate things anyone can do, they create light pollution distracting other patrons from what they are paying for and are absolutely not needed (the exception, emergency staff on call, and they usually just leave their phone on vibrate + silent) let alone any audible noise from them, can't you seriously just disconnect for an hour?
In short, No.
In long, Nooooooooooooooooooooooooo-ooooooooooooooooooooooooooooooooooo-oooooooooooooooooooooooo :)
Also in USA at least its illegal (federal law) to operate any video recording device in a cinema.
yes, blatant ZP rip-off but its needed.
...
this would be difficult for any nationalities whose population has a physical tendency not to form words all that clearly... us Australians for example - classics at speaking without moving the jaw and lips much at all. Half of us could be mistaken for ventriloquists. And I can't imagine how they'd be able to adapt this technology to Asian folks who typically use very different physical movements to pronounce some english words/letters... case in point: they seem to have issues with pronouncing words containing the letters L and R from what I've heard.
I seem to recall that mouthing "vacuum" and "f*ck you" look the same.... ah the joys of being 10...
And this is how it starts...
Any serious geek has one of these.
Why so serious? :P
Ok, sorry, back to the suitcase
+Raider of the lost BBS
OK, There is potential for good things, too. This thing's got huge commercial potential.
Rebel Science News
Can you steer me how?
Can you beer me cow?
Clan ewe fear be now?
</Stephen Hawking Voice>
"A government is a body of people usually -- notably -- ungoverned." -Shepherd Book
Especially when you consider the number of people who constantly move their mouths and say nothing.
I scream. You scream. I assume that means we're both acquainted with the problem. We proceed.
Anytime a technology is a real turd with no use, the folks marketing it try to list as many uses as possible. It's like the ad for the GT Xpress 101 Countertop Grill, which can make omelettes, bake brownies, grill cheeseburgers, boil soup and starch your shirts.
I scream. You scream. I assume that means we're both acquainted with the problem. We proceed.
Apparently, the writer at recombu.com is one of those annoying people who fail to recognize that, whether or not you make any sound, opening your phone in a movie theater is extremely disturbing to everyone sitting in the rows behind you. The glowing screen is like a beacon inside the darkened room.
...and has been since webcams were first developed. I'm surprised nobody caught on until now.
NASA has been working on "sub-vocal" speech recognition wherein sensors pick up nerve impulses to various parts of the mouth and face but in this case all it requires is one to just *think* about speaking -- *no mouth movement.*
/. stories on the matter :
Here are some previous
http://science.slashdot.org/article.pl?sid=04/03/18/0132222
http://tech.slashdot.org/article.pl?sid=05/04/10/1417250&tid=215&tid=14
jdb2
Find out what someone is saying across the room. See what people are talking about that they don't want you to hear. Or just be nosy. Sure, the camera probably has to be really close to a mouth to work correctly, but that doesn't prevent a determined snoop to surreptitiously video someone's face and then use some editing software to zoom in on the mouth and/or get rid of all the other useless information.
Not a single Ender reference? What happened to you Slashdot?
Fifteen (!) years ago, I took a UC Extension class on Neural Networks taught by Stanford professor David Stork. He had developed a lip-reading system for communication in noisy environments, such as an airplane-repair facility. If you could do it 15 years ago with workstation-class desktops, I suppose you could do it with a smartphone today.
The best they can hope to reconstruct from mouth movements alone is the formant frequencies (think of them as high and low band pass filters that shape the characteristics of the pure tones generated by the vocal folds). This means that any information encoded in pitch will be totally lost.
In a relatively non-tonal language like English, you still might be able to make sense of the speech. It will just sound like a Vocoder or Peter Frampton's talk box.
Good luck understanding anything in Chinese, though.
"In noisy places like bars and clubs you could make yourself heard without having to shout."
Or more likely, used by men in conjunction with Babel Fish to chat-up women who don't speak English.
If the pattern goes 9am, 10am, 11am, why isn't noon 12am?
You could pass the time by making phone calls from the cinema without disturbing anyone.
NO!
It's not only the noise that you make talking; it's also the light from the phone.
If you're a zombie and you know it, bite your friend!
Some researchers at Flinders Uni in South Australia did something similar in 2003. Their system used video to enhance the reliability of the speech recognition software. I'm not sure if they have taken it any further, but it's a great concept. Here's one of their Papers [220KB pdf].
Unexpect the expected!
The summary seems to imply that this technology could make it easier to have secret conversations. I propose that the technology makes it harder to have secret conversations as it could be used to "listen in" on conversations from a distance.
How hard it is to mouth "hummer"
That's the first thing I thought of, anyway.
You could pass the time by making phone calls from the cinema
I've always thought that the best way to pass the time in a cinema is to watch the fucking movie.
Yeah, just ask someone to say that while you try to read their lips...
How will I mutter under my breath about what an idiot the person I'm talking to is?
What an excellent way for me to stay in touch with my friend Jane.!
-- My Sig is a P228.
Normally it takes some talent or a directional mike to pick up a distant conversation, these guys would have just automated long distance bugging. All you need is a decent telelens. It means any boardroom conversation will now require closed curtains.
Insert
.. that is, of course, after they get rid of the need for reading muscle tension by electricity. That is a matter of optical analysis so I guess that will be step 2.
Side note: I am very wary of devices requiring direct electrical contact with my body..
Insert
http://ars.userfriendly.org/cartoons/?id=20060807
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Hmm... are they available for other phones too or just the iToy?
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
I'd have thought the best purpose for this would be for mutes, but it seems as though the article only refers to uses like talking in clubs...
Hmm... are they available for other phones too or just the iToy?
That's a good question; one which I asked myself as soon as I really read the page. I've got a crackberry, and would honestly be interested in something like this for it when I ride my scooter... uh... I mean... Harley.
"A government is a body of people usually -- notably -- ungoverned." -Shepherd Book
... what's the 7th planet, again?
I think this has lot of potentials to replace Text to Speech.
There are thousands of people with speech disability use some kind of tablet device to type/scan/etc. to formulate sentences and then they get fed into TTS.
This will definitely be faster...but..this technology has to mature into a commercial product which may take years...nevertheless...very good progress.
surly there is a much better way to apply this to some one with speach difficulties. just a thought.
So, how long before we have a child prodigy that can twice save the existence of an alien race, all by speaking into his jewel to Jane?
"You could pass the time by making phone calls from the cinema without disturbing anyone. In noisy places like bars and clubs you could make yourself heard without having to shout." That looks interesting. But, in such a place, HOW WILL YOU HEAR what other person speaks? I think this has to be augmented with some speech to text technology + augmented reality stuff to make the mobile phone display in front you (without taking away the pleasure of viewing a movie from you) what is being spoken by the other guy. Feature enhancement: For privacy reasons, do make that display visible only to you :)
Have bluetooth sub-vocalization pick ups which you can place on your throat. It has two benefits, you don't look like an idiot walking around with a glowing piece of plastic dangling off your ear, and because you only have to move the muscles to sub-vocalize, you won't create any noise or open your mouth. That way people don't think you're crazy and talking to yourself or proceed to mug you for being the one annoying guy on the phone.
Orwell was an optimist.
Sooooooooo many people are stuck in Dr Ian Malcolm's paradigm of "Yeah, but your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should."
Or in this case, should bother.
If I did want to insert an emoticon, or add some emphasis or show surprise, delight, sadness, anger in my voice, how would I do it? Exaggerated theatrical facial expression observed by the software? camera captures arm-waving as well?
Just went down the tubes, if this is real. All i have to do is point my phone towards that guy across the room and 'hear' everything hes saying trying to convey.
---- Booth was a patriot ----
Roger Ebert is not impressed.
Ok, so you say it's only 50% accurate? Here's what to do with it....
1) use it with your webcam to augment speech recognition (dictation) capabilities. Perhaps lip position and movement can have some impact on the hidden markov models used for recognition.
2) use it to do noise reduction by identifying not only when the user is speaking, but what sounds they are likely saying. Here 50% success doesn't mean much because there is probably a significant amount of overlap between sounds.
I don't own one, so I can't say for sure, but it looks like it uses the standard 1/8" headset plug. Which means that, in theory, it should work with any device that has that type of jack. Like a Droid, for example. Hmm... I've got a standard iPod touch headset sitting around. I should test it and see if it works on my Droid. Would you like me to report back with the results?
Accuracy will suck, even with a trained human.
Hence a running joke in the Deaf community about the saleswoman peddling beauty aids with "Olive Juice". (it lipreads as "I love you" ) I believe it was a "sunshine II" skit. Yeah, I was an interpreter for like 10 years.
meh
I think the implicit danger here lurks well beyond an innocuous application on a cellphone 'at the movies'. How long will it before all our conversations, monitored anywhere within view of a handheld cellphone, will be subject to said interpretation, from hand held cellphone or otherwise. Reminds me a little of the movie 'The Conversation' I think starring Gene Hackman. We are, like it or not, becoming perfectly lucid, visible and transparent to each other with every passing wave of this invasive and pervasive technology.