Digital Mouths, Synthetic Faces at MIT and Lucasfilm
jfengel writes "Two separate articles about generating faces automatically. From the Boston Globe, there is a story about MIT scientists putting words into somebody's mouth by splicing together footage. In the samples, I couldn't tell the difference between the synthetic footage and the same person really saying the same thing. (Though it's a little hard to tell at only 81kbps video). And Wired as a lengthy article about generating purely synthetic faces at Lucasfilm. It discusses some of the difficulties in getting it right."
I mean, this is pretty cool and all, but there's no reason to start worrying if someone's gonna put words in your mouth anytime soon. First they'd need:
1. a few minutes of footage of you saying stuff that has the full range of mouth movements directly into a camera.
2. an audio recording of you actually saying what it is that they want you to say. It's possible to cut and splice seperate recordings together, but 99% of the time, differences in the sound space would make it obvious that the recording was spliced together.
And then after that, all they'd have is a video of you saying the thing and staring like a zombie into the camera.
It's cool in theory, but I think Hollywood has done a lot better job at achieving better results.
Mmm, Gummi Venus De Milo...
c-hack.com |
... This could hardly be done discretely, at least with todays technology...
I don't think it would be that hard to gather data on someone's face, especially if your program was merely plotting facial movements as data - as one side of the face is more or less the same as the other, you can interpolate and mirror one side of the face if that's all you have. So you video tape from a concealed location, and play back that tape, with enhancements, etc, for your "thief" program. Plus, if you really needed a range of facial expressions, you simply set up the person you are "stealing" from:
A very gorgeous woman walks next to a guy ( the mark) and screetches suddenly - he cringes at first, then his face turns to a smile as the woman begins to rapidly utter things like "Oh my God! it's YOU! Do you remember me? It's been so long!" He utters a few words of "maybe" and "I'm not sure" until some big brute comes out and sees "his woman" hanging all over the guy, and roars this I'm-gonna-kill-you roar, at which point the mark cowers in fear for his life, and the brute and the girl just go away...
All of which is recorded on-the-sly. The actors involved don't even need to know why they are doing what they are doing, as long as they do it well - you've got your data, and you secret away back to your nefarious face-stealing lab, to sample this guys facial expressions, and create an image of him doing whatever... And thats only if he's NOT famous. Most famous people these days are all OVER video, thus more sampling material for you.
When I first read this I thought he was joking. As I read further, I realized he was dead serious. Does anyone else find this highly ridiculous? I'm not suggesting that the concept of people having souls is ridiculous; I just think the idea of the presence or absence of one giving away a computer rendering is absurd.
For anyone who feels the same way as the wired author, I propose the following hypothetical question: If some rendering was constructed (that is, produced algorithmically with the help of an artist) that was a truly perfect copy of a view of an actual person (i.e., every photon given off by either was matched), would a viewer be able to visually distinguish the two?
If someone answers "Yes", then this becomes a matter of belief in supernatural powers and will not benefit from further discussion.
If someone answers "No, but any rendering that could actually be created would be distinguishable from the human", I would give the following argument.
First of all, I don't think the rendering of the actual surfaces involved is a point of contention. If believable "bellies or thighs" can be done, then we can adequately render the surfaces of the face as well. The issue is positioning those surfaces to create a convincingly human expression. What if the artists were to take photographs of the actual person and use points of reference on the person's face as control points to position the artificial model? (Of course, they already do this.) As more control points are used the model will become increasingly like the original. The wired author essentially addresses this very point with his analogy of approximating a circle using many-sided polygons:
This concept falls apart when you consider the content of the final phrase (in parentheses (heh-self describing)). While a face can be considered continuous, human vision is just as discreet as computer graphics. We have a finite number of rods and cones in our retina. The number of possible responses of those rods and cones to different intensities of light may be harder to quantify, but it is certainly true that given two light sources of increasingly similar brightness there exists a point at which they will be humanly indistinguishable. A rendering does not have to be actually perfect to be perfect as far as human vision is concerned.
Anyway, my point is that the problem of creating believable computer representations of humans is a matter of engineering. It certainly is a very difficult problem, but I don't think you can reasonably claim it is insurmountable due to a computer's lack of a soul unless your argument is based on something like telepathy.
Do you wonder if there's a soul behind those synthetic faces? There sure is; it's Japanese. The Japanese are the most likely to perfect synthespians. They already got off to a rocky start with Final Fantasy.
All the pieces are in place: their economy is terrible, they take cartoons seriously, and they envy Americans.
A holy grail of Japanese animation is to look and sound exactly like an American live action movie. They could save their economy by replacing Hollywood actors with Tokyo animators. They could make movies their next great export, after cars and electronics. I think Americans won't lead the synthespian wave: We love our actors too much and we have little to gain. The Japanese don't love American actors (economically) and they have everything to gain.
Final Fantasy's failure to profit has scared them, but they're already improving. They're learning how to write and act like Americans from Americans. That's what Square has done with Kingdom Hearts, translated by Disney and starring Haley Joel Osment. And the Metal Gear games, made in Japan and voice acted in USA, also sell well in USA.
So I think the Japanese will do it. They need to.
If it looked real, it would probably cease to be funny.
You know, like the man who's not afraid of 3 inch bees.