Digital Mouths, Synthetic Faces at MIT and Lucasfilm
jfengel writes "Two separate articles about generating faces automatically. From the Boston Globe, there is a story about MIT scientists putting words into somebody's mouth by splicing together footage. In the samples, I couldn't tell the difference between the synthetic footage and the same person really saying the same thing. (Though it's a little hard to tell at only 81kbps video). And Wired as a lengthy article about generating purely synthetic faces at Lucasfilm. It discusses some of the difficulties in getting it right."
I spent a year in Iraq looking for WMD and all I found was this lousy sig.
Sounds difficult. I guess it's a bit like Photoshopping video, rather than a still image. Kudos!
I've been waiting for the ability to put together new movies by stars long since dead, possibly stars who weren't even contemporaries. I'm sure it will soon be possible, and this looks like they're heading in the right direction.
The biggest hurdle I can see isn't technological, it'll be legal. Who really owns the rights to use the films made by famous people? It might be interesting to see just which ??AA lays claim to it first.
Lemon curry?
Using Poser, I found natural body movements fairly hard to create. the main difficulties I can see in getting facial expressions correct are simple: They have to be 'real'. Because everyone's face is different, the most accurate way to do faces is to'sample' a real face. Purely computer generated faces are not hard. The hard things are the TRANSITIONS between the expressions. These are extremely hard. Just ask the disney artists who did snow white. Moving from story-board to story-board is the hardest part. Computers have done a lot to help the transition problem. But sampling a real face is the best way to get things accurate so far.
Sig (appended to the end of comments you post, 120 chars)
this is great. Maybe the lip-syncing in Britney Spears' videos won't be so obvious.
Do you even lift?
These aren't the 'roids you're looking for.
The difference is that they take a video of a human rather than build the image up from wireframe. Basically, they can take a video of say President Bush and have him say stuff that he didn't say. In FF, it's obviously a cartoon image talking. It's easy to build a cartoon of a human but it is difficult to animate a real person that you can compare videos with.
_______________________________
"I'm not Conceited...I'm just a realist..."
This about somes it up for me....
Although imagining Ted Kopel speaking in spanish is a riot.
I remember being in europe some place, listening to the BBC for ten minutes on a shortwave radio, desperately trying to understand what the guy was saying through all of the static. It then occurred to me that the announcer was speaking in spanish in a really thick and proper british accent. The accent was so strong it threw me off, between the static, and everything.
So I wonder if Koppel would even be understanding.
"It is a greater offense to steal men's labor, than their clothes"
I spent a year in Iraq looking for WMD and all I found was this lousy sig.
Read my lips: Strategerie means no new taxes. P-o-t-a-t-o-e.
I'm the Devil the Windows users warned you about.
henrik wann jensen is developing some of the most usable algorithms for skin and other translucent materials. He gave a talk last month at Cal as a prospective faculty member. It was fairly impressive.
his home page
rendering skin
rendering smoke
replace 'berserkeley' with 'berkeley' to respond via email.
haven't you all seen the maulibu stacy simpsons episode?...
"Hello Smithers, You're Quite Good At Turning Me On."
sig - .
Perhaps this will lead to greater adoption of digital signing?
Not sure whether the President's speech is real or fake? Just see if he signed the authorised transmissions with his PGP key.
I work with 3D design, and can certainly attest to the difficulty in mimicking people. The huge numbers of muscles and tiny details of morphology that make up a human face is a tremendously important part of making realism. However, ultimately a surface is needed, as it is, in the end, the light that is reflected back to our eyes. How real the surface looks is a required part of the equation, and some of the new advancements being made in rendering are quite exciting to me. For instance, many older raytracers only handle how light directly reflects off the surface of a texture. But in reality, things like human skin are not opaque, but are slightly translucent. The light passes into the skin, reflects off things like blood vessels, and exits again. Light also behaves in other interesting ways in certain situations. And some effects are simply dependent on computational power. Radiosity, for instance, can make scenes look much more realistic, but is too cycle-hungry to be used all the time in full-screen video. Being able to set these sorts of properties without having to program complex custom render modules for each movie will go a long way towards making artificial people more common.
we'll soon see a video of Dan Rather singing Rocked by Rape?
The shareholder is always right.
I don't think I'm special in this respect, but I didn't find the example clips that were given too hard to discern.
Look for enunciation of certain latters such as P and M, and you should be able to tell the difference. The generated image gives a sense of moving the mouth but not enunciating the words clearly. Almost as if she is gliding over the words. With the real movie, however, you can see the woman completely changing her mouth formation to form the sounds required to pronounce the words.
Another, more benign use of the tech could be in entertainment. There was that episode of Star Trek: Deep Space Nine where they integrated the actors in with footage from the classic ep, Trouble With Tribbles. Great fun, but they were limited to using footage that exisited from the original series for intereacting with Kirk, Spock et al. Imagine being able to track Shatner's 60's face onto an actor and use this tech to lipsync 21st century Shatner's dialog. Best. Time Travel. Episode. Ever.
And I don't even like Trek that much :-)
banning probably wouldn't work. When ArtificialLipManipulation is outlawed, only outlaws will have ArtificialLipManipulation.
One futuristic countermeasure I can imagine would be for concerned citizens (e.g., politicians, dissidents) to have some type of device that cryptographically signs some aspect of their speech along with a trusted indicator of time. This thing would have to transmit a signal that would be embedded in any recorded media. Thus, verification of the digital signature of the audio and time hash would indicate whether the original recording was fabricated or tampered with. Doesn't really get around this technology (i.e., they only make it look like you're mouthing the audio, they don't deal with the audio), but it would prevent others from splicing or generating fake audio to accompany these phony video clips...
Of course, there's only like, 50 zillion reasons this would be difficult/impossible to implement. But hey, I'm just the idea man...
Of course, if they could extend this work beyond the lips and face, imagine what the porn industry could do...
"History doesn't repeat itself, but it does rhyme." Mark Twain
Speaking of Trek, there was a fairly recent episode of Enterprise where some odd aliens used spliced images of the captain to send a threatening message, because, I guess, they could not speak human themselves. (I don't even think they had mouths.)
It was kind of a freaky affect.
Table-ized A.I.
Should photocopiers be banned because they make it easier to forge written documents? No, and neither should this. People are just going to have to get used to the fact that forging video is possible.
I don't care if it's 90,000 hectares. That lake was not my doing.
Hah... these MIT gurus think they have originality, huh? Well, I'll have you know that the guys behind South park have mastered the skill of matching voices to moving mouths long ago.
Damn Canadians and their flapping heads... and Saddam Hussein, too!
A new wave of those Elian Gonzalez doing WASSSSUPPPP videos, oh joy!
Remember this?
Moderation Totals: Flamebait=2, Troll=1, Redundant=1, Insightful=6, Overrated=1, Underrated=1, Total=12. (not mine)
the year is 2095. the reviewer speaks:
...[and they] show a wide range of competence. Some scenes, such as //this//, are nothing short of brilliant. However, I can't agree with those who believe that a large quantity of sublime art was lost. OSc was in its infancy, and the original consensualists tended to be technical personnel with vivid but unsophisticated imaginations. I have seen all 18 remaining snaps of OS-LOTR, and am convinced that nothing of value was lost to the Tolkienist or to the viewing public.
//graph// of the isologs: precipitous in the higher dimensions, almost flat in D1 through D5. Midlands is universally available and is the vehicle through which most young people first meet Tolkien. It is still maintained, although the classic version stabilized in 2072.
....
Let me begin by once again repeating the truism: no video whatsoever can match the scenes as they appear to your imagination during a simple, unaided reading of the three volumes of Tolkien's original text.
With that out of the way, I will say that my own favorite among the video versions is the recent blockbuster edition, followed by the "Midlands" OSc 2072 dist (tuned 2,-1,4,0); and after that, the 2001-2003 movies using the Gibson/Taylor overlay. This review concentrates on videos; I will leave VRs for another day.
There is no need, at this remove, to cite the failings of the Bakshi anime (1978) or Jackson's groundbreaking 2001-2003 live action movie.... However, when WWM re-released the "long" version on tab with a selection of overlays, including Mercer/Tran/Lopez and Gibson/Taylor, the movie was transformed from a mere classic to a paradigm of style. Its effect on a generation resembled the effect of the original books on the "Sixties Era" (roughly 1964-1972). The wildly popular M/T/L overlay, its unearthly beauty toning down the somewhat brutal original video, went straight to the heart of the virals.
At the same time, the first underground OSc version, "OS-LOTR", was in process. Remember that this was before the Hurst case and copyright law was still in the postmillennial phase. Nevertheless, thousands of people participated. By any standard, the first version was pretty primitive. The base disappeared during Hurst. Only 18 snaps survive;
The first legal OSc version ("OurRing") is also available at universities, but is not worth the casual viewer's time. The maintainers provided no guidance. Story elements of an unsavory nature, having nothing to do with the original books, found their way into the base. Tuning was in its infancy: OurRing provides only five settings in each of three dimensions. The project became overlarge, and never gained popularity outside a hobbyist community. It is of historical interest only, as is the short-lived "Bakshi", based on the anime, begun and closed within a year after OurRing.
"Midlands", on the other hand, became a classic within weeks of startup. It derives most of its visual imagery and pacing from the centennial remake, but retains none of the bizarrer elements. A comparison of snaps is extremely revealing. The earliest still archived (two days in) is almost an exact copy of LOTR-100. In one week more, participation skyrocketed by 6000 percent, and the nine-day snap contains none at all of the odd politico-academic coloration. Note the gradients in this
Midlands is far more tunable than OurRing. The original tuner, which is part of the OSc v. 5.4 kernel, allowed for 15 dimensions. Addicts and purists apply the 500-dimension Gordon tuner. I have viewed several allegedly "perfectly" Gordon-tuned versions and could see no difference at all. These decimal-place variations invisible to anyone else fuel quite vitriolic disputes in the hobbyist community.
"Zealand" and "Hildebrandt", Midlands' two nearest competitors, have a much smaller following. Zealand is of course based on the 2003 video. Hildebrandt is experimental; it combines OSc and overlay technologies. There is no dist--as the maintainer states in true twentieth-century fashion, it is intended to be a "work in progress", to be "as dynamic as the events it portrays". This can lead to surprises if you view over a period of days instead of capturing the whole thing at once. Its consos also tend to be outside the standard demo.
Last year's remake is, in my opinion, the best of all. Yes, it condenses the story, but this is not a bad thing, as anyone will agree who has played one of the realtime VRs. Stern's directorial imagination could not possibly be closer to Tolkien's original vision. There is, of course, no truth to the rumor that he is a clone of Tolkien made for the purpose.
Read the EFF's Fair Use FAQ
It's easy to build a cartoon of a human but it is difficult to animate a real person that you can compare videos with.
Huh?! I work as a Sr. VFX guy, and CGI (Computer Generated Imaging) for facial animation is one of the most complex things to do!
Basically, there are so many muscles in the face and so many nuances that it is very difficult to emulate a realistic face. Chris Landreth is a director at Alias|wavefront with whom I had the "pleasure" of working with. His entire focus has mainly been with facial animation. And even with his talent, facial animation still doesn't look 100% realistic.
Check out the book: Computer Facial Animation to get a glimpse at the mathematics, anatomy, and other technical hurdles being overcome in this arena.
Vital Idea
Afterall, with sufficient CPU power, anybody could make anybody talk about anything!
This will also mean that the court system will then ask for eyewitnesses since videos will not be admissible.
I'm not sure whether this is good or bad.
The statement below is true.
The statement above is false.
I can really really see this being used in war. Yes I know, hand it to us humans to take something like this and make it a weapon for war, but anyways...
Imagine Osama broadcasting on Afghani telivision to his troops to surrender to the nearest US platoon. I'm probably overestimating the stupidity of your average afghani al quaida member but chances are, you might get a good number of them to actually buy it, and surrender.
Going even further, we could fake Osama's capture, have him broadcast to the country that america is a nice place and to quit being player haters. Yeah I know this all sounds far fetched but i'm sure the military would already be looking into this.
I mean, this is pretty cool and all, but there's no reason to start worrying if someone's gonna put words in your mouth anytime soon. First they'd need:
1. a few minutes of footage of you saying stuff that has the full range of mouth movements directly into a camera.
2. an audio recording of you actually saying what it is that they want you to say. It's possible to cut and splice seperate recordings together, but 99% of the time, differences in the sound space would make it obvious that the recording was spliced together.
And then after that, all they'd have is a video of you saying the thing and staring like a zombie into the camera.
It's cool in theory, but I think Hollywood has done a lot better job at achieving better results.
Mmm, Gummi Venus De Milo...
c-hack.com |
That's my point. It is easy to make a cartoon look human-like (such as in FF). But you still know it's an animation. That's what I meant when I said that it is difficult to animate a real person (ie non-cartoon character). However, what they did was different. They took the original video and modified it.
_______________________________
"I'm not Conceited...I'm just a realist..."
I actually got it right (before looking at the answers, even). :)
What do the synthetic pictures have in common? Well, in both cases the woman moves her lips a bit less (the second) or does slightly less facial expressions (the first one).
With this movie at low-quality post stamp size, I have my doubts regarding a full size TV newsreader. But I guess the technology is still in prototype stages and in a few years, we'll likely have synthetic newsreader indin.. indisti... indistinguishable from the real thing. But still probably far away from the same synthetic person actually performing some action more than talking.
Beware: In C++, your friends can see your privates!
When I first read this I thought he was joking. As I read further, I realized he was dead serious. Does anyone else find this highly ridiculous? I'm not suggesting that the concept of people having souls is ridiculous; I just think the idea of the presence or absence of one giving away a computer rendering is absurd.
For anyone who feels the same way as the wired author, I propose the following hypothetical question: If some rendering was constructed (that is, produced algorithmically with the help of an artist) that was a truly perfect copy of a view of an actual person (i.e., every photon given off by either was matched), would a viewer be able to visually distinguish the two?
If someone answers "Yes", then this becomes a matter of belief in supernatural powers and will not benefit from further discussion.
If someone answers "No, but any rendering that could actually be created would be distinguishable from the human", I would give the following argument.
First of all, I don't think the rendering of the actual surfaces involved is a point of contention. If believable "bellies or thighs" can be done, then we can adequately render the surfaces of the face as well. The issue is positioning those surfaces to create a convincingly human expression. What if the artists were to take photographs of the actual person and use points of reference on the person's face as control points to position the artificial model? (Of course, they already do this.) As more control points are used the model will become increasingly like the original. The wired author essentially addresses this very point with his analogy of approximating a circle using many-sided polygons:
This concept falls apart when you consider the content of the final phrase (in parentheses (heh-self describing)). While a face can be considered continuous, human vision is just as discreet as computer graphics. We have a finite number of rods and cones in our retina. The number of possible responses of those rods and cones to different intensities of light may be harder to quantify, but it is certainly true that given two light sources of increasingly similar brightness there exists a point at which they will be humanly indistinguishable. A rendering does not have to be actually perfect to be perfect as far as human vision is concerned.
Anyway, my point is that the problem of creating believable computer representations of humans is a matter of engineering. It certainly is a very difficult problem, but I don't think you can reasonably claim it is insurmountable due to a computer's lack of a soul unless your argument is based on something like telepathy.
Wow, did you write that? That was an impressive bit of sci-fi prose. I found it fascinating and believable. I think the 'review' seemed very real to me largely because of the unfamiliar jargon and details interspersed in it. Reminds me of the entire slang language Anthony Burgess made up for A Clockwork Orange.
Do you wonder if there's a soul behind those synthetic faces? There sure is; it's Japanese. The Japanese are the most likely to perfect synthespians. They already got off to a rocky start with Final Fantasy.
All the pieces are in place: their economy is terrible, they take cartoons seriously, and they envy Americans.
A holy grail of Japanese animation is to look and sound exactly like an American live action movie. They could save their economy by replacing Hollywood actors with Tokyo animators. They could make movies their next great export, after cars and electronics. I think Americans won't lead the synthespian wave: We love our actors too much and we have little to gain. The Japanese don't love American actors (economically) and they have everything to gain.
Final Fantasy's failure to profit has scared them, but they're already improving. They're learning how to write and act like Americans from Americans. That's what Square has done with Kingdom Hearts, translated by Disney and starring Haley Joel Osment. And the Metal Gear games, made in Japan and voice acted in USA, also sell well in USA.
So I think the Japanese will do it. They need to.
Seem almost prescient considering what happened in Florida in 2000 :-)
Sheesh, when will you democrats stop whining? It's like losing on penalties. If you can't score one more goal in over two hours of football, then you really can't complain about losing on penalites (even if it was a duff decision). BIG :-)
"The first thing to do when you find yourself in a hole is stop digging."
You're assuming I'm an American - hell, I'm not even in the northern hemisphere... :-)
And considering the state of Australia's government, I really shouldn't be making fun of yours :-)
Just think of the film possibilities in the future!
:) We could have Brad Pitt as the main bad guy (we all know he's crazy), and Sean Connery as the local sheriff... oh, and then pick any half dozen supermodels/really effing hot chicks for the town whores/barmaids.
:)
When we consider Final Fantasy: The Movie, and contrast it to what should be viable within just 5 years from now, it boggles the mind.
I, for one, would love to see a digital-quality old western film - but with both the Duke and Eastwood, not just one. Oh, and while we're at it, why not have Arnold Swartsenager (spelled wrong, I'm sure) be a henchman. And hell, throw "Han Solo" (Harrison Ford) in there as a local traveling trader, but in some western chaps.
That'd be a really fun movie to watch.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
If it looked real, it would probably cease to be funny.
You know, like the man who's not afraid of 3 inch bees.
Great - so what we're saying now is that Bush's malapropisms and obvious discomfort oncamera are sure signs that he's actually a human? Vote for me, I sweat and stammer on camera!
I thought the synthetic woman's delivery was very Gorelike, i.e. too wooden and perfect to be human.
-Styopa
The Digital Animations Group (http://www.digital-animations.com/) have been doing computer generated characters very well for a couple of years. They are responsible for Ananova, the Talking Head and their latest creation the singing and dancing virtual pop star Tmmy (http://www.tmmy.co.uk), which BTW I submitted to slashdot but it was refused.
Thanks! Yeah, I wrote it after I saw the movie a few months ago and wished I could make a few slight changes.
For Martyn S., here's the key--
- Overlays: Computer-generated actors, or sets of actors, replacing the originals.
- Tuners: Some kind of technology that allows you to set the amount of romance, scenery, violence, history, magic, humor, or other features (up to 500 with the Gordon tuner software) to your personal preference. Sort of like adjusting brightness/contrast/colors in an image file, on a conceptual level.
- OSc is "open source creativity." It means that a lot of people modify the "base" video, under control of maintainers. These people are called consensualists or consos.
- Snaps = snapshots of the what the video looks like at one point in time, because with OSc it's changing all the time.
- Virals = nickname for a generation, like "flappers" or "hippies" is to us.
Check the reply to the next message below.
People will just have to cease believing everything they watch on TV. Just as we've all learned to not believe everything we read. (yeah right)
These are my friends, See how they glisten. See this one shine, how he smiles in the light.