Digital Mouths, Synthetic Faces at MIT and Lucasfilm
jfengel writes "Two separate articles about generating faces automatically. From the Boston Globe, there is a story about MIT scientists putting words into somebody's mouth by splicing together footage. In the samples, I couldn't tell the difference between the synthetic footage and the same person really saying the same thing. (Though it's a little hard to tell at only 81kbps video). And Wired as a lengthy article about generating purely synthetic faces at Lucasfilm. It discusses some of the difficulties in getting it right."
I spent a year in Iraq looking for WMD and all I found was this lousy sig.
this is great. Maybe the lip-syncing in Britney Spears' videos won't be so obvious.
Do you even lift?
These aren't the 'roids you're looking for.
I spent a year in Iraq looking for WMD and all I found was this lousy sig.
Read my lips: Strategerie means no new taxes. P-o-t-a-t-o-e.
I'm the Devil the Windows users warned you about.
henrik wann jensen is developing some of the most usable algorithms for skin and other translucent materials. He gave a talk last month at Cal as a prospective faculty member. It was fairly impressive.
his home page
rendering skin
rendering smoke
replace 'berserkeley' with 'berkeley' to respond via email.
I work with 3D design, and can certainly attest to the difficulty in mimicking people. The huge numbers of muscles and tiny details of morphology that make up a human face is a tremendously important part of making realism. However, ultimately a surface is needed, as it is, in the end, the light that is reflected back to our eyes. How real the surface looks is a required part of the equation, and some of the new advancements being made in rendering are quite exciting to me. For instance, many older raytracers only handle how light directly reflects off the surface of a texture. But in reality, things like human skin are not opaque, but are slightly translucent. The light passes into the skin, reflects off things like blood vessels, and exits again. Light also behaves in other interesting ways in certain situations. And some effects are simply dependent on computational power. Radiosity, for instance, can make scenes look much more realistic, but is too cycle-hungry to be used all the time in full-screen video. Being able to set these sorts of properties without having to program complex custom render modules for each movie will go a long way towards making artificial people more common.
we'll soon see a video of Dan Rather singing Rocked by Rape?
The shareholder is always right.
I don't think I'm special in this respect, but I didn't find the example clips that were given too hard to discern.
Look for enunciation of certain latters such as P and M, and you should be able to tell the difference. The generated image gives a sense of moving the mouth but not enunciating the words clearly. Almost as if she is gliding over the words. With the real movie, however, you can see the woman completely changing her mouth formation to form the sounds required to pronounce the words.
Another, more benign use of the tech could be in entertainment. There was that episode of Star Trek: Deep Space Nine where they integrated the actors in with footage from the classic ep, Trouble With Tribbles. Great fun, but they were limited to using footage that exisited from the original series for intereacting with Kirk, Spock et al. Imagine being able to track Shatner's 60's face onto an actor and use this tech to lipsync 21st century Shatner's dialog. Best. Time Travel. Episode. Ever.
And I don't even like Trek that much :-)
the year is 2095. the reviewer speaks:
...[and they] show a wide range of competence. Some scenes, such as //this//, are nothing short of brilliant. However, I can't agree with those who believe that a large quantity of sublime art was lost. OSc was in its infancy, and the original consensualists tended to be technical personnel with vivid but unsophisticated imaginations. I have seen all 18 remaining snaps of OS-LOTR, and am convinced that nothing of value was lost to the Tolkienist or to the viewing public.
//graph// of the isologs: precipitous in the higher dimensions, almost flat in D1 through D5. Midlands is universally available and is the vehicle through which most young people first meet Tolkien. It is still maintained, although the classic version stabilized in 2072.
....
Let me begin by once again repeating the truism: no video whatsoever can match the scenes as they appear to your imagination during a simple, unaided reading of the three volumes of Tolkien's original text.
With that out of the way, I will say that my own favorite among the video versions is the recent blockbuster edition, followed by the "Midlands" OSc 2072 dist (tuned 2,-1,4,0); and after that, the 2001-2003 movies using the Gibson/Taylor overlay. This review concentrates on videos; I will leave VRs for another day.
There is no need, at this remove, to cite the failings of the Bakshi anime (1978) or Jackson's groundbreaking 2001-2003 live action movie.... However, when WWM re-released the "long" version on tab with a selection of overlays, including Mercer/Tran/Lopez and Gibson/Taylor, the movie was transformed from a mere classic to a paradigm of style. Its effect on a generation resembled the effect of the original books on the "Sixties Era" (roughly 1964-1972). The wildly popular M/T/L overlay, its unearthly beauty toning down the somewhat brutal original video, went straight to the heart of the virals.
At the same time, the first underground OSc version, "OS-LOTR", was in process. Remember that this was before the Hurst case and copyright law was still in the postmillennial phase. Nevertheless, thousands of people participated. By any standard, the first version was pretty primitive. The base disappeared during Hurst. Only 18 snaps survive;
The first legal OSc version ("OurRing") is also available at universities, but is not worth the casual viewer's time. The maintainers provided no guidance. Story elements of an unsavory nature, having nothing to do with the original books, found their way into the base. Tuning was in its infancy: OurRing provides only five settings in each of three dimensions. The project became overlarge, and never gained popularity outside a hobbyist community. It is of historical interest only, as is the short-lived "Bakshi", based on the anime, begun and closed within a year after OurRing.
"Midlands", on the other hand, became a classic within weeks of startup. It derives most of its visual imagery and pacing from the centennial remake, but retains none of the bizarrer elements. A comparison of snaps is extremely revealing. The earliest still archived (two days in) is almost an exact copy of LOTR-100. In one week more, participation skyrocketed by 6000 percent, and the nine-day snap contains none at all of the odd politico-academic coloration. Note the gradients in this
Midlands is far more tunable than OurRing. The original tuner, which is part of the OSc v. 5.4 kernel, allowed for 15 dimensions. Addicts and purists apply the 500-dimension Gordon tuner. I have viewed several allegedly "perfectly" Gordon-tuned versions and could see no difference at all. These decimal-place variations invisible to anyone else fuel quite vitriolic disputes in the hobbyist community.
"Zealand" and "Hildebrandt", Midlands' two nearest competitors, have a much smaller following. Zealand is of course based on the 2003 video. Hildebrandt is experimental; it combines OSc and overlay technologies. There is no dist--as the maintainer states in true twentieth-century fashion, it is intended to be a "work in progress", to be "as dynamic as the events it portrays". This can lead to surprises if you view over a period of days instead of capturing the whole thing at once. Its consos also tend to be outside the standard demo.
Last year's remake is, in my opinion, the best of all. Yes, it condenses the story, but this is not a bad thing, as anyone will agree who has played one of the realtime VRs. Stern's directorial imagination could not possibly be closer to Tolkien's original vision. There is, of course, no truth to the rumor that he is a clone of Tolkien made for the purpose.
It's easy to build a cartoon of a human but it is difficult to animate a real person that you can compare videos with.
Huh?! I work as a Sr. VFX guy, and CGI (Computer Generated Imaging) for facial animation is one of the most complex things to do!
Basically, there are so many muscles in the face and so many nuances that it is very difficult to emulate a realistic face. Chris Landreth is a director at Alias|wavefront with whom I had the "pleasure" of working with. His entire focus has mainly been with facial animation. And even with his talent, facial animation still doesn't look 100% realistic.
Check out the book: Computer Facial Animation to get a glimpse at the mathematics, anatomy, and other technical hurdles being overcome in this arena.
Vital Idea
I mean, this is pretty cool and all, but there's no reason to start worrying if someone's gonna put words in your mouth anytime soon. First they'd need:
1. a few minutes of footage of you saying stuff that has the full range of mouth movements directly into a camera.
2. an audio recording of you actually saying what it is that they want you to say. It's possible to cut and splice seperate recordings together, but 99% of the time, differences in the sound space would make it obvious that the recording was spliced together.
And then after that, all they'd have is a video of you saying the thing and staring like a zombie into the camera.
It's cool in theory, but I think Hollywood has done a lot better job at achieving better results.
Mmm, Gummi Venus De Milo...
c-hack.com |