Leaping the Uncanny Valley
reachums submits this glance at "the newest level of computer animation," intended to get past the paradoxical "uncanny valley" — that is, the way animated humans actually can appear jarring as the animation gets hyper-realistic. "This short video gives us a glimpse of what we can hope to see in the future of computer games and movies. Emily is not a real actress, but she looks like a real person, something we haven't truly seen before in computer animation."
There was much talk about the uncanny value when Final Fantasy: The Spirits Within came out after Square had promised for years that it would have realistic humans. A common criticism was that the human beings were real enough to inspire comfort for long enough that one would be then shaken by their lack of certain flexibility and the bloodlessness of their faces. Dr Aki was more creepy than sexy.
From what i understood, this is simply an easier kind of motion capture that works straight from video without the need for sensors etc. That's not the same as creative animation, you still need a real person talking and moving.
My appreciation of Douglas Adams is far deeper than yours.
Not quite 100%, though. It still has the same problem as almost all previous attempts - the eyeblinks don't look right.
I don't know quite what it is - too slow? The eyelids always meet in the same place? - but it's the one thing that screams "fake" to me.
Sammy Davis Jr?
...many flesh-and-blood actors I've seen.
In a discussion elsewhere, someone stated that the facial animation was good, but the body movement was unrealistic. Since the body movement was actually a live actor, I'd say that this was analogous to a passed Turing test -- an observer couldn't tell which parts were animated and which parts were human. (It's a weak analogy, of course, since there was no interaction.)
Just as synthesizers were the end of "real" musicians, photography was the end of "real" paintings, etc.
Slashdot Burying Stories About Slashdot Media Owned
First off, they failed at getting passed the "uncanny valley". That video is still creepy looking.
Second, this isn't computer animation. It's just video processing. If you still need to do high resolution motion capture to produce your images, you haven't replaced the actor. You've merely edited their appearance in the performance. They didn't even bother to go so far as to take the captured motion and paste key bits of it together into the speech. They just had her sit there and say the whole thing, then "rendered" it.
Lame.
Motion capture a face and rerender it from the same viewpoint as a camera used to capture the texture and you'll trivially get something almost indistinguishable from the original. It's only a valid test if you change something significant: move the camera, change the lighting, change the facial features or change the performance.
-- SIGFPE
I am amazed at the quality of this animation: Still, I could see there was -something- wrong with her, but could not put my finger on it. (this was of course also influenced since I -knew- she was fake before watching the vid).
Btw, here's a direct link to the video: http://www.youtube.com/watch?v=bLiX5d3rC6o
Be sure to tick the 'Watch in high quality' when the video opens (anyone knows a way to do that automatically in a link?)
When you shoot a mime, do you use a silencer?
So gluing an weird uncanny mask on an actors face will be the future of animation?
Considering the quality of acting these days by Hollywood, anything that obstructs their faces would be an improvement.
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
I'd say it's past the uncanny valley. That's not to say that I can't tell it's fake. She looks a little fake. Something is wrong-- her face is too still or something. But she doesn't look like a zombie. She's not distractingly creepy. That's all they're really shooting for at the moment, right?
I wonder if certain faces work better with this technology than others. Perhaps younger, smoother faces (like "Emily") work better than old, wrinkly faces, since they can get an accurate representation of skin texture without as much complexity.
Do stuffed animals instantly create a sense of revulsion? Not really else they wouldn't have been around for so long yet this is the ultimate uncanny valley item. As close to the living thing as you can get, fully posed as if it is alive, yet a rotting corpse nonethless.
If you ever dealt with real corpses you would know that they really ain't all this disgusting, it is so easy to get used to it that you might be temped to think that the so called natural revulsion is just media installed reaction.
If the uncanny valley really exist, then please explain realistic paintings that have been around for ages, artisit have tried for hundred of years to create realistic images of human beings and we admire their efforts without any sense of revulsion. Same with statues. Do we feel uneasy at madam Thussauds?
Yes we do NOTICE it when a seemingly realistic thing behaves unrealistic but I have the same sense when I see a car in a computer cut scene that doesn't obey the laws of physics and for instance slides.
It has nothing to do with the uncanny valley, if a real human being was holding a glass of water that didn't spill when tipped over you would get the same feeling.
We know how things work and when they don't we get upset. The trick that cartoons and such pull is that they say right up front by their looks that they are not real and therefor things don't have to work as we expect it.
That was the problem with Final Fantasy, it tried to be a human drama and then didn't use human emotions on the faces of the actors. IF it had been a pure action flick with no close-ups there wouldn't have been a problem. It wasn't the uncanny valley, it was just bad acting, if it had been done by humans who could act we would have felt the same.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Uncanny valley in a nutshell: Is it a "Good Robot" or a "Bad Human"?
But, there is an assumption about what is acceptable... what is the norm? At the moment, we're in a rapid transition phase. There are relatively few human-enough-like examples within our day-to-day existence. I would suggest that as these emulants (to coin a term) become more prevalent and pervasive, their familiarity will reduce the perception of their being bad.
We've come a long ways in the 35+ years since I used an ASR-33 Teletype over a 110-baud modem to a time-shared 8KB minicomputer. That sounds like a long time, and in some respects, it is. Today's generation has seen rapid advancements in game consoles, and even now, the best still appear really good, but still unreal. My guess is that in 5-10-20 years, when the visuals become even better, AND THERE HAS BEEN AN EXTENDED PERIOD OF FAMILIARITY, there will be less of a gap to leap. Not just because the visuals got better, but because we have become more familiar with them.
An aside: Look into the eyes of a young baby. Watch how they make eye contact, and don't let go. Watch how intently they examine you. That's setting up neurons and patterns of what is safe, good, bad, and everything else.
P.S. I wonder if the transition from the old black and white TVs to today's HDTV sets has run through a similar perception challenge?
They have another demo on their Front Page
And while it's extremely impressive, sadly it's definitely in the valley for me.
Yay ! Wonderful low-bandwidth youtube streaming video in all its glorious crap-quality !
The best way to show technical demos about photo-realism !
I can't wait to see the thumbnail sized 60%-quality jpeg screen caps, too !
I feel as much informed about the quality as when watching all those wonderful ads about hiddef screens on the TV.
---
Common, Image Metrics, can't you just post a descent hi-quality video file, so we can actually see what your technology looks like ?
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Heck, she doesn't even look as real as Celine Dion, let alone a real person.
Help! I'm a slashdot refugee.
I wish I'd somehow had a chance to view this before knowing that it was a computer animation... say, a side-by-side comparison of a real and an animated person and a challenge to guess which was animated.
To me, "Emily" did not look real and did look uncanny. Actually, it reminded me of nothing so much as one of those videos where they replace a baby's mouth with animation so that it appears to be talking like an adult. It seemed to me that the animation's "mouth" was not stably positioned on its "face;" when the head turned, I perceived a change in the position of the mouth relative to the face. Something about the skin didn't look right, either.
Would I have accepted it as real if I were expecting "real?" Yes. But that's not the same thing.
Some years back I took part in an experiment to gauge something about necessary bit rates and algorithms to make synthesized speech sound real. What struck me forcibly was that, in this experiment, when you were listening to the best synthesized speech, if I'd had no standard of comparison I'd have said it was real. But when they switched to a real voice saying the same thing, there was the most amazing sensation, almost a tactile sensation of sound shaped by warmth and moisture. Only after you heard the real thing did the synthesized speech seem cold and mechanical.
"How to Do Nothing," kids activities, back in print!
For a look at the "uncanny valley" in the other direction, I recall someone posted this link to something about http://inventorspot.com/articles/girls_get_anime_look_with_extrawide_contact_lenses_16872">"anime eyes" contact-lenses in a story a couple of days ago and it certainly freaked a number of people out.
The World Wide Web is dying. Soon, we shall have only the Internet.
Well, a better question is if the uncanny valley really exists. Or rather, if it's really as simple as that valley, or we're actually looking at a more complex and multi-dimensional phenomenon.
And I'll attempt to build a framework to falsify it. It's a bit roundabout and I'll start by explaining the what and why of that framework, before all else. Bear with me, please.
First of all, before someone jumps in with the ever popular, "OMG, you're not worthy to question the high priests!" (err... "scientists"), the uncanny valley is just a hypothesis. A very compelling and well argued one, no doubt, but hardly a proven fact.
Second, before I get into the meat of the argument, the points chosen to represent it are highly debatable. E.g., is a zombie scary because of being close enough to the real thing to fall in the "uncanny valley", or because of the whole cultural meaning of death, undeath, corpses, etc?
When you look at each point individually, you can handwave and argue it to be wherever you want it, to support your hypothesis. It's called the Texas sharpshooter fallacy, after the fable of the sharpshooter who shot first and then painted a bullseye around the hole. You can "prove" anything in (pseudo-)science if you can do just that to the data: take a fuzzy and ill defined points and argue where they belong on your curve.
The "uncanny valley" paper does just that. We don't know the exact X coordinate on that graph for a zombie or a robot. It could be way right or way left, or whatever. So what really follows is that Mori decided a priori where they belong on that curve, and then places them at a point based on that. It's a textbook application of the Texas sharpshooter fallacy.
So what I'm going to do is an ad absurdum reduction of his curve.
I don't know the exact coordinates of any of my examples either, but, here's the important part: I don't need to pretend to. I'll just peg them between two other values, which, assuming the curve is correct, both fall in the valley or outside it, or some other position. Based on the reaction they caused, and, again, assuming that the curve were correct.
And due to the shape of the curve, if two points are in the valley, then everything between them is in the valley too. If two points are, say, both to the left of the valley, then a point between them should be on the left of the valley too. That is the important part.
So, let's build a counter-example: the FF movie was called a clear example of the Uncanny Valley. It's in the valley. Sony's Everquest 2 (particularly with the unnatural ambient bloom enabled) caused a similar reaction, and many euphemisms were used to describe just that: that that world looked disturbingly unnatural, especially if you pushed the graphics settings high enough. Classic example of entering the uncanny valley from the left, eh? So it's point 2 in that valley.
A point between them should, obviously, also be in the valley. That curve only has one dip, right?
Well, point #3 could be Oblivion. The graphics are better and more detailed than Sony's graphics in EQ, but don't even come close to the insane polygon counts and animations of the FF movies. It's between the two points. It should also be in the valley. It isn't. Nobody was repulsed by Oblivion's graphics. Or pick Crysis, or whatever newer high-end game, and you get the same curious behaviour. It ought to be in the valley, but it isn't.
Let's build another counter-example: so we're told that zombies are only repulsive because they're so close to humans as to fall in the uncanny valley. So logically, if you start with a zombie and move farther and farther away from human-like with it, eventually it exits the valley. Right? In fact, past a point it becomes outright _cute_ and appealing. Or ought to. I mean, that's the shape of that curve.
You probably realize already how absurd that statement is, but let's actually imagine it. Let's say we start with that corpse an
A polar bear is a cartesian bear after a coordinate transform.
Is this her?
If so, good but a little way to go yet :)
The only thing new here is that the equipment required to do the motion capture has been reduced to a single video camera. The facial movements are not being generated by a computer, merely copied from an actor so it's still nowhere near a believable simulation of a human face.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Image Metrics calls this "performance transfer technology". It's not really animation; it's more of a scheme for pasting face A onto actor B. Quite a bit of this already goes on; often, when you see a stunt performer's face on screen, the face of the principal has been transferred to the image of the stunt performer. With this new technology, that can be done without matching camera angles or going through the whole "dots on the face" makeup ordeal.
The person in the video is Emily O'Brien a professional actress. You can find a much better video of Image Metric's work on this page of her website
So they make a 3D model of Emily's face (using a 3D scanner, presumably), then they film Emily moving her face, then they deform the model to match Emily's facial expressions, then they superimpose the model on Emily's head.
Er... what for?
At best they'll end up with something identical to the original (but they don't - the model doesn't wrinkle properly and sometimes the tracking is slightly off - you can see her face "float" relative to the hairline and ears).
I could understand the point if they could take expressions from one person's face and replicate them on another person's face (which is something you can do with motion capture - and some clean-up work). But obviously they can't do that automatically, or they would have done it for the demo.
I can see this kind of technology being useful to disguise the transition between an actor's real face and a 3D face (which will later be deformed by hand, or morphed into some creature, etc.), but the demo is so limited (camera doesn't move, the 3D face is almost identical to the real face, etc.) that it seems a long way off from being an alternative to motion capture and manual tweaking. This is like showing some (supposedly) revolutionary new GPU by making it print "Hello World" on the screen. If the technology is so great, why such a limited demo?
I would be shocked if "within a generation" you couldn't do video games that are animated in real time to the live action level. You're forgetting that one generation ago (~1990), it was impossible to do even cartoon level animation (the first full length CG picture was Toy Story in 1996). Today, a dozen ametuers using free software can produce a short film with equal or better effects (http://en.wikipedia.org/wiki/Elephants_Dream/).
What about that funky Replicant teddy bear from Blade Runner? That was all the way IN the Uncanny Valley.
BTW the girl on the video in the article...FAIL. Very, very, VERY creepy.
Knowledge is power. Knowledge shared is power multiplied.
Is they are trying to make a perfect looking human...humans are defined by their imperfections. When they airbrush real humans too much it winds up looking fake.
They need to add human imperfections to the CGI models to pass the uncanny valley test.
Broken link in parent, try this
"Since no amount of cosmetic surgery will make actual human eyes larger, some girls are trying another way to up their cute quotient: extra-wide contact lenses!"
Well, there is the crazy shit known as "eye tattooing". It's still a young procedure and I don't know if they can blend a tattoo that close to the iris.
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
No, the whole problem with motion capture is that its *not* exact. The results can be pretty faulty, especially when it comes to facial animation and when you then apply those faulty animation data to an equally imperfect mesh you lend right deep down in the uncanny area, exactly *because* its motion capture. With hand animation on the other side an artists can fine tune the results till they look perfect, which however never really happens for realistic facial animation since it would just be way to much work.
An even bigger problem will be making robots that can convincingly pass for human while physically in their presence and trying to feign one-on-one communication. Have you ever noticed that somehow, something just kind of clicks, and you *know* you've made eye contact with someone... and you know that THEY know, too? They might be far away, in a moving vehicle, looking at something else (or just generally looking around), but every now and then "it" happens... you make random, fleeting eye contact with a stranger.
My theory is that it's due to the fact that your eyes are always moving (if your eye were perfectly still, you wouldn't be able to see, because rods and cones derive most of their information from CHANGES rather than instantaneous sampled state). I'm guessing that the pattern of movement appears random, but somehow the part of your brain responsible for background signal processing is able to recognize that movement pattern in the eyes others, and tries to synchronize itself to it. Neither person is intentionally trying to do it, or is even aware of it, but their brains -- through visible eye movement -- are actively negotiating the equivalent of a handshake... and when it happens, a metaphorical "datagram" gets sent to your conscious brain letting you know that you've "locked on" to another person. When you're intentionally talking to someone, it lets you know that you have their attention. When it unexpectedly happens at some random moment when you're just gazing out at the horizon, it can be awkward and uncomfortable.
It's why if you're trying to hide, the worst thing you can possibly do is try to watch what's going on nearby. You might be in the dark shadows, or behind a large object with little more than a hole big enough to see through... but somehow, if someone happens to gaze in the right direction, and their eye detects the movement pattern of an eye somewhere nearby, they're going to immediately feel like something is amiss, even if they don't immediately realize what just happened. If their gaze crosses the gaze of another person who's looking at something entirely different, it might just be a feeling of unease. It's why looking for a lost person or animal is easier than looking for a lost object, at least if you're close enough to potentially make eye contact, Looking for a misplaced object, your brain has to process everything it sees, and constantly do pattern-matching. With people and animals, it's kind of like they're emitting a short-range beacon that allows you to randomly gaze around, but get "that feeling" whenever eye contact occurs, signaling that some area merits further visual inspection.
Anyway, getting back to the Uncanny Valley, it'll be interesting to see what impact the ability to feign eye contact by robots will have. A robot with no eye contact seems creepy in a "dead" kind of way. Would a robot that "almost" managed to maintain eye contact be MORE comforting, or creepier still? Would the "comfort" factor depend upon whether the person interacting with the robot KNEW they were interacting with a robot? Or would making "almost correct" subconscious eye contact with a robot send chills down the person's spine, setting off subconscious alarms to let them know, "DANGER! Something here isn't quite right!", regardless of whether the person KNEW it was a robot?
It wasn't the same fish each time, it wasn't the same place each time, it wasn't even the same time(of day) each time - I would be swimming & sightseeing, stop for no apparent reason, start observing closely, and there would be a large-eyed fish watching me.
Thanks for making me reconsider my initial position - what's that old expression? It's not the things we don't know that get us in trouble, it's the things we know that aren't so.
Hmmm. Your ideas are intriguing to me and I wish to subscribe to your newsletter.