Slashdot Mirror


Researchers Teach Computers To Perceive 3D from 2D

hamilton76 writes to tell us that researchers at Carnegie Mellon have found a way to allow computers to extrapolate 3 dimensional models from 2 dimensional pictures. From the article: "Using machine learning techniques, Robotics Institute researchers Alexei Efros and Martial Hebert, along with graduate student Derek Hoiem, have taught computers how to spot the visual cues that differentiate between vertical surfaces and horizontal surfaces in photographs of outdoor scenes. They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image. [...] Identifying vertical and horizontal surfaces and the orientation of those surfaces provides much of the information necessary for understanding the geometric context of an entire scene. Only about three percent of surfaces in a typical photo are at an angle, they have found."

7 of 145 comments (clear)

  1. Awesome! by rblum · · Score: 5, Funny

    Now run it on an Escher picture!

  2. Robot vision by amightywind · · Score: 4, Insightful

    They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image

    This is so not new. These researchers may have advanced techniques is some areas, but shape from shading inversion problems like this have been worked successfully since the 1970's and earlier. The theory is well established. Horn's Robot Vision is a classic.

    --
    an ill wind that blows no good
  3. Errr... by Ayanami+Rei · · Score: 5, Informative

    you've always been able to do that.
    Cities aren't the kind of thing this is target for.
    You can get building plans and architectural drawings and everything from the city for free. There are algorithms that can easily map pictures to objects if you know ahead of time the shape of the things that "should" be there.

    This stuff is for deciding the shape of unknown things, and more importantly, to gain new heuristics for image searches.

    With this technology, you could ask for "things that are round, and have a box".

    More importantly, you could show the computer one picture of something, and have it attempt to find more pictures of it (from different angles, with different colors, etc.). Like you show it a Volvo C90, and it shows you any and all pictures of Volvo C90s by the shape.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  4. First application will be... by Onimaru · · Score: 5, Funny

    ...pr0n, of course. Now we can accurately predict and model the exact size and specularity of Linsey Lohan's boobies, using this revolutionary new (wait for it) Mellon Engine. Truly, we live in the future.

    --
    adam b.
  5. "Enemy of the State" by Rob+T+Firefly · · Score: 4, Funny

    So we're one step closer to actually being able to do the dramatic image-enhancing stuff that's routine in film and television crime drama? You know, where the brooding detective notices four interesting pixels in the background of a scratchy security video, strokes his chin thoughtfully, and says "enhance this bit" to the stereotype computer geek. The geek types noisily, the computer zooms in on thouse four pixels, and clears it up into a detailed image of the bad guy, often moving other foreground stuff out of the way to do so.

    1. Re:"Enemy of the State" by Jerf · · Score: 4, Informative

      It's worth pointing out that a lot of that stuff isn't, strictly speaking, impossible.

      What's impossible is to take a single photo out of the stream and "enhance" it to the n-th degree without using the rest of the video.

      And no matter how good your technique, you can't generate information, so there will be some limit to your zooming in.

      But the idea that if you consider the entire video stream, you can extract a lot more information is not impossible at all, and you'd probably be surprised by both what is in there and what isn't. Seeing "through" something probabilistically is possible if the object being "seen" was in video at some point. On the other hand, "zooming" in to something on the counter that has been there for the entire duration of the video and has never moved is impossible, because while you may have 15,000 pictures of the object, they're all the same pictures.

      Normally I don't bring this up when we're having one of our usual bitch-fests about CSI here on Slashdot because by and large the standard bitching is still correct. But as AI advances, some of the stuff that seems impossible now will become very possible.

      One early example I remember seeing is the demonstration of a system that could identify a person with about 15x15 pixel, high-temporal-resolution monochrome video of them walking, by comparing walking patterns. This was a while ago, and it's worth pointing out your brain can do a pretty decent job of the same task when shown the same video. I mention this because any given frame of the video is basically a random assortment of gray blobs, but in motion, not only is it "a person" but it's a specific person; making it a video adds a lot of information.

  6. Play with it yourself! by cranesan · · Score: 4, Interesting

    http://www.cs.cmu.edu/~dhoiem/projects/popup/index .html

    Looks like some of the software they wrote to do this has been GPL'ed.