Researchers Teach Computers To Perceive 3D from 2D

← Back to Stories (view on slashdot.org)

Researchers Teach Computers To Perceive 3D from 2D

Posted by ryuzaki0 on Wednesday June 14, 2006 @07:16AM from the your-battlebot-wants-an-upgrade dept.

hamilton76 writes to tell us that researchers at Carnegie Mellon have found a way to allow computers to extrapolate 3 dimensional models from 2 dimensional pictures. From the article: "Using machine learning techniques, Robotics Institute researchers Alexei Efros and Martial Hebert, along with graduate student Derek Hoiem, have taught computers how to spot the visual cues that differentiate between vertical surfaces and horizontal surfaces in photographs of outdoor scenes. They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image. [...] Identifying vertical and horizontal surfaces and the orientation of those surfaces provides much of the information necessary for understanding the geometric context of an entire scene. Only about three percent of surfaces in a typical photo are at an angle, they have found."

23 of 145 comments (clear)

Awesome! by rblum · 2006-06-14 07:19 · Score: 5, Funny

Now run it on an Escher picture!
leaning tower by ZivZoolander · 2006-06-14 07:21 · Score: 3, Interesting

Wonder how this will handle those optical illusion photos. like me nocking over the leaning tower of pisa, or holding hte statue of liberty.
1. Re:leaning tower by Tolleman · 2006-06-14 07:56 · Score: 2, Funny
  
  Just like us. Segmentation fault.
Directly applicable to the car racing AI grand.... by ChrisGilliard · 2006-06-14 07:22 · Score: 3, Interesting

...challenge. I think Carnegie Mellon wants revenge against Stanford for beating them in the 2006 DARPA grand challenge. Maybe 2007 will be Carnegie Mellon's year to win the grand challenge. If this happens, we're only a hop skip and a jump to having these things drive us around (esp on freeways).

--
No Sigs!
Imagine the Possibilities by Valthan · 2006-06-14 07:23 · Score: 2, Interesting

One could concievably take a pictures of a city, upload them to this program, stich the pieces together and then import it into a game world. How awesome would it be to actually be able to run around a city(say Toronto) and do things you always wanted to do... (dropping a penny off of the CN tower and having it hit someone :D)

--
--Valthan
Typical photos? by doti · 2006-06-14 07:24 · Score: 3, Interesting

Only about three percent of surfaces in a typical photo are at an angle

What typical photos are those? No faces, people, trees or any organic thing?
No cars? No roofs?

--
factor 966971: 966971
Robot vision by amightywind · 2006-06-14 07:26 · Score: 4, Insightful

They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image

This is so not new. These researchers may have advanced techniques is some areas, but shape from shading inversion problems like this have been worked successfully since the 1970's and earlier. The theory is well established. Horn's Robot Vision is a classic.

--
an ill wind that blows no good
Errr... by Ayanami+Rei · 2006-06-14 07:26 · Score: 5, Informative

you've always been able to do that.
Cities aren't the kind of thing this is target for.
You can get building plans and architectural drawings and everything from the city for free. There are algorithms that can easily map pictures to objects if you know ahead of time the shape of the things that "should" be there.

This stuff is for deciding the shape of unknown things, and more importantly, to gain new heuristics for image searches.

With this technology, you could ask for "things that are round, and have a box".

More importantly, you could show the computer one picture of something, and have it attempt to find more pictures of it (from different angles, with different colors, etc.). Like you show it a Volvo C90, and it shows you any and all pictures of Volvo C90s by the shape.

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
1. Re:Errr... by jackbird · 2006-06-14 09:03 · Score: 2, Funny
  
  You can get building plans and architectural drawings and everything from the city for free. There are algorithms that can easily map pictures to objects if you know ahead of time the shape of the things that "should" be there.
  Dear Sir,
  ha ha ha.
  ha ha ha ha ha ha ha.
  ha.
  If only.
  Signed,
  every CAD operator in the world
First application will be... by Onimaru · 2006-06-14 07:29 · Score: 5, Funny

...pr0n, of course. Now we can accurately predict and model the exact size and specularity of Linsey Lohan's boobies, using this revolutionary new (wait for it) Mellon Engine. Truly, we live in the future.

--
adam b.
"Enemy of the State" by Rob+T+Firefly · 2006-06-14 07:31 · Score: 4, Funny

So we're one step closer to actually being able to do the dramatic image-enhancing stuff that's routine in film and television crime drama? You know, where the brooding detective notices four interesting pixels in the background of a scratchy security video, strokes his chin thoughtfully, and says "enhance this bit" to the stereotype computer geek. The geek types noisily, the computer zooms in on thouse four pixels, and clears it up into a detailed image of the bad guy, often moving other foreground stuff out of the way to do so.

--
Slashdot Burying Stories About Slashdot Media Owned
1. Re:"Enemy of the State" by Jerf · 2006-06-14 07:50 · Score: 4, Informative
  
  It's worth pointing out that a lot of that stuff isn't, strictly speaking, impossible.
  
  What's impossible is to take a single photo out of the stream and "enhance" it to the n-th degree without using the rest of the video.
  
  And no matter how good your technique, you can't generate information, so there will be some limit to your zooming in.
  
  But the idea that if you consider the entire video stream, you can extract a lot more information is not impossible at all, and you'd probably be surprised by both what is in there and what isn't. Seeing "through" something probabilistically is possible if the object being "seen" was in video at some point. On the other hand, "zooming" in to something on the counter that has been there for the entire duration of the video and has never moved is impossible, because while you may have 15,000 pictures of the object, they're all the same pictures.
  
  Normally I don't bring this up when we're having one of our usual bitch-fests about CSI here on Slashdot because by and large the standard bitching is still correct. But as AI advances, some of the stuff that seems impossible now will become very possible.
  
  One early example I remember seeing is the demonstration of a system that could identify a person with about 15x15 pixel, high-temporal-resolution monochrome video of them walking, by comparing walking patterns. This was a while ago, and it's worth pointing out your brain can do a pretty decent job of the same task when shown the same video. I mention this because any given frame of the video is basically a random assortment of gray blobs, but in motion, not only is it "a person" but it's a specific person; making it a video adds a lot of information.
It is a fairly simple process by IndustrialComplex · 2006-06-14 07:33 · Score: 2, Informative

I remember doing something similar to this while an undergrad at Penn State. It was just an undergraduate computer vision course, but one of our exercises involved identifying common reference points from one or more images of the same object. These points can then be used to make an estimation of parallax between the images. It is really fun to play with since you can use a few still images to create the illusion that a camera is panning around the object. Of course, that example is quite simple. It is very easy for the points to give false positives, and the processing time of our unoptomized algorithms nearly made it unusable. But it did at least give a proof of concept. However, taking this and expanding it to create 3d models, if they can do so reliably, is quite amazing.

--
Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
That's been possible for years... by Penguinisto · 2006-06-14 07:35 · Score: 3, Interesting

It's called Canoma. Problem is, it's been limited in scope, and the original company that wrote it (MetaCreations) went out of business ages ago: It still exists as an orphan that Adobe has been sitting on, however.
(MetaCreations also produced Poser, Bryce, and Carrara. - all three of which are still alive and in use by the 3D hobbyist market).
/P

--
Quo usque tandem abutere, Nimbus, patientia nostra?
1. Re:That's been possible for years... by kthejoker · 2006-06-14 07:50 · Score: 2, Funny
  
  Looks like your sig has been rendered obsolete.
3D paradoxes by ortholattice · 2006-06-14 07:36 · Score: 3, Funny

I wonder what the software would end up doing with this: M.C. Escher's Waterfall. Would the program self-destruct like that robot in Star Trek?
Using multiple camera angles... by jsharkey · 2006-06-14 07:38 · Score: 3, Interesting

Last year I worked on an Artificial Intelligence project to recognize objects from several video angles. It takes 2D images (from camera video) and turns them into a 3D path.

It uses a super-neat concept called "Geometric Hashing" which can be used to recognize an object regardless of size, rotation, or even partially-obscured regions.
I worked with them briefly by moultano · 2006-06-14 07:42 · Score: 3, Informative

The complexity of the models that the program is able to extract is similar to what you would see in a game like doom. All "floors" are perfectly horizontal, all "walls" are perfectly vertical, and most objects (people, trees, cars) become small vertical walls. This doesn't attempt to capture surface geometry at all; it approximates things with large planes. What they are saying is that most things you see in pictures are very well approximated by these simple primitives, such that when they create a scene using them it provides convincing parallax as you move around it. It's a really neat effect.
Re:Shits & Giggles by Joebert · 2006-06-14 07:44 · Score: 2, Funny

hmmmm.

I've got so many bills, it would be impossible for even the entire Slashdot reader base to pay them all.

--
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
Not for objects at all by moultano · 2006-06-14 08:18 · Score: 2, Insightful

This is only for outdoor scenes and only extracts planar information. It isn't designed for objects at all. It provides general geometric context, ie this area is ground, this area is a left facing wall, etc. That's not to say that a similar technique couldn't be used for identifying round objects, but that isn't what this is for.
Play with it yourself! by cranesan · 2006-06-14 08:41 · Score: 4, Interesting

http://www.cs.cmu.edu/~dhoiem/projects/popup/index .html

Looks like some of the software they wrote to do this has been GPL'ed.
Nothing like shape from shading approaches by moultano · 2006-06-14 09:10 · Score: 2, Insightful

Shape from shading works only on a very narrow set of objects. If you are trying to recover the shape of a marble statue, use shape from shading. If your object has color forget about it.

What you are saying amounts to "People have done research into computer vision in the past, therfore any new research into computer vision is soooo not new."
Re:Well... by jackbird · 2006-06-14 10:41 · Score: 2, Interesting
I've used Photomodeler and Canoma, and made camera mapped environments in 3D software by hand for years. It is incredibly nontrivial. it is a lot of blood, sweat, tears, handpainting, and a not-so-terribly good result. Some typical problems:
- Camera barrel distortion
- chromatic abberations
- hot colors in high-contrast areas of digital photos
- JPEG compression artifacts
- specular highlights and reflections
- lens flares and blooms from those specular highlights and reflections
- clipped/out of gamut areas
- occluding objects like trees, parked cars, signs, telphone poles, pedestrians, trashcans, newspaper vending machines, etc., etc., etc.
- occluding objects like other buildings in aerial photos
- only being able to shoot certain details from awkward angles
- not being able to shoot certain details from any angle at all
- horrendous texture stretching
- perspective problems with concave/convex detail like window ledges, cornices, awnings, etc., etc., etc.
- stuff you forgot to photograph
- different lighting conditions when you go back out to shoot the stuff you forgot to photograph
- unavailable architectural drawings
- paper architectural drawings
- poorly-reproduced paper architectural drawings from 1912
- architectural drawings that bear no resemblance to the conditions onsite
- CAD files aligned to state survey coordinates so large that the single-precision floats in most 3D software starts scrambling the model due to rounding errors.
  as I said, nontrivial.