Researchers Teach Computers To Perceive 3D from 2D
hamilton76 writes to tell us that researchers at Carnegie Mellon have found a way to allow computers to extrapolate 3 dimensional models from 2 dimensional pictures. From the article: "Using machine learning techniques, Robotics Institute researchers Alexei Efros and Martial Hebert, along with graduate student Derek Hoiem, have taught computers how to spot the visual cues that differentiate between vertical surfaces and horizontal surfaces in photographs of outdoor scenes. They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image. [...] Identifying vertical and horizontal surfaces and the orientation of those surfaces provides much of the information necessary for understanding the geometric context of an entire scene. Only about three percent of surfaces in a typical photo are at an angle, they have found."
Now run it on an Escher picture!
Wonder how this will handle those optical illusion photos. like me nocking over the leaning tower of pisa, or holding hte statue of liberty.
...challenge. I think Carnegie Mellon wants revenge against Stanford for beating them in the 2006 DARPA grand challenge. Maybe 2007 will be Carnegie Mellon's year to win the grand challenge. If this happens, we're only a hop skip and a jump to having these things drive us around (esp on freeways).
No Sigs!
One could concievably take a pictures of a city, upload them to this program, stich the pieces together and then import it into a game world. How awesome would it be to actually be able to run around a city(say Toronto) and do things you always wanted to do... (dropping a penny off of the CN tower and having it hit someone :D)
--Valthan
"Your scientists have yet to discover how neural networks create self-consciousness, let alone how the human brain processes two-dimensional retinal images into the three-dimensional phenomenon known as perception. Yet you somehow brazenly declare seeing is believing?"
-- Jesse "The Body" Ventura as a Man In Black
Only about three percent of surfaces in a typical photo are at an angle
What typical photos are those? No faces, people, trees or any organic thing?
No cars? No roofs?
factor 966971: 966971
They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image
This is so not new. These researchers may have advanced techniques is some areas, but shape from shading inversion problems like this have been worked successfully since the 1970's and earlier. The theory is well established. Horn's Robot Vision is a classic.
an ill wind that blows no good
you've always been able to do that.
Cities aren't the kind of thing this is target for.
You can get building plans and architectural drawings and everything from the city for free. There are algorithms that can easily map pictures to objects if you know ahead of time the shape of the things that "should" be there.
This stuff is for deciding the shape of unknown things, and more importantly, to gain new heuristics for image searches.
With this technology, you could ask for "things that are round, and have a box".
More importantly, you could show the computer one picture of something, and have it attempt to find more pictures of it (from different angles, with different colors, etc.). Like you show it a Volvo C90, and it shows you any and all pictures of Volvo C90s by the shape.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
"If this happens, we're only a hop skip and a jump to having these things drive us around (esp on freeways)."
Man that would be a pretty neat invention.
...the CMU web site. My Commodore 64 would really like to sign up for this.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
...pr0n, of course. Now we can accurately predict and model the exact size and specularity of Linsey Lohan's boobies, using this revolutionary new (wait for it) Mellon Engine. Truly, we live in the future.
adam b.
So we're one step closer to actually being able to do the dramatic image-enhancing stuff that's routine in film and television crime drama? You know, where the brooding detective notices four interesting pixels in the background of a scratchy security video, strokes his chin thoughtfully, and says "enhance this bit" to the stereotype computer geek. The geek types noisily, the computer zooms in on thouse four pixels, and clears it up into a detailed image of the bad guy, often moving other foreground stuff out of the way to do so.
Slashdot Burying Stories About Slashdot Media Owned
I knew I bought that machine spec'd at 3.27 infinite loops per second for a reason.
I remember doing something similar to this while an undergrad at Penn State. It was just an undergraduate computer vision course, but one of our exercises involved identifying common reference points from one or more images of the same object. These points can then be used to make an estimation of parallax between the images. It is really fun to play with since you can use a few still images to create the illusion that a camera is panning around the object. Of course, that example is quite simple. It is very easy for the points to give false positives, and the processing time of our unoptomized algorithms nearly made it unusable. But it did at least give a proof of concept. However, taking this and expanding it to create 3d models, if they can do so reliably, is quite amazing.
Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
Nice to see we're doing things for shits & giggles, is this some sort of practical joke ?
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
(MetaCreations also produced Poser, Bryce, and Carrara. - all three of which are still alive and in use by the 3D hobbyist market).
Quo usque tandem abutere, Nimbus, patientia nostra?
I wonder what the software would end up doing with this: M.C. Escher's Waterfall. Would the program self-destruct like that robot in Star Trek?
Last year I worked on an Artificial Intelligence project to recognize objects from several video angles. It takes 2D images (from camera video) and turns them into a 3D path.
It uses a super-neat concept called "Geometric Hashing" which can be used to recognize an object regardless of size, rotation, or even partially-obscured regions.
That's how Capt. Kirk will defeat the head android in the remake of "Mudd's Women"!
It'll be more entertaining than, "He always lies! .... I'm lying!
But, if he's lying then he's tellling the truth!
But if he can't tell the truth because he always lies. But if he says he's lying, then he's telling the truth..."
Black and white.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
The complexity of the models that the program is able to extract is similar to what you would see in a game like doom. All "floors" are perfectly horizontal, all "walls" are perfectly vertical, and most objects (people, trees, cars) become small vertical walls. This doesn't attempt to capture surface geometry at all; it approximates things with large planes. What they are saying is that most things you see in pictures are very well approximated by these simple primitives, such that when they create a scene using them it provides convincing parallax as you move around it. It's a really neat effect.
Granted you can extrapolate an estimate of the surroundings for a 3d scene from a single image.
This is good when the source material doesn't exist.
However if I were in the grand challenge I wouldn't be swapping the (minimum) stereo imaging most cars appear to have.
1) its an approximation and may not be applicable for different terrain or obsticles (similar rock against similar floor)
2) its harder to fool 2 cameras than a single one, glitches could send you off the cliff.
3) with a stereo pair you can interpolate properly and produce a much better map.
Humans with one eye (and single image devices) benefit greatly when given a series of images because then the same interpolation can occur and the 3d scene can be rebuilt.
liqbase
I'd like to see this applied more directly to something like Google Earth. They already have the "show buildings".... this would be a great boon to that. It might need a different shading than the grey boxes used by Google earth as it stands now, to show which structures are derived from the 2d images, but still, I think it'd be great.
Google, you can send me my check now, please.
This could be a revolution in the CSI field. There are already products that make 3D virtual crime scenes but this could be applied to just every case were a picture was taken.
You are absolutely correct that it won't be able to tell what the 'reverse' side looks like, other than they will know that it has to be within certain size constraints.
So if I'm looking at a football, I won't be able to tell what is behind it from a single picture. You would have a blind spot, that would grow based upon the vectors from the image aperture to the edges of the object.
However, this could be a breakthrough for facial recognition. Given a facial photo, if they are able to extract the dimensions of features, it should provide another level of accuracy in the detection process.
For example: Recognition software might limit a face to 10 possible matches, but if you then run this software, maybe only 1 has a nose that is as long, or eye sockets of a certain depth.
Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
So when is this going to be used to turn real environments into virtual environemts?
:)
Taking reconnaisance photos and turning them into training simulations, for example. Or, closer to my level, taking photos of public places and turning them into deathmatch levels.
(Always wanted to make a Quake level of my high school, but then became worried people would thing I'd be the source of the next Columbine. Then I wanted to do one of my college, but then 9/11 came along, and I was worried of being investigated as a terrorist. There's freedom of speech, for you.)
tasks(723) drafts(105) languages(484) examples(29106)
This is only for outdoor scenes and only extracts planar information. It isn't designed for objects at all. It provides general geometric context, ie this area is ground, this area is a left facing wall, etc. That's not to say that a similar technique couldn't be used for identifying round objects, but that isn't what this is for.
Many of these techniques aren't new; some of this stuff has been happening since '96.
I wrote a program to do something similar converts a 2D into a 3D image
I think you'll find this interesting: http://www.cs.technion.ac.il/~gershon/EscherForRea l/
Left 30 degrees
click click click click click
Up twenty degrees
click click click click click
Enhanse
click click click click click
Zoom in on that
click click click click click
Enhanse
click click click click click
OK, give me a hardcopy right there.
"More human than human is oour motto"
http://www.cs.cmu.edu/~dhoiem/projects/popup/index .html
Looks like some of the software they wrote to do this has been GPL'ed.
Well, that and we have a gigantic corpus of training data to extrapolate from.
Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
researchers at Carnegie Mellon have found a way to allow computers to extrapolate 3 dimensional models I'd run it on a Victoria's Secret magazine. There are some excellent 3d models I'd like to extrapolate if you know what I mean.
God spoke to me.
I saw something unusual when I saw (again) Blade Runner.
/ bladeea2.html (spanish, sorry, but it has a diagram of the scene, where "espejo" means "mirror", there is a convex mirror)m (some information in english)
:lol:
When examining the photo with the ESPER machine, I observed that the photo was transformed into 3d in someway. In fact I remember the mirror, perhaps in a future a mirror inside the photo can apport information of the 3D scene...
The ESPER machine:
http://www.geocities.com/Hollywood/Boulevard/7920
http://www.brmovie.com/FAQs/BR_FAQ_Terminology.ht
It suddenly come to my mind when I read this announcement...
I post here once a year, so I am not registered, and forgive my spanglish
Egocentrico.
in the context of my stereoscopy hobby for use with my emagin z800 vr visor i discovered software that was able to detect some depth dimension from the movement from frame to frame in a movie. The tech has been developed by a company called Soft4D, which doesn't exist anymore. But it seems http://www.colorcode3d.com/ sells a version of the software for use with any normal 2D DVD's and their stereoscopic 50 eurocent glasses. It sure adds some depth to a 2D movie, no true 3D effect but still remarkable and more immersive to watch then just 2D.
1. See if your school has LabView or Matlab. Both offer FFT out of the box. One of those would have actually been my first choice for the project you're describing.
2. If that fails, note that there are plenty of textbooks (or websites) that explain the FFT butterfly. A quick search turned up http://www.relisoft.com/Science/Physics/fft.html, which even has C++ source code available for download.
I have seen an example of this video enhancement technology where they have some crappy video of a car leaving a parking garage and the front license plate is completely unreadable due to grainy pixelation. But when they selected the area of the plate and compared the data from every frame of the video it because quite clear what the license plate said. It is very convincing.
Ever since the 9/11 conspiracy theorists started posting captured stills of the airplane hitting the tower, pointing out unknown devices strapped to the underside, I have wished that someone with access to this image processing technology would analyze the full video sequence to see if there is really anything there or not. It sure would be nice to use some high-tech tools to put this whole thing to rest.
Shape from shading works only on a very narrow set of objects. If you are trying to recover the shape of a marble statue, use shape from shading. If your object has color forget about it.
What you are saying amounts to "People have done research into computer vision in the past, therfore any new research into computer vision is soooo not new."
Obligatory Blade Runner quote:
Enhance 224176
Enhance, Stop
Move in, Stop
Pull out, Track right, Stop
Center in, Pull back, Stop
Track 45 right, Stop
Center and Stop
Enhance 34 to 36
Pan right and pull back, Stop
Enhance 34 to 46
Pull back, Wait a minute, Go right, Stop
Enhance 5719
Track 45 left, Stop
Enhance 15 to 23
Give me a hard copy right there.
Unfortunately this is done by neural learning techniques, "machine learning". So it is essentially randomly taught artificial neurons and the researchers have no idea how the machine solves it. However machine learning techniques, or Artificial Neural Networks (ANN) have alot of potential as custom IC's and computing power become better and better.
Now if only they could teach this to my dogs.
"We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
otherwise known as a steinmetz solid, which is often used as a demonstration for engineering drawing or architecture classes to show that a 3-d drawing of an object is not sufficient to determine its actual shape. A mouhefanggai in 3-D drawings looks like a sphere, but is actually a ridged object with a surface consisting entirely of flat-wrapped curves, rather than compound curves.
Nostalgia's not what it used to be.
Imagine what this could do for converting a 2D film to 3D. With the appropriate technology, we could have 3D movies that are worth a darn.
Hmm let me see here.. what could be considered prior art?
Maybe Pablo Picasso's Guernica??!?! Man, that Picaso was waaaay ahead of his time!
*watches out for rotten tomatoes*
SixD
This algorithm will breathe life into my old porn collection!
This all pre-supposes you can translate the diagram accurately and position it in the 3d world. You'd probably need GPS readings at different points on the building, and on the camera to get decent results.
And you need a light model and surface texture models (or a lot of pictures from different angles).
So this isn't trivial. But it's doable. Such techniques are used in film for scene composition and for texturing 3d representations of real-world objects.
It's not like you can just take a picture of a building and have your 3d modeling tool figure out what it's a picture of and create a new texture artifact, etc.
You have special tools and workflows to do this. I doubt they'll be bundled with AutoCAD any time soon.
But there's nothing ground breaking about:
1) take picture of things that you have a model of
2) derive textures for model for arbitrary model viewing
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Shape from shading works only on a very narrow set of objects. If you are trying to recover the shape of a marble statue, use shape from shading. If your object has color forget about it.
Not true at all. If you understand the photometric function of the materials in the scene variation due to color can be separated from variation due to shading. Image classification techniques are useful for doing this. This is discussed in the book and elsewhere. We used the technique for Voyager II to measure topography of Uranus and Neptune satellites. Stereo pairs were not often available.
What you are saying amounts to "People have done research into computer vision in the past, therfore any new research into computer vision is soooo not new.
Ding! Wrong again. The lesson is that slashdot editors should be careful to refrain from hyperbolic descriptions of results that are really incremental rather than revolutionary. And readers like you should not swallow the summaries hook line and sinker.
an ill wind that blows no good
Got any links on this? Not much Wiki info on Geometric Hashing - I remember my physics teacher at school explaining why a robot has these problems recognising stuff so I'm kinda interested (this was in the 80's so I may have missed some of the newer stuff).
Imagine what this could do for the porn industry ;)
glitches can't send you over a cliff. maps & GPS (or inertial) keeps you off the cliffs. glitches could send you into a ditch or onto a bush, either of which would be difficult to extract from.
Can you be Even More Awesome?!
How impressive this research really is won't be known until we can have a look at their methods, algorithms and training data set. I have a feeling that the novel aspect of their work is not in the extraction of features, or the method used to determine whether a surface is vertical or horiztonal. As others have already said, shape from shading (think shading a lit cube with a pencil on paper) and even geometric approaches can get you a 3D model from 2D images. It all depends on the assumptions you make before hand. However, if their system learned the classification from the 300+ images it got fed, that would be pretty impressive, even though they most likely hand labelled the images as mostly vertical, a bit of both, etc.
On a side note, Kanade is a very influential researcher in computer vision, and one with a massive and solid body of novel work. If he said that the CMU researchers' work is good and novel, it adds quite a bit of weight to their claims.
Oh, and to those that said images of buildings are not "everyday", keep in mind that most research papers I have seen operate on handcrafted images, sometimes of a single kind of object. Being able to handle arbitrary images of buildings is very, very general and "everyday".
Berserk Manga > All
> So we're one step closer to actually being able to do the dramatic
> image-enhancing stuff that's routine in film and television crime drama?
By a strange coincidence, we (the CMU graphics lab) been making fun of that very piece of cinematic fiction for the last week or so.
If all you have is a single image, there's nowhere to get additional information from to do that magical "enhance" trick movies so love. If you've got a video stream there may be stuff you can do, but video data is typically stored at substantially lower resolution than still images due to space considerations, so it's pretty unlikely you're going to get magic out of it. Basically, the magic "enhance" button is pure fiction, and likely to remain so for the forseeable future. That's not to say no image processing is possible or useful - stuff is done all the time - but it's not like what movies would have you believe.
On the other hand, neither are movies' portrayals of being a cop, being a programmer, or being a pretty coed spending a weekend at an abandoned cabin in the woods remotely accurate, so why should this be any different?
When it comes down to it, these men are shaking hands about teaching a computer to read Magic Eyes.
Isn't that like a second year problem at most universities?
"Only about three percent of surfaces in a typical photo are at an angle, they have found."
Doesn't it depend on whether the photo's of a city and man built objects or of nature, trees and mountains...
Uh, this has already been done before.
And this is news because...
Mike D. Smith http://www.elecorn.com
In this rather disturbing episode in the fourth season of TNG, engineer Commander Geordi La Forge has to investigate why he and three colleagues are drawn to a particular small area of a planet, to their peril. There's visual evidence from their first trip he keeps pouring over until he decides to focus on a small shadow and take that frame to the holodeck. The holodeck gives the 2D shadow 3D shape, both proving there was something else in the room and more. We see it's not human. In the fiction he tells the computer to extrapolate the shadow based on the size of an average human, say 5'8" and using personnel logs the other crew men can be rendered in 3D completely. Most of it though is from a 2D frame in a video. Geordi has a weird costume at the end of this episode...blue man group anyone?