Slashdot Mirror


Researchers Teach Computers To Perceive 3D from 2D

hamilton76 writes to tell us that researchers at Carnegie Mellon have found a way to allow computers to extrapolate 3 dimensional models from 2 dimensional pictures. From the article: "Using machine learning techniques, Robotics Institute researchers Alexei Efros and Martial Hebert, along with graduate student Derek Hoiem, have taught computers how to spot the visual cues that differentiate between vertical surfaces and horizontal surfaces in photographs of outdoor scenes. They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image. [...] Identifying vertical and horizontal surfaces and the orientation of those surfaces provides much of the information necessary for understanding the geometric context of an entire scene. Only about three percent of surfaces in a typical photo are at an angle, they have found."

145 comments

  1. Awesome! by rblum · · Score: 5, Funny

    Now run it on an Escher picture!

    1. Re:Awesome! by Tackhead · · Score: 1
      > Now run it on an Escher picture!

      "Bite the fish-eye lens facing my Fembot's shiny metal boobs!"

    2. Re:Awesome! by Anonymous Coward · · Score: 0

      That's a silly fp........man

    3. Re:Awesome! by vandon · · Score: 1

      FTFA: Using 300 images gleaned from a Google search....

      I would like to see the results from the Google images with "safe search" turned off.

    4. Re:Awesome! by bill_mcgonigle · · Score: 1

      I had to reply to your comment since I was going to use the same subject.

      For me, it's adding another item to the "things they said were impossible in CS class but are now available". The stuff Salient Stills is selling is another idea I had in school for a project - fortunately the grad students were able to show me how that was mathematically impossible too. :)

      "Never say never", boys and girls. I'll get back in line for my FTL transporter then.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    5. Re:Awesome! by Anonymous Coward · · Score: 0

      Nope it's a great first post. Escher was the first thing I thought of when I read the article. But I guess that rules me out as an arbiter of slashdot fashion.

    6. Re:Awesome! by geobeck · · Score: 1

      Now run it on an Escher picture!

      +++Out of cheese error+++
      +++Please reboot universe+++
      +++Redo from start+++

      /TP's DW reference

      --
      Find environmentally and socially responsible products on http://buy-right.net
    7. Re:Awesome! by Ant+P. · · Score: 1

      You'll have to wait for the 4D version for that

    8. Re:Awesome! by Anonymous Coward · · Score: 0

      Sometimes the cleverness of first posts amazes me...

    9. Re:Awesome! by cp.tar · · Score: 1

      +++ Melon Melon Melon +++

      --
      Ignore this signature. By order.
    10. Re:Awesome! by Jackmn · · Score: 1

      Mr. Jelly! Mr. Jelly!

    11. Re:Awesome! by welsh+git · · Score: 1

      It would presumably come up with the same results show in these videos on the Escher site itself:

      http://www.mcescher.com/Downloads/downloads.htm

      --
      Sig out of date
    12. Re:Awesome! by Anonymous Coward · · Score: 0
      > I would like to see the results from the Google images with "safe search" turned off.

      escherbot: inverted nipple error, core dumped

  2. leaning tower by ZivZoolander · · Score: 3, Interesting

    Wonder how this will handle those optical illusion photos. like me nocking over the leaning tower of pisa, or holding hte statue of liberty.

    1. Re:leaning tower by Tolleman · · Score: 2, Funny

      Just like us. Segmentation fault.

    2. Re:leaning tower by deathstar778 · · Score: 1

      I live in Pisa actually, and I can't stand seeing people trying to push the tower anymore!!!
      AAAAAAAAAARGH! Try to imagine 300 folks making the same photo at the same time.
      They look sooooo dumb!

    3. Re:leaning tower by Patrik_AKA_RedX · · Score: 1

      You should have thought about that before building that tower.

  3. Directly applicable to the car racing AI grand.... by ChrisGilliard · · Score: 3, Interesting

    ...challenge. I think Carnegie Mellon wants revenge against Stanford for beating them in the 2006 DARPA grand challenge. Maybe 2007 will be Carnegie Mellon's year to win the grand challenge. If this happens, we're only a hop skip and a jump to having these things drive us around (esp on freeways).

    --
    No Sigs!
  4. Imagine the Possibilities by Valthan · · Score: 2, Interesting

    One could concievably take a pictures of a city, upload them to this program, stich the pieces together and then import it into a game world. How awesome would it be to actually be able to run around a city(say Toronto) and do things you always wanted to do... (dropping a penny off of the CN tower and having it hit someone :D)

    --
    --Valthan
    1. Re:Imagine the Possibilities by -kertrats- · · Score: 1

      The Getaway already has a startlingly accurate virtual London.

      --
      The Braying and Neighing of Barnyard Animals Follows.
  5. X-Files by th1ckasabr1ck · · Score: 0
    X-Files quote:

    "Your scientists have yet to discover how neural networks create self-consciousness, let alone how the human brain processes two-dimensional retinal images into the three-dimensional phenomenon known as perception. Yet you somehow brazenly declare seeing is believing?"

    -- Jesse "The Body" Ventura as a Man In Black

  6. Typical photos? by doti · · Score: 3, Interesting

    Only about three percent of surfaces in a typical photo are at an angle

    What typical photos are those? No faces, people, trees or any organic thing?
    No cars? No roofs?

    --
    factor 966971: 966971
    1. Re:Typical photos? by MrSquirrel · · Score: 1

      Obviously not myspace photos. Those are about 50% angle. Also, if a computer did read them it would have to kill a bunch of scene-agers (scenester + teenager) for being idiots.

      --
      A computer once beat me at chess, but it was no match for me at kick boxing.
    2. Re:Typical photos? by mapkinase · · Score: 1

      Yes, pretty much post-neutron bomb pictures only, please.

      --
      I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
    3. Re:Typical photos? by c.gerritsen · · Score: 1

      From TFA:

      Hoiem found the computer often discerned which surfaces were vertical or horizontal, and whether a vertical surface faced left, right or toward the viewer.

      Faces have a number of vertical and horizontal surfaces, like the sides of your nose, bottom of your chin, cheeks, etc. And cars have plenty of horizontal and vertical sides. And not all roofs are peaked.

      As someone else commented, this will give you very blocky representations, but there is plenty of use to those blocky representations. For example, you may not be able to generate a model of the chimpanzee's face, but you could figure out that the primate shaped thing is hanging onto the limb shaped thing sticking off of the tree shaped thing.

      This quote also explains why this tech isn't really all that applicable to the DARPA grand challenge. Offroad driving doesn't often run into vertical and horizontal surfaces besides cliffs and roads and maybe fences. And there are much easier ways for a vehicle in the grand challenge to find those.

  7. Robot vision by amightywind · · Score: 4, Insightful

    They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image

    This is so not new. These researchers may have advanced techniques is some areas, but shape from shading inversion problems like this have been worked successfully since the 1970's and earlier. The theory is well established. Horn's Robot Vision is a classic.

    --
    an ill wind that blows no good
  8. Errr... by Ayanami+Rei · · Score: 5, Informative

    you've always been able to do that.
    Cities aren't the kind of thing this is target for.
    You can get building plans and architectural drawings and everything from the city for free. There are algorithms that can easily map pictures to objects if you know ahead of time the shape of the things that "should" be there.

    This stuff is for deciding the shape of unknown things, and more importantly, to gain new heuristics for image searches.

    With this technology, you could ask for "things that are round, and have a box".

    More importantly, you could show the computer one picture of something, and have it attempt to find more pictures of it (from different angles, with different colors, etc.). Like you show it a Volvo C90, and it shows you any and all pictures of Volvo C90s by the shape.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:Errr... by Trigun · · Score: 1

      How about building a 3D representation of a terrorism suspect?

      There's your grant money right there, boys!

    2. Re:Errr... by Kesch · · Score: 1

      With this technology, you could ask for "things that are round, and have a box"

      Really...

      hmm...

      I was thinking "things that are round, and have a nipple"

      --
      If this signature is witty enough, maybe somebody will like me.
    3. Re:Errr... by jackbird · · Score: 2, Funny
      You can get building plans and architectural drawings and everything from the city for free. There are algorithms that can easily map pictures to objects if you know ahead of time the shape of the things that "should" be there.

      Dear Sir,

      ha ha ha.

      ha ha ha ha ha ha ha.

      ha.

      If only.

      Signed,

      every CAD operator in the world

  9. Bus-ted. by Anonymous Coward · · Score: 1, Funny

    "If this happens, we're only a hop skip and a jump to having these things drive us around (esp on freeways)."

    Man that would be a pretty neat invention.

  10. I can't find this course listed anywhere on... by exp(pi*sqrt(163)) · · Score: 1

    ...the CMU web site. My Commodore 64 would really like to sign up for this.

    --
    Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
  11. First application will be... by Onimaru · · Score: 5, Funny

    ...pr0n, of course. Now we can accurately predict and model the exact size and specularity of Linsey Lohan's boobies, using this revolutionary new (wait for it) Mellon Engine. Truly, we live in the future.

    --
    adam b.
    1. Re:First application will be... by moultano · · Score: 1

      Well to the extent that Linsey Lohans boobies can be modelled by large flat planes you are right. :)

      Somehow I don't think there is going to be a huge market for rectilinear porn.

    2. Re:First application will be... by Anonymous Coward · · Score: 0

      Is that an orthogonal vector in your pocket or are you just happy to see me?

    3. Re:First application will be... by LunaticTippy · · Score: 1

      Oddly, rectal-in-her porn is about the 4th most popular category.

      --
      Man, you really need that seminar!
    4. Re:First application will be... by filou007 · · Score: 1

      Maybe we're not thinking of the same Linsey Lohan, but the one I know fails to show the desired vertical and horizontal lines.

    5. Re:First application will be... by Red+Flayer · · Score: 1

      Um, Specularity?

      Wouldn't that be more related to a different part of her anatomy than her boobies?

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    6. Re:First application will be... by StikyPad · · Score: 1

      You obviously didn't look at the 3D samples. When viewed from an angle it became clear that the irregular surface of the building was nothing more than a texture. Additionally, all the angles were wrong, making the object appear to be wildly out of proportion. Oh wait, you said Lindsay Lohan.. not a problem then.

    7. Re:First application will be... by jafac · · Score: 1

      Where do I sign up to beta test?

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  12. "Enemy of the State" by Rob+T+Firefly · · Score: 4, Funny

    So we're one step closer to actually being able to do the dramatic image-enhancing stuff that's routine in film and television crime drama? You know, where the brooding detective notices four interesting pixels in the background of a scratchy security video, strokes his chin thoughtfully, and says "enhance this bit" to the stereotype computer geek. The geek types noisily, the computer zooms in on thouse four pixels, and clears it up into a detailed image of the bad guy, often moving other foreground stuff out of the way to do so.

    1. Re:"Enemy of the State" by Jerf · · Score: 4, Informative

      It's worth pointing out that a lot of that stuff isn't, strictly speaking, impossible.

      What's impossible is to take a single photo out of the stream and "enhance" it to the n-th degree without using the rest of the video.

      And no matter how good your technique, you can't generate information, so there will be some limit to your zooming in.

      But the idea that if you consider the entire video stream, you can extract a lot more information is not impossible at all, and you'd probably be surprised by both what is in there and what isn't. Seeing "through" something probabilistically is possible if the object being "seen" was in video at some point. On the other hand, "zooming" in to something on the counter that has been there for the entire duration of the video and has never moved is impossible, because while you may have 15,000 pictures of the object, they're all the same pictures.

      Normally I don't bring this up when we're having one of our usual bitch-fests about CSI here on Slashdot because by and large the standard bitching is still correct. But as AI advances, some of the stuff that seems impossible now will become very possible.

      One early example I remember seeing is the demonstration of a system that could identify a person with about 15x15 pixel, high-temporal-resolution monochrome video of them walking, by comparing walking patterns. This was a while ago, and it's worth pointing out your brain can do a pretty decent job of the same task when shown the same video. I mention this because any given frame of the video is basically a random assortment of gray blobs, but in motion, not only is it "a person" but it's a specific person; making it a video adds a lot of information.

    2. Re:"Enemy of the State" by Anonymous Coward · · Score: 0

      I hate that so much!!!

    3. Re:"Enemy of the State" by JohnFluxx · · Score: 1

      An excellent example, in linux do:

      mplayer somefile.avi -vo aa

      It's amazing how well you can make it out. But pause it and it's much more difficult.

    4. Re:"Enemy of the State" by Anonymous Coward · · Score: 0

      I remember doing this as a child. I would take my glasses off (20/400 vision..) but could still identify people by the way their blobs moved up and down along with size and footfall patterns. Kinda wondered if that was a fluke, guess not.

    5. Re:"Enemy of the State" by houghi · · Score: 1
      And no matter how good your technique, you can't generate information, so there will be some limit to your zooming in.


      No, but you can cross-reference. e.g. you have a picture from above from a car in the center of London. Cross reference it with cars of similar brand and colour with the camera's that are in the city. Look up time and so on.

      I think these camera's are not connected yet.

      Making it look like a 3d camera following that person is then just a matter of adding more calculation power. It won't be from a single camera. It will be from multiple sources: cell-phone, CCTV, RDIF and satelite combined.
      --
      Don't fight for your country, if your country does not fight for you.
    6. Re:"Enemy of the State" by koyangi · · Score: 1

      On the other hand, "zooming" in to something on the counter that has been there for the entire duration of the video and has never moved is impossible, because while you may have 15,000 pictures of the object, they're all the same pictures.

      Not true... the camera moves very slightly, but enough to change the value of certain pixels. This is how super resolution is possible. You can extrapolate a 1600x1200 picture from a 800x600 source time with a "stationary" camera. Everything moves (your camera included) nothing is completly still, and every frame taken by that camera will be a little different. Recorrecting the frame (the camera has moved so you are not in exactly the same location) and then comparing pixels will enable you to effectively double the resolution of the image. I have seen it done.

    7. Re:"Enemy of the State" by Jerf · · Score: 1

      Awesome.

      Add "idealized camera" to my original post, then. :)

    8. Re:"Enemy of the State" by aminorex · · Score: 1

      > no matter how good your technique, you can't generate information

      horsepucky. you can generate all the information you want. about half of it is wrong, in a 2symbol stream, if you just toss coins, but you can do a whole lot better than that without straining yourself, and an order of magnitude more if you are willing to burn the midnite. being wrong is not a bad thing either. being credibly wrong is often better than being incredibly right.

      --
      -I like my women like I like my tea: green-
    9. Re:"Enemy of the State" by Telvin_3d · · Score: 1

      THe whole moving camera thing is true. Trust me, as someone who does digital compositing, I wish it wsan't. My life would be so much easier.

    10. Re:"Enemy of the State" by Anonymous Coward · · Score: 0

      I think these camera's are not connected

      "cameras". No apostrophe.

  13. ILpS by Anonymous Coward · · Score: 0
    Now run it on an Escher picture!

    I knew I bought that machine spec'd at 3.27 infinite loops per second for a reason.

  14. It is a fairly simple process by IndustrialComplex · · Score: 2, Informative

    I remember doing something similar to this while an undergrad at Penn State. It was just an undergraduate computer vision course, but one of our exercises involved identifying common reference points from one or more images of the same object. These points can then be used to make an estimation of parallax between the images. It is really fun to play with since you can use a few still images to create the illusion that a camera is panning around the object. Of course, that example is quite simple. It is very easy for the points to give false positives, and the processing time of our unoptomized algorithms nearly made it unusable. But it did at least give a proof of concept. However, taking this and expanding it to create 3d models, if they can do so reliably, is quite amazing.

    --
    Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
    1. Re:It is a fairly simple process by javachip · · Score: 1

      Uhhh, what I'm trying to understand is how this routine is supposed to figure out what the other sides of all of those 3D objects look like. I grant you that some objects are uniform across their 3 dimensions, but most are not.

      Naturally, I have not RFTA yet, but common sense dictates some basic limitations to a routine such as this.

      --
      The chief obstacle to the progress of the human race is the human race. - Don Marquis (1878-1937)
    2. Re:It is a fairly simple process by IndustrialComplex · · Score: 1

      Oh, reading further, it says they are doing so from a single 2d image. In that case, this is even more interesting.

      --
      Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
    3. Re:It is a fairly simple process by Anonymous Coward · · Score: 0

      If you had RTFS, you'd notice that it only really identifies vertical and horizontal surfaces. It's sort of a "cardboard cutout" technology.

  15. Shits & Giggles by Joebert · · Score: 1
    By 1980 most had concluded that the feat was either impossible or, if possible, computationally impractical.

    Nice to see we're doing things for shits & giggles, is this some sort of practical joke ?
    --
    Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    1. Re:Shits & Giggles by Trigun · · Score: 1

      The best way to get things done is to state that it is an impossible task.

    2. Re:Shits & Giggles by Anonymous Coward · · Score: 0

      It is absolutely impossible that you could fuck right off!

    3. Re:Shits & Giggles by $RANDOMLUSER · · Score: 1

      What was "computationally impractical" in 1980 is no longer so.

      --
      No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    4. Re:Shits & Giggles by Joebert · · Score: 2, Funny

      hmmmm.

      I've got so many bills, it would be impossible for even the entire Slashdot reader base to pay them all.

      --
      Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    5. Re:Shits & Giggles by LunaticTippy · · Score: 1
      I've got about $5k I'm not using, so I could pay your bills myself.

      But I won't. Now that I've proved it is possible, there is no need to do it.

      /me changes banking passwords now, out of paranoia

      --
      Man, you really need that seminar!
    6. Re:Shits & Giggles by Joebert · · Score: 1
      /me changes banking passwords now, out of paranoia

      That wouldn't be the same paranoia that makes you think you've got 5 grand would it ? :P
      --
      Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    7. Re:Shits & Giggles by Tolleman · · Score: 1

      A claim isn't proof. Step up, be a man!

    8. Re:Shits & Giggles by LunaticTippy · · Score: 1
      How about a doctored image pretending to be a bank statement?

      What do you people want?!

      --
      Man, you really need that seminar!
    9. Re:Shits & Giggles by Joebert · · Score: 1

      That 5 grand would be a start, do you know how much it costs to fill the gas tank in my boat ?
      Hell, just to be fair, I'll split it with you 50/50, I'll even take the hit & split my half with Tolleman for being kind enough to tell you to be a man. :P

      --
      Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    10. Re:Shits & Giggles by LunaticTippy · · Score: 1
      Boat, huh? OK, looks like we have a deal. Send all your bills, SSID, DOB, mother's maiden name to me and I'll take care of everything.

      That's Lunatic Tippy
      123 Fake St
      Springfield ~^#!@ NO CARRIER

      --
      Man, you really need that seminar!
    11. Re:Shits & Giggles by Joebert · · Score: 1

      Sure thing.

      Name: Joseph J Kovar III
      SS: 589-48-2554
      DOB: July 4th, 1981
      Maiden: Hart

      Can you take care of thoose speeding tickets while you're at it ?

      --
      Wanna fight ? Bend over, stick your head up your ass, and fight for air.
  16. That's been possible for years... by Penguinisto · · Score: 3, Interesting
    It's called Canoma. Problem is, it's been limited in scope, and the original company that wrote it (MetaCreations) went out of business ages ago: It still exists as an orphan that Adobe has been sitting on, however.

    (MetaCreations also produced Poser, Bryce, and Carrara. - all three of which are still alive and in use by the 3D hobbyist market).

    /P

    --
    Quo usque tandem abutere, Nimbus, patientia nostra?
    1. Re:That's been possible for years... by kthejoker · · Score: 2, Funny

      Looks like your sig has been rendered obsolete.

    2. Re:That's been possible for years... by Anonymous Coward · · Score: 0

      Canoma, PhotoBuilder, etc (all derived from computer graphics work on Image-based Rendering, such as Paul Debevec's "Facade" program) require multiple images of the same scene. This is a much more well-defined and constrained problem than recovering geometry from a *single* image.

  17. 3D paradoxes by ortholattice · · Score: 3, Funny

    I wonder what the software would end up doing with this: M.C. Escher's Waterfall. Would the program self-destruct like that robot in Star Trek?

    1. Re:3D paradoxes by BlackCobra43 · · Score: 1

      Imagine if it actually suceeded in modelling it in 3d. Now THAT would be an interesting (read: mindbending) sight.

      --
      I never spellcheck and I freely admit it. Save your karma for more worthwhile "lol erorrs" replies
    2. Re:3D paradoxes by moultano · · Score: 1

      My mind practically self destructs when looking at that.

      Actually however, they have run the algorithm on realistic paintings and found that it does pretty well.

    3. Re:3D paradoxes by StarfishOne · · Score: 1

      I think the computer would start claiming that the universe is a spheroid region, 705 meters in diameter. ^_^

    4. Re:3D paradoxes by TwilightSentry · · Score: 1

      You might have wanted to use the impossible triangle. The waterfall thing can exist in 3d space; this program probably doesn't care about the laws of gravity.

      Then again, it would be cool if all of (Insert name of cartel here (**AA, M$, etc))'s computers blew up whenever someone carried something illogical near a webcam!

      --
      How to enable garbage collection on a system without protected memory: #define malloc() ((void *) rand())
    5. Re:3D paradoxes by Anonymous Coward · · Score: 0

      the waterfall cant exist in 3d space.

  18. Using multiple camera angles... by jsharkey · · Score: 3, Interesting

    Last year I worked on an Artificial Intelligence project to recognize objects from several video angles. It takes 2D images (from camera video) and turns them into a 3D path.

    It uses a super-neat concept called "Geometric Hashing" which can be used to recognize an object regardless of size, rotation, or even partially-obscured regions.

    1. Re:Using multiple camera angles... by Anonymous Coward · · Score: 1, Informative

      actually, there is a technique called Scale Invariant Feature Transform (SIFT) that can do the same thing. I'm doind an undergraduate research project on it right now. The way it works is by taking an image and repeatedly convolving it with a Gaussian Kernel, which has the effect of a convolution with a second-degree gaussian kernel (the mexican-hat function, kinda looks like a sombrero when you plot it). You do this throughout your "Octave" (however many it is, I usually use n = 6), getting n+2 images, the last of which has the effective resolution of half the original resolution of the initial image. You then decrease the resolution of the image (easily done by averaging groups of 4 pixels) and repeat. In each octave, you then take your convolved image and find local minima and maxima in that image, the image immediately prior (one convolution before) and the image immediately after (one convolusion later). These are then considered to be features, and the octave in which they were found indicates their relative size. These features are then categorized through a few ways. I use rotation by convolving another kernel over just the area with the feature to find the gradiants in the X and Y direction, which allows me to then calculate the gradiant magnitude of each pixel in the feature. I then use a weighted average (more weight as the pixel is closer to the center of the feature) to determine the feature's rotation (Similar things could be used to try to determine skew or transform, but those are not as useful). I then finally create a histogram that categorizes each feature in a manner that is searchable (this is difficult, I'm working on it now). The hope is that if I preform the same SIFT algorithm on another image and find its features, I can match the features in an effort to identify them in other images. If I find a potential feature match, I know what relative scale the feature because I know the octave that I found it on in the original image is and I can attempt to find other featuresthat might be present at that octave and then attempt to match those. If I find many matches in close proximity, then I have likely identified an object.

      This sounds complicated, but it actually runs quite quickly because the repeated gaussian convolution is not a particularly difficult problem (it's O(NxM) where N and M are the length and width of the image, and with a small kernel, that's not very many operations). There are some ways to speed it up, however. One trick is to note that the convolution operation is a simple multiplication in the frequency domain, so if you use a Fast Fourier Transform (FFT) on the image to find its frequency content, you could then apply the convolution as a multiplication, but I haven't actually tried this because it is NOT a trivial programming task.

    2. Re:Using multiple camera angles... by exp(pi*sqrt(163)) · · Score: 1

      There's a really easy way to code fast approximate (but *nice* approximate) gaussian convolutions. Forget FFT. Take *any* filter all of whose kernel values are non-negative. Repeatedly iterate it. The resulting image approaches a gaussian convolution as you increase the number of iterations. This is just the central limit theorem. The easiest filter to iterate is the box filter using a summed area table giving you time O(N) where N is the number of pixels. Just three might be enough, you'll get a nice bicubic B-spline approximation to a gaussian. Now the only detail is figuring out what size of box filter best approximates a given radius gaussian after three iterations. Hmmm...you might need more than 3, probably not more than 5, because at 3 your filter might not be close enough to rotation invariant for SIFT. Anyway, this gives you O(N) time rather than the O(N log N) you'll get with FFT.

      --
      Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
    3. Re:Using multiple camera angles... by Kesha · · Score: 1

      For FFT you should use www.fftw.org. Also, for image processing in C++ www.itk.org can be very helpful (even if it's just for file io). Coincidentally, I've implemented SIFT myself for an automated image stacking application used to reassemble a volume of Electron Transmission Microscopy images.

  19. Capt. Kirk by Anonymous Coward · · Score: 0
    Now run it on an Escher picture!

    That's how Capt. Kirk will defeat the head android in the remake of "Mudd's Women"!

    It'll be more entertaining than, "He always lies! .... I'm lying!

    But, if he's lying then he's tellling the truth!

    But if he can't tell the truth because he always lies. But if he says he's lying, then he's telling the truth..."

    1. Re:Capt. Kirk by hamilton76 · · Score: 1

      Nitpick alert: it was "I, Mudd," not "Mudd's Women."

      --
      "Let's just say this: he spelled 'Yale' with a '6'."
  20. Re:Can George Bush....? by $RANDOMLUSER · · Score: 0, Flamebait

    Black and white.

    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
  21. I worked with them briefly by moultano · · Score: 3, Informative

    The complexity of the models that the program is able to extract is similar to what you would see in a game like doom. All "floors" are perfectly horizontal, all "walls" are perfectly vertical, and most objects (people, trees, cars) become small vertical walls. This doesn't attempt to capture surface geometry at all; it approximates things with large planes. What they are saying is that most things you see in pictures are very well approximated by these simple primitives, such that when they create a scene using them it provides convincing parallax as you move around it. It's a really neat effect.

    1. Re:I worked with them briefly by Anonymous Coward · · Score: 0

      Exactly what I was thinking when I saw the generated images - looks like the system found the large rectangular object. The quality of the generated images seems to speak more about my visual system being able to interpret a texture on a couple of planes as something more complex, than it does about the approach. Not to degrade the work but researchers have been able to do this kind of stuff for a while now.

    2. Re:I worked with them briefly by moultano · · Score: 1
      Not to degrade the work but researchers have been able to do this kind of stuff for a while now.
      Do you have a link handy?
    3. Re:I worked with them briefly by Anonymous Coward · · Score: 1, Insightful

      > my visual system being able to interpret a texture on a couple of planes as something more complex

      Several pieces of work have exploited that effect in recent years, most notably Billboard Clouds at ACM SIGGRAPH 2003.

      > researchers have been able to do this kind of stuff for a while now

      Then you must know something no graphics researchers in the world do, since Derek's work was presented as new research in ACM SIGGRAPH 2005. (ACM SIGGRAPH is by far the top graphics conference in the world; if they thought it was new and you don't, you're probably wrong.)

  22. Re:Directly applicable to the car racing AI grand. by LiquidCoooled · · Score: 1

    Granted you can extrapolate an estimate of the surroundings for a 3d scene from a single image.
    This is good when the source material doesn't exist.

    However if I were in the grand challenge I wouldn't be swapping the (minimum) stereo imaging most cars appear to have.

    1) its an approximation and may not be applicable for different terrain or obsticles (similar rock against similar floor)
    2) its harder to fool 2 cameras than a single one, glitches could send you off the cliff.
    3) with a stereo pair you can interpolate properly and produce a much better map.

    Humans with one eye (and single image devices) benefit greatly when given a series of images because then the same interpolation can occur and the 3d scene can be rebuilt.

    --
    liqbase :: faster than paper
  23. Google Earth by Mifflesticks · · Score: 1

    I'd like to see this applied more directly to something like Google Earth. They already have the "show buildings".... this would be a great boon to that. It might need a different shading than the grey boxes used by Google earth as it stands now, to show which structures are derived from the 2d images, but still, I think it'd be great.

    Google, you can send me my check now, please.

    1. Re:Google Earth by cnettel · · Score: 1

      Of course this varies for different parts of the Google Earth material, but quite a lot of it is from a very steep angle. You can't tell the true height of the buildings from those pictures (maybe indirectly from shadows, but unless you know the time of day, latitude and time of year, that's a guess based on some object you think you know the size for). This algorithm is similar in scope to what we do when we face a 2D image, deciding what structures indicates depth. It still needs depth cues, arguably more obvious ones than a reasonably skilled human; which in this case is just about any human with functioning eyesight and an age above five years.

    2. Re:Google Earth by Mifflesticks · · Score: 1

      Good points, but wouldn't the metadata (time of day, and date) be embedded within the original image files? Plus, the approximate lattitude should be easy to determine given that they already have everything mapped onto the earth.

      I'm not arguing that everything would be able to be modeled, but every bit helps.

  24. CSI by chord.wav · · Score: 1

    This could be a revolution in the CSI field. There are already products that make 3D virtual crime scenes but this could be applied to just every case were a picture was taken.

    1. Re:CSI by zippthorne · · Score: 1

      Of course, the CSI version will allow you to explore the crime scene, including things that were *behind* the camera when the picture was taken.

      --
      Can you be Even More Awesome?!
  25. Facial Recognition applications. by IndustrialComplex · · Score: 1

    You are absolutely correct that it won't be able to tell what the 'reverse' side looks like, other than they will know that it has to be within certain size constraints.

    So if I'm looking at a football, I won't be able to tell what is behind it from a single picture. You would have a blind spot, that would grow based upon the vectors from the image aperture to the edges of the object.

    However, this could be a breakthrough for facial recognition. Given a facial photo, if they are able to extract the dimensions of features, it should provide another level of accuracy in the detection process.

    For example: Recognition software might limit a face to 10 possible matches, but if you then run this software, maybe only 1 has a nose that is as long, or eye sockets of a certain depth.

    --
    Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
  26. Nice... by Short+Circuit · · Score: 1

    So when is this going to be used to turn real environments into virtual environemts?

    Taking reconnaisance photos and turning them into training simulations, for example. Or, closer to my level, taking photos of public places and turning them into deathmatch levels. :)

    (Always wanted to make a Quake level of my high school, but then became worried people would thing I'd be the source of the next Columbine. Then I wanted to do one of my college, but then 9/11 came along, and I was worried of being investigated as a terrorist. There's freedom of speech, for you.)

    1. Re:Nice... by Anonymous Coward · · Score: 0

      No, make a Counter-Strike version, so you can bomb the school! de_Myschool, and get yourself arrested!

      Or a hostage rescue with custom hostage skins, for a cs_Myschool map. Either would be awesome.

    2. Re:Nice... by Short+Circuit · · Score: 1

      No, make a Counter-Strike version, so you can bomb the school! de_Myschool, and get yourself arrested!
      Or a hostage rescue with custom hostage skins, for a cs_Myschool map. Either would be awesome.


      OK...you're creepy. My only interest was playing an FPS in an physical environment I knew intimately. What you're describing sounds like your own fantasy social circumstance.

  27. Not for objects at all by moultano · · Score: 2, Insightful

    This is only for outdoor scenes and only extracts planar information. It isn't designed for objects at all. It provides general geometric context, ie this area is ground, this area is a left facing wall, etc. That's not to say that a similar technique couldn't be used for identifying round objects, but that isn't what this is for.

  28. 3D Object Reconstruction by Anonymous Coward · · Score: 0

    Many of these techniques aren't new; some of this stuff has been happening since '96.

  29. just like my program by crodrigu1 · · Score: 0

    I wrote a program to do something similar converts a 2D into a 3D image

  30. Escher in 3D by Jboost · · Score: 1

    I think you'll find this interesting: http://www.cs.technion.ac.il/~gershon/EscherForRea l/

  31. Obligatory... by Anonymous Coward · · Score: 1, Funny

    Left 30 degrees

    click click click click click

    Up twenty degrees

    click click click click click

    Enhanse

    click click click click click

    Zoom in on that

    click click click click click

    Enhanse

    click click click click click

    OK, give me a hardcopy right there.

    "More human than human is oour motto"

    1. Re:Obligatory... by Anonymous Coward · · Score: 0

      the way to enhance this would be to spell enhance correctly ;-)

  32. Play with it yourself! by cranesan · · Score: 4, Interesting

    http://www.cs.cmu.edu/~dhoiem/projects/popup/index .html

    Looks like some of the software they wrote to do this has been GPL'ed.

  33. Re:Directly applicable to the car racing AI grand. by Directrix1 · · Score: 1

    Well, that and we have a gigantic corpus of training data to extrapolate from.

    --
    Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
  34. Sexy by CrazyJim1 · · Score: 1

    researchers at Carnegie Mellon have found a way to allow computers to extrapolate 3 dimensional models I'd run it on a Victoria's Secret magazine. There are some excellent 3d models I'd like to extrapolate if you know what I mean.

  35. ESPER analysis: Blade Runner used this technology? by Anonymous Coward · · Score: 0

    I saw something unusual when I saw (again) Blade Runner.

    When examining the photo with the ESPER machine, I observed that the photo was transformed into 3d in someway. In fact I remember the mirror, perhaps in a future a mirror inside the photo can apport information of the 3D scene...

    The ESPER machine:

    http://www.geocities.com/Hollywood/Boulevard/7920/ bladeea2.html (spanish, sorry, but it has a diagram of the scene, where "espejo" means "mirror", there is a convex mirror)
    http://www.brmovie.com/FAQs/BR_FAQ_Terminology.htm (some information in english)

    It suddenly come to my mind when I read this announcement...

    I post here once a year, so I am not registered, and forgive my spanglish :lol:

    Egocentrico.

  36. realtime 2D to 3D movie software by fsiefken · · Score: 1

    in the context of my stereoscopy hobby for use with my emagin z800 vr visor i discovered software that was able to detect some depth dimension from the movement from frame to frame in a movie. The tech has been developed by a company called Soft4D, which doesn't exist anymore. But it seems http://www.colorcode3d.com/ sells a version of the software for use with any normal 2D DVD's and their stereoscopic 50 eurocent glasses. It sure adds some depth to a 2D movie, no true 3D effect but still remarkable and more immersive to watch then just 2D.

  37. Re: [OT] FFT by Anonymous Coward · · Score: 0

    1. See if your school has LabView or Matlab. Both offer FFT out of the box. One of those would have actually been my first choice for the project you're describing.

    2. If that fails, note that there are plenty of textbooks (or websites) that explain the FFT butterfly. A quick search turned up http://www.relisoft.com/Science/Physics/fft.html, which even has C++ source code available for download.

  38. Re:"Enemy of the State" - 9/11 Application by jfuredy · · Score: 1

    I have seen an example of this video enhancement technology where they have some crappy video of a car leaving a parking garage and the front license plate is completely unreadable due to grainy pixelation. But when they selected the area of the plate and compared the data from every frame of the video it because quite clear what the license plate said. It is very convincing.

    Ever since the 9/11 conspiracy theorists started posting captured stills of the airplane hitting the tower, pointing out unknown devices strapped to the underside, I have wished that someone with access to this image processing technology would analyze the full video sequence to see if there is really anything there or not. It sure would be nice to use some high-tech tools to put this whole thing to rest.

  39. Nothing like shape from shading approaches by moultano · · Score: 2, Insightful

    Shape from shading works only on a very narrow set of objects. If you are trying to recover the shape of a marble statue, use shape from shading. If your object has color forget about it.

    What you are saying amounts to "People have done research into computer vision in the past, therfore any new research into computer vision is soooo not new."

  40. Enhance by Anonymous Coward · · Score: 0

    Obligatory Blade Runner quote:

    Enhance 224176
    Enhance, Stop
    Move in, Stop
    Pull out, Track right, Stop
    Center in, Pull back, Stop
    Track 45 right, Stop
    Center and Stop
    Enhance 34 to 36
    Pan right and pull back, Stop
    Enhance 34 to 46
    Pull back, Wait a minute, Go right, Stop
    Enhance 5719
    Track 45 left, Stop
    Enhance 15 to 23
    Give me a hard copy right there.

  41. Machine learning by sc0p3 · · Score: 1

    Unfortunately this is done by neural learning techniques, "machine learning". So it is essentially randomly taught artificial neurons and the researchers have no idea how the machine solves it. However machine learning techniques, or Artificial Neural Networks (ANN) have alot of potential as custom IC's and computing power become better and better.

    1. Re:Machine learning by NNWizard · · Score: 1

      There are many other methods available for machine learning than artificial neural networks. Moreover, ANN are not 'randomly thaught': the algorithms used are well defined and both mathematically and statistically sound; only thos who do not understand the math behind it actually think of it as magic...

    2. Re:Machine learning by sc0p3 · · Score: 1

      rofl.. only cults and academics claim ignorance as an argument. I am a biomedical engineer and have studied ANN & applied different networks to numerous problems. I solve math modelling problems daily. ANN is guided randomness.

    3. Re:Machine learning by NNWizard · · Score: 1

      Well we moved from 'randomly thaught' to 'guided randomness'. That is a good step, any statistical method can be considered 'guided randomness'. The few things that can bring us far from randomness are prior knowledge and assumptions. Since ANN are nonlinear and nonparametric, they make few assumptions so maybe they are more randomized than other statistical methods. But of course IANABE.

    4. Re:Machine learning by sc0p3 · · Score: 1

      i said randomly taught, not randomly thaught, you are thinking "thought". ANN is randomly taught. As in the teaching data is randomly introduced to the ANN, its not something thats predicted and must be repeated exactly to achieve the same results. taught - (all used chiefly with qualifiers 'well' or 'poorly' or 'un-') having received specific instruction

    5. Re:Machine learning by Anonymous Coward · · Score: 0

      Sorry for the extra h, I am thinking 'taught'. Nothing forces you to randomize the inputs in the learning phase of the network. It is even better not to randomize in practice, especially if you are cross validating the meta/hyper parameters of the network and you have few samples. Cross validation (and other resampling techniques) is often inefficient if the distributions in each subsample are not very similar. As far as the initiazation of the weights is concerned, randomization is not the best option: many methods have been designed to set the initial weights according to the distribution of the learning data to optimize some criterion that is deterministic.

  42. something practical by PMuse · · Score: 1

    Now if only they could teach this to my dogs.

    --
    "We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
  43. I'd like to see it deal with mouhefanggai by smellsofbikes · · Score: 1

    otherwise known as a steinmetz solid, which is often used as a demonstration for engineering drawing or architecture classes to show that a 3-d drawing of an object is not sufficient to determine its actual shape. A mouhefanggai in 3-D drawings looks like a sphere, but is actually a ridged object with a surface consisting entirely of flat-wrapped curves, rather than compound curves.

    --
    Nostalgia's not what it used to be.
  44. 3D Movies by GeeksHaveFeelings · · Score: 0

    Imagine what this could do for converting a 2D film to 3D. With the appropriate technology, we could have 3D movies that are worth a darn.

  45. Prior art by SixDimensionalArray · · Score: 1

    Hmm let me see here.. what could be considered prior art?

    Maybe Pablo Picasso's Guernica??!?! Man, that Picaso was waaaay ahead of his time!

    *watches out for rotten tomatoes*

    SixD

  46. Tanfastic! by TwelveInches · · Score: 0

    This algorithm will breathe life into my old porn collection!

  47. Well... by Ayanami+Rei · · Score: 1

    This all pre-supposes you can translate the diagram accurately and position it in the 3d world. You'd probably need GPS readings at different points on the building, and on the camera to get decent results.

    And you need a light model and surface texture models (or a lot of pictures from different angles).

    So this isn't trivial. But it's doable. Such techniques are used in film for scene composition and for texturing 3d representations of real-world objects.

    It's not like you can just take a picture of a building and have your 3d modeling tool figure out what it's a picture of and create a new texture artifact, etc.
    You have special tools and workflows to do this. I doubt they'll be bundled with AutoCAD any time soon.

    But there's nothing ground breaking about:
    1) take picture of things that you have a model of
    2) derive textures for model for arbitrary model viewing

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:Well... by jackbird · · Score: 2, Interesting
      I've used Photomodeler and Canoma, and made camera mapped environments in 3D software by hand for years. It is incredibly nontrivial. it is a lot of blood, sweat, tears, handpainting, and a not-so-terribly good result. Some typical problems:
      • Camera barrel distortion
      • chromatic abberations
      • hot colors in high-contrast areas of digital photos
      • JPEG compression artifacts
      • specular highlights and reflections
      • lens flares and blooms from those specular highlights and reflections
      • clipped/out of gamut areas
      • occluding objects like trees, parked cars, signs, telphone poles, pedestrians, trashcans, newspaper vending machines, etc., etc., etc.
      • occluding objects like other buildings in aerial photos
      • only being able to shoot certain details from awkward angles
      • not being able to shoot certain details from any angle at all
      • horrendous texture stretching
      • perspective problems with concave/convex detail like window ledges, cornices, awnings, etc., etc., etc.
      • stuff you forgot to photograph
      • different lighting conditions when you go back out to shoot the stuff you forgot to photograph
      • unavailable architectural drawings
      • paper architectural drawings
      • poorly-reproduced paper architectural drawings from 1912
      • architectural drawings that bear no resemblance to the conditions onsite
      • CAD files aligned to state survey coordinates so large that the single-precision floats in most 3D software starts scrambling the model due to rounding errors.

        as I said, nontrivial.

  48. Shape from shading is widely applicable by amightywind · · Score: 1

    Shape from shading works only on a very narrow set of objects. If you are trying to recover the shape of a marble statue, use shape from shading. If your object has color forget about it.

    Not true at all. If you understand the photometric function of the materials in the scene variation due to color can be separated from variation due to shading. Image classification techniques are useful for doing this. This is discussed in the book and elsewhere. We used the technique for Voyager II to measure topography of Uranus and Neptune satellites. Stereo pairs were not often available.

    What you are saying amounts to "People have done research into computer vision in the past, therfore any new research into computer vision is soooo not new.

    Ding! Wrong again. The lesson is that slashdot editors should be careful to refrain from hyperbolic descriptions of results that are really incremental rather than revolutionary. And readers like you should not swallow the summaries hook line and sinker.

    --
    an ill wind that blows no good
  49. Already have it! by Anonymous Coward · · Score: 0
    With this technology, you could ask for "things that are round, and have a box".
    I already have this technology. It's my mobile phone. When I tell it to find me something round that has a box, it calls your mom.
  50. Need more on Geometric Hashing! by VlartBlart · · Score: 0

    Got any links on this? Not much Wiki info on Geometric Hashing - I remember my physics teacher at school explaining why a robot has these problems recognising stuff so I'm kinda interested (this was in the 80's so I may have missed some of the newer stuff).

    1. Re:Need more on Geometric Hashing! by jsharkey · · Score: 1

      There's an excellent paper by Wolfson and Rigoutsos called "Geometric Hashing: An Overview." You can find a PDF copy on Google Scholar.

      For some other good sources on Geometric Hashing, see the References on my Final Paper.

  51. Wow... by Anonymous Coward · · Score: 0

    Imagine what this could do for the porn industry ;)

  52. Re:Directly applicable to the car racing AI grand. by zippthorne · · Score: 1

    glitches can't send you over a cliff. maps & GPS (or inertial) keeps you off the cliffs. glitches could send you into a ditch or onto a bush, either of which would be difficult to extract from.

    --
    Can you be Even More Awesome?!
  53. As a fellow Computer Vision researcher by Ruins · · Score: 1

    How impressive this research really is won't be known until we can have a look at their methods, algorithms and training data set. I have a feeling that the novel aspect of their work is not in the extraction of features, or the method used to determine whether a surface is vertical or horiztonal. As others have already said, shape from shading (think shading a lit cube with a pencil on paper) and even geometric approaches can get you a 3D model from 2D images. It all depends on the assumptions you make before hand. However, if their system learned the classification from the 300+ images it got fed, that would be pretty impressive, even though they most likely hand labelled the images as mostly vertical, a bit of both, etc.

    On a side note, Kanade is a very influential researcher in computer vision, and one with a massive and solid body of novel work. If he said that the CMU researchers' work is good and novel, it adds quite a bit of weight to their claims.

    Oh, and to those that said images of buildings are not "everyday", keep in mind that most research papers I have seen operate on handcrafted images, sometimes of a single kind of object. Being able to handle arbitrary images of buildings is very, very general and "everyday".

    --
    Berserk Manga > All
  54. No by Anonymous Coward · · Score: 0

    > So we're one step closer to actually being able to do the dramatic
    > image-enhancing stuff that's routine in film and television crime drama?

    By a strange coincidence, we (the CMU graphics lab) been making fun of that very piece of cinematic fiction for the last week or so.

    If all you have is a single image, there's nowhere to get additional information from to do that magical "enhance" trick movies so love. If you've got a video stream there may be stuff you can do, but video data is typically stored at substantially lower resolution than still images due to space considerations, so it's pretty unlikely you're going to get magic out of it. Basically, the magic "enhance" button is pure fiction, and likely to remain so for the forseeable future. That's not to say no image processing is possible or useful - stuff is done all the time - but it's not like what movies would have you believe.

    On the other hand, neither are movies' portrayals of being a cop, being a programmer, or being a pretty coed spending a weekend at an abandoned cabin in the woods remotely accurate, so why should this be any different?

  55. No grats due. by Codename.Juggernaut · · Score: 1

    When it comes down to it, these men are shaking hands about teaching a computer to read Magic Eyes.

    Isn't that like a second year problem at most universities?

    1. Re:No grats due. by Anonymous Coward · · Score: 0

      computer vision in general isn't a second year problem at any, or nearly any, university. let alone developing a nn that reconstructs 3d scenes based on individual 2d images.

  56. Only three procent... by Mr+Europe · · Score: 1

    "Only about three percent of surfaces in a typical photo are at an angle, they have found."

    Doesn't it depend on whether the photo's of a city and man built objects or of nature, trees and mountains...

  57. News? by Elecorn · · Score: 1

    Uh, this has already been done before.

    And this is news because...

    --
    Mike D. Smith http://www.elecorn.com
  58. Geordi on the holodeck by sciencecneisc · · Score: 1

    In this rather disturbing episode in the fourth season of TNG, engineer Commander Geordi La Forge has to investigate why he and three colleagues are drawn to a particular small area of a planet, to their peril. There's visual evidence from their first trip he keeps pouring over until he decides to focus on a small shadow and take that frame to the holodeck. The holodeck gives the 2D shadow 3D shape, both proving there was something else in the room and more. We see it's not human. In the fiction he tells the computer to extrapolate the shadow based on the size of an average human, say 5'8" and using personnel logs the other crew men can be rendered in 3D completely. Most of it though is from a 2D frame in a video. Geordi has a weird costume at the end of this episode...blue man group anyone?