Slashdot Mirror


Making 3D Models from Video Clips

BoingBoing is covering an interesting piece of software called VideoTrace that allows you to easily create 3D models from the images in video clips. "The user interacts with VideoTrace by tracing the shape of the object to be modeled over one or more frames of the video. By interpreting the sketch drawn by the user in light of 3D information obtained from computer vision techniques, a small number of simple 2D interactions can be used to generate a realistic 3D model."

23 of 103 comments (clear)

  1. Terrible link by masterz · · Score: 5, Informative

    wow, what a terrible link.

    A quick search turns up the project homepage http://www.acvt.com.au/research/videotrace/

    1. Re:Terrible link by apankrat · · Score: 4, Insightful

      Outside of /. this sort of news "wrapper" articles (BB or not) is considered a blog spam. There is absolutely no reason to link to a wrapper, when it just rehashes what's in the original article and then forwards to it for details (which is what a vast majority of readers would want anyways).

      --
      3.243F6A8885A308D313
    2. Re:Terrible link by wdebruij · · Score: 2, Funny

      which is what a vast majority of readers would want

      Are we on the same site? What is this "article" you talk of?

  2. Another step towards AI by CrazyJim1 · · Score: 3, Interesting

    AI needs a way of interpreting video input into 3d objects and environment. Once a computer can represent objects in a 3d environment, it can then perform operations on them. Technically you could make AI without this tool, but you'd have to do extremely precise and patient CAD inputs that would take most of your life. With a tool to convert video into 3d objects, you can just start cataloging all the objects out there. Add in a 3d physics simulator, and you're halfway to true AI. I have a quick overview on how to do AI, and as you'll note on the very beginning of the page: the reason I haven't worked on AI myself is that I can't code a video->3d object converter myself.

    1. Re:Another step towards AI by QuantumG · · Score: 4, Interesting

      Have you heard of the Scale Invariant Feature Transform? Well you have now. There are libraries written in C# (no less) which are publicly available to do this stuff. You can recognize a large collection of objects.

      --
      How we know is more important than what we know.
    2. Re:Another step towards AI by kudokatz · · Score: 5, Interesting

      SIFT is ok even for occluded objects, but is horrid in 3-d because SIFT features cannot match up for a significantly rotated scene. There are better algorithms that can recover both the shape of the scene as in the article and even produce the location of the camera as a by-product.

      In terms of object recognition, there has been great work done by treating an "nxn" pixel image as a point in n^2 space, and then reducing the computation space and projecting a given image onto that new, lower-dimensional approximation of the original object, and finding a match via a nearest-neighbor search through recognized objects.

      There is also good work being done in terms of getting a detailed 3-d model using structured light methods: http://www.prip.tuwien.ac.at/research/research-areas/3d-vision/structured-light

      There is good literature out there, but sometimes the math gets over my head =P

    3. Re:Another step towards AI by CrazyJim1 · · Score: 4, Interesting

      I get that a lot. Blind people still have a 3d imagination. They need to know where the doors are, where the stairs are, and where objects they use are. You need a 3d imagination space to have AI and that is the primary reason that past attempts at making AI have failed. I love to watch the advances in video card technology and the competition between NVIDIA and ATI because the more they work, the easier it will be to do AI, and all computer advances for that matter. I think I could start some basic AI with this 3d recognition software with the hardware of an average modern desktop. I think it is just a software problem and not necessarily a hardware one. We'll see. I'm going to keep in touch with this group and see if they let me use their software because I'm an unemployed coder and I might as well work on AI because some group has to do it. I'll make it an open source project in Source Forge and maybe extra coders will jump on.

    4. Re:Another step towards AI by ADRenalyn · · Score: 2, Interesting

      Navigation needs 3D and that already works.

      Navigation might work, but it's far from perfect, or even good.

      It's nice that your robot can tell when something is blocking its way. But how does it know when there is nothing left to walk/drive on? For instance, a stair leading down, or a change in materials (from sand to water, or asphalt to ice) that would prevent it from moving properly? Can it tell that certain variations are normal (a rug, or different colored tiles on a ceramic floor) and some are dangerous (the edge of an in-ground pool)?

      When a robot/computer can tell that something is in it's way- and figure out what that object is, and if it can be moved (safely, and to where), then we're approaching *decent* AI.

  3. Software for 2D images for 3D models is not new by bn0p · · Score: 5, Informative

    Software like Canoma from the now-defunct Metacreations would let you create 3D models from 2D images in the mid-to-late 90s. I also remember reading about people using Viz ImageModeler to convert images from video to models even though the software is also designed for still images - the users would just capture those frames they needed to create the 3D model.

    The only thing "new" about this is using video as the input without having to grab the individual frames yourself.


    Never let reality temper imagination

    --
    Never let reality temper imagination
    1. Re:Software for 2D images for 3D models is not new by Anonymous Coward · · Score: 3, Interesting

      Actually, algorithmically, you can make a substantial leap in processing capabilities when you switch from feeding in series of still images to video. This may seem a bit counterintuitive, since a video is just a series of still images, but the key is that a video is a continuous series of still images.

      The main problem with existing techniques is that they often require a lot of user interaction to create a complete model, because points between images have to be delineated and correlated by hand, or at best with some minimal computer assistance.

      A video-based process can take advantage of the fact that changes between the images will be relatively small, and follow definite trajectories, which would allow an appropriate algorithm to identify and correlate features with almost no manual intervention. This would be an absolutely huge improvement in usability, although it's not an easy problem by any means.

      For example, the program may be able to easily isolate objects from the background by tracking differences in how points move due to perspective, which can be done with discontinuous still pictures, but is much harder to say with any confidence which points correlate with which under arbitrary changes in point of view.

      To give an analogy, it'd be like giving you a picture of a whole egg, and a picture of a crushed egg, and asking you try to and accurately trace back where individual pieces of the shell came from. It'd be much, much easier if you had a video of the egg being smashed, where you could trace out, frame by frame, where individual pieces came from.

      It's not the same problem, but for a computer, it's comparably hard. For a human being, if the egg wasn't smashed, it'd be relatively simple to pick out which points relate to which, but that's only because we have a sophisticated image recognition system that allows us to reason about shapes. If you happen to have two pictures of an unfamiliar object from radically different points of view, it can be quite tricky to decide what the whole object must look like. Show a video of the same object, moving around between different points of view, and it's not nearly as hard.

    2. Re:Software for 2D images for 3D models is not new by samkass · · Score: 4, Informative

      Yeah, the big breakthrough in this, IMHO, was a 1994 paper by Takeo Kanade of CMU's Robotics Institute titled "A Sequential Factorization Method for Recovering Shape and Motion from Image Streams", which did a pretty good job of factorizing out the 3D model as well as the camera motion from a video stream... it could tell you not only the dimensions of the house you were videotaping, but the stride of the person holding the camera. This laid the groundwork for a lot of other "model from video" work done throughout the 90's. More recently a group there has done a lot of work on "Shape from Sillouette" which looks closer to the technology that this product uses.

      I've been waiting for this technology to go big on eBay for a decade... maybe this'll be the year.

      --
      E pluribus unum
  4. computer vision technology is pretty wild by jollyreaper · · Score: 3, Insightful

    Remember back in the day when we were told that computers would never be able to learn how to understand human speech because it's too complicated? The arguments were compelling but now we've got voice recognition working over crappy telephone connections and dictation software is getting better all the time. As bad as the voice recognition problem was, computer vision seemed like an even harder nut to crack given how impossible it seemed to get a machine to go from a two-dimensional image to 3D. All of this stuff seems like impossibly difficult "we'll never get there" AI impossibilities and then we see a technology demonstration that nails it. I'm still astounded that DARAPA is not only asking for robot-driven cars, they're actually getting teams producing working results. That's another problem I always thought would be impossible.

    My prediction for the future: the 21st century will be for robotics what the 20th was for aviation. We've been thinking about it for centuries but now the technology is maturing to the point that we can really do something with it. The stuff we're amazed by today is going to seem like wood and canvas biplanes.

    --
    Kwisatz Haderach
    Sell the spice to CHOAM
    This Mahdi took Shaddam's Throne
    1. Re:computer vision technology is pretty wild by MobileTatsu-NJG · · Score: 4, Interesting

      Remember back in the day when we were told that computers would never be able to learn how to understand human speech because it's too complicated? The arguments were compelling but now we've got voice recognition working over crappy telephone connections and dictation software is getting better all the time. As bad as the voice recognition problem was, computer vision seemed like an even harder nut to crack given how impossible it seemed to get a machine to go from a two-dimensional image to 3D. All of this stuff seems like impossibly difficult "we'll never get there" AI impossibilities and then we see a technology demonstration that nails it. I'm still astounded that DARAPA is not only asking for robot-driven cars, they're actually getting teams producing working results. That's another problem I always thought would be impossible. Hmm. Though it's not really that clear from your post, I'm concerned that you're seeing one problem where really there is two. In the case of voice recognition, getting a computer to recognize a spoken word within a certain context is far easier than getting the computer to understand a phrase like "Set up an appointment for me on the Fifth of May at 2 pm.". One is simple signal analysis, the other is context-sensitive understanding. The former is easy and has been possible for years. The latter is virtually impossible without the computer in question having 'experience'.

      The same is true for image recognition. You can get a computer to recognize movement pretty easily. Heck, the ability for software to detect the 3d form of an object has been around for ages. However, getting a computer to watch Star Wars and say "I see Dennis Lawson sitting inside an X-Wing fighter." is, as I said before, difficult to do without a concept of 'experience'.

      We'll get there one of these days, but right now the sorts of cool-sounding advancements we've been seeing really only work in very specific circumstances.
      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

  5. Oh yeah ? by witte · · Score: 3, Funny

    I'd like to see how it holds up against Calista Flockhart footage and not go Division By Zero.

  6. This sounds like a project I did some work on by markds75 · · Score: 5, Interesting

    I'm a Ph.D. student at UC Santa Cruz. I finished my masters a few years ago working on enhancements to a project with similar goals. My advisor, Jane Wilhelms (who unfortunately died shortly after I finished my masters) was working on computer vision techniques for several years. Her work focused on extracting motion for animals (often children or horses) out of videos. My Masters contribution was to look at how the accuracy and usability of the software could be improved if we assume that the general motion of a walk is the same for all instances of a particular species (the knees all bend the same way, and the legs move in the same order, etc). I didn't have a high quality capture to start with, so the results were a bit fuzzy in terms of accuracy, but it did make the process easier for the user. The user had only to make the "original" motion match the video at key frames (maybe 4 per "walk cycle"), and the computer could easily interpret the rest; I don't recall off the top of my head, but I think the number of key frames the user had to specify was reduced by half or more over the former process (without the canonical motion as a starting point). I didn't publish any papers based on my work, but my masters thesis (with example filmstrips) is available.

  7. Test case by kramulous · · Score: 3, Interesting

    Hook up google maps api with polar navigated flight path, some edge/point detection algorithms and start mapping. That'd be an interesting video.

    --
    .
  8. "True AI"? by Anonymous Coward · · Score: 2, Insightful

    Add in a 3d physics simulator, and you're halfway to true AI.

    I've never heard of "true AI" -- do you mean strong AI?

    And no, computer vision plus physics simulation does not make half of strong AI, either. Russell and Norvig, the classic AI text, lists 9 abilities generally required for strong AI. 2 is not half of 9.

    I have a quick overview on how to do AI, and as you'll note on the very beginning of the page [geocities.com]: the reason I haven't worked on AI myself is that I can't code a video->3d object converter myself.

    I don't know what your dead geocities page has, but not working on AI because you can't write a video->3d object converter is like not working on video compression because you can't act.
  9. Youtube by Anonymous Coward · · Score: 5, Informative
  10. Re:I for one... by Anonymous Coward · · Score: 2, Funny

    When I was in grad school, I knew a fellow who was working on similar technology. I don't think he got anywhere near as advanced as this, but he did get good enough that given 10 to 15 still images, his software could create a primitive 3D model.

    Unfortunately for him, he tried to make a 3D model of his erect penis. I'm not sure if he realized it or not, but he wasn't very well hung (he's Korean). Well, at one of the presentations he had to make regarding his work, he accidentally opened up the model of his penis. He couldn't even deny that it was his, since his name was in the filename. And his supervisor, an older woman, just couldn't stop laughing. He did go on to get his degree, but I think his pride took a real beating.

  11. Similar concept for my thesis by ZedarSlash · · Score: 2, Interesting

    In my thesis I'm also creating a 3d model from a video stream, only I'm using stereoscopy and pattern recognition to find matching objects in each frame and triangulating the depth to said objects. By the end I'm hoping to reduce the objects to small pixel clusters; the tricky part is that all this is happening in real-time. By mounting the cameras on a device where the point of view is know, it could be used to map out any static terrain by just navigating through it. Adding more cameras from different perspectives increases the completeness of the generated model. The article has definately got the right idea. With sufficient object detection and tracking algorithms, you could minimise or eliminate the need to draw the template.

  12. What, all these comments by SeaFox · · Score: 4, Funny

    ...and no one is going to make a porn joke?

  13. Re:Wake me when... by pnewhook · · Score: 3, Insightful

    We're a heck of a lot closer with this than without it. This is a huge step in that direction.

    Actually our company has had technology more advanced than that described in the article for years. With ours you simply pan the camera around and the model creation is fully automatic - there is no need to trace the image at all.

    It's called Instant Scene Modeller and heres a link to a demo of the technology for anyone that's interested: http://www.demo.com/demonstrators/demo2005/54188.php

    --
    Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
  14. linking to wrappers is probably good by someone1234 · · Score: 2, Insightful

    It surely mitigates the slashdot effect.

    --
    Patents Drive Free Software as Hurricanes Drive Construction Industry