Making 3D Models from Video Clips
BoingBoing is covering an interesting piece of software called VideoTrace that allows you to easily create 3D models from the images in video clips. "The user interacts with VideoTrace by tracing the shape of the object to be modeled over one or more frames of the video. By interpreting the sketch drawn by the user in light of 3D information obtained from computer vision techniques, a small number of simple 2D interactions can be used to generate a realistic 3D model."
AI needs a way of interpreting video input into 3d objects and environment. Once a computer can represent objects in a 3d environment, it can then perform operations on them. Technically you could make AI without this tool, but you'd have to do extremely precise and patient CAD inputs that would take most of your life. With a tool to convert video into 3d objects, you can just start cataloging all the objects out there. Add in a 3d physics simulator, and you're halfway to true AI. I have a quick overview on how to do AI, and as you'll note on the very beginning of the page: the reason I haven't worked on AI myself is that I can't code a video->3d object converter myself.
God spoke to me.
for finding the one boingboing post that's not about Doctorow's Disney fetish, or Xeni's insistance that she is in fact, not a he.
Actually, algorithmically, you can make a substantial leap in processing capabilities when you switch from feeding in series of still images to video. This may seem a bit counterintuitive, since a video is just a series of still images, but the key is that a video is a continuous series of still images.
The main problem with existing techniques is that they often require a lot of user interaction to create a complete model, because points between images have to be delineated and correlated by hand, or at best with some minimal computer assistance.
A video-based process can take advantage of the fact that changes between the images will be relatively small, and follow definite trajectories, which would allow an appropriate algorithm to identify and correlate features with almost no manual intervention. This would be an absolutely huge improvement in usability, although it's not an easy problem by any means.
For example, the program may be able to easily isolate objects from the background by tracking differences in how points move due to perspective, which can be done with discontinuous still pictures, but is much harder to say with any confidence which points correlate with which under arbitrary changes in point of view.
To give an analogy, it'd be like giving you a picture of a whole egg, and a picture of a crushed egg, and asking you try to and accurately trace back where individual pieces of the shell came from. It'd be much, much easier if you had a video of the egg being smashed, where you could trace out, frame by frame, where individual pieces came from.
It's not the same problem, but for a computer, it's comparably hard. For a human being, if the egg wasn't smashed, it'd be relatively simple to pick out which points relate to which, but that's only because we have a sophisticated image recognition system that allows us to reason about shapes. If you happen to have two pictures of an unfamiliar object from radically different points of view, it can be quite tricky to decide what the whole object must look like. Show a video of the same object, moving around between different points of view, and it's not nearly as hard.
I'm a Ph.D. student at UC Santa Cruz. I finished my masters a few years ago working on enhancements to a project with similar goals. My advisor, Jane Wilhelms (who unfortunately died shortly after I finished my masters) was working on computer vision techniques for several years. Her work focused on extracting motion for animals (often children or horses) out of videos. My Masters contribution was to look at how the accuracy and usability of the software could be improved if we assume that the general motion of a walk is the same for all instances of a particular species (the knees all bend the same way, and the legs move in the same order, etc). I didn't have a high quality capture to start with, so the results were a bit fuzzy in terms of accuracy, but it did make the process easier for the user. The user had only to make the "original" motion match the video at key frames (maybe 4 per "walk cycle"), and the computer could easily interpret the rest; I don't recall off the top of my head, but I think the number of key frames the user had to specify was reduced by half or more over the former process (without the canonical motion as a starting point). I didn't publish any papers based on my work, but my masters thesis (with example filmstrips) is available.
Hook up google maps api with polar navigated flight path, some edge/point detection algorithms and start mapping. That'd be an interesting video.
.
The same is true for image recognition. You can get a computer to recognize movement pretty easily. Heck, the ability for software to detect the 3d form of an object has been around for ages. However, getting a computer to watch Star Wars and say "I see Dennis Lawson sitting inside an X-Wing fighter." is, as I said before, difficult to do without a concept of 'experience'.
We'll get there one of these days, but right now the sorts of cool-sounding advancements we've been seeing really only work in very specific circumstances.
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
In my thesis I'm also creating a 3d model from a video stream, only I'm using stereoscopy and pattern recognition to find matching objects in each frame and triangulating the depth to said objects. By the end I'm hoping to reduce the objects to small pixel clusters; the tricky part is that all this is happening in real-time. By mounting the cameras on a device where the point of view is know, it could be used to map out any static terrain by just navigating through it. Adding more cameras from different perspectives increases the completeness of the generated model. The article has definately got the right idea. With sufficient object detection and tracking algorithms, you could minimise or eliminate the need to draw the template.