Making 3D Models from Video Clips
BoingBoing is covering an interesting piece of software called VideoTrace that allows you to easily create 3D models from the images in video clips. "The user interacts with VideoTrace by tracing the shape of the object to be modeled over one or more frames of the video. By interpreting the sketch drawn by the user in light of 3D information obtained from computer vision techniques, a small number of simple 2D interactions can be used to generate a realistic 3D model."
wow, what a terrible link.
A quick search turns up the project homepage http://www.acvt.com.au/research/videotrace/
AI needs a way of interpreting video input into 3d objects and environment. Once a computer can represent objects in a 3d environment, it can then perform operations on them. Technically you could make AI without this tool, but you'd have to do extremely precise and patient CAD inputs that would take most of your life. With a tool to convert video into 3d objects, you can just start cataloging all the objects out there. Add in a 3d physics simulator, and you're halfway to true AI. I have a quick overview on how to do AI, and as you'll note on the very beginning of the page: the reason I haven't worked on AI myself is that I can't code a video->3d object converter myself.
God spoke to me.
Software like Canoma from the now-defunct Metacreations would let you create 3D models from 2D images in the mid-to-late 90s. I also remember reading about people using Viz ImageModeler to convert images from video to models even though the software is also designed for still images - the users would just capture those frames they needed to create the 3D model.
The only thing "new" about this is using video as the input without having to grab the individual frames yourself.
Never let reality temper imagination
Never let reality temper imagination
Remember back in the day when we were told that computers would never be able to learn how to understand human speech because it's too complicated? The arguments were compelling but now we've got voice recognition working over crappy telephone connections and dictation software is getting better all the time. As bad as the voice recognition problem was, computer vision seemed like an even harder nut to crack given how impossible it seemed to get a machine to go from a two-dimensional image to 3D. All of this stuff seems like impossibly difficult "we'll never get there" AI impossibilities and then we see a technology demonstration that nails it. I'm still astounded that DARAPA is not only asking for robot-driven cars, they're actually getting teams producing working results. That's another problem I always thought would be impossible.
My prediction for the future: the 21st century will be for robotics what the 20th was for aviation. We've been thinking about it for centuries but now the technology is maturing to the point that we can really do something with it. The stuff we're amazed by today is going to seem like wood and canvas biplanes.
Kwisatz Haderach
Sell the spice to CHOAM
This Mahdi took Shaddam's Throne
I'd like to see how it holds up against Calista Flockhart footage and not go Division By Zero.
I'm a Ph.D. student at UC Santa Cruz. I finished my masters a few years ago working on enhancements to a project with similar goals. My advisor, Jane Wilhelms (who unfortunately died shortly after I finished my masters) was working on computer vision techniques for several years. Her work focused on extracting motion for animals (often children or horses) out of videos. My Masters contribution was to look at how the accuracy and usability of the software could be improved if we assume that the general motion of a walk is the same for all instances of a particular species (the knees all bend the same way, and the legs move in the same order, etc). I didn't have a high quality capture to start with, so the results were a bit fuzzy in terms of accuracy, but it did make the process easier for the user. The user had only to make the "original" motion match the video at key frames (maybe 4 per "walk cycle"), and the computer could easily interpret the rest; I don't recall off the top of my head, but I think the number of key frames the user had to specify was reduced by half or more over the former process (without the canonical motion as a starting point). I didn't publish any papers based on my work, but my masters thesis (with example filmstrips) is available.
Hook up google maps api with polar navigated flight path, some edge/point detection algorithms and start mapping. That'd be an interesting video.
.
I've never heard of "true AI" -- do you mean strong AI?
And no, computer vision plus physics simulation does not make half of strong AI, either. Russell and Norvig, the classic AI text, lists 9 abilities generally required for strong AI. 2 is not half of 9.
I don't know what your dead geocities page has, but not working on AI because you can't write a video->3d object converter is like not working on video compression because you can't act.
http://www.youtube.com/watch?v=vda2RAEuW_g
When I was in grad school, I knew a fellow who was working on similar technology. I don't think he got anywhere near as advanced as this, but he did get good enough that given 10 to 15 still images, his software could create a primitive 3D model.
Unfortunately for him, he tried to make a 3D model of his erect penis. I'm not sure if he realized it or not, but he wasn't very well hung (he's Korean). Well, at one of the presentations he had to make regarding his work, he accidentally opened up the model of his penis. He couldn't even deny that it was his, since his name was in the filename. And his supervisor, an older woman, just couldn't stop laughing. He did go on to get his degree, but I think his pride took a real beating.
In my thesis I'm also creating a 3d model from a video stream, only I'm using stereoscopy and pattern recognition to find matching objects in each frame and triangulating the depth to said objects. By the end I'm hoping to reduce the objects to small pixel clusters; the tricky part is that all this is happening in real-time. By mounting the cameras on a device where the point of view is know, it could be used to map out any static terrain by just navigating through it. Adding more cameras from different perspectives increases the completeness of the generated model. The article has definately got the right idea. With sufficient object detection and tracking algorithms, you could minimise or eliminate the need to draw the template.
...and no one is going to make a porn joke?
Actually our company has had technology more advanced than that described in the article for years. With ours you simply pan the camera around and the model creation is fully automatic - there is no need to trace the image at all.
It's called Instant Scene Modeller and heres a link to a demo of the technology for anyone that's interested: http://www.demo.com/demonstrators/demo2005/54188.php
Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
It surely mitigates the slashdot effect.
Patents Drive Free Software as Hurricanes Drive Construction Industry