Building 3D Models On the Fly With a Webcam
blee37 writes "Here is an excellent video demonstration of a new program developed by Qi Pan, a graduate student, and other researchers at the University of Cambridge. The 'ProFORMA' software constructs a 3D model of an object in real time from (commodity) webcam video. The user can watch the program deduce more pieces of the 3D model as the object is moved and rotated. The resulting graphics are of high quality."
With open-source rendering images already well established and continually improving that only leaves the content areas under-developed. This method will allow anyone with an object to digitize it. This will enable people to take that content and then mix it in virtual environments. Throw in some voice-synthesis software, some directing software, and a million monkeys hammering away at plots then Hollywood as an institution is dead. This is another piece, the others will fall into line as well. It is ironic in that in one of the Civilization games, discovering the Internet invalidates the Hollywood Wonder.
Shh.
I can just see it now -- anyone who can get a bit of video of you can create a 3-D models of your face and body, and then do anything with the likeness. When rendering gets really good, this could be a bit embarrassing. Instead of 2D retouched photos of celebrities and politicians, we'll be seeing hacked up 'animated' (but realistic) video of them doing all sorts of wild stuff. Well, it might be a boon to the porn industry, at least in the short-term before the rendering software becomes available to consumers.
I was thinking about robots one day and I was wondering why those who work on computer vision didn't do something like this. Instead of trying to get the machine to understand the analog world, why wouldn't it be better for the machine to have an internal representation of the world by making a 3d map? Quake 3 CoffeeShop, if you will.
The idea I had was that the vision system creates a 3d map with entities, mapped from the vision system as well, inside. The AI works within the 3d representation of the world. If the AI wants to move from A to B, it signals the body controlling subsystem to start walking. When the 3d representation, being informed by the vision system, tells the AI that it is at point B, then the AI signals to stop walking.
Hardware constraints not withstanding, is this model any good?
I'm just a lowly, early middle aged novice C programmer who has never actually done anything with robotics, so if what I said made no sense or is obviously idiotic, I do understand that my ideas are comin' outta my ass.
There seems to be a huge gap between these kind of academic projects and the commercial available programs. I have come across several commenrcial applications that can do these kind of things, but these applications cost at least a 1000 dollars or even more. And then there are all these academic projects (going on for at least two decades), which present nice video's and papers, and sometimes release some software. But when you look at the software, you discover that you first have to download nine other package and compile the whole thing and what you get is some kind of script you have to run, with all sorts of command line options. But sofar, I have never found an application with a solid interface on the level of the Gimp or Blender for the matter of the fact. I find this rather strange. I am almost getting the impression that some of the results are sold to the developers of the commercial packages.
The major reason that these types of programs don't get expanded into commercial products or bought and integrated into existing products is that they are cute tech demos but not particularly real-world interesting.
Almost without exception anything simple enough for these types of reconstruction programs to handle is too simple to bother with. The paper church in the demo video for instance. The final wire-frame product is, sadly, crap. Neat and interesting crap but still crap. There are at least 3 times the polys that the form needs and almost all of the significant edges are in the wrong place. In the time it would take to clean up the data into something worth using I could build a better model form scratch including textures.
There are perhaps some very niche uses for this in terms of augmented reality. It could be integrated into a game or chat program to give a more realistic version of those make-an-avatar-from-your-webcam gimmicks that seem to gain attention every once and a while. If this guy has developed some very good algorithms he might get the interest of some of the match-moving software companies like Syntheyes.
But the reason this kind of this never shows up in profesional 3D packages is that if you are good enough to be using the software professionally you are good enough not to need these kinds of crutches. It's the 3D equivalent of Dreamweaver's auto-generated spaghetti code.
I haven't read the article yet, but there's already a program doing this with cheap cameras, version 1.0 was free:
http://www.david-laserscanner.com/
That's called "simultaneous location and mapping", and in the last five years, good algorithms have been developed and quite a few systems are more or less working. Search for "Visual SLAM".
The Samsung Hauzen vacuum cleaner uses Visual SLAM. There's a video. This is way ahead of the blundering Roomba.