Cheap 3D Computer Vision?
InspectorPraline writes "According to this article at the New York Times [free reg req'd], a tech firm known as Tyzx is developing optics technology that will have three-dimensional capability -- using two cameras attached by a high-bandwidth connection to a custom processing card inside a PC. The article makes one believe that the system would have a top speed of as much as 132 stereo frames per second, which could be very useful in security systems. Of course, the real question is who's behind the cameras, but we can all drool over the other possibilities, right?"
No more taping the red and blue filters from my Mag-Lite to my eyelids any more! :-)
Do you need three eyes?
This is taken from the document Real-time Stereo Vision for Real-world object tracking:
.... the DeepSea chip may not be able to find a valid match for every pixel in the image. Large unformity lit areas of scene may have pixels of identical intensity; for pixels in such area, no single match can be found. Pixels that correspond to an object that is invisible to one imager but the other also do not have matching pixels.
... Once the matching process is complete, the range of each pixel can be calucated using the horizontal disparity of the matching pixels, the focal lenghts of the lenses and the distance between them. The DeepSea chip designates the range or anormalous pixels as invalid. :)) See also a HP document covering partly the same matter.
<clip>
The DeepSea chip is hardware implementation of the census correspondence algorithm invented by Tyzx staff... The algorithm's key concept is transforming a pixel's numeric absolute intensity value into a bit string that represents the pixel's brightness relative ot it's neighboring pixels. For each pixel, The DeepSea chip examines the pixels surrounding area called a neighborhood. A typical neighborhood is 7x7 pixels centered on the subject pixel. Comparing a subject pixel's intensity to its neighbours, the chip produces a relative intensity map (show in the document, page 8).
</clip>
(typos are mine)
The technology employed (both hardware and software) is limited. CMOS sensors of the type described suffer from poor signal to noise as well as interlacing artifacts. Pixel jitter is of major importance in machine vision and I doubt these sensors offer much clock control over and above the 1 pixel mark (if any).
The matching algorithm described is very primitive, assuming rotation in depth between views doesn't effect the scene projection into the image - ooh but it does. The concensus matching algorithm is very simple and whilst it does recognise the problems of illumination variation it fails to solve the problem in a manner you could describe as robust. Also contrary to popular belief you cannot robustly recover depth from every pixel n the image! There is no evidence that the human vision system does it (without knowledge of the object) so why are people trying it? Even if you ataempt it you are going to need some way of telling which data is more accurate than not in order to start using the results. Edges are your best bet and I didn't see any evidence of preprocessing described in their system (although to be fair I only read it breifly).
I appreciate that this is supposed to be a cheap system and thus its limitations are probably to be expected. Might be fun to play with for a hundred Euros or so.
For more state of the art look at what is possible you could do better than take a look at TINA an open source machine vision system with a very sophisticated stereo depth estimation algorithm (we even built a chip to accelerate it!)
-- "Can't sleep, clowns will eat me!"