iFixit Tears Down Microsoft's Kinect For Xbox 360
alphadogg writes "Microsoft's new hands-free Kinect game controller is packed with four microphones, two autofocus cameras and a motion detector chip that together make for one heck of a complex toy, according to iFixit's initial teardown of the device. 'We haven't been this excited to get our hands on new hardware since the iPad,' says Kyle Wiens, CEO of the company. 'The way that we interact with computers is (finally) evolving, and Kinect is unlike any hardware we've ever taken apart. In fact, the only thing we've ever taken apart that has anywhere close to this many sensors is Pleo, the dinosaur robot.' iFixit describes Kinect as 'a horizontal bar of sensors connected to a small, motorized pivoting base.' The $150 device that Microsoft put hundreds of millions of dollars of research into can be purchased separately from the Xbox 360 or as part of a bundle. A Prime Sense PS1080-A2 is at the heart of Kinect's motion detection capabilities, as it connects to all of Kinect's sensors and processes images of your game room's color and scope before shooting them over to the Xbox. iFixit couldn't immediately identify all of the chips within the box, so plans to update its teardown."
Open source drivers? For an MS-produced device that only works on MS-consoles and probably plays all sorts of non-standard tricks to avoid it's use as a direct PC peripheral? Oh, right. See you in several years then.
$150 actually gets you three very low-spec webcams - Two 640x480, but the other one is 320x240 IR cam almost exactly the same as the WiiMote's infrared "sensor", cheap and most laptops now have them built in "for free". Four microphones (10p throwaway electronics - hence why every laptop has one by default). The expense is in the software and image processing, always has been, always will be, and that's the bit that's INCREDIBLY hard to write accurately and have it work quickly (i.e. chances are that it's set to a particular set of criteria and won't work in odd conditions, won't be programmable and/or puts all it's work to the host processor which means some poor sod has to reinvent all the image processing algorithms without hitting a patent using what is basically three off-the-shelf webcams). I wouldn't be surprised if any "drivers" that do appear for the Kinect actually work because they have been doing this on much cheaper hardware first (e.g. a bunch of second hand 640x480 USB webcams and a multi-input sound card) - hell, getting a simple motion-detection algorithm working can be a pain - ever tried to set up Motion on a couple of cameras? You can spend your life tweaking, making image masks, etc. and that's just to say "the image changed" or "it didn't" against a background of auto-focus, auto-exposure cameras looking into someone's back yard.
And like all things reliant on software-based recognition, it will not be as accurate or as quick or as adjustable as you need it to be. Voice recognition hasn't improved much in 15 years. (But voice synthesis has because that's "easy" in comparison). My bank still can't understand when I say my account number and thus have reverted to DTMF tones to do that entry. Fingerprint recognition hasn't improved much in 15 years (with optical fingerprint scanners like one I own or the ones built into IBM laptops) - hence open-source software that's quite basic is just as good as the commercial offerings for hundreds of pounds and people are touting "iris recognition" as the next best thing (with the same problems, but new rounds of investment). 3D image recognition hasn't improved much in 15 years - robots still bounce off walls they didn't see and cars still crash with "crash-proof" control software (like the extremely expensive and hi-tech demo a few months ago where two "crash-proof" cars rammed each other incessantly over multiple trials because their image recognition and distance sensors just were not processing the data in the right way - in front of the world's press for a "crash-proof" car from a major car manufacturer.
These things has ALWAYS been in their current state, it's just that we can do more of them faster now. It doesn't mean that throwing a quad-core 3GHz at the problem somehow solves the fact that the algorithms are crap, limited and that computers can only do what we tell them and not automatically recognise shapes, sizes, colours, etc. unless told exactly how. Hell, show people a bit of software that can actually work out (vaguely reliably) if there's a cow in an random image and you'll be a millionaire.
The current state of any of these processes is enough for basic tasks (arguably enough for gaming but, again, they've been around for decades so people HAVE used them in video-controlled gaming devices for years for everything from the NES to the PC and they all flopped), but anyone who's ever used voice recognition software for a long time, or been involved with professional projects aimed at image processing will tell you - beyond a certain point, there's no "groundbreaking" tricks to use that can get you better recognition even if you trawl through PhD papers. It's all pretty much the same thing - run an image through some basic Photoshop-style filters, try to identify edges and clusters, filter based