Kinect's AI Breakthrough Explained
mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"
Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.
I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.
- "What do you do for a living?"
- "I train trees to make a decision forest that can see human limbs."
- "Ah, I see. Makes sense. (WHAT THE FUCK???)"
Why would MS rent/buy processor time? They've got the world's biggest botnet, and they even have the suckers pay MS to join it.
Sleep your way to a whiter smile...date a dentist!
The same way that cybercriminals crack captchas, they just offered up a picture of a random boob to a random boob. The real problem was stopping at 1m pictures.
> I'm far more interested in how they generated those '1 million'
> pre-labelled test images in the first place.
Snapshots from the webcams attached to computers running Windows.
Do daemons dream of electric sleep()?
Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.
Modest doubt is called the beacon of the wise. - William Shakespeare
The sensor came from primasense. The algorithms in it are entirely from MSR.
It is also quite nice to see this published openly.
And no doubt backed up by a dozen patents.
Hum, no, actually, they just used a known for years technic of machine learning on a huge sample of data and it worked pretty well.
From my point of view, there is no major breakthrough but still it's a nice solution.
I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.
They went through several iterations of this process:
And no doubt backed up by a dozen patents.
Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.
I don't care if it's 90,000 hectares. That lake was not my doing.
Yeah, they make nice products when they face competition, there is no doubt about it. But even then, some of the commercial practices are questionable and that's where most of the hate comes from. For instance, you buy an XBox360 and a PS3. In the XBox you have to pay a monthly fee to play online games where as in the PS3 is completely free. If Microsoft is the only player in town in that particular case then we would be in a world of hurt. Luckily, having options pushes Microsoft to do the right thing, even if it doesn't end up doing most of the time. To me, that's what Linux and free software is really all about, more about options instead of being a much superior product.
"[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"
this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
and of course it is not what they do.
nice piece work, tho IMHO not AI breakthrough.
This has nothing to do with reconstructing a depth image from a 2D image. The Kinect is a depth camera and already gives you a real depth image (not a guess).