Kinect's AI Breakthrough Explained
mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"
Any decent large data center will be happy to rent you one for a price?
Wouldnt you still ned software capable of using all of the resources?
Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.
I have more than 1000 cores in my desktop lol maybe they just hooked their desktops together
Smells like Neural Networks thinking ...
I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.
- "What do you do for a living?"
- "I train trees to make a decision forest that can see human limbs."
- "Ah, I see. Makes sense. (WHAT THE FUCK???)"
Why would MS rent/buy processor time? They've got the world's biggest botnet, and they even have the suckers pay MS to join it.
Sleep your way to a whiter smile...date a dentist!
From the summary it looks like they are basically using a classifier which they spent a lot of time training, and it works well. This is impressive, but I don't know if it meets the story title's claim of "AI breakthrough", since from the summary it sounds basically like, "researchers used classifier for classifying data and it worked!" Can someone summarize in a little more detail exactly what the "breakthrough" entails, other than basically standard use of classifiers for training on data sets?
So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.
Yeah, amazing how any post about anything some neckbeard doesn't like because it's MS-related is shilling or advertising. Douche.
Ummm, all I've seen so far apart from this are pretty obvious uses of the depth sensor.
What Microsoft has done is solved an extremely hard AI problem. Check out the body-part identification. I think more credit is due.
Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.
Modest doubt is called the beacon of the wise. - William Shakespeare
A lot of the MS-haters on Slashdot tried to write off the Kinect as a nice bit of third-party hardware with a crappy MS-made driver. I wonder how they'll respond to this. Microsoft has really outdone themselves here. I think Penny Arcade put it best. If only they could apply this sort of innovation to their more important products, they'd be back on top in no time.
Forget the 1000-core cluster. I want to know where I can get 1,000,000 images of people with all the (major) body parts zoned and referenced.
That's an impressive test corpus.
I haven't thoroughly read the paper yet, but calling this an AI breakthrough is inappropriate for a number of reasons. First, this is an application of machine learning, which is not the same thing as AI. Second, it seems to be a fairly incremental work building on very common techniques--very far from a breakthrough in any respect. If you don't believe me, see some of Jamie Shotton's other work, which is good work, but this is nothing extraordinary in comparison.
The need to process 200 frames a second kind of puts a limiter on what techniques you can use.
The sensor came from primasense. The algorithms in it are entirely from MSR.
It is also quite nice to see this published openly.
And no doubt backed up by a dozen patents.
IP (#35625124) resolved to slashdot user Wovel (964431).
So...it can't see the forest for the limbs?
I would assume they just used an established motion tracking system in parallel with the Kinect sensor input.
At 30 fps, that's about 10 hours of input.
Learn Japanese RPG -- lrnj.com
Neural Network / perceptrons.
And here I am again, Mr Anonymous Coward, talking to myself.
Yes I am a douche.
I would like to know why they choose a relatively slow method, random forests (RF), over something like an SVM or another classifier based on a convex optimization? They claim the RF run fast on the GPUs, although they also mention reference [6], which uses SVMs on this problem. Are the RF actually faster than a modern, non-linear SVM implementation?
And no doubt backed up by a dozen patents.
Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.
I don't care if it's 90,000 hectares. That lake was not my doing.
The sensor came from primasense. The algorithms in it are entirely from MSR.
Why can you download SDK with this software from Primasense directly, but not from Microsoft if the algorithms are M$ property?
Looks like M$ is just appropriating third party research.
Who logs in to gdm? Not I, said the duck.
I'd rather they kept their secrets and let somebody else figure it out than be granted a monopoly on an idea.
Looks like M$ is just appropriating third party research.
Splendid. Primesense are not complaining about this paper but you accuse MSR of stealing work?
And is this how they got the images tagged in the first place? http://kotaku.com/#!5605936/is-this-how-microsoft-will-fix-kinects-couch-problem
[Neo voice]we need porn. A lot of porn.[/Neo voice]
"[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"
this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
and of course it is not what they do.
nice piece work, tho IMHO not AI breakthrough.
PrimeSense developed the sensor technology (hardware and firmware) that gives you a depth image. Microsoft took that depth image and created the algorithms that perform body tracking (software).
PrimeSense also have their own body tracking solution (they call it NITE), but it's based on an entirely difference concept and requires a calibration pose to "lock in" initially. Microsoft doesn't use NITE.
Being possessed of an enormously large penis, I am unable to use Kinect as it keeps detecting it as a third leg!
Gentoo Linux - another day, another USE flag.
The method they are using s called as haar cascades postulated by viola jones. I have used the same with opencv for a bit now. http://en.wikipedia.org/wiki/Haar-like_features It's basically passing An image thru progressive classifiers to get a final weight of match. Microsoft may have done the training for generating the classifiers but the method has been around for a bit. "Decision tree".... Pfffft.
Why can't MS do stuff like this in all their departments? Are there not enough smart people to go around? You get truly cool things like this, juxtaposed with
lame "us too!" attempts like WP7 and Bing.
Before you design for reuse, make sure to design it for use.
So which side does it use, left or right?
Way more hyperbole was used to describe the feats of the Wii remote than in the Penny Arcade comic, even before Motion Plus was released...
I don`t know if there is a version of windows with support for more than 256 logical processors (whatever that means). http://www.microsoft.com/windowsserver2008/en/us/r2-scalability-reliability.aspx
"I think this line is mostly filler"
Ah, so you want shorter patent terms and non-ridiculous licensing costs.
Yell at the government regarding the former, and yell at the sellers regarding the latter.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...