Kinect's AI Breakthrough Explained
mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"
Any decent large data center will be happy to rent you one for a price?
Wouldnt you still ned software capable of using all of the resources?
Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.
Smells like Neural Networks thinking ...
I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.
- "What do you do for a living?"
- "I train trees to make a decision forest that can see human limbs."
- "Ah, I see. Makes sense. (WHAT THE FUCK???)"
Why would MS rent/buy processor time? They've got the world's biggest botnet, and they even have the suckers pay MS to join it.
Sleep your way to a whiter smile...date a dentist!
From the summary it looks like they are basically using a classifier which they spent a lot of time training, and it works well. This is impressive, but I don't know if it meets the story title's claim of "AI breakthrough", since from the summary it sounds basically like, "researchers used classifier for classifying data and it worked!" Can someone summarize in a little more detail exactly what the "breakthrough" entails, other than basically standard use of classifiers for training on data sets?
So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.
Ummm, all I've seen so far apart from this are pretty obvious uses of the depth sensor.
What Microsoft has done is solved an extremely hard AI problem. Check out the body-part identification. I think more credit is due.
Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.
Modest doubt is called the beacon of the wise. - William Shakespeare
A lot of the MS-haters on Slashdot tried to write off the Kinect as a nice bit of third-party hardware with a crappy MS-made driver. I wonder how they'll respond to this. Microsoft has really outdone themselves here. I think Penny Arcade put it best. If only they could apply this sort of innovation to their more important products, they'd be back on top in no time.
Forget the 1000-core cluster. I want to know where I can get 1,000,000 images of people with all the (major) body parts zoned and referenced.
That's an impressive test corpus.
I haven't thoroughly read the paper yet, but calling this an AI breakthrough is inappropriate for a number of reasons. First, this is an application of machine learning, which is not the same thing as AI. Second, it seems to be a fairly incremental work building on very common techniques--very far from a breakthrough in any respect. If you don't believe me, see some of Jamie Shotton's other work, which is good work, but this is nothing extraordinary in comparison.
The sensor came from primasense. The algorithms in it are entirely from MSR.
It is also quite nice to see this published openly.
And no doubt backed up by a dozen patents.
So...it can't see the forest for the limbs?
I would assume they just used an established motion tracking system in parallel with the Kinect sensor input.
At 30 fps, that's about 10 hours of input.
Learn Japanese RPG -- lrnj.com
Neural Network / perceptrons.
And no doubt backed up by a dozen patents.
Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.
I don't care if it's 90,000 hectares. That lake was not my doing.
I'd rather they kept their secrets and let somebody else figure it out than be granted a monopoly on an idea.
Looks like M$ is just appropriating third party research.
Splendid. Primesense are not complaining about this paper but you accuse MSR of stealing work?
"[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"
this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
and of course it is not what they do.
nice piece work, tho IMHO not AI breakthrough.
PrimeSense developed the sensor technology (hardware and firmware) that gives you a depth image. Microsoft took that depth image and created the algorithms that perform body tracking (software).
PrimeSense also have their own body tracking solution (they call it NITE), but it's based on an entirely difference concept and requires a calibration pose to "lock in" initially. Microsoft doesn't use NITE.
The method they are using s called as haar cascades postulated by viola jones. I have used the same with opencv for a bit now. http://en.wikipedia.org/wiki/Haar-like_features It's basically passing An image thru progressive classifiers to get a final weight of match. Microsoft may have done the training for generating the classifiers but the method has been around for a bit. "Decision tree".... Pfffft.
"Us too!"?
Well it's hard for them to do stuff like this in all departments when you don't acknowledge all the other times that they offer innovative or superior products.
WP7 is in my opinion a far better thought out operating system from a user standpoint than any of the alternatives. So if by "Me Too!" you mean they released a great rewrite of their product which has been on the market longer than either Android or iOS then yes they too continued innovating. WinMo go sucky but when it was released it was pretty amazing. The problem was that A) capacitive touchscreens were prohibitively expensive so styluses were the only useful input device and B) Data Plans were prohibitively expensive and painfully slow.
They should have started preparing for the day when finger input would be useful and data plans would be accessible sooner but they eventually got caught up.
Zune would be another example where Microsoft was both releasing tech before the ipod and with the Zune still offering a superior product. The fact that it didn't sell well had far less to do with the fact that it was a bad product than it just didn't have the brand recognition when it launched as the ipod.
I don`t know if there is a version of windows with support for more than 256 logical processors (whatever that means). http://www.microsoft.com/windowsserver2008/en/us/r2-scalability-reliability.aspx
"I think this line is mostly filler"
Ah, so you want shorter patent terms and non-ridiculous licensing costs.
Yell at the government regarding the former, and yell at the sellers regarding the latter.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Well, I might have a way, but it only works on a semi spherical planet in a vacuum.