Kinect's AI Breakthrough Explained

← Back to Stories (view on slashdot.org)

Kinect's AI Breakthrough Explained

Posted by Soulskill on Saturday March 26, 2011 @09:58AM from the expensive-hacker-toys dept.

mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"

60 of 97 comments (clear)

Min score:

Reason:

Sort:

Re:1000-core cluster? by Luckyo · 2011-03-26 10:05 · Score: 1

Any decent large data center will be happy to rent you one for a price?
Re:1000-core cluster? by metalmaster · 2011-03-26 10:08 · Score: 1

Wouldnt you still ned software capable of using all of the resources?
Sounds like vision, all right by liquiddark · 2011-03-26 10:11 · Score: 4, Interesting

Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.
1. Re:Sounds like vision, all right by hoytak · 2011-03-26 10:43 · Score: 4, Insightful
  
  Random forests have always been a nice classifier to use when working with really wacky data types. This is due in part to how easy it is to customize them; a lot of the ways they can be tweaked and tuned and customized have fairly intuitive effects on the outcome and behavior of the classifier. In my experience, while neural nets can also be pretty powerful, they are often much harder to work with as the parameters you have for tweaking can be really non-intuitive. We sometimes joke about neural nets being "black magic" because the training and tweaking can be really uninterpretable.
  However, the biggest reason random forests were used is probably because they are extremely fast on current chips, probably a couple orders of magnitude faster than neural nets when the trees are hard coded.
  
  --
  Does having a witty signature really indicate normality?
2. Re:Sounds like vision, all right by Game_Ender · 2011-03-26 10:52 · Score: 4, Interesting
  
  Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.
3. Re:Sounds like vision, all right by oliverthered · 2011-03-26 13:52 · Score: 1
  
  Go search for Women Aspergers interview tony attwood.
  Keep listening until you get to the 'sixth sense' bit.
  You may not realize it, that doesn't mean that other people aren't 100% aware.. (e.g. I'm in the third person, it's pretty apparent that I don't make the spelling mistakes I just tell my body by pushing a command out to write some stuff, and it cocks up sometimes).
  It does similar in the other direction, with various levels of indirection... and I can also push things further down for some real number crunching.
  
  --
  thank God the internet isn't a human right.
4. Re:Sounds like vision, all right by Twinbee · 2011-03-26 14:12 · Score: 4, Interesting
  
  Yes, now all they need to do is fix the lag which can be quite high, maybe even 200ms:
  http://www.youtube.com/watch?v=weZOjotbuSU
  Something really low like 16ms or better is needed so that we don't notice, according to this article:
  http://www.sussex.ac.uk/Users/km3/hfes.pdf
  
  --
  Why OpalCalc is the best Windows calc
5. Re:Sounds like vision, all right by dominious · 2011-03-26 20:56 · Score: 1
  
  Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.
  I suppose that's what the title "AI Breakthrough" means. Training decision trees in a random forest is not a breakthrough.
6. Re:Sounds like vision, all right by hvm2hvm · 2011-03-26 21:39 · Score: 2
  
  Exactly... sometimes "good enough" is better than "it should work in theory but we don't have the required hardware/algorithmical/whatever capabilities yet". It probably won't work perfectly in some cases but for most applications it's great.
  
  I really think AI will be created in the same way. Once in a while a need appears for a AI related task and someone finds a "good enough solution". In time, someone will need a robot to have a serious conversation with and there will be enough knowledge lying around that it will be easy to create that "good enough" solution.
  
  It will be like with the kinect and wii, no-one will expect what will come out of it but everyone will think "hm, they should have done that years ago".
  
  --
  ics
7. Re:Sounds like vision, all right by amorsen · 2011-03-26 23:33 · Score: 1
  
  The youtube video doesn't really prove anything. The lag could just as easily be introduced by the TV or the game.
  
  --
  Finally! A year of moderation! Ready for 2019?
ANN? by Gulah · 2011-03-26 10:18 · Score: 1

Smells like Neural Networks thinking ...
Re:More advertising masquerading as news by symes · 2011-03-26 10:19 · Score: 4, Informative

I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.
Strange Descriptions... by Anonymous Coward · 2011-03-26 10:27 · Score: 5, Funny

- "What do you do for a living?"
- "I train trees to make a decision forest that can see human limbs."
- "Ah, I see. Makes sense. (WHAT THE FUCK???)"
1. Re:Strange Descriptions... by Slutticus · 2011-03-26 15:04 · Score: 1
  
  Sounds like an upcoming xkcd strip.
2. Re:Strange Descriptions... by Sal+Zeta · 2011-03-27 03:48 · Score: 1
  
  -"Oh! So, you're the one who writes lyrics for Radiohead, then."
Re:1000-core cluster? by davester666 · 2011-03-26 10:33 · Score: 2

Why would MS rent/buy processor time? They've got the world's biggest botnet, and they even have the suckers pay MS to join it.

--
Sleep your way to a whiter smile...date a dentist!
Need a more descriptive summary by radarsat1 · 2011-03-26 10:34 · Score: 1

From the summary it looks like they are basically using a classifier which they spent a lot of time training, and it works well. This is impressive, but I don't know if it meets the story title's claim of "AI breakthrough", since from the summary it sounds basically like, "researchers used classifier for classifying data and it worked!" Can someone summarize in a little more detail exactly what the "breakthrough" entails, other than basically standard use of classifiers for training on data sets?
1. Re:Need a more descriptive summary by narcc · 2011-03-26 11:13 · Score: 1
  
  TFA says "it is all based on fairly standard classical pattern recognition"
  I'm a science reporter. I just want to clarify your above statement -- Are you saying that this is an unprecedented breakthrough in artificial intelligence research that will lead to "thinking machines" in the next year?
  
  --
  Required reading for internet skeptics
2. Re:Need a more descriptive summary by mikael · 2011-03-26 12:30 · Score: 1, Informative
  
  The function: f=d(x+u/d(x))-d(x+v/d(x)) would calculate the depth gradient of the pixel. It's possible to reconstruct a three dimensional shape from a 2D image .
  Then your problem is trying to match a human skeleton to the shape. If you know the curvature of the gradient at a particular point, you can eliminate some body parts. A head is mostly spherical and within a particular maximum/minimum, limbs and the torso are more cylindrical with a linear depth along one axis. Look for that linearity, and you could determine that is a limb and what direction it is aligned in.
  
  --
  Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
3. Re:Need a more descriptive summary by marcansoft · 2011-03-26 22:25 · Score: 2
  
  This has nothing to do with reconstructing a depth image from a 2D image. The Kinect is a depth camera and already gives you a real depth image (not a guess).
Focussing on the normal bit by Anonymous Coward · 2011-03-26 10:34 · Score: 1

So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.
1. Re:Focussing on the normal bit by hedwards · 2011-03-26 10:50 · Score: 2
  
  The same way that cybercriminals crack captchas, they just offered up a picture of a random boob to a random boob. The real problem was stopping at 1m pictures.
2. Re:Focussing on the normal bit by multipartmixed · 2011-03-26 10:52 · Score: 2
  
  > I'm far more interested in how they generated those '1 million'
  > pre-labelled test images in the first place.
  Snapshots from the webcams attached to computers running Windows.
  
  --
  
  Do daemons dream of electric sleep()?
3. Re:Focussing on the normal bit by gmaslov · 2011-03-26 13:20 · Score: 4, Informative
  So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.
  I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.
  They went through several iterations of this process:
  
  Train their algorithm on this huge data set
  Notice that it doesn't work so well in some situations
  Have their mo-cap actor(s) produce additional data to cover those situations
  Process the new mo-cap data into however many thousands of additional training poses
  GOTO 10
4. Re:Focussing on the normal bit by im_thatoneguy · 2011-03-27 13:10 · Score: 1
  
  What I find really interesting about this approach is that it's machine learning in a virtual environment.
  They essentially taught a game controller how to be a game controller by feeding it virtual players inside of a game.
  I suspect this is how we'll want to train all artificial intelligence agents. Why go through the trouble of building a robotic body for an AI to use when it can simply be provided a virtual world to live and grow in.
  I've also always been curious why more AI research doesn't take place in virtual environments. A prime example is the DARPA driving challenges. You could simulate LIDAR data and stereo camera arrays in a quite photorealisitc environment 24/7. Taking it out on the road seems like a formality.
"Almost as impressive"? by Anonymous Coward · 2011-03-26 10:51 · Score: 1

Ummm, all I've seen so far apart from this are pretty obvious uses of the depth sensor.
What Microsoft has done is solved an extremely hard AI problem. Check out the body-part identification. I think more credit is due.
1. Re:"Almost as impressive"? by Anonymous Coward · 2011-03-26 12:28 · Score: 2, Insightful
  
  Hum, no, actually, they just used a known for years technic of machine learning on a huge sample of data and it worked pretty well.
  From my point of view, there is no major breakthrough but still it's a nice solution.
Impressive. by Chocolate+Teapot · 2011-03-26 10:59 · Score: 4, Funny

Training just three trees using 1 million test images took about a day using a 1000-core cluster
Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.

--
Modest doubt is called the beacon of the wise. - William Shakespeare
1. Re:Impressive. by feedayeen · 2011-03-26 11:16 · Score: 1
  
  Training just three trees using 1 million test images took about a day using a 1000-core cluster
  Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.
  The 1K core cluster is mostly because it takes such a long time to say anything. Had they gone with a one core cluster, by days end, the system will have just managed to say 'good morning'. The end result is that it has accomplished nothing. Thankfully, with this system, they can complete this statement in a thousandth of the time, in other words, it reduced the startup time to 28.8 seconds.
2. Re:Impressive. by TheoMurpse · 2011-03-26 15:24 · Score: 1
  
  640K ought to be enough chlorophyll for anyone.
Very impressive by artor3 · 2011-03-26 11:02 · Score: 1, Flamebait

A lot of the MS-haters on Slashdot tried to write off the Kinect as a nice bit of third-party hardware with a crappy MS-made driver. I wonder how they'll respond to this. Microsoft has really outdone themselves here. I think Penny Arcade put it best. If only they could apply this sort of innovation to their more important products, they'd be back on top in no time.
1. Re:Very impressive by Dr+Max · 2011-03-26 12:27 · Score: 1
  
  I for one welcome the day i no longer have to push analogue sticks around or furiously slide a mouse around a desktop to take out a guy in the latest fps. Give me a good head and gun tracking system and maybe a heads up display any time. I agree we aren't at nirvana yet but it wont be long before your controlling your rts armies with hand signs from the heavens.
  
  --
  Rocket Surgeon.
2. Re:Very impressive by drinkypoo · 2011-03-26 13:07 · Score: 1
  
  As for the device and things like that in general, just like Eyetoy, these will never, ever replace a controller. Ever.
  I think it's fairly clear that the future involves both approaches, sometimes both in one game. Keeping gamepad support anywhere it is possible to do so keeps the game accessible for as many people as possible, e.g. the disabled. But I really enjoy the fact that the Wii gets me moving around. I imagine I'd enjoy the same thing about Kinect (but my 360's optical drive died and I have been extremely lazy about replacing it. I have all the pieces...)
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
3. Re:Very impressive by hoytak · 2011-03-26 13:41 · Score: 1
  
  More important products.... yes....
  http://www.telegraph.co.uk/technology/microsoft/8287610/Xbox-Kinect-helps-Microsoft-beat-Wall-Street-profit-forecasts.html
  
  --
  Does having a witty signature really indicate normality?
4. Re:Very impressive by Concerned+Onlooker · 2011-03-26 14:01 · Score: 1
  
  Well, I really like my Mac and I think this is cool. So, stuff that data point in your decision forest. :-)
  
  --
  http://www.rootstrikers.org/
5. Re:Very impressive by SpinyNorman · 2011-03-26 14:31 · Score: 1
  
  Yep - this is certainly a very impressive product. MIcrosoft have an absolutely world-class reseach lab/staff, but it seems rare so far for their work to make it into products (same as was the case with Xerox PARC).
6. Re:Very impressive by wjsteele · 2011-03-26 14:57 · Score: 1
  
  "IBM's BOCA RATON : created the first PC"
  
  And here all this time I thought the Apple computer was out before the IBM... silly me.
  
  Bill
  
  --
  It's my Sig and you can't have it. Mine! All Mine!
7. Re:Very impressive by Clsid · 2011-03-26 20:39 · Score: 2
  
  Yeah, they make nice products when they face competition, there is no doubt about it. But even then, some of the commercial practices are questionable and that's where most of the hate comes from. For instance, you buy an XBox360 and a PS3. In the XBox you have to pay a monthly fee to play online games where as in the PS3 is completely free. If Microsoft is the only player in town in that particular case then we would be in a world of hurt. Luckily, having options pushes Microsoft to do the right thing, even if it doesn't end up doing most of the time. To me, that's what Linux and free software is really all about, more about options instead of being a much superior product.
8. Re:Very impressive by ObsessiveMathsFreak · 2011-03-27 04:25 · Score: 1
  
  The Penny Arcade strip was actually a send up of all the ridiculous hype surrounding the device. It can't actually restore sight to... you know what, never mind. Yeah, the Kinect is the digital manifestation of the second coming. It is the apex of technological development for the human race.
  
  --
  May the Maths Be with you!
Re:1000-core cluster? by woolpert · 2011-03-26 11:03 · Score: 1

Forget the 1000-core cluster. I want to know where I can get 1,000,000 images of people with all the (major) body parts zoned and referenced.
That's an impressive test corpus.
Summary hyperbole by Anonymous Coward · 2011-03-26 11:11 · Score: 1

I haven't thoroughly read the paper yet, but calling this an AI breakthrough is inappropriate for a number of reasons. First, this is an application of machine learning, which is not the same thing as AI. Second, it seems to be a fairly incremental work building on very common techniques--very far from a breakthrough in any respect. If you don't believe me, see some of Jamie Shotton's other work, which is good work, but this is nothing extraordinary in comparison.
1. Re:Summary hyperbole by Jeremi · 2011-03-26 15:13 · Score: 1
  
  First, this is an application of machine learning, which is not the same thing as AI.
  That's the beauty and mystery of AI -- once a technique is actually made to work on computers in the real world, it loses its status as an "AI technique". The AI goalposts automatically move ahead to some other, harder problem that isn't solved yet. Eventually we will have HAL-9000 style computers everywhere, and people will continually piss them off by telling them the reasons they don't count as "real AI".
  
  --
  
  I don't care if it's 90,000 hectares. That lake was not my doing.
2. Re:Summary hyperbole by Anonymous Coward · 2011-03-26 15:37 · Score: 1
  
  Nah. twice, three times, max. then the people will start dying.
3. Re:Summary hyperbole by Needlzor · 2011-03-26 17:52 · Score: 1
  
  As Abe Othman and Ariel Procaccia said: "AI is whatever gets published at AAAI/IJCAI". Best definition of AI yet.
Re:Developed by a 3rd party? by shriphani · 2011-03-26 12:14 · Score: 2

The sensor came from primasense. The algorithms in it are entirely from MSR.
Re:More advertising masquerading as news by Raenex · 2011-03-26 12:18 · Score: 2

It is also quite nice to see this published openly.
And no doubt backed up by a dozen patents.
Kinect's Perspective by AnotherAnonymousUser · 2011-03-26 13:20 · Score: 1

So...it can't see the forest for the limbs?
Re:1000-core cluster? by lrnj · 2011-03-26 14:10 · Score: 1

I would assume they just used an established motion tracking system in parallel with the Kinect sensor input.
At 30 fps, that's about 10 hours of input.

--
Learn Japanese RPG -- lrnj.com
Summary in a few words by elsJake · 2011-03-26 14:48 · Score: 1

Neural Network / perceptrons.
Re:More advertising masquerading as news by Jeremi · 2011-03-26 15:15 · Score: 3, Insightful

And no doubt backed up by a dozen patents.
Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:More advertising masquerading as news by Raenex · 2011-03-26 19:46 · Score: 1

I'd rather they kept their secrets and let somebody else figure it out than be granted a monopoly on an idea.
Re:Developed by a 3rd party? by shriphani · 2011-03-26 19:52 · Score: 1

Sorry I misspelled the name there. The company is PrimeSense. Here's where I see the paper beating the OpenNI SDK - 200 fps on consumer grade hardware. This is just what the paper claims it is - a simple machine learning technique that when applied correctly produced very good results and allowed them to launch a highly successful peripheral.

Looks like M$ is just appropriating third party research.

Splendid. Primesense are not complaining about this paper but you accuse MSR of stealing work?
TFA makes it sound like they're cheating by L4z4ru5 · 2011-03-26 21:16 · Score: 2

"[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"
this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
and of course it is not what they do.
nice piece work, tho IMHO not AI breakthrough.
1. Re:TFA makes it sound like they're cheating by flyingkillerrobots · 2011-03-27 04:45 · Score: 1
  
  That's what a tuning set is for. Do you trust the summaries here to give a perfect description of what is going on?
  
  --
  "It is a good thing for an uneducated man to read books of quotations..." -Winston Churchill
Re:Developed by a 3rd party? by marcansoft · 2011-03-26 22:34 · Score: 1

PrimeSense developed the sensor technology (hardware and firmware) that gives you a depth image. Microsoft took that depth image and created the algorithms that perform body tracking (software).
PrimeSense also have their own body tracking solution (they call it NITE), but it's based on an entirely difference concept and requires a calibration pose to "lock in" initially. Microsoft doesn't use NITE.
Decision tree my a$$ by sundru · 2011-03-27 00:44 · Score: 1

The method they are using s called as haar cascades postulated by viola jones. I have used the same with opencv for a bit now. http://en.wikipedia.org/wiki/Haar-like_features It's basically passing An image thru progressive classifiers to get a final weight of match. Microsoft may have done the training for generating the classifiers but the method has been around for a bit. "Decision tree".... Pfffft.
Re:Why? by im_thatoneguy · 2011-03-27 13:20 · Score: 1

"Us too!"?
Well it's hard for them to do stuff like this in all departments when you don't acknowledge all the other times that they offer innovative or superior products.
WP7 is in my opinion a far better thought out operating system from a user standpoint than any of the alternatives. So if by "Me Too!" you mean they released a great rewrite of their product which has been on the market longer than either Android or iOS then yes they too continued innovating. WinMo go sucky but when it was released it was pretty amazing. The problem was that A) capacitive touchscreens were prohibitively expensive so styluses were the only useful input device and B) Data Plans were prohibitively expensive and painfully slow.
They should have started preparing for the day when finger input would be useful and data plans would be accessible sooner but they eventually got caught up.
Zune would be another example where Microsoft was both releasing tech before the ipod and with the Zune still offering a superior product. The fact that it didn't sell well had far less to do with the fact that it was a bad product than it just didn't have the brand recognition when it launched as the ipod.
Re:Operating system? by aled · 2011-03-27 14:18 · Score: 1

I don`t know if there is a version of windows with support for more than 256 logical processors (whatever that means). http://www.microsoft.com/windowsserver2008/en/us/r2-scalability-reliability.aspx

--

"I think this line is mostly filler"
Re:More advertising masquerading as news by X0563511 · 2011-03-28 03:22 · Score: 1

Ah, so you want shorter patent terms and non-ridiculous licensing costs.
Yell at the government regarding the former, and yell at the sellers regarding the latter.

--
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Re:Thanks for NOTHING, Microsoft! by Neil+Boekend · 2011-03-30 00:34 · Score: 1
There are 2 solutions:
1. Don't play naked.
2. Don't lie about your penis size
--
Well, I might have a way, but it only works on a semi spherical planet in a vacuum.