Slashdot Mirror


Kinect's AI Breakthrough Explained

mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"

19 of 97 comments (clear)

  1. Sounds like vision, all right by liquiddark · · Score: 4, Interesting

    Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.

    1. Re:Sounds like vision, all right by hoytak · · Score: 4, Insightful

      Random forests have always been a nice classifier to use when working with really wacky data types. This is due in part to how easy it is to customize them; a lot of the ways they can be tweaked and tuned and customized have fairly intuitive effects on the outcome and behavior of the classifier. In my experience, while neural nets can also be pretty powerful, they are often much harder to work with as the parameters you have for tweaking can be really non-intuitive. We sometimes joke about neural nets being "black magic" because the training and tweaking can be really uninterpretable.

      However, the biggest reason random forests were used is probably because they are extremely fast on current chips, probably a couple orders of magnitude faster than neural nets when the trees are hard coded.

      --
      Does having a witty signature really indicate normality?
    2. Re:Sounds like vision, all right by Game_Ender · · Score: 4, Interesting

      Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.

    3. Re:Sounds like vision, all right by Twinbee · · Score: 4, Interesting

      Yes, now all they need to do is fix the lag which can be quite high, maybe even 200ms:
      http://www.youtube.com/watch?v=weZOjotbuSU

      Something really low like 16ms or better is needed so that we don't notice, according to this article:
      http://www.sussex.ac.uk/Users/km3/hfes.pdf

      --
      Why OpalCalc is the best Windows calc
    4. Re:Sounds like vision, all right by hvm2hvm · · Score: 2

      Exactly... sometimes "good enough" is better than "it should work in theory but we don't have the required hardware/algorithmical/whatever capabilities yet". It probably won't work perfectly in some cases but for most applications it's great.

      I really think AI will be created in the same way. Once in a while a need appears for a AI related task and someone finds a "good enough solution". In time, someone will need a robot to have a serious conversation with and there will be enough knowledge lying around that it will be easy to create that "good enough" solution.

      It will be like with the kinect and wii, no-one will expect what will come out of it but everyone will think "hm, they should have done that years ago".

      --
      ics
  2. Re:More advertising masquerading as news by symes · · Score: 4, Informative

    I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.

  3. Strange Descriptions... by Anonymous Coward · · Score: 5, Funny

    - "What do you do for a living?"

    - "I train trees to make a decision forest that can see human limbs."

    - "Ah, I see. Makes sense. (WHAT THE FUCK???)"

  4. Re:1000-core cluster? by davester666 · · Score: 2

    Why would MS rent/buy processor time? They've got the world's biggest botnet, and they even have the suckers pay MS to join it.

    --
    Sleep your way to a whiter smile...date a dentist!
  5. Re:Focussing on the normal bit by hedwards · · Score: 2

    The same way that cybercriminals crack captchas, they just offered up a picture of a random boob to a random boob. The real problem was stopping at 1m pictures.

  6. Re:Focussing on the normal bit by multipartmixed · · Score: 2

    > I'm far more interested in how they generated those '1 million'
    > pre-labelled test images in the first place.

    Snapshots from the webcams attached to computers running Windows.

    --

    Do daemons dream of electric sleep()?
  7. Impressive. by Chocolate+Teapot · · Score: 4, Funny

    Training just three trees using 1 million test images took about a day using a 1000-core cluster

    Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.

    --
    Modest doubt is called the beacon of the wise. - William Shakespeare
  8. Re:Developed by a 3rd party? by shriphani · · Score: 2

    The sensor came from primasense. The algorithms in it are entirely from MSR.

  9. Re:More advertising masquerading as news by Raenex · · Score: 2

    It is also quite nice to see this published openly.

    And no doubt backed up by a dozen patents.

  10. Re:"Almost as impressive"? by Anonymous Coward · · Score: 2, Insightful

    Hum, no, actually, they just used a known for years technic of machine learning on a huge sample of data and it worked pretty well.
    From my point of view, there is no major breakthrough but still it's a nice solution.

  11. Re:Focussing on the normal bit by gmaslov · · Score: 4, Informative

    So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.

    I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.

    They went through several iterations of this process:

    1. Train their algorithm on this huge data set
    2. Notice that it doesn't work so well in some situations
    3. Have their mo-cap actor(s) produce additional data to cover those situations
    4. Process the new mo-cap data into however many thousands of additional training poses
    5. GOTO 10
  12. Re:More advertising masquerading as news by Jeremi · · Score: 3, Insightful

    And no doubt backed up by a dozen patents.

    Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.
  13. Re:Very impressive by Clsid · · Score: 2

    Yeah, they make nice products when they face competition, there is no doubt about it. But even then, some of the commercial practices are questionable and that's where most of the hate comes from. For instance, you buy an XBox360 and a PS3. In the XBox you have to pay a monthly fee to play online games where as in the PS3 is completely free. If Microsoft is the only player in town in that particular case then we would be in a world of hurt. Luckily, having options pushes Microsoft to do the right thing, even if it doesn't end up doing most of the time. To me, that's what Linux and free software is really all about, more about options instead of being a much superior product.

  14. TFA makes it sound like they're cheating by L4z4ru5 · · Score: 2

    "[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"

    this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
    and of course it is not what they do.

    nice piece work, tho IMHO not AI breakthrough.

  15. Re:Need a more descriptive summary by marcansoft · · Score: 2

    This has nothing to do with reconstructing a depth image from a 2D image. The Kinect is a depth camera and already gives you a real depth image (not a guess).