Slashdot Mirror


Kinect's AI Breakthrough Explained

mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"

9 of 97 comments (clear)

  1. Sounds like vision, all right by liquiddark · · Score: 4, Interesting

    Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.

    1. Re:Sounds like vision, all right by hoytak · · Score: 4, Insightful

      Random forests have always been a nice classifier to use when working with really wacky data types. This is due in part to how easy it is to customize them; a lot of the ways they can be tweaked and tuned and customized have fairly intuitive effects on the outcome and behavior of the classifier. In my experience, while neural nets can also be pretty powerful, they are often much harder to work with as the parameters you have for tweaking can be really non-intuitive. We sometimes joke about neural nets being "black magic" because the training and tweaking can be really uninterpretable.

      However, the biggest reason random forests were used is probably because they are extremely fast on current chips, probably a couple orders of magnitude faster than neural nets when the trees are hard coded.

      --
      Does having a witty signature really indicate normality?
    2. Re:Sounds like vision, all right by Game_Ender · · Score: 4, Interesting

      Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.

    3. Re:Sounds like vision, all right by Twinbee · · Score: 4, Interesting

      Yes, now all they need to do is fix the lag which can be quite high, maybe even 200ms:
      http://www.youtube.com/watch?v=weZOjotbuSU

      Something really low like 16ms or better is needed so that we don't notice, according to this article:
      http://www.sussex.ac.uk/Users/km3/hfes.pdf

      --
      Why OpalCalc is the best Windows calc
  2. Re:More advertising masquerading as news by symes · · Score: 4, Informative

    I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.

  3. Strange Descriptions... by Anonymous Coward · · Score: 5, Funny

    - "What do you do for a living?"

    - "I train trees to make a decision forest that can see human limbs."

    - "Ah, I see. Makes sense. (WHAT THE FUCK???)"

  4. Impressive. by Chocolate+Teapot · · Score: 4, Funny

    Training just three trees using 1 million test images took about a day using a 1000-core cluster

    Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.

    --
    Modest doubt is called the beacon of the wise. - William Shakespeare
  5. Re:Focussing on the normal bit by gmaslov · · Score: 4, Informative

    So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.

    I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.

    They went through several iterations of this process:

    1. Train their algorithm on this huge data set
    2. Notice that it doesn't work so well in some situations
    3. Have their mo-cap actor(s) produce additional data to cover those situations
    4. Process the new mo-cap data into however many thousands of additional training poses
    5. GOTO 10
  6. Re:More advertising masquerading as news by Jeremi · · Score: 3, Insightful

    And no doubt backed up by a dozen patents.

    Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.