Predator Outdoes Kinect At Object Recognition

← Back to Stories (view on slashdot.org)

Predator Outdoes Kinect At Object Recognition

Posted by timothy on Thursday April 14, 2011 @06:23AM from the think-well-inside-the-box dept.

mikejuk writes "A real breakthough in AI allows a simple video camera and almost any machine to track objects in its view. All you have to do is draw a box around the object you want to track and the software learns what it looks like at different angles and under different lighting conditions as it tracks it. This means no training phase — you show it the object and it tracks it. And it seems to work really well! The really good news is that the software has been released as open source so we can all try it out! This is how AI should work."

26 of 205 comments (clear)

Min score:

Reason:

Sort:

Wow, what a great idea. by Anon-Admin · 2011-04-14 06:32 · Score: 2

1) Integrate this with a physical tracking system to move the camera to follow the target. 2) A simple program to actuate a solenoid when on target. 3) Add gun 4) train with photo 5) leave somewhere days before target arrives. 6) Profit
1. Re:Wow, what a great idea. by bill_mcgonigle · 2011-04-14 07:02 · Score: 2
  
  Why bother with all this when a bluetooth (cell phone) listener with a range weapon is so much less complex?
  
  --
  My God, it's Full of Source!
  OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Very nice. by Animats · 2011-04-14 06:36 · Score: 2

Very nice.
There are other systems which do this, though. This looks like an improvement on the LK tracker in OpenCV.
This could be used to handle focus follow in video cameras. Many newer video cameras recognize faces as focus targets, but don't stay locked onto the same face. A better lock-on mechanism would help.
Re:Um by SorcererX · 2011-04-14 06:39 · Score: 4, Informative

The kinect doesn't have stereo cameras. It has one color camera which isn't really used for much, a IR projector (that projects IR dots all over the scene) and a IR camera. The IR camera uses the pixel distance between the dots to find the distance. The depth image you then get is used as input to the algorithm that detects the body parts and their orientation etc.

--
Any sufficiently advanced technology is indistinguishable from magic.
Re:I for one.. by Ruke · 2011-04-14 06:41 · Score: 4, Interesting

I'm looking forward to looking at the GPL'd source code. There are a lot of ways to do object tracking, and they've all generally got problems, but I was rather impressed with this presentation. It was able to track the moving vehicle while it passed into and out of shadows (non-uniform saturation), as well as track that panda while it turned around (changing its shape), and it was able to distinguish a black-and-white version of the presenter's face (not based on color). It was able to recognize objects that moved off screen, which seems to indicate that it's not just drawing a snake around the moving object. Furthermore, it doesn't seem to need to be specifically programmed to track each object (as we saw the presenter just drag-and-drop a box around his hand/face.)
Why still fooling with ONE camera? by ackthpt · 2011-04-14 06:44 · Score: 2

Shouldn't we be developing AI to use two? I mean, we have two eyes (most of us, condolences to those who do not, no disrespect intended) and we recognize objects, dept of field and rates of change within three dimensions, using them.

--

A feeling of having made the same mistake before: Deja Foobar
1. Re:Why still fooling with ONE camera? by Ruke · 2011-04-14 06:57 · Score: 2
  
  The only thing two cameras really nets you is more reliable depth perception; however, this requires regular calibration, as minute shifts in cameras (say, from being jostled around while moving) can translate to large errors if your focal points aren't exactly where you think they are. It's often easier to track movement using the change-in-size of your object, and have a separate specialized depth-sensor (sonar, laser, etc) to perform depth measurements when you need them to be exact.
2. Re:Why still fooling with ONE camera? by smelch · 2011-04-14 07:05 · Score: 2
  
  And a glossary.
  
  --
  If I can just reach out with my words and touch a butthole, just one, it will all be worth it.
3. Re:Why still fooling with ONE camera? by JanneM · 2011-04-14 10:26 · Score: 2
  
  "99% of living creatures have a pair of eyes."
  Most of those eyes - flies included - are not used for stereoscopic perception. They have two eyes because one eye typically covers less than half the visual field. Most animals' eyes are pointed away from each other, with very little or no visual overlap anywhere.
  Depth perception mostly does not need stereoscopy. If it did, one-eyed people would hardly be able to walk or feed themselves, never mind drive a car or other things.
  Stereovision is good mainly for precise location within, oh, a few meters, in the case of primates. The kind of distance and precision you'd need to move between tree branches. For longer distances, or for less accuracy, other cues are sufficient.
  
  --
  Trust the Computer. The Computer is your friend.
Not a breakthrough by Dachannien · 2011-04-14 06:51 · Score: 4, Informative

This isn't a breakthrough. Much of the technology for tracking objects in this way has been out for about a decade. See this Wikipedia article for one technique for doing this:
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
1. Re:Not a breakthrough by immakiku · 2011-04-14 06:54 · Score: 2
  
  I think the breakthrough is the speed improvements to do this in real time on reasonable commodity hardware?
2. Re:Not a breakthrough by bughunter · 2011-04-14 07:23 · Score: 3, Funny
  
  Indeed. I've worked on some military programs that track and intercept, umm... things... for various purposes... that use this very same image-based tracking algorithm. But instead of painting a red dot or drawing trails, it steers a, umm... vehicle... that... uh... delivers candy.
  Yea. Candy.
  Euphemism aside, he's done a very nice job of integrating it with commercial hardware and software. It's still impressive.
  
  --
  I can see the fnords!
3. Re:Not a breakthrough by durrr · 2011-04-14 08:45 · Score: 3, Insightful
  
  Unless you're a ballistic missile or insurgent you're likely to never see anything of those military systems. If we invent a matter replicator and only use it for creating delicious topping for ice cream it still wouldn't be as much of a waste as the military hush-hush superadvanced fancypants-never-for-good-use systems.
Re:Um by marcansoft · 2011-04-14 06:52 · Score: 2

Not the distance between dots. The camera sees exactly the same dot density regardless of depth because the projector and the camera are on the same plane (it doesn't matter if the surface is near or far, since dots will have the same angular distance when viewed from the camera). What it does measure is horizontal displacement vs. a reference image. This works because the camera and the projector are horizontally offset.
Re:State of the art? Yes. Breakthrough? No. by countertrolling · 2011-04-14 06:58 · Score: 2

It's a breakthrough in the price of AI..

--
For justice, we must go to Don Corleone
Re:What should I name my video tracking technology by PReDiToR · 2011-04-14 07:00 · Score: 2

Bloody stupid name if you ask me ...

I mean they even spelled it properly and everything.

--

Do not meddle in the affairs of geeks for they are subtle and quick to anger
Nothing new or great by Anonymous Coward · 2011-04-14 07:03 · Score: 2, Interesting

As a person who does on a daily to daily basis research on object tracking, and having seen implementations and performances of many trackers (including this one) on real world problems (including gaming), this is nowhere a new approach or an approach which outperforms many other ones published in recent computer vision conferences.
From TFA:
"It is true that it isn't a complete body tracker, but extending it do this shouldn't be difficult."
Going from this to body tracking is a HUGE step, it's not a really easy thing to do. I don't know there is a strange hype around this one which I can't really understand the reason, it's coming up on many websites etc, while as I said not being a great tracker.
Re:Great video, but by hotkey · 2011-04-14 07:14 · Score: 5, Informative

who can't spell
I guess you're fluent in Czech?
Re:No training phase? by ifrag · 2011-04-14 07:16 · Score: 2

I think what was meant is no independent training phase. The training is in parallel with actual use.

--
Fear is the mind killer.
Re:I for one.. by Anonymous Coward · 2011-04-14 08:00 · Score: 2, Funny

Yeah, well what would it recognise *this* as?!
Re:I for one.. by LingNoi · 2011-04-14 08:06 · Score: 4, Informative

The source code was already released. https://github.com/zk00006/OpenTLD
There are a few more repos here.. http://www.google.co.th/#q=site:github.com+%22TLD+is+an+algorithm+for+tracking+of+unknown+objects%22&hl=en&filter=0
Re:I for one.. by webmistressrachel · 2011-04-14 08:19 · Score: 2

Warning: Goatse ahead.
And yes, it could probably be adapted to scan links on pages being viewed in a browser for similar images to goats, and color all goatse trolls red, eliminating the need for posts like this one...

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:Um by deapbluesea · 2011-04-14 08:28 · Score: 4, Interesting

it should be no problem to track individual limbs to generate a skeleton of the user
I'm not so sure about that. He is using a tracking algorithm paired to a template matching algorithm. His claim is that, although both methods have high error rates, their errors are mostly orthogonal to each other. In other words, one method works better sometimes, the other method works better sometimes, and combined, they do a pretty good job. In his videos he's left out scenes where there is a large area of near constant intensity. I'm curious how his method deals with this as there aren't enough details to track, nor are there enough features to template match. Also, with arms and legs, if the texture is generally the same between the two (say you are wearing sweatpants and a sweatshirt of the same color), then there really isn't enough information for the tracker to work with in order to distinguish a leg from an arm. Straight arms and straight legs will both match the template, the tracker will likely struggle with the relatively large area of constant intensity.
That's not to detract from Kalal's research - this is really good work - I just want to point out that it very likely suffers from a few achilles heals not mentioned in his video.
Now pair this method with the kinect, and you might see a real improvement.

--
Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
Um...err... NO!!!! by SpinyNorman · 2011-04-14 08:48 · Score: 5, Informative

No - wrong on all counts.
- Kinect doesn't have stereo cameras (it has an IR camera for depth perception and a visible light camera for other usage)
- Kinect doesn't use the visible light camera for body recognition. Recognition is based on the depth map provided by the IR grid projector and IR camera.
- Kinect doesn't operate like a laser rangefinder (it operates via structured light displacements, not via light pulse reflection times)
- Kinect doesn't track a wireframe (it tracks independent body parts)
How you got modded as "4 - informative" is beyond me. The blind leading the blind.
The way Kinect works is by projecting a dense evenly-spaced grid of IR dots (i.e. structured light) on the scene, then using it's IR camera (horizontally offset from the grid projector) to pick up the reflected dot pattern.
Due to depth differences in the scene, and the offset of the IR camera from the IR projector, the reflected dot pattern is not evenly spaced - the dots are horizontally displaced based on depth. To understand this, consider shining two parallel beams of light at a) a flat surface, and b) a surface angled at 45 degrees away from the light source. If you took a step sideways away from the light beams and looked at their reflections of the two objects, the dot (beam) separation on the flat surface would be the same as the true beam separation, but the dot separation on the angled surface would be increased. by an amount you could calculate using simple trig.
In order to operate in real-time with low cost, a dedicated chip processes the IR camera image and converts the dot displacements into the corresponding depth map.
The clever, and somewhat counter-intuitive, part is how Kinect then turns this depth map into a body part map. The basic idea is that it probabilistically maps local clusters of depths to body parts (via having been trained on a huge manually body-part-labelled image set), then converts these local probabilities into larger scale body part labels (i.e. if 60% of the local clusters in a region say "hand" , then the region is labelled as a hand). This way it doesn't track overall body postion or a wireframe, but rather independently tracks body parts (which is why it has no trouble correctly tracking muliple partially occluded people in frame).
1. Re:Um...err... NO!!!! by Skidborg · 2011-04-14 10:24 · Score: 2
  
  Because quantum physics and neuropsychology make you an expert on object tracking AI and any other field? What if specializing has made you blind to things outside of your field of expertise? You might be making a valid point somehow, but your reeking aura of arrogance makes you a nuisance rather than a good leader.
  
  --
  Supporter of the +1 Over Dramatic mod option. In memory of apk.
In Soviet Russia... by Kamiza+Ikioi · 2011-04-14 08:53 · Score: 2

...the software learns you!

--
I8-D