Combining Two Kinects To Make Better 3D Video
suraj.sun sends this quote from Engadget about improving the Kinect 3D video recordings we discussed recently:
"[Oliver Kreylos is] blowing minds and demonstrating that two Kinects can be paired and their output meshed — one basically filling in the gaps of the other. He found that the two do create some interference, the dotted IR pattern of one causing some holes and blotches in the other, but when the two are combined they basically help each other out and the results are quite impressive."
awesome
u don't conform to the character limit for sub-headings?
Finally had enough. Come see us over at https://soylentnews.org/
How cost and/or physics prohibitive would it be to exploit the fact that "IR" actually covers a number of frequencies of invisible-to-the-naked-eye light with similar properties? Could one modify a Kinect with appropriate narrow-band filters, so that a second Kinect, with filters for a different narrow band wouldn't even see the dot pattern of the first? If possible, how many Kinects would it be possible for(or, at what point does the required narrowness and wavelength tolerance requirements become absurdly costly?)
Is that A)Wholly impractical, because of some sort of effect the reflecting materials would have on the IR wavelengths, B)Sure, it's possible; but have you checked the supplier's price list for narrowband IR filters recently, or C)Just a bit of ebay and some steady hands?
Perhaps more practically, I wonder if the Kinects could(with some mixture of hardware shutters and firmware or driver mods) be made to trade off sample rate for coverage(ie. if the kinects are ordinarily taking 60 frames/second, could two kinects be made to take 30 frames/second each, turning off their IR source when it isn't their turn, and turning it on when it is) or does their mechanism of operation require too much time to calibrate itself on startup?
What feat would that be that one stationary ear could do as well as kinect?
Recognize your voice from the kitchen
So wont 3 Kinects make 3D video?
There is a class of visual inputs that makes the human brain just tie itself in knots, even once you know that the trick is, "optical illusions", Escher stuff, and the like.
I wonder what the class of "optical illusions" for the Kinect's vision system and algorithms is... Off the top of my head, I'd imagine that retroreflective materials might kind of freak it out; but I'd be curious to know if there are any stimuli that cause it to wig out in weird ways, the way that optical illusions do the human visual system.
This makes for real 3D movies. Capture the streams from both sources, combine in real time in the viewer, and you're able to change your PoV and focus independently of any other observer.
This is revolutionary for entertainment. Not stereoscopy.
Finally had enough. Come see us over at https://soylentnews.org/
Can you imagine a beowulf cluster of kinects??
We don't know how to use jelly yet, so we settle with plastic and metal.
Still a crazy task.
I wonder what the class of "optical illusions" for the Kinect's vision system and algorithms is
I'm guessing kinect makes assumptions based on common human bone structure, e.g. something like a dog might freak it out and make it explode.
Am I the only one imagining getting a Kinect or two in every room of their home and then use it to fly through the 3d video feed of their apartment?
The "good ol' brain" does a fairly crappy job, actually. 3D vision systems like these tend to perform quite a bit better than we do. And we only do as well as we do because we can use a lot of indirect clues based on our long experience with a 3D-world - we know how big stuff normally is, for instance, so we can judge distance from size. Mess up those clues and we completely lose it.
And even with good clues we don't actually measure distance well. Have somebody place items on a parking lot or some place like that, then try to guess the distances. Not going to be very accurate. Try to estimate distance vertically rather than horizontally and you'll do even worse; you have fewer clues and less experience to fall back on.
Trust the Computer. The Computer is your friend.
You are essentially just comparing the brain to the computer. We would likely have better spatial resolution if we had more ears and eyes as well. And most of the capabilities of the ear, especially in regards to space, is learned based on the combination with other responses like vision and touch. If you lived your life from the beginning with your only sense being a single ear, you'd probably do worse than a Kinect unless someone explicitly taught you what the things you were hearing meant, if you could ever learn to understand them at all.
With all this stuff in the news recently about backscatter machines and the need for improved x-ray machines, this sort of system would be fantastic for improving the quality of screening, being able to look in and see depth in luggage.
Waiting for an amusing sig.
... is good, but I'm holding out for 4 Girls, 3 Kinects, 2 Boxes, 1 Cup :)
THE HONOUR OF THE KNIGHTS - CC Licensed Sci-Fi Novel
I would say both would be very accurate, considering no actual measuring would be taking place. You can extrapolate points of reference:
- A car is approx 3m long 2m wide. A parking space is about same
- The lanes between spaces are 2 cars wide, to allow for idiots who can't follow the arrows.
- Basic trig can give you any distance in a parking lot.
The same applies to buildings. The average person is 6' tall, with 18" spare to the roof. The floor space is approx 6", making each floor approx 7'. Multiply $floors by 7. For offices, assume false ceilings; 9' per story.
This does go back to your "time spent in the 3D world" though. If we had no point of reference, yes we'd suck. However, we do, so we don't.
Finally had enough. Come see us over at https://soylentnews.org/
When I first saw the video of one Kinect, I immediately wondered how you could get multiple units working together.
It wasn't until I watched the video again later that day that it hit me. I had just explained to someone how 3D theater projection works, and so I had an epiphany: The most sensible course is to use polarizing filters.
With filters on the IR emitters and cameras, the units should be able to only see their own IR illumination. Of course, it would only work for two Kinects with maximum effectiveness, but considering how well this turned out with the units at right-angles from each other, I don't see why you couldn't combine the two ideas for 3-4 units and get sufficient quality.
I wish I had the money to get a couple Kinects and test my idea, but I'm no good with coding anyway.
It'd be awesome to see the Blender Foundation put out a bounty for a Kinect-based open source motion capture and 3D scanning suite though. :D
Friend: "The NIC is misconfigured..." Me: "No prob, I'll just telnet in and fix it." *Silence*
Which is exactly what the parent said. Besides, look at it this way. You're using cars as an overlay grid. The Kinect is using a dot patter projected in infrared. What's the difference? Or, if you were to go to an empty grassy field, how would you distance estimates do?
Vintage computer games and RPG books available. Email me if you're interested.
And even with good clues we don't actually measure distance well.
Yep, just look at the quarterback for the Carolina Panthers.
Todd: I hope it proves as delicious as the farmers that grew them
Ear's can't do that, that's the brain. And for the kinect that would be the program in the device it connects to.
You can get training if you really care.
As the video demonstrates, the Kinect is fooled by spurious pattern projections from other Kinects in the vicinity. This could be solved by replacing the IR source in the 'projector' (actually a point source and a pinhole grid) with one of a different wavelength, and adding appropriate filters to the IR cameras in each Kinect. Each Kinect would then only see IR light of the 'colour' it emits. This would probably require the use of slightly brighter IR emitters.
The results look an almost identical to the kind of data I get from the NextEngine 3D laser scanner. To create a 3D surface, the device sweeps a laser across the object in front of it. The laser sweeps a vertical line, and shines on the (arbitary) surface of the object in front of it. Stereo cameras capture the shape of the laser line from different angles, and software is able to extract the 3D surface from there. An accompanying visible light image from one camera or the other is used to apply a "skin" to what is otherwise a wireframe. By using a laser and taking its time, rather than broadcasting an infrared grid of fiducial dots, the results are very good: sub-millimeter accuracy is easy, though for handheld objects, not people in a room. Similar technology can be used for very large scale models, such as the I-35W bridge collapse in Minneapolis.
Your comment reminds me of an interesting experiment you can do in 2D. Show people a page containing nothing but a creature that can't possibly exist and ask how big it is, obviously there's no way to answer without scale. If you put a picture of an elephant next to the creature it looks huge but if you put a picture of a mouse next to it the creature looks small.
They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
Rugby pitches. 100m long. Divide or multiply as required. Plus, a healthy background in outdoor pursuits gave me a good eye for horizontal distance.
Plain buildings a la MiniPeace. however, would throw me completely.
Finally had enough. Come see us over at https://soylentnews.org/
Wow, imagine a Beowulf.... blah blah.
Because that's not a plural. What do you think I am, an idiot?
1. Find YouTube channel with worthy content
2. Subscribe
3. Share new videos on Slashdot
4. ????
5. PROFIT!
That's typically a product of training. We don't have much experience with it, because we don't need it.
But take an aborigine, and ask him to estimate how far something is, and you'll get a good accurate answer, even if it's not in feet and inches.
And even with good clues we don't actually measure distance well. Have somebody place items on a parking lot or some place like that, then try to guess the distances. Not going to be very accurate.
And yet we are able to navigate and interact with our environment with a high degree of precision. When I'm driving a car, for instance, without looking at how fast I'm going, knowing distances, the weight of the car, my acceleration and deceleration capabilities, I'm able to stop at a line painted on the road to within half a meter. Just with my eyes!
I work with robots, and even knowing all this information to a high accuracy, there is so much work that needs to be done with localization, navigation, planning, etc. to get it to mimic my performance. The robot must be equipped with laser range finders, wheel encoders, global positioning systems, and an array of other sensors. If only I could slap a vision system on it and call it a day. Whatever the human brain is doing under the hood, it's incredibly sophisticated. We're bad at estimating distances because we don't need to.
Ya know to the best of my knowledge you cannot use the Kinect as a webcam in Skype. I would love to buy a Kinect but I need a reason other than awesome tricks, I need useful functionality.
He's not the only one. My depth-first recursive post counter has found hundreds of such posts.
blog.sam.liddicott.com
Yes, we are. Our vision system is pretty successful when you look at how we actually use it in the real world. We don't actually need to know the precise distance to things; what we want to know is rather direction and time to impact and similar and we're really, really good at that (look up tau-margin estimation for instance). Though note that with a human-level vision system you would still need a lot of those sensors you talk about. Our vision system absolutely depends on proprioception to figure out where we are in the world and compensate for our own movements; we need separate dead-reckoning systems and (again) a lot of experience to be even somewhat correct about our movements over large distances and so on.
But I wrote this in reply to a poster that seemed to believe we humans are actually better than Kinect at the specific vision tasks it's built to do. Too many people seem to believe that the mammalian vision system is inherently great, at whatever tasks we imagine, and that if we could only make something like it our machine vision problems would be solved. That is simply not the case.
Trust the Computer. The Computer is your friend.
I do, kinda.
Just wait for the flood of homemade 3d pr0n :)
(hey somebody had to say it)
But I wrote this in reply to a poster that seemed to believe we humans are actually better than Kinect at the specific vision tasks it's built to do.
But we are better. Kinect is built to recognize faces and body postures, it’s not built to estimate the distance from you to the TV even if it can do that more accurately than we can.
Headings are for brief topic summaries (a few words.) Not content.
Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
Well, your wrong. "Connects" is a verb, and everybody knows that even plural verbs do not get apostrophe's. Sheesh man, do some research.
But we are better. Kinect is built to recognize faces and body postures, it’s not built to estimate the distance from you to the TV even if it can do that more accurately than we can.
That is a ridiculous statement. Kinect builds heightmaps. If that's not estimating the distance from you to the TV then I don't know what is. Kinect in fact does the other cool things it can do specifically because it is built to estimate the distance from you to the TV, when other camera systems are not. If this was ALL it would do you could still do the same stuff on the 360 in software, but it would take away from the available processing power which is why embedding it as a complete solution was the smart thing to do.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Given a quality enough image, bandwidth, and some motion-sensing gear (ahem), any immersion-style display (HUD, dome, etc) could allow for real-time panning of a distant location.
Examples:
- shooting a net of these at an operating table would let remote viewers move around the room and view the procedure without crowding the room or limited to the perspective of the single camera.
- a web site could point this setup at anything interesting (lab experiment, box of puppies, anthill, construction site, political debate) and stream it live for an amazing viewer-decided perspective.
- live news could mount an array of this setup to a vehicle and capture a modeled view of anything they could reach, then pan around without much camera work.
CMOS sensors light gathering capabilities fall off over increasing wavelength.
Silicon's quantum efficiency at NIR is much lower than visible. There's not a
huge range of NIR to play in without QE falling off.
IR diodes don't emit light over a single wavelength. Not only do they shift long with
temperature, but the rated wavelength is really an average of the range the wavelength
drifts over.
Very tight bandpass filters tend to drift shorter in wavelength off axis.
Check out this video where a guy has taken the model from the kinect and replaced the points with variable sized blobs. Looks cool. Fat Cat
Unexpect the expected!
This could be solved by replacing the IR source in the 'projector' (actually a point source and a pinhole grid) with one of a different wavelength, and adding appropriate filters to the IR cameras in each Kinect.
Or maybe timing the grid light to be off while the other camera is on and vis-versa. Alternating back and forth quickly like the 3D LCD 'shutter' lenses. This way the grids would not interfere with each other. Don't have to turn off the cameras, just don't use the 3D grid data from those frames where the opposite grid is being used.
Just my .01 worth. :)
I think you misunderstood me. Building height maps is just a means to an end; the end being figuring out just what you're doing with those limbs of yours. The Kinect wasn't created/built for the purpose of measuring objects, even if it's better at this than us humans.
The Kinect wasn't created/built for the purpose of measuring objects, even if it's better at this than us humans.
That's a big fail of a response. The statement that prompted your original comment was "But I wrote this in reply to a poster that seemed to believe we humans are actually better than Kinect at the specific vision tasks it's built to do." and you said "But we are better. Kinect is built to recognize faces and body postures, it’s not built to estimate the distance from you to the TV even if it can do that more accurately than we can." But that is plainly false. Kinect is built to measure the distance from you to the TV, it has hardware specifically for this purpose. We do not have hardware specifically for this purpose; we infer the information from our existing sensors. So no, I understood your comment perfectly, and I simply disagree with everything you said.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I simply disagree with everything you said
So you believe that Kinect is superior to humans at recognizing faces and body postures?
Thi's is a brilliant thread.
The Kinect uses an LED laser, so is truly monochromatic. You'd need some very narrow band-pass filters, but these are available, albeit sometimes bespoke.
I wasn't reading the thread or commenting on your grammar... I just think you're an idiot.
The (testable but not yet tested by the public, to my knowledge) question is whether a Kinect unit needs a prohibitive number of frames withits own IR unit on in order to figure out what is going on.
If it takes 30-60 frames to do so, that is only a 1-2 second delay, which is nearly irrelevant from the perspective of the standard use case. Just have the menu do some slightly sci-fi transition during that time and they will never even notice.
If, however, you are trying to use two or more Kinects with shutters, you go from "two Kinects, each at half frame rate; but no distortion" to "two kinects, approximately 1 usable fix every second or two' but no distortion". That isn't totally useless(if imaging a static object, throwing a few kinects on the floor and letting each take their turn, then crunch the results, is still pretty easy; but it is way too slow for 3d video purposes.
If a Kinect can get a fast fix on startup, all is well and shutters would work, if not, you'd pretty much have to play wavelength tricks or put up with some interference...
Do you really consider it a fair competition: your brain against a device that has a sales price of $150?
Isn't it nice to know that someone at Microsoft could be checking in on our kids doing gymnastics? Most of us will just be leaving it plugged in all the time in our living rooms... I feel safer already.
Wikileaks Is Democracy