Hardware for Homebrew Motion Capture?
goruka asks: "We are a small garage game development and 3D animation group, and as such, we try to develop by reducing costs as much as we can. Recently, it came to our mind that we could setup and develop a home-brew motion capture system by using three consumer USB web-cams to motion track bright objects attached to the body. However, we don't know which web-cam models can: capture at a decent frame rate (25fps) and resolution; are supported and easily programmed under GNU/Linux, since we'd like to later release our software as open source; and lastly, won't cost us a fortune. What are your experiences with such devices?"
http://www.hackaday.com/entry/1234000427059760/
We run an Axis 207 at work. Pair it up with Zoneminder and you've got yourself a montion capture system, albeit in the form of home security system software.
ACs are modded -6. I don't read you, I don't mod you, I don't see you. Don't like it? Don't be a coward.
http://www.compusa.com/products/product_info.asp?p fp=SEARCH&Ntt=philips+900&N=0&Dx=mode+matchall&Nty =1&D=philips+900&Ntk=All&product_code=337160&pfp=s rch1
The reviews are not exaggerating, it's a nice camera.
I forgot, it has a usb-audio device endpoint two that's a built in mic, but that's not important.
The 1280x960 modes mentioned are software scaling, so they're useless. It's a fairly standard CCD board in the unit that is 640x480. Since it uses a Bayer pattern to filter color, you're going to want to throw away the chroma components in your analysis. You might be able to use chroma for helping it distinguish the balls from the background, but you'll want to use the luma information for accurate tracking.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
First, they have many, many cameras, because you have to have 3 unobscured camera views to triangulate a point. I assume you want to mocap people doing game moves, so multiple camera are required. Also, they capture with infra-red, not visible light. They put infra-red reflective spheres on the people at key locations, so what the camera picks up are dots on a black background. This is for 2D tracking.
Some real time tracking occurs, and and a rough 3D wireframe is generated so they can see if they have a good take. Note that it's not one computer for all the cameras because of bandwidth limits. You may not be able to support very many cameras per computer, because you need to save all the frames for post processing. The rough tracking data is sent via UDP to a single computer that does the wire frame display.
After you have captured the data, then it gets really hard. You need to calibrate all the cameras so you can combine their data. You need to survey their positons and their pointing angles. It is possible to calibrate by locking down the camera and shooting targets before you start. I don't know if you need to correct for lens distortion or not, it may depend on your cameras.
The cameras have to be synched. If they are taking picture as different times then 3D point positions will be not be right. Web cams don't have external synch.
First, you have to do 2D tracking for each camera. Then you have to figure out which 2D tracked point on camera A corresponds to the same point on cameras B, C, D for each frame, typically while the mocap actors are being very athletic. The you need to combine multiple 2D points to 3D points. Remember that 2D points will dissapear and reappear during a move.
After you have 3D points then you need to connect those dots into motion paths. This takes a lot of very complex motion filtering software. People often use Kalman filters for this. Sometimes they do Kalman filtering in 2D and in 3D. Multi pass filters can be used, where you go from 2D to 3D to motion paths, and then you take the estimated 3D positon and project it back to 2D. The back projected data is combined with the captured images to get better data for the next pass.
Assuming all that works, then you can take 3D path data and translate to the frame of reference on the person so you can animate the character. Are you going to use inverse kinematics to to derive joint angles from end posions of arms and legs? Often times you measure points on both sides of a joint and directly measure joint angles so you can directly apply the measured data to the 3D model.
Heck, I don't do this stuff, I just have been around it, and these are things I remember. Optical tracking is very hard. People still use magnetic tracking and joint flex tracking, sometimes in combination with optical, becasue they are better for some kinds of measurements. Now you know why movies and high end vidoe games are so expensive....
Couple of things in here, from researching the field with a university research lab to see about buying commercial gear, I have a lot of suggestions.
- For your camera, look for cheap used DV cameras on ebay. Not super high res, but lots of them 3 ain't going to cut it, consider at least 9 (high/low from each of the cardinal directions, and on top [might want a few for different sectors]) - occlusion is an absolute bitch of a problem.
- This will provide reliable time-synced data, and NOT max out your USB bus.
- USB cannot provide you with images from 3 cameras with the same timesync, it's just not capable of such behavior.
- Firewire has a longer length limit on the cables, which is a big help for your work.
- Cheap PCI firewire cards - two should be enough, this will give you 6 seperate firewire busses, and put you at the limit of your PCI bus.
- Find filters that fit said cameras, and are opaque to visible light, but transparent to infrared.
- Rig up really bright infra-red lighting, ideally with a low quantity of visible light output.
- Go to an burgler alarm supply place, and buy infra-red reflective tape - I leart this tip from the EA guys a couple of years back, the 'official' reflective tape from 3M costs too much, and is a pain to order, but alarm places stock stuff that works even better, and is cheaper to boot.
- Buy really small polystrene balls, and cover with infra-red tape. On one small part of the ball, put the hook side of a velcro dot. These are reusable now, avoiding problem with tape waste. You can also clean them easily to keep them very reflective.
- For your subjects, get them to wear any clothing that velcro will hook reliably onto (pretty easy choice)
- Place the reflective balls on either side of every joint, spaced not more than 90 degrees apart - eg your elbow should have 8 balls.
Using infra-red helps reduce the data-set size way down, and also lets you use the cameras in monochrome for capturing, greatly reducing the data-set size.
From working with several commercial mocap rigs, I'll say that the calibration routines are extremely important. You need to accurately map the entire volume that you wish to capture in. Depending on space available to you, consider building a simple frame or using a lighting rig to attach the cameras to.
I will repeat again, occlusion is an absolute killer problem. From visting the EA facilities in Burnaby BC to specifically research their systems (I was working with a university research lab at the time), they estimated that they lost 2 hours of production a day to occlusion problems during mocap shoots.
Your system must be capable of tracking all the balls, all of the time. If it loses one, it's almost impossible to pick it up again properly during a runtime - you'd need to recode the relative location of that ball before it gave you useful data again.
ICQ# : 30269588
"I used to be an idealist, but I got mugged by reality."
The iSight was only discontinued in Europe it turns out.
i n_Europe
http://en.wikipedia.org/wiki/ISight#Discontinued_
-
Systems Administrators: We read the manual so you don't have to.
I can second the OpenCV nomination.
However, I think I may be able to add something to the puzzle: I was informed (but have not yet tested) that IEEE1394 (Firewire) cams will synch across the bus. This means that you no longer have to worry about adjusting for framedrops or timing or whatnot. Rather, the two cameras "see" their fields in lock-step with respect to time. I know that some folks here locally have had great success with Uni-Brain Fire-i cameras, but earlier in the thread someone reported a bad experience with them.
However, this being slashdot, I must remind you that YMMV.
With that said, your ideas on using webcams is spot on, but you are going to need more than three, mainly for occlusion handling. For the rig I was contemplating (using webcams much the same as you), I was thinking of at least four cameras. The main problem I ran into (just in thinking about it, no actual implementation), and as others have described, was timing issues. For best results, you need all the frames captured from the cameras to happen at the exact same time. Since with USB webcams this isn't possible, you either need to come up with another solution (people here have mentioned some "high end" cameras that have syncing systems), or deal with it in software (very difficult to do, in addition to dealing with everything else, and still getting a high frame rate).
Another problem you are going to run into (and has been mentioned by others, but not much on the reason) is webcamera resolution. Most webcams that capture at decent framerates do so at QVGA (320x240). Even those that capture at a real 640x480 typically do so at only around 15fps, instead of 24 or 30. Rare (and more expensive) is the webcam that will capture at 24-30fps with VGA resolution. Even at VGA resolution, though, you are going to have to deal with the angular vs pixel resolution of the camera. What I mean by this is that as an object moves throught the FOV of the camera, it is going to only be imaged by certain pixels of the CCD imaging device. Depending on the distance away from the camera, the object may move say a foot, and only move (on camera) a pixel or so. The further away the moving object, the fewer pixels covered due to parallax. This translates into a lower resolution of pixels (on camera) to inches/cm (in real motion). In fact, this is almost the inverse problem of HMDs, where you can have high resolution, and low FOV, or vice-versa. In order to have both (in either cameras or HMDs), you have to pay a lot of money. In optical camera-based mocap, this means HDTV or better resolution cameras. I hope you understand what I mean here, because it is important for motion capture where you may be capturing large amounts of motion over a lot of area. For close-ups (like facial capture) it is less important - but remember, the higher the resolution of the camera, the finer the motion you can capture at all distances from the person/object to the camera. Higher resolution cameras translate into higher prices for the system, because you have to deal with more data, all in realtime. Not easy, not cheap.
You might best be able to deal with this by going the custom camera route. What you would want to do is build a custom frame capturing system, using 640x480 (or better) b&w CCD cameras (you don't need color, you just need IR sensitivity - even with B/W cameras, you are going to filter the final image down so far that it is mostly only a true b&w 2bpp image - so the closer you can do that in hardware, the less you have to do in software). This won't be easy, but many people have done similar systems for homebrew robotic vision systems, so look there. Realize that this kind of a project will likely dwarf your game development project in both hardware and software needs, and you might end up with a system
Reason is the Path to God - Anon