Slashdot Mirror


CMU Video Conference System Gets 3D From Cheap Webcams

Hesham writes "Carnegie Mellon University's HCI Institute just released details on their "why-didn't-I-think-of-that-style" 3D video conferencing application. Considering how stale development has been in this field, this research seems like a nice solid step towards immersive telepresence. I was really disappointed with the "state-of-the-art" systems demoed at CES this year — they are all still just a flat, square, video stream. Hardly anything new. What is really cool about this project, is that researchers avoided building custom hardware no one is going to ever buy, and explored what could be done with just the generic webcams everyone already has. The result is a software-only solution, meaning all the big players (AIM, Skype, MSN, etc.) can release this as a simple software update. 'Enable 3D' checkbox anyone? YouTube video here. Behind the scenes, it relies on a clever illusory trick (motion parallax) and head-tracking (a la Johnny Lee's Wiimote stuff — same lab, HCII). It was just presented at IEEE International Symposium on Multimedia in December."

8 of 94 comments (clear)

  1. 2.5D, not 3D by adam · · Score: 5, Insightful

    The post title/summary is misleading -- this is actually 2.5D and not 3D at all. (It works on the premise that the background is static, and obtains a matte of the background, and using subtraction to dynamically key/mask the participant from the image, and then add the user as a second foreground layer; on the viewer side, headtracking is used to gently shift the user layer to reveal background hidden behind it)

    For what it's worth, I really don't care for this effect at all. I am not denigrating its inventors in the slightest; this is a novel (read: low cost) approach, and I am sure some people would enjoy having this in their iChat/AIM/skype. To me, it's the equivalent of Apple's Photobooth filters (fisheye, inverted colors, etc) -- a cheap parlor trick that seems nifty for about 5 seconds, and then becomes precipitously distracting. True 3D has its own issues with distraction and visual anomalies (leading to headaches, etc). Even the best 3D cinematographers around have to be very careful to avoid these issues (for instance, Vince Pace, who shoots 3D for James Cameron (Titanic, Terminator, etc) has plenty of headache-inducing scenes in his demoreel, and this is a guy with state-of-the-art facilities who has as much knowledge as anyone about how to do stereoscopic cinematography). Frankly, I think video conferencing is best left 2D, and any efforts toward improving it should be spent increasing framerate/resolution (and reducing lag + dropped frames).

    --
    I am Jack's complete lack of surprise.
    1. Re:2.5D, not 3D by JustinOpinion · · Score: 5, Insightful

      I agree with you: having this kind of 2.5D experience is neat but not particularly useful.

      But I wonder if this software could be adapted to do something else... One of the things that most people dislike about webcam-conferencing is that the other person is never looking "at" you. They are looking on their screen at an image of you, so they are not looking directly at their camera, and so on your end they seem to be looking away from you. (And they see you looking away from them, too.)

      While this may seem trivial, it is actually a significant roadblock to inter-person tele-communication. People rely on body language and eye contact to establish each other's moods, to really "connect". Webcam-conferencing forces us to violate social conventions (like looking into people's eyes), which can be anywhere from subconsciously bothersome, to somewhat distracting, or even perceived as insulting.

      So what I would like is a multi-camera system that uses similar kinds of interpolation to rebuild the image of the person so that they are looking directly at the camera. So if I put one webcam on either side of my screen, they can combine their images to create a shifted image where I am looking directly at the viewer on the other end.

      Though it is a rather small and subtle addition to tele-conferencing, I believe it would have a bigger impact than what TFA seems to be showing. I think it would make the interaction "more real."

    2. Re:2.5D, not 3D by GameMaster · · Score: 4, Informative

      First off, the image would be an, ugly, red/blue mess. Secondly, even if you used one of the more advanced shutter glasses or polerized 3d techniques you'd still end up looking at someone wearing goofy 3d glasses abscuring eye contact. Don't get me wrong, I have no problem with wearing 3d glasses when playing games or watching a movie but not when I'm trying to converse, face to face, with someone.

      --

      Rules of Conduct:
      #1 - The DM is always right.
      #2 - If the DM is wrong, see rule #1
    3. Re:2.5D, not 3D by forkazoo · · Score: 3, Informative

      So what I would like is a multi-camera system that uses similar kinds of interpolation to rebuild the image of the person so that they are looking directly at the camera. So if I put one webcam on either side of my screen, they can combine their images to create a shifted image where I am looking directly at the viewer on the other end.

      Geometric view interpolation is not unknown in the labs right now, and in some cases is being researched for exactly the reason you suggest. As another poster suggested, there are certainly some cases where the interpolation will break down. (Put a hand in front of each webcam at the side of your monitor, and it won't interpolate two palms to look like your face, for example.) Another one is that anything transparent makes it impossible to estimate the depth at a particular point because there are actually two depth values there. So, the smoke from your cigarette which is an amorphous volume of semitransparency through which you can see a window, the schmutz on the window, a reflection on the window, and something through the window will just ruin any chance of doing the interpolation properly. When you try to shift the pixel correctly to accomodate for the view shift, you get like seven different answers for what direction it is supposed to go.

      Still, look up the Foundry's "Ocula" system for 3D cinematography. It's a shipping commercial product that does a lot of strong magic with stereoscopic imagery on a daily basis. (Which i would have assumed was currently impossible.)

      It's too slow to be used for real time conferencing. You let it cook overnight for a single shot, or a handful of shots to compute disparity maps offline. It needs to be at least an order of magnitude faster to be practical for real time work. Thankfully, there are a lot of researches trying to figure out clever hacks to speed up these sorts of things, and a lot of engineers figuring out ways to build stonking GPU's to run OpenCL in a year or two. Expect stereo stuff to become mainstream somewhere around 2011-2012 would be my guess.

  2. The tech is cool, sure.. by Quarters · · Score: 5, Insightful

    ...but that sample conversation at the end of the video may have well been between two drunken epilepsy sufferers on boats in the North Atlantic. Who moves around like that while they are talking?

  3. Re:Game control? by Wumpus · · Score: 3, Informative

    John Carmack prototyped this a few years back. His conclusion at the time was that there was too much lag in the system to make it really useful.

  4. Bandwidth reduction? by Anonymous Coward · · Score: 3, Interesting

    I wonder if a more practical use would be to use the technique for video bandwidth reduction. If you know where the person is, you could concentrate video bandwidth on the face region, while keeping the rest of the "video" relatively static. No point in continuously compressing and sending boring background. Of course many codecs already do temporal compression that gives a similar effect, but this might increase the efficiency for video chat.

  5. I worked on that too. Look at these vids... by dinther · · Score: 3, Interesting

    Inspired by Johnny Lee's stuff, I pulled some old code out over a year ago and turned it into a decent engine that handles multiple screens and head tracking (TrackIR) to achieve the motion parallax effect. Like with all 3D effects, it needs to be seen but the following videos give you a good idea.

    Have a look at these demo videos and you can even download a demo:

    My first test
    http://nz.youtube.com/watch?v=X8PevTuEWlg

    More accurate tracking
    http://nz.youtube.com/watch?v=yf1hu6GLmf0

    Multi screen study
    http://nz.youtube.com/watch?v=ZBdtPz2V_vY

    Engine complete
    http://nz.youtube.com/watch?v=ku76aHq3pps
    Download Demo
    http://vandinther.googlepages.com/virtualwindow