Ask Slashdot: Tips On 2D To Stereo 3D Conversion?
An anonymous reader writes "I'm interested in converting 2D video to Stereoscopic 3D video — the Red/Cyan Anaglyph type in particular (to ensure compatibility with cardboard Anaglyph glasses). Here's my questions: Which software(s) or algorithms can currently do this, and do it well? Also, are there any 3D TVs on the market that have a high quality 2D-to-3D realtime conversion function in them? And finally, if I were to try and roll my own 2D-to-3D conversion algorithm, where should I start? Which books, websites, blogs or papers should I look at?" I'd never even thought about this as a possibility; now I see there are some tutorials available; if you've done it, though, what sort of results did you get? And any tips for those using Linux?
Don't do it.
Give me Classic Slashdot or give me death!
we all were suckered. we tried it, hated it and moved on.
each time they try to re-invent this, its still just an effects gimmick.
you'll soon grow bored.
don't invest anything in this. its a reocurring cash grab due to industry boredom.
and as a fulltime glasses wearer, I'd never be caught dead with cardboard glasses over my regular ones. an absurd concept if there ever was one.
--
"It is now safe to switch off your computer."
I'm interested in converting 2D video to Stereoscopic 3D video
George Lucas, is that you?
A friend of mine used to work for a French special effects company and he had to work on this. He told me that this is basically a world of pain and it produces great piles of smocking shit. It just sucks, even when done properly by highly trained people. Can you imagine making 3D out of a 2D tree? Make every background 3D or properly cut out the character to get the desired effect?
It sucks, it's mostly manual, get over it.
Stupidity is the root of all evil.
That's you, isn't it George Lucas?
Dammit, leave the original trilogy alone! The digital "remaster" was insulting enough!
An enigma, wrapped in a riddle, shrouded in bacon and cheese
You will want to avoid the old paper red/cyan glasses and go with the slightly more expensive plastic ones that are designed for LCD monitors and TVs. Otherwise be prepared for a LOT of ghosting. Also, nvidia makes platic red/cyan glasses that are designed to fit over regular glasses. You may also need to calibrate your monitor to make sure that red is really red and cyan is really cyan.
I was personally very surprised at how well red/cyan works. Of course the colors get a little muddle, but not as much as I had expected.
I bought these, btw http://www.amazon.com/Glasses-Pro-Ana-movies-Computers/dp/B0036NP3CS/ref=sr_1_1?ie=UTF8&qid=1327421067&sr=8-1
You can't turn a two-dimensional photograph into 3D because the original has lost all the phase information that conveys needed info (e.g., "depth"). Similarly, you can't restore 2D sound to 3D, because the essential information isn't in the source recording that you'd need to "position" all the sound sources in 3D. In general, you can go from (N+1) to (N) dimensions, but you lose information. That means you can not automatically go from (N-1) dimensions to (N) without restoring that lost information...which wasn't recorded. Therefore, you'd have to synthesize every frame of video/sound to add the missing stuff, and you can't get it automagically, because the (N-1) version simply doesn't have the information you need to make the transformation.
Example. Set up an orchestra with a flutist positioned 20 feet above the main orchestra. 2D mics have picked up all sounds, but they have no sense of where, vertically, each musical instrument is located, because the two (or more) horizontally-dispersed stereo microphones are laterally displaced. You've have to add microphones that are positioned vertically to gather the phase information for 3D, but your recording has no such information.
Clarification -- Arduino doesn't suck, just paraphrasing the unfortunate mentality of a bunch of posters on this article. It is bewildering to me that on a "news for nerds" site, people are disparaging somebody from undertaking what could turn out to be a cool tech project, even if it is known in advance that the end result isn't going to be "Avatar". And even if the best of 3D is a bomb in the theater, that doesn't mean it isn't a lot of fun to play with, as a school project, etc. I enjoyed messing with this stuff in physics lab in college.
Contra my provocative subject, Arduino is an excellent choice for serious hobbyists. And similarly, there is nothing wrong with playing around with 3D video techniques and even being willing to try rolling one's own algorithm.
Get a (homebrew friendly) life, slashdotters!
(If the OP clarifies that he's working on a big Hollywood title, I'll take this back. Until then...)
This anon has it right. If you have two synchronized 2D films of the same thing from slightly different angles, then you can try to match objects in the two frames, and use that to determine the depth. You could just apply the red filter to one film, blue to the other, superimpose, and boom, it's like you're watching the two different films with your two different eyes, and if they were filmed with cameras set properly, it will actually look right. But it sounds like you only have one 2D film. Here, the best you can hope for is to identify different objects in the film and apply a depth to each one, try to match and track those objects across different frames and keep the depths consistent. If the objects are sitting on a flat floor and you can see their bases, or if you can see shadows of the objects, you may be able to use that info to determine depth. Otherwise, you have to guess, and the result will probably look poor.
this can be done easily with ffmpeg and imagemagick - you need two video sources, and from a ffmpeg script, extracting a picture sequence from both videos, one sequence from the left camera, and another from the right - with a bash script using imagemagick you will separate the colour channels from each frame: red from one camera, and green/blue from another - and having the separation done, you will join with imagemagick again the red channel picture frame from one and green/blue from another, into a new picture sequence, and when you have this sequence ready, you convert it into video again with ffmpeg - try googling for ffmpeg and imagemagick instruction arguments when coding this bash script
THIS. Somebody mod parent up please.
...gives me a headache, especially the flickering kind.
My blueray player can simulate 3D from any 2D source (Panasonic DMP-BDT210) although I'm not exactly sure how it does it, or how good it looks. (no 3D tv) You might be able to talk someone into connecting one up at your favorite bigbox store for you if you acted interested in buying the blueray player, and wanted a demo of its conversion capabilities. This would at least give you a firsthand idea of how it will look to see if YOU think its worth it.
There was a recent NOVA episode about aerial photo reconnaissance during WWII. To make stereoscopic images, they'd fly the plane straight and level over the target. If they could take multiple pictures with 60% overlap, they could use two adjacent images to make one stereoscopic image that was good enough to tell a ship from a decoy.
Any motion picture where the camera pans side to side gives an opportunity to create a "3d" image. If an object moves across a still camera, you can also derive 3d information. (Also if it spins)
An interesting exercise would be to process a film, and make stereoscopic only what what can be done properly, and leave the rest flat. A scene would start out flat, then people and things would begin to jump out at you.
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
The convert utility in the imagemagick package does a good job of it with still images. I'd consider dumping your frames out as a series of images, running the convert utility on them, and then re-creating your movie.
I've also thought that taking that code in convert, merging it into VLC, and setting up VLC to grab from 2 cameras at once... with enough CPU and RAM, it could be come very close to real time 3d movie.
Don't blame me, I voted for Kodos
You might want to look into luminosity based research. The brightness at each pixel may contain some information of the angle of the surface with respect to the camera and a light source. At some point that looked potentially promising. But of course the technique can fail pretty easily. Much of the work I've seen is based on trying to figure out how our brains do this all the time. Try closing one eye, see how 3D the world still looks (better than most 3D movies). You are going to have a tough challenge to beat that. But that doesn't mean its not worth trying.
1. Display 2d images on a flat panel tv facing you
2. spin the display 45 degrees so that one edge is nearer to you the other edge
3. That's it --notice how pixels on one side are closer to you when the ones on the opposite edge are futher away from u spetially)you display is in 3D now.
2d->3d converted media is much more likely to make people feel sick or get headaches from the video than media recorded directly in 3d. There are two reasons for this. Firstly, because you lack some information. For instance if you look at a box that is obscuring your vision of the objects behind it in the real world, each eye has different information based on its perspective. (Try looking at something with one eye, then the other, and look at what changes behind the object). 2d media will only have the information for one eye, and you'll have to make up/fake out that second eye. Secondly, you're trying to fake out the depth cues and it's very hard to do right because you often don't have the depth buffer necessary to do it right.
Standard Template:
I want to do _something_, but I do not know how to do _something_, how do I do _something_, provided I don't know how to or do not want to waste my time using Google.
Then a barrage of responses by people that don't really know how to do _something_, but surprisingly have a lot of opinion about _something_.
And then, of course, a smart *ss like myself pointing this out.
I haven't thought of anything clever to put here, but then again most of you haven't either.
In a few words: if you only have a 2D video, then it is a very hard computer vision problem, that has not been solved on the research side.
There is an active benchmark of disparity estimation algorithms (full bibliography at the end of the page). Those algorithms take two pictures and estimate a depth image. From this depth image, it is possible to reconstruct the scene in 3D (but you cannot see what's behind objects). From my experience, this class of algorithms do quite a bad job with real-life images, and have not been applied to video at all.
I've been using optical flows (see a related benchmark) for the development of an Android app (3D Camera) that converts pictures from 2D to 3D, without glasses (check it out!). The optical flow is a more general version of depth estimation (i.e. in any direction, not just left to right motion motion). It has been applied 3D conversion of videos with relative success, I can search for references if you are interested.
From my knowledge & experience, optical flows are the state of the art algorithms to convert 2D pictures/videos to 3D, but they are quite computationally intensive.
Despite what some PR hustling excitables might claim, stereoscopic conversion cannot be effectively automated at this time. Do people try it? Yes. Does it generate watchable results? Sometimes by accident, yes.
The thing is, a stereoscopic conversion done painstakingly frame by frame by a highly skilled compositing artist looks pretty bad. Any automated conversion process will be orders of magnitude worse.
What you need is a ton of really excellent rotoscoping (I send my jobs out to work farms in Russia) to separate all of the elements, and then a compositing application like After Effects or Nuke to offset the various layers along the Z (while scaling to retain size coherency). Now the fun part: fill in all the missing pixels your offset has made visible! A combination of displacement maps, cloning and hand-painted details should do the trick (this is the part that separates the men from the boys).
Your mileage may vary, but in ideal circumstances this is still a pretty hard trick to pull off without inducing headaches or making everything on screen look like cardboard flats.
I'm waiting for the 1D to 2D algorithms to be perfected. I have this 1D sketch of the battle of Bull Run that I'd really like to get converted. Here's the 1D version: __________________________ ________________ ___ ____________ Hopefully Slashdot doesn't get a takedown notice. What will be really awesome is when all of these work together, so I can convert that 1D drawing into a 4D movie!
Comment removed based on user account deletion
Monoprice sells a 2D to 3D HDTV/DLP Converter (Frame Sequential, Side by Side, and Red/Cyan) w/ Remote for $95
Very nice explanation.
You can't just run a 2D video through an algorithm and magically get a 3D video.
You have to run the video through a compositing program (think Photoshop for video) and use that to chop and mask each scene and introduce parallax effects. Then (if your compositing program supports 3D space) you output the streams from two different virtual cameras so that you have 2 final videos that are synced and are from two different angles (one for each eye). At that point, it's trivial to encode them to whichever 3D video container format you want to deliver as your final output.
If you really want to learn how to do this, try it first using stills with Photoshop or the Gimp. Once you understand what's involved for creating a believable 3D scene out of a 2D image, you're ready to start learning how to use a video compositing app to do the same thing.
Be prepared to spend a lot of time on this.
I'm out of my mind right now, but feel free to leave a message.....
This is basically impossible, or will have horrible artifacts.
The current crop of movies with 2D-to-3D conversions still took significant human and artistic effort to achieve, even though the results are mediocre. For a given frame, for every pixel in 2D, SOMETHING has to decide how far away the subject depicted must be. That is, it has to INVENT the third dimensional value. Then this value is used to calculate two new 2D frame with parallax involved.
There's no computational way to achieve this INVENTION of the depth value with an arbitrary photograph, though. Any computational model will have big gaps in its ability. With enough computing power, you can perhaps identify visual markers in neighboring frames (say, the corner of a lampshade), solve for where the camera position must be relative to the markers, then use the depth of the solved markers to base all the other pixels (say, the lampshade versus the drapes). But that (1) takes significant solver time now, (2) requires a lot of hand-adjustments to discard inappropriate markers that upset the solver process with bad results, and (3) won't find anywhere near enough quality markers across the whole frame in fast-moving action scenes to fill in the rest of the data.
Some people get ill with the best 3D out there, others get ill as the quality of the 3D information degrades. The inconsistent results of any realtime method would likely be epilepsy- and nausea-inducing in a matter of seconds.
[
It is possible. There are some algorithms that do this (semi-)automatically. Not sure how they work (perhaps using parallax from moving objects), but they do work, and I have seen the results. I came across a 3D version of one of the Star Wars movies, and I was quite impressed with the results from what is after all an automated process. The 3D in space and landscape scenes was pretty good. However closeups of talking faces revealed the weakness; the moving face confused the algorithm and the result was something that looked a bit like the shimmering produced by rising hot air.
Impressive from a technical point of view, but I wouldn't call the results suitable for the cinema or even for home viewing. 3D movies require the director and cameramen knowing their 3D stuff, you have to shoot specifically for 3D, and it is not easy. However, Cameron has shown that it can be done, and that it can add something to the movie rather than just being a gimmick with spears / body parts being pointed / thrown at the audience.
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
In theory you can expand the image to 3d by clipping each object and vertically slicing it into layers (for a tree, or horizontally for a bench) then adding 3D effect to the layers, filling in where there is gaps and overlaying where needed, for each object in each frame of the film, then composit all the objects and masks back on to the scene. Now that you've spent ~100 hours on that your first frame is done, time to do the next, but now it's harder because you need to account for motion so your clipping masks are no the same AR as the previous frame and your fill/overlap layers changed as well since the perspective has changed. Wash rinse repeat for the rest of your life to finish a 10 minute clip.
possible to do right? yes.
practical? not at all.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
You dont simply want to filter the channels red vs green/blue. That creates terrible ghosting. Instead look up Dubois alforithm, its a linear projection from 6d colorspace to 3d colorspace, optimized for minimal ghosting using MSE. Finished matrices exist fro both red/cyan, green/magenta and amber/blue, available from Dubois homepage. Recently used this for a project, works great.
You can try DVDFab from Fentao and see if that works for you: http://www.dvdfab.com/
I work in post-production, and while some of the stereo-handling algorithms are impressive from a technical point of view (like the stuff in Eyeon Dimension and The Foundry's Ocula), and while I think stereo 3D is here to stay for video games (at least after consoles add some improvements to head tracking), I doubt it will be more than a passing fad for movies. It's simply not compatible enough with human vision, even when done properly (head movements spoil the effect, the difference between convergence point and focus plane puts stress on your eyes, etc.; it's as if someone nailed your head to the cameras). When I'm watching a movie, I'm a spectator, I don't feel any need to be "in" the movie; I'm fine with being an infinite distance way. Anything that makes watching the movie less comfortable is going to detract from the experience.
Anyway, although there are ways to extract 3D information from 2D image sequences (not from individual images), as done by camera trackers such as SynthEyes, PFTrack, etc., the result is a very low resolution point cloud, which is really only useful to calculate the camera position and / or track some scene features, not to create a usable stereoscopic image pair.
The only vaguely acceptable way to get stereo is to project the frames onto a (simplified) hand-made 3D model of the shot (typically a grid deformed by a displacement map), and then render it from two virtual cameras. This can take ages (to set up; rendering is quick) and is generally the kind of work you offload to some intern you don't like much. Even then, the results are generally less pleasant to watch than the original (mono) footage. If you're interested in seeing how this is done, search for "Stereo Conversion NAB" on YouTube, and you should find a few examples.
There is no way to convert individual frames from 2D to 3D in real time for the same reason that "digital zoom" can't show you text that was smaller than the sensor's pixels; the information is simply not there. You can, obviously, write an algorithm that adds made-up depth information to any image, just as you can write an algorithm that adds random text to zoomed images, but I doubt that would improve your movies in any way.
The problem isn't too hard if you are moving your camera sideways at an even speed since you could just use 1 frame for the left view, and a frame a short amount of time later for the right view. However, if the video camera is taking some unknown path then no 2 frames from the original video will in general create the correct parallax. Therefore, you would need to do a bundle adjust on the camera movement (computationally quite painful and not always reliable for arbitrary camera motion). Then comes the hard part of producing a close to 100% coverage dense 3D model at regular enough intervals to render new image frames with camera spacing and orientation to match human vision. Not impossible, but I think reliability of the currently available algorithms and computation time are the big problems.
I write post-production software used to do this (and it runs on Linux!). The best results I've seen involve manually breaking each shot into dozens of layers, using rotoscoping. Each set of layers is exported as masks and imported into a compositing application where the images for the layers are projected onto the masks in 3D space. In some cases they build rough 3D models and project the layers onto the respective models. Now they can add a virtual camera and render the scene from both views. Then they bring the footage into a paint system and manually paint in the "missing" parts that now show up because of the change in camera angle. This has to be done for both the left and the right eye.
They have a room of 300 guys in India doing this for Titanic. But the results are INCREDIBLE.
Some automatic techniques involve rotoscoping a depth map by hand (or with a combination of some automated depth map generation, but this almost always has to be tweaked for good results), then using that to synthesize two new views from the 2D footage. Then to fill in the gaps they can use either an automated warping (which looks almost, but not quite, entirely not all right) or hand-painting again.
The upshot is it is a very very manual, labor-intensive process, with somewhat specialized tools. But when done well it looks amazing.
Not really as bad as you think. All it does is show frame n in one eye and frame n + 1 in the other, stretched (and cropped to preserve aspect ratio) a bit to exaggerate the depth. So things that do not move, they are assumed to be in the background, moving things seem to be closer. It's not as bad as you say, no resetting one key frames for example, but yes the effect is strange, often not right, as well as neat.
The company DDD has built hardware to do this; it "works", after a fashion. It is, indeed, incorporated into a number of recent 3D TVs.
Basically, there are a number of algorithms in the box, and it chooses the one that is most appropriate for a given sequence. If the system sees blue in the top of the frame, it assumes that it is sky, and puts it in the back. If the camera is trucking from one side to the other to generate parallax, it uses that to generate depth. If I recall correctly, there are some 25 different algorithms using motion, color, brightness, etc -- and it indeed does sort of work.
The depth map that is generated is quite coarse (3D conversion can look very good indeed. I was the stereo supervisor for 1/3 of Transformers III, which had both photographed and simulated 3D, and I felt that they were of comparable quality. It was a very non-real-time process, of course!
I love Mondays. On a Monday, anything is possible.
Have you tried http://www.youtube.com/editor_3d It's quite basic, and requires dual video input. I gave it a crack and got horrible results (mainly due to bad camera setup on my end and a lack of patience, oh and I didn't have the r/b glasses, did I mention that I wasn't trying very hard either). With a decent dual camera setup you could probably produce them quite painlessly.
Can a person program a new solution to a problem? Why should anyone be able to stop such a thing? -Richard Stallman
Use the Gimp and then this excellent 3-d tutorial. http://goldomega.deviantart.com/art/Photoshop-3d-Anaglyph-Tutorial-149857792 I have been converting my paintings to 3-d for use with red blue glasses.
Are you kidding, someone moderated this as Troll !!!
I certainly never intended to be negative about anyone or thing, just that it's a hard problem.
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
Someone mod parent up. That site's amazing!
Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.
Download Bundler + PMVS 3D reconstructions packages and feed video into them. Those packages are fairly stable and reliable and they give you 3D point cloud. After that convert 3D point cloud to surface - there are several packages which can do it, but I can't give any advice here - I don't know any *stable* package, all of them are research soft - memory leaks, random crashes, difficult parameters setting, compilation problems etc. If you want to learn algorithms himself that's at least year worth of math and computer vision (if you are not math/phys major). "Multiple View Geometry in Computer Vision" usually recommended for starters, but this book is thoroughly obsolete now. All the modern staff is in the papers.
If you look on Google under "OpenCV stereo vision" you will find links showing how the code runs. There are video examples using two web cams that run in real time at around 5 frames per second. If you record and run off line you can get reasonable playback frame rates.
This code generates a depth map for the scene, so each pixel is assigned a distance from the camera. These techniques are derived from robotic vision research, so it is an image processing solution, not a 3D computer graphics solution. It does not generate 3D surfaces. What you do with the depth map is up to you.
Why is Snark Required?