Pencigraphy: Image Composites from Video

site is slow -- here's the text by krog · 2002-07-25 05:54 · Score: 4, Informative

Video Orbits of the Projective Group:
A New Perspective on Image Compositing.
Steve Mann
Abstract
A new technique has been developed for estimating the projective (homographic) coordinate transformation between pairs of images of a static scene, taken with a camera that is free to pan, tilt, rotate about its optical axis, and zoom. The technique solves the problem for two cases:

* images taken from the same location of an arbitrary 3-D scene, or
* images taken from arbitrary locations of a flat scene.

The technique, first published in 1993,

@INPROCEEDINGS{mannist,
AUTHOR = "S. Mann",
TITLE = "Compositing Multiple Pictures of the Same Scene",
Organization = {The Society of Imaging Science and Technology},
BOOKTITLE = {Proceedings of the 46th Annual {IS\&T} Conference},
Address = {Cambridge, Massachusetts},
Month = {May 9-14},
pages = "50--52",
note = "ISBN: 0-89208-171-6",
YEAR = {1993}
}

has recently been published in more detail in:

@techreport{manntip,
author = "S. Mann and R. W. Picard",
title = "Video orbits of the projective group;
A simple approach to featureless estimation of parameters",
institution = "Massachusetts Institute of Technology",
type = "TR",
number = "338",
address = "Cambridge, Ma",
month = "See http://n1nlf-1.eecg.toronto.edu/tip.ps.gz",
note = "Also appears {IEEE} Trans. Image Proc., Sept 1997, Vol. 6 No. 9",
year = 1995}

(The aspect of the 1993 paper dealing with differently exposed pictures to appear in a later Proc IEEE paper; please contact author of this WWW page if you're interested in knowing more about extending dynamic range by combining differently exposed pictures, or getting a preprint.)

A pdf file of the above publication, as it originally appeared, with the original pagination, etc., is also available.

The new algorithm is applied to the task of constructing high resolution still images from video. This approach generalizes inter-frame camera motion estimation methods which have previously used an affine model and/or which have relied upon finding points of correspondence between the image frames.

The new method, which allows an image to be created by ``painting with video'' is used in conjunction with a wearable wireless webcam, so that image mosaics can be generated simply by looking around, in a sense, ``painting with looks''.
Introduction
Combining multiple pictures of the same static scene allows for a higher ``resolution'' image to be constructed.: Example of image composite from IS&T 1993 paper (click to see higher resolution version). In the above example, the spatial extent of the image is increased by panning the camera while mosaicing and the spatial resolution is increased by zooming the camera and by combining overlapping frames from different viewpoints.

Note that the author overran the panning to appear twice in the composite picture (this is an old trick dating back to the days of the 1904 Kodak circuit 10 camera which is still used to take the freshman portraits in Killian court, and there are several people who still overrun the camera to get in the picture twice). Note also that the author appears sharper on the right than on the left because of the zooming in (``saliency'') at that region of the image.

Note also that, unlike previous methods based on the affine model, the inserts are not parallelogram-shaped (e.g. not affine), because a projective (homographic) coordinate transformation is used here rather than the affine coordinate transformation.

The difference between the affine model and the projective model is evident in the following figure:

For completeness, other coordinate transformations, such as bilinear and pseudo-perspective, are also shown. Note that the models are presented in two categories, models that exhibit the ``chirping'' effect, and those that do not.
Examples

1. Extreme wide-angle architectural shot. A wide-sweeping panorama is presented in a distortion-free form (e.g. where straight lines map to straight lines).
2. My point of view at Wal-Mart Click for medium-resolution greyscale image; a somewhat higher resolution image is available here; a much higher resolution version of this same picture, in either 192 bit color (double) or 24 bit color (uchar), is available upon request).
3. ``Claire'' image sequence Paul Hubel aims a hand-held video camera at his wife. Although the scene is not completely static and there is no constraint to keep the camera center of projection (COP) fixed, the algorithm produces a reasonable composite image.
4. An ``environment map'' of the Media Lab's ``computer garden''.
5. Head-mounted camera at a restaurant
6. Outdoor scene with people, close-up (Alan Alda interviewing me for Sci.Am "FRONTIERS").
7. National geographic visit

See a gallery of quantigraphic image composites
Obtain (download) latest version of VideoOrbits freesource from sourceforge
or if you can take a look at an older version, (download of old version) or if you don't want to obtain the whole tar file, you can take a look at the README of the old version. bugs, bug reports, suggestions for features, etc. to: mann@eecg.toronto.edu, fungja@eyetap.org, corey@eyetap.org
My original Matlab files upon which the C version of orbits is based (these in-turn were based on my PV-Wave and FORTRAN code)
For more info on orbits, see chapter 6 of the textbook. Steve's personal Web page
List of publications

--
Cretin - a powerful and flexible CD reencoder

Restoring old video by Blind+Linux · 2002-07-25 05:57 · Score: 5, Interesting

Juding from the description found in that article, I believe that it is possible to enhance old video to higher qualities. However, the quality of color sometimes cannot be enhanced no matter what. Unless one has access to the original film reel, it is unlikely that any sort of improvements could be made; video copies are utterly useless in this manner. Anything from before 1990 in VHS is much worse quality, case in point being the John Woo film A Better Tomorrow. The problem with these videos is that not only is the quality blurry, but the color blending is off and sometimes exceeds the lines it should, creating distorted images. I've seen this in a lot of older movies... I wonder if there's a way to correct this.
At any rate this looks very promising indeed... it'd be cool to see some of the old classics in better quality. :)

Re:Restoring old video by Anonymous+Canard · 2002-07-25 07:45 · Score: 3, Informative

Well, NTSC video has 29.97 frames per second (w/ 2 alternating "fields" per frame). So when the camera is held steady, that's about 30 sample exposures of a particular angle.
For full motion video, our brains do this kind of integration for us anyway through persistence of vision. For the techniques described on the site to be used the successive images would have to be partially overlapping, not fully overlapping. It is pointless to do some sort of isomorphic mapping when the frames are fully overlapping already.
The reason I mention it is that I've written software to knit together separate photographs into a single panoramic picture. My approach was quite different, based on applying a lens distortion to the digital image to make all of the images map into a consistent sperical space, then mapping them back to an isomorphic image after they have been joined. The approach described here appears at first reading to involve rotating the images into a common plane and knit them together in that plane - quite an interesting approach, and one I wish I had thought of at the time I was working on my own problem.
But taking that and saying 'yeah, now we can restore old movies' is just a bizarre misunderstanding of what the technique involves.

--

--
BitTorrent in C -- LibBT
http://www.sf.net/projects/libbt

Site already /.-ed by Hertog · 2002-07-25 05:58 · Score: 4, Funny

Can we be sure his head didn't explode?

--
-=- I heard rumours about an OS called "Social Life", heard of it? Is it stable? -=-

Not in real-time. by Christopher+Thomas · 2002-07-25 06:05 · Score: 5, Informative

The new method, which allows an image to be created by ``painting with video'' is used in conjunction with a wearable wireless webcam, so that image mosaics can be generated simply by looking around, in a sense, ``painting with looks''.

Just in case anyone was wondering - this wasn't being done in anything close to real-time the last time I checked. There's a cluster in Prof. Mann's lab which is dedicated to compositing these images (my cube is in the next room).

Still an interesting project. The affine transformation approach has been well-understood for some time (you do a brute force and ignorance test of promising-looking affine transformations [rotations and scalings] to find one that matches the new image to the old). As far as I can tell, he's doing the same thing with a different coordinate system that has a bit less distortion.

Re:Not in real-time. by dillon_rinker · 2002-07-25 08:44 · Score: 3, Informative

affine transformations [rotations and scalings]

Actually, a combination of rotation, scaling, translation, and shearing.

Algebraically (and more precisely) (and more pedantically), in 2-D:

X=A1*X1+B1*Y1+C1
Y=A2*X1+B2*Y2+C2

I'd try to show it as matrix arithmetic but the lameness filter won't let me. More evidence that being a math geek (or even a former one) is lame.

Look out Hollywood by JojoLinkyBob · 2002-07-25 06:11 · Score: 3, Interesting

A good testing ground for this concept could be boot-leg movie craze.

All of the different recordings for a given movie are commensurably low-quality, but wouldn't it be great if you combine the best aspects of each (a "greater of goods") to generate one sharp quality movie. Testing it should be a little easier since you could use the rectangular silk-screen to calibrate the images. Food for thought.

--
-jc

Fourth Dimension. by Fross · 2002-07-25 06:11 · Score: 4, Interesting

This in a very interesting and inspired use of technologies, that is giving some great results. However, one thing that is not bing taken into account here is that video is shot over time - subsequent frames of a scene represent a change in a scene according to how things progress over time. Thus for anything other than a static scene (which is not of too much use) this can cause problems.

Take for instance the example on the main page of this (if it's not slashdotted already), the two swimmers standing ready to dive in. In a real-orld situation, by the time the first picture of th swimmer on the left was taken, the one on the right may have already dived in - when it comes to take that one's picture, he would be already swimming away. Hence if these images were composited, it would look like one dived in while the other was still on the blocks.

Possibly of artistic interest, but otherwise a bit of an annoyance in what is definitely a very cool use of technology. It's interesting that after 100 years or so, we could be back at the point where someone says "hold still for a few seconds, i'm going to take a picture".

Fross

Related to security techs? by mike3411 · 2002-07-25 06:12 · Score: 3, Interesting

The site's very /.'ed, but I believe what's done is similar to a technology used by security firms and the military. Essentially, when you take a picture of a given object/scene, the "true" resolution (comprised of each individual photon bouncing off the objects and striking the lense) is always downsampled, to varying degrees, depending on the resolution of the camera. However, if a camera is moving, while each individual frame will be of equal resolution, the particular data that each is storing will contain differnt information about the object/scene. If, for example, the camera is pointed at a grayscale gradient that's so small it only occupies one pixel, that pixel might appear white, black, or somewhere in between depending on the exact orientation of the camera, and in a regular video would probably look like some indistinct blur between these colors. With analysis, the changes can be examined and used to create an image that accurately portrays the gradient.

Traditionally, this has only been done with motionless cameras, it sounds like what this professor has done is to extend these capabilities to moving and zooming video, which is extremely cool (and I really want to check out his site, so everyone else stop going there :).

--
Mod me down, and I will become more powerful than you can possibly imagine!

video-still; what about video-video? by mikeee · 2002-07-25 06:26 · Score: 4, Interesting

It would be really neat if it could interlace multiple video streams into a higher-resolution single stream.

Use of such a technique to defeat no-copy flags left as an exercise.

I saw an article a few weeks ago about some DoD fooling about with tech that merged multiple cameras (at fixed locations) into a 3-D model that could be viewed from different angles in realtime. Anybody have a link to that one?

Consumer product did this, Snappy by t0qer · 2002-07-25 06:32 · Score: 3, Interesting

The snappy video snapshot from play inc did this years ago IIRC. Even though NTSC res is 720x480 the snappy was able to squeeze high res pictures out by sampling 2 frames, them performed mathmatical magic to achieve resolutions over 1280x1024.

More examples by interiot · 2002-07-25 06:33 · Score: 4, Informative

Some more pictures from Video Orbits:

``Claire'' image sequence. Paul Hubel aims a hand-held video camera at his wife.
An ``environment map'' of the Media Lab's ``computer garden''.
Outdoor scene with people, close-up (Alan Alda interviewing me for Sci.Am "FRONTIERS").

Ready the Slashdoting!

Clearing up some confusion by Astin · 2002-07-25 06:40 · Score: 5, Informative

My undergraduate design project was with Steve Mann on this technology (objective was the "parallelization" of the software on a Beowulf cluster - shout out to Mike and Anna :) ).

The main use of this system so far has been to stitch multiple images into one panoramic shot. Like any auto-stitching program, this requires a certain amount of overlap between frames - the more overlap, the better the stitching. The code works remarkably well, automatically rotating, zooming, skewing and otherwise transforming the images to fit together and then mapping them into a "flat" image as opposed to a parallelogram-shaped one.

Yes, the higher resolution from multiple shots of the same scene works, and is a very cool effect of the system. Of course, this requires a more or less static scene.

Finally, it's not necessarily "video" that it uses, although pulling individual frames from a video would work. It's based of the head-mounted cameras of the wearcam systems, which essentially use a stripped-down webcam for image-gathering, so you already know the fps and resolution limitations involved with those.

Of course, in the 2 years since I've been there, the technology has probably improved, although I doubt the webpage has. :)

Mann has a bunch of cool projects involved with the wearcam/wearcomps. This is a great one, another is the Photoquantigraphic Lightspace Rendering (painting with light), which can also be found on the wearcam site.

--
- In hell, treason is the work of angels.

Minority Report got its timeline wrong. by NeMon'ess · 2002-07-25 06:45 · Score: 5, Interesting

This is a gateway to pingpong-ball-less motion capture. In future with sufficient processing power and algorithyms, it ought to be possible to combine two lenses spaced apart for stereo, combined with x,y,and z axis positioning sensors. Such a device could record stereo data, combined positional data and the understanding that objects "grow" as the come closer", to make 3D models of anything it sees. The more time it can watch an object and rotate/zoom around it, the more detailed the model can be. It doesn't even have to make the model in realtime, just record as much data as it can then upload it to more powerful computers later. When does Minority Report take place? 2050 or so? Well by then I fully expect that instead of the flat holograms Tom Cruise watched we'll have full 3D.

Re:Minority Report got its timeline wrong. by foobar104 · 2002-07-25 07:23 · Score: 3, Interesting

The techniques you talk about in such breathless terms have been in commercial use for several years. Discreet's compositing software has a 3D tracker module that can infer three-dimensional relationships from moving video; it works pretty well under most circumstances. And there's an outfit called RealVis, I think, that can turn a scene or a series of stills into a fully textured 3D model with only minimal human interaction. They used the same basic technique on The Matrix, way back in '98, to build virtual sets for some specific special effects shots.

The only real limitations are contrast-- a computer couldn't isolate a polar bear in a snowstorm no matter how well lit and shot-- and field of view. If you don't shoot the back of the car, you can't see the back of the car. (I know that's kind of a ``duh,'' but you'd be surprised how many people don't get that at first.)

Addendum by Astin · 2002-07-25 06:48 · Score: 3, Interesting

One more thing - this isn't done in real-time. It can be run on a single machine and take a fair bit of time as it works through image pairs. Therefore, the more images you use, the longer it takes.

ie.- 5 images: 1, 2, 3, 4, 5

compares 1 & 2, 2 & 3, 3 & 4, and 4 & 5. The co-ordinate transformations for each pair are relative to the base image (so you don't have to re-transform after stitching).

There has been work to farm out the comparisons across a Beowulf cluster (the one built when I was there, was of some impressive VA Linux boxes, I believe it's been expanded since). But this still takes some time. So unless someone's going to get a parallel computing cluster inside a single package and make it affordable, this won't be rolled-out nationwide overnight.

--
- In hell, treason is the work of angels.

Pixelization No Longer Safe & Effective by Jah-Wren+Ryel · 2002-07-25 07:00 · Score: 5, Interesting

You know how televisions shows will pixelate the face of someone that doesn't want to be show on television? Sometimes it is just a passerby on MTV's Realworld who won't sign a release, but sometimes its somebody a little more important like a corporate or federal whistle-blower.

I've long thought that pixelization wasn't a very good way to protect the identities of these people because when they are on video, they move around and the camera sometimes moves around, but often the pixelization is applied in post-production so it stays in a relatively constant location rather than tracking the features on the person's face. Anyone sufficiently motivated and sufficiently equipped with the right tools ought to be able to reconstruct a much higher resolution, non-pixelated image of the secret person's face by extracting all of the useful information from each frame and then corollating it all together with the general movements of the person in the frame.

It sounds to me like pencigraphy is exactly the kind of science required to do something like that. So now the question is, who do we want to unmask? Too bad Deep Throat never made an on camera appearance.

--
When information is power, privacy is freedom.

Er, but does it *work*? by pla · 2002-07-25 12:02 · Score: 3, Insightful

Can we say "documentation", people?

I have three pictures, with roughly 2/3rds overlap.

I ran them pairwise (1 and 2, then 2 and 3) through estpchirp2m. Good, I get two output sets of 8 reals. I stuff them into a single file, one on each line.

So I pintegrate that file, using picture #2 as the reference frame. Cool, I now have three sets of eight reals.

Next, I pchirp2nocrop all three separately, passing the appropriate line from pintegrate on the command line (why bother with text files here, if I need to cut-and-past at this step anyway?). I now have three new .pbm files, which seems like what I should have according to the extremely limited documentation.

Step four, I cement the three new .pbm's together, and get a single file as the output. "Great!", I think, it worked and didn't give me too many problems.

So I open up the picture. Or try to. It seems that whatever the output file has in it, valid .pbm data doesn't top that list.

I tried again, but since I had followed the (limited) directions carefully the first time, my results did not differ.

So, I have three suggestions for Mr. Cyborg...

First, it doesn't matter *how* cool of a program you write, if no one can figure out how to use it (WRITE SOME REAL DOCS!!!).

Second, it doesn't matter how cool your program *sounds*, if it doesn't work.

Third, 99% of people playing with this will either not want to tweak any of the in-between stages' results. Of those that *do*, 99% will just hack the source. Ditch the four (and then some) programs, and make a single executable that takes as its arguments just the name of the input files, in order, and perhaps a *few* tweaking options (like enable or disable filtering, which sounds useful, except YOU DON'T HAVE IT DESCRIBED ANYWHERE!).

Ahem.

Otherwise, great program. No doubt one of the many companies doing the same thing for the past 20 years will soon have their lawyers send their congrats.

Slashdot Mirror

Pencigraphy: Image Composites from Video

18 of 157 comments (clear)