Pencigraphy: Image Composites from Video
jafuser writes: "Prof. Steve Mann (of cyborg fame) has a detailed technical description on his site that demonstrates a method of transforming video into a high resolution composite image. Pictures are seamlessly mosaiced together to form one larger picture of the scene. Portions of the video that were "zoomed in" will result in a much clearer region in the final picture. I wonder if this could be used in a linear sequence to 'restore' old video to higher resolutions? It's on sourceforge; download and play!" Mann has been experimenting with such composites using personal video cameras for years.
Video Orbits of the Projective Group:
A New Perspective on Image Compositing.
Steve Mann
Abstract
A new technique has been developed for estimating the projective (homographic) coordinate transformation between pairs of images of a static scene, taken with a camera that is free to pan, tilt, rotate about its optical axis, and zoom. The technique solves the problem for two cases:
* images taken from the same location of an arbitrary 3-D scene, or
* images taken from arbitrary locations of a flat scene.
The technique, first published in 1993,
@INPROCEEDINGS{mannist,
AUTHOR = "S. Mann",
TITLE = "Compositing Multiple Pictures of the Same Scene",
Organization = {The Society of Imaging Science and Technology},
BOOKTITLE = {Proceedings of the 46th Annual {IS\&T} Conference},
Address = {Cambridge, Massachusetts},
Month = {May 9-14},
pages = "50--52",
note = "ISBN: 0-89208-171-6",
YEAR = {1993}
}
has recently been published in more detail in:
@techreport{manntip,
author = "S. Mann and R. W. Picard",
title = "Video orbits of the projective group;
A simple approach to featureless estimation of parameters",
institution = "Massachusetts Institute of Technology",
type = "TR",
number = "338",
address = "Cambridge, Ma",
month = "See http://n1nlf-1.eecg.toronto.edu/tip.ps.gz",
note = "Also appears {IEEE} Trans. Image Proc., Sept 1997, Vol. 6 No. 9",
year = 1995}
(The aspect of the 1993 paper dealing with differently exposed pictures to appear in a later Proc IEEE paper; please contact author of this WWW page if you're interested in knowing more about extending dynamic range by combining differently exposed pictures, or getting a preprint.)
A pdf file of the above publication, as it originally appeared, with the original pagination, etc., is also available.
The new algorithm is applied to the task of constructing high resolution still images from video. This approach generalizes inter-frame camera motion estimation methods which have previously used an affine model and/or which have relied upon finding points of correspondence between the image frames.
The new method, which allows an image to be created by ``painting with video'' is used in conjunction with a wearable wireless webcam, so that image mosaics can be generated simply by looking around, in a sense, ``painting with looks''.
Introduction
Combining multiple pictures of the same static scene allows for a higher ``resolution'' image to be constructed.: Example of image composite from IS&T 1993 paper (click to see higher resolution version). In the above example, the spatial extent of the image is increased by panning the camera while mosaicing and the spatial resolution is increased by zooming the camera and by combining overlapping frames from different viewpoints.
Note that the author overran the panning to appear twice in the composite picture (this is an old trick dating back to the days of the 1904 Kodak circuit 10 camera which is still used to take the freshman portraits in Killian court, and there are several people who still overrun the camera to get in the picture twice). Note also that the author appears sharper on the right than on the left because of the zooming in (``saliency'') at that region of the image.
Note also that, unlike previous methods based on the affine model, the inserts are not parallelogram-shaped (e.g. not affine), because a projective (homographic) coordinate transformation is used here rather than the affine coordinate transformation.
The difference between the affine model and the projective model is evident in the following figure:
For completeness, other coordinate transformations, such as bilinear and pseudo-perspective, are also shown. Note that the models are presented in two categories, models that exhibit the ``chirping'' effect, and those that do not.
Examples
1. Extreme wide-angle architectural shot. A wide-sweeping panorama is presented in a distortion-free form (e.g. where straight lines map to straight lines).
2. My point of view at Wal-Mart Click for medium-resolution greyscale image; a somewhat higher resolution image is available here; a much higher resolution version of this same picture, in either 192 bit color (double) or 24 bit color (uchar), is available upon request).
3. ``Claire'' image sequence Paul Hubel aims a hand-held video camera at his wife. Although the scene is not completely static and there is no constraint to keep the camera center of projection (COP) fixed, the algorithm produces a reasonable composite image.
4. An ``environment map'' of the Media Lab's ``computer garden''.
5. Head-mounted camera at a restaurant
6. Outdoor scene with people, close-up (Alan Alda interviewing me for Sci.Am "FRONTIERS").
7. National geographic visit
See a gallery of quantigraphic image composites
Obtain (download) latest version of VideoOrbits freesource from sourceforge
or if you can take a look at an older version, (download of old version) or if you don't want to obtain the whole tar file, you can take a look at the README of the old version. bugs, bug reports, suggestions for features, etc. to: mann@eecg.toronto.edu, fungja@eyetap.org, corey@eyetap.org
My original Matlab files upon which the C version of orbits is based (these in-turn were based on my PV-Wave and FORTRAN code)
For more info on orbits, see chapter 6 of the textbook. Steve's personal Web page
List of publications
Cretin - a powerful and flexible CD reencoder
The new method, which allows an image to be created by ``painting with video'' is used in conjunction with a wearable wireless webcam, so that image mosaics can be generated simply by looking around, in a sense, ``painting with looks''.
Just in case anyone was wondering - this wasn't being done in anything close to real-time the last time I checked. There's a cluster in Prof. Mann's lab which is dedicated to compositing these images (my cube is in the next room).
Still an interesting project. The affine transformation approach has been well-understood for some time (you do a brute force and ignorance test of promising-looking affine transformations [rotations and scalings] to find one that matches the new image to the old). As far as I can tell, he's doing the same thing with a different coordinate system that has a bit less distortion.
Video Orbits of the Projective Group.
Grabbed this a little before the server colapsed
... v116.ppm (e.g. for an image sequence with 117 pictures in it).
The four main programs you need to use to assemble such image sets are estpchirp2m, pintegrate, pchirp2nocrop, and cement (Computer Enhanced Multiple Exposure Numerical Technique).
The programs use the ``isatty'' feature of the C programming language to provide documentation which is accessed by running them with no command line arguments (e.g. from a TTY) to get a help screen. The sections for each program give usage hints where appropriate. Future versions will support the ``pipe'' construct (e.g. some programs may be used without command line arguments but will still do the right thing in this case rather than just printing a help message).
The first program you need to run is estpchirp2m, which estimates coordinate transformation parameters between pairs of images. These ``chirp'' parameters are sets of eight real-valued quantities which indicate a projective (i.e., affine plus chirp) coordinate transformation on an image.
The images are generally numbered sequentially, for example, v000.ppm, v001.ppm,
After you run estpchirp2m on all successive pairs of input images in the sequence, the result will be a set of sets of eight numbers, in ASCII text, one set of numbers per row of text (the numbers separated by white space). The number of lines in the output ASCII text file will be one less than the total number of frames in the input image sequence. For example, if you have a 117-frame sequence (e.g. image files numbered v000.ppm to v116.ppm), there will be 116 lines of ASCII text in the output file from estpchirp2m.
The first row of the text file (e.g. the first set of numbers) indicates the coordinate transformation between frame 0 and frame 1; the second row, the coordinate transformation between frame 1 and frame 2, \ldots A typical filename for these parameters is ``parameters\_pairwise.txt''
These pairwise {\em relative} parameter estimates are then to be converted into ``integrated'' {\em absolute} coordinate transformation parameters (e.g. coordinate transformations with respect to some selected `reference frame'). This conversion is done by a program called pintegrate.
This program takes as input the filename of the file containing parameters from the ASCII text file produced by estpchirp2m (e.g. ``parameters\_pairwise.txt'' and a `reference frame' (specified by the user), and calculates the coordinate transformation parameters from each frame in the image sequence to this specified `reference frame'.
The output of pintegrate is another ASCII text file which lists the set of chirp parameters (again, 8 numbers per chirp parameter, each set of 8 numbers in ASCII, on a new row of text), this time one parameter per frame, designed to be used in order. That is, the first row of the output text file (first set of 8 real numbers) provides the coordinate transformation from frame 0 to the reference frame, the second from frame 1 to the reference frame\ldots
The program called pchirp2nocrop takes the ppm or pgm image for each input frame, together with the chirp parameter for this frame % from pintegrate, and `dechirps' it (applies the coordinate transformation to bring it into the same coordinates as the reference frame). Generally the parameters passed to pchirp2nocrop are those which come from pintegrate (e.g. {\em absolute} parameters, not relative parameters). The output of pchirp2nocrop is another ppm or pgm file.
The program called cement (CEMENT is an acronym for Computer Enhanced Multiple Exposure Numerical Technique.) assembles the dechirped images (which have been processed by pchirp2nocrop) and assembles them onto one large image `canvas'.
Ready the Slashdoting!
mib? i guess you're referring to the shades.. guess what those are? his own custom wearable computer. you can read more about it at wearcam.org. this guy is awesome.
My undergraduate design project was with Steve Mann on this technology (objective was the "parallelization" of the software on a Beowulf cluster - shout out to Mike and Anna :) ).
:)
The main use of this system so far has been to stitch multiple images into one panoramic shot. Like any auto-stitching program, this requires a certain amount of overlap between frames - the more overlap, the better the stitching. The code works remarkably well, automatically rotating, zooming, skewing and otherwise transforming the images to fit together and then mapping them into a "flat" image as opposed to a parallelogram-shaped one.
Yes, the higher resolution from multiple shots of the same scene works, and is a very cool effect of the system. Of course, this requires a more or less static scene.
Finally, it's not necessarily "video" that it uses, although pulling individual frames from a video would work. It's based of the head-mounted cameras of the wearcam systems, which essentially use a stripped-down webcam for image-gathering, so you already know the fps and resolution limitations involved with those.
Of course, in the 2 years since I've been there, the technology has probably improved, although I doubt the webpage has.
Mann has a bunch of cool projects involved with the wearcam/wearcomps. This is a great one, another is the Photoquantigraphic Lightspace Rendering (painting with light), which can also be found on the wearcam site.
- In hell, treason is the work of angels.
For full motion video, our brains do this kind of integration for us anyway through persistence of vision. For the techniques described on the site to be used the successive images would have to be partially overlapping, not fully overlapping. It is pointless to do some sort of isomorphic mapping when the frames are fully overlapping already.
The reason I mention it is that I've written software to knit together separate photographs into a single panoramic picture. My approach was quite different, based on applying a lens distortion to the digital image to make all of the images map into a consistent sperical space, then mapping them back to an isomorphic image after they have been joined. The approach described here appears at first reading to involve rotating the images into a common plane and knit them together in that plane - quite an interesting approach, and one I wish I had thought of at the time I was working on my own problem.
But taking that and saying 'yeah, now we can restore old movies' is just a bizarre misunderstanding of what the technique involves.
--
BitTorrent in C -- LibBT
http://www.sf.net/projects/libbt