Choosing Better-Quality JPEG Images With Software?
kpoole55 writes "I've been googling for an answer to a question and I'm not making much progress. The problem is image collections, and finding the better of near-duplicate images. There are many programs, free and costly, CLI or GUI oriented, for finding visually similar images — but I'm looking for a next step in the process. It's known that saving the same source image in JPEG format at different quality levels produces different images, the one at the lower quality having more JPEG artifacts. I've been trying to find a method to compare two visually similar JPEG images and select the one with the fewest JPEG artifacts (or the one with the most JPEG artifacts, either will serve.) I also suspect that this is going to be one of those 'Well, of course, how else would you do it? It's so simple.' moments."
Unfortunately, I think you may find that it will simply require a human-level brain. I'd be really impressed with software that said, "Yep, this image just *looks* better to me." Unless, of course, JPG artifacts are systematic and consistent across images, which could well be.
Paste both images in your image editor of choice, one layer on top of each other, apply a difference/subtraction filter.
it is lossy compression, after all . . .
Do it with a bunch of images, and I expect you'll discover that the low-quality-gzipped image will be smaller than the high-quality-gzipped image...
Maybe? *shrug*
Larger file size should give a rough hint if both images are the same format (i.e. JPEG). But you've probably already thought of that.
Of course, there ought to be better ways...
Have you tried just comparing the files' sizes with respect to the images' dimensions? It'll vary from encoder to encoder, but higher-quality JPEGs will be larger than lower-quality ones. You could just use the number of pixels in the picture and the file size to obtain a rough approximation of "quality per pixel" and choose the image with the higher value. It won't be perfect, but it's a lot easier than trying to pick out JPEG artifacts.
Also, the number of artifacts doesn't tell the full story. One image may have more artifacts, but those artifacts may all exist in the background parts of the image, while the foreground is less blocky. It's a choice each encoder makes.
Given a set a pictures, it would be really nice to see them grouped by "these are several pictures of the same scene/object/subject". This is a tool I'm not aware of yet, and I'd love to hear what open-source tools people are using.
As a next step, it would be neat to pick out the one that's most in focus...
"You know, Hobbes, some days even my lucky rocketship underpants don't help" -- Calvin
Artifacts are something visible to us - they mean nothing to software. It doesn't know wether the pixels are intentionally colored that way (ie, detail) or colored that way through some compression process at some point in time (ie, artifacts) or something else (eg, ditherting, color depth, banding, etc). If two images are compressed at vastly different ratios, you'll be able to tell easily. Otherwise, they're probably both at a default 90% and if you can't tell the difference, whats the problem?
If you want to know which image has more artefacts, it would still be hard to tell what is an artefact and whats supposed to be part of the image.
If you just want to know which is more compressed.. dont jpeg images store the compression ratio used the last time they were saved? It should be in the header somewhere.
I suppose you could recompress both images as JPEG with various quality settings, then do a pixel-by-pixel comparison computing a difference measure between each of the two source images and its recompressed version. Presumably, the one with more JPEG artefacts to start with will be more similar to its compressed version, at a certain key level of compression. This relies on your compression program generating the same kind of artefacts as the one used to make the images, but I suppose that cjpeg with the default settings has a good chance of working.
Failing that, just take the larger (in bytes) of the two JPEG files...
-- Ed Avis ed@membled.com
less compression = bigger file
or else the problem is not truly resolvable. The other way is to
assume all the similar images come from the same source, if so then
its as simple as looking at the compression level in the file format
and the various levels of scaling applied to the lossy images.
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
You save both images at a lower JPG quality and you test which one changed more (across all its pixels; remember to factor in the relative importance of various types of changes) from its original state to its new lower-quality state. The image that changed more was more dependent on the original JPG quality setting so it had more information, detail, etc..
Take a look at Amazon's Mechanical Turk offering. (www.mturk.com) You create "hits" either through their website or API and real humans complete the tasks. This one would be simple, some two (or more) images to the worker and have them select the one with the best quality. You can even pre-screen workers by having them complete a qualification test. You could get all your images sorted out for a penny or two a piece.
The ImageMagick package includes a command called identify, which can read the EXIF data in the JPEG file. You can use it like this:
identify -verbose creek.jpg | grep Quality
In my example, it gave " Quality: 94".
This will not work on very old cameras (from ca. 2002 or earlier?), because they don't have EXIF data. This is different info than you'd get by just comparing file sizes. The JPEG quality setting is not the only factor that can influence file size. File size can depend on resolution, JPEG quality, and other manipulations such as blurring or sharpening, adjusting brightness levels, etc.
Find free books.
all things the same, jpeg quality gives a good index to the quality of the image,
but it can be just as true that a lower jpeg quality image might be a better quality image.
for example, two images: the first image might be scanned off a badly faded
colour photocopy of a famous painting - it is saved at 300 dpi - approximately
2800 x 1200 pixels, and the jpeg quality set at 12 -- the second image is a
well lit photograph of the original painting, scanned on a scitex scanner,
and brought in as a tiff original -- all high end, but then they res it down to
only 1600 x 900 pixels, say at 200 dpi, and saved at a jpeg quality of 8.
well, in such a case, most software based on the assumption :-(
that jpeg quality = image quality would auto-pick the worser image.
handy index variable to have tho - could provide a resolution / jpeg quality :-)
metric in the google image searches...
all the best
john penner
An image with more contrast (greater average difference between adjacent pixels) probably has more detail. But compressability, as has already been noted, is probably just as good a measure.
To make a JPEG, you cut it into blocks, run the DCT on each block and mess with the 4:2:2 color formula and pkzip the pieces... That said, I would think measuring the number of blocks would be related to number of artifacts... In my barbaric approach to engineering, (assuming there is no other suggested way on slashdot), I would get the source code to the JPEG encoder/decoder and print out statistics (number of blocks, block size) of each image...
Run the DCT and check how much it's been quantized. The higher the greatest common factor, the more it has been compressed.
Alternatively, check the raw data file size.
Others have mentioned file size, but another good approach is to look at the quantization tables in the image as an overall quality factor. E.g., JPEG over RTP (RFC 2435) uses a quantization factor to represent the actual tables, and the value of 'Q' generally maps to quality of the image. Wikipedia's doc on JPEG has a less technical discussion of the topic, although the Q it uses is probably different from the example RFC.
When you save a JPEG, you usually choose a quality setting 0-100. I'm not sure if the effect of that is standard, but this should have reasonable results either way: try saving both images at 100, then progressively decrease the quality level until the image changes because additional artifacts are being introduced. This way, you can experimentally determine the quality setting of each image, and just choose the higher of the two. Alternatively, if that quality setting is available in the metadata somewhere, just read that.
Compute the root-mean-square difference between the original image and a gaussian-blurred version?
JPEG tends to soften details and reduce areas of sharp contrast, so the sharper result will probably
be better quality. This is similar to the PSNR metric for image quality.
Bonus: very fast, and can be done by convolution, which optimizes very efficiently.
Just look at the manner in which JPEGs are encoded for your answer!
Take the DCT (discrete cosine transform) of blocks of pixels throughout the image. Examine the frequency content of the each of these blocks and determine the amount of spatial frequency suppression. This will correlate with the quality factor used during compression!
load up both images in adobe after effects or some other image compositing program and apply a "difference matte"
Any differences in pixel values between the two images will show up as black on a white background or vise versa...
adam
BOXXlabs
Assuming you can find similar images programmatically you can probably use size to get a good guess. Alternately I know there are algorithms to find edges. Edges are where most jpeg artifacts show up. If you could then look at the gradient from the edges smooth ones will likely be the better image.
ThumbsPlus is an image management tool. It has a feature called "find similar" that should do what you want as far as identifying to pictures that are the same except for the compression level. Once the similar picture is found you can use ThumbsPlus to look at the file sizes and see which one is bigger.
I mean, you don't want second rate pictures in your pr0n stash?
I had problems building it back then, let alone writing the scripts for it and the hassle of figuring out which images were duplicates, but this utility seems to fit the bill.
Seems that if I really overcompress a JPEG, the main problems are at the edges of the blocks. This is not really unexpected.
So a simple first pass would be to apply a simple edge detector and look for discontinuities at the edges of the 8x8 blocks. For an example, just try an edge detector in any decent image editing app on an overcompressed JPEG.
Compute the number of bits per pixel of the image data.
use an edge-detection filter. since jpeg artifacts usually present themselves as "smeared out" edges, you may be able to figure out some rule based on the edge-detected image.
HOSAKA K., A new picture quality evaluation method.
Proc. International Picture Coding Symposium, Tokyo, Japan, 1986, 17-18.
Compress a bunch of original images with variable quality, noise, etc.
Go through this set of images (where you know which one is "best") and train it to return two booleans, one for match/no-match, another for first better or second better.
Slow to train, but you can use GPGPU for massive speedups.
I wonder if out of focus or blue detection methods will give you a metric which varies with the level of jpeg artifcats, after all the jpeg artifacts should make it more difficult to do things like edge detections etc which are the same the things that made more difficult by blurry and out of focus images
A google search for blur detection should bring up things that you can try, Here is series of posts that to do a good job of explaining some of the work involved
Assuming the only quality loss is due to JPEG compression, I guess a fourier transform should give you a hint: I think the worse quality image should have lower amplitude of high frequencies.
Of course, that criterion may be misleading if the image was otherwise modified. For example noise filters will typically reduce high frequencies as well, but you'd generally consider the result superior (otherwise you woldn't have applied the filter).
The Tao of math: The numbers you can count are not the real numbers.
You could just open the low quality images and save them with a higher quality setting.
I remember a Slashdot article of a guy who used JPEG quantization to detect if images were photoshopped... it had an example of a terrorist adding books. Can't find it via google tho.
First, make a bumpmap of each image. Then, render them onto quads with a light at a 45 degree angle to the surface normal. Run a gaussian blur on each resulting image. Then run a quantize filter, followed by lens flare, solarize, and edge-detect. At this point, the answer will be clear: both images look horrible.
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
JPEG compression averages groups of pixels with similar color data inside the JPEG image, but does not weigh that average against nearby pixel groups. You can use this fact to identify JPEG artifacts, even if the edges between artifacts is not visible to human eyes.
EG, in a patch of sky, which has a fairly random, but otherwise uniform distribution of shades of blue, there will emerge "squares" where the averaging algorithm has averaged a pixel group, but did not weigh the average of adjacent groups, resulting in a visually identifiable artifact.
You can gauge the quality of a compressed JPEG image by testing for discrete boundries in areas of similar color values that would nominally contain a random (or smooth gradient with random dither) aggregation of similar color types, and assinging a "Severity" value based on the 'hardness' of the artifact's differnce to it's neighbors.
In other words, in areas that would have originally had a nice "smooth" blending of similar colors, you will end up with blocks of discrete colors that have discernable edges. The severity of artifacting would be determinable by measuring how far discretely unique each artifact block is from it's neighbors, (with caveats to natural boundries- such as sky against tree, etc.)
To evaluate if an edge is a JPEG artifact or not, you should gather the JPEG pixel group size from the JPEG header, then see if your edges form a rectangle that is a multiple of that size.
This way you can tell if the hard edge is an artifact, or if it is the edge of Paris Hilton's nipple (or some other natural edge. Natural edges will very rarely have a mathematically perfect rectangular profile.)
A systematic evaluation of an image would be slow and painful, but would produce a scoring benchmark to rate two arbitrary JPEGs against each other. (Better would, of course, be 2 JPEGS and a lossless PNG-- that way you have the un-averaged data to help identify artifact boundries with, among other things, but that isnt what you asked for.)
"It's known that saving the same source image in JPEG format at different quality levels produces different images"
news to me.
Check out Tineye - http://tineye.com/faq
It does not do exactly what above post suggests, but it partially does what submitter asked (finding similar images on the net).
For what it's worth: I remember using Paint Shop Pro 9 a few years ago. It has a function called "Removal of JPEG artifacts" (or similar). I remember being surprised how well it worked. I also remember that PSP has quite good functionality for batch processing. So what you could do is use the "remove artifact" function and look at the difference before/after this function. The image with the bigger difference has to be the one of lower quality.
I am not sure if there is a tool that automatically calculates the difference between two images, but this is a task simple enough to be coded in a few lines (given the right libraries are at hand). For each color channel (RGB) of each pixel, you basically just calculate the square of the difference between the two images. Then you add all these numbers up (all pixels, all color channels). The bigger this number is, the bigger the difference between the images.
Maybe not your push-one-button solution, but should be doable. Just my $0.02.
compare both images against the original, not each other.
count number of pixels different from the original, then calculate max and average difference between either image and the original.
decide which parameter means more to you.
go forward from there.
-- it's ridiculous how many people misspell ridiculous... (damn, damn, damn...)
I would very much like to do the same with audio. I have so many duplicate tracks in my music collection in different formats and bitrates.
Warning: The Surgeon General Has Determined that Sigs are Dangerous to Your Health
JPEG works by breaking the image into 8x8 blocks and doing a two dimensional discrete cosine transform on each of the color planes for each block. At this point, no information is lost (except possibly by some slight inaccuracies converting from RGB to YUV as is used in JPEG). The step where the artifacts are introduced is in quantizing the coefficients. High frequency coefficients are considered less important and are quantized more than low frequency coefficients. The level of quantization is raised across the board to increase the level of compression.
Now, how is this useful? The reason heavily quantizing results in higher compression is because the coefficients get smaller. In fact, many become zero, which is particularly good for compression - and the high frequency coefficients in particular tend towards zero. So partially decode the images and look at the DCT coefficients. The image with more high frequency coefficients which are zero is likely the lower quality one.
Something like $\frac{1}{N} \sum_{i=1}^{N}(x_i-y_i)^2$, where $x$ and $y$ are arrays of pixels, and $N is the number of pixels in each array?
Does JPEG header have the compression method listed as well as compression ratio? If not, is there any way to figure out what kinda compresison engine is used base on how an image is constructed?
If so, simply do some testing against some of the most popular compression engine base on the artifact to determines what engine is used, then find out their compression ratio (perhaps a simple files size might work?). Then simply pick the images with the best quality base on engine used and ratio?
Run the free GREYCstoration algorithm on both images, subtract results from original, and pick the one most similar to the original: http://www.greyc.ensicaen.fr/~dtschump/greycstoration/
This is not the sig you're looking for.
Cool story, bro.
Replying to your post to create a new sub-thread, hope you don't mind as I think it involves similar research...
Often when I look at digital photos taken at a camera's maximum megapixel range, or even scans of negatives, or random pictures on the interwebs, I find them to be rather blurry; not necessarily out-of-focus, but simply 'soft'.
Essentially.. there's more information being used to store the image 'as is' than there is casually useful* information -in- the image.
Does anybody know of software, or algorithms, to figure out how much casually useful information is in a picture, and at what size (dimensions) that picture would optimally be stored?
* by 'casually useful' I mean this... take today's APOD image: /anon
http://antwrp.gsfc.nasa.gov/apod/ap090716.html ( view full - sparing their bandwidth by not linking to it, though I'm sure they have plenty )
That image to me, the casual user, looks blurry. Ever single pixel within it (and beyond from the original) is probably very important to the scientists; being able to run some algorithms on it to get every last bit of information from it. But when I look at it, I see the smallest 'feature' in it as being maybe 3-4 pixels across, let's say 4. So if I downsize it to 25% of the full size image, it looks perfectly sharp to me without any significant (to me, the casual user) loss of information.
NO. Not file size. File size would be a potential test if all images were from the same original source and if they were only ever jpeg compressed once. Unfortunately, quite often one will come across images that have been jpeg compressed and re-compressed, and the final re-compression was done at "high quality', So the file is large for the image, but it still contains all of the jpeg artifacts from the lower quality compression. You can also see extra artifacts when one file has only been compressed once but another file has been compressed repeatedly, even if the second file is the same size as the file that was only compressed once.
There are, of course, other issues that come into question too, such as original color depth and color depth of every intermediate image.
The poster asked a good question, but you did not provide a helpful answer.
I'm an American. I love this country and the freedoms that we used to have.
Compute the variance of the Fourier coefficients within each block and then calculate the average for each image. The better quality image should have lower variance. If a block has a lot of edges, then the higher frequency coefficients should have much higher values than the lower ones. If a block is uniform, then the lower frequency coefficients should have higher values. So if you have a good image, it will be easy to see the difference between uniform parts and edges. That is the coefficients of the most "important" frequency within a block will be higher. If your have a poor quality image, then not.
Any guest worker system is indistinguishable from indentured servitude.
If you've identified two images as the same (can be done by comparing pics of the same spatial resolution (make sure to low pass filter before resize to avoid artifacting!) and looking at the mean sum of square differences for really small differences... you'll have to play around with tolerance to find if its the "same" but and I'd always keep a weary eye... it'll just find similar images IMO), then you just have to take a look at their frequency domain counterpart images. The images with the most detail will have more energy in the high frequencies than the other less detailed images.
On the other hand, strictly for seeing who has the most artifacting, if you've identified images as the "same", the completely horizontal and vertical high frequencies should have lots of energy (by comparison wrt the good image and within the bad image itself) to make all those blocks.
Matlab makes it easy to visualize and transform this kind of stuff so take a look at its image processing toolbox or documentation (docs freely available online).
find dupes on the internet http://tineye.com/
find dupes on your HDD http://www.bigbangenterprises.de/en/doublekiller/
I killed da wabbit -Elmer Fudd
This is how I check for how much compression i have in my images.
1. Grab the original and the jpeg into photoshop (or whatever you use)
2. do a difference as your transfer mode. This will show you how different it is.
3. find out the value of all the pixels (I don't know ad them together or something)
Repeat the above steps with the second picture.
whichever is more is the one that is more different (why does that sound like bad English to me?) will be the lower quality image.
Use python and the PIL (python image library) to automate the whole thing and thats it.
JPEG is pretty efficient at compressing images -- the only way they get smaller on average is by increasing the quality loss. Therefore, the larger of the two images in bytes is probably the better looking copy.
- Michael T. Babcock (Yes, I blog)
How to save digital photos is a serious concern. JPEG sucks, it is not even an option. Any 24 bit option is doable. Here's the rub Adobe needs to get more open source, we can help them and they can help us.
I just thought of a possible way to compare...
Assuming both JPEG aren't at the lowest (or very low) quality:
1. Take image A, create 10 or 20 more copies using different levels of quality (5, 10, 15, and so on).
2. Compare each of them with image A, from lowest to highest quality.
3. Stop where the diff no longer change with the previous image, then we can assume image A is at the previous image's quality level.
Do the same with image B.
Well, your problem is that image quality is subjective. Can computers make good subjective judgements? Not really.
Let's say you count the number of pixels that are different? Well, what if JPEG usually slightly alters the brightness? You could weight the difference, but what if JPEG sometimes moves an edge by a pixel?
I think if you study a bit about how JPEG works, you might find that you can computationally determine how much information that is lost; but that does not mean that your computed number in any way is related to what a human will say the image quality is.
No, I will not work for your startup
No your the faggit, and I gone git pitbulls with AIDS to rape you face!
PS: HA HA!
It's appears that you assume that he wants to compare images for which he himself is the source? What if the images he actually wants to compare are pr0n, of the same hi-res glamour photo sets obtained from different sources? He needs to decide which is the "best" pron to keep, right? (Never mind that he can probably jack off equally well to either/any... he's a COLLECTOR so it matters. :-)
Such images will almost always have the EXIF data scrubbed from them, so your technique wouldn't work at all for racy hi-res stuff. I'm deliberately not naming example sources, because I don't want them to know they're a topic. :-)
Exploit JPEG's weakness.
JPEG encodes pixels by using a cosine transform on 8x8 pixel blocks. The most perceptually visible artifacts (and the artifacts most suceptible to cause troble to machine vision algorithms) appear on block boundaries.
Short answer:
a. 2D-FFT your image
b. Use the value of the 8-pixel period response in X and Y direction as your quality metric. The higher, the worse the quality.
This is a crude 1st approximation but works.
The 'quality' of a picture, as stated, is still a bit vague. If you have an image of a completely blue wall, I believe the entire picture could be compressed to a single 'artifact', yet retain the same amount of information as a bitmap. Perhaps what you're after is the amount of information given in an image.
Information Theory should help there. http://en.wikipedia.org/wiki/Entropy_(information_theory)
One quick and dirty method might take the histogram of the image, and then find the one with the greatest (or least) standard deviation. You could map light/depth, colors, etc to the histogram and see which one best suits your needs. It's not flawless (if for some reason you wanted a very blue wall instead of picking up little defects of dirt) but it could work.
Judge. It's not perfect, but it works.
Analogies don't equal equalities, they are merely somewhat analogous.
If by "near-duplicate" you mean different files that were actually once the same image, sorting by "date modified" might give you satisfactory results. Of course, I'm making certain assumtions here about how the images were acquired and why there are multiple versions, and only you will know if this applies to your situation, but I would suspect that the older files would be of better quality.
Welcome to Slashdot. Replace this text with your desired signature before replying to a story.
jpgQ - JPEG Quality Estimator
http://www.mediachance.com/digicam/jpgq.htm
Aside from the mathematical tests some have suggested, my gut tells me this is going to be almost impossible. There are tasks that a human can perform that just aren't doable given the present state of our software systems. The gap has as much to do with our understanding about how we perceive through our senses as it does with algorithms and calculation methodologies. We just don't know yet enough about the underlying processes to make a computer do it.
The same goes for other areas where AI is sorely lacking. Things like OCR, language recognition and translation, not to mention a program where you can whistle a tune and have it analyzed to the point where its name can be deduced (if it was written already), or scored as sheet music (if you're creating something new).
You are asking a machine to make a comparison between "good" and "not good" or "OK" and "fantastic" when all of these choices are by their very nature illusory at best.
Consider a photo of a person. I may prefer a softer focus some my prefer sharper, other more color saturation of a pastoral scene others less. Individuals judge an image in many many different ways.
In my youth I did a lot of photography. I was taking pictures of the Winternationals at Fremont Raceway ( when it still existed. ) and was shooting a funny car as it came off the line. I was shooting tri-x and pushing it a full stop which resulted in a grainy negative. I did some darkroom magic and came up with a very eye catching and award winning photo. But if you mechanically compared it to the straight shot it would haev been inferior.
The point is you can use an computer to compare some things, but you cannot use a computer to judge "better" in an artistic sense or a "pleasing to the eye" sense.
Hey KID! Yeah you, get the fuck off my lawn!
Umm...... I have to ask. If you can't tell just by looking at them, what difference does it make?
Tiled background images by evolution
Horizontal 3d bars by evolution
Vertical 3d bars by evolution
Um, Cisco has catastrophic layoffs today and this couldn't have waited until later?
Take advantage of the fact that JPEG quantized the chrominance information more aggressively at higher compression levels. Quite ridiculously so, in fact. Look at these three images. The first two are the Cb and Cr channels of a highly-compressed JPEG. The third is the luminance channel. Notice that there is WAY more information contained in the luminance channel. This effect gets more and more extreme as JPEG quality goes down.
Histograms
Quantifying this is a different question. Look at the histograms of each of the three channels. The histogram of Cb and Cr is extremely sparse, with a few large peaks, but with no energy in most buckets. The luminance channel, on the other hand, has a much more detailed histogram. I leave it up to the reader to create a formula to boil this all down to a single number.
What about just, you know... looking at them?
And if you can't tell the difference, does it matter then? (Just take the smaller one.)
That is my approach.
If you want the best one, even when you can't see the difference, just take the biggest one.
If the codec is the same, the chance that a higher quality image is smaller, is zero.
There, I solved it for you. :D
Or as a funny advertisement for a newspaper said:
[Image of a shiny pen.]
Before the first manned flight to space, NASA developed a pen, that can write in zero gravity, without the ink leaking.
The development costs amounted to $12 million.
[Removes pen, and puts a pencil in its place.]
That's... how the Russians solved the problem.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Comment removed based on user account deletion
jpeg is plain evil. OP problem can imo be solved by reading of JPEG compression level, sure it wont help if image is multiple times recompressed but looking up together to size and compression level from header should be enough
God's gift to chicks
The simple fact of the matter is that what you perceive as a "better" image, others won't. You may look at the primary subject matter, other will look at that and the background. You may be concerned about the contrast on the picture while others will look at the colors. While I understand that you're really looking for a good median there is truth to the axiom that "a picture says a thousand words". Anytime you monkey with it, you're stripping at least a few those words away. I think a better question is not "how do a compress this picture" but "what pictures should I keep". Just my $.02
There was a story a while back of a programer that worked with the quantization field or something to tell if a photo had been photoshoped, how many layers, and by what program EVEN if the file had been reencoded and compressed. Google "Krawetz's software." He used it to show Al Qaeda's videos were manipulated.
While important, compression isn't the only issue. You'll also have to consider issues such as resolution, cropping, noise, blurriness, color balance, white level... especially if you're dealing with non-digital sources. I went through a phase of collecting scans of HR Giger works and came across all sorts of subjective issues. One scan might be extremely high res but cuts off the edges. Another might be blurry but have more accurate colors (compared to low-res images from the artist's official sites). Many times I ended up keeping multiple images since I couldn't find a single one reproducing everything faithfully.
Just Noticeable Difference. The objective way of measuring the subjective. http://en.wikipedia.org/wiki/Difference_limen
Thanks to the many who took this as a serious question and didn't turn this into a "It's just pr0n so who cares." Some is pr0n, some isn't, the most consistent thing is humor.
Many ideas needed the original image to find the better quality of the copy and some asked where I get these images from. These are linked in that I get the images from the USENET, from forums and from artists' galleries. This means that there's only a small set, from the artists' galleries, that I know are original. Others may be original but it may not be the original that comes to me first. On occasion, an artist may even publish the same image in different forms depending on the limitations of the different forums he frequents.
There were some ideas that were nicely different from the directions I was following that they'll give me more to think about.
I'll also acknowledge those who said that how the image is represented is less important than what the image represents. That's quite true but if I have a machine that can find the best representation of something I enjoy then why not use it.
yeah, that's the problem with learning systems like neural nets - it is hard to be sure which variations they are 'focusing' on - figure or ground.
First i 'jpegtran' all files to even-out different compression methods. Then 'fdupes' and delete all identical files. /R .\ %%1 in (*.jpg *.jpeg) do jpegtran -optimize -perfect -copy none -progressive "%%1" "%%1"
In Windows: for
Duplicate File Finder (Empty RecycleBin before to avoid confusion)
Then i use GQView (exist for both Linux and win32). Set Preferences, Advanced, Custom Similarity to 98% to begin with. GQView Menu, New Collection, Load list of files and select Compare.
BONUS DISK-SPACE: jscl.exe -d -j -n -r -s *.jpg
hihi
This is a very interesting question!! (excuse me in advance for my english)
As mentionned in the previous posts, very simple mathematical equations can give you mesures about the quality of an image. For instance, the 3 most popular are:
- Root mean square
- Mean absolute difference
- Peak signal to noise ratio
However, none of these can provide an accurate representation of the artefacts percieved by a human. I'm a student in Image Processing in the University of Sherbrooke. From what I know, there's a lot of researches on "Quality mesurments" (especially one with people from texas University and Université de lyon) from which we expect promising results.
Until then, you can still use some old tricks. Chop off where it's the less percievable.
- Translate RGB channels in YUV. Chop on the chrominance and keep the luminance. We tend to be more sensible about the latter.
- Chop on High frequencies using a logarithmic filter. We're more sensible to small variations on lower frequencies.
All of the terms and concepts can be found with a quick search on google / wikipedia.
Also, take a look at the Jpeg2000 format. It's usage of the wavelet transform leaves a lot less artefacts for a given compression ratio.
PGF (progressive grapfic file) is similar to Jpeg2000, a bit faster on compression, leaves few more artefacts.
However, some old tips are still
Try VisiPics (freeware.)
In general your best bet would be to use an image quality metric that takes into account how the human visual system works. The 2D frequency response of the human eye looks something like a diamond, which means that we see vertical and horizontal frequencies better than diagonal ones.
In fact, most image compression techniques (including JPEG) take this into account, however, conventional ways of determining the noise in images (minimum mean squared error, peak signal to noise, root mean squares) don't factor in the human visual system.
Your best bet is to use something like the structural similarity method (SSIM) by Prof. Al Bovik of UT Austin and his student Prof. Zhou Wang (now at the University of Waterloo).
You can read all about SSIM and get example code here: http://www.ece.uwaterloo.ca/~z70wang/research/ssim/
Or read more about image quality assessment at Prof. Bovik's website: http://live.ece.utexas.edu/research/Quality/index.htm
If you don't care about how it works, and just want to use it, you can get example code for ssim in matlab at that website and C floating around the net. The method is easy to use; essentially the ssim function takes two images and returns a number between 0 and 1 that describes how similar the images are. Given two compressed images and the original image, take the SSIM between each and the original. The compressed image with the higher SSIM value is the "best".
It sounds like for your problem you might NOT have the original uncompressed image. In that case you might try checking for minimal entropy or maximum contrast in your images.
Essentially entropy would be calculated as:
h = histogram(Image);
p = h./(number of pixels in image);
entropy = -sum(p./log2(p));
You will need to make sure you scale the image appropriately and don't divide by zero! Or better yet, you should be able to find code for image entropy and contrast on the web. Just try searching for entropy.m for a matlab version.
Good luck!
Hello, is that you NASA ?
Matlab. Though you need to have the original picture to compare. One thing is though that mathematical difference does not correlate with image quality. By reducing the resolution of the chrominance channels (e.g. half resolution for color, full resolution for luminance), you can get a much smaller image, and you cannot easily see the difference. So image quality is always subjective.
I'm assuming you want to automatically/programmatically discard the one with the least/most artifacts. In this case there are very few programs around, but I'm working on a rules engine for my program that may be able to help you in future. Please evaluate DuMP3 at http://dump3.sourceforge.net/ to see if it may suit your needs.
double penetration;
But you don't know what you're talking about, and you're wrong. I was wondering how long it would be before some idiot thought this article would be a good excuse to reel off some buzz words they read in a book once but didn't really understand, in the hopes of looking intelligent...and here you are!
I take it that you want to extract and compare features of the actual jpeg image, regardless of quality. There are many ways to do that and none of them includes filesize comparisons or the like. You could look in the JPEG Standard and try to filter out compression by just reading the base of every 8x8 block (that's the one that shouldn't be compressed) and compare these values for similarity. However you should aim for more advanced image recognition and comparison algorithms, for example the ones used on TinEye. Most of these algorithms come from the field of AI, but they're quite simple generally.
JPEG Images are built of 8x8 Blocks. Thoser are then DCT'd (Discrete Cosine Transform) in order to get the Block's frequency spectrum. The element x=1, y=1 is the so-called DC-Channel (Like Direct Current). It is usually the average of the whole 8x8 Block. The other positions are frequencies incresing with the position (e.g. pos x=2 is one whole oscillation, where x=4 are 2 (or 4, don't remember) oscillations).
Now to the task. If you look at the DC-Component, and the other components are relatively small, this means that there is not much information in this block (e.g. if it is just a blue spot in a picture with the sky). However, if you have two similar pictures, you can compare block by block. The picture ehich has higher components in the higher x and y values will be the one with the better quality, since high frequency means: high details.
Of course, implementing this be difficult. There is not just DCT involved, but also a zip like algorithm, and the actal compressions is done by "rounding" the components values to integers (since DCT itself doesn't do any compression).
Maybe one could adapt a jpeg library by inserting some code in the decompression algorithm which creates a "fingerprint" of the individual blocks, and then compare it with the other picture's fingerprint. I think the result shoudl really tell the quality difference.
Cheers
If you know you have the original files, the file with the oldest date has the best quality. Else, go by file size first, and break ties using the oldest date as the winner.
You could write a little app to interpolate across spaces in the JPEG and then compare the resulting differences from interpolated and actual data for each JPEG image. Assumable, the image with more JPEG compression artifacts will have a higher (on average) difference between interpolated values and actual values because of the random artifacts which will throw off interpolation.
How finely grained your interpolation needs to be may be something you will have to experiment with... but I think this should work fairly reliably in theory.
It seems to me it would work best if you had something to compare it to
since you don't have the original, how about looking at it from this point of view.
take each of the two images and reprocess them with the lowest quality of jpeg (producing the most artifacts) and see which original image is closer to its reprocessed image.
the other one should then be the highest quality.
I heard a similar story about an auto-tracking algorithm used for aiming cameras. It would happily follow the red car, but then it saw the red garbage dumpster. It never moved after seeing the red garbage dumpster.
The truth is that the AI algorithms are absolutely notorious for keying in on unanticipated patterns. For the AI algorithms to work, you need to verify they are doing what you expect. Depending on your choice of algo, this can be really tough.
Just looking at the image works best, especially when you have to judge between for example an image that has higher resolution and one that has less artifacts. The only way you can really tell which one will look best to you is by looking at it.
You probably don't necessarily want to find the "best quality" image, but rather the image that was closest to the original.
I take it you're either trying to eliminate the low-quality duplicates or thumbnails from a really large collection of pr0n, or trying to write an image search engine that tries to present the "best" rendition of a particular image first.
For the second pass, you'd likely want to scan through the metadata first, especially stuff exposed by EXIF. So you'd want to give higher scores to EXIF data that makes it sound like it came directly off a digital camera or scanner, and bump down the desirability of pictures that appeared to have been edited by any sort of photo editing software.
Then maybe you want to look at something that would rank down watermarks or other modifications.
Another step would be to compare compression quality, but I think that's what most of the other posts are concentrating on. But this is a difficult step because it can be easily fooled, since idiots can re-save a low quality image with the compression quality cranked all the way up so the file size becomes high even though the actual image quality is worse than the original. You probably need to run it through one of those "photoshop detectors" that could tell you whether the image has been through smoothing or other filters in a photo editor. The originals (especially in raw format and maybe high quality JPEG) will have a certain type of CCD noise signature that your software might be able to detect. In the same vein, a poorly-compressed JPEG will have lots of JPEG quantization artifacts that your software might be able to detect as well. Otherwise, you're kinda left with zooming in on pics and eyeballing it.
Finally you might be left with a group of images that are exactly the same but have different file names... you probably want some way to store some of the more useful bits of descriptive text as search/tag metadata, but then choose the most consistent file naming convention or slap on your own based on your own metadata.
Hopefully this gives you a start to important parts of the process that you might have overlooked...
I've used VisiPics (google it or just add .info). It works very well to me. It'll scan the directories you choose, check for duplicate photos, display them (allowing you to compare them), and give you the option to move or delete either or all.
1: count how many unique values there each DCT coeffeciant. If you only find a small number then it probablly means the image has been through low quality jpeg compression. This method may be fooled if the image has been cropped in a way that changes the block boundried though.
2: check for excessive high frequency noise, this may indicate the image has been dithered in the past. OTOH excessively low high frequencies may indicate heavy jpeg compression.
IMO storage is cheap so what I would do is make a database which could index the various copies of each image. You could things arranged so there was one version the software considered "probablly best" but if you really needed the best quality copy you could go back and check manually.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register