Choosing Better-Quality JPEG Images With Software?

AI problem? by darpo · 2009-07-16 10:05 · Score: 1

Unfortunately, I think you may find that it will simply require a human-level brain. I'd be really impressed with software that said, "Yep, this image just *looks* better to me." Unless, of course, JPG artifacts are systematic and consistent across images, which could well be.

Re:AI problem? by Anonymous Coward · 2009-07-16 10:10 · Score: 0

Or you could just check the compression and resolution...
Re:AI problem? by Robotbeat · 2009-07-16 10:10 · Score: 3, Interesting

...it will simply require a human-level brain.
How about Amazon's Mechanical Turk service?
https://www.mturk.com/
Re:AI problem? by Anonymous Coward · 2009-07-16 10:10 · Score: 0

Use a neural network. You can train the network by presenting it with high-quality photos, and their deteriorated versions.
Re:AI problem? by lunchlady55 · 2009-07-16 10:22 · Score: 4, Funny

Oh sure, it starts out innocently enough - pick the better image. Next thing you know Skynet's decided that it's the better LIFE-FORM.
AI - JUST SAY NO!
Brought to you by the Coalition for Human Survival (C) Aug. 29, 1997
Re:AI problem? by Anonymous Coward · 2009-07-16 10:36 · Score: 0

I'd say: do it while it is still legal.
Re:AI problem? by eikonoklastes · 2009-07-16 10:40 · Score: 2, Funny

Well, of course, how else would you do it? It's so simple.
Re:AI problem? by nametaken · 2009-07-16 10:46 · Score: 2, Insightful

You're right, it needs to be done by humans to be sure.
Amazon's Mechanical Turk should do the trick.
https://www.mturk.com/mturk/welcome
Re:AI problem? by CajunArson · 2009-07-16 10:49 · Score: 5, Interesting

I don't know about "quality", but frankly it shouldn't be too hard to compare similar images just by doing simple mathematical analysis on the results. I'm only vaguely familiar with image compression, but if a "worse" JPEG image is more blocky, would it be possible to run edge detection to find the most clearly defined blocks that indicates a particular picture is producing "worse" results? That's just one idea, I'm sure people who know the compression better can name many other properties that could easily be measured automatically.
What a computer can't do is tell you if the image is subjectively worse, unless the same metric that the human uses to subjectively judge a picture happens to match the algorithm the computer is using, and even then it could vary by picture to picture. For example, a highly colorful picture might hide the artifacting much better than a picture that features lots of text. While the "blockiness" would be the same mathematically, the subjective human viewing it will notice the artifacts in the text much more.

--
AntiFA: An abbreviation for Anti First Amendment.
Re:AI problem? by CajunArson · 2009-07-16 10:56 · Score: 2, Insightful

And to reply to myself.. several other posters have noted that taking the DCT of the compression blocks in the image will give information on how highly compressed the image is... there's one example.

--
AntiFA: An abbreviation for Anti First Amendment.
Re:AI problem? by sarbrot · 2009-07-16 11:03 · Score: 1

yes, that was my first thought as well, compare filesize, dimensions and EXIF (if available). Filesize and dimensions alone should give you a factor that could be used well although there might be edge-cases where files are mislabeled as JPEG when they are in fact bmp or whatever. You should verify them by probing the file header as well
Re:AI problem? by Xenographic · 2009-07-16 11:06 · Score: 1

> How about Amazon's Mechanical Turk service?
He might not want everyone looking at his porn collection?
Also, you'd have to scan every pair of images for dupes, which changes the complexity from N to N*log(N). Moreover, that relies on humans and some people have no idea which image is higher quality. Not everyone even understands what a compression artifact is. Such people won't give you useful answers.
In his situation, I'd probably run the dupe finder program, then examine all the duplicates personally. There can't be *that* many... right?
Re:AI problem? by Anonymous Coward · 2009-07-16 11:47 · Score: 0

What an idiot. Knee to the groin.
Re:AI problem? by kpoole55 · 2009-07-16 11:48 · Score: 2, Informative

I've been lax, in a way, in my pruning of late so the findimagedupes program found about 28000 groups of near duplicate images. Finding that many was a surprise and that's why I started looking to see if a program had been written yet for the next step, finding the better image. I wrote a little script that prunes the identical files but now run into the problem of non-identical files that contain the same or nearly the same image.
Re:AI problem? by moderatorrater · 2009-07-16 12:06 · Score: 2, Insightful

Even simpler mathematical analysis would include such techniques as seeing which one takes up more disk space. Last I checked, that was very highly correlated with compression level.
Re:AI problem? by Spy+der+Mann · 2009-07-16 12:37 · Score: 4, Insightful

Here's a simple but expensive formula:
1. Get the image
2. Compress it severely.
3. Compare the difference between original and the compressed.
The lower the difference, the lower the image quality.
4. Profit!
Or you could just measure the amount of data in the DCT space. Duh.
Re:AI problem? by arose · 2009-07-16 12:38 · Score: 5, Informative

AI or small utility... You never know with computers ;)

--
Analogies don't equal equalities, they are merely somewhat analogous.
Re:AI problem? by fractoid · 2009-07-16 13:20 · Score: 1

Or you could just measure the amount of data in the DCT space. Duh.
This is the direction I was going in, partly because then you don't have to render the image out in order to test it. As for detecting similar images, perform some standardized munching on all images to turn them into 32x32 4-grey-level icons and then sum the differences in pixels. If there's only a few pixels different, and they're only out a little on brightness, then you have a match.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:AI problem? by fractoid · 2009-07-16 13:26 · Score: 2, Informative

Thou shalt not make a machine in the likeness of a human mind.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:AI problem? by fractoid · 2009-07-16 13:30 · Score: 1

Well, given that a JPEG encoded image is stored as a compressed DCT in the first place, that makes things pretty easy. :)

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:AI problem? by Anonymous Coward · 2009-07-16 13:30 · Score: 2, Informative

Or you could just measure the amount of data in the DCT space. Duh.
That'd be a Discrete Cosine Transform
(for the confused like me. Crazy what they can do with math these days)
Re:AI problem? by Anonymous Coward · 2009-07-16 13:38 · Score: 0

what if you have more than 1,073,741,824 images? better use 5 gray levels
Re:AI problem? by bendodge · 2009-07-16 13:41 · Score: 5, Informative

Since the mods haven't noticed, and I don't have mod points, let me point out that THIS POST HAS THE ANSWER. A real program that will do what the asker wants. The source is available, but I can't seem to find its license (it includes some of the Independent JPEG Goup's code). Also, doesn't a jpeg's EXIF data or some other tag in the file tell you what quality it was saved at?

--
The government can't save you.
Re:AI problem? by Anonymous Coward · 2009-07-16 14:12 · Score: 0

Giggitygiggitygiggity.
-- Tucker
Re:AI problem? by VanessaE · 2009-07-16 14:44 · Score: 2, Insightful

Just checking the size of the file (or, I suspect, just the size of the DCT data) won't always work. Sometimes an image can end up growing in size slightly while losing quality, depending on the nature of the image and the settings of the imaging program.
Things such as thin wires, multi-colored ribbon cable, close-ups of a circuit board, and other images with lots of similar details seem to benefit most from this kind of tweaking, mainly thanks to the placement and qualities of the artifacts, rather than their mere existence or apparent severity.
I've had this happen many times - set an icon for, say, 35% quality and it will probably look kinda grungy, but step it down by just one or two percent and suddenly the artifacts shift around or change their appearance, sometimes in a manner that better suits the image - almost like constructive interference.
Re:AI problem? by iluvcapra · 2009-07-16 15:18 · Score: 1

I'm pretty sure it's impossible, information-theoretically, to examine the bitmap of several images and decide which among them is of the "highest quality," because you in order to decide the fidelity of an image you need the original un-lossy image, to compare with the others to make an objective determination of the total signal noise and distortion. Either that, or somehow have metadata in the file that captures knowledge of how much data was lost in the transform.
You could use mturk or some sort of program that detects the signature of particular JPEG artifacts, but in the end this will just be heuristic, and won't give you a positive answer. For all of these heuristics, I'd bet simply nominating the largest of all files found to be of the same image will pick the best image as often as any more sophisticated method.

--
Don't blame me, I voted for Baltar.
Re:AI problem? by adolf · 2009-07-16 16:51 · Score: 2, Interesting

It almost does what he wants. He doesn't spell it out, but it seems strongly implied that he also wants a system capable of automatically finding these duplicates by itself, and then automatically determining which image is "best."
Which seems obvious, to me: If he's got enough photos of sufficient disorganization that he can't tell automatically which duplicate is best, then there probably isn't any straight-forward way (with filenames or directory trees or whatever) to find out which ones are dupes to begin with.
Judge, the afore-linked program, only does the job of finding the best image out of a set of duplicates.
What tool can be used to find the (near) duplicates to begin with?

--
Kid-proof tablet..
Re:AI problem? by XDirtypunkX · 2009-07-16 17:02 · Score: 1

Different images show different amounts of artifacts at the same amount of compression.
Re:AI problem? by XDirtypunkX · 2009-07-16 17:07 · Score: 1

Yeah, but information theory wise you could have a very high quality JPEG copy of a previously compressed JPEG image, which doesn't help anyone.
In practice though, JPEG artifacts occur on blocks boundaries and it's pretty easy to create a good heuristic for the kind of images you care about.
Re:AI problem? by TheSpoom · 2009-07-16 17:26 · Score: 1

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Re:AI problem? by bh_doc · 2009-07-16 18:29 · Score: 3, Informative

http://www.jhnc.org/findimagedupes/
There's a bunch, but I know you can construct command line operations with this one. I imagine you could construct a system from this and the parent program that will find dupes, then nuke the poorer quality of each, or whatever.
Re:AI problem? by scdeimos · 2009-07-16 18:39 · Score: 2, Insightful

That's only a reasonable indicator if the two copies of the same image you are comparing are also the same resolution. It's not hard to have a higher resolution image consume less disk space if the compression level has been bumped up. Also, different programs usually produce different JFIF streams even when set to the same compression level and using the same *uncompressed* source image, making the DCT size approach even less reliable.
Re:AI problem? by Lord+Crc · 2009-07-16 19:23 · Score: 1

Even simpler mathematical analysis would include such techniques as seeing which one takes up more disk space. Last I checked, that was very highly correlated with compression level.
The problem is that there are many choices left to the compression program which affect the quality/size trade-off. A high quality compression program might generate optimized quantization tables for that specific image, resulting in a superior image at lower bitrate compared to say the standard libjpeg implementation.
Re:AI problem? by Anonymous Coward · 2009-07-16 19:31 · Score: 0

Turks shit in the street, so you're the imbecile.
Re:AI problem? by CarpetShark · 2009-07-16 20:08 · Score: 1

Unfortunately, I think you may find that it will simply require a human-level brain.
OK, great. Now where can I find a donkey?
Re:AI problem? by SlashWombat · 2009-07-16 21:31 · Score: 2, Insightful

Unfortunately, its not all that easy to compare. In general, the file with the higher byte count will be the better image, BUT ... The problem is there are different ways to compress the same picture. (There are several "controls", even in baseline JPEG. (Where the "quantisation" steps occur, where the high frequency cutoff for each macroblock occurs. Then there are different ways for the JPEG engine to entropy encode the bitstream. IE: Arithmetic coding is allowed by the JPEG standard, however, due to patent issues, most implementations use Huffman coding, which is slightly less efficient.) It should be remembered that the JPEG standard is just baseline Any implementer is free to improve upon the baseline coding, as long as it still decodes correctly. There used to be JPEG viewing software that decompressed and cleaned up images that looked terrible using "standard JPEG decoding software. (I am not sure, but I suspect the blockiness and quantisation errors were smoothed out, improving the displayed image immensely.)

Of course, what you really need is the NCIS image enhancement package.
Re:AI problem? by gnasher719 · 2009-07-16 23:37 · Score: 1

Even simpler mathematical analysis would include such techniques as seeing which one takes up more disk space. Last I checked, that was very highly correlated with compression level.
And it would often be completely wrong because it doesn't take into account that some people re-encode images again. Like an image could be compressed to 100 KB in JPEG, then become a 4 MB BMP image, then compressed to 500 KB JPEG. I doubt it will look better than the same image, compressed directly to 200 KB.
Re:AI problem? by nahdude812 · 2009-07-16 23:42 · Score: 2, Insightful

This just about gets to the heart of it. "Better" is a subjective term, so choosing better quality images is not going to be something everyone can agree on. Your example nails it. If you have two copies of the same image, one is higher resolution than the other, but saved with a higher compression rate, which is better? The answer is going to be "it depends on if the noise introduced by the higher compression annoys me more than the reduced information in the lower resolution image."
If the compression on the high resolution image is high enough, you might still have better detail in the lower resolution image. If the higher resolution image isn't actually higher resolution, just higher dimensions (it's the smaller image scaled up), this is automatically a lower quality image (you can always recreate the higher resolution image from the lower resolution image, but not vice versa as rounding errors cause information loss whenever you scale an image).
There may also be subjective differences like brightness/contrast/tone mapping differences.
Given that the question being asked is a subjective one, the correlation of file size to subjective image quality should be so high that you may gain only a few percent better predictability with an extremely complex algorithm.

--
Slay a dragon... over lunch!
Re:AI problem? by buchner.johannes · 2009-07-16 23:58 · Score: 1

You're right, it needs to be done by humans to be sure.
I bet this is how "Hot or not" et. al. came to life

--
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
Re:AI problem? by holmstar · 2009-07-17 01:00 · Score: 1

The author already said that finding which images are similar is not the problem. The problem is finding the highest quality image from those that are similar.
Re:AI problem? by Liquidretro · 2009-07-17 02:06 · Score: 1

I agree, sounds like a algorithm for mathmatica to me. If you are serious about this there are people on Flickr that run detailed mathematical image analysis that compare camera to camera, sensor to sensor, etc for things like noise and other properties. I would think one of them might be able to help you figure out how to do this best. You do not want to use people for this process. Most people unless specially trained and actually care are bad at spotting key differences in photos.
Re:AI problem? by Anonymous Coward · 2009-07-17 02:37 · Score: 0

Why is parent not funny? Illiterate philistines.
Re:AI problem? by sexconker · 2009-07-17 03:41 · Score: 1

How can you account for cropping, rotation, maybe added text, etc.?
Re:AI problem? by sexconker · 2009-07-17 03:45 · Score: 1

Aug 29th is Michael Jackson's birthday (and my own).
Re:AI problem? by fractoid · 2009-07-17 04:50 · Score: 1

I have an elegant solution but this margin is too small to contain it.

It's easy to make things harder by adding "but what if"s. The question is whether those what-ifs apply. For instance, if the question is "how can I find duplicate images that were independantly compressed from the same (or almost identical) source image" as I guessed, then my solution works. On the far other end of the scale, you could ask for an algorithm that will scan your picture collection of lolcats, cute puppies, pron, and your holiday snaps, and return one each from those categories.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:AI problem? by rilian4 · 2009-07-17 07:07 · Score: 1

Of course, what you really need is the NCIS image enhancement package.
YES!! I wish I had mod-points today. You'd get a +1 for the NCIS reference. Let McGee get to work on that and you'll have an answer before the next commercial break.

--

...quicker, easier, more seductive the darkside is...but more powerful, it is not.
Re:AI problem? by treeves · 2009-07-17 08:15 · Score: 1

Maybe make a RECAPTCHA problem out of it. Get more people looking at each image. Works well for digitizing old books.

--
...the future crusty old bastards are already drinking the Kool-Aid.
Re:AI problem? by TheVedge · 2009-07-17 10:04 · Score: 1

If that is the case, then it is clearly a proof that human brains are not apt at the task.
Re:AI problem? by sexconker · 2009-07-17 10:32 · Score: 1

On the one hand, I asked a question.
On the other hand, you give a long winded version of "it doesn't, that's too hard.".
Re:AI problem? by adolf · 2009-07-17 17:48 · Score: 1

He has? Where?

--
Kid-proof tablet..
Re:AI problem? by kpoole55 · 2009-07-18 15:05 · Score: 1

Here's the core of what I feel the problem is ...
"The problem is image collections, and finding the better of near-duplicate images."
Perhaps this was said in too few words but what was meant was that in collections of images there are often sets of near duplicates and what is wanted is a way to find the best quality image of the set of nearly duplicate images.
I got judge to compile on my Ubuntu 8.04 system and tested it with a few known examples and it seems to do what I want. It certainly wasn't fooled by a low quality image resaved at a high quality setting. Now to call it from my little Python script that let's me oversee the process and see how it does on a wider set of examples.
Re:AI problem? by Nicolay77 · 2009-07-19 02:53 · Score: 1

A perfectly cromulent answer for a cromulent question.

--
We are Turing O-Machines. The Oracle is out there.
Re:AI problem? by adolf · 2009-07-19 12:43 · Score: 1

Funny. I read the exact same thing, and interpreted it totally differently.
In fact, I still do.
Glad that judge at least does what it's supposed to. I'll have to try to remember it for when I have a similar problem in the future (and I certainly have in the past).

--
Kid-proof tablet..
Re:AI problem? by kpoole55 · 2009-07-19 13:52 · Score: 1

I was trying not to use my usual form of run on sentences that I tend to favor. Sorry for the confusion.
But now I have two programs, findimagedupes, which finds sets of visually similar images in a larger set of images, and judge, which rates the JPEG quality of an JPEG file. Given these two programs and a few rules of thumb I should be able to create a Python script that will do most of the work and just present me with a few things that really do need an eyeball inspection.

Easy by Anonymous Coward · 2009-07-16 10:05 · Score: 3, Interesting

Paste both images in your image editor of choice, one layer on top of each other, apply a difference/subtraction filter.

Re:Easy by Random+Destruction · 2009-07-16 11:06 · Score: 3, Insightful

Ok, so you know how two images differ. Which one is closer to the original? You don't know, because you don't have the original to compare.

--
:x
Re:Easy by dainichi · 2009-07-16 12:28 · Score: 1

To 28000 images??? even a group of trained monkeys would start revolting.

--
"Oooh. I hate it when a paradigm shifts without a clutch"
Re:Easy by kpoole55 · 2009-07-16 12:59 · Score: 1

No. 28000 groups of similar images. Some of those groups, not many, may have upwards of ten members in the group.
Re:Easy by Hurricane78 · 2009-07-16 13:39 · Score: 1

This can be scripted too. With imagemagick.

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Re:Easy by Anonymous Coward · 2009-07-16 16:43 · Score: 0

How bout looking for the most high-frequency response, basically meaning that the images are sharper (although this will more likely tell you that it's closer to the original copy, rather than better quality, as someone could later apply noise reduction to fix it up.)

File size by Tanman · 2009-07-16 10:06 · Score: 2, Insightful

it is lossy compression, after all . . .

Re:File size by Anonymous Coward · 2009-07-16 10:10 · Score: 0

Assuming, of course, that the resolution and aspect ratio are the same.
Re:File size by Robotbeat · 2009-07-16 10:14 · Score: 4, Informative

File size doesn't tell you everything about quality.
For instance, if you save an image as a JPEG vs. first saving as a dithered GIF and _then_ saving as JPEG, then the second one will have much worse actual quality, even if it has the same filesize (it may well have worse quality AND have a larger file size).
Re:File size by teko_teko · 2009-07-16 10:16 · Score: 3, Insightful

File size may not be accurate if it has been converted multiple times at different quality, or if the source is actually lower quality.
The only way to properly compare is if you have the original as the control.
If you compare between 2 different JPEG quality images, the program won't know which parts are the artifacts. You still have to decide yourself...
Re:File size by Anonymous Coward · 2009-07-16 10:19 · Score: 3, Insightful

File size doesn't tell you anything. If I take a picture with a bunch of noise (eg. poor lighting) in it then it will not compress as well. If I take the same picture with perfect lighting it might be higher quality but smaller file size.
Why this is modded up, I don't know. Too many morons out there.
Re:File size by Anonymous Coward · 2009-07-16 10:26 · Score: 0

That's a good first order approximation, but if you're collecting your images from the internet you'll find that sometimes someone will save a low-quality jpeg image at higher quality. Some ancient browsers used to save all images as bmps, then that image might get converted to a jpeg later, using a quality setting that doesn't match the original. The artifacts will still be there but the file size will not reflect that.
Re:File size by Shikaku · 2009-07-16 10:29 · Score: 4, Informative

http://linux.maruhn.com/sec/jpegoptim.html
No. You can compress JPEG lossless.
Re:File size by Anonymous Coward · 2009-07-16 10:37 · Score: 0

Informative? Slashdot's standards really have been dropping lately. This utility you linked to appears to perform some sort of additional lossless optimization of a JPEG file. However, THE ORIGINAL JPEG FILE WAS ALREADY COMPRESSED IN A LOSSLESS WAY, and furthermore, THERE IS NO LOSSLESS JPEG. PERIOD.
Re:File size by lymond01 · 2009-07-16 10:42 · Score: 1

I sort of had the impression the person was talking about the exact same picture, saved from the original to two different qualities of JPEG. If he were trying to tell the difference between the amount of JPEG artifacts in two different pictures, I imagine he would get inconsistent results, given many trials, for the reasons you say.
I suppose he could have meant something different than what he said, but there aren't too many politicians trolling slashdot, I'd guess.
Re:File size by Vectronic · 2009-07-16 10:42 · Score: 2, Interesting

Also, stuff like Photoshop, will insert a bunch of meta/exif-bullshit but something like Paint, doesn't... it's usually only about 2 to 3kb, but it's still tainting your results if you are going by size alone.
Re:File size by richardkelleher · 2009-07-16 10:46 · Score: 1

If you had a database with say, 100k original images, plus each of those images saved with various levels of compression. Then scan through the files looking for differences between the known higher and lower quality images building a database of what "artifacts" look like. You could then use the database of "artifacts" that occur most frenquently to scan unknown images for matches. After visually reviewing the "artifacts" found in a thousand or so unknown images for validity (is it really something that lowers the quality of the image), you can determine if this method is valid. If determined to be valid, just bounce images off the database of known "artifacts" and rate them for image quality on some scale. The ones with the highest (or lowest depending on the design of the scale) is the better image.
Now, go write that grant (I'm guessing DOD would be interested in funding something like this), put together a staff of 10 or so (20 if you use undergrads) to build and visually analyze the database. Write some analysis software and have a field day.
Oh, by the way, I get 5% of the grant amount as consultant on the initial project design... :-)
Re:File size by Anonymous Coward · 2009-07-16 10:46 · Score: 0

Except the guy didn't ask about any of that--all he asked about was jpeg artifacts.
Re:File size by Anonymous Coward · 2009-07-16 10:52 · Score: 1, Informative

actually one of the meta values that is stored is a quality indicator.
Re:File size by Score+Whore · 2009-07-16 10:52 · Score: 5, Informative

...THERE IS NO LOSSLESS JPEG. PERIOD.
Except for Lossless JPEG standardized in 1993. But other than that, no there is no lossless jpeg.
Re:File size by PitaBred · 2009-07-16 10:53 · Score: 1

But if they're duplicate pictures (some kind of matching heuristic), then file size most certainly IS appropriate. You're starting from the same point, choosing the result with less lost during compression, and therefore larger, would be quite logical.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:File size by kernelphr34k · 2009-07-16 10:54 · Score: 1

jpeg is a lossless format anyways... JPEG suck if you want to keep quality.. Everytime you open and save the image it looses quality. Use a PNG, or TIF for a better quality image that you can open and save many times without loosing quality.
Re:File size by Anonymous Coward · 2009-07-16 11:07 · Score: 0

Err, that should read "the original jpeg file was already compressed in a lossy way."
Re:File size by izomiac · 2009-07-16 11:22 · Score: 1

File size doesn't tell you anything. If I take a picture with a bunch of noise (eg. poor lighting) in it then it will not compress as well. If I take the same picture with perfect lighting it might be higher quality but smaller file size.
That sounds like you took two different pictures and have two different files. Comparing file size obvious wouldn't work for different pictures, nor could I see why anyone would want to automatically delete one of them. But if it's the same picture, just more highly compressed, then the file size would almost certainly be greater for the less compressed image. Essentially by definition, since that's the whole point of compression.
Re:File size by Qzukk · 2009-07-16 11:27 · Score: 2, Insightful

actually one of the meta values that is stored is a quality indicator.
And when you save a max quality copy of a min quality jpeg, the picture still looks like crap.

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
Re:File size by Chyeld · 2009-07-16 11:30 · Score: 1

Unfortunately, that's a subjective term based on the 'codec' used to make the jpg. Not everyone's 100 is the same nor is everyone working off the same scale (i.e. 1-10 vs 1-100). It helps if all the images were made by the same program using the same parameters, but breaks down quickly as a valid comparitor after that.
Re:File size by Chyeld · 2009-07-16 11:43 · Score: 5, Interesting

There was a old story my AI teacher used to share back in college about a military contractor that was developing an AI based IFF (identifcation, friend or foe) system for aircraft.
They trained it using what was, at the time, a vast picture database of every aircraft known. In the lab, they were able to get it down to 99% accurate, with the error favoring 'unknown' as the third option.
So they took it out for a test run. The first night out the system tried firing on anything and everything it could lock on, including ground targets.
This was bad. Horribly bad. But they were certain that there was some sort of equipment failure going on. After all their AI was damn near perfect at ID'ing the targets in the lab, the issues must be up the line somewhere.
So they did a once over of the equipment and couldn't find a problem. Not sure what to do next the team took the system out for another dry run the next day. This time, the system refused to see any ground targets and anything it saw in the air was friendly.
Now this was getting ridiculous, the team was extremely confused. So they did what they should have done the first time around, they did a third test run looking at what the AI was actually 'thinking'.
And promptly discovered the problem. While they had a huge database of images to use, they realized that all their 'friendly' craft had pictures taken during the day, while in flight. All their 'hostile' craft however were pictures that had been taken at night during spy runs or from over head satalite shots.
The AI wasn't keying off the planes, it was keying off whether it was daytime or night time.
I don't know if the above actually ever happened, but my point is, it doesn't matter how many images you seed your database with. Unless you are there to tell it what is an artifact and what is just part of the picture, you are going to end up with horrible results and comical results.
Re:File size by mezis · 2009-07-16 12:19 · Score: 2, Interesting

Every single JPEG is lossy, for three reasons:

a. Source (color) digital images use RGB colorspace (typically, the raw format is "RAW" with a Bayer layout). JPEG compresses three planes, with a YCrCb colorspace.
Due to colorspace conversion and quantization error, you lose information. That's called "lossy".
b. Even in lossless JPEG, each 64-pixel block is KR-transformed and quantized. Again, always lossy.
c. No free lunch.

Typically, even lossless JPEG makes you lose 1-2% of the total information (measured via image entropy). Things are slightly better with lossless JPEG2000. Both are *perceptually* lossless.
Re:File size by 4D6963 · 2009-07-16 12:34 · Score: 1

I heard the same one except with tanks.

--
You just got troll'd!
Re:File size by Anonymous Coward · 2009-07-16 12:39 · Score: 2, Funny

>>Except for Lossless JPEG [wikipedia.org] standardized in 1993. But other than that, no there is no lossless jpeg.
Katie Couric: What did John McCain do to try to stop the housing meltdown?
Sarah Palin: He voted for legislation to more carefully regulate Fannie Mae and Freddie Mac to stop bad lending practices.
Katie Couric: ...
Katie Couric: Well, besides that, what did he do?
Sarah Palin: ?
And the funny thing is, we all remember this now as Sarah Palin not knowing the answer to the question, when it was really Katie Couric who was the fucktard that didn't know about lossless jpeg.
True story.
Re:File size by 4D6963 · 2009-07-16 12:40 · Score: 1

Why is yours modded up higher I wonder. The OP wants to "compare two visually similar JPEG images and select the one with the fewest JPEG artifacts". That means they're the same image. That means file size will help you there, unless they're not the same resolution, although it should do regardless.
If I take a picture with a bunch of noise (eg. poor lighting) in it then it will not compress as well. If I take the same picture with perfect lighting it might be higher quality but smaller file size.
That's if you compensate for the poor lighting so it appears as bright though. But yeah, at that point that makes it completely different images, so why talk about lighting or noise, why not talk about the smoothness of features photographed or whatever else.
Which reminds me, am I the only one who can tell from looking at the various file sizes in a folder containing a set of photographs (let's say, porn) which is a close up or not?

--
You just got troll'd!
Re:File size by nabsltd · 2009-07-16 12:41 · Score: 3, Insightful

Unfortunately, that's a subjective term based on the 'codec' used to make the jpg. Not everyone's 100 is the same nor is everyone working off the same scale (i.e. 1-10 vs 1-100).
In addition, I bought a program (Windows only, sorry) that allows the user to pick the areas of the image that need the most bits. Basically, it allows you to pick the quality for any abitrary region (using standard selection tools like lasso) when saving the JPEG.
I mostly got it for the batch processing and its excellent image quality when you set it to minimum compression.
Re:File size by Minwee · 2009-07-16 12:51 · Score: 4, Funny

And a squad of kanagaroos firing RPGs.
Re:File size by Anonymous Coward · 2009-07-16 12:52 · Score: 0

Although if properly entered all that meta/exif bullshit can be used to do just what he is asking about. That is the reason that data exists. It can be searched and correlated.
Re:File size by RJFerret · 2009-07-16 13:15 · Score: 1

Sub-sampling will also totally throw off file size (which I adjust all the time depending on image content).
But...how about this? Re-compress both images to your lowest typical level. The one that's changed the greatest will be the highest quality, have the most detail and dynamic range, without time consuming visual inspection.
I just tried it and found this method superficially effective at least.
In the future, use better file naming notes! (My originals are Name00.jpg, first gen are Name01.jpg, radical changes go to Name10.jpg and weirdness can even be accommodated with Name10silo.jpg or Name10-512.jpg for resizes.) They also sort in sequential order in file requesters for easy work flow/processing.
-Randy
Re:File size by sbeckstead · 2009-07-16 13:26 · Score: 2, Insightful

But another bit of meta data there is "generation" so at least you could see how far it went from the place it started. The meta data actually has a purpose and people that process images without preserving it should be shot. And if the image hasn't got meta data and you are a professional you won't use it anyway. I hate tools like Paint because they destroy all that beautiful meta data you could have used to make this determination much easier. Assuming of course that image was generated and stored by someone who used the meta data in the first place. Alas you may be hosed here.

--
Why bother
Re:File size by sbeckstead · 2009-07-16 13:32 · Score: 1

You don't really have to answer the question he asked in this case because what he wants is a solution to a problem stated badly or in terms that were not quite within the parameters that were described. Solving his problem is far more important than answering a stupid question from a dweeb that obviously isn't an image processing professional like those of us who answered his question with a solution not contained withing his original question. You just have to realize that when it comes to /. there are far more interesting things to do than merely answering a question which always presents a method of proving your superior information retention and processing abilities. Now read this three times until you can understand it and repeat it forward and backwards with proper punctuation. And he still won't be any closer to his solution because there was no information contained in your post that would have been helpful to us or him.

--
Why bother
Re:File size by sbeckstead · 2009-07-16 13:53 · Score: 1

Solving his problem is far more important than answering a stupid post from a dweeb that obviously isn't an image processing professional like yourself who answered his question with a solution not contained within his original question. You just have to realize that when it comes to /. there are far more interesting things to do than merely answering a question which always presents a method of proving your superior information retention and processing abilities.. And original poster still isn't any closer to his solution because there was no information contained in your post that would have been helpful to us or him.

--
Why bother
Re:File size by sbeckstead · 2009-07-16 13:55 · Score: 1

You loose dogs of war you lose quality of images. Don't get that backwards again...

--
Why bother
Re:File size by Anonymous Coward · 2009-07-16 13:58 · Score: 0

A min quality JPEG will have blocks of similar color, so saving at higher quality will tend to produce fewer bits because the low-quality blocks will compress well. I suppose you could check the number of colors used in the image as well as the size and compression. Perhaps the average number of pixels per color would help.
Re:File size by Hurricane78 · 2009-07-16 13:58 · Score: 1

Well, that's why you look at it first. If you can't tell the difference, and want the "best quality" anyway, you got the same disease an as "audiophile", and I recommend some Monster display cables to go with it. :P

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Re:File size by SEWilco · 2009-07-16 14:11 · Score: 1

A hundred people standing next to each other every evening taking pictures of sunset at Yosemite can generate different pictures which look similar. If they're similar enough that it's hard to tell them apart, this fellow is willing to get rid of the lower quality ones even if they're not from identical sources.
Re:File size by timeOday · 2009-07-16 14:11 · Score: 5, Insightful

This is the kind of problem you can solve in 2 minutes with 95% accuracy (by using file size), or never finish at all by listening to all the pedants on slashdot. When people know a little too much they love to go on about stuff like entropy and information gain, just because they (sort of) can.
Try file size on the set of images of interest to you and see if it coincides with your intuition. If it does, you're done.
Re:File size by quacking+duck · 2009-07-16 14:11 · Score: 1

And promptly discovered the problem. While they had a huge database of images to use, they realized that all their 'friendly' craft had pictures taken during the day, while in flight. All their 'hostile' craft however were pictures that had been taken at night during spy runs or from over head satalite shots.
[...]
I heard the same one except with tanks.
Wait, they imaged tanks while they were in flight...?
Re:File size by SEWilco · 2009-07-16 14:15 · Score: 1

Yeah, I also lose quality in my images when there are loose dogs of war around. All the running tends to do that.
Re:File size by Score+Whore · 2009-07-16 15:11 · Score: 1

Except that there is no requirement that a jpeg be encoded in YCbCr. Lossless jpeg is a totally separate mode of encoding. See faq entry.
Re:File size by Binary+Boy · 2009-07-16 15:12 · Score: 4, Informative

Lossless JPEG and lossless JPEG2000 are both exactly that - lossless. Not perceptually lossless, which is what people often use to refer to high-quality, lossy JPEG/JPEG2000, or JPEG-LS. Lossless JPEG uses a PCM-like encoder, not DCT, AFAIR. Lossless JPEG and lossless JPEG2000 are, in fact, lossless, at least with regards to image data in supported color spaces. This is in part a result of *not* converting to YCrCb, since that conversion is lossy, of course. Not all Lossless JPEGs are 8bit YCrCb.
Accusoft, for one, has a toolkit for building lossless JPEG applications which supports 16bit RGB and greyscale lossless JPEG modes.
The near-lossless JPEG you're thinking of is JPEG-LS, which is perceptually lossless, and guarantees a maximum error rate that is generally neglible for almost all applications. This format gets better compression ratios than Lossless JPEG, of course.
Neither the lossless or near-lossless JPEG modes are common though, outside of niche apps. Lossless JPEG2000 is, however, since almost all JPEG2000 libraries support it alongside the lossy modes.
Re:File size by izomiac · 2009-07-16 15:25 · Score: 1

Very true, but when dealing with different images the term "quality" becomes too subjective for a program to deal with. OTOH, the phrasing of the question makes it ambiguous whether the OP means the same original image that's been compressed at different levels (my impression), or multiple pictures of the same physical object (and lighting and all) that have been compressed at different levels (how could that happen?). Perhaps even multiple pictures of the same object compressed at the same level but keeping the one with less visible artifacts (also subjective).
Re:File size by gammaraybuster · 2009-07-16 15:41 · Score: 1

It just struck me as funny how there's a loss or corruption of information about lossless jpeg format.
Re:File size by swillden · 2009-07-16 15:44 · Score: 1

File size doesn't tell you anything.
I use it all the time, and it works really well.
Sometimes when I'm trying to handhold a shot and I have to use a shutter speed that's a little too slow (meaning small shakes of my hands cause blur), I put the camera in continuous mode and mash the button for 2-3 seconds, collecting 10-15 images of almost exactly the same image -- but some of them will come out less shaky and significantly sharper than others.
In post-processing, I could manually compare them one by one to find the sharpest, but it's much quicker and easier to look at the file sizes. Having done this a few hundred times, I now no longer even bother examining the images visually at 1:1 zoom, because in the many that I did check carefully, file size was always an accurate indicator. This is true with both JPEG files and CR2 (losslessly-compressed RAW files).

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:File size by Blue+Shifted · 2009-07-16 15:48 · Score: 1

ditto
Re:File size by Air-conditioned+cowh · 2009-07-16 17:02 · Score: 1

it is lossy compression, after all . . .
Time stamps, even!
Re:File size by wall0159 · 2009-07-16 19:00 · Score: 1

C'mon - next you'll be trying to tell me that the Earth orbits the sun...
Re:File size by beelsebob · 2009-07-16 19:40 · Score: 1

Yes, but it will have far fewer jpeg artifacts as well. The quality loss will all be gif artifacts.
Re:File size by Hognoxious · 2009-07-16 20:21 · Score: 1

Similar != same
There's a difference between two separate shots of the same scene and two images derived from the same shot. It's not really clear which the OP means.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:File size by Hognoxious · 2009-07-16 20:26 · Score: 1

jpeg is a lossless format anyways... [snip] Everytime you open and save the image it looses quality.

Do you understand what the less suffix means?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:File size by Anonymous Coward · 2009-07-16 20:26 · Score: 0

Brillant!
Re:File size by Anonymous Coward · 2009-07-16 21:32 · Score: 0

... A flatfish with six legs? Sing the evolution, man!
Re:File size by Anonymous Coward · 2009-07-16 21:43 · Score: 0

Lossless JPEG2000 is just a special case of regular JPEG2000. J2k is designed in a way that you can basically cut off a file at any desired size, and the larger the file, the better the quality. Up to lossless quality.
Re:File size by Anonymous Coward · 2009-07-16 22:04 · Score: 0

That's why some of us use our brains and read the _whole_ thing to actually figure out what he wants. Rather than fixate over the possible meanings of single words.

e.g.
> The problem is image collections, and finding the better of near-duplicate images
> It's known that saving the same source image in JPEG format at different quality levels produces different images, the one at the lower quality having more JPEG artifacts
> I've been trying to find a method to compare two visually similar JPEG images and select the one with the fewest JPEG artifacts

Most people with "image collections" who want to keep the better images from near dupes, are probably talking about the problem where the pics are actually from the same shot/source, but compressed differently or recompressed.

Plus if you have two pics that are actually separate shots of the same scene but hard to distinguish from each other, then it's likely for his purposes they are identical, and he might as well pick the better quality one.
Re:File size by Anonymous Coward · 2009-07-16 23:14 · Score: 0

Amen.
Re:File size by Ant+P. · 2009-07-16 23:31 · Score: 1

VBR JPEG compression... I'd be surprised if that hasn't been done somewhere before (and slightly annoyed). The user controllable part is a nice touch though.
Re:File size by Anonymous Coward · 2009-07-17 01:08 · Score: 0

god help us all
Re:File size by david.gilbert · 2009-07-17 01:22 · Score: 1

You're gonna have to change your thinking if you ever want to be a consultant.
Re:File size by Mr.+Suck · 2009-07-17 01:45 · Score: 1

Another more common example of this issue is the artifacts potentially introduced when an image is resized (resampled) - different resampling algorithms have differing quality.
A potentially intractable aspect of this problem is that there is no reference image supplied - your proposed algorithms have nothing concrete to be scored against so you have no way to objectively pick the best one.
Re:File size by Chyeld · 2009-07-17 03:12 · Score: 2, Interesting

I always wondered if that one wasn't an urban legend too, but appearently it was mostly true:

The reuse of some object-oriented code has caused tactical headaches for Australia's armed forces. As virtual reality simulators assume larger roles in helicopter combat training, programmers have gone to great lengths to increase the realism of their scenarios, including detailed landscapes and - in the case of the Northern Territory's Operation Phoenix - herds of kangaroos (since disturbed animals might well give away a helicopter's position).
The head of the Defense Science & Technology Organization's Land Operations/Simulation division reportedly instructed developers to model the local marsupials' movements and reactions to helicopters. Being efficient programmers, they just re-appropriated some code originally used to model infantry detachment reactions under the same stimuli, changed the mapped icon from a soldier to a kangaroo, and increased the figures' speed of movement.
Eager to demonstrate their flying skills for some visiting American pilots, the hotshot Aussies "buzzed" the virtual kangaroos in low flight during a simulation. The kangaroos scattered, as predicted, and the visiting Americans nodded appreciatively... then did a double-take as the kangaroos reappeared from behind a hill and launched a barrage of Stinger missiles at the hapless helicopter. (Apparently the programmers had forgotten to remove that part of the infantry coding.)
The lesson?
Objects are defined with certain attributes, and any new object defined in terms of an old one inherits all the attributes. The embarrassed programmers had learned to be careful when reusing object-oriented code, and the Yanks left with a newfound respect for Australian wildlife. Simulator supervisors report that pilots from that point onward have strictly avoided kangaroos, just as they were meant to.
Now the real story, with the Urban Myth removed...
On Friday DSD told the story of the killer kangaroos. Now we know the truth. And it is even weirder: the kangaroos threw beach balls!
Dr Anne-Marie Grisogono, Head, Simulation Land Operations Division at the Australian DSTO has told us what actually happened and we are delighted to set the record straight.
"I related this story as part of a talk on Simulation for Defence, at the Australian Science Festival on May 6th in Canberra. The Armed Reconnaissance Helicopter mission simulators built by the Synthetic Environments Research Facility in Land Operations Division of DSTO, do indeed fly in a fairly high fidelity environment which is a 4000 sq km piece of real outback Australia around Katherine, built from elevation data, overlaid with aerial photographs and with 2.5 million realistic 3d trees placed in the terrain in those areas where the photographs indicated real trees actually exist.
"For a bit of extra fun (and not for any strategic reason like kangaroos betraying your cover!) our programmers decided to put in a bit of animated wildlife. Since ModSAF is our simulation tool, these were modelled on ModSAF's Stinger detachments so that the associated detection model could be used to determine when a helo approached, and the behaviour invoked by such contact was set to 'retreat'. Replace the visual model of the Stinger detachment in your stealth viewer with a visual model of a kangaroo (or buffalo...) and you have wildlife that moves away when approached. It is true that the first time this was tried in the lab, we discovered that we had forgotten to remove the weapons and the 'fire' behaviour.
"It is NOT true that this happened in front of a bunch of visitors (American or any other flavour). We don't normally try things for the first time in front of an audience! What I didn't relate in the talk is that since we were not at that stage interested in weapons, we had not set any weapon or projectile types, so what the kangaroos fired at us was in fact the default object f
Re:File size by Anonymous Coward · 2009-07-17 04:06 · Score: 0

Apocryphal.
Re:File size by changedx · 2009-07-17 05:12 · Score: 1

The tricky thing is that the OP is asking two separate questions:
1) How do I group images of similar content (e.g. Natalie Portman eating hot grits) which may be of different dimensions/resolutions?
2) How do I choose the best archetype in each group?

For 1, the above poster is correct: 95% of the time, the better image will have a larger file size.
For 2, you'll need specialized software that can measure image similarity. If the software doesn't do automatic resize/rescale, you'll need to script that too.
Re:File size by Anonymous Coward · 2009-07-17 08:46 · Score: 0

This won't work, it's quite easy to set higher quality JPEG settings in later save iterations which may mean that the larger files are the less pristine of a set.
Re:File size by Anonymous Coward · 2009-07-17 08:49 · Score: 0

Oh how sweet it is to hear "Windows only, sorry".
Re:File size by jgrahn · 2009-07-19 00:00 · Score: 1

Well, that's why you look at it first. If you can't tell the difference, and want the "best quality" anyway, you got the same disease an as "audiophile", and I recommend some Monster display cables to go with it. :P
The OP obviously wants batch processing of perhaps thousands of images.
Even if he's not, it's not the same thing. You may want to enlarge parts of the image, change its contrast and so on later. It's hard to tell what artifacts hide at that level by just looking at the whole unmodified image. p

I'm not an expert by Flimzy · 2009-07-16 10:07 · Score: 1

But what if you saved both images in an uncompressed format (bmp?), then compressed them both using a lossless format (gzip?), and compared the file sizes...

Do it with a bunch of images, and I expect you'll discover that the low-quality-gzipped image will be smaller than the high-quality-gzipped image...

Maybe? *shrug*

Re:I'm not an expert by gurps_npc · 2009-07-16 10:09 · Score: 1

I agree that this would probably be the simplest method. Note, I wonder if something as simple as examining the file size of the jpeg would be good enough for most cases.

--
excitingthingstodo.blogspot.com
Re:I'm not an expert by Bill,+Shooter+of+Bul · 2009-07-16 10:20 · Score: 1

Good idea. I'm also not an expert. Though, I would think there is a limit to how well this would work. If it were cell shaded to some extent, it might look better than a lossy jpg, but compress to a smaller size. The question is if there would be any point in between where loss of information would actually result in better image quality.
Imagine a chess board is in the image. If an image is sort of lossy, the lines between the black and white might get a little blurred with some black running into some white and visa versa. If you just made the entire board to be a flat gray that averaged the two, it might look better to a human, even if that isn't what the original image was.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.
Re:I'm not an expert by sznupi · 2009-07-16 10:37 · Score: 1

I'm also not an expert, but I suspect it might work in the other direction far too often.
Perhaps artifacts of low-quality jpeg images, embedded in simple stream of bmp, could look more like noise to general purpose compressor; more than "natural" photographs with gradual gradients.
And random noise is incompressible.

--
One that hath name thou can not otter
Re:I'm not an expert by Anonymous Coward · 2009-07-16 10:57 · Score: 0

"I agree that this would probably be the simplest method."
Or he could just, you know, save it as an LZW or ZIP compressed tiff directly, rather than going from bitmap to zip. That would be simpler.
Not that that method is foolproof. Jpeg artifacting on a low-detail/high contrast image (like a checkerboard pattern) is spurious detail, and would result in a larger file size, whereas an unartifacted copy of the image would make a very small file.
Re:I'm not an expert by Anonymous Coward · 2009-07-16 11:22 · Score: 0

Gradients are not really a problem. A smooth gradient will become very blocky, resulting in less detail. What is a problem is high contrast edges, which will create spurious detail as a result of jpg artifacting. It's easily verifiable on your own--create a gradient in photoshop and save it as the lowest quality jpeg. Then create an image that's pure white on one half and pure black on the other and save it as the lowest quality jpeg. The gradient will turn into vast blocks of the same color. The b/w image will show significant spurious detail around the edge.
Re:I'm not an expert by Binary+Boy · 2009-07-16 15:23 · Score: 1

Definitely not a reliable test - it'll vary significantly by the image content.
There are many cases where the version compressed initially with a lossy encoder and then recompressed with a lossless encoder will be larger than the version just compressed with the lossless encoder. For instance, a simple image of horizontal, solid color lines will compress very well with any lossless encoder; when you run it through a JPEG encoder first (at just about any quality level) it'll add a lot of noise that'll bloat the size once you compress it with a lossless encoder.
Alternately, a noisy source image will have it's noise levels softened by heavy JPEG compression, likely resulting in a *smaller* file when encoded losslessly than the source. But a clean, low-noise source image will behave differently, at least at some JPEG compression levels. Too many variables here for this to be a useful test.
There's no easy solution to this problem, at least not without making certain assumptions about the image source - for instance, if you know all compressed versions came from the same source file, with no additional processing, and were encoded with the same jpeg library then the test can be very simple.
Re:I'm not an expert by sznupi · 2009-07-16 23:31 · Score: 1

Wasn't talking about gradients specifically, just that lack of proper "natural" ones (not only gradients, also edges) in picture with lots of artifacts (almost "random noise", essentially), might end up with bigger file in the method of OP, far too often for this method to be reliable.
Using your extreme example: image that is half white and half black. Compress it with jpeg quality settings 90 and 10. Then convert both to bmp and use general purpose compressor (as OP suggested). The "10" one will most likely be bigger! (much more ringing/noise, much more "random" data in comparison to "90" one; the latter will look much closer like alternating streams of pure zeros and ones - much easier to compress by general purpose compression algorithm)

--
One that hath name thou can not otter
Re:I'm not an expert by Anonymous Coward · 2009-07-17 00:10 · Score: 0

Wow, that's awesome!
I tried it with XnView:
Compressed an original image (Image A) to jpeg 50% (Image B) so that I had two comparable images.
Converted original and compressed to bitmaps (Image C and Image D). These of course are the same size (20737kb).
Then converted the bitmaps to same quality jpegs (Image E and Image F).
Image E came to 1124kb
Image F came to 806kb
Genius...

File Size. by Anonymous Coward · 2009-07-16 10:07 · Score: 0

Larger file size should give a rough hint if both images are the same format (i.e. JPEG). But you've probably already thought of that.

Of course, there ought to be better ways...

File size or density? by Durandal64 · 2009-07-16 10:08 · Score: 1

Have you tried just comparing the files' sizes with respect to the images' dimensions? It'll vary from encoder to encoder, but higher-quality JPEGs will be larger than lower-quality ones. You could just use the number of pixels in the picture and the file size to obtain a rough approximation of "quality per pixel" and choose the image with the higher value. It won't be perfect, but it's a lot easier than trying to pick out JPEG artifacts.

Also, the number of artifacts doesn't tell the full story. One image may have more artifacts, but those artifacts may all exist in the background parts of the image, while the foreground is less blocky. It's a choice each encoder makes.

Re:File size or density? by PCM2 · 2009-07-16 10:46 · Score: 1

And BTW, isn't this what most of us do already when we're searching Google Images?

--
Breakfast served all day!

Share your suggestions by gehrehmee · 2009-07-16 10:09 · Score: 1

Given a set a pictures, it would be really nice to see them grouped by "these are several pictures of the same scene/object/subject". This is a tool I'm not aware of yet, and I'd love to hear what open-source tools people are using.

As a next step, it would be neat to pick out the one that's most in focus...

--
"You know, Hobbes, some days even my lucky rocketship underpants don't help" -- Calvin

Re:Share your suggestions by Chabo · 2009-07-16 10:41 · Score: 1

I saw a piece of software that does something similar to what you're talking about; recently I watched James May's Big Ideas, they showed a camera that you wear around to create a lifelog.
The camera took photos every 30 seconds or so, and the software was able to divide sets of photos into "events"; it distinguished between the time the wearer was in the kitchen making breakfast, and when they sat at their computer typing up an article, for instance. I imagine that someone's created similar software for public use, then.

--
Convert FLACs to a portable format with FlacSquisher
Re:Share your suggestions by doti · 2009-07-20 09:11 · Score: 1

Yes, it takes a lot of work to organize all that porn.

--
factor 966971: 966971

The Human Eye by Anonymous Coward · 2009-07-16 10:09 · Score: 0

Artifacts are something visible to us - they mean nothing to software. It doesn't know wether the pixels are intentionally colored that way (ie, detail) or colored that way through some compression process at some point in time (ie, artifacts) or something else (eg, ditherting, color depth, banding, etc). If two images are compressed at vastly different ratios, you'll be able to tell easily. Otherwise, they're probably both at a default 90% and if you can't tell the difference, whats the problem?

Re:The Human Eye by jhfry · 2009-07-16 10:52 · Score: 1

Software is well suited to detecting patterns, including patterns that might appear as distracting artifacts in an image. Just because subjectively the pictures are both equally similar to the original, doesn't mean that they are mathematically similar.
I can imagine a method for comparing two images to an original image and scoring the two based upon how similar they are to the original while detecting and deducting for distracting compression artifacts. Ironically this method would be very similar to the JPEG compression algorithm itself, as it tries to make decisions with regard to subjective ideas and artifact reduction (dark areas are reduced in complexity, noise is blurred, etc because most people don't perceive these changes as readily as other areas of detail loss.)
Perhaps using a subjective comparison scoring mechanism you could train this algorithm to favor your individual taste during is compare, perhaps you could even add this capability to a JPEG compression tool so your JPEGs can reflect your individual perception better.
For example, I can accept more blurring, but banding in solid color areas are very distracting to me, while some people would rather keep detail that might be smoothed out and they might tolerate a bit of banding in large fields of color.

--
Sometimes the best solution is to stop wasting time looking for an easy solution.
Re:The Human Eye by sbeckstead · 2009-07-16 14:01 · Score: 1

There are DNA comparison tools that do just that actually, they compare for similarity of strings of DNA, not exact match and give you a percentage score. Hmmm have to look into this.

--
Why bother

Well... by Anonymous Coward · 2009-07-16 10:09 · Score: 0

If you want to know which image has more artefacts, it would still be hard to tell what is an artefact and whats supposed to be part of the image.

If you just want to know which is more compressed.. dont jpeg images store the compression ratio used the last time they were saved? It should be in the header somewhere.

Try compressing both further by Ed+Avis · 2009-07-16 10:09 · Score: 2, Insightful

I suppose you could recompress both images as JPEG with various quality settings, then do a pixel-by-pixel comparison computing a difference measure between each of the two source images and its recompressed version. Presumably, the one with more JPEG artefacts to start with will be more similar to its compressed version, at a certain key level of compression. This relies on your compression program generating the same kind of artefacts as the one used to make the images, but I suppose that cjpeg with the default settings has a good chance of working.

Failing that, just take the larger (in bytes) of the two JPEG files...

--
-- Ed Avis ed@membled.com

Re:Try compressing both further by JPortal · 2009-07-16 10:21 · Score: 1

I think you're onto something. It's not perfect, but all other solutions require (1) near-human AI, (2) the original file as a control, or (3) comparing filesize which may not be accurate, as described in other comments.
Re:Try compressing both further by 4D6963 · 2009-07-16 12:47 · Score: 1

Good ideas, although I suppose you could combine the two ideas. Recompress both images using the same high quality settings, and if you'll assume that the JPEG algorithm will have an easier time compressing what's already been damaged (after all why not, doesn't it work by discarding spectral components? Therefore if more are already discarded from the start it should compress it better) and compare the file sizes.
I think it should work for most cases, and the nice thing is you can make it work with a mere bash script and ImageMagick's convert command.

--
You just got troll'd!

Filesize is a hint by democrates · 2009-07-16 10:09 · Score: 1

less compression = bigger file

Re:Filesize is a hint by thethibs · 2009-07-16 10:34 · Score: 4, Informative

More Noise = Less Compression

--
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
Re:Filesize is a hint by democrates · 2009-07-16 10:44 · Score: 1

Lol, noise is indeed a poor word for information loss.
Re:Filesize is a hint by Anonymous Coward · 2009-07-16 14:57 · Score: 0

Exactly. This sounds overly simple, but I'll bet this would work 90+ percent of the time: multiply the height and width in pixels, and then divide that product by the unique number of color values across the pixels. More noise would likely mean more like-colors across neighboring pixels.
Admittedly it's not perfect, but most of the suggestions here involve including the images to the original. Doesn't that contradict the OP?
Re:Filesize is a hint by SharpFang · 2009-07-16 20:36 · Score: 1

Different noise levels = Not same image.

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2

Requires original image in loss-less form by xquark · 2009-07-16 10:11 · Score: 1

or else the problem is not truly resolvable. The other way is to
assume all the similar images come from the same source, if so then
its as simple as looking at the compression level in the file format
and the various levels of scaling applied to the lossy images.

--
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.

I believe I've heard of something similar... by Anonymous Coward · 2009-07-16 10:11 · Score: 0

You save both images at a lower JPG quality and you test which one changed more (across all its pixels; remember to factor in the relative importance of various types of changes) from its original state to its new lower-quality state. The image that changed more was more dependent on the original JPG quality setting so it had more information, detail, etc..

Artificial, artificial intelligence by Ant2 · 2009-07-16 10:12 · Score: 1

Take a look at Amazon's Mechanical Turk offering. (www.mturk.com) You create "hits" either through their website or API and real humans complete the tasks. This one would be simple, some two (or more) images to the worker and have them select the one with the best quality. You can even pre-screen workers by having them complete a qualification test. You could get all your images sorted out for a penny or two a piece.

ImageMagick can give you EXIF data. by bcrowell · 2009-07-16 10:12 · Score: 4, Informative

The ImageMagick package includes a command called identify, which can read the EXIF data in the JPEG file. You can use it like this:

identify -verbose creek.jpg | grep Quality

In my example, it gave " Quality: 94".

This will not work on very old cameras (from ca. 2002 or earlier?), because they don't have EXIF data. This is different info than you'd get by just comparing file sizes. The JPEG quality setting is not the only factor that can influence file size. File size can depend on resolution, JPEG quality, and other manipulations such as blurring or sharpening, adjusting brightness levels, etc.

--
Find free books.

Re:ImageMagick can give you EXIF data. by ss122_ry · 2009-07-16 10:38 · Score: 1

Is it guaranteed that, of two visually similar jpegs, the one with a higher quality has less artifacts? I think not.
Re:ImageMagick can give you EXIF data. by Jane+Q.+Public · 2009-07-16 10:53 · Score: 1

No guarantee, but the probability is extremely high. If you have two files of the same dimensions that are visually similar (i.e., different versions of the same picture), then the one with the higher dpi rating (which is not directly related to dimension in JPEG) and better quality (lower compression) as shown by EXIF data is almost certainly going to be the one with fewer artifacts in the real world. Of course it is possible to create situations in which that is not so, but they don't usually happen accidentally.

But as someone else mentioned, EXIF data is not always present. Some programs do not generate EXIF data when they save the file, for example.
Re:ImageMagick can give you EXIF data. by DotDotSlasher · 2009-07-16 10:56 · Score: 3, Informative

imagemagick can also compare two images, and tell you how different they are. That is -- quantify the differences by returning a floating point number or two (PSNR, RMSE) in a way that a more-compressed JPEG image will return a correspondingly different floating point value. I know the question concerns two JPEG-compressed images, but if you do have an original image -- and you want to test which is closest to the original, ImageMagick can do that. Use the ImageMagick compare function.
See http://www.imagemagick.org/script/compare.php

Also, [[www.gimp.org]] is able to look at an image and approximate what JPEG compression quality setting was used, and use that same quality setting to save an output JPEG copy of the image. So -- they have some algorithm inside of their application which takes an image and returns (a good guess of) the corresponding jpeg quality value.
Of course, this does not help you if the image was saved with a lousy JPEG quality value, like 10/100, and later saved at a much higher value, like 98/100. Since the algorithm only sees the last image, it would tell you the quality value is 98/100, even though the contents of the image would indicate the results of 10/100 compression, because of multi-generational lossy compression.
Re:ImageMagick can give you EXIF data. by bk2204 · 2009-07-16 11:42 · Score: 1

This doesn't work on all cameras. My Olympus Stylus 810 is from 2006, and it doesn't have that within the information. And yes, it does support EXIF.
Re:ImageMagick can give you EXIF data. by Will.Woodhull · 2009-07-16 13:52 · Score: 1

These techniques should work-- provided the only data in the files that are being compared is the image data. If other data has been overlaid on an image by steganography, then that is likely to confuse any attempt to identify the image with the greatest fidelity to the original.
I can't think of any way to assess how often steganography is currently being used on the web. Its use is growing; the number of downloads of steganographic software is increasing. There is software available that can test for the presence of a steganographic overlay in a suspected jpeg file with some reliability. But if the problem is determining whether one of the 250 snapshots taken at the Company Picnic and being uploaded to Flikr is also loaded with the recipe for the Company's Secret Sauce... I don't think there is any effective way to do that kind of screening.
Is Original Poster actually looking for a way to bulk scan for the use of steganography? It seems to me that this an emerging problem in corporate espionage.

--
Will
Re:ImageMagick can give you EXIF data. by Anonymous Coward · 2009-07-16 23:11 · Score: 1, Informative

ImageMagick does not need EXIF data. It estimates the quality by looking at the JPEG quantization table.
$ convert logo: jpeg:- | identify -verbose - | grep Quality
Quality: 92

Translation: Please help me with my porn... by Chyeld · 2009-07-16 10:13 · Score: 5, Insightful

Dear Slashdot,
Recently I checked my porn drive and realized that I have over 50 gigibytes of jpg quality porn collected. Unfortunately, I've noticed that a good portion of these are all the same picture of Natlie Portman eating hot grits. Could you please point me to a free program that will allow me to find the highest resolution, best quality version of this picture from my collection and delete the rest?
Many Thanks!

Re:Translation: Please help me with my porn... by Pointy_Hair · 2009-07-16 10:36 · Score: 1

... and I've been wanking to it so often I actually -have- gone blind now and can't tell the versions apart, even on my new widescreen monitor!
Re:Translation: Please help me with my porn... by kpoole55 · 2009-07-16 11:18 · Score: 1

And if it were normal porn it wouldn't be so much of a problem as JPEG loses aren't so noticeable in images of natural items or settings but this is cartoon porn where people should have used .gif to start and the loses along what should be sharply defined lines becomes more noticeable.
Re:Translation: Please help me with my porn... by lenzm · 2009-07-16 13:13 · Score: 1

Pfft, 50 GB? This is slashdot, I'm sure he's got a dedicated RAID array.
Re:Translation: Please help me with my porn... by Anonymous Coward · 2009-07-16 23:09 · Score: 0

Since this is slashdot, here is the obligatory xkcd reference ;)
http://www.xkcd.com/598

jpeg quality != image quality by johnrpenner · 2009-07-16 10:14 · Score: 1

all things the same, jpeg quality gives a good index to the quality of the image,
but it can be just as true that a lower jpeg quality image might be a better quality image.

for example, two images: the first image might be scanned off a badly faded
colour photocopy of a famous painting - it is saved at 300 dpi - approximately
2800 x 1200 pixels, and the jpeg quality set at 12 -- the second image is a
well lit photograph of the original painting, scanned on a scitex scanner,
and brought in as a tiff original -- all high end, but then they res it down to
only 1600 x 900 pixels, say at 200 dpi, and saved at a jpeg quality of 8.

well, in such a case, most software based on the assumption
that jpeg quality = image quality would auto-pick the worser image. :-(

handy index variable to have tho - could provide a resolution / jpeg quality
metric in the google image searches... :-)

all the best
john penner

Contrast by clem.dickey · 2009-07-16 10:14 · Score: 0, Redundant

An image with more contrast (greater average difference between adjacent pixels) probably has more detail. But compressability, as has already been noted, is probably just as good a measure.

use the JPEG underlying details by cellurl · 2009-07-16 10:15 · Score: 2, Insightful

To make a JPEG, you cut it into blocks, run the DCT on each block and mess with the 4:2:2 color formula and pkzip the pieces... That said, I would think measuring the number of blocks would be related to number of artifacts... In my barbaric approach to engineering, (assuming there is no other suggested way on slashdot), I would get the source code to the JPEG encoder/decoder and print out statistics (number of blocks, block size) of each image...

It's easy by Anonymous Coward · 2009-07-16 10:16 · Score: 5, Insightful

Run the DCT and check how much it's been quantized. The higher the greatest common factor, the more it has been compressed.

Alternatively, check the raw data file size.

quantization tables by angryargus · 2009-07-16 10:17 · Score: 3, Insightful

Others have mentioned file size, but another good approach is to look at the quantization tables in the image as an overall quality factor. E.g., JPEG over RTP (RFC 2435) uses a quantization factor to represent the actual tables, and the value of 'Q' generally maps to quality of the image. Wikipedia's doc on JPEG has a less technical discussion of the topic, although the Q it uses is probably different from the example RFC.

iterate over jpeg quality setting by Anonymous Coward · 2009-07-16 10:17 · Score: 0

When you save a JPEG, you usually choose a quality setting 0-100. I'm not sure if the effect of that is standard, but this should have reasonable results either way: try saving both images at 100, then progressively decrease the quality level until the image changes because additional artifacts are being introduced. This way, you can experimentally determine the quality setting of each image, and just choose the higher of the two. Alternatively, if that quality setting is available in the metadata somewhere, just read that.

Measure sharpness? by Anonymous Coward · 2009-07-16 10:18 · Score: 4, Interesting

Compute the root-mean-square difference between the original image and a gaussian-blurred version?
JPEG tends to soften details and reduce areas of sharp contrast, so the sharper result will probably
be better quality. This is similar to the PSNR metric for image quality.

Bonus: very fast, and can be done by convolution, which optimizes very efficiently.

Re:Measure sharpness? by PCM2 · 2009-07-16 10:43 · Score: 1

But this method requires a copy of the original -- or failing that, you'd need to already know which of the JPEGs is the highest quality, which defeats the purpose.

--
Breakfast served all day!
Re:Measure sharpness? by uhmmmm · 2009-07-16 10:52 · Score: 3, Insightful

Even faster is look at the DCT coefficients in the file itself. Doesn't even require decoding - JPEG compression works by quantizing the coefficients more heavily for higher compression rates, and particularly for the high frequency coefficients. If more high frequency coefficients are zero, it's been quantized more heavily, and is lower quality.
Now, it's not foolproof. If one copy went through some intermediate processing (color dithering or something) before the final JPEG version was saved, it may have lost quality in places not accounted for by this method. Comparing quality of two differently-sized images is also not as straightforward either.
Re:Measure sharpness? by uhmmmm · 2009-07-16 10:56 · Score: 1

Also, JPEG works on blocks. While it's true that JPEG gets rid of high frequency details first (and thus results in blurring), this is only useful within each block. You can have high contrast areas at the edge of each block, and this is actually often some of the most annoying artifacting in images compressed at very low quality. So just because it has sharp edges doesn't mean it's high quality.
Re:Measure sharpness? by uhmmmm · 2009-07-16 11:00 · Score: 1

No it doesn't. This method has another problem (see my replies to it), but other than that, it could work. He's suggesting that to each copy of the image, you look at the difference between that copy and a blurred version of it. This will give you an idea of how sharp that copy is. And since JPEG throws out high frequency information first, resulting in blurring, it would appear at first glance that the sharper image should be the higher quality one.
As I said in another comment though, JPEG operates on blocks, and especially at very low qualities, you get sharp edges between each block. So the assumption that sharp image == high quality is not really valid.
Re:Measure sharpness? by Anonymous Coward · 2009-07-16 11:44 · Score: 0

Throwing out high frequencies can actully result in more "ringing" at each side of high contrast edges, and, as you say, increases blockiness in low contrast areas. So, no, the low quailty wil not (not usually, andyway) result in blurring, it will result in greater artifacts, and those will create an image with more "sharpiness".
Re:Measure sharpness? by 4D6963 · 2009-07-16 12:50 · Score: 1

Even faster is look at the DCT coefficients in the file itself.
Yeah, if you're familiar with the way these are encoded in the JPEG format. But you're right though.

--
You just got troll'd!
Re:Measure sharpness? by Anonymous Coward · 2009-07-16 13:17 · Score: 0

Also, JPEG works on blocks. While it's true that JPEG gets rid of high frequency details first (and thus results in blurring), this is only useful within each block. You can have high contrast areas at the edge of each block, and this is actually often some of the most annoying artifacting in images compressed at very low quality. So just because it has sharp edges doesn't mean it's high quality.
I posted the "measure sharpness" suggestion. This problem is trivial to deal with -- block boundaries occur in very predictable locations (every 8 pixels), and their impact can be removed from the details calculation easily. You can blur along block boundaries or apply a weighting function to the "detail" images that cancels out boundary effects.
I bet it would be even easier operating in the fourier space.
Re:Measure sharpness? by Anonymous Coward · 2009-07-16 13:35 · Score: 0

I am the original "measure sharpness" poster.

ringing artifacts
Will have less contrast than the original high-contrast edge. They will need to be extremely prevalent noticeable to contribute more to the root-mean-square than the original, sharper edge.
But yes, this could be a problem.
Re:Measure sharpness? by Hurricane78 · 2009-07-16 15:13 · Score: 1

Uuum... I think he does not have the original. And I think that is the point. Because if he had the original, he could, you know, ...use that one! ;)

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Re:Measure sharpness? by b4dc0d3r · 2009-07-17 02:08 · Score: 1

In theory that would work, but you have to consider the data source. If you're talking about images from the net, this won't cut it. Lots of times the images will be downloaded and re-saved using higher quality settings. The result is double compression and the lesser quality image has the higher settings.
Re:Measure sharpness? by uhmmmm · 2009-07-18 02:12 · Score: 1

Merely resaving an image with a higher quality setting won't magically repopulate the high frequency coefficients, so I think this will still work in that case. Of course, if the image has gone through any sort of filtering or modification before being resaved, then all bets are off. But that's also beyond the scope of what the original poster asked for - he wanted a way to detect JPEG compression artifacts, not any extra filtering that might have also been applied.

DCT by tomz16 · 2009-07-16 10:18 · Score: 4, Informative

Just look at the manner in which JPEGs are encoded for your answer!

Take the DCT (discrete cosine transform) of blocks of pixels throughout the image. Examine the frequency content of the each of these blocks and determine the amount of spatial frequency suppression. This will correlate with the quality factor used during compression!

Re:DCT by mikenap · 2009-07-16 10:54 · Score: 3, Insightful

This seems to me the best suggestion, and there's a simple visual way to accomplish it! The hardest hit part of the image is going to be the chroma information, which your eye normally has reduced resolution sensitivity for in a normal scene. To overcome this, load your JPEGs into your favorite image editor and crank the saturation to the max(this throws away the luminance data). Now the JPEG artifacts in the chroma information will HIT YOU IN THE FACE, even in images that seemed rather clean before. Pick the least blocky of the two, and there you go!
Re:DCT by Anonymous Coward · 2009-07-16 11:37 · Score: 1, Insightful

Or just take the 2D FFT of the entire images. Higher JPEG compression should result in fewer high frequency components in an image.
Re:DCT by eggnoglatte · 2009-07-16 14:12 · Score: 3, Insightful

That works, but only if you have exact, pixel-to-pixel correspondence between the photos. It won't work if you just grab 2 photos from flicker that both show the Eiffel tower, and you wonder which one is "better".
Luckly, there is a simple way to do it: use jpegtran to extract the quantization table form each image. Pick the one with the smaller values. This can easily be scripted.
Caveat: this will not work if the images have been decoded and re-coded multiple times.
Re:DCT by Anonymous Coward · 2009-07-16 14:59 · Score: 0

Just look at the manner in which JPEGs are encoded for your answer!
Take the DCT (discrete cosine transform) of blocks of pixels throughout the image. Examine the frequency content of the each of these blocks and determine the amount of spatial frequency suppression. This will correlate with the quality factor used during compression!

good call. this the only remotely correct answer to the actual question on here so far.
Re:DCT by Anonymous Coward · 2009-07-16 16:34 · Score: 0

It's not immediately clear to me that a simple 2D FFT on the entire image will work as you claim. The edges of the DCT transform window can have very sharp edges. Such artifacts might show up as a high spatial frequency signal in a 2D FFT.

use a "difference matte" by Anonymous Coward · 2009-07-16 10:20 · Score: 4, Informative

load up both images in adobe after effects or some other image compositing program and apply a "difference matte"

Any differences in pixel values between the two images will show up as black on a white background or vise versa...

adam
BOXXlabs

Re:use a "difference matte" by uhmmmm · 2009-07-16 11:03 · Score: 2, Insightful

So, that will show you which parts differ. How do you tell which is higher quality? Sure, you can probably do it by eye. But it sounds like the poster wants a fully automated method.
Re:use a "difference matte" by miggyb · 2009-07-16 11:53 · Score: 1

Someone already suggested that before, and I'm not understanding it. The result would be a delta, but it wouldn't help with figuring out which one of the original two was of a higher quality. Having a delta just tells you that the two pictures are different.

--
This signature serves no purpose other than to help you see which posts were made by me.
Re:use a "difference matte" by Anonymous Coward · 2009-07-17 04:56 · Score: 0

Ok, Mr. +4 Informative After Effects guy,
how do you know which image has the higher quality, which one the lower?
And how do you repeat the process hundreds of times?
Must be a real joy spending all day in your favorite Image Compositing Program. No wonder you haven't been sober in months.
And for you whack moderators out there: Stop modding up comments because it contains a few fancy-sounding words you've heard the cool kids use before!
adam
BOXXlabs

find the edges? but size is useful and easy? by with+a+'c' · 2009-07-16 10:20 · Score: 1

Assuming you can find similar images programmatically you can probably use size to get a good guess. Alternately I know there are algorithms to find edges. Edges are where most jpeg artifacts show up. If you could then look at the gradient from the edges smooth ones will likely be the better image.

Try ThumbsPlus by Anonymous Coward · 2009-07-16 10:21 · Score: 3, Informative

ThumbsPlus is an image management tool. It has a feature called "find similar" that should do what you want as far as identifying to pictures that are the same except for the compression level. Once the similar picture is found you can use ThumbsPlus to look at the file sizes and see which one is bigger.

Re:Try ThumbsPlus by Anonymous Coward · 2009-07-17 05:06 · Score: 0

Oh I'm already looking forward to meta-moderating this whole thread!
Stupid kids don't know the fuck they're talking about and stupid moderators mod up everything that contains a brand name.
I wish I had a lawn -_-

Found it a while ago by sco08y · 2009-07-16 10:22 · Score: 5, Informative

I mean, you don't want second rate pictures in your pr0n stash?

I had problems building it back then, let alone writing the scripts for it and the hassle of figuring out which images were duplicates, but this utility seems to fit the bill.

Re:Found it a while ago by kpoole55 · 2009-07-16 11:23 · Score: 1

Thanks, I will look at this. It's not just that there are occasional duplicates but, in some subject matters, so many duplicates that the storage requirements become a nuisance.
Re:Found it a while ago by Hurricane78 · 2009-07-16 14:00 · Score: 1

Pictures? In *my* porn collection? And they are not moving? What is this? 1999??

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.

The most obvious artefects... by 91degrees · 2009-07-16 10:22 · Score: 1

Seems that if I really overcompress a JPEG, the main problems are at the edges of the blocks. This is not really unexpected.

So a simple first pass would be to apply a simple edge detector and look for discontinuities at the edges of the 8x8 blocks. For an example, just try an edge detector in any decent image editing app on an overcompressed JPEG.

Bits per pixel by Citizen+of+Earth · 2009-07-16 10:24 · Score: 1

Compute the number of bits per pixel of the image data.

edge detection by Anonymous Coward · 2009-07-16 10:24 · Score: 0

use an edge-detection filter. since jpeg artifacts usually present themselves as "smeared out" edges, you may be able to figure out some rule based on the edge-detected image.

image quality measures by trb · 2009-07-16 10:28 · Score: 4, Informative

google (or scholar-google) for Hosaka plots, or image quality measures. Ref:

HOSAKA K., A new picture quality evaluation method.
Proc. International Picture Coding Symposium, Tokyo, Japan, 1986, 17-18.

Re:image quality measures by Anonymous Coward · 2009-07-16 10:55 · Score: 0

Any idea where to get hold of that paper? It only pops up as a reference for me.
Re:image quality measures by trb · 2009-07-16 17:13 · Score: 1

I don't know where to find the Hosaka paper now. I read it back then. I'm not in the imaging biz any more, so I haven't kept up.
The Picture Coding Symposium (where Hosaka presented and published his 1986 paper) still exists, perhaps you can get it from them.
Here's a paper which shows some Hosaka plots and discusses them, and apparently has some more recent info, since it was written 10 years after Hosaka's - still more than 10 year ago.
http://www.sci.brooklyn.cuny.edu/~eskicioglu/papers/IEEETransCom95.pdf
I assume that if you google for Hosaka and Eskicioglu you can follow the bibliography trail to more recent work, if you like.

Neural network! by Anonymous Coward · 2009-07-16 10:29 · Score: 0

Compress a bunch of original images with variable quality, noise, etc.

Go through this set of images (where you know which one is "best") and train it to return two booleans, one for match/no-match, another for first better or second better.

Slow to train, but you can use GPGPU for massive speedups.

Blur Detection? by HashDefine · 2009-07-16 10:37 · Score: 2, Informative

I wonder if out of focus or blue detection methods will give you a metric which varies with the level of jpeg artifcats, after all the jpeg artifacts should make it more difficult to do things like edge detections etc which are the same the things that made more difficult by blurry and out of focus images

A google search for blur detection should bring up things that you can try, Here is series of posts that to do a good job of explaining some of the work involved

Fourier transform by maxwell+demon · 2009-07-16 10:37 · Score: 2, Interesting

Assuming the only quality loss is due to JPEG compression, I guess a fourier transform should give you a hint: I think the worse quality image should have lower amplitude of high frequencies.

Of course, that criterion may be misleading if the image was otherwise modified. For example noise filters will typically reduce high frequencies as well, but you'd generally consider the result superior (otherwise you woldn't have applied the filter).

--
The Tao of math: The numbers you can count are not the real numbers.

Well by Anonymous Coward · 2009-07-16 10:39 · Score: 0

You could just open the low quality images and save them with a higher quality setting.

Check the quantization by PhrostyMcByte · 2009-07-16 10:43 · Score: 1

I remember a Slashdot article of a guy who used JPEG quantization to detect if images were photoshopped... it had an example of a terrorist adding books. Can't find it via google tho.

Re:Check the quantization by Animaether · 2009-07-16 11:19 · Score: 3, Informative

http://www.cs.dartmouth.edu/farid/research/tampering.html
http://www.cs.dartmouth.edu/farid/publications/tr06a.html

Filters by mypalmike · 2009-07-16 10:47 · Score: 5, Funny

First, make a bumpmap of each image. Then, render them onto quads with a light at a 45 degree angle to the surface normal. Run a gaussian blur on each resulting image. Then run a quantize filter, followed by lens flare, solarize, and edge-detect. At this point, the answer will be clear: both images look horrible.

--
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.

Re:Filters by miggyb · 2009-07-16 11:56 · Score: 1

The one that looks less horrible is, of course, the one that was originally a higher-quality jpg. Genius!

--
This signature serves no purpose other than to help you see which posts were made by me.

Look for boundries by Anonymous Coward · 2009-07-16 10:50 · Score: 0

JPEG compression averages groups of pixels with similar color data inside the JPEG image, but does not weigh that average against nearby pixel groups. You can use this fact to identify JPEG artifacts, even if the edges between artifacts is not visible to human eyes.

EG, in a patch of sky, which has a fairly random, but otherwise uniform distribution of shades of blue, there will emerge "squares" where the averaging algorithm has averaged a pixel group, but did not weigh the average of adjacent groups, resulting in a visually identifiable artifact.

You can gauge the quality of a compressed JPEG image by testing for discrete boundries in areas of similar color values that would nominally contain a random (or smooth gradient with random dither) aggregation of similar color types, and assinging a "Severity" value based on the 'hardness' of the artifact's differnce to it's neighbors.

In other words, in areas that would have originally had a nice "smooth" blending of similar colors, you will end up with blocks of discrete colors that have discernable edges. The severity of artifacting would be determinable by measuring how far discretely unique each artifact block is from it's neighbors, (with caveats to natural boundries- such as sky against tree, etc.)

To evaluate if an edge is a JPEG artifact or not, you should gather the JPEG pixel group size from the JPEG header, then see if your edges form a rectangle that is a multiple of that size.

This way you can tell if the hard edge is an artifact, or if it is the edge of Paris Hilton's nipple (or some other natural edge. Natural edges will very rarely have a mathematically perfect rectangular profile.)

A systematic evaluation of an image would be slow and painful, but would produce a scoring benchmark to rate two arbitrary JPEGs against each other. (Better would, of course, be 2 JPEGS and a lossless PNG-- that way you have the un-averaged data to help identify artifact boundries with, among other things, but that isnt what you asked for.)

different images? by Anonymous Coward · 2009-07-16 10:51 · Score: 0

"It's known that saving the same source image in JPEG format at different quality levels produces different images"

news to me.

tineye? by E+IS+mC(Square) · 2009-07-16 10:52 · Score: 1

Check out Tineye - http://tineye.com/faq

It does not do exactly what above post suggests, but it partially does what submitter asked (finding similar images on the net).

Re:tineye? by kpoole55 · 2009-07-16 11:57 · Score: 1

This is a very interesting service that might answer a different sort of question that I hear in the forums I frequent. Thanks.
Re:tineye? by Binary+Boy · 2009-07-16 15:29 · Score: 1

The image toolkit TinEye is based on (Piximilar) is far more powerful even than TinEye. Awesome stuff, one of the best commercial CBIR engines I've seen.
If you just want to group near-identical images, which vary only by minor processing - resolution, minor color correction - there are simple, low-end tools that can do this easily. imgseek is open source and works pretty well; I also use the Windows-based VSDIF, which isn't bad for finding duplicates in various formats, scales, and color spaces (I use it for deduplicating image libraries - the corporate edition has a command line interface). Both of these tools have limits when it comes to cropping, non-right-angle rotations, whereas Piximilar and some of its competitors can handle pretty radically modified images, or recognize individual components of larger images.

Automatic JPEG Artifact Removal by yet-another-lobbyist · 2009-07-16 10:55 · Score: 4, Interesting

For what it's worth: I remember using Paint Shop Pro 9 a few years ago. It has a function called "Removal of JPEG artifacts" (or similar). I remember being surprised how well it worked. I also remember that PSP has quite good functionality for batch processing. So what you could do is use the "remove artifact" function and look at the difference before/after this function. The image with the bigger difference has to be the one of lower quality.
I am not sure if there is a tool that automatically calculates the difference between two images, but this is a task simple enough to be coded in a few lines (given the right libraries are at hand). For each color channel (RGB) of each pixel, you basically just calculate the square of the difference between the two images. Then you add all these numbers up (all pixels, all color channels). The bigger this number is, the bigger the difference between the images.
Maybe not your push-one-button solution, but should be doable. Just my $0.02.

Re:Automatic JPEG Artifact Removal by kpoole55 · 2009-07-16 12:53 · Score: 1

This is an interesting idea in that it really doesn't compare the features in the two images under consideration to each other. That's a break from the usual trail I've been following. Thanks for the suggestion.

compare against the static baseline. by circusboy · 2009-07-16 11:04 · Score: 1

compare both images against the original, not each other.
count number of pixels different from the original, then calculate max and average difference between either image and the original.

decide which parameter means more to you.

go forward from there.

--
-- it's ridiculous how many people misspell ridiculous... (damn, damn, damn...)

Re:compare against the static baseline. by circusboy · 2009-07-16 11:12 · Score: 1

adding to that, you can run the following algorithm on the diff images.
1. blur image by an arbitrary value,
2. darken the image by an arbitrary value.
3. repeat until image is all black.
count the number of repetitions. given various values for steps one and two, you can tune the algorithm to find images that have large areas of mismatch.
possibly not useful to you, but have found it good for validation testing for image manipulation software.

--
-- it's ridiculous how many people misspell ridiculous... (damn, damn, damn...)

How about audio? by bondiblueos9 · 2009-07-16 11:07 · Score: 2, Interesting

I would very much like to do the same with audio. I have so many duplicate tracks in my music collection in different formats and bitrates.

--
Warning: The Surgeon General Has Determined that Sigs are Dangerous to Your Health

Re:How about audio? by notseamus · 2009-07-16 12:22 · Score: 1

If you're running a mac and have all your files in an itunes library, then Dupin is extremely useful. It matches on name, size, length, bit rate, or all at once.
It's pretty useful, and the freeware version lets your delete from drive as well as library.
If you're on windows, I searched for years and couldn't find anything :(

--
I dreamed of Freud: What does this mean?

Look at the DCT coefficients by uhmmmm · 2009-07-16 11:09 · Score: 3, Informative

JPEG works by breaking the image into 8x8 blocks and doing a two dimensional discrete cosine transform on each of the color planes for each block. At this point, no information is lost (except possibly by some slight inaccuracies converting from RGB to YUV as is used in JPEG). The step where the artifacts are introduced is in quantizing the coefficients. High frequency coefficients are considered less important and are quantized more than low frequency coefficients. The level of quantization is raised across the board to increase the level of compression.

Now, how is this useful? The reason heavily quantizing results in higher compression is because the coefficients get smaller. In fact, many become zero, which is particularly good for compression - and the high frequency coefficients in particular tend towards zero. So partially decode the images and look at the DCT coefficients. The image with more high frequency coefficients which are zero is likely the lower quality one.

Image Quality Metrics. by Jeremy+Erwin · 2009-07-16 11:13 · Score: 1

Something like $\frac{1}{N} \sum_{i=1}^{N}(x_i-y_i)^2$, where $x$ and $y$ are arrays of pixels, and $N is the number of pixels in each array?

Is there a way to find out the compression engine? by ID000001 · 2009-07-16 11:18 · Score: 1

Does JPEG header have the compression method listed as well as compression ratio? If not, is there any way to figure out what kinda compresison engine is used base on how an image is constructed?

If so, simply do some testing against some of the most popular compression engine base on the artifact to determines what engine is used, then find out their compression ratio (perhaps a simple files size might work?). Then simply pick the images with the best quality base on engine used and ratio?

GREYCstoration by Rashdot · 2009-07-16 11:21 · Score: 1

Run the free GREYCstoration algorithm on both images, subtract results from original, and pick the one most similar to the original: http://www.greyc.ensicaen.fr/~dtschump/greycstoration/

--
This is not the sig you're looking for.

Re:the solution is simple by Anonymous Coward · 2009-07-16 11:26 · Score: 0

Cool story, bro.

Image sharpness measuring? by Anonymous Coward · 2009-07-16 11:28 · Score: 0

Replying to your post to create a new sub-thread, hope you don't mind as I think it involves similar research...

Often when I look at digital photos taken at a camera's maximum megapixel range, or even scans of negatives, or random pictures on the interwebs, I find them to be rather blurry; not necessarily out-of-focus, but simply 'soft'.

Essentially.. there's more information being used to store the image 'as is' than there is casually useful* information -in- the image.

Does anybody know of software, or algorithms, to figure out how much casually useful information is in a picture, and at what size (dimensions) that picture would optimally be stored?

* by 'casually useful' I mean this... take today's APOD image:
http://antwrp.gsfc.nasa.gov/apod/ap090716.html ( view full - sparing their bandwidth by not linking to it, though I'm sure they have plenty )
That image to me, the casual user, looks blurry. Ever single pixel within it (and beyond from the original) is probably very important to the scientists; being able to run some algorithms on it to get every last bit of information from it. But when I look at it, I see the smallest 'feature' in it as being maybe 3-4 pixels across, let's say 4. So if I downsize it to 25% of the full size image, it looks perfectly sharp to me without any significant (to me, the casual user) loss of information. /anon

NO not file size by frovingslosh · 2009-07-16 11:33 · Score: 1

NO. Not file size. File size would be a potential test if all images were from the same original source and if they were only ever jpeg compressed once. Unfortunately, quite often one will come across images that have been jpeg compressed and re-compressed, and the final re-compression was done at "high quality', So the file is large for the image, but it still contains all of the jpeg artifacts from the lower quality compression. You can also see extra artifacts when one file has only been compressed once but another file has been compressed repeatedly, even if the second file is the same size as the file that was only compressed once.

There are, of course, other issues that come into question too, such as original color depth and color depth of every intermediate image.

The poster asked a good question, but you did not provide a helpful answer.

--
I'm an American. I love this country and the freedoms that we used to have.

variation by superwiz · 2009-07-16 11:36 · Score: 1

Compute the variance of the Fourier coefficients within each block and then calculate the average for each image. The better quality image should have lower variance. If a block has a lot of edges, then the higher frequency coefficients should have much higher values than the lower ones. If a block is uniform, then the lower frequency coefficients should have higher values. So if you have a good image, it will be easy to see the difference between uniform parts and edges. That is the coefficients of the most "important" frequency within a block will be higher. If your have a poor quality image, then not.

--
Any guest worker system is indistinguishable from indentured servitude.

Tough process, have a look at the frequency domain by Anonymous Coward · 2009-07-16 11:38 · Score: 0

If you've identified two images as the same (can be done by comparing pics of the same spatial resolution (make sure to low pass filter before resize to avoid artifacting!) and looking at the mean sum of square differences for really small differences... you'll have to play around with tolerance to find if its the "same" but and I'd always keep a weary eye... it'll just find similar images IMO), then you just have to take a look at their frequency domain counterpart images. The images with the most detail will have more energy in the high frequencies than the other less detailed images.
On the other hand, strictly for seeing who has the most artifacting, if you've identified images as the "same", the completely horizontal and vertical high frequencies should have lots of energy (by comparison wrt the good image and within the bad image itself) to make all those blocks.
Matlab makes it easy to visualize and transform this kind of stuff so take a look at its image processing toolbox or documentation (docs freely available online).

It depends what you want.. by Paracelcus · 2009-07-16 11:41 · Score: 1

find dupes on the internet http://tineye.com/
find dupes on your HDD http://www.bigbangenterprises.de/en/doublekiller/

--
I killed da wabbit -Elmer Fudd

difference by collywally · 2009-07-16 11:42 · Score: 1

This is how I check for how much compression i have in my images.
1. Grab the original and the jpeg into photoshop (or whatever you use)
2. do a difference as your transfer mode. This will show you how different it is.
3. find out the value of all the pixels (I don't know ad them together or something)
Repeat the above steps with the second picture.
whichever is more is the one that is more different (why does that sound like bad English to me?) will be the lower quality image.
Use python and the PIL (python image library) to automate the whole thing and thats it.

Just sort by the size by MikeBabcock · 2009-07-16 11:51 · Score: 1

JPEG is pretty efficient at compressing images -- the only way they get smaller on average is by increasing the quality loss. Therefore, the larger of the two images in bytes is probably the better looking copy.

--
- Michael T. Babcock (Yes, I blog)

Adobe DNG by Gruff1002 · 2009-07-16 11:54 · Score: 1

How to save digital photos is a serious concern. JPEG sucks, it is not even an option. Any 24 bit option is doable. Here's the rub Adobe needs to get more open source, we can help them and they can help us.

Possible Method... by teko_teko · 2009-07-16 11:57 · Score: 1

I just thought of a possible way to compare...

Assuming both JPEG aren't at the lowest (or very low) quality:

1. Take image A, create 10 or 20 more copies using different levels of quality (5, 10, 15, and so on).
2. Compare each of them with image A, from lowest to highest quality.
3. Stop where the diff no longer change with the previous image, then we can assume image A is at the previous image's quality level.

Do the same with image B.

Subjective... by GWBasic · 2009-07-16 12:02 · Score: 1

Well, your problem is that image quality is subjective. Can computers make good subjective judgements? Not really.

Let's say you count the number of pixels that are different? Well, what if JPEG usually slightly alters the brightness? You could weight the difference, but what if JPEG sometimes moves an edge by a pixel?

I think if you study a bit about how JPEG works, you might find that you can computationally determine how much information that is lost; but that does not mean that your computed number in any way is related to what a human will say the image quality is.

--
No, I will not work for your startup

Re:Admit your a huge faggot by rtyhurst · 2009-07-16 12:10 · Score: 1

No your the faggit, and I gone git pitbulls with AIDS to rape you face!

PS: HA HA!

You're assuming a bit too much, aren't ya? by macraig · 2009-07-16 12:10 · Score: 1

It's appears that you assume that he wants to compare images for which he himself is the source? What if the images he actually wants to compare are pr0n, of the same hi-res glamour photo sets obtained from different sources? He needs to decide which is the "best" pron to keep, right? (Never mind that he can probably jack off equally well to either/any... he's a COLLECTOR so it matters. :-)

Such images will almost always have the EXIF data scrubbed from them, so your technique wouldn't work at all for racy hi-res stuff. I'm deliberately not naming example sources, because I don't want them to know they're a topic. :-)

Expert's answer by mezis · 2009-07-16 12:23 · Score: 2, Interesting

Exploit JPEG's weakness.

JPEG encodes pixels by using a cosine transform on 8x8 pixel blocks. The most perceptually visible artifacts (and the artifacts most suceptible to cause troble to machine vision algorithms) appear on block boundaries.

Short answer:
a. 2D-FFT your image
b. Use the value of the 8-pixel period response in X and Y direction as your quality metric. The higher, the worse the quality.

This is a crude 1st approximation but works.

Entropy? by Anonymous Coward · 2009-07-16 12:23 · Score: 0

The 'quality' of a picture, as stated, is still a bit vague. If you have an image of a completely blue wall, I believe the entire picture could be compressed to a single 'artifact', yet retain the same amount of information as a bitmap. Perhaps what you're after is the amount of information given in an image.
Information Theory should help there. http://en.wikipedia.org/wiki/Entropy_(information_theory)
One quick and dirty method might take the histogram of the image, and then find the one with the greatest (or least) standard deviation. You could map light/depth, colors, etc to the histogram and see which one best suits your needs. It's not flawless (if for some reason you wanted a very blue wall instead of picking up little defects of dirt) but it could work.

Use judge by arose · 2009-07-16 12:29 · Score: 0, Redundant

Judge. It's not perfect, but it works.

--
Analogies don't equal equalities, they are merely somewhat analogous.

Re:Use judge by Anonymous Coward · 2009-07-16 20:25 · Score: 0

Somebody mod the parent up...

How about "Date Modified"? by tomsomething · 2009-07-16 12:34 · Score: 1

If by "near-duplicate" you mean different files that were actually once the same image, sorting by "date modified" might give you satisfactory results. Of course, I'm making certain assumtions here about how the images were acquired and why there are multiple versions, and only you will know if this applies to your situation, but I would suspect that the older files would be of better quality.

--
Welcome to Slashdot. Replace this text with your desired signature before replying to a story.

Try jpgQ - JPEG Quality Estimator by Anonymous Coward · 2009-07-16 12:38 · Score: 1, Informative

jpgQ - JPEG Quality Estimator
http://www.mediachance.com/digicam/jpgq.htm

Some things aren't doable yet by PingXao · 2009-07-16 12:44 · Score: 1

Aside from the mathematical tests some have suggested, my gut tells me this is going to be almost impossible. There are tasks that a human can perform that just aren't doable given the present state of our software systems. The gap has as much to do with our understanding about how we perceive through our senses as it does with algorithms and calculation methodologies. We just don't know yet enough about the underlying processes to make a computer do it.

The same goes for other areas where AI is sorely lacking. Things like OCR, language recognition and translation, not to mention a program where you can whistle a tune and have it analyzed to the point where its name can be deduced (if it was written already), or scored as sheet music (if you're creating something new).

All these replies are right yet all are wrong. by FlyingGuy · 2009-07-16 13:00 · Score: 1

You are asking a machine to make a comparison between "good" and "not good" or "OK" and "fantastic" when all of these choices are by their very nature illusory at best.

Consider a photo of a person. I may prefer a softer focus some my prefer sharper, other more color saturation of a pastoral scene others less. Individuals judge an image in many many different ways.

In my youth I did a lot of photography. I was taking pictures of the Winternationals at Fremont Raceway ( when it still existed. ) and was shooting a funny car as it came off the line. I was shooting tri-x and pushing it a full stop which resulted in a grainy negative. I did some darkroom magic and came up with a very eye catching and award winning photo. But if you mechanically compared it to the straight shot it would haev been inferior.

The point is you can use an computer to compare some things, but you cannot use a computer to judge "better" in an artistic sense or a "pleasing to the eye" sense.

--
Hey KID! Yeah you, get the fuck off my lawn!

What difference does it make? by tpstigers · 2009-07-16 13:15 · Score: 1

Umm...... I have to ask. If you can't tell just by looking at them, what difference does it make?

Re:Easy or using evolution like this... by barwasp · 2009-07-16 13:20 · Score: 1

Tiled background images by evolution
Horizontal 3d bars by evolution
Vertical 3d bars by evolution

Cisco? by Anonymous Coward · 2009-07-16 13:39 · Score: 0

Um, Cisco has catastrophic layoffs today and this couldn't have waited until later?

Try looking at the histograms of Y, Cb, Cr by pclminion · 2009-07-16 13:42 · Score: 1

Take advantage of the fact that JPEG quantized the chrominance information more aggressively at higher compression levels. Quite ridiculously so, in fact. Look at these three images. The first two are the Cb and Cr channels of a highly-compressed JPEG. The third is the luminance channel. Notice that there is WAY more information contained in the luminance channel. This effect gets more and more extreme as JPEG quality goes down.

Histograms

Quantifying this is a different question. Look at the histograms of each of the three channels. The histogram of Cb and Cr is extremely sparse, with a few large peaks, but with no energy in most buckets. The luminance channel, on the other hand, has a much more detailed histogram. I leave it up to the reader to create a formula to boil this all down to a single number.

Maybe I'm thinking too simple here, but: by Hurricane78 · 2009-07-16 13:53 · Score: 1

What about just, you know... looking at them?
And if you can't tell the difference, does it matter then? (Just take the smaller one.)
That is my approach.

If you want the best one, even when you can't see the difference, just take the biggest one.
If the codec is the same, the chance that a higher quality image is smaller, is zero.

There, I solved it for you. :D
Or as a funny advertisement for a newspaper said:

[Image of a shiny pen.]
Before the first manned flight to space, NASA developed a pen, that can write in zero gravity, without the ink leaking.
The development costs amounted to $12 million.
[Removes pen, and puts a pencil in its place.]
That's... how the Russians solved the problem.

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.

Comment removed by account_deleted · 2009-07-16 13:56 · Score: 1

Comment removed based on user account deletion

JPEG compression - say no to jpeg! by Fotograf · 2009-07-16 15:44 · Score: 1

jpeg is plain evil. OP problem can imo be solved by reading of JPEG compression level, sure it wont help if image is multiple times recompressed but looking up together to size and compression level from header should be enough

--
God's gift to chicks

There are too many variables. by thesandbender · 2009-07-16 15:53 · Score: 1

The simple fact of the matter is that what you perceive as a "better" image, others won't. You may look at the primary subject matter, other will look at that and the background. You may be concerned about the contrast on the picture while others will look at the colors. While I understand that you're really looking for a good median there is truth to the axiom that "a picture says a thousand words". Anytime you monkey with it, you're stripping at least a few those words away. I think a better question is not "how do a compress this picture" but "what pictures should I keep". Just my $.02

Re by Anonymous Coward · 2009-07-16 15:53 · Score: 0

There was a story a while back of a programer that worked with the quantization field or something to tell if a photo had been photoshoped, how many layers, and by what program EVEN if the file had been reencoded and compressed. Google "Krawetz's software." He used it to show Al Qaeda's videos were manipulated.

Compression is just one factor by Chris+Pimlott · 2009-07-16 16:26 · Score: 1

While important, compression isn't the only issue. You'll also have to consider issues such as resolution, cropping, noise, blurriness, color balance, white level... especially if you're dealing with non-digital sources. I went through a phase of collecting scans of HR Giger works and came across all sorts of subjective issues. One scan might be extremely high res but cuts off the edges. Another might be blurry but have more accurate colors (compared to low-res images from the artist's official sites). Many times I ended up keeping multiple images since I couldn't find a single one reproducing everything faithfully.

JND Baby! by Anonymous Coward · 2009-07-16 16:31 · Score: 0

Just Noticeable Difference. The objective way of measuring the subjective. http://en.wikipedia.org/wiki/Difference_limen

thanks for the serious consideration here by kpoole55 · 2009-07-16 16:32 · Score: 2, Interesting

Thanks to the many who took this as a serious question and didn't turn this into a "It's just pr0n so who cares." Some is pr0n, some isn't, the most consistent thing is humor.

Many ideas needed the original image to find the better quality of the copy and some asked where I get these images from. These are linked in that I get the images from the USENET, from forums and from artists' galleries. This means that there's only a small set, from the artists' galleries, that I know are original. Others may be original but it may not be the original that comes to me first. On occasion, an artist may even publish the same image in different forms depending on the limitations of the different forums he frequents.

There were some ideas that were nicely different from the directions I was following that they'll give me more to think about.

I'll also acknowledge those who said that how the image is represented is less important than what the image represents. That's quite true but if I have a machine that can find the best representation of something I enjoy then why not use it.

Re: AI matching by neonsignal · 2009-07-16 16:46 · Score: 1

yeah, that's the problem with learning systems like neural nets - it is hard to be sure which variations they are 'focusing' on - figure or ground.

win32 GQView by u64 · 2009-07-16 16:54 · Score: 1

First i 'jpegtran' all files to even-out different compression methods. Then 'fdupes' and delete all identical files.
In Windows: for /R .\ %%1 in (*.jpg *.jpeg) do jpegtran -optimize -perfect -copy none -progressive "%%1" "%%1"
Duplicate File Finder (Empty RecycleBin before to avoid confusion)
Then i use GQView (exist for both Linux and win32). Set Preferences, Advanced, Custom Similarity to 98% to begin with. GQView Menu, New Collection, Load list of files and select Compare.

BONUS DISK-SPACE: jscl.exe -d -j -n -r -s *.jpg
hihi

Re:win32 GQView by Woek · 2009-07-16 18:29 · Score: 1

I use gqview as my standard image viewer in gnome, and I also have good experience with the duplicate search function. Works quite well!

This is the subject of many studies by Anonymous Coward · 2009-07-16 17:12 · Score: 0

This is a very interesting question!! (excuse me in advance for my english)

As mentionned in the previous posts, very simple mathematical equations can give you mesures about the quality of an image. For instance, the 3 most popular are:

- Root mean square
- Mean absolute difference
- Peak signal to noise ratio

However, none of these can provide an accurate representation of the artefacts percieved by a human. I'm a student in Image Processing in the University of Sherbrooke. From what I know, there's a lot of researches on "Quality mesurments" (especially one with people from texas University and UniversitÃ© de lyon) from which we expect promising results.

Until then, you can still use some old tricks. Chop off where it's the less percievable.
- Translate RGB channels in YUV. Chop on the chrominance and keep the luminance. We tend to be more sensible about the latter.
- Chop on High frequencies using a logarithmic filter. We're more sensible to small variations on lower frequencies.

All of the terms and concepts can be found with a quick search on google / wikipedia.

Also, take a look at the Jpeg2000 format. It's usage of the wavelet transform leaves a lot less artefacts for a given compression ratio.
PGF (progressive grapfic file) is similar to Jpeg2000, a bit faster on compression, leaves few more artefacts.

However, some old tips are still

Try VisiPics (freeware.) by Anonymous Coward · 2009-07-16 17:24 · Score: 0

Try VisiPics (freeware.)

Structural Similarity Index Method (SSIM) by Paridel · 2009-07-16 18:31 · Score: 2, Interesting

In general your best bet would be to use an image quality metric that takes into account how the human visual system works. The 2D frequency response of the human eye looks something like a diamond, which means that we see vertical and horizontal frequencies better than diagonal ones.

In fact, most image compression techniques (including JPEG) take this into account, however, conventional ways of determining the noise in images (minimum mean squared error, peak signal to noise, root mean squares) don't factor in the human visual system.

Your best bet is to use something like the structural similarity method (SSIM) by Prof. Al Bovik of UT Austin and his student Prof. Zhou Wang (now at the University of Waterloo).

You can read all about SSIM and get example code here: http://www.ece.uwaterloo.ca/~z70wang/research/ssim/

Or read more about image quality assessment at Prof. Bovik's website: http://live.ece.utexas.edu/research/Quality/index.htm

If you don't care about how it works, and just want to use it, you can get example code for ssim in matlab at that website and C floating around the net. The method is easy to use; essentially the ssim function takes two images and returns a number between 0 and 1 that describes how similar the images are. Given two compressed images and the original image, take the SSIM between each and the original. The compressed image with the higher SSIM value is the "best".

It sounds like for your problem you might NOT have the original uncompressed image. In that case you might try checking for minimal entropy or maximum contrast in your images.

Essentially entropy would be calculated as:

h = histogram(Image);
p = h./(number of pixels in image);
entropy = -sum(p./log2(p));

You will need to make sure you scale the image appropriately and don't divide by zero! Or better yet, you should be able to find code for image entropy and contrast on the web. Just try searching for entropy.m for a matlab version.

Good luck!

NASA by ei4anb · 2009-07-16 19:38 · Score: 1

Hello, is that you NASA ?

Compare them mathematically by Anonymous Coward · 2009-07-16 19:43 · Score: 0

Matlab. Though you need to have the original picture to compare. One thing is though that mathematical difference does not correlate with image quality. By reducing the resolution of the chrominance channels (e.g. half resolution for color, full resolution for luminance), you can get a much smaller image, and you cannot easily see the difference. So image quality is always subjective.

shameless plug by pyropunk51 · 2009-07-16 20:06 · Score: 1

I'm assuming you want to automatically/programmatically discard the one with the least/most artifacts. In this case there are very few programs around, but I'm working on a rules engine for my program that may be able to help you in future. Please evaluate DuMP3 at http://dump3.sourceforge.net/ to see if it may suit your needs.

--
double penetration; //ouch

Oooh! Lots of big words! by Anonymous Coward · 2009-07-16 20:21 · Score: 0

But you don't know what you're talking about, and you're wrong. I was wondering how long it would be before some idiot thought this article would be a good excuse to reel off some buzz words they read in a book once but didn't really understand, in the hopes of looking intelligent...and here you are!

Welcome to Pattern Recognition by Yamavu · 2009-07-16 22:00 · Score: 1

I take it that you want to extract and compare features of the actual jpeg image, regardless of quality. There are many ways to do that and none of them includes filesize comparisons or the like. You could look in the JPEG Standard and try to filter out compression by just reading the base of every 8x8 block (that's the one that shouldn't be compressed) and compare these values for similarity. However you should aim for more advanced image recognition and comparison algorithms, for example the ones used on TinEye. Most of these algorithms come from the field of AI, but they're quite simple generally.

inverse DCT comparison by Anonymous Coward · 2009-07-16 22:09 · Score: 0

JPEG Images are built of 8x8 Blocks. Thoser are then DCT'd (Discrete Cosine Transform) in order to get the Block's frequency spectrum. The element x=1, y=1 is the so-called DC-Channel (Like Direct Current). It is usually the average of the whole 8x8 Block. The other positions are frequencies incresing with the position (e.g. pos x=2 is one whole oscillation, where x=4 are 2 (or 4, don't remember) oscillations).

Now to the task. If you look at the DC-Component, and the other components are relatively small, this means that there is not much information in this block (e.g. if it is just a blue spot in a picture with the sky). However, if you have two similar pictures, you can compare block by block. The picture ehich has higher components in the higher x and y values will be the one with the better quality, since high frequency means: high details.

Of course, implementing this be difficult. There is not just DCT involved, but also a zip like algorithm, and the actal compressions is done by "rounding" the components values to integers (since DCT itself doesn't do any compression).

Maybe one could adapt a jpeg library by inserting some code in the decompression algorithm which creates a "fingerprint" of the individual blocks, and then compare it with the other picture's fingerprint. I think the result shoudl really tell the quality difference.

Cheers

KISS by SNACKeR · 2009-07-16 23:36 · Score: 1

If you know you have the original files, the file with the oldest date has the best quality. Else, go by file size first, and break ties using the oldest date as the winner.

Interpolative Comparison by Anonymous Coward · 2009-07-17 00:13 · Score: 0

You could write a little app to interpolate across spaces in the JPEG and then compare the resulting differences from interpolated and actual data for each JPEG image. Assumable, the image with more JPEG compression artifacts will have a higher (on average) difference between interpolated values and actual values because of the random artifacts which will throw off interpolation.

How finely grained your interpolation needs to be may be something you will have to experiment with... but I think this should work fairly reliably in theory.

compare with a lower quality image by Anonymous Coward · 2009-07-17 00:45 · Score: 0

It seems to me it would work best if you had something to compare it to
since you don't have the original, how about looking at it from this point of view.
take each of the two images and reprocess them with the lowest quality of jpeg (producing the most artifacts) and see which original image is closer to its reprocessed image.
the other one should then be the highest quality.

AI Algorithms by Cassini2 · 2009-07-17 01:50 · Score: 1

I heard a similar story about an auto-tracking algorithm used for aiming cameras. It would happily follow the red car, but then it saw the red garbage dumpster. It never moved after seeing the red garbage dumpster.

The truth is that the AI algorithms are absolutely notorious for keying in on unanticipated patterns. For the AI algorithms to work, you need to verify they are doing what you expect. Depending on your choice of algo, this can be really tough.

Eyeballing it works best by Anonymous Coward · 2009-07-17 02:16 · Score: 0

Just looking at the image works best, especially when you have to judge between for example an image that has higher resolution and one that has less artifacts. The only way you can really tell which one will look best to you is by looking at it.

Sorting steps to find originals by rwa2 · 2009-07-17 03:43 · Score: 2, Informative

You probably don't necessarily want to find the "best quality" image, but rather the image that was closest to the original.

I take it you're either trying to eliminate the low-quality duplicates or thumbnails from a really large collection of pr0n, or trying to write an image search engine that tries to present the "best" rendition of a particular image first.

As a quick first pass (after you've run through to collect all the similar images into separate groups), you'd obviously want to find the version of the image with the highest resolution. This might let you easily throw out thumbnails or scaled down versions you might come across. Of course, some dorks will upscale images and post them somewhere, so you might still want to hang on to some of them for the second stage.
For the second pass, you'd likely want to scan through the metadata first, especially stuff exposed by EXIF. So you'd want to give higher scores to EXIF data that makes it sound like it came directly off a digital camera or scanner, and bump down the desirability of pictures that appeared to have been edited by any sort of photo editing software.
Then maybe you want to look at something that would rank down watermarks or other modifications.
Another step would be to compare compression quality, but I think that's what most of the other posts are concentrating on. But this is a difficult step because it can be easily fooled, since idiots can re-save a low quality image with the compression quality cranked all the way up so the file size becomes high even though the actual image quality is worse than the original. You probably need to run it through one of those "photoshop detectors" that could tell you whether the image has been through smoothing or other filters in a photo editor. The originals (especially in raw format and maybe high quality JPEG) will have a certain type of CCD noise signature that your software might be able to detect. In the same vein, a poorly-compressed JPEG will have lots of JPEG quantization artifacts that your software might be able to detect as well. Otherwise, you're kinda left with zooming in on pics and eyeballing it.
Finally you might be left with a group of images that are exactly the same but have different file names... you probably want some way to store some of the more useful bits of descriptive text as search/tag metadata, but then choose the most consistent file naming convention or slap on your own based on your own metadata.

Hopefully this gives you a start to important parts of the process that you might have overlooked...

Visipics by Anonymous Coward · 2009-07-17 05:30 · Score: 0

I've used VisiPics (google it or just add .info). It works very well to me. It'll scan the directories you choose, check for duplicate photos, display them (allowing you to compare them), and give you the option to move or delete either or all.

A few ideas by petermgreen · 2009-07-17 09:29 · Score: 1

1: count how many unique values there each DCT coeffeciant. If you only find a small number then it probablly means the image has been through low quality jpeg compression. This method may be fooled if the image has been cropped in a way that changes the block boundried though.
2: check for excessive high frequency noise, this may indicate the image has been dithered in the past. OTOH excessively low high frequencies may indicate heavy jpeg compression.

IMO storage is cheap so what I would do is make a database which could index the various copies of each image. You could things arranged so there was one version the software considered "probablly best" but if you really needed the best quality copy you could go back and check manually.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

Slashdot Mirror

Choosing Better-Quality JPEG Images With Software?

291 comments