Slashdot Mirror


Choosing Better-Quality JPEG Images With Software?

kpoole55 writes "I've been googling for an answer to a question and I'm not making much progress. The problem is image collections, and finding the better of near-duplicate images. There are many programs, free and costly, CLI or GUI oriented, for finding visually similar images — but I'm looking for a next step in the process. It's known that saving the same source image in JPEG format at different quality levels produces different images, the one at the lower quality having more JPEG artifacts. I've been trying to find a method to compare two visually similar JPEG images and select the one with the fewest JPEG artifacts (or the one with the most JPEG artifacts, either will serve.) I also suspect that this is going to be one of those 'Well, of course, how else would you do it? It's so simple.' moments."

67 of 291 comments (clear)

  1. Easy by Anonymous Coward · · Score: 3, Interesting

    Paste both images in your image editor of choice, one layer on top of each other, apply a difference/subtraction filter.

    1. Re:Easy by Random+Destruction · · Score: 3, Insightful

      Ok, so you know how two images differ. Which one is closer to the original? You don't know, because you don't have the original to compare.

      --
      :x
  2. File size by Tanman · · Score: 2, Insightful

    it is lossy compression, after all . . .

    1. Re:File size by Robotbeat · · Score: 4, Informative

      File size doesn't tell you everything about quality.

      For instance, if you save an image as a JPEG vs. first saving as a dithered GIF and _then_ saving as JPEG, then the second one will have much worse actual quality, even if it has the same filesize (it may well have worse quality AND have a larger file size).

    2. Re:File size by teko_teko · · Score: 3, Insightful

      File size may not be accurate if it has been converted multiple times at different quality, or if the source is actually lower quality.

      The only way to properly compare is if you have the original as the control.

      If you compare between 2 different JPEG quality images, the program won't know which parts are the artifacts. You still have to decide yourself...

    3. Re:File size by Anonymous Coward · · Score: 3, Insightful

      File size doesn't tell you anything. If I take a picture with a bunch of noise (eg. poor lighting) in it then it will not compress as well. If I take the same picture with perfect lighting it might be higher quality but smaller file size.

      Why this is modded up, I don't know. Too many morons out there.

    4. Re:File size by Shikaku · · Score: 4, Informative

      http://linux.maruhn.com/sec/jpegoptim.html

      No. You can compress JPEG lossless.

    5. Re:File size by Vectronic · · Score: 2, Interesting

      Also, stuff like Photoshop, will insert a bunch of meta/exif-bullshit but something like Paint, doesn't... it's usually only about 2 to 3kb, but it's still tainting your results if you are going by size alone.

    6. Re:File size by Score+Whore · · Score: 5, Informative

      ...THERE IS NO LOSSLESS JPEG. PERIOD.

      Except for Lossless JPEG standardized in 1993. But other than that, no there is no lossless jpeg.

    7. Re:File size by Qzukk · · Score: 2, Insightful

      actually one of the meta values that is stored is a quality indicator.

      And when you save a max quality copy of a min quality jpeg, the picture still looks like crap.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    8. Re:File size by Chyeld · · Score: 5, Interesting

      There was a old story my AI teacher used to share back in college about a military contractor that was developing an AI based IFF (identifcation, friend or foe) system for aircraft.

      They trained it using what was, at the time, a vast picture database of every aircraft known. In the lab, they were able to get it down to 99% accurate, with the error favoring 'unknown' as the third option.

      So they took it out for a test run. The first night out the system tried firing on anything and everything it could lock on, including ground targets.

      This was bad. Horribly bad. But they were certain that there was some sort of equipment failure going on. After all their AI was damn near perfect at ID'ing the targets in the lab, the issues must be up the line somewhere.

      So they did a once over of the equipment and couldn't find a problem. Not sure what to do next the team took the system out for another dry run the next day. This time, the system refused to see any ground targets and anything it saw in the air was friendly.

      Now this was getting ridiculous, the team was extremely confused. So they did what they should have done the first time around, they did a third test run looking at what the AI was actually 'thinking'.

      And promptly discovered the problem. While they had a huge database of images to use, they realized that all their 'friendly' craft had pictures taken during the day, while in flight. All their 'hostile' craft however were pictures that had been taken at night during spy runs or from over head satalite shots.

      The AI wasn't keying off the planes, it was keying off whether it was daytime or night time.

      I don't know if the above actually ever happened, but my point is, it doesn't matter how many images you seed your database with. Unless you are there to tell it what is an artifact and what is just part of the picture, you are going to end up with horrible results and comical results.

    9. Re:File size by mezis · · Score: 2, Interesting

      Every single JPEG is lossy, for three reasons:

      a. Source (color) digital images use RGB colorspace (typically, the raw format is "RAW" with a Bayer layout). JPEG compresses three planes, with a YCrCb colorspace.
      Due to colorspace conversion and quantization error, you lose information. That's called "lossy".
      b. Even in lossless JPEG, each 64-pixel block is KR-transformed and quantized. Again, always lossy.
      c. No free lunch.

      Typically, even lossless JPEG makes you lose 1-2% of the total information (measured via image entropy). Things are slightly better with lossless JPEG2000. Both are *perceptually* lossless.

    10. Re:File size by Anonymous Coward · · Score: 2, Funny

      >>Except for Lossless JPEG [wikipedia.org] standardized in 1993. But other than that, no there is no lossless jpeg.

      Katie Couric: What did John McCain do to try to stop the housing meltdown?
      Sarah Palin: He voted for legislation to more carefully regulate Fannie Mae and Freddie Mac to stop bad lending practices.
      Katie Couric: ...
      Katie Couric: Well, besides that, what did he do?
      Sarah Palin: ?

      And the funny thing is, we all remember this now as Sarah Palin not knowing the answer to the question, when it was really Katie Couric who was the fucktard that didn't know about lossless jpeg.

      True story.

    11. Re:File size by nabsltd · · Score: 3, Insightful

      Unfortunately, that's a subjective term based on the 'codec' used to make the jpg. Not everyone's 100 is the same nor is everyone working off the same scale (i.e. 1-10 vs 1-100).

      In addition, I bought a program (Windows only, sorry) that allows the user to pick the areas of the image that need the most bits. Basically, it allows you to pick the quality for any abitrary region (using standard selection tools like lasso) when saving the JPEG.

      I mostly got it for the batch processing and its excellent image quality when you set it to minimum compression.

    12. Re:File size by Minwee · · Score: 4, Funny

      And a squad of kanagaroos firing RPGs.

    13. Re:File size by sbeckstead · · Score: 2, Insightful

      But another bit of meta data there is "generation" so at least you could see how far it went from the place it started. The meta data actually has a purpose and people that process images without preserving it should be shot. And if the image hasn't got meta data and you are a professional you won't use it anyway. I hate tools like Paint because they destroy all that beautiful meta data you could have used to make this determination much easier. Assuming of course that image was generated and stored by someone who used the meta data in the first place. Alas you may be hosed here.

    14. Re:File size by timeOday · · Score: 5, Insightful
      This is the kind of problem you can solve in 2 minutes with 95% accuracy (by using file size), or never finish at all by listening to all the pedants on slashdot. When people know a little too much they love to go on about stuff like entropy and information gain, just because they (sort of) can.

      Try file size on the set of images of interest to you and see if it coincides with your intuition. If it does, you're done.

    15. Re:File size by Binary+Boy · · Score: 4, Informative

      Lossless JPEG and lossless JPEG2000 are both exactly that - lossless. Not perceptually lossless, which is what people often use to refer to high-quality, lossy JPEG/JPEG2000, or JPEG-LS. Lossless JPEG uses a PCM-like encoder, not DCT, AFAIR. Lossless JPEG and lossless JPEG2000 are, in fact, lossless, at least with regards to image data in supported color spaces. This is in part a result of *not* converting to YCrCb, since that conversion is lossy, of course. Not all Lossless JPEGs are 8bit YCrCb.

      Accusoft, for one, has a toolkit for building lossless JPEG applications which supports 16bit RGB and greyscale lossless JPEG modes.

      The near-lossless JPEG you're thinking of is JPEG-LS, which is perceptually lossless, and guarantees a maximum error rate that is generally neglible for almost all applications. This format gets better compression ratios than Lossless JPEG, of course.

      Neither the lossless or near-lossless JPEG modes are common though, outside of niche apps. Lossless JPEG2000 is, however, since almost all JPEG2000 libraries support it alongside the lossy modes.

    16. Re:File size by Chyeld · · Score: 2, Interesting

      I always wondered if that one wasn't an urban legend too, but appearently it was mostly true:

      The reuse of some object-oriented code has caused tactical headaches for Australia's armed forces. As virtual reality simulators assume larger roles in helicopter combat training, programmers have gone to great lengths to increase the realism of their scenarios, including detailed landscapes and - in the case of the Northern Territory's Operation Phoenix - herds of kangaroos (since disturbed animals might well give away a helicopter's position).

      The head of the Defense Science & Technology Organization's Land Operations/Simulation division reportedly instructed developers to model the local marsupials' movements and reactions to helicopters. Being efficient programmers, they just re-appropriated some code originally used to model infantry detachment reactions under the same stimuli, changed the mapped icon from a soldier to a kangaroo, and increased the figures' speed of movement.

      Eager to demonstrate their flying skills for some visiting American pilots, the hotshot Aussies "buzzed" the virtual kangaroos in low flight during a simulation. The kangaroos scattered, as predicted, and the visiting Americans nodded appreciatively... then did a double-take as the kangaroos reappeared from behind a hill and launched a barrage of Stinger missiles at the hapless helicopter. (Apparently the programmers had forgotten to remove that part of the infantry coding.)

      The lesson?

      Objects are defined with certain attributes, and any new object defined in terms of an old one inherits all the attributes. The embarrassed programmers had learned to be careful when reusing object-oriented code, and the Yanks left with a newfound respect for Australian wildlife. Simulator supervisors report that pilots from that point onward have strictly avoided kangaroos, just as they were meant to.

      Now the real story, with the Urban Myth removed...

      On Friday DSD told the story of the killer kangaroos. Now we know the truth. And it is even weirder: the kangaroos threw beach balls!

      Dr Anne-Marie Grisogono, Head, Simulation Land Operations Division at the Australian DSTO has told us what actually happened and we are delighted to set the record straight.

      "I related this story as part of a talk on Simulation for Defence, at the Australian Science Festival on May 6th in Canberra. The Armed Reconnaissance Helicopter mission simulators built by the Synthetic Environments Research Facility in Land Operations Division of DSTO, do indeed fly in a fairly high fidelity environment which is a 4000 sq km piece of real outback Australia around Katherine, built from elevation data, overlaid with aerial photographs and with 2.5 million realistic 3d trees placed in the terrain in those areas where the photographs indicated real trees actually exist.

      "For a bit of extra fun (and not for any strategic reason like kangaroos betraying your cover!) our programmers decided to put in a bit of animated wildlife. Since ModSAF is our simulation tool, these were modelled on ModSAF's Stinger detachments so that the associated detection model could be used to determine when a helo approached, and the behaviour invoked by such contact was set to 'retreat'. Replace the visual model of the Stinger detachment in your stealth viewer with a visual model of a kangaroo (or buffalo...) and you have wildlife that moves away when approached. It is true that the first time this was tried in the lab, we discovered that we had forgotten to remove the weapons and the 'fire' behaviour.

      "It is NOT true that this happened in front of a bunch of visitors (American or any other flavour). We don't normally try things for the first time in front of an audience! What I didn't relate in the talk is that since we were not at that stage interested in weapons, we had not set any weapon or projectile types, so what the kangaroos fired at us was in fact the default object f

  3. Try compressing both further by Ed+Avis · · Score: 2, Insightful

    I suppose you could recompress both images as JPEG with various quality settings, then do a pixel-by-pixel comparison computing a difference measure between each of the two source images and its recompressed version. Presumably, the one with more JPEG artefacts to start with will be more similar to its compressed version, at a certain key level of compression. This relies on your compression program generating the same kind of artefacts as the one used to make the images, but I suppose that cjpeg with the default settings has a good chance of working.

    Failing that, just take the larger (in bytes) of the two JPEG files...

    --
    -- Ed Avis ed@membled.com
  4. Re:AI problem? by Robotbeat · · Score: 3, Interesting

    ...it will simply require a human-level brain.

    How about Amazon's Mechanical Turk service?
    https://www.mturk.com/

  5. ImageMagick can give you EXIF data. by bcrowell · · Score: 4, Informative

    The ImageMagick package includes a command called identify, which can read the EXIF data in the JPEG file. You can use it like this:

    identify -verbose creek.jpg | grep Quality

    In my example, it gave " Quality: 94".

    This will not work on very old cameras (from ca. 2002 or earlier?), because they don't have EXIF data. This is different info than you'd get by just comparing file sizes. The JPEG quality setting is not the only factor that can influence file size. File size can depend on resolution, JPEG quality, and other manipulations such as blurring or sharpening, adjusting brightness levels, etc.

    1. Re:ImageMagick can give you EXIF data. by DotDotSlasher · · Score: 3, Informative

      imagemagick can also compare two images, and tell you how different they are. That is -- quantify the differences by returning a floating point number or two (PSNR, RMSE) in a way that a more-compressed JPEG image will return a correspondingly different floating point value. I know the question concerns two JPEG-compressed images, but if you do have an original image -- and you want to test which is closest to the original, ImageMagick can do that. Use the ImageMagick compare function.
      See http://www.imagemagick.org/script/compare.php

      Also, [[www.gimp.org]] is able to look at an image and approximate what JPEG compression quality setting was used, and use that same quality setting to save an output JPEG copy of the image. So -- they have some algorithm inside of their application which takes an image and returns (a good guess of) the corresponding jpeg quality value.
      Of course, this does not help you if the image was saved with a lousy JPEG quality value, like 10/100, and later saved at a much higher value, like 98/100. Since the algorithm only sees the last image, it would tell you the quality value is 98/100, even though the contents of the image would indicate the results of 10/100 compression, because of multi-generational lossy compression.

  6. Translation: Please help me with my porn... by Chyeld · · Score: 5, Insightful

    Dear Slashdot,

    Recently I checked my porn drive and realized that I have over 50 gigibytes of jpg quality porn collected. Unfortunately, I've noticed that a good portion of these are all the same picture of Natlie Portman eating hot grits. Could you please point me to a free program that will allow me to find the highest resolution, best quality version of this picture from my collection and delete the rest?

    Many Thanks!

  7. use the JPEG underlying details by cellurl · · Score: 2, Insightful

    To make a JPEG, you cut it into blocks, run the DCT on each block and mess with the 4:2:2 color formula and pkzip the pieces... That said, I would think measuring the number of blocks would be related to number of artifacts... In my barbaric approach to engineering, (assuming there is no other suggested way on slashdot), I would get the source code to the JPEG encoder/decoder and print out statistics (number of blocks, block size) of each image...

  8. It's easy by Anonymous Coward · · Score: 5, Insightful

    Run the DCT and check how much it's been quantized. The higher the greatest common factor, the more it has been compressed.

    Alternatively, check the raw data file size.

  9. quantization tables by angryargus · · Score: 3, Insightful

    Others have mentioned file size, but another good approach is to look at the quantization tables in the image as an overall quality factor. E.g., JPEG over RTP (RFC 2435) uses a quantization factor to represent the actual tables, and the value of 'Q' generally maps to quality of the image. Wikipedia's doc on JPEG has a less technical discussion of the topic, although the Q it uses is probably different from the example RFC.

  10. Measure sharpness? by Anonymous Coward · · Score: 4, Interesting

    Compute the root-mean-square difference between the original image and a gaussian-blurred version?
    JPEG tends to soften details and reduce areas of sharp contrast, so the sharper result will probably
    be better quality. This is similar to the PSNR metric for image quality.

    Bonus: very fast, and can be done by convolution, which optimizes very efficiently.

    1. Re:Measure sharpness? by uhmmmm · · Score: 3, Insightful

      Even faster is look at the DCT coefficients in the file itself. Doesn't even require decoding - JPEG compression works by quantizing the coefficients more heavily for higher compression rates, and particularly for the high frequency coefficients. If more high frequency coefficients are zero, it's been quantized more heavily, and is lower quality.

      Now, it's not foolproof. If one copy went through some intermediate processing (color dithering or something) before the final JPEG version was saved, it may have lost quality in places not accounted for by this method. Comparing quality of two differently-sized images is also not as straightforward either.

  11. DCT by tomz16 · · Score: 4, Informative

    Just look at the manner in which JPEGs are encoded for your answer!

    Take the DCT (discrete cosine transform) of blocks of pixels throughout the image. Examine the frequency content of the each of these blocks and determine the amount of spatial frequency suppression. This will correlate with the quality factor used during compression!

       

    1. Re:DCT by mikenap · · Score: 3, Insightful

      This seems to me the best suggestion, and there's a simple visual way to accomplish it! The hardest hit part of the image is going to be the chroma information, which your eye normally has reduced resolution sensitivity for in a normal scene. To overcome this, load your JPEGs into your favorite image editor and crank the saturation to the max(this throws away the luminance data). Now the JPEG artifacts in the chroma information will HIT YOU IN THE FACE, even in images that seemed rather clean before. Pick the least blocky of the two, and there you go!

    2. Re:DCT by eggnoglatte · · Score: 3, Insightful

      That works, but only if you have exact, pixel-to-pixel correspondence between the photos. It won't work if you just grab 2 photos from flicker that both show the Eiffel tower, and you wonder which one is "better".

      Luckly, there is a simple way to do it: use jpegtran to extract the quantization table form each image. Pick the one with the smaller values. This can easily be scripted.

      Caveat: this will not work if the images have been decoded and re-coded multiple times.

  12. use a "difference matte" by Anonymous Coward · · Score: 4, Informative

    load up both images in adobe after effects or some other image compositing program and apply a "difference matte"

    Any differences in pixel values between the two images will show up as black on a white background or vise versa...

    adam
    BOXXlabs

    1. Re:use a "difference matte" by uhmmmm · · Score: 2, Insightful

      So, that will show you which parts differ. How do you tell which is higher quality? Sure, you can probably do it by eye. But it sounds like the poster wants a fully automated method.

  13. Try ThumbsPlus by Anonymous Coward · · Score: 3, Informative

    ThumbsPlus is an image management tool. It has a feature called "find similar" that should do what you want as far as identifying to pictures that are the same except for the compression level. Once the similar picture is found you can use ThumbsPlus to look at the file sizes and see which one is bigger.

  14. Re:AI problem? by lunchlady55 · · Score: 4, Funny

    Oh sure, it starts out innocently enough - pick the better image. Next thing you know Skynet's decided that it's the better LIFE-FORM.

    AI - JUST SAY NO!

    Brought to you by the Coalition for Human Survival (C) Aug. 29, 1997

  15. Found it a while ago by sco08y · · Score: 5, Informative

    I mean, you don't want second rate pictures in your pr0n stash?

    I had problems building it back then, let alone writing the scripts for it and the hassle of figuring out which images were duplicates, but this utility seems to fit the bill.

  16. image quality measures by trb · · Score: 4, Informative
    google (or scholar-google) for Hosaka plots, or image quality measures. Ref:

    HOSAKA K., A new picture quality evaluation method.
    Proc. International Picture Coding Symposium, Tokyo, Japan, 1986, 17-18.

  17. Re:Filesize is a hint by thethibs · · Score: 4, Informative

    More Noise = Less Compression

    --
    I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
  18. Blur Detection? by HashDefine · · Score: 2, Informative

    I wonder if out of focus or blue detection methods will give you a metric which varies with the level of jpeg artifcats, after all the jpeg artifacts should make it more difficult to do things like edge detections etc which are the same the things that made more difficult by blurry and out of focus images

    A google search for blur detection should bring up things that you can try, Here is series of posts that to do a good job of explaining some of the work involved

  19. Fourier transform by maxwell+demon · · Score: 2, Interesting

    Assuming the only quality loss is due to JPEG compression, I guess a fourier transform should give you a hint: I think the worse quality image should have lower amplitude of high frequencies.

    Of course, that criterion may be misleading if the image was otherwise modified. For example noise filters will typically reduce high frequencies as well, but you'd generally consider the result superior (otherwise you woldn't have applied the filter).

    --
    The Tao of math: The numbers you can count are not the real numbers.
  20. Re:AI problem? by eikonoklastes · · Score: 2, Funny

    Well, of course, how else would you do it? It's so simple.

  21. Re:AI problem? by nametaken · · Score: 2, Insightful

    You're right, it needs to be done by humans to be sure.

    Amazon's Mechanical Turk should do the trick.

    https://www.mturk.com/mturk/welcome

  22. Filters by mypalmike · · Score: 5, Funny

    First, make a bumpmap of each image. Then, render them onto quads with a light at a 45 degree angle to the surface normal. Run a gaussian blur on each resulting image. Then run a quantize filter, followed by lens flare, solarize, and edge-detect. At this point, the answer will be clear: both images look horrible.

    --
    There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
  23. Re:AI problem? by CajunArson · · Score: 5, Interesting

    I don't know about "quality", but frankly it shouldn't be too hard to compare similar images just by doing simple mathematical analysis on the results. I'm only vaguely familiar with image compression, but if a "worse" JPEG image is more blocky, would it be possible to run edge detection to find the most clearly defined blocks that indicates a particular picture is producing "worse" results? That's just one idea, I'm sure people who know the compression better can name many other properties that could easily be measured automatically.
    What a computer can't do is tell you if the image is subjectively worse, unless the same metric that the human uses to subjectively judge a picture happens to match the algorithm the computer is using, and even then it could vary by picture to picture. For example, a highly colorful picture might hide the artifacting much better than a picture that features lots of text. While the "blockiness" would be the same mathematically, the subjective human viewing it will notice the artifacts in the text much more.

    --
    AntiFA: An abbreviation for Anti First Amendment.
  24. Automatic JPEG Artifact Removal by yet-another-lobbyist · · Score: 4, Interesting

    For what it's worth: I remember using Paint Shop Pro 9 a few years ago. It has a function called "Removal of JPEG artifacts" (or similar). I remember being surprised how well it worked. I also remember that PSP has quite good functionality for batch processing. So what you could do is use the "remove artifact" function and look at the difference before/after this function. The image with the bigger difference has to be the one of lower quality.
    I am not sure if there is a tool that automatically calculates the difference between two images, but this is a task simple enough to be coded in a few lines (given the right libraries are at hand). For each color channel (RGB) of each pixel, you basically just calculate the square of the difference between the two images. Then you add all these numbers up (all pixels, all color channels). The bigger this number is, the bigger the difference between the images.
    Maybe not your push-one-button solution, but should be doable. Just my $0.02.

  25. Re:AI problem? by CajunArson · · Score: 2, Insightful

    And to reply to myself.. several other posters have noted that taking the DCT of the compression blocks in the image will give information on how highly compressed the image is... there's one example.

    --
    AntiFA: An abbreviation for Anti First Amendment.
  26. How about audio? by bondiblueos9 · · Score: 2, Interesting

    I would very much like to do the same with audio. I have so many duplicate tracks in my music collection in different formats and bitrates.

    --
    Warning: The Surgeon General Has Determined that Sigs are Dangerous to Your Health
  27. Look at the DCT coefficients by uhmmmm · · Score: 3, Informative

    JPEG works by breaking the image into 8x8 blocks and doing a two dimensional discrete cosine transform on each of the color planes for each block. At this point, no information is lost (except possibly by some slight inaccuracies converting from RGB to YUV as is used in JPEG). The step where the artifacts are introduced is in quantizing the coefficients. High frequency coefficients are considered less important and are quantized more than low frequency coefficients. The level of quantization is raised across the board to increase the level of compression.

    Now, how is this useful? The reason heavily quantizing results in higher compression is because the coefficients get smaller. In fact, many become zero, which is particularly good for compression - and the high frequency coefficients in particular tend towards zero. So partially decode the images and look at the DCT coefficients. The image with more high frequency coefficients which are zero is likely the lower quality one.

  28. Re:AI problem? by kpoole55 · · Score: 2, Informative

    I've been lax, in a way, in my pruning of late so the findimagedupes program found about 28000 groups of near duplicate images. Finding that many was a surprise and that's why I started looking to see if a program had been written yet for the next step, finding the better image. I wrote a little script that prunes the identical files but now run into the problem of non-identical files that contain the same or nearly the same image.

  29. Re:AI problem? by moderatorrater · · Score: 2, Insightful

    Even simpler mathematical analysis would include such techniques as seeing which one takes up more disk space. Last I checked, that was very highly correlated with compression level.

  30. Expert's answer by mezis · · Score: 2, Interesting

    Exploit JPEG's weakness.

    JPEG encodes pixels by using a cosine transform on 8x8 pixel blocks. The most perceptually visible artifacts (and the artifacts most suceptible to cause troble to machine vision algorithms) appear on block boundaries.

    Short answer:
    a. 2D-FFT your image
    b. Use the value of the 8-pixel period response in X and Y direction as your quality metric. The higher, the worse the quality.

    This is a crude 1st approximation but works.

  31. Re:AI problem? by Spy+der+Mann · · Score: 4, Insightful

    Here's a simple but expensive formula:

    1. Get the image
    2. Compress it severely.
    3. Compare the difference between original and the compressed.

    The lower the difference, the lower the image quality.
    4. Profit!

    Or you could just measure the amount of data in the DCT space. Duh.

  32. Re:AI problem? by arose · · Score: 5, Informative

    AI or small utility... You never know with computers ;)

    --
    Analogies don't equal equalities, they are merely somewhat analogous.
  33. Re:AI problem? by fractoid · · Score: 2, Informative

    Thou shalt not make a machine in the likeness of a human mind.

    --
    Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
  34. Re:AI problem? by Anonymous Coward · · Score: 2, Informative

    Or you could just measure the amount of data in the DCT space. Duh.

    That'd be a Discrete Cosine Transform
    (for the confused like me. Crazy what they can do with math these days)

  35. Re:AI problem? by bendodge · · Score: 5, Informative

    Since the mods haven't noticed, and I don't have mod points, let me point out that THIS POST HAS THE ANSWER. A real program that will do what the asker wants. The source is available, but I can't seem to find its license (it includes some of the Independent JPEG Goup's code). Also, doesn't a jpeg's EXIF data or some other tag in the file tell you what quality it was saved at?

    --
    The government can't save you.
  36. Re:AI problem? by VanessaE · · Score: 2, Insightful
    Just checking the size of the file (or, I suspect, just the size of the DCT data) won't always work. Sometimes an image can end up growing in size slightly while losing quality, depending on the nature of the image and the settings of the imaging program.

    Things such as thin wires, multi-colored ribbon cable, close-ups of a circuit board, and other images with lots of similar details seem to benefit most from this kind of tweaking, mainly thanks to the placement and qualities of the artifacts, rather than their mere existence or apparent severity.

    I've had this happen many times - set an icon for, say, 35% quality and it will probably look kinda grungy, but step it down by just one or two percent and suddenly the artifacts shift around or change their appearance, sometimes in a manner that better suits the image - almost like constructive interference.

  37. thanks for the serious consideration here by kpoole55 · · Score: 2, Interesting

    Thanks to the many who took this as a serious question and didn't turn this into a "It's just pr0n so who cares." Some is pr0n, some isn't, the most consistent thing is humor.

    Many ideas needed the original image to find the better quality of the copy and some asked where I get these images from. These are linked in that I get the images from the USENET, from forums and from artists' galleries. This means that there's only a small set, from the artists' galleries, that I know are original. Others may be original but it may not be the original that comes to me first. On occasion, an artist may even publish the same image in different forms depending on the limitations of the different forums he frequents.

    There were some ideas that were nicely different from the directions I was following that they'll give me more to think about.

    I'll also acknowledge those who said that how the image is represented is less important than what the image represents. That's quite true but if I have a machine that can find the best representation of something I enjoy then why not use it.

  38. Re:AI problem? by adolf · · Score: 2, Interesting

    It almost does what he wants. He doesn't spell it out, but it seems strongly implied that he also wants a system capable of automatically finding these duplicates by itself, and then automatically determining which image is "best."

    Which seems obvious, to me: If he's got enough photos of sufficient disorganization that he can't tell automatically which duplicate is best, then there probably isn't any straight-forward way (with filenames or directory trees or whatever) to find out which ones are dupes to begin with.

    Judge, the afore-linked program, only does the job of finding the best image out of a set of duplicates.

    What tool can be used to find the (near) duplicates to begin with?

  39. Re:AI problem? by bh_doc · · Score: 3, Informative

    http://www.jhnc.org/findimagedupes/

    There's a bunch, but I know you can construct command line operations with this one. I imagine you could construct a system from this and the parent program that will find dupes, then nuke the poorer quality of each, or whatever.

  40. Structural Similarity Index Method (SSIM) by Paridel · · Score: 2, Interesting

    In general your best bet would be to use an image quality metric that takes into account how the human visual system works. The 2D frequency response of the human eye looks something like a diamond, which means that we see vertical and horizontal frequencies better than diagonal ones.

    In fact, most image compression techniques (including JPEG) take this into account, however, conventional ways of determining the noise in images (minimum mean squared error, peak signal to noise, root mean squares) don't factor in the human visual system.

    Your best bet is to use something like the structural similarity method (SSIM) by Prof. Al Bovik of UT Austin and his student Prof. Zhou Wang (now at the University of Waterloo).

    You can read all about SSIM and get example code here: http://www.ece.uwaterloo.ca/~z70wang/research/ssim/

    Or read more about image quality assessment at Prof. Bovik's website: http://live.ece.utexas.edu/research/Quality/index.htm

    If you don't care about how it works, and just want to use it, you can get example code for ssim in matlab at that website and C floating around the net. The method is easy to use; essentially the ssim function takes two images and returns a number between 0 and 1 that describes how similar the images are. Given two compressed images and the original image, take the SSIM between each and the original. The compressed image with the higher SSIM value is the "best".

    It sounds like for your problem you might NOT have the original uncompressed image. In that case you might try checking for minimal entropy or maximum contrast in your images.

    Essentially entropy would be calculated as:

    h = histogram(Image);
    p = h./(number of pixels in image);
    entropy = -sum(p./log2(p));

    You will need to make sure you scale the image appropriately and don't divide by zero! Or better yet, you should be able to find code for image entropy and contrast on the web. Just try searching for entropy.m for a matlab version.

    Good luck!

  41. Re:AI problem? by scdeimos · · Score: 2, Insightful

    That's only a reasonable indicator if the two copies of the same image you are comparing are also the same resolution. It's not hard to have a higher resolution image consume less disk space if the compression level has been bumped up. Also, different programs usually produce different JFIF streams even when set to the same compression level and using the same *uncompressed* source image, making the DCT size approach even less reliable.

  42. Re:AI problem? by SlashWombat · · Score: 2, Insightful

    Unfortunately, its not all that easy to compare. In general, the file with the higher byte count will be the better image, BUT ... The problem is there are different ways to compress the same picture. (There are several "controls", even in baseline JPEG. (Where the "quantisation" steps occur, where the high frequency cutoff for each macroblock occurs. Then there are different ways for the JPEG engine to entropy encode the bitstream. IE: Arithmetic coding is allowed by the JPEG standard, however, due to patent issues, most implementations use Huffman coding, which is slightly less efficient.) It should be remembered that the JPEG standard is just baseline Any implementer is free to improve upon the baseline coding, as long as it still decodes correctly. There used to be JPEG viewing software that decompressed and cleaned up images that looked terrible using "standard JPEG decoding software. (I am not sure, but I suspect the blockiness and quantisation errors were smoothed out, improving the displayed image immensely.)

    Of course, what you really need is the NCIS image enhancement package.

  43. Re:AI problem? by nahdude812 · · Score: 2, Insightful

    This just about gets to the heart of it. "Better" is a subjective term, so choosing better quality images is not going to be something everyone can agree on. Your example nails it. If you have two copies of the same image, one is higher resolution than the other, but saved with a higher compression rate, which is better? The answer is going to be "it depends on if the noise introduced by the higher compression annoys me more than the reduced information in the lower resolution image."

    If the compression on the high resolution image is high enough, you might still have better detail in the lower resolution image. If the higher resolution image isn't actually higher resolution, just higher dimensions (it's the smaller image scaled up), this is automatically a lower quality image (you can always recreate the higher resolution image from the lower resolution image, but not vice versa as rounding errors cause information loss whenever you scale an image).

    There may also be subjective differences like brightness/contrast/tone mapping differences.

    Given that the question being asked is a subjective one, the correlation of file size to subjective image quality should be so high that you may gain only a few percent better predictability with an extremely complex algorithm.

  44. Sorting steps to find originals by rwa2 · · Score: 2, Informative

    You probably don't necessarily want to find the "best quality" image, but rather the image that was closest to the original.

    I take it you're either trying to eliminate the low-quality duplicates or thumbnails from a really large collection of pr0n, or trying to write an image search engine that tries to present the "best" rendition of a particular image first.

    1. As a quick first pass (after you've run through to collect all the similar images into separate groups), you'd obviously want to find the version of the image with the highest resolution. This might let you easily throw out thumbnails or scaled down versions you might come across. Of course, some dorks will upscale images and post them somewhere, so you might still want to hang on to some of them for the second stage.

    2. For the second pass, you'd likely want to scan through the metadata first, especially stuff exposed by EXIF. So you'd want to give higher scores to EXIF data that makes it sound like it came directly off a digital camera or scanner, and bump down the desirability of pictures that appeared to have been edited by any sort of photo editing software.

    3. Then maybe you want to look at something that would rank down watermarks or other modifications.

    4. Another step would be to compare compression quality, but I think that's what most of the other posts are concentrating on. But this is a difficult step because it can be easily fooled, since idiots can re-save a low quality image with the compression quality cranked all the way up so the file size becomes high even though the actual image quality is worse than the original. You probably need to run it through one of those "photoshop detectors" that could tell you whether the image has been through smoothing or other filters in a photo editor. The originals (especially in raw format and maybe high quality JPEG) will have a certain type of CCD noise signature that your software might be able to detect. In the same vein, a poorly-compressed JPEG will have lots of JPEG quantization artifacts that your software might be able to detect as well. Otherwise, you're kinda left with zooming in on pics and eyeballing it.

    5. Finally you might be left with a group of images that are exactly the same but have different file names... you probably want some way to store some of the more useful bits of descriptive text as search/tag metadata, but then choose the most consistent file naming convention or slap on your own based on your own metadata.

    Hopefully this gives you a start to important parts of the process that you might have overlooked...