Slashdot Mirror


Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF?

postbigbang writes "Imagine having thousands of images on disparate machines. many are dupes, even among the disparate machines. It's impossible to delete all the dupes manually and create a singular, accurate photo image base? Is there an app out there that can scan a file system, perhaps a target sub-folder system, and suck in the images-- WITHOUT creating duplicates? Perhaps by reading EXIF info or hashes? I have eleven file systems saved, and the task of eliminating dupes seems impossible."

2 of 243 comments (clear)

  1. findimagedupes in Debian by nemesisrocks · · Score: 5, Interesting

    whatever you decide on, it could probably be done in a hundred lines of perl

    Funny you mention perl.

    There's a tool written in perl called "findimagedupes" in Debian. Pretty awesome tool for large image collections, because it could identify duplicates even if they had been resized, or messed with a little (e.g. adding logos, etc). Point it at a directory, and it'll find all the dupes for you.

  2. Quick shell script using exiftool by Khopesh · · Score: 4, Interesting

    This will help find exact matches by exif data. It will not find near-matches unless they have the same exif data. If you want that, good luck. Geeqie has a find-similar command, but it's only so good (image search is hard!). Apparently there's also a findimagedupes tool available, see comments above (I wrote this before seeing that and had assumed apt-cache search had already been exhausted).

    I would write a script that runs exiftool on each file you want to test. Remove the items that refer to timestamp, file name, path, etc. make a md5.

    Something like this exif_hash.sh (sorry, slashdot eats whitespace so this is not indented):

    #!/bin/sh
    for image in "$@"; do
    echo "`exiftool |grep -ve 20..:..: -e 19..:..: -e File -e Directory |md5sum` $image"
    done

    And then run:

    find [list of paths] -typef -print0 |xargs -0 exif_hash.sh |sort > output

    If you have a really large list of images, do not run this through sort. Just pipe it into your output file and sort it later. It's possible that the sort utility can't deal with the size of the list (you can work around this by using grep '^[0-8]' output |sort >output-1 and grep -v '^[0-8]' output |sort >output-2, then cat output-1 output-2 > output.sorted or thereabouts; you may need more than two passes).

    There are other things you can do to display these, e.g. awk '{print $1}' output |uniq -c |sort -n to rank them by hash.

    On Debian, exiftool is part of the libimage-exiftool-perl package. If you know perl, you can write this with far more precision (I figured this would be an easier explanation for non-coders).

    --
    Use my userscript to add story images to Slashdot. There's no going back.