Slashdot Mirror


Automated PDF File Integrity Checking?

WomensHealth writes "I have about 6500 pdfs in my 'My Paperport Documents' folder that I've created over the years. As with all valuable data, I maintain off-site backups. Occasionally, when accessing a very old folder, I'll find one or two corrupted files. I would like to incorporate into my backup routine, a way of verifying the integrity of each file, so that I can immediately identify and replace with a backed-up version, any that might become corrupted. I'm not talking about verifying the integrity of the backup as a whole, instead, I want to periodically check the integrity of each individual PDF in the collection. Any way to do this in an automated fashion? I could use either an XP or OS X solution. I could even boot a Linux distro if required."

1 of 40 comments (clear)

  1. Sure there's a way by b4dc0d3r · · Score: 3, Interesting

    There are PDF libraries out there - write a wrapper that loads a file, and when it gets to the end without error emits a 0 "no error" return code, and any errors result in a non-zero code.

    Or maybe there are other cmd-line tools which issue a "failed to load" error. That's where I'd look first. Like a tool to strip content out of a PDF - script it so it outputs to /dev/null and check the exit code. I'd be surprised if there were a ready-made solution for this somewhere.