Automated PDF File Integrity Checking?
WomensHealth writes "I have about 6500 pdfs in my 'My Paperport Documents' folder that I've created over the years. As with all valuable data, I maintain off-site backups. Occasionally, when accessing a very old folder, I'll find one or two corrupted files. I would like to incorporate into my backup routine, a way of verifying the integrity of each file, so that I can immediately identify and replace with a backed-up version, any that might become corrupted. I'm not talking about verifying the integrity of the backup as a whole, instead, I want to periodically check the integrity of each individual PDF in the collection. Any way to do this in an automated fashion? I could use either an XP or OS X solution. I could even boot a Linux distro if required."
There are PDF libraries out there - write a wrapper that loads a file, and when it gets to the end without error emits a 0 "no error" return code, and any errors result in a non-zero code.
/dev/null and check the exit code. I'd be surprised if there were a ready-made solution for this somewhere.
Or maybe there are other cmd-line tools which issue a "failed to load" error. That's where I'd look first. Like a tool to strip content out of a PDF - script it so it outputs to