Slashdot Mirror


Ask Slashdot: What's a Good Tool To Detect Corrupted Files?

Volanin writes "Currently I use a triple boot system on my Macbook, including MacOS Lion, Windows 7, and Ubuntu Precise (on which I spend the great majority of my time). To share files between these systems, I have created a huge HFS+ home partition (the MacOS native format, which can also be read in Linux, and in Windows with Paragon HFS). But last week, while working on Ubuntu, my battery ran out and the computer suddenly powered off. When I powered it on again, the filesystem integrity was OK (after a scandisk by MacOS), but a lot of my files' contents were silently corrupted (and my last backup was from August...). Mostly, these files are JPGs, MP3s, and MPG/MOV videos, with a few PDFs scattered around. I want to get rid of the corrupted files, since they waste space uselessly, but the only way I have to check for corruption is opening them up one by one. Is there a good set of tools to verify the integrity by filetype, so I can detect (and delete) my bad files?"

27 of 247 comments (clear)

  1. AppleScript by noh8rz3 · · Score: 3, Interesting
    An AppleScript / Automator script can step through files on a hd, open them, and catch a thrown error if the open fails. Tis sits a good automated way to glad the bad ones. Not the fastest method, but it could run at night.

    you seem to be surprisingly ok with the fact that your computer crashed and all your documents and media were corrupted, as was your backup. I would have been beside myself. Hulk smash! Please let us know what different set ups you're exploring to avoid this.

    1. Re:AppleScript by dgatwood · · Score: 3, Insightful

      But the open usually won't fail. Unless the error is within the header bytes of a movie or image, the media will open, but will appear wrong. Worse, there is no way to detect this corruption because media file formats generally do not contain any sort of checksums. At best, you could write a script that looks for truncation (not enough bytes to complete a full macroblock), or write a tool that computes the difference between adjacent pixels across macroblock boundaries and flags any pictures in which there is an obvious high energy transition at the macroblock boundary, but even that cannot tell you whether the image is corrupt or simply compressed at a low quality setting with lots of blocking artifacts.

      The short answer, however, is "no". Such corruption can't usually be detected programmatically.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    2. Re:AppleScript by jasno · · Score: 3, Interesting

      Here's what I did when I realized my mp3 collection on my Mac was slowly dying:

      find -print -exec cat {} > /dev/null

      it takes a while, but for files with ioerrors you'll see a warning printed after the file name. Put the output in a file and you can use grep(the 'B' option comes to mind) to get a list of the bad files.

      The sad thing is that Time Machine didn't seem to notice that the files were bad, so now the files are gone forever. Disk Utility didn't help.

      Shouldn't there be a way to find bad blocks on OS X? I looked around and all I could find were commercial products.

      --

      http://www.masturbateforpeace.com/
  2. The BEST method.. by Anonymous Coward · · Score: 5, Funny

    is urgency. Corrupted files have the ability to detect urgency and your discovery of them will come in a form compatible with the laws of Murphy.

  3. For MP3s use amp3test.exe by denis-The-menace · · Score: 5, Informative

    2000-2001 MAF-Soft http://www.maf-soft.de/
    The version I have is v1.0.3.102

    It can scan single mp3s and entire folders structures for defects and logs everything if you wish. It will give you a percentage of how good the file is.

    Depending on the damage you may be able to fix headers and chop off corrupted tag info with something like a MP3Pro Trim v1.80.exe

    --
    Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
  4. md5sum by sl4shd0rk · · Score: 3, Interesting

    or sha1sum if you prefer. Automate in cron against a list of knowns.

    eg:
    $ md5sum /home/wilbur/Documents/* > /home/wilbur/Docs.md5
    $ md5sum -c /home/wilbur/Docs.md5

    --
    Join the Slashcott! Feb 10 thru Feb 17!
    1. Re:md5sum by subtr4ct · · Score: 3, Informative

      This type of approach is automated in a python script here.

  5. For JPEGs by Jethro · · Score: 4, Informative

    You can run jpeginfo -c. I have a script that runs against a directory and makes a list for when I do data recovery for all my friends who don't listen when I tell them their 10 year old laptop may be dying soon.

    --


    In the land of the blind, the one-eyed man is kinky.
  6. Re:Newbie question hour? by Volanin · · Score: 5, Informative

    Author here:

    > Last backup August.
    Yes, that was silly of me.

    > Thinks there is a way to detect generic file corruption
    There is no way to detect generic file corruption. But there is a way to detect specific filetype corruption. For example, I already found mp3val, that is able to scan all my mp3 and check for file integrity, and even fix a few kinds of corruption (such as unmatching bytes in the header and sound chunks). Maybe with the right set of tools, I might also detect (or even fix) my corrupted pictures, movies and books as well.

    --
    If I clone myself, can I call it a thread?
    If a girl winks to us, can I call it a race condition?
  7. Tech Tool Pro, perhaps by Anonymous Coward · · Score: 3, Informative

    Tech Tool Pro, over on the Mac side, has a "File Structures" check which looks at a lot of different structured file types to make sure that their internal format is valid.

  8. A lot of corrupt files? by 19thNervousBreakdown · · Score: 4, Interesting

    That seems very strange--the only files that should really be corrupted, unless something extremely rare and catastrophic happened, are the ones that were being written when power went out, or were cached. And even then, a flush usually flushes everything, or at least whole files at once, or areas of disk. Is the partition highly fragmented or something?

    I know this doesn't do much for your question, but that kind of failure mode is almost exactly what filesystems do their damnedest to avoid. HFS+, being journaled, should be even more proof against, well, exactly what happened to you. Maybe the Linux driver is poor, but man, if you got silent data corruption on a multitude of files that weren't even being written, that's really bad and the driver should be classified "EXPERIMENTAL" at best, and certainly not compiled into distros' default kernels.

    To answer your question, I don't have experience with any tools (I automate my backups, and any archival files go on a RAID volume that does a full integrity scan nightly), but once you find one, you should separate your files into two categories--"must be good", and "can be bad". The "must be good" files (serial #s, source code, etc.), you hand-check, so you know for certain that every one of them is good. It'll also motivate you to replace them now, instead of later when replacements will only get harder to come by. The "can be bad" files (music, pictures, etc.), you do the automated check on and then just delete as you run into ones that the check missed. This has the advantage of concentrating your effort into where it's useful. If you try to check all of your files, you'll just burn out before you finish. You may even want to do more advanced triaging, but you'll have to come up with the categories and criteria there. The main thing is, split this problem up.

    --
    <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    1. Re:A lot of corrupt files? by rrohbeck · · Score: 4, Informative

      Very few filesystems keep checksums - only btrfs and zfs come to my mind.
      With defective hardware (RAM issues in main memory and disk or controller caches are fun) you can have silent corruption that goes on for a long time. Also bits on disks rot but those should give you a CRC or ECC error.

  9. mplayer/mencoder (or ffmpeg) & imagemagick by Bonteaux-le-Kun · · Score: 4, Informative

    You can just run mencoder or ffmpeg on the mp3 and mov on all the files (with a small shell script, probably involving 'find' or similar), just tell it to write the output to /dev/null, that should go through those files as fast at they can be read from disk and abort with error on those that are broken. For the jpgs, you could try something similar with imagemagick's 'convert', to convert them to whatever format to /dev/null, which also needs to read the whole file content and aborts if they're broken (one should hope). Those converters are really fast, especially ffmpeg, so that should complete in a reasonable time.

  10. Check why the files are corrupted by ncw · · Score: 5, Insightful

    I'd be asking myself why lots of files became corrupted from one dodgy file system event. Assuming HFS works like file systems I'm more familiar with, it will allocate sequential blocks for files wherever it can. This means that a random filesystem splat is really unlikely to corrupt loads and loads of files. You might expect a file system corruption to cause a load of files to go missing (if a directory entry is corrupted) or corrupt a few files, but not put random errors into loads of files.

    I'd check to see whether files I was writing now get corrupted too. It might be dodgy disk or RAM in your computer.

    The above might be complete paranoia, but I'm a paranoid person when it comes to my data, and silent corruption is the absolute worst form of corruption.

    For next time, store MD5SUM files so you can see what gets corrupted and what doesn't (that is what I do for my digital picture and video archive).

    --
    Every man for himself, all in favour say "I"
  11. Re:compare them to an intact backup by Calos · · Score: 5, Insightful

    Well...

    My first suspicion would be that the filesystem is messed up, not the actual files. Unless s/he had a lot of pending writes to all of these files, there is no reason that something should have actually overwritten or garbled them when the power shut down. Much more likely was an impending or in-progress write to the filesystem's tables, which has affected where it thinks all the files' pieces are stored. And if that is the case, date modified and size may be irrelevant because those are going to be reported by the filesystem.

    Aside from trying to read back sector-by-sector data and assembling them, however, I don't know that there's a remedy.

    --
    I vote based on politicians' actions, unless contrary to my preconceptions. Often wrong, never uncertain. #iamthe99%
  12. Re:BSOD? No, use open source "Tripwire" by quarkscat · · Score: 3, Informative

    Not the BSOD.
    If the OP had used open source "tripwire" on known-good files in each filesystem on his Macbook, and saved the resultant data output to a USB thumbdrive formatted with FAT32, the OP would have had a good chance of determining all corrupted files. In this case, an ounce of prevention would have prevented several pounds of "cure".

    Check out http://tripwire.org./

  13. Re:file(1) by Volanin · · Score: 3, Informative

    Author here:

    At first I thought this idea wouldn't work. As some people have already written here, the 'file' command sometimes just checks for a few bytes. But since it is so easy to implement, why not give it a try? And indeed, for videos it worked quite well. Some of the corrupted MOV files were detected simply as 'data file' or even 'MPEG sequence' and were promptly deleted! Thank you for the idea.

    --
    If I clone myself, can I call it a thread?
    If a girl winks to us, can I call it a race condition?
  14. Re:Gamemaker sucks ass by Volanin · · Score: 5, Funny

    Author here:

    Ok, I could deal with the loss of some unique videos and pictures from travels... but now that you mention the porn... *weep*

    --
    If I clone myself, can I call it a thread?
    If a girl winks to us, can I call it a race condition?
  15. Re:Newbie question hour? by loftwyr · · Score: 4, Interesting

    mplayer can detect corrupted movie and audio files find . -name '*.mov' -exec mplayer -msglevel all=6 -speed 100.0 -framedrop -nogui -nolirc -cache 8192 -tskeepbroken -ao null -vo null {} \; | grep Warning! > $1.txt Change the *.mov as appropriate.

  16. Get Rid Of Paragon! by Lord_Jeremy · · Score: 5, Interesting

    Alright now I'm afraid I can't help with your verify problem but I do have one piece of solid advice: get rid of Paragon HFS immediately!

    It is a truly shoddy piece of software that as of version 9.0 has a terrible bug that will cause it to destroy HFS+ filesystems. Google "paragon hfs corruption" and you will see many many horror stories from people who just plugged a Mac OS X disk into a Windows machine w/ Paragon HFS and then discovered the entire filesystem was hosed. In my dual-boot win/mac setup I replaced my copy of MacDrive with a trial version of Paragon HFS 9.0 from their website and every single one of the six HFS+ disks I had connected internally were damaged. Disk Utility couldn't do a thing and I had to buy a program called Diskwarrior to even begin to recover data. I ended up losing two disks worth of files anyway.
    http://www.mac-help.com/t12137-opened-hfs-drive-win7-paragon-hfs-now-wont-boot.html
    http://www.wilderssecurity.com/showthread.php?t=299306
    http://hardforum.com/showthread.php?t=1677099
    http://www.avforums.com/forums/apple-mac/1509344-hfs-super-block-not-found.html

    whew! Anyway the pain I went through after that software very nearly ruined my life was so great, I don't want it to happen to anyone else. According to their own website 9.0 has this awful bug but they fixed it in 9.0.1. Evidently the trial download on the main page is still for version 9.0 and still has the disk destroying bug! Any software company that releases a filesystem driver with this terrible a bug (not to mention the numerous reports of BSODs and other relatively minor problems) clearly has terrible quality assurance and simply can't be trusted.

    1. Re:Get Rid Of Paragon! by macraig · · Score: 3, Interesting

      Having nothing at all to do with Paragon (not that I'm a fan of the company otherwise), I had a very similar disaster occur with an external eSATA 5TB RAID 5 enclosure. It's one that uses an internal hardware RAID 5 circuit and doesn't require port multiplication, so when connected it appears to the host as a single large volume. At the time I was swapping it between a Linux (Ubuntu) system and a Windows 7 system; it was of course configured as GPT. Eventually I connected it to the Windows 7 system and during boot Windows declared there were problems and initiated chkdsk. Chkdsk ran for more than 18 hours and when it was done, most of the files in the volume were hopelessly corrupted. Upon detailed inspection, I found that blocks of all the files were swapped and intermingled, as if something had made a jigsaw puzzle out of the MFT and couldn't reassemble Humpty Dumpty. Was it chkdsk itself that caused the damage? Was it the swapping between two machines and operating systems (both GPT compliant)? I suspect it was actually caused by chkdsk, but could never prove it.

  17. Re:compare them to an intact backup by ncw · · Score: 5, Informative

    That is a good thought, and photorec does an excellent job of finding pictures and videos by searching through your sectors - definitely worth a try.

    http://www.cgsecurity.org/wiki/PhotoRec_Step_By_Step

    --
    Every man for himself, all in favour say "I"
  18. Re:right filesystem by d3vi1 · · Score: 4, Informative

    Two aspects to your problem:

    1) Recovering from the current situation

    If you didn't make ANY changes to the filesystem after it was corrupted, you still have a chance with software like DiskWarrior or Stelar Phoenix. Never work on the original corrupted filesystem unless you have copies of it. So grab a second drive, connect it over USB and using hdiutil or dd copy it to the second drive. Once you do that, use DiskWarrior or Stelar Phoenix on either one of the copies, while keeping the other one intact. Always have an intact copy of the original FS. You might be successful trying multiple methods, so KEEP AN INTACT COPY.

    2) Avoiding it in the future
    NTFS is good at surviving a crash if and only if the crash occurs in Windows. Paragon NTFS for Mac/Linux or NTFS-3G don't use journaling to it's full extent (for both metadata and data). So, if you get a crash while in Mac OS X or Linux, chances are that you get data corruption.

    Same goes for HFS+. While Mac OS X uses journaling on HFS+, Linux doesn't. It's read-only in Linux if it has journaling. Furthermore, the journaling is metadata only in HFS+.

    Now we get to the last journaled filesystem available to all 3 OSs: EXT3. It's the same crap as above.

    Because of the three points above, I have a conclusion: what you're looking for (ZFS) hasn't been invented on any of the OSs that you're using.
    Thus, I have a simple recommendation:
    Use ZFS in a VMware machine exported via CIFS/WebDAV/NFS/AFP to Linux, Windows or Mac OS X. A small FreeNAS VM with 256MB of RAM can run in VMWare Player and Workstation on Windows/Linux and Fusion on OS X.

    ZFS uses checksumming on the filesystem blocks, which lets you know of the silent corruptions. Furthermore, by design, it will be able to roll-back any incomplete filesystem transactions. I've had my arse saved by ZFS more times than I care to remember. The most difficult thing for my home storage system is to find external disk arrays that give me direct access to all the disks (not their RAID crap). A proper home storage system is RAIDZ2 (basically RAID6) + Hot Spare.

    Another way is to have a simple, TimeMachine-like backup solution on at least one of your operating systems. But even that doesn't catch silent data corruptions, let alone warn you. As such, we get back to: ZFS.

    --
    UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
  19. Re:Newbie question hour? by Anonymous Coward · · Score: 5, Funny

    mplayer can detect corrupted movie and audio files find . -name '*.mov' -exec mplayer -msglevel all=6 -speed 100.0 -framedrop -nogui -nolirc -cache 8192 -tskeepbroken -ao null -vo null {} \; | grep Warning! > $1.txt Change the *.mov as appropriate.

    <infomercial>its JUST. THAT. EASY folks!</infomercial>

  20. Photorec is great BUT by rduke15 · · Score: 3, Interesting

    Indeed, I used photorec/testdisk to recover mp4 files after they had (all) been accidentally deleted from an HFS+ partition.

    But when I first started it in it's default mode, it "found" only rubbish, breaking up the actual mp4s into a mess of .doc, xml, jpg, .whatever files, including totally broken .mp4s.

    When I restarted it after configuring it to only look for .mov/.mp4, it did a fantastic job, and as far as I know, all files could be recovered. Of course, that was made easier by the fact that I knew that all the files which needed to be recovered were .mp4.

  21. Re:Your eyes by Score+Whore · · Score: 4, Informative

    Well, jpeg files have a structure that will generate detectable errors if it's damaged. So simply opening them with something as simple as djpeg from the IJG and piping the output to /dev/null should give you a pretty good start on damaged images. Something like this perhaps:

    find . -name "*jpg" -o -name "*jpeg" -o -name "*JPG" -o -name "*JPEG" | while read filename; do if djpeg "$filename" > /dev/null 2> then :; else echo "$filename" is toast; fi; done

    You could probably do something similar with mpg123 and mplayer for .mp3 and movies.

  22. Re:Your eyes by Zaiff+Urgulbunger · · Score: 5, Informative
    Might be better using the "identify" command of ImageMagick. The man page says:

    The identify program is a member of the ImageMagick(1) suite of tools. It describes the format and characteristics of one or more image files. It also reports if an image is incomplete or corrupt.