Slashdot Mirror


Ask Slashdot: What's a Good Tool To Detect Corrupted Files?

Volanin writes "Currently I use a triple boot system on my Macbook, including MacOS Lion, Windows 7, and Ubuntu Precise (on which I spend the great majority of my time). To share files between these systems, I have created a huge HFS+ home partition (the MacOS native format, which can also be read in Linux, and in Windows with Paragon HFS). But last week, while working on Ubuntu, my battery ran out and the computer suddenly powered off. When I powered it on again, the filesystem integrity was OK (after a scandisk by MacOS), but a lot of my files' contents were silently corrupted (and my last backup was from August...). Mostly, these files are JPGs, MP3s, and MPG/MOV videos, with a few PDFs scattered around. I want to get rid of the corrupted files, since they waste space uselessly, but the only way I have to check for corruption is opening them up one by one. Is there a good set of tools to verify the integrity by filetype, so I can detect (and delete) my bad files?"

247 comments

  1. compare them to an intact backup by allo · · Score: 0

    then you see which are changed. then check, if the file is much smaller or corrupted.

    1. Re:compare them to an intact backup by Pokermike · · Score: 2

      And even though your last backup is from August, this will still constrain the number of files you potentially have to eyeball.

    2. Re:compare them to an intact backup by Anonymous Coward · · Score: 2, Insightful

      Consider the possibility that the backup already contains corrupted files. I once had defective RAM where only one bit flipped occasionally. The machine was quite stable, so the defect went undetected and over a couple of months it silently corrupted hundreds of files. Unless he finds out what caused the crash, he can't be sure that the backup is alright.

    3. Re:compare them to an intact backup by Calos · · Score: 5, Insightful

      Well...

      My first suspicion would be that the filesystem is messed up, not the actual files. Unless s/he had a lot of pending writes to all of these files, there is no reason that something should have actually overwritten or garbled them when the power shut down. Much more likely was an impending or in-progress write to the filesystem's tables, which has affected where it thinks all the files' pieces are stored. And if that is the case, date modified and size may be irrelevant because those are going to be reported by the filesystem.

      Aside from trying to read back sector-by-sector data and assembling them, however, I don't know that there's a remedy.

      --
      I vote based on politicians' actions, unless contrary to my preconceptions. Often wrong, never uncertain. #iamthe99%
    4. Re:compare them to an intact backup by ncw · · Score: 5, Informative

      That is a good thought, and photorec does an excellent job of finding pictures and videos by searching through your sectors - definitely worth a try.

      http://www.cgsecurity.org/wiki/PhotoRec_Step_By_Step

      --
      Every man for himself, all in favour say "I"
    5. Re:compare them to an intact backup by VIPERsssss · · Score: 1

      I have used pc inspector file recovery with mixed results. It's slow but I have recovered pictures and documents. There's not much you can do if the file has already been stepped on.

      --
      We are eternal, all this pain is an illusion.
    6. Re:compare them to an intact backup by LordLimecat · · Score: 2

      Seconding the photorec / testdisk suite, they are incredible. I would rate it up with ddrescue as the top 2 data recovery tools.

    7. Re:compare them to an intact backup by DrVxD · · Score: 1

      Consider the possibility that the backup already contains corrupted files

      In which case, it's not a backup - it's just a waste of storage space.

      --
      Not everything that can be measured matters; Not everything that matters can be measured.
    8. Re:compare them to an intact backup by Anonymous Coward · · Score: 0

      WinHex is an excellent disk editor that will let you see all the files regardless of what the filesystem says. It's also used in forensics.

    9. Re:compare them to an intact backup by bedonnant · · Score: 1

      This happened to me once. At first I thought it was a hard drive problem, it took me a while to figure it out. I lost many pictures in the process, and had rsync'd corrupted files over my backup. To me the problem now is: how to make sure what I'm backing up is not corrupted and that my previous backup, about to be overwritten, is not cleaner than my source?

      --
      ~~~ Paf. Le chien.
  2. AppleScript by noh8rz3 · · Score: 3, Interesting
    An AppleScript / Automator script can step through files on a hd, open them, and catch a thrown error if the open fails. Tis sits a good automated way to glad the bad ones. Not the fastest method, but it could run at night.

    you seem to be surprisingly ok with the fact that your computer crashed and all your documents and media were corrupted, as was your backup. I would have been beside myself. Hulk smash! Please let us know what different set ups you're exploring to avoid this.

    1. Re:AppleScript by Anonymous Coward · · Score: 0

      Tis sits a good automated way to glad the bad ones.

      What? Are you sure your computer isn't crapping out on you too?

    2. Re:AppleScript by dgatwood · · Score: 3, Insightful

      But the open usually won't fail. Unless the error is within the header bytes of a movie or image, the media will open, but will appear wrong. Worse, there is no way to detect this corruption because media file formats generally do not contain any sort of checksums. At best, you could write a script that looks for truncation (not enough bytes to complete a full macroblock), or write a tool that computes the difference between adjacent pixels across macroblock boundaries and flags any pictures in which there is an obvious high energy transition at the macroblock boundary, but even that cannot tell you whether the image is corrupt or simply compressed at a low quality setting with lots of blocking artifacts.

      The short answer, however, is "no". Such corruption can't usually be detected programmatically.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    3. Re:AppleScript by dgatwood · · Score: 1

      I should clarify. If you are intimately familiar with the format, and if it is a multi-frame format, such as a compressed audio or video format, it is possible to programmatically detect that there are frames that reference illegal frames, frames whose structure is not valid, etc. in much the same way that you can detect a JPEG file whose header is invalid.

      Again, though, none of this will be caught by merely opening the movie; the movie will generally play correctly up until the decoder encounters the error, at which point it may recover and continue playing content after the gap, or it may just choke and die. Either way, detection isn't something that can be easily automated.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    4. Re:AppleScript by vlm · · Score: 2

      The TLDR version is this scenario is why you configure your mythtv box to store MPEG TS which have embedded CRC error detection and recovery instead of MPEG PS which are irrelevantly smaller, if you have the option.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    5. Re:AppleScript by K.+S.+Kyosuke · · Score: 1

      Doesn't MPlayer report most file corruptions to stdout or stderr even if the playback continues? You should be able to grep for it. Granted, it isn't bulletproof, but I often get warnings even if the playback seems fine - is seems to be sensitive. I don't think it would ignore jumbled sectors.

      --
      Ezekiel 23:20
    6. Re:AppleScript by jasno · · Score: 3, Interesting

      Here's what I did when I realized my mp3 collection on my Mac was slowly dying:

      find -print -exec cat {} > /dev/null

      it takes a while, but for files with ioerrors you'll see a warning printed after the file name. Put the output in a file and you can use grep(the 'B' option comes to mind) to get a list of the bad files.

      The sad thing is that Time Machine didn't seem to notice that the files were bad, so now the files are gone forever. Disk Utility didn't help.

      Shouldn't there be a way to find bad blocks on OS X? I looked around and all I could find were commercial products.

      --

      http://www.masturbateforpeace.com/
    7. Re:AppleScript by LordLimecat · · Score: 1

      File corruption wont generate ioerrors I dont think. Your system may be able to properly read data from the disks, data that it thinks is what you requested, its just that the data is bad. A computer isnt going to generally be able to detect that without either knowledge of the file format, or checksums.

    8. Re:AppleScript by s.petry · · Score: 1

      This is what the mythic WinFS is supposed to catch, and why it performs so horribl. Relies on external libraries that understand file formats just to write a file and check integrity.

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    9. Re:AppleScript by dgatwood · · Score: 1

      File corruption wont generate ioerrors I dont think.

      Depends on the cause of the corruption. If it was caused by bad RAM or a hard drive with bad cache memory on the board, causing filesystem corruption that randomly permutes data, then it won't cause any detectable I/O error. If it was caused by a bad block, it will. Then again, when your OS repaired the file system, it should have flagged the overlapping extents/blocks in such a way that you can determine which files were potentially corrupted, so there should still be a log unless the corruption is occurring actively while you read data off the disk (bad cache RAM on the drive itself), in which case your best bet is probably reading data off the disk a block at a time, flushing the disk cache before each block read command. This will take weeks, however. Or swap the controller board from an identical drive and use that board while cloning it to a third drive.

      The latter type of corruption (bad blocks) is more common than the former (silent corruption), though regrettably the former seems to be becoming more and more common lately, which has me growing quite concerned about data integrity with modern hard drives.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    10. Re:AppleScript by wolrahnaes · · Score: 1

      Relies on external libraries that understand file formats just to write a file and check integrity.

      *citation needed*

      To my knowledge WinFS uses checksums for data integrity, just like many other filesystems. The external libraries that understand the file are for metadata extraction on files saved by applications that are not WinFS aware, since the main "selling point" of WinFS was its metadata storage and search capabilities. WinFS-aware applications will of course be expected to set their own metadata.

      --
      I used to get high on life, but I developed a tolerance. Now I need something stronger.
  3. file(1) by Anonymous Coward · · Score: 1

    If the entire contents of the files are messed up, you could write a quick script that compares the output of file(1) to the file extension. I wouldn't call this high-fidelity - I'd recommend using this to generate a list you go through by hand - but it's at least a starting place.

    1. Re:file(1) by Volanin · · Score: 3, Informative

      Author here:

      At first I thought this idea wouldn't work. As some people have already written here, the 'file' command sometimes just checks for a few bytes. But since it is so easy to implement, why not give it a try? And indeed, for videos it worked quite well. Some of the corrupted MOV files were detected simply as 'data file' or even 'MPEG sequence' and were promptly deleted! Thank you for the idea.

      --
      If I clone myself, can I call it a thread?
      If a girl winks to us, can I call it a race condition?
    2. Re:file(1) by blueg3 · · Score: 1

      Another possibility is to use hachoir to check the validity of each file's internal metadata.

  4. BSOD? by G3ckoG33k · · Score: 1

    "What's a Good Tool To Detect Corrupted Files?"

    BSOD?

    1. Re:BSOD? by linear+a · · Score: 1

      "BSOD?" Naw. That just detects a corrupt O/S.

  5. Linux Command: file by Anonymous Coward · · Score: 1, Insightful

    Try running "file" from a command line on a few files you know to be corrupt. If the file command tells you the same, you could run a quick bash script to loop through the files and spit out the names of the bad ones. This is all assuming you know what you are doing with shell scripting.

  6. CRC32 or any other quick hash of the files by Anonymous Coward · · Score: 0

    Compare files hash to known hash of good file.

  7. This might help by Anonymous Coward · · Score: 0

    I'm not entirely sure if this will apply to your setup because corruption is kinda sporadic but....

    Way back in my IRC days I ran an Fserv, DCC transfers between different versions of MIRC was a nightmare and quite often ended up corrupting files.

    What I did as a half assed method, was write a simply batch script (this was a long time ago) that scanned my folders and created a text file listing of each file, extension, and most importantly, file size. I'd try to run that script daily, but you could easily automate it to run with scheduling.

    Then I wrote another simple script to basically parse the current folder contents with the latest list I created, any differences in file sizes would be spit out to another text file. This would be basic command line stuff in linux, but again, this only catches the files that have changed size since the scan, it's not a fool proof corruption method.

    1. Re:This might help by Anonymous Coward · · Score: 1

      That won't help detect corruption, only truncation of files. You would need an md5 or similar hash.

    2. Re:This might help by jkflying · · Score: 1

      CRC is faster than md5, and for random corruption just as effective.

      --
      Help I am stuck in a signature factory!
    3. Re:This might help by vlm · · Score: 1

      That won't help detect corruption, only truncation of files. You would need an md5 or similar hash.

      md5 is (relatively) slow. a simple CRC-32 will only fail you for 1 in 2 ** 32 corruptions, and I suspect the guy doesn't even have 2 ** 16 files so the odds are CRC-32 is more than good enough and significantly faster.

      Then again, he's probably going to be hard drive speed limited not CPU limited. Then again, no point wasting laptop battery on an overly complicated algorithm. CRC32 is gonna use at least 1/5th the CPU/wallclock time and/or battery of md5.

      The tradeoff boils down to you can use md5 and burn at least 5 times more battery/heat/wall clock time (whatever is your limiting reagent) in exchange for (128-32) = 2 ** 96 times lower likelihood of mistake. The problem with accepting 2 ** 96 higher reliability is his dying hard drive probably cannot provide 2 ** 16 reliability so increasing the algorithm is a waste since it's already asymptotically limited.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    4. Re:This might help by flargleblarg · · Score: 1

      md5 is (relatively) slow. a simple CRC-32 will only fail you for 1 in 2 ** 32 corruptions, and I suspect the guy doesn't even have 2 ** 16 files so the odds are CRC-32 is more than good enough and significantly faster.

      Not true. At all. On modern systems, MD5 is just as fast as CRC-32 because the disk is the bottleneck, not the CPU.

    5. Re:This might help by thereitis · · Score: 1

      Today with Mercurial/Git you can easily create a local repository that contains your text file listing and checkin changes. Your 'hg log' will show all differences over time, which could be useful.

    6. Re:This might help by thereitis · · Score: 1

      I wrote a disk cataloger years ago and was surprised at the number of CRC32 collisions I found within my own set of files (on CDs at the time - I have a lot more data today). I recall that pairing CRC32 with file size helped a lot (and probably even solved the problem in my case).

    7. Re:This might help by petermgreen · · Score: 1

      identification puts much higher requirements on a hash algorithm than corruption detection.

      Lets connsider corruption detection.

      Say you have c corruputed files.
      Now suppose our hash has m possibilities all equally likely (the ideal case for a hash function).
      The average number of files that will pass the hash test despite being corrupt is c/m

      Now lets consider identification

      Say you have n different files, the number of possible pairs of files is ((n^2)-n)/2 . For large n we can approximate this as n^2/2 .
      Now suppose our hash has m possibilities all equally likely (the ideal case for a hash function). The probability of any pair files having colliding hashes colliding is 1/m
      So the average number of collisions is approximately (n^2)/(2h).

      Do the sums and you find you need a much larger hash to get acceptable performance in an identification application than in a corruption detection application.

      Note: all the above assumes no malice is involved. If malice is involved then the requirements on the hash get much tighter.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  8. The BEST method.. by Anonymous Coward · · Score: 5, Funny

    is urgency. Corrupted files have the ability to detect urgency and your discovery of them will come in a form compatible with the laws of Murphy.

  9. WHAT GOOD TOOL??? by Anonymous Coward · · Score: 0

    What ever. Look at /. advertisers. Why are you pestering us with mundane marketing questions?

    I know - /. is a marketing company now ... gathering marketing data.

    My Bad.

  10. No easy answer by gstrickler · · Score: 1, Insightful

    1. Compare to backup, files that match are ok.
    2. AppleScript option others mentioned may help reduce it further.
    3. Backup regularly, and verify your backup procedure.
    4. Anything else will cost you consulting rates.

    --
    make imaginary.friends COUNT=100 VISIBLE=false
    1. Re:No easy answer by Anonymous Coward · · Score: 0

      Yes there is. I used a 15lb. sledgehammer on my eslate that I suspected had corrupt files and I was right. They were all corrupted.

    2. Re:No easy answer by interval1066 · · Score: 0

      Perl or bash will do this quite easily, run a hash compare of the two files, if they don't match delete the bad flle. Is this a serious question?

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    3. Re:No easy answer by gstrickler · · Score: 1

      Which part of "last backup is from August" is unclear to you? For any file that has been updated since his terribly outdated back up, your answer is useless and incorrect. Or did you not RTFS?

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    4. Re:No easy answer by nateb · · Score: 1

      It's simple enough to check file dates to mitigate the generation of false positives.

      --
      -- Nate
    5. Re:No easy answer by gstrickler · · Score: 1

      That still doesn't address the issue. Yes, that will identify files that the metadata says haven't changed, but don't match. That's step 1 in my original post. But it does absolutely nothing about determining which files that are new or have changed since his last backup may be corrupt (step 2 in my list).

      Neither you, nor the poster I replied above have added anything to the discussion. If you have read the question, and have something useful to add, please do so.

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    6. Re:No easy answer by Anonymous Coward · · Score: 0

      Try a commercial product (I'm a user not a salesman): Beyond Compare (from http://www.scootersoftware.com/)

      It will do the file comparison in several convenient ways. Works on Linux and Windows systems. A free trial is available and it is very affordable if you want to buy it.

    7. Re:No easy answer by DrVxD · · Score: 1

      Why bother hashing? Just compare the contents of the files.

      --
      Not everything that can be measured matters; Not everything that matters can be measured.
    8. Re:No easy answer by CountBrass · · Score: 1

      The problem is the answer that takes into account that he hasn't taken a backup in 8 months is "you are screwed, learn from it and move on."

      The gp was trying to be more helpful: don't bitch at him because the op was foolish.

      --
      Bad analogies are like waxing a monkey with a rainbow.
    9. Re:No easy answer by gstrickler · · Score: 1

      Which part of his answer adds anything to my original post? Answer None. His answer addresses only the first step in my original post, and with an 8 month old backup, that step alone isn't very useful.

      --
      make imaginary.friends COUNT=100 VISIBLE=false
  11. For MP3s use amp3test.exe by denis-The-menace · · Score: 5, Informative

    2000-2001 MAF-Soft http://www.maf-soft.de/
    The version I have is v1.0.3.102

    It can scan single mp3s and entire folders structures for defects and logs everything if you wish. It will give you a percentage of how good the file is.

    Depending on the damage you may be able to fix headers and chop off corrupted tag info with something like a MP3Pro Trim v1.80.exe

    --
    Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
  12. md5 and shell scripting by swanzilla · · Score: 1, Offtopic

    Go nuts.

  13. diff by Anonymous Coward · · Score: 0

    use the 'diff' command between your backups and the originals.

    diff -rq /backuplocation /originallocation could work a treat.

    Switches:
    r = recursive
    q = tell only if files differ

    1. Re:diff by hoggoth · · Score: 2

      These comments are full of 'helpful' suggestions to compare to backup or to md5's generated from the backups.
      That makes no sense.
      If he has a good set of backups JUST RESTORE THE BACKUPS to get known good files back. Why would you read every backup file and every current file, then compare them, then make a list of ones that don't match just to restore the backups. Restore them all. done.

      --
      - For the complete works of Shakespeare: cat /dev/random (may take some time)
  14. Can you compare to backup? by Anonymous Coward · · Score: 1

    Suppose your volume is mounted under /mnt/a and your backup is mounted under /mnt/b. Something like this should work:

    for f in $(find /mnt/a -mtime -2 | sed -r 's/(^/mnt/a/)(.*)/\2/') ; do

    cmp /mnt/a/"$f" /mnt/b/"$f"

    done

    That should find all files which have been modified within the past 2 days and differ from your backup, which will help narrow down your search. I don't know of a tool that will address your specific question about testing for integrity for particular file formats. For specific file formats, you can automatic this, of course, like using ImageMagick for image files, but I don't know of a tool that will just do everything. It shouldn't be hard to write a script to look at the extension and the output of the "file" command and determine which tool to use to automatically check integrity for that specific file format.

  15. md5sum by sl4shd0rk · · Score: 3, Interesting

    or sha1sum if you prefer. Automate in cron against a list of knowns.

    eg:
    $ md5sum /home/wilbur/Documents/* > /home/wilbur/Docs.md5
    $ md5sum -c /home/wilbur/Docs.md5

    --
    Join the Slashcott! Feb 10 thru Feb 17!
    1. Re:md5sum by subtr4ct · · Score: 3, Informative

      This type of approach is automated in a python script here.

    2. Re:md5sum by Anonymous Coward · · Score: 1

      another script that wraps md5sum with some convenience features:

      http://worldsworstsoftware.com/md5tool.html

    3. Re:md5sum by Anonymous Coward · · Score: 0

      or sha1sum if you prefer. Automate in cron against a list of knowns.

      eg:
      $ md5sum /home/wilbur/Documents/* > /home/wilbur/Docs.md5
      $ md5sum -c /home/wilbur/Docs.md5

      Unless you suspect malicious tampering no need to step up to sha1sum. It's more intensive computationally, and md5 clashes are rare enough that you won't have a problem within your lifetime.

      Also I use fsum because it will do entire directory trees recursively.

    4. Re:md5sum by chagrinish · · Score: 1

      or sha1sum if you prefer. Automate in cron against a list of knowns.

      eg: $ md5sum /home/wilbur/Documents/* > /home/wilbur/Docs.md5 $ md5sum -c /home/wilbur/Docs.md5

      Definitely comparing md5/sha hashes against the backup files is the way to go. This will get every corrupted file regardless if the file has structural integrity.

  16. Re:Gamemaker sucks ass by binarylarry · · Score: 2, Funny

    Have some respect, the man just lost his entire porn stash.

    --
    Mod me down, my New Earth Global Warmingist friends!
  17. backups, backups, backups by ballyhoo · · Score: 1

    If you're talking about recovery tools, you're already on the wrong track. A Time Capsule costs $300. How much is your data worth? How much are the tools going to cost to recover it? How much is your time worth? I'll bet that the sum of those last three things is a whole pile more than 300 bucks.

    If I were you, the thing I'd buy right now is a good backup solution. Re: your existing data, take a full image of your hard disk and take your time recovering it.

    Once you've new backup system, you can then sit there with a big smile on your face and comment smugly on all future /. posts about data loss.

    Have I lost data? Hell yeah. And it will never happen again.

    -bh

    1. Re:backups, backups, backups by spire3661 · · Score: 1

      This is pretty much the overriding sentiment. OP is going to get smugness from us because we've all been there and we all know there is no substitute for vigilance. He failed in his vigilance. Im not trying ot be a dick but rather to really drive home that backups are serious and you should treat them as such. Data > hardware ALWAYS.

      P.S. Synology NAS > Time Capsule by an order of magnitude.

      --
      Good-bye
    2. Re:backups, backups, backups by AshtangiMan · · Score: 1

      Agreed. My data loss happened from theft, and the backup was stolen as well. Now my backup drive sits hidden away, wirelessly capturing my backups. Time capsule is a good solution, but there are others. I just bought a 2gb external drive for $160, combined with a wireless router that has a usb port could be a less expensive alternative. I'm actually thinking that the 2gb drives might not be a good backup solution, and am looking into building a NAS specifically for backup using 4 500gb drives in a Raid 5 configuration as a backup (I know raid isn't backup, but that doesn't mean your backup can't be a raid array).

  18. double-click on the file icon in Explorer by Anonymous Coward · · Score: 0

    If you see IE come up, followed by several dozen browser tabs in split-second progression, followed by a lot of message boxes proclaiming "Warning! This computer is infected by trojan pirate hacker malware viruses!". And you later open your inbox and see a lot of spam from debt remediation services.

    That's a bad sign. I would delete the file.

  19. A question about NTFS versus other file systems... by Anonymous Coward · · Score: 0

    An honest question :

    I've had several crashes over the years with Windows XP but the files, data and system files were never corrupted.
    In linux it seems that file systems are not very resilient, and the least crash can corrupt your files.
    Is NTFS such a good well designed file system compared to linux file systems ?

  20. For JPEGs by Jethro · · Score: 4, Informative

    You can run jpeginfo -c. I have a script that runs against a directory and makes a list for when I do data recovery for all my friends who don't listen when I tell them their 10 year old laptop may be dying soon.

    --


    In the land of the blind, the one-eyed man is kinky.
    1. Re:For JPEGs by Volanin · · Score: 1

      Author here:

      This method detected a single corrupted picture.
      Probably my pictures were the least affected of all my data.
      Thanks for the great idea.

      --
      If I clone myself, can I call it a thread?
      If a girl winks to us, can I call it a race condition?
    2. Re:For JPEGs by Jethro · · Score: 1

      Glad I could help!

      --


      In the land of the blind, the one-eyed man is kinky.
  21. Use mtree by Anonymous Coward · · Score: 0

    see man mtree for details.

  22. the answer is not "file" by vlm · · Score: 2

    unix "file" is not the answer. For some formats it does as little as look at a couple header bytes. Its a great tool to guess a format. Its a terrible verifying parser and does nothing to verify content.

    An example of what I'm getting at, with some made up details, unfortunately html is not like well formed xml and every viewer is different anyway so the best way to figure out if a html web page file format is corrupt is unfortunately to pull it up in firefox. This only detects corruption in the structure of the file, if the corruption is just a couple bits then you end up with problems like tQis where the only way to see the h got fouled up is to write more or less a IQ 100 artificial intelligence. All "file" is going to test is pretty much does the file begin with or contain a regex something like less-than html greater-than (getting past the filters).

    For content you could F around with, for example, piping a mp3 file thru a decoder and then thru an averaging spectrum analyzer and see if there's anything overly unusual in the spectrum. Also some heuristics like is the file only 1 second long, then its F'ed up.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    1. Re:the answer is not "file" by Anonymous Coward · · Score: 0

      This has implications for steganography too. If you could solve this, it would make it a lot harder to hide data.

  23. Mod parent up by Anonymous Coward · · Score: 0

    After one page of nonsense and replies from people who didn't even bother to read the question, finally a useful answer!

  24. Why are there no good desktop filesystems? by Anonymous Coward · · Score: 0

    Why isn't there even one filesystem for desktop PCs which stores per-block checksums in the metadata? We store the freakin' last-accessed timestamp whenever a file is read, but no checksums?

    1. Re:Why are there no good desktop filesystems? by vlm · · Score: 1

      This reminds me of parity and ECC memory battles of decades past. OK, so it detects an error... Then what? Shut off the power? Not really sure what you'll be gaining. The sole example where it works is when you have the policy and budget to replace anything that takes an error. Useless for this situation.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    2. Re:Why are there no good desktop filesystems? by nyctopterus · · Score: 1

      Then what? Restore from last (good) backup, instead of propagating the corrupted file through the backup system until the good version is lost, surely?

    3. Re:Why are there no good desktop filesystems? by Anonymous Coward · · Score: 0

      ZFS and btrfs both do just that.

    4. Re:Why are there no good desktop filesystems? by marcosdumay · · Score: 1

      The sole example where it works is when you have the policy and budget to replace anything that takes an error.

      Ok, forgetting that ECC also corrects random errors that happen on functional hardware... WTF? Of course detecting problems is only usefull if you have the 'policy' of correcting them somehow.

    5. Re:Why are there no good desktop filesystems? by Anonymous Coward · · Score: 0

      The up-coming Linux btrfs does use checksums. For RAID set-ups, it uses it to automatically recover corrupt blocks from the other copies/parity. For RAID0 or non-RAID, it can only tell you that the file is corrupt though.

    6. Re:Why are there no good desktop filesystems? by vlm · · Score: 1

      Most end users don't have that policy. Is it running right now? Well wait until it breaks completely and is no longer usable in any form.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    7. Re:Why are there no good desktop filesystems? by petermgreen · · Score: 1

      The real trick is to combine checksuming with mirroring or reconstructive parity.

      If you only have mirroring/reconstructive parity you can reconstruct the data if you know which block is bad
      If you only have checksumming you can detect which block is bad but you can't do anything about it.
      If you have BOTH mirroring/reconstructive parity and things are set up properly so they work together*

      Afaict this was/is the killer feature of zfs, I think btrfs now has it too but i'm not positive (though the impression i've got is that btrfs is still too unstable generally to be used as a main fs).

      *Having a raid system below a checksumming system doesn't work for this because the raid system may overwrite the good data with the bad during a resync. Having a checksumming system below a raid system would work but runs into the problem that checksumming is difficult to do efficiently below the filesystem layer (because you need to store the checksums somewhere) so the only real way to do it efficiently and correctly is to integrate both checksumming and redundancy into the filesystem.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    8. Re:Why are there no good desktop filesystems? by petermgreen · · Score: 1

      If you have BOTH mirroring/reconstructive parity and things are set up properly so they work together*

      oops, didn't complete that bit, it should have said

      If you have BOTH checksumming and mirroring/reconstructive parity and things are set up properly so they work together* then you can detect which block is bad and then take action to recover the data.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  25. Using "find" and "file" by Anonymous Coward · · Score: 0

    Well, I don't own nor use any Apple products, but I do have some Linux experience.

    It would depend entirely on the type of corruption (fully corrupt vs randomly corrupt), but you could try the find command, combined with the file command to detect a files mime type (which if zerod or garbage, would be obvious).

    For example:

        find /path/to/start/at -type f -exec file "{}" \;

    That would look in "/path/to/start/at" and everything below, for all files ("-type f"), and for each one found, would run the command ("-exec file '{}'") on it. The last part is "\;" since you need to tell the find command that there is "no more", but since ";" already means something to the shell, we need to escape it with a backslash ("\").

    The output you get will depend on what it is you are running the command on. If it is an image, file will report such. If it just says "data" or "empty", it is probably corrupt, or worth manually investigating.

  26. Re:Newbie question hour? by Volanin · · Score: 5, Informative

    Author here:

    > Last backup August.
    Yes, that was silly of me.

    > Thinks there is a way to detect generic file corruption
    There is no way to detect generic file corruption. But there is a way to detect specific filetype corruption. For example, I already found mp3val, that is able to scan all my mp3 and check for file integrity, and even fix a few kinds of corruption (such as unmatching bytes in the header and sound chunks). Maybe with the right set of tools, I might also detect (or even fix) my corrupted pictures, movies and books as well.

    --
    If I clone myself, can I call it a thread?
    If a girl winks to us, can I call it a race condition?
  27. Re:A question about NTFS versus other file systems by lordbeejee · · Score: 0

    Not sure if it's the HFS+ filesystem, I let my battery run out a lot while working on my ubuntu laptop (ext4 fs), never have a prob.

  28. right filesystem by zdzichu · · Score: 2

    You need good filesystem, with embedded data checksum and self-healing using redundant copies. For Linux - btrfs is fine. For Mac OS X & Linux - ZFS.

    --
    :wq
    1. Re:right filesystem by ltwally · · Score: 1

      The best filesystem to survive a crash is a filesystem designed for an operating system that is expected to crash: NTFS.

      --



      /dev/random
    2. Re:right filesystem by Volanin · · Score: 1

      Author here:

      The problem lies in finding a filesystem that can be accessed by all three OSes. I would go with NTFS as well, but last time I tried, MacOS could not write to it. Every guide out there recommends FAT32, but the 4GB file size limitation is a deal breaker for me.

      --
      If I clone myself, can I call it a thread?
      If a girl winks to us, can I call it a race condition?
    3. Re:right filesystem by Githaron · · Score: 1

      The best filesystem to survive a crash is a filesystem designed for an operating system that is expected to crash: NTFS.

      I don't know if I should laugh or ask what evidence that you have NTFS is the "best".

    4. Re:right filesystem by cpu6502 · · Score: 1

      I use RAR to split the >4GB files in half. To date I'veonly needed to do that once (a DVD rip).

      --
      My AC stalker: " I personally agree with your posts most of the time, but that won't keep me from modding you troll"
    5. Re:right filesystem by vux984 · · Score: 2

      10.5 and 10.6 and I assume 10.7 have read/write support but its not enabled by default, and is not officially supported.

      http://hints.macworld.com/article.php?story=20090913140023382

      Also you are using paragon HFS+ for windows... you should already be aware they have Paragon NTFS for Mac.

      A bigger question is whether NTFS is the best filesystem to use, and that's a separate question entirely. And that's a question I don't know the answer to.

      So, if the primary OS was windows... then I'd use NTFS.

      But if you spend most of your time in linux, and do most of the filesystem writing from linux... then I'd probably pick something robust and linux-native, and then get solutions for OSX and Windows to read it...

    6. Re:right filesystem by Shoe+Puppet · · Score: 2

      NTFS-3G supports writing to NTFS. AFAIK, most Linux distributions use it instead of the kernel driver and there's a OSX port as well.

      --
      (+1, Disagree)
    7. Re:right filesystem by marcosdumay · · Score: 1

      The problem with that rationale is that the set of developers that make systems that crash often is hightly correlated with the set of developers that make FSs that corrupt data often.

    8. Re:right filesystem by d3vi1 · · Score: 4, Informative

      Two aspects to your problem:

      1) Recovering from the current situation

      If you didn't make ANY changes to the filesystem after it was corrupted, you still have a chance with software like DiskWarrior or Stelar Phoenix. Never work on the original corrupted filesystem unless you have copies of it. So grab a second drive, connect it over USB and using hdiutil or dd copy it to the second drive. Once you do that, use DiskWarrior or Stelar Phoenix on either one of the copies, while keeping the other one intact. Always have an intact copy of the original FS. You might be successful trying multiple methods, so KEEP AN INTACT COPY.

      2) Avoiding it in the future
      NTFS is good at surviving a crash if and only if the crash occurs in Windows. Paragon NTFS for Mac/Linux or NTFS-3G don't use journaling to it's full extent (for both metadata and data). So, if you get a crash while in Mac OS X or Linux, chances are that you get data corruption.

      Same goes for HFS+. While Mac OS X uses journaling on HFS+, Linux doesn't. It's read-only in Linux if it has journaling. Furthermore, the journaling is metadata only in HFS+.

      Now we get to the last journaled filesystem available to all 3 OSs: EXT3. It's the same crap as above.

      Because of the three points above, I have a conclusion: what you're looking for (ZFS) hasn't been invented on any of the OSs that you're using.
      Thus, I have a simple recommendation:
      Use ZFS in a VMware machine exported via CIFS/WebDAV/NFS/AFP to Linux, Windows or Mac OS X. A small FreeNAS VM with 256MB of RAM can run in VMWare Player and Workstation on Windows/Linux and Fusion on OS X.

      ZFS uses checksumming on the filesystem blocks, which lets you know of the silent corruptions. Furthermore, by design, it will be able to roll-back any incomplete filesystem transactions. I've had my arse saved by ZFS more times than I care to remember. The most difficult thing for my home storage system is to find external disk arrays that give me direct access to all the disks (not their RAID crap). A proper home storage system is RAIDZ2 (basically RAID6) + Hot Spare.

      Another way is to have a simple, TimeMachine-like backup solution on at least one of your operating systems. But even that doesn't catch silent data corruptions, let alone warn you. As such, we get back to: ZFS.

      --
      UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
    9. Re:right filesystem by omnichad · · Score: 2

      Finding a way to make the Mac read NTFS beats using MacDrive for HFS+ on the Windows side. NTFS just doesn't corrupt as easily with a power failure as HFS+, in my experience. Ideally, I would just use networked storage and access it from Mac OSX with afpd or NFS, from Windows with Samba, and linux with NFS.

    10. Re:right filesystem by xbytor · · Score: 1

      > what you're looking for (ZFS) hasn't been invented on any of the OSs that you're using.

      Actually, there is MacZFS. Runs just fine on OSX. I have the OS, apps, and my home folder on an HFS+ partition on an SSD. Everything else is on ZFS. It's exported via SMB to all my Win boxes.

    11. Re:right filesystem by trynis · · Score: 1

      Thus, I have a simple recommendation:
      Use ZFS in a VMware machine exported via CIFS/WebDAV/NFS/AFP to Linux, Windows or Mac OS X. A small FreeNAS VM with 256MB of RAM can run in VMWare Player and Workstation on Windows/Linux and Fusion on OS X.

      ZFS uses checksumming on the filesystem blocks, which lets you know of the silent corruptions. Furthermore, by design, it will be able to roll-back any incomplete filesystem transactions.

      Seconded. I'm running a similar setup right now for precisely these reasons, although I'm not running FreeNAS virtually, but rather have a dedicated machine for it. Once you get used to ZFS you will not want anything else (possibly with the exception of btrfs once it matures). I'm currently moving away from Linux to PC-BSD (an easy to setup FreeBSD variant) to be able to have a ZFS root file system. Snapshoting and cloning are incredibly useful even on a single-disk machine, and incremental backups are trivial.

      --
      This is not a sig.
    12. Re:right filesystem by izomiac · · Score: 1

      I feel your pain, but this is the reason everyone recommends FAT32. I've used NTFS and ext2 for shared volumes before, but the filesystem invariably gets corrupted since you're stuck using hacked-on filesystem drivers with OSes that aren't designed for them (e.g. corruption upon crashing).

      exFAT might be an option in the future, but right now FAT32 is your best bet. Personally, I keep my larger files on my Samba server, and media files in a partition created for the OS I plan to use them with. If you really need >4 GB file support, then make a "small" partition for transferring files only, don't store anything on there long-term and don't delete the old OS's files until you've verified they were successfully copied from the shared partition to the new OS's partition.

    13. Re:right filesystem by d3vi1 · · Score: 1

      > what you're looking for (ZFS) hasn't been invented on any of the OSs that you're using.

      Actually, there is MacZFS. Runs just fine on OSX. I have the OS, apps, and my home folder on an HFS+ partition on an SSD. Everything else is on ZFS. It's exported via SMB to all my Win boxes.

      And there's the ten's complement implementation that's even better, but doesn't cover Windows and Linux. There is no Windows implementation and the Linux one is alpha quality at best.

      --
      UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
    14. Re:right filesystem by Anonymous Coward · · Score: 0

      Because compared to HFS+ it would seem to be that way. And given your useless response without suggestion, you just sound like a Linux fanboi.

    15. Re:right filesystem by Anonymous Coward · · Score: 0

      http://tenscomplement.com/
      IS ZFS on Mac OS.

    16. Re:right filesystem by Anonymous Coward · · Score: 0

      ZFS is available for Mac: http://tenscomplement.com/our-products

  29. Tech Tool Pro, perhaps by Anonymous Coward · · Score: 3, Informative

    Tech Tool Pro, over on the Mac side, has a "File Structures" check which looks at a lot of different structured file types to make sure that their internal format is valid.

  30. Reed Solomon to the rescue by mpol · · Score: 1

    It's already too late, but I keep important files with par2 files. That way, when there's like 5% corruption, I can still fix the file.
    I do this with flac files and some datafiles.

    Also make sure you keep backups going. I guess this was your warning. Everyone needs one.

    --

    Well, don't worry about that. We can get you back before you leave. (Dr. Who)
    1. Re:Reed Solomon to the rescue by Jazari · · Score: 1
      I'll second that. QuickPar ( http://www.quickpar.org.uk/ ) has been exceptionally useful to me over and over again. I can check file integrity, recover minor corruption, and revert to past file states if I accidentally modify old archived files. It's also free. The only unfortunate thing is that it doesn't seem to be under development anymore, but at least it still works with Win7/64.

      For archival purposes, I've started using WinRAR ( http://www.rarlabs.com/ ) with the file authenticity and recovery options checked. Unfortunately none of this helps you now, but it will help in the future at least...

    2. Re:Reed Solomon to the rescue by bernywork · · Score: 1

      There is a good link here:

      http://ttsiodras.github.com/rsbep.html

      This is a good move for creating par files etc as part of your backups. He also has some other really good information up there in regards to protecting data. Especially creating backups under windows:

      http://ttsiodras.github.com/win32backup.html

      --
      Curiosity was framed; ignorance killed the cat. -- Author unknown
    3. Re:Reed Solomon to the rescue by St.Creed · · Score: 1

      Better use Crashplan (free). Backup to a remote computer, internet or your own disks in the background. Works for me (and lots of other people).

      --
      Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
  31. A lot of corrupt files? by 19thNervousBreakdown · · Score: 4, Interesting

    That seems very strange--the only files that should really be corrupted, unless something extremely rare and catastrophic happened, are the ones that were being written when power went out, or were cached. And even then, a flush usually flushes everything, or at least whole files at once, or areas of disk. Is the partition highly fragmented or something?

    I know this doesn't do much for your question, but that kind of failure mode is almost exactly what filesystems do their damnedest to avoid. HFS+, being journaled, should be even more proof against, well, exactly what happened to you. Maybe the Linux driver is poor, but man, if you got silent data corruption on a multitude of files that weren't even being written, that's really bad and the driver should be classified "EXPERIMENTAL" at best, and certainly not compiled into distros' default kernels.

    To answer your question, I don't have experience with any tools (I automate my backups, and any archival files go on a RAID volume that does a full integrity scan nightly), but once you find one, you should separate your files into two categories--"must be good", and "can be bad". The "must be good" files (serial #s, source code, etc.), you hand-check, so you know for certain that every one of them is good. It'll also motivate you to replace them now, instead of later when replacements will only get harder to come by. The "can be bad" files (music, pictures, etc.), you do the automated check on and then just delete as you run into ones that the check missed. This has the advantage of concentrating your effort into where it's useful. If you try to check all of your files, you'll just burn out before you finish. You may even want to do more advanced triaging, but you'll have to come up with the categories and criteria there. The main thing is, split this problem up.

    --
    <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    1. Re:A lot of corrupt files? by rrohbeck · · Score: 4, Informative

      Very few filesystems keep checksums - only btrfs and zfs come to my mind.
      With defective hardware (RAM issues in main memory and disk or controller caches are fun) you can have silent corruption that goes on for a long time. Also bits on disks rot but those should give you a CRC or ECC error.

    2. Re:A lot of corrupt files? by 19thNervousBreakdown · · Score: 1

      Yeah, that's what I was saying--it's pretty unlikely that the power failure caused this, so the author should try to find the true root of the problem.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    3. Re:A lot of corrupt files? by rrohbeck · · Score: 1

      You haven't lived until you troubleshoot a site with a couple 100TB in FC RAIDs that weren't accessed in months and that had media scan turned off due to a FW bug :)

  32. Well, for one thing by Anonymous Coward · · Score: 0

    First, copy to an external disk all files that are on your hard disk and not on your backup.
    Next, compare the files on the hard disk with the backup, and copy to that same external disk all the files whose MD5 sum is different AND last modified date is later than the backup.
    Then, wipe the hard disk.
    Then, restore the backup.
    Finally, just look at the files on the external disk. There won't be nearly as many of those.

  33. Re:Your eyes by dmacleod808 · · Score: 0

    TL;DR Summary. in a quest to be #, Mr. Courteaudotbiz forgoes reading the summary to post Snarky Comment #1(TM)

    --
    There Can Be Only One...
  34. mplayer/mencoder (or ffmpeg) & imagemagick by Bonteaux-le-Kun · · Score: 4, Informative

    You can just run mencoder or ffmpeg on the mp3 and mov on all the files (with a small shell script, probably involving 'find' or similar), just tell it to write the output to /dev/null, that should go through those files as fast at they can be read from disk and abort with error on those that are broken. For the jpgs, you could try something similar with imagemagick's 'convert', to convert them to whatever format to /dev/null, which also needs to read the whole file content and aborts if they're broken (one should hope). Those converters are really fast, especially ffmpeg, so that should complete in a reasonable time.

  35. Not have bit rot in the first place by Anonymous Coward · · Score: 1

    Having dealt with file corruption first hand after bit rot on some older media, the best recommendation I have is to find a solution which will prevent future bit rot. I have been using Great Lakes SAN (http://glsan.com) , who bases their system on zfs. Their system using 256bit check sums on all files and can detect and correct any/all file issues which may occur. Furthermore, backups, off-site disaster recover + system monitoring are built into their system.

    1. Re:Not have bit rot in the first place by Anonymous Coward · · Score: 0

      So does zfs do checksumming of all files?

    2. Re:Not have bit rot in the first place by 0123456 · · Score: 1

      So does zfs do checksumming of all files?

      Yes. All filesystem blocks, I believe.

    3. Re:Not have bit rot in the first place by flargleblarg · · Score: 0

      Yep -- at the block level.

  36. Check why the files are corrupted by ncw · · Score: 5, Insightful

    I'd be asking myself why lots of files became corrupted from one dodgy file system event. Assuming HFS works like file systems I'm more familiar with, it will allocate sequential blocks for files wherever it can. This means that a random filesystem splat is really unlikely to corrupt loads and loads of files. You might expect a file system corruption to cause a load of files to go missing (if a directory entry is corrupted) or corrupt a few files, but not put random errors into loads of files.

    I'd check to see whether files I was writing now get corrupted too. It might be dodgy disk or RAM in your computer.

    The above might be complete paranoia, but I'm a paranoid person when it comes to my data, and silent corruption is the absolute worst form of corruption.

    For next time, store MD5SUM files so you can see what gets corrupted and what doesn't (that is what I do for my digital picture and video archive).

    --
    Every man for himself, all in favour say "I"
    1. Re:Check why the files are corrupted by rrohbeck · · Score: 1

      The bit rot could have gone on for some time. How often do you check those videos or MP3s that you downloaded years ago?

    2. Re:Check why the files are corrupted by DocSavage64109 · · Score: 2

      I agree with this parent. Most likely there is a hardware failure, like the one that caused Intel to spend a billion dollars recalling Sandy Bridge motherboards for SATA errors. You need to isolate the problem to either a hard drive, ram, motherboard, cable, or even power supply and fix the root cause.

    3. Re:Check why the files are corrupted by Anonymous Coward · · Score: 0

      It probably not hardware failure. It just the quality of apple filesystems is just horrible. People lose their file all the time with hfs+ because it is a abomination of all modern filesystems. It is basically a more complex version of fat32 with journaling. If the op was smarter, then he should have format his filesystem to ntfs or ext3. Atleast those filesystems will actually tell you when the file is corrupted.

  37. Perception bias by tobiah · · Score: 1

    I've certainly seen corruption with XP crashes, not a big deal because I do backup. About the same with the other file systems. In this case he was using Mac OS 10.7 Lion, which is a mess, and two others accessing the same partition. Not surprised.

    --
    "The ability to delude yourself may be an important survival tool" - Jane Wagner -
  38. Re:BSOD? No, use open source "Tripwire" by quarkscat · · Score: 3, Informative

    Not the BSOD.
    If the OP had used open source "tripwire" on known-good files in each filesystem on his Macbook, and saved the resultant data output to a USB thumbdrive formatted with FAT32, the OP would have had a good chance of determining all corrupted files. In this case, an ounce of prevention would have prevented several pounds of "cure".

    Check out http://tripwire.org./

  39. Re:Gamemaker sucks ass by jkflying · · Score: 0

    Including all of his erotica e-books. Tough life, dude.

    --
    Help I am stuck in a signature factory!
  40. Re:Gamemaker sucks ass by Volanin · · Score: 5, Funny

    Author here:

    Ok, I could deal with the loss of some unique videos and pictures from travels... but now that you mention the porn... *weep*

    --
    If I clone myself, can I call it a thread?
    If a girl winks to us, can I call it a race condition?
  41. Re:A question about NTFS versus other file systems by Githaron · · Score: 1

    An honest question :

    I've had several crashes over the years with Windows XP but the files, data and system files were never corrupted. In linux it seems that file systems are not very resilient, and the least crash can corrupt your files. Is NTFS such a good well designed file system compared to linux file systems ?

    Linux supports a wide array of filesystems. Which ones have you used? I have used ext3 and ext4 and have never run into file corruption problems. Both of those are journaling filesystems. Journaling filesystems helps prevent corruption in the even of power failure.

    Beside the filesystem, one other possibility for corrupted files is a bad hard drive. I know someone who reinstalled Windows on his desktop on a regular basis because key files would go missing or get corrupted. I took a look at it and found out that he simply had a bad hard drive. After getting a new one, he didn't have anymore problems.

  42. A suggestion: Instead of triple booting... by Sepiraph · · Score: 2

    I'd recommend running a base OS and then run something like VMware workstation so that you run other OSes inside the main OS. One huge benefit is that you can have access to multiple OSes at the same time and you don't need to reboot into them either. With hypervisor technology getting common on desktop, there probably isn't any need to multi-boot unless you have a specific reason not to use virtualization.

    1. Re:A suggestion: Instead of triple booting... by rHBa · · Score: 1

      3d graphics acceleration (lack of) and audio latency (in live audio/DJ apps) are two reasons I still dual boot. I run VirtualBox on Ubuntu for other applications but for gaming and mp3jing I boot into windows natively.

  43. Re:A question about NTFS versus other file systems by 0123456 · · Score: 1

    In linux it seems that file systems are not very resilient, and the least crash can corrupt your files.
    Is NTFS such a good well designed file system compared to linux file systems ?

    I've never had corrupt files after a Unix crash; be it SunOS, Solaris, HP-UX, Linux or any of the other Unix variants I've used.

    I've never had corrupt files after an XP crash, but I've often had scandisk delete files, including a multi-gigabyte game installer that I'd just downloaded before it crashed. It regularly deleted Firefox bookmarks before they switched from storing them in big HTML files.

    The NTFS approach appears to be 'I'll guarantee file system consistency but won't guarantee any of your files are still there'. I'm sure you can find similar Linux filesystems, but the most common ones don't seem to have any problems.

  44. Re:Your eyes by cpu6502 · · Score: 2

    Perhaps but I agree with the first post. Going through and simply looking at all the JPEGs or MPEGs is probably the only way to tell if a file is corrupted (I wouldn't trust the CPU to do an accurate job). Also gives you a change to erase a lot of stuff you really don't need anymore. I dumped 300 gig off my drive simply by going through everything... took awhile but it was worthwhile to get rid of old shows/movies I'll likely never watch.

    --
    My AC stalker: " I personally agree with your posts most of the time, but that won't keep me from modding you troll"
  45. zfs by chocolatetrumpet · · Score: 2

    zfs! Works great. Included with FreeBSD 9, amongst other OSs.

    You might also enjoy John Siracusa's exhaustive review of filesystems on one of my favorite podcasts.

    --
    Spoon not. Fork, or fork not. There is no spoon.
    1. Re:zfs by Anonymous Coward · · Score: 0

      From the wikipedia entry on ZFS, i dont see any support listed for Microsoft Windows. So that might not be a solution.

      Also, I've been using freebsd since its inception, but i have never tried ZFS support.
      Can you comment on the stability/reliability of ZFS on freebsd? In comparison to the default?

      -Hashie @ TrYPNET.net

  46. Use JHOVE by mattpalmer1086 · · Score: 2

    The JSTOR/Harvard Object Validation Environment:

    http://hul.harvard.edu/jhove/

    It's specifically designed to first probabilistically identify files, then attempt to verify their format.

    Disclaimer: I haven't worked on it directly, but I did spend a number in the digital preservation space, so I probably know some of the people who have contributed to it.

  47. Re:Newbie question hour? by Anonymous Coward · · Score: 2, Insightful

    Let me ask a stupid question since I've never run a battery out on a machine running Ubuntu. Why did this happen? Running OSX or Windows, the machine would have hibernated safely before the battery ran out. Does Ubuntu not do this and it just dies? Or is this something you configured to act this way? If it is default behavior in Ubuntu it is something they ought to fix.

  48. Fake by Anonymous Coward · · Score: 0

    Yeah right, like a slashdot member could even lift a 15 lb. sledgehammer!

  49. Re:Gamemaker sucks ass by Anonymous Coward · · Score: 0

    I'm not using Gamemaker until it can also clean the kitty litter, go to the grocery store for me and find porn automatically for me!

  50. D&D approach by Ambitwistor · · Score: 1

    Cast Detect Evil, Sense Motive, and Discern Lies on the potentially corrupted files.

    1. Re:D&D approach by Volanin · · Score: 1

      Author here:

      Sorry, but I can't stand anymore the Paladin of the party insisting on replacing the HD for a tried and true Bag Of Holding.
      Thanks for the tip anyway.

      --
      If I clone myself, can I call it a thread?
      If a girl winks to us, can I call it a race condition?
  51. Re:Newbie question hour? by loftwyr · · Score: 4, Interesting

    mplayer can detect corrupted movie and audio files find . -name '*.mov' -exec mplayer -msglevel all=6 -speed 100.0 -framedrop -nogui -nolirc -cache 8192 -tskeepbroken -ao null -vo null {} \; | grep Warning! > $1.txt Change the *.mov as appropriate.

  52. Get Rid Of Paragon! by Lord_Jeremy · · Score: 5, Interesting

    Alright now I'm afraid I can't help with your verify problem but I do have one piece of solid advice: get rid of Paragon HFS immediately!

    It is a truly shoddy piece of software that as of version 9.0 has a terrible bug that will cause it to destroy HFS+ filesystems. Google "paragon hfs corruption" and you will see many many horror stories from people who just plugged a Mac OS X disk into a Windows machine w/ Paragon HFS and then discovered the entire filesystem was hosed. In my dual-boot win/mac setup I replaced my copy of MacDrive with a trial version of Paragon HFS 9.0 from their website and every single one of the six HFS+ disks I had connected internally were damaged. Disk Utility couldn't do a thing and I had to buy a program called Diskwarrior to even begin to recover data. I ended up losing two disks worth of files anyway.
    http://www.mac-help.com/t12137-opened-hfs-drive-win7-paragon-hfs-now-wont-boot.html
    http://www.wilderssecurity.com/showthread.php?t=299306
    http://hardforum.com/showthread.php?t=1677099
    http://www.avforums.com/forums/apple-mac/1509344-hfs-super-block-not-found.html

    whew! Anyway the pain I went through after that software very nearly ruined my life was so great, I don't want it to happen to anyone else. According to their own website 9.0 has this awful bug but they fixed it in 9.0.1. Evidently the trial download on the main page is still for version 9.0 and still has the disk destroying bug! Any software company that releases a filesystem driver with this terrible a bug (not to mention the numerous reports of BSODs and other relatively minor problems) clearly has terrible quality assurance and simply can't be trusted.

    1. Re:Get Rid Of Paragon! by Volanin · · Score: 1

      Author here:

      Just out of curiosity, I went to check the version of my Paragon installer and guess what... it was corrupted! Oh the irony!
      Windows is the OS I least use, and I have not booted it for the last month or so. Unless Paragon silently corrupted something there previously and somehow "weakened" the filesystem integrity since. Anyway, thanks for the tip. What do you use currently to read HFS+ in Windows?

      --
      If I clone myself, can I call it a thread?
      If a girl winks to us, can I call it a race condition?
    2. Re:Get Rid Of Paragon! by jones_supa · · Score: 1

      Have you considered the option of running only OSX natively and Windows and Linux in virtual machines? It might make things a bit neater.

    3. Re:Get Rid Of Paragon! by macraig · · Score: 3, Interesting

      Having nothing at all to do with Paragon (not that I'm a fan of the company otherwise), I had a very similar disaster occur with an external eSATA 5TB RAID 5 enclosure. It's one that uses an internal hardware RAID 5 circuit and doesn't require port multiplication, so when connected it appears to the host as a single large volume. At the time I was swapping it between a Linux (Ubuntu) system and a Windows 7 system; it was of course configured as GPT. Eventually I connected it to the Windows 7 system and during boot Windows declared there were problems and initiated chkdsk. Chkdsk ran for more than 18 hours and when it was done, most of the files in the volume were hopelessly corrupted. Upon detailed inspection, I found that blocks of all the files were swapped and intermingled, as if something had made a jigsaw puzzle out of the MFT and couldn't reassemble Humpty Dumpty. Was it chkdsk itself that caused the damage? Was it the swapping between two machines and operating systems (both GPT compliant)? I suspect it was actually caused by chkdsk, but could never prove it.

    4. Re:Get Rid Of Paragon! by spads · · Score: 1

      Bro, yea, chkdsk, that's just ironic enough to be possible in a Windows universe! "Oh, and remember to be sure to allow it to complete uninterrupted!" lol

      --
      Bukowski said it. I believe it. That settles it.
    5. Re:Get Rid Of Paragon! by Lord_Jeremy · · Score: 2

      I had been using MacDrive before trying out Paragon. The version of MD I had (8 I think?) no longer worked when I upgraded Windows on one of my computers so I looked around for something else before buying the MacDrive upgrade. I saw Paragon had a promotion where you'd get a discount on a new copy of HFS+ for Windows if you proved you were switching from a competing driver (making it cheaper than the MD upgrade) so that's when I installed the evil trial.

      It's only been a couple weeks since the disaster so I haven't yet had the confidence to install any new drivers yet. I'm planning on going back to MacDrive after I buy the upgrade. In the years I've used it (pretty much since Bootcamp) I haven't had any problems with it, I went looking for alternatives simply out of curiosity. If ain't broke, eh?

      Looking back at your particular problem I've got a couple thoughts. First of all, some of the common compressed media formats like JPG and MP3 can be crudely verified by some sort of utility that attempts to inflate the compressed structure. This guy has a suggestion for JPGs and I think I saw someone else post a recommendation for MP3s. I suspect that files like PDFs generally won't open at all if there is any corruption in the format, you could try using Spotlight to find all PDFs and then open them all at once. Preview has always given me error messages if I try to open a corrupted PDF. I've also noticed that corrupted MOV files tend not to open, but I can't guarantee that this is a rule.

      I might also try looking at some known corrupted files with a HEX editor. In the past I've encountered disk corruption that manifested as the binary contents of parts of files being entirely zeros. If there is some discernible pattern it may be possible to hack together some way to scan your files.

      Although it may be moot now, from what I've read online Paragon HFS creates all sorts of issues with the HFS+ filesystem journal. It's indeed possible that it left your disk in a state where it was vulnerable to further problems. I'm also curious what package and version you were/are using in Ubuntu. It wouldn't surprise me if that driver is nowhere near as robust as it should be.

      As someone who used to share many disks between Linux, Windows and Mac OS X I previously had come to the conclusion that the easiest solution was to use Ext3 formatting on disks that I wanted write access to from all three operating systems. Early on I had a minor filesystem problem with the HFS+ package I was using in Linux when writing files and from then on I mounted HFS disks in read-only mode. Now I very rarely use Linux to access the external disks I share between a Win/OSX dual-boot (gigabit network FTW). For Mac OS X I have a very good NTFS driver called Tuxera NTFS. I still occasionally mount Ext3/4 disks in Windows using Ext2Fsd (ignore the implications of the name). The Ext driver I was using in Mac OS X didn't have write capabilities for Ext4 last I checked, but I can't remember what it was called.

      I hope at least some of this is helpful. Cheers.

    6. Re:Get Rid Of Paragon! by Lord_Jeremy · · Score: 1

      Jesus, something similar happened to me back when I was dual-booting Vista and OS X on my MacBook Pro. One day when I booted Windows it declared that it had to run chkdsk, churned for the longest time and reported tons of errors. After it rebooted I was surprised to see Windows start up fine. Then I got an unreadable disk dialog box and saw that my Mac OS X drive (D: normally mounted by MacDrive) wasn't being mounted. Of course then when I reboot again I get a nice cheerful blinking question mark. Disk Utility on the Mac install dvd couldn't even recognize the formerly HFS+ partition.

    7. Re:Get Rid Of Paragon! by spads · · Score: 1

      This is just a guess, though still something you should have specified in your original description.

      If when it originally died on the off chance that you initially powered back up into Windows first, then this Paragon thing (with the above mentioned deficiencies) would sound like the likely culprit. Otherwise, as people have mentioned, it is very strange that disk-wide issues should result from a loss of power.

      --
      Bukowski said it. I believe it. That settles it.
    8. Re:Get Rid Of Paragon! by macraig · · Score: 1

      I'll tell ya, the experience scarred me worse than the bullying in elementary school. I'm distrusting of all filesystems now. Paranoid? Nah, I watched a 5TB MFT being turned into logical hamburger!

    9. Re:Get Rid Of Paragon! by Lord_Jeremy · · Score: 1

      Doesn't MBR have a 2TB size limit? I know you formatted it as GPT but GPT is designed to look like MBR in the case of software that can't recognize GPT. Maybe chkdsk thought it was an MBR disk or somesuch. I'm not very familiar with Windows disk management but in my experience, Windows operates under some very annoying assumptions about your disk layout.

    10. Re:Get Rid Of Paragon! by macraig · · Score: 1

      GPT includes a very small MBR partition at the front of the drive, which is the only partition that appears if a system doesn't support GPT; if the system does support GPT then the MBR partition is deliberately ignored and hidden. The MBR partition is empty and does not show any of the files in the GPT partition, so if that had been my problem chkdsk wouldn't have had anything to do. Nope, chkdsk spent that entire day checking - and probably destroying via the MFT - the millions of files in the GPT partition.

    11. Re:Get Rid Of Paragon! by Anonymous Coward · · Score: 1

      GPT includes a very small MBR partition at the front of the drive, which is the only partition that appears if a system doesn't support GPT;

      No, that's not necessarily true. GPT was designed to make it possible to have a MBR partition table in parallel with the GPT table. Within certain limits (mostly based on MBR's inability to do some things), the MBR table can point to the same partitions as the GPT table. A common source of disks partitioned like this is Apple's Boot Camp, which will automatically setup a GPT/MBR hybrid disk with 2 main partitions (one for OS X, one for Windows), with both partitions visible in both the GPT and MBR tables.

  53. It may be that simple by the_B0fh · · Score: 1

    Just have your OSX do a repair - it could be that certain VTOC or directory tables were damaged, and a repair may fix it. The files themselves should be OK, but the pointers to them are fubared.

    Also try something like http://www.cgsecurity.org/wiki/PhotoRec or similar to recover deleted files. There's one for OSX. Run it after a repair, and photorec, and you should get most of your crap back.

  54. Re:Newbie question hour? by StikyPad · · Score: 1

    Look, you're really taking the wrong approach here. The way to deal with corruption is avoidance, backup, and corrective action.

    1) Avoidance. This is the generally the role of the filesystem and the underlying hardware, each of which have methods for preventing and correcting data corruption without ever involving the user. The user has a small part to play by doing things like shutting down instead of turning off whenever possible, though journaling filesystems (i.e., all modern filesystems) will know when a file operation was interrupted prematurely and check the integrity automatically. Also try not to put different file systems and OSes on the same drive, since there's the possibility that one OS may not respect the FS or limits of another (typically/historically, Windows has been the culprit here, but not always, and not so much anymore.) Any OS will generally leave an unrecognized drive alone unless you tell it to do otherwise, but the system drive has often been considered fair game.

    2) Backups (optional). Once you have a known-good (or believed good) installation, create your backup. Repeat somewhere between often (if your data is important) and never (if it's not).

    3) Correction. If and when you come across data corruption, that's not a sign that you're wasting space on your hard drive; it's a sign that something is seriously wrong. The proper course of action is to identify the underlying cause and correct it, not to delete the files to free up space. If you're experiencing corruption on only one drive regardless of channel and cable, replace the drive. If you're consistently having problems on a given channel, then don't use that channel. If you're having random issues across all drives on all channels, then the chipset is bad and the motherboard should be replaced. Basic troubleshooting.

    Technically you *could* take a checksum of all of your files and update the database every time a file is changed. Some antivirus systems already do this to detect infections, but it would also detect incidental changes as well. The problem is that constantly verifying the integrity of your files will only hasten the demise of your storage medium. It's a self-fulfilling prophecy.

  55. backup strategy to prevent this by osssmkatz · · Score: 1

    You clearly need an image based backup system to prevent this from happening again. It needs to be a chron job (or task scheduler) and run on regular intervals when storage is available. ideally, it needs to be network storage, so that a sudden disconnect (abscence of power) cannot easily corrupt the backup. There is an open source version of Ghost, partd, rsync.. options for you, though I am relatively new to linux so I don't know what the appropriate option is for you. Time machine you could use if you had a separate partition, but I think that isn't what you want. also, fundamentally writing to one partition from three OSes is asking for trouble.

  56. Re:Newbie question hour? by frisket · · Score: 1

    Ubuntu pops up a warning window, and if you ignore it the battery light turns orange, and then red, and then it should hibernate. Flat-out dying is not something I've come across under Ubuntu (and I have some flaky old machines with old batteries, and they still warn me and then shut down).

  57. Re:Newbie question hour? by Anonymous Coward · · Score: 0

    There is an oldie out there called cleanjpg.exe which does the same for jpg's.

  58. Bad news... and good by jimicus · · Score: 2

    The bad news is I don't know of any (and I don't think you'll find any) easy, one-shot tool to run across the whole lot that gives you a simple "corrupted yes/no?" answer to lots of different filetypes.

    The good news is it'd be reasonably easy to lash together something in bash, kick it off overnight and come back in the morning to a list of probably-corrupted files.

    In pseudo-bash (because I haven't the time to write it out and check it works properly), something like this would be a good start:


    function checkJpeg {
        jpeginfo -c $1 || return 1
        return 0
    }

    function checkPdf {
        # do something to check a PDF is OK
    }

    FILETYPE=`file $1`
    case $FILETYPE in
        "jpeg" )
            checkJpeg $1 || echo $1 ;;
        "PDF )
            checkPdf $1 || echo $1 ;;
    esac

    Then run it with the help of find /home -type f -print0 to check every file in /home. This would give you a list of potentially-corrupted files. Up to you how you deal with it - personally I wouldn't run rm against it in case you find files that can be rescued or that your checks aren't as perfect as you'd like.

    For extra credit, determine the expected filetype based on file extension and then use file(1) as your first "is it corrupted?" test - that way you'll spot files that are too corrupted for file(1) to work reliably.

    1. Re:Bad news... and good by mattpalmer1086 · · Score: 1

      Actually there is a tool that does all of that already: JHOVE - JSTOR/Harvard Object Validation Environment.

      http://hul.harvard.edu/jhove/

      It's used in the digital preservation field, for example in an archive to try to figure out what they've got and what state it's in.

    2. Re:Bad news... and good by macraig · · Score: 1

      All such a process can do is verify that the file header appears well-formed. That might flag a few bad apples, but the ones with good headers and corrupted contents will slip through the cracks.

  59. There's a whole slew of them by Solandri · · Score: 1

    http://en.wikipedia.org/wiki/Comparison_of_file_verification_software

    md5sum is the one I know best, but that's because my computing is unix-centric.

  60. cfv by Anonymous Coward · · Score: 0

    Short answer:

    cfv

    Long answer:

    How can you be so tremendously stupid to not even know about cfv?

  61. Re:Newbie question hour? by Vancorps · · Score: 2

    The real reason and it was stated in the summary is that the file system was HFS+ which is far less tolerate to this behavior than ext4.

  62. Par2 by Dadoo · · Score: 1

    That's a pretty good idea, if you only want to detect corrupted files (and yes, I know that's what the OP said he wanted), but I can't believe no one's suggested par2, yet. It will not only detect corrupted files, but repair them, too. If he had used par2, he wouldn't have to delete them.

    --
    Sit, Ubuntu, sit. Good dog.
    1. Re:Par2 by ibennetch · · Score: 1

      The problem with par2, at least as far as I've seen it implemented, is that it's really an archive container -- a real pain to use in day-to-day file storage. Am I missing something that lets me take advantage of the error detection and recovery of par2 while allowing me to seemlessly access the files? While I agree with you on the merits of a resilient format like par2 for archiving files, it doesn't seem very friendly for quick access to files on the disk.

  63. Re:Newbie question hour? by Anonymous Coward · · Score: 5, Funny

    mplayer can detect corrupted movie and audio files find . -name '*.mov' -exec mplayer -msglevel all=6 -speed 100.0 -framedrop -nogui -nolirc -cache 8192 -tskeepbroken -ao null -vo null {} \; | grep Warning! > $1.txt Change the *.mov as appropriate.

    <infomercial>its JUST. THAT. EASY folks!</infomercial>

  64. Folder hash creation/comparison tool by Anonymous Coward · · Score: 0

    I am using Hashdeep for this purpose.
    http://md5deep.sourceforge.net/
    It is similar to a script that generates an md5 or SHA hash value for every file, but much easier to use.

    1. Re:Folder hash creation/comparison tool by macraig · · Score: 1

      That's a preemptive strategy, though. No help at all if you only think to use it after your kid brother decides it would be fun to slap his Magnet Balls all over your computer case.

  65. Be philosophical about it by chepati · · Score: 2

    ... yes, this is not what you want to hear at this point, but try to have a positive take on this.

    Last year during a routing Windows7 installation, my second hard drive from which I double boot my 90%-of-the-time-in-use Linux was destroyed. Either a coincidence that it occurred during the win7 installation or a nefarious plot, but the hard disk, a 1TB Seageate sata, developed an unrecoverable click of death.

    On that hard drive I had my short stories which I had written in college and the intervening years since then, much of my photos, skype history and many other things, seemingly important to me at the time of the "disaster". I was inconsolable for a few days, and felt like I had been bereft of someone very dear to me. Then it hit me -- to hell with the stories, to hell with the photos, to hell with the rest of the digital baggage I had accumulated. I could write my stories again, and do it better, I could take more photos, I could hoard more useless junk. After a month I no longer missed any of the lost stuff.

    Learn to view such mishaps more philosophically and learn to shed all the useless garbage you accumulate through the years; realize that almost nothing that you can store on your computer, or up in your attic, has really all that sentimental value you attach to it. Learn what's important, intrinsically important, to you and safeguard that. All the rest, you'll be amazed how little you need it and how even less you'll miss it.

    To hell with useless stuff.

    1. Re:Be philosophical about it by jones_supa · · Score: 1

      +1

    2. Re:Be philosophical about it by Anonymous Coward · · Score: 0

      How hard is it to do automatic and periodic backups.....

  66. Re:A question about NTFS versus other file systems by Anonymous Coward · · Score: 0

    I've seen defective memory cause data corruption too, among a lot of other strange problems that seemed to point to another hardware component. Memtest86+ is your friend.

  67. Re:Your eyes by DarwinSurvivor · · Score: 2

    I used to do that, but found it to be pointless these days. Organizing the stuff is one thing, but deleting is basically pointless unless you can automate it. 300GB may seem like a job well done, but with 3TB drives for $100 these days, you just saved yourself $10 worth of harddrive space and it probably took you a few hours.

    My current setup is to have everything on my server box and simply copy over what I need to my laptop as I need it and NFS/SSHFS the rest of it on the fly when home.

  68. Re:Guess you should have used a real filesystem by JonJ · · Score: 1

    It's the HFS+ partition that died, not the Linux one. Mac OS is certainly not a "freetard" OS.

    --
    -- Linux user #369862
  69. Says The Knack: You'll find out the hard way by macraig · · Score: 1

    Lacking not only a backup but also PAR(2) and MD5 files, manual inspection of each and every file is the ONLY way you can determine their integrity. There is no automagic after-the-fact integrity check. If you had MD5 sums for every file, you could at least check their integrity. Some PAR2 files would not only verify but possibly repair if the damage wasn't more extensive than the PAR recovery blocks. Of course if you're willing and able to do all that, you'd probably have had full and differential backups first.

    (And yes, the subject was a lyrics reference.)

    1. Re:Says The Knack: You'll find out the hard way by Red_Chaos1 · · Score: 1

      I've thought about using PAR2 for my own files I care about, but the rub is knowing just what settings to use, which I don't, and there seems to be little info on the optimal settings.

    2. Re:Says The Knack: You'll find out the hard way by macraig · · Score: 1

      There's not much being done with PAR2, unfortunately. I think MultiPAR work might still be current.

  70. You are right to suspect the driver: by Burz · · Score: 1

    The Linux HFS+ driver can't even work in write mode unless the journal has been deleted, so the journal isn't working when using the HFS+ partition under Ubuntu and probably Windows as well (author take note). I would not use that filesystem under Linux or Windows on a daily basis. Also, since the journal has been deleted, you are probably missing the safety of journaling under the native OSX as well.

    Author should also note that archival backups with md5 or sha256 checksums are probably the most straightforward way to maintain data integrity. If you want something more elegant for day to day use, I would consider setting up a NAS using either BTRFS or ZFS as the filesystem along with a nice 1Gbps LAN (if you don't have that already).

  71. Nobody has mentioned it by Anonymous Coward · · Score: 0

    Diskwarrior works miracles
    http://www.alsoft.com/diskwarrior/index.html

  72. pseudo-bash... by Mister+Liberty · · Score: 1

    I hate your brace style anyway...

  73. Re:Your eyes by s.petry · · Score: 1

    Maybe you didn't mean it this way, but dang if I did not see all the PHBs come out from work with your comment. "I can get 1TB Drives from Fryes for $80.00, why do you say it costs several hundred?".

    Oh, you wanted redundant drives to be covered in the event of a failure? You wanted a drive that has some performance so it does not take 32 minutes to open your word file? So much for that 1TB for 80 bucks thing...

    The new one is "SSDs are only $150.00, and they are the same as what you get for SSDs without all the writes.

    --

    -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

  74. spinrite by Anonymous Coward · · Score: 0

    Spinrite may work

  75. Re:Gamemaker sucks ass by Anonymous Coward · · Score: 0

    I'm not using Gamemaker until it can also clean the kitty litter, go to the grocery store for me and find porn automatically for me!

    You might want to program it for the type of porn you like. If you do not, you may get some horrific* results.

    *Horrific being the sights may never leave your mind and cause you to have nightmares, day-mares, lose your job, lose your wife, kdis, house, car, computers, pet(s), .

  76. George by sjames · · Score: 2

    George is your best bet. He's not bright enough for most support tasks, but he can certainly handle this one.

  77. Format Validation Tools by Anonymous Coward · · Score: 0

    The Digital Preservation Community (e.g. www.openplanetsfoundation.org) use JHOVE http://hul.harvard.edu/jhove/ and JHOVE2 https://bitbucket.org/jhove2/main/wiki/Home for validating some types of files against documented formats. They might be useful for checking which files are valid or not.

  78. fs.hash by Anonymous Coward · · Score: 0

    Check the md5 sums of the files. I was unsatisfied with md5deep so I wrote my own tool to calculate the md5 sums of all the files. It will tell you if the files have changed directories, been updated or are just corrupt by looking at the files datestamp and comparing it with the one in it's database.

    http://pastebin.com/uTTRh4Ws

  79. Re:Your eyes by Spazmania · · Score: 1

    My files go through descending levels of staleness. First I copy them off the primary drive to a network drive. They sit there for a while. When a couple years have passed without looking at them, they go on an offline drive and I create a file listing which I keep on the network drive. Once a decade has passed without loading the offline drive, they go in the trash.

    Files that still have value get caught in the sweep but then migrate back to the primary drive as needed. Saves having to scrutinize everything before hand.

    For the OP: restore from your august backup. Add anything on the drive with a more recent mod time. And then deal with the corrupt files as you come across them. If you come across them.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  80. Re:spinrite by Anonymous Coward · · Score: 0

    if the previous mention hadn't been modded down to -1 you would have noticed you were beaten by 2 hours.

  81. Photorec is great BUT by rduke15 · · Score: 3, Interesting

    Indeed, I used photorec/testdisk to recover mp4 files after they had (all) been accidentally deleted from an HFS+ partition.

    But when I first started it in it's default mode, it "found" only rubbish, breaking up the actual mp4s into a mess of .doc, xml, jpg, .whatever files, including totally broken .mp4s.

    When I restarted it after configuring it to only look for .mov/.mp4, it did a fantastic job, and as far as I know, all files could be recovered. Of course, that was made easier by the fact that I knew that all the files which needed to be recovered were .mp4.

  82. The jpeg aspect. by PenguinJeff · · Score: 1
  83. Not all "journaling" is equivilant by Anonymous Coward · · Score: 0

    I suggest that if you care about your data, you learn the difference between meta data journaling and data journaling. Plus while your having fun, a side trip down proper filesystem/disk fencing is in order.

    The problem on linux, is that a lot of filesystems are labeled as "journaling" and not to many people dig into the details, picking filesystems based on popularity or performance. The truth is that nearly all (the linux implementations) of them are susceptible to silent data corruption, and the ones that aren't, don't tend to ship in their safest mode because it kills performance.

    Then there are the drivers/controllers with write-back caches, that fail to do proper fencing. Frankly, the state of data integrity over power loss/system crash is pretty bad.

    Bottom line, do your homework.

  84. Re:Newbie question hour? by Atti+K. · · Score: 2

    I think this is the root of the problem here, he chose the wrong filesystem to share between the three OSes. Sadly there are not too many choices. FAT32 is the only one natively supported by all three, with its well known limitations. He might have been better with NTFS though, using NTFS-3G on Linux and OS X, but that has some performance hit. There's really no perfect solution for this kind of problem.

    --
    .sig: No such file or directory
  85. Re:Your eyes by Score+Whore · · Score: 4, Informative

    Well, jpeg files have a structure that will generate detectable errors if it's damaged. So simply opening them with something as simple as djpeg from the IJG and piping the output to /dev/null should give you a pretty good start on damaged images. Something like this perhaps:

    find . -name "*jpg" -o -name "*jpeg" -o -name "*JPG" -o -name "*JPEG" | while read filename; do if djpeg "$filename" > /dev/null 2> then :; else echo "$filename" is toast; fi; done

    You could probably do something similar with mpg123 and mplayer for .mp3 and movies.

  86. Re:Gamemaker sucks ass by Nimey · · Score: 0

    Weeping sores? You'd better use more KY, buddy.

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
  87. What about partitioning? by postglock · · Score: 1

    If you rarely use Windows, you could partition your disk, and format the Windows part to FAT. Then, on the odd occasion where you want to transfer files, you could just mount its partition in Linux (that's what I do), or OS X. I like the idea of sandboxing Windows, and not letting it touch/corrupt my main (ext4) file system.

  88. Re:Newbie question hour? by LDAPMAN · · Score: 1

    HFS+ with journaling enabled is as solid as EXT. Journaling is on by default if the disk was formatted in the past several years. I'm curious to know if he had it turned off.

  89. Re:Your eyes by cpu6502 · · Score: 1

    >>>you just saved yourself $10 worth of harddrive space and [enjoyed watching a ton of TV shows/movies during the process]

    Fixed that for you.

    --
    My AC stalker: " I personally agree with your posts most of the time, but that won't keep me from modding you troll"
  90. Re:Your eyes by Score+Whore · · Score: 2

    There ought to be an &1 after the 2>.

  91. for the ubuntu part by Anonymous Coward · · Score: 0

    debian has debsums (assume ubuntu includes this too). Can check md5 hash of every file installed via pkg manager and list what changed. Then , just --reinstall of those packages, and back to pristine state as far as OS and applications.

    A tiny part of what you want to do, but may help.

  92. journaling issues by Anonymous Coward · · Score: 0

    The Linux hfsplus driver does not support journaling. If your partition had journaling enabled then it was outdated when linux powered down. After the power failure the only files that should have been affected would be recently written files that linux had in the write cache. The problem is that when you used OSX to repair the disk it would try to use the outdated journal to correct errors in the directory structure, and that may have caused the corruption you found in files that were not open when power was lost.

  93. A multi-tool approach may be necessary by Arrogant-Bastard · · Score: 2

    First, let's presume you're running Linux for what follows.

    1. You're going to want to be familiar with both file(1) and find(1). File(1) is pretty straightforward, but be aware that its heuristics for file type detection vary in accuracy. If you're not find-literate, then at least get used to this construct:
    find /foo/bar -name "*.jpg" -print | sort -u > /tmp/files.jpg
    which will recursively search directory /foo/bar for all files suffixed ".jpg" and dump a sorted list of them into /tmp/files.jpg and this one:
    find /foo/bar -type f -print | sort -u > /tmp/files.all
    which will search the same directory, but will return a list of all (plain) files, that is, things which are not directories, devices, sockets, etc., sorted and dumped into file /tmp/files.all. (Note that the method by which find traverses filesystem trees won't yield sorted output, hence the need to pipe these through sort.)

    2. You now have (a) a list of all jpg files and (b) a list of all files. (I picked jpg arbitrarily to illustrate the process, by the way.) You can now generate a list of all files that are NOT jpg with this:
    comm -13 /tmp/files.jpg /tmp/files.all > /tmp/files.all2l
    The point of this exercise is that you can now repeat steps 1-2 with .gif, .mpg, etc., as you deal with each file type and reduce the remaining list to those awaiting your attention. /tmp/files.all3, /tmp/files.all4, etc. will each be smaller and eventually, if you deal with all files, /tmp/files.allX will be zero-length. Note that not all files have suffixes, of course -- and those without will likely be the ones requiring the most manual effort. If you want to know which suffixes are most numerous, something like
    sed -e "s/.*\.//" /tmp/files.all | sort | uniq -c | sort -n
    will give you a rough idea.

    3. Now then...you'll need some tools for dealing with each file type. The first tool I'd use is stat(1), to check sizes for plausability. Then things like jpeginfo(1), mp3val(1), tidy(1), will be some help, but of course you'll need to distinguish between "error message emitted because file is corrupt" and "error message emitted because file has minor issues...that it had BEFORE this episode". You may need to check the Ubuntu repository for tools you don't have; you may need to do some searching on the web for "Linux tool to check PDF integrity) and similar.

    4. If you have backups of any kind and can restore them, then you could try using sum(1) to compare checksums pre- and post-incident. This is a filetype-invariant method, which is good because it lets you skip the above...but bad because all it wll tell you is "different", not "mildly damaged" or "horribly corrupted" or something in between.

    5. I would recommend against deleting anything at this point. Instead, move it to secondary storage, like an external drive. I don't have a specific reason for advising this, other than "many years of experience doing partially-manual, partially-automated things like this and a recognition that sometimes errors in the methodology...or fatigue introduced by the tedium of executing it...lead to mistakes".

    6. Good luck.

  94. Re:Your eyes by Zaiff+Urgulbunger · · Score: 5, Informative
    Might be better using the "identify" command of ImageMagick. The man page says:

    The identify program is a member of the ImageMagick(1) suite of tools. It describes the format and characteristics of one or more image files. It also reports if an image is incomplete or corrupt.

  95. Re:Newbie question hour? by Vancorps · · Score: 1

    Last I checked which was about two months ago HFS+ with journaling enabled does not work reliably at all on Linux. I ran into this problem with trying to create a Drobo that everyone could share. The reality is that the only option would have been was HFS+ with journaling disabled in order to have all three. NTFS-3g from my experience works great on Linux and like crap on OS X nevermind the fact that you wouldn't even be able to install OS X on an NTFS partition.

  96. A couple thouhts by Anonymous Coward · · Score: 0

    zfs, btrfs or git fsck

    I've have a bunch of mods to git to run as a backup utility. Things like all the meta informatation, device files, running across filesystems and so on... :-)
    Though, in this case, sounds like you want to get rid of Paragon.

  97. Re:Your eyes by Anonymous Coward · · Score: 0

    find . -type f | egrep -i jpe?g | xargs -n 1 djpeg etc....

  98. Fingerprint (Re: md5sum) by Anonymous Coward · · Score: 0

    I made a command line tool called Fingerprint for this process: http://www.oriontransfer.co.nz/gems/fingerprint

    There is a GUI available for Mac OS X.

  99. Re:Guess you should have used a real filesystem by Anonymous Coward · · Score: 0

    Even trolls look stupid when they don't RTFS.

  100. "corrupt" is often subjective by MSG · · Score: 1

    Files can be corrupted by rare spontaneous bit flipping, by mis-writing a block that was intended for another file or corrupting the block list to include data from another file (cross-linked files), by including blocks that don't exist, or by including blocks that have no data or arbitrary data.

    Headers or meta data in some file formats can be verified by applications that support that file format, but it's possible for some of those problems to change the file's data such that the data is still valid, but wrong. If you have a large collection of media files or image files, filesystem corruption could potentially cross-link valid data from another file of the same type.

    All of that is to say that the only way you can reliably detect corrupt files is to compare them to files that are known good. To anyone with backups in your position, I would simply say that the best option would be to wipe the system and restore a backup that you trust. If you had rsnapshot backups, you might be able to:
    rsync -avcn /backup/ /filesystem/

    rsync would then tell you which files differed from backup.

    According to comment 39919031, Paragon HFS may have serious bugs. It's possible that the problem didn't actually come from the power loss, but from a bad filesystem driver. I'd recommend using something better supported by all systems for your shared space, or using hardware assited virtualization for all but one of the operating systems. On my own hardware, I run Linux with other systems in KVM guests, which works well. The host OS can export shared space over the network (NFS or CIFS) to the guests, which is probably the most stable filesystem configuration possible.

  101. Re:Newbie question hour? by Anonymous Coward · · Score: 0

    Well yes, it is relatively easy, when compared to the alternative of writing your own program to do the task. Now that you have the command line, all you have to do is copy and paste, no thinking required. I notice you did not offer an alternative, either.

  102. Re:Your eyes by neyla · · Score: 2

    That seems not worth it. The thing is, both drive-space and data-volume tends to double every ~18 months or so. You wait first "a couple of years", then on a network drive, then once a decade has passed, they go in the trash.

    But a decade ago the cheapest storage was a 40GB drive costing $130 or thereabouts. Today 40GB worth of space is 1.5% of that shiny new 3TB-disk costing $150 or thereabouts.

    There's essentially no benefit to deleting old data, because old data is *always* small data, and so copying it to the new disk will use a miniscule portion of the new disc and have essentially no cost. $150/3TB is equivalent to $2 for saving those 40GB.

    The only data that's potentially worthwhile to delete is *new* data that you have no need for. There is no such thing as "old but large data".

    Avoiding clutter is a different issue, but that's easily solved by copying all the old data to a named folder, then move out of that folder and into the current file-system only those files you actually use.

  103. Re:Newbie question hour? by Anonymous Coward · · Score: 0

    exFAT is a good alternative, even if it is propietary. It is what I use to share data between Linux, Windows and Mac. There is a fuse driver for it.

  104. Corrupt picture files in Windows by Elementalor · · Score: 1

    One way to check JPEG/PNG/GIF files that may be corrupted is browsing the folder with thumbnails on. The files that doesn't show the thumbnail are corrupted.

  105. Triple Booting with Write permission O.o by Anonymous Coward · · Score: 0

    They may have been corrupted before your crash. I had spooky(similiar) things happen using linux ntfs write before. Just may have noticed the bad files after power loss.

    Triple booting is the source of the problem. Some things don't mix even if they are technically "supported."
    If you must triple boot consider a more compatible file system (fat32 lol?) or disable writing to disk in foreign OS.

    Data that is irreplaceable may be worth recovering.
    For me formatting and starting fresh -- best option.

  106. Invalid path or corruption? by Anonymous Coward · · Score: 0

    The actual files may not be corrupt. Are deeply nested folders the ones corrupted? Something to consider.

  107. Bit Torrent by Barny · · Score: 1

    Simple enough, package up the 'known good' files on your server into a torrent, give torrent file to laptop, have it use bit torrents built in hash checking to verify and, if bad, replace the damaged parts.

    --
    ...
    /me sighs
  108. Too late now, but... by lga · · Score: 1

    Silent file corruption is the reason why I now keep all my data on a ZFS filesystem. ZFS has a checksum for every block and if you have redundancy at all (raidz, raidz2, or even just tell ZFS to keep two copies of each file) then it will repair the corruption as well as detect it. I've got a HP Microserver running Solaris but I recommend running FreeNAS instead if you don't know ZFS. This blog is a good place to learn about ZFS.

  109. Re:Your eyes by Spazmania · · Score: 1

    I don't like losing data so it sits in a raid and gets backed up. This means I'm not just keeping one copy of primary data, I'm keeping many. And reprocessing/recopying it each time I make a backup.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  110. integrity checker by Anonymous Coward · · Score: 0

    http://diglloydtools.com/integritychecker.html

    I use it all the time on a daily basis on my post-production job, with daily TB of in/out.

    Do the job very well.

  111. getting your files back by przemekklosowski · · Score: 1

    It is rather unlikely that you have a massive corruption of actual data in multiple files--for that, you'd need a sustained write activity hitting all over the disk. Possible, but not very probable. Instead, I think you have metadata corruption, so that the filesystem points to wrong blocks. The glimmer of hope here is that the actual file data is mostly contiguous, so that you can scan through the image and identify individual files even without the filesystem information. There is a forensic tool called 'foremost' that does exactly that: rips through the binary filesystem image and as it finds headers of known data types (jpeg, gif, doc, mp3, etc. etc.), it tries to find as much following data as is consistent with known file layouts. The result of course has tons of cryptically named files of each type foremost knows about---not all of them are legitimate, even---but it's better than nothing.

  112. Re:Your eyes by godefroi · · Score: 1

    but with 3TB drives for $100 these days

    Where are you buying your drives? I can't find them for less than about $185...

    --
    Karma: Poor (Mostly affected by lame karma-joke sigs)
  113. Re:Your eyes by Anonymous Coward · · Score: 0

    If you really did "fix" DarwinSurvivor's post, then perhaps you'd care to explain this part of your prior post?

    took awhile but it was worthwhile to get rid of old shows/movies I'll likely never watch.

  114. Re:Your eyes by kyrio · · Score: 1

    3TB HDD is less than $150 at newegg. It's faster than or as fast as any 7200RPM drive from previous years as well, even if it's a 5900RPM drive, because technology just gets better. I'm not even sure what you are going on about with SSD.

  115. Re:Your eyes by kyrio · · Score: 1

    Are you not American? newegg has them for less than $150.

  116. Re:Guess you should have used a real filesystem by Anonymous Coward · · Score: 0

    But OSX IS based on a "freetard" OS

  117. Try DiskWarrior by MacDaffy · · Score: 1

    Since your major partition is HFS+, I recommend DiskWarrior (http://www.alsoft.com/diskwarrior/). I've used it professionally for over ten years, and it still does the best job at finding, fixing, and reporting corrupted files on HFS disks.

  118. Re:BSOD? No, use open source "Tripwire" by Anonymous Coward · · Score: 0

    or just run md5deep on all files and save it to a text file; a full backup would have been the best strategy here though (I use bacula, which also stores the hash of a file it backs up).

  119. Re:Your eyes by Anonymous Coward · · Score: 0

    So, 1TB for $50 with no backup in case of failure or corruption. Or did you mean 1TB with redundancy for $150? That begins to fall into the "several hundred" category.

  120. Re:Newbie question hour? by Anonymous Coward · · Score: 0

    It's just a silly joke, calm down Francis.

  121. FITS would be better by Anonymous Coward · · Score: 0

    JHOVE is good, but only recognises a small number of files. It would be better to use the File Information Tool Set (FITS) which wraps several tools, including JHOVE.

    http://code.google.com/p/fits/

  122. md5sum a directory tree by Anonymous Coward · · Score: 0

    Years ago, I had a motherboard with a serverworks chipset which had the really fun properrty of corrupting hard drives if DMA was enabled.

    I wrote a simple python script that recursed through directories, and generated a md5sum file in each directory, containing the md5sum of each file.
    When run again, it compares the current files against the md5sum file and reports differences, including new files, removed files and changed files.
    It is GPL3 and works anywhere python 2.6 or newer exists, including windows xp...7 and linux.
    It is at http://jdeifik.com/ under "md5sum a directory tree"

    Of course, it won't fix broken files, but it will detect all corrupted files (assuming your ran it over good files once).

  123. Re:Newbie question hour? by Mysticalfruit · · Score: 1

    I've got a laptop running exclusively Ubuntu and the default behavior when the battery gets to 5% is to just hibernate.
    As for a validation of files, this is a truly trivial problem to solve...

    --
    Yes Francis, the world has gone crazy.
  124. Re:Your eyes by godefroi · · Score: 1

    I am, and that was the first place I looked...

    --
    Karma: Poor (Mostly affected by lame karma-joke sigs)
  125. Re:Your eyes by isorox · · Score: 1

    Maybe you didn't mean it this way, but dang if I did not see all the PHBs come out from work with your comment. "I can get 1TB Drives from Fryes for $80.00, why do you say it costs several hundred?".

    Are you one the the exchange admins that claims that 10,000 emails totalling 500MB is too much to have in your inbox, despite gmail AND hotmail not even sniffing at it (and providing rich-text cross-platfordm email, instant search)?

    For enterprise storage I tend to aim high at a capital cost of $500/TB, and running costs of $50/TB/year.

    That means purging 300GB saves $20 a year. If it only takes a day that's fast. It also costs the company $800 in lost time. Whoop.

  126. Re:Newbie question hour? by isorox · · Score: 1

    Ubuntu pops up a warning window, and if you ignore it the battery light turns orange, and then red, and then it should hibernate. Flat-out dying is not something I've come across under Ubuntu (and I have some flaky old machines with old batteries, and they still warn me and then shut down).

    I have. I have flakey batteries that report 0% when they've still got another 20 minutes or so left. When the warning pops up I often kill gnome-power-manager, save my work, and take my changes.

    Sometimes I don't get to power in time.

    I also have a problem if I knock my second battery out the CD bay (thinkpad). The machine doesn't like that when on battery power, and just turns off.

  127. Re:Newbie question hour? by isorox · · Score: 1

    mplayer can detect corrupted movie and audio files

    find . -name '*.mov' -exec mplayer -msglevel all=6 -speed 100.0 -framedrop -nogui -nolirc -cache 8192 -tskeepbroken -ao null -vo null {} \; | grep Warning! > $1.txt

    Change the *.mov as appropriate.

    <infomercial>its JUST. THAT. EASY folks!</infomercial>

    Yes it is, as the OP was kind enough to tell you advanced mplayer use.

    Compare this cut-and-paste job to using "simple" tools.

    Click Start
    Click Find
    type *.mov
    wait for the find to finish
    double click the top file
    watch it
    if it breaks, delete the file
    double click the next file
    continue for all files

    The CLI makes complex tasks easy and fast. When someone is kind enough to give you a copy and paste line it's even easier and faster.

  128. Re:Your eyes by neyla · · Score: 1

    How many copies you keep of the data is entirely beside the point.

    The point is that a disk full of data from a decade back, is only going to fill 1.5% of a new disc.

    Yes, if you keep 2 separate backups plus your primary storage, you're going to need *3* new discs, not one. But the presence of the old data doesn't change this noticeably.

    My point is "All my files from the last decade" are always going to be ~2 order of magnitude larger than "all my files from before that", thus deleting the latter is never going to save you more than a tiny percentage.

    Even if you keep 100 separate backups, deleting all the decade-old files will *still* only save you 1-2% of your total storage-needs.

  129. Re:Your eyes by Anonymous Coward · · Score: 0

    Did you even read what you typed before you hit submit?

  130. FBI by nobodyatnowhere · · Score: 1

    Send your drive to the FBI, they will scan it for you.