Slashdot Mirror


The Many Paths To Data Corruption

Runnin'Scared writes "Linux guru Alan Cox has a writeup on KernelTrap in which he talks about all the possible ways for data to get corrupted when being written to or read from a hard disk drive. This includes much of the information applicable to all operating systems. He prefaces his comments noting that the details are entirely device specific, then dives right into a fascinating and somewhat disturbing path tracing data from the drive, through the cable, into the bus, main memory and CPU cache. He also discusses the transfer of data via TCP and cautions, 'unfortunately lots of high performance people use checksum offload which removes much of the end to end protection and leads to problems with iffy cards and the like. This is well studied and known to be very problematic but in the market speed sells not correctness.'"

2 of 121 comments (clear)

  1. I think this has happened to me by jdigital · · Score: 4, Interesting

    I think I suffered from a series of Type III errors (rtfa). After merging lots of poorly maintained backups of my /home file system I decided to write a little script to look for duplicate files (using file size as a first indicator, then md5 for ties). The script would identify duplicates and move files around into a more orderly structure based on type, etc. After doing this i noticed that a small number of my mp3's now contain chunks of other songs in them. My script was only working with whole files, so I have no idea how this happened. When I refer back to the original copies of the mp3s the files are uncorrupted.

    Of course, no one believes me. But maybe this presentation is on to something. Or perhaps I did something in a bonehead fashion totally unrelated.

    --
    :wq ~ ~ ~ ~ ~
  2. Re:benchmarks by dgatwood · · Score: 5, Interesting

    I've concluded that nobody cares about data integrity. That's sad, I know, but I have yet to see product manufacturers sued into oblivion for building fundamentally defective devices, and that's really what it would take to improve things, IMHO.

    My favorite piece of hardware was a chip that was used in a bunch of 5-in-1 and 7-in-1 media card readers about four years ago. It was complete garbage, and only worked correctly on Windows. Mac OS X would use transfer sizes that the chip claimed to support, but the chip returned copies of block 0 instead of the first block in every transaction over a certain size. Linux supposedly also had problems with it. This was while reading, so no data was lost, but a lot of people who checked the "erase pictures after import" button in iPhoto were very unhappy.

    Unfortunately, there was nothing the computer could do to catch the problem, as the data was in fact copied in from the device exactly as it presented it, and no amount of verification could determine that there was a problem because it would consistently report the same wrong data.... Fortunately, there are unerase tools available for recovering photos from flash cards. Anyway, I made it a point to periodically look for people posting about that device on message boards and tell them how to work around it by imaging the entire flash card with dd bs=512 until they could buy a new flash card reader.

    In the end, I moved to a FireWire reader and I no longer trust USB for anything unless there's no other alternative (iPod, iPhone, and disks attached to an Airport Base Station). While that makes me somewhat more comfortable than dealing with USB, there have been a few nasty issues even with FireWire devices. For example, there was an Oxford 922 firmware bug about three years back that wiped hard drives if a read or write attempt was made after a spindown request timed out or something. I'm not sure about the precise details.

    And then, there is the Seagate hard drive that mysteriously will only boot my TiVo about one time out of every twenty (but works flawlessly when attached to a FW/ATA bridge chipset). I don't have an ATA bus analyzer to see what's going on, but it makes me very uncomfortable to see such compatibility problems with supposedly standardized modern drives. And don't get me started on the number of dead hard drives I have lying around....

    If my life has taught me anything about technology, it is this: if you really care about data, back it up regularly and frequently, store your backups in another city, ensure that those backups are never all simultaneously in the same place or on the same electrical grid as the original, and never throw away any of the old backups. If it isn't worth the effort to do that, then the data must not really be important.

    --

    Check out my sci-fi/humor trilogy at PatriotsBooks.