Slashdot Mirror


The Many Paths To Data Corruption

Runnin'Scared writes "Linux guru Alan Cox has a writeup on KernelTrap in which he talks about all the possible ways for data to get corrupted when being written to or read from a hard disk drive. This includes much of the information applicable to all operating systems. He prefaces his comments noting that the details are entirely device specific, then dives right into a fascinating and somewhat disturbing path tracing data from the drive, through the cable, into the bus, main memory and CPU cache. He also discusses the transfer of data via TCP and cautions, 'unfortunately lots of high performance people use checksum offload which removes much of the end to end protection and leads to problems with iffy cards and the like. This is well studied and known to be very problematic but in the market speed sells not correctness.'"

4 of 121 comments (clear)

  1. End-to-end by Intron · · Score: 4, Informative

    Some enterprise server systems use end-to-end protection, meaning the data block is longer. If you write 512 bytes of data + 12 bytes or so of check data and carry that through all of the layers, it can prevent the data corruption from going undiscovered. The check data usually includes the block's address, so that data written with correct CRC but in the wrong place will also be discovered. It is bad enough to have data corrupted by a hardware failure, much worse not to detect it.

    --
    Intron: the portion of DNA which expresses nothing useful.
  2. Hello ZFS by Wesley+Felter · · Score: 4, Informative

    ZFS's end-to-end checksums detect many of these types of corruption; as long as ZFS itself, the CPU, and RAM are working correctly, no other errors can corrupt ZFS data.

    I am looking forward to the day when all RAM has ECC and all filesystems have checksums.

    1. Re:Hello ZFS by harrkev · · Score: 3, Informative

      I am looking forward to the day when all RAM has ECC and all filesystems have checksums.
      Not gonna happen. The problem is that ECC memory costs more, simply because there is 12.5% more memory. Most people are going to go for as cheap as possible.

      But, ECC is available. If it is important to you, pay for it.
      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
  3. Real-life proof of ZFS detecting problems by E-Lad · · Score: 3, Informative

    Give this blog entry a read:
    http://blogs.sun.com/elowe/entry/zfs_saves_the_day_ta

    And you'll understand :)