Slashdot Mirror


Error-Proofing Data With Reed-Solomon Codes

ttsiod recommends a blog entry in which he details steps to apply Reed-Solomon codes to harden data against errors in storage media. Quoting: "The way storage quality has been nose-diving in the last years, you'll inevitably end up losing data because of bad sectors. Backing up, using RAID and version control repositories are some of the methods used to cope; here's another that can help prevent data loss in the face of bad sectors: Hardening your files with Reed-Solomon codes. It is a software-only method, and it has saved me from a lot of grief..."

49 of 196 comments (clear)

  1. It can make files a bit hard to read, though by Bromskloss · · Score: 3, Funny

    salkdffalkfhwefh2ihr5j45!"Â5jkcq2%"45wceh5 234j5cja4h5c2q4x524qZTkzzj3kzg3qkgl3kzgq3kjgh kq3gkzlq3hwgjlh 34qlgch34ljkw93q0x45c45 #&%#%&5vcXÂ%YXCHGC%ub64bVE5&UBy4vy5yc5E&Â E%vu64EV46rcuw4&C/4w6

    --
    Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
    1. Re:It can make files a bit hard to read, though by xquark · · Score: 4, Insightful

      It really depends where you store the FEC, some techniques store the fec separately others concatenate and others interleave the FEC. Each method has its own advantages and disadvantages.

      --
      Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
    2. Re:It can make files a bit hard to read, though by Tablizer · · Score: 5, Funny

      2ihr5j45!"Â5jkcq2%"45wceh5 234j5cja4h5c2q4x524q kq3gkzlq3hwgjlh 34qlgch34ljkw93q0x45c45 #&%#%&5vcXÂ%

      I for one welcome our Perl Overlords

  2. Drives already do this by Rene+S.+Hollan · · Score: 3, Informative

    ... at least CDROMs employ RS codes.

    --
    In Liberty, Rene
    1. Re:Drives already do this by Architect_sasyr · · Score: 5, Informative
      From CD-ROM wiki:

      A CD-ROM sector contains 2352 bytes, divided into 98 24-byte frames. The CD-ROM is, in essence, a data disk, which cannot rely on error concealment, and therefore requires a higher reliability of the retrieved data. In order to achieve improved error correction and detection, a CD-ROM has a third layer of Reed-Solomon error correction.[1] A Mode-1 CD-ROM, which has the full three layers of error correction data, contains a net 2048 bytes of the available 2352 per sector. In a Mode-2 CD-ROM, which is mostly used for video files, there are 2336 user-available bytes per sector. The net byte rate of a Mode-1 CD-ROM, based on comparison to CDDA audio standards, is 44.1k/s×4B×2048/2352 = 153.6 kB/s. The playing time is 74 minutes, or 4440 seconds, so that the net capacity of a Mode-1 CD-ROM is 682 MB.

      I'd say that's a yes.

      --
      Me failed English...
      FreeBSD over Linux. If my comments seem odd, this may explain...
    2. Re:Drives already do this by Marillion · · Score: 4, Interesting

      My biggest failed prediction in the world of computers was the CD-ROM.

      I was an audio CD early adopter and I knew from articles I read that audio CD's often had a certain defect rate. The defect rate was usually such that you would never hear it. One artist even published all the defects in the liner notes.

      Based upon this, I presumed that you would never get the defect rate to zero and that no one would trust a data medium with anything less than perfection - and thus predicted the CD-ROM would never catch on.

      They don't have to get the rate to zero. Just close enough to zero for the RS to function.

      --
      This is a boring sig
    3. Re:Drives already do this by Anonymous Coward · · Score: 2, Informative

      Yeah, but is there another problem elsewhere in the system? I have an el-cheapo USB-PATA adapter with a MTBF (mean time to bit flip) of about 2 hours. Every other disk was ruined, and I only knew because of a sick little obsession with par2. Disk data is ECC'd, PATA data is parity checked, and USB data is checksummed. Still, inside one little translator chip, all that can be ruined. and that's why data integrity MUST be an operating/file system service.

    4. Re:Drives already do this by xquark · · Score: 3, Interesting

      My understanding is that it is possible to drill a few holes no larger than 2mm in diameter equally spread over the surface of an "audio cd" and with the help of h/w RS erasure decoding, channel interleaving and channel prediction (eg:probabilistically reconstruct missing right channel from known left channel) one can produce a near perfect reconstruction - that's what usually happens to overcome scratches and other kinds of simple surface defects.

      --
      Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
    5. Re:Drives already do this by wik · · Score: 2, Funny

      You haven't read too many theses. There are holes all over the place...

      --
      / \
      \ / ASCII ribbon campaign for peace
      x
      / \
    6. Re:Drives already do this by Solandri · · Score: 5, Informative

      That's a pretty fundamental part of information theory - communication in a noisy channel. If your communications (or data storage) are digital, you can overcome any level of random noise (error) at the cost of degraded transmission rate (increased storage requirement). Before CDs, it was (and still is) most prevalent in modem protocols and hard drives. Modern hard drives would probably be impossible without it - read errors are the norm, not the exception. It's just hidden from the high-level software by multiple levels of error correction in the low-level firmware.

    7. Re:Drives already do this by Solandri · · Score: 5, Informative

      Data is stored linearly on a CD (and DVD). So the data can survive huge scratches running from the center to edge, but is very susceptible to radial scratches rotated around the center. If you think of a CD as an old-style phonograph record, you can scratch across the grooves and the error correction will fix it; but scratching along a groove will quickly corrupt the data because the scratch will destroy sequential data (and its ECC). That's why they recommend cleaning CDs by wiping from the center out, never in a circular motion.

    8. Re:Drives already do this by femto · · Score: 4, Interesting

      Another view is that everything is a code in a noisy environment, so there is no way to talk about "the underlying device" as it itself is just another type of coding. Magnetic recording can be viewed as a way of encoding information onto the underlying (thermal) noisy matter. There is some very deep stuff happening in information theory. Let's take the empty universe as a noisy channel. Now every structure in the universe (including you and me) becomes information encoded over the empty universe. One gets the feeling that any "ultimate theory" won't be expressed in terms of forces and fields but some underlying, unifying, concept of information.

    9. Re:Drives already do this by norton_I · · Score: 5, Funny

      Dude, put the bong down.

    10. Re:Drives already do this by Yvan256 · · Score: 3, Funny

      Other valid replies would have been "I'll have what he's having" or "Pass it around, please".

    11. Re:Drives already do this by MagdJTK · · Score: 2, Interesting

      Indeed. In fact, the code used for CDs can cope with 4000 consecutive bits being unreadable. Quite remarkable!

    12. Re:Drives already do this by Wowsers · · Score: 4, Interesting

      I loved my DAT (for audio) portable recorder, it employed Double-Reed-Solomon error correction, you would have to do some serious hammering to the side of the recorder to get the tape to "skip" in a way the error correction could not correct it and you'd hear it drop out, running and recording was NOT out of the question though.

      Now what do the consumers have for recorders - cr*ppy, cheap, nasty, low bitrate, overcompressed MP3 recorders. The recording industry killed off an excellent (but expensive) format to palm off rubbish compressed audio to the masses. (Proper PCM recorders are no different in price to the DAT decks).

      --
      Take Nobody's Word For It.
    13. Re:Drives already do this by complete+loony · · Score: 3, Informative

      AFAIK when a disk is scratched you are more likely to get a tracking error than a failure to decode the audio.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    14. Re:Drives already do this by mentaldrano · · Score: 4, Informative

      Radial scratches go from center to edge, azimuthal scratches go around the center.

    15. Re:Drives already do this by PitaBred · · Score: 2, Insightful

      Thing is, the "overcompressed" MP3 recorders are good enough. Most people use them to record lecture notes, or a meeting, or just talking to themselves. Those are about the only reasons to really need a portable recorder, and for those uses, mp3 is very good. Just because it's low bitrate doesn't mean it's bad, and just because your DAT recorder had higher quality doesn't mean it's more fit for the purposes it would be used for. Seriously... running and recording? Why would you ever want to do that?

  3. ZFS? by segfaultcoredump · · Score: 3, Interesting

    Uh, is this not one of the main features of the ZFS file system? It does a checksum on every block written and will reconstruct the data if an error is found? (assuming you are using either raid-z or mirroring. Otherwise it will just tell you that you had an error).

    1. Re:ZFS? by xquark · · Score: 5, Informative

      checksums really only help in detecting errors. Once you've found errors, if you have an exact redundancy somewhere else you can repair the errors. What reed-solomon codes do is provide the error detecting ability but also the error correcting ability whilst at the same time reducing the amount of redundancy required to a near theoretical minimum.

      btw checksums have limits on how many errors they can detect within lets say a file or other kind of block of data. A simple rule of thumb (though not exact) is that 16 and 32 bit checksums can detect upto 16,32 bit errors respectively anymore and the chance of not detecting every bit error goes up, it could even result in not finding any errors at all.

      --
      Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
    2. Re:ZFS? by bobbozzo · · Score: 2, Informative

      Mirroring is RAID-1, not 0.

      --
      Nothing to see here; Move along.
    3. Re:ZFS? by this+great+guy · · Score: 3, Informative

      I have been a ZFS user for a while and know a lot of its internals. Let me comment on what you said.

      checksums really only help in detecting errors.

      Not in ZFS. When the checksum reveals silent data corruption, ZFS attempts to self-heal itself by rewriting the sector with a known good copy. Self-healing is possible if you are using mirroring, raidz (single parity), raidz2 (dual parity), or even a single disk (provided the copies=2 filesystem attribute is set). The self-healing algorithm in the raidz and raidz2 cases is actually interesting as it is based on combinatorial reconstruction: ZFS makes a series of guesses as to which drive(s) returned bad data, it reconstructs the data block from the other drives, and then validates whether this guess was correct or not by verifying the checksum.

      checksums have limits on how many errors they can detect.

      All the ZFS checksumming algorithms (fletcher2, fletcher4, SHA-256) generate 256-bit checksums. The default is fletcher2 and offers very good error detection (even errors affecting more than 256 bits of data) assuming unintentional data corruption (the fletcher family are not a cryptographic hash algorithms, it is actually possible to intentionally find collisions). SHA-256 is collision-resistant therefore it will in practice detect all data corruptions. It would be computationally infeasible to come up with a corrupted data block that still matches the SHA-256 checksum.

      A good intro to the ZFS capabilities are these slides

  4. Harden Files by inKubus · · Score: 2, Funny

    When he said "harden files", I thought he was going into a long soliloquy on all the porn on his computer, so I went to the next story.

    --
    Cool! Amazing Toys.
  5. Never underestimate the redunancy... by symbolset · · Score: 2, Interesting

    Look, if it's secret, one copy is too many. For everything else, gmail it to five separate recipients. It's not like Google has ever lost any of the millions of emails I've received to date. (This is not a complaint -- they don't show me the spam unless I ask for it).

    And if they ever did lose an email, well, to paraphrase an old Doritos commercial, "They'll make more."

    Seriously, personally I view the the persistence of data as a problem. It's harder to let go of than it is to keep.

    --
    Help stamp out iliturcy.
  6. Re:Been doing this from the past 3 decades by Nefarious+Wheel · · Score: 5, Funny
    Arrrh, aye, this be done since the dawn of time, matey! Ever since the days before global warming when pirates kept a second pistol in their belt just in case. Cap'n Jack Reed in the Solomons would harden his data with a second powder charge when the occasion demanded it.

    "Awk! Parroty Error! Parroty Error! Pieces of Seven, Pieces of Seven"

    (*BOOM*) never did like that bird.

    --
    Do not mock my vision of impractical footwear
  7. Speed? by grasshoppa · · Score: 3, Interesting

    My question is of speed; this seems a promising addition to anyone's back up routine. However, most folks I know have 100s of gigs of data to back up. While differentials could be involved, right now tar'ing to tape works fast enough taht the backup is done before the first staff shows up for work.

    I assume we're beating the hell out of the processor here; so I'm wondering how painful is this in terms of speed?

    --
    Mod me down with all of your hatred and your journey towards the dark side will be complete!
    1. Re:Speed? by Zadaz · · Score: 2, Informative

      Well since my $100 Radio Shack CD player I bought in 1990 could do it in real-time I'm guessing that the requirements are pretty low. In fact a lot of hardware already uses it.

      If you read the rest of the page you find out it's very ingenious and efficient at doing what it does.

      While it's certainly not new (it's from 1960) or unused (hell, my phone uses it to read QR codes) I'm sure its something that has been under the radar of a lot of Slashdot readers, so I'll avoid making a "slow news day" snark.

    2. Re:Speed? by xquark · · Score: 4, Interesting

      The speed of encoding and decoding directly relates to the type of RS and the amount of FEC required. Generally speaking erasure style RS can go as low as O(nlogn) (essentially inverting and solving for a vandermonde or Cauchy style matrix) A more general code that can correct errors (the difference between an error and an erasure is that in the latter you know the location of the error but not its magnitude) may require a more complex process, something like Syndrome-Berlekamp Massey-Forney which is about O(n^2).

      It is possible to buy specialised h/w (or even GPUs) to perform the encoding steps (getting roughly 100+MB/s) and most software encoders can do about 50-60+Mb/ for RS(255,223) - YMMV

      --
      Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
  8. Version control != backups by XorNand · · Score: 2, Insightful

    Please, please stop thinking of version control as some sort of backup. When we initially started mandating the use of version control software, developers would just using the "commit" button instead of the "save" button. It makes it *much* more difficult to traverse through the repo when you have three dozen commits per day, per developer, each commented with "ok. really should be fixed now." The worst offenders were issued an Etchasketch for a week while their notebooks went in for service *cough*. Problem solved.

    --
    Entrepreneur : (noun), French for "unemployed"
    1. Re:Version control != backups by dgatwood · · Score: 2, Insightful

      Well, you shouldn't commit until you believe you have it in a state where the changes are usable (i.e. don't break the tree), but beyond that, I'd rather see more commits of smaller amounts of code than giant commits that change ten thousand things. If you end up having to back out a change, it's much easier if you can easily isolate that change to a single commit. My rule is commit early, commit often. I'm not the only one, either:

      http://blog.red-bean.com/sussman/?p=96

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    2. Re:Version control != backups by ceswiedler · · Score: 3, Insightful

      The best solution is for developers to use their own private branches. Then they can commit as much as they want, and integrate into the main branch when they're ready. Unfortunately subversion has crappy support for integration (even with version 1.5 AFAICT) compared to something like perforce.

  9. huh??? by Jane+Q.+Public · · Score: 2, Insightful

    If you think storage quality has been nose-diving, then you haven't been around very long. It just isn't so.. and there really is not much more I can say to add to that.

    I have been around this industry quite a while, and I call bullshit on that.

  10. Yes, RS should be a file system service. by Futurepower(R) · · Score: 3, Insightful

    "... data integrity MUST be an operating/file system service."

    I agree. I'm willing to have a small loss in speed and a small increase in price to have better data integrity.

    There is already data integrity technology embedded in hard drives, and I support making it more robust.

    1. Re:Yes, RS should be a file system service. by iminplaya · · Score: 5, Funny

      Those who sacrifice speed for data integrity deserve neither - BF

      --
      What?
  11. Re:Interesting by Anonymous Coward · · Score: 3, Informative

    The cross platform program dvdisaster will add extra information to your DVD as an error correcting code. Alternatively, you can make a parity file for an already-existing DVD and save it somewhere else.

    It actually has a GUI too, so it must be user friendly.

  12. Or you could just use PAR... by InakaBoyJoe · · Score: 2, Interesting

    TFA introduces some new ".shielded" file format. But do we need yet another file format when PAR (Parchive) has been doing the same job for years now? The PAR2 format is standardized and well-supported cross-platform, and might just have a future even IF you believe that Usenet is dying...

    I always thought it would be cool to have a script that:

    • Runs at night and creates PAR2 files for the data on your HD.
    • Occasionally verifies file integrity against the PAR2 files.

    With a system like this, you wouldn't have to worry about throwing away old backups for fear that some random bit error might have crept into your newer backups. Also, if you back up the PAR2 files together with your data, as your backup media gradually degrades with time, you could rescue the data and move it to new media before it was too late.

    Of course, at the filesystem level there is always error correction, but having experienced the occasional bit error, I'd like the extra security that having a PAR2 file around would provide. Also, filesystem-level error correction tends to happen silently and not give you any warning until it fails and your data is gone. So a user-level, user-adjustable redundancy feature that's portable across filesystems and uses a standard file format like PAR would be really useful.

  13. what about quickpar and dvdisaster? by MoFoQ · · Score: 4, Informative

    quickpar especially has been in use on usenet/newsgroups for years....o yea...forgot....they are trying to kill it.

    anyways...there's also dvdisaster which now has several ways of "hardening".
    one of them seems to catch my attention: adds error correction data to a CD/DVD (via a disc image/iso)

  14. Re:As I understand it... by Firehed · · Score: 2, Interesting

    From what I've read and heard, ZFS is designed to pretty much be the last filesystem we'll ever need. I'm pretty sure they've considered hash collisions with regards to data integrity.

    Also consider that you probably won't need to reconstruct the entire sector, but only a few bits from it. If there was some sort of insane scenario where you had to reconstruct a complete 1GB block from a single MD5 hash... (ie, "here's an MD5 hash. Give me a sequence of 1073741824 bytes to make it") well it's technically possible, though the electric bill for your server farm may piss off more than a few treehuggers. On the other hand, if you had only a few bytes that needed repair, brute-force reconstruction, while still time-consuming, suddenly becomes more more feasible. I always wonder why I can't apply this kind of logic to torrents with that one file stuck at 99.98%...

    I'm sure that kind of thing is largely irrelevant with ZFS as it's designed to be somewhat more efficient, but you get the point.

    --
    How are sites slashdotted when nobody reads TFAs?
  15. Re:As I understand it... by Pseudonym · · Score: 3, Insightful

    Ok, lets assume its a 128-bit hash. For a 1GB file how many combinations of 1GB will produce the same hash?

    You're asking the wrong question.

    The right question is: Given a 1Gb file, how much "mutation" do you have to do to it to produce a file with the same hash? And the answer to that is: Enough to make the data unrecoverable no matter what you do.

    --
    sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  16. This is not the same thing as PAR ... by DrJimbo · · Score: 4, Informative
    ... even though both TFA and PAR use Reed-Solomon.

    The difference is that TFA interleaves the data so it is robust against sector errors. A bad sector contains bytes from many different data blocks so each data block only loses one byte which is easy to recover from. If you use PAR and encounter a bad sector, you're SOL.

    PAR was designed to solve a different problem and it solves that different problem very well but it wasn't designed to solve the problem that is addressed by TFA. Use PAR to protect against "the occasional bit error" as you suggest, but use the scheme given in TFA to protect against bad sectors.

    --
    We don't see the world as it is, we see it as we are.
    -- Anais Nin
  17. Re:Been doing this from the past 3 decades by Reed+Solomon · · Score: 2, Funny

    Bah, I never make any erorrs

  18. Datarecovery "data". by rew · · Score: 4, Insightful

    Working for a datarecovery company, I know that about half the cases where data is lost the whole drive "disappears". So, bad sectors? You can solve that problem with reed solomon! Fine! But that doesn't replace the need for backups to help you recover from: accidental removal, fire, theft and total disk failure (and probably a few other things I can't come up with right now)... .

  19. Re:they're good for different things by xquark · · Score: 2, Informative

    Your comment is incorrect, RS codes are a subset of BCH codes. In fact BCH codes are a general definition of a class of algebraic codes nothing more. your comment about one being better than the other for a specific purpose is wrong.

    Think of BCH codes as "vehicle" and RS codes as "The Bugatti Veyron" that is the relationship.

    --
    Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
  20. Re:You are all dumb as there is only one way. by billcopc · · Score: 3, Interesting

    Reed-Solomon is ancient compared to par2.

    No, you're dumb. Par2 IS Reed-Solomon. Silly me to expect an AC to fact-check the most trivial subjects of a post.

    The procedure explained in TFA is basically adapting a different tool to behave more or less like single-file par2. That makes it redundant (in the /. sense, not the data-recovery sense).

    There is one thing I would love to see, and that's local disk checksumming. That's right, take a 500gb disk, chop it into slices and do RAID-5 on them as if they were individual spindles. It's been years since I've had a hard drive actually die on me, but I've seen bit-errors more often than I'd like. Having self-checking built into the filesystem (or low-level disk access) would help ensure 100% data integrity, and you could still do RAID-1 on top of it for safety.

    --
    -Billco, Fnarg.com
  21. Re:As I understand it... by twistedcubic · · Score: 2, Insightful

    You mean to say injective, though being bijective is sufficient.

  22. Re:You are all dumb as there is only one way. by brix · · Score: 2, Informative

    I do the same as the AC, but I keep a copy of the smallest par2 from the set on my local drive for recovery (and back these up as well). If a CD/DVD ever goes bad to the point it won't even read the FS, you can still create an ISO file of it including all errors. The par2 recovery can be done using the ISO image at that point, and as long as the damage to the DVD didn't exceed the redundancy level, full recovery of the original files is possible.

    Note that you aren't recovering the ISO itself at this point, you are using the ISO as input to par2repair (or the GUI). The recovery is done using the blocks of the original files and pars. The end result is the original file(s) stored on the disc.

    It sounds like dvdisaster does something similar, but I've been using this technique with pars for a few years now.

  23. Re:I wish PAR2 would have kept improving... by Fnord666 · · Score: 2, Informative

    Eventually, I started using ICE ECC, http://www.ice-graphics.com/ICEECC/IndexE.html, free as in beer, to enhance my DVD backups of stuff like photos and data. IIRC, I tested it's ability to reconstruct missing files and it seemed OK at the time.

    Unfortunately this software looks like it is closed source and windows only. A program to apply error correcting codes to your archived files is only useful if you still have a platform to run it on. Hopefully 15 years from now when you go to recover your files you have an old windows machine still available for use.

    --
    'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
  24. Re:I wish PAR2 would have kept improving... by catenx · · Score: 2, Insightful

    The biggest limitation of PAR2 for me is the lack of directory handling. You can only create and verify parchives for the files within a directory. One solution is a script that runs the PAR creation or verification for each subdirectory but this is hardly elegant. Hard to use backup is backup that isn't used. A better solution is what ICE ECC offers.

    Agreeing with Fnord666, the software does not use an open algorithm. The general tone of the site is "use this software it is awesome, don't argue". There doesn't seem to be verification of its awesomeness. Furthermore, the program author's tone in many of the forum posts is abrasive and near combative when people question it.

    PAR2 is proven but limited. This /. post is the closest I've seen to addressing the progression of software past Parchives, or at least enhancing the PAR spec for new needs (directory traversing for example).

    Is this really the case, that no one has taken PAR2 to the next level? Judging from the lack of links in these comments to the flamebaiting posts of "we've been doing this for years" there isn't much progress.

    I want PAR3.