Error-Proofing Data With Reed-Solomon Codes
ttsiod recommends a blog entry in which he details steps to apply Reed-Solomon codes to harden data against errors in storage media. Quoting: "The way storage quality has been nose-diving in the last years, you'll inevitably end up losing data because of bad sectors. Backing up, using RAID and version control repositories are some of the methods used to cope; here's another that can help prevent data loss in the face of bad sectors: Hardening your files with Reed-Solomon codes. It is a software-only method, and it has saved me from a lot of grief..."
slow news day anyone?
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
salkdffalkfhwefh2ihr5j45!"Â5jkcq2%"45wceh5 234j5cja4h5c2q4x524qZTkzzj3kzg3qkgl3kzgq3kjgh kq3gkzlq3hwgjlh 34qlgch34ljkw93q0x45c45 #&%#%&5vcXÂ%YXCHGC%ub64bVE5&UBy4vy5yc5E&Â E%vu64EV46rcuw4&C/4w6
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
... at least CDROMs employ RS codes.
In Liberty, Rene
it already exists - QPAR
Uh, is this not one of the main features of the ZFS file system? It does a checksum on every block written and will reconstruct the data if an error is found? (assuming you are using either raid-z or mirroring. Otherwise it will just tell you that you had an error).
It would be nice if it was a file system layer like encFS but for error-correction.
I've been burned by scratched DVD+Rs too many times. I'd be interested if there were a way to do this kind of thing in Windows..
I am the maverick of Slashdot
Just use a modern versioning system, such as Git or Mercurial, which keep track of everything using hashes. Then, you not only get to detect and repair errors, you also get version history.
"The way storage quality has been nose-diving in the last years" I disagree totally all my modern drives are working whereas I used to plagued by hard disk failure. For example are any modern drives as bad as the deathstar? in the past 5 years I have had 0 failures but in the 5 preceding that I had about 8 drive failures. Small sample size I know but to me hard drives are getting better.
When he said "harden files", I thought he was going into a long soliloquy on all the porn on his computer, so I went to the next story.
Cool! Amazing Toys.
It's like a code that can correct errors!
Look, if it's secret, one copy is too many. For everything else, gmail it to five separate recipients. It's not like Google has ever lost any of the millions of emails I've received to date. (This is not a complaint -- they don't show me the spam unless I ask for it).
And if they ever did lose an email, well, to paraphrase an old Doritos commercial, "They'll make more."
Seriously, personally I view the the persistence of data as a problem. It's harder to let go of than it is to keep.
Help stamp out iliturcy.
Yeah, I can spell it. Get a libe.
Help stamp out iliturcy.
My question is of speed; this seems a promising addition to anyone's back up routine. However, most folks I know have 100s of gigs of data to back up. While differentials could be involved, right now tar'ing to tape works fast enough taht the backup is done before the first staff shows up for work.
I assume we're beating the hell out of the processor here; so I'm wondering how painful is this in terms of speed?
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Please, please stop thinking of version control as some sort of backup. When we initially started mandating the use of version control software, developers would just using the "commit" button instead of the "save" button. It makes it *much* more difficult to traverse through the repo when you have three dozen commits per day, per developer, each commented with "ok. really should be fixed now." The worst offenders were issued an Etchasketch for a week while their notebooks went in for service *cough*. Problem solved.
Entrepreneur : (noun), French for "unemployed"
Whenever you back up to CD or DVD, fill up any unused remaining space with par files generated from the data being backed up.
Reed-Solomon is ancient compared to par2.
If you think storage quality has been nose-diving, then you haven't been around very long. It just isn't so.. and there really is not much more I can say to add to that.
I have been around this industry quite a while, and I call bullshit on that.
"... data integrity MUST be an operating/file system service."
I agree. I'm willing to have a small loss in speed and a small increase in price to have better data integrity.
There is already data integrity technology embedded in hard drives, and I support making it more robust.
since it is only a "snapshot" of the data at a particular time. Any time you change the data, you have to do another "snapshot". What a major pain in the ass.
This might be useful for archived files, but not something you change on a regular basis.
Yes, CDs and DVDs have error correction built in, but they don't do much if you happen to a nice scratch that follows the spin of the disk. I.e. a moderate scratch from the outside to the inside of a CD is reasonably OK for data, but a scratch the other way will kill your data much more easily.
For a while I was using PAR2, yes, the PAR2 used on USENET, to beef up the safety of my DVD backups of my home data. Unfortunately, PAR2 never really evolved to handle subdirectories properly, which mattered when I wanted an off-site backup of my digital photos.
Eventually, I started using ICE ECC, http://www.ice-graphics.com/ICEECC/IndexE.html, free as in beer, to enhance my DVD backups of stuff like photos and data. IIRC, I tested it's ability to reconstruct missing files and it seemed OK at the time.
Anyways, that's my $0.02 on Reed-Solomon for backups.
TFA introduces some new ".shielded" file format. But do we need yet another file format when PAR (Parchive) has been doing the same job for years now? The PAR2 format is standardized and well-supported cross-platform, and might just have a future even IF you believe that Usenet is dying...
I always thought it would be cool to have a script that:
With a system like this, you wouldn't have to worry about throwing away old backups for fear that some random bit error might have crept into your newer backups. Also, if you back up the PAR2 files together with your data, as your backup media gradually degrades with time, you could rescue the data and move it to new media before it was too late.
Of course, at the filesystem level there is always error correction, but having experienced the occasional bit error, I'd like the extra security that having a PAR2 file around would provide. Also, filesystem-level error correction tends to happen silently and not give you any warning until it fails and your data is gone. So a user-level, user-adjustable redundancy feature that's portable across filesystems and uses a standard file format like PAR would be really useful.
Doesn't par2 already employ reed-solomon? (http://en.wikipedia.org/wiki/Parchive)
And it has all sorts of options let you configure the amount of redundancy you'd like?
And it has (ahem) been very well tested in the recovery of incomplete binary archives ... ?
Now that usenet has been stripped of binaries, we'll have to find other uses for these tools ....
Why aren't you encrypting your e-mail?
ZFS checksums are actually hashes, as in "cryptographic hash", so they're pretty damn reliable (though theoretically 100% reliable) at detecting errors.
HAND.
quickpar especially has been in use on usenet/newsgroups for years....o yea...forgot....they are trying to kill it.
anyways...there's also dvdisaster which now has several ways of "hardening".
one of them seems to catch my attention: adds error correction data to a CD/DVD (via a disc image/iso)
I'm glad it's not just me thinking my drives are dying sooner than they once did.
Why is storage quality going down, and what does that mean for that 1TB drive for $200 bucks? Will it's lifespan exceed two years?
http://parchive.sourceforge.net/
Who is storage critical files without error correction and a checksummed file system?
Or in other words, ECC + ZFS.
You can't complain about Data Loss, if you're running cheap desktop hardware on NTFS.
Excellent, I agree. But generation of Par2 files should be automatic. I don't mind having only 3.5 gigabytes on a DVD for data if the Par2 files are generated and tested automatically.
Par2 is apparently Reed-Solomon done in a more helpful way.
Quote from the Parity Volume Set Specification 2.0: "PAR 2.0 uses a 16-bit Reed-Solomon code and can support 32768 blocks."
RAID6 uses Reed-Solomon error correction. In fact, RAID5 can be viewed as a special case of RAID6.
This thing looks like a solution in search of a problem. Slow news day?
___
If you think big enough, you'll never have to do it.
The difference is that TFA interleaves the data so it is robust against sector errors. A bad sector contains bytes from many different data blocks so each data block only loses one byte which is easy to recover from. If you use PAR and encounter a bad sector, you're SOL.
PAR was designed to solve a different problem and it solves that different problem very well but it wasn't designed to solve the problem that is addressed by TFA. Use PAR to protect against "the occasional bit error" as you suggest, but use the scheme given in TFA to protect against bad sectors.
We don't see the world as it is, we see it as we are.
-- Anais Nin
These codes, http://en.wikipedia.org/wiki/BCH_code , are far superior.. However, both Miller code and these pale in comparison to Low Density Parity Check codes. http://en.wikipedia.org/wiki/Low-density_parity-check_code
Yes, this has been done forever etc. but has anyone experienced any ugly "bit rot"? I mean, I've had firewalls that would checksum applications and if it ever complained about surprise changes I didn't catch it. Equally I have about 100GB for which I have CSVs - no spontanious corruption to note. Source code should very easily fail to compile if a random bit was flipped, also can't think of any case. I guess if it's that important having a PAR file with some recovery data won't hurt but first you I'd take RAID + backups any day.
Live today, because you never know what tomorrow brings
There is already the pretty mature and fairly widely used par2 application that already does this.
It's handy for downloading binaries off usenet, where you might lose a few parts.
(Say, only be able to download 18 of the 20 files)
I've also used it to success when burning files to cdr, that have later become corrupted beyond what the cd error correction could handle.
Much recommended!
http://en.wikipedia.org/wiki/Parchive
This site has really great info. I love finding out about all these codes. Thanks
Jay
Cyber Monday
Channel noise can be overcome via increased redundancy in transmission/storage, thereby reducing the effective transfer rate/storage density. Film at 11.
I could be wrong, but I'm pretty sure this is why we have on-disk (and on-bus) checksums and ECC RAM. And frankly if your mission-critical data is being ruined by DVD scratches, adding RS codes to your DVDs is probably not going to solve the fundamental problem of system administrator incompetence.
/ Seriously, these days Fark has more technically competent and interesting articles than /.
Working for a datarecovery company, I know that about half the cases where data is lost the whole drive "disappears". So, bad sectors? You can solve that problem with reed solomon! Fine! But that doesn't replace the need for backups to help you recover from: accidental removal, fire, theft and total disk failure (and probably a few other things I can't come up with right now)... .
Australia Post implemented the Royal Mail's 4 state barcoding system for all bulk and pre-sorted mail categories. The barcode incorporates RS and greatly improves the scan rate of damaged mail. The RM4SCC was adopted throughout the US and Canada.
Task Mangler
That'll teach me to leave out a "not". Of course, I meant "theoretically NOT 100% reliable". :)
The odds of collisions against a given fixed hash (which a hash for a data block is) of course depend on the method, but they are miniscule -- probably less than random bits flips on the bus or in RAM. Has anyone even ever found a single example of a SHA256 collision?
Even so, you can *NEVER* be absolutely 100% sure that the data is what you wrote. Even a two-way RAID1 doesn't get you there since you could (theoretically) have identical errors on both drives. Increasing it to a three-way RAID1 with a majority vote (or even just outright declaring an unrecoverable error when a mismatch is found) gets you closer to 100%, but errors are still theoretically possible.
So the point is: You can never attain 100%, but how close to 100% do you want to get? For me, ZFS hashes are "good enough".
HAND.
Mod parent, grandparent, and great grandparent way up please - this is the most significant thread in the comments so far.
o/~ Join us now and share the software
BCH is good for correcting large numbers of small errors. RS is good for correcting "bursty" errors.
You want to use BCH in media where bits bear no relation to each other, like in NAND where a cell contains only 1 or 2 bits, and adjacent cells are unaffected.
RS is better on things like hard drives where a flaw in the media is likely to produce longer runs of errors in a row. Two sequential bits on a hard drive are interdependent.
Also, the whole article is dumb. First, hard drives don't appear to be getting worse lately. Additionally, every mass storage device you use (hard drives or NAND flash) already uses error correction. Additionally, SATA uses error correction over the bus, and PCIe does too. If your machine has ECC RAM (most don't) and a SATA storage interface, then your data is already covered by ECC all the way from the storage into memory.
So why add more?
http://lkml.org/lkml/2005/8/20/95
The problem with this method is that drives already store data with this or similar techniques.
When you get a failure you're likely to lose a sector which is far larger than this scheme can save!
I'm sorry, but this is stupid. Error correction is done at the level of the disk controller. You gain nothing by re-doing it at the level of the file system. You only get file-system level errors when you don't pay attention to the disk controller telling you that the disk is going bad and wait for the disk to degrade to the point where errors can't be corrected anymore.
Install one of the many utilities that monitor disk health and replace your disk when they tell you there's a problem with your disk.
They are better for dispersing data over storage media than using error correction codes. In essence, an IDA transforms a file into k blocks of data, and any n (nk blocks suffice to reconstruct the file. (It doesn't matter which blocks you choose for reconstruction, as long as you have n different blocks, you're fine.)
Unfortunately there don't seem to be many tools or libraries available, so you have to implement the IDA yourself and this requires a bit of math.
so when is this coming as a block device so its not clumsy as hell?
Forward error correction using vandermode matrices does this quite nicely. There are N-K codes that allow K blocks to be encoded in N blocks (N>K) so that any K of the N can be used to decode. Thus you can loose N-K blocks. For blocks, read tracks, or sectors, or whatever unit is typically lost in a media failure.
.. at least CDROMs employ RS codes.
RS codes are good at corecting randomly scattered bit errors. The error mode in CDs is missing chunks (eg scrathces). So, they use a mechanism which scatters bits around. When a scratch (correlated errors) is descattered, they become randoml sacttered errors, so the RS codes can do their job.
SJW n. One who posts facts.
There is an entire field of study related to this topic, but, in short, Reed-Solomon codes are not currently the state of the art. There are much more efficient iterative codes (e.g. Low-Density Parity-Check codes) and there are also rateless codes for a more incremental protection. At any rate, the right place to use these is probably at the hardware level ... even if efficiency is not an issue, they tend to require a fair amount of redundancy.
1. Error reporting paths are not reliable (in fact it's notoriously terrible). Drive failure prediction is not reliable.
2. The data path is not reliable. That includes media, drive controller, firmware, buffers, cabling, controller, host RAM, and other host subsystems. Cosmic rays. Whatever might flip a bit anywhere in your system.
3. Given the foregoing, ZFS detects ALL errors between media and application, and (if redundancy is available) corrects them.
Traditional RAID doesn't even know which side of a mirror is the good side, if they don't match. Some RAID systems do checksumming but this doesn't protect you against errors in the rest of the datapath. And isolated storage subsystems like NetApp, no matter how sophisticated, also do not protect the whole longer more vulnerable datapath.
ZFS is unique in doing that, by design. Join the mailing list or study the material at opensolaris.org. It may change the way you think about storage.
you had me at #!
No filesystem that's not a complete toy should allow files to become corrupted anyway. If your filesystem IS a complete toy (You know who you are) perhaps you should consider a filesystem that doesn't allow your files to become corrupted. If your OS is a complete toy (Again, you know who you are) and doesn't support a filesystem that's not a complete toy, then perhaps you should consider a different operating system.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
they want their novel ideas back
Sun's new file system ZFS uses end to end checksums. Nowthat we have multi-core computers and maybe more processing than we need it makes sense to do this. Sun's software is free and open source. Apple is just starting to use it in Mac OS X now.
I deeply recommend: "Introduction to Finite Fields and their Applications" Rudolf Lidl, Harald Niederreiter
A reed-solomon code isn't really a good way to do this at the highest-level, nor is it very useful for correcting errors at the highest level. Associating an ECC code with a regular file is an exercise in futility. ECC codes have a very small collision space for the bits they use, making them only suitable for error spaces which are well characterized. In other words, ECC works great when matched to a particular media, at the lowest level (e.g. hard drive or flash firmware, or radio, etc), and is horrible when used anywhere else.
Non-ECC codes are best used to detect high-level errors. These have much larger collision spaces. You are much better off doing, say, a (tar|md5) check on a snapshot as a means of high-level validation that the information has not changed. Ultimately what you would really like to do is have the OS track a check code at the highest level (the VNOPS) and then store it independant of the filesystem.
ZFS does a good job detecting errors below the ZFS filesystem layer itself, but can't detect errors made at or above the filesystem layer... for example, due to bugs in the OS itself. Remember it isn't just the filesystem which must be bug-free, the OS has to be too. So ZFS is really good and detecting media errors but no matter how good it is you still couldn't trust your data to a single logical ZFS mount. You still need off-site backups and replication and if you use ZFS you wind up with a severe double-multiplication of storage requirements. As much as I like ZFS (and really like its ability to detect lost writes), it winds up being a very, very heavy-weight solution.
-Matt
All disks already do this, you dumb fuck. Another linux nutcase copying something that's already been done - FOR DECADES !! You are one stupid mofo, and the stupid ass losers who think "oooooo, great idea!" are only a little less of one.
I guess in your world, vendors never cheap out, cables are always perfect and files never go over the net. If only there weren't so many obnoxious cowards there...