Why RAID 5 Stops Working In 2009
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
If you have one RAID5 box, just build another one that replicates it. Use that for your "hot backup". Then back that up to tape, if you must.
Storage is so cheap these days (especially if you don't need super-fast speeds and can use regular SATA drives), that you might as well just go crazy with mirroring/replicating all your drives all over the place for fault-tolerance and disaster-recovery.
you know the other solution is to not use RAID5 with these big drives, or to go to RAID1, or to actually back up the data you want to save to DVD and accept a disk failure will cost you the rest.
Now, while 1TB onto DVDs seems like quite a chore (and I'll admit it's not trivial), some level of data staging can help out immensely, as well as incrementally backing up files, not trying to actually get a full drive snapshot.
Say you backup like this:
my pictures as of 21oct2008
my documents (except pictures and videos) as of 22 oct2008
etc.
while you will still lose data in a disk failure, your loss can be mitigated, especially if you only try to backup what is important. With digital cameras I would argue that home movies and pictures are the two biggest data consumers that people couldn't backup to a single dvd and that they would be genuinely distressed to lose.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.
I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.
That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
I have large RAID 5's and RAID 6's... I generally don't have any RAID columns over 8TB. I HAVE had drive failures. Yes... I'm talking cheapo SATA drives. No... I have not see the problem this article presents. Do I backup critical data? Yes. The only time I lost a column was due to a firmware bug which caused a rebuild to fail. Took awhile to restore from backup, but that was about the extent of the damage. I would call this article FUD... deceptive FUD, but very much FUD.
My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)
So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I run a raid5 with 1TB disks. Growing the array from 3 to 4 took around 4 hours, 4 to 5 took maybe 8 or 10, 5 to 7 took something like 30 hours I guess.
But that's growing from a previous capacity to a larger capacity.
Using mdadm to fake a failure by removing and adding a single drive, the recover time generally was 4-5 hours.
I've got a mainframe circa 1984 that's been using the same type of drive since 1989. Last year we pulled all the year-end financial numbers off the yearly backups dating back to that point. Zero failed tapes.
Consumer-grade CDs and DVDs use a photosensitive dye to record information. It can degrade in anywhere between 2 to 5 years...Longer if you keep it in a cool dark place, but not 20 years.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Redundancy... You keep using that word. I do not think it means what you think it means.
RAID 0, psudo-ironically, is not redundant at all. RAID 1, often called mirroring, are the arrays that are redundant.
Wow. I love your FUD. If you're going to lie, at least make it seem truthful.
Lacking in file system utilities (yes, fsck IS necessary even on healthy filesystems, especially on desktops and portables)
Why no fsck? And if you really feel the need to do something:
License-incompatible with anything worth running it on, other than Solaris itself... which is NOT worth running (see #1 above)
What you mean to say is "Some Operating Systems whose merits can be debated are license incompatible with the license of ZFS." FreeBSD can implement ZFS. Why can't Linux? Because of its license, not that of ZFS.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
I worked at Maxtor up till 2006, and had the privilege of being able to play with several raid controllers, and that coincidently is how I got started with Linux at home (software RAID). At the time, and mind you I only had 160 GB and 250 GB drives to play with, I build a number of raid-5 arrays up to 2 TB. When people think about RAID failure, they generally think about a hardware failure - a sector that can't be read etc. That is only the "obvious" problems. Even under ideal conditions, the 1e15 - 1e17 error rates published by the disk drive vendors also includes data errors that ARE NOT detected in hardware. It does not take a sector read failure to generate a data miscompare. What I found back in '06, is that with a 2TB Raid5 made up of 8 drives, there was about a 10% probability of a RAID data failure every time the raid array was read, sector, by sector for the entire 2TB span. That implies that in the event of a real disk failure, there was about a 10% probability that the rebuild would fail because of an otherwise undetected data read error. I am not sure where state of the art is with Linux Software RAID, and perhaps the "scrub" operation mentioned above does the trick, but the biggest failing in RAID systems I have used, is that when a data error occurs, the algorithms don't/didn't calculate the missing block, and write it back to the failing device giving it a chance to push off the sector in error. Most disk drives can "heal" with most of the common problems in a RAID system. Whats missing is back ground grooming that deals with a missing data slice, and gives the device the chance to recover from it, while alerting the admin that a problem was "handled". Its not the 3%/year hard disk failure we should be worried about - its corrected error rate. 1e15 is very unforgiving when you are talking about terabytes... As long as RAID doesn't do the "right thing" and try to recapture the missing data, RAID-5 is in trouble.
Nothing evolves faster than the word of god in the minds of men who think themselves divinely inspired.
"Safe" production data ...with nightly replays/screenshots ...
Exactly. You make backups, no matter what. Anyone that relies on RAID for backups will get what they deserve, sooner than later.
RAID and SANs are for uptime (reliability) and/or performance. SANs with snapshots and RAID with backups are for data recovery.
The cesspool just got a check and balance.
I have to echo this comment. RAID is not a backup. It is a form of redundancy. Nothing is stopping that system from losing two drives and completely losing your data. RAID simply allows you to keep working after a SINGLE disk failure. If you're not making backups of your critical data and relying on RAID to save your behind, you're insane.
How reliable RAID5 is depends, because actually the more disks you have, the greater the likelihood that one of them will fail in any set period of time. So obviously if you have a RAID 0 of lots of disks, then there is a much better chance that the RAID will fail than that any particular disk will fail.
So the purpose of RAID5 is not so much to make it orders of magnitude more reliable than just having a single disk, but rather to mitigate the increased risk that would come from having a RAID0. So you'd have to calculate, for the number of disks and the failure rate of any particular drive, what are the chances of having 2 drives fail at the same time (given a certain response rate to drive failure). If you have enough drives and a slow enough response to disk failures, it's at least theoretically possible (I haven't done the math) that a single drive is safer.
Well, Windows does. Taking a snapshot of NTFS, even on a heavily used 1TB+ file server, takes only a few seconds, and under normal operation the file system is still fast.
NTFS is actually a pretty good file system. It's probably because it was originally designed by IBM.
- It's not the Macs I hate. It's Digg users. -
"This time?"
Ah, I see you've never read "Song of Songs"
I have CD-Rs dating back to 1994 or 1995 that are just fine -- and they're off-brand media too. "Good" media was $12 to $20 per CD then, and "cheap" media was $7.00 per CD.
I have DVD-Rs dating back to 2002 or 2003 -- again, just fine.
While it's good to be cautious, some in here are crying wolf regarding optical media.
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
Sadly no. I have a ton of things to back up at home and just use Bacula with a ton of DVD-RWs. It's not really ideal. I keep scouring eBay and Craig's List for an LTO1 or 2 drive but I haven't had any luck getting something under a thousand dollars. I've looked at S3, rsync.net, and a few others, but they're all way too expensive for me.
That doesn't work for me. Try
hell, if you want to lose data, you've gotta at LEAST use dd. rm is just removing file handles, all your data is fine, you just cant access it. run
(or whatever disk you want to lose) and then see how many data recovery places will turn you away. the level of data recovery available to the public is pretty crappy, there's a guy offering a reasonably big prize to any data recovery company (or anyone at all i guess) who can recover data from a disk he zero'd with dd and hasnt had any takers yet. i wish i could find the link
External TB drives are around $150 bucks. Buy several. Make rotating copies. It's doable on your budget. (We're in the same boat, btw, and that was our solution for the dev machines)
However, the real issue is your employer has decided on the budget, and what you do with it is how well you're protected. Sometimes we don't get a Fibre NAS with remote backup, no matter how much we want it. Sometimes we have to get by with the old rsync, dd, or pure copy or even tar/zip with rotating media. (Anything less is suicide)
The cesspool just got a check and balance.
The problem is that the capacity has been growing faster than the transfer-bandwith. Thus it takes a longer and longer time to read (or write) a complete disk. This gives a larger window for double-failure.
No, the point is that (statistically) you can't actually read all of the data without having another read error (statistically speaking).
Whether you read it all at 100MB/sec or 10MB/sec (ie: how long it takes) is irrelevant (within reason). The problem is that published URE rates are such that you "will" have at least one during the rebuild (because of the amount of data).
The solution, as outlined by a few other posters, are more intelligent RAID5 implementations that don't take an entire disk offline just because of a single sector read error (some already act like this, most don't).
"Unrecoverable" implies that it is not possible to read the data anymore.
Also, data on the disk is addressed by sectors, so if one fails, this means you typically have at least 512 bytes lost.
It's true that even that might not completely break some kind of large media file, but you have to remember that RAID5 is a layer below your file system data, so if an error occurs when its trying to rebuild itself, it will not be able to give you your data back.
You might be able to recover a lot of your data from an error of this kind, but don't count on the RAID implementation to do it for you.