Why RAID 5 Stops Working In 2009

← Back to Stories (view on slashdot.org)

Why RAID 5 Stops Working In 2009

Posted by kdawson on Tuesday October 21, 2008 @11:03AM from the back-'em-up-rawhide dept.

Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"

25 of 803 comments (clear)

Carefully protected? by Whiney+Mac+Fanboy · 2008-10-21 11:03 · Score: 5, Insightful

12 TB of your carefully protected â" you thought! â" data is gone. Oh, you didn't back it up to tape? Bummer!
If it wasn't backed up to an offsite location, then it wasn't carefully protected.

--
There are shills on slashdot. Apparently, I'm one of them.
1. Re:Carefully protected? by SatanicPuppy · 2008-10-21 11:16 · Score: 5, Insightful
  
  Yea, because we all backup 12TB of home data to an offsite location. Mine is my private evil island, and I've bioengineered flying death monkeys to carry the tapes for me. They make 11 trips a day. I'm hoping for 12 trips with the next generation of monkeys, but they're starting to want coffee breaks.
  I'm sorry, but I'm getting seriously tired of people looking down from the pedestal of how it "ought" to be done, how you do it at work, how you would do it if you had 20k to blow on a backup solution, and trying to apply that to the home user. Even the tape comment in the summary is horseshit, because even exceptionally savvy home users are not going to pay for a tape drive and enough tapes to archive serious data, more less handle shipping the backups offsite professionally.
  This is serious news. As it stands, the home user that actually sets up a RAID 5 raid is in the top percentile for actually giving a crap about home data. Once that becomes a non-issue, then the point has come when a reasonable backup is out of reach of 99% of private individuals. This, at the same time as more and more people are actually needing a decent solution.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
2. Re:Carefully protected? by sholsinger · 2008-10-21 11:49 · Score: 5, Funny
  
  Next they'll want to unionize. At that point you've lost everything.
3. Re:Carefully protected? by WhatAmIDoingHere · 2008-10-21 12:16 · Score: 5, Insightful
  
  RAID is NOT a back-up solution. RAID is a "oh shit my hard drive failed" solution.
  
  --
  Not a Twitter sockpuppet... but I wish I was.
4. Re:Carefully protected? by Facegarden · 2008-10-21 12:18 · Score: 5, Funny
  
  Buying a computer system you cannot afford to properly use is crazy. Yes, some people are crazy, and those crazy people are going to lose data, but there's no sense in defending it.
  Well, i guess i'm crazy, i have 3TB of space on my home PC, and no way to back it all up offsite. I do have some important folders from one drive automatically copy to another drive periodically, so if one drive dies the other will be okay, but if i lose them both or the place burns down or i get a nasty virus, it's all going to hell.
  Most of my space is taken up by pirated... err... backed up... HD movies. And porn, lots of porn.
  Either way, i'm not too worried if i lose that, it's just the things i back up i really care about.
  The thing is, i was going to RAID 3 of the drives into a secure 1TB array, but now i hear all these issues with RAID and i worry that it may be WORSE than just copying over the files periodically. I want a DROBO but those are expensive as hell.
  This article has inspired me to look into Tape Backup but i worry that it's not cost effective (i haven't looked yet).
  I should fill up some tapes with a few hundred gigs of porn, write "confidential" on them, and stash them in a bag, under some bush, across the street from HP near my apartment. I'm sure some curious person would come looking, only to discover their contents and wonder why the hell someone went to all that trouble....
  God i'm strange.
  -Taylor
  
  --
  Worldwide Military budgets: $2100 billion. Worldwide Space Exploration budgets: $38 billion. Really, world? Really?
5. Re:Carefully protected? by binarylarry · 2008-10-21 12:43 · Score: 5, Funny
  
  That's why serious IT people use Fedex.
  
  --
  Mod me down, my New Earth Global Warmingist friends!
6. Re:Carefully protected? by jaxtherat · 2008-10-21 14:41 · Score: 5, Insightful
  
  I love how you use the language "get what they deserve".
  What about my situation, where I have to store ~ 1TB of unique data per office in 3 offices that are roughly 1000 km apart and I have to keep everything backed up with a budget of less than ~AU$ 4000 IN TOTAL?
  I have to run a 4 x 1TB RAID arrays on the file servers and use rsync to synchronise all the data between the offices nightly "effectively" doing offsites, and have a 3 TB linux NAS (also using RAID 5) for incrementals at the main site.
  That is all I can afford, and I feel that I'm doing my best for my employer given my budget and still maintaining my professional integrity as a sysad.
  Why do I "get what they deserve" when I can't afford the necessary LTO4 drives, servers and tapes (I worked it out I'd need ~ AU$ 30,000) to do it any other way?
  
  --
  http://www.zombieapocalypse.tv/
7. Re:Carefully protected? by camperdave · 2008-10-21 14:50 · Score: 5, Funny
  
  Keep in mind that Flash cells are memory arrays and as such are susceptible to ionizing radiation that can and will flip bits.
  
  That's okay. We'll just gang them together in a RAID 5 configuration.
  
  --
  When our name is on the back of your car, we're behind you all the way!
8. Re:Carefully protected? by ajkst1 · 2008-10-21 15:03 · Score: 5, Informative
  
  I have to echo this comment. RAID is not a backup. It is a form of redundancy. Nothing is stopping that system from losing two drives and completely losing your data. RAID simply allows you to keep working after a SINGLE disk failure. If you're not making backups of your critical data and relying on RAID to save your behind, you're insane.
9. Re:Carefully protected? by darkpixel2k · 2008-10-21 15:59 · Score: 5, Funny
  
  The company BOTH cares about their data AND can't afford a proper backup system.
  In this case, linux has one last resort for you:
  sudo apt-get install bible
  
  darkpixel@hoth:~$ bible
  bible: Debian/BRS Release 4.18, $Date: 2005/01/23 11:29:22 $
  Hit '?' for help.
  
  -snip-
  
  bible(KJV) [Gen1:1]> ec3:6
  
  Ecclesiastes 3
  
  6 A time to get, and a time to lose; a time to keep, and a time to cast away;
  bible(KJV) [Ec3:6]>
  
  Mainly pay attention to that whole '...and a time to lose' part.
  
  --
  There's no place like ::1 (I've completed my transition to IPv6)
10. Re:Carefully protected? by tengu1sd · 2008-10-21 16:19 · Score: 5, Insightful
  
  >>>The company BOTH cares about their data AND can't afford a proper backup system.
  It can be that the company cares, but doesn't care enough to budget for potential data recovery. All you can do is to make sure the risks are explained, with budget option and well documented paper trail is cover your nether regions. Been there, done that. The typical response is that backups are not important, until a failure and a few days of uncertainty is forced upon the company.
  Having the same, potentially corrupted, data at multiple sites mitigates against the loss of a disk, or even the loss of a single site. User error or database corruption can wind up copied over your good data. Needing to go back for more than a day or two can may not be practical in a disk to disk backup environment.
  It's a part of system manager's role to spell out potential problems in easy to understand power point sound bytes and show what options are available. The better you can do this, the more toys you'll have to play with.
11. Re:Carefully protected? by techess · 2008-10-22 01:04 · Score: 5, Interesting
  
  I always love it when Fed-Ex destroys something and then tries to hide it. One day I walked past the shipping office and I smelled the very strong odor of hydraulic oil coming from the room. I take a look inside since we shouldn't be receiving anything that has hydraulic oil in it. I found a bunch of boxes with the local Detroit Airport logo all over them and sealed with DET labeled tape. The cardboard was completely soaked through with the oil.
  I carefully opened one of the boxes and found it contained servers! It appears that the original boxes got in some sort of accident at the airport and were completely soaked. At the airport Fed-Ex or the baggage handlers did us a "favor" and re-boxed everything. The servers were so coated (and filled) that even the new boxes were completely soaked through and the bottoms of the boxes were starting to pull apart. The Fe-Ex guy (so we wouldn't refuse them) dropped them off at lunch and then got some random person in the hall to sign off on it.
  We had to pay for new servers to be built ASAP and shipped overnight (UPS this time) at huge cost for us. Since someone had signed off on the package we then had a very long fight to get Fed-Ex to pay for the equipment they destroyed. We never got the extra cost for the overnight shipping and the rush build reimbursed.
  
  --
  Don't anthropomorphize computers. They *hate* that.
Re:Dont worry too much by SatanicPuppy · 2008-10-21 11:25 · Score: 5, Informative

The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.
I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.
That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Re:Dont worry too much by Angus+McNitt · 2008-10-21 11:44 · Score: 5, Insightful

... very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
That is also because very few people buy a Raid setup piecemeal. Most end up buying a solution, fully populated. The idea of swapping out some drives as you go, or growing your RAID over time doesn't always look good, either to the PHBs who usually run the budget, or to the vendor. We had a vendor trying to sell us a iSCSI SAN device tell us that varying the drive lots and dates increased the chances of failure. Needless to say we went elsewhere.

When we bought the RAID array for our Exchange box, this is going back a few years, everybody looked at my like an idiot because I asked for drives with different lot numbers. It was the best I could do as buying over time was not an option. HP was actually pretty cool about this request and out of 8 disks, no 3 have the same lot number or manufacture date.

Of course we are also running RAID on that machine for non-backup and do a nightly replication, so your mileage may vary.

--
"To Do Is To Be" - Socrates, "To Be Is To Do" - Sartre, "Do Be Do Be Do" - Sinatra
Re:RAID doesn't protect against your worst enemy by SatanicPuppy · 2008-10-21 11:45 · Score: 5, Insightful

Wow, how incite-ful. Doesn't matter what the discussion is, some geek is bound to weigh in with all the shortcomings of any idea.
Newsflash: there is no perfect backup! No method is foolproof, especially when it's bound to be boring as hell, and you've got an inevitable human factor. You get lazy moving the tapes offsite, you put off fixing a dead drive because there are 4 others, you wipe your main partition upgrading your distro and forget that your CRON rsync script uses the handy --delete flag, and BOOM wipes out your backup.
Shit happens. Pointing out what we all already know doesn't do anything helpful.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Re:RAID doesn't protect against your worst enemy by lucas+teh+geek · 2008-10-21 11:51 · Score: 5, Insightful

RAID doesn't protect against your worst enemy
rm -r *
nor is it supposed to. not being a moron seems to have protected me from "my worst enemy" just fine. RAID has protected me from random disk failures. seems to be working as designed

--
TIAEAE!
Confessions of a reformed RAID addict by rs79 · 2008-10-21 12:08 · Score: 5, Funny

You get your first RAID controller from a trusted friend. "Here" he says "try this" and hands you a Mylex board. It has a 64 bit bus and 3 SCSI LVD connectors. Oooh. That looks fast. So you start ebaying drives, cables, adapters, more controllers, the inevitable megawatt power supply and you mess around with raid 1, raid 0 raid 1+0 and raid 5. Suddenly every system falls prey to RAIDMANIA; eventually for yourself you build a system with 3 controllers, with 3 busses each and a drive on each one of 9 busses. With a controller for swap, one for data and one for the system will Windows now be fast? Yeah, sorta. Those drives sure are quiet - from a click-click busy noise perspective, NOT from a "sounds liks a jet airplane when running" perspective. Heat is an issue, too.
http://rs79.vrx.net/works/photoblog/2005/Sep/15/DSCF0007s.jpg
But oh my are the failure modes spectacular.
I just use a laptop now and make several sets of backup DVDs or just copy to spare drives. I love RAID to death. But it's really only marginally worth the effort in the real world. But if you need fast, OMG.

--
Need Mercedes parts ?
Re:RAID doesn't protect against your worst enemy by Kleen13 · 2008-10-21 12:31 · Score: 5, Funny

(though it's been running since '04 without any problems, and my HD health monitors show it in good shape)
Oh man.... you didn't just say that out loud did you???

--
That sinking feeling deep in your gut when you KNOW you screwed up bad summed up with: {head desk} {head desk}
Re:RAID doesn't protect against your worst enemy by Junior+J.+Junior+III · 2008-10-21 12:31 · Score: 5, Funny

My data backup scheme is to steganographically embed my entire filesystem into nude pictures of Sarah Palin, and then upload them to usenet.

--
You see? You see? Your stupid minds! Stupid! Stupid!
Re:RAID doesn't protect against your worst enemy by Renderer+of+Evil · 2008-10-21 12:55 · Score: 5, Funny

That's why I chisel all my data (ones and zeros) onto stone tablets. In a few years the pile of stones will be taller than Everest. :)
And in a thousand years some bearded guy will discover couple of those stones, come down the mountain and will base a religion around it. These things are cyclical.
Raid 5 - Kills Drives Dead(tm) by fortapocalypse · 2008-10-21 13:02 · Score: 5, Funny

RAID???!!! Aaaaaaah! (Drive dies.)
Re:RAID doesn't protect against your worst enemy by tkw954 · 2008-10-21 13:25 · Score: 5, Funny

rm -r *

That doesn't work for me. Try

sudo rm -rf /*
Re:Don't panic! by Anonymous Coward · 2008-10-21 13:47 · Score: 5, Insightful

No, it won't. That's the point of this not-news article. It's getting to the point where (due to the size of the disks) a rebuild takes longer than the statistically "safe" window between individual disk failures. Two disks kick it in the same timeframe (the chance of which increases as you add disks) and you're screwed.
A poorly designed multi-disk storage system can easily be worse than a single disk.
Re:Don't panic! by Sillygates · 2008-10-21 16:07 · Score: 5, Insightful

The mathematical theory behind raid5 is not complicated at all. http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5

And there is parity, that's how raid5 works.

You are probably referring to "silent" errors, which for performance reasons, isn't read/detected by most raid5 implementations. And in reality there is little reason to actively read parity, unless they are running/recovering in degraded mode: Sure, you'll be informed that there is data corruption, but there is no way to tell whether the parity, or the original data is at fault (though its true, some implementations will scrub/update the parity to match the original data on an occasional basis).

I don't see a single set of raid5 disks as a backup solution at any measure though (disk reliability is only one aspect of this, hardware/driver/filesystem bugs can also cause hard or impossible to detect corruption), but it is a great 'best effort' to prevent a bit of downtime on high availability disks.

--
I fear the Y2038 bug
Re:Don't panic! by Eivind · 2008-10-21 18:05 · Score: 5, Insightful

Yes. It's amazing that the article presents the basic point so horribly poorly. The problem is not the capacity of the disks.
The problem is that the capacity has been growing faster than the transfer-bandwith. Thus it takes a longer and longer time to read (or write) a complete disk. This gives a larger window for double-failure.
Simple as that.