Slashdot Mirror


Why RAID 5 Stops Working In 2009

Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"

11 of 803 comments (clear)

  1. Re:Dont worry too much by SatanicPuppy · · Score: 5, Informative

    The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.

    I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.

    That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
  2. Re:Testable assertion by theendlessnow · · Score: 4, Informative

    I have large RAID 5's and RAID 6's... I generally don't have any RAID columns over 8TB. I HAVE had drive failures. Yes... I'm talking cheapo SATA drives. No... I have not see the problem this article presents. Do I backup critical data? Yes. The only time I lost a column was due to a firmware bug which caused a rebuild to fail. Took awhile to restore from backup, but that was about the extent of the damage. I would call this article FUD... deceptive FUD, but very much FUD.

  3. 1 in 10^14 bit is not what I observe by gweihir · · Score: 4, Informative

    My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)

    So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  4. Re:RAID != Backup by Walpurgiss · · Score: 4, Informative

    I run a raid5 with 1TB disks. Growing the array from 3 to 4 took around 4 hours, 4 to 5 took maybe 8 or 10, 5 to 7 took something like 30 hours I guess.

    But that's growing from a previous capacity to a larger capacity.
    Using mdadm to fake a failure by removing and adding a single drive, the recover time generally was 4-5 hours.

  5. Re:Carefully protected? by SatanicPuppy · · Score: 4, Informative

    I've got a mainframe circa 1984 that's been using the same type of drive since 1989. Last year we pulled all the year-end financial numbers off the yearly backups dating back to that point. Zero failed tapes.

    Consumer-grade CDs and DVDs use a photosensitive dye to record information. It can degrade in anywhere between 2 to 5 years...Longer if you keep it in a cool dark place, but not 20 years.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
  6. Re:RAID doesn't protect against your worst enemy by Anonymous Coward · · Score: 4, Informative

    Redundancy... You keep using that word. I do not think it means what you think it means.

    RAID 0, psudo-ironically, is not redundant at all. RAID 1, often called mirroring, are the arrays that are redundant.

  7. Re:Can I tell you where to insert your plug? by pyite · · Score: 4, Informative

    Wow. I love your FUD. If you're going to lie, at least make it seem truthful.

    Lacking in file system utilities (yes, fsck IS necessary even on healthy filesystems, especially on desktops and portables)

    Why no fsck? And if you really feel the need to do something:

    zpool scrub <pool_name>

    License-incompatible with anything worth running it on, other than Solaris itself... which is NOT worth running (see #1 above)

    What you mean to say is "Some Operating Systems whose merits can be debated are license incompatible with the license of ZFS." FreeBSD can implement ZFS. Why can't Linux? Because of its license, not that of ZFS.

    --

    "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman

  8. Re:Carefully protected? by ajkst1 · · Score: 5, Informative

    I have to echo this comment. RAID is not a backup. It is a form of redundancy. Nothing is stopping that system from losing two drives and completely losing your data. RAID simply allows you to keep working after a SINGLE disk failure. If you're not making backups of your critical data and relying on RAID to save your behind, you're insane.

  9. Re:Don't panic! by nine-times · · Score: 4, Informative

    How reliable RAID5 is depends, because actually the more disks you have, the greater the likelihood that one of them will fail in any set period of time. So obviously if you have a RAID 0 of lots of disks, then there is a much better chance that the RAID will fail than that any particular disk will fail.

    So the purpose of RAID5 is not so much to make it orders of magnitude more reliable than just having a single disk, but rather to mitigate the increased risk that would come from having a RAID0. So you'd have to calculate, for the number of disks and the failure rate of any particular drive, what are the chances of having 2 drives fail at the same time (given a certain response rate to drive failure). If you have enough drives and a slow enough response to disk failures, it's at least theoretically possible (I haven't done the math) that a single drive is safer.

  10. Re:RAID doesn't protect against your worst enemy by cbreaker · · Score: 4, Informative

    Well, Windows does. Taking a snapshot of NTFS, even on a heavily used 1TB+ file server, takes only a few seconds, and under normal operation the file system is still fast.

    NTFS is actually a pretty good file system. It's probably because it was originally designed by IBM.

    --
    - It's not the Macs I hate. It's Digg users. -
  11. Re:Carefully protected? by kimvette · · Score: 4, Informative

    I have CD-Rs dating back to 1994 or 1995 that are just fine -- and they're off-brand media too. "Good" media was $12 to $20 per CD then, and "cheap" media was $7.00 per CD.

    I have DVD-Rs dating back to 2002 or 2003 -- again, just fine.

    While it's good to be cautious, some in here are crying wolf regarding optical media.

    --
    The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50