Why RAID 5 Stops Working In 2009
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
12 TB of your carefully protected â" you thought! â" data is gone. Oh, you didn't back it up to tape? Bummer!
If it wasn't backed up to an offsite location, then it wasn't carefully protected.
There are shills on slashdot. Apparently, I'm one of them.
Furthermore, for many enterprise applications disk size is not the main concern, but rather I/O throughput and reliability. Few need 7 disks of 2 TB in RAID5.
Some of us do need a large amount of reasonably priced storage with fast read speed & slower write speed. This pattern of data access is extremely common for all sorts of applications.
And this raid 5 "problem" is simply the fact that modern sata disks have a certain error rate. But as the amount of data becomes huge, it becomes very likely that errors will occur when rebuilding a failed disk. But errors can also occur during normal operation!
The problem is that sata disks have gotten a lot bigger without the error rate dropping.
So you have a few choices:
- use more reliable disks (like scsi/sas) which reduce the error rate even further
- use a raid geometry that is more tolerant of errors (like raid 6)
- use a file system that is more tolerant of errors
- replicate & backup your data
I can see a lot of people getting into a tizzy over this. The RAID 5 this guy is talking about is controlled by one STUPID controller.
There are a lot of methods, and patented technology that prevent just the situation he is talking about. Here is just one example:
RAID is not perfect, not by any stretch, but if you use it properly it will serve it's purpose quite nicely. If your data is that critical, having it on a single raid is ill advised anyways. If you are talking about databases, then RAID 10 is more preferable and replicating the databases across multiple sites, even more so.
What is this article about?
They say that since there is more data, you're more likely to encounter problems during a rebuild.
The issue isn't with RAID, it's with the file system. Use larger blocks/sectors.
Losing all of your data requires you to have a shitty RAID controller. A decent one will reconstruct what it can.
The odds of you encountering a physical issue increases as capacity increases, and decreases as reliability increases. In theory, the 1 TB and up drives are pretty reliable. Anything worth protecting should be on server-grade hard drives anyway.
The likelihood of a physical problem popping up during your rebuild is no higher with new drives than it was with old drives. I haven't noticed my larger drives failing at higher rates than my older, smaller drives. I haven't heard of them failing at higher rates.
Remember, folks, RAID is a redundant array of inexpensive disks. The purpose of RAID is to be fault-tolerant, in the sense that a few failures don't put you out of production. You also get the nice bonus of being able to lump a bunch of drives together to get a larger total capacity.
RAID is not a backup solution.
RAID 5 and RAID 6, specifically, are still viable solutions for most setups. If you want more reliability, go with RAID 1+0, RAID 5+0, whatever.
Choosing the right RAID level has always depended on your needs, setup, budget, and priorities.
Smells like FUD.
How many times does this have to be said.
RAID is not a backup. RAID is designed to protect against hardware failures. It can also increase your I/O speed, which is more important in some cases. Backups are different.
Depending on what you are doing, you may or may need a RAID, but you definitely need backups.
... very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
That is also because very few people buy a Raid setup piecemeal. Most end up buying a solution, fully populated. The idea of swapping out some drives as you go, or growing your RAID over time doesn't always look good, either to the PHBs who usually run the budget, or to the vendor. We had a vendor trying to sell us a iSCSI SAN device tell us that varying the drive lots and dates increased the chances of failure. Needless to say we went elsewhere.
When we bought the RAID array for our Exchange box, this is going back a few years, everybody looked at my like an idiot because I asked for drives with different lot numbers. It was the best I could do as buying over time was not an option. HP was actually pretty cool about this request and out of 8 disks, no 3 have the same lot number or manufacture date.
Of course we are also running RAID on that machine for non-backup and do a nightly replication, so your mileage may vary.
"To Do Is To Be" - Socrates, "To Be Is To Do" - Sartre, "Do Be Do Be Do" - Sinatra
Wow, how incite-ful. Doesn't matter what the discussion is, some geek is bound to weigh in with all the shortcomings of any idea.
Newsflash: there is no perfect backup! No method is foolproof, especially when it's bound to be boring as hell, and you've got an inevitable human factor. You get lazy moving the tapes offsite, you put off fixing a dead drive because there are 4 others, you wipe your main partition upgrading your distro and forget that your CRON rsync script uses the handy --delete flag, and BOOM wipes out your backup.
Shit happens. Pointing out what we all already know doesn't do anything helpful.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
RAID doesn't protect against your worst enemy
rm -r *
nor is it supposed to. not being a moron seems to have protected me from "my worst enemy" just fine. RAID has protected me from random disk failures. seems to be working as designed
TIAEAE!
RAID 5 will still be orders of magnitude more reliable than just having a single disk.
No sig today...