Why RAID 5 Stops Working In 2009
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
12 TB of your carefully protected â" you thought! â" data is gone. Oh, you didn't back it up to tape? Bummer!
If it wasn't backed up to an offsite location, then it wasn't carefully protected.
There are shills on slashdot. Apparently, I'm one of them.
When HDD's move to bigger sectors - there should be better error recovery reducing the probability of unrecoverable read errors. Right? Ok, I'm moving to ZFS.
Prediction: The real iPhone killer is going to be sex robots from Japan. Think about it.
RAID is not, and has never been, a substitute for backups.
I am not running Windows.......oh wait nvm
I mean, WTF? Many people regard RAID as something magical that will keep their data no matter what happens. Well ... it's not.
Furthermore, for many enterprise applications disk size is not the main concern, but rather I/O throughput and reliability. Few need 7 disks of 2 TB in RAID5.
The Raven
RAID 5 and "carefully protected"?
RAID 5, as well as RAID 6 is nothing more at an attempt to add some amount of redundancy without sacrificing too much space. Go RAID 1 instead with the same number of disks. Also do off-site mirroring of all your data.
And if you get get "unrecoverable read error" after a drive failure, it means the administrator should get fired, as he was too stupid to type "echo check > sync_action" followed later by "cat mismatch_cnt".
The problem with Raid 5 is that the more drives you have the higher probability you have that more than one drive dies. That's why you have multiple raid 5 arrays of 4 disks maximum instead of one array of 7 disks.
If you use RAID to 'protect' your data, you clearly don't value your data at all.
While the interesting bit of this article is the coming demise of RAID 5, what you should be bringing away with it is, if RAID is all that stands between you and data loss, you're a noob.
That's what RAID stands for. It's a nice idea in theory, as long as the disks remain cheap, but I've never trusted them to work properly and had more than one break on me. "All you have to do is unplug the bad disk, plug in a good one in its place, and in a few minutes all will be hunky dory." Bzzt. Wrong. Thanks for playing.
Backup every day to tape, to another disk entirely on a diffrent machine, to R/W DVD, twice a day if you have to, or all of the above--anywhere else but the machine itself. RAID: the accident waiting to happen. Yeah, I'm paranoid. It comes from experience.
How about a moderation of -1 pedantic.
If you have one RAID5 box, just build another one that replicates it. Use that for your "hot backup". Then back that up to tape, if you must.
Storage is so cheap these days (especially if you don't need super-fast speeds and can use regular SATA drives), that you might as well just go crazy with mirroring/replicating all your drives all over the place for fault-tolerance and disaster-recovery.
A RAID 5 setup is only a precaution in case of an hardware failure. It serves as no excuse for not having backed up your data.
And the topic is also flawed - RAID 5 doesn't have any self destruct mechanism.
This story is just ridiculous. It clearly states that this doesn't affect Enterprise users, as their URE rate is lower and unless they're idiots they use smaller drives. What home users will have 7 disk RAID 5 arrays of 2TB disks? Is this really a large enough percentage of RAID5 users to call for the death of it?
If you put 7 disks in a single RAID5 without backup then its called bad design and bad implementation.
This has always been true regardless of disk size/speed.
As above posters have pointed out once you get past 4 disks the non-ZFS way to go is multiple blocks of RAID-(whatever number is appropriate for your scenario).
Though ZFS is awesome and if your OS/hardware supports it 100% there is little reason to stick with RAID
July 18th, 2007
This is trivially testable. Any slashdotters have experience rebuilding 7TB RAID 5 arrays?
You'd think, if this were really an issue, we'd be hearing stories from the front lines of this happening with increasing frequency. Instead we have a blog post based entirely on theory, without a single real-world example for corroboration.
What's more, who even uses RAID 5 anymore? I thought it was all RAID 10 and whatnot these days.
rm -r *
Seriously, you're kidding yourself if you think RAID is protecting you.
http://lkml.org/lkml/2005/8/20/95
Sure, everyone should use atleast Raid 6 in production, atleast it's an improvement over the classic Raid 5 with a hot-spare.
But the big problem with Raid isn't disk failure, it's disk decay, and a major reason for that being a problem is the lack of hashing on most modern filesystems.
They basically don't check that what you put somewhere is what you get back, which means that the Raid can decay slowly and your data will just corrupt, sure it's still raid, it's just that the distributed data is corrupt.
I'm going to troll ridiculously old articles and post them to Slashdot and hope the editors don't notice... oh cool, they didn't here either!
I just wasted your mod points! HA!
I can see a lot of people getting into a tizzy over this. The RAID 5 this guy is talking about is controlled by one STUPID controller.
There are a lot of methods, and patented technology that prevent just the situation he is talking about. Here is just one example:
RAID is not perfect, not by any stretch, but if you use it properly it will serve it's purpose quite nicely. If your data is that critical, having it on a single raid is ill advised anyways. If you are talking about databases, then RAID 10 is more preferable and replicating the databases across multiple sites, even more so.
What is this article about?
They say that since there is more data, you're more likely to encounter problems during a rebuild.
The issue isn't with RAID, it's with the file system. Use larger blocks/sectors.
Losing all of your data requires you to have a shitty RAID controller. A decent one will reconstruct what it can.
The odds of you encountering a physical issue increases as capacity increases, and decreases as reliability increases. In theory, the 1 TB and up drives are pretty reliable. Anything worth protecting should be on server-grade hard drives anyway.
The likelihood of a physical problem popping up during your rebuild is no higher with new drives than it was with old drives. I haven't noticed my larger drives failing at higher rates than my older, smaller drives. I haven't heard of them failing at higher rates.
Remember, folks, RAID is a redundant array of inexpensive disks. The purpose of RAID is to be fault-tolerant, in the sense that a few failures don't put you out of production. You also get the nice bonus of being able to lump a bunch of drives together to get a larger total capacity.
RAID is not a backup solution.
RAID 5 and RAID 6, specifically, are still viable solutions for most setups. If you want more reliability, go with RAID 1+0, RAID 5+0, whatever.
Choosing the right RAID level has always depended on your needs, setup, budget, and priorities.
Smells like FUD.
The whole argument boils down the published URE rate being both accurate, and a foregone conclusion. Will disk makers _really_ make drives that have a sector failure for every 2 terabytes, or will they improve whatever technology is causing these URE's to be much more rare? (if the rate was real in the first place).
AccountKiller
How many times does this have to be said.
RAID is not a backup. RAID is designed to protect against hardware failures. It can also increase your I/O speed, which is more important in some cases. Backups are different.
Depending on what you are doing, you may or may need a RAID, but you definitely need backups.
Start with a short plug, because where I'll be asking you to put your ZFS "plug" will most-definitely hurt if your "plug" is any larger...
Need I go on? There are plenty more reasons.
We'll have a viable replacement soon enough, which is already designed to have quite a few more features that ZFS does not have, and cannot delivery in its current incarnation.
12 TB of your carefully protected â" you thought! â" data is gone. Oh, you didn't back it up to tape? Bummer!"
Ummm... Wow.
A RAID is no substitute for a backup. A RAID cannot recover a file that you accidentally deleted, for example. A RAID can't be used to rebuild your server if the building burns down. If you aren't using some kind of offsite backup, your data is not carefully protected.
RAIDs are handy for giving you a little more reliability. If one HDD fails you can usually recover without any downtime.
RAIDs also give you much better speed over a single drive.
RAIDs can give you increased capacity over a single drive.
But a RAID is not a substitute for a backup. Never was, never will be.
In practice, this means that while your array is rebuilding, your performance SLAs go out of the window. If this is for an interactive server, such as a TP database or web service you end up with lots of complaints and a large backlog of work.
The result is that as disks get bigger, the recovery takes longer. This is what make RAID less desirable, not the possibility of a subsequent failure - that can always be worked around.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
It's implied in the Slashot description that ZFS solves the problem of drive failure. It does not. Just want to make that clear. In fact, I'd argue that there is actually more risk inside of ZFS with regards to the actual problem presented here.... for those that believe all is doom and gloom with regards to RAID.
I don't understand how the spec'd URE of a single SATA drive can be translated across multiple drives whose combined capacity happens to match that URE. I would think that the success or failure of each drive is independent of the others and the fact that you have 6 2-terabyte drives doesn't mean that you have to have a URE. You'd think that you'd have to have an URE on 7 2-terabyte drives in any configuration fi that were true.
Spell it out for everyone.
RAID won't save your data if there is a fire.
Or if you delete a file.
Or if two drives fail.
Or a thousand other scenarios.
All RAID does is prevent the system from going down when a single drive fails (except RAID 0). Thus giving everyone in the office time to finish up their important work and log out for the day so you can swap the drive. Or, if you're brave, swap the drive during regular work hours.
For the home user (not working on huge graphic files) RAID 1 (mirroring) should be sufficient. As long as it is paired with another EXTERNAL hard drive that you copy your important information to. And leave with your brother or something. I'm talking family photos and such. Your tax information should be small enough to fit on a USB drive.
If your computer completely failed TODAY what would be the really irreplaceable files on it?
Back those up. Then store them with a friend or someone in your family.
There, problem solved.
...clearly, Raid 7 is needed.
RAID 5 isn't a false sense of security. It actually DOES protect you from a disk failure.
I made the decision about two years ago that all disks at home will be either mirrored or RAID5. Disks are so dirt cheap that there's no reason not to.
RAID doesn't prevent you from having to have some sort of backup solution, and if you can't trust yourself to do them unless you're being risky with your data, I'll happily avoid dealing with restoring data and all that bullshit from a single disk failure and you can sink your time into doing it all manually.
- It's not the Macs I hate. It's Digg users. -
can someone tell what's the meaning of this shit?
Scrub once a week, or once every two weeks.
RAID6 isn't about losing any two disks, it's about having two parity stripes. It's about being able to survive sector errors without any worry.
It's about losing ONE drive and still have enough parity to replace it without any errors.
RAID6 on 5 drives is retarded, tho, because it leaves you absurdly close to RAID1 in kept space. RAID6 is for when you have 8-10 drives. At that point you barely notice the (N - 2) effect and you have a fast (provided your processor can handle it all) chunk of throughput along with an incredibly reliable system. Well, N-3 with a hotswap.
Personally, I think I'd go RAID-Z2 via ZFS if only because it's a little bit sturdier a filesystem to begin with.
They recycle old news forever!
Probably someone attempting the 1000 monkeys on a thousand keyboards theory? They have made quite some progress don't you think?
Sure expected the "editor" to actually look at the article is excessive. But:
"Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives."
Is an obvious indication that this article is old since 18-24 away puts you in 2010 now...
12 TB of your carefully protected -- you thought! -- data is gone.
oh noes!!! it will be even worse when you have 13TB instead of 12TB!!!!
now read that BS article and downmod this "news"...
RAID-10 ftw? Expensive I know, but at least you have a full layer of redundancy rather than just a parity drive.
Well what the author is saying is not true by my experience..
When I configure a new server with RAID, before handing over the box, I will test every single RAID 1/5/10 sets by pulling one drive out at a time for a couple of minutes then I would put it back in until the rebuild is complete.
I worked with multiple Dell PowerVault MD1000 using 15 1TB SATA drives and never have I ran into an issue of not being able to complet the rebuild because of the issue mentioned in the article.
I can tell you that drives will fail but if you have the right monitoring in place and catch when a drive fails or about to fails and with the right RAID solution in place, you can consider your data pretty safe. RAID is not a backup not 100% but chances are that a RAID solution will surly save your butt when a disk dies and you just don't have the time to rebuild / restore a box.
My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)
So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I'm in the process of building a new 8x 1T array. I'm not using any fancy raid card. Just a LSI 1068E chipset with a SAS expander to handle LOTS (16 slots in the case, using 8 right now).
I'm not putting the entire thing into one big array. I'm breaking up each 1T drive into ~100GB slices that I will use to build several overlapping arrays. Each MD device will be no more than 4-5 slices. This way if an error occurs on one disk in one part of a disk I will have a higher probability of recovery.
I may also use RAID 6 to give me more chance of rebuilding.
Disk errors tend to not be whole disk errors, just small broken physical parts of a single disk.
SMART will give me more chance to detect and replace dying drives.
Seriously - what's the problem with RAID 5? It's not a FALSE sense of security: It actually DOES prevent data loss or down time on a single disk failure. If you're a moron, you're creating 14 disk arrays. If you're smart, you keep it to 7 disks at the very most.
RAID 5 is great. It's fast, unless you have a shit controller without enough cache. It's going to prevent down time on a single disk failure (which is overwhelmingly the most common type of failure) and it doesn't cost you too much capacity.
Usually I'm more concerned with a fire or flood than a double-disk failure.
RAID 6 is good, but you get the same (actually worse) performance hit over RAID 5. More parity calculations. You can lose any two disks, which is nice, and if you can spare the space, go for it!
I don't see RAID 6 as being all that much more of a big deal over RAID 5 and actually it shouldn't really have it's own number since it's exactly the same technology and parity system as 5. It should be RAID 5.1 or something. Or maybe RAID5+1. The only reason it's become more available now is because controllers have gotten fast enough to deal with the additional parity.
- It's not the Macs I hate. It's Digg users. -
In communications design, such as cell phones or digital TV, channels with low reliability (fading, burst interference, etc) are tasked with getting much better overall bit error rates than you'd think you could given all the crap spewed into the RF spectrum. I'm kind of confused by why the same techniques of forward error correction, interleaving, and such aren't employed more aggressively for hard drives (maybe they are more than I am thinking, maybe that's how you get to 10^-12 in the first place?). 10^-12 bit error rate is phenominally good compared to what most digital communications devices deal with.
Typically you throw away 20-30% of your available channel with extra bits due to the encoding (imagine hardware encoding by the hard drive as it writes the bits), but you are guaranteed that if you can get most of the bits right (I'm talking 99%, not 99.9999999%) you can get the original data back, or at least know that you didn't. Interleaving spreads the bits around, so one dead sector (or 10 in a row) can easily be rocovered automatically.
I use Unraid for all my data storage. The parity and data isn't striped across all drives, only one drive is parity. If I loose two disks, I only loose the data on those two disks. I've got 4TB of home storage at the moment, but will eventually scale it. Best thing is that it runs on most hardware, and boots off a memory stick. It happens to run Reiser FS though, so when a HD dies, it really makes your life hard to get any data back. But thats why you have backups, right?
If one read error occurs in reconstruction of the array you lose the piece of data its tied to - not everything. Still get to keep 99.999% of it.
*eye roll*
I guess you should be considered a new age Luddite?
Are you the same guy that always waits for SP1 before using any software? I thought so.
RAID is a proven technology and it's use in nearly all business IT systems from big to tiny.
RAID isn't meant as a replacement to backups. It's one PART of the entire system of preventing unnecessary data lose, and more importantly, down time. You can keep on running your server while the failed disk is replaced and rebuilt.
So, while I eat cheeto's and surf Slashdot while that RAID array rebuilds itself, you can go ahead and recover your old data from last night all day long while people bitch at you for not using the technology that's been around since the inception of the hard drive.
If you actually did have the experience you claim, you'd slap yourself for such a stupid fucking post.
- It's not the Macs I hate. It's Digg users. -
You get your first RAID controller from a trusted friend. "Here" he says "try this" and hands you a Mylex board. It has a 64 bit bus and 3 SCSI LVD connectors. Oooh. That looks fast. So you start ebaying drives, cables, adapters, more controllers, the inevitable megawatt power supply and you mess around with raid 1, raid 0 raid 1+0 and raid 5. Suddenly every system falls prey to RAIDMANIA; eventually for yourself you build a system with 3 controllers, with 3 busses each and a drive on each one of 9 busses. With a controller for swap, one for data and one for the system will Windows now be fast? Yeah, sorta. Those drives sure are quiet - from a click-click busy noise perspective, NOT from a "sounds liks a jet airplane when running" perspective. Heat is an issue, too.
http://rs79.vrx.net/works/photoblog/2005/Sep/15/DSCF0007s.jpg
But oh my are the failure modes spectacular.
I just use a laptop now and make several sets of backup DVDs or just copy to spare drives. I love RAID to death. But it's really only marginally worth the effort in the real world. But if you need fast, OMG.
Need Mercedes parts ?
I have read and written about 50 Tb with my single, non-redundant 160 Gb hard drive and I have never lost so much as one bit. This is just a store-bought hard disk without even a hint of longevity. It was the cheapest drive in terms of bytes for the buck - even cheaper than used drives, which tend to be slightly overpriced for people who want to spend next to nothing. I verify all my reads and writes because I handle some large files - no data loss has ever occurred, even though I do not treat my drive lightly. I transport my drive from place to place, run it from various power sources, and use it fearlessly during thunderstorms. Zero information has gone missing.
Therefore, with a RAID system of redundant 50 Tb disks, I could fear nothing.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
Is this the way it's going to be from now on? Always in full 'crisis" mode? OMG! Y2K! No more IPv4 addresses! Spam! Spam! Spam! Spam! Spam! Great for job security, I guess. This whole fear factor thing is getting out of hand.
What?
Nobody is saying that RAID is a backup. RAID is there to keep you up and running in a business environment when a drive fails, which is, as the author puts it, inevitable. Then he goes on to statistically prove that, while rebuilding an array of currently relevant size for a large business, as in many TB of data, that you will almost certainly not be able to recover your array to a healthy state because of an unavoidable, highly probable read error on one of your "healthy" disks. Of course you have a fucking backup of your production 12 TB RAID array. He said what he did about tape backups to drive home the point, which is that your shit will be down, out of production, thereby making the fact that you had your data in RAID 5 completely pointless. The author has a good fucking point, RAID 5 is statistically useless when dealing with disks that large.
If I have 5 disks each with the exact same data is it possible so that when i want to access a file it will ask for 1/5 of the data from each drive? Wouldn't this increase drive life and increase read speed up by 5x?
First off, Isn't this story a year+ old? Sheesh.
Second off, if you're worried about URE on X number of disks, what about a single capacitor cooking off on the raid controller? No serious data is stored on a single raid controller system, without good backups or another raid'd system on completely unique hardware. Yes, if you put a lot of disk on one controller and have a failure you have a higher risk of *another* failure. That's why important data doesn't depend on *only* RAID, and why lots of places use mirroring, replication, data shuttling, etc. This isn't new. Most folks that can't afford to rebuild from backups or from a mirror'd remote device also couldn't have used 12TB for anything *but* bulk offline file storage because it's slower than christmas VS a 'real' storage array. Using it for the uber HD DVR? Great. Oh no, you lose X-files's last episodes. This isn't banking data we're talking here.
Prioritize your data. I cannot believe that a home user has 12TB of important stuff. Back up your critical records both on site and off [1]. Back up the important stuff on site with whatever is convenient. Let the rest go hang.
[1] Use DVDs in the unlikely event you have that much critical data. Few home users will have a critical need for that stuff beyond the life of the media. Any that do can copy it over every five years, and take the opportunity to delete the obsolete stuff.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
With regards to raid 6 being insufficient for *single* drive failure recovery:
Raid 6 allows you to recover from any number of read errors as long as no more than one disk in a strip has a read error.
The probability of multiple failures on any given byte is thereby:
1-( (1-10^-14)^7 + 7*(10^-14)*(1-10^-14)^6) =~ 2.1*10^-27
Therefor to have a better chance of getting a failed raid 6 then winning the lottery, you'd need disks in the 5EB range ( (1-2.1*10^-27)^X=1-10^-8 gives X=~ 5*10^18).
In these circumstances, the probabilities are negligible in comparison with those of a second total drive failure during recovery.
This is why you scrub your RAID arrays once a week. If you're using software RAID on Linux, for example:
echo check > /sys/block/md0/md/sync_action
The above will scrub array md0 and initiate sector reallocation if needed. You do this while you have redundancy so the bad data can be recovered. Over time, weak sectors get reallocated from the spare bands, and when you do have a failure the probability of a secondary failure is very low over the interval needed for drive replacement.
Most non-crap hardware controllers also provide this function. Read the documentation.
Can You Say Linux? I Knew That You Could.
Isn't it more cost effective to do RAID 1, with a nightly backup to an external. At least in my home, I do not require mission critical hot-swapping capabilities. Then again I only have 3x 1TB hard drives. Also, after RTFAing the author of the article assumes that an unrecoverable read error corrupts your RAID array. It does not, typically your bad sector gets added to the list and mapped out of being used. Speaking of used, article assumes that entire drive is being used, but if the error on the part of the drive not covered with data, this is also a non issue.
I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered. My life is my own.
How much is your data worth? Spend accordingly.
$ = D/?
money spent on backup is equal to value of data divided by %variable%
"Apparently, RAID 6 isn't far behind"
RAID 6 is not new, 3ware has it for some time now.
I'm self employed, and my computer(s) hold both personal home stuff as well as work files. My work files mainly consist of translated documents in Word, Excel, PPT and PDF format. My accounting data is a couple MB in size, although I have it all printed out on paper due to legal requirements.
That said... for any home user with 12TB of data (12TB!? WTF!?), I'm willing to bet that the important stuff, like me, will all fit on a DVD. Maybe even a CD. Make a habit to burn a new DVD/CD on Friday evening, and keep up to 5 generations. That's a maximum of $5 per month spent on backups. Dirt cheap, and relatively effective. If you don't want to lose more than a day's worth of data, just make duplicate copies to a thumbdrive whenever you feel the need.
The majority of files on a large HDD that are so big that DVD/CD backups aren't realistic, are files that probably will cause an inconvenience at best when a disk fails. (Again, I'm talking about home users, not corporate users.) Movies, music, home videos, Games, etc. You have the original media, yes? I'm betting that anyone that has more than 2TB of storage, and is nearly saturating it, is doing a lot of P2P downloading. Disregarding the legality of it for a moment, this data can be gotten again. It will take time, but it isn't lost for good.
If you have more than a 100GB of data that is critical (I'm not talking about getting your panties in a knot over a few lost seasons of a TV show), then it's time to start thinking really, really hard about investing in a serious (and very expensive) backup system. Spare hard drives seem to be cheaper for smaller volumes of data. Just have 5 external HDDs where you can dump everything. If you have larger volumes of data, there's tape (and expensive). It's a matter of how much you value your data.
If you're self employed doing 3D modeling and rendering or other media related work where your life lays at the mercy of large volumes of data being in-tact, you need to make the necessary investment. If your sales don't rake in enough money to cover the essential equipment you need for your trade, you need to re-examine your business model.
Long story short, most people don't have 12TB of important data. Most people don't have 12GB of data for that matter! Put in that perspective, backups aren't hard, or expensive.
I should be safe now, right?
You see? You see? Your stupid minds! Stupid! Stupid!
RAID is not a backup solution.
For those administrators who think it is, you should keep your resume on the array.
I am the unwilling control for my Origin.
btrfs should be an option on Linux, for those who care to go that route
I used to use the old punch card system to backup my data. Sure it takes a while but it was totally worth it... Until one day while attempted to move the many boxes fully of carefully sorted cards I fell down the steps and the cards went everywhere. I learned from that mistake and started writing all everything down on paper... Lot's o' 1's and 0's, my hand hurt.. A lot. But there was a fire at my off site :( sot I had to resort to the ultimate old school back up. A chisel and a rock... a really really big rock.
Eating the brains of your enemies does not make you smarter. But it's still fun.
RAID 5 will still be orders of magnitude more reliable than just having a single disk.
No sig today...
Not only are there two parity drives, but the operating system can perform automatic scanning of the drives to ensure that all data and parity disks are correct and silently correct any errors that occur on only one disk. It only takes a few days to scan 12 TB, and if this is done often enough the probability of a two failed disks plus a previously undetected unrecoverable error on a third disk is quite a bit lower than the failure rate for RAID5. RAID5 volumes can be automatically scanned, but if corruption is detected there's no way to know which of the disks was actually incorrect, barring an actual message from the hard disk. Silent corruption is a much bigger enemy of RAID5 than RAID6.
I don't know why the article focuses on RAID5; RAID1 or RAID10 will have exactly the same issues at a slightly lower frequency than RAID5, but more frequently than RAID6.
Ultimately, the solution is simply more redundancy, or more reliable hardware. RAID with 3 parity disks is not much slower than RAID6, and dedicated hardware or increasing CPU speed will take care of that faster than drive speeds increase.
RAID???!!! Aaaaaaah! (Drive dies.)
article from 2006: http://lwn.net/Articles/190222/
Nothing to see here. Move along.
"Time is nothing; timing is everything."
I have to say, the ZFS folks have convinced me. There are simply too many places where bit rot can creep in these days even when the drive itself is perfect. The fact that the drive is not perfect just puts a big exclamation point on the issue. Add other problems into the fray, such as phantom writes (which have also been demonstrated to occur), and it gets very scary very quickly.
I don't agree with ZFS's race-to-root block updating scheme for filesystem integrity but I do agree with the necessity of not completely trusting the block storage subsystem and of building checks into the filesystem data structures themselves.
Even more specifically, if one is managing very large amounts of data one needs a way to validate that the filesystem contains what it is supposed to contain. It simply isn't possible to do that with storage-system logic. The filesystem itself must contain sufficient information to make validation possible. The filesystem itself must contain CRCs and hierarchical validation mechanisms to have a proper end-to-end check. I plan on making some adjustments to HAMMER to fix some holes in validation checking that I missed in the first round.
-Matt
gee I sure hope someone invents a way to bind raid clusters together soon. Oh it's called storage area network? Great! Oh we can even bind those together with things like SVC? Awesome! Needless to say, I don't get this article. Using the biggest disks you can get is a dumb move anyway and has never been necessary.
Can we just stop pretending ZDNet is a news source please?
PerfectRAID(TM) is Promise's patented RAID data protection technology [...]
RAID is not perfect, not by any stretch, [...]
But I thought...? ;P
Can tolerate a complete drive failure + hundreds of unrecoverable reads per drive on the two remaining drives. The larger the disk, the less likely that both remaining drives will fail on the same sector, so larger drives are an advantage, not a disadvantage, compared to data split across drives that has to be "rebuilt" from the parity info ...
I concur but in a less-condescending way: if there are any people that are already on the way towards building a giant RAID, can someone please give this a try to see what actually happens? (That is, fill the drives with test data, make the array rebuild once or twice, and see if bad stuff happens.) The article is based off the URE spec, but maybe real-world URE rate is lower.
But to be fair to the article, note the date on the post: July 18th, 2007. It wasn't "trivially testable" at the time; terabyte drives weren't exactly cheap yet.
if your first step is to use the largest drive available in a performance raid, then your first step is probably a misstep.
it's better to use those raids with the smaller-yet-faster drives, ie the 10k+ rpm raptors or perhaps some yet to be seen faster better (stronger harder?) SSD's and use the latest ultra large HDDs in a simple redundant raid setup for backups.
multi raids ftw
sigs... don't talk to me about sigs....
The theory of more disks = more disk failures sounds totally logical. But in practice it does not work at all. For 5 years I ran 5 servers with various IDE RAID5 and RAID1 solutions (promise, highpoint). There was a total of about 20 IDE disks. I see a disk failure about once every two months. About 3 years ago I added a Dell poweredge 2600 running TWO SCSI U160 disks on a SCSI RAID1. A single disk fails about three times a year on the dell. I found a cheap NetApp F760 NAS. It has three disk shelves of 72gb fiber channel drives for a total of 28 disks (2TB) making up 4 RAID 4 volumes. I've had this for a little over a year running ISCSI for database servers and have yet to see a single disk failure. NetApp uses a technology, WAFL, that is exactly the same as ZFS and was in fact has been in production for more than a decade. But I digress.
My point here is, the number of disk failures in a particular IT system cannot be generalized upon. There is no global rule for disk failures. My guess is there are so many different reasons for failure that it is practically impossible to predict how a system will behave without looking at the system itself, not at the cloud of disk failures. In my case I had a bunch of IDE disks failing at one rate, a bunch of SCSI U160's failing even more frequently, and a whole lot more fiber channel disks that have yet to fail at all!
Also the whole premise of the article is emphasizing on the failure of RAID5 - then says enterprises won't be affected - but what typical home user even uses a RAID 5? If I were going to give my mother a RAID it would be a RAID 1, not a RAID 5. Furthermore the typical user doesn't even know what RAID is! The typical user still thinks his single HDD is safe! We (geeks) have a long way to go in educating the typical user before we get to the RAID5 is unsafe part, which is untrue anyway. A good disk controller will recover the data in your RAID 5, even with a URE of 10^14. As with most generalizations and statistics, this one is clearly false. I'm sure Seagate loves that Samsung's failure rate effects their drive's failure rate somehow! The title would be better phrased "Crappy disks and crappy disk controllers are crappy". Hmmmm, yes, I like that much better. The previous title was boring and pedantic.
So am I on crack? What am I missing when they say MTTF @ 1,000,000 hours... By my calculation thats 114 years... Even if what the report says is true and the MTTF is more like 300,000 hours, thats still 34 years. I keep a stack of spare SATA drives on hand at work because they go bad so frequently and thats in a climate controlled power smoothed room. I'd say over the last two years I've replaced 6 drives against 14 servers. Granted I haven't had a bad one in quite awhile (knock on glued particle wood fibers w/ a simulated cherry veneer).
So what am I missing???
Assuming that you'll have another drive fail before it can rebuild is pretty alarmist. Sure, it can happen... I've *seen* it happen. But it's not the norm.
Most people who run RAID5 are running pretty poor hardware implementations. But on a board with Raptors, via multiple (quality) SATA controllers each connected via PCI-E (avoiding bus contention), I've seen RAID5 rebuilds of over 200 MB/sec. That's a pretty far cry from something like 8 drives hooked to a 32-bit, 33MHz PCI SX8, and getting a tenth of that.
Oh, you're not stuck, you're just unable to let go of the onion rings.
Isn't ZFS a filesystem? Why would I care about what filesystem I am using when I am trying to protect my data from disk failures?
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
First they say that these new improved lower failure-rate drives along with bigger disks are going to kill RAID5. Then, they try to tell us that if 1 drive failed it's likely you will lose the array due to an unrecoverable error on the remaining array drives. So which is it? Are hard drives becoming more reliable or more error prone? You can't have it both ways. This sounds like complete crap to me. I'll still be using them at home, and probably 5 years from now too.
RAID-5 is UPTIME protection, not DATA protection. What I mean by that is if you have a non-redundant configuration (single disk, RAID-0, JBOD) and a disk fails, your shit is down. You can't use your system any more until you get the problem fixed. Then, it is a rebuild situation. So you can be facing a few days of downtime, maybe more. That can be real annoying.
Well RAID-1 and RAID-5 solve that to a large degree. Now a disk failure doesn't cause a system failure. A disk fails, and provided you can get a replacement before another one fails, and it is a good bet you can, and there is no problem. You continue operating, maybe with a bit of a speed reduction but that's all.
That's why I have a RAID-5 at home. Used to run a RAID-0, since I keep good backups. Well, the a disk failed on me. Didn't lose any data, but I was sitting around with no computer to use, and it was Friday evening which meant that any order from an online shop wouldn't be there till Tuesday even if ordered with fast shipping. I elected to go buy disks locally, despite the premium charged by CompUSA, but this time got enough for a RAID-5.
The issue was never data loss, the issue was that my system was out and would remain so for quite awhile. If that is an issue for you, then RAID is a good answer (since drives are probably the sole most likely thing to fail other than fans). However backups are a completely different matter.
The reality is this:
* The home user who creates a RAID-5 array will continue to make a RAID-5 array and risk the data loss.
* The businesses that use RAID-5 arrays will likely fall into one of three groups. Note, I'm not saying that a business won't use all three of these groups either.
1. Lots of small capacity disks for performance (with or without off-site backup)
2. Fewer larger capacity disks (most likely with offsite backup)
3. RAID 5+1 users... data security by mirroring every drive in the set.
Personally, at home, I use a RAID-5 array. I'm fully aware of the implications and the likelyhood that if I lose one drive, there's a chance a second one will die while i'm rebuilding... that's OK with me, however, since I've gained 2 things.
1. Some redundancy to prevent completel data loss in the event of an outage.
2. Higher performance disk (for reads), which makes a difference when I launch stuff remotely. Believe it or not, RAID-5 over a GigE connection is faster than local disk.
Indeed.
Personally, I believe the correct answer to ensuring data recoverability is RAID together with real-time replication. You can usually accomplish this with a very acceptable price-point.
RAID is important to prevent down time due to a single disk failure, and replication prevents loss of data due to an array failure.
Personally I think RAID5/6 will be around for a very long time because it works and there's actually people out there that use it correctly (versus SO MANY people on Slashdot, apparently.)
Gosh, there's even one guy a few posts up that's claiming "The biggest problem with RAID is DECAY." Holy crap. Any RAID card made in the last 10 years will periodically scrub the disks and make sure the parity is correct - or else it will mark a disk as bad.
- It's not the Macs I hate. It's Digg users. -
dumbass boyond raid is all you need atm gook bye you =FAIL //
d-_-b
Raid 1+0 on the machine - We want a safety net to deal with minor failures and keep the machine online, and the drives need to be fast.
Automatic backups by the hosting company, to local and offsite tape, twice a day, I'm told verified at both locations after written to tape against the original snapshot.
Automatic backups synced to our company office, once a day, verified against md5 and sha hashs of the data when the original backup was created after it is at the office.
Copies brought offsite to a safety deposit box when ever I feel like doing, just in case everything else fails.
Are we safe from all harm? No theres a good chance that a nuclear detonation near our office which is near the primary datacenter will get most of our backups instantly, and possibly get the secondary backups provided by the hosting provider. However, as I told the president of our company, if that happens, I'm really not going to give a damn about restoring our data.
And ... the whole thing was screwed because no one noticed some of our reports were acting wonky in time to keep the last good backup (last snapshot of the database before the server or OS screwed up the file itself) from cycling out. This was the time when that random old backup laying in the safety deposit box saved us.
When that event occurred, since we had NEVER gone to the safety deposit box before, no one even thought about it when we were in the initial panic failure mode. It was actually a few days later when discussing the 'data loss' that the guy who actually controls the safety deposit box said 'What about the backups I take offsite?' and finally my obviously slow brain kicked in and said 'duh'.
For the record, I have since turned in my sysadmin card for obvious and become a developer, and I no longer believe in backups, thats what the sysadmin is for! Now I make the problem worse with bad code rather than being responsible for and protecting someone elses data.
Sorry, what was my point again?
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Just an FYI, the default behavior in Debian is to scrub all of your software RAID arrays monthly via a cronjob that comes with the mdadm package (see the checkarray script).
I wouldn't be surprised if other distros do something similar.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
A Black Swan is an event that is highly improbably, but statistically probable.
Yes, it is possible for a drive in a RAID 5 array to become absolutely inoperable, and for one of the other drives to have a read failure at the same time. This is highly unlikely though, and is not the Black Swan. The math use to calculate the likelihood of these two events occurring at the same time is faulty. The MTBF metric for hard drives is measured in 'soft failures'; this is very different from a 'hard failure'.
The difference between the two types of failures is that a soft failure, while a serious error, is something that the controlling operating system can work around if it detects it. It is extremely unlikely that a hard drive will exhibit a hard failure without having several soft failures first. It is even more unlikely that two drives in the same array will exhibit a hard failure within the length of time it takes to rebuild the array. In my experience, it is more likely that the software controlling the array will run into a bug rebuilding the array. I've seen this with several consumer-grade RAID controllers.
The true Black Swan is when a disk in the array catches fire, or does something equally as destructive to the entire array.
To echo other people's points, RAID increases availability, but only an off-site backup solves the data retention problem.
...is rising to the level of complexity of a simple operating system. They do a lot of very smart things in there to catch problems before they become unrecoverable, and they even report them to you via SMART so you can tell when they're at an increased risk of failure.
The catch is, there are several different ways to optimize firmware. Drives intended to be used in RAID arrays have different firmware from drives intended for desktop or laptop use. If you use desktop drives in a large RAID array, with a rather fault-intolerant parity RAID implementation, YOU ARE AN IDIOT AND DESERVE WHAT YOU GET.
The hard disk vendors aren't idiots. They make nice, fat margins on drives that get used primarily in RAID 5 arrays, and they're not about to let that revenue stream dry up. You'll have to pay a little more for the RAID-optimized drives, but the price gap between bargain SATA and RAID SATA is much smaller than the gap between IDE and SCSI, so it's still worth it to an awful lot of people.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
I think anyone talking about HDDs here wether SCSI, ATA, Fiber, SATA are all talking about 3.5" form factor.
If you're willing to negate performance, storage size and space, are the 2.5" or 1.8" Hard drives less *prone* to failure than the 3.5" ones? Say I want to be reckless, back up to a few DVDs and smaller format HDDs - different batches maybe different manufactureres. Unplug them onece copied.
Personally, I have well under > 10 GB of REALLY essential work, photos, etc. I mention this at least becuase a drive of equal or grater capacity in 1.8" FF would be affordable. The rest I could care less about. So how do I best protect it?
As the article mentions ZFS, would standardizing to one file format - Windows, Mac, Unix, Linux be a good idea with ZFS? I had a Mac and there's all kinds of crap software I had to install to get Windows to recognize the drives. Apple is considering ZFS. Linux its an option(?) and the BSDs (are working on it?).
Just encryt all your data, rename it to $porn_dvd.part_$n and throw it on any filesharing network you like.
Your data will be save for as long as the internet exists and due to increasing number of broadband users sharing 12 TB of data is not a problem at all.
who has 12tb of data at home? i can only imagine that you have a nice media server. in which case, all of your music and movies should already be backed-up for you on the original media. maybe you should consider your downloaded porn expendible in the event of a disk failure. 12tb of family photos? maybe you should get from behind that camera and actually interact with your family.
if you really have 12tb of data at home that you cannot afford to lose, then you can afford to setup another box with 12tb of capacity and sync them.
I have no idea. But I don't. I use multiple servers. Hard disks are cheap.
I run linux software raid. If a drive fails, I'm notified, and then scripts stop samba, sendmail, ftp, etc. When they are stopped, the system checks two different secondary servers, which are also raid 5, and it will rsync the delta over to one secondary, and then the other secondary.
Compare the number of bytes involved in an emergency rsync, the last regularly scheduled rsync being at most 8 hours old, to a full blown rebuild of a drive.
Teensy weensy. Even with two complete duplicates. And one of those secondary systems contains rsync snapshots coving 12 months of history.
Then the primary machine waits for my intervention limping along in degraded mode. I can do a rebuild with a hot swap that is already installed. Or if the drives are already fairly old. I replace the whole set.
Never needed tape. Too slow. Too expensive. Not very flexible.
My Primary system is 4 TB (1 TB worth to parity)
Secondary One is 4 TB (1 TB worth to parity)
Secondary Two is 8 TB (2 TB worth of parity)
The Secondary Two has 60 days of snapshots, and a monthly snapshot for the last 12 months.
Drive failures are lowered dramatically if you take a $5 case fan and put into position to blow over the top of each drive.
How about RAID-60? You can loose up to four drives (two per sub-array).
OK. I'll bite... " it is almost certain it will see an [unrecoverable read error]." What? Like most stats the 12 terabytes failure rate doesn't mean anything. If it were true, then the servers I have running 4 terabyte arrays would be failing all the time.
Whats all this talk about DVD? Why not sure Blu-Ray? whats it got 50 gigs? Little better...240 Disks for the example above where someone said 1600+ dvds. Also Blu-Ray today have 2 layers. In the future i am sure that will increase, or the next optical medium will be even more condensed. That being said i still believe that SSD is the way of the future.
-EL
... or rather, haven't noticed, the fact that nobody has realize this article was written in Mid-2007?
Why has it taken this long to hit slashdot? Is there a time warp here...
I work for a storage company. And have been in the storage industry for abut 8 years now. And here is the problem with this whole story. "...data from the failed drive, it is almost certain it will see an [unrecoverable read error]." Why is this almost certain? I haven't seen this happen...ever...to the extent that the RAID set wouldn't rebuild. That's, like, the job of RAID 5. Yes, it will take longer to rebuild 2TB drives. And people are building faster RAID contollers. You can always do RAID 1 (which isn't dead, but sure is damn expensive). And with the advent of SSDs...and RAID controllers with a shit load of write cache...you can do what TMS does and RAID 1 from your 128GB SSDs (however many you may have in your array) to a set of SATA disks (which can themselves be mirrored). This way you have multiple failure domains. And although you lose performance, you don't lose DATA. There are always smart people out there to solve such problems...but I see that it's quite sexy to just call something dead. I like the term (from Princess Bride) "Mostly Dead".
Sometimes RAID 5 drives fail because of faulty controllers. I've had several drives "go bad" but they were perfectly fine. RAID 5 sucks. Takes too much time to rebuild. I'll only mirror or mirror+stripe from now on. Storage cost is inexpensive enough that having down-time from rebuilding is worth spending more money on a few more disks.
The article says "RAID 5 Stops working" blah blah blah. That's not the case. The purpose of RAID 5, as has been mentioned several times in the comments, is to give you MORE time to recover from a failed drive.
No matter what size your array is, having more time to recover from any failure is invaluable. Therefore claiming that because bigger drives are available, does NOT invalidate the value that RAID5 has to an organisation.
Having said that, RAID5 is NEVER considered the ultimate means of protecting your data. If you think it is, then think again. You must always have multiple copies of your data, in multiple locations, and preferably on a magnetic media for long term storage if necessary.
I have a collection of servers, all with various amounts and types of data. These servers have RAID arrays, some mirrored drives only, others with Mirrors and RAID5 for more important data.
Each are incrementally backed up hourly to a "backup server" on to another RAID5 array.
A daily backup is also taken, on to another path.
Each night, this backup server's data is written to an LTO4 tape, and the next morning it is taken off-site.
We also keep monthly tapes on-site, and off-site.
At any one point, I can recover data to any server from up to an hour ago, as of last night from either yesterday's daily HDD backup, or last night's tape. Or at the end of every month for as long as we've had this backup strategy.
This is the best backup strategy I could come up with the budget I had, and I don't pretend that it's the best in the world.
But simply relying on RAID 5 and nothing else, you're asking for serious trouble.
The only solution is to regularly read everything:
The chance of avoiding double errors in the form of unreadable sectors during rebuild about doubles each time you halve the time between full reads of all sectors on a drive. (True to about weekly full reads.)
This is because a full read will allow each drive in the array to discover sectors that are becoming iffy (soft/recoverable read errors) and then remap them.
See lwn.net for a discussion and links to some good papers.
Terje
"almost all programming can be viewed as an exercise in caching"
I've only seen Raid Ant Baits III. Where can I pick up some Raid 5?
Well played you Magnificent Bastard (TM) :)
Nothing is perfect, but when you use the word perfect in a trademarked name is sounds really good.
I've seen 3 cases of data loss this year in 6 drive RAID5 with 750GB or 1TB drives. One drive failed, and a second drive failed completely during the rebuild or there was a read error on another drive during rebuild. That is not a fluke given that rebuilding can take days if the system is under high load.
Scrubbing costs too much performance if you want it to be useful. We're using RAID6 in newer products.
thegodmovie.com - watch it
Tapes are overrated. As long as you are careful not to drop them, HDDs are pretty decent.
;).
AU$1000 buys you about five 1TB SATA drives.
So AU$3K can buy you: 5 x daily 1TB drives, 5 weekly, and 5 month.
AU$1K is enough to buy a "build it yourself from decent parts" server - decent power supply, 4 x SATA, 2 x 1Gbps NICs, a core 2 duo (so you can do gzip to two "backup media" drives at the same time ), 2x 1GB RAM (not important but hey it's cheap) and a UPS+power filter, and special removable caddies for those HDDs (make sure they won't overheat the HDDs).
With tape drives, if there is new higher capacity tape technology, you will need to buy a new very _expensive_ tape drive to take advantage of it.
With HDDs you get a whole drive mechanism along with your "media".
If in 5 years time if there are cheap 10TB hard drives, you just buy them and use them.
Whereas if in 5 years time there are cheap 10TB LTOx tapes, you will need an expensive LTOx drive.
For the past decade or so that has been the trend.
I bet the SATA interface will be around for a long time, so in 5 years you'd still be able to read _most_ of your backups. Even if the bearings seize up due to age and lack of use, the data is likely to still be on the platter.
Now if you require _hundreds_ of tapes worth of storage, then using tapes as your "media" may become more viable than using HDDs.
But I get the impression you're not facing that scenario, so HDDs should be fine.
There are special caddies for HDDs, so that you can plug and unplug them.
So you could buy a Backup Server with a gigabit NIC or two, install the caddies, insert the drives, and then do backups.
Seems doable to me. After all AU$4000 is a lot of money in Malaysia where I am (we're the cheap labour, "low brains" country).
So, how much are they paying you in Australia to solve problems like this?
If a sector fails during an array rebuild in RAID-5 (after a complete drive failure), you lose one stripe's worth of data (ex., 64 kB x N-1, where N is the number of drives), you don't lose the entire array. Following the article author's logic, if you have a read error in a single drive, then all the data on the drive is lost.
It's amazing that ZDNet would pay someone this clueless to write an article about this subject, it's amazing they published it without any verification, it's amazing that the article is still online (and essentially uncorrected) after almost one year, and it's even more amazing that someone decided to post this on Slashdot.
I just had this very same scenario happen to me about too weeks ago, except I only had about 1 TB and happened to have most of it backed up to external drive. I had to manually rebuild the RAID array and a i am now awaiting the arrival of my 1.5 TB hard drive to come in. I am just going to manually back up the data on this drive and use my three 500GB hard drives in a RAID 1 array instead of RAID 5. Its not worth it to me to have a redundant system that doesn't work. I don't want to loose my entire collection of movies and music do to another poorly manufactured hard drive.
RAID was developed when the biggest hardrive was 100MB and people needed easy ways to prevent data loss. RAID was also for enterprises just recently it was opened to home users/gamers using RAID 0,1,10 etc.. this doesn't surprise me one bit yet people still make a big deal. the bigger the array the more prone you are to disk issues and data loss during a rebuild.
50 pin ribbon cable?! in the same mess as all of those GLORIOUS amphenol cables?!.... connected to half height SCSI drive?!?!
At least 2 of the drives appear to be SCA-2 80 pin drives... but one of them appears to be of the 4.3 gig vintage..... (maybe a 9.1?)
(And seriously... what SCSI cards are you using again?!)
Goodness, even the summary says "didn't back up? bummer!". Yes, we all know RAID only hedges against hardware failure. The point of this whole exercise is that RAID 5 doesn't even adequately help with hardware failures once data per drive grows large enough.
URE stands for uncorrectable read error, so corrections via ECC should already be factored into that spec.
Why are you describing ZFS as the only option, are you working for Sun?
Real-time remove replication and distributed storage are real alternative to RAID 5 or 6.
No need to use Solaris. There's a ton of very efficient tools to do that on Linux, like the excellent Zumastor project.
{{.sig}}
Thanks for the real-world data! Out of curiosity, what was the claimed URE rate on those drives?
The reason I ask is: It looks like your observed rate is better than 1 in 10^15 with 96% confidence, and probably nearer to 10^16. Getting an extra order of magnitude (i.e. from enterprise drives, which are 10^15) would be pretty impressive. But it would be quite astonishing if they claimed only 10^14 and actually gave 10^16 instead.
Whoops, technically it seems the "unrecoverable" expansion is more popular than "uncorrectable". But in either case, ECC should already be factored in.
I think the journalist should be fired for ethics violation. If not, the RAID manufacturers should bring the ZDNET news website to court over defamation charges. The journalist spread false information, fully knowing that it was false information!
He was totally aware that true hardware based RAID-5 controllers regularly and automatically check the attached disks' surfaces to find sector errors and advise disk replacement preemptively, therefore a spot error on another drive during an array rebuild simply won't happen, period.
This false and defamative article by the journalist is similar to yelling Fire! in a crowded theatre when there is no fire actually. That is NOT protected speech under the First Amendment and can be prosecuted, because it presents great danger to the patrons (i.e. a lethal stampede).
Similarly, sedition against RAID-5 exposes server operators to data loss (financial loss, loss of life-saving medical records, etc.) when they abandon RAID protection based on the journalist false propaganda.
In time of war people spreading panic via false information are summarily executed for good. I do not think journalists should be allowed to spread lies if they know it is a lie or even if they simply omitted checking the information's veracity.
They should be held responsible, becuse with freedom of speech there comes responsibility for the content of their communication.
Firstly, the core determinants of HDD failures are:
The studies by CMU and Google are not broken down at the application level, i.e. - what purpose were the HDDs serving. For example an HDD serving as an archive will perform differently from an HDD doing constant defragmentation, for the sake of example, or other read/write intensive functions as compared to archiving.
Such a mashing is therefore "unfair". But ok, lets take the numbers produced by CMU and Google. Their rates of failure does seem to threaten RAID 5's (and other RAIDs) reliability with increasing disk sizes. This issue is immediately resolved by the RAID controller - but yes it means an extra performance penalty for the RAID implementation.
As such, RAID 5 will not die. Its the RAID controllers that need to be more intelligent, at the expense of performance.
putting all your important data on one disk, is putting it al on one raid now also "eggs in one basket"?
So - 2TB RAIDS for everyone - and everything is happy.
I mean, who really needs to keep all their data in ONE place anyway?
...but nobody *forces* you to wait with the replacement of a harddrive until it breaks. You may do it earlier and regularly.
Ohne Worte
Yeah the oldies were more reliable. But... I did some archaeology on an old 386 desktop computer and the HDD still had the price tag on it. $630 for a 120MB drive!!! Nowadays we expect to get a whole home computer at that price and with all the latest. Obviously the production has become more efficient in terms of price but we make computers more and more as disposable and so we get what we pay for.
Stupidity is its own reward.
- Disks are a lot cheaper than Data
- Disk IO is the major bottleneck
That is why I have been sticking to RAID 0+1 since 1997.
and regarding the comments on offsite backup, you don't need flying monkeys to transfer 12TB of data every backup. Don't you have heard of rsync and database replication???
Oops,selected wrong moderation option. This replay is to wipe that moderation.
Well... 1 failed read per 10^14 bits PER PHYSICAL DRIVE. In the example you have 6 drives, EACH having STATISTICAL 1 failed bit per 10^14reads. Since the drives are smaller than 12TB, we're quite safe. Until we get array of 12TB drives :)
I'll also tell you it's pretty serious when you see a Fedex plane with a cargo fire in the evening news and later find out that your shipment was on it. And oh... it's not covered because it was considered an 'Act of God'.
There is a very nice program called rdiff-backup, useful for cheap disk-to-disk backups. It incorporates incremental changes at your current backup, making it equal to the more recent version and keeping incremental deltas that you can apply to get the old versions (the reverse of incremental backups). Of course it isn't as reliable as proper disk-to-tape backups.
Now, about your situation, I bet all those terabytes don't have the same importance for the company. Are you sure that you can't provide extra protection for some of the data?
Rethinking email
License-incompatible with anything worth running it on, other than Solaris itself... which is NOT worth running (see #1 above)
What you mean to say is "Some Operating Systems whose merits can be debated are license incompatible with the license of ZFS." FreeBSD can implement ZFS. Why can't Linux? Because of its license, not that of ZFS.
Mac OS X.
Even if it was feasible to buy all these hard drives or a tape drive, the amount of time it would take to properly do all these back-ups on a useful time scale seems to be beyond the reach of the typical user. Even power users do other things in their lives than worry about their computers. I can't see somebody with enough free time to make CD or DVD or tape backups every so often. And if you are copying your whole 1+ TB drive then it would take forever. It may just be that because I'm a college student I have less time than most people with normal jobs, but I see my dad come home late from work almost every day, and then he's just too tired to want to do anything else. So maybe this whole discussion just becomes irrelevant because not too many people realistically have the time to be able to do all this backing up, and would rather just take the risk of running a RAID setup.
Too expensive? Really?
I pay about 13 USD/month for 90 gb of images at S3. I'm a hobby photographer, and those 90 gbs are primarily RAWs along with a few PSDs etc. In case you're wondering why I have 90 gbs of images..
I don't bother protecting my music and my movies etc. I've got lots of legal DVDs and CDs.. And my mailbox is IMAP - so no problem there. My contact list is synced to my phone both at home and at work. So, for 13 USD/month I've insured myself against losing 5 years of my life.
Stop the brainwash
The article is a year old, and basicly just rehashes part of this paper from 2004...
Seriously...look it up. If you've got only 3 drives it's free, but go ahead and pay for the normal version. Mine works amazingly well; I have about 2.5GB of data including my DVD and CD collections. It runs great, is expandable, and fails more gracefully than RAID5. If one drive fails, you rebuild; if two drive fail you'll only lose one drive worth of data. Bad, but not as bad as losing the whole array. I haven't kept up with the software (it works, therefore I don't mess with it), but there were plans a few months ago to implement a hot-spare option in case of drive failure.
You'll need a separate PC box, but for just a few drives they can be had fairly cheaply.
Is it just my observation, or are there way too many stupid people in the world?
On Slashdot, you don't need to make excuses for the size of your porn archive. We understand.
In the post the writer states that SATA drives
have a URE rate of approx. 12 Terabytes. I would think that the URE rate applies to a single drive. Meaning that on any given drive you can expect for every 12 TB you will get an error. The key here is *for any given drive*. Therefore isn't it erroneous to apply that error rate across multiple drives? Every drive resets the chances back to 12 TB limit. Since drives are still limited to about 1-2 TB in size I would think this issue is years off. When drives are 12 TB then there will be issues.
Can someone tell me if I am wrong?
When I last priced it out for the amount of data I have it was about 50 USD a month, and being in Canada that cost has been higher and higher recently. Plus, with my current setup of Bacula and DVD-RWs there's no monthly cost whatsoever, and I get much faster recovery times. Recently when my wife's laptop drive crashed she was back up and running a couple hours after I bought a replacement disk. With S3 I'd have to wait while my DSL connection downloaded over 40 gigabytes. Add in two more computers being backed up and that's a very large cost in time when recovering (or even backing up).
Again though I'm talking about an ideal solution for me.
Read error = another failed drive.
As most failed drives don't completely self destruct, you can recover that data in the given scenario. Better solution (sans backup) is to initially mix the the branding and hardware revision of the drive in a raid 5 array, and have spares on hand with a already hot spare in the group with an aggressive rebuild strategy.
Oops,selected wrong moderation option. This replay is to wipe that moderation.
Fair enough, but the two people who modded this "informative" should have replied to you for the same reason ;)
Disk mirroring (aka, "RAID 10") is easy to set up on most systems any more. Disks get cheaper every year. The original reason for RAID was to minimize hardware costs while providing some redundancy for failure-recovery, back when a 1 gig disk used to cost upwards of $10,000. As hardware costs have declined, the financial incentive to cut these corners have also declined. With disks well below $1000, there really isn't much reason not to keep online mirrors and offline full image backups.
At home, now, I keep critical data on mirrored disks, and rotate several other full image backup disks offline. All the disks are identical make/model, and all are bootable. The offline disks are stored in a fireproof safe. I periodically send a disk to my brother just in case of fire or flood. When my critical data grows to fill one of those, I will get a half dozen new disks, each 3 or 4 times bigger than the old ones.
If you have multiple ters of personal data, you might want to consider if all of it really needs to be backed up. The quicken file is important. The sixty-fifth five-hour video of the sleeping baby might not be as important.
You mean "scrubbing"; I don't know what Promise's patent covers, but basic scrubbing and self-diagnostic monitoring has been built in to commercial-grade arrays since forever.
The point of that article was how to present poor statistical analysis (being someone who hated statistics at uni, I can see my own half-arsed attempts in the article).
Or perhaps its fairer to say "poor statistical analysis accompanies by real-world lack of knowledge".
Now, if you work with SAN's and storage, you probably already see the faults that I really cant be bothered pointing out... we can just sit here and laugh together at yet another ill-conceived zdnet article shall we?
A successful RAID scrub depends on perfect error reporting. ZFS does not.
you had me at #!
...than RAID can, by design. That's the OP's point here. Disk failures are far from the only failure mode; and many failures are neither detected nor reported.
you had me at #!
Is Slashdot reporting getting slower or is it just me? The publication date on this article is July 2007. I'm sure that the marketdroids (and tech guys too) have had plenty of time to get ready for the Slashdotting...
The server is still up, if only as fallback, so I can look this up. My original numbers were only rough guesses.
One is set of 4 Seagate ST3400832A (400GB, i.e. 1.6TB total), with the disks showing 31321h power-up time (that is 3.6 years. The second one is 8 Seagate ST3500641AS (500BG, 4TB total), with disk uptimes of 20548h, i.e. 2.3 years. I did reduce the scrubbing interval to 1 every 30 days about a year ago, so lets factor that in.
Incidentially, there is an error in my original numbers: 2 years at once every 15 days is 50 complete reads, not 100, i.e. 200TB read. Sorry. Better numbers:
1st Array:
2.6 yrs @ 15 day interval, 1.6TB = 63 scans, 1.6TB = 101 TB read
1 yr @ 30 day interval, 1.6TB = 12 scans, 1.6TB = 19 TB read
2nd Array:
1.3 yrs @ 15 day interval, 4TB = 18 scans, 4 TB = 72 TB read
1 yr @ 30 day interval, 4TB = 12 Scans, 4 TB = 48 TB read
That is 240 TB read in surface scans alone, without a single uncorrectable. There is one disk with 4 reallocated sectors and one with 5, the rest is at zero.
As to the uncorrectable rate, Seagate says 1 in 1^14 reads is uncorrectable, i.e. a 512 Byte sector read, if I interpret this corrctly. That would mean one unreadable sector every 5.12 * 10^16 Bytes, i.e. one sector missing every 51 EB. That would fit the observation. Seems there is a major combinatoric goof in the original article. It is still true that one in 10^14 bits read independently (!) would be unrecoverable, but the independence is not there, as you either get a complete sector or nothing.
To spin this further, if it really is 1 in 1^14 sectors, an 8 disk RAID5 array with 1TB disks has about 1.36*10^10 sectors and hence a chance of one unrecoverable sector of roughly 1 in 730 during RAID5 rebuild. At to the 3% failure rate per year number, at 10MB/s rebuild write speed (conservative), the rebuild takes 28 hours, i.e. adding another royghly 1 in 10000 chance of a disk dying during rebuild.
Interessting...
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Mod parent up. So many idiots and FUD here it's not funny.
Because root-mean-square is the ONLY way to get the REAL voltage.
RAID 5 is designed for 3 to 4 disk arrays. RAID 6 should be used up to 6 or 7 disks. Anything bigger, and you really need something like RAID 1 over RAID 5 or a clustering file system. AndrewFS, Lustre, or anything similar can make the chances of losing every copy of your data extremely unlikely.
You should still have data somewhere off-site just in case. A Lustre system at your primary location and a smaller Lustre system for compressed backups at a secondary location should be plenty of redundancy.
Stupid Promise Vendor. Nice job marketing your Solution.
The Problem that is being described will have little effect on the home user but is a looming problem for the Enterprise user where data sets are getting out of hand even today. People may laugh at someone for having a 12TB or larger filesystem as not workable and definetly not easy to index or backup. How many LTO 4 tapes is that?? But the reality is that some Enterprises have filesystems that size and even larger. Some have files that are 1TB+ today. Think Oil and Gas researchers which deal with huge datasets.
Whether you have one controller or a dozen controllers makes no difference since it is still one common filesystem you are guarenteed to run into the problem that is described in the article. The only way to work around it is to leverage new technologies and techniques that are not all well defined yet. RAID 6 will become the standard to replace RAID 5 just to get past the next few years as described in the article. New SCSI standards will also be developed think SCSI-4 as well as techniques like DIF (Data Integrity Field) that add metadata to hash each block of data to allow for error correction from the base hardware all the way into the application stack.
Make no mistake we are about to turn a corner and the next few years are going to change everything about how we store and retrieve data simply because the old paradigms have run their useful course.
Good luck and thanks for all the bits.
First, ZFS is trash. It's not even complete. There are over 300 fixes in the current patch cluster from Sun. Not to mention there are no robust tools to fix it once it breaks. Did I mention software raid rocks? Secondly, hardware raid isn't going away. Finally, the article doesn't take into account increased throughput of new drives that will be developed.
The one that almost bit us in the ass was a setup where we had a staging environment that had a few hundred gigs of files that would using rsync to push them out to a handful of webservers across the globe.
The disk array that was on the staging machine failed to come up one day, so rsync dutifully synced up the empty /mnt/staging mount point to all the webservers.
In some backup situations (certainly my personal stuff) it's usually acceptable to tell rsync to never delete stuff on the other end. Sure you can still corrupt one file and propagate the corrupt version, but you are less likely to wholesale blow away your backup.
I'm no RAID expert, but surely the 10^-14 bit error rate the guy is talking about is the rate of previously undiscovered errors? In which case, those errors won't have been found by a bad-sector sweep, nifty though that feature is for advance warning of other problems. His point is that one error in 10^14 bits, though it doesn't sound a lot, is actually one error every time you read 12TB of previously-working data. Which is what happens during RAID5 disk rebuild.
Peter
Drobo!
How many times does this have to be said.
RAID is not a backup.
Yeah... How many times does it have to be said? I mean, seriously, it seems everybody else who commented here said the same thing...
Bow-ties are cool.
Most RAID controllers have the capability to maintain a hot spare. This spare is tested to check it's integrity and if you have a RAID disk go bad the hot spare comes on line and recapitulation begins. Does this even protect you from URE? I guess not.but Wasn't this always the case with RAID5? If you really care I guess you could do a RAID 1:5, two mirrored raid 5's or a cluster or hell I don't know. Just print the stuff out that you really want to keep.
I love how anytime somebody gives an example of a solution to a problem, an anonymous jerk on the Internet has to accuse them of working for the companies interests. That's a WONDERFUL way of trying to discredit me.
I have purchased and setup a lot of RAID systems in my time and Promise was just ONE EXAMPLE that came to mind. Why? I just bought a Promise 4-bay NAS. That was in the marketing literature I had read before purchasing the unit.
Promise is not the only one to have technology and methods like that. I would bet that Adaptec, 3Ware, etc. all have similar technology.
So *fucking* excuse me. Next time I will take the time to find ALL the technologies from EVERY vendor and present them to /. That way jerks like you will have less opportunity to claim that I secretly work for Vendor A serving products from Manufacturer B.
But... just for shits and giggles... why not try to respond to the technology I mentioned? All you have done is to parrot the article. The "problem described in the article" is less likely to happen with the technology that I mentioned. Plain and Simple. Respond to that. I'll wait.
It's still no replacement for offsite backups but Lime Technology (http://lime-technology.com/) has a JBOD+parity solution that uses a parity drive to protect your data but even in a 2 disk failure more you only lose the data from those disks (or if the second was the parity drive you just lose the data from the one disk). And in the case of a single drive failure it'll limp along using the parity drive until you replace the bad disk and rebuild. The 3-drive version is free but if you want to go bigger they charge a nominal cost (sub $100)
It works as a great solution for a media server where you may not necessarily care about all the data but you'd prefer to not lose ALL of it at once should multiple drives fail.
The company can't afford to not have a proper backup system.
You think of price per megabyte backupped. But you need to start thinking in price per megabyte "not restored".
They still spin don't they? Oh drats, there are now disks (SD) that don't spin since they are not disks but they otherwise look like disks... same form factors... sigh.. progress.
Which RAID controllers do that, exactly? That's a bit like saying that SATA controllers will treat a drive with a single bad sector as being dead. I've never seen a controller do such a thing, and I doubt anyone making those controllers would stay in business for long.
Some controllers will consider a drive as "missing" if it doesn't respond for more than 30 seconds or so, and some SATA drives had a bug (more of a design flaw when used in RAID) that made them spend up to 2 minutes trying to recover a bad sector before responding. The result was the controller assumed the drive had died and said the rebuild had failed. In other words, the problem was the delay, not the bad sector. Anyway, you could still restart the rebuild, but this could be painfully slow if you had a lot of bad sectors.
This is not true for modern controllers, SCSI drives or "RAID edition" SATA drives, that never spend more than a couple of seconds trying to recover bad sectors (they simply give up and let the RAID controller handle it).
I'm not sure if the SATA spec has been expanded to include a "I'm busy trying to remap a sector" drive state, which would obviously be the ideal solution, but "TLER" and longer wait times by modern controllers have made the problem essentially disappear, and I think the problem never existed with SCSI / SAS drives, which is what most people would use for "enterprise" RAID-5 arrays.
I guess you never heard of Hot Spare
Personally, I believe the correct answer to ensuring data recoverability is RAID together with real-time replication.
Like, say, RAID-1? :)
Honestly, given the cost/GB ratios these days, the space advantages of RAID-5 seem pretty silly compared to the reliability and performance issues. Why not just go with something like RAID-10 and be done with it?
I've had this happen a few times, and in every case I managed to recover my data with some effort.
What you need to do is to copy the bad-block disk to the new replacement disk via dd w/ the "conv=sync,noerror" option -- sync tells it to 0-pad bad blocks to length, and noerror tells it to keep reading in the face of errors.
You then use the copy, which will have 0's in place of bad blocks, in place of the bad-block disk, and use the bad-block disk in place of the failed disk.
Annoying and time-consuming, but better than losing 10T of data.
kieran hervold
> With 12 TB of capacity in the remaining RAID 5
> stripe and an URE rate of 10^14, you are highly
> likely to encounter a URE. Almost certain, if
> the drive vendors are right.
So one sector (let's say 4kb) in 12 trillion bits is bad, so you're saying your RAID controller is so pathetic that it cas not continue?!?!?!
WTF?!?!?!?
> Oh, you didn't back it up to tape?
TAPE?!?!?
Try backing up to eSATA drives.
Basically, this moron has no clue of how to actually use a PC.
Andy
So a lot of people are saying ZFS is a great solution for a number of the issues brought up in the article.
ZFS isn't available on Linux due to incompatibilities of the CDDL, and Linux's GPL license.
CDDL is (in my opinion) more "free" than GPL (which forces redistribution of code; but let's avoid that debate right now.)
Okay, so if ZFS can't be bundled in a Linux distribution and redistributed while maintaining a GPL license, fair enough. But what is stopping anyone from doing the port, and providing ZFS as a freely distributed, do-what-you-want-with package, that installs and runs fine on Linux. If a user chooses to take this freely licensed ZFS and compile/link/install it on their Linux system, that is none of Linux/GPL's business.
Yes, redistribution is thwarted by the GPL, but why would install-it-yourself be problematic? I'd settle for that. Why isn't this available? Why hasn't anyone finished a port that can be used in this manner?
Not trolling, honestly curious about it...
Love many, trust a few, do harm to none.
Well, because it's still too expensive for a lot of businesses to go full RAID 10 on their main storage system.
The disks you buy at NewEgg are cheap, but the disks you buy for your SAN are not as cheap. They might be the same disks, but that's just the way it is. And, the big costs come in the form of cost per slot, not necessarily the disk that plugs into it.
RAID-5 doesn't suffer from any real performance issues. Not for the last 10 years, anyway. Read speed is as fast as a stripe set, and write performance hits are easily mitigated by on-board cache. I kinda thought this was common knowledge..
Replication can be done to a much cheaper unit or DAS, and/or can be sent to an off-site location for better recoverability in the case of a real disaster.
- It's not the Macs I hate. It's Digg users. -