Why RAID 5 Stops Working In 2009
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
12 TB of your carefully protected â" you thought! â" data is gone. Oh, you didn't back it up to tape? Bummer!
If it wasn't backed up to an offsite location, then it wasn't carefully protected.
There are shills on slashdot. Apparently, I'm one of them.
RAID is not, and has never been, a substitute for backups.
I mean, WTF? Many people regard RAID as something magical that will keep their data no matter what happens. Well ... it's not.
Furthermore, for many enterprise applications disk size is not the main concern, but rather I/O throughput and reliability. Few need 7 disks of 2 TB in RAID5.
The Raven
The problem with Raid 5 is that the more drives you have the higher probability you have that more than one drive dies. That's why you have multiple raid 5 arrays of 4 disks maximum instead of one array of 7 disks.
If you have one RAID5 box, just build another one that replicates it. Use that for your "hot backup". Then back that up to tape, if you must.
Storage is so cheap these days (especially if you don't need super-fast speeds and can use regular SATA drives), that you might as well just go crazy with mirroring/replicating all your drives all over the place for fault-tolerance and disaster-recovery.
A RAID 5 setup is only a precaution in case of an hardware failure. It serves as no excuse for not having backed up your data.
And the topic is also flawed - RAID 5 doesn't have any self destruct mechanism.
This is trivially testable. Any slashdotters have experience rebuilding 7TB RAID 5 arrays?
You'd think, if this were really an issue, we'd be hearing stories from the front lines of this happening with increasing frequency. Instead we have a blog post based entirely on theory, without a single real-world example for corroboration.
What's more, who even uses RAID 5 anymore? I thought it was all RAID 10 and whatnot these days.
I can see a lot of people getting into a tizzy over this. The RAID 5 this guy is talking about is controlled by one STUPID controller.
There are a lot of methods, and patented technology that prevent just the situation he is talking about. Here is just one example:
RAID is not perfect, not by any stretch, but if you use it properly it will serve it's purpose quite nicely. If your data is that critical, having it on a single raid is ill advised anyways. If you are talking about databases, then RAID 10 is more preferable and replicating the databases across multiple sites, even more so.
The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.
I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.
That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
What is this article about?
They say that since there is more data, you're more likely to encounter problems during a rebuild.
The issue isn't with RAID, it's with the file system. Use larger blocks/sectors.
Losing all of your data requires you to have a shitty RAID controller. A decent one will reconstruct what it can.
The odds of you encountering a physical issue increases as capacity increases, and decreases as reliability increases. In theory, the 1 TB and up drives are pretty reliable. Anything worth protecting should be on server-grade hard drives anyway.
The likelihood of a physical problem popping up during your rebuild is no higher with new drives than it was with old drives. I haven't noticed my larger drives failing at higher rates than my older, smaller drives. I haven't heard of them failing at higher rates.
Remember, folks, RAID is a redundant array of inexpensive disks. The purpose of RAID is to be fault-tolerant, in the sense that a few failures don't put you out of production. You also get the nice bonus of being able to lump a bunch of drives together to get a larger total capacity.
RAID is not a backup solution.
RAID 5 and RAID 6, specifically, are still viable solutions for most setups. If you want more reliability, go with RAID 1+0, RAID 5+0, whatever.
Choosing the right RAID level has always depended on your needs, setup, budget, and priorities.
Smells like FUD.
The whole argument boils down the published URE rate being both accurate, and a foregone conclusion. Will disk makers _really_ make drives that have a sector failure for every 2 terabytes, or will they improve whatever technology is causing these URE's to be much more rare? (if the rate was real in the first place).
AccountKiller
How many times does this have to be said.
RAID is not a backup. RAID is designed to protect against hardware failures. It can also increase your I/O speed, which is more important in some cases. Backups are different.
Depending on what you are doing, you may or may need a RAID, but you definitely need backups.
RAID 5, as well as RAID 6 is nothing more at an attempt to add some amount of redundancy without sacrificing too much space. Go RAID 1 instead with the same number of disks.
As far as I'm concerned, RAID 5 really has no redeeming features (it's slow, not particularly safe, but lulls people into a false sense of security).
From a data integrity perspective, though, RAID6 is a better solution than RAID1.
Given arrays of equal sizes, with RAID6 your data can survive the loss of *any* two disks; with RAID1, if you lose two disks which happen to be a mirrored pair, then you're hosed.
But, as you point out, RAIDn doesn't really qualify as "carefully protected"
Not everything that can be measured matters; Not everything that matters can be measured.
The main point of the article is to point out a problem that is going to eventually occur. If you read the article he mentions that later on with large enough hard drives, everyone will require a RAID set up with their "Dell manufactured" Computer. (assuming Dell hands out >>2-4TB disks to their average user)
In practice, this means that while your array is rebuilding, your performance SLAs go out of the window. If this is for an interactive server, such as a TP database or web service you end up with lots of complaints and a large backlog of work.
The result is that as disks get bigger, the recovery takes longer. This is what make RAID less desirable, not the possibility of a subsequent failure - that can always be worked around.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Spell it out for everyone.
RAID won't save your data if there is a fire.
Or if you delete a file.
Or if two drives fail.
Or a thousand other scenarios.
All RAID does is prevent the system from going down when a single drive fails (except RAID 0). Thus giving everyone in the office time to finish up their important work and log out for the day so you can swap the drive. Or, if you're brave, swap the drive during regular work hours.
For the home user (not working on huge graphic files) RAID 1 (mirroring) should be sufficient. As long as it is paired with another EXTERNAL hard drive that you copy your important information to. And leave with your brother or something. I'm talking family photos and such. Your tax information should be small enough to fit on a USB drive.
If your computer completely failed TODAY what would be the really irreplaceable files on it?
Back those up. Then store them with a friend or someone in your family.
There, problem solved.
Scrub once a week, or once every two weeks.
RAID6 isn't about losing any two disks, it's about having two parity stripes. It's about being able to survive sector errors without any worry.
It's about losing ONE drive and still have enough parity to replace it without any errors.
RAID6 on 5 drives is retarded, tho, because it leaves you absurdly close to RAID1 in kept space. RAID6 is for when you have 8-10 drives. At that point you barely notice the (N - 2) effect and you have a fast (provided your processor can handle it all) chunk of throughput along with an incredibly reliable system. Well, N-3 with a hotswap.
Personally, I think I'd go RAID-Z2 via ZFS if only because it's a little bit sturdier a filesystem to begin with.
... very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
That is also because very few people buy a Raid setup piecemeal. Most end up buying a solution, fully populated. The idea of swapping out some drives as you go, or growing your RAID over time doesn't always look good, either to the PHBs who usually run the budget, or to the vendor. We had a vendor trying to sell us a iSCSI SAN device tell us that varying the drive lots and dates increased the chances of failure. Needless to say we went elsewhere.
When we bought the RAID array for our Exchange box, this is going back a few years, everybody looked at my like an idiot because I asked for drives with different lot numbers. It was the best I could do as buying over time was not an option. HP was actually pretty cool about this request and out of 8 disks, no 3 have the same lot number or manufacture date.
Of course we are also running RAID on that machine for non-backup and do a nightly replication, so your mileage may vary.
"To Do Is To Be" - Socrates, "To Be Is To Do" - Sartre, "Do Be Do Be Do" - Sinatra
Wow, how incite-ful. Doesn't matter what the discussion is, some geek is bound to weigh in with all the shortcomings of any idea.
Newsflash: there is no perfect backup! No method is foolproof, especially when it's bound to be boring as hell, and you've got an inevitable human factor. You get lazy moving the tapes offsite, you put off fixing a dead drive because there are 4 others, you wipe your main partition upgrading your distro and forget that your CRON rsync script uses the handy --delete flag, and BOOM wipes out your backup.
Shit happens. Pointing out what we all already know doesn't do anything helpful.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
RAID-10 ftw? Expensive I know, but at least you have a full layer of redundancy rather than just a parity drive.
RAID doesn't protect against your worst enemy
rm -r *
nor is it supposed to. not being a moron seems to have protected me from "my worst enemy" just fine. RAID has protected me from random disk failures. seems to be working as designed
TIAEAE!
My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)
So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I'm in the process of building a new 8x 1T array. I'm not using any fancy raid card. Just a LSI 1068E chipset with a SAS expander to handle LOTS (16 slots in the case, using 8 right now).
I'm not putting the entire thing into one big array. I'm breaking up each 1T drive into ~100GB slices that I will use to build several overlapping arrays. Each MD device will be no more than 4-5 slices. This way if an error occurs on one disk in one part of a disk I will have a higher probability of recovery.
I may also use RAID 6 to give me more chance of rebuilding.
Disk errors tend to not be whole disk errors, just small broken physical parts of a single disk.
SMART will give me more chance to detect and replace dying drives.
Seriously - what's the problem with RAID 5? It's not a FALSE sense of security: It actually DOES prevent data loss or down time on a single disk failure. If you're a moron, you're creating 14 disk arrays. If you're smart, you keep it to 7 disks at the very most.
RAID 5 is great. It's fast, unless you have a shit controller without enough cache. It's going to prevent down time on a single disk failure (which is overwhelmingly the most common type of failure) and it doesn't cost you too much capacity.
Usually I'm more concerned with a fire or flood than a double-disk failure.
RAID 6 is good, but you get the same (actually worse) performance hit over RAID 5. More parity calculations. You can lose any two disks, which is nice, and if you can spare the space, go for it!
I don't see RAID 6 as being all that much more of a big deal over RAID 5 and actually it shouldn't really have it's own number since it's exactly the same technology and parity system as 5. It should be RAID 5.1 or something. Or maybe RAID5+1. The only reason it's become more available now is because controllers have gotten fast enough to deal with the additional parity.
- It's not the Macs I hate. It's Digg users. -
I guess you should be considered a new age Luddite?
Are you the same guy that always waits for SP1 before using any software? I thought so.
RAID is a proven technology and it's use in nearly all business IT systems from big to tiny.
RAID isn't meant as a replacement to backups. It's one PART of the entire system of preventing unnecessary data lose, and more importantly, down time. You can keep on running your server while the failed disk is replaced and rebuilt.
So, while I eat cheeto's and surf Slashdot while that RAID array rebuilds itself, you can go ahead and recover your old data from last night all day long while people bitch at you for not using the technology that's been around since the inception of the hard drive.
If you actually did have the experience you claim, you'd slap yourself for such a stupid fucking post.
- It's not the Macs I hate. It's Digg users. -
You get your first RAID controller from a trusted friend. "Here" he says "try this" and hands you a Mylex board. It has a 64 bit bus and 3 SCSI LVD connectors. Oooh. That looks fast. So you start ebaying drives, cables, adapters, more controllers, the inevitable megawatt power supply and you mess around with raid 1, raid 0 raid 1+0 and raid 5. Suddenly every system falls prey to RAIDMANIA; eventually for yourself you build a system with 3 controllers, with 3 busses each and a drive on each one of 9 busses. With a controller for swap, one for data and one for the system will Windows now be fast? Yeah, sorta. Those drives sure are quiet - from a click-click busy noise perspective, NOT from a "sounds liks a jet airplane when running" perspective. Heat is an issue, too.
http://rs79.vrx.net/works/photoblog/2005/Sep/15/DSCF0007s.jpg
But oh my are the failure modes spectacular.
I just use a laptop now and make several sets of backup DVDs or just copy to spare drives. I love RAID to death. But it's really only marginally worth the effort in the real world. But if you need fast, OMG.
Need Mercedes parts ?
First off, Isn't this story a year+ old? Sheesh.
Second off, if you're worried about URE on X number of disks, what about a single capacitor cooking off on the raid controller? No serious data is stored on a single raid controller system, without good backups or another raid'd system on completely unique hardware. Yes, if you put a lot of disk on one controller and have a failure you have a higher risk of *another* failure. That's why important data doesn't depend on *only* RAID, and why lots of places use mirroring, replication, data shuttling, etc. This isn't new. Most folks that can't afford to rebuild from backups or from a mirror'd remote device also couldn't have used 12TB for anything *but* bulk offline file storage because it's slower than christmas VS a 'real' storage array. Using it for the uber HD DVR? Great. Oh no, you lose X-files's last episodes. This isn't banking data we're talking here.
Prioritize your data. I cannot believe that a home user has 12TB of important stuff. Back up your critical records both on site and off [1]. Back up the important stuff on site with whatever is convenient. Let the rest go hang.
[1] Use DVDs in the unlikely event you have that much critical data. Few home users will have a critical need for that stuff beyond the life of the media. Any that do can copy it over every five years, and take the opportunity to delete the obsolete stuff.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
(though it's been running since '04 without any problems, and my HD health monitors show it in good shape)
Oh man.... you didn't just say that out loud did you???
That sinking feeling deep in your gut when you KNOW you screwed up bad summed up with: {head desk} {head desk}
My data backup scheme is to steganographically embed my entire filesystem into nude pictures of Sarah Palin, and then upload them to usenet.
You see? You see? Your stupid minds! Stupid! Stupid!
This is why you scrub your RAID arrays once a week. If you're using software RAID on Linux, for example:
echo check > /sys/block/md0/md/sync_action
The above will scrub array md0 and initiate sector reallocation if needed. You do this while you have redundancy so the bad data can be recovered. Over time, weak sectors get reallocated from the spare bands, and when you do have a failure the probability of a secondary failure is very low over the interval needed for drive replacement.
Most non-crap hardware controllers also provide this function. Read the documentation.
Can You Say Linux? I Knew That You Could.
The vast majority of Egypts writings were stored on perishable papyrus, not carved or painted on stone. Of all that they ever wrote or stored, we have but the tiniest fraction remaining.
If we lost technology today, there would be nothing left but paper in 20 years. In a thousand, there wouldn't even be much paper.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
I used to use the old punch card system to backup my data. Sure it takes a while but it was totally worth it... Until one day while attempted to move the many boxes fully of carefully sorted cards I fell down the steps and the cards went everywhere. I learned from that mistake and started writing all everything down on paper... Lot's o' 1's and 0's, my hand hurt.. A lot. But there was a fire at my off site :( sot I had to resort to the ultimate old school back up. A chisel and a rock... a really really big rock.
Eating the brains of your enemies does not make you smarter. But it's still fun.
That's why I chisel all my data (ones and zeros) onto stone tablets. In a few years the pile of stones will be taller than Everest. :)
Redundancy... You keep using that word. I do not think it means what you think it means.
RAID 0, psudo-ironically, is not redundant at all. RAID 1, often called mirroring, are the arrays that are redundant.
And in a thousand years some bearded guy will discover couple of those stones, come down the mountain and will base a religion around it. These things are cyclical.
If you source the original term 'RAID', it goes to an ACM article describing Redundant Arrays of Inexpensive Disks. In RAID 0, which is actually a marketing term, there's striping, but no redundancy that can infer the contents of a missing member of the array. From the perspective of availability, it has none. As you cite, RAID 1 is a mirrored pair, usually the same type of drive, and it also is likely the fastest RAID-- and most expensive in terms of available net data after redundancy for availability. There is also no RAID 6...10, as these are marketing terms, too.
---- Teach Peace. It's Cheaper Than War.
"Shit happens. Pointing out what we all already know doesn't do anything helpful."
Actually, it gives posters like you a chance to remind everyone else that shit happens.
I believe there would be many fewer frustrated/bitter IT workers if more people meditated on the fact that shit just happens. In today's marketplace it is usually IT left holding the bag when things go south anyhow... gotta get acclimated to that and roll on.
Anyhow, I doubt there are many IT veterans not familiar with really expensive, really borked backup systems. Smarter people than me have observed that as technology progresses, existing strategies either age or mature. The ones that age become brittle, and the ones that mature become more robust...
Corporate suits usually insure that both aged and mature technologies will be flogged on long past their rational retirement dates.
And look what happened? Netcraft is already half way to confirming the demise of alt.binaries!
Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
RAID 5 will still be orders of magnitude more reliable than just having a single disk.
No sig today...
Not only are there two parity drives, but the operating system can perform automatic scanning of the drives to ensure that all data and parity disks are correct and silently correct any errors that occur on only one disk. It only takes a few days to scan 12 TB, and if this is done often enough the probability of a two failed disks plus a previously undetected unrecoverable error on a third disk is quite a bit lower than the failure rate for RAID5. RAID5 volumes can be automatically scanned, but if corruption is detected there's no way to know which of the disks was actually incorrect, barring an actual message from the hard disk. Silent corruption is a much bigger enemy of RAID5 than RAID6.
I don't know why the article focuses on RAID5; RAID1 or RAID10 will have exactly the same issues at a slightly lower frequency than RAID5, but more frequently than RAID6.
Ultimately, the solution is simply more redundancy, or more reliable hardware. RAID with 3 parity disks is not much slower than RAID6, and dedicated hardware or increasing CPU speed will take care of that faster than drive speeds increase.
RAID???!!! Aaaaaaah! (Drive dies.)
Wow. I love your FUD. If you're going to lie, at least make it seem truthful.
Lacking in file system utilities (yes, fsck IS necessary even on healthy filesystems, especially on desktops and portables)
Why no fsck? And if you really feel the need to do something:
License-incompatible with anything worth running it on, other than Solaris itself... which is NOT worth running (see #1 above)
What you mean to say is "Some Operating Systems whose merits can be debated are license incompatible with the license of ZFS." FreeBSD can implement ZFS. Why can't Linux? Because of its license, not that of ZFS.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
I do the same thing, but I want to warn you...
I've had TWO occasions where it has failed me. Once, a lightning strike that zotched both drives. The second time a rubber isolator failed in the case and the master drive fell onto the backup.
In both cases the bad spots in the two drives were different so I got back most of my data, but now I use Mozy as well as mirroring. I REALLLLLLLY don't want to lose all of my digital photos. :)
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Lets hope he discovers some porn this time...
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
That doesn't work for me. Try
I have to say, the ZFS folks have convinced me. There are simply too many places where bit rot can creep in these days even when the drive itself is perfect. The fact that the drive is not perfect just puts a big exclamation point on the issue. Add other problems into the fray, such as phantom writes (which have also been demonstrated to occur), and it gets very scary very quickly.
I don't agree with ZFS's race-to-root block updating scheme for filesystem integrity but I do agree with the necessity of not completely trusting the block storage subsystem and of building checks into the filesystem data structures themselves.
Even more specifically, if one is managing very large amounts of data one needs a way to validate that the filesystem contains what it is supposed to contain. It simply isn't possible to do that with storage-system logic. The filesystem itself must contain sufficient information to make validation possible. The filesystem itself must contain CRCs and hierarchical validation mechanisms to have a proper end-to-end check. I plan on making some adjustments to HAMMER to fix some holes in validation checking that I missed in the first round.
-Matt
You DID see my previous reply, right?
Yes, I did. It quotes an explanation that you can only fix errors in redundant configuration. Considering that the whole basis for this discussion is RAID-5, I think that's a feasible thing. However, metadata is written in multiple places, so if you want a ZFS fsck to correct a corrupted superblock, it's kinda silly since that superblock is written in multiple places anyway. Also, you can tell ZFS to do a manual scrub (as I shown) which has the advantage of running while the array is running so you can cron script it and still keep the array available.
I'm not going to argue license points. The fact is that ZFS is under an open source license and so is Linux. Sun had every right to use their own license.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
The Egyptians found a way to preserve their message over thousands of years, surely we can come up with something. :)
And they would have saved future generations from vast amounts of confusion and effort, if they'd only been a little more diligent backing up their pyramid construction HOWTO files.
You leave RMS out of this!
oops, missed funny and hit overrated.
Sorry about that. To bad this will remove some good mods up above.
It was me, I did it, I moved your cheese
Well, Windows does. Taking a snapshot of NTFS, even on a heavily used 1TB+ file server, takes only a few seconds, and under normal operation the file system is still fast.
NTFS is actually a pretty good file system. It's probably because it was originally designed by IBM.
- It's not the Macs I hate. It's Digg users. -
"This time?"
Ah, I see you've never read "Song of Songs"
A Black Swan is an event that is highly improbably, but statistically probable.
Yes, it is possible for a drive in a RAID 5 array to become absolutely inoperable, and for one of the other drives to have a read failure at the same time. This is highly unlikely though, and is not the Black Swan. The math use to calculate the likelihood of these two events occurring at the same time is faulty. The MTBF metric for hard drives is measured in 'soft failures'; this is very different from a 'hard failure'.
The difference between the two types of failures is that a soft failure, while a serious error, is something that the controlling operating system can work around if it detects it. It is extremely unlikely that a hard drive will exhibit a hard failure without having several soft failures first. It is even more unlikely that two drives in the same array will exhibit a hard failure within the length of time it takes to rebuild the array. In my experience, it is more likely that the software controlling the array will run into a bug rebuilding the array. I've seen this with several consumer-grade RAID controllers.
The true Black Swan is when a disk in the array catches fire, or does something equally as destructive to the entire array.
To echo other people's points, RAID increases availability, but only an off-site backup solves the data retention problem.
Isn't ZFS a filesystem? Why would I care about what filesystem I am using when I am trying to protect my data from disk failures?
Because it's a file system, volume management, and redundancy all rolled into one combined with native NFS and SMB sharing, iSCSI support, etc. etc.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
No method is foolproof, especially when it's bound to be boring as hell, and you've got an inevitable human factor. You get lazy moving the tapes offsite, you put off fixing a dead drive because there are 4 others, you wipe your main partition upgrading your distro and forget that your CRON rsync script uses the handy --delete flag, and BOOM wipes out your backup.
Jesus Christ, you must be one unlucky soul. Do you live your entire life in a worst-case scenario?
The system that I use for data storage is as follows:
This system may not be foolproof (what is?), but it is pretty frickin' safe, and costs me roughly $3 or $4 per month. Not too shabby for what I would consider to be a fairly robust backup system for a home user.
I suppose the biggest challenge is deciding what goes into rsnapshot. If my RAID array suffered a massive failure, I would definitely lose data. But this is mostly video content, and really, if I lose my mythtv shows, it is not exactly as catastrophic as if I lost, say, my quickbooks data.
There are a lot of things that keep me awake at night, but loss of important data is not one of them.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
That doesn't work for me. Try
hell, if you want to lose data, you've gotta at LEAST use dd. rm is just removing file handles, all your data is fine, you just cant access it. run
(or whatever disk you want to lose) and then see how many data recovery places will turn you away. the level of data recovery available to the public is pretty crappy, there's a guy offering a reasonably big prize to any data recovery company (or anyone at all i guess) who can recover data from a disk he zero'd with dd and hasnt had any takers yet. i wish i could find the link
The only solution is to regularly read everything:
The chance of avoiding double errors in the form of unreadable sectors during rebuild about doubles each time you halve the time between full reads of all sectors on a drive. (True to about weekly full reads.)
This is because a full read will allow each drive in the array to discover sectors that are becoming iffy (soft/recoverable read errors) and then remap them.
See lwn.net for a discussion and links to some good papers.
Terje
"almost all programming can be viewed as an exercise in caching"
#%^@%!#$!!!! The second one works!!
Goodness, even the summary says "didn't back up? bummer!". Yes, we all know RAID only hedges against hardware failure. The point of this whole exercise is that RAID 5 doesn't even adequately help with hardware failures once data per drive grows large enough.
I'm glad the 'not being a moron' thing worked out for you. But, what would you suggest to those in the audience that cannot claim the same. :-)
OS X?
Firstly, the core determinants of HDD failures are:
The studies by CMU and Google are not broken down at the application level, i.e. - what purpose were the HDDs serving. For example an HDD serving as an archive will perform differently from an HDD doing constant defragmentation, for the sake of example, or other read/write intensive functions as compared to archiving.
Such a mashing is therefore "unfair". But ok, lets take the numbers produced by CMU and Google. Their rates of failure does seem to threaten RAID 5's (and other RAIDs) reliability with increasing disk sizes. This issue is immediately resolved by the RAID controller - but yes it means an extra performance penalty for the RAID implementation.
As such, RAID 5 will not die. Its the RAID controllers that need to be more intelligent, at the expense of performance.
Oops,selected wrong moderation option. This replay is to wipe that moderation.
Even if it was feasible to buy all these hard drives or a tape drive, the amount of time it would take to properly do all these back-ups on a useful time scale seems to be beyond the reach of the typical user. Even power users do other things in their lives than worry about their computers. I can't see somebody with enough free time to make CD or DVD or tape backups every so often. And if you are copying your whole 1+ TB drive then it would take forever. It may just be that because I'm a college student I have less time than most people with normal jobs, but I see my dad come home late from work almost every day, and then he's just too tired to want to do anything else. So maybe this whole discussion just becomes irrelevant because not too many people realistically have the time to be able to do all this backing up, and would rather just take the risk of running a RAID setup.
Yes, that's what time machine is for. Sadly, my mac is the best backed up machine here. I have an external seagate drive hooked up with time machine and average around a month of backup points. I also burn things on DVD twice a year I can't live without like my iTunes collection. I really wish blu-ray would pick up on Macs for backup purposes. I could backup my iTunes with 3 50GB BD discs. 135GB of data to backup on 8GB DVDs?
Tapes are cost prohibitive and optical hasn't kept up with hard drive capacity. I remember when I could backup my whole computer on 2 CDs. Now, even with BD I'd need 5 discs.
Optical discs have their own problems, but I like to have backups on at least two different types of media. Since tapes are expensive and I've had terrible luck with them professionally, I'd like to stick to optical when possible.
MidnightBSD: The BSD for Everyone