RAID's Days May Be Numbered

← Back to Stories (view on slashdot.org)

Posted by kdawson on Thursday September 17, 2009 @09:15PM from the time-to-try-flit dept.

storagedude sends in an article claiming that RAID is nearing the end of the line because of soaring rebuild times and the growing risk of data loss. "The concept of parity-based RAID (levels 3, 5 and 6) is now pretty old in technological terms, and the technology's limitations will become pretty clear in the not-too-distant future — and are probably obvious to some users already. In my opinion, RAID-6 is a reliability Band Aid for RAID-5, and going from one parity drive to two is simply delaying the inevitable. The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss. In short, it's a scenario that will eventually require a solution, if not a whole new way of storing and protecting data."

17 of 444 comments (clear)

Min score:

Reason:

Sort:

Solved a Long Time Ago by BBCWatcher · 2009-09-17 21:27 · Score: 4, Informative

Honestly, there really aren't that many unsolved problems in computing if you are sufficiently aware enough to include mainframes and mainframe operating disciplines in your consideration. The basic way the mainframe community solved this particular problem long ago was to, first, take a holistic view about mitigating data loss. Double concurrent spindle failures are just one possible risk element. What about, for example, an entire data center exploding in a spectacular fireball? (Or whatever.) IBM, for example, came up with several different flavors of GDPS and continues to refine them, and they include multiple approaches to data storage tiering across geographies, depending on what you're trying to achieve. Data loss, whether physical or otherwise (such as security breaches), is not a particular problem with this class of technology and associated IT discipline, nor does there seem to be any signs of a growing problem in this particular technology class.
Re:simple idea by paulhar · 2009-09-17 21:34 · Score: 4, Informative

Enterprise arrays copy all the good data off the drive to a spare drive, use RAID to recover the failed sector(s), then fail the broken disk.
ZFS by DiSKiLLeR · 2009-09-17 22:16 · Score: 5, Informative

This is something the ZFS creators have been talking about for some time, and been actively trying to solve.
ZFS now has triple parity, as well as actively checksumming every disk block.

--
You can tell how powerful someone is by the magnitude of the crime they can commit and be able to get away with.
1. Re:ZFS by DiSKiLLeR · 2009-09-17 22:22 · Score: 5, Informative
  
  I thought I should add:
  ZFS speeds up rebuilding a RAID (called resilvering) over traditional non-intelligent or non-filesystem based RAIDS by only rebuilding the blocks that actually contain live data; there's no need to rebuild EVERYTHING if only half the filesystem is in use.
  ZFS also starts the resilvering process by rebuilding the most IMPORTANT parts first; the filesystem metadata and works its way down the tree to the leaf nodes rebuilding data. This way, if more disks fail, you have attempted to rebuild the most data possible. If filesystem metadata is hose, everything is hosed.
  ZFS tells you which files are corrupt, if any are, and insufficient replicas exist to due failed disks.
  All this on top of double or triple parity. :)
  
  --
  You can tell how powerful someone is by the magnitude of the crime they can commit and be able to get away with.
Re:reallocate on write by Erik+Hensema · 2009-09-17 22:19 · Score: 4, Informative

That's what any raid controller worth their salt does. I've seen 3ware and areca controllers do this, and those aren't the most expensive controllers on the market by far.

--
This is your sig. There are thousands more, but this one is yours.
Re:I thought RAID was about spindle count by gedhrel · 2009-09-17 22:30 · Score: 4, Informative

You don't rely on RAID to avoid data loss; you rely on it as a first line in providing continuity. We run backups of large systems here, but we tend to do other things too: synchronous live mirroring between sites of the critical data. And beter system design. There are some systems where, whilst we _could_ go back to tape (or VTL) at a pinch, having to do so would be a disaster in itself.
We're designing systems that permit rapid service recovery (the most live critical data) and a second tier of online recovery to get the rest back. We just can't afford the downtime.
Double-spindle failures on RAID systems are just one of those things that you _will_ see. Deciding whether a system deserves some other measure of redundancy is mostly an actuarial, rather than a technical, decision.
Re:RAID is here to stay by Kjella · 2009-09-17 22:54 · Score: 4, Informative

And when RAID 6 has a high enough risk that it's worth expanding the scheme everyone will start switching from double parity schemes to triple parity schemes since their much less expensive in terms of spindle count than RAID 6+1.
I don't think you've quite understood the problem described. You can have an infinite number of parity disks, but it does you no good if recovering one data disk causes another data disk to fail.
Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.
Now we try doing the same with ten 10TB disks and the same reliability. The last disk dies and you replace it, only now you must read 10TB from each disk. Instead of adding 1% to the lifetime it adds 10% so that they've spent 20, 30, 40, 50, 60, 70, 80, 90, 100 and 10% (your new disk) of their lifetime. But now another disk fails, you can recover that but then another will fail and another and another and another.
Basically, parity does not solve that issue. If you had a mirror, you would instead copy the mirrored disk with significantly less wear on the disks. RAID is very nice as a high-level check that the data isn't corrupted but it's a very inefficient way of rebuilding a whole disk.

--
Live today, because you never know what tomorrow brings
Re:Hardware RAID is dead by RulerOf · 2009-09-17 22:59 · Score: 3, Informative

FWIW, I'm a happy 3ware customer... saddened by their sellout to LSI, but I digress.

When I think of software RAID, I think of parity data being handled by the operating system, being done on x86 chips as part of the kernel or offloaded via a driver (thinking Fake-RAID).

If you're abstracting your storage away from the operating system that uses it, say via iSCSI or NFS or SMB to a dedicated storage box, like a NetApp filer or a Celerra, then I would consider that hardware RAID, personally speaking. If you're saying that these dedicated storage boxes manage parity, mirroring and so on all done with the same chip that's also running their local operating systems, then I have to admit that yes, that sounds like software RAID to me, but the real distinction I've come to draw between software and hardware RAID is a matter of performance and feature set. If said boxes give the same or better performance (I/Ops and throughput) to a workload as a dedicated, internal storage system managed by something like my 9650SE, then hell..... who cares, right? Aside from being rather impressed that such is possible without dedicated XOR chips, that is.

--
Boot Windows, Linux, and ESX over the network for free.
Re:Bogus outdated thinking by daybot · 2009-09-17 23:04 · Score: 3, Informative

And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.
Yeah, every time an article on storage catches my eye, I have to check laptop SSD prices. So far, each time I do this, for the cost of a drive the size I need, I could buy a new snowboard, or a laptop, bike, half a holiday, room full of beer... etc. I really want one, but so far I haven't been able to look at that list and say "I'd rather have an SSD!"
Re:simple idea by paulhar · 2009-09-17 23:05 · Score: 3, Informative

They do to varying degrees of success but just because a disk can't read a particular sector doesn't mean that the drive is faulty - it could be a simple error on the onboard controller that is causing the issue.
FC/SAS drives mostly leave error handling up to the array rather than doing it themselves because the arrays can typically make better decisions as to how to deal with the problem and helps cope with time sensitive applications. The array can choose to issue additional retries, reboot the drive while continuing to use RAID to serve the data, etc.
Consumer SAS drives on the other hand try really hard to recover from the problem - for example retrying again and again with different methods to get the sector and while admiral that leads to behaviours we see in consumer land where the PC just "locks up". The assumption here is that there is no RAID available and so reporting an error back to the host is "a bad thing". The enterprise SAS drives we're seeing on the market are starting to disable this automatic functionality to make them behave correctly when inserted into RAID arrays.
Usually ;-)
Re:Bogus outdated thinking by Lumpy · 2009-09-18 00:32 · Score: 4, Informative

The problem is IT guys and PHB's that think RAID=Backup.
It's not and it never has been a backup solution. RAID is high availability and nothing more.
RAID does it's job perfectly for high availability and will continue to do so for decades. Sorry but I have yet to see any other technology deliver the capacity I use for my small 30TB Database we have at work. Our Raid 50 array works great. We also realtime mirror that to the Backup SQL server (not for backup of data but backup of the entire server so that when SQL1 goes offline SQL2 picks up the work.)
SQL2 is backed up to a SDAT tape magazine nightly.
RAID does what it's supposed to do perfectly, it's days are not numbered because no other technology other than RAID can provide high availability.

--
Do not look at laser with remaining good eye.
Re:simple idea by operagost · 2009-09-18 02:05 · Score: 4, Informative

I'll assume you aren't trolling, and point out that disks work BECAUSE OF the air inside. The heads gain lift.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:simple idea by operagost · 2009-09-18 02:11 · Score: 3, Informative

The only real difference between WD's enterprise SATA and their consumer line (other than, perhaps, the warranty) is a firmware setting that determines how long it attempts to write to a sector before giving up and using a spare block. It has to be reduced for enterprise use so that the RAID controller doesn't fail the disk prematurely. My WD disks kept "failing" until I set this timeout shorter. It's been a year since I did that, and I've had no failures or data corruption. It's possible that this is no longer the case for their latest models.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:simple idea by amoeba1911 · 2009-09-18 02:12 · Score: 3, Informative

Speed of sound at sea level: 340.29 m/s verify
((3.5 inches) * (2.54 (cm / inches)) * pi) * (((15000 / minute) * (1 minute)) / (60 second)) * (0.01 (meter / centimeter)) = 69.8218967 m / s verify
If my calculation is correct, the outer edge of a 3.5" plate spinning at 15000 RPM is moving at 69.82m/s, which is about 20% of speed of sound. It's fast, but it's nowhere near the speed of sound.
Re:simple idea by Zenaku · 2009-09-18 02:12 · Score: 4, Informative

Air is necessary for the read/write head to operate. The piece that comes into close proximity of the platter is essentially a tiny hovercraft. It's about the size of a pepper flake, and has a microscopic pattern called an "air bearing" carved into the side facing the platter. Designing this air bearing is an exercise in fluid dynamics -- it is the shape of the bearing and how air flows over it that allows the read/write head to skim over the surface of the platter at a distance measured in microns without actually contacting the surface of the platter.
If the read/write head does contact the surface of the platter, that is called a head crash, and is bad.

--
If fate makes you a motorcycle, you become a motorcycle.
Re:simple idea by AlecC · 2009-09-18 03:20 · Score: 3, Informative

No - to reconstruct 1 sector you have to read one sector from every other drive, then write 1 sector to the replacement drive. Effectively, to reconstruct you have to read thw whole raid. So the read and write speeds both count.

--
Consciousness is an illusion caused by an excess of self consciousness.
Re:fill the drive with helium by the_other_chewey · 2009-09-18 10:35 · Score: 3, Informative

Filling the drive with helium should help;
Yeah. For about half a week. Helium has the smallest "gas particles" there are - Hydrogen atoms would
be smaller, but those really like to bond, and an H_2 molecule is quite a bit larger than a Helium atom

That's why He leaks out of everything. No exception. It diffuses through "leakproof" welds for vacuum tanks.
It diffuses through the steel walls of tanks (albeit more slowly). That's also why He is used in leakage detection:
If you see less than $not_so_few He atoms on the outside of the container you test within a couple of seconds after you injected a little bit of He, the container is considered airtight.

The only way to keep a HE atmosphere in your drive would be to constantly refill it. I don't think that there'll be any scenario where this would seem like an even remotely good idea.