RAID's Days May Be Numbered

simple idea by shentino · 2009-09-17 21:16 · Score: 2, Interesting

Don't consider an entire drive is dead if you get a piddly one-sector error.

Just mark it read only and keep chugging.

Re:simple idea by Coren22 · 2009-09-18 00:49 · Score: 3, Interesting

They aren't talking about drive speeds as much as failure rate:

The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss.
They are talking about the MTBF of drives has not gone up as fast as the capacity, and the fact that a missed write is actually quite likely with a modern high capacity drive. Even saying drive speeds haven't gone up is very accurate, 15k RPM drives have been around for quite a while now, at least for 10 years, and there has not been an improvement in speed in that time. Where are my 30k RPM drives?~
Also, I have a bit of a problem with your statement about OMG small enterprise drives. Enterprise drives have caught up to consumer drives in size, you can now buy 1TB SAS drives; they are just OMG expensive compared to the consumer drives.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:simple idea by alva_edison · 2009-09-18 00:55 · Score: 4, Interesting

The problem becomes space in the data center. I don't know about you, but we're trying to cram Petabytes into existing computer rooms and coming up short. Plus you don't address Tier 2 or Tier 3 storage which tends to be on SATA or near-line SAS both of which have the ridiculous size problem. Calling 15,000 RPM fast in the datacenter is also misleading because those are the speeds we've been at for a few years now, 10GB iSCSI (or FCoE, which bypasses the collison problem) is about to render that untenable. The current solution tends toward storage virtualization (in this case virtualization means excessive amounts of high-speed cache in front of controllers and less control on where controllers allocate space). The future is most likely some kind of grid technology (like XIV from IBM). Where any blcok is on two random drives in the array, and only the controller knows where. This means that drive rebuilds become subject to swarm speeds (since there is an equal chance that it is pulling data from every other drive in the tower).

--
He effected a bored affect.
Re:simple idea by paulhar · 2009-09-18 01:02 · Score: 4, Interesting

You're not likely to see 30k RPM drives any time soon. The speed of a 15k drive means that the outer edge of the 3 1/2" drive is spinning pretty fast... getting close to the speed of sound and the lions share of power consumed by 15k drives is consumed in counteracting the air buffeting the heads. With 2 1/2" drives we could go faster but while drives are open to the air it's not likely we'll see much in the short term.
It's why CDROM speeds haven't gone up much since the old day of 52x.
As areal density improves the drives will be able to push out more raw MB/sec just like DVD is better than CD, but in terms of IOPs it's not likely to dramatically improve.
Re:simple idea by denis-The-menace · 2009-09-18 01:37 · Score: 2, Interesting

Why not add multiple heads to the same platter?
Keep the disk spinning at 15K but add heads with their own actuator and everything. One could read only the other write only. Whatever makes sense.

--
Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
Re:simple idea by Firethorn · 2009-09-18 02:15 · Score: 4, Interesting

Even partial evacuation would help, but you run into the problem that the read heads are designed to use the air to keep them from contacting the platters, so you'd need to replace that effect somehow.
The Space shuttle and ISS even have special sensors to shut the hard drives down if the air pressure goes too low. Reading about which was how I found out that hard drives are designed to use air.
Not to mention that you're now trying to build an air tight container, but if you're looking at ultra-high performance drives that's less of an issue.
Still, you have to look at how much such a drive would cost, and whether the cost would ever be repaid - if I was looking at investing in such technology I'd be concerned that Flash would outpace my vacuum drives before I got them released. Even if I DO manage to find a niche, would the niche last long enough against flash memory that's getting faster and cheaper so quickly?
For certain data sets and access patterns, flash is already much cheaper than the old raid options - the best example I saw was a dataset of a few hundred gigabytes that was mostly read-only, but accessed so much so randomly they had to mirror it on 10 hard drives to meet the read demands. One professional level SSD performed BETTER, while costing less than half of the setup.

--
I don't read AC A human right
Re:simple idea by Gothmolly · 2009-09-18 02:37 · Score: 3, Interesting

Can you say "instantaneous heat death" ? Vacuum is an excellent insulator.

--
I want to delete my account but Slashdot doesn't allow it.
Re:simple idea by pjr.cc · 2009-09-18 06:50 · Score: 3, Interesting

Unfortunately all that is quite a myth for the most part.
Having worked in storage for a aeons the reality is that the difference between enterprise and "consumer grade rubbish" has very little to do anything but tollerance. If you picked up a 300G 10k enterprise drive and compared it to the consumer grade rubbish you'd find nothing different. It used to be the case, way back when, that they were very different but because consumer grade drives have gotten so much better its just not worth the expense of building the same drive for enterprise as for consumers with slightly different specs. What is different is the acceptable tollerances, when a platter comes off the line if its within 2% of its manufacturing tollerances its ok to use for entperise and if its higher they throw it into consumer. The reality is that most drives are in that "better than 2% tollerance" range and that is simply because the processes to make them have gotten so good over the years. The point is that when you hit your magic tollerance number, the drive is capable of 100% duty cycle.
So essentially, the difference between "consumer" and "enterprise" when it comes to the casing, the platters, the heads and the motors is zero. There are alot of different spec drives out there today ranging from 146gb (typically the smallest you'll find these days) all the way to 2gb with speeds form 7200 to 15000 rpm and enterprise is the only place that uses all of them, but they still come off the same manufacturing line. The drivers behind it all come down to the consumer itself, in enterprise its often about performance, and with consumers its about size. Very conveniently building bigger consumer grade drives typically means improving the performance of a drive in ways that scale straight back to the enterprise. Sure, you wont see many users throwing around 15k rpm drives, but thats more because its unnecessary.
So why is it that in the mid-to-low server range do we find 300gb 15k drives? Because its a cheap way of getting performance - and that is fairly important at that end of the market where servers need to be cheap and theres alot of competition (you know, 1-2ru with 4-8 drives and a raid card, no san).
So what else differs between the two? Interface. In the mid-to-low server range we start talking SAS and this is more to do with being able to talk to several drives at once (Again not something alot of consumers do other than with usb drives perhaps). The SAS interface is quite brilliant cause it can scale quite well to a larger number of drives than can SATA and does it very cheaply. It also takes alot of load off the server when it comes to processing data transfer (for a large number of drives). But in that same space you WILL find sata drives going up to 2tb (often servers lag consumers in size simply because of certification, not because of anything to do with stability). To call a 1tb drive unstable is rather silly in reality.
Now the BIG end of town - SAN's. These days in most SAN's you'll find a mix of SATA and Fibre channel (some do do SAS as well, but its uncommon though its changing). In the SAN end of town (the big boy game) you'll see it all. 7.2k rpm 2tb SATA's sitting in the same array along side 146g 15k RPM fibre channel and its all about trading off storage density/cost to performance. Consider this: 10 1tb sata drives can consume (easily) a 8gbps FC interface - OUCH! Now alot of SAN arrays start at around 4 FC intercaes and go up to maybe 16, but they'll be supporting literally thousands of drives. Alot of the SAN industry realised some time ago that throwing 2tb SATA's into an array made alot of sense because SAN interfaces have grown very slowly in terms of throughput and single HD interfaces have grown very quickly. There are even several very popular arrays that only do SATA and that was the driver behind "enterprise" grade large-storage drives (i.e. entperise grade 1tb+ sata drives). At the server you still get the fibre channel performance. The critical difference is that the array does more work

Bogus outdated thinking by twisteddk · 2009-09-17 21:28 · Score: 5, Interesting

The author says it himself in the article:

"And running software RAID-5 or RAID-6 equivalent does not address the underlying issues with the drive. Yes, you could mirror to get out of the disk reliability penalty box, but that does not address the cost issue."

but he hasn't adressed the fact that today you get 100 times as much diskspace for the same cost as you did 10 years ago when cost was a factor. In real life cost isn't a factor when it comes to datastorage, simply because it's really low in real life projects, as compared to the other costs in a project requiring storage. So if you want the reliability you go get a mirror. Drivespace is dirt cheap.

As for the rebuildtimes, fine, go buy FASTER drives. I dont see the problem. HP and many other vendors have long been trying to sell combined raid soltions (like the EVA) where you mix high storage with high performance drives (like SSD vs. SATA).

The only real argument for the validity of this article is the personal use of drives/storage. And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.

--
--- To err is human... Am I more human than most ?

Re:Bogus outdated thinking by Svartalf · 2009-09-18 01:37 · Score: 2, Interesting

RAID5 is not backup. It's resilience for bringing the whole system down with a failure.
RAID was originally developed to make what we consider small storage capacities (then massive) affordable and reasonably reliable.
You're using RAID5 in it's "intended" use- but an SSD of the same capacity will be inherently MORE reliable (by a factor of how many of those magnetic disks you remove) than your system design right now.
From personal experience with a system customer base of literally thousands of enterprise class servers spread out over many companies, RAID doesn't work QUITE the way people make it out to be. We're ripping it out of the equipment and reverting to warm backups instead- the RAID1 design they fielded made the servers unstable.
The field engineer crowd (one of my friends worked with Nortel in the field engineer group and my brother is a manager for outsource company doing a lot of the same work with the same customers...) HATES RAID.
Blow a controller? Better hope you have an identical one in stock. You can't just swap out a differing controller of the same brand or pop a different brand in- they all do things ever so slightly differently on the disks.
Blow a disk? Better hope you can get the new drive in there and integrate it properly before you lose another.
Disks don't have the reliability we once thought they had.
RAID doesn't do what most people thinks it does for them.

--
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas

There are always more solutions... by Anonymous Coward · 2009-09-17 21:37 · Score: 1, Interesting

Probably the next meta solution after RAID 6 will be something like ZFS, where the filesystem that works not just on the fs-specific layer, but on the LVM layer so it can log CRCs of files and immediately be able to tell if a file got corrupted (and perhaps fix it with some ECC records.) One can see a filesystem not just writing a RAID layer, but taking recovery data and storing that away as filesystem metadata.

Of course, there is always doing redundant arrays of RAID clusters, say three groups, two data, one parity, or mirroring RAID 5 volumes. You have the usual tradeoffs: The more fancy the RAID scheme, the more disks you need, and the more computing you have to do for every bit thrown at and read off the array.

Long term solution? A move to something other than magnetic storage. This could be optical, it could be SSD if some advance allows very large density increases, or something unknown. The technology would have to have a rate of failure magnitudes better than magnetic, as well as a cost on par with magnetic for it to completely work. Holographic storage has languished for a while, perhaps as the technology improves for that, we may see drives using 3D blocks of that replacing the old fashioned spindles.

Hardware RAID is dead by PiSkyHi · 2009-09-17 21:46 · Score: 3, Interesting

Hardware RAID is dead - software for redundant storage is just getting started. I am looking forward to making use of btrfs so I can have some consistency and confidence to how I deal with any ultimately disposable storage component.

The ZFS folks have been doing it fine for some time now.

Hardware RAID controllers have no place in modern storage arrays - except those forced to run Windows

Non-issue ... by Lazy+Jones · 2009-09-17 21:47 · Score: 3, Interesting

Modern RAID arrays show no dramatic performance degradation while rebuilding, also with RAID-50/RAID-60 arrays, only a fraction of the disk accesses is slower than usually when a single drive is replaced.

For enterprise level storage systems, this is also a non-issue because of thin provisioning.

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)

If you want smaller drives... by asdf7890 · 2009-09-17 22:05 · Score: 4, Interesting

If you want smaller drives to speed up rebuild times then, erm, buy smaller drives? You can get ~70Gb 10Krpm and 15Krpm drives fairly readily - much smaller than the 500-to-2000-Gb monsters and faster too. You can still buy ~80Gb PATA drives too, I've seen them when shopping for larger models, though you only save a couple of peanuts compared to the cost of 250+Gb units.

If you can't afford those but still don't want 500+Gb drives because they take too long to rebuild if the array is compromised and needs a rebuild, and management won't let you buy bog standard 160Gb (or smaller) drives as they only cost 20% less than 750Gb units without the speed benefits of the high cost 15Krpm ones, how about using software RAID and only using the first part of the drive? Easily done with Linux's software RAID (partition the drives with a single 100Gb (for example) partition, and RAID that instead of the full drive) and I'm sure just as easy with other OSs. You'll get speed bonuses too: you'll be using the fastest part of the drive in terms of bulk transfer speed (most spinning drives are arranged such that the earlier tracks have higher data density) and you'll have lower latency on average as the heads will never need to move the full diameter of the platter. And you've got the rest of the drive space to expand onto if needed later. Or maybe you could hide your porn stash there.

ZFS, Anyone? by Tomsk70 · 2009-09-17 22:09 · Score: 2, Interesting

I've managed to get this going, using the excellent FreeNAS - although proceed with caution, as only the beta build supports it, and I've already had serious (all data lost) crashes twice.

However the principle is sound, and I'm sure this will become standard before long - the only trouble being that HP, Dell and the like can't simply offer upgrades for existing RAID cards - due to the nature of ZFS, it needs a 'proper' CPU and a gig or two or RAM. Even so, it does protect against many of the problems now besetting RAID (which was never meant to handle modern, gargantuan disk sizes).

Fountain codes? by andrewagill · 2009-09-17 22:11 · Score: 3, Interesting

What about fountain codes? The coding there is capable of recovering from a greater variety of faults.

Re:Fountain codes? by Anonymous Coward · 2009-09-17 22:44 · Score: 1, Interesting

Why fountain codes ? Any other erasure code http://en.wikipedia.org/wiki/Erasure_codes will do the job. Parity and Reed Solomon codes used in RAID are in fact erasure codes.

Re:Worked-around a Long Time Ago by Anonymous Coward · 2009-09-17 22:22 · Score: 5, Interesting

But really none of that should be necessary for the general case. Storing data in different physical locations is a good but entirely unrelated issue- the main problem of disk reliability is still very much in need of a solution. That's pretty much the point of the article: You can come up with various solutions which move the problem around, give multiple fallbacks for when something goes wrong.. but there's still the problem of things going wrong in the first place. I shouldn't need to use 12 separate disks spread across the globe just for basic reliability / redundancy

Old news by EmTeedee · 2009-09-17 22:22 · Score: 2, Interesting

Read that before on slashdot. Why RAID 5 Stops Working In 2009

Parity declustering by Biolo · 2009-09-17 22:37 · Score: 4, Interesting

Actually I like the parity declustering idea that was linked to in that article, seems to me if implemented correctly it could mitigate a large part of the issue. I have personally encountered the hard error on RAID5 rebuild issue, twice, so there definitely is a problem to be addressed...and yes, I do now only implement RAID6 as a result.

For those who haven't RTFATFALT (RTFA the f*** article links to), parity declustering, as I understand it, is where you have, say, an 8 drive array, but where each block is written to only a subset of those drives, say 4. Now, obviously you loose 25% of your storage capacity (1/4), but consider a rebuild for a failed disk. In this instance only 50% of your blocks are likely to be on your failed drive, so immediately you cut your rebuild time in half, halving your data reads, and therefore your chance of encountering a hard error. Larger numbers of disks in the array, or spanning your data over fewer drives, cuts this further.

Now, consider the flexibility you could build into an implmentation of this scheme. Simply by allowing the number of drives a block spans to be configurable on a per block basis, you could then allow any filesystem that is on that array to say, on a per file basis, how many disks to span over. You could then allow apps and sysadmins to say that a given file needs to have the maximum write performance, so diskSpan=2, which gives you effectively RAID10 for that file (each block is written to 2 drives, but with multiple blocks in the file is likely to be written to a different pair of drives, not quite RAID10, but close). Where you didn't want a file to consume 2x its size on the storage system, you could allow a higher diskSpan number. You could also allow configurable parity on a per block basis, so particularly important files can survive multiple disk failures, temp files could have no parity. There would need to be a rule however that parity+diskSpan is less than or equal to the number of devices in the array.

Obviously there is an issue here where the total capacity of the array is not knowable, files with diskSpan numbers lower than the default for the array will reduce the capacity, numbers higher will increase it. This alone might require new filesystems, but you could implement todays filesystems on this array as long as you disallowed the per-block diskSpan feature.

This even helps for expanding the array, as there is now no need to re-read all of the data in the array (with the resulting chance of encountering a hard error, adding huge load to the system causing a drive to fail, etc). The extra capacity is simply available. Over time you probably want a redistribution routine to move data from the existing array members to the new members to spread the load and capacity.

How about you implement a performance optimiser too, that looks for the most frequently accessed blocks and ensures they are evenly spread over the disks. If you take into account the performance of the individual disks themselves, you could allow for effectively a hierarchical filesystem, so that one array contains, say, SSD, SAS and SATA drives, and the optimiser ensures that data is allocated to individual drives based on the frequency of access of that data and the performance of the drive. Obviously the applications or sysadmin could indicate to the array which files were more performance sensitive, so influencing the eventual location of the data as it is written.

--
Stealing a rhinoceros should not be attempted lightly.

Remembering an article earlier this week: by Chrisq · 2009-09-17 22:44 · Score: 3, Interesting

Will scalable distributed storage systems like Hadoop and Google File System take over from RAID?

RAID concept is fine, it's that HDs are too big by trims · 2009-09-17 22:45 · Score: 5, Interesting

As others have mentioned, this is something that is discussed on the ZFS mailing lists frequently.

For more info there, check out the digest for zfs-discuss@opensolaris.org

and, in particular, check out Richard Elling's blog

(Disclaimer: I work for Sun, but not in the ZFS group)

The fundamental problem here isn't the RAID concept, is that the throughput and access times of spinning rust haven't changed much in 30 years. Fundamentally, today's hard drive is no more than 100 times as fast (both in throughput and latency) than a 1980s one, while it holds well over 1 million times more.

ZFS (and other advanced filesystems) will now do partial reconstruction of a failed drive (that is, they don't have to bit copy the entire drive, only the parts which are used), which helps. But there are still problems. ZFS's pathological case results in rebuild times of 2-3 WEEKS for a 1TB drive in a RAID-Z (similar to RAID-5). It's all due to the horribly small throughput, maximum IOPs, and latency of the hard drive.

SSDs, on the other hand, are no where near the problem. They've got considerably more throughput than a hard drive, and, more importantly, THOUSANDS of times better IOPS. Frankly, more than any other reason, I expect the significant IOPS of the SSD to signal the death knell of HDs in the next decade. By 2020, expect HDs to be gone from everything, even in places where HDs still have better GB/$. The rebuild rates and maintenance of HDs simply can't compete with flash.

Note: IOPS = I/O Per Second, or the number of read/write operations (irregardless of size) which a disk can service. HDs top out around 350, consumer SSDs do under 10,000, and high-end SSDs can do up to 100,000.

-Erik

--
There are always four sides to every story: your side, their side, the truth, and what really happened.

doesn't raid 10 solve this? by davros-too · 2009-09-17 23:13 · Score: 2, Interesting

Um, don't schemes like raid 1+0 solve the parity rebuild problem? Even in the worst case of full disk loss, only one disk needs to be rebuilt and even for a large disk that doesn't take very long. Am I missing something?

--
In theory, there's no difference between theory and practice; in practice there is.

Re:RAID is here to stay by paulhar · 2009-09-17 23:20 · Score: 2, Interesting

RAID 1 has much less reliability than RAID 6. Assume a typical case: one disk totally fails. You then start to reconstruct - in a RAID 1 scheme a single sector error will result in the rebuild failing. Not great.

In RAID 6 you start the rebuild and you get a single sector error from one of the drives you're rebuilding from. At that point you've got yet another parity scheme available (in the form of the RAID 6 bit) that figures out what that sector should have been and then continues the rebuild. Then you go back and decide what to do about that drive that had the second error.

A lot of drive failures aren't full head crashes or motor errors but just single sector, track, bits of dirt on the platter style errors. Other than the affected area the drive can be read.

With RAID 6 you can fail two disks completely and still access the data. You're still reading from the same ten 10TB disks in your example and if the implementation of RAID 6 is optimal (RAID-DP) you aren't having to read additional data from the same physical disks.

In the world you describe with 10TB drives it sounds like you'd just not be able to use the disks at all since any process that reads from the disks will kill them. There are a few things that could happen:

1. Disks get more reliable. Hasn't happened much yet but...
2. We switch to different packaging. Instead of making disks larger we cram more of them into the same space similar to CPU cores - same MTBF per disk but lots of them presented out by one physical interface.
3. We change technologies completely. SSD (interesting failure modes there too... needs RAID)

I guess we'll find out in only a few years...

Re:Worked-around a Long Time Ago by plover · 2009-09-17 23:42 · Score: 4, Interesting

Actually, storing data in a multiple data center / high availability environment is a completely related issue. The summary above talks of "entirely different paradigms." Cloud storage would be multiple data center based, which is entirely different from keeping the only copy on your local drives. In this concept, your machine would have enough OS to boot, and enough hard drive space to download the current version of whatever software you are leasing. Your personal info would always be maintained in the data centers, and only mirrored locally. Have a home failure? Drop in a new part or even a new PC, (possibly with an entirely different operating system, such as Chrome,) connect to the service, and you're 100% back.

It's no longer a novel concept for the home market. Consider Google Docs. It's not even being sold as "safer than RAID", it's being touted as "get it from anywhere" or "share with your friends". Safer than RAID is just a bonus.

So are we ready to move all our personal information to clouds? I certainly am not, but Google Docs are wildly popular and a lot of people are. I long ago learned that I can't look to myself to judge what the mainstream attitudes are in many things.

--
John

RAID 4 has a dedicated parity drive, not 5 by Targon · 2009-09-17 23:43 · Score: 4, Interesting

RAID 4 is where you have one dedicated parity drive. RAID 5 solves this by spreading the parity information for each drive to all the other drives in the array. RAID 6 adds a second parity block for increased reliability, but as a result of the increased write for that extra parity block, it slows down write speeds.

The real key to making RAID 4, 5, or 6 work is that you really need 4-6 drives in the array to take advantage of the design. I wouldn't say that it will fall out of favor though, because having solid protection from a single drive going bad really is critical for many businesses. Backups are all well and good for if your system crashes, but for most businesses, uptimes are more critical yet. So, backups for data so corruption problems can be rolled back, and RAID 5,6,10 for stability and to avoid having the entire system die if one drive goes bad. What takes more time, doing a data restore from a backup for when an individual application has problems, or having to restore the entire system from a backup, with the potential that the backup itself was corrupted?

With that said, web farms and other applications can get away with just using a cluster approach instead of a single well designed machine(or set of machines) have become popular, but there are many situations which make a system with one or more RAID arrays a better choice. The focus on RAID 0 and 1 for SMALL systems and residential setups has simply kept many people from realizing how useful a 4-drive RAID 5 setup would be.

Then again, most people go to a backup when they screw up their system, not because of a hard drive failure. With techs upgrading hardware before they run into a hard drive failure, the need for RAID 1, 4, 5, and 6 has dropped.

I will say this, since a RAID 5 array can rebuild on the fly(since it keeps working even if one drive fails), the rebuild time itself does not significantly impact system availability. Gone are the days when a rebuild has to be done while the system is down.

RAID6 with enterprise hardware is reliable by niola · 2009-09-17 23:56 · Score: 2, Interesting

I use RAID6 for several high-volume machines at work. Having double parity plus a hot spare means rebuild time is no worry.

But if you are not a fan you can always throw something together with ZFS's RAIDZ or RAIDZ2 which is also distributed parity but the ZFS filesystem checksums and keeps multiple (distributed) copies of every block to detect and fix data corruption before it becomes a bigger problem.

People using ZFS have been able to detect silent data corruption from a faulty power supply that other solutions would never have found just because of the checksumming process.

Re:RAID is here to stay by Anonymous Coward · 2009-09-18 00:23 · Score: 1, Interesting

Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.

Except that it doesn't work anywhere NEAR like that. The lifetime of disks are much, much greater than the read cost to rebuild a failed drive. You're definitely not spending 1% of its total lifetime. You're not even spending 1/1000th of that 1%. You couldn't measure the difference between having to rebuild that drive and just using the disk as a new one.

The vast majority of hard disk failures are manufacturing issues, not end-of-lifetime for an average drive issue. You don't have a raid system because the average lifetime of a harddisk is small, you get a raid system because of the outliers. Every once in a while, a manufacturer puts out a deathstar, and it's going to fail a month after you put it in. At the same time, you'll have disks in there that are going to keep chugging away for five years straight, and you'll eventually replace them because you want a bigger disk, not because they've failed.

I'm not sure I get it by Joce640k · 2009-09-18 00:27 · Score: 2, Interesting

Is he saying that you can never read a whole hard disk because it will fail before you get to the end?

That's what it seems like he's saying but my hard disks usually last for years of continuous so I'm not sure it's true.

--
No sig today...

Dear Seagate, Western Digital, et. al: by ThreeGigs · 2009-09-18 01:09 · Score: 4, Interesting

Here's what I want, folks:
A 5.25 inch device with 5 double-sided platters running at 5400 RPM. Basically the same size as a desktop CD/DVD drive, ala Quantum Bigfoot.
I want 8 sides of the platters dedicated to data, and the other two sides dedicated to parity (or one parity and the other servo), essentially a self-contained RAID on a single disk.
I want all data heads to write and read simultaneously, in Parallel. The idea is to have 64 byte sectors on each platter which are recombined into a 512-byte result. 8 heads writing and reading in paralell means HUGE throughput for sequential operations.

It's RAID 5 or 6 on a single disk, although without spindle redundancy.

And I also want a high-performance option: 2 sets of read/write heads 180 degrees apart, which effectively would cut seek times in half, making the drive perform more like a 10k RPM drive. With current densities, that's 12 TB in the volume of a DVD drive. It solves speed, sector error recovery and capacity issues. The only thing missing is a data bus that can handle the throughput.

Re:Dear Seagate, Western Digital, et. al: by adisakp · 2009-09-18 06:24 · Score: 2, Interesting

I want 8 sides of the platters dedicated to data
More platters == more mass. Which translates to more power required for the motor, higher energy usage and much more heat generated by the drive. Generating more heat == quicker hardware failures. Also with bigger / larger / more platters, it's much harder to spin the platters faster. Usually more platters == slower RPM drive speed and much slower seek rates. If you can do fewer, smaller, and lighter platters, you can make the drive spin faster and perform better -- this is exactly what the Velociraptor does with it's high RPM 2.5" format.

Also, using only one side of the platter is often faster and more reliable because the head arm weighs less (1/2 the heads) so they don't have as much mass to impede fast seeking or to cause vibration. Plus you don't have to worry about the alignment on both sides of the platter. This is one reason why the highest speed drives do not necessarily even use both sides of the platter.

It's RAID 5 or 6 on a single disk, although without spindle redundancy.
No it's not... what happens if the control electronics fail, the arm actuator, or the spindle motor? RAID 5/6 have whole disk redundancy. You just have data redundancy on the platters - not full hardware redundancy. Also, all this extra components you want to add to the drive will just make it more complicated and have more points of failure so the drives will actually fail earlier.

And I also want a high-performance option: 2 sets of read/write heads 180 degrees apart, which effectively would cut seek times in half, making the drive perform more like a 10k RPM drive.
Except that it would slow the drive down by making calibration harder and slower. Moving the head arms causes vibration and movement in the drive. One arm would not be able to reliably read while the other was moving unless the drive was spinning slower to begin with.

The moral of your story is that you have some interesting ideas, but believe it or not, most of them have already been tried and rejected well before coming to market because they weren't feasible or reliable or didn't actually result in performance improvements in a cost effective manner.

Re:RAID is here to stay by maraist · 2009-09-18 01:10 · Score: 2, Interesting

I don't understand what your failure rate strategy is. First of all, there's no such thing as saying you are 90% or 10% of the way through a disk's life.. It's a probability distribution, who's probability is dramatically effected by the current events (and somewhat related to historical events). A drive might be at a 0.00005% probability of failure at any given moment, but then a large sustained read occurs which adjusts the heat and causes voltage fluctuations , so now you're operating at 0.001% probability.

Then a drive dies in hot-swap-mode, a drive spins down, then another spins up, this has massive voltage fluctuations as well as slight tension on the cabling which causes reflections in the wiring which increases your probability of failure to say 0.02%. (I'm totally making up numbers, but the trends are what's important).

So the act of powering down/up or hot-swaping intrinsically increases the probability of co-disk-failures, unless you have a very expensive system with separate AC/DC converters (e.g. fully decoupled) and obviously isolated frames, heat-compartments, etc.

BUT, you can mitigate this by having 3+-way redundancy (RAID-1; I honestly don't understand the point of using slower RAID-5 / RAID-6 anymore). So when one drive fails, you have addressed the probability of a second failure. There is a geometric reduction in probability that 3 or 4 or 5 simultaneous drives fail. Meaning even at the peek risky part of the drive-swap operation, if you have say 2% probability that another drive will fail, then there is 0.004% probability that two drives will fail simultaneously. 0.0008% that three fail, etc.

This isn't strictly correct, of course, because the probabilities are not fully independent. You have many common components, and thus their probabilities are intertwined. But sufficient to say the probabilities are less.

Now I say 3+way RAID-1 because it may be silly to swap out a single drive when one goes bad. The process I would recommend (if you have a sufficiently advanced RAID controller, and non-super-expensive disks), is this:

5-way RAID-1 with 2 powered down disks (thus effectively 3-way RAID-1)
On a drive failure, power up the two disks and initiate their syncing.
Swap out the error'd drive, and and initiate it's syncing.

For a brief-while, you have 2 valid, 2 semi-valid, and 1 semi-semi-valid drive.

As the drives sync-up(may take over 24 hours), power-down the original remaining 2 and remove them.

Recycle the good disks into JBOH (Just a bunch of hardware) clustering. Meaning boot-disks / log-file disks in say RAID-1, swapping out the oldest drive.

You can either buy several 4-way/5-way RAID controllers, or get a single 15-disk RAID controller for under than $1k. This allows you to have multiple logical volumes and share the 'spun-down disks', So now you're really only using 3 disks per logical-volume, though having two logical volumes with bad disks does reduce your ideal reliability somewhat. But this gives you 4 volumes which can be combined into RAID-10. You could build such a system for under $6k with various mixtures of high-end and low-end disks (for different partition requirements, boot/OS/linear-logging (RAID-1), random-write-data (RAID-10)).

If the data is super critical, use a block-level master-slave replication. Ideally your application supports direct master-slave or better yet, multi-master.

And if you're JBOH (Just a Bunch Of Hardware) clustering, then trivial RAID-1 with 2 or 3 disks (in software-raid) is all you need. Note, I use 3-disk RAID10 on my home linux machine, (that plus DVD drive fills up my IDE slots) - pretty clever technique. Yes I know virtually all MB's have hardware RAID these days, but unless they've got an extra 4Gig of buffer-RAM in them, they're pointless in my opinion, plus they're non-portable (screw transparent windows support, you can't distinguish disk errors from forced reboots anyway).

--
-Michael

Re:RAID is here to stay by LWATCDR · 2009-09-18 01:21 · Score: 2, Interesting

Well the logical thing IMHO is after the first year you put in a new drive and do an array rebuild after making a backup.
Drives are really cheap and I would do that for as long as the array is in use.
Reuse the old drives in desktops if they are SATA.
Not perfect but it keeps you from having an array of old drives in your server.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.

Crappy RAID's days are numbered by zerofoo · 2009-09-18 01:26 · Score: 2, Interesting

Having worked with plenty of enterprise grade raid (EMC symetrix, clarion, and Dell SAN devices) I can say that capacity and rebuild times are not a problem for high-end arrays.

What will bring the problem to the masses are these stupid consumer NAS boxes. It is very easy to build a 4 or 8 TB array for home use using relatively cheap hardware. Unfortunately, no home user/abuser, that I know, has the skill set to manage or protect such a large array of data.

My most recent experience with a Western Digital sharespace was awful. Here is a box with a Gigabit NIC, and 4 - 2TB hard drives in a RAID 5 array that has transfer rates around 9MB/sec at best. Combine that pitiful performance with a rebuild/reformat time of over two days - and you know where this is going.

Average joes are going to put their entire lives on these things and never back them up due to the time and space cost. When a failure does occur - it will take days to perform a rebuild of the array - vastly increasing the likelyhood of another failure and permanent data loss.

Crappy RAID's days are numbered - good RAID implementations will be with us as long as hard drives have ANY failure rate at all.

-ted

fill the drive with helium by speedtux · 2009-09-18 02:29 · Score: 3, Interesting

Filling the drive with helium should help; the speed of sound in helium is 3x higher than in air, and it offers less resistance.

(Hydrogen would be even better, but it has a tendency to interact with metals in unfortunate ways.)

Re:fill the drive with helium by dkf · 2009-09-18 12:17 · Score: 2, Interesting

Filling the drive with helium should help; the speed of sound in helium is 3x higher than in air, and it offers less resistance.
(Hydrogen would be even better, but it has a tendency to interact with metals in unfortunate ways.)

Thinking about it, methane might be a more practical choice. Yes, it's denser than helium so the effect won't be anything like as strong (the speed of sound in methane is only about 40% faster) but it's also very cheap and available, and won't cause too many problems from interacting with the rest of the drive. Having to seal the drive is an issue, yes, but that's not far off what's needed now; it's imperative that dust is kept out of the platter enclosure anyway...

--
"Little does he know, but there is no 'I' in 'Idiot'!"

Re:Parent is spot on by Anonymous Coward · 2009-09-18 08:40 · Score: 1, Interesting

I do like mirroring more than this parity stuff.

Let's assume I have e.g. a RAID10 containing 100TB of Data. I loose one disk. and then I loose the second one. CrashBoomBang!

Of course I'm not stupid and I have a backup of all my 100TB of Data. What traditionally would happen now is, I will need to restore all of my Data, as I don't know what I had on these disks. Takes ages!

Now comes the magic of ZFS. Through the combination of ZFS as a Volume manager and a file system, ZFS can tell me exactly which files are missing. I will then replace both failed disk. If using 1TB Disks, I don't have to restore 100TB, but only up to 1TB. This will minimize the recovery time up to 100 times.

37 of 444 comments (clear)