RAID's Days May Be Numbered

simple idea by shentino · 2009-09-17 21:16 · Score: 2, Interesting

Don't consider an entire drive is dead if you get a piddly one-sector error.

Just mark it read only and keep chugging.

Re:simple idea by Anonymous Coward · 2009-09-17 21:22 · Score: 0

Do you have any idea what a RAID is? By your post, I assume not.
Re:simple idea by paulhar · 2009-09-17 21:34 · Score: 4, Informative

Enterprise arrays copy all the good data off the drive to a spare drive, use RAID to recover the failed sector(s), then fail the broken disk.
Re:simple idea by Eric+Smith · 2009-09-17 22:45 · Score: 4, Insightful

The drives already do that internally. By the time they're reporting errors, bad things are happening, and it really IS time to replace the drive. Anyhow, drives are inexpensive. It's more cost effective to replace them than to spend a lot of time screwing around with them.
Re:simple idea by paulhar · 2009-09-17 23:05 · Score: 3, Informative

They do to varying degrees of success but just because a disk can't read a particular sector doesn't mean that the drive is faulty - it could be a simple error on the onboard controller that is causing the issue.
FC/SAS drives mostly leave error handling up to the array rather than doing it themselves because the arrays can typically make better decisions as to how to deal with the problem and helps cope with time sensitive applications. The array can choose to issue additional retries, reboot the drive while continuing to use RAID to serve the data, etc.
Consumer SAS drives on the other hand try really hard to recover from the problem - for example retrying again and again with different methods to get the sector and while admiral that leads to behaviours we see in consumer land where the PC just "locks up". The assumption here is that there is no RAID available and so reporting an error back to the host is "a bad thing". The enterprise SAS drives we're seeing on the market are starting to disable this automatic functionality to make them behave correctly when inserted into RAID arrays.
Usually ;-)
Re:simple idea by Anonymous Coward · 2009-09-18 00:06 · Score: 5, Insightful

Enterprise arrays are also very VERY different from what most people know as RAID. Smart controllers, smart drive cages, drives that are a magnitude better than the consumer grade garbage.
The Summary talks about how speed has not kept up with capacity, Yes that is correct in the low grade consumer junk. Enterprise server class RAID drives are a different story. The 15,000 RPM drives I have in my RAID 50 array here on the Database server are insanely fast. Plus server class drives are not silly unstable capacities like 1Tb or 1.5Tb they area "OMG small" 300gb size but are stable as a rock.
So I guess the question is, Is the summary talking about RAID on junk drives or RAID on real drives?
Re:simple idea by Coren22 · 2009-09-18 00:49 · Score: 3, Interesting

They aren't talking about drive speeds as much as failure rate:

The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss.
They are talking about the MTBF of drives has not gone up as fast as the capacity, and the fact that a missed write is actually quite likely with a modern high capacity drive. Even saying drive speeds haven't gone up is very accurate, 15k RPM drives have been around for quite a while now, at least for 10 years, and there has not been an improvement in speed in that time. Where are my 30k RPM drives?~
Also, I have a bit of a problem with your statement about OMG small enterprise drives. Enterprise drives have caught up to consumer drives in size, you can now buy 1TB SAS drives; they are just OMG expensive compared to the consumer drives.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:simple idea by alva_edison · 2009-09-18 00:55 · Score: 4, Interesting

The problem becomes space in the data center. I don't know about you, but we're trying to cram Petabytes into existing computer rooms and coming up short. Plus you don't address Tier 2 or Tier 3 storage which tends to be on SATA or near-line SAS both of which have the ridiculous size problem. Calling 15,000 RPM fast in the datacenter is also misleading because those are the speeds we've been at for a few years now, 10GB iSCSI (or FCoE, which bypasses the collison problem) is about to render that untenable. The current solution tends toward storage virtualization (in this case virtualization means excessive amounts of high-speed cache in front of controllers and less control on where controllers allocate space). The future is most likely some kind of grid technology (like XIV from IBM). Where any blcok is on two random drives in the array, and only the controller knows where. This means that drive rebuilds become subject to swarm speeds (since there is an equal chance that it is pulling data from every other drive in the tower).

--
He effected a bored affect.
Re:simple idea by paulhar · 2009-09-18 01:02 · Score: 4, Interesting

You're not likely to see 30k RPM drives any time soon. The speed of a 15k drive means that the outer edge of the 3 1/2" drive is spinning pretty fast... getting close to the speed of sound and the lions share of power consumed by 15k drives is consumed in counteracting the air buffeting the heads. With 2 1/2" drives we could go faster but while drives are open to the air it's not likely we'll see much in the short term.
It's why CDROM speeds haven't gone up much since the old day of 52x.
As areal density improves the drives will be able to push out more raw MB/sec just like DVD is better than CD, but in terms of IOPs it's not likely to dramatically improve.
Re:simple idea by JediTrainer · 2009-09-18 01:32 · Score: 2, Funny

lions share of power consumed by 15k drives is consumed in counteracting the air buffeting the heads

Until some genius figures out how to build one with no air inside?

--

You can accomplish anything you set your mind to. The impossible just takes a little longer.
Re:simple idea by MBGMorden · 2009-09-18 01:37 · Score: 1

The speed of a 15k drive means that the outer edge of the 3 1/2" drive is spinning pretty fast... getting close to the speed of sound and the lions share of power consumed by 15k drives is consumed in counteracting the air buffeting the heads. With 2 1/2" drives we could go faster but while drives are open to the air it's not likely we'll see much in the short term.
Pardon my ignorance here, but is there any reason the casing couldn't just be vacuum sealed such that there was no air in the chamber where the platters were spinning?

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
Re:simple idea by denis-The-menace · 2009-09-18 01:37 · Score: 2, Interesting

Why not add multiple heads to the same platter?
Keep the disk spinning at 15K but add heads with their own actuator and everything. One could read only the other write only. Whatever makes sense.

--
Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
Re:simple idea by Anonymous Coward · 2009-09-18 01:45 · Score: 0

> getting close to the speed of sound
Well, more like mach 0.2, but pretty impressive nonetheless.
Re:simple idea by Coren22 · 2009-09-18 01:51 · Score: 1

I was trying to use the new punctuation, the ~ to mean sarcasm. I know that it is very unlikely to ever see those kinds of speeds from hard drives. I had one of these and I remember how very hard it was to get it up to 15k rpm, so I would expect the amount of power to go to 15k rpm in a hard drive is pretty high.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:simple idea by Just+Some+Guy · 2009-09-18 01:51 · Score: 1, Informative

The speed of a 15k drive means that the outer edge of the 3 1/2" drive is spinning pretty fast... getting close to the speed of sound
3.5in * 3.14 * 15000r/m * 60m/h * 1ft/12in * 1mi/5280ft = 156mi/h
That's still pretty fast, but not nearly the speed of sound at STP.

--
Dewey, what part of this looks like authorities should be involved?
Re:simple idea by Coren22 · 2009-09-18 01:52 · Score: 1

That is an interesting idea, only problem is, you would have to put one on each end of the disk drive so they had no chance of hitting each other. This would probably increase the size of the package quite a bit though.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:simple idea by russotto · 2009-09-18 02:00 · Score: 3, Insightful

Pardon my ignorance here, but is there any reason the casing couldn't just be vacuum sealed such that there was no air in the chamber where the platters were spinning?
You'd need a whole new way of keeping the head off the platter. You'd have a problem with lubricants vaporizing. Heat would be a problem as well.
Re:simple idea by operagost · 2009-09-18 02:05 · Score: 4, Informative

I'll assume you aren't trolling, and point out that disks work BECAUSE OF the air inside. The heads gain lift.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:simple idea by operagost · 2009-09-18 02:11 · Score: 3, Informative

The only real difference between WD's enterprise SATA and their consumer line (other than, perhaps, the warranty) is a firmware setting that determines how long it attempts to write to a sector before giving up and using a spare block. It has to be reduced for enterprise use so that the RAID controller doesn't fail the disk prematurely. My WD disks kept "failing" until I set this timeout shorter. It's been a year since I did that, and I've had no failures or data corruption. It's possible that this is no longer the case for their latest models.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:simple idea by amoeba1911 · 2009-09-18 02:12 · Score: 3, Informative

Speed of sound at sea level: 340.29 m/s verify
((3.5 inches) * (2.54 (cm / inches)) * pi) * (((15000 / minute) * (1 minute)) / (60 second)) * (0.01 (meter / centimeter)) = 69.8218967 m / s verify
If my calculation is correct, the outer edge of a 3.5" plate spinning at 15000 RPM is moving at 69.82m/s, which is about 20% of speed of sound. It's fast, but it's nowhere near the speed of sound.
Re:simple idea by Zenaku · 2009-09-18 02:12 · Score: 4, Informative

Air is necessary for the read/write head to operate. The piece that comes into close proximity of the platter is essentially a tiny hovercraft. It's about the size of a pepper flake, and has a microscopic pattern called an "air bearing" carved into the side facing the platter. Designing this air bearing is an exercise in fluid dynamics -- it is the shape of the bearing and how air flows over it that allows the read/write head to skim over the surface of the platter at a distance measured in microns without actually contacting the surface of the platter.
If the read/write head does contact the surface of the platter, that is called a head crash, and is bad.

--
If fate makes you a motorcycle, you become a motorcycle.
Re:simple idea by Firethorn · 2009-09-18 02:15 · Score: 4, Interesting

Even partial evacuation would help, but you run into the problem that the read heads are designed to use the air to keep them from contacting the platters, so you'd need to replace that effect somehow.
The Space shuttle and ISS even have special sensors to shut the hard drives down if the air pressure goes too low. Reading about which was how I found out that hard drives are designed to use air.
Not to mention that you're now trying to build an air tight container, but if you're looking at ultra-high performance drives that's less of an issue.
Still, you have to look at how much such a drive would cost, and whether the cost would ever be repaid - if I was looking at investing in such technology I'd be concerned that Flash would outpace my vacuum drives before I got them released. Even if I DO manage to find a niche, would the niche last long enough against flash memory that's getting faster and cheaper so quickly?
For certain data sets and access patterns, flash is already much cheaper than the old raid options - the best example I saw was a dataset of a few hundred gigabytes that was mostly read-only, but accessed so much so randomly they had to mirror it on 10 hard drives to meet the read demands. One professional level SSD performed BETTER, while costing less than half of the setup.

--
I don't read AC A human right
Re:simple idea by smaddox · 2009-09-18 02:21 · Score: 1

The current drives designs probably aren't that far away from being able to hold a decent vacuum (just enough to spin faster. There still needs to be enough air to keep the head of the disk). The problem is that the air eventually will leak in, and without a way to remove it again, the drive lifetime will be dramatically reduced.
In other words, in order for low internal air pressure drives to work, they need to have built in pumps. That is not going to happen.
Re:simple idea by Anonymous Coward · 2009-09-18 02:23 · Score: 0

It will certainly take a genius. The read/write heads ride on a thin air boundary layer above the surface of the platter. Operating in a vacuum would result in direct contact.
Re:simple idea by Firethorn · 2009-09-18 02:25 · Score: 1

This means that drive rebuilds become subject to swarm speeds (since there is an equal chance that it is pulling data from every other drive in the tower).
Well, I'd imagine that the bottleneck would be the replacement drive's write speed, wouldn't it? Even if the controller has enough spare block space on all it's other drives, the new drive is still empty, and that takes quite a bit of time to fill today.
Though I suppose you could restore redundancy very quickly via putting half of all NEW data on the new drive, while the lost redundancy is restored using spare space on all the other drives while waiting for the new drive to come online.

--
I don't read AC A human right
Re:simple idea by Amouth · 2009-09-18 02:30 · Score: 1

If the read/write head does contact the surface of the platter, that is called a head crash, and is bad.
Head To Plater Interface - is a perfect way to lose data - normaly just fucks the head up and the sector it hit.. but i did once have one fail - it put a really nice scrape on the platter and the head and 1/2 the arm exited the drive via the side of the caseing. I'm very glad i was not near it when that happened. Theres a reason we called them IBM DeathStars

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:simple idea by Gothmolly · 2009-09-18 02:37 · Score: 3, Interesting

Can you say "instantaneous heat death" ? Vacuum is an excellent insulator.

--
I want to delete my account but Slashdot doesn't allow it.
Re:simple idea by Anonymous Coward · 2009-09-18 02:55 · Score: 0

Never mind that rotational velocity is still tiny compared to seeking the head...
Re:simple idea by Anonymous Coward · 2009-09-18 03:03 · Score: 0

Tiny little rockets.
See lift, no hard at all.
Re:simple idea by tedgyz · 2009-09-18 03:19 · Score: 1

Where are my 30k RPM drives?
They are likely to blow up in your face. Well, not exactly, but the stability of the outer edge becomes an issue leading to too much wobble. This is why cdroms stopped getting faster. The disks would shatter due to instability at high speed. I'm not saying a 30k HD would shatter, but I think it is too unstable to read/write.

--
"No matter where you go, there you are." -- Buckaroo Banzai
Re:simple idea by AlecC · 2009-09-18 03:20 · Score: 3, Informative

No - to reconstruct 1 sector you have to read one sector from every other drive, then write 1 sector to the replacement drive. Effectively, to reconstruct you have to read thw whole raid. So the read and write speeds both count.

--
Consciousness is an illusion caused by an excess of self consciousness.
Re:simple idea by rayzat · 2009-09-18 03:32 · Score: 2, Informative

I do a lot of storage work and whenever the talk comes to spindle counts I've always wondered this as well. Since the only thing no scaling in hard drives these days are the rotational and read speeds adding another head could double the single drive throughput and IOPS. I've looked into it and found 100's of patents on the idea, and one drive from 1986 that had multiple heads, it had 8 if I remember correctly, looked to be the size of a record player, held 200 MB, and cost 250k.
Re:simple idea by Anonymous Coward · 2009-09-18 03:32 · Score: 0

I don't think its very close to the speed of sound at all.
3.5 inches * pi * 15000 per minute in mph = 156 mph
Re:simple idea by PitaBred · 2009-09-18 03:33 · Score: 1

Just FYI, mechanical drives will NEVER not be open to the air. The heads depend on the Bernoulli effect to lift above the platter to where they to the reading. No air means that the head stays in contact with the disk, which is bad mojo.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:simple idea by Anonymous Coward · 2009-09-18 03:33 · Score: 0

A CD rom is 120mm in diameter, so an RPM covers 120 * 2 * pi / 1000 = 0.7536 meters in a second, or * 15000 / 60 = 188.4 meters per second
Re:simple idea by FireFury03 · 2009-09-18 03:45 · Score: 1

I'll assume you aren't trolling, and point out that disks work BECAUSE OF the air inside. The heads gain lift.
I'll correct that for you:
Current designs of hard disks rely on there being air inside in order to float the heads. But you wouldn't necessarily need to float the heads on air if you redesign the disk to suspend the heads in some other way (Off the top of my head, lets say a magnetic field. Of course, you'd have to make sure the magnetic field you suspend the heads in doesn't interfere with the magnetic field they are reading/writing). Also, air is just a fluid - it may be that you can find a different fluid to fill the drive with that has better properties for the job in hand (e.g. higher speed of sound).

--
http://blog.nexusuk.org
Re:simple idea by Rich0 · 2009-09-18 03:54 · Score: 2, Informative

I'm surprised that nobody has mentioned the issue of failure of the drive material itself at higher rotational velocities.
I believe CDs are limited to 52X because the polycarbonate they are constructed of explodes when you get too much higher than that (with a safety factor of course).
A metal hard drive probably can take more speed, but I'm sure that at some point you get deformation of the platter. You also have bearings/etc to deal with. 30k is a pretty fast rotation rate - and we're talking about a device that is always-on.
Additionally, even 10k SCSI drives aren't exactly consumer-grade hardware. We're already getting in to the high-end realm, and the whole point of RAID was the "I."
Re:simple idea by zippthorne · 2009-09-18 03:54 · Score: 2, Informative

You know google does the conversion for you: 2*pi*3.5 inches * 15,000 minute^-1 in m/s = 140 m / s

--
Can you be Even More Awesome?!
Re:simple idea by zippthorne · 2009-09-18 03:58 · Score: 1

And of course, I figured out the difference as soon as I clicked the submit button.
I blame drive manufacturers using different conventions to make their drives appear larger for marketing purposes.

--
Can you be Even More Awesome?!
Re:simple idea by thisisntme · 2009-09-18 04:01 · Score: 1

120mm diameter, not radius, so 94.2 m/s
Re:simple idea by networkBoy · 2009-09-18 04:12 · Score: 1

like helium perhaps?
or lower the viscosity of the working fluid by operating under partial vacuum?
-nB

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:simple idea by Anonymous Coward · 2009-09-18 04:13 · Score: 0

(A comment can't be redundant when it was posted 20 minutes before the others.)
Re:simple idea by Anonymous Coward · 2009-09-18 04:19 · Score: 3, Funny

340.29 m/s is the speed of sound in a vaccuum.
Moran.
Re:simple idea by networkBoy · 2009-09-18 04:20 · Score: 1

bearings are easy to deal with.
I have a turbo pump that spins at 35Krpm, uses a ruby bearing. The thing is loud when it starts up, but soon falls totally silent as it pumps all the air out of the system.
To get the remaining air out once it's mechanically evacuated the air we add LN2 to the jacket, and turn on an ion pump to push the atmosphere molecules towards the impeller. Damn fine vacuum that produces. Useless for hard drives, as far as Bernoulli goes, but useful on the bearing side.

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:simple idea by Binary+Boy · 2009-09-18 04:30 · Score: 3, Funny

lions share of power consumed by 15k drives is consumed in counteracting the air buffeting the heads
Until some genius figures out how to build one with no air inside?
Lions need air.
Re:simple idea by networkBoy · 2009-09-18 04:31 · Score: 1

WrenIII drives did this.
High performance ESDI drives, all of 160 meg and the size of two CD-Rom drives stacked up.
Still have mine.
Ahh the memories of opening a debug 0x000zzzzz session to the controller and defining all the format data manually:
62, 63, or 64 sectors per track (0,1,or 2 spare sectors), 128, 256, or 512K sectors....
those were the days.

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:simple idea by WickedLilMonkies · 2009-09-18 04:49 · Score: 2, Informative

You're not likely to see 30k RPM drives any time soon. The speed of a 15k drive means that the outer edge of the 3 1/2" drive is spinning pretty fast... getting close to the speed of sound ...It's why CDROM speeds haven't gone up much since the old day of 52x...
Perhaps I haven't taken a math class in a while, but my cocktail napkin calculation says that a 3.5 inch disc spinning at 15,000 times per minute will travel just over 156 miles/hour. No where near 761 mph (speed of sound).

3.5 x Pi = 11 inch circumference x 15000 = 164,933 inches per minute / 12 inches / 5280 feet/mile * 60 minutes/hour = 156 mph.

Furthermore, while I don't argue your point that they are spinning pretty fast, I disagree with your assertion that CDROM's haven't increased because of this. More like, I believe CDROMs are simply not manufactured within sufficient tolerances, as indicated by their frequent vibrations when they spin up, and such vibrations could cause them to shatter.

For amusement: http://www.powerlabs.org/cdexplode.htm
Re:simple idea by lewiscr · 2009-09-18 05:03 · Score: 1

How did you change this setting? I don't recall seeing this setting in hdparm or smartcrl. Is there a WD utility for this? I don't have a need for this info, I just want to poke around. :-)
Re:simple idea by 0xFCE2 · 2009-09-18 05:04 · Score: 1

Keep the disk spinning at 15K but add heads with their own actuator and everything.

Has been done some time ago (so no 15k/min), see: http://www.tomshardware.com/news/seagate-hdd-harddrive,8279.html
Re:simple idea by Anonymous Coward · 2009-09-18 05:12 · Score: 0

Mod parent Funny. He might've been serious, but if so, it's even more ironic.
Re:simple idea by Tuoqui · 2009-09-18 05:17 · Score: 1

I think the point they're trying to make is that 'consumer grade' stuff can utilize the same methods as some of these high end enterprise setups.
In reality what I'd expect to see given the falling price of storage technology is a move towards RAID 5+1. Which is what the Enterprise level stuff is doing but with consumer grade hardware. I know the original idea behind RAIDs were to provide redundancy and protection from data loss. Thus maybe we'll see people moving to RAID1+0 since it's simpler to deal with and you can always have 2, 3 or 4 drives in the 1 section to build up a sufficient amount of redundancy since each disk you add limits your data loss. Also if you have 3 or 4 drives rebuilding would take faster since 2 drives could be used for RAID operation while 1 drive would be dedicated to rebuilding the drive being replaced.

--
09F911029D74E35BD84156C5635688C0
+2 Troll is Slashdot's way of saying groupthink is confused
Re:simple idea by frosty_tsm · 2009-09-18 05:41 · Score: 1

like helium perhaps? or lower the viscosity of the working fluid by operating under partial vacuum? -nB
Helium wouldn't work. You'd want higher density (speed of sound is higher in water than air). Viscosity isn't quite the issue. The issue is more how smoothly the air or fluid travels (which is why approaching the speed of sound is bad).
Re:simple idea by Anonymous Coward · 2009-09-18 05:49 · Score: 0

3.5" high-speed, enterprise-class drives may have a 3.5" form-factor, but they use 2.5" platters, not 3.5".
Re:simple idea by networkBoy · 2009-09-18 06:08 · Score: 1

right then...
$workingFluid =~ s/helium/sulpherhexaflouride/ig;

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:simple idea by Dogtanian · 2009-09-18 06:10 · Score: 1

I was trying to use the new punctuation, the ~ to mean sarcasm.
I've only ever seen that in one guy's .sig, which came across as him unilaterally taking it upon himself to invent some form of punctuation then expecting everyone to adopt it.

I might be wrong, I just haven't seen it elsewhere.

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Re:simple idea by Crystalmonkey · 2009-09-18 06:12 · Score: 1

I'd say Hydrogen but it seems like Sony laptops are randomly testing that as we speak...
Re:simple idea by nuckfuts · 2009-09-18 06:18 · Score: 1

Are you nuts? Modern hard drives do all sorts of internal magic to present a flawless interface externally, like geometry virtualization, spare sector mapping (sometimes), and retry algorithms that try every combination of head alignment, early/late timing, etc.
When you get to the point that a hard drive reports a single hard error, there is something seriously wrong with that drive. Consider it an early warning. Copy the data off asap and replace the drive.
I have had many drives replaced by manufacturers due to a single-sector error. Not once has a manufacturer suggested that the drive might be "OK".
Re:simple idea by Anonymous Coward · 2009-09-18 06:36 · Score: 0

We are talking about RAID, not RAED (I=Innexpensive, E=Expensive).
Re:simple idea by pjr.cc · 2009-09-18 06:50 · Score: 3, Interesting

Unfortunately all that is quite a myth for the most part.
Having worked in storage for a aeons the reality is that the difference between enterprise and "consumer grade rubbish" has very little to do anything but tollerance. If you picked up a 300G 10k enterprise drive and compared it to the consumer grade rubbish you'd find nothing different. It used to be the case, way back when, that they were very different but because consumer grade drives have gotten so much better its just not worth the expense of building the same drive for enterprise as for consumers with slightly different specs. What is different is the acceptable tollerances, when a platter comes off the line if its within 2% of its manufacturing tollerances its ok to use for entperise and if its higher they throw it into consumer. The reality is that most drives are in that "better than 2% tollerance" range and that is simply because the processes to make them have gotten so good over the years. The point is that when you hit your magic tollerance number, the drive is capable of 100% duty cycle.
So essentially, the difference between "consumer" and "enterprise" when it comes to the casing, the platters, the heads and the motors is zero. There are alot of different spec drives out there today ranging from 146gb (typically the smallest you'll find these days) all the way to 2gb with speeds form 7200 to 15000 rpm and enterprise is the only place that uses all of them, but they still come off the same manufacturing line. The drivers behind it all come down to the consumer itself, in enterprise its often about performance, and with consumers its about size. Very conveniently building bigger consumer grade drives typically means improving the performance of a drive in ways that scale straight back to the enterprise. Sure, you wont see many users throwing around 15k rpm drives, but thats more because its unnecessary.
So why is it that in the mid-to-low server range do we find 300gb 15k drives? Because its a cheap way of getting performance - and that is fairly important at that end of the market where servers need to be cheap and theres alot of competition (you know, 1-2ru with 4-8 drives and a raid card, no san).
So what else differs between the two? Interface. In the mid-to-low server range we start talking SAS and this is more to do with being able to talk to several drives at once (Again not something alot of consumers do other than with usb drives perhaps). The SAS interface is quite brilliant cause it can scale quite well to a larger number of drives than can SATA and does it very cheaply. It also takes alot of load off the server when it comes to processing data transfer (for a large number of drives). But in that same space you WILL find sata drives going up to 2tb (often servers lag consumers in size simply because of certification, not because of anything to do with stability). To call a 1tb drive unstable is rather silly in reality.
Now the BIG end of town - SAN's. These days in most SAN's you'll find a mix of SATA and Fibre channel (some do do SAS as well, but its uncommon though its changing). In the SAN end of town (the big boy game) you'll see it all. 7.2k rpm 2tb SATA's sitting in the same array along side 146g 15k RPM fibre channel and its all about trading off storage density/cost to performance. Consider this: 10 1tb sata drives can consume (easily) a 8gbps FC interface - OUCH! Now alot of SAN arrays start at around 4 FC intercaes and go up to maybe 16, but they'll be supporting literally thousands of drives. Alot of the SAN industry realised some time ago that throwing 2tb SATA's into an array made alot of sense because SAN interfaces have grown very slowly in terms of throughput and single HD interfaces have grown very quickly. There are even several very popular arrays that only do SATA and that was the driver behind "enterprise" grade large-storage drives (i.e. entperise grade 1tb+ sata drives). At the server you still get the fibre channel performance. The critical difference is that the array does more work
Re:simple idea by Grishnakh · 2009-09-18 07:05 · Score: 1

A CD-ROM is also made of plastic instead of aluminum, so it's not able to withstand the forces at high speeds than an aluminum platter can.
Re:simple idea by nairb774 · 2009-09-18 07:07 · Score: 1

Google handles unit conversions for you:
http://www.google.com/search?q=3.5+in+*+pi+*+15000+%2F+minute -> (3.5 in * pi * 15 000) / minute = 69.8218967 m / s
http://www.google.com/search?q=3.5+in+*+pi+*+15000+%2F+minute+in+yards%2Fhr -> (3.5 in * pi * 15 000) / minute = 274 889.357 yards / hr
Have fun
Re:simple idea by Coren22 · 2009-09-18 07:13 · Score: 1

Agreed. I just didn't know any other way to make it obvious it was a joke.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:simple idea by BitZtream · 2009-09-18 07:32 · Score: 1

Water is a different material with different material properties. Light travels at a different speed in water as well, but it isn't faster.
The speed of sound increases at an inverse proportion to air pressure as well. As aircraft get higher in the atmosphere, the speed of sound increases.
You wouldn't want a higher density gas in the drive enclosure, you'd want a lower density gas and a better method of avoid head strikes.
You also wouldn't want water in your drive enclosure since the drag and turbulence are much higher for it than any gas I'm aware of, even with an increased speed of sound.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:simple idea by Marble68 · 2009-09-18 08:02 · Score: 1

Just thinking out loud here, but to get to 30k RPM what if you spun the read heads in the opposite direction of the spindles?

Yes, yes, yes, there are many, serious engineering challenges to this; but it if the platters are spinning clockwise at 15k RPM, and the read heads spun counter-clockwise at 15k RPM; you effectively have 30k RPM.

It'd probably have a large impact on the physical dimensions of the drive itself.

Read head arm technology would have to undergo a complete redesign, and there's be warping and stretching of metal at such speeds.

Or perhaps the read head is completely replace by a read platter. Alternating platters with several read heads dispersed across both sides in a pattern, spun in the opposite direction.

Only thing I can think of to move data up and down the central bus would be some type of optics, and leverage the kinetic energy of the "read platter" to power the electronics at the center of it.

Or another idea, a read drum that spins opposite of the platters. The drum's heads aren't mounted perpendicular towards the center, but are angled to the head is mounted via multiple points.

Again, the bus and power have to be addressed.

Dunno, just spit balling...

--
/me sips his coffee and ponders a new sig...
Re:simple idea by Firethorn · 2009-09-18 08:25 · Score: 1

No - to reconstruct 1 sector you have to read one sector from every other drive, then write 1 sector to the replacement drive. Effectively, to reconstruct you have to read thw whole raid. So the read and write speeds both count.
That's for traditional RAID 5/6 applications. I was responding to the cloud system proposal, where there's no longer necessarily a 1:1 relationship between drives, sectors, and the data contained on them.

--
I don't read AC A human right
Re:simple idea by mindstrm · 2009-09-18 09:13 · Score: 1

Just fucks up one sector.... but it also throws up particulate matter that can cause further damage.
Once you crash a head, it's new drive time....
Re:simple idea by JSlope · 2009-09-18 10:06 · Score: 1

but while drives are open to the air it's not likely we'll see much in the short term.
We should pour some liquid vacuum there...

--
ResoMail - the alternative secure e-mail system
Re:simple idea by Anonymous Coward · 2009-09-18 10:24 · Score: 0

Where any blcok is on two random drives in the array, and only the controller knows where. This means that drive rebuilds become subject to swarm speeds (since there is an equal chance that it is pulling data from every other drive in the tower).
Sounds very similar to the google file system.
Re:simple idea by RedBear · 2009-09-18 12:04 · Score: 2, Informative

Besides which I have no idea what the speed of sound has to do with the theoretical upper limit of the speed of a spinning disk. It's not like an airplane wing with a trailing shock wave. I would think there would be much more pressing problems that are keeping us from seeing 30K RPM hard drives anytime soon, like:
- Shear strength of the platter material
- Total mass of the platter, especially near the edge
- Heat generated in the bearings
- Energy necessary to spin the platter at that speed
- Torsional forces from rotating the drive while it's spinning
And probably down near the bottom of the list of potential problems:
- Cavitation and/or shock waves from the air around the spinning platter.
Re:simple idea by lpq · 2009-09-18 14:54 · Score: 1

close to the speed of sound? define 'close'.
15k ~ 1/5, so 30k ~ 2/5 or certainly no more than 50%... is that what you mean by close?
They could get faster by using multiple heads out of phase from each other by either 180 or 120 degree's.
that AND multiple read heads in parallel/platter -- say add 2-3 then seek distance could be reduced by half.
If you use 2 arms at 180 degrees, you can read the disk in half the time.
I'm surprised no one has designed this type of disk -- all seem to stick to 1 head/platter-surface.
Maybe it's a form factor thing -- but maybe they could use the multiple heads on 2.5" platters in a 3.5" FF.
Of course they need to be sensitive to heat buildup under load -- maybe slow down the spin rate or something.
Worst thing I've seen on disks was in all the consumer-grade enclosures I looked at for external SATA's. At idle, they kept the disks in the mid 30C range. Consumer grade disks usually have a max temp of 40C. Under load, all of them, easily exceeded that safety margin. Only disks in the computer had enough cooling to stay below 40 consistently.
Re:simple idea by ResidentSourcerer · 2009-09-19 01:26 · Score: 1

If you are going to use the current pivot mechanism, I think 5 or 6 is all you can manage without them hitting each other.
However there may be problems with one head reading what another head wrote. I know this was a problem with floppy drives due to differences in head alignment.
Would this be a problem now?
For that matter, can a present head be used so that you can read platter 3 while writing to platter 1? It would mean more wires running to the head fork.
I see this as a huge complication with small benefit. After all, for reading, mirroring is effectively the same thing as having a dual head.
In general, throwing multiple spindles combined with a smart controller is probably less expensive than solving the control issues with multiple heads.

--
Third Career: Tree Farmer Second Career: Computer Geek First Career: Teacher, Outdoor Instructor, Photographer.
Re:simple idea by Dogtanian · 2009-09-19 01:56 · Score: 1

Agreed. I just didn't know any other way to make it obvious it was a joke.
Well, the winking smiley ;-) is the traditional way of highlighting a joke, as opposed to a scheme one person out of six billion has agreed to (^_^)

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Re:simple idea by Anonymous Coward · 2009-09-19 04:14 · Score: 0

The distance is actually measured in nanometers, and has been at or under 10 for the better part of a decade. The difference may seem small, but you're talking dust flakes vs. viruses (yeah -- that's how clean the clean rooms are, when virii are bad). If you scaled a read head up to the size of a 747, the scaled distance between platter and head (so ground and 747) would be all of about 1mm.
So yeah, it's an exercise in fluid dynamics... but it's a heck of a lot more complicated than just an exercise these days.
Re:simple idea by orac2 · 2009-09-21 02:34 · Score: 1

Actually --
"..the holy grail has always been a helium-filled drive. Helium offers several advantages arising from its very low density and very high thermal conductivity. It takes much less power to run the spindles; cooling of the VCM and preamp is greatly improved; and temperature differentials within the drive almost disappear. The most important gain is from the reduction of internal turblulence and buffeting of the actuator and disks."
-- from "Future hard disk drive systems," Roger Wood, Journal of Magnetism and Magnetic Materials 321 (2009) 555-561

--
"Just once, I'd like to meet an alien menace that wasn't immune to bullets." -- The Brigadier, Dr. Who
Re:simple idea by StuffMaster · 2009-09-21 06:47 · Score: 0

Try googling TLER. I believe there is a utility to update the drive to turn this on or off.
Re:simple idea by operagost · 2009-09-22 03:31 · Score: 1

WDTLER.exe

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:simple idea by jon3k · 2009-09-23 08:53 · Score: 1

Kind of like 3PAR. Drives are split into 256MB "chunklets" and data is written across large pools of disks. They're actually #1 in the SPC-1 benchmarks right now. We recently deployed an F400 (active/active dual controllers, 4Gb, etc, the usual suspects) and have been very pleased so far. Works exceptionally well with VMWare clusters.
Re:simple idea by Anonymous Coward · 2009-09-28 10:17 · Score: 0

Actually, the reason that we never got over about 52x (and most drives these days are scaled back to around 40x) is that at 60x the cds were starting to fragment. There are so many cds out there, it's no surprise that many of them are very poorly made. They'd get up to speed, and then the minor flaws in the cd would finally give way and they'd just explode into chunks of plastic shrapnel inside your drive. The drive would be toast after that, and hopefully your cd wasn't anything important.
Actually, I just re-read what you wrote, and you're saying the same thing I am. I wrote all this though, so I'm just going to post it anyway. I thought you were saying that the power draw was why they hadn't gone past 52x, when it was actually just plain physical limitations in the cds.

reallocate on write by Spazmania · 2009-09-17 21:22 · Score: 2, Informative

Or just regenerate and write the one sector from the parity data since all modern hard disks reallocate bad sectors on write.

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.

Re:reallocate on write by Erik+Hensema · 2009-09-17 22:19 · Score: 4, Informative

That's what any raid controller worth their salt does. I've seen 3ware and areca controllers do this, and those aren't the most expensive controllers on the market by far.

--
This is your sig. There are thousands more, but this one is yours.
Re:reallocate on write by badkarmadayaccount · 2009-09-19 23:39 · Score: 1

Gahhhh, I hate propietary one purpose controlers...
Where does that take us? Any serious computer *cough* mainframes *cough* the Amiga *cough* has dedicated I/O coprocessors. Heck, Intel CPUs could have had them, the 8089 was just that, but stupid beancounters in IBM though that it was not worthy of a socket. Seriously, parity calculations go on the GPU, and anything else on the IO core. Whether the data in question is going through the SATA bus, or Ethernet, or PCI, or whatever, it shouldn't matter. </rant>

--
I know tobacco is bad for you, so I smoke weed with crack.

Fuck Raid In The Goat-Ass by Anonymous Coward · 2009-09-17 21:25 · Score: 0, Troll

What's hard to remove is nigger grease. If you have ever gone to a swimming pool only to find that a bunch of blacks are already swimming in it then you know what nigger grease is all about. It is easiest to see right after they exit the water. It's a thin film on the surface of the water deposited by the vast quantities of oil that their overactive glands produce. Seriously we can be carbon neutral tomorrow if we found a way to make biodiesel out of it as the supply is abundant. Anyway, if you swim in a pool like that you will feel yourself coated with the nigger grease. It is not pleasant. Is there a filter somewhere that can remove even KFC-fortified nigger grease?

Solved a Long Time Ago by BBCWatcher · 2009-09-17 21:27 · Score: 4, Informative

Honestly, there really aren't that many unsolved problems in computing if you are sufficiently aware enough to include mainframes and mainframe operating disciplines in your consideration. The basic way the mainframe community solved this particular problem long ago was to, first, take a holistic view about mitigating data loss. Double concurrent spindle failures are just one possible risk element. What about, for example, an entire data center exploding in a spectacular fireball? (Or whatever.) IBM, for example, came up with several different flavors of GDPS and continues to refine them, and they include multiple approaches to data storage tiering across geographies, depending on what you're trying to achieve. Data loss, whether physical or otherwise (such as security breaches), is not a particular problem with this class of technology and associated IT discipline, nor does there seem to be any signs of a growing problem in this particular technology class.

Re:Solved a Long Time Ago by Anonymous Coward · 2009-09-17 21:32 · Score: 0

A great variety of solutions fit the "geocluster" scheme, it's certainly not a mainframe exclusive.
Re:Solved a Long Time Ago by Odinlake · 2009-09-17 23:59 · Score: 2, Insightful

As /.:ers so eagerly point out whenever RAID is mentioned: it's not for backup. It's for reducing downtime when hd's fail. So I assume that's the issue the original poster was thinking of. Not that I know what the solution would possibly be, but there's the correct question at least.
Re:Solved a Long Time Ago by DarkOx · 2009-09-18 00:31 · Score: 2, Informative

Well, the point the of the article is that if it takes your array 6 hours to rebuild instead of 4 because the capacities have gone up but the failure rate of the hardware is unchanged you have a problem. The problem is that you are more likely to experience another failure before the first one has been mitigated. If you have that additional failure on most raids (unless you are doing 5-5 or 1-5 or some other RAID over RAID scheme) you get down time. The volume is off line and must be restored from some other location.
The solution is usually a cluster or remote hotsite or something like that. It would be nice to have fast rebuild times back. There are lots of situations were 5 nines is not a requirement but downtime still should be avoided, shorter exposure windows for array rebuilds are a good thing.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:Solved a Long Time Ago by AK+Marc · 2009-09-19 17:41 · Score: 1

The problem is that you are more likely to experience another failure before the first one has been mitigated.

The first one is mitigated. Completely and totally. It's that the second one causes data loss and can't be mitgated because the first one, though completely mitigated, reducuced resources so that the second couldn't be mitigated.

The solution is usually a cluster or remote hotsite or something like that.

Which is a RAED. It's no more or less than RAID, other than it's expense is higher. It can fail. Just because you have a 5+1 with your "+1" being off site doesn't mean that you are any more reliable than a 5+1 on-site. And any scheme you have dedicated to you off-site would be cheaper in-house on-site (the exception being if you don't have dedicated resources, if you have 10 people sharing a RAID 6+1, that will be cheaper than any of those 10 trying to duplicate it themselves alone).

--
Learn to love Alaska
Re:Solved a Long Time Ago by Anonymous Coward · 2009-09-20 20:07 · Score: 0

GDPS (Geographically Dispersed Parallel Sysplex) is a disaster recovery technology and won't necessarily address the problem in the article. Unless the two mainframes are within about 50Km of each other (less than 5ms), GDPS is forced to work in asynchronous mode. In async mode, the source system buffers writes, which means some amount of data will be lost in the event that one system goes up in smoke with the datacenter. You have the same problem with very high transactional rate systems. Even putting the two mainframes right next to each other won't help because the latency caused by the write will slow system performance to the point that the second system will be forced to run in the asynchronous mode. There is a mode, called synchronous, that will prevent any data loss, but it is only usable when the two systems are close to each other and the transactional rate is low enough. BTW, every other operating system (Windows, Linux, AIX, Solaris and HPUX) can do the same thing as GDPS.

Bogus outdated thinking by twisteddk · 2009-09-17 21:28 · Score: 5, Interesting

The author says it himself in the article:

"And running software RAID-5 or RAID-6 equivalent does not address the underlying issues with the drive. Yes, you could mirror to get out of the disk reliability penalty box, but that does not address the cost issue."

but he hasn't adressed the fact that today you get 100 times as much diskspace for the same cost as you did 10 years ago when cost was a factor. In real life cost isn't a factor when it comes to datastorage, simply because it's really low in real life projects, as compared to the other costs in a project requiring storage. So if you want the reliability you go get a mirror. Drivespace is dirt cheap.

As for the rebuildtimes, fine, go buy FASTER drives. I dont see the problem. HP and many other vendors have long been trying to sell combined raid soltions (like the EVA) where you mix high storage with high performance drives (like SSD vs. SATA).

The only real argument for the validity of this article is the personal use of drives/storage. And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.

--
--- To err is human... Am I more human than most ?

Re:Bogus outdated thinking by TechnoFrood · 2009-09-17 22:56 · Score: 5, Insightful

I admit I haven't RTFA, but I don't quite get your statement of "And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.", I can't see how an SSD is a replacement for a raid-5 array. Everyone I know who uses a raid-5 uses it for large amounts of storage with a basic level of protection against data loss. I could justify replacing a raid-0 set up with a SSD.
That said I definitely couldn't afford an SSD that would be able to replace the raid-5 in my pc (4x500GB usable space of 1.34TB), the largest SSD listed on ebuyer.com are 250GB @ £360 each, I would need 8 to match my raid 5 setup which is £2880 which is probably enough to build 2 reasonable machines both with a 1.34TB raid-5 using normal HDDs.
Re:Bogus outdated thinking by drsmithy · 2009-09-17 23:00 · Score: 5, Insightful

And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.
Huh ? That's like saying show me 3 people who have a nice pair of running shoes and I'll show you 3 guys who can't afford a car.
Re:Bogus outdated thinking by daybot · 2009-09-17 23:04 · Score: 3, Informative

And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.
Yeah, every time an article on storage catches my eye, I have to check laptop SSD prices. So far, each time I do this, for the cost of a drive the size I need, I could buy a new snowboard, or a laptop, bike, half a holiday, room full of beer... etc. I really want one, but so far I haven't been able to look at that list and say "I'd rather have an SSD!"
Re:Bogus outdated thinking by c6gunner · 2009-09-17 23:12 · Score: 1

That said I definitely couldn't afford an SSD that would be able to replace the raid-5 in my pc (4x500GB usable space of 1.34TB), the largest SSD listed on ebuyer.com are 250GB @ Â£360 each, I would need 8 to match my raid 5 setup which is Â£2880 which is probably enough to build 2 reasonable machines both with a 1.34TB raid-5 using normal HDDs.
In today's prices, it'd be enough to build 2 machines with MCH higher capacity (5TB+). Remember that a 1 TB drive today costs less than what you probably spent per 500 gig drive.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-17 23:16 · Score: 1, Informative

Actually I run a RAID 5 array off of a server in my home. 8 147 GB SAS 15000rpm drives. I'm a photographer and have tens of thousands of images that I need an affordable storage solution for, and RAID 5 does the trick, along with off site back up.
For one SSD's are simply not practical, nor cost efficent yet, and certainly not in the size and quantity I would require. For two, your argument simply doesn't wash, even if you use SSD's it doesn't eliminate the need for a RAID array, not for someone that truly needs the fault tollerance and redundancy which is the reason for having an array in the first place. Your argument is simply to build an array at four times the cost. Sure, I can afford to spend 6 grand building an SSD RAID but the real question is why would I when I can have an enterprise class solution for $1500. Your summation is just ridiculous.
On a side note, if someone has a true need for RAID and they're using a software RAID solution then they're asking for problems. A hardware solution should be the ONLY consideration for a real RAID setup.
Re:Bogus outdated thinking by tg123 · 2009-09-17 23:28 · Score: 2, Insightful

I admit I haven't RTFA, but I don't quite get your statement of "And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.", I can't see how an SSD is a replacement for a raid-5 array. Everyone I know who uses a raid-5 uses it for large amounts of storage with a basic level of protection against data loss........
I hope your not mixing up Raid with a backup.
Raid when used for protecting your computer will not protect your data it just makes your system able to tolerate hard drive failure.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-17 23:32 · Score: 0

And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.
Why don't you name three people you know who have 2TB (or more) of SSD storage on their personal PCs?
Good luck with that.
Re:Bogus outdated thinking by lorenlal · 2009-09-17 23:44 · Score: 1

I think the point is that SSDs are one way to address the problem of rebuilds and thus, reliability. The limiting factor in disk bandwidth is the mechanical process of spinning the disk and getting the head over the right part and reading. A 6 Gb/s SATA interface was approved this year, but that was mostly due to the emergence of these SSDs. Yes, it's great that you can have a huge RAID-5 setup at home, and it's probably very little consequence if the array rebuilds for a couple days.
It's a whole different matter if you're a large business and you're sweating it out for those 2 days waiting for the rebuild to finish. Not because you'd lose your data... That's why you have backups. But if you have a second failure during that rebuild, you're looking at having to start the rebuild over again, and then you have to restore from tape. The solution for that is to have a second server with the setup... And depending on the cost of the first, may end up costing a lot more than those SSDs, especially if you factor in energy usage.
Since SSD's are still relatively new, yes, they're expensive. I also remember how 10 years ago, a 10 GB mechanical drive was about $100... The deal is, it's all about needs.. You don't need SSD at home... You don't need 99.999 uptime... But there are businesses and people that do. They'll be plenty happy to pay.
Re:Bogus outdated thinking by plover · 2009-09-17 23:45 · Score: 3, Funny

Half a holiday is overrated. Buy the SSD! :-)

--
John
Re:Bogus outdated thinking by J4 · 2009-09-17 23:47 · Score: 1

Argument sounds good on paper yet faster drives don't have a lot of impact because improvements in capacity/$ outpace improvements in speed. 2 x not fast enough=still not fast enough.
When morguefile.com went down they spent 3 days in limbo waiting for a RAID-6 array to rebuild and when it finally finished it was
garbage. The site was down for 2 weeks due to extenuating circumstances but what ate the time was the sheer amount of data that had to be processed.
One day I'll get around to writing down what really happened.
Re:Bogus outdated thinking by Shakrai · 2009-09-18 00:19 · Score: 3, Funny

Huh ? That's like saying show me 3 people who have a nice pair of running shoes and I'll show you 3 guys who can't afford a car.
We need a +1 car analogy mod.... ;)

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 00:21 · Score: 0

But there are businesses and people that do. They'll be plenty happy to pay.
And if those businesses have their critical data on some lousy Windows servers they were not happy enough to pay the cost for a system designed to do the job. There's so many SAN manufacturers out there creating decent solutions, that the best solution for an enterprise waiting 2 days for a disc rebuild is to find the people responsible for that decision not to go with a SAN and hang them by their balls.
Re:Bogus outdated thinking by FlyingBishop · 2009-09-18 00:22 · Score: 1

It's more like saying show me 3 guys that own an SUV and I'll show you 3 guys who can't afford a hybrid car. They're different engines, for different tasks.
Re:Bogus outdated thinking by Lumpy · 2009-09-18 00:32 · Score: 4, Informative

The problem is IT guys and PHB's that think RAID=Backup.
It's not and it never has been a backup solution. RAID is high availability and nothing more.
RAID does it's job perfectly for high availability and will continue to do so for decades. Sorry but I have yet to see any other technology deliver the capacity I use for my small 30TB Database we have at work. Our Raid 50 array works great. We also realtime mirror that to the Backup SQL server (not for backup of data but backup of the entire server so that when SQL1 goes offline SQL2 picks up the work.)
SQL2 is backed up to a SDAT tape magazine nightly.
RAID does what it's supposed to do perfectly, it's days are not numbered because no other technology other than RAID can provide high availability.

--
Do not look at laser with remaining good eye.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 00:32 · Score: 0

I have a nice pair of running shoes and can't afford a car. That's one.
Re:Bogus outdated thinking by L4t3r4lu5 · 2009-09-18 00:42 · Score: 5, Insightful

Raid when used for protecting your computer will not protect your data it just makes your system able to tolerate hard drive failure.
... Which will protect my data when a drive fails.

RAID-5 means that I can have 3x500GB drives with 1GB of space, and not have the same worry (total loss of data) that I would if a 1x1TB drive failed.

We know it doesn't replace backup. We know it doesn't protect against theft, fire, malicious data destruction etc etc. You do realise who you're talking to, don't you? This is an IT article on Slashdot. Telling people on this thread that RAID isn't a replacement for regular backups is like telling a mechanic that a stick of celery is not a suitable replacement for a piston.

--
Finally had enough. Come see us over at https://soylentnews.org/
Re:Bogus outdated thinking by TechnoFrood · 2009-09-18 00:44 · Score: 1

I am not mixing it up with a backup, anything important is backed up from the raid, but the fact it provides some form of tolerance is why I use raid-5 for the array instead of one of the other options my motherboard offers (JBOD,0,1,5).
Re:Bogus outdated thinking by Threni · 2009-09-18 00:46 · Score: 1

> In real life cost isn't a factor when it comes to datastorage, simply because it's really low in real life projects,
This is nonsense. You might be able to buy 1gig drives for £50, but they're shitty consumer drives, and they're way too slow for a proper company with a meaningful amount of data. If you are aquiring drives for a SAN based solution you'll be paying 1 or 2 orders of magnitude to store a given amount of data, once you've taken redundancy, the need for backups, the fact you'll want really fast drives which won't fail etc.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 00:47 · Score: 0

Obviously, YOU are not about to go to Oktoberfest!
Re:Bogus outdated thinking by bigstrat2003 · 2009-09-18 00:49 · Score: 1

That's a form of protecting your data, dude. Look, I'm not saying RAID is the only step you need to take, but let's not sell it shorter than we need to. It does protect your data, just not in every possible scenario.

--
"16MB (fuck off, MiB fascists)" - The Mighty Buzzard
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 00:55 · Score: 0

As for the rebuildtimes, fine, go buy FASTER drives. I dont see the problem.
Obviously, you don't. The problem is not the wallclock time spent for rebuild but the vast amount of data that is read.
Re:Bogus outdated thinking by Coren22 · 2009-09-18 00:57 · Score: 4, Insightful

I will never run RAID 5 on anything but data I don't care about. The risk is too great, and the rebuild times are not near good enough. RAID 1 or 10 is the only way to go. The acronym is Redundant Array of Inexpensive Disks, if they are so Inexpensive, why are you concerned about the difference between losing 1 drive to parity, or losing half your drives to duplicates. I cannot think of a single place where RAID 5 is appropriate, the performance loss on write just isn't worth the trouble.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:Bogus outdated thinking by MadnessASAP · 2009-09-18 01:01 · Score: 1

Re you kidding me? Go for a room full of beer. The amount of beer I could get for an SSD I would be so drunk I wont be able to tell my laptop from my ballsack never mind an SSD from a regular drive.

--
I may agree with what you say, but I will defend to the death your right to face the consequences of saying it.
Re:Bogus outdated thinking by Coren22 · 2009-09-18 01:01 · Score: 1

It was such an appropriate car analogy too, really got the point across that the GGFP doesn't know what the hell he is talking about.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:Bogus outdated thinking by Coren22 · 2009-09-18 01:09 · Score: 1

Maybe your thoughts are in the wrong direction? I recently bought a 32 GB Intel SSD, and that would be more then enough for your average laptop, you could then add external storage for your mass storage needs if you need more then that. It however is not enough space for a Vista Ultimate OS drive...time to use Ubuntu instead now that my tuner card is supported by Linux. FYI, I bought the SSD for my MCP, not my laptop as I already have a SSD hooked to my laptop.
NewEgg SATA SSD Drives
As you can see, there are quite a few smaller drives which are actually pretty reasonably priced. Granted, you aren't going to be getting a 100GB SSD for $100, but you can get 32GB for approximately $100

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:Bogus outdated thinking by Svartalf · 2009-09-18 01:37 · Score: 2, Interesting

RAID5 is not backup. It's resilience for bringing the whole system down with a failure.
RAID was originally developed to make what we consider small storage capacities (then massive) affordable and reasonably reliable.
You're using RAID5 in it's "intended" use- but an SSD of the same capacity will be inherently MORE reliable (by a factor of how many of those magnetic disks you remove) than your system design right now.
From personal experience with a system customer base of literally thousands of enterprise class servers spread out over many companies, RAID doesn't work QUITE the way people make it out to be. We're ripping it out of the equipment and reverting to warm backups instead- the RAID1 design they fielded made the servers unstable.
The field engineer crowd (one of my friends worked with Nortel in the field engineer group and my brother is a manager for outsource company doing a lot of the same work with the same customers...) HATES RAID.
Blow a controller? Better hope you have an identical one in stock. You can't just swap out a differing controller of the same brand or pop a different brand in- they all do things ever so slightly differently on the disks.
Blow a disk? Better hope you can get the new drive in there and integrate it properly before you lose another.
Disks don't have the reliability we once thought they had.
RAID doesn't do what most people thinks it does for them.

--
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 01:44 · Score: 0

Can't make sense of your logic... but I support the "outdated" part... You see... I remember a slashdot story (October 22th, 2008) about it: http://hardware.slashdot.org/story/08/10/21/2126252/Why-RAID-5-Stops-Working-In-2009 Which actually linked to a zdnet story published July 18th, 2007 ... http://blogs.zdnet.com/storage/?p=162 It's 2009... Does raid5 still do its job?
Re:Bogus outdated thinking by metamatic · 2009-09-18 01:47 · Score: 4, Insightful

Blow a controller? Better hope you have an identical one in stock. You can't just swap out a differing controller of the same brand or pop a different brand in- they all do things ever so slightly differently on the disks.
That's why I prefer software RAID.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Bogus outdated thinking by Hi_2k · 2009-09-18 02:02 · Score: 2, Informative

That's why the smart money is based on node-based storage: Multiple boxes that are interchangeable. It's a shameless product plug, but I work for Isilon Systems, and our solution is that the whole system is considered replaceable: We don't sell a configuration that doesn't allow you to yank an entire box transparently. A drive failure is rebuilt and ready for swapping as soon as it comes up: Most of our admins don't know about disk failures until their data is already reprotected.

Granted, our smallest config is 9TB; We're somewhat overkill for a home user. But if you need a company-wide NAS...

Commodity hardware, standard networking (Gig and 10Gig Ethernet frontend, Infiniband backend), and a very smart filesystem (Capable of protecting from up to 4 simultaneous whole-node failures) == a killer combination; It takes some seriously bad luck for data-loss to become a problem.

--
When life gives you crap, Make Crapade.
Sluggy Freelance.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 02:07 · Score: 0

You had to go introduce a car analogy...
Re:Bogus outdated thinking by fluffernutter · 2009-09-18 02:07 · Score: 2, Informative

To do a true raid-5, cost of the drives is fairly negligible. AFIK, to avoid missing writes in the event of a power outage you need a true raid chassis with a battery backup which runs $4K +. Fake raid 5 and software raid 5 are pretty risky as the writes can get caught in the parity calculation and not get witten out if the power goes down to everything at the same time.

--
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
Re:Bogus outdated thinking by operagost · 2009-09-18 02:15 · Score: 1

We, and all of our customers, have service contracts with a major IT company-- the one who makes most of the hardware we buy. We let them worry about having replacement controllers, as it is their responsibility under contract. We also make sure our customers have good backups, of course.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 02:28 · Score: 0

Apparently you don't know where you are posting. Slashdot ALWAYS has had, and will have a bunch of ignorant folks who think RAID is a backup. No matter how many times you tell them it's not.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 02:34 · Score: 0

I have been supporting
You do not need nothing at home, but getting 99.99 at home is very achievable even with a single spindle running. My computers stay up 24/7 with 2.5" drives. and every 18 moths or so I just buy a new drive, not because the old one is failing, but because the old ones are not big enough. Then the old drives make it into external enclosures wired to a good powered 7 port USB hub, that are fired up once every day to take the backup of the internal drives and are then shut down. Yes I do have identical copies of the backup, and yes, i take some some of the drives to store in the desk at work. If i was really that paranoid I would not keep data at all. Computers, like every other electronics are engineered and build to fail at some point - they are . Most people seem to not get it. It's like complaining that after using that pickaxe for 6 months in the quarry it does not look and perform like new.
On another note - newer trust flash sticks and SSD for your data. The cause of death on this is usually eletrical, while inserting or removing the stick and while powering on/powering off the SSD. Not to mention the cell ware of the SSD - I had SSDs die in 6 moths even now, even the best makes.With spindle drives if the drive fails - drive electronics, the heads' arms, and the spndle motor, the data is still on the spindle, and is in a very good change of full, yet pricey recovery.
Re:Bogus outdated thinking by Rockoon · 2009-09-18 02:36 · Score: 2, Informative

To do a true raid-5, cost of the drives is fairly negligible.
While you are absolutely correct about cost, I think your definition of what a true raid-5 is needs a little work.

The purpose of RAID-n is to survive failures with near-zero downtime. The larger the disparity grows between capacity and performance as array sizes increase, the less and less these RAID's are serving their purpose. The chance of a drive failure while rebuilding a multi-TB array is quite significant, an occurrence that RAID-n was supposed to minimize to near-zero levels.

In the future, there will only be RAID-0 and RAID-JBOD for conventional drives. Uptime will have to be solved another way, because RAID-n solves it less and less as the years (and thus, capacity) tick away.

--
"His name was James Damore."
Re:Bogus outdated thinking by Ifni · 2009-09-18 02:44 · Score: 1

Unless it's FRESH ORGANIC celery, of course.

--
Oh, was that my outside voice?
Re:Bogus outdated thinking by Ifni · 2009-09-18 02:46 · Score: 1

Yes, but if you need TWO SSDs, then it becomes a tougher decision.

--
Oh, was that my outside voice?
Re:Bogus outdated thinking by hjf · 2009-09-18 02:47 · Score: 1

DAYS to rebuild an array? WTF?
I have an old athlon64, 3GB RAM (running at DDR200 instead of DDR400 because of a stupid memory controler, not even dual channel). Motherboard is a regular ASUS board, and its running OpenSolaris (some 1-year-old build). It serves silly services for internal network, but also runs 2 VMs (VirtualBox) which keep the cpu and RAM moderately busy most of the time. SATA drivers are Sun-provided-never-updated drivers
ZFS (software "RAID") takes about 2 hours to rebuild ("resilver", in zfs's terms) the 4x500GB array (90% full). I'm pretty sure that if the controller was SATA2, and I had more RAM (ZFS keeps huge caches in RAM -- by default "system memory - 1GB"), a second level cache (an SSD drive), and more up to date hardware, things would go much faster. I don't know why people keep telling that the array would take days to rebuild.
Probably only if you're near 100% utilization on disk I/O it would take you that long, but provided that ZFS can make caches from RAM (you can easily have 24GB on the newer Core i7 boards) and SSD (which would give you some 256GB of extra cache), I don't see a situation where it could take that long (or probably yes: if you run a program to read random sectors, effectively making cache useless).
Re:Bogus outdated thinking by hazydave · 2009-09-18 03:07 · Score: 1

Yes, it's true... I do run a RAID on my personal PC (well, in a little box next to it), and it's also true, I can't afford a 6TB SSD. Sad, but true.
However, I also know a secret... you can re-write a sector on an HDD many, many times more than the same physical sector on an SSD. The SSD gets you around the mechanical issue of seek times, but it doesn't get around the idea that things fail. And, given that SSDs usually do wear levelling to ensure no one sector wears out prematurely (depending on the specifics of the NAND flash used in the SSD, whether it's MLC or SLC, etc... some MLCs are only good for a few tens of thousands of writes), by the time you start to see a failure or two, chances are, the whole thing's ready to go. They're also typically slower at dealing with writes, particularly many small writes, since you have very large block sizes at the flash-memory level (up to about a megabyte) on large SSDs.
Obviously, there are plenty of usage patterns that would yield a mechanical failure from an HDD long before there's a flash cell failure from an SSD. In both cases, the realistic life expectancy may be fine... I don't expect an HDD to last forever... I'm happy when the last to the point at which a replacement's cost is essentially a "no brainer" (eg, when 2TB drives start showing up below $100 or so, my 6TB RAID is likely to become an 8TB RAID).

--
-Dave Haynie
Re:Bogus outdated thinking by Sancho · 2009-09-18 03:18 · Score: 1

It does protect your data, just not in every possible scenario.
But it's really, really bad to start thinking about it as protection for your data.
Anyway, RAID started out as a way to get higher performance I/O. Doing this reduced reliability, so measures were taken to restore some of that reliability. Using RAID5 for reliability is like going to Afghanistan, but only staying in the good parts.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 03:28 · Score: 1, Informative

I use Linux software RAID5 with three 640GB SATA drives in my home PC which serves as my MythTV DVR as well as my fileserver. At the time I assembled it, this was the sweet spot of price, size, performance, and power efficiency.
I have an almost identical host located about 400 miles away in a relatives home, serving as a mirror for my fileserver content (but not DVR content which I consider disposable, but still RAID protected just for convenience of recovery if a simple disk error hits me). That host has a mixture of 640 GB drives and 320 GB drives because it has actually evolved over six years since I originally assembled it with three 160 GB drives.
I replaced drives with larger cheap ones when something failed or its SMART data was looking iffy or I needed more space, always maintaining RAID5 level protection for my data except during brief degraded array events. I always purchased cost effective replacements which were usually larger, allowing a size progression like this: 3x 160 GB ... 2x 160 GB + 1x 320 GB (ignore upper 160 GB) ... 2x 160 GB + 2x 320 GB (migrate into logically 3x 320 GB by having upper and lower half of each 320 GB drive associate with one of the 160 GB drives) ... 3x 320 GB; repeat part of this sequence with 640 GB replacing 320 GB drives. By the way, I also migrated the host hardware through CPU speed upgrades, chassis and motherboard upgrades, whole conversion from AMD to Intel CPU, and many operating system replacements in that six year period. During all of this, Linux MD RAID let me maintain the same data arrays on the same disk set while swapping out such other components.
My backup strategy is to keep generational backups on the same disks (separate RAID5 filesystems) on each host and frequently synchronize the main fileserver image over the Internet with rsync. So recent changes are propagated between hosts and each keeps its own running generational snapshots so I can recover from a complete system loss with a worst case effort of sneaker-net to carry bulk data 400 miles. In practice, I've never experienced any complete system loss, though I have had to rebuild the boot OS remotely in order to gain access to the still intact data arrays after one peculiar hardware error event.
In my experience, Linux MD RAID has been wonderful. I was able to do the above-mentioned reconfiguration of disks in a live system (only powered down temporarily to physically install and remove internal SATA drives). I create the RAID arrays over disk partitions, so I can selectively add and remove the disk zones in chunks using mdadm. Rather than try to resize2fs on these multi-year filesystems, I admit that I did tar/reformat/untar some filesystems one time to resize and defrag.
Re:Bogus outdated thinking by Wdomburg · 2009-09-18 03:32 · Score: 2, Informative

Try a rebuild on a much larger aggregate running a dual parity array under load. Trust me, they can easily run days. Say you have a 16 disk aggregate using 1TB 7200RPM disks. Because you need every block in a stripe to reconstruct parity, you need to read from the other disks to reconstruct; so 14 reads and 1 write per block.
You're also misunderstanding how the SSD caching works for ZFS. Blocks are only pulled in after repeated requests, which isn't going to be the case for a resliver. There will be at least some benefit to read ahead caching in memory, but even that has sharply diminishing returns, particularly with the ZFS rebuild strategy of reconstructing at a file level rather than a linear block rebuild. That approach has significant benefits though. By walking through the metadata instead of blindly copying blocks you don't have to rebuild empty space, and if - god forbid - you lose more than one drive in a RAID-Z or two drives in a RAID-Z2 array, you still have a partial recovery to work with.
Re:Bogus outdated thinking by PitaBred · 2009-09-18 03:37 · Score: 1

Seconded. I have a 4x1TB RAID5 in my media center for the same reason. It'd be a pain in the ass to re-rip all my DVD's and such, but I don't really need to back up the data. I just want it to be fault tolerant, so if a drive dies, I can replace it and not lose all my data.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:Bogus outdated thinking by phoenix_rizzen · 2009-09-18 03:43 · Score: 1

The difference is that ZFS resilver only rebuilds data that existed on the failed drive. Normal hardware RAID rebuild touches every sector of the disk.
Thus, a ZFS rebuild of a 50% full disk will only touch 50% of the disk. A rebuild of a normal array would touch 100% of the disk.
Wait until your ZFS array is at 80% full and try to resilver a failed drive. ;)
And when you get into TB-sized arrays, rebuilds take a lot longer than on GB-sized arrays.
Re:Bogus outdated thinking by PitaBred · 2009-09-18 03:58 · Score: 1

RAID5 is awesome for my media center. It's data that's not terribly valuable and can be recreated, but I don't want to lose it to a drive failure because the initial creation wasn't easy, and a large, cohesive bit of space is nice for a media center program like MythTV. The drives are inexpensive, true, and I only spent $400 on 4 1TB drives. That's all I can fit in that case, and having that extra 1TB of space on my media center right now means that instead of the array being 75% filled, it's only 50% filled.

Just because you can't think of a place for it doesn't mean there isn't one.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:Bogus outdated thinking by tehSpork · 2009-09-18 04:14 · Score: 1

The problem is IT guys and PHB's that think RAID=Backup.
Bullshit. If that's true where you work then you'd better be looking for some new IT guys. In my time in IT I have never seen a RAID without a tape jukebox or some other backup system behind it.

Now were you to qualify that as "the unqualified scabs doing IT for most small businesses" that could be a different story.
Re:Bogus outdated thinking by Guspaz · 2009-09-18 04:36 · Score: 1

Which reminds me; the problem about rebuild times growing will more or less go away as the industry (eventually) moves to SSDs; freed of the mechanical limitations of existing drives, SSDs will likely see increases in capacity keep up with increases in speed.
Today's high-end (~$900) drives can theoretically do a rebuild on a 3x80GB RAID-5 array in about two and a half to three minutes. Yes, minutes; the read/write speeds are on the order of half a gig per second on the PCI-e based drives.
So, OK, we've got performance covered for the forseeable future. What about the concern that RAID-6 is just a "reliability Band Aid"? That's bullshit, to be honest.
The big problem with RAID-5 is, what happens if you're rebuilding a missing drive, and you get a read error on one of your two remaining drives? Well, excusing the fact that filesystems using checksums such as ZFS should be able to detect and re-read if the data itself is sound, RAID-6 fixes the problem in that the *EXACT* same sector would have to be bad on two out of three disks in order to cause a permanent failure. The chances of that happening are infinitesimal, and always will be.
I took a look at Seagate's Cheetah drives. They have a reported error rate of 1x10E-16. If we assume that's in bits, that indicates you'll get a read/write error every 1.25 petabytes OK, not too low, but I can see that being a concern as disk sizes get bigger.
OK, so what are the chances of the *SAME* sector having an error there? How often would that occur? Unless I'm mistaken, you multiply the probabilities, and you get a read error on the two disks simultaneously 12,500,000,000,000,000 petabytes. That don't sound like no "Band Aid" to me.
*NOTE: I'm crap at statistics, so I know that I'm failing to take into account that there are three disks that can have a read error in a RAID-6 scenario. So sue me, it doesn't change my point.
Re:Bogus outdated thinking by smoker2 · 2009-09-18 04:43 · Score: 1

I have ebuyer in my hosts file to redirect me to a warning on my system.
Re:Bogus outdated thinking by amplt1337 · 2009-09-18 05:02 · Score: 1

A room full of beer, on the other hand... just think of the possibilities!

--
Freedom isn't free; its price is the well-being of others.
Re:Bogus outdated thinking by Courageous · 2009-09-18 05:12 · Score: 2, Informative

As for the rebuild times, fine, go buy FASTER drives.
Hard drives are getting bigger faster than they are getting faster.
Hard drives are getting bigger faster than they are getting more reliable.
In an enterprise setting, SATA based storage is a reality, for cost reasons, in tiers 2 and 3.
Your suggestion that this problem is solved simply by buying faster drives is a poor one.
And in a few generations of high speed drives, the problem with manifest regardless.
Henry's article is not as clear as it could be, however. He's really talking about the pending failure for traditional raid sets as we know them, such as aggregates of N drives in a set, or drives hung off a RAID controller. RAID as an algorithm for error correction is nowhere near failure. Look at the manner in which Isilon does it. All the data in an isilon system is part of a clustered RAID approach, but this is distributed in data packets far different than standard block. All nodes in an Isilon cluster participate in a "RAID rebuild" when it's needed; the system is capable of multigigabyte per second RAID rebuild, and it only rebuilds what is needed, not the "disk". This can all be done with economical SATA drives.
Note, however, that Isilon's RAID is not really RAID at all. I.e., it's not about arrays of disk, but rather partity based correction of lost file redundancy data. I.e., it's more object based, such as Henry was alluding to.
As for the classic RAID set, Henry is quite right when he says that it is trying to die. RAID rebuild times are already in excess of 24 hours, and are going to be that much worse with 2TB and 4TB drives. With longer RAID rebuilt times, pDATALOSS increases notably, particularly if you are aware the Google and Carnegie findings that drives actually tend to fail at the same time. I.e., pFAIL of a HD is not independent of pFAIL of other HD's in a RAID set. They tend to fail together.
C//
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 05:17 · Score: 0

Spoken like someone that has no idea what their talking about.
First of all, the write performance on a RAID 1 set up is much, much slower than RAID 5. RAID 1 requires everything to be written twice, once on the primary, once on the mirror, that's what RAID 1 is, RAID 5 requires a single write and a parity bit. Secondly, as a RAID array is scaled up, RAID 5 is far cheaper, If you have an array spanning across 10 drives you only need 1 additional drive for parity, in RAID 1 you would need a total of 20 drives, not to mention power and physical space requirements. Rebuilding is much faster in RAID 5, when a drive fails, the entire array is working to rebuild the replacement drive plus you still have access to the array and all of its data while it is rebuilding, in RAID 1 you're out of commision until the contents of the failed drive are restored by the single mirror drive. And thirdly, RAID, as defined by the RAID advisory board, stands for Redundant Array of Independent Disks.
Re:Bogus outdated thinking by hawk · 2009-09-18 06:01 · Score: 1

Well, I think we just found out how vista was conceived :)
hawk
Re:Bogus outdated thinking by Coren22 · 2009-09-18 06:03 · Score: 1

Um...yeah...
So, write performance of a RAID 1 setup is slow? Write performance on a RAID 1 is 1 X each drive speed (1 x 2 / 2), write performance on a RAID 5 array is (1 x 1 / X + Y) X being the number of drives in the array and Y being the amount of time it takes to compute, which typically is the long part of the equation. As far as cheaper, is days of your time rebuilding the array for a missed read cheaper to you? It isn't to me when I have half the company asking me why our file server is so slow. Rebuilding is faster on a RAID 5? Now we are getting into "what are you smoking?" area. Rebuild on a RAID 10 is a matter of copying data from one drive to another, RAID 5 requires you reading from every drive, recomputing the parity, then rewriting to the replaced drive.
Obviously, it is you who needs to learn about RAID since you don't know the basics of how these things work, and since I work in the industry building these items and testing them. I would post the link to a wonderful site detailing why RAID 5 needs to die a painful death, but I can't seem to find it at the moment.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:Bogus outdated thinking by Coren22 · 2009-09-18 06:06 · Score: 1

I would tend to agree with this as a good case to fit under "data I don't care about" such as rips of DVDs you have, or MP3 rips of your music CDs. If you are recording from TV to this though, make sure you have a good RAID controler that computes the parity on board or you might find that the encoder can't keep up with live TV. I actually use a RAID 0 on my media pc as it needs no overhead, and I could care less if my recordings are lost as they would be uploaded to my file server after processing.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:Bogus outdated thinking by delcielo · 2009-09-18 06:41 · Score: 1

SSDs aren't perfect either. For critical data, you'll see companies deploying their shiny new SSD drives in SSD RAID arrays.

--
Hot Damn! It's the Soggy Bottom Boys!
Re:Bogus outdated thinking by qoncept · 2009-09-18 06:56 · Score: 1

Show me three hot chicks doing things even I think are gross and I'll show you three bills.

--
Whale
Re:Bogus outdated thinking by Burning1 · 2009-09-18 07:01 · Score: 1

[blockquote]It's not and it never has been a backup solution. RAID is high availability and nothing more.[/blockquote]
Raid is fault tolerance, high availability, performance, capacity, managment, and in many cases, monitoring.
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-18 07:27 · Score: 0

and I could care less if my recordings are lost as they would be uploaded to my file server after processing.
So you must care a great deal then, since you could care less. But that doesn't make much sense in the context of your argument. Or do you mean you couldn't care less?
Re:Bogus outdated thinking by EXrider · 2009-09-18 08:15 · Score: 1

The problem is IT guys and PHB's that think RAID=Backup.
Nobody that thinks "RAID=Backup" qualifies as an "IT guy", that's a disaster waiting to happen.

--
grep -iw skynet /etc/services
Re:Bogus outdated thinking by Coren22 · 2009-09-18 09:13 · Score: 1

See sig for answer...or can you not read.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:Bogus outdated thinking by Carnildo · 2009-09-18 09:38 · Score: 1

Lucky you. When I built my storage server, the comparisons were things like "new car" or "down payment on a house".
Needless to say, I instead spent $500 for five 1TB drives in a RAID-6 array.

--
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
Re:Bogus outdated thinking by hjf · 2009-09-18 11:30 · Score: 1

read the post. the array is 4x500GB (being upgraded to 4x1TB), 90% full. still takes 2 hours to resilver.
Re:Bogus outdated thinking by hjf · 2009-09-18 11:37 · Score: 1

when I mentioned SSD caching I was talking about the regular "business" reads (not the "rebuild" reads).
you mention RAID-Z2, that's a completely different animal. but yeah, that's what it's for: the chance of a second drive failing during the rebuild, which RAID-5 doesn't take into account (pretty much, the main topic of the original post was about a second drive failing)
so if it takes you days to rebuild an array you're, statistically speaking, much safer and it won't matter if it takes a couple of days to rebuild, it will become an issue only if you're near 100% I/O usage (which is rare 24/7)
Re:Bogus outdated thinking by daybot · 2009-09-18 12:14 · Score: 1

When I built my storage server, the comparisons were things like "new car"... I instead spent $500 for five 1TB drives in a RAID-6 array.
What good is a new house and a new car if you can't enjoy them because you're worrying about your RAID rebuild times? Choose life: choose SSD!
Re:Bogus outdated thinking by daybot · 2009-09-18 12:32 · Score: 1
32 GB Intel SSD, and that would be more then enough for your average laptop
I work and play on my laptop. It's my only computer - a fact that makes my life easier. Here's my storage requirements on top of OS/apps:
- Photos: 40GB
- Music: 50GB
- VMs: 100GB (work-related)
- Client data (encrypted, but no personal data): 30GB
- Video: 1,200GB
My laptop drive lets me carry everything but video without an external HD. Back at home, I have full-size external drives for the video and for backups. Let me know when 512GB SSDs cost less than a Burton Custom.
Re:Bogus outdated thinking by ypctx · 2009-09-18 13:20 · Score: 1

to protect from 4 node failure, is the storage overhead 1:5 or do you use some compression technique?
Re:Bogus outdated thinking by dbIII · 2009-09-18 14:18 · Score: 1

One reason I went for hardware RAID instead of software is that it would be more likely that my theoretical replacement I've never met would be able to go out and buy another card than read my documentation that they may not be able to find in the first frantic 5 minutes on site with a dead array. Large organisations where you can depend on things getting passed on are a completely different story.
In hindsight I could do it and just tape instructions onto the outside of the cases. I didn't think of that at the time and the shiny aspects of the web front ends and easy setup on boot sold me.
Re:Bogus outdated thinking by dbIII · 2009-09-18 14:38 · Score: 1

There are so many that The Daily WTF would no longer consider a "I thought RAID was backup?" disaster worth a story. I have a few in my workplace (developers are IT guys too), I keep an eye on their purchases and take snapshots every week or so of their "backup" areas on RAID5.
800GB tapes are cheaper than most people think.
Re:Bogus outdated thinking by dbIII · 2009-09-18 14:54 · Score: 1

I can think of several and have several. With geophysical applications you have the input data, the transformations you do on it and the working space which just contains modifications to the input data. I use RAID5 for the working space because anything in there can be constructed from the input data and the information about what has been done to it. Of course the whole lot goes onto tape at regular intervals. In the worst case I could lose five days worth of work if an array dies but in nearly every case it would take a lot less than five days to reconstruct it all. There are still all the jobs so they are just queued up to all run again and if time is an issue more nodes could be used than in the original run.
Disk is cheaper now so I'll be moving off RAID5 to avoid those worst case situations, but it certainly has it's uses.
Re:Bogus outdated thinking by Trogre · 2009-09-19 00:13 · Score: 1

An electric car.
(given the current track record of SSDs)

--
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
Re:Bogus outdated thinking by Anonymous Coward · 2009-09-19 03:25 · Score: 0

Get with the times. It is Redundant Array of INDEPENDENT Disks.
Re:Bogus outdated thinking by Longstaff · 2009-09-19 06:26 · Score: 1

I would mod up if I could. I've got two separate Isilon clusters at work and they are indeed awesome. Just one of the features that go above and beyond standard RAID is what they call FlexProtect. I allows for the ability to handle your redundancy not just at a node/disk level, but right down to a directory/file level. Only need N+1 except for some critical data? You can give that file or whole directory N+4 while keeping the rest of the cluster N+1.
Re:Bogus outdated thinking by AK+Marc · 2009-09-19 17:48 · Score: 1

I don't know how you'd say that. If you have 400 nodes and can protect from a 4-node failure, then you have just 4 parity and 396 data, for a 4/396 or 1/99 of the available data being used for parity. But if you had 5 nodes, then 4/5 of the available data would be unavailable. So to give a number based on the redundancy seems to be absurd.

--
Learn to love Alaska
Re:Bogus outdated thinking by AK+Marc · 2009-09-19 17:52 · Score: 1

Just as an aside, this is why RAID-5 isn't always the best solution. 2X 1T drive would cost a little more than 3X 500GB, but would also offer better reliability. They should fail less often because, for the same per-drive MTBF, the RAID-1 solution will have a failure 33% less often. Fewer failures is a good thing.

--
Learn to love Alaska
Re:Bogus outdated thinking by badkarmadayaccount · 2009-09-20 02:42 · Score: 1

And this is why Slashdot does not need signed certificates - this place is the only one where the parent post could for a second be considered serious.

--
I know tobacco is bad for you, so I smoke weed with crack.
Re:Bogus outdated thinking by Svartalf · 2009-09-21 05:34 · Score: 1

Heh... You'll note I didn't mention what type we were ripping out in my post- it was SOFTWARE RAID1 we were ripping out. It has more inherent problems than RAID5 and RAID5 in software has it's own sets of issues. Intrinsically, "hardware" RAID is little more than software RAID on specialized hardware- but it only has to do one thing and it's typically battery backed up enough to ensure pending writes to the array get there on a power failure- if you're doing enterprise class hardware that is.
When someone mentions RAID of any kind, I question whether they understand what they're talking about. There are good reasons for RAID- but many of the people using it aren't in that domain of good reasons more often than not.

--
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Re:Bogus outdated thinking by jon3k · 2009-10-01 07:15 · Score: 1

SSDs when you need speed, HDDs when you need capacity. No one "needs" a ferrari, a minivan probably makes more sense. But if you've got the money and like the performance, buy an SSD.
Re:Bogus outdated thinking by jon3k · 2009-10-01 07:18 · Score: 1

Why do people continue to think that SSDs should be several orders of magnitude faster in IOPS and several times faster in throughput but somehow cost the same price? I never understood that.

Speed, Capacity, Low Cost - you can have any two.
Re:Bogus outdated thinking by jon3k · 2009-10-01 07:20 · Score: 1

Please define a "real RAID setup" and explain why software RAID isn't an option.
Re:Bogus outdated thinking by jon3k · 2009-10-01 07:31 · Score: 1

"Which reminds me; the problem about rebuild times growing will more or less go away as the industry (eventually) moves to SSDs; freed of the mechanical limitations of existing drives, SSDs will likely see increases in capacity keep up with increases in speed."

No, it was just increase the period of time before rebuild times become unacceptable. Growth of SSD capacity will far outpace the throughput increases going forward, so eventually, again, capacity will so far outpace performance that rebuild times will be unacceptable.

Enlighten me by El_Muerte_TDS · 2009-09-17 21:31 · Score: 3, Insightful

(Certain) RAID (levels) address the issue of potential dataloss due to hardware malfunction. How does moving to an Object-Based Storage Device address this issue better? Actually, I don't see how RAID and OSD are mutually exclusive.

Harddisks, not RAID by Anonymous Coward · 2009-09-17 21:35 · Score: 5, Insightful

Now that's a stupid article.

It basically says, you can't read a harddisk more than X times before you get an error on some sector, so RAID is dead. That's a logical nonsequitur. RAID is a generic technology that also applies to flash memory cards, USB sticks, anything you can store data on basically. The base technique says "given this reliability, you can up the reliability if you add some redundancy". There's no link to harddisks other than that that's what they're used for right now.

Re:Harddisks, not RAID by Anonymous Coward · 2009-09-17 23:39 · Score: 0

The solution is simple. Just ask Jane to take care of it. She can do anything.
Oh wait, we don't have the ansible communication networks up yet. Nevermind....
Re:Harddisks, not RAID by J4 · 2009-09-18 00:26 · Score: 2, Insightful

RAID is here to stay for a while no doubt, but it's a response to a series of problems that has problems of it's own. You can take 5+1 drives make an array where one bad chassis slot can indeed take the whole thing out, or you make a bunch of mirrors at the expense of capacity, or you can stripe one scary large fragile volume.In production it's about performance & availability. Realize that the whole data integrity thing is relative and merely an illusion. It's kinda like on Futurama when they had the tanker with 1k hulls. The only solution to the first case is double the hardware, which is a major investment and recurring cost (rack space/electricity, stamps). Murphy's law tell's us that indeed "shit happens", so there are no guarantees.
Although I didn't read the article I suspect it's promoting the cloud paradigm, which is the current ultimate expression of redundancy.
Re:Harddisks, not RAID by Coren22 · 2009-09-18 01:40 · Score: 2, Insightful

Wow, never thought I would see an obscure reference like that. Most of the people I know who read Ender's Game never bothered to read the rest of the series and would have no clue who Jane was.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:Harddisks, not RAID by R2.0 · 2009-09-18 02:38 · Score: 1

"Wow, never thought I would see an obscure reference like that. Most of the people I know who read Ender's Game never bothered to read the rest of the series and would have no clue who Jane was."
Maybe that's because we read the novella first, thought it was fantastic, and then read the novel and thought WTF?
I could barely stand the crap he used to bulk up the page count the first time - why would I read an entire SERIES made up of that drivel?

--
"As God is my witness, I thought turkeys could fly." A. Carlson

There are always more solutions... by Anonymous Coward · 2009-09-17 21:37 · Score: 1, Interesting

Probably the next meta solution after RAID 6 will be something like ZFS, where the filesystem that works not just on the fs-specific layer, but on the LVM layer so it can log CRCs of files and immediately be able to tell if a file got corrupted (and perhaps fix it with some ECC records.) One can see a filesystem not just writing a RAID layer, but taking recovery data and storing that away as filesystem metadata.

Of course, there is always doing redundant arrays of RAID clusters, say three groups, two data, one parity, or mirroring RAID 5 volumes. You have the usual tradeoffs: The more fancy the RAID scheme, the more disks you need, and the more computing you have to do for every bit thrown at and read off the array.

Long term solution? A move to something other than magnetic storage. This could be optical, it could be SSD if some advance allows very large density increases, or something unknown. The technology would have to have a rate of failure magnitudes better than magnetic, as well as a cost on par with magnetic for it to completely work. Holographic storage has languished for a while, perhaps as the technology improves for that, we may see drives using 3D blocks of that replacing the old fashioned spindles.

Re:There are always more solutions... by denis-The-menace · 2009-09-18 01:44 · Score: 2, Informative

Who says there are no errors with optical media?
I've seen a CD with light shining through after 5 years.

--
Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
Re:There are always more solutions... by Lifthrasir · 2009-09-20 10:55 · Score: 1

That's because there is a big hole in the middle of the disk :)

--
No beer, no TV make Lifthrasir something something

Ask what does Google do by Anonymous Coward · 2009-09-17 21:39 · Score: 0

Ask, ask, ask

Re:Ask what does Google do by K.+S.+Kyosuke · 2009-09-17 21:50 · Score: 0, Offtopic

I thought the modern version of the old saying was "Ask what would Jesus google."

--
Ezekiel 23:20
Re:Ask what does Google do by Carewolf · 2009-09-17 23:44 · Score: 2, Insightful

A search engine doesn't mind losing data, most of the storage is essentially just a cache or summary of the internet and can be regenerated. That said, Google already have so many mirrors for performance reasons that actual data loss is practically impossible.
Re:Ask what does Google do by hjf · 2009-09-18 03:00 · Score: 1

But they also host google mail and apps. The data on that can't be regenerated.
Re:Ask what does Google do by BitZtream · 2009-09-19 05:09 · Score: 1

That data is also stored on multiple machines across multiple data centers.
Google doesn't lose data. They lose drives, sometimes machines, and on rare occasion a full data center, the do not however lose data because any one of those is insignificant when you store copies of everything on multiple continents.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

RAID is here to stay by paulhar · 2009-09-17 21:41 · Score: 5, Insightful

Disclaimer: I work for a storage vendor.

> FTA: The real fix must be based on new technology such as OSD, where the disk knows what is stored on it and only has to read and write the objects being managed, not the whole device
OSD doesn't change anything. The disk has failed. How has OSD helped?

> FTA: or something like declustered RAID
Just skimming that document it seems to claim: only reconstruct data, not white space, and use a parity scheme that limits damage. Enterprise arrays that have native filesystem virtualisation (WAFL for example) already do this. RAID 6 arrays do this.

Lets recap. Physical devices including SSDs will fail. You need to be able to recover from failure. The failure could be as bad as the entire physical device failing, or as bad as a single sector being unreadable. In the former case a RAID reconstruct will recover the data but you'll hit RAID recovery errors due to the raw amount of data that needs to be recovered. Enterprise arrays mitigate the risk of recovery errors by using RAID 6. They could even recover the data from a DR mirrored system as part of the recovery scheme.

And when RAID 6 has a high enough risk that it's worth expanding the scheme everyone will start switching from double parity schemes to triple parity schemes since their much less expensive in terms of spindle count than RAID 6+1.

One assumption is, at some point in the future, reconstructions will be a continual occurring background task just like any other background task that enterprise arrays handle. As long as there is enough resiliency and performance isn't impacted then it doesn't matter if a disk is being rebuilt.

Re:RAID is here to stay by Kjella · 2009-09-17 22:54 · Score: 4, Informative

And when RAID 6 has a high enough risk that it's worth expanding the scheme everyone will start switching from double parity schemes to triple parity schemes since their much less expensive in terms of spindle count than RAID 6+1.
I don't think you've quite understood the problem described. You can have an infinite number of parity disks, but it does you no good if recovering one data disk causes another data disk to fail.
Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.
Now we try doing the same with ten 10TB disks and the same reliability. The last disk dies and you replace it, only now you must read 10TB from each disk. Instead of adding 1% to the lifetime it adds 10% so that they've spent 20, 30, 40, 50, 60, 70, 80, 90, 100 and 10% (your new disk) of their lifetime. But now another disk fails, you can recover that but then another will fail and another and another and another.
Basically, parity does not solve that issue. If you had a mirror, you would instead copy the mirrored disk with significantly less wear on the disks. RAID is very nice as a high-level check that the data isn't corrupted but it's a very inefficient way of rebuilding a whole disk.

--
Live today, because you never know what tomorrow brings
Re:RAID is here to stay by Anonymous Coward · 2009-09-17 23:19 · Score: 0

I don't think you can read. You can complain all you want but he already addressed your point here

One assumption is, at some point in the future, reconstructions will be a continual occurring background task just like any other background task that enterprise arrays handle. As long as there is enough resiliency and performance isn't impacted then it doesn't matter if a disk is being rebuilt.
Re:RAID is here to stay by paulhar · 2009-09-17 23:20 · Score: 2, Interesting

RAID 1 has much less reliability than RAID 6. Assume a typical case: one disk totally fails. You then start to reconstruct - in a RAID 1 scheme a single sector error will result in the rebuild failing. Not great.
In RAID 6 you start the rebuild and you get a single sector error from one of the drives you're rebuilding from. At that point you've got yet another parity scheme available (in the form of the RAID 6 bit) that figures out what that sector should have been and then continues the rebuild. Then you go back and decide what to do about that drive that had the second error.
A lot of drive failures aren't full head crashes or motor errors but just single sector, track, bits of dirt on the platter style errors. Other than the affected area the drive can be read.
With RAID 6 you can fail two disks completely and still access the data. You're still reading from the same ten 10TB disks in your example and if the implementation of RAID 6 is optimal (RAID-DP) you aren't having to read additional data from the same physical disks.
In the world you describe with 10TB drives it sounds like you'd just not be able to use the disks at all since any process that reads from the disks will kill them. There are a few things that could happen:
1. Disks get more reliable. Hasn't happened much yet but...
2. We switch to different packaging. Instead of making disks larger we cram more of them into the same space similar to CPU cores - same MTBF per disk but lots of them presented out by one physical interface.
3. We change technologies completely. SSD (interesting failure modes there too... needs RAID)
I guess we'll find out in only a few years...
Re:RAID is here to stay by Junta · 2009-09-18 00:02 · Score: 1

Enterprise arrays that have native filesystem virtualisation
Do we really have to put the word 'virtualisation' on everything? I don't see what aspect of the concept is remote 'virtual'. Filesystem-level or filesystem aware RAID schemes I wouldn't mind, but 'virtualisation' is being tossed around to the point of becoming a meaningless buzzword, completely stripped of its original, specific meaning.
Other than that one word, I agree with the sentiment. RAID is a sufficiently generecized concept that can cover the dumbest array configs (unclean arrays require full device resync, one bad sector read putting an array into degraded mode instantly, no filesystem awareness resorting in managing unimportant data) to smarter cases (unclean arrays being a near impossibility or resync aided by a journal to know which specific parts could have stale parity, bad sector read inducing sector rewrite if hard drive can still rewrite and issueing a warning, and schemes that know the difference between used and unused space). If you focus on the low end and think it the end-all, be-all of RAID, then yes, you'll think it's in need of immediate attention. If you understand the more sophisticated implementations, you'll realize it scales no worse than the data you use.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:RAID is here to stay by jittles · 2009-09-18 00:06 · Score: 1

I'm no expert on physical media here but as storage space increases wouldn't you expect the drive to be able to handle more reads?
Re:RAID is here to stay by secmartin · 2009-09-18 00:11 · Score: 1

In fact, ZFS has just gained support for triple-parity RAID precisely because of the long rebuild times with current-generation drives.
But given the every-increasing size of drives, moving to RAID-10 might be a good alternative; you'll need more disks to reach a certain desired array size, but rebuild times will be far lower because you don't need to do parity calculations. With RAID-1 and RAID-10, a 2TB drive can be completely rebuilt is less than 8 hours, depending on how busy it is; and you don't suffer the extreme performance penalty you get when using a RAID-5 array in degraded mode.
Re:RAID is here to stay by Anonymous Coward · 2009-09-18 00:16 · Score: 0

To assume that we'll have 10TB disks which will give us on average one read error every 100TB of reads seems a little pessimistic. There are two kinds of read errors that a hard disk can give you: The fatal ones and the spurious ones. Fatal ones are caused by head crashes, material fatigue and other effects which will cause the whole drive to fail in short order. Spurious errors can be caused by vibration, surface impurities and electrical interference. These errors are more likely with increasing density (or to put it another way: As the capacity increases, the frequency of these errors remains constant, so they more often over the full capacity.) These errors can cause data loss, but they don't affect the rest of the drive. This means that the disk itself can lower the failure rate through internal redundancy schemes. The typical fatal errors on the other hand do not increase with capacity. A head crash is a head crash, whether the disk stores 1GB or 1TB. That means the frequency of fatal errors goes down as the capacity goes up. (Actually as read-write speed goes up.) It is true that it is futile to build a RAID with hard disks that can only be completely read/written 10 times before a fatal error occurs, but just as futile would using such a hard disk for anything else be.
Re:RAID is here to stay by dpilot · 2009-09-18 00:19 · Score: 2, Insightful

Even this doesn't handle the other side of the scenario...
Buy your box of drives and put them in a RAID-6. Chances are you just bought all of the drives at the same time, from the same vendor, and they're probably all the same model of the same brand. Chances are also very good that they're from the same manufacturing lot. You've got N "identical" drives. Install them all into your drive enclosure, power the whole thing up, build your RAID-6, put it into service.
Now all of your "identical" drives are running off of the same power supply, getting the same voltage. There's likely to be some temperature gradient inside the box, but overall they're all at similar temperatures. They have the same number of POH, the same number of read requests, same number of write requests. In essence, they remain very nearly "identical" through their service life.
Next, let one drive fail. What are your chances of having a second drive failure, especially when you power the RAID down to replace the first failing drive?
That's what I've heard some anecdotal evidence from, from those who manage this type of thing where I work. RAIDs tend not to have single-drive failures, or at least tend to have "time clustered" drive failures. Plan for it.

--
The living have better things to do than to continue hating the dead.
Re:RAID is here to stay by Anonymous Coward · 2009-09-18 00:23 · Score: 1, Interesting

Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.
Except that it doesn't work anywhere NEAR like that. The lifetime of disks are much, much greater than the read cost to rebuild a failed drive. You're definitely not spending 1% of its total lifetime. You're not even spending 1/1000th of that 1%. You couldn't measure the difference between having to rebuild that drive and just using the disk as a new one.
The vast majority of hard disk failures are manufacturing issues, not end-of-lifetime for an average drive issue. You don't have a raid system because the average lifetime of a harddisk is small, you get a raid system because of the outliers. Every once in a while, a manufacturer puts out a deathstar, and it's going to fail a month after you put it in. At the same time, you'll have disks in there that are going to keep chugging away for five years straight, and you'll eventually replace them because you want a bigger disk, not because they've failed.
Re:RAID is here to stay by atamido · 2009-09-18 00:50 · Score: 2, Informative

Actually, reliability quickly scales towards RAID 1+0 as the number of drives increases. In a 14 drive array, a single drive failure in both is fine. A second drive failure has the possibility of destroying the RAID 1+0 array, but the chance of the right drive failing is low. With 3 total drive failures, RAID 6 will fail, while RAID 1+0 has a low probability of failure.
Rebuild times are also much shorter on RAID 1+0 as only a single drive has to be read, which reduces heat produced and the chance of a second failure.
There are some papers that describe the math of the statistical analysis to prove it, but I can't track it down at the moment. It is a rather counter intuitive. But, you have significantly less drive space, so RAID 6 may still be the better option for some circumstances.
Re:RAID is here to stay by zrq · 2009-09-18 00:58 · Score: 1

2. We switch to different packaging. Instead of making disks larger we cram more of them into the same space similar to CPU cores - same MTBF per disk but lots of them presented out by one physical interface.
Um ... isn't that what RAID does ?
What you describe would just move the the the RAID controller inside the drive enclosure rather than on the PCI bus.
Unless you were thinking that we create a 10Tbyte disk from 10 x 1Tbyte discs, so if one fails you would only have to replicate 1Tbyte of data rather than the whole 10Tbyte.
In which case, the RAID controller would have to be able to 'see' inside the 10Tbyte virtual disc to know which of the internal discs had failed and what needed replicating.
So a 10Tbyte 'virtual' disc created by LVM gluing 10 x 1Tbyte RAID 1 arrays together to make them look like one large 10Tbyte disc ? .... all in one little box that could overheat, driven by one power supply that could fail or spike damaging the LVM or RAID controller chip corrupting the whole lot.
Re:RAID is here to stay by Kjella · 2009-09-18 01:01 · Score: 1

RAID 1 has much less reliability than RAID 6. Assume a typical case: one disk totally fails. You then start to reconstruct - in a RAID 1 scheme a single sector error will result in the rebuild failing. Not great.
Obviously, it was about RAID6+1 vs adding a third parity set, so you'd still have RAID6 as a fallback for that. Of course I pulled it to an extreme since these disks would only last 10 reads compared to current disks which claim 1 in 10^15 bits on 2TB = 500 reads between errors but that number has kept going down. The disk I presented would clearly be unusable, on the other hand I made some very kind assumptions about the rotation cycle, the effect of massive activity over short time and predictable failures.
My point was that if you're starting to enter hit size/reliabtility ratios where rebuilds trigger more rebuilds, you can't take the naive "What's the odds of THREE disks going down at once?" Take a bunch of disks of approximately same generation, being suddenly asked to work very hard to rebuild terabytes of data. What's the odds of a third drive going bad but not a fourth? Pretty slim if they're all falling off the same cliff, and that's the only time another parity set would help.

--
Live today, because you never know what tomorrow brings
Re:RAID is here to stay by maraist · 2009-09-18 01:10 · Score: 2, Interesting

I don't understand what your failure rate strategy is. First of all, there's no such thing as saying you are 90% or 10% of the way through a disk's life.. It's a probability distribution, who's probability is dramatically effected by the current events (and somewhat related to historical events). A drive might be at a 0.00005% probability of failure at any given moment, but then a large sustained read occurs which adjusts the heat and causes voltage fluctuations , so now you're operating at 0.001% probability.

Then a drive dies in hot-swap-mode, a drive spins down, then another spins up, this has massive voltage fluctuations as well as slight tension on the cabling which causes reflections in the wiring which increases your probability of failure to say 0.02%. (I'm totally making up numbers, but the trends are what's important).

So the act of powering down/up or hot-swaping intrinsically increases the probability of co-disk-failures, unless you have a very expensive system with separate AC/DC converters (e.g. fully decoupled) and obviously isolated frames, heat-compartments, etc.

BUT, you can mitigate this by having 3+-way redundancy (RAID-1; I honestly don't understand the point of using slower RAID-5 / RAID-6 anymore). So when one drive fails, you have addressed the probability of a second failure. There is a geometric reduction in probability that 3 or 4 or 5 simultaneous drives fail. Meaning even at the peek risky part of the drive-swap operation, if you have say 2% probability that another drive will fail, then there is 0.004% probability that two drives will fail simultaneously. 0.0008% that three fail, etc.

This isn't strictly correct, of course, because the probabilities are not fully independent. You have many common components, and thus their probabilities are intertwined. But sufficient to say the probabilities are less.

Now I say 3+way RAID-1 because it may be silly to swap out a single drive when one goes bad. The process I would recommend (if you have a sufficiently advanced RAID controller, and non-super-expensive disks), is this:

5-way RAID-1 with 2 powered down disks (thus effectively 3-way RAID-1)
On a drive failure, power up the two disks and initiate their syncing.
Swap out the error'd drive, and and initiate it's syncing.

For a brief-while, you have 2 valid, 2 semi-valid, and 1 semi-semi-valid drive.

As the drives sync-up(may take over 24 hours), power-down the original remaining 2 and remove them.

Recycle the good disks into JBOH (Just a bunch of hardware) clustering. Meaning boot-disks / log-file disks in say RAID-1, swapping out the oldest drive.

You can either buy several 4-way/5-way RAID controllers, or get a single 15-disk RAID controller for under than $1k. This allows you to have multiple logical volumes and share the 'spun-down disks', So now you're really only using 3 disks per logical-volume, though having two logical volumes with bad disks does reduce your ideal reliability somewhat. But this gives you 4 volumes which can be combined into RAID-10. You could build such a system for under $6k with various mixtures of high-end and low-end disks (for different partition requirements, boot/OS/linear-logging (RAID-1), random-write-data (RAID-10)).

If the data is super critical, use a block-level master-slave replication. Ideally your application supports direct master-slave or better yet, multi-master.

And if you're JBOH (Just a Bunch Of Hardware) clustering, then trivial RAID-1 with 2 or 3 disks (in software-raid) is all you need. Note, I use 3-disk RAID10 on my home linux machine, (that plus DVD drive fills up my IDE slots) - pretty clever technique. Yes I know virtually all MB's have hardware RAID these days, but unless they've got an extra 4Gig of buffer-RAM in them, they're pointless in my opinion, plus they're non-portable (screw transparent windows support, you can't distinguish disk errors from forced reboots anyway).

--
-Michael
Re:RAID is here to stay by maraist · 2009-09-18 01:21 · Score: 1

Hehe, the 3..5-way redundancy I recommended was primarily to address the peek probability moment of failure. Ideally you address this by using a completely different mechanism to avoid the probability swings. The most correct way of doing so is, as I recommending having fully electrically decoupled systems.

This means:
1) separate AC/DC converters for each power supply (you can often regulate DC voltages off a fluxuating AC source better than adjusting a DC voltage - well, unless you use some large step-down DC 2 DC voltage regulator, but I'm no expert in this industry just a punk orphaned undergrad in EE)
2) fiber-optic connectors to the RAID controller.

You could accomplish step 1 with eSATA, but I have no experience this with in a server configuration. Obviously more expensive server solutions exist.

--
-Michael
Re:RAID is here to stay by LWATCDR · 2009-09-18 01:21 · Score: 2, Interesting

Well the logical thing IMHO is after the first year you put in a new drive and do an array rebuild after making a backup.
Drives are really cheap and I would do that for as long as the array is in use.
Reuse the old drives in desktops if they are SATA.
Not perfect but it keeps you from having an array of old drives in your server.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:RAID is here to stay by vrmlguy · 2009-09-18 01:54 · Score: 1

I don't think you've quite understood the problem described. You can have an infinite number of parity disks, but it does you no good if recovering one data disk causes another data disk to fail.
Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.
Now we try doing the same with ten 10TB disks and the same reliability. The last disk dies and you replace it, only now you must read 10TB from each disk. Instead of adding 1% to the lifetime it adds 10% so that they've spent 20, 30, 40, 50, 60, 70, 80, 90, 100 and 10% (your new disk) of their lifetime. But now another disk fails, you can recover that but then another will fail and another and another and another.
This is more of a proof of concept, but imagine if your RAID-5 array is composed of mirrored pairs. You are still protected against double disk failures, but (if you spread your reads across the pair) you'll only see half the degradation when you rebuild. If that seems too expensive, then use RAID-5 (2+1), where a clever rebuild algorithm will only cost you 2/3s of the degradation. Now that you've wrapped your head around that idea, imagine an NxN set of disks, where each row and each column forms an independent parity set. Now you're protected against triple failures at the same cost as RAID-6. (You'd need to lose four drives on the corners of a rectangle to lose data.) Adjust your value of N to suit your needs; larger values cost you less overhead, but smaller values can improve write performance and potentially reduce rebuild degradation.

Basically, parity does not solve that issue. If you had a mirror, you would instead copy the mirrored disk with significantly less wear on the disks. RAID is very nice as a high-level check that the data isn't corrupted but it's a very inefficient way of rebuilding a whole disk.
No one I know of uses RAID parity to check data integrity; it's way too expensive it terms of drive bandwidth. To check a sector, you have to read all the corresponding sectors on all of the drives in the parity set. If you're truly worried about this (and some people are!), it's much cheaper to add more error checking to your drive, either by custom microcode or in your device drivers (see, for example http://www.google.com/search?q=EMC+Double+Checksum and http://www.google.com/search?q=Oracle+HARD+Initiative).

--
Nothing for 6-digit uids?
Re:RAID is here to stay by Akatosh · 2009-09-18 02:04 · Score: 1

Also if two paired drives fail in a raid1+0 it's easier to recover data out of it (sometimes just powercycle and fsck) than a parity based raid. When raid5 double deaths your data gets turned into spaghetti. Join the Battle Against Any Raid Five.
Re:RAID is here to stay by Chris_Jefferson · 2009-09-18 03:15 · Score: 2, Insightful

Basically you are suggesting someone would make and then sell a disk which could only be read, entirely, 10 times in it's entire life time?
Well that's easily solved. We won't buy those disks.

--
Combination - fun iPhone puzzling
Re:RAID is here to stay by sjames · 2009-09-18 05:18 · Score: 1

What that means is that a given RAID level with given hardware will have a maximum practical size. The filesystem will need to be able to handle being spread across multiple smaller block devices, each a RAID of maximum practical size.
Re:RAID is here to stay by sjames · 2009-09-18 05:20 · Score: 1

Personally, I like to rotate drives around. That is, pull a few (one at a time) from the RAID and put them aside for when another RAID is built. This gives a variety of drives at different ages to help prevent failures from clustering.
Re:RAID is here to stay by Anonymous Coward · 2009-09-18 05:46 · Score: 0

Disclaimer: I work for a storage vendor.
Clustered raid does work, and can allow for a constant rebuild time, even as disks grow larger.
Re:RAID is here to stay by Mad+Merlin · 2009-09-18 05:59 · Score: 1

RAID 1 has much less reliability than RAID 6. Assume a typical case: one disk totally fails. You then start to reconstruct - in a RAID 1 scheme a single sector error will result in the rebuild failing. Not great.
In RAID 6 you start the rebuild and you get a single sector error from one of the drives you're rebuilding from. At that point you've got yet another parity scheme available (in the form of the RAID 6 bit) that figures out what that sector should have been and then continues the rebuild. Then you go back and decide what to do about that drive that had the second error.

No, if you have 4 drives (the minimum useful amount for RAID 6), then you have 3 drives to read from when reconstructing a failed drive in RAID 1. If the read fails on the first drive you're reading from, you move on to the second and third to reconstruct the drive, which is much better reliability than having only dual parity.

--
Game! - Where the stick is mightier than the sword!
Re:RAID is here to stay by BitZtream · 2009-09-18 08:44 · Score: 1

I don't see whats so difficult about this.
Use smaller disks groups concatentated together for larger storage volumes.
Don't use 3 drives of 1TB each to get 2TB of raid. Use 9 333MB drives.
3 virtual devices, lets called them vdevs.
You configure each virtual devices with 3 drives in a RAID5 or 6 configuration.
Then build one big pool, lets call it a zpool, out of the 3 vdevs.
Then when one drive fails, it only takes the time for 333MB to copy, not 1TB.
Lets roll it all up into one complete system. Lets build a file system that goes right along with it, with proper consistence and snapshots, and make an efficient way to (insert the rest of the ZFS commercial here).
Then we could name it something cool ... like zfs ...
This has been solved already.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:RAID is here to stay by badkarmadayaccount · 2009-09-20 05:12 · Score: 1

How about parity calcs on something that can deal with them - GPGPUs?

--
I know tobacco is bad for you, so I smoke weed with crack.

Hardware RAID is dead by PiSkyHi · 2009-09-17 21:46 · Score: 3, Interesting

Hardware RAID is dead - software for redundant storage is just getting started. I am looking forward to making use of btrfs so I can have some consistency and confidence to how I deal with any ultimately disposable storage component.

The ZFS folks have been doing it fine for some time now.

Hardware RAID controllers have no place in modern storage arrays - except those forced to run Windows

Re:Hardware RAID is dead by Chrisje · 2009-09-17 21:57 · Score: 4, Insightful

First of all, "Hardware RAID" is still software, just executed by dedicated circuits. The distinction is kind of moot. For low-cost, low performance systems, software can run on your main box to perform this task, but for high-end applications you'll want dedicated hardware to take care of it, so your machine can do what it needs to do with more zeal.
So my guess is that you're not working for a storage vendor. I haven't seen many people switch to SW RAID recently. If anything, the Unix world is finally crawling out of its "lvm striping" hole. Most servers anywhere are running on stuff like HP's Proliants, and I don't see customers ship back the SmartArray controllers.
Re:Hardware RAID is dead by paulhar · 2009-09-17 22:07 · Score: 2, Informative

> First of all, "Hardware RAID" is still software, just executed by dedicated circuits. The distinction is kind of moot.
I'm not sure where in my post you saw anything about a comparison between Hardware RAID or Software RAID.
> So my guess is that you're not working for a storage vendor. I haven't seen many people switch to SW RAID recently.
I work for NetApp. I didn't think it mattered much in the post I made though. To your second point, as all of the NetApp Enterprise storage systems use software based RAID I can happily confirm that many hundreds of thousands of customers have switched to software RAID.
As you mentioned earlier though the point is moot since when you're delivering an enterprise array to a customer it doesn't matter if the array uses RAID cards provided by a 3rd party vendor, uses RAID cards built in-house, or uses software RAID to write the data that the customer gives you. The ingress point for the customer is a physical port (IP/FC typcially) and that port provides RAID capabilities. Maybe that's also hardware RAID?
Re:Hardware RAID is dead by Anonymous Coward · 2009-09-17 22:52 · Score: 0

Hint: he wasn't replying to you
Re:Hardware RAID is dead by RulerOf · 2009-09-17 22:59 · Score: 3, Informative

FWIW, I'm a happy 3ware customer... saddened by their sellout to LSI, but I digress.

When I think of software RAID, I think of parity data being handled by the operating system, being done on x86 chips as part of the kernel or offloaded via a driver (thinking Fake-RAID).

If you're abstracting your storage away from the operating system that uses it, say via iSCSI or NFS or SMB to a dedicated storage box, like a NetApp filer or a Celerra, then I would consider that hardware RAID, personally speaking. If you're saying that these dedicated storage boxes manage parity, mirroring and so on all done with the same chip that's also running their local operating systems, then I have to admit that yes, that sounds like software RAID to me, but the real distinction I've come to draw between software and hardware RAID is a matter of performance and feature set. If said boxes give the same or better performance (I/Ops and throughput) to a workload as a dedicated, internal storage system managed by something like my 9650SE, then hell..... who cares, right? Aside from being rather impressed that such is possible without dedicated XOR chips, that is.

--
Boot Windows, Linux, and ESX over the network for free.
Re:Hardware RAID is dead by amorsen · 2009-09-17 23:33 · Score: 1

I don't see customers ship back the SmartArray controllers.
There's no need to ship them back, especially the low end versions. The high end ones can't handle non-RAID-formatted disks, which is a bit of a pain, but they perform ok as long as you avoid using real RAID (anything that requires calculating parity.) So stick with RAID-1, and SmartArray is fine -- just have a spare controller or server around, because you can't switch controller vendor without losing your data.

--
Finally! A year of moderation! Ready for 2019?
Re:Hardware RAID is dead by StormReaver · 2009-09-18 00:31 · Score: 1

Most servers anywhere are running on stuff like HP's Proliants, and I don't see customers ship back the SmartArray controllers.
That doesn't mean it isn't happening. We had three SmartArray controllers fail in rapid succession. Each one we replaced failed within days, until the fourth one finally worked. That was certainly a rare situation, but SmartArrays are not a magic bullet. They sometimes fail, and they sometimes fail spectacularly, just like everything else.
Re:Hardware RAID is dead by tuxicle · 2009-09-18 00:57 · Score: 1
Two problems with RAID implemented on the host CPU
1. I/O Bandwidth between CPU and disks is used up to do the multiple reads/writes needed for parity-based RAIDs like RAID5/6. With a dedicated controller, the high-bandwidth traffic stops at the controller, and the host's I/O buses only see a single read or write.
2. Memory backup batteries for write caches, which help in the event of a power failure
From a management point of view, though, host-based RAIDs are much nicer. Most "hardware" RAIDs are a pain to maintain, the less expensive ones require a reboot to get to the BIOS-based management software.
Re:Hardware RAID is dead by atamido · 2009-09-18 01:09 · Score: 1

While it isn't likely to gain many fans on Slashdot, Drive Extender feature of Windows Home Server is the perfect RAID replacement for home uses.
http://en.wikipedia.org/wiki/Windows_Home_Server#Drive_Extender
It does file level replication across NTFS formatted drives of dissimilar sizes. You can add/remove drives easily. If you have a 20 drive array and half of you drives fail, you only lose the files that were just on those drives. You can pull any one of the drives out and hook it to another computer (Windows/Linux/etc) and you can read all of the files stored on that drive in their original directory structure.
There can be performance gains, but there might not be depending on usage (whether or not people are accessing files that have copies on different drives). It's perfect for home uses as recovery is brain dead simple and the storage scalability is great for storing videos on.
Nothing like this exists yet in Linux (or anywhere else that I've seen). There are some pretty flexible Linux distros out there that do some magic form of RAID, but there are still serious performance penalties with writing the parity, and they simple (connect drive to any other machine) recovery method.
The magic happens through the Windows SMB share, so if the feature were to be replicated in Linux, it'd probably be through a modification/hack of Samba. Ideally the storage would be on ZFS/BTRFS so that data integrity is hashed at the block level. I'm basically waiting around for someone to figure out how to do this in Linux.
Re:Hardware RAID is dead by Anonymous Coward · 2009-09-18 01:22 · Score: 0

> First of all, "Hardware RAID" is still software, just executed by dedicated circuits. The distinction is kind of moot.
I'm not sure where in my post you saw anything about a comparison between Hardware RAID or Software RAID.
Ummmm he wasn't replying to your post...
Re:Hardware RAID is dead by aiosaka · 2009-09-18 01:27 · Score: 1

..Most servers anywhere are running on stuff like HP's Proliants, and I don't see customers ship back the SmartArray controllers.
I'm shipping HP P400 SmartArray controller back, it's a nice controller, but it only knows how to do RAID, not JBOD, and while HP DL185 is a nice machine for a storage server (12 3.5 SATA drives in 2U only), I'm no way putting windows inside it to serve several terabytes datastore, OpenSolaris with ZFS and COMSTAR on the other hand - yes, just have to replace that nice shiny P400 controller first, to get access to disks and make ZFS magic to work..
Re:Hardware RAID is dead by maraist · 2009-09-18 01:29 · Score: 1

lvm striping?? Where's the mdadm love??? It's so much more advanced (3-disk RAID-10, for example). When you've got 4Gig RAM and quad processing almost standard and 16-core with 64Gig on the cheap, where's the loss in performance due to SW raid?

Hell, I've got a mysql server with 32Gig of RAM with direct-IO so that non-app system-ram is used purely as a dirty disk spool (since fsync can return once one drive has the data). Ok, so that machine is using HW raid, but that's really only for hot-swap and 15-disk enclosure. My other machines (with only 4G RAM) have similar configurations but SW-RAID.

--
-Michael
Re:Hardware RAID is dead by maraist · 2009-09-18 01:45 · Score: 1

PARITY MUST DIE.. Seriously people, that's all I hear about in this thread.. Didn't you guys get the memo? Hard disks are large enough to not need disk-extension via RAID-5 (the ONLY advantage of RAID-5) for a vast majority of apps. I found some old RAID-5 SCSI configurations when I got to my current company.. We got a 20% speed-boost by switching it over to RAID-1 (granted the disk size cavitated). We did wind up quadrupling it's RAM, but that was cheaper than upgrading a single new SCSI disk. Yes RAID-5 is faster at reading, but I guarantee you that 64Gig of sys-RAM is faster than RAID-5 reads and RAID-1 is MUCH faster at random-writes than RAID-5 (though it does trail in linear-writes which is generally a non-issue).

That being said, there's little the dedicated-hardware can do that the good OS with lots of RAM can't do better (so long as you have enough parallel SATA connectors). Yes you can have battery-backup, but that's similar to UPS / machine (like google does). And when you enter multi-master clustering, then a good clustered OS RAID blows the pants off a SAN (granted, I'm not aware of such a currently 'good' system, but many theoretical ones are in the works).

Anyway, I'm making unsubstantiated claims, so feel free to embarrass me with benchmarks - I only have anecdotal evidence and theoretical purity.

--
-Michael
Re:Hardware RAID is dead by nxtw · 2009-09-18 02:00 · Score: 1

Aside from being rather impressed that such is possible without dedicated XOR chips, that is.
XOR chips really aren't that special. A Core 2 can perform RAID6 calculations in excess of 6000 megabytes/sec. The difficulty is getting enough bandwidth from the drives to the CPU (for example, placing more than 2-4 drives on a PCI-e x1 controller will limit performance.)
Linux software RAID on (PCI-e based) commodity desktop hardware can saturate gigabit ethernet rather easily.
Re:Hardware RAID is dead by gbjbaanb · 2009-09-18 02:20 · Score: 1

And those BIOS-based dedicated 'fakeraid' controllers certainly aren't special at all.
Software RAID is often much faster than the fakeraid hardware controllers, and much easier to update :)
Re:Hardware RAID is dead by hazydave · 2009-09-18 03:14 · Score: 1

All RAID is software RAID. The only difference between "hardware" RAID and software RAID is where the software runs.. does it get its own CPU or not? And there are plenty of applications for "hardware" RAID... external "box of drives" when your PC is full, large NAS (I have one of these), SANs (not yet), etc. From the prespective of the RAID hardware, you may well be running Linux with ZFS or BTRFS or something entirely different, and that's software... but from the external clients prespective, it's hardware RAID. And this is the way its always been done... there were "hardware RAID" cards for the PC's ISA bus... I recall one had an embedded SPARC processor running the RAID software, back in the mid-80s or so.

--
-Dave Haynie
Re:Hardware RAID is dead by smoker2 · 2009-09-18 05:01 · Score: 1

The problem with hardware raid is the hardware. If one card goes bad, you often must replace it with an identical card, same vendor, same model, same version etc. Or your raid stops working. How many spares do you buy to start with to be sure of having a replacement. That's why I prefer software based RAID.

If it's a dedicated COTS box then the vendor should be providing service. But for self build it's better to remain adaptable.
Re:Hardware RAID is dead by sjames · 2009-09-18 05:26 · Score: 1

Actually I try to avoid hardware RAID unless the vendor is willing to fully document the on-disk layout and promise that it will NEVER change. Otherwise, a controller failure is effectively un-recoverable.
Of course, on the lower end, a "RAID controller" is just a bunch of SATA ports and a poorly written soft RAID driver.
Re:Hardware RAID is dead by Courageous · 2009-09-18 06:06 · Score: 1

Even NetApp's software based RAID, with classic aggregates, is starting to get long in the tooth when it comes to RAID rebuild times on SATA disks. So while you are right that SW based parity recovery of lost data elements is far from dead, you're not so right about NetApp's current approach to it. Have you looked at your rebuild times for larger aggregates of SATA systems, and considered the implications for 2TB and (coming) 4TB SATA drives? The situation does not look good.
C//
Re:Hardware RAID is dead by Anonymous Coward · 2009-09-18 07:31 · Score: 0

Ok, so the MegaRAID line is pretty weak (I do work for a certain storage vendor)...
As far as "dedicated" storage boxes like NetApp, WAFL, etc... Most of those are basically filesystem heads that virtualize backend storage (IBM/LSI/EMC/etc). That backend storage 90% of the time still consists of RAID sets (mostly RAID5 still) that the frontend filer concatenates or otherwise virtualizes. Products like IBMs SVC fall into the same category.
From my years of working here, I'd say RAID is far from dead. Given the number of IT guys who seem to believe that they don't need backups, I think it speaks well for the reliability and stability of a well done mid-range or enterprise RAID solutions that I rarely see customer with data that is just gone. Upcoming solutions that incorporate SSDs will more than take care of the speed issues. Pre-emptive rebuilds can reduce reconstruction times to less than 10 minutes even in multi-TB RAID sets when properly implemented.
RAID is far from dead.
Re:Hardware RAID is dead by Anonymous Coward · 2009-09-18 07:44 · Score: 0

(not able to log in to reply in person)
Reconstruct times on a large aggr are similar to a small one as the aggr gets broken up into raid sets, the maximum size of which is 14+2. Reading 15tb and writing 1tb doesn't take very long, less than 24 hours.
What numbers are you seeing?
Re:Hardware RAID is dead by Anonymous Coward · 2009-09-18 08:30 · Score: 0

If you work for a hardware raid vendor.. you *wouldn't* see people switching to software raid - By definition, you wouldn't see them come back!
Re:Hardware RAID is dead by Courageous · 2009-09-18 08:32 · Score: 1

Turn it around: what numbers is NetApp advertising, and will you upgrade our filers to a higher model if we don't get those numbers? And why the hell are you AC'ing this. It just annoys.
C//
Re:Hardware RAID is dead by Atario · 2009-09-18 21:08 · Score: 1

the real distinction I've come to draw between software and hardware RAID is a matter of performance and feature set. If said boxes give the same or better performance (I/Ops and throughput) to a workload as a dedicated, internal storage system managed by something like my 9650SE, then hell..... who cares, right?

The whole point of doing software RAID is that you are no longer bound to a proprietary disk format set by the controller manufacturer. When your hardware RAID controller fails, you better have an exact duplicate already bought, because you have no guarantee the manufacturer even makes that one anymore (or is even in business anymore). When the software RAID machine's motherboard (or drive controller) fails, you just get whatever new one you want, because your software will be the same and the drive interfaces will be the same.

--
"A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt
Re:Hardware RAID is dead by Anonymous Coward · 2009-09-18 22:10 · Score: 0

Netapp storage does perform the RAID processing in software, it doesn't use dedicated RAID chips.

Non-issue ... by Lazy+Jones · 2009-09-17 21:47 · Score: 3, Interesting

Modern RAID arrays show no dramatic performance degradation while rebuilding, also with RAID-50/RAID-60 arrays, only a fraction of the disk accesses is slower than usually when a single drive is replaced.

For enterprise level storage systems, this is also a non-issue because of thin provisioning.

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)

I thought RAID was about spindle count by BlueParrot · 2009-09-17 21:50 · Score: 4, Insightful

I admit I'm not an expert, but I was under the impression that RAID was mainly about ensuring you a large number of spindles and some redundancy so you can serve data quickly even if a couple of drives fail while the servers are under pressure. Surely you would not rely on a RAID to avoid data loss since you should be keeping external backups anyway?

Re:I thought RAID was about spindle count by gedhrel · 2009-09-17 22:30 · Score: 4, Informative

You don't rely on RAID to avoid data loss; you rely on it as a first line in providing continuity. We run backups of large systems here, but we tend to do other things too: synchronous live mirroring between sites of the critical data. And beter system design. There are some systems where, whilst we _could_ go back to tape (or VTL) at a pinch, having to do so would be a disaster in itself.
We're designing systems that permit rapid service recovery (the most live critical data) and a second tier of online recovery to get the rest back. We just can't afford the downtime.
Double-spindle failures on RAID systems are just one of those things that you _will_ see. Deciding whether a system deserves some other measure of redundancy is mostly an actuarial, rather than a technical, decision.
Re:I thought RAID was about spindle count by Sobrique · 2009-09-17 23:14 · Score: 1

Yeah, RAID is just playing statistics - you're taking a chance that during your rebuild window, you don't get a second drive outage in the same RAID set. The bigger the RAID set, the lower the chance is, but the chance is always present. Even if you go to extremes like triple mirror, remote site replicas... the chance of a compound 6 drive failure exists - it's just the odds are phenomenally low, that at that point you're far more likely that what's happened is that a plane has fallen out of the sky onto your datacentre instead.
Re:I thought RAID was about spindle count by Anonymous Coward · 2009-09-18 02:27 · Score: 0

Lemme correct you, here..
You *do* rely on RAID to avoid data loss (that's the point of the "REDUNDANCY" bit...). What you *don't* rely on RAID for, however, is BACKUPS.
Re:I thought RAID was about spindle count by Abcd1234 · 2009-09-18 03:07 · Score: 1

You don't rely on RAID to avoid data loss
Well that's absurd, of course you do. You should not rely on RAID to the exclusion of other approaches (like proper backups, off-site mirroring, etc), but the whole point of RAID is to provide a first-line defense against catastrophic data loss.
Re:I thought RAID was about spindle count by sjames · 2009-09-18 05:30 · Score: 1

You use RAID to avoid data availability loss. The backup may or may not be another RAID. RAID is often relied upon to minimize the chance of losing the data that changes between backups.
Re:I thought RAID was about spindle count by xenocide2 · 2009-09-18 06:33 · Score: 1

RAID has two points: eliminating single point of failure, and multichannel IO. The RAID design you choose for a given system depends on your desired balance of performance and reliability. To claim the "whole point of RAID is to provide a first-line defense against catastrophic data loss" ignores half of the point and the decisions that should be made when designing systems.

--
I Browse at +4 Flamebait
Open Source Sysadmin

well they are right about one thing by Anonymous Coward · 2009-09-17 22:02 · Score: 0

the rebuiling times are really astronomical. I don't know how my arrays do it, but it routinely costs me 3+ hours to rebuild

that, and the various scanning / fixing / searching tasks.. endless, if you work with the larger drives, even if sata attached

'course it's nice to have large HD's on desktop pc's, but when you have to fix bosses' PC and it takes 8 (!!) hours to clone, scan and repair.. while he can't work.. that's no good.

my 2cts

Wrong assumptions by vojtech · 2009-09-17 22:03 · Score: 5, Insightful

The article assumes that when within a RAID5 array a drive encounters a single sector failure (the most common failure scenario), an entire disk has to go offline, be replaced and rebuilt.

That is utter nonsense, of course. All that's needed is to rebuild a single affected stripe of the array to a spare disk. (You do have spares in your RAID setups, right?)

As soon as the single stripe is rebuilt, the whole array is again in a fully redundant state again - although the redundancy is spread across the drive with a bad sector and the spare.

Even better, modern drives have internal sector remapping tables and when a bad sector occurs, all the array has to do is to read the other disks, calculate the sector, and WRITE it back to the FAILED drive.
The drive will remap the sector, replace it with a good one, and tada, we have a well working array again. In fact, this is exactly what Linux's MD RAID5 driver does, so it's not just a theory.

Catastrophic whole-drive failures (head crash, etc) do happen, too. And there the article would have a point - you need to rebuild the whole array. But then - these are by a couple orders of magnitude less frequent than simple data errors. So no reason to worry again.

*sigh*

Re:Wrong assumptions by Anonymous Coward · 2009-09-17 22:53 · Score: 2, Insightful

Even if only a sector in a disk has failed, I'd mark the entire disk as failed and replace it as soon as I could. Maybe I'm paranoid, but I've seen many times that when something starts to fail, it continues failing at increasing speed.
Re:Wrong assumptions by Junta · 2009-09-18 00:18 · Score: 1

Hence why it is a warning condition.
First off, people keep saying 'disks never report bad blocks until they've exceeded their bad block count'. That is just wrong, it holds true for write operations, but it *cannot* magically do that on read (if it could read the sector, it isn't an error, if it can't, than it certainly can't reconstruct the data that it is missing. There may be some more complications involved, but that describes one scenario accurately.
If the bad sector relocation count is exceeded, then the drive is failed. Maybe frequent relocations before exhausting that overhead would be a sign too, but a one-off bad block on read is not something to be overly worried about.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Wrong assumptions by Anonymous Coward · 2009-09-18 00:27 · Score: 0

Unfortunately, many RAID implementations do exactly that -- a single read/write error and the drive is marked bad and taken offline, and a rebuild is required. That can be avoided. If, assuming the article is correct, you get a second disk failure while rebuilding one out of 234 rebuilds, then you need to REDUCE the frequency of rebuilds. If you have 1 rebuild a year, that means you're good for 200 years. I'm OK with that.
But still, by using intelligent remapping with a single error rather than taking the whole drive offline and requiring a rebuild, you can drastically reduce the number of rebuilds.
Other technologies, such as unRAID, make failure much less of a problem -- striped raid you loose the whole array and have to try expensive RAID recovery services. With non-striped raid, such as unRAID, each drive in the array is a standard filesystem, and has entire files, and can be simply put in another system and the data copied off of it.
Re:Wrong assumptions by Anonymous Coward · 2009-09-18 00:33 · Score: 0

Even if only a sector in a disk has failed, I'd mark the entire disk as failed and replace it as soon as I could. Maybe I'm paranoid, but I've seen many times that when something starts to fail, it continues failing at increasing speed.
You're paranoid. Even if that failed sector results in the cascade failure you describe, which is unlikely, it's on a fucking raid. When it completely fails you're still going to have access to all your data, you can wait until it does fail, replace it, and lose nothing. On the other hand, if you replace it because of one sector failing, it could be a perfectly good drive. Hell, if you buy a high density drive, I challenge you to go through the step of simply installing your OS without seeing failed sectors. Today's drives internally do the work of marking the sector bad and skipping it, and failed sectors are part of the normal operating mode, not something that is considered unusual.
Re:Wrong assumptions by atamido · 2009-09-18 01:21 · Score: 1

Unfortunately, many RAID implementations do exactly that -- a single read/write error and the drive is marked bad and taken offline, and a rebuild is required.
That's pretty much been my experience with the multiple RAID controller cards we've used. There are simpler/better ways to handle the error, but they seem to be ignored. Especially with the size of drives, a single error that has nothing to do with the drive's long term reliability marks the drive as bad. Sure you can pull the drive and pop it back in, but that requires a full rebuild. A controller card should make the drive read/write to that area a few times to get the drive to mark it back it bad and remap the sector, then just write the stripe back in place. (And log/email a notification of the error.)
This is basically what Xiotech does, and they make a bundle because it is all seamless and they never have to replace any drives in their storage boxes.
Re:Wrong assumptions by initialE · 2009-09-18 04:41 · Score: 1

Between having a spare and a raid-6, why not choose the raid-6? The system is in fully redundant mode even while recovering from an error.

--
Starbucks, Harbuckle of Breath.
Re:Wrong assumptions by rdebath · 2009-09-18 08:12 · Score: 1

I wouldn't go with never, but yes it is pretty rare.
You see a modern hard drive has two levels of error correction. There's the 'on the fly' correction, it takes very little CPU and happens fast. Then there's the 'advanced' mode, not only does it use the ECC to it's full extent but it uses the block CRCs so it can check guesses and lots of other techniques like merging multiple passes.
So yes there's a pretty good chance that the hard drive will be able to read a sector even though it's bad enough that it won't use it again.

If you want smaller drives... by asdf7890 · 2009-09-17 22:05 · Score: 4, Interesting

If you want smaller drives to speed up rebuild times then, erm, buy smaller drives? You can get ~70Gb 10Krpm and 15Krpm drives fairly readily - much smaller than the 500-to-2000-Gb monsters and faster too. You can still buy ~80Gb PATA drives too, I've seen them when shopping for larger models, though you only save a couple of peanuts compared to the cost of 250+Gb units.

If you can't afford those but still don't want 500+Gb drives because they take too long to rebuild if the array is compromised and needs a rebuild, and management won't let you buy bog standard 160Gb (or smaller) drives as they only cost 20% less than 750Gb units without the speed benefits of the high cost 15Krpm ones, how about using software RAID and only using the first part of the drive? Easily done with Linux's software RAID (partition the drives with a single 100Gb (for example) partition, and RAID that instead of the full drive) and I'm sure just as easy with other OSs. You'll get speed bonuses too: you'll be using the fastest part of the drive in terms of bulk transfer speed (most spinning drives are arranged such that the earlier tracks have higher data density) and you'll have lower latency on average as the heads will never need to move the full diameter of the platter. And you've got the rest of the drive space to expand onto if needed later. Or maybe you could hide your porn stash there.

ZFS, Anyone? by Tomsk70 · 2009-09-17 22:09 · Score: 2, Interesting

I've managed to get this going, using the excellent FreeNAS - although proceed with caution, as only the beta build supports it, and I've already had serious (all data lost) crashes twice.

However the principle is sound, and I'm sure this will become standard before long - the only trouble being that HP, Dell and the like can't simply offer upgrades for existing RAID cards - due to the nature of ZFS, it needs a 'proper' CPU and a gig or two or RAM. Even so, it does protect against many of the problems now besetting RAID (which was never meant to handle modern, gargantuan disk sizes).

Re:ZFS, Anyone? by c6gunner · 2009-09-17 23:21 · Score: 1

I've managed to get this going, using the excellent FreeNAS - although proceed with caution, as only the beta build supports it, and I've already had serious (all data lost) crashes twice.
That's horrible!!! Even when I was running ZFS under FUSE on Ubuntu, it didn't take out any of my data. I did that for well over a year, and felt pretty nervous about it, but never had an issue. You need to ditch FreeNAS ASAP and get your server on OpenSolaris.
Re:ZFS, Anyone? by Cheeze · 2009-09-18 00:06 · Score: 1

I ran ZFS/FUSE on Ubuntu 64-bit for about 3 months. Aside from some performance issues, it worked great up until about 20-30 reading and writing threads, when it crashed. It was easy enough to restart the file system, but I also had to restart the 15 VMs I had running on it. It would crash predictably though, so that's something.
ZFS under FreeBSD or Solaris is so much nicer. The performance even on the same hardware is many times better in straight reading and writing throughput.

--
Why read the article when I can just make up a snap judgement?
Re:ZFS, Anyone? by Tomsk70 · 2009-09-18 01:04 · Score: 1

Well I got it up and running (I do like FreeNAS, it allows me to set things up without having to delve too deeply into Unix/ Linux - however once I tried to copy a large (>5Tb) amount of data to it, not only did it crash and burn, but it wouldn't boot while the drives were still in place. Removing the sata cables to the drives allowed it to boot, but as a result any chance of recovery was gone - even rebuilding FreeNAS and recreating the drive pool didn't change anything.
This left me much more wary of using FreeNAS for ZFS/ ISCSI stuff - but I can't complain, as Freenas .7 is still in beta, so I guess that's the price for being bleeding edge :-)
I did consider using a heaver-duty unix OS, but I'm a windows engineer (the home setup is a 2003 domain) and spend more than enough time already fixing those, so I'm not mad keen on learning a new OS just for one purpose. Plus, I found that the alternative file systems to NTFS (UFS, etc.) are all very well, but I quickly ran into trouble in all sorts of ways - one silly example is the new Win7 libraries - unless the required drive/ dir is indexed, you can't add them (and the libraries, while just another implementation of the same thing, are very handy)...add to that FreeNAS keeping my domain account's password in a plain txt file (eeek), and it's sort of convinced me to wait until there's a decent offering for 2008 R2....
Yes, yes, I know it will be in time for 2020 :-)
Re:ZFS, Anyone? by Cato · 2009-09-18 01:54 · Score: 1

Sounds like ZFS is very bleeding edge at the moment - a crash leaving your disks unrecoverable is worse than ext4, for example, which is a newer filesystem.
Re:ZFS, Anyone? by c6gunner · 2009-09-18 04:08 · Score: 1

I did consider using a heaver-duty unix OS, but I'm a windows engineer (the home setup is a 2003 domain) and spend more than enough time already fixing those, so I'm not mad keen on learning a new OS just for one purpose.
Trust me - I'm no unix expert either. If you can install and run Ubuntu Linux, you can do the same with OpenSolaris.
The only problems I had with making the switch was hardware incompatibility - one of my SATA controller cards wasn't supported, and my on-board NIC would crap out every half hour or so for about a minute. To correct those issues, I bought a new controller card ($40) and compiled a different version of the network drivers. Everything else was easy. Give it a try - you can at least download the live-cd. That way you can get a feel for it and see if it'll support your hardware, without having to install anything.
Re:ZFS, Anyone? by jimicus · 2009-09-18 05:51 · Score: 1

only the beta build supports it, and I've already had serious (all data lost) crashes twice.
Get real. There's no earthly way I'm putting anything important onto a system for which a caveat like that exists, and I won't until that caveat's been gone for at least 18 months - 2 years.
Re:ZFS, Anyone? by Tomsk70 · 2009-09-20 21:59 · Score: 1

Get real?
I was illustrating that this is a new technology that will fix a lot of the issues that have grown around RAID, not trying to sell new s/w.
Do let me know where in my article I was giving the impression that this was a sales pitch... ...although now you've pointed that out, I've realised that Firefox is less that two years old, and I've now hit the 100+ user mark for unprotected password lists. So I guess until they fix that for default installs, I'd better not use FF for at least 18 months :-)
Re:ZFS, Anyone? by Tomsk70 · 2009-09-20 22:00 · Score: 1

I like the sound of that - you've convinced me, I'll give it a whirl!

Fountain codes? by andrewagill · 2009-09-17 22:11 · Score: 3, Interesting

What about fountain codes? The coding there is capable of recovering from a greater variety of faults.

Re:Fountain codes? by Anonymous Coward · 2009-09-17 22:44 · Score: 1, Interesting

Why fountain codes ? Any other erasure code http://en.wikipedia.org/wiki/Erasure_codes will do the job. Parity and Reed Solomon codes used in RAID are in fact erasure codes.
Re:Fountain codes? by kasperd · 2009-09-18 01:51 · Score: 1

What about fountain codes?
The properties are quite similar to the Reed Solomon codes used for RAID-5 and RAID-6. The main difference according to your link is, a tradeoff between storage efficiency and CPU usage. For Reed Solomon adding parity disks causes CPU usage to grow quadratically, but the disk space used remains optimal. Fountain codes on the other hand will use more disk space, but CPU usage grows only linearly. With the one parity disk used for RAID-5, and the two typically used for RAID-6, the CPU usage is not a problem.

A problem with fountain codes is, that you don't know beforehand how many disks you are going to need for recovery. You'll absolutely need a safety margin there, which in itself is going to add more CPU time than you saved from using fountain codes to begin with. You'll probably need arrays with 10s of parity disks before it starts paying off. And at that point the time needed to read and write that many different disks every time you update a sector is going to be performance problem.

Instead of just doing parity between the disks, you could do parity that goes both across disks and within disks. Imagine a slice of say 16 sectors, and eight disks, then instead of 6+2 disks, you could do 104+24 sectors with fountain codes and spread them across the disks. If you lose one complete disk and another one have a bad sector, you have lost just 17 sectors, and the remaining 7 redundant sectors give you a high chance of recovery. And you get slightly better storage efficiency than with RAID-6. However, single sector updates will still be expensive.

In terms of performance and safety, I think the standard RAID-6 Reed Solomon codes sounds better than fountain codes.

Fountain codes sounds much more appropriate for communication over a very lossy link. In that case a 10% chance of needing more data than expected doesn't mean data loss, it just means a 10% chance of the sender needing to compute for a bit longer to keep sending. That's a very reasonable tradeoff for getting the performance benefit from fountain codes.

--

Do you care about the security of your wireless mouse?
Re:Fountain codes? by andrewagill · 2009-09-18 02:42 · Score: 1

I was thinking of the within-disk concept you mentioned when I posted that. (My first encounter with fountain codes was trying to come up with a way to recover data from CDs that would eventually degrade)

I'm not sure how Reed Solomon compares, since I was mainly thinking of the simple parity encoding of earlier RAIDs.
Re:Fountain codes? by jabuzz · 2009-09-21 00:36 · Score: 1

Given that I can buy a storage controller that is able to do RAID 6 reads and writes with *NO* performance penalty what is this rubbish about CPU usage for? That most storage vendors are lame and don't do such controllers at the moment is not really relevant.

This video would disagree... by Anonymous Coward · 2009-09-17 22:13 · Score: 0

http://www.youtube.com/watch?v=96dWOEa4Djs

ZFS by DiSKiLLeR · 2009-09-17 22:16 · Score: 5, Informative

This is something the ZFS creators have been talking about for some time, and been actively trying to solve.

ZFS now has triple parity, as well as actively checksumming every disk block.

--
You can tell how powerful someone is by the magnitude of the crime they can commit and be able to get away with.

Re:ZFS by DiSKiLLeR · 2009-09-17 22:22 · Score: 5, Informative

I thought I should add:
ZFS speeds up rebuilding a RAID (called resilvering) over traditional non-intelligent or non-filesystem based RAIDS by only rebuilding the blocks that actually contain live data; there's no need to rebuild EVERYTHING if only half the filesystem is in use.
ZFS also starts the resilvering process by rebuilding the most IMPORTANT parts first; the filesystem metadata and works its way down the tree to the leaf nodes rebuilding data. This way, if more disks fail, you have attempted to rebuild the most data possible. If filesystem metadata is hose, everything is hosed.
ZFS tells you which files are corrupt, if any are, and insufficient replicas exist to due failed disks.
All this on top of double or triple parity. :)

--
You can tell how powerful someone is by the magnitude of the crime they can commit and be able to get away with.
Re:ZFS by Anonymous Coward · 2009-09-17 23:52 · Score: 1, Insightful

But does on run on Linux?
I wish someone would just make a friggin kernel patch to add real ZFS support to Linux. You can't distribute pre-built Linux kernels with ZFS support due to licensing issues, BUT you could distribute a kernel patch that we can then apply to our kernels and compile ourselves and everything would be OK legally as long as you don't redistribute the patched binaries.
Re:ZFS by Anonymous Coward · 2009-09-18 00:21 · Score: 0

But when will you be able to raidz across mirrors?
Our current setup (ZFS mirror across two hw-raid external units) allows an entire unit to fail (power to rack, link or whatever) and then a disk to fail in the other unit. It keeps the ZFS data integrity and recovery aspect at the cost of the probably-not-that-likely "RAID5 write hole".
I can't see how you do that yet with a pure ZFS solution. The ability to raidz across mirrored pairs (each side being in a different external JBOD) would eliminate the "write hole" while still coping with the suggested failure mode (which it isn't too hard to imagine happening).
Re:ZFS by ari_j · 2009-09-18 01:15 · Score: 1

Oblig. Futurama: The fools! If only they'd built it with six thousand and one hulls!
Re:ZFS by Anonymous Coward · 2009-09-18 01:19 · Score: 0

On top of this is snapshots, very efficient replication, and probably most importantly for those married to statistics, a very good caching system. The caching system helps you avoid the ooooSCARYoooo issues of the outdated article by simply serving from *memory*.
The article has many problems including the fact that the author doesn't really understand modern virtualized filesystems or how they address these issues by merging intimate knowledge of the disk state, inline checksums (ZFS), smart data recovery, and inline scrubbing. I have *never* seen a disk read error of the sort he is talking about (the 10^14) error as those are effectively masked by the filesystem/OS, RAID (ZFS), (they both exist for this reason), good data management (snapshots, replication), and monitoring. He is essentially assuming that the bit error rate equates to disk failure and that simply is wrong.
Articles to read:
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
http://research.microsoft.com/pubs/64599/tr-2005-166.pdf
http://www.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf
Re:ZFS by atamido · 2009-09-18 01:30 · Score: 1

ZFS now has triple parity, as well as actively checksumming every disk block.
You can also store multiple copies of blocks, however there is a caveat on this. From the #zfs channel on Freenode:

[2009-09-12 09:24:03] [kjetilho] in ZFS, you need to add multiple disks at a time to get redundancy
[2009-09-12 09:24:49] [kjetilho] you can specify "copies=2" which means all data will be stored twice, but you're not guaranteed the copies will be on different disks
So you can store multiple copies of blocks, but you can't guarantee the copies will be on different drives? I don't know whether to laugh or cry.
Re:ZFS by Anonymous Coward · 2009-09-18 02:01 · Score: 0

This feature was implemented for machines with one disk
Re:ZFS by Anonymous Coward · 2009-09-18 02:09 · Score: 0

Oh, yeah! And I will surely take advice on file systems from a guy named 'Disk killer'! :)
Re:ZFS by Anonymous Coward · 2009-09-18 03:17 · Score: 0

ZFS now has triple parity, as well as actively checksumming every disk block.
Screw that, we've got QUINTUPLE parity!
Re:ZFS by Anonymous Coward · 2009-09-18 03:27 · Score: 0

And since it doesn't have a FSCK, you're screwed until access time to identify such failures.
The fact of the article is that parity RAID is the dying breed. Mirroring solutions should only become better as drive densities and drives continue to grow and become cheaper. The missing part of the mirroring equation is that RAID runs at a lower level and has no context for the data, if it knew some things about the filesystem like it does with ZFS then you can make mirror recovery and failure mechanisms radically smarter than just blindly rebuilding every single sector and the ballooning disks.
Re:ZFS by Anonymous Coward · 2009-09-18 03:55 · Score: 0

Yes it does have fsck. It's called 'scrub', i.e.: 'zpool scrub tank'
Re:ZFS by Anonymous Coward · 2009-09-18 08:48 · Score: 0

Who cares?
Just switch to OpenSolaris.
Re:ZFS by BitZtream · 2009-09-18 09:16 · Score: 1

If there aren't enough drives to spread the blocks across distinct drives than it can't put multiple copies on different drives.
If you read the documentation, and take the IRC excerpt in context, you'll see that you simply need multiple disks to achieve the goal. You basically have to ignore his first line to take the second line to mean what you say.
Put two vdevs of equal size into a zpool, set copies=2 and you will have copies stored on 2 physical disks, and depending on your setup RAIDz or mirrored at the vdev level as well.
zfs is predictable. It is not random. You can gaurantee copies=2 will store data on different drives by simply creating it properly, with multiple vdevs, and setting copies=2 while there is sufficient space.
The caveat is: copies=2 can't store on 2 separate disks if you only have one disk.
And my comment on that is ... no shit Sherlock.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:ZFS by atamido · 2009-09-18 10:01 · Score: 1

I asked to verify I understood correctly, and he indicated that it was a quick implemented feature based on metadata copies, so basically not to expect much out of it. It's possible he was wrong or there was a miscommunication though.
Looking more at the documentation though seems to indicate that the only time this would happen is if the other drive(s) do not have the space available for the copies, in which case multiple copies will be written to the same drive.
http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection
Re:ZFS by ducomputergeek · 2009-09-18 10:46 · Score: 1

I knew I ran FreeBSD over Linux for a reason.
*ducks*

--
"The problem with socialism is eventually you run out of other people's money" - Thatcher.

Re:Worked-around a Long Time Ago by Anonymous Coward · 2009-09-17 22:22 · Score: 5, Interesting

But really none of that should be necessary for the general case. Storing data in different physical locations is a good but entirely unrelated issue- the main problem of disk reliability is still very much in need of a solution. That's pretty much the point of the article: You can come up with various solutions which move the problem around, give multiple fallbacks for when something goes wrong.. but there's still the problem of things going wrong in the first place. I shouldn't need to use 12 separate disks spread across the globe just for basic reliability / redundancy

Old news by EmTeedee · 2009-09-17 22:22 · Score: 2, Interesting

Read that before on slashdot. Why RAID 5 Stops Working In 2009

Parity declustering by Biolo · 2009-09-17 22:37 · Score: 4, Interesting

Actually I like the parity declustering idea that was linked to in that article, seems to me if implemented correctly it could mitigate a large part of the issue. I have personally encountered the hard error on RAID5 rebuild issue, twice, so there definitely is a problem to be addressed...and yes, I do now only implement RAID6 as a result.

For those who haven't RTFATFALT (RTFA the f*** article links to), parity declustering, as I understand it, is where you have, say, an 8 drive array, but where each block is written to only a subset of those drives, say 4. Now, obviously you loose 25% of your storage capacity (1/4), but consider a rebuild for a failed disk. In this instance only 50% of your blocks are likely to be on your failed drive, so immediately you cut your rebuild time in half, halving your data reads, and therefore your chance of encountering a hard error. Larger numbers of disks in the array, or spanning your data over fewer drives, cuts this further.

Now, consider the flexibility you could build into an implmentation of this scheme. Simply by allowing the number of drives a block spans to be configurable on a per block basis, you could then allow any filesystem that is on that array to say, on a per file basis, how many disks to span over. You could then allow apps and sysadmins to say that a given file needs to have the maximum write performance, so diskSpan=2, which gives you effectively RAID10 for that file (each block is written to 2 drives, but with multiple blocks in the file is likely to be written to a different pair of drives, not quite RAID10, but close). Where you didn't want a file to consume 2x its size on the storage system, you could allow a higher diskSpan number. You could also allow configurable parity on a per block basis, so particularly important files can survive multiple disk failures, temp files could have no parity. There would need to be a rule however that parity+diskSpan is less than or equal to the number of devices in the array.

Obviously there is an issue here where the total capacity of the array is not knowable, files with diskSpan numbers lower than the default for the array will reduce the capacity, numbers higher will increase it. This alone might require new filesystems, but you could implement todays filesystems on this array as long as you disallowed the per-block diskSpan feature.

This even helps for expanding the array, as there is now no need to re-read all of the data in the array (with the resulting chance of encountering a hard error, adding huge load to the system causing a drive to fail, etc). The extra capacity is simply available. Over time you probably want a redistribution routine to move data from the existing array members to the new members to spread the load and capacity.

How about you implement a performance optimiser too, that looks for the most frequently accessed blocks and ensures they are evenly spread over the disks. If you take into account the performance of the individual disks themselves, you could allow for effectively a hierarchical filesystem, so that one array contains, say, SSD, SAS and SATA drives, and the optimiser ensures that data is allocated to individual drives based on the frequency of access of that data and the performance of the drive. Obviously the applications or sysadmin could indicate to the array which files were more performance sensitive, so influencing the eventual location of the data as it is written.

--
Stealing a rhinoceros should not be attempted lightly.

Re:Parity declustering by Shag · 2009-09-18 01:50 · Score: 1

How about you implement a performance optimiser too, that looks for the most frequently accessed blocks and ensures they are evenly spread over the disks. If you take into account the performance of the individual disks themselves, you could allow for effectively a hierarchical filesystem, so that one array contains, say, SSD, SAS and SATA drives, and the optimiser ensures that data is allocated to individual drives based on the frequency of access of that data and the performance of the drive.
Props for being the only poster to bring HSM into this - I think it's increasingly necessary. Okay, sure, we've all got N times as much data as we did 10 years ago - but really, how much of that data are we accessing regularly, and how much of it is "just lying around?" I'd even like to see a couple more layers in that hierarchical system, some kind of near-online storage would be good. 15 years ago, I worked at a place that had an optical jukebox sitting next to one bank of main Sequent servers, and that thing held probably hundreds of optical disks. These days, you could replicate the idea with 32GB SDHC cards in less space (don't know if anyone has). And of course there needs to be a way to take the truly unused stuff all the way off-line.
I'm really waiting for good home HSM. Take something suitably automated like Apple's "Time Capsule" backup, give it optical burner and a SDHC card slot, and a way to tell my computer "hey, I'm getting full, I want to clean out the files you've touched least recently, please give me some media to shove them onto" and I will be a happy camper. (If this already exists, somebody please let me know!)

--
Village idiot in some extremely smart villages.
Re:Parity declustering by lewiscr · 2009-09-18 06:05 · Score: 1

For those that missed the point, the poster is describing ZFS.
Re:Parity declustering by Anonymous Coward · 2009-09-18 06:42 · Score: 0

Did you just reinvent ZFS before our very eyes?
Re:Parity declustering by Anonymous Coward · 2009-09-18 07:43 · Score: 0

Now, obviously you loose 25% of your storage capacity (1/4)...
You lost me at "loose". How do you loosen storage capacity? Build robot legs for the drives?
I'm not reading comments any further. It is clear the summary of the entire thread is: Idiot speaks whereof he knows not, some other idiot reposts it, yet another idiot submits it to Slashdot, another...person flips a coin and decides to approve it, people comment, some can't spell.
And someone needs a hug....
Re:Parity declustering by jabuzz · 2009-09-21 00:40 · Score: 1

Install GPFS and then throw TSM into the mix and you have just that today. Some fast 15k spindles, tiered to some slow but large 7.2K SATA spindles, tiered finally to a large tape library.
Then throw in CTDB for some clustered Samba and node resiliency.
Re:Parity declustering by Shag · 2009-09-24 21:24 · Score: 1

Sounds nice. Where can I get that in click-and-drool form for my home WLAN? :)

--
Village idiot in some extremely smart villages.

Re:Worked-around a Long Time Ago by Fred_A · 2009-09-17 22:43 · Score: 5, Funny

I shouldn't need to use 12 separate disks spread across the globe just for basic reliability / redundancy

You're trying to weasel out of paying IBM protection money !

--

May contain traces of nut.
Made from the freshest electrons.

Remembering an article earlier this week: by Chrisq · 2009-09-17 22:44 · Score: 3, Interesting

Will scalable distributed storage systems like Hadoop and Google File System take over from RAID?

RAID concept is fine, it's that HDs are too big by trims · 2009-09-17 22:45 · Score: 5, Interesting

As others have mentioned, this is something that is discussed on the ZFS mailing lists frequently.

For more info there, check out the digest for zfs-discuss@opensolaris.org

and, in particular, check out Richard Elling's blog

(Disclaimer: I work for Sun, but not in the ZFS group)

The fundamental problem here isn't the RAID concept, is that the throughput and access times of spinning rust haven't changed much in 30 years. Fundamentally, today's hard drive is no more than 100 times as fast (both in throughput and latency) than a 1980s one, while it holds well over 1 million times more.

ZFS (and other advanced filesystems) will now do partial reconstruction of a failed drive (that is, they don't have to bit copy the entire drive, only the parts which are used), which helps. But there are still problems. ZFS's pathological case results in rebuild times of 2-3 WEEKS for a 1TB drive in a RAID-Z (similar to RAID-5). It's all due to the horribly small throughput, maximum IOPs, and latency of the hard drive.

SSDs, on the other hand, are no where near the problem. They've got considerably more throughput than a hard drive, and, more importantly, THOUSANDS of times better IOPS. Frankly, more than any other reason, I expect the significant IOPS of the SSD to signal the death knell of HDs in the next decade. By 2020, expect HDs to be gone from everything, even in places where HDs still have better GB/$. The rebuild rates and maintenance of HDs simply can't compete with flash.

Note: IOPS = I/O Per Second, or the number of read/write operations (irregardless of size) which a disk can service. HDs top out around 350, consumer SSDs do under 10,000, and high-end SSDs can do up to 100,000.

-Erik

--
There are always four sides to every story: your side, their side, the truth, and what really happened.

Re:RAID concept is fine, it's that HDs are too big by c6gunner · 2009-09-17 23:27 · Score: 1

But there are still problems. ZFS's pathological case results in rebuild times of 2-3 WEEKS for a 1TB drive in a RAID-Z (similar to RAID-5).
Huh? How big is your array? 2,500 Petabytes?
I've got 2 RAIDZ zpools on one server, one is 5x500GB, the other is 6x1TB. When I ran some tests with the arrays half-filled, the 500GB drives rebuild in around 2 hours, the 1TB in around 5. That gives me a rebuild time of around 10 hours for a FULL terabyte array. That's quite a bit shorter than your 2-3 weeks.
Re:RAID concept is fine, it's that HDs are too big by Schraegstrichpunkt · 2009-09-18 00:17 · Score: 1

Do you understand the difference between a pathological case and a common case?

--
http://outcampaign.org/
Re:RAID concept is fine, it's that HDs are too big by Joce640k · 2009-09-18 00:24 · Score: 1

He said "pathological case", not "average".

--
No sig today...
Re:RAID concept is fine, it's that HDs are too big by Daniel_Staal · 2009-09-18 00:40 · Score: 1

That's probably the difference between 'worst case' and 'normal case': If they can avoid the worst case most of the time average times will be much faster. But that doesn't mean that you couldn't get yourself into the special case where it will take ages.

--
'Sensible' is a curse word.
Re:RAID concept is fine, it's that HDs are too big by SwashbucklingCowboy · 2009-09-18 01:38 · Score: 2, Insightful

"The fundamental problem here isn't the RAID concept, is that the throughput and access times of spinning rust haven't changed much in 30 years."
Uh, there's another bigger problem. The drive error rate (when reading data) hasn't changed that much either while data on a drive has dramatically increased.
When doing a rebuild when you've lost all redundancy a single read error means the rebuild will fail. Increase the size of a drive (while keeping error rates constant) and you increase the likelihood of a rebuild failure.
Re:RAID concept is fine, it's that HDs are too big by Anonymous Coward · 2009-09-18 02:23 · Score: 0

"Irregardless" is not a word, you fucking moron.
Re:RAID concept is fine, it's that HDs are too big by chance360 · 2009-09-18 06:06 · Score: 1

(regardless of size)
Re:RAID concept is fine, it's that HDs are too big by lewiscr · 2009-09-18 06:08 · Score: 2, Funny

Irregardless, I'll continue to use it.
Re:RAID concept is fine, it's that HDs are too big by mindstrm · 2009-09-18 07:05 · Score: 1

wait -ssd's cant' fail?

Wrong title. Or dramatization again? by dostick · 2009-09-17 23:00 · Score: 1

Article should be titled "Parity - based RAID days are numbered". There's nothing wrong with RAID 1,0, 10

Re:Wrong title. Or dramatization again? by defireman · 2009-09-17 23:21 · Score: 2, Informative

RAID 0 does not offer any redundancy. Just a performance increase from reading simultaneously from 2 drives.
Re:Wrong title. Or dramatization again? by arndawg · 2009-09-18 00:42 · Score: 1

Everyone knows that. So i think that's a pretty redundant comment i think that comment is.
Re:Wrong title. Or dramatization again? by Hatta · 2009-09-18 02:26 · Score: 1

If you simply mirror drives, and the expected amount of data read before you encounter an error is a significant proportion of the size of the drives, then you may not be able to rebuild your mirror without encountering an error.

--
Give me Classic Slashdot or give me death!
Re:Wrong title. Or dramatization again? by EmagGeek · 2009-09-18 02:39 · Score: 1

No...
It should read:
The days, of implementing RAID with hardware and software that have none of the design elements necessary for implementing RAID properly, are numbered.

1 error = 1TB rebuild by valentyn · 2009-09-17 23:04 · Score: 1

The real problem with "classic" RAID is that 1 single error means a total rebuild of the array.

--
my other sig is a 500 page novel

Look the solution is obvious by jayhawk88 · 2009-09-17 23:08 · Score: 5, Funny

The cloud. Just cloud it, baby. Nothing bad ever happens in the cloud; they're so white and fluffy after all.

Re:Look the solution is obvious by Anonymous Coward · 2009-09-18 02:48 · Score: 0

The cloud. Just cloud it, baby. Nothing bad ever happens in the cloud; they're so white and fluffy after all.
Dude, where do you think lightning comes from? And lightning isn't good for dogs. I mean hard drives.

doesn't raid 10 solve this? by davros-too · 2009-09-17 23:13 · Score: 2, Interesting

Um, don't schemes like raid 1+0 solve the parity rebuild problem? Even in the worst case of full disk loss, only one disk needs to be rebuilt and even for a large disk that doesn't take very long. Am I missing something?

--
In theory, there's no difference between theory and practice; in practice there is.

Re:doesn't raid 10 solve this? by confused+one · 2009-09-18 00:44 · Score: 1

losing two disks, one on each mirror.
Re:doesn't raid 10 solve this? by Anonymous Coward · 2009-09-18 03:27 · Score: 0

In every scheme, a full disk loss requires one disk to be rebuilt. The problem is that for large disk that takes very long. (1 TB @ 60 MB/s takes 4.6h to read or write).
Re:doesn't raid 10 solve this? by lewiscr · 2009-09-18 06:19 · Score: 1

RAID5 is not redundant during a single drive failure. Once the failed drive is rebuilt, it is singly redundant again.
RAID6 attempts to keep a fully redundant array even during a single drive failure. Once the failed drive is rebuilt, it doubly redundant (redundantly redundant?). 2 drive failures make a RAID6 look the same as a RAID5 with 1 drive failure.
RAID1+0 is partially redundant during a single drive failure. The downside is that it focuses the stress of rebuilding the failed drive on a single drive (it's mirror). If the mirrors are not from the same manufacturing lot, the probabilities of a 2nd failure is fairly well known. If the failed drive and it's mirror come from the same manufacturing lot, the risk of a 2nd failure (during rebuild) increase considerably.
I personally run RAID1+0, with the RAID1 part being a two way mirror. I'm debating making the RAID1 portion a 3-way mirror. Just to assuage my paranoia.
Re:doesn't raid 10 solve this? by kaaona · 2009-09-18 06:54 · Score: 1

I think you have it exactly right. RAID 1 pairs can be chained as 1+0 strings to achieve truly enormous capacities without the exponential rebuild times of RAID 5 arrays. The rebuild / mirroring times for RAID 1+0 are fast and finite, and in most systems this can be done both on-line and on-the-fly. More significantly, with RAID 1+0 there's essentially no parity computation write penalty, making it a good choice for high-volume transaction systems.
The only downsides are the physical volume, electrical power, and cooling required by larger numbers of hard drives. The latest terabyte-class SATA drives have shrunk those numbers some, but not to the degrees seen in memory density and CPU line geometries.
Re:doesn't raid 10 solve this? by lewiscr · 2009-09-18 08:22 · Score: 1

I forgot to address the final point:

only one disk needs to be rebuilt and even for a large disk that doesn't take very long.
That depends. I can clone a 120GB SATA drive in about 2 hours, if I dedicate all the IO to the clone operation. Admittedly, this is on my laptop (which peaks around 30MB/s), but it's also a dedicated sequential read from the source and write to the destination. If I try and use both disks while the cloning is occuring, the clone is slowed down significantly. I would expect that operation to drop to well below 30MB/s.
On top of that, a lot of RAID card/implementation will cap the amount of I/O that can be consumed by a rebuild. My Linux software RAID1 caps the transfer rate (I *think* around 10MB/s, but I'm kind of guessing. The trivial test I tried didn't run long enough to hit the cap, and I don't want to rebuild the big volume just for the fun of it).
All these things add up to a much longer rebuild time than a back of the envelope calculation would indicate.
Re:doesn't raid 10 solve this? by davros-too · 2009-09-18 11:44 · Score: 1

You're right, rebuild times are not short. However, I think they're still a lot shorter than failure times. I think rebuild times are probably a big advantage of the better hardware raid controllers. Rebuild IOs are 'local' to the card. I recently rebuilt a 300G drive while the mirror continued with an online and fairly active database, with no noticeable performance loss during the rebuild.

--
In theory, there's no difference between theory and practice; in practice there is.

Tahoe-LAFS by Anonymous Coward · 2009-09-17 23:16 · Score: 0

The RAID concept can be extended to multiple PCs forming a storage grid. One open-source implementation is Tahoe LAFS.

Re:Worked-around a Long Time Ago by plover · 2009-09-17 23:42 · Score: 4, Interesting

Actually, storing data in a multiple data center / high availability environment is a completely related issue. The summary above talks of "entirely different paradigms." Cloud storage would be multiple data center based, which is entirely different from keeping the only copy on your local drives. In this concept, your machine would have enough OS to boot, and enough hard drive space to download the current version of whatever software you are leasing. Your personal info would always be maintained in the data centers, and only mirrored locally. Have a home failure? Drop in a new part or even a new PC, (possibly with an entirely different operating system, such as Chrome,) connect to the service, and you're 100% back.

It's no longer a novel concept for the home market. Consider Google Docs. It's not even being sold as "safer than RAID", it's being touted as "get it from anywhere" or "share with your friends". Safer than RAID is just a bonus.

So are we ready to move all our personal information to clouds? I certainly am not, but Google Docs are wildly popular and a lot of people are. I long ago learned that I can't look to myself to judge what the mainstream attitudes are in many things.

--
John

RAID 4 has a dedicated parity drive, not 5 by Targon · 2009-09-17 23:43 · Score: 4, Interesting

RAID 4 is where you have one dedicated parity drive. RAID 5 solves this by spreading the parity information for each drive to all the other drives in the array. RAID 6 adds a second parity block for increased reliability, but as a result of the increased write for that extra parity block, it slows down write speeds.

The real key to making RAID 4, 5, or 6 work is that you really need 4-6 drives in the array to take advantage of the design. I wouldn't say that it will fall out of favor though, because having solid protection from a single drive going bad really is critical for many businesses. Backups are all well and good for if your system crashes, but for most businesses, uptimes are more critical yet. So, backups for data so corruption problems can be rolled back, and RAID 5,6,10 for stability and to avoid having the entire system die if one drive goes bad. What takes more time, doing a data restore from a backup for when an individual application has problems, or having to restore the entire system from a backup, with the potential that the backup itself was corrupted?

With that said, web farms and other applications can get away with just using a cluster approach instead of a single well designed machine(or set of machines) have become popular, but there are many situations which make a system with one or more RAID arrays a better choice. The focus on RAID 0 and 1 for SMALL systems and residential setups has simply kept many people from realizing how useful a 4-drive RAID 5 setup would be.

Then again, most people go to a backup when they screw up their system, not because of a hard drive failure. With techs upgrading hardware before they run into a hard drive failure, the need for RAID 1, 4, 5, and 6 has dropped.

I will say this, since a RAID 5 array can rebuild on the fly(since it keeps working even if one drive fails), the rebuild time itself does not significantly impact system availability. Gone are the days when a rebuild has to be done while the system is down.

Crystals? by Hitman_Frost · 2009-09-17 23:44 · Score: 1

Sorry if this is a bit offtopic, guys.

I noticed the crystals tag on the story, which reminded me of the old Star Trek episodes where someone would open a case of storage crystals, select one, and then access some tremendously huge amount of data on a local terminal using it.

The thought that popped into my head the other day was this - how do they always seem to know what crystal to select? There would often be 20 - 25 in a case, and they were all unlabelled!

I had an amusing image of some kind 100 petabyte crystal technology marred by the user's sticking labels on the sides of them with "movie collection" scrawled in biro!

Re:Crystals? by DamnStupidElf · 2009-09-18 06:19 · Score: 1

The obvious answer is that it was a RAID1 of crystals with a whole lot of redundancy.

Distributed storage ... by Itkovian · 2009-09-17 23:46 · Score: 1

*cough* http://www.b-virtual.org/display/WWWBVIRT/Distributed+Storage+System *cough*

Seems like an adequate solution to the problem, at least according to the talk I heard last week. BTW, I am not associated with B-virtual.

--
I am the Shield Anvil. And I am not yet done.

To summarize the summary, by Anonymous Coward · 2009-09-17 23:54 · Score: 0

In my opinion, RAID-6 is a reliability Band Aid for RAID-5, and going from one parity drive to two is simply delaying the inevitable. The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss. In short, it's a scenario that will eventually require a solution, if not a whole new way of storing and protecting data.

So, in summary, current RAID designs have problems with large drives. This basically means that you'll encounter issues. The simplest way of saying this is that it fails. A quick recap: RAID has problems.

RAID6 with enterprise hardware is reliable by niola · 2009-09-17 23:56 · Score: 2, Interesting

I use RAID6 for several high-volume machines at work. Having double parity plus a hot spare means rebuild time is no worry.

But if you are not a fan you can always throw something together with ZFS's RAIDZ or RAIDZ2 which is also distributed parity but the ZFS filesystem checksums and keeps multiple (distributed) copies of every block to detect and fix data corruption before it becomes a bigger problem.

People using ZFS have been able to detect silent data corruption from a faulty power supply that other solutions would never have found just because of the checksumming process.

Raid does not equal a valid backup by Anonymous Coward · 2009-09-18 00:14 · Score: 0

No raid based solution equals having a backup... period.

If your worried about down time, build in redundancy as was mentioned several times above...

We run raid5 and raid 10 on various systems for their data, and backup to multiple destinations (tape and hard drive)... we can afford some downtime if 0things fail...

Not sure I seal a real problem...

I'm not sure I get it by Joce640k · 2009-09-18 00:27 · Score: 2, Interesting

Is he saying that you can never read a whole hard disk because it will fail before you get to the end?

That's what it seems like he's saying but my hard disks usually last for years of continuous so I'm not sure it's true.

--
No sig today...

Re:I'm not sure I get it by SwashbucklingCowboy · 2009-09-18 01:25 · Score: 1

Yes, that's pretty much it. BUT... It's not an entire drive failing that causes the problem, it's a read of a single sector of a drive. Here's a pretty good explanation: http://blogs.zdnet.com/storage/?p=162
Re:I'm not sure I get it by Anonymous Coward · 2009-09-18 03:10 · Score: 0

my hard disks usually last for years of continuous
Yeah, who needs this RAID stuff. Just buy a new disk every couple of years and copy the data to the new disk. This way you will also have backups on the old disks. Why didn't anyone think of this before.
Re:I'm not sure I get it by Joce640k · 2009-09-18 04:23 · Score: 1

The article says "So the read fails. And when that happens, you are one unhappy camper. The message âoewe canâ(TM)t read this RAID volumeâ travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected - you thought! - data is gone. Oh, you didnâ(TM)t back it up to tape? Bummer!"
Ummm...no.
One bad sector isn't the end of the world. Even on RAID5 it only damages one file, not the entire 12TB. You can restore the file from backup and rewrite the sector so that the drive's controller will remap it.
With RAID6 you'd need two drives to have the exact same bad sector - probably less likely than a meteorite hitting them.

--
No sig today...

Re:Worked-around a Long Time Ago by vrmlguy · 2009-09-18 00:32 · Score: 1

Actually, storing data in a multiple data center / high availability environment is a completely related issue. The summary above talks of "entirely different paradigms." Cloud storage would be multiple data center based, which is entirely different from keeping the only copy on your local drives. In this concept, your machine would have enough OS to boot, and enough hard drive space to download the current version of whatever software you are leasing. Your personal info would always be maintained in the data centers, and only mirrored locally. Have a home failure? Drop in a new part or even a new PC, (possibly with an entirely different operating system, such as Chrome,) connect to the service, and you're 100% back.

Unfortunately, that's just moving the issue to the cloud, which (since it is storing great gobs of data) is likely to be using the highest capacity drives available in some sort of RAID configuration.

--
Nothing for 6-digit uids?

Hey there, Mr. Obvious... by Anonymous Coward · 2009-09-18 00:36 · Score: 0

Long time listener, first time caller...

How is this news? Mirrored raids cut possible storage space in half for something a simple backup would take care of in most instances. The data access increase provided by striped raid is negligible at best, and if you lose that raid, you lose everything.

The only possible advantage to a raid is *if* you have a mirrored raid and *if* you're working with a major business server that you can't afford to be down for any length of time. Of course, you still have to bring it down to swap the drives and rebuild the array. So what's the point?

Re:Hey there, Mr. Obvious... by Anonymous Coward · 2009-09-18 16:45 · Score: 0

The only possible advantage to a raid is *if* you have a mirrored raid and *if* you're working with a major business server that you can't afford to be down for any length of time. Of course, you still have to bring it down to swap the drives and rebuild the array. So what's the point?
Ever heard of hot-swap?

Just make enough copies by Anonymous Coward · 2009-09-18 00:36 · Score: 0

IIRC, Google doesn't use RAID for their data. They "just" ensure there is always a specified number of distributed copies available. If one copy becomes unavailable, make another copy.

An Idea... by Kookus · 2009-09-18 00:45 · Score: 1

It seems that time is of the essence when doing a rebuild, why not consume disks faster, but decrease the time it takes to do a rebuild. So in a raid setup with a hot spare, come up with an algorithm to basically write to the spare a distributed set of the data that is being written to the raid. The algorithm part, I guess, can be better explained as a daily/weekly backup. So If I'm writing an incremented number every day to the same spot on a disk, maybe the hot spare only has the data written to it on fridays the next time it is updated. My guess is that data is usually changed infrequently, and more or less you only add data. So the hot spare should have pretty close to 1/2 the correct data on it upon a disk failure. Then you only have to rebuild the other 1/2 or so. So my guess would be the hot spare would have a fraction of the writes (closer to 1/1 than 1/2) of a normal disk, and none of the reads until it's ready to go into service. 1/2 the rebuild time or so. Maybe it's worth it?

Re:Worked-around a Long Time Ago by Anonymous Coward · 2009-09-18 00:46 · Score: 0

Nice database you got there. Shame if anything happened to it.

Re:Worked-around a Long Time Ago by kickedfortrolling · 2009-09-18 00:48 · Score: 4, Funny

Don't discourage the boy. Weaseling out of things is important to learn. It's what separates us from the animals

--
--AlexC
Just because I dont agree with climate change doesnt make me a troll

Re:Worked-around a Long Time Ago by 2obvious4u · 2009-09-18 00:50 · Score: 2, Insightful

And then like AOL, Google goes out of business (shocker I know) and all your data is lost forever. The cloud is good for a lot of stuff, but for data storage it should be part of the solution, not 100% of it.

duplicate? by schwit1 · 2009-09-18 01:00 · Score: 1

http://hardware.slashdot.org/article.pl?sid=08/10/21/2126252

Re:Worked-around a Long Time Ago by yahwotqa · 2009-09-18 01:02 · Score: 5, Funny

Not from weasels, though...

Many a job has been lost... by Caduceus1 · 2009-09-18 01:02 · Score: 1

...because someone thought that since they had RAID, they didn't need to back up the data...

What gets me, especially in the Linux world, is the difficulty or sometimes the impossibility of monitoring the arrays for their state. We've had several controllers that we've only found out about bad disks on physical inspection. This limits the controllers we use and thus might be using a lesser-performing controller only because we can monitor it...

--
rm /dev/mem
Sci-Fi Storm

Dear Seagate, Western Digital, et. al: by ThreeGigs · 2009-09-18 01:09 · Score: 4, Interesting

Here's what I want, folks:
A 5.25 inch device with 5 double-sided platters running at 5400 RPM. Basically the same size as a desktop CD/DVD drive, ala Quantum Bigfoot.
I want 8 sides of the platters dedicated to data, and the other two sides dedicated to parity (or one parity and the other servo), essentially a self-contained RAID on a single disk.
I want all data heads to write and read simultaneously, in Parallel. The idea is to have 64 byte sectors on each platter which are recombined into a 512-byte result. 8 heads writing and reading in paralell means HUGE throughput for sequential operations.

It's RAID 5 or 6 on a single disk, although without spindle redundancy.

And I also want a high-performance option: 2 sets of read/write heads 180 degrees apart, which effectively would cut seek times in half, making the drive perform more like a 10k RPM drive. With current densities, that's 12 TB in the volume of a DVD drive. It solves speed, sector error recovery and capacity issues. The only thing missing is a data bus that can handle the throughput.

Re:Dear Seagate, Western Digital, et. al: by EmagGeek · 2009-09-18 02:37 · Score: 3, Insightful

Without spindle redundancy...
or logic element redundancy...
or power supply redundancy...
or cable interconnect redundancy...
add to that the cost of adding dedicated RAID hardware to every single drive (that's an expensive PLD), and it's no wonder it's not on the market. High cost - no return.
Re:Dear Seagate, Western Digital, et. al: by Anonymous Coward · 2009-09-18 02:58 · Score: 0

You forgot to mention the pony.
Re:Dear Seagate, Western Digital, et. al: by R2.0 · 2009-09-18 03:06 · Score: 1

"It's RAID 5 or 6 on a single disk, although without spindle redundancy."
You are aware that the "R" in RAID stands for "redundant", yes? So restating your assertion would be: "It's like a redundant array of inexpensive disks, although without spindle redundancy."

--
"As God is my witness, I thought turkeys could fly." A. Carlson
Re:Dear Seagate, Western Digital, et. al: by Anonymous Coward · 2009-09-18 03:19 · Score: 1, Informative

And then the onboard hard drive controller fails. Zap. Game over.
RAID on discrete disks in really good at avoiding that kind of hardware failure.
Re:Dear Seagate, Western Digital, et. al: by Joce640k · 2009-09-18 04:54 · Score: 1

So put two of them in "RAID1" ... it's not as if it will break the bank these days.

--
No sig today...
Re:Dear Seagate, Western Digital, et. al: by tepples · 2009-09-18 04:55 · Score: 1

I want all data heads to write and read simultaneously, in Parallel.
In order to treat the platters of a single hard disk as an array, you'd need to have a separate servo mechanism to align each head to the data track under it.
Re:Dear Seagate, Western Digital, et. al: by adisakp · 2009-09-18 06:24 · Score: 2, Interesting

I want 8 sides of the platters dedicated to data
More platters == more mass. Which translates to more power required for the motor, higher energy usage and much more heat generated by the drive. Generating more heat == quicker hardware failures. Also with bigger / larger / more platters, it's much harder to spin the platters faster. Usually more platters == slower RPM drive speed and much slower seek rates. If you can do fewer, smaller, and lighter platters, you can make the drive spin faster and perform better -- this is exactly what the Velociraptor does with it's high RPM 2.5" format.

Also, using only one side of the platter is often faster and more reliable because the head arm weighs less (1/2 the heads) so they don't have as much mass to impede fast seeking or to cause vibration. Plus you don't have to worry about the alignment on both sides of the platter. This is one reason why the highest speed drives do not necessarily even use both sides of the platter.

It's RAID 5 or 6 on a single disk, although without spindle redundancy.
No it's not... what happens if the control electronics fail, the arm actuator, or the spindle motor? RAID 5/6 have whole disk redundancy. You just have data redundancy on the platters - not full hardware redundancy. Also, all this extra components you want to add to the drive will just make it more complicated and have more points of failure so the drives will actually fail earlier.

And I also want a high-performance option: 2 sets of read/write heads 180 degrees apart, which effectively would cut seek times in half, making the drive perform more like a 10k RPM drive.
Except that it would slow the drive down by making calibration harder and slower. Moving the head arms causes vibration and movement in the drive. One arm would not be able to reliably read while the other was moving unless the drive was spinning slower to begin with.

The moral of your story is that you have some interesting ideas, but believe it or not, most of them have already been tried and rejected well before coming to market because they weren't feasible or reliable or didn't actually result in performance improvements in a cost effective manner.
Re:Dear Seagate, Western Digital, et. al: by sootman · 2009-09-18 06:58 · Score: 1

Agreed--I've wanted something like this for years. :-) I didn't think as far as data and parity platters or multiple heads but I'd be more than happy with large, slow, quiet, cool disks.

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Re:Dear Seagate, Western Digital, et. al: by Anonymous Coward · 2009-09-18 07:35 · Score: 0

Problems with your idea:
It isn't possible to do simultaneous reads/writes on several heads at once unless you dedicate a servo mechanism/arm for each head.
Having 64 byte sectors isn't smart as the overheads (synch patterns, ECC) would dwarf the actual data.
A (hard) head crash on one of the platters would contaminate the drive interior with debris, probably causing crashes of the other heads too.
Re:Dear Seagate, Western Digital, et. al: by Anonymous Coward · 2009-09-18 12:59 · Score: 0

And how does this help if your disk doesn't spin up?
Re:Dear Seagate, Western Digital, et. al: by drfreak · 2009-09-18 13:33 · Score: 1

Dear potential Client,
How's it feel to want?
Sincerely,
Western Digital
Re:Dear Seagate, Western Digital, et. al: by BitZtream · 2009-09-19 05:20 · Score: 1

Great, awesome ... redundant spindles...
That does nothing for every failure mode I've encountered.
Those being, drive motor stops spinning the drive up, or the heads stop working (click click click).
Since either of those would break your idea, it makes it rather useless to build a 'redundant device' that protects against failure modes that nobody experiences because the other failure modes happen first.
Two sets of heads 180 degrees apart may increase your sequential throughput, but unless they can seek to different locations that the other heads, then you aren't changing seek time.
You haven't actually solved a speed, sector error, or capacity issue that doesn't get resolved the exact same way with 2 disks mirrored, except that with 2 disks you get all the protection of your own super disk, as well as protection against most of the things that will take out your superdisk.
I'm not sure why it seems like a bright idea to essentially take a raid, put it in a single box, and then take away a few layers of redundancy and advantages. Fortunately there are plenty of mods on slashdot who don't actually analyze something for throwing an interesting mod at it.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Dear Seagate, Western Digital, et. al: by toddestan · 2009-09-19 08:12 · Score: 1

I'll take the consumer version of that drive. In other words, drop the redundancy platters for more data capacity, and no high performance options. Basically one big huge media storage drive, the lower throughput would not bother me.
In order to back it up, give me another one in some kind of external enclosure with Firewire or eSATA connection.

windows home server? by flappinbooger · 2009-09-18 01:11 · Score: 1

I thought windows home server resolved this issue a while ago. Their non-raid multi disk approach is well documented regarding it's function as well as philosophy. http://www.google.com/search?q=windows+home+server+doesn't+use+raid

I was suspicious of this at first but reading up on it, it sounds pretty good.

--
Flappinbooger isn't my real name

Solved, but at a price by Anonymous Coward · 2009-09-18 01:12 · Score: 2, Insightful

You are absolutely correct in that the mainframe world has dealt with all of the modern recovery issues. But think of the actual USE of storage these days. What used to be a colossal database is now just a bunch of a bunch of home videos from my camcorder. Not only has the cost of storage dropped to nearly nothing, the threshold for using it has dropped even lower. I'm perfectly willing to commit a few megabytes every time I push the button on my digital camera. I remember college, where my mainframe disk quota was a mere 256K.

Today's challenge is to get mainframe-class recovery without bringing back to mainframe-style prices. Some of this is controlled by the way we USE data storage. And then there is all the "savings" we get from server consolidation. Everything we do to consolidate just makes storage management a bigger headache. The trick is to evolve not just the low-level, "invisible" management of storage, but the high level applications as well. If I don't truly NEED to have 10TB on a single mount point, perhaps I should have multiple volumes, distribute my storage, and find a way to be happy with twenty 500GB volumes instead. The easiest way to avoid the recovery time of a 10TB RAID set is to not build one.

I was in mainframe IT long before RAID was commonplace. We commonly faced limits of 450MB on indexed files, because that's as much as you could get from a hard drive back in the early 1980's. Modern Oracle DBAs must be scratching their heads at all of the tablespace management options that seem so redundant when you have RAID storage. This was the pre-RAID method of storage management, in which database container files could be of any size, mounted anywhere, and utilized in all sorts of creative ways to circumvent the hardware limitations of storage in those days. Today, it represents little more than an opportunity to inadvertently bring out the worst of both worlds by setting up these two storage methodologies in conflict with each other.

Re:Solved, but at a price by badkarmadayaccount · 2009-09-20 02:12 · Score: 1

Sounds fun being a storage engineer. But why not make the database the filesystem and volume manager and software RAID? The layering and flexibility are already there, no need to retrofit anything, just put a small binary in the kernel and wrap it with POSIX.

--
I know tobacco is bad for you, so I smoke weed with crack.

Crappy RAID's days are numbered by zerofoo · 2009-09-18 01:26 · Score: 2, Interesting

Having worked with plenty of enterprise grade raid (EMC symetrix, clarion, and Dell SAN devices) I can say that capacity and rebuild times are not a problem for high-end arrays.

What will bring the problem to the masses are these stupid consumer NAS boxes. It is very easy to build a 4 or 8 TB array for home use using relatively cheap hardware. Unfortunately, no home user/abuser, that I know, has the skill set to manage or protect such a large array of data.

My most recent experience with a Western Digital sharespace was awful. Here is a box with a Gigabit NIC, and 4 - 2TB hard drives in a RAID 5 array that has transfer rates around 9MB/sec at best. Combine that pitiful performance with a rebuild/reformat time of over two days - and you know where this is going.

Average joes are going to put their entire lives on these things and never back them up due to the time and space cost. When a failure does occur - it will take days to perform a rebuild of the array - vastly increasing the likelyhood of another failure and permanent data loss.

Crappy RAID's days are numbered - good RAID implementations will be with us as long as hard drives have ANY failure rate at all.

-ted

Re:Crappy RAID's days are numbered by lotho+brandybuck · 2009-09-18 02:01 · Score: 1

There's a lot of people who've been burnt by a crummy on-motherboard RAID that will never touch it again. I've never been burnt by a regular disk failure. I've been seriously burnt by a on-motherboard RAID on a client's machine that went south. I had pretty recent backups, the real time sink was spending 14 hrs trying all sorts of stuff to rebuild the RAID, instead of 5hrs replacing a drive, rebuilding the system and recovering about 1/2 hr of programming.
I'll never touch RAID, although I'm sure there's good controllers out there.
If I needed 24/7 disk, with second-by-second guaranteed no dataloss, I'd hire someone to set it up and support it. I'll never touch a RAID again.
Re:Crappy RAID's days are numbered by swillden · 2009-09-18 03:14 · Score: 1

I've been seriously burnt by a on-motherboard RAID on a client's machine that went south
A big part of the reason why I only use software RAID on low-end hardware. Decent HW RAID equipment is great, but the low-end stuff you find on motherboards is lousy. Linux MD-RAID is fast and extremely reliable.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:Crappy RAID's days are numbered by Anonymous Coward · 2009-09-18 08:59 · Score: 0

We had a hardware/software error on an EMC Symmetrix.
EMC's engineering told us, to be save, they will recalculate all parity bits, and that this will take months (!) (for about 50TB of usable space).
After all the problems we had with these boxes, I start to think that high-end storage is not as high-end at all.
Re:Crappy RAID's days are numbered by davros-too · 2009-09-18 11:54 · Score: 1

I'll never touch a RAID again.
Yeah mate, I feel your pain, I've been there! However, for servers you still need raid - software raid in server OS is just fine these days, or better still if you've been burned, any server-quality raid card really does 'just work' (look for hardware with 'hot swap' drives - worth the money).
For workstations, backups plus SSD for speed.

--
In theory, there's no difference between theory and practice; in practice there is.

Next Gen File Systems/Storage Management Solutions by Anonymous Coward · 2009-09-18 01:30 · Score: 0

When a RAID Array becomes too large to be efficient, it's time to deploy some other storage management solutions. The idea is to keep RAID arrays small, manageable and reliable. As mentioned elsewhere in comments, storage today is inexpensive compared to the other costs of large projects.

From the commercial side I saw ibm's GDPS mentioned, but there is also the General Parallel File System from IBM which allows you to stripe your data over multiple arrays over multiple networked systems as well as to have multiple mirrors. The storage does not have to be RAID storage either.

Also, there is the XAM space for storage management in the Content Addressable Storage space. The current implementations are mostly commercial though Sun's (Oracle's now) Honey Comb project provides a version of the XAM interface.

There are also several open source storage management solutions that provide for redundancy and stability - Twisted Storage is one. Tahoe and XTreemFS are two others.

Pick the right technology for the job, there are solutions out there and I'm sure new ones on the way.

Check out XIV or EMC's using enterprise flash by Anonymous Coward · 2009-09-18 01:30 · Score: 0

there is a storage platform called XIV, it distributes the data in what they call RAID-X Go check it out at xiv.com also EMC is using Enterprise Flash in their arrays in a Raid-5 config, and i think in 18-24 months we will see EFD (enterprise flash) be priced very closely with a 15K FC drive.

Re:Next Gen File Systems/Storage Management Soluti by Anonymous Coward · 2009-09-18 01:38 · Score: 0

reference links

SATA vs FC/SAS: grapes and oranges by argent · 2009-09-18 01:54 · Score: 3, Insightful

The chart he's using goes from SCSI, to fiberchannel, to SAS... to SATA. When you go from professional/server interfaces to hobby/desktop ones, of course the rebuild time skyrockets. If you did this article a few years ago and slid ATA in as the last data point instead of fiberchannel, you'd be seeing the knee showing up then instead of now. How about looking at 2010 and doing the calculations with 6 Gb SAS interconnect and 3 Gb drives, instead of 1.5 Gb SATA and 1 Gb drives?

a better solution by dayton967 · 2009-09-18 01:54 · Score: 1

Build multiple smaller arrays of drives and with the right file system, you then pool the storage, eg ZFS, LVM, btrfs, and many others. In theory as well (and practicality), you could do raid of raids of raids ... etc. (eg raid 5 array consiting of raid 5 arrays) You also can have something like hadoop, or the various other solutions coming, which allows for clustering across your infrastructure. Remember google does not have one single big massive array of disks running, to serve all of us peons.

Sw&Hw RAID have been obsolete for some time by toby · 2009-09-18 02:05 · Score: 1

..and the sooner people learn that, the better. Studying ZFS' design reveals a great deal about what RAID simply cannot achieve.

--
you had me at #!

Re:Sw&Hw RAID have been obsolete for some time by BitZtream · 2009-09-19 05:10 · Score: 1

You realize one layer of ZFS is software raid ... right?

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

Re:Worked-around a Long Time Ago by Hatta · 2009-09-18 02:08 · Score: 4, Insightful

Consider Google Docs.

If you have so much data that you're likely to encounter an error when rebuilding your RAID array, I don't think Google Docs is going to cut it.

--
Give me Classic Slashdot or give me death!

This whole discussion is stupid.. by fluffernutter · 2009-09-18 02:13 · Score: 1

I just don't see evidence of this in real life.. I have a couple companies as customers who together are running six SANs from three different vendors. There must be about 100 raid arrays between them. We replace a drive about twice a year between all of them.. There is definitely no flaming ball of fire anywhere. We put in a new drive, it rebuilds over a couple hours and we are good for another six months. Plus I don't know about the other vendors but IBM is scaling upwards, too.. you get a few SANs under an SVC and you can migrate data from SAN to SAN at will.

--
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.

Re:Worked-around a Long Time Ago by Anonymous Coward · 2009-09-18 02:23 · Score: 0

I don't see the situation really evolving for Raid.

A long time ago, big servers used to have around 50 disks totalling 1 GB of storage, MTBF of individual disks of 200,000 hours, some form of Raid, a bandwidth of 10Mb/s

Today's servers have also about 50 disks, totalling 40TB of storage, MTBF of individual disks ranging from 200,000 to 1.6 million hours, Raid 6, a bandwidth of 40Gb/s

So I fail to understand why Raid would become obsolete.

We are just adding extra layers of security by mirroring the data in different locations. But the Raid technology is not dead. servers, even those in the cloud still need Raid.

Re:Worked-around a Long Time Ago by Swordsman02155 · 2009-09-18 02:25 · Score: 1

Not from weasels, though...

Eagles may soar, but weasels don't get sucked into jet engines.

bad idea by speedtux · 2009-09-18 02:26 · Score: 1

Modern drives perform bad block replacement internally. By the time you actually see drive errors at the OS level, the drive has already experienced massive failures.

fill the drive with helium by speedtux · 2009-09-18 02:29 · Score: 3, Interesting

Filling the drive with helium should help; the speed of sound in helium is 3x higher than in air, and it offers less resistance.

(Hydrogen would be even better, but it has a tendency to interact with metals in unfortunate ways.)

Re:fill the drive with helium by Volante3192 · 2009-09-18 03:05 · Score: 1

Wouldn't work without a major redesign. Hard drives aren't sealed to the outside world; there has to be an equilibrium between air pressure and volume.
You'd somehow have to work out a closed system for a hard drive possibly leading to some sort of bladder to allow helium to leave and return the main hard drive chassis while not being vented to the outside world.
Re:fill the drive with helium by nacturation · 2009-09-18 03:59 · Score: 1

Wouldn't work without a major redesign. Hard drives aren't sealed to the outside world; there has to be an equilibrium between air pressure and volume.
They're not? So any dust/smoke particles and so on are free to enter the spinning platter area?

--
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Re:fill the drive with helium by networkBoy · 2009-09-18 04:14 · Score: 3, Insightful

there are these things called filters.
They work pretty well.

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:fill the drive with helium by Volante3192 · 2009-09-18 04:29 · Score: 1

Why do you think people are told not to smoke around hard drives?
http://www.buzzle.com/articles/smoke-damages-and-hard-drive-recovery.html
Re:fill the drive with helium by Volante3192 · 2009-09-18 04:51 · Score: 1

Yes, but filters do not seal a hard drive. Sealed means nothing gets in, nothing gets out.
Filters say "this can go in and out, but nothing larger."
The problem when we add helium to the mix is it has the tendancy to head skywards and not come back down, so if you make your filter fine enough to only allow helium to pass in or out (let's assume for the sake of argument this is possible) then as pressure increases inside, helium is pushed out. This helium, having free reign over it's journey now, floats up and out.
Now, when the pressure drops in the HD, there's nothing to replenish the volume.
Re:fill the drive with helium by Volante3192 · 2009-09-18 05:03 · Score: 1

Wow, epic reply fail... Plus side, found a better link.
Short answer: Yes!
http://answers.google.com/answers/threadview/id/115859.html
Re:fill the drive with helium by networkBoy · 2009-09-18 05:24 · Score: 1

this is a reply to the wrong post :(
should have been a reply to the "smoke/dust post further above...

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:fill the drive with helium by Volante3192 · 2009-09-18 05:41 · Score: 1

Yeah, we broke slashdot's threading...
Anyway, filters don't work if the particles are smaller than air, which works well enough in general for dust, but many smoke particles are smaller than air molecules (ok, N2 and O2).
Smoke = HD killer.
Re:fill the drive with helium by Dogtanian · 2009-09-18 05:52 · Score: 1

many smoke particles are smaller than air molecules (ok, N2 and O2)
Citation needed.

You're not seriously suggesting that smoke particles are smaller than N2 or O2 are you? The only way I can think of that is that you're talking about *molecules* (or atoms) of potentially damaging substances, which wouldn't be what most people would consider "smoke particles".

I might be wrong, though I'd be interested to hear your answer if I was.

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Re:fill the drive with helium by Volante3192 · 2009-09-18 06:09 · Score: 1

Well, maybe not that persay, but smoke particles are small enough to get through the filter on the hard drive.
http://answers.google.com/answers/threadview/id/115859.html
Re:fill the drive with helium by Grishnakh · 2009-09-18 07:02 · Score: 1

I don't get it, why not just seal them hermetically with helium inside, and not worry about outside air pressure? As long as the drive's chassis, lid, and seals are strong enough, they shouldn't have any trouble as long as the air outside isn't too far from 1 atm (obviously, you couldn't put it in a 100 atm chamber or whatever without causing damage).
Obviously, they'd need a more rigid lid than what they currently put on drives, but that's no big deal.
Re:fill the drive with helium by Firethorn · 2009-09-18 07:27 · Score: 2, Informative

don't get it, why not just seal them hermetically with helium inside, and not worry about outside air pressure?

1. Hasn't been necessary
2. Helium is expensive
3. Sealing something Helium-tight is expensive, about as bad as trying to seal in hydrogen*
4. Fairly sensitive to pressure - not a problem in a non-airtight HD, but a problem in a sealed HD that's heating up.
5. Cooling can be an issue
*Mostly because He tends to stay monoatomic, H pairs up into H2. End result is that the H2 molecule is around the same size as a He atom.

--
I don't read AC A human right
Re:fill the drive with helium by Volante3192 · 2009-09-18 07:32 · Score: 1

Outside pressure is a nonissue. It's the inside pressure that you're worried about. The platter spinning increases the pressure and pushes air out.
Therefore, when you seal the drive, you can't reduce the inside pressure.
Now, I'm going to disclaim that this is all hypothesis, I have no citation, however it stands to reason that this equilibrium between air pressure and volume plays a role in how hard drives work or else manufacturers would not have included a vent in the first place.
It's way too small to provide any cooling functionality, thus the only possible reason I can see it being there is to deal with pressure changes and since outside pressure is a constant (relatively) that leaves inside pressure as the issue.
Re:fill the drive with helium by Volante3192 · 2009-09-18 07:50 · Score: 1

I should amend this comment after doing some research:
Outside pressure is a nonissue at low altitudes. It will become an issue at high altitudes when there's just not enough air to create enough pressure inside.
There ARE sealed hard drives ( http://www.mt-optech.com/ ) but it doesn't look like you get these unless you have a damn good reason to.
Re:fill the drive with helium by BitZtream · 2009-09-18 08:01 · Score: 1

Yes.
Every drive has a hole to make sure pressure is balanced inside and out. Most of them have nothing at all over the hole. A few have a small piece of open cell foam covering the hole on the inside.
The reality of it is however, very little air is going to transition through that hole due to its small size, so dust build up is going to take a little while under normal circumstances.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:fill the drive with helium by Grishnakh · 2009-09-18 08:16 · Score: 1

Aha! So it does look like it's possible, they just don't normally do it, probably because of cost (after all, they're selling 80GB drives now for about $35 on NewEgg).
At some point, it may become cost-effective to made HDs sealed and helium-filled to get higher performance; if they did it on all drives at the factory, the incremental cost probably wouldn't be that much. More likely though, they'll just move to SSDs.
Re:fill the drive with helium by Anonymous Coward · 2009-09-18 08:19 · Score: 0

Don't try it! The heads fly above the platters. The platter surface and head aerodynamics need the density of normal AIR to fly at the correct height. If you lower the density of the ambient gas, you will lower the fly height, the heads will probably flutter, and either way you will increase the likelihood of a head crash (total destruction).
Re:fill the drive with helium by mindstrm · 2009-09-18 09:08 · Score: 1

No they aren't - because the engineers wisely thought of this and put in a particulate filter between the area where the platters spin and the vent hole on the outside.
Re:fill the drive with helium by the_other_chewey · 2009-09-18 10:35 · Score: 3, Informative

Filling the drive with helium should help;
Yeah. For about half a week. Helium has the smallest "gas particles" there are - Hydrogen atoms would
be smaller, but those really like to bond, and an H_2 molecule is quite a bit larger than a Helium atom

That's why He leaks out of everything. No exception. It diffuses through "leakproof" welds for vacuum tanks.
It diffuses through the steel walls of tanks (albeit more slowly). That's also why He is used in leakage detection:
If you see less than $not_so_few He atoms on the outside of the container you test within a couple of seconds after you injected a little bit of He, the container is considered airtight.

The only way to keep a HE atmosphere in your drive would be to constantly refill it. I don't think that there'll be any scenario where this would seem like an even remotely good idea.
Re:fill the drive with helium by Bigjeff5 · 2009-09-18 12:15 · Score: 1

If you've ever opened up a hard drive, you may have noticed there are very weak barriers over open holes - which often say things like "removal voids the warranty yadda yadda". These exist to let air in and keep dust out. Dust = bad, vacuum = worse.
You'd be hard pressed to come up with something better, the drive arms and platters would have to be extremely rigid to prevent the platters from contacting disk while still being able to make minute changes in direction extremely quickly (they are delicate and agile currently), all while maintaining a separation of well below a millimeter. There is also zero heat dissipation through the vaccum, so you would need some sort of heat conductor to make up for the difference there. All in all it may be feasible, but the drives would be a lot heavier, more expensive, and probably slower - which completely defeats the purpose of removing the air to speed the drive up.

--
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
Re:fill the drive with helium by dkf · 2009-09-18 12:17 · Score: 2, Interesting

Filling the drive with helium should help; the speed of sound in helium is 3x higher than in air, and it offers less resistance.
(Hydrogen would be even better, but it has a tendency to interact with metals in unfortunate ways.)

Thinking about it, methane might be a more practical choice. Yes, it's denser than helium so the effect won't be anything like as strong (the speed of sound in methane is only about 40% faster) but it's also very cheap and available, and won't cause too many problems from interacting with the rest of the drive. Having to seal the drive is an issue, yes, but that's not far off what's needed now; it's imperative that dust is kept out of the platter enclosure anyway...

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:fill the drive with helium by Anonymous Coward · 2009-09-18 13:33 · Score: 0

Don't try it! The heads fly above the platters. The platter surface and head aerodynamics need the density of normal AIR to fly at the correct height. If you lower the density of the ambient gas, you will lower the fly height, the heads will probably flutter, and either way you will increase the likelihood of a head crash (total destruction).
Re:fill the drive with helium by Anonymous Coward · 2009-09-18 16:57 · Score: 0

Hard drives already use (inert) helium gas; see http://www.patentstorm.us/patents/5454157/claims.html
Re:fill the drive with helium by baptiste · 2009-09-18 23:11 · Score: 1

Ever wondered why every drive has that small hole marked DO NOT COVER! It's an air vent with a filter on the inside to keep out contaminants to allow pressure to equalize

--
Top Most Bizarre/Disturbing Error Messages
Re:fill the drive with helium by Anonymous Coward · 2009-09-19 01:25 · Score: 0

If you fill the drive with helium then there will be a slashdot post two years later describing how to crack open an "old" helium hard drive to suck the gas out and all of your datacenter admins will be talking funny.
Re:fill the drive with helium by RockDoctor · 2009-09-19 09:41 · Score: 1

You'd somehow have to work out a closed system for a hard drive possibly leading to some sort of bladder to allow helium to leave and return the main hard drive chassis while not being vented to the outside world.
That's not a show-stopper. For a design to work from, look at a mechanical barograph. They contain a moderate size, thin-wall partial-vacuum chamber. As the external air pressure varies, the deformation of the can is picked up by a lever system and used to wiggle the pen. The same construction can be used to produce a package that allows you to contain a fluid around some space while having it stay in barometric equilibrium with the outside world.
We use systems like this at work in the other direction to maintain computers that work while the whole package is suspended in 20,000 psi hot, corrosive fluids.
It's not rocket science. Though similar systems are probably used in rocket science.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Re:fill the drive with helium by Anonymous Coward · 2009-09-19 23:44 · Score: 0

methane might be a more practical choice. Yes, it's denser than helium so the effect won't be anything like as strong (the speed of sound in methane is only about 40% faster) but it's also very cheap and available
So if you suspect your drive is failing, you'll just have to fart in its general direction?
Re:fill the drive with helium by Anonymous Coward · 2009-09-21 07:11 · Score: 0

I've just checked quite a few drives I have lying around and many don't have that label, they all have a hole though. However some of the drives I checked are getting on a bit, so the labeling may be more prevalent on newer drives.
Re:fill the drive with helium by Anonymous Coward · 2009-09-21 07:19 · Score: 0

Bollocks they do. Helium leaks out of just about everything, so just isn't practical. Just because someone patented the idea, doesn't mean it has actually been made, and this idea wouldn't have got past the prototype stage with helium. It may be feasible to implement with other inert gases, but it still isn't done with most hard drives.
Re:fill the drive with helium by Anonymous Coward · 2009-09-21 07:25 · Score: 0

You're a moron. Do you really think anyone on Slashdot is going to fill their hard drives with helium to make it spin faster? This was just a proposed design idea, which if anything would be implented by manufacturers, who could change the head aerodynamics to suit the lower density of helium. Not that the idea is any way feasible, as others have pointed out.

There are good solutions today by Anonymous Coward · 2009-09-18 02:50 · Score: 0

There a number of Enterprise storage array vendors that have very good solutions. Last year my company looked at all the major storage vendors and we chose 3PAR for a number of reason including raid rebuild issues. They had two things going for them. 1) They only rebuild the blocks that are written to. 2) Since they wide stripe across all the drives in the array, they can rebuild from more drives. We have had 2 drive failures since the we purchased the array. Both times the applications did not notice any performance slow down during the failure and rebuild times. This was different from our other arrays we currently have.

While we selected 3PAR, there are a number of storage vendors that do something similar that would fit most companies need for quick rebuilds.

Disks are huge now by Sloppy · 2009-09-18 03:00 · Score: 1

n short, it's a scenario that will eventually require a solution, if not a whole new way of storing and protecting data.

And that "new solution" is a few minutes older than RAID5 itself: RAID1. C'mon, people, RAID1 is the bees knees. Between RAID1, RAID0 (and various combinations of the two), the dirt-fucking-cheap huge-fucking-capacity disks you can get now, and high bus bandwidths, life is great. Everything I used to solve with RAID5, I now solve with RAID1, and it just doesn't have problems. Worried about another disk dying when you're rebuilding a new one? Just spring for the extra $100 (oh, the horrors) for a couple more disks. There's no rule that says you can't have 4 or 5 disk redundancy, and at today's prices, you can afford it. RAID is such a non-problem now, you just have to use the non-parity levels.

Ok, but how else could we improve things? More ECC and redundancy at the sector level. e.g. I'm willing to give up a lot more capacity for even more ECC. If I have 512 bytes of data, I'm ok with using 1024 bytes per disk: let both fail before you call it a disk failure. And those of you who cry about all this (and on top of multi-disk RAID1!) being so space inefficient, I want you to go to newegg and look at dollars per gi^H^Hterabyte. (Is it still terabytes today, or have they moved up to the next prefix yet? (joking, but when I go back and read this in 10 years, it won't sound like a joke anymore))

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:Disks are huge now by mindstrm · 2009-09-18 07:03 · Score: 1

And you'll see this in enterprise SANs.. even a decade ago, an EMC unit would use mirrored pairs of drives at the lowest level available to the operator - so when you decided to build a raid 5 array - you were already using mirrored drives for each element of the array. And yeah, that was expensive.
Re:Disks are huge now by Anonymous Coward · 2009-09-18 10:03 · Score: 0

I agree. We need significantly more redundancy at the sector level. Just make these things tunable parameters and let people who know what they are doing tweak them. Take the 1TB drive, make it, say, 800GB but have more redundancy. I'd also want to be able to tune things like when the drive reports errors to the controller. For my ZFS fileserver using redundant drives, I don't need the disk to worry about it. Report the read error and ZFS will find a good copy elsewhere. Hell, I might even use copies=2 or higher for really important data. The video storage doesn't really matter, I can re-rip that. The /home filesystem though, that could stand some duplication.
Of course, such drives do exist, in a way. They are the ones that say "Enterprise Grade" on the label and cost 2x as much. Retarded.

Re:Worked-around a Long Time Ago by Anonymous Coward · 2009-09-18 03:00 · Score: 0

Are you kidding me? Google Docs is not safer than raid. It has failed and been down for hours. At best it is as good as raid.

Quantum Hard Drives by TheLeopardsAreComing · 2009-09-18 03:03 · Score: 1

Yes, this is as cool as it sounds! Along with the implementation of the quantum computer (which have been speculated to crack pretty much every encryption algorithm known, because it's capable of sending every possible answer at the same time), the implementation of the Quantum hard drive has been explored. Different orientations of spin give 1's and 0's as well as both 1&2 spin (superposition factor). These properties could be used to create very dense and stable hard drives. Someday our HDD's will be the tape drives of today!

Raid, Availabilty and Backup by Archangel+Michael · 2009-09-18 03:07 · Score: 1

What is it(data) worth?

Define these failure types: minor, normal, major, critical, catastrophic.

What are the chances of ____ failure?
What is an acceptable downtown for _____ failure?

These are the questions one needs to ask (and others) before saying "Raid" or "Tape" or "Enterprise Solution" or .... whatever.

If we don't have answers to these questions, then the discussion is simply an exercise in mental masturbation. Simply put, we have to be able to effectively mitigate against the types of disasters by appropriate counter measures weighing against the cost of replacing the data and the likelihood of that disaster.

Losing data is critical to everyone. How much effort is needed to protect it, varies widely.

RAID is simply one level of protection. It isn't a backup, it isn't redundant data. It is just a means to mitigate against a certain kind of failure, nothing more, nothing less.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.

RAID is very much alive and well by boingolover · 2009-09-18 03:25 · Score: 1

RAID actually IS a valid solution to combat the problem of failing drives, even when the technology in those drives is a moving target. Ariel densities have gone up, meaning tolerances have to be tighter, which makes it harder to build a reliable drive than it was some time ago, and that trend will only continue to increase. RAID 5 specifically might not be the best solution for today's needs, but RAID 6 is a step in the right direction. And to call RAID 6 a bandaid is hardly fair. There is no storage methodology that you could put in place now that could be guaranteed valid 10 years from now. Besides, 1+0 is gaining popularity and there are extensions beyond that. RAID 1+0 in particular is much quicker in rebuild time than raid 5/6. You give up half your disk space to redundancy, but you take comparatively little performance hit during rebuilds and it's very fast in the day-to-day. Granted, in worst case scenario you are no more redundant than you would be with RAID 5 (i.e., you lost both disks in the same mirror before it could be rebuilt on a hot spare), but in the best case scenario you could lose up to half your disks and keep chugging along.

hybrix mid of RAID drives? by Tumbleweed · 2009-09-18 03:40 · Score: 1

Is the parity drive a bottle neck to RAID performance? I wonder what the impact would be of using SSDs for the parity drives, and regular drives for the data drives? Assuming you either don't need the same size drive for the parity drives as you do for the data drives, or you simply don't need any more data than an SSD holds, and you can afford 1 or 2 SSDs but not enough for the entire array to be SSDs. :)

Re:Worked-around a Long Time Ago by tomhudson · 2009-09-18 03:47 · Score: 2, Insightful

Faster to just copy it to a usb key. You have multiple copies of your data, and no longer have to worry about network latency, or even if there IS a network available.

RAID isn't going anywhere. by Wdomburg · 2009-09-18 04:00 · Score: 1

Ermmm, how about simply moving to a smaller form factor drive. Most servers have moved to 2.5" bays already and I see no reason to doubt SAN vendors will start offering SFF shelves as well. Another approach is to just throw enough cheap disk at the problem that long background rebuilds aren't a concern. Multiple RAID sets off redundant storage stacks have better DR characteristics regardless.

That isn't to say alternate approaches to data integrity aren't called for as well. It's clear that future filesystems are going to include some level of end-to-end checksumming and offer software approaches to data replication. Likewise, there are plenty of approaches to data replication that don't follow traditional filesystem conventions; consider Google FS or CouchDB.

Anyone know how the hard error rate is calculated? by putaro · 2009-09-18 04:06 · Score: 1

A meme that has been going around is that the size of the disk relative to the hard error rate is getting to the point where reading the entire disk once or twice guarantees getting a hard error. My experience with disks, though, is that they do not "wear out" by how many blocks you read from them. They do tend to wear out with age, vibration, heat, start/stop, etc. Is the hard error rate really based on number of bits read (and, it's also not a guarantee you will get a hard error - it's the minimum) or is it an estimate based on expected usage vs age? Or is it not related to the media at all but instead a measure of how many bits you can transfer over the link without an error?

This meme is really getting silly. Long rebuild times are a problem but we're still talking about days for devices that have typical life spans of years.

What we need is a by advocate_one · 2009-09-18 04:06 · Score: 1

redundant array of redundant arrays of inexpensive discs...

RARAID

--
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.

Joking aside... by Joce640k · 2009-09-18 04:28 · Score: 1

Surely the HDD manufacturers can add extra error-correction data to the hard disks.With 2Tb to play with I wouldn't mind losing 5% (or even 10%) to error-correction.

--
No sig today...

Re:simple idea (the bad old days) by Informative · 2009-09-18 04:29 · Score: 2, Informative

The sealed drives we use now showed up in the '80s. Before that the platters were not part of the drive, they were in a plastic cover to keep the dust off. On the mainframes the cover held a stack of platter; on the minis there was just one or two 5mb platters inside. We would place the whole stack with cover into the drive, then rotate the handle to pull the cover out, leaving the spindle of platters in the drive. Then just close the dorr and push the button to spin it up.
In either those old open ones or the "new" sealed ones, the head flies on a cushion of air, but the distance from head to platter is microspic; a piece of dust is big in comparison. In the old open drives, if the head hit even a tiny piece of dirt, it could "crash" into the platter and gouge out a rip. If you haven't heard it, it was actually fairly loud and startling.

Mod Parent Up - First Really Correct Comment by billstewart · 2009-09-18 04:31 · Score: 1

The big problem with RAID on bigger disks is that you're splitting data across N+1 disks, so if one disk fails catastrophically and needs to be rebuilt, you need to have good data on the remaining N disks, and as the disks become larger, the chances that all that data is good are no longer near-100%. So you're likely to get some bad data even with RAID, though you're not going to have as much bad data as you'd have without RAID, and you're still avoiding the problem that if you disk fails, you lose everything; you're just not guaranteed to lose nothing.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:Mod Parent Up - First Really Correct Comment by Rockoon · 2009-09-18 05:16 · Score: 1

so if one disk fails catastrophically and needs to be rebuilt, you need to have good data on the remaining N disks, and as the disks become larger, the chances that all that data is good are no longer near-100%.
This is true as the # of disks increase as well. It is the total capacity of the array, not the specific size, or number of the number of individual disks that leads us here.

The higher the rebuild time, the more redundancy you need... at some point the redundancy needed even grows beyond the 100%.

--
"His name was James Damore."

The future is virtualization. by rickb928 · 2009-09-18 04:42 · Score: 1

You heard it here, remember me well.

--
deleting the extra space after periods so i can stay relevant, yeah.

Relevant quote by Anonymous Coward · 2009-09-18 04:42 · Score: 0

It's kinda like on Futurama when they had the tanker with 1k hulls.

Fry: What happened?
Dr. Zoidberg: All six thousand hulls have been breached.
Fry: Oh, the fools! Why didn't they build it with six thousand and one hulls? When will they learn?

Mixing up RAID-for-speed and RAID-for-protection by billstewart · 2009-09-18 04:44 · Score: 1

There are three reasons you run raid, other than because it's cookl

Goes faster
Protects you from losing all your data if a whole drive fails
Protects you from losing bits of your data if bits of a drive are bad

SSDs are about speed, but don't have high capacity unless you've got an infinite budget. That's not the problem here. The problems are that

if a whole drive fails and you need to rebuild it, it's going to take a long time which you may not be able to afford, and
If a whole drive fails and you need to rebuild it, you've no longer got parity information, so if there are bad bytes on your disks or bad bits in disk transmission, you can't recover those bits.

SSDs don't really help you on time, because you can't afford to run your whole shop on SSDs if it's big enough that RAID's not reliable. The recovery problem is helped somewhat by disk block error-correcting codes: you can detect if a block has corruption and rewrite/remap it when you're first storing it, so the ratio of bad blocks on disk isn't as bad as you'd expect from raw error rates, and you may need to continually recheck your raid block and rebuild bad blocks proactively rather than finding them after failures.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

The rebuild problem isn't about drive speed. by Anonymous Coward · 2009-09-18 04:56 · Score: 0

It's about RAID controller speed, and that just isn't scaling as fast as disk capacity or business requirements for disk space.

To build or rebuild a RAID5 / RAID6 array is very expensive - in order to calculate parity on one drive you have have to read every other drive and XOR all of those bits together. Since in a RAID5 / RAID6 array there is no dedicated parity disk (parity is distributed across all drives) replacing one drive requires that every bit on every other drive is at either read or written during the rebuild process. That means that the rebuild time is O(n) on the number of disks in the array and also O(n) on the size of the disks in the array. BTW, the performance of a RAID5/6 array during a rebuild is terrible - the controller is flat out already.

To give you an example, a well known company I worked for a few years back was archiving event data onto a cluster of storage machines. The cluster itself was a specialized spatial database optimized for parallel searches, so that TBs of data could be searched in a second or two. IIRC, in 2004 each storage machine within that cluster contained 2TB RAID5 arrays split across 6 disks and a rebuild took most of a day. By 2007 each storage machine contained about 10TB split across 6 disks and a rebuild took most of a week.

Now consider time between failures. Carnegie Mellon did a study of 100000 drives during which MTBFs claimed by manufacturers was an average of 15 times more than that attained during the study. So that expensive SCSI 1.2 million hour MTBF really has a MTBF of about 100000 hours.

In a RAID array the MBTF of any single disk is the MBTF of a single disk divided by the number of disks. So that 100000 hours MBTF divided by 6 drives equates to 700 days between disk failures. 2 years sounds reasonable enough. But, iIn my former employer's cluster though there were 40 machines (and growing by about 4 each year). So the MBTF for a single disk in the entire cluster was about 17 days. On average, every 17 days someone had to babysit an array rebuild that took most of a week.

It gets worse though. The same Carnegie Mellon study showed a strong link between failure and disk age, ie your oldest drives are most likely to fail. That means that the chances of a second drive failing in the array before the rebuild is complete cannot be ignored, and indeed we saw this happen on 2 occasions. Fortunately backups existed and the data was recovered. By the time I left we were using raid 15 ( read performance is maintained during a rebuild of one of the constituent raid5 arrays ), and had an active data migration policy so that older machines were taken out of the cluster before all of their disks started failing.

I hate to think what problems Google must have with this.

Parent is spot on by bussdriver · 2009-09-18 05:11 · Score: 1

I have seen rebuilding problems TWICE where a 2nd drive died or had sector errors during rebuild. I've also avoided problem but had drives in the RAID all dying or having errors within the same year of the 1st drive's problem. Its not comfortable to spend 6 months swapping out most the drives when you can only afford to lose 1 of them.

The solution is preemption. You replace drives every 3 years reguardless. see google's study on drive life. 5 years seems to be the furthest one should go and 3 years is safer. Its also true that a significant number of drives show trouble in the 1st month (under heavy use) and I was about to start a burn in policy before I left that job. We ordered 30-40 drives at a time.

Buying the disks at the same time is somewhat unavoidable because if you buy lots you get discounts; PO approval etc. I think the risks are low of multiple drive deaths. I would however replace a whole batch immediately if just 1 died from the lot (my new policy.) Key to this is tracking the drives by AGE and batch. Oh, I USED to swear by seagate.

Hardware raid may cost but just try getting back online when the raid controller DIES. Buy two. I prefer software RAID, especially on servers; screw pci drivers! I never was a fan of parity RAID; use RAID 10. You can lose up to HALF of all disks in RAID 10; sure, a specific half not random half... its all a gamble anyhow.

Ultimately the #1 question is always: "What is my data worth to me?"

Going in the future, I've been WISHING somebody will standardize drive clustering; hopefully with dumb controller boards on the disks and a standardized controller board. Sure its a hardware controller-(and centralized caching?)- but it doesn't have to be RAID it can be a block level manager without the limitations of RAID. If standardized, then the main problems of resolving controller failure and vendor lock would be gone; plus you could would have an upgrade path. Today, this can not be done because you get tied to a vendor, a series, or even just 1 model of controller board.

The problem will continue to grow to the point where people will want management not dumb RAIDs; then we might see a move towards a convention and maybe standardization.

--
Democracy Now! - uncensored, anti-establishment news

Re:Parent is spot on by Anonymous Coward · 2009-09-18 08:40 · Score: 1, Interesting

I do like mirroring more than this parity stuff.
Let's assume I have e.g. a RAID10 containing 100TB of Data. I loose one disk. and then I loose the second one. CrashBoomBang!
Of course I'm not stupid and I have a backup of all my 100TB of Data. What traditionally would happen now is, I will need to restore all of my Data, as I don't know what I had on these disks. Takes ages!
Now comes the magic of ZFS. Through the combination of ZFS as a Volume manager and a file system, ZFS can tell me exactly which files are missing. I will then replace both failed disk. If using 1TB Disks, I don't have to restore 100TB, but only up to 1TB. This will minimize the recovery time up to 100 times.

Raid N+1 by davidwr · 2009-09-18 05:15 · Score: 1

With disk space getting cheaper and cheaper, "N+1" plus transaction logging may be the way to go.

If a drive fails, mark that half of the mirror offline then when the drive is replaced, replay the log on that half.

Granted, this won't work if your throughput is so fast the log can never be replayed, but in most cases you'll have the drive replaced within 24 hours, if not within one.

You'll still have the rare case where both mirrors have a disk failure at the same time, in which case you'll degrade to RAID 5 or whatever RAID N you are using as a baseline.

In a large enterprise, it may pay to segregate your data between "gotta have it as fast as local storage," "gotta have it within fractions of a second," "gotta have it within seconds," "gotta have it within minutes," "gotta have it within an hour," "gotta have it within a day," and "gotta have it within a week" and buy your drives accordingly. "Gotta have it within minutes" data can be stored on drives that are usually powered off. "Gotta have it within an hour" can be on drives that are not even in the computer room and are fetched by a low-level clerk when needed. "Gotta have it within a day" data would be archived data in an offsite data center that can be retrieved fairly quickly. Everything else, mostly old data kept for legal reasons that probably nobody will ever read, would be in the "gotta have it within a week" bin.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re: 30k RPM drives by rnturn · 2009-09-18 06:42 · Score: 1

``15k RPM drives have been around for quite a while now, at least for 10 years, and there has not been an improvement in speed in that time. Where are my 30k RPM drives?''

Probably doubling as incandescent lights in the development lab. OK, that may be a bit of an exaggeration but have you ever touched a 15K RPM drive that's been running for a while? They get damned hot. I mean "burning your finger" hot. You don't use those in enclosures that aren't designed to provide a lot of air flow to prevent the drives from cooking themselves to death. (And more air flow inevitably means louder enclosures so most users will balk at deploying them in a non-data center environment.) Now imagine the heat dissipation and the cooling needed for a 30K RPM drive.

--
CUR ALLOC 20195.....5804M

Re:Worked-around a Long Time Ago by Anonymous Coward · 2009-09-18 06:44 · Score: 0

So are we ready to move all our personal information to clouds? I certainly am not, but Google Docs are wildly popular and a lot of people are. I long ago learned that I can't look to myself to judge what the mainstream attitudes are in many things.

Mainstream... fads?

Rediculous by mindstrm · 2009-09-18 07:01 · Score: 1

That's pretty silly...

Yes - running a home raid-5 array with 4 drives (3 for the array, 1 spare) - it's days are limited. The risk profile goes up because data density has grown faster than rebuild times. Fair enough.

That's why raid methods continue to evolve to keep up with the times, and enterprise solutions continue to become more sophisticated.
But we'll still be using "Redundant arrays of independent disks" for quite a while I imagine.

I think alot of people missed the point.... by pjr.cc · 2009-09-18 07:37 · Score: 1

Raid may be coming to end, but its nothing to do with the speeds of drives. Its all about size. Its all about the fact that you are GUARANTEED to have a sector fail on read after x number of reads. Now because x hasnt changed much over the decades while drives have gotten vastly larger, we're approaching x but we've still got a while to go yet and in the enterprise (where raid actually matters) its not something that'll really be a problem for at least another 5-10 years. This is not a problem that raid 6 or zfs solve.

But, the problem is this, if a drive fails in a raid 5 array and the array runs off to rebuild it then when you get near x you'll fail during the rebuild and that will keep occuring until the array gives up. Raid 6 still has the same issue, it'll still read all the drives to rebuild the broken disk, same with ZFS. Unfortunately, there has been alot of industry spin suggesting raid 6 and ZFS solve this issue which is patently false.

Raid 6 was designed because of how arrays were being deployed, when large scale disk boxes were being deployed (SAN etc) we got to a point where if you had enough disks you might as well have an engineer on site because the failure rate of drives was such that when you got to a certain number of drives (we're talking hundred of thousands here, not that hard in a large data center packed with SAN arrays) you could guarentee at least one drive would fail EVERY SINGLE DAY. So, with raid sets getting larger (i.e. 10-15 drives in a single raid 5 array and they were getting much bigger than that) it became important to save yourself from a dual-drive failure (because the likelyhood of it happening went up exponentially as the set got larger), having two parity drives solved that issue - hence raid 6.

ZFS was designed for performance. Raid 5 (and 6) SUCKS for performance when your not using some form of off-load device (hardware raid card or SAN array head) and so ZFS was designed to (in part) give people the ability to do that kind of thing on the server cpu without killing the performance of having 6-10 striped drives. The problem is that if you throw some data at a drive, computers are very good at it. They can just say "write this chunk of data here please" and the rest just happens. With raid (if its done in software), what you have to do is calculate the parity of ever single byte of data on the array - this hurts alot and destroys the whole transfer mechanism.

The point is though, we're approaching x and the only real solution is to fix x. Though, with SSD's coming along quite nicely, the problem will likely resolve itself and we'll be using raid 6 SSD's. SSD's are quite capable of solving this little drama because x can be quite variable unlike their "spinning-rusty-metal" counterparts. They can also be scaled alot easier for physical size (something we havent seen yet). Its very hard to produce a tonne of different sizes when it comes to disks - i.e. we have three formats, 1.8", 2.5" and 3.25" and alot of that has to do with producing some "spinning-rusty-metal" in an efficient way at different sizes. i.e. thats just not practical for motorized components. SSD's on the other hand are a completely different boat. Because they are just chips on a little board they can scale very well in terms of physical size. Right now it hasnt happened, they are doomed to follow the pre-set sizes of their cousins because of the fact we're forced to use both.

This wont always be the case, eventually not only will SATA, FC, infiniband (an already mostly dead tech any way) and SAS be a thing of the past (They were designed for spinning rusty metal and with people like seagate threatening to sue ssd makers for using sata and the like, ssd boys will probably make their own interface). The traditional interfaces for HD's are not fantastic (as it turns out) for ssd's, you could in fact make a much better interface if your talking directly to ssd's. This also hasn't happened, but one day it will, it'll hit the consumer market and that will be the death of "spinning-rus

stupid article! by Anonymous Coward · 2009-09-18 08:16 · Score: 0

Your story claims RAIDs then you define it more precisely at RAID 3,5, or 6 but the talk about it again referring to all RAID types. Get a clue! Nothing wrong with RAID 10 or RAID 1 when it comes to reliability and rebuild times! On top of all this you claim RAID is old technology and state how it will be replaced but there is not replacement so how can this be? Stupid author!

Should it really be Hard? by hyperion2010 · 2009-09-18 11:02 · Score: 1

Come on guys, just store everything in RAM.

If the point of RAID is access times then clearly RAM beats the hell out of all the competitors here.
If what you needs is uptime then the system is always on, thus RAM.
If you're worried about running out of space in your server rooms ram is alot smaller than HDDs.
If you want redundency just set mirrored tmpfses.
If you're worried about failure RAM failure rate is WAY lower than a HDD and has ECC built in, plus, hotplugging RAM is cake.

So, WHY, I ask, are we still using hard drives? What could possibly go wrong?

Author's data doesn't match his hypothesis by OrangeCatholic · 2009-09-18 23:55 · Score: 1

There we go. Somebody is talking about the article.

I'm not a storage expert, but something about this article seems half-baked. For example, looking at the rebuild table, from 2002 to 2005, rebuild time stayed roughly the same, while drive size went from 146gb to 300gb and bandwidth went from 89 to 119 MB/sec. Is there a problem with that?

Then, from 2005 to 2009, rebuild time dropped, even though disk size went from 300gb to 450gb, because the number of drives on the channel doubled from three to six. Again, is there a problem?

Now looking at his second table, I guess the author's point is that since rebuild times have increased from 5.28 hours in 1994, to 6.6 hours in 2009, and drives have gotten 100x more reliable (10e14 vs 10e16), the ponderously slow, 2-order of magnitude growth in drive reliability won't keep up with the dramatic 25% increase in rebuild times.

Is this the point he's trying to make?

ZFS by bussdriver · 2009-09-19 10:27 · Score: 1

ZFS seems to solve many issues with a higher-end block manager approach in the filesystem itself; it is like a smart software raid.

However, ZFS is not wide spread and outside of SUN not something I can trust for serious use at this time. Its also a server bound solution as well; some higher end hardware would be nice that had ZFS features like the high end solutions already out there but doesn't cost a fortune with vendor lock.

I have (for fun) created Raid 10 with triple redundancy but i never used it in production (due to cost.) Again with a RAID 10 you are using at least 4 disks. You COULD lose 2 disks and be ok as long as it is not the wrong pair of disks. It goes up the larger the stripe goes. Yes, the whole thing is gone if you happen to lose the wrong 2 drives. The other RAIDs can't lose 2 disks except raid 6 which isn't even available in many situations.

--
Democracy Now! - uncensored, anti-establishment news

444 comments