Everything You Know About Disks Is Wrong

MTBF by seanadams.com · 2007-02-20 13:36 · Score: 5, Interesting

MT[TB]F has become a completely BS metric because it is so poorly understood. It only works if your failure rate is linear with respect to time. Even if you test for a stupendously huge period of time, it is still misleading because of the bathtub curve effect. You might get an MTBF of say, two years, when the reality is that the distribution has a big spike at one month, and the rest of the failures forming a wide bell curve centered at say, five years.

Suppose a tire manufacturer drove their tires around the block, and then observed that not one of the four tires had gone bald. Could they then claim an enormous MTBF? Of course not, but that is no less absurd than the testing being reported by hard drive manufacturers.

Re:MTBF by Wilson_6500 · 2007-02-20 13:45 · Score: 5, Informative

Um, but doesn't the summary of the paper say that there is no infant mortality effect, and that failure rates increase with time, and thus the bathtub curve doesn't actually apply?
Re:MTBF by gvc · 2007-02-20 14:04 · Score: 3, Interesting

MT[TB]F has become a completely BS metric because it is so poorly understood. It only works if your failure rate is linear with respect to time. Even if you test for a stupendously huge period of time, it is still misleading because of the bathtub curve effect. You might get an MTBF of say, two years, when the reality is that the distribution has a big spike at one month, and the rest of the failures forming a wide bell curve centered at say, five years.
The simplest model for survival analysis is that the failure rate is constant. That yields an exponential distribution, which I would not characterize as a bell curve. The Weibull distribution more aptly models things (like people and disks) that eventually wear out; i.e. the failure rate increases with time (but not linearly).
With the right model, it is possible to extrapolate life expectancy from a short trial. It is just that the manufacturers have no incentive to tell the truth, so they don't. Vendors never tell the truth unless some standardized measurement is imposed on them.
Re:MTBF by kidgenius · 2007-02-20 15:16 · Score: 2, Informative

Well, I guess you don't really understand reliability then. You also don't understand MTBF/MTTF (hint: they aren't the same) What they have said is a big "no duh" to anyone in the field. MTTF will work regardless of whether or not your failure rate is linear with time. Also, there are other distribution of failure beyond just exponential, such as the Weibull. Exponential is a subset of the Weibull. Using this distribution you can accurately calculate an MTTF. Now, the MTBF will not match the MTTF initially, but given enough time, it will eventually match the MTTF. All of this information is very useful to anyone that actually knows what to do with those numbers.
Re:MTBF by kidgenius · 2007-02-20 15:24 · Score: 2, Insightful

I'm also going to add to my statement and mention that the authors of the article do not understand MTTF. They have calculated MTBF, not MTTF. They are not the same. In fact, they have assumed that the drives fail in a random way by doing a simple hours/failures. They need to really to look at failures and suspensions and perform a weibull analysis to see how close their stuff is to the manufacturers stated values.
Re:MTBF by kidgenius · 2007-02-20 15:35 · Score: 3, Informative

No, they don't. Hard drive manufacturers state an MTTF, which is very different from MTBF. The two can be similar, but they are not interchangeable. The author of this paper has calculated MTBF, and tried to compare it to MTTF, which is WRONG. They really should've consulted a reliability engineer. Any competent one worth their salt would see the difference. One of them varies with time, the other is static and unchanging based on age.
Re:MTBF by 6th+time+lucky · 2007-02-20 15:56 · Score: 3, Insightful

MT[TB]F has become a completely BS metric because it is so poorly understood.
Dont forget the M in MTBF. Its mean (stastically speaking...). That means (!) that some might fail now, some later, but on average they last a while. Manipulate that information and you might get 1,000,000 hrs MTBF, but you have to account for and not forget about the worst case senario (thats what a failure is) which might be the next drive is going to fail *now*, which is why RAID5 isnt as good as it might seem looking at the average statistics.

Backup, backup, backup has always been my motto (and thats just personal data). Interesting that Google thinks this is the way to go also (i.e. 3 copies of all data)
Re:MTBF by angio · 2007-02-20 17:56 · Score: 3, Informative

Your statement doesn't make a lot of sense. a) Hard drives are a non-repairable system, for all intents and purposes. Therefore, there *is* no repair. MTTF is the only useful metric. b) MTBF = MTTF + the time to repair. Assuming that's zero, then for any useful failure engineering, hard drive MTBF = hard drive MTTF. That's about all you've got if you're expressing the statistic as a single number. The reason that MTBF is a function of time is to cope with the assumption that the system is less reliable after a repair, which doesn't apply in this case.

Now, you can have all sorts of distributions that you draw that mean from, but a mean is a mean.
Re:MTBF by vtcodger · 2007-02-20 20:01 · Score: 5, Insightful

***Um, but doesn't the summary of the paper say that there is no infant mortality effect,***
It does. But it also says -- repeatedly -- that the data is disk replacement data, NOT disk failure data. i.e. it's data on the number of problems that the user tech thought might be fixed by replacing the disk, not by the number of disks that actually failed. One might wonder if, for example, the response to a system failing while it was being set up or in early lifetime might not be to put the whole damn thing into a box and ship it back to the vendor rather than dink around trying to figure out what is wrong. That won't be recorded as a disk failure.
The study is fine -- really it is. But, table 3 ought to give pause. It's quite clear that different data sets show quite different diagnostic patterns. We've got one set of data that says that power supplies, for example, are hardly ever replaced and a second set that says that they are the most frequently replaced item. There MAY be good reasons for this. But it could also be an indication that the technicians are incompetent, that the record keeping is erratic, or (and I'd seriously consider this one) that only certain kinds of failures are being recorded.
Finally, I think someone really ought to mention that there is no way that a disk manufacturer is actually going to measure MTBFs of 100000 hours prior to printing up the data sheets. The problem is that there are only around 750 hours in a month. And you need a reasonable number of failures (many quality guys would say at least 4) in order to get a reasonably valid MTBF. In order to actually measure a six digit MTBF, the manufacturer would have to run maybe 500 units for a month. My guess is that isn't going to happen. If they have the production line producing 500 units, they are going to ship them. Manufacturer MTBF data are surely based on data from a handful of engineering and preproduction units plus a bunch of wild guesses.
My guess, and it is just a guess, is that manufacturer MTBFs for disks are probably pretty much the MTBF goal in the drive specifications established before the design actually started.
Incidentally, based on some experience with other sorts of high tech gadetry, if the engineering/preproduction units do fail during test, a failure analysis will be done, and steps will be taken to fix the problem. Problem's fixed. OK, we shouldn't count those failures since they won't happen any more. That's called "censoring failure data". Begin to get an idea why disk MTBFs might be pretty much pure fiction?

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
Re:MTBF by kidgenius · 2007-02-21 06:10 · Score: 2, Insightful

And for my final trick, let me give you an example.
Let's say you have five units with an MTTF of 5000 hours, and we put a new one into service every 500 hours.
It'll look something like this:
0-5000
500-5500
1000-6000
1500-6500
2000-7000
Now, each drive failed after five thousand hours. This is the mean time to failure. In other words, each drive had, on average, 5000 hours on it when it failed.
Next, let's calculate MTBF. There were 5 failures, with a total of 7000 hours of operation. This would result in a cumulative MTBF of 7000/5 = 1400 for the system. If you really look at it even closer you can see that you had an MTBF of infinity for the first 5000 hours, then an MTBF of only 500 hours for the last 2000 hours. Noticed how MTBF has changed over time but MTTF has remained the same? Notice the huge difference between MTBF and MTTF now? Noticed how I didn't take repair into account at all?
So repeat after me....MTBF is NOT the same as MTTF. The paper is incorrect in this regard.

moving parts by DogDude · 2007-02-20 13:41 · Score: 5, Funny

Every single mechanism with moving parts will fail. It's just a matter of when. In a few years, when everybody is using solid state drives, people will look back and shake their heads, wondering why we were using spinning magnetic platters to hold all of our critical data for such a long time.

--
I don't respond to AC's.

Re:moving parts by Nimloth · 2007-02-20 13:57 · Score: 2, Interesting

I thought flash memory had a lower read/write cycle expectancy before crapping out?
Re:moving parts by theReal-Hp_Sauce · 2007-02-20 14:05 · Score: 5, Funny

Forget Solid State Drives, soon we'll have Isolinear Chips. It wont matter if they fail or not because as long as the story line supports it Geordie can re-route the power through some other subsystem, Data can move the chips around really quickly, Picard can "make it so", and after it's all over with Wesley can wear a horrible sweater and deliver a really cheese line.

-C
Re:moving parts by NMerriam · 2007-02-20 14:19 · Score: 4, Informative

I thought flash memory had a lower read/write cycle expectancy before crapping out?

They do have a limited read/write lifetime for each sector, BUT the controllers automatically distribute data over the least-used sectors (since there's no performance penalty to non-linear storage), and you wind up getting the maximum possible lifetime from well-built solid-state drives (assuming no other failures).

So in practice, the lifetime of modern solid state will be better than spinning disks as long as you aren't reading and writing every sector of the disk on a daily basis.

--
Recursive: Adj. See Recursive.
Re:moving parts by wik · 2007-02-20 14:45 · Score: 4, Informative

Not true. Transistors at really small dimensions (e.g., 32nm and 22nm processes) will experience soft breakdown during (what used to be) normal operational lifetimes. This will be a big problem in microprocessors because of gate oxide breakdown, NBTI, electromigration, and other processes. Even "solid-state" parts have to tolerate current, electric fields, and high thermal conditions and gradually break down, just like mechanical parts. Don't go believing that your storage will be much safer, either.

--
/ \
\ / ASCII ribbon campaign for peace
x
/ \
Re:moving parts by scoot80 · 2007-02-20 14:56 · Score: 2, Informative

Flash memory will have about 100,000 write cycles before you will burn it out. As parent mentioned, a controller would write that data to several different locations, at different times, thus increasing the lifetime. What this would mean though is that your flash disk will be considerably bigger then what it can actually hold.
Re:moving parts by blackest_k · 2007-02-20 15:37 · Score: 3, Interesting

Still doesn't mean it will last, got a 1 gig usb flash drive here dead in less than 8 weeks and very few read and writes. It will not identify itself. It might have 99,900 write cycles left but its still trashed.
Lets face it there is no reliable storage media, the only way to be safe is multiple copies.

--
Blarney Quality Restaurant, Plants
Re:moving parts by am+2k · 2007-02-20 21:32 · Score: 2, Insightful

The point you didn't get was that even solid state disks can fail without warning, so you need a backup anyways.

You only need a single counterexample to disprove a theory.
Re:moving parts by koyangi · 2007-02-21 01:11 · Score: 2

Given some failure data, you can calculate an MTBF for almost anything. The military has been compiling reliability data for various electronic components for many decades. Yes they have.

I work in the defense industry and in general hardware works just long enough to be installed in the vehicle. All you need to know is when the first system test will be done in front of the customer and you can easily predict when failure of every critical componet will occur.

Some people insist upon using math and MIL-HDBK-217, but I say give me a program schedule and I can tell you exactly when you will hit a 50% failure rate!
Re:moving parts by Boglin · 2007-02-21 05:32 · Score: 3, Funny

I think you need to reread the article. It clearly states that consumer solutions are just as good as Enterprise one.
Re:moving parts by Maximum+Prophet · 2007-02-21 07:43 · Score: 2, Informative

If you look at the numbers for the failure of the system RAM and assume that most machines have much, much more disk space than RAM, SSD's don't make sense. They are faster, but you won't get better MTTB's. On the HPC1 and COM1 groups of machines, the memory was replaced almost as often as the hard drives. If you had to replace all that HD space with RAM, your failure rate would go though the roof.

--
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)

i'll tell you by User+956 · 2007-02-20 13:43 · Score: 2, Interesting

Bianca Schroeder, of CMU's Parallel Data Lab, submitted Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?

It means I should be storing my important, important data on a service like S3.

--
The theory of relativity doesn't work right in Arkansas.

Re:i'll tell you by karnal · 2007-02-20 14:03 · Score: 2

"redundancy, redundancy, redundancy."

So that Department of Redundancy Department really does something after all!

--
Karnal
Re:i'll tell you by DarkVader · 2007-02-20 15:12 · Score: 2, Funny

And could there be anything funnier that could happen to that comment than it being moderated "Redundant"?

"Everything You Know About Disks Is Wrong" by cookieinc · 2007-02-20 13:49 · Score: 3, Funny

Everything You Know About Disks Is Wrong

Finally, a paper which disspells the common myth that disks are made of boiled candy.

Re:"Everything You Know About Disks Is Wrong" by egr · 2007-02-20 14:44 · Score: 4, Funny

I've read the article, then the tittle, damn!

Amazing! by Dr.+Eggman · 2007-02-20 13:52 · Score: 2, Insightful

You mean to tell me these people have found hard drives that don't fail beyond repair by the end of the first year? I've never encountered a HD that has done this, much to the despare of my wallet. Now, I am serious, what is wrong with the harddrives I choose that kills them so quickly? Is Western Digital no longer a good manufacturer? Should I maybe not run a virus check nightly and a disk defrag weekly? Is 6.5GB of virtual memory too much to ask? Of course not, the manufacturers are just making crappier hds. This article has told me one thing: it's time to get a RAID setup. I've been looking at RAID 5, but two things still trouble me, the price and the performance hit. Does anyone have any information on just how much a performance hit I might experience if I have to access the HD a lot?

--
Demented But Determined.

Re:Amazing! by BagOBones · 2007-02-20 15:41 · Score: 2, Insightful

Those Deathstars as I like to call them where really really bad. If you build your servers with a strong support contract from your vendor you can get really fast drive replacement times. We run completely on Dell servers with GOLD level support. I had a drive fail in my primarily file server, I had a replacement drive on my reception desk in 4 hours from putting my phone down to report the problem. The controller supported background rebuilding so the users didn't even feel the loss.

I you build your own servers, you need to have more spares on hand than 1

--
EA David Gardner -"... but the consumers have proven that actually what they want is fun."
Re:Amazing! by Kadin2048 · 2007-02-20 17:58 · Score: 2, Interesting

Somewhere around I have an Apple 20MB hard drive that is getting on 15 years old. Sure, it hasn't seen a lot of usage recently, but I still fire it up every once in a while. (It makes the greatest turbine-like startup sound; seriously, it's like a 747.) Connects to the floppy disk controller. Has its own power supply.

I'm sure there are people around with even older, still-working-fine gear. A while back, I saw some DEC disk packs for the early removable-platter hard drives selling on eBay, as pulls-from-working equipment. I'm not sure what exactly was going through the minds of the designers when they were building stuff, a decade or two ago, but they just seemed to not be planning for obsolescence in the same way that the people churning out today's disposable gear are. (Although the sample is clearly biased: looking at the 20-year-old gear from 1986 that's still around today might make you think that everything then was bulletproof, but in reality all the crappy stuff is already 30 feet down in some landfill somewhere.)

I suspect in 20 years, people will look back at 2006 gear as the height of reliability, just because it'll only be the really exceptionally well-built pieces of gear that will still be around. The Deathstars and other crap drives that failed will long be forgotten.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Re:Amazing! by petermgreen · 2007-02-20 21:30 · Score: 2, Informative

If anything, RAID should make your hard disk access a lot faster. That is, unless you go for software RAID, which will put a hit on your processor.
afaict Linux software raid is actually pretty good nowadays at least as long as you stick to the basic raid levels

beware of the very common fake hardware (e.g. really software but with some bios and driver magic to make the array bootable and generally behave like hardware raid from the users point of view) controllers. Theese often have far worse performance than linux software raid and many of them only support windows.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:Amazing! by drsmithy · 2007-02-20 22:37 · Score: 2, Informative

That is, unless you go for software RAID, which will put a hit on your processor.
This myth needs to die. No remotely modern processor takes a meaningful performance hit from the processing overhead of RAID.
However, I think if you're going to make the investment to go with RAID 5, then buying a proper hardware controller won't add a significant amount to the cost of your set up.
Decent RAID5-capable controllers are hundreds of dollars. Software RAID is free and - in most cases - faster, more flexible and more reliable.
Re:Amazing! by drsmithy · 2007-02-20 22:41 · Score: 2, Informative

Uh sorta. Depends on the raid type. Striped will be faster, mirrored will be about as fast, raid 5 is gonna be the slowest, even in hardware.
Compared to a single disk, RAID5 is still going to be faster (except perhaps for the odd corner-case here and there).
Also, in many cases, software RAID5 is faster that hardware RAID5.
Re:Amazing! by 10Ghz · 2007-02-20 23:30 · Score: 2, Interesting

"If anything, RAID should make your hard disk access a lot faster. That is, unless you go for software RAID, which will put a hit on your processor."

Since we are talking about IO-bound operations, does that matter? I mean, CPU is hardly ever the bottleneck these days, the hard-drive quite often is. So even if soft-RAID puts more load on the CPU, does it cause any slowdown? Espesially if it makes IO faster?

--
Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.

Re:Dr. Schroeder is pretty hot, too! by Anonymous Coward · 2007-02-20 13:53 · Score: 5, Funny

Except she requires a MTBF of more than 3 seconds. Sorry dude.

Re:Dr. Schroeder is pretty hot, too! by gardyloo · 2007-02-20 13:56 · Score: 2, Funny

Except she requires a MTBF of more than 3 seconds. Sorry dude.

You call that failure?!? I'd call it success.

infant mortality by Anonymous Coward · 2007-02-20 13:57 · Score: 5, Insightful

I suspect that the 'infant mortality' syndrome really has to do with the drives being abused before they are installed in the machines (getting dropped during shipping for example)

the large shops like these studies are looking at get the drives in bulk directly from the manufacturer, the rest of us who have to go through several middle-men before we get our drives have more of a chance that something happened to them before we received them.

David Lang

Re:infant mortality by mabhatter654 · 2007-02-20 16:55 · Score: 2, Insightful

I think the myth of infant mortality is that if the drive works in the first week/month it will work perfectly until the warranty/magic dust wears off and you don't have to worry about reliability until then. What they saw in the real world was that some drives had consistantly reduced performance and lifespan right from the start. You can't operate on the assumption that I replaced 5 drives so I'm good for 3 years and not keep spares or backups ready... the Google report takes this another step because they were interested in what the drives were reporting for health.. is the drive's internal software giving good reliability numbers... In Google's case the drives weren't and work needs to be done.
the whole idea is away from the main myth that drive never fail unless they're junk... to the idea that drive DO and WILL fail because they are mechanical parts. Engineers aren't interested in blaming the manufacurer for imperfect parts, but in doing THEIR job of keeping the data and the network going.

Re:MTBF? RTFA. by Vellmont · 2007-02-20 13:58 · Score: 4, Informative

You might get an MTBF of say, two years, when the reality is that the distribution has a big spike at one month, and the rest of the failures forming a wide bell curve centered at say, five years.

Well, the article actually says that drives don't have a spike of failures at the beginning. It also says failure rates increase with time. So you're right that MTBF shouldn't be taken for a single drive, since the failure rate at 5 years is going to be much higher than at one.

The other thing that the article claims is that the stated MTBF is simply just wrong. It mentioned a stated MTBF of 1,000,000 hours, and an observed MTBF of 300,000 hours. That's pretty bad. It's also quite interesting that the "enterprise" level drives aren't any better than the consumer level drives.

--
AccountKiller

Desktop vs Server usage. by DigiShaman · 2007-02-20 13:58 · Score: 3, Insightful

Key observations from Dr. Schroeder's research:
High-end "enterprise" drives versus "consumer" drives?

Interestingly, we observe little difference in replacement rates between SCSI, FC and SATA drives, potentially an indication that disk-independent factors, such as operating conditions, affect replacement rates more than component specific factors."

Maybe consumer stuff gets kicked around more. Who knows?

Or maybe powering up the drives off and on is more stressful to the components; say in a desktop environment. With servers racked up, the drives are always spinning with near constant thermal conditions.

--
Life is not for the lazy.

Re:Desktop vs Server usage. by Lumpy · 2007-02-20 14:38 · Score: 4, Interesting

Or she forgot to put in the part that Enterprise drives are replaced on a schedule BEFORE they fail. At Comcast I used to have 30 some servers with 25-50 drives each scattered about the state. every hard drive was replaced every 3 years to avoid failures. These servers (Tv ad insertion servers) made us between $4500-13,000 a minute they were in operation in spurts of 15 minutes down 3-5 minutes inserting ad's. Downtime was not acceptable so we replaced them on a regular basis.

Most enterprise level operations that relies on their data replace drives before they fail. In fac tthe replacement rate was increased to every 2 years not for failure prevention but for capacity increases.

--
Do not look at laser with remaining good eye.
Re:Desktop vs Server usage. by markov_chain · 2007-02-20 15:02 · Score: 2, Informative

I never had a hard drive fail. I buy one more new one a year, and drop the smallest one. I run 4 at a time in a beige box PC. They are a mix of all sorts of manufacturers (usually from a CompUSA sale for less than $0.30/GB).

- I never turn off the PC.
- The case has no cover.

--
Tsunami -- You can't bring a good wave down!
Re:Desktop vs Server usage. by Reziac · 2007-02-20 15:53 · Score: 2, Interesting

Well, I can connect my own anecdots ;) Once they're fully set up, my everyday machines are never powered down again (except to upgrade the hardware), nor do the HDs spin down. They are also on good quality power supply units, AND are protected by a good UPS, AND have good cooling. Those 3 points can make all the difference in the world to their longevity, regardless of use patterns.

Right now my everyday HDs number thus:

6.4GB W.D. -- new in 1998, has always run 24/7. No SMART but probably has upward of 70,000 hours uptime. (Its identical twin failed about a year ago, but it had always clanked louder while doing thermal recalibration. This one is still quiet.)

8.4GB W.D. -- new in 1998, used about 12hrs/day thru 2002, offline 2002-2006, running 24/7 for the past year. No SMART but probably has about 25,000 hours uptime.

45GB W.D. -- SMART data: 42093 hours uptime, 181 power cycles (mainly as hard resets).

40GB W.D. -- SMART data: 3919 hours uptime, 197 power cycles. (Dated 2002; found in trash in 2006)

60GB W.D. -- SMART data: 28056 hours uptime, 100 power cycles (mainly as hard resets)

Running 24/7 pretty much eliminates thermal stress and the "what do you mean you're not powering up today?!!" that happens sometimes with older HDs.

Other points of conventional wisdom about running fulltime:
1) "It causes more bearing wear." I wonder if that's so -- might the lubricant stay better distributed when it never chills down and never gets a chance to settle and congeal??
2) "It's more likely to stiction if it does sit til it's cold." In my experience it's the opposite -- the HD with only intermittent use is far more likely to stiction, and sometimes can be cured permanently by letting 'em run for a few days solid.

One of the points in TFA was that over 40% of RMA'd HDs proved to have nothing wrong with them. This is in line with my own observations (in fact, closer to 100% in SOHO/home-user environments) -- many supposed HD failures are actually user or software errors, not the hardware at all.

I don't know that this is at all helpful :) But my recommendation to my clients is that if they don't want to run 24/7, they should not power the machine on and off more than once a day.

--
~REZ~ #43301. Who'd fake being me anyway?
Re:Desktop vs Server usage. by MadMorf · 2007-02-20 16:01 · Score: 5, Informative

Most enterprise level operations that relies on their data replace drives before they fail.

You worked at an unusual place!

I'm a Tech Support Engineer for a large storage system manufacturer and I can tell you that NONE of our customers replace disks before they fail unless our OS detects a "predictive failure" for the disk. Our customers are some of the biggest names in business from all over the planet.

--
Goofy, Geeky Gifts and More!
Re:Desktop vs Server usage. by yoprst · 2007-02-20 18:05 · Score: 3, Interesting

It's broadcasting, dude! No downtime is allowed. Here in Soviet Russia we (broadcasters) do exactly the same, except that we prefer 2-year period.
Re:Desktop vs Server usage. by the_womble · 2007-02-20 21:25 · Score: 2, Informative

There are some good reasons to shut down:

1) Electricity consumption
2) Power cuts (unless you have a UPS and software for a clean shutdown installed, what happens if there is a power cut while you are away?).
3) Power fluctuations (my power supply blew dramatically after one a few months ago) and lightning.
4) Heat (in a hot climate)

Re:Dr. Schroeder is pretty hot, too! by Anonymous Coward · 2007-02-20 14:03 · Score: 2, Insightful

A quick look into her lectures/talks in the past:

June 2006 Microsoft Research, Mountain View, CA. Host: Chandu Thekkath. "Understanding failure at scale".

Its okay man.. She will understand..

Cyrus IMAP by More+Trouble · 2007-02-20 14:05 · Score: 2, Interesting

From StorageMojo's article: Further, these results validate the Google File System's central redundancy concept: forget RAID, just replicate the data three times. If I'm an IT architect, the idea that I can spend less money and get higher reliability from simple cluster storage file replication should be very attractive.

For best-of-breed open source IMAP, that means Cyrus IMAP replication.
:w

Every single solid state drive will fail too... by EmbeddedJanitor · 2007-02-20 14:07 · Score: 2, Informative

It is just a matter of time. Depending on the technology (eg. flash) it might be a short to medium time or a long time.

If something has an MTBF of 1 million hours (that's 114 years or so), then you'll be a long time dead before it fails.

At this stage, the only reasonable non-volatile solid state alternative is NAND flash which costs approx 2 cents per MByte ($20/Gbyte) and dropping. NAND flash has far slower transfer speeds than HDD, but is far smaller, uses less power and is mechanically robust. NAND flash typically has a lifetime of 100k erasure cycles and needs special file systems to get robustness and long life.

--
Engineering is the art of compromise.

Re:Every single solid state drive will fail too... by Detritus · 2007-02-20 16:09 · Score: 2, Informative

MTBF tells you the failure rate over the item's service lifetime, which for hard disks, is commonly five years.

--
Mea navis aericumbens anguillis abundat

This paper and the Google paper are complementary by Thagg · 2007-02-20 14:10 · Score: 4, Informative

What's interesting about both of these papers is that previously-believed myths are shown to be, in fact, myths.

The Google paper shows that relatively high temperatures and high usage rates don't affect disk life.
The current paper shows that interface (SCSI, FC vs ATA) had no effect either. The Google paper shows
a significant infant mortality that the CMU paper didn't, and the Google paper shows some years of flat
reliability where the current paper shows decreasing reliability from year one.

The both show that the failure rate is far higher than the manufacturers specify, which shouldn't come
as a surprise to anybody with a few hundred disks.

I'm particularly pleased to see a stake driven through the heart of "SCSI disks are more reliable."
Manufacturers have been pushing that principle for years, saying that "oh, we bin-out the SCSI disks
after testing" or some other horseshit, but it's not true and it's never been true. The disks are
sometimes faster, but they're not "better".

Thad

--
I love Mondays. On a Monday, anything is possible.

Human MTBF by EmbeddedJanitor · 2007-02-20 14:12 · Score: 4, Funny

MTBF of a human until gross catastophic failure (ie. death) is approx 50 years which is approx 440,000 hours.

Of course if we count relatively minor failures (like forgetting to take out the trash or pick up dirty underwear), then MTBF is approx 27 minutes!

--
Engineering is the art of compromise.

That's wrong by ArbitraryConstant · 2007-02-20 14:22 · Score: 2, Informative

It didn't conclude RAID 5 doesn't help, it concludes RAID 5 doesn't help as much as people think, because people think the probability of another failure before the rebuild is complete is negligible and they're wrong.

It helps, and distributing the data more helps more. Someone concerned about multi-drive failures can, for example, use a 3-way RAID 1 array, or a RAID 6 array (which can tolerate the loss of any 2 drives).

--
I rarely criticize things I don't care about.

Re:That's wrong by jcgf · 2007-02-20 14:35 · Score: 2, Insightful

In my humble opinion it also helps to use different branded drives in your raid array, that way the chance of them failing at the same time for the same reason is less and you should have longer to do your rebuild.
Re:That's wrong by petermgreen · 2007-02-20 22:13 · Score: 2, Informative

However, when that failure point is reached is at a random point within the distribution so while the probability of another failure at any point in time is not zero it is pretty small.

There are three real dangers with raid

The first is that arrays are typically built out of identical drives, usually drives from the same batch and then all the drives are run for the same time periods. This means that if there is a design or manufacturing fault that causes a failure peak at a certain number of operational hours there is a good chance that more than one drive in your array will fail at about the same time.

The second is that the drives in an array are typically in one machine, running off one power supply (or one pair of redundant power supplies) and connected to one controller. This means that faults with other hardware in the machine can destroy multiple hard drives at once.

The third is failure of the controller. In many cases the controller stores information on how the data is set up within its own non-volatile memory (some better controllers do store it on the disks themselves) while this doesn't destroy the actual data it can easilly put it beyond the ability of non-experts to reassemble the array in a way that gets the data back (and if they make a mistake they can easilly destory the data they were trying to recover). There is also the problem that getting a suitable replacement controller may be difficult.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

Re:Infant Mortality and stuff by Wilson_6500 · 2007-02-20 14:24 · Score: 5, Insightful

That may be the new 'theory' but we all know about theory vs reality.

Uh, but wasn't this data accumulated via testing actual drives? That's... kinda how science works--by replacing anecdotal evidence with scientifically-gathered data. That's basically condemning science in favor of anecdotes--and the medical fields can tell you how well _that_ works.

So SSD's are not only faster, but more reliable? by gelfling · 2007-02-20 14:32 · Score: 3, Interesting

I wonder if anyone looked at what actually failed in the drives? An arm, a platter, an actuator, a board, an MPU?

Would an analysis tell us that SSDs are not only faster but more reliable and if so by how much?

forget RAID? by juventasone · 2007-02-20 14:38 · Score: 2, Informative

Translation: one array drive failure means a much higher likelihood of another drive failure ... Further, these results validate the Google File System's central redundancy concept: forget RAID, just replicate the data three times.

The fact that another drive in an array is more likely to fail if one has already failed makes a lot of sense, but the conclusion to forget RAIDs doesn't. Arrays are normally composed of the same drive model, even the same manufacturing batch, and are in the same operating environment. If something is "wrong" with any of these three variables, and it causes a drive to fail, it's common sense the other drives have a good chance at following. I've seen real-world examples of this.

In my real-world situations, the RAID still did it's job, the drive was replaced, and nothing was lost, despite subsequent failure of other drives in the array. Sure you can get similar reliability at a lower price by replicating data, but I think that's always been understood as the case. Furthermore, as someone else in the forum mentioned, enterprise-class RAIDs are often used primarily for performance reasons. A modern hardware RAID controller (with a dedicated processor and ram) can create storage performance unattainable outside of a RAID.

Re:forget RAID? by Ragin'Cajun · 2007-02-20 17:50 · Score: 2, Interesting

I used to work at a company that made network-attached storage appliances. Amazingly enough, one source of drive failures was the hot spare spinning up! The current draw during the spinup would cause a voltage dip on the power plane, which could lead to a read or write error on one of the neighbouring drives. Unfortunately, the most common cause of the hot spare spinning up was...another drive failing. So suddenly a second drive fails because of a read or write error.

The thing is, sometimes getting a read error doesn't actually mean the media is bad. There could have been some power fluctuation during the write, so the checksum doesn't match the data and the drive's controller returns a failure during the read. But if you rewrite that sector, it will be fixed (e.g. during an unconditional format).

--
--It's all fun and games, 'till someone loses an eye. Then it's one-eyed fun!--

Schroeder's disk... by Anonymous Coward · 2007-02-20 14:42 · Score: 2, Funny

is neither working nor broken... Unless you look at it of course ;)

How much does handling matter? by RebornData · 2007-02-20 14:43 · Score: 5, Interesting

What's interesting to me is that neither of these papers mentions the issue of pre-installation handling. The good folks over at Storage Review seem to be of the opinion that the shocks and bumps that happen to a drive between the factory and the final installation are the most significant factor in drive reliability (much more than brand, for example).

The google paper talks a bit about certain drive "vintages" being problemmatic, but I wonder if they buy drives in large lots, and perhaps some lots might have been handled roughly during shipping. If they could trace back each hard drive to the original order, perhaps they could look to see if there's a correlation between failure and shipping lot.

-R

Re:How much does handling matter? by ForestGrump · 2007-02-20 18:41 · Score: 3, Informative

the google paper was posted a day or 2 ago. let me find it.
here you go
http://hardware.slashdot.org/article.pl?sid=07/02/ 18/0420247

--
Is it true that more people vote for the winner of American Idol, than vote for the president? -Ali G.

all this is moot by billcopc · 2007-02-20 14:56 · Score: 3, Insightful

Hard drives die often because the manufacturers build them cheaply, the same as every other component in a PC. Why would they ever make a bulletproof hard drive ? They'd go out of business!

Sure, some of them end up being replaced under warranty, but a lot of them don't, and so Maxtor/IBM/Hitachi make another buck off your sorry ass. There isn't a sane server admin that doesn't keep a set of spares in his desk drawer, because it's not a question of "if" it dies but WHEN. Hell, most decently-geared techies have a whole box of hard drives, pre-mounted in hotswap bays ready to rock. And if it weren't for the fact that I was just laid off a month ago, I'd be buying a couple spare SATA drives myself, I just have a funny feeling something's going to go tits up in my media server. I haven't had any warnings or hiccups, but I just know the Seagate devil's planning his move, waiting for 2 drives to start straying so he can kill my Raid-5 nice and fast. Hard drives are little more than Murphy's Law in a box.

--
-Billco, Fnarg.com

Re:all this is moot by Diordna · 2007-02-20 16:07 · Score: 2

No, they wouldn't. People buy new drives because the price of storage keeps going down and the size of the average file keeps going up.

Exponential with time by tedgyz · 2007-02-20 14:58 · Score: 2, Informative

All the hard drives I installed in my family's computers have failed in the last 5 years - including mine. :-(

Waaaah! They cry, when I tell them there is no hope for the family photos, barring a media reclamation service == $$$

I tell everyone: "Assume your hard drive will fail at any moment, starting now! What is on your hard drive that you would be upset if you never saw it again?"

--
"No matter where you go, there you are." -- Buckaroo Banzai

Re:Infant Mortality and stuff by DarkVader · 2007-02-20 15:02 · Score: 2, Interesting

1 in 20 drive failures? What are you using, Western Digital drives? I don't see anything close to that failure rate, more like 1 in 300.

I don't deploy "enterprise" drives, they're overpriced, and the few I did install years ago proved to be less reliable than "consumer" drives. My real world experience is that the "consumer" drives are generally reliable, I just plan on a 2-3 year replacement schedule.

I can't disagree with RAID being fallible depending on what takes out the drive, though.

Nothing I knew about hard drives was mentioned by AllParadox · 2007-02-20 15:21 · Score: 3, Insightful

As mechanical devices, hard drives are appallingly reliable.

The electronics on the hard drive rank as major players in heat generation in the boxen.

Heat kills transistorized components.

"Hard Drive Data Recovery" companies often have nothing more sophisticated than a hard drive buying program, and very competent techs soldering and unsoldering drive electronics. They buy a few each of most available hard drives, as the drives appear on the market. When a customer sends them a hard drive for "recovery", the techs find a matching drive in inventory, disconnect the electronics, and replace the electronics in the drive. The percentage of drive failures due to mechanical failure is very low.

When I bought a desktop computer for an unsophisticated family member, I also purchased and installed a drive cooler - a special fan that blows directly on the drive electronics.

I was very concerned about MTBF. I just assumed that the manufacturer's information was totally irrelevant to my situation - a hard drive in a corner of the tower, covered with dust, and no air circulation.

I occasionally pick up used equipment from family and friends. Usually, it is broken. Often, it is the hard drive. What is amazing is not that they failed, but that they lasted so long with a 1.5 inch coating of insulating dust.

I suspect this would also explain the rising failure rate with time. Nobody seems to clean the darned things. They just sit and run 24/7/365, until they fail.

--
All is paradox. Retired lawyer, so this is just one more layman's opinion.

Re:Nothing I knew about hard drives was mentioned by AllParadox · 2007-02-20 15:53 · Score: 2, Funny

"temperature has no effect on the failure rate"

Said by people who do not know how to light off a cutting torch.

Trust me, I *can* make 'em fail.

Real quick, too.

--
All is paradox. Retired lawyer, so this is just one more layman's opinion.

Re:Infant Mortality and stuff by TheLink · 2007-02-20 15:33 · Score: 4, Insightful

quote: "Sorta. Again, real world vs theory. Try banging the hell out of an off the shelf consumer drive 24/7/365 and see how long it holds up"

Uh the paper is based on _real_world_ stats (which part of "empirical evidence" + "she looked at 100,000 drives" don't you understand?).

Your assumptions = theory. Paper = real world.

And that's why the paper was voted "Best Paper", because it seems lots of people had similar assumptions and this paper is very useful to at least get some people to revisit those assumptions.

It might still be proven wrong by a bigger/better study, or it could turn out that it was flawed in some way. But I'll give them the benefit of doubt - more than I'll trust the MTTF/MTBF figures from drive manufacturers.

--

Too many replies beneath your current threshold

Re:Infant Mortality and stuff by Anonymous Coward · 2007-02-20 16:30 · Score: 5, Insightful

Use two drives that are not in a raid setup. Use one as the data holder and rsync or tar.gz the data to the other one at your comfort level (hourly/daily/weekly/monthly or whatever time frame you would like). Much cheaper then raid, easier to get going, no gotchas involved with different HD controllers or different drives and most importantly, the second drive is not "live" and not in normal operation which constitutes a backup (remember, raid is not and never was a backup solution, it is only for uptime and maybe speed).

Raid controllers comes in two flavors. Ones that are very well supported and you will always find a similar or compatible one if that controller fails, the down side of this type is it is very expensive. The other type is the cheap ones, you know, the ones for under $100 which may not exist in 2 years when your fails leaving your raid array useless and the on board SATA raid chip sets that change at least yearly as well. Good luck with those. They do work but I'd bet you will have more problems with the raid setup itself then with actual drives the data is on.

I know, KISS is not in typical /. speak but it definitely applies here. 300GB HDs are about $80 without rebates, using one to hold a copy of the other using rsync or robocopy is about the cheapest backup you can get and since it is not a live file system, all the other things that happens to data that is not the fault of the actual HD (virus, mouse slip, kids messing around, accidents, overwriting) will be recoverable.

and Google contradicts. by bill_mcgonigle · 2007-02-20 16:33 · Score: 4, Interesting

Well, the article actually says that drives don't have a spike of failures at the beginning.

Hmm, the Google paper says they do, from 3-6 months (Figure 2).

Which leaves us with confirmation that 50% of all studies are wrong.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

disk spin-up is most responsible for failure ? by cats-paw · 2007-02-20 16:52 · Score: 3, Interesting

I keep hearing this persistent rumor that it's disk spin-up which is the most significant contribution to disk failure. The moral of the story is that systems which are left on 24/7 are less likely to see HD failures than systems turned on/off everyday.

Now if that's really true, wouldn't it be quite simple for the manufacturers to simply spin-up the disk more slowly by putting in very simple and reliable motor control circuitry ?

Does anyone have any real evidence, i.e. not anecdotal, that this is really true.

--
Absolute statements are never true

OSS Software RAID, too. by Kadin2048 · 2007-02-20 17:43 · Score: 4, Insightful

On the other hand, you could get a cheap drive controller, and do software RAID, using OSS tools; the setup might be more complex than hardware RAID, but there shouldn't be any issues with recovering your data later due to the format it's written in.

I agree though, that for most people, some sort of "userland RAID" where the disks are just mounted as regular volumes to the filesystem, and then you just write the data twice, is probably the best bet. There's no format problems, and you'll always be able to pull a drive out, stick it in another machine, and get at your data.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:Infant Mortality and stuff by duffbeer703 · 2007-02-20 18:09 · Score: 4, Informative

That may be the new 'theory' but we all know about theory vs reality. Here in reality if you put a couple of dozen new drives into service you have one or two spare hard drives to replace the ones that WILL fail in the first week. Especially with consumer grade drives typical in workstation deployment. If you only have one dud out of twenty it was a good rollout.

This study looks pretty realistic to me, in fact its better data than the Google paper's because they are looking at different usage scenarios. The study also jives with vendor's warranty periods -- right around the 3 year mark (end of warranty) failures start going up.

I take issue with your "real world vs. theory" argument version workstation disks and server disks as well, only because I have my own numbers. Based on numbers that my company gathers for its 50,000 workstations, the disk failure rate is around 1.9% annually. (Still alot of disks) There are exceptions -- those numbers are driven upward by one deployment of workstations from a vendor that had a 22% failure rate. (the PCs were replaced by the vendor) Server disks are in the same ballpark - slightly less that 2%.

Vendors provide more evidence of that fact. Many servers are being shipped with SATA disks, often the same as what you'll find in workstations. If SATA was less reliable, that would increase the vendor's support costs and they wouldn't ship them.

You're totally right about RAID-5... it can be a dangerous thing for an inept admin. Bad disks often come in batches, and bad controllers can ruin your day. A redundant array of bad data isn't very helpful ;)

--
Conformity is the jailer of freedom and enemy of growth. -JFK

Actually, mostly it DOESN'T contradict by Moraelin · 2007-02-20 20:19 · Score: 5, Insightful

The two don't really contradict each other that much. Google's spike is relatively small and it's really a spike in the first 1-3 months. By the 6th month it's basically settled. In this paper half the time they graph in whole year increments, so that kind of a spike would be averaged into the first year. So, no, they don't contradict each other as such. And in at least one of the graphs by month in this paper (HPC1), there is something that looks like a spike in the first month.

More importantly, they don't contradict each other in respect to the rest of the curve. With or without that spike, the curve just doesn't look like the bathtub fairy tale that drive makers try to bullshit us with. You're led into a false sense of security that, basically, if a drive didn't fail within the first couple of months, then it'll be at a (nearly) constant and very small probability to fail for the whole next 5 years, and only then it starts rising again. Basically that if you upgrade your drives every 4 years, whatever didn't fail within 2-3 months, heck, it's very unlikely to fail. And the curve just doesn't look that way. The probability to fail rises continuously, and (again whether that spike actually exists or not) after as little as 1 year you're above the starting height of the "bathtub" already.

In retrospect, I don't even know when and why the "bathtub" myth even started. The bathtub distribution was originally for stuff like electronic components, without moving parts. For something with mechanical wear and tear like a hard drive, who the heck came up with the idea that the same curve must apply? Shouldn't it have been common sense all along that it linearly gets more wear and tear?

Both papers also tell us that the manufacturers' MTBF numbers are, basically, pure bullshit. They're some impressive number put there for the benefit of the marketting department, not because someone at Seagate/Maxtor/whatever actually believes that number.

In retrospect, again, we should have had an alarm signal when the manufacturers lowered there warranty from 3 to 1 year. If indeed there was (1) the MTBF they claim, and more importantly (2) the bathtub curve they claim, the reduction wouldn't have even made too much of a difference. I mean, most drives would have failed withing a couple of months, followed by barely a trickle of deffective drives for the next 5 years straight. Why bother doing the bad-for-marketting thing of lowering the warranty in that scenario? Or did they already know that they lie?

And finally, a very important point is that (again, bullshit marketting claims be damned) there is no difference in reliability between cheap SATA and expensive SCSI and FC. There is this assumption permeating the whole society that if something is expensive, it _must_ automatically be better and more durable than the cheap stuff. That if you buy a big plasma TV, it's automatically better and last longer than an el-cheapo CRT. (Yeah, right. Plasma is actually known for its decay over time.) A whole edifice of consumerism, conspicuous consumption, and SFV (Stupid Fashion Victim) syndrome is based on that bullshit excuse to spend more than you need to spend. "Yeah, but it'll be better and last longer!" Yeah, right.

I've actually met people who wouldn't even _consider_ putting a ATA drive in any kind of server. "What, you're going to put your enterprise data on ATA drives???" (Said with a perplexed look, as if I had proposed flushing it to /dev/nul or something.) Well, now we know they're not actually any worse. If you don't actually need the extra bandwidth or lower latency or a 15,000 RPM drive, then you can just as well drop a SATA drive in that machine. Even for 10,000 RPM, 4.5ms, there are the WD Raptor drives with SATA interface, and they're cheaper than a SCSI or FC drive. For a lot of stuff you don't even need those, a 7200 RPM will do perfectly fine.

--
A polar bear is a cartesian bear after a coordinate transform.

Re:Actually, mostly it DOESN'T contradict by darCness · 2007-02-21 02:31 · Score: 2, Informative

"There is this assumption permeating the whole society that if something is expensive, it _must_ automatically be better"

This is known as the Veblen Effect based on work by Thorstein Veblen.

Re:Infant Mortality and stuff by empaler · 2007-02-20 22:09 · Score: 2, Informative

I actually only have good experiences with WD and was about to order a new batch of SATA disks (now-ish).

Yes, Everything! by DeeVeeAnt · 2007-02-21 00:27 · Score: 2, Funny

It turns out they are actually triangular

--
Home fucking is killing prostitution.

Re:No "infant mortality" effect? by asuffield · 2007-02-21 01:29 · Score: 2, Informative

Love the RAID5 stat, though... Perhaps this study will finally convince people to only use RAID for performance or huge-JBOD reasons, never for (the illusion of) reliability.

It's true that you should never buy anything for the illusion of reliability, but the article does not claim RAID is not a good way to get reliability.

First, let's look at the common mistake when people think about RAID: "If the probability of a drive failure is X, then the probability of two drives in a RAID volume failing is X*X, which is much smaller". That's nonsense, as the article demonstrates - the probability is only X*X if the events are independent, which they are clearly not.

But the idea was nonsense even before that. The statement is taking the wrong attitude to the problem - it is considering the probability of data loss at *one point in time*. That's not actually what you care about - if your server dies on Tuesday, it is no comfort to you that it did not die on Monday. Here is a more sensible way to look at what is going on (ignoring backups for the moment):

Every drive is going to fail, typically within the first ten years of its life. So if you have a non-RAID system, the probability of data loss is 100% - certain. Really. Without RAID, sooner or later, you are going to lose that volume. What RAID gives you is a moderate chance of getting through the inevitable drive failures without losing the volume, and that's a chance that you never had at all without RAID. Different configurations can modify how large that chance is, but the essential feature of RAID is that you get the chance.

So what do backups get you? It's basically the same thing, except that you've got to rebuild the server. So if you just have backups and no RAID, it is a certainty that sooner or later your server is going to have significant amounts of downtime while it's being rebuilt from the backup. If downtime bothers you, you need RAID, period. Exactly what kind of RAID depends on what chance you want to take (standard risk management calculation), but there's just no contest between "certain failure" and "chance of avoiding failure" - even a 10% chance of surviving a disk failure is infinitely better than no chance (and the actual figure should be much better than that).

Lastly, what happens if you have RAID and no backups? It should be apparent that you get the same scenario as RAID with backups, only with a higher chance of failure. So there's no fundamental reason not to do that - line up the figures along with RAID+backup solutions in your risk management analysis, and pick the cheapest option for the level of risk you (or your insurance company) are willing to accept.

The impact of this study is a nice improvement in the accuracy of that analysis. Neither more nor less. If you're running large servers, this would be a good time to pull out those numbers and take another look at them (if you don't have those numbers on file, this study is not for you).

Re:moving parts - Don't always wear out by leonardluen · 2007-02-21 02:18 · Score: 2, Funny

but what happens when we run out of cats to power them?

Slashdot Mirror

Everything You Know About Disks Is Wrong

79 of 330 comments (clear)