Slashdot Mirror


Disk Failure Rates More Myth Than Metric

Lucas123 writes "Using mean time between failure rates suggest that disks can last from 1 million to 1.5 million hours, or 114 to 170 years, but study after study shows that those metrics are inaccurate for determining hard drive life. One study found that some disk drive replacement rates were greater than one in 10. This is nearly 15 times what vendors claim, and all of these studies show failure rates grow steadily with the age of the hardware. One former EMC employee turned consultant said, 'I don't think [disk array manufacturers are] going to be forthright with giving people that data because it would reduce the opportunity for them to add value by 'interpreting' the numbers.'"

283 comments

  1. Never had a drive fail by Jafafa+Hots · · Score: 4, Interesting
    I've gone through many over the years, replacing them as they became too small - still using some small ones many years old for minor tasks, etc. and he only drive I've ever had partially fail is the one I accidentally launched across a room.

    I don't understand how people are always complaining about their hard drives failing. In 30 years it hasn't happened to me yet.

    I'm about to lug a huge Wang hard drive out to the trash pickup on Monday - weighs over 100 pounds... still runs. Actually it uses removable platters but still...

    --
    This space available.
    1. Re:Never had a drive fail by Anonymous Coward · · Score: 5, Funny

      Wait. You've got a huge Wang, and you're throwing it out? D00d, that's just uncool. Give it to someone else at least. It would be fun to ask people "wanna come see my huge Wang?" just to see their reaction! :)

      hah. captcha word: largest

    2. Re:Never had a drive fail by Anonymous Coward · · Score: 3, Insightful

      Drive failures are actually fairly common, but usually the failures are due to cooling issues. Given that most PCs aren't really set up to ensure decent hard drive cooling, it is probable that the failure ratings are inflated due to operation outside of the expected operational parameters (which are probably not conservative enough for real usage). In my opinion, if you have more than a single hard drive closely stacked in your case you should have some sort of hard drive fan.

    3. Re:Never had a drive fail by serviscope_minor · · Score: 5, Funny

      I'm about to lug a huge Wang hard drive out to the trash pickup on Monday - weighs over 100 pounds... still runs. Actually it uses removable platters but still...

      <Indiana Jones> IT BELONGS IN A MUSEUM!</Indiana Jones>

      --
      SJW n. One who posts facts.
    4. Re:Never had a drive fail by hedwards · · Score: 3, Informative

      I think cooling issues are somewhat less common than most people think, but they are definitely significant. And I wouldn't care to suggest that people neglect to handle heat dissipation on general principle.

      Dirty, spikey power is a much larger problem. A few years back I had 3 or 4 nearly identical WD 80gig drives die within a couple of months of each other, They were replaced with identical drives that are still chugging along find all this time later. The only major difference is that I gave each system a cheapo UPS.

      Being somewhat I cheap, I tend to use disks until they wear out completely. After a few years I shift the disks to storing things which are permanently archived elsewhere or swap. Seems to work out fine, only problem is what happens if the swap goes bad while I'm using it.

    5. Re:Never had a drive fail by GIL_Dude · · Score: 3, Insightful

      I'd agree with you there; I have had probably 8 or 9 hard drives fail over the years (I currently have 10 running in the house right now and I have 8 running at my desk at work, so I do have a lot of drives). I am sure that I have caused some of the failures by just what you are talking about - I've maxed out the cases (for example my server has 4 drives in it, but was designed for 2 - I had to make my own bracket to jam the 4th in there, the 3rd went in place of a floppy). But I've never done anything about cooling and I probably caused this myself. Although to hear the noises coming from some of the platters when they failed I'm sure at least a couple weren't just heat. For example at work I have had 2 drives fail in just bog standard HP Compaq dc7700 desktops (without cramming in extra stuff). Sometimes they just up and die, other times I must have helped them along with heat.

    6. Re:Never had a drive fail by danwat1234 · · Score: 1

      Dude don't chuck it! "It Belongs in a museum!"

    7. Re:Never had a drive fail by kesuki · · Score: 3, Informative

      And i had 5 fail This year, welcome, the the law of averages. note i own about 15 hard drives including the 5 that failed.

    8. Re:Never had a drive fail by Kibblet · · Score: 1

      I wish I had your luck.

    9. Re:Never had a drive fail by Kjella · · Score: 2, Informative

      1.6GB drive: failed
      3.8GB drive: failed
      45GB drive: failed
      2x500GB drive: failed

      Still working:
      9GB
      27GB
      100GB
      120GB
      2x160GB
      2x250GB
      3x500GB
      2x750GB
      3x500GB external

      However, in all the cases they've been the worst possible. The 45GB drive was my primary drive at the time with all my recent stuff. The 2x500GB were in a RAID5, you know what happens in a RAID5 when two drives fail? Yep. Right now I'm running 3xRAID1 for the important stuff (+ backup), JBOD on everything else.

      --
      Live today, because you never know what tomorrow brings
    10. Re:Never had a drive fail by Thought1 · · Score: 1

      I've only had two drives fail, but that was due to a tree falling on the power lines close to my house and shorting them (watching the bright showers of sparks into the road in the dark was fun, though). I think I've bought somewhere around 50 of them over the last 15 years (since they actually got inexpensive enough to be worth buying).

    11. Re:Never had a drive fail by Anonymous Coward · · Score: 0

      i had a friend who said the same thing once but he lived to regret it. you can email him yourself and maybe he would want to share the details -- jaredj@aieranco.com

    12. Re:Never had a drive fail by STrinity · · Score: 4, Funny

      I'm about to lug a huge Wang
      There needs to be a -1 "Too Easy" moderation option.
      --
      Les Miserables Volume 1 now up with my reading of
    13. Re:Never had a drive fail by mpeskett · · Score: 1

      Well, that's somewhat reassuring - I have 3 drives, but they have at least one drive space on either side and a fan blowing air into the case directly over/between them. Ought to be nice and cool.

      Never had a failure myself. I thought a portable drive had gone bad once but it turned out to be the USB lead... a bit annoying, but I got a bigger one to replace it, meaning I now have more space, which is good.

    14. Re:Never had a drive fail by Anonymous Coward · · Score: 0

      I don't know how you've done that, but good for you.

      I've been around tech for 20 years and in the industry for 15. I've personally seen at least a dozen drives fail, across the following brands: Western Digital, Seagate, IBM, Hitachi and especially Maxtor (these WILL fail, without failure).

      We currently have a failing drive in one server, and I've just replaced failing drives in two workstations, this week alone. These drives are all 3-5 years old and on 24/7, being hit by an application that hits the drives 24/7.

      At home, I've never had one fail, either the workstation or drive is replaced within 3-5 years.

    15. Re:Never had a drive fail by tdelaney · · Score: 1

      For an opposing anecdote, my family had 3 fairly new drives fail within 3 months of each other - 1 Seagate (approx 1 year old), 1 Samsung (approx 6 months old) and 1 Western Digital (3 weeks old).

      During this period, I learned not to buy WD drives in Australia again - whereas Seagate and Samsung handle warranty returns locally, and each took about 3 days to get a new drive to me, WD wanted me to send the drive to Singapore, and estimated a 4-week turnaround. Fortunately, I was able to convince the retailer to take it back (for a restocking fee), and was able to buy the same-sized Samsung for less than the modified refund.

      OTOH, I've still got 8GB drives that work just fine.

      OTGH, I bought 2 IBM Deathstars (75GXP) several years ago (at the same time, presumably the same batch) - one died very quickly, but the other is still in use today.

    16. Re:Never had a drive fail by afidel · · Score: 4, Informative

      I would tend to agree with that. I run a datacenter that's cooled to 74 degrees and has good clean power from the online UPS's and I've had 6 drive failures out of about 500 drives over the last 22 months. Three were from older servers that weren't always properly cooled (the company had a crappy AC unit in their old data closet.) The other three all died in their first month or two after installation. So properly treated server class drives are dying at a rate of about .5% per year for me, I'd say that jives with manufacturer MTBF.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    17. Re:Never had a drive fail by cheater512 · · Score: 1

      I make a point of buying WD drives in Australia.
      Never had one fail yet. Very impressed. :)

    18. Re:Never had a drive fail by Depili · · Score: 3, Informative

      Excess heat can cause the lubricant of a hd to go bad and causes weird noises, also logic board failures/head positioning failures cause quite a racket.

      In my experience most drives fail without any indications from smart tests, ie. logic board failures, bad sectors are quite rare nowadays.

    19. Re:Never had a drive fail by rolfwind · · Score: 1

      I already had 2 harddrives fail in 2 seperate notebooks. They weren't old either at the time, maybe one was 16 months and the other was 3 months at the time. I've only owned about 4 notebooks.

      Something about moving around and harddrives don't mix. (Can't wait for SSD).

    20. Re:Never had a drive fail by Rosy+At+Random · · Score: 2, Interesting

      Am I the only one who wants to hear more about the drive that went ballistic?

      --
      Would you like a slice of toast?
    21. Re:Never had a drive fail by Xtravar · · Score: 1

      I agree with you almost 100%.
      The only time I had a hard drive die was at work... which is probably one of the worst places for it to happen.

      And our tech people couldn't recover data; I had to ask for the broken drive and recover it myself.

      And I was quite dicked because we get just one big partition and so the fragmentation rate was extremely high over my important documents.

      That's why:
        1. always partition everything
        2. never use Maxtor drives
        3. never buy Dell

      --
      Buckle your ROFL belt, we're in for some LOLs.
    22. Re:Never had a drive fail by tomhudson · · Score: 1

      When they fail within minutes, in an open box, with extra fans blowing across them (4 out of 4 from one batch, 2 out of 4 with a replacement batch - and yes, they were also individually checked in another machine afterwards, but let's face it, when they're making grinding or zip-zip-zip noises, they're defective) there's a problem with quality control. Specifically, China.

      Also , do NOT use those hard drive fans that mount under the hd - I tried that with a raid 4 years ago. The fans become unbalanced after a while, and will ruin your drives. Instead, mount an additional fan inside the case, pointing directly at the drives, to help avoid hot spots.

    23. Re:Never had a drive fail by Zak3056 · · Score: 3, Funny

      The only possible response to that is this Penny Arcade.

      --
      What part of "shall not be infringed" is so hard to understand?
    24. Re:Never had a drive fail by kylemonger · · Score: 1

      My Maxtor drives have lasted the longest so far. No failures across four drives with ages up to 52 months. LaCie is my "never again" brand, with a 100% failure rate across 6 drives in 24 months. Every drive manufacturer seems to go through bad patches where their product just sucks for a while. Buy and pray, basically, because who's good today may suck tomorrow.

    25. Re:Never had a drive fail by pipatron · · Score: 1

      these WILL fail, without failure

      That's not a very epic fail.

      --
      c++; /* this makes c bigger but returns the old value */
    26. Re:Never had a drive fail by crmarvin42 · · Score: 1

      I've also never had a drive fail. That said I've had the enclosure (LaCie and their use of sub standard firewire bridges) or it's power brick (Cheap radio shack enclosures) fail several times for external drives.

      The first one was bad for 3 years before I realized that it was the enclosure and not the drive, and I returned the drive in the second one before I realized it was the power brick and not the drive itself that was bad Doh't.

      The only genuinely bad drive I've ever come across was came with a second hand blue and white G3 mac 2 years ago that was bad when I got it, but it was the one that originally shipped with the machine and too small to be useful anyway.

      --
      Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
    27. Re:Never had a drive fail by reboot246 · · Score: 1
      I've never had one fail either. Among the ones I have that are still working are -- a Quantum 8GB, several Western Digital 20GB & 40GB drives, 20GB Seagate (still use it every day in a SimplyMEPIS system), and a 3GB Maxtor that runs a Damn Small Linux system. These old drives are useful for fun projects, but I'd never trust any of my important data with them. Drives *can* fail; I know because I've heard rumors about failures. :)

      I still have a 20 MB hard drive that I bought for my Atari 520ST way back in the early 90s! It worked the last time I used it a few years ago.

    28. Re:Never had a drive fail by plasmoidia · · Score: 1

      I hope you have up-to-date backups. 'Cause after a making a comment like that, you are sure to have a drive or two fail in the coming week...

    29. Re:Never had a drive fail by gweihir · · Score: 1

      This matches my experience with about 50 Maxtors (known to die early when not cooled properly). I lost one in 3 years without apparent cause and two more that had been dropped in shipping (incompetently packaged).

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    30. Re:Never had a drive fail by Urza9814 · · Score: 1

      I've only ever seen two hard drive fail. Both were 80GB Western Digitals, and neither lasted more than a year. I now avoid WD at all costs.

    31. Re:Never had a drive fail by justinchudgar · · Score: 1

      Dirty, spikey power is a much larger problem.
      I agree. The only times I've seen drives die is when there have been utility, UPS or PSU problems. In fact, I just switched to all solid state on a key server because the client is in a rural area where the power can go off for 1-3 hours half a dozen times a year. And, they have been unable to get funding for a generator; so, after the last UPS fried itself and the server's drives and PSU, I switched to flash. The added bonus is that the power draw and heat dissipation are lower which means the server stays up longer anyway.
      --
      WARNING: Smoking this sig may cause lowered IQ, insanity or short term memory loss. It is also really bad for your monit
    32. Re:Never had a drive fail by petermgreen · · Score: 1

      many people when building arrays use identical drives, this is good for performance but bad for data protection. Identical drives (particularlly if bought from the same vendor at the same time and therefore likely from the same batch) subjected to almost identical loading (being in an array together) is a recipie for multiple drives failing in the same manner at about the same time.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    33. Re:Never had a drive fail by bcmm · · Score: 1

      ...he only drive I've ever had partially fail is the one I accidentally launched across a room.
      You CANNNOT say something like that and then not tell us how you accidentally launched it across the room!
      --
      # cat /dev/mem | strings | grep -i llama
      Damn, my RAM is full of llamas.
    34. Re:Never had a drive fail by tomhudson · · Score: 1

      I bought the original batch of 4 drives from 2 different retailers in 2 different cities - 3 of them STILL ended up being the same batch. Go figure ...

      I'd adopt Heinlein's thinking (mutis mutandi) and buy all the drives for one raid from 1 batch. Either they fail, or they don't. If you're lucky, they all last a long time. If you're not, you won't end up with a raid that you have to junk because you can't replace one obsolete bad disk.

      Of course, raid is no replacement for backing up, just as svn isn't (but try to explain that to someone who hasn't been bitten in the you-know-whats :-)

    35. Re:Never had a drive fail by slysithesuperspy · · Score: 1

      I had a similar situation, 2x320gb failed in raid5. And guess what, I sent them back to Seagate for replacements and they kindly sent me 400gb, so I was very happy until they BOTH failed a few days after. Oh, and that was after 1 drive failing a few months before. Now I backup with some external drives.

      I had read on slashdot to buy drives from different batches and I ignored that advice :( That is really the last time I'm going to be a cheapskate

    36. Re:Never had a drive fail by petermgreen · · Score: 1

      Yes raid isn't a replacement for backups because there are some types of threat it doesn't protect against but that doesn't mean improving reliability isn't part of it's function (if it wasn't everyone would be using RAID 0).

      1: keep the system running after a drive failure
      2: protect the data that is too recent to have been caught in a backup run yet against drive failure
      3: improve performance

      Many people claim they are doing raid for reliability but then go and buy identical drives rather than using a mixture of different brands.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    37. Re:Never had a drive fail by PhotoGuy · · Score: 1

      I'm about to lug a huge Wang hard drive out to the trash pickup on Monday - weighs over 100 pounds... still runs. Actually it uses removable platters but still...

      Reminds me of when I was a student worker at a large corporation, years ago, and was helping their transition from dedicated Wang word processors to PC-based word processors for the typing pool. It met with a lot of resistance.

      The first time I came across the president of the company, in the elevator after work one day, he started chatting about the transition, and how it was being resisted. Someone made the comment "yeah, those girls in the typing pool really love the Wangs." Never seen an elevator full of people try to stifle so many snickers in front of the prez...

      --
      Love many, trust a few, do harm to none.
    38. Re:Never had a drive fail by tomhudson · · Score: 1

      The "mix-n-match" thing is counter-intuitive with raids.

      If one drive fails, you're going to want to migrate your data asap, no matter what, and put the old drives on the shelf "just in case." More than likely, you'll also be upsizing the drives (You can buy a 750 gig for what a 250 gig cost 2 years ago).

      1. If you get 4 drives, and half last 2 years, and the other half 4, you'll be pulling them at the 2-year point.
      2. Same scenario as above, but the drives are all from one batch, and they all last 2 years - net result is the same - the drives are retired at 2 years.
      3. Same scenario as above, but the drives are all from one batch, and they all last 4 years - the drives are retired at 4 years.

      Since you're going to retire the array at the first failure, mixing from different batches means you've increased the odds that at least one drive fails prematurely.

    39. Re:Never had a drive fail by Jafafa+Hots · · Score: 1
      Oh, its mundane... I had the drive outside the comp., sitting on a high table, not in an enclosure, just hooked up with an IDE cable to grab some stuff off of it. Tripped over the cable which acted like a whip, drive shot off the end of it.

      Got about 74% of the data off of it afterward.

      --
      This space available.
    40. Re:Never had a drive fail by Reziac · · Score: 2, Interesting

      I live where the power spikes and sags constantly. My machines are all on UPSs. And each PC has a decent quality PSU. And if a HD runs more than "pleasantly warm" to the touch, it gets its own dedicated fan. Consequently, I firmly believe all HDs are supposed to live A Long Time... the oldest of my 24/7 HDs right now is 10 years old, and has about 80,000 actual hours on it -- Like yourself, I think they're supposed to be worn out before being thrown out. :)

      Of course, yonder is a large stack of backups, which also help increase HD longevity. ;)

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    41. Re:Never had a drive fail by Reziac · · Score: 1

      Hmm... I've been using those "under the HD" fans for 10 years; they REALLY help when a HD runs a bit hot and there's no good way to mount a fan blowing directly across the HD. (Not every PC case was designed by someone who actually puts components inside it. :)

      Even with one that developed Rattly Worn-out Bearings, I didn't observe any wobble (and the HD under it outlived two of those fans, and was just retired at age 8) -- but my HD mounts are solid and don't wiggle even if you bang on them. Some mounts are so cheapassed that they vibrate just from the HD running, and in those cases you're probably right.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    42. Re:Never had a drive fail by Killjoy_NL · · Score: 1

      Ok, now I'm curious, how do you accidentally launch a hdd across a room?

      Please let there be a funny story behind it :)

      --
      This is the sig that says NI (again)
    43. Re:Never had a drive fail by kasperd · · Score: 1

      many people when building arrays use identical drives, this is good for performance but bad for data protection.
      Sounds like two urban legends to me. Sure if you buy two disks from the same vendor, they could be part of a bad batch. To avoid losing data because of that do an initial stress test of your disks before you start relying on them to store valuable data. If one fails during stress test get it replaced and keep stress testing until you are able to complete the stress test without failures. After that it would be highly unlikely that your raid experience two simultaneously drive failures unless it is caused by an external event such as a power spike, a short circuit, the cabinet got kicked, or something else. Even if drives from the safe batch were going to have similar life expectations, they are not going to fail within seconds of each other. And the second one to fail just has to survive long enough for data to be copied to the hot spare.

      There still is a reason why you might see one disk fail and another disk fail before the raid recovery finishes. If one of your disks have a bad sector, it could go unnoticed for a long time, if you never read it. If the bad sector happened to be a parity sector in a raid-5, it is quite likely that it is not going to be read until you actually need it for recovery. So what happens if you have a bad sector on one disk, and another disk in your raid fails? In some raid implementations, the recovery hits the bad sector and marks that disk as failed, now your raid have two failed disks and recovery fails.

      To protect against this you need a few things. First of all the raid must never give up on recovery because of bad sectors, even if a disk has bad sectors and should be replaced, the raid must still try reading other sectors from that disk, if that is the only way to recover data. And even if some sector is unrecoverable, it should just report that as a bad sector to the higher layer, and keep recovering everything else.

      Above is of course not sufficient to prevent a bit of data loss in the event of some bad sectors and one total disk failure. Periodically reading the raw disks to watch out for bad sectors may help, and your raid system might be able to do this automatically. But even better is to have a bit more redundancy. Have three mirrors if you use raid-1, and use raid-6 instead of raid-5. In those cases even with bad sectors spread across all your disks and one total disk failure, you are unlikely to lose data.

      You could use raid-1 with three mirrors for things you write frequently and raid-6 for things you don't write that frequent. I was thinking about setting up a file system with data on raid-6 and journal on raid-1, but dropped the idea again because the access patterns in a journal are actually not that bad for raid-6.

      Those performance problems from mixing drives from different vendors, I have never observed, even though I have done raids that way. One advantage you get from using drives from different vendors is, that it improves your chances of being able to find a drive matching in size once you need to replace one. But if you are going to be using larger drives when you replace one in your raid, then that doesn't matter anyway.
      --

      Do you care about the security of your wireless mouse?
    44. Re:Never had a drive fail by h4rm0ny · · Score: 1


      Slightly off-topic, but I recently bought my first computer case that was actually a quality case (an Antec P182) and the difference to what I normally buy (the cheapest I can find) is extraordinary. It's got lots of space around the hard drives with an air-flow design that specifically sucks the air right over them. Impossible to say if it will increase their reliability, but it's certainly not going to be doing any harm.

      --

      Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
    45. Re:Never had a drive fail by Reziac · · Score: 1

      One of the reasons I like RaidMax cases is because the HD bays (and their midtowers have a total of 10 drive bays!) enforce half an inch of free space between drives, so even if you fill ALL the drive bays there is still room for airflow (and the side-mounted case fan swishes air through them well enough, too).

      This is in direct contrast to the typical case (cheap or not!) that unless you only use half the drive bays, crams the HDs together cheek-by-jowl, literally touching one another. So they're baking in one another's heat, and said heat can't dissipate properly even if a fan is blowing directly on the HDs.

      I remember this point from Google's survey of some 100,000 of their own HDs: HDs that run hot have an average lifespan of about 3 years, whereas HDs that run cool have a lifespan of 5 or more years. This exactly matches my own observation of typical system failure rates -- the typical OEM case is DESIGNED to retain heat, and if the OEM system's HD fails, it will do so at about 3 years. Conversely, HDs in clone machines (which usually are not so "efficiently engineered" and spread stuff around inside the case, and are typically MUCH better vented overall) usually survive 5 years or longer.

      Don't believe that OEMs design 'em to run hot? I have a Dell P4 here, one of their more-expensive models ($4000 new -- it was given to me because the previous owner got sick of its problems). With stock cooling, which consisted of one case fan and a shroud aimed at the CPU, it ran so hot that it was unstable. I removed the shroud, and added a proper CPU heatsink/fan** and a case fan. Its running temperature dropped 40F DEGREES, and it stopped crashing.

      ** You can buy a standard HSF to fit the nonstandard Dell mounting arrangement for about $15 from tekgems.com

      The plastic sheathing that's always been on most OEMs and is now popular on clone cases is another problem. Metal dissipates heat, and a large portion (sometimes the majority) of system cooling is heat transfer *through* the metal case. But plastic is an insulator, and the dead airspace between metal interior and plastic exterior is ALSO an insulator. Take a standard oldfashioned metal case, throw a towel over it (don't cover the front or back, only the sides and top) and watch the temperature skyrocket. Then consider whether plastic sheathing is such a wonderful idea after all.

      HDs don't like running at temps below about 60F either, but that's not usually an issue, unless you're an Eskimo :)

      I try to keep my system innards somewhere in the 35C range, and there's little doubt in my mind that this helps longevity. (I don't rush out and buy new shit all the time; I use old paid-for shit til it either dies or is no longer of any use for anything.) I'm writing this on a box whose major innards are almost 10 years old. :)

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    46. Re:Never had a drive fail by aunt+edna · · Score: 1

      Will I buy Samsung again? abso bloody utely not
      Did my newish 160gb SpinRite fail?

      Only good taste reins back the terms of endearment I could use.

    47. Re:Never had a drive fail by petermgreen · · Score: 1

      mixing from different batches means you've increased the odds that at least one drive fails prematurely.
      but decreased the odds of a second drive failure before you can get the system back upto redundant running.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    48. Re:Never had a drive fail by tomhudson · · Score: 1

      mixing from different batches means you've increased the odds that at least one drive fails prematurely.
      but decreased the odds of a second drive failure before you can get the system back upto redundant running.

      You would think so, but by the time a RAID says it's dying, theres a good chance that at least one other disk has problems too, and you won't be able to recover everything anyway. ISTR that about 15% of all attempts to recover a raid fail because another disk has errors, or dies during the recovery.

      Also, what do you do if 2 or more drives fail because their controllers got zapped, or someone "accidently" "booted" the box - with their foot? Being from different batches, or different manufacturers, isn't going to help, so might as well go for the scenario that gives the best chance for a longer useable lifetime.

    49. Re:Never had a drive fail by petermgreen · · Score: 1

      theres a good chance that at least one other disk has problems too, and you won't be able to recover everything anyway. ISTR that about 15% of all attempts to recover a raid fail because another disk has errors, or dies during the recovery.
      Frankly with the way most people build thier raid arrays i'm not surprised. What do you expect to happen if you take a group of identical drives and run them under almost identical load and environmental conditions.

      I would like to see stats on raid recovery failures for raids built with a strict policy to make every drive a different model. I bet they would be much lower.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    50. Re:Never had a drive fail by tomhudson · · Score: 1

      The problem isn't that the drives fail at the same time. In a lot of cases, the problem is that data that hasn't been accessed from one of the surviving drives is no longer readable (bad sector), but neither the drive electronics nor the raid noticed, simply because no attempt had been made to read it in a while.

      The alternative is to have a background process continually re-read and verify data over the entire disk surface, with the higher load, earlier failure rates, and reduced performance that will cause. In other words, the cure could be worse than, or even cause, the disease. And under heavy load, the process would have to be stopped anyway.

      With the large disk sizes available today, why not have 3 identical drives in a RAID-1, and rotate out 1 drive each day/week/month/whatever-arbitrary-time-floats-your-boat? Or just leave all 3 in - much less wear reading data, much higher performance (3 disk caches, 3 sets of heads, platters, etc., for the same data), and when one fails, you still have a 2-drive raid, or 2 backups of your data to recover from.

    51. Re:Never had a drive fail by Larryish · · Score: 1

      That proves what I have been saying for years... you can't go wrong when you have a huge Wang.

    52. Re:Never had a drive fail by Jesus_666 · · Score: 1

      Recently, I've seen two drives (a 200G Seagate and a 120G Maxtor) die with similar symptoms - they went completely inert. They just don't care whether there is power or not. All drive failures I've seen before had the drive at least attempting to spin up, so I find this unusual. The only lead I have is that both drives have at some point been connected to the same external controller, which has since been retired.

      Any idea what that might be? I guess something on the logic board must have fried so the driven don't even make it to the spinup procedure...

      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
    53. Re:Never had a drive fail by Anonymous Coward · · Score: 0

      There are those who are careful with their computer, and those who have hard drives fail.

      But seriously, much of the reason why 1331 people are always complaining about hard drives fail is because they subject them to the viabrations of the cooling fans they put in their computers for the purposes of keeping their drives cool. Heat isn't good for hard drives, but viabrations are far worse; put a cooling fan up against a hard drive, and it's going to die.

    54. Re:Never had a drive fail by Dalroth · · Score: 1

      Do what I do.

      4x500GB Raid 1+0 then rsync occasionally to one external USB 1GB drive.

      Keep external USB drive offsite if you're extra paranoid (friends house, parents house, trunk of car, whatever works for you).

      Bryan

    55. Re:Never had a drive fail by Allador · · Score: 1

      That's why:
          1. always partition everything
          2. never use Maxtor drives
          3. never buy Dell I would strongly argue that you're learning the wrong lesson here.

      The lesson to be learned is to have your data backed up.

      With a secondary lesson in that if downtime is expensive for you, then have redundancy built in (ie, mirroring or better).
    56. Re:Never had a drive fail by the+brown+guy · · Score: 1

      I have a 20 gig WD hard drive, keeps failing CRC's which is a total pain in the ass when your 5 gb torrents don't work, but this drive has seen some serious abuse. Dropping it, stepping on it, having the computer kicked over many times (while the drive is only secured by the wiring. I think it is mostly affected by how well you take care of your drives, if you leave it in your computer, provide adequate room for airflow to prevent overheating, and stop it from falling over, then hard drives last many years.

      --
      Orbis terrarum est non altus satis
    57. Re:Never had a drive fail by zippthorne · · Score: 1

      Of course, there's an obvious and easy solution to this: heterogeneous ages.

      After six months, replace the first disk whether it needs it or not. (or shorter or longer period depending on the ratio of "disks that can fail safely" to "disks in the array" and how long you desire between complete disk refreshment cycles) Then, replace the next disk after another six months. Continue indefinitely.

      Drive size inflation is handled by buying disks that are integral-stripe-units larger each time and, when all disks have enough additional capacity (basically after one complete cycle, and every upgrade following), growing the array to fill the disks. There will always be (n-1) disks with "wasted" space, but it's a small price to pay for fault tolerance.

      If you have failures requiring unscheduled rebuilding, then you make whatever corrections are necessary to re-establish the original level redundancy and spacing.

      (also, after nearly one complete cycle, you can use the used disks (and one new one) to establish an additional array without the bootstrapping)

      --
      Can you be Even More Awesome?!
    58. Re:Never had a drive fail by tomhudson · · Score: 1

      Of course, there's an obvious and easy solution to this: heterogeneous ages.

      After six months, replace the first disk whether it needs it or not. (or shorter or longer period depending on the ratio of "disks that can fail safely" to "disks in the array" and how long you desire between complete disk refreshment cycles) Then, replace the next disk after another six months. Continue indefinitely.

      That leaves you vulnerable to the high initial failure rate of drives. Most of us have had drives that have failed either out of the box, within a few hours, or in the first few weeks. It's a bit of a crap-shoot. The real answer is always have a good backup.
    59. Re:Never had a drive fail by WuphonsReach · · Score: 1

      That's been my experience as well.

      Killer #1 has always been poor cooling around the drives. And it takes hardly any airflow to keep a drive cool.

      Killer #2 has been poor power quality.

      --
      Wolde you bothe eate your cake, and have your cake?
    60. Re:Never had a drive fail by GWBasic · · Score: 1

      I don't understand how people are always complaining about their hard drives failing. In 30 years it hasn't happened to me yet.

      I've had two drives fail on me in the last three years. The most recent case was interesting. My group at work moved to a new building, and the week after we moved, two drives failed.

      My failure was rather spectacular. I arrived at work, and much to my surprise, iTunes couldn't find my music. After I realized that it was dead, my co workers started arriving and telling me how my hard drive was making a loud screeching sound the night prior.

  2. There are only two kind of peeps... by **loki969** · · Score: 5, Insightful

    ...those that make backups and those that never had a hard drive fail.

    1. Re:There are only two kind of peeps... by Raineer · · Score: 5, Insightful

      I see it the other way... Once I start taking backups my HDD's never fail, it's when I forget that they crash.

    2. Re:There are only two kind of peeps... by Anonymous Coward · · Score: 0

      What about those of us who are both. Or some people I know who have had 3 failures yet still don't back up (it has gotten to be a joke among their friends)

    3. Re:There are only two kind of peeps... by gparent · · Score: 1

      And I'm part of those who've never had a hard drive fail :)

    4. Re:There are only two kind of peeps... by Metasquares · · Score: 1

      Don't forget about those of us who just keep their important work checked into a remote version control system.

    5. Re:There are only two kind of peeps... by OS24Ever · · Score: 2, Insightful

      More like 'those that never owned an IBM Deskstar drive'

      --

      As a rock-in-roll Physicist once said, No matter where you go, there you are.

    6. Re:There are only two kind of peeps... by Anonymous Coward · · Score: 0
      I use a raid 1 mirror.
      Have had many HD's fail, don't make backups.

      Am i of the "those that make backups" type, as the mirror can be seen as a continual backup, or am i of the "those that never had a harddisk fail", because I've never had a raid 1 mirror fail?

    7. Re:There are only two kind of peeps... by johannesg · · Score: 1

      That counts as backup... Noone says it *has* to be tape you know.

    8. Re:There are only two kind of peeps... by BSAtHome · · Score: 2, Funny

      Real men don't make backups; they cry.

    9. Re:There are only two kind of peeps... by BSAtHome · · Score: 1

      I remember those 75G IBM drives. Had an array of them, totaling 16 drives, 14 of them failed within 12 months.

    10. Re:There are only two kind of peeps... by mikael_j · · Score: 1

      I actually had a few "Deathstars", in fact, I still have one in one of my machines still running fine. Not a single one of them has crashed for me, one made the infamous "click of death" but then just kept on running...

      I always love mentioning that when people say IBM hardware is of poor quality (generally gamers and similar people who will never again touch any product by a company if they so much as hear a rumour about one of their products failing a bit too often, yet they'll gladly buy the cheapest possible parts and bitch about anything that isn't consumer-grade crap is overpriced).

      /Mikael

      --
      Greylisting is to SMTP as NAT is to IPv4
    11. Re:There are only two kind of peeps... by Nefarious+Wheel · · Score: 1

      I use a raid 1 mirror. Have had many HD's fail, don't make backups. Am i of the "those that make backups" type, as the mirror can be seen as a continual backup

      There are at least two ways of looking at backups. By using a RAID device you are protecting your system against catastrophic hardware failure, by using remote media (tape or remote VCS) you are protecting your data. Two separate things. For data that's important, you have to guard against young Bobby Tables running a script that renders your database or file unusable, and for that you need the ability to roll back to a previous version. RAID1 doesn't do that by itself unless you use a 3-disk RAID1 set and periodically drop the third mirror (i.e. have it spun down during the work day).

      --
      Do not mock my vision of impractical footwear
    12. Re:There are only two kind of peeps... by asuffield · · Score: 1

      Once you have reliable hardware that won't trash your data, you can build your idiot-protection in software on top of that: simply take nightly snapshots of the filesystem contents onto a different region of the drive. I use rsync's hardlink-forest mode for this.

      After that you only need to worry about area-destructive effects, like major fires - and at that point, you may well decide that your data is not important enough to protect it against your house burning down, because when that happens your mail is the least of your problems.

      External backup media has its applications, but it is not universally required. It is also frequently unreliable due to the human element, while a fully automated system that solves the same problems can be far more reliable.

    13. Re:There are only two kind of peeps... by petermgreen · · Score: 1

      You also have to worry about things that kill everything in the PC, In particular if a high voltage somehow gets on the machines power rails.

      Sure you may be able to get the data back with a controller board swap but finding the exact right boards may be a PITA and the high voltages may have fried parts that aren't on the controller board requiring proffessional recovery.

      Further if you use multiple identical drives in your array (lots of people do this because it means better performance and more efficiant use of disk space) there is a very real risk of multiple failures close together.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    14. Re:There are only two kind of peeps... by squidinkcalligraphy · · Score: 4, Insightful

      "Backups are for wimps. Real men upload their data to an FTP site and
      have everyone else mirror it." -Linus Torvalds

      --
      "I think it would be a good idea" Gandhi, on Western Civilisation
    15. Re:There are only two kind of peeps... by jamesh · · Score: 1

      Real men don't make backups; they cry.

      That is one if the funniest comments I've read this year :)

      I guess nothing else is going to make a man cry more than the loss of his pr0n collection.
    16. Re:There are only two kind of peeps... by Anonymous Coward · · Score: 0

      I remember them too, mostly because the 75GB Deskstar drive that I bought almost 9 years ago is still working today.

    17. Re:There are only two kind of peeps... by Anonymous Coward · · Score: 0

      always love mentioning that when people say IBM hardware is of poor quality

      Yes, because your few anecdotes outweigh the massive amount of evidence out there that IBM's "Deathstar" series had a horrendous failure rate.

      Mind you, I agree that it's stupid to never buy from a particular company again just because they had a bad run of devices. I feel that Maxtor gets a similar bad rap around here -- they had a run of bad drives in the early 90's, and now look at the stigma they have here on Slashdot. I avoided buying any drives in their bad series, and I've never had a Maxtor fail on me.

    18. Re:There are only two kind of peeps... by HardCase · · Score: 1

      ...those that understand what MTBF means and those that don't.

    19. Re:There are only two kind of peeps... by thefekete · · Score: 1

      The way I see all types of "insurances" (e.g. Backups) are that you are not paying IN CASE something happens, you are really just paying Murphy through a third party to prevent such mishaps.

      --
      The cool things is to have windows that bounce up and down like a good tits.
    20. Re:There are only two kind of peeps... by OS24Ever · · Score: 1

      First off, I didn't call them deathstars, you did.

      Secondly, I didn't say IBM was poor hardware quality. I stated a well known fact that there were issues with those drives, in fact there was a lawsuit about it and IBM settled the suit. So there was something too it. I'm happy you had better results. I didn't enjoy that luck.

      I owned 1x75GB and 4x250GB drives. The 75GB failed, and 2 of the 250GB drives failed. Because I didn't read the part about 'no peanuts' my 75GB was voided and it was never fixed. The second time I sent the 2x250GBs in I made sure not to make that mistake, and they were fixed under warranty and worked until I got rid of the server they were in.

      I now happen to buy Seagate hard drives for my personal use, but it happens that the reason I do that is IBM no longer is in the hard drive business so my incentive to buy them is not so high any more.

      What was that incentive? I happen to work for them. One of those 'eat your own dogfood' kinda things in that if I can work for them, I can use the stuff in my personal projects. That being said my response was a personal one, on my own time, and didn't express any IBM opinion on the matter.

      I'm just expressing my own opinion in the original post was rather dubious in nature because I find it hard to believe no one ever had hard disk fail - ever - I've been using computers now for almost 30 years. From my little Atari 800. Saying that in all that time you never had a disk fail says you either never used computers, or you should buy a lottery ticket. I can remember a 10MB hard drive actually smoking in the late 80s in a Compaq Plus, and the excitement of getting it replaced with a 'hard card' that had a whoppin 20MB on it.

      Since then, I've had hard disk never fail the entire time I owned a system, and I've had others crash and burn within weeks of getting them. It's impossible to expect something that does the equivalent of a 747 flying over water 2ft above ground at mach 20 or whatever the stat is to not break every once in a while.

      --

      As a rock-in-roll Physicist once said, No matter where you go, there you are.

    21. Re:There are only two kind of peeps... by Nefarious+Wheel · · Score: 1
      I tend to think more in the enterprise context, because that's kind of where I live. Banks take it to extremes - block replication of everything in the SAN to a dupe across the city. That sort of combines off-site backup with mirroring. Saving the database logs pretty much perpetually in a version control system covers the databases, plus backup of individual file shares by more common methods (although disk is increasingly preferred to tape).

      Nowdays you're pretty much encouraged to consider your laptop/desktop storage as nothing more than a working cache, often times by requiring documents to be stored / indexed on some form of doc control system such as Documentum or similar.

      A good place to start reading is hit the EMC web site and follow the links and reference terms you read there, either Wiki or JFGI. Interesting subject, storage infrastructure, and a good career path that covers boring to brilliant solutions.

      --
      Do not mock my vision of impractical footwear
  3. Marketplace can't function without good data by dpbsmith · · Score: 5, Insightful

    If everyone knows how much a disk drive costs, and nobody can find out how long a disk drive really will last, there is no way the marketplace can reward the vendors of durable and reliable products.

    The inevitable result is a race to the bottom. Buyers will reason they might was well buy cheap, because they at least know they're saving money, rather then paying for quality and likely not getting it.

    1. Re:Marketplace can't function without good data by Anonymous Coward · · Score: 0
      The problem is that the consumers are idiots. How many of them do you suppose understand what Mean Time Between Failure statistics indicate? Example:

      Using mean time between failure rates suggest that disks can last from 1 million to 1.5 million hours, or 114 to 170 years
      That is obviously wrong.
    2. Re:Marketplace can't function without good data by piojo · · Score: 1

      The inevitable result is a race to the bottom. Buyers will reason they might was well buy cheap, because they at least know they're saving money, rather then paying for quality and likely not getting it. That's the description of a lemon market. However, I don't think it applies here, because brands gain reputations in this realm. If one brand of hard drives becomes known as flaky, people (and OEMs) will stop buying it.
      --
      A cat can't teach a dog to bark.
    3. Re:Marketplace can't function without good data by commodoresloat · · Score: 3, Interesting

      If everyone knows how much a disk drive costs, and nobody can find out how long a disk drive really will last, there is no way the marketplace can reward the vendors of durable and reliable products. And that may be the exact reason why the vendors are providing bad data. On the flip side, however, if people knew how often drives failed, perhaps we'd buy more of them in order to always have backups.
    4. Re:Marketplace can't function without good data by Jeff+DeMaagd · · Score: 1

      For the most part, buying a more expensive drive doesn't necessarily mean it's more reliable. The Google paper on the subject said that they saw no significant difference between the regular desktop drives and the pricey Fiber Channel drives.

    5. Re:Marketplace can't function without good data by Jeff+DeMaagd · · Score: 1

      Isn't it the very point? By not saying what MTBF really means, it's another way to dupe even the smart people.

      The way I understand MTBF to really be, one in ten drives failing might translate to an MTBF of 30 years, assuming the drive is replaced at the end of a 3 year service life, or 50 years assuming a 5 year service life.

    6. Re:Marketplace can't function without good data by Anonymous Coward · · Score: 0

      I don't think you quite understand what MTBF means.

      MTBF really does refer to the average number of operating hours between each failure; the kicker is, they're not talking about a single drive, they're talking about any reasonably large population of drives. For example, let's say you're an HD manufacturer, and you're running a MTBF test. The test is performed with a population of 1000 drives and is run for 1000 hours (therefore a total of 1000x1000 = 1M operating hours are accumulated). During that time, 1 drive fails. You have now established a 1,000,000 hour MTBF figure for that drive model. (Not very well -- in real MTBF testing one would probably want to accumulate more than one failure.)

      For the end user of a single drive, MTBF isn't too meaningful. All you can take out of it is that a bigger figure means the drive is less likely to fail during the time you own it.

      This doesn't mean it's a bad thing that drive manufacturers use it so much. MTBF is a great specification for people who use large numbers of identical disks -- data center operators, for example. It's also very useful to system integrators (Dell, HP, etc.) who use it to predict return rates and warranty costs (and probably even insist on penalty clauses in their contracts so they can recover some money if the failure rate is significantly worse than the MTBF figure suggests).

    7. Re:Marketplace can't function without good data by rHBa · · Score: 1

      Okay, I'm assuming that the manufacturers get their data by testing their drives in perfect conditions i.e consistent power supplies, no physical shock, hermetically sealed rooms etc.

      And your average user probably has no UPS and a good proportion of those are laptop/external HDDs which obviously go through a significant amount of physical shock.

      So unless we can remove the 'average users' statistics from the independent data then we are comparing apples© and oranges...

    8. Re:Marketplace can't function without good data by petermgreen · · Score: 3, Insightful

      A MTBF is only meaningfull when combined with an operating lifespan over which is was measured and after which it is advised that customers needing high reliability replace thier drives.

      Also the manufacturer needs to specify the conditions of the test, temperature, humidity etc and customers requiring reliability need to ensure they run near those conditions.

      If you do a 1000 hour test and all your drives have a design fault that cause a large proportion of them to fail after about 5000 hours usage you probablly won't notice the fault but 7 months down the line customers who run the drive 24/7 will.

      The problem is of course that by the time you have done proper testing (= running the drives for thier expected lifespan under realistic operating conditions and seeing what proportion fail during that time and when) for a device with an expected lifetime in years the device is obsolete.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    9. Re:Marketplace can't function without good data by terrymr · · Score: 1

      Surely thats Apples®

    10. Re:Marketplace can't function without good data by Fweeky · · Score: 1

      On the other hand, another study found FC drives to have a 10 times lower bit error rate than SATA drives.

    11. Re:Marketplace can't function without good data by aunt+edna · · Score: 1

      Thanks. I so liked your post. Wish I could just get to it like that. Go ahead, write some more. :-)

    12. Re:Marketplace can't function without good data by HardCase · · Score: 1

      Completely not correct.

      MTBF in this case is the amount of time that will pass before a drive fails, assuming that it is properly maintained and replaced as the end of its useful lifetime is reached. That failure number is the mean number of hours that pass before a failure occurs under those conditions. It doesn't mean that the drive will last for hundreds of years and it isn't a measure of how many drives were tested (although the number of drives tested does determine the accuracy of the statistic). It just means that if you use the drive properly, it should be a long time before one dies unexpectedly.

      Now the real question is...what is the useful lifetime of the drive?

    13. Re:Marketplace can't function without good data by canuck57 · · Score: 1

      The inevitable result is a race to the bottom. Buyers will reason they might was well buy cheap, because they at least know they're saving money, rather then paying for quality and likely not getting it. I cheat. I ask the storage administrators I know. One place I know is really good, they see drives from everywhere. You get a feeling on which drives are better than others. About 5-7 years ago one manufacturer really messed up, gaining a nickname "death....". I still don't buy them today. Another I stopped buying went out of business years ago.

      I know google collects these stats but getting the inside on it is tough. But here is a glimpse of what they have.

  4. MTBF For Unused Drive? by sarahbau · · Score: 1

    Maybe they mean the MTBF for drives that are just on, but not being used. I've never put any stock into those numbers, because I've had too many drives fail to believe that they're supposed to be lasting 100 years. I've had 3 die in the last 3 years alone (all in my server, so probably getting more than average use, but still...)

    1. Re:MTBF For Unused Drive? by zappepcs · · Score: 4, Interesting

      The problem is that the MTBF is calculated on an accelerated lifecycle test schedule. Life in general does not actually act like the accelerated test expanded out to 1day=1day. It is an approximation, and prone to errors because of the aggregated averages created by the test.

      On average, a disk drive can last as long as the MTBF number. What are the chances that you have an average drive? They are slim. Each component in the drive, every resistor, every capacitor, every part has an MTBF. They also have tolerance values: that is to say they are manufactured to a value with a given tolerance of accuracy. Each tolerance has to be calculated as one component out of tolerance could cause failure of complete sections of the drive itself. When you start calculating that kind of thing it becomes similar to an exercise in calculating safety on the space shuttle... damned complex in nature.

      The tests remain valid because of a simple fact. In large data centers where you have large quantities of the same drive spinning in the same lifecycles, you will find that a percentage of them fail within days of each other. That means that there is a valid measurement of the parts in the drive, and how they will stand the test of life in a data center.

      Is your data center an 'average' life for a drive? The accelerated lifecycle tests cannot tell you. All the testing does is look for failures of any given part over a number of power cycles, hours of use etc. It is quite improbable that your use of the drive will match that of the expanded testing life cycle.

      The MTBF is a good estimation of when you can be certain of a failure of one part or another in your drive. There is ALWAYS room for it to fail prior to that number. ALWAYS.

      Like any electronic device for consumers, if it doesn't fail in the first year, it's likely to last as long as you are likely to be using it. Replacement rates of consumer societies mean that manufacturers don't have to worry too much about MTBF as long as it's longer than the replacement/upgrade cycle.

      If you are worried about data loss, implement a good data backup program and quit worrying about drive MTBFs.

    2. Re:MTBF For Unused Drive? by WaltBusterkeys · · Score: 2, Insightful

      Great post above. It also depends on how you count "failure." I've had external drives fail where the disk would still spin up, but the interface was the failure point. I took the disk out of the external enclosure and it worked just fine with a direct IDE (I know, who uses that anymore?) connection.

      If I were running a data-based business I'd count that as a "failure" since I had to go deal with the drive, but the HD company probably wouldn't since no data was permanently lost.

    3. Re:MTBF For Unused Drive? by NovaSupreme · · Score: 1

      MTBF has *nothing* to do with life expectancy. It's failure rate of good drives, that are not expected to fail. More precisely it's failure rate of the drives in the conditions when their failure rate is constant (that precludes high rate failure in the beginning and in the end). I can make hard drives that are guaranteed to work for 1 day, tell that to my customers and their MTBF will be infinity, since there will never be unexpected failures! Another example MTBF of a healthy adult in USA may be 10000 years (chances of road accidents are small) but life expectancy is only 80 years!

    4. Re:MTBF For Unused Drive? by BSAtHome · · Score: 2, Insightful

      There is another failure rate that you have to take into account: unrecoverable bit-read error-rate. This is detected as an error in the upstream connection, which can cause the controller to fail the drive. An unrecoverable read fails the ECC mechanism and can under circumstances be recovered by performing a re-read of the sector.

      The error-rate is in the order of 10^14 bits. Calculating this on a busy system, reading 1MBytes/s gives you approx. 10^7 seconds for each unrecoverable read failure. Or, that means it occurs 3 times per year on average. So, forget MTBF on busy systems and hope that your controller is able to do re-reads on a disk. Otherwise, your busy system/array is not going to last very long.

    5. Re:MTBF For Unused Drive? by Anonymous Coward · · Score: 0

      No, that's not how MTBF for hard drive is calculated. And that's also not the "average" hours of an "average" drive can expect if someone gets an "average" hard drive.

      See here for an extensive discussion, http://forums.storagereview.net/index.php?showtopic=18811 I'm sure there's more info in that forum.

    6. Re:MTBF For Unused Drive? by mollymoo · · Score: 4, Informative

      Maybe they mean the MTBF for drives that are just on, but not being used. I've never put any stock into those numbers, because I've had too many drives fail to believe that they're supposed to be lasting 100 years.

      If you think an MTBF of 100 years means the disk will last 100 years you're bound to be disappointed, because that's not what it means. MTBF is calculated in different ways by different companies, but generally there are at least two numbers you need to look at, MTBF and the design or expected lifetime. A disk with an MTBF of 200 000 hours and a lifetime of 20 000 hours means that 1 in 10 are expected to fail during their lifetime, or with 200 000 disks one will fail every hour. It does not mean the average drive will last 200 000 years. After the lifetime is over all bets are off.

      In short, the MTBF is a statistical measure of the expected failure rate during the expected lifetime of a device, it is not a measure of the expected lifetime of a device.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    7. Re:MTBF For Unused Drive? by SuperQ · · Score: 3, Informative

      MTBF is NOT calculated for a single drive. MTBF is calculated based on an average for ANY pool size of drives.

      If you have 10,000 drives, and the failure is 1 in 1,000,000 hours, you will have a failure every 100 hours.

      Here's a good document on disk failure information:
      http://research.google.com/archive/disk_failures.pdf

    8. Re:MTBF For Unused Drive? by hkfczrqj · · Score: 1

      On average, a disk drive can last as long as the MTBF number. What are the chances that you have an average drive? They are slim. Each component in the drive, every resistor, every capacitor, every part has an MTBF. They also have tolerance values: that is to say they are manufactured to a value with a given tolerance of accuracy. Each tolerance has to be calculated as one component out of tolerance could cause failure of complete sections of the drive itself. When you start calculating that kind of thing it becomes similar to an exercise in calculating safety on the space shuttle... damned complex in nature. Read up on some "Reliability Theory." It's not that complex, and it can give simple and meaningful results, even on complex systems. BTW, I'd love to get a hold of hard disk data and run it through some math. It'll probably be very obvious why they were withholding the data in the first place.
    9. Re:MTBF For Unused Drive? by baggins2001 · · Score: 1

      I would be interested in this also and have asked an administrator at a server farm about this. He said that out of about 10000+ computers they have about 15 failures per week. They are continuously replacing about 60 per week (upgrades).
      They hardly look at them. They just yank them out and put in another one. The people that would actually know are the companies that sell them computers, because they may be the ones investigating the failures.
      Most of their time was spent on efficiency of replacing downed systems and software and OS updates. Those were the things that usually gave them the most headaches.
      I found out he was an administrator for a server farm, when he started telling me a story about how one of the guys there pushed out the wrong updates to 1500 computers and they all crashed.

      --
      He who said 1,000,000 monkeys on 1,000,000 typewriters would eventually type the great novel, never saw an AOL chat room
    10. Re:MTBF For Unused Drive? by greyhueofdoubt · · Score: 1

      >>if it doesn't fail in the first year, it's likely to last as long as you are likely to be using it.

      For anyone not familiar with this concept, it's known as a bathtub curve:
      http://en.wikipedia.org/wiki/Bathtub_curve

      This applies as much to hard drives as it does to cars, buildings, etc.

      God I love data that can represented on curves... :)

      -b

      --
      No offense, but I've stopped responding to AC's.
  5. Never had a drive *not* fail. by Murphy+Murph · · Score: 4, Informative

    I've gone through many over the years, replacing them as they became too small - still using some small ones many years old for minor tasks, etc. and he only drive I've ever had partially fail is the one I accidentally launched across a room.

    My anecdotal converse is I have never had a hard drive not fail. I am a bit on the cheap side of the spectrum, I'll admit, but having lost my last 40GB drives this winter I now claim a pair of 120s as my smallest.
    I always seem to have a use for a drive, so I run them until failure.

    --
    I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
    1. Re:Never had a drive *not* fail. by neumayr · · Score: 1

      Wow, you had 40GB drives last this long?
      Impressive. All mine were of the IBM Deathstar type :-/

      --
      Truth arises more readily from error than from confusion. -Francis Bacon
    2. Re:Never had a drive *not* fail. by hcmtnbiker · · Score: 1

      My anecdotal converse is I have never had a hard drive not fail. I am a bit on the cheap side of the spectrum, I'll admit, but having lost my last 40GB drives this winter I now claim a pair of 120s as my smallest. I always seem to have a use for a drive, so I run them until failure.

      If this was the case I would seriously consider looking for a problem that's not directly related to the hard drives themselves. Around 80% of HDD failures are controller board failures, I wonder if maybe your setup is experiencing electrical problems, brownouts or surges that might mess with the controller boards. I myself have never had an HDD fail on me before even with constant abuse.

      --
      If i had one dollar for every brain you dont have, i would have $1.
    3. Re:Never had a drive *not* fail. by Murphy+Murph · · Score: 1

      I have very clean power, and use UPSs to boot. I believe I simply use them much longer than average. How many drives have you had running 24x7x365 for seven years?

      --
      I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
    4. Re:Never had a drive *not* fail. by Anonymous Coward · · Score: 0

      I CAN'T type today!
      24x7x52

    5. Re:Never had a drive *not* fail. by kesuki · · Score: 1

      well see I read slashdot when i bought my 80 gig drives, right about the time they started calling IBM deathstars, so i have two really nice 80 gig ibm drives that came from their OTHER plant and they're doing real nice.. FWIW my smallest HDD that works is a 3.5" 4.3GB maxtor, I always researched every HD i ever bought after losing almost 2 GB of irreplaceable data to my first ever HD failure (was a maxtor, ironically I RMAed it and got the 4.3 GB that is currently my smallest, they no longer were selling the 2 GB models etc etc free upgrade.

      I also am somewhat good at using optical media for backup (used to use zip drives, then tape, now optical discs) but it didn't stop me from loosing about 120 GB* of data this year to 5 count em 5 HD failures, only 1 of them was electrocuted by me buying a cheap PSU to try to get my long not used 'buggy asus' Dual AMD MP 2000+ system to run because i figured it needed a 500 watt ATX1.4 PSU something that i ran across online last year (it shipped with a 400 watt PSU because that was the best ATX 1.4 PSU the company i bought from had, even though the damn system locked up every 45 minutes of run time, no matter the OS because of under-voltage) was actually a really nice 400 watt PSU, but 400 watts didn't cut it, for those watt hungry Athlon MP's not even with 1 hd and no optical drive, and minimal fans.

      plus to complicate it 1 TB of my optical media is infected with a nasty windows rootkit. (sigh) but i found a solution to cleanse my files safely. (linux machine + 'clean' windows usb drive enclosure, sadly I'm missing part 3 a GOOD (google mail good) windows rootkit scanner for Linux) no, the stupid scanner people recommend for cleaning email sucks and can't detect the rootkit. i've yet to find a windows solution that can detect it, so far only Diff can tell between a system that is infected by exposure and one that isn't. based on files that update for no reason when optical media is inserted) detecting the files it modifies != detecting the rootkit. all i know is gmail detects it and no virus/rootkit scanner i've tried has. and i don't have time to send 1 TB of data through gmail not even on cable.

      *= not total capacity total capacity was 210 GB or so, but i only lost around 120 GB (max) since i was pulling one drive, 2 drives test drives in test setups (no data loss) 1 was electrocuted in my asus POS workstation from hell no loss and 1 was already backed up and formatted when it started corrupting data on a test system during configuration no loss) i'm not sure what was on the 120 GB i lost, so i might have had most of it backed up with the exception of about 20-30 GB(was backing up those files when it failed ahh sigh for not backing up my old server drives before they got 8+ years old)

    6. Re:Never had a drive *not* fail. by autocracy · · Score: 1

      I've heard stories of the Deathstars. Thus far, I've only lost iPod hard drives (though those failures started a week after I mentioned "I've never lost a drive before."). I've been run twin 18gig IBM De(ath|sk)star SCSIs for something on the order of 6 years now. Right now I have the laptop (80g), external (close to 500g), and twin SCSI system. Going to be turning up a 6 spindle set of 18 gig drives soon. Anyway, I suppose I should double-check my backups now.

      --
      SIG: HUP
    7. Re:Never had a drive *not* fail. by Depili · · Score: 3, Interesting

      The deathstars were all 80gt PATA disks, manufactured by a single plant, had 8 of them, all failed.

    8. Re:Never had a drive *not* fail. by neumayr · · Score: 2, Insightful

      *blink*

      Okay, when I think of backup, it's data backup.
      I wouldn't backup applications or operating systems, just their configuration files.
      Anyway, what I'd try doing is diff(1)ing all those backed up system files with the originals.

      Or am I missing something completely, and it's some weird rootkit that's embedded in some wm* media file?

      --
      Truth arises more readily from error than from confusion. -Francis Bacon
    9. Re:Never had a drive *not* fail. by KillerBob · · Score: 2, Insightful

      I still have a working 10MB hard drive from an IBM 8088... >.> (and yes, that system still works too, complete with the Hercules monochrome graphics and orange scale CRT)

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    10. Re:Never had a drive *not* fail. by Anonymous Coward · · Score: 0

      But does it run Linux?

    11. Re:Never had a drive *not* fail. by eWarz · · Score: 0, Offtopic

      No Offense, but I currently have a Core 2 Quad, 4 GB RAM, 2 500 GB Seagate Drives, 1 250 GB Hitachi Desk Star, 1 80 GB Maxtor, a GeForce 8800 GT and i'm pulling 270-300 watts while GAMING. My PS is a 430 watt antec and i've had 0 problems from my rig in general, and i game A LOT. I also overclocked my Core 2 Quad to 3 Ghz recently. The system is rock solid stable. I've NEVER lost a hard drive. Even the hard drive in my old IBM PS/2 still works. The key is a nice clean energy source and proper cooling.

    12. Re:Never had a drive *not* fail. by dgatwood · · Score: 2, Interesting

      One drive, 24x7, approx. 12 years. Seagate. Why?

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    13. Re:Never had a drive *not* fail. by Mathinker · · Score: 1

      Of course! He runs it zippety quick in a virtual machine under Vista!

    14. Re:Never had a drive *not* fail. by FrankieBaby1986 · · Score: 1

      Have a gateway from 1998 (PIII 733) with a 7200 RPM, 30 Gig HD, can't remember brand, think it is seagate. This machine has been ON, with yearly breaks of about a week (vacation), since we bought it. It has even been pushed off a table onto a cement floor while running! That was a year ago, and it (sadly) it is still the family computer!

      --
      ERROR: SIG NOT FOUND (A)bort, (R)etry, (F)ail?:
    15. Re:Never had a drive *not* fail. by Epistax · · Score: 1

      Judging by your nick, you aren't representative of everyone. Just anyone who happens to read this message.

    16. Re:Never had a drive *not* fail. by KillerBob · · Score: 4, Informative

      Admittedly, it's a different environment entirely than what you're running, but let me see if I can shed some light on it for you....

      I administer a small server, which runs its services in virtual sandboxes. One physical box, but through KVM the Apache/PHP/MySQL is in one sandbox, the SMTP/IMAP is in another, etc. Each VM image is about 20GB, give or take, and the machine has two physical hard drives. My backup is periodic, and incremental. And the backup alternates between the drives... at any given time each hard drive will have two copies of every VM, not counting the one that's actually running.

      Now... here's where the full system backup comes in: because it's a virtual machine, it's only a single 20GB file. Backing it up is as easy as shutting down the VM and copying the file. Recovering from a backup is where it gets even easier... all I have to do is copy that one file back, and start it up. Poof. *everything* is back the way it was at the time of the backup. Total time to recover? Less than a minute.

      And the host OS is easy to rebuild, too, because there's no configuration files to worry about. SSH and KVM are the only services the host is running, and for the most part an out of the box configuration for most Linux distributions will handle it quite nicely.

      So... I guess to answer your question... in my case a complete system backup makes administering, and recovering from "oh shit" moments a hell of a lot easier. :) If you have the hard drive storage space available, I'd definitely suggest going that route.

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    17. Re:Never had a drive *not* fail. by monkaru · · Score: 2, Interesting

      *laughs* Redunadcy exists for a purpose. I ALWAYS assume hardware will fail. It does you know. I guess that's why I still have the data from my 1966 756 byte Multics terminal account.

    18. Re:Never had a drive *not* fail. by leenks · · Score: 1

      Actually no - they had 30 and 60 gig drives failing well before the 80gb shenanigans. We had dozens of them, and all replaces under warrantee.

    19. Re:Never had a drive *not* fail. by putaro · · Score: 2, Informative

      No, the key is a small sample size. Disks in data centers, running in a nice, fully A/C'd room off nicely filtered power will fail. All disks will fail eventually - they have little spinny things in them and bearings and such that will eventually give out. But, your mileage will vary, disks *are* reliable, and it's easy to have a small sample set that works well.

    20. Re:Never had a drive *not* fail. by Minimalist360 · · Score: 1
      I still have an 80MB (that's megabyte) Quantum Fireball SCSI drive. It's been running almost every day since 1990 in an Amiga 2500/030. It doesn't do a lot these days, just shows a slideshow, but for a long time I had three of these running a BBS, you know, with modems and a bulletin board and "warez".

      The other two were sold off and I had two full-height 5.25" 1.2GB seagate drives for a while, etc, etc.

    21. Re:Never had a drive *not* fail. by Minimalist360 · · Score: 1
      Wow, here's one for sale on eBay

      No bad sectors!

    22. Re:Never had a drive *not* fail. by fbjon · · Score: 1

      Old but alive: 100 MB seagate, 420 MB Conner, 4 and 6 GB Quantum Fireballs. Dead drives: 2-3 Maxtors and 1 Seagate Barracuda IV.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    23. Re:Never had a drive *not* fail. by Reziac · · Score: 1

      In 15 years with PCs, I've had 3 everyday-use HDs fail, but between 'em they were almost old enough to vote, and ran 24/7 most of their lives, and they all staggered along for a year or more after the first symptoms:

      2GB W.D., got headcrashed when I moved and developed the creeping crud; 3 years later, at age 5, it finally got to where it lost the partition table and I had to replace it.

      6GB W.D., at age 7 the thermal calibration function developed a habit of sticking on. No bad sectors, tho.

      45GB W.D., retired at age 8 (with almost 55,000 actual hours on it) because the tail end of the drive had the creeping crud (tho my *real* motivation for replacing it was that I was out of disk space). Other than that last gig, it was still perfect. (Most likely it got headcrashed when I had the machine apart to insert the network card. I'm careful, but sometimes eggs still break.) This one ran a little hot by my standards (more than pleasantly-warm to the touch), so spent its life with its own dedicated fan, which I expect helped its longevity.

      In clients' machines and donations to our user group, I seldom see sick or dead HDs, and when I do, it's usually in an OEM 1) with a really cheapassed power supply and 2) where the HD is crammed into an unventilated corner and is being cooked in its own juice.

      Anyway, I agree with you... HDs' reputation for failure seems to be rather worse than the reality. Most of the so-called HD failures I've seen were actually fucked-up filesystems or trashed MBRs, nothing to do with the hardware itself.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    24. Re:Never had a drive *not* fail. by msromike · · Score: 1

      And if you ran your VMs on a Windows server product you could back up the VM while it was still running by using the built-in Windows VSS (Volume Shadow Service) and a free tool like Hobocopy.

    25. Re:Never had a drive *not* fail. by rts008 · · Score: 1

      "It has even been pushed off a table onto a cement floor while running! That was a year ago, and it (sadly) it is still the family computer!"

      You have my sympathy, dude.
      Might I suggest pushing it off the roof, or better yet, nuke it from orbit...it's the only way to be sure. :)

      On a serious note though, there seems to be no rhyme or reason with HDD failures.
      I've had a 40 GB maxtor, 15 GB WD, a 670 MB Seagate, 30 MB WD all fail, and an 80 GB WD that is now only 63 GB's of usable HDD, but still have a multitude of 10 MB-200 GB IDE/PATA drives still running strong along with Maxtor, WD, and Seagate SATA HDD's that never miss a beat.
      Meanwhile, I have known friends that cannot seem to keep a hard-drive alive for more than 1 1/2 years.

      (yes- I still have running, functional 8086's, 8088's, 286's, 386's, 486's (with and without co-processors), 586's, a p133+ Cyrix 586 (supposed to be equal to an Intel Pentium 166 according to marketing hype)and enough parts to have at least 6-10 686's running...from 500 MHz Pentium III's to
      AMD 1.8 GHz Athlons, to my current (posting from this one) P4 478 3.0 GHz PC.

      Currently in my home net it is: (from low to high-end)

      Thrown together Frankenstein: Cyrix 586 p133+ w/ 64 MB RAM, 4x cd-ROM, 1.44 floppy, 670 MB HDD, S3 2 MB ISA vid card, running Caldera Open Linux Base 1.1.....just for the fun/exploration of it.

      Dell Dimension XPS T500 P3 500 MHz, 448 MB RAM, 40 GB HDD (OEM), 10 GB WD ATA100 HDD, ATI AIW Radeon 7500 64 MB vid card, Lite-On 32x/12x/48x CD -+ burner, and 16x DVD-ROM, and 1.44 floppy, running Win XP Pro SP2, and Win 98 SE.

      P3 733, 512 MB RAM, ATI rage 128 vid card, 24x CD-ROM, 1.44 floppy, 120 GB ATA133 PATA HDD, running CentOS 5.0 set up as a file server on my subnet.

      Dell ??*something* (not in my stepdaughter's room, and too lazy to go look at her computer now- and yes, her and her mom are visiting friends right now!) w/ Athlon 1.8 GHz, 1 GB RAM, ATI 9200 128 MB vid card, 4x dual layer DVD burner (Sony DRUxxxx??), 16x DVD-ROM, 1.44 floppy, Primary Disk is OEM 40 GB SATA, Secondary is same OEM 40 GB SATA- no RAID, with above mentioned 63 of 80 GB ATA 133 PATA drive, dual-booting Kubuntu 7.10 and Win XP Pro SP2.(about 3/4's of the time in Kubuntu- only has one school web-page that requires windows, but she can 'work it' using Firefox!)

      Home built P4 478 3.0 GHz, 2 GB RAM, ATI 9550 256 MB vid, Pri. HDD is 200 GB PATA 133 running Kubuntu 7.10, with an 80 GB SATA, and a 100 GB SATA, with /home on 80 GB, and back ups to the 100 GB- long story!

      Bottom line, we can only get hopelessly bogged down in all of this anecdotal give and take.
      I don't really know for sure where and how to find solid info for this.

      Help?!

      --
      Down With Slashdot BETA!!! I've been around the corner and seen the oliphant; you can only abuse me from your perspecti
    26. Re:Never had a drive *not* fail. by Gazzonyx · · Score: 1

      OTOH, if you are of the *nix persuasion, you could place your VMs on either an (XFS XOR ZFS) formatted slice, or any other file system (journaled FTW, kids!) on top of LVMv3; snapshot the sucker and put it where ever you'd like. Also, if you want to do this over a network, you can use samba's VSS VFS module on the net share and pull the VM over CIFS. Extra flexibility points for intelligent use of iSCSI or AOE.

      Killerbob, what are you using for your incremental backups? AMANDA or rsync or something of the likes?

      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    27. Re:Never had a drive *not* fail. by Hal_Porter · · Score: 1

      It was the ones made in Hungary that were the problem

      http://www.pcworld.com/article/id,59943-page,1/article.html?tk=dn082901X

      http://everything2.com/e2node/IBM%2520DeskStar

      The ones made in Thailand IIRC were OK. So this one bad plant in Hungary caused IBM to sell it's hard drive business with a $2B loss.

      An interesting story to tell people if they consider outsourcing. Seems like factories are not a commodity after all. In this case the Hungarian one ended up costing them $2B.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    28. Re:Never had a drive *not* fail. by Gription · · Score: 1

      The small sample size is indeed the overriding feature of these failure/success stories. In the last 10 years I have gone through about 10,000 computers in the sites my guys support. Currently we have almost 2,500 computers out there in service.

      With my larger statistical sample the data smooths out. Every week I get at least one drive that is sent in that they want me to attempt a recovery on. This doesn't include the number of failed HDs where they didn't want to attempt recovery because they didn't have any thing critical or they had everything backed up. So basically we are talking about over 50 drives a year from a sample of about 2500 computers.

      A while ago there was a /. story listing the drive failure stats from Google's server farms. It had a seriously large sample and they had some really detailed findings.

    29. Re:Never had a drive *not* fail. by Jesus_666 · · Score: 1

      Same here. For me, hard disk drives are the computer component with the second highest failure rate, right after floppy drives.

      I think that MTBF is a completely random number with no connection to reality. A hard drive lives for about two to four years, persiod. Much more interesting is how the manufacturer deals with failing drives. Maxtor makes bad drives but I like their RMA process. Seagate gives a five-year warranty. Much more important than MTBF.


      The oldest working drive I have is about 2 GB in size, but it's in an old 486DX laptop that rots in some corner, so it doesn't see any use. (Note: It's not the original drive and the laptop's BIOS can only address 214 MB of it.)

      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
    30. Re:Never had a drive *not* fail. by kesuki · · Score: 1

      most of my files are wm, mpg, or avi files, how do you think i got 1 terrabyte of files?

      and i will admit that in fact i have been sloppy and have backed up many programs (exe's zips, etc.) as well as program files folders, and sometimes just the whole hard drive, to save my time in sifting through files...

      most of the sloppy saving of files I've stopped using any of the exe's or zip files (most stuff can be DLed fresh) but for instance there are tons of game save files and folders that use a hybrid of weird file extensions etc..

      since I don't have a scanner that can detect this nasty rootkit I can't honestly know if the files are infected. for the time being I'm only playing the media files on a Linux system but many of the files don't play back (especially the wm* files)

    31. Re:Never had a drive *not* fail. by ckaminski · · Score: 1

      You can do the same thing with VMware and vmware-cmd pause, and an LVM snapshot.

    32. Re:Never had a drive *not* fail. by m50d · · Score: 1

      I've been running six hard drives continuously for the past three years, and not had one fail. Drives which are on 24x7x365 tend to do better in my experience - the wear comes from spinning them up or down. Or at least, the two drives that I've ever lost were in machines that got powered on and off one or more times each day. I suggest you listen to the man and look for other sources of trouble.

      --
      I am trolling
    33. Re:Never had a drive *not* fail. by petermgreen · · Score: 1

      A while ago there was a /. story listing the drive failure stats from Google's server farms. It had a seriously large sample and they had some really detailed findings.
      yeah, unfortunately iirc the fact it was information from a datacenter meant thier temperature vs reliability data was lacking many data points for higher temperatures.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    34. Re:Never had a drive *not* fail. by Anonymous Coward · · Score: 0

      Considering the wide range of PSUs, drives, locations, etc. in my sample I doubt "power" is the problem. The only thing in common amongst all my drives is their (near) constant use.
      I'm talking nearly a score of drives over the last two decades.
      I've changed states (not to mention substations) more than enough to rule out power as a source of problems.
      Fact is simple, I use them until they die.

    35. Re:Never had a drive *not* fail. by Anonymous Coward · · Score: 0

      but it didn't stop me from loosing about 120 GB* of data this year Where did your data go when you set it loose? Is frightened, and alone, cast adrift on the Internet by its uncaring owner?
      Oh, you meant that you lost your data. Lose != Loose.
    36. Re:Never had a drive *not* fail. by Atti+K. · · Score: 1
      My first ever HD was a 335 MB Fujitsu (bought with a 486 back in '95), failed after 2 years.

      Then came a 2 GB Quantum Fireball, ran about 4 years with no problem, then I sold it. Those were the best hard disks of their age I think.

      Then I went with Western Digital: first a 20 GB, failed after 2 years. Got a 40 GB replacement, it's still working (~4 years old), but rarely used.

      This including, now I have 6 WD (5 desktop + 1 notebook) drives, with no problems yet: 160 GB (~3.5 yrs), 200 GB (~2 yrs), 250 GB (~1.5 yrs), 500 GB (3 months) and 120 GB notebook drive (~1 month :).

      I tend to believe that WD drives are quite reliable. I check SMART attributes often, no signs of problems yet. Btw, the SMART warning saved my data from that failing 20 GB WD.

      --
      .sig: No such file or directory
    37. Re:Never had a drive *not* fail. by Eivind · · Score: 1

      But then again, with something like disk-drives where price/performance keeps going down rapidly, it doesn't make that much economical difference if the disks last on the average 3 or on the average 5 years.

      If disk-space is half the price after 18 months (a fair estimate I think), then that means that two disks that fail after 18 months can be replaced with -one- new, after 36 months (3 years) 4 failed disks can be replaced with 1 new one. After 5 years 8 failed disks can be replaced with a single one.

      So, in financial terms, a disk that will last for 3 years is worth 75% of one that will last forever. And one that will last 5 years is worth 85% of one that will last forever.

      This ain't -strictly- true offcourse, there's managment overhead in addition to the cost of the physical new disks. Still, the general idea is sound: If you've paid to store data for 18 months, you've paid half of what it costs to store it forever.

  6. Failure rates ! warranty period. by Kenja · · Score: 0

    As drive sizes have been going up, overall the warranty periods have been going down. With few exceptions (Seagate does three years) drives have a one year expected life time.

    --

    "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
    1. Re:Failure rates ! warranty period. by ABasketOfPups · · Score: 5, Informative

      Warranty periods for 750 gig and 1 terabyte drives from Western Digital, Samsung, and Hitachi, are 3 years to 5 years according to the info on zipzoomfly.com.

      A one year warranty doesn't seem that common. External drives seem to have one year warranties, but even SATA drives at Best Buy mostly have 3 years

    2. Re:Failure rates ! warranty period. by KillerBob · · Score: 1

      External drives seem to have one year warranties, but even SATA drives at Best Buy mostly have 3 years


      3 years is pretty much the industry standard on hard drives. Likewise for monitors, btw... so if your HP or Dell starts having problems with the monitor, you should check the warranty on the monitor because it'll usually be longer than the warranty on the desktop. :)

      But yes... external peripherals usually only have a 1 year warranty. My 1TB external drive is the only thing I've ever bought the extended warranty on... 3 years with Future shop. :P
      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    3. Re:Failure rates ! warranty period. by Anonymous Coward · · Score: 0

      In the EU, manufacturers are obligated by law to provide at least two years of warranty on common electronic devices, including (external) hard drives. I just returned a crashed one after one and a half years and it got replaced without hassle (as long as I stated it wasn't because of my own doing).

    4. Re:Failure rates ! warranty period. by adolf · · Score: 1

      FYI: I have a 1TB WD Mybook World Edition network-attached drive. It included a 3-year warranty, which was about 2 years longer than I expected it to be.

      So far, so good, but the thing is slow like molasses in January. It's really an odd thing: Gigabit ethernet, but with a slow CPU, a slow network driver, and a slow SATA port. It is lucky to be able to sustain rates of 4.3 megabytes per second. But SSH was easy to turn on, and WD included a full development environment, which makes up for a (very small) bit of the pain involved in using it.

      Somehow I'm thinking that before the 3 year warranty is up, I'll have yanked the drive and installed it in a real computer.

    5. Re:Failure rates ! warranty period. by Killjoy_NL · · Score: 1

      How slow is molasses in january? compared to other months?

      --
      This is the sig that says NI (again)
    6. Re:Failure rates ! warranty period. by LinuxDon · · Score: 1

      I only buy drives with at least 5 year warranty, and are designed for 24/7 operation.

      If the manufacturer doesn't have so much faith in their product, then why should I?

  7. warranties by qw0ntum · · Score: 4, Insightful

    The best metric is probably going to be the length of warranty the manufacturer offers. They have financial incentive to find out the REAL mean time until failure in calculating the warranty.

    --
    'Every story, if continued long enough, ends in death.' --Ernest Hemingway
    1. Re:warranties by dh003i · · Score: 1

      The best metric is probably going to be the length of warranty the manufacturer offers. They have financial incentive to find out the REAL mean time until failure in calculating the warranty. They do provide "real" MTBF numbers. It's just MTBF isn't for what you think it's for. See my post explaining this.
    2. Re:warranties by qw0ntum · · Score: 1

      Yes... we say the same thing (last paragraph). I know very well what MTBF means and how it's calculated. In your words, I put my stock in the warranty, because "that's what they're willing to put their money behind." The warranty is set so that most devices don't stop working until after the warranty period ends. This more accurately reflects the amount of time a drive lasts under normal use.

      I'm not saying that MTBF isn't a completely unreliable number. I'd imagine there is a correlation between higher MTBF numbers and warranty.

      Great post, by the way. Very informative and well worded. :)

      --
      'Every story, if continued long enough, ends in death.' --Ernest Hemingway
    3. Re:warranties by Murphy+Murph · · Score: 1

      The best metric is probably going to be the length of warranty the manufacturer offers. They have financial incentive to find out the REAL mean time until failure in calculating the warranty.

      ASSuming anything approaching a significant of drives which fail during the warranty period are claimed. Otherwise a warranty is nothing more than advertising.
      I strongly suspect this is not the case and you are simply replacing one false metric with another.
      --
      I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
    4. Re:warranties by afidel · · Score: 1

      It's worse than your post implies because the manufacturers actually specify that drives be replaced every so often to get the MTBF rating. Basically the only thing an MTBF rating is good for is figuring out statistically what the chances are of a given RAID configuration losing data before a rebuild can be completed.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:warranties by ooloogi · · Score: 3, Insightful

      Warranties beyond about two years become largely meaningless for this purpose, because after a drive is getting older people often won't bother claiming warranty for what is by then such a small drive. The cost of shipping/transport is likely to be more than the marginal $/GB on a new drive.

      So in this way a manufacturer can get away with a long warranty, without necessarily incurring a cost for unreliability.

    6. Re:warranties by Anonymous Coward · · Score: 1, Interesting

      Interesting ideas about hard drive reliability. I spend many hours each month looking at hard drive performance as part of my work. My job is to qualify drives (and other devices) for our servers. Also have a large volume of drives in the lab and in the field to monitor.

      I see the useful life of most drives as 3 - 5 years. The drive supplier is going to cover failures inside that 5 year useful life. Most folks are replacing the obsolete gear as new hardware becomes available. The idea that a drive could actually have a million hour MTBF is just fantasy. I see lots of failures with less than 8000 hours (a year) on the drive. Those are certainly outside the "early life failure" category. They are just worn out. Lots of drives have defects that the user doesn't even know about. They don't generate SMART reports or they don't analyze what the report says. I see drives all the time that have only a few hundred hours on them that I wouldn't install in my system.

      MTBF specs for hard drives are a marketing ploy.

    7. Re:warranties by dh003i · · Score: 1

      Thanks. Looks like I was just misunderstanding what you meant by "real MTBF".

      I think the MTBF numbers as published are more useful for, say, enterprises setting up large arrays of hard-drives for redundancy.

    8. Re:warranties by gweihir · · Score: 1

      The best metric is probably going to be the length of warranty the manufacturer offers. They have financial incentive to find out the REAL mean time until failure in calculating the warranty.

      I am still waiting for a replacement of the 15 Netgear GA302 cards that that died after 3 years. The "5 year warranty" is worth nothing with some manufacturers.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    9. Re:warranties by SuperQ · · Score: 1

      MTBF means a LOT to people with large disk arrays. Think about having a cluster of 1000 machines with 2 drives each. That's 2000 drives. If the MTBF is 750,000 hours (seagate specs for SATA drives) that's a broken drive every 15 days on average.

    10. Re:warranties by Anonymous Coward · · Score: 0

      If they had to honor the warranty. But in my experience the hassle of returning an old, small, slow hard drive just isn't worth it for most people compared to the ease of replacing it with a cheap new larger and faster unit.

    11. Re:warranties by Killjoy_NL · · Score: 1

      "real MTBF"

      Buffering... buffering... buffering... ;)

      --
      This is the sig that says NI (again)
    12. Re:warranties by HardCase · · Score: 1

      Just because you don't know what MTBF means doesn't mean that it's a marketing ploy.

    13. Re:warranties by kalirion · · Score: 1

      For all we know, we ARE getting the real mean time. Say 9/10 hard drive fails within an hour, and the last hard drive lasts 10 million hours. There you have it, mean time of 1 million hours. I bet there's at least one hard drive of just about any model, that's never failed. How do you know it won't last a few billion years?

  8. Easy to get the quoted figures ... by Alain+Williams · · Score: 1

    put the 500GB drive into your bottom drawer ... the unused disk will break when thrown out by your great great grand kids - who will simultaneously wonder if you really did use storage of such tiny capacity.

    1. Re:Easy to get the quoted figures ... by JimboFBX · · Score: 1

      In the future, the letter 'a' will take up 5 megabytes of space.

  9. What MTBF is for. by sakusha · · Score: 5, Insightful

    I remember back in the mid 1980s when I received a service management manual from DEC, it had some information that really opened my eyes about what MTBF was really intended for. It had a calculation (I have long since forgotten the details) that allowed you to estimate how many service spares you would need to keep in stock to service any installed base of hardware, based on MTBF. This was intended for internal use in calculating spares inventory level for DEC service agents. High MTBF products needed fewer replacement parts in inventory, low MTBF parts needed lots of parts in stock. Presumably internal MTBF ratings were more accurate than those released to end users.

    So anyway.. MTBF is not intended as an indicator of a specific unit's reliability. It is a statistical measurement to calculate how many spares are needed to keep a large population of machines working. It cannot be applied to a single unit in the way it can be applied to a large population of units.

    Perhaps the classical example is about the old tube-based computers like ENIAC, if a single tube has an MTBF of 1 year, but the computer has 10,000 tubes, you'd be changing tubes (on average) more than once an hour, you'd rarely even get an hour of uptime. (I hope I got that calculation vaguely correct)

    1. Re:What MTBF is for. by dh003i · · Score: 1

      Good post, I think we were on the same wavelength, as I posted something very similar to that below.

    2. Re:What MTBF is for. by sakusha · · Score: 3, Informative

      Thanks. I read your comment and got to thinking about it a bit more. I vaguely recall that in those olden days, MTBF was not an estimate, it was calculated from the service reports of failed parts. The calculations were released in monthly reports so we could increase our spares inventory to cover parts that were proving to be less reliable than estimated. But then, those were the days when every installed CPU was serviced by authorized agents, so data gathering was 100% accurate.

    3. Re:What MTBF is for. by davelee · · Score: 4, Informative

      MTBFs are designed to specify a RATE of failure, not the expected lifetime. This is because disk manufacturers don't test MTBF by running 100 drives until they die, but rather running say, 10000 drives and counting the number that fail during some period of months perhaps. As drives age, clearly the failure rate will increase and thus the "MTBF" will shrink.

      long story short -- a 3 year old drive will not have the same MTBF as a brand new drive. And a MTBF of 1 million hours doesn't mean that the median drive will live to 1 million hours.

    4. Re:What MTBF is for. by flyingfsck · · Score: 2, Informative

      That is an urban legend. Colossus and Eniac were far more reliable than that. The old tube based computers seldom failed, because the tubes were run at very low power levels and tubes degrade slowly, they don't pop like a light bulb (which is run at a very high power level to make a little visible light). Colossus for example was built largely from Plessey telephone exchange registers and telex machines. These registers were in use in phone exchanges for decades after the war. I saw some tube based exchanges in the early 80s that were still going strong.

      --
      Excuse me, but please get off my Pennisetum Clandestinum, eh!
    5. Re:What MTBF is for. by jrumney · · Score: 1

      It cannot be applied to a single unit in the way it can be applied to a large population of units.

      This is the case with any statistic. They are very useful for predicting trends in a large enough population, but completely useless for predicting individuals' behaviour.

    6. Re:What MTBF is for. by tzot · · Score: 1

      More or less, you are correct:

      $ units
      1948 units, 71 prefixes, 28 functions

      You have: 10000 hours
      You want: year
                      * 1.1407955
                      / 0.87658128

      --
      I speak England very best
    7. Re:What MTBF is for. by Bacon+Bits · · Score: 2, Informative

      Exactly, it's a basic misunderstanding of what MTBF means.

      Let's say you buy quality SAS drives for your servers and SAN. They're Enterprise grade, so they have a MTBF of 1 million hours. Your servers and SAN have a total of 500 disks between them all. How many many drives should you expect to fail each year?

      IIRC, this is the calculation:

      1 year = 365 days x 24 hours = 8760 hours per year
      500 disks * 8760 hours per year = 4,380,000 disk-hours per year
      4,380,000 disk-hours per year / 1,000,000 hours per disk failure = 4.3 disk falures per year

      So a 500 disk server farm should expect 4-5 disk failures annually.

      --
      The road to tyranny has always been paved with claims of necessity.
    8. Re:What MTBF is for. by Hatta · · Score: 1

      Except that you can't really calculate the number of drives you'd expect to die this year from the MTBF unless you know something more about how those deaths are distributed. There's no reason to assume a normal distribution.

      --
      Give me Classic Slashdot or give me death!
    9. Re:What MTBF is for. by mollymoo · · Score: 1

      One of the wonders of the normal distribution is that if you have a large number of independent random variables contributing to your value (like the tolerances of the numerous components in a hard drive, which are themselves caused by random process like thermal effects, uncorrelated vibrations and distributions of concentration during mixing) you can indeed reasonably expect a normal distribution.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    10. Re:What MTBF is for. by Reziac · · Score: 1

      Thanks, that explains it nicely.

      Plugging in my own HDs (usually 6 HDs going 24/7) the annual failure number comes out to about 0.05, which is a little low, but not ridiculously so -- in the Real World, I have a HD failure about once every 5 years.

      Hmm... actually, that's right on as an average. My first computer with a HD arrived in 1993. I just replaced the 3rd HD I've ever had get really sick. (Tho it was 8 years old.)

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    11. Re:What MTBF is for. by denmarkw00t · · Score: 1

      "Take the number of hard-drives in the field, (A), and multiply it by the probable rate of failure, (B), then multiply the result by the average out-of-court settlement, (C). A times B times C equals X... If X is less than the cost of a recall, we don't do one." "Are there a lot of these kinds of hard-drive failures?" "Oh, you wouldn't believe." "... Which... hard-drive manufacturer do you work for?" "A major one."

  10. Misunderstanding MTBF by dh003i · · Score: 4, Interesting

    I think that a lot of people are mis-understanding MTBF. A HD might have a MTBF of 100 years. This doesn't mean that the company expects the vast majority of consumers to have that HD running for 100 years without problems.

    MTBF numbers are generated by running say thousands of hard-drives of the same model and batch/lot, and seeing how long it takes before 1 fails. This may be a day or so. You then figure out how many total HD running hours it took before failure. If you have 1,000 HD's running, and it takes 40 hours before one fails, that's a 40,000 hr MTBF. But this number isn't generated by running say 10 hard-drives, waiting for all of them to fail, and averaging that number.

    Thus, because of the way MTBF numbers are generated, they may or may not reflect hard-drive reliability beyond a few weeks. It depends on our assumptions about hard-drive stress and usage beyond the length of time before the 1st HD of the 1,000 or so they were testing failed. Most likely, it says less and less about hard-drive reliability beyond that initial point of failure (which is on the order of tens or hundreds of hours, not hundreds of thousands of hours or millions of hours!).

    To be sure, all-else equal, a higher MTBF is better than a lower one. But as far as I'm concerned, those numbers are more useful for predicting DOA, duds, or quick-failure; and are more useful to professionals who might be employing large arrays of HD's. They are not particularly useful for getting a good idea of how long your HD will actually last.

    HD manufacturers also publish an expected life-cycle of their HD. But I usually put the most stock in the length of the warranty. That's what they're willing to put their money behind. Albeit, it's possible their strategy is just to warranty less than how long they expect 90% of HD's to last, so they can then sell them cheaper. But if you've had a HD and you've had it for longer than what the manufacturer publishes as the expected-life, what they're saying by that is you've basically got a good value, and will probably want to have something else on hand, and be backed up.

    1. Re:Misunderstanding MTBF by flyingfsck · · Score: 1

      Nope, MTBF is usually *calculated* and the number is just that - a number - it means fuck-all in real time. The numbers are used comparitively, to show the designers which potentially stressed components need to be looked at during the design phase. Eventually the numbers are mis-used by the marketing department to mislead the customers, but that is not the intent of the designers and is not the purpose of the MTBF calculations.

      --
      Excuse me, but please get off my Pennisetum Clandestinum, eh!
    2. Re:Misunderstanding MTBF by scsirob · · Score: 2, Insightful

      "A HD might have a MTBF of 100 years"

      That's not how it works. A certain type of HD may have a specified MTBF, a single drive never does. It's all about quantities. A drive may be designed for 5 years of economic life. That's 43800 hours.

      If that type of drive is specified for 1 million hours MTBF, approximately one in every 23 drives will fail within those 5 years.

      If you run a disk array with about 115 of these drives, you will have an average of one drive fail every month. Run a data centre with 3500 drives and you will have a drive failure every day.

      --
      To Terminate, or not to Terminate, that's the question - SCSIROB
    3. Re:Misunderstanding MTBF by glitch23 · · Score: 1

      I think that a lot of people are mis-understanding MTBF. A HD might have a MTBF of 100 years. This doesn't mean that the company expects the vast majority of consumers to have that HD running for 100 years without problems.

      Maybe this is implied but it seems to me that the misunderstanding comes from the fact that they aren't using the correct unit of measurement for MTBF. These "hours" should be documented as "drive-hours" not just "hours" which to me means regular hours.

      MTBF numbers are generated by running say thousands of hard-drives of the same model and batch/lot, and seeing how long it takes before 1 fails. This may be a day or so. You then figure out how many total HD running hours it took before failure. If you have 1,000 HD's running, and it takes 40 hours before one fails, that's a 40,000 hr MTBF. But this number isn't generated by running say 10 hard-drives, waiting for all of them to fail, and averaging that number.

      Given this example, they are measuring drive-hours. A similar type of measurement, man-hours, is used for project lengths.

      --
      this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
    4. Re:Misunderstanding MTBF by dh003i · · Score: 1

      That's a good point. It should be called drive-hours.

  11. Temperature is the key by arivanov · · Score: 4, Interesting

    Disk MTBF is quoted for 20C.

    Here is an example of my server. At 18C ambient in a well cooled and well designed case with dedicated hard drive fans he Maxtors I use for RAID1 run at 29ÂC. My Media server which is in the loft with sub-16C ambient runs them at 24-34 depending on the position in the case (once again, proper high end case with dedicated hard drive fans).

    Very few hard disk enclosures can bring the temperature down to 24-25C.

    SANs or high density servers usually end up running disks at 30C+ while at 18C ambient. In fact I have seen disks run at 40C or more in "enterprise hardware".

    From there on it is not amazing that they fail at a rate different from the quoted one. In fact I would have been very surprised if they did.

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
    1. Re:Temperature is the key by 0123456 · · Score: 1

      From what I remember, the Google study showed that temperature made far less difference than had previously been believed (of course my memory may be past its MTBF).

    2. Re:Temperature is the key by ABasketOfPups · · Score: 5, Interesting

      Google says that's just not what they've seen. "The figure shows that failures do not increase when the average temperature increases. In fact, there is a clear trend showing that lower temperatures are associated with higher failure rates. Only at the very high temperatures is there a slight reversal of this trend."

      On the graph it's clear that 30-35C is best at three years. But up until then, 35-40C has lower failure rates, and both have lower rates by a lot than the 15-30C range.

    3. Re:Temperature is the key by Jugalator · · Score: 3, Informative

      I agree, I had a Maxtor disk that ran at something like 50-60 C and wondered when it was going to fail, never really treated it as my safest drive. And lo and behold, after ~3-4 years the first warnings on bad sectors started cropping up, and a year later Windows panicked and told me to immediately back it up if I hadn't already because I guess the number of SMART errors were building up.

      On the other hand, I had a Samsung disk that ran at 40 C tops, in a worse drive bay too! The Maxtor one had free air passage in the middle bay (no drives nearby), where the Samsung was side-by-side with the metal casing.

      So I'm thinking there can be some measurable differences between drive brands, and a study of this, along with perhaps relationship with brand failure rates would be most interesting!

      --
      Beware: In C++, your friends can see your privates!
    4. Re:Temperature is the key by ViperAFK · · Score: 1

      The problem is you had a maxtor... No wonder it failed.

    5. Re:Temperature is the key by 0123456 · · Score: 1

      "The problem is you had a maxtor... No wonder it failed."

      I've had over a dozen Maxtors in the last decade and none have failed. Of course I replace them every 3-4 years because by then they've got too damn small.

      The only drive I've had that did fail was an IBM, and even then I had plenty of advance warning so I could replace it before it was unusable.

    6. Re:Temperature is the key by afidel · · Score: 1

      Yeah my datacenter is 23-24C and the hottest disk bays in my SAN average about 37C. I don't care because my SAN is designed to lose an entire bay without losing data and the manufacturer is responsible for warranty replacement parts. So far in 22 months of operation it's lost three drive out of 160 and two of those were basically DOA with the other dying at about two months.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    7. Re:Temperature is the key by drsmithy · · Score: 2, Insightful

      However, Google's data doesn't appear to have a lot of points when temperatures get over 45 degrees or so (as to be expected, since most of their drives are in a climate controlled machine room).

      The average drive temperature in the typical home PC would be *at least* 40 degrees, if not higher. While it's been some time since I checked, I seem to recall the drive in my mum's G5 iMac was around 50 degrees when the machine was _idle_.

      Google's data is useful for server room environments, but I'd be hesistant to extrapolate it to drives that aren't kept in a server room with a ~20 degrees C ambient temperature and have active cooling.

    8. Re:Temperature is the key by drsmithy · · Score: 2, Informative

      On the other hand, I had a Samsung disk that ran at 40 C tops, in a worse drive bay too! The Maxtor one had free air passage in the middle bay (no drives nearby), where the Samsung was side-by-side with the metal casing.

      Air is a much better insulator than metal.

    9. Re:Temperature is the key by ooloogi · · Score: 2, Informative

      From the Google study, it would appear that there was a brand of hard drive that ran cool and was unreliable. If there's a correlation between brand/model/design and temperature (which there will be), then the temperature study may just be showing that up.

      To get a meaningful result, it would require taking a population of the same drive and comparing the effects of temperature on it.

    10. Re:Temperature is the key by Fex303 · · Score: 1

      The average drive temperature in the typical home PC would be *at least* 40 degrees, if not higher. While it's been some time since I checked, I seem to recall the drive in my mum's G5 iMac was around 50 degrees when the machine was _idle_. Just for the record, that's really not the case with current Macs. My MacBook's HDD is currently sitting on 34C and remains around the same temperature when the machine is under load. I can't speak to other machines, but I consider that rather impressive given how tiny the MacBooks are.
    11. Re:Temperature is the key by Slippy. · · Score: 1

      And I've worked at an ISP where for a year, 90% of the failed drives were maxtors for a year (20 or 30 failed - lots and lots of drives). Many in the first week of use, then a slow attrition. Probably from the same manufacturing batch - all 40, 80, 120G. Lots with head crashes that sounded like a computer beep as they were failing.

      Two or three chirping maxtors in a server room, but no warning lights on the servers, is confusing as h*ll. :)

      Early 2000's were really bad for HD failures for some reason. It's dropped off now in the places I work. Still failures, but less of them.

      Personally, the worst was still the DeskStar series crap (all in desktops, which made it worse - users hate backing up till it's too late). And all our 60G drives went bad. Every single 60G drive failed eventually too. I still cringe when I think of 60G drives.

      In the 90's, a batch of Fireballs lived up to the name. Thank god it didn't trigger fire suppression in one esp' bad situation.

      That said, in my limitted experience, drive failures don't follow brands. Batches often, but brands have never been reliable for me in the long term.

      A co-worker always says, "Plan for failure, hope for success" - works really well for storage.

      --
      -- Life is good. Tastes like chicken.
    12. Re:Temperature is the key by SoupIsGoodFood_42 · · Score: 1

      My 24" plastic iMac has HDD Bay temp of 48C and fan at 1299rpm -- no disk activity. It's in a cosy room -- probably mid 20s.

    13. Re:Temperature is the key by DaleGlass · · Score: 1

      The Google study didn't include drives running at the insane temperatures they may reach in consumer hardware. Especially the 7200RPM drives. I had a drive running at 55-60C in a tiny box made for a mini-ITX board. 45C is not hard to reach in a normal but badly cooled tower case. Google's study topped out at somewhere about 30C and the graph was going up at that point.

    14. Re:Temperature is the key by arivanov · · Score: 1

      I have yet to see a Maxtor fail when cooled properly (and I ran the IT for a company which bought only maxtors).

      Further to this as far as observations go, DiamondMax series 8 in that company was failing with nearly 100% probability in 2 years when run at 40C+ (inside HP Evo cases and badly designed server enclosures) and with nearly 0% when run at less than 30C (proper case/enclosure).

      So based on my first hand experience - if you have a maxtor - cool it. Otherwise it fails.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    15. Re:Temperature is the key by jdowland · · Score: 1

      The metal casing may have helped conduct away heat. With my thecus n2100, I have found that taking the plastic lid off results in cooler temperatures than leaving it on, but putting a biscuit tin lid on top results in the coolest, although still running at about 50 celcius which is not good :(

  12. Well, duh! by Jurily · · Score: 1

    Everyone who's ever had a hard drive already knows that.

  13. Bring Me the Drivemaker by KermodeBear · · Score: 0

    From a wonderful satire site Married to the Sea, comes this little gem.

    Drive makers have always relied on questionable statistics and outright misrepresentation to make sales, and as we all know, statistics are worse than even damned lies.

    I am not a supporter of industry regulation or class action lawsuits, I think that both are use far too much these days, but it would be nice if these companies were given a hard kick in the pants. They've gotten away with this for far too long.

    --
    Love sees no species.
  14. Recycle, don't just dump it! by Anonymous Coward · · Score: 1, Interesting

    He should look at the escalating price of gold too. Older the computer component the more gold in the connectors and the thicker the gold on the traces, etc.. Not to mention other precious metals involved in some of the components such as platinum, paladium, etc.. Perhaps the greatest consideration should be given to the fact that it would increase the heavy metal pollution at the dump it goes to.

    Probably some nice magnets inside to play with too. :P

    1. Re:Recycle, don't just dump it! by Jafafa+Hots · · Score: 1

      Well, technically this is the pickup the town does twice a year for stuff and they will be recycling it.

      --
      This space available.
  15. My drives work great ... by buchner.johannes · · Score: 0, Offtopic

    My drives work great ... until someone comes along and puts stickers on other drives that say they are more "ready" than my drives.

    --
    NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
  16. Chicken/egg problem...sort of by jmpeax · · Score: 1

    I don't think [disk array manufacturers are] going to be forthright with giving people that data because it would reduce the opportunity for them to add value by 'interpreting' the numbers. By the same token, by acknowledging the realistic lifespan of their products, they could get customers to replace hard disks more often and therefore give them more business.

    However, I strongly suspect that the problem lies in the fact that one manufacturer would have to be the first to change the documented lifespan of their products, and the danger is that unless their competitors follow, their products could be interpreted as inferior and they could lose a lot of business.
  17. What about google? by MMC+Monster · · Score: 1

    Didn't Google present data on their disk failure rates? How about other large purchasers? Who cares if the manufacturers don't report them. If you have some very large purchasers report them, it may be more useful information, anyway.

    --
    Help! I'm a slashdot refugee.
  18. Add value & Interpreting by Nicolay77 · · Score: 1

    I would put the quotation marks around "add value" instead of adding them around "interpreting".

    They are obviously interpreting the numbers.

    How the hell can they be adding value is way beyond me.

    Adding price, may be, but VALUE ????

    --
    We are Turing O-Machines. The Oracle is out there.
    1. Re:Add value & Interpreting by drsmithy · · Score: 1

      How the hell can they be adding value is way beyond me.

      By having larger amounts of data and more skill in interpreting it.

  19. Build your own USB drives by omnirealm · · Score: 3, Informative

    While we are on the topic of failing drives, I think it would be appropriate to include a warning about USB drives and warranties.

    I purchased a 500GB Western Digital My Book about a year and a half ago. I figured that a pre-fab USB enclosed drive would somehow be more reliable than building one myself with a regular 3.5" internal drive and my own separately purchased USB enclosure (you may dock me points for irrational thinking there). Of course, I started getting the click-of-death about a month ago, and I was unpleasantly surprised to discover that the warranty on the drive was only for 1 year, rather than the 3 year warranty that I would have gotten for a regular 3.5" 500GB Western Digital drive at the time. Meanwhile, my 750GB Seagate drive in a AMS VENUS enclosure has been chugging along just fine, and if it fails sometime in the next four years, I will still be able to exchange it under warranty.

    The moral of the story is that, when there is a difference in the warranty periods (i.e., 1 year vs. 5 years), it makes a lot more sense to build your own USB enclosed drive rather than order a pre-fab USB enclosed drive.

    --
    An unjust law is no law at all. - St. Augustine
    1. Re:Build your own USB drives by Anonymous Coward · · Score: 0

      Agree 100% always build your own!

      Yet another reason is that you void the warranty on a pre-built drive enclosure if you open it up. I had a 2.5" hammer enclosure that went bad. The drive itself was fine, but the usb bridge died.

      I needed the data off of it so I took the drive out and popped it into another enclosure I had laying around, got my data and was happy.

      When I called Hammer up to try and get a warranty repair or even PAY for a replacement for the USB controller portion, they told me that my warranty was right-out, and they won't even sell me the part that broke.

      Meh... a stand alone enclosure is like $10 anyway.

    2. Re:Build your own USB drives by line-bundle · · Score: 1

      Did you check/confirm with Western Digital? I bought the My Book World edition. It was clearly written "3 year warranty" on the box, but when I registered it it only said 1 year. After raising a stink they changed my online registration warranty to 3 years.

      Needless to say, it's my last WD drive. Their service suxk.

    3. Re:Build your own USB drives by Anonymous Coward · · Score: 0

      I'd guess from a supplier's POV it makes a lot of sense to reduce the warranty on a USB version. They have to figure users (includes CD-tray-coffee-holder people) will be plugging & unplugging & transporting the units, vs install once to sit in a stationary box. Think about that from an IT support staff perspective for a moment, and you'll probably agree with the supplier's estimate.

    4. Re:Build your own USB drives by micheas · · Score: 1

      I had a drive array that was eating drives about '02.

      WD started sending extra return packages with the replacement drives.

      MTBF was running about two months, or two failures a week.

      Got to know there support department well.

      I think every drive was replaced twice, and then it became stable. so something changed, but I don't really know what.

      I have rarely had drives last their full three years. but pretty much all of them have had good RMA policies.

      I just wish that they had easy to get linux based drive test software so I could easily net boot the machine and test the drives without having to deal with removable media.

  20. MTBF rate calculation method is flawed by DonChron · · Score: 2, Insightful
    Drive manufacturers take a new hard drive, run a hundred drives or so for some number of weeks, and measure the failure rate. Then they extrapolate that failure rate out to thousands of hours... So, let's say one in 100 drives fail in a 1000-hour test (just under six weeks). MTBF = 100,000 hours, or 11.4 years!

    To make this sort of test work, it must be run over a much longer period of time. But in the process of designing, building, testing and refining disk drive hardware and firmware (software), there isn't that much extra time to test drive failure rates. Want to wait an extra 9 months before releasing that new drive, to get accurate MTBF numbers? Didn't think so. How many different disk controllers do they use in the MTBF tests, to approximate different real-world behaviors? Probably not that many.

    Could they run longer tests, and revise MTBF numbers after the initial release of a drive? Sure, and many of them do, but that revised MTBF would almost always be lower, making it harder to sell the drives. On the other hand, newer drives are certainly available every quarter, so it may not be a bad idea to lower the apparent value of older drive models.

    So, it's better to assume a drive will fail before you're done using it. They're mechanical devices with high-speed moving parts, very narrow tolerable ranges of operation (that drive head has to be far enough away from the platters not to hit them, but close enough to read smaller and smaller areas of data). Anyone who's worked in a data center, or even a small server room, knows that drives fail. When I've had around two hundred drives, of varying ages, sizes and manufacturers, in a data center, I observed a failure rate of five to ten drives per year. This is well below the MTBF for enterprise disk array drives (SCSI, FC, SAS, whatever), but drives fail. That's why we have RAID. Storage Review has a good overview of how to interpret MTBF values from drive manufactures.

  21. I don't know what you people do to your drives by gelfling · · Score: 2, Interesting

    But since 1981 I have had exactly zero catastrophic PC drive crashes. That's not to say I haven't seen some bad/relocated sectors, but hard failures? None. Granted that's only 20 drives. But in fact in my experience in PC's, midranges and mainframes in almost 30 years I have seen zero hard drive crashes.

  22. At work by Anonymous Coward · · Score: 0

    I'm at work right now...copying terabytes of data from an array that has failing drives and cannot rebuild itself due to the amount of simultaneous drive failures. I have been here for 32 hours. So, please don't give me this "hard drives never fail" crap!

  23. All my hard drives eventually failed by dfcamara · · Score: 1

    From my first 200MB Seagate (bought in 1993) to a 20GB Maxtor that failed last year. Fortunately they fail when they're no longer my primary drive. I would say they last something about 5-6 years...

  24. MTBF is a useful statistical measure by Kupfernigk · · Score: 3, Insightful
    which many people confuse with MTTF (mean time to failure) - which is relevant in predicting the life of equipment. It needs to be stated clearly that MTBF applies to populations; if I have 1000 hard drives with a MTBF of 1 million hours, I would on average expect one failure every thousand hours. These are failures rather than wearouts, which are a completely different phenomenon.

    Anecdotal reports of failures also need to consider the operating environment. If I have a server rack, and most servers in the rack have a drive failure in the first year, is it the drive design or the server design? Given the relative effort that usually goes into HDD design and box design, it's more likely to be due to poor thermal management in the drive enclosure. Back in the day when Apple made computers (yes, they did once, before they outsourced it) their thermal management was notoriously better than that of many of the vanilla PC boxes, and properly designed PC-format servers like the HP Kayaks were just as expensive as Macs. The same, of course, went for Sun, and that was one reason why elderly Mac and Sparc boxes would often keep chugging along as mail servers until there were just too many people sending big attachments.

    One possibly related oddity that does interest me is laptop prices. The very cheap laptops are often advertised with optional 3 year warranties that cost as much as the laptop. Upmarket ones may have three year warranties for very little. I find myself wondering if the difference in price really does reflect better standards of manufacture so that the chance of a claim is much less, whether cheap laptops get abused and are so much more likely to fail, or whether the warranty cost is just built into the price of the more expensive models because most failures in fact occur in the first year.

    --
    From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
    1. Re:MTBF is a useful statistical measure by dosboot · · Score: 1

      Happen to know the MTTF for typical drives?

    2. Re:MTBF is a useful statistical measure by mindslut · · Score: 1

      I think that "MBTF" is a misleading statistic because of the first word: "mean" or average. Drive reliability tends to be related to "batches", it is not evenly distributed over the whole population of drives. It is unlikely you will get something close to "mean" reliability.

      It is not hard to understand why: hard drives are very complex electro-mechanical devices, and are a commodity under intense cost pricing pressures. There are many high tolerance parts in the system of a drive upon which the whole drive's reliability depends. Periodically one of those components will go through a period of sub-standard quality or reliability, resulting in the lots of drives with that sub-std component having much higher than spec'd MTBF rates. The same model drives from a different "lot" may last virtually forever.

      This bursty reliability is inherent in the product and is common to all drive manufacturers because at the lowest level of components they are all using the same few, trusted, cost-effective suppliers. You really cannot guess the reliability based on the brand or model of drive, though MTBF does give you an idea of what level of reliability they were designed for.

      Hard drives are miracle devices in some ways, and on average very reliable. The challenge is that we entrust them with our invaluable data and need them to be perfectly reliable.

      Multiple disks and software (RAID) seem to be the answer in the data center but considering the bursty reliability - when you have many, many drives, it is quite possible that two drives in a RAID group will fail. This gave rise to diagonal parity RAID, aka RAID 5, or RAID DP, which allows you to recover your data even if two of the drives fail.

      But what do we do about our laptop and home desktop drives? Is anybody committed to RAID at the desktop and laptop levels? Why do laptops come with one fallible drive?

      EP
      "not in the drive business, anymore"

    3. Re:MTBF is a useful statistical measure by mindslut · · Score: 1

      Periodically one of those components will go through a period of sub-standard quality or reliability, resulting in the lots of drives with that sub-std component having much higher than spec'd MTBF rates. Oops. Should have read:

      Periodically one of those components will go through a period of sub-standard quality or reliability, resulting in the lots of drives with that sub-std component having much lower than spec'd MTBF rates.
    4. Re:MTBF is a useful statistical measure by petermgreen · · Score: 1

      you can really feel (in how much it creaks etc) the difference in build quality between a cheap craptop and a decent laptop (macbook, thinkpad, vaio, lattitude etc).

      If the case is creaking/bending that is going to be putting stress on all the circuit boards which is going to increase the failure rate.

      and there are many other areas in which spending a little extra money upfront will really increase the reliability of a peice of electronic equipment that is going to be moved arround all the time (for example my old time craptop has the hard drive solidly screwed to the case, I belive my macbook has it supported by shock absorbing rubber runners).

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  25. RAID, If You Really Care by crunchy_one · · Score: 2, Insightful

    Hard drives have been becoming less and less reliable as densities increase. Seagate, WD, Hitachi, Maxtor, Toshiba, heck, they all die, often sooner than their warranties are up. They're mechanical devices, for crying out loud. So here's a bit of good advice: If you really care about your data, use a RAID array with redundancy (RAID 1 or 5). It will cost a bit more, but you'll sleep better at night. Thank you all for your kind attention. That is all.

    1. Re:RAID, If You Really Care by AK+Marc · · Score: 1

      Just remember, redundant disks is not redundant data. If a drive starts to fail and inserts errors in the stripe before it is detected and taken off-line, you will be left with useless data. I've seen it happen. RAID isn't backup. RAID is uptime. Backups are backup. And an untested backup is a worthless backup.

  26. It belongs in a museum by Cowclops · · Score: 1

    Panama Hat: SO DO YOU!

  27. Typical misleading title (and bad article) by oren · · Score: 2, Insightful

    Disk reliability metrics are much more science than myth. Like all science, this means you actually need to put some minimal effort into understanding them. Unlike myths :-)

    Disks have two separate reliability metrics. The first is their expected life time. In general disks failure follows a "bathtub distribution". They are much more likely to fail at the first few weeks of operation. If they make it past this phase, they become very reliable - for a while anyway. Once their expected lifetime is reached, their failure rate starts steeply climbing.

    The often quoted MTBF numbers express the disk reliability during the "safe" part of this probability distribution. Therefore, a disk with an expected lifetime of, say, 4 years, can have an MTBF of 100 years. This sounds theoretical until you consider that if you have 200 of such disks, you can expect that on average one of them will fail each year.

    People running large data warehouses are painfully aware of these two separate numbers. They need to replace all "expired" disks, and also have enough redundancy to survive disk failures in the duration.

    The article goes so far as to state this:

    "When the vendor specs a 300,000-hour MTBF -- which is common for consumer-level SATA drives -- they're saying that for a large population of drives, half will fail in the first 300,000 hours of operation," he says on his blog. "MTBF, therefore, says nothing about how long any particular drive will last."

    However, this obviously flew over the head of the author:

    The study also found that replacement rates grew constantly with age, which counters the usual common understanding that drive degradation sets in after a nominal lifetime of five years, Schroeder says.

    Common understanding is that 5 years is a bloody long life expectancy for a hard disk! It would take divine intervention to stop failures from rising after such a long time!

    1. Re:Typical misleading title (and bad article) by NerveGas · · Score: 0

      Actually, their numbers are more based in fantasy (marketing) than reality or science.

      They claim an MTBF in the ballpark of 50 years, but that's just a number pulled out of their rectal cavity.

      If you take a large number of drives and perform scientifically valid MTBF failures, you would certainly come up with a number less than half of that, and perhaps as low as 10% of that.

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
    2. Re:Typical misleading title (and bad article) by mobby_6kl · · Score: 2, Insightful

      They claim an MTBF in the ballpark of 50 years, but that's just a number pulled out of their rectal cavity.

      If you take a large number of drives and perform scientifically valid MTBF failures, you would certainly come up with a number less than half of that, and perhaps as low as 10% of that.

      Where did you pull these numbers from?
    3. Re:Typical misleading title (and bad article) by IdeaMan · · Score: 1

      Where did you pull these numbers from? Uhh I don't think we want to know...
      --
      They ARE out to get you simply because They are in it for themselves and they don't care about you.
  28. MTBF assumes drives are replaced every few years by AySz88 · · Score: 3, Informative

    MTBF is only valid during the "lifetime" of a drive. (For example, "lifetime" might mean the five years during which a drive is under warranty.) Thus, the MTBF is the mean time before failure if you replace the drive every five years with other drives with identical MTBF. Thus the 100-some year MTBF doesn't mean that an individual drive will last 100+ years, it means that your scheme of replacing every 5 years will work for an average time of 100+ years.
    Of course, I think this is another deceptive definition from the hard drive industry... To me, the drive's lifetime ends when it fails, not "5 years".
    Source: http://www.rpi.edu/~sofkam/fileserverdisks.html

  29. Wow... by NerveGas · · Score: 1

    Was this even a question? I mean, did anybody actually believe the claims from the hard drive manufacturers?

    --
    Oh, you're not stuck, you're just unable to let go of the onion rings.
  30. To quote Scott McNealy by HockeyPuck · · Score: 1

    People say, 'Tape is kind of boring.' Well, I say go in and tell your customer that you have lost their back-up tapes and you'll see excitement pretty quickly.

  31. try calling sun recently? by Anonymous Coward · · Score: 0

    Have you tried to get a drive replaced from Sun recently?
    It'll take you an hour just to reach someone and then they arn't even in the right department.
    Why?
    Within the past six months the amount of hard drive failures on there equipment has skyrocketed.
    They can't hire people fast enough to answer the phones/web requests for new ones.

  32. WD Green drive - marketing invention by Anonymous Coward · · Score: 1, Interesting

    disclaimer: I work for Samsung 3.5" HDD Lab

    One difference would be that the voice coil motor that pushes the head back and forth on seeks on Samsung drives runs slower, but quieter and lower power. Samsung drives generally have a reputation for being lower power. That has been one differentiating factor between Samsung versus Seagate, Fujitsu and Western Digital. However, an even bigger difference is the number of disks in the drive. The more disks, the harder all of the motors have to work.

    There are differences from model to model within vendors as well. For each new model of hard drive you have a custom designed motor, enclosure, ICs, media, etc. The technology is moving so fast it is hard to follow. The current generation is the 1TB disks.

    One funny example is that right now Western Digital is pushing their so-called "Green" 5400 rpm drives. Running at 5400 rpm does indeed use less power -- but they didn't set out to make a low power drive. Engineering was simply unable to get their 1TB drive to work at the higher performance 7200 rpm. So, they marketed it as a "green" drive, and had a huge success!

  33. Externals failing power supplies by Anonymous Coward · · Score: 0

    A little off-topic, but maybe someone should start complaining about stupid crappy power supplies failing for external hard drives.
    I have had TWO failed power supplies happen on me, one a couple days ago because i accidentally kicked it when i tripped (on the floor...)
    And it wasn't even a full-force kick either, its like one of those nudges you'd use to check if the guy you ran over was dead.

    Also, both drives were from Seagate, it could be that they just fail at making anything good (i'd have to stick with that one)

    I'm almost considering ripping the drive out, breaking the plastic into pieces then mailing it back to Seagate with a little message, something like "LOOK AT ME! I'M A FUCKING WRECK! THIS IS YOUR FAULT DAMN IT! WHY DID YOU USE THOSE STUPID SECURE TORX SCREWS? YOU GOD DAMN SHITPENGUIN!"

    1. Re:Externals failing power supplies by petermgreen · · Score: 1

      did they really use tamper proof torx rather than regular torx? tamper proof torx (with the dimple in the middle of the screw) is quite rare in my experiance.

      regular torx is quite common in such equipment because it gives very good torque transfer and at least in my experiance doesn't tend to foul.

      either way the bits aren't hard to get.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  34. Seagate = 5 yrs, not 3. by ClioCJS · · Score: 2, Informative

    Last I checked.

    --
    -Clio
    Karma: Bad (mostly from not giving a fuck)
    Blog: http://clintjcl.wordpress.com
  35. MTBF Curve by hla · · Score: 1

    I would expect that MTBF for new and old kit would differ significantly, and found http://www.tech-faq.com/mtbf.shtml which defines the 'MTBF Curve' - the variation of MTBF across a a product's lifecycle.

    However, by the vary nature of the beast, this cannot be grasped by a single number, which is what marketeers prefer.

    This would be a nice application for Sparklines [http://en.wikipedia.org/wiki/Sparkline]

    --
    change is inevitable ... change i3 !nevitable ... change i3 inevitable cbange i3 !n
    1. Re:MTBF Curve by Anonymous Coward · · Score: 0

      > I would expect that MTBF for new and old kit...

      I don't think there has ever been a harddrive kit. At least in 35 years in this business I have never seen one. If someone did sell a kit, you'd have to have a clean room to assemble it. A harddrive kit? You sound more like an Onion article.

    2. Re:MTBF Curve by Anonymous Coward · · Score: 0

      Kit = Gear = Stuff = Items...

  36. my 20GB WD has been on continuously since 99 by Anonymous Coward · · Score: 0


    on this workstation (pIII500 1GB RAM XP/98) my 20GB western digital has been on for more or less 24/7 since 1999 and is still going strong, sure its a bit noisy (till it spins down) but no errors or bad blocks and the transfer rate is as good as ever
    my servers drives on the other hand has had 2 drives in 4 years and in my day job i have seen many a maxtor/wd fail after 6 - 18months after the customer purchased it /old adage

    they just dont make 'em like they used to

  37. What it is matters by MrNougat · · Score: 1

    Things which have moving parts have shorter lifespans than things which are solid state.

    Of the pieces of hardware in a computer, these have moving parts:

    Cooling fans
    Hard drives
    Removeable media drives (CD, floppy, tape, etc)
    Switches (power button and the like)
    Ports (if you count things like the pins inside network and modem ports)

    All of those things except hard drives could fail simultaneously, and you'd be pretty likely to be able to have the server running again in short order - by pulling the hard drive(s) and transplanting them into identical hardware. If any one of those non-hard drive things fails alone, the server is likely to continue running long enough to effect replacement before catastrophe (excepting maybe cooling fans, but you still wouldn't have a recovery scenario).

    If the hard drives all simultaneously fail, you need to restore from backup, which is a lot of downtime.

    Because failed drives result in so much downtime, fudging the reliability statistics on hard drives creates am exponentially higher risk to the consumer, one they are not generally aware of.

    Note: I know about RAID, and everyone should use redundant disks. I'm talking a complete failure of all hard drives in a system. Unlikely, yes, but not out of the question. I had two of three drives in a RAID5 in predictive failure on a brand new HP server that I had just finished a customer project with, and had to wait until after the weekend to receive new drives. I was damned lucky.

    --
    Web 2.0 == Giant Blogspam Circle Jerk
  38. In Other Words .... by PPH · · Score: 1

    ... disk drive manufacturers insist on putting spin on metrics.

    Sorry. Mod me -1 Bad Pun.

    --
    Have gnu, will travel.
  39. Except... by absurdist · · Score: 2, Insightful

    ...that by the time the drive fails beyond that warranty, the vendor is more likely than not not going to have any drives that small in stock. So they'll replace it with whatever's on the shelf, which is usually an order of magnitude larger, at the very least.

  40. OMG!!! Johnny Wadd (aka:John Holmes) Resurrected! by rts008 · · Score: 1

    I would not haul MY wang out to the trash...especially if it weighed over 100 pounds! *ducks and runs*

    --
    Down With Slashdot BETA!!! I've been around the corner and seen the oliphant; you can only abuse me from your perspecti
  41. Given enough time... by Stormwatch · · Score: 1

    Given enough time, the failure rate for any kind of device is 100%.

    1. Re:Given enough time... by Anonymous Coward · · Score: 0

      Exactly what was the point of your post?

      No, seriously, I can't see what you brought to this discussion. Are you just looking for somebody to address you as Captain Obvious?

  42. The variation is more important than the mean by Anonymous Coward · · Score: 0

    Which would you rather own? A model that lasts 5 years to the day no matter how many you own, or a model that on average lasts 5 years but is as likely to fail on its 10th day as it is on its 10th anniversary of service?

    Bell curves can be steep or shallow. Steeper curves, having values tightly clustered around the mean, lend themselves to far more predictable results. And the worst part about drive failures is that they're so freaking unpredictable.

    I'm surprised so many have commented on the MTBF without so much as mentioning the relevance of the variation.

    1. Re:The variation is more important than the mean by sakusha · · Score: 1

      Statistically, those two options are equal.

      I'm not a statistician, but I recall hearing the problem described as this: Suppose you you could design an anti-ballistic missile shield that was only 80% efficient. There are two design options. In one case, 20% of the missiles will always penetrate. And in the other case, 20% of the time, all missiles will penetrate. These options were asserted to be equal risks, statistically. Maybe someone else could explain this reasoning, but I've long forgotten the details.

    2. Re:The variation is more important than the mean by zippthorne · · Score: 1

      They are, but it's not a good mapping to the GP.

      For in the first case if you have on additional criterion that you know Which 20% are going to fail, you can set up another, parallel system happens to have a very specific, different 20% of cases of failure and have a 100% missile shield.

      Wheras in the second case, the best you can do is additional layers in series to filter out 80% of the remaining each time.

      Obviously, we're ignoring the case where only one missile defense system would be an option because it's not the case where you would only be limited to one hard drive for any other than fiscal reasons.

      --
      Can you be Even More Awesome?!
  43. Enough with the little sample sizes by fluffy99 · · Score: 2, Insightful

    To the guys who claim they've never lost a drive, you've had what? Maybe 3 or 4? I deal with several large raids, encompassing a few hundred drives and running 24/7. The power and cooling are very tightly controlled. Looking at our statistics, we have about a 5% failure rate for drives within the first year. About 10% over four years. SCSI drives seem to last longer than SATA drives, but they are also much more expensive. The MTBF numbers from the manufacturers are total BS. The best number to go by is the warranty, because that's what matters to the manufacturer. Depending on the expected failure rate of a particular model and the profit margin, they set the warranty period to minimize the number of replacements and still be able to make a profit. Some models that might be a 5% or even 10% warranty replacement rate.

  44. WRONG WRONG WRONG WRONG WRONG! by cwm9 · · Score: 1

    All of this is WRONG.

    All this is just confusion, admittedly happily encouraged by the hardware manufactures.

    MTBF is NOT and has NOTHING to do with the expected time before a drive fails.

    MTBF is the expected time between failures in a SYSTEM which is REGULARLY MAINTAINED.

    What does regularly maintained mean? It means that when a component reaches the end of its SERVICE LIFE that component is REPLACED.

    TO WHIT: If at the end of the warranty period of your drive you replace said drive with a new burned-in hard drive, copying the data from old drive to new drive, and you keep doing this over and over again, on average it will take the MTBF before you encounter a failure.

    Also, MTBF figures are notoriously inaccurate as they are arrived at using a formula which takes into account the MTBF of each component that goes into a system -- components which often have incorrect MTBF times.

    Example: An electrolytic capacitor might have an MTBF of 100,000 years, assuming you replace it with a new tested electrolytic capacitor of the same type every year before all the electrolyte evaporates!

    Knowing the MTBF without knowing the service life of the component or the burn-in procedures for a component is meaningless.

    For more info see: http://www.apcmedia.com/salestools/VAVR-5WGTSB_R0_EN.pdf&revid=607475614&sa=X&oi=revisions_inline&resnum=0&ct=result&cd=3&usg=AFQjCNFpbPO04_wdZ8-aD-sN5yDKUViCsQ

    This has been hashed and rehashed over and over.

    1. Re:WRONG WRONG WRONG WRONG WRONG! by cwm9 · · Score: 1
    2. Re:WRONG WRONG WRONG WRONG WRONG! by fluffy99 · · Score: 1

      MTBF is NOT and has NOTHING to do with the expected time before a drive fails.

      For those who don't know the common accepted definition of MTBF. http://en.wikipedia.org/wiki/Mean_time_between_failures. So yes and no. Some definitions say MTBF is not supposed to include the first failure and some definitions say it does. However, for refurbished drives MTBF should predict the next failure. In both cases the stated MTBF is total hogwash and everyone knows it.

      MTBF is the expected time between failures in a SYSTEM which is REGULARLY MAINTAINED. For a system that has no serviceable parts, MTBF is effectively the average service life. How exactly do you maintain your HD?
    3. Re:WRONG WRONG WRONG WRONG WRONG! by cwm9 · · Score: 1

      Your SYSTEM is your COMPUTER. You maintain it by replacing the hard drive and transferring the data as I said in the original post.

    4. Re:WRONG WRONG WRONG WRONG WRONG! by fluffy99 · · Score: 1

      Okay, so a MTBF for the computer as a whole would be meaningful as it is a system and has serviceable components (parts that need maintenance and have individual service life expectancies). MTBF for a non-serviceable item is meaningless.

      To stretch the example, MTBF makes sense for a car that is properly maintained. After the initial break-in period, assuming routine maintenance is performed, MTBF represents the unexpected failures such as the starter going out.

    5. Re:WRONG WRONG WRONG WRONG WRONG! by cwm9 · · Score: 1

      MTBF for a single part makes perfect sense.

      In fact, all components that go into the hard drive have an MTBF, and it is those MTBFs that all get stuck in a big calculation to get the resultant MTBF.

      Nothing in the definition of MTBF says anything about replacing the entire unit not being permitted.

      Single capacitors/resistors/etc. have MTBFs.

      If you want to calculate the MTBF of your computer, you will need the MTBF of the hard drive, the processor, memory, motherboard... etc.

      It makes perfect sense to talk about the MTBF of the drive by itself. So long as you replace it on a regular basis, copying the data each time, you can expect performance on average to the MTBF.

    6. Re:WRONG WRONG WRONG WRONG WRONG! by cwm9 · · Score: 1

      Actually, in your car example, you would have to know the service life of the starter and replace it every time the service life expired for the MTBF to apply.

      The thing about MTBF is it's pretty useless because most of the time we don't care if something breaks down... we just repair it.

      For example, you don't go replacing the starter on a regular basis because if it goes out you just replace it -- no harm done. You don't replace your hard drive on a regular basis because if it goes out you buy a new one and restore your backups.

      MTBFs apply more to things like 747s, where you DO replace certain parts on a regular basis because if, say, the turbines in the jet failed, you might die.

      NASA cares about MTBFs. You can bet that when a component in a schematic call-out says, "replace every 5 flights", they do it. The MTBF for the shuttle may be very high, but the service life of an o-ring might only be 2 months.

      The problem is that MTBF doesn't do anything for the average consumer. It's a worthless metric for them.

      Now suppose you need a hard drive to go into a U2 bomber...

      You can bet the builder of that plane is going to find out what both the service life AND MTBF of that hard drive is, and that hard drive will get replaced every so many years, and the MTBF will go into the equation for the reliability of that plane.

    7. Re:WRONG WRONG WRONG WRONG WRONG! by cwm9 · · Score: 1

      I guess the thing about MTBF if it's just a huge number that manufacturers throw out there because consumers don't understand what it really means.

      It's like writing "100% Juice" on the front of a Cranberry Cocktail bottle -- the statement is true, but 90% of consumers don't get that it's 80% APPLE/GRAPE juice plus 20% cranberry. But does it help sell Cranberry Cocktail? Hell, yes!

      Is the MTBF of a drive real? Probably pretty close. Does it mean anything for the consumer? Not even close. Does it help sell drives? Hell, yes!

    8. Re:WRONG WRONG WRONG WRONG WRONG! by fluffy99 · · Score: 1

      No. A capacitor is non-serviceable and the first failure is the end of life. The proper term here is MTTF, mean-time-to-failure.

  45. Real world numbers by fluffy99 · · Score: 1
    From the conclusions section of http://db.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html


    7 Conclusion

    Many have pointed out the need for a better understanding of what disk failures look like in the field. Yet hardly any published work exists that provides a large-scale study of disk failures in production systems. As a first step towards closing this gap, we have analyzed disk replacement data from a number of large production systems, spanning more than 100,000 drives from at least four different vendors, including drives with SCSI, FC and SATA interfaces. Below is a summary of a few of our results.

    • Large-scale installation field usage appears to differ widely from nominal datasheet MTTF conditions. The field replacement rates of systems were significantly larger than we expected based on datasheet MTTFs.

    • For drives less than five years old, field replacement rates were larger than what the datasheet MTTF suggested by a factor of 2-10. For five to eight year old drives, field replacement rates were a factor of 30 higher than what the datasheet MTTF suggested.

    • Changes in disk replacement rates during the first five years of the lifecycle were more dramatic than often assumed. While replacement rates are often expected to be in steady state in year 2-5 of operation (bottom of the ``bathtub curve''), we observed a continuous increase in replacement rates, starting as early as in the second year of operation.

    • In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk-independent factors, such as operating conditions, usage and environmental factors, affect replacement rates more than component specific factors. However, the only evidence we have of a bad batch of disks was found in a collection of SATA disks experiencing high media error rates. We have too little data on bad batches to estimate the relative frequency of bad batches by type of disk, although there is plenty of anecdotal evidence that bad batches are not unique to SATA disks.

    • The common concern that MTTFs underrepresent infant mortality has led to the proposal of new standards that incorporate infant mortality[33]. Our findings suggest that the underrepresentation of the early onset of wear-out is a much more serious factor than underrepresentation of infant mortality and recommend to include this in new standards.

    • While many have suspected that the commonly made assumption of exponentially distributed time between failures/replacements is not realistic, previous studies have not found enough evidence to prove this assumption wrong with significant statistical confidence[8]. Based on our data analysis, we are able to reject the hypothesis of exponentially distributed time between disk replacements with high confidence. We suggest that researchers and designers use field replacement data, when possible, or two parameter distributions, such as the Weibull distribution.

    • We identify as the key features that distinguish the empirical distribution of time between disk replacements from the exponential distribution, higher levels of variability and decreasing hazard rates. We find that the empirical distributions are fit well by a Weibull distribution with a shape parameter between 0.7 and 0.8.

    • We also present strong evidence for the existence of correlations between disk replacement interarrivals. In particular, the empirical data exhibits significant levels of autocorrelation and long-range dependence.
  46. I completely disagree! by Atario · · Score: 1
    --
    "A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt
  47. Re:Misunderstanding MTBF [indeed!] by AliasMarlowe · · Score: 1

    If you have 1,000 HD's running, and it takes 40 hours before one fails, that's a 40,000 hr MTBF. Manufacturers employ a more sophisticated method to arrive at unreliable numbers for reliability.

    The manufacturer tests a population of drives, and waits for a significant fraction of the drives under test to fail, recoding the failure time for each. In this way, it is possible to separate "infant mortality" failures from "random event" failures. Typically, the failure times are fitted to a Weibull distribution http://en.wikipedia.org/wiki/Weibull_distribution. This process also provides a value for the post-manufacture burn-in time which will kill most units which are prone to "infant mortality" type failure. The MTBF is estimated based on the "random event" failure rate, giving absurdly large MTBF values. Unfortunately, the test rarely lasts long enough to identify the third type of failure: "wear out", which determines the end of life, and which is often less than the MTBF.

    Think of estimating human life expectancy in the U.S. as MTBF, using data from http://www.data360.org/dsg.aspx?Data_Set_Group_Id=587. A small fraction (683 out of 100000) die in the first year. Death rates are low for the next 60 years, then climb to a very steep peak. The 85+ category is not subdivided, because very few live long enough to make separate 85-94 and 95+ categories worthwhile. However, if life expectancy were calculated for humans in the same way as MTBF for disk drives, then only deaths between ages 1 and 24 would be used. Since from the 99317 who survived infancy, only 126 die in that time, the MTBF for humans would be estimated at several centuries. If only we didn't wear out...
    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
  48. 2 HD failures in 8 years by alizard · · Score: 1

    However, I generally run out of HD space long before the drive has had time to wear out and I buy drives in pairs, one main, one mobile rack with rsync backup, and I occasionally rotate the main and backup drives to equalize wear.

  49. different strokes by alizard · · Score: 1

    I do straight disk image backups to a mobile rack. I do this 3x a week using a modified Knoppix disk with backup scripts (to find out how, google on alizard Knoppix) taking about 15 minutes to rsync the drives together. Upside? If my main HD fails, I'm back up and running in 15 minutes. (I'm running LVM, so I have to change the volume ID to boot normally from a Linux initrd) For you, a bare-metal restore is going to take a lot longer.

  50. since I'm an individual user by alizard · · Score: 1

    I back up to a mirror drive in a mobile rack, which is unplugged from the computer when not actively backing up. I back up 3x/week (IOW, when KAlarm tells me to) using a Knoppix disk modified with rsync and dar backup scripts.

    The problem with RAID is the obvious one. If a disk drive in a RAID environment fails due to factors extrinsic to the drive (i.e. lightning bolt blowing up the UPS and surge protector and dumping into the PSU), every redundant drive probably goes with it. The way to avoid this is ... secondary backup storage, whether to another site, NAS, a backup server, or pile of DVD-Rs.

    1. Re:since I'm an individual user by MrNougat · · Score: 1

      That's not a "problem" with RAID. That has nothing to do with the failure rates of hard drives under normal conditions. Fire, flood, lightning strike - these are all anomalous external forces. Hard drives tend to fail when you put them in a drill press, too, but I wouldn't expect those events to have a statistical significance.

      Besides which, I wasn't talking about "what you should do to be safe in case your hard drives all die." All of that is obvious. I was pointing out that, because of the misrepresentation of failure rates in hard drives, the risk to the consumer of said drives is both higher and more hidden.

      --
      Web 2.0 == Giant Blogspam Circle Jerk
  51. and when a PSU glitch fries your RAID drives? by alizard · · Score: 1

    I use a drive mirror in a mobile rack that is unplugged when not backing up (schedule 3x a week and I'm pretty religious about it), and back up to an offline DVD-R pile monthly. My rack cost me about $20, it's a very nice aluminum case to an SATA plug. And the last time I had to replace the drive, I was up and running in 10 minutes and at the Maxtor site working on my RMA a few minutes later. (props to Maxtor, the warranty replacement was hassle-free)

    And yes, I do sleep better at night. If your RAID array doesn't have some sort of separate offline/nearline backup, you shouldn't.

    1. Re:and when a PSU glitch fries your RAID drives? by crunchy_one · · Score: 1

      You betcha, alizard, both RAID and regular offsite backups are mandatory for that good, refreshing sleep.

  52. Re:Misunderstanding MTBF [indeed!] by dh003i · · Score: 1

    LOL, great post & analogy. Yea, if only we didn't wear out.

    Thanks for the more detailed analysis of how MTBF is calculated (and how burn-in failures are ignored -- shouldn't that also be something that they report?). So it seems like this calc is just enough to get beyond burn-in duds or DOA, and into maybe the "mid-life" of the HD, but not into the burn-out phase. Although with 3 or 5 years being burn-out, that would be impractical to calculate. Albeit, they could provide an estimate based on reported burnouts (StorageReview.com) of their similar HDs manufactured with similar processes.

  53. Lots of drive failures are... not by Anonymous Coward · · Score: 0

    I see this a lot in my line of work. While I have had the occasional hard drive die (I look after a lot of machines, so my odds are up over the normal person - I also have backups so failures aren't a problem), most people who say "my hard drive died" are usually more like "windows refused to boot so I got sold a new hard drive".

    I'm really suprised at the rush to sell people new hard drives in the asshole stores. Just this weekend I got given a drive by someone to 'recover' all their info from it after it had 'died'. I stick it in a case... it's fine. The only thing wrong with it was a fucked up windows registry. So I happily copy all their files off onto some dvd's. Too bad I don't get paid extra for that. I also wonder what would have normally happened to this 'dead' hard drive after it had been replaced. I bet it would have just been reformatted and gone into the next machine.

  54. Google disagrees by Anonymous Coward · · Score: 0

    Google released a report about a year ago with the surprising finding that heat had no apparent effect on the rate of hard disk failure. This was based on Google's set of several tens of thousands of always-on hard disks.

    1. Re:Google disagrees by IdeaMan · · Score: 1

      Yes but google doesn't use laptops.

      I had to replace 3 hard drives in my Dell Inspiron laptop until I wised up to the massive amount of heat coming off of the underside of it and got a laptop cooler.

      --
      They ARE out to get you simply because They are in it for themselves and they don't care about you.
  55. The best drive reliability assesment I could find by the_olo · · Score: 1

    ...is the StorageReview.com Hard Drive Reliability Survey: http://www.storagereview.com/map/lm.cgi/survey_login

    You basically input all the hard drives you possess into their database and then they let you see the statistics collected so far.

    When one of your drives fail, you ought to update its status (at what age has it failed).

    The database still contains a bit sparse information, but it's still the best I could locate on the Internet.

  56. look out for counterfeit drives though by WhiteDragon · · Score: 1

    I bought some hard drives from a company I found through one of those on line cheap price location sites (not mentioning the name because I don't want to encourage them), but it was about 10% cheaper than the next less expensive vendor. My companies policy at the time was to record the serial numbers of all the drives. I noticed that I could not find a serial number printed, but there was a barcode where the serial number field should be. I scanned the barcodes of the drives, noticed that they were all the same, and figured I was just looking in the wrong place. I called Maxtor (the drives were labeled with Maxtor labels), and they had me run some more tests, and they came to the conclusion, "We never made those drives". They were all counterfeit. Needless to say, the drives all failed after only a few days / weeks of use.

    --
    Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?
  57. Response from Western Digital re drive lifespan by Reziac · · Score: 1

    I just wrote W.D. and asked about *actual* expected lifespan of their hard drives, and received this response:

    ===============
    We no longer measure the reliability of our drives using Mean Time Between Failure (MTBF). Our current drive reliability is measured using Component Design Life (CDL) and Annualized Failure Rate (AFR). The Component Design Life of the drive is 5 years and the Annualized Failure Rate is less than 0.8%.
    ================

    --
    ~REZ~ #43301. Who'd fake being me anyway?