Slashdot Mirror


Disk Drive Failures 15 Times What Vendors Say

jcatcw writes "A Carnegie Mellon University study indicates that customers are replacing disk drives more frequently than vendor estimates of mean time to failure (MTTF) would require.. The study examined large production systems, including high-performance computing sites and Internet services sites running SCSI, FC and SATA drives. The data sheets for the drives indicated MTTF between 1 and 1.5 million hours. That should mean annual failure rates of 0.88%, annual replacement rates were between 2% and 4%. The study also shows no evidence that Fibre Channel drives are any more reliable than SATA drives."

284 comments

  1. Repeat? by Corith · · Score: 2, Insightful

    Didn't we already see this evidence with Google's report?

    --
    user corith signing off...
    1. Re:Repeat? by georgewilliamherbert · · Score: 3, Informative

      We did both this study and the Google study in the first couple of days after FAST was over. Completely redundant....

    2. Re:Repeat? by LiquidCoooled · · Score: 2, Interesting

      Yes, and its mentioned in the report.
      The best part about the entire thing is the very last quote:

      "If they told me it was 100,000 hours, I'd still protect it the same way. If they told me if was 5 million hours I'd still protect it the same way. I have to assume every drive could fail."

      Just common sense.

      --
      liqbase :: faster than paper
    3. Re:Repeat? by countSudoku() · · Score: 1

      Yes, it was posted last week... It's still very interesting though.

      http://hardware.slashdot.org/article.pl?sid=07/02/ 21/004233

      --
      This is the NSA, we're gonna geet U h@x0r5! Also, what is a h@x0r5?
    4. Re:Repeat? by Anonymous Coward · · Score: 0
    5. Re:Repeat? by Anonymous Coward · · Score: 1, Funny

      I read so much from the firehose these days, I can't tell a dupe from the scoop anymore. I guess I need a new tag - dejavu.

    6. Re:Repeat? by ajs · · Score: 5, Informative

      The best part about the entire thing is the very last quote:

      "If they told me it was 100,000 hours, I'd still protect it the same way. If they told me if was 5 million hours I'd still protect it the same way. I have to assume every drive could fail."

      Just common sense. It's "common sense," but not as useful as one might hope. What MTTF tells you is, within some expected margin of error, how much failure you should plan on in a statistically significant farm. So, for example, I know of an installation that has thousands of disks used for everything from root disks on relatively drop-in-replaceable compute servers to storage arrays. On the budgetary side, that installation wants to know how much replacement cost to expect per annum. On the admin side, that installation wants to be prepared with an appropriate number of redundant systems, and wants to be able to assert a failure probability for key systems. That is, if you have a raid array with 5 disks and one spare, then you want to know the probability that three disks will fail on it in the, let's say, 6 hour worst-case window before you can replace any of them. That probability is non-zero, and must be accounted for in your computation of anticipated downtime, along with every other unlikely, but possible event that you can account for.

      When a vendor tells you to expect 1 0.2% failure rate, but it's really 2-4% that's a HUGE shift in the impact to your organization.

      When you just have one or a handful of disks in your server at home, that's a very different situation from a datacenter full of systems with all kinds of disk needs.
    7. Re:Repeat? by Anonymous Coward · · Score: 0

      No, this is an indirect duplicate of the *other* disk drive endurance paper that was posted back in mid-February. What is the sound of Slashdot clapping with one hand? A posting of some other online crud that's merely a poor synopsis of the original, already posted item. So typical...

    8. Re:Repeat? by PitaBred · · Score: 1

      But what kind of money would you budget for replacing/fixing drive failure in each case? That's the rub.

    9. Re:Repeat? by Anonymous Coward · · Score: 0
      Yes,

      Your post IS redundant

    10. Re:Repeat? by Ramble · · Score: 0

      Yes, but as the description says these have only been testing on high-performance rigs and servers. Usage and environmental conditions affect MTBF. I'm pretty sure that a desktop drive that Granny keeps for playing solitaire in a cool environment will last longer than a hard disk in a hot server room being hammered.

      --
      "Oh boy"
    11. Re:Repeat? by Maxo-Texas · · Score: 1

      By the same logic tho...

      For ANY number of drives, there is a non-zero chance that all will fail too close to the same time.

      You can't win if you play long enough.

      --
      She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
    12. Re:Repeat? by Detritus · · Score: 2, Funny

      There's also a non-zero probability that all of the air molecules in a room will rush to the corner of the room, suffocating the occupants.

      --
      Mea navis aericumbens anguillis abundat
    13. Re:Repeat? by Baddas · · Score: 2, Funny

      Just think what a fantastic way to die that would be. You'd get all kinds of notoriety.

    14. Re:Repeat? by CastrTroy · · Score: 1

      Doesn't the laws of physics prevent this? Wouldn't a shift of some of the molecules of air to one corner cause a partial vacuum (to some degree) in the rest of the room, causing the room to even itself out. Think of an empty room, with no openings, there'd be very little wind of any kind, apart from heat from the outside causing a little bit of air movement. How is it possible that this could happen?

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    15. Re:Repeat? by Courageous · · Score: 1

      It's in fact most likely that the drives will fail at closely clustered periods of time. I.e., if drives are failing, there's a SIGNIFICANTLY non zero chance that others will be failing really soon.

      C//

    16. Re:Repeat? by shaitand · · Score: 1

      If we were really talking about molecules of air sure but by air we really only mean oxygen. There are plenty of non-oxygen molecules in air that could fill the space previously occupied by air and thus all the oxygen could fill one part of the room while all the other gases filled the side you are in.

      Should this ever happen to you I would recommend breathing out while on the ground and jumping as high as possible to take a breath. Oxygen is a lot lighter than CO2 so if the molecules are going to split it will probably be a vertical split rather than a horizontal split.

    17. Re:Repeat? by ShakaUVM · · Score: 3, Informative

      Except MTBF is just pulled out of their asses. Look at the development cycle of a hard drive. Look at the MTBF. I used to work for an engineering company, and have worked doing test suites to determine MTBF. Sure, there's numbers involved, but it's probably 60% wishful thinking and 40% science.

      Believe me, they aren't determining an 11 year MTBF empirically.

    18. Re:Repeat? by turbidostato · · Score: 0

      "Doesn't the laws of physics prevent this?"

      Not at all. The ones that prevents this (for any practical meaning) are statistical ones.

      "How is it possible that this could happen?"

      Your must remember that air is, for the most part just "dirty" vacuum: there's aplenty of empty space among air molecules, and they all are just "wandering at will". Of course, there's a chance that by mere casuality all air molecules wander just to a corner of a room, so all the rest of the volume is empty and you suffocated. Of course too, when you consider the *hughe* number or molecules there are in a room (just remember Avogadro's number), you can expect having to wait quite a lot universe lifes to see such an event.

    19. Re:Repeat? by philwx · · Score: 1

      I would like to tell them about all the people that throw away perfectly good hard drives and buy replacements when "Winders" crashes.

      \worked as tech support in the past.

    20. Re:Repeat? by Anonymous Coward · · Score: 0

      Your post is both redundant and uninformative.

    21. Re:Repeat? by pipatron · · Score: 1

      IANAB, but wouldn't breathing 100% oxygen be kinda bad?

      --
      c++; /* this makes c bigger but returns the old value */
    22. Re:Repeat? by Anonymous Coward · · Score: 0

      Your post is redundant, uninformative, and unimaginative.

    23. Re:Repeat? by shaitand · · Score: 1

      IANAB either, but I'm fairly sure it would be better than breathing 0% Oxygen.

    24. Re:Repeat? by Maxo-Texas · · Score: 1

      And when statistical extremes occur, people tend to disbelieve them or assume mystical forces are at work. When really, the event was just really improbable.

      There are a huge number of very improbable events so the likely hood that one of them will occur somewhere is fairly high.

      Say that the odds of the air in a 10'x10' area piling up like that was one in a trillion per year. Then that means it happens an average of once a year on some planet somewhere in the universe.

      Likewise, if you have a million drives running, your likely to see some freaky statistical runs that might never occur for the average home user with 10 drives. OTH, your "average" behavior would be much more average than the home user too.

      --
      She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
    25. Re:Repeat? by Sj0 · · Score: 1

      Your post is redundant, uninformative, unimaginative, and redundant.

      --
      It's been a long time.
    26. Re:Repeat? by iamstretchypanda · · Score: 1

      ... WOOOSHHHH

  2. it's relative. by User+956 · · Score: 4, Funny

    The data sheets for the drives indicated MTTF between 1 and 1.5 million hours.

    Yeah, but I bet they didn't say what planet those hours are on.

    --
    The theory of relativity doesn't work right in Arkansas.
    1. Re:it's relative. by bigtangringo · · Score: 2, Funny

      Or what percentage of the speed of light they were traveling.

      --
      Yes, I am a smart ass; it's better than the alternative.
    2. Re:it's relative. by astrashe · · Score: 1

      If an observer on a rail platform measures the MTF of a hard disk on a rail car moving at speeds close to the speed of light...

    3. Re:it's relative. by goombah99 · · Score: 1
      How does it compare to flash MTBF. Or between Manufacturers? If the ratio of actual to stated MTBF is the same for all hard disks that's fine I guess since I know how to divide by 15. But if it varies between manufaruters or between alternative technologies (dvd, harddrive, flash drive, metal film drive, tape) then this matters a great deal as one will make the wrong choices or pay way too much for reliability not gained.

      unless they warantee this, which none do, the spec is meaningless, and they might as well lie.

      --
      Some drink at the fountain of knowledge. Others just gargle.
  3. Masters of estimates by Anonymous Coward · · Score: 0

    First the thing about the drive sizes (1000 or 1024?), now thins guesstimate...

    1. Re:Masters of estimates by dangitman · · Score: 1

      Well, the hard-drive makers are correct on the size thing - a Gigabyte is 1000 Megabytes, and the OS and software makers are wrong. I wish the software side would fix this problem. Does anybody know of any way to change preferences in MacOS or Windows so that filesizes are read out correctly? i.e, that Gigabytes are actually displayed as Gigabytes, or that the listing is changed to correctly display Gibibytes as the value? (or Kibibytes, Mebibytes, whatever)

      --
      ... and then they built the supercollider.
    2. Re:Masters of estimates by Anonymous Coward · · Score: 0

      You're confusing giGabytes with giBabytes.

    3. Re:Masters of estimates by Fulcrum+of+Evil · · Score: 3, Insightful

      Well, the hard-drive makers are correct on the size thing - a Gigabyte is 1000 Megabytes, and the OS and software makers are wrong.

      Yeah, they coined the term and have been using it for 40 years, but they're wrong.

      Gigabytes are actually displayed as Gigabytes, or that the listing is changed to correctly display Gibibytes as the value? (or Kibibytes, Mebibytes, whatever)

      Listen, just because someone comes up with a standard doesn't obligate everyone to use it, especially when they already have a perfectly workable system already. Claiming that NIST can impose an unwanted standard on the world is like saying that it isn't a word until the OED lists it.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    4. Re:Masters of estimates by shaitand · · Score: 1

      Simply because NIST and Europe don't want to confuse the prefixes they are using for other things (like the metric system that has no relevance to computers, tough europeans) doesn't mean the world should adapt to an inconsistant system.

      I for one prefer that the computing world use a consistant system and simply ignore European confusion with metric prefixes. After all, they really aren't relevant here in the US were almost every element of modern computing was invented.

      How about we get rid of all the marketing systems and go back to good old base 2 for ALL data measurements. Stop calling it 10/100 ethernet when it tops out at 11.9mb/s

    5. Re:Masters of estimates by dangitman · · Score: 1

      You're confusing giGabytes with giBabytes.

      How so? A gigabyte is 1000 megabytes. A gibibyte is 1024 megabytes. And there is no such thing as a "gibabyte."

      --
      ... and then they built the supercollider.
    6. Re:Masters of estimates by dangitman · · Score: 1

      Yeah, they coined the term and have been using it for 40 years, but they're wrong.

      What? Software writers coined the greek language? I'm sorry, those greek prefixes have been around for a lot longer than 40 years.

      Listen, just because someone comes up with a standard doesn't obligate everyone to use it, especially when they already have a perfectly workable system already

      But it's obviously not perfectly workable, otherwise this confusion would never come up. And why would it be so hard for software to give me an option? It's my data - if I want it correctly labeled as Gibibytes, instead of incorrectly labeled - why don't I have that option? You can't change the size of a HD after it's made, but you can change how data is displayed in file lists. So, what's the problem?

      --
      ... and then they built the supercollider.
    7. Re:Masters of estimates by dangitman · · Score: 1

      I for one prefer that the computing world use a consistant system and simply ignore European confusion with metric prefixes.

      It's perfectly fine to use base 2 - just use a different label don't incorrectly use language. "Mega" means 1,000, not 1024. It does not mean "around 1000." So, why did some idiot decide to use "Mega" to denote 1024?

      I'm sick of technical and industry types ruining language for everybody. They've done enough damage with jargon and three-letter-abbreviations, why do they have to fuck with scientific language too?

      --
      ... and then they built the supercollider.
    8. Re:Masters of estimates by dangitman · · Score: 1

      Duh. I mean "kilo" means 1000, of course.

      --
      ... and then they built the supercollider.
    9. Re:Masters of estimates by Fulcrum+of+Evil · · Score: 1

      What? Software writers coined the greek language?

      No, we coined the terms kilobyte and megabyte.

      But it's obviously not perfectly workable, otherwise this confusion would never come up.

      There's no confusion, only some deception by HD manufacturers.

      nd why would it be so hard for software to give me an option? It's my data - if I want it correctly labeled as Gibibytes, instead of incorrectly labeled - why don't I have that option?

      Because the correct label is Gigabyte, which is 2^30 bytes. If you care so much, write a patch.

      So, what's the problem?

      We don't want to change something that works just fine because some people who only tangentially get involved with what we do every day decided they want to redefine our jargon.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    10. Re:Masters of estimates by shaitand · · Score: 1

      Truthfully I couldn't agree more. But the fact is that this particular screwing with the language was done decades ago and the damage is done. Attempting to change it now will only make things worse.

    11. Re:Masters of estimates by shaitand · · Score: 1

      'They've done enough damage with jargon and three-letter-abbreviations, why do they have to fuck with scientific language too?'

      Last I checked the metric system had nothing to do with science. It is simply another arbitrary measurement system that happens to be a little easier to work with when using base 10.

    12. Re:Masters of estimates by dangitman · · Score: 1
      Don't scientists use measurements?

      Language is arbitrary. But it works a lot better when it's consistent. Some computer geek made a mistake some years ago, and used the wrong label. Why not just admit this mistake, change it, and move on? Science and human endeavor should be about improving ourselves, not getting stuck in old habits. A long time ago, somebody proclaimed that the sun revolves around the earth. Should we just stick to this incorrectness, just because it happened before we knew that it is not true? Somehow, the rest of the world manages to evolve and move on, yet computer geeks want to cling to incorrectness out of habit - even though the history of computers is like the blink of an eye compared to other fields of knowledge. Humans have gotten over much larger misconceptions, so why is it so difficult to get over something so trivial as this?

      --
      ... and then they built the supercollider.
    13. Re:Masters of estimates by dangitman · · Score: 1

      No, we coined the terms kilobyte and megabyte.

      Which contain the greek prefixes. So, they were coined in error.

      There's no confusion, only some deception by HD manufacturers.

      How can they be being deceptive when they are using accurate terms? And are network interface manufacturers being deceptive? Streaming video providers being deceptive?

      Because the correct label is Gigabyte, which is 2^30 bytes.

      No, it's not. You are using the label incorrectly, and not using standards.

      We don't want to change something that works just fine because some people who only tangentially get involved with what we do every day decided they want to redefine our jargon.

      But it doesn't work just fine. If it did, why does this argument even exist? How can it possibly work "just fine" when it is completely out of sync with every other field on the planet, and every other usage of these prefixes? And what do you mean "tangentially involved"? Who are "we"? I work with data every day, it's not tangential at all. What gives you authority over this?

      --
      ... and then they built the supercollider.
    14. Re:Masters of estimates by dangitman · · Score: 1

      But the fact is that this particular screwing with the language was done decades ago and the damage is done. Attempting to change it now will only make things worse.

      I'd have to disagree. Decades is nothing. Computer technology is still in its infancy. What better time to break this bad habit than now, just before computing gets really widespread?

      See my other reply for more of this argument, but we've held other misconceptions for centuries, even millenia - yet we are able to change those. Why should it be so hard to whack this silly bit of trivia on the head, when we are so early in the game?

      --
      ... and then they built the supercollider.
    15. Re:Masters of estimates by dangitman · · Score: 1
      P.S:

      We don't want to change something that works just fine because some people who only tangentially get involved with what we do every day decided they want to redefine our jargon.

      Why are you so afraid of change? It's only two letters in a label! Why are you so attached to it? That doesn't seem like a very geeky or rational approach - what ever happened to looking to the future and improving things?

      Also, you use "we" and "our" a lot. You say that "we" coined the term. Are you saying that you personally had something to do with coining these terms? I find that hard to believe. Who "owns" them? Again, we get back to the greek problem. Surely, "they" own their language, and you shouldn't be abusing it. Why don't you come up with your own damn alphabet and language? Huh?

      --
      ... and then they built the supercollider.
    16. Re:Masters of estimates by shaitand · · Score: 1

      'Computer technology is still in its infancy. What better time to break this bad habit than now, just before computing gets really widespread?'

      I don't know how it is elsewhere in the world. But here in the US there is a computer in pretty much every home. They are about as common as screwdrivers and hammers. I don't know how much more widespread you could get than that.

      'Why should it be so hard to whack this silly bit of trivia on the head, when we are so early in the game?'

      Marketing. The moment you create a different label for the proper base 2 units that are actually required for computing you have just vindicated the marketing droids. The marketing droids will continue to the terms just as they always have (incorrectly) because you have just made it correct for them to do so. The public will never be aware that any change was made at all and will continue to mistake a base 2 megabyte for a marketing megabyte.

      The only way to make this change happen appropriately would be to require labels on all computer merchandise that use base 2 units much like we do for food.

    17. Re:Masters of estimates by Fulcrum+of+Evil · · Score: 1

      Why are you so afraid of change?

      Why fetishize change? I like how things are, and there's no compelling reason to change.

      It's only two letters in a label! Why are you so attached to it?

      Largely because I'm used to kilo and kibi, mebi, gibi sound retarded. It's never come up in any job I've held, and the only people I've heard advocate it live on slashdot.

      Also, you use "we" and "our" a lot. You say that "we" coined the term. Are you saying that you personally had something to do with coining these terms?

      I identify with my profession. What are you on about?

      Who "owns" them? Again, we get back to the greek problem.

      The SI units are adapted from both greek and latin. In greek, they didn't mean 10^n, they were various words for 'big' and fifth or sixth. They also don't count for information conten. Mega is 10^6 or 2^20 depending on context.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    18. Re:Masters of estimates by Fulcrum+of+Evil · · Score: 1

      Which contain the greek prefixes. So, they were coined in error.

      No, they are used differently. Bytes are not part of SI, so we get to do what makes sense for what we do.

      How can they be being deceptive when they are using accurate terms?

      They aren't being accurant. About 10-12 years back, western digital started using 10^9 to compute disk capacity so they could list a bigger number. They got sued for false advertising, so they (and everybody else) now have to declare that they're using the 10^9 number when they say Gigabyte. Marketing weasels do not set standards.

      No, it's not. You are using the label incorrectly, and not using standards.

      bytes are outside of SI and ISO-31 doesn't apply to information content, no I'm not because there's no standard. That gibi foolishness doesn't count because nobody cares about it.

      If it did, why does this argument even exist? How can it possibly work "just fine" when it is completely out of sync with every other field on the planet, and every other usage of these prefixes?

      We have different requirements, and a 5% difference for someone outside the field isn't a big deal. We won't change what we're doing because computers are base 2, not 10, so accomodations must be made.

      And what do you mean "tangentially involved"?

      A bunch of standards bodies have adopted this new notation in the face of a lack of controversy.

      What gives you authority over this?

      There is no authority - standards bodies have none unless it is given to them by the people who they purport to standardize. I am merely noting that most of us in the field use mega and giga, while the mebi, gibi faction is small and anemic. What gives you authority over this?

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    19. Re:Masters of estimates by dangitman · · Score: 1

      I don't know how it is elsewhere in the world. But here in the US there is a computer in pretty much every home. They are about as common as screwdrivers and hammers. I don't know how much more widespread you could get than that.

      But they are very primitive, and not used anywhere near their potential. many of the computers go almost unused. How much more widespread could they get? Well, a computer in everyone's pocket for a start. And also us making more effective use of them.

      The marketing droids will continue to the terms just as they always have (incorrectly) because you have just made it correct for them to do so. The public will never be aware that any change was made at all and will continue to mistake a base 2 megabyte for a marketing megabyte.

      I'm not seeing it. The current situation is much more vulnerable to marketing manipulation. If the terms were not so often interchanged, then an informed consumer could easily know what they are getting.

      In any case, I don't see any evidence of this conspiracy by HD manufacturers that people keep whining about. On every company website that I've seen, they actually define "Gigabyte" (correctly) in the small print. This is something that I don't see very often in software or OSes, which freely use the term incorrectly, but don't disclose their definition.

      The deception thus should lie on the software side - for example, is a software maker deceiving you, when they say that their application uses 256MB of RAM, when it is really consuming 256MiB of RAM? I'm not sure why everyone blames the HD manufacturers, but software gets off scott-free with their misrepresentations.

      The only way to make this change happen appropriately would be to require labels on all computer merchandise that use base 2 units much like we do for food.

      Couldn't we just encourage programmers and other people in the industry to use those terms correctly? For example, if my OS listed Mebibytes correctly, then people would become aware of it. Eventually, companies who did not use the standards would be questioned, and market/peer pressure would prevail.

      The problem is that, as you can see on slashdot, programmers and others are so addicted to their labeling out of habit, that they want to either ignore the problem, or shout down anyone who proposes a standard. Although this label might be a minor issue in the scale of things - the attitude behind this really reveals the primitiveness of the industry as I alluded to above. Such unprofessional attitudes are widespread, and it makes computer people seem very immature compared to other sciences and fields of engineering - where the practitioners are much more willing to debate more rationally, and change their ways.

      --
      ... and then they built the supercollider.
    20. Re:Masters of estimates by dangitman · · Score: 1

      Why fetishize change?

      I'm not. I'm arguing for consistency, and change when something isn't working well.

      I like how things are, and there's no compelling reason to change.

      Except there is a compelling reason to change. You just want to ignore it because of the sound of a word.

      Largely because I'm used to kilo and kibi, mebi, gibi sound retarded.

      That's just about the lamest argument ever. If you don't like the sound of it, then why didn't you think about that when "you" incorrectly used "Mega" and "Giga" to avoid this problem? But if you don't like the sound of "Mebi" - then just switch to "Mega" and use it correctly. But seriously. There are plenty of lame-sounding terms. Are these any lamer than "Kelvin" or "meter"? It's just because you aren't used to them.

      I identify with my profession. What are you on about?

      But not everybody in your profession agrees with this usage. What makes you think that all programmers agree with you? What am I on about? You claimed that "we" invented the term. So, did you, or did you not have anything to do with coining the term? If you didn't, you can't claim you were a part of the decision. Just being a programmer doesn't give you credit for that coinage, any more than you being a programmer means you were a part of inventing the Turing test.

      Mega is 10^6 or 2^20 depending on context.

      Which is why it is so retarded. It means 10^6 everywhere - except for this one "special" area. Where everything has to be different. Why? If you want computer science and engineering to be taken more seriously, and rise past the level of psychology, you should start acting more like the real scientists and engineers. Do you want to be stuck at the kid's table, or on the "special bus" where we have to make special exceptions for little Johnny IT (psst - he's a bit slow), or do you want to come to the big table?

      --
      ... and then they built the supercollider.
    21. Re:Masters of estimates by dangitman · · Score: 1

      No, they are used differently. Bytes are not part of SI, so we get to do what makes sense for what we do.

      So what? Mega and Giga are a part of SI, and a part of a consistent language beyond that. Just because "byte" was coined by programmers/computer science - doesn't mean it's appropriate to modify the prefixes.

      Megabyte - is literally "Million Bytes"

      The size of a byte, and its basis in binary math has nothing to do with it.

      western digital started using 10^9 to compute disk capacity so they could list a bigger number. They got sued for false advertising, so they (and everybody else) now have to declare that they're using the 10^9 number when they say Gigabyte. Marketing weasels do not set standards.

      That's just dumb. They got sued for using a term literally? It would have been more appropriate to sue software manufacturers for saying "this software requires 256MB of disk space" - when in actual fact it uses more. Remember, Mega=10^6.

      Doesn't this debacle show how stupid people were for abusing the prefixes in the terminology? It was never formally defined as a standard, it was just a casual (mis)use of language. Western Digital is hardly to blame for that ambiguity, and that lawsuit should have been dismissed as baseless.

      You claim there is no problem - but then you clearly demonstrate that there is a big problem - a company can get sued for using language more consistently. That's not good.

      We won't change what we're doing because computers are base 2, not 10, so accomodations must be made.

      Again, the arrogance and immaturity of the industry goes on show. You won't change because you're egotistical. In reality, the programmer works for the user. The user is more important than the programmer. (excluding of course, programs made by the programmer for him/herself)

      Just because you program in base 2 has no relevance to the end user, who is most likely NOT using a base two number system. Shit, why even display decimal values at all? If you love binary so much, why doesn't your file browser list filesizes in binary? Why even use "mega" which is base ten? Why not just display all values as a direct binary value?

      I am merely noting that most of us in the field use mega and giga, while the mebi, gibi faction is small and anemic.

      That's because of attitudes like yours. Having the majority of an industry being wrong isn't exactly a good thing. It threatens the credibility of the industry. And there's nothing to stop you from continuing to use "mega" - just use it correctly. Most people use a decimal number system. And the binary system only really comes into play when doing low-level functions right up against the silicon. The majority of our use is in the higher layers.

      What gives you authority over this?

      None. Just like you have none. But metric and ISO standards are pretty much where it's at. It would be good for the industry to get involved in that, rather than being a ghetto.

      --
      ... and then they built the supercollider.
    22. Re:Masters of estimates by Fulcrum+of+Evil · · Score: 1

      But if you don't like the sound of "Mebi" - then just switch to "Mega" and use it correctly.

      I am. Megabyte si 2^20 bytes. Bytes are not SI units.

      But not everybody in your profession agrees with this usage.

      I don't see a groundswell of support for your position.

      So, did you, or did you not have anything to do with coining the term?

      We as in my profession. Are you not quite right in the head?

      If you want computer science and engineering to be taken more seriously, and rise past the level of psychology, you should start acting more like the real scientists and engineers.

      That's rich - we aren't taken seriously because we don't use SI units. What are you typing on, again?

      Do you want to be stuck at the kid's table, or on the "special bus" where we have to make special exceptions for little Johnny IT (psst - he's a bit slow), or do you want to come to the big table?

      I am at the big table. When you grow up, you can be there too.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    23. Re:Masters of estimates by Fulcrum+of+Evil · · Score: 1

      Megabyte - is literally "Million Bytes"

      No, literally, it is 'monster byte'. Mega is only a million in the context of SI. Bytes are not SI.

      They got sued for using a term literally?

      They got sued for using a term differently from the common usage in order to deceive the buying public.

      Doesn't this debacle show how stupid people were for abusing the prefixes in the terminology?

      No, not really.

      It was never formally defined as a standard, it was just a casual (mis)use of language

      Look up 'de facto' and get back to me.

      You won't change because you're egotistical.

      No, because it works just fine.

      The user is more important than the programmer.

      Not so. Users are more easily replaced. If you want to let the users define technical jargon, then pleaase refer to your computer as the hard drive and prefix every program's name with 'microsoft', even when someone else makes it. Face it, most users have no concept of size - what does 200M mean? Is that too big to send over email?

      That's because of attitudes like yours.

      Yeah, we like our terms as they are, so the mebi contingent is minor. That's what you call a tautology.

      Having the majority of an industry being wrong isn't exactly a good thing.

      Use defines meaning. The majority of the industry says megabyte = 2^20 bytesm thus it is so.

      But metric and ISO standards are pretty much where it's at. I

      Only to the extent that we allow it. Since the use of Mega is already well established, you have sod all chance of changing things.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
  4. In other news... by Mr.+Underbridge · · Score: 4, Informative

    ...Carnegie Mellon researchers can't tell a mean from a median. This is inherently a long-tailed distribution in which the mean will be much higher than the median. Imagine a simple situation in which failure rates are 50%/yr, but those that last beyond a year last a long time. Mean time to failure might be 1000 years. You simply can't compare the statistics the way they have without knowing a lot more about the distribution than I saw in the article. Perhaps I missed it while skimming.

    1. Re:In other news... by Anonymous Coward · · Score: 0

      Except this isn't the first of these studies. Google ran one based on their own drive usage, and another group did one over an even larger set of drives than Google used (!)

      Their findings: 1) Drives suck. 2) Expensive drives don't suck less. and finally 3)

      those that last beyond a year last a long time

      is false. There is no "bathtub" distribution of drive failures with a spike at the beginning and the end. The "burn-in" myth is just that.

      Both of these reports were on /. just a few weeks ago.

    2. Re:In other news... by Falkkin · · Score: 3, Informative

      In other news, Carnegie Mellon researchers know more about statistics than you give them credit for; blame ComputerWorld for crappy coverage of what the paper says. If you read the paper or the abstract, the researchers actually claim the opposite of what you are suggesting, namely, that the "infant mortality effect" (bathtub curve) often claimed for hard drives isn't actually the case. See Figure 4 in the paper and Section 5 ("Statistical properties of disk failures"). The paper is online here:

      http://www.usenix.org/events/fast07/tech/schroeder /schroeder_html/index.html

    3. Re:In other news... by Anonymous Coward · · Score: 0

      You've been busted, Mr. Underbridge!

    4. Re:In other news... by kidgenius · · Score: 1
      And in further other news....Carnegie Mellon researchers don't know enough about statistics as they think they do.

      Let me explain. They show a weibull slope of 0.71. This is highly indicative of infant mortality in their system, and leaning towards random failure. So, to say that it isn't infant mortality goes against their actual statistical results. So, they've got some mix-up of data. One thing says "failure rates increased with time", but then their weibull shows otherwise. They didn't do it right. Also, they confused MTTF and MTBF. They tried to call them the same when they're not. There is a big difference, and there's a reason manufacturers state MTTF instead of MTBF. I even ran a simulation based off the CMU data and I can show an MTTF of 1M hours for the drives, yet my MTBF is five times lower. They attempted reliability and failed. If they actually got some help from someone knowledgeable in the field, then they wouldn't have made these HUGE, glaring errors, and their results would be more believable.

  5. Personally I am SHOCKED by dingbatdr · · Score: 2, Insightful

    Yes, I am SHOCKED that companies have implemented a systematic program of distorting the truth in order to increase profits.

    I propose a new term for the heinous practice---"marketing".

    --
    The truth is an offense, but not a sin.------R. N. Marley
    1. Re:Personally I am SHOCKED by Beardo+the+Bearded · · Score: 4, Informative

      What, really?

      The same companies that lie about the capacity on EVERY SINGLE DRIVE they make? You don't think that they're a bunch of lying fucking weasels? (We're both using sarcasm here.)

      I don't care how you spin it. 1024 is the multiple. NOT 1000!

      Failure doesn't get fixed because making a drive more reliable means it costs more. If it costs more, it's not going to get purchased.

      --

      ---
      ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
    2. Re:Personally I am SHOCKED by Lord+Ender · · Score: 3, Informative

      Before computers were used in real engineering, we could get away with "k" sometimes meaning 1024 (like in memory addresses) and sometimes meaning 1000 (like in network speeds). Those days are past. Now that computers are part of real engineering work, even the slightest amount of ambiguity is not acceptable .

      Differentiating between "k" (=1000) and "ki" (=1024) is a sign that the computer industry is finally maturing. It's called progress.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    3. Re:Personally I am SHOCKED by JackMeyhoff · · Score: 1

      You mean like the AMD CPU ratings for example? Shocking isnt it.

      --
      http://www.rense.com/general79/wdx1.htm
    4. Re:Personally I am SHOCKED by DogDude · · Score: 1

      Failure doesn't get fixed because making a drive more reliable means it costs more. If it costs more, it's not going to get purchased.

      I couldn't disagree more. I know that I would pay more for even somewhat more reliable drives. The problem is that I can't find any sold that guarantee any kind of reliability other than the rock-bottom standard one year.

      --
      I don't respond to AC's.
    5. Re:Personally I am SHOCKED by Hamilton+Lovecraft · · Score: 0

      That's pretty hilarious. Computers used to be part of real engineering work. Now they're toys.

      --
      step 3: god dammit, it doesn't work
    6. Re:Personally I am SHOCKED by Intron · · Score: 1

      And those lying road signs, too. Everyone knows there should be 1024 meters in a kilometer!

      --
      Intron: the portion of DNA which expresses nothing useful.
    7. Re:Personally I am SHOCKED by Anonymous Coward · · Score: 0

      And who are you, may I ask? Are you some idiot engineering school freshman?

      Because whenever someone starts bragging about "real" engineering or "real" programming or whatever the hell they think is in, you can pretty much guarantee what follows is pure crap.

    8. Re:Personally I am SHOCKED by Thirdsin · · Score: 1

      Strangle me with my own wireless keyboard, but a large part of my job is replacing these failing drives. So, as much as i'd like to scream and yell for increased reliability I can't. There is a lot to say for my (current) job security ;-)

      --
      No words of wisedom here.
    9. Re:Personally I am SHOCKED by Moofie · · Score: 1

      If you rely on others' incompetence for job security, then you are an incompetent.

      --
      Why yes, I AM a rocket scientist!
    10. Re:Personally I am SHOCKED by Lord+Ender · · Score: 1

      Yes, true wisdom only comes from Anonymous Cowards, like yourself. I'm convinced.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    11. Re:Personally I am SHOCKED by Fulcrum+of+Evil · · Score: 1

      Now that computers are part of real engineering work, even the slightest amount of ambiguity is not acceptable .

      Since when are lying marketroids doing real engineering?

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    12. Re:Personally I am SHOCKED by Thirdsin · · Score: 1

      Or I am just starting out in the field. Some might call this gaining "experience". Think outside the box moron. ::smooches::

      --
      No words of wisedom here.
    13. Re:Personally I am SHOCKED by CorSci81 · · Score: 2

      I'd just like to point out that computers were used for "real" engineering long before they became ubiquitous in the workplace or home. Why do you think FORTRAN is one of the oldest computing languages in existence?

    14. Re:Personally I am SHOCKED by Chonine · · Score: 3, Informative
      Standard metric is indeed powers of 10, and a megabyte is indeed 10^6 bytes.

      To clear up the confusion, the notation for binary, as in 2^20 bytes was developed. That would be a Mebibyte.

      http://en.wikipedia.org/wiki/Mebibyte

    15. Re:Personally I am SHOCKED by binarybum · · Score: 4, Funny

      yeah, I used to think they were dirty bastards, but they just work on a different scale than the rest of us.
          The trick is to purchase your HD in pennies.

        "100,000 pennies! why that's 1024 dollars!!"

      --
      ôó
    16. Re:Personally I am SHOCKED by Ed+Avis · · Score: 1

      Are network card makers also evil because '100 megabits per second' really does mean 100 million bits and not 1.048576 million?

      --
      -- Ed Avis ed@membled.com
    17. Re:Personally I am SHOCKED by TheThiefMaster · · Score: 1

      It has always been that when dealing with bytes k was 1024 and when dealing with anything else it was 1000. Network speeds are in bits per second, which is not bytes, therefore it uses k=1000. Hard drives are in bytes, so k=1024.

      And I refer you to a previous post I made where I mentioned having two "Maxtor 160GB" hard-disks, one which used the definition GB=1,000,000 kB (with k=1024) and the other which used the definition GB=1,000,000,000 Bytes. That's just plain being confusing on purpose.
      http://hardware.slashdot.org/comments.pl?sid=22297 8&cid=18059026

    18. Re:Personally I am SHOCKED by Timothy+Brownawell · · Score: 2, Insightful

      Before computers were used in real engineering,

      Computers have *always* been used for "real engineering" as you call it. It's only recently that they've gotten cheap enough to use as toys.

      we could get away with "k" sometimes meaning 1024 (like in memory addresses) and sometimes meaning 1000 (like in network speeds). Those days are past.

      WTF? It's like any other part of language, things have different meanings in different contexts. What does "cat" mean?

      Now that computers are part of real engineering work, even the slightest amount of ambiguity is not acceptable .

      Ok, so do we rename cat-the-program or cat-the-heavy-machinery (and what about cat-the-animal)? Computers and heavy machinery are both used for "real engineering work", so we can't have any ambiguity in which we're talking about. That would be not acceptable .

      Differentiating between "k" (=1000) and "ki" (=1024) is a sign that the computer industry is finally maturing. It's called progress.

      No, it's a sign that too many people have sticks up their butts and can't accept that language can be context-dependent. The world is not binary, and failing to recognize this is likely one reason that software sucks so much.

      Also, it's a sign that disks (as opposed to ram) are sized by cost, rather than efficient use of address lines. Ram is sold in power-of-2 sizes for technical reasons. Disks are different enough that those technical reasons aren't there, so marketing dictates that the prefixes used be chosen to give the largest numbers.

    19. Re:Personally I am SHOCKED by MightyYar · · Score: 1

      I'm kind of a youngin', only been out of engineering school for 9 years... but in my experience, seasoned engineers don't pay too much attention to specs listed on components. If a spec is critical, you need to test against it. While I certainly saw this in-house, the Japanese seem almost religious about it - I would get calls and emails all the time from our Japanese field service guys because the customer had tested against our spec and (surprise!) it didn't come out exactly as our spec stated.

      Thus, a hard drive marked with k instead of ki: (a) would only surprise someone who who has never purchased a drive before and (b) wouldn't matter to a "real" engineer because that engineer would test the drive to make sure it was suitable for the application.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    20. Re:Personally I am SHOCKED by Anonymous Coward · · Score: 0

      Yes, when it was rammed down our throats by marketing teams lobbying for a standard change because they wanted to inprove the computer industry... I ahd nothing to do with making the drive looking bigger.

      Fucktwit.

    21. Re:Personally I am SHOCKED by subl33t · · Score: 1

      Well done.

    22. Re:Personally I am SHOCKED by Ryan+Mallon · · Score: 4, Funny

      Why do you think FORTRAN is one of the oldest computing languages in existence?

      Because it was invented before most other computer languages? Is this a trick question ;-)

    23. Re:Personally I am SHOCKED by HappyEngineer · · Score: 1

      I don't understand why anyone would use the base 2 definition. A kilometer is 1000 meters. A giga-year is a billion years. A mega-liter is a million liters. A teraflop is a trillion flops.

      A teraflop is not 2^40 flops. It is 10^12 flops.

      If I have a gigabyte then I assume I have a billion bytes. If you want to use prefixes that denote base 2 then you should come up with prefixes that are not the same as the metric prefixes.

      The hard drive manufacturers are of course just doing it for marketing purposes, but that doesn't make it wrong.

    24. Re:Personally I am SHOCKED by ShakaUVM · · Score: 1

      There's no particular reason why a Megabyte should be 1024 bytes, instead of 1000 bytes, which is the SI standard.

    25. Re:Personally I am SHOCKED by CastrTroy · · Score: 1

      I think that manufacturers brought the warranty down to 1 year, but consumers started to get really mad, so they started to raise the warranties again. Here's some drives with 5 year warranties.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    26. Re:Personally I am SHOCKED by Anonymous Coward · · Score: 0

      What's up with the hate of ACs? Sometimes they have important things to say, too - and reasons for being anonymous.

      Dumbass!

    27. Re:Personally I am SHOCKED by Anonymous Coward · · Score: 0

      You've wasted your time. DogShit doesn't want real information; he obviously didn't even look for it. He's just a sad little troll.

    28. Re:Personally I am SHOCKED by evilviper · · Score: 1

      Before computers were used in real engineering, we could get away with "k" sometimes meaning 1024 (like in memory addresses) and sometimes meaning 1000

      BEFORE? So the ENIAC was built to play solitare?

      Engineering was one of the very FIRST uses of computers, it's the frivolous crap that is the recent addition.

      So, yeah, your whole post is utter nonsense.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    29. Re:Personally I am SHOCKED by shaitand · · Score: 1

      Yes but computers don't use the metric system, they use a base 2 system. There is no context in which is appropriate to apply metric reasoning to computers. Especially when USians still constitute a majority share in the modern computing world and USians don't use the metric system or recognize its prefixes.

      The biggest reason is that memory uses base 2. Since everything must be addressed through memory it simply doesn't make sense to measure it in a unit that can't be evenly addressed.

      People don't even bit about network devices but they should be measured in base 2 units as well. The new prefixes are ridiculous because they continue to allow the alternate notations by giving them difficult to remember prefixes. A network card should not be 100mb or mib or MB or whatever damn prefix we are using for that marketing notation today simply because 11.9mb sounds less impressive.

      It is simply stupid that I have to figure out the notation and expand it to 100000000bits, divide by 8 to get bytes, divide by 1024 to get k and then again to get mb simply to figure out how fast this device can actually transfer a given set of data (which will always be measured in base 2 units).

      Even non-techs understand the base 2 units because they are used to purchasing memory that way.

    30. Re:Personally I am SHOCKED by egomaniac · · Score: 1, Interesting

      There is no context in which is appropriate to apply metric reasoning to computers.

      It's exactly this kind of bullshit that irritates me. Suppose you look at a file. It's 95,015,327 bytes long. You're claiming that referring to the file as being 95MB is "inappropriate"?

      I'm a software engineer, fully versed in binary math, and the fact that computers refer to that file as being 90MB still really pisses me off. It's pointless and annoying.

      --
      ZFS: because love is never having to say fsck
    31. Re:Personally I am SHOCKED by Beardo+the+Bearded · · Score: 1

      Really? An actual, Iron Ring wearing, B.Eng holding, Association of Professional Engineers affiliated Software Engineer?

      I'm just asking because they have those now, and they're rare.

      /EE here
      //Wait, this isn't Fark.

      --

      ---
      ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
    32. Re:Personally I am SHOCKED by shaitand · · Score: 1

      'Suppose you look at a file. It's 95,015,327 bytes long. You're claiming that referring to the file as being 95MB is "inappropriate"?'

      As an informal rough estimate, not at all. But yes, in any formal or written sense I absolutely think that is inappropriate.

      'It's pointless and annoying.'

      As a software engineer you don't know that as an electronic engineer you would know that the machine can not be efficiently made to work with a different system. Understanding that fact means that using anything but the base 2 numbering will always require numerous extra steps whenever a REAL calculation is performed. You will always ultimately end up having to translate to the base 2 system that the hardware and software is using.

    33. Re:Personally I am SHOCKED by MBGMorden · · Score: 1

      Not all companies even came down to a year. Can't remember if it was Samsung or Seagate, but I know one of those two was sticking with 3 year warranties even when most were going to 1 year. That being said, I use both and they've been rock solid. My most trouble has come from Western Digital (which I've heard have gotten better but a few years back we considered them "toy" hard drives) and IBM.

      Of course, I've got an 80mb drive made by Connor out of an old 486 circa '92 or so, and it STILL works :D. Got an old 5gb Micropolis that's still kicking too.

      --
      "People who think they know everything are very annoying to those of us who do."-Mark Twain
    34. Re:Personally I am SHOCKED by pipingguy · · Score: 1

      FWIW, I define "real engineering" as technical knowledge that can be transmitted to skilled tradesman so that tangible, useful things can be constructed (software "engineers" need not apply). Always remember that if computers and software disappeared tomorrow there'd still be people remaining that know how to design and build bridges, highways, pipelines and large buildings. Can the same be honestly said if the reverse happened?

    35. Re:Personally I am SHOCKED by kripkenstein · · Score: 1

      I don't care how you spin it. 1024 is the multiple. NOT 1000!

      The real issue here is that the mathematical community didn't have the foresight to see what having 2^10 = 1024 != 1000 would cause. But let's not play the blame game; solutions are what we need. For example, we can change math so that 2^10 = 1000, by setting 10 = 9.965. That would solve everything. Or we could change 2 = 1.995.

      Both of these options are simple compromise solutions that the mathematical community can surely accept. If not, congress can pass it as legislation. Meanwhile, I'm going to fix the relevant Wikipedia pages right now.

    36. Re:Personally I am SHOCKED by __aajfby9338 · · Score: 1

      There's a very good reason why a megabyte should not be 1000 bytes, or 1024 bytes, either:

      A megabyte is 1048576 bytes. Or maybe 1000000 bytes, depending on who you ask.

    37. Re:Personally I am SHOCKED by l3v1 · · Score: 1

      I don't care how you spin it. 1024 is the multiple. NOT 1000!

      It's all because of technology getting widespread. Don't get me wrong I _don't_ mean people getting more technological, just more of them using it. Engineers, coders, etc. just know what a byte is, what the whole binary system is/has been used for, why 1024 as a multiplier was nice, things about computer history, we just don't "get" the whole "ambiguous" story what people were referring to when they introduced the whole kibibyte joke. We just know what we are talking about, what we are referring to and there's no ambiguity. But, among the general public, kilo just means kilo, they don't know or just simply don't care about the whole 1024 issue, and you know, the crowd always wins. We just have to live with it. But, thing is, we don't have to blindly accept it. I always use the 1024 base when appropriate and when somebody looks questioningly, I tell them what's all about, if they are willing to listen.
       
      Otherwise I just don't care. If they want kibi or kilo, give them kibi or kilo.
       

      --
      I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
    38. Re:Personally I am SHOCKED by PerlDudeXL · · Score: 1

      I thought the extra "i" is used to indicate SI units (10,100,1000,...) and not the 1024 style!?

    39. Re:Personally I am SHOCKED by s52d · · Score: 1

      It is so nice, when 2 Mbit/second is actually 2*1024*1000 bit/second.

    40. Re:Personally I am SHOCKED by TheThiefMaster · · Score: 1

      Actually there's no reason why a computer couldn't just display the numbers using the base 10 prefix, and just use base 2 internally. But I don't think it's ever going to get changed.

    41. Re:Personally I am SHOCKED by Anonymous Coward · · Score: 0

      If you post on Slashdot just to insult other people, you need to get out of your parents' basement.

    42. Re:Personally I am SHOCKED by Sj0 · · Score: 1

      The units are for humans. The computer knows the entire number.

      --
      It's been a long time.
    43. Re:Personally I am SHOCKED by Sj0 · · Score: 1

      Joke's on you! I worked hard, payed my dues, finally graduated college and got a good job, and now I pay good money to live in SOMEONE ELSE'S moms basement!

      --
      It's been a long time.
  6. I believe it... by madhatter256 · · Score: 2, Informative

    Yeh. Don't rely on the HDD after it surpasses its' manufacturer warranty.

    --
    Previewing comments are for sissies!
    1. Re:I believe it... by SighKoPath · · Score: 2, Insightful

      Also, don't rely on the HDD before it surpasses its manufacturer warranty. All the warranty means is you get a replacement if it breaks - it doesn't provide any extra guarantees of the disk not failing.

    2. Re:I believe it... by Anonymous Coward · · Score: 0

      I sort of rely on the fact that drives won't. I havn't been able to work the system as well as a friend of mine, but as long as they fail within the warantee window, that's basically a free upgrade.

    3. Re:I believe it... by drinkypoo · · Score: 1

      Don't rely on a HDD ever. This is why we have backups and RAID. Even RAID's not enough by itself.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    4. Re:I believe it... by The+Clockwork+Troll · · Score: 2, Insightful

      In my experiences with several major drive vendors, I have never gotten an "upgrade". What you get is a replacement drive, but generally it's the same drive (perhaps refurbished or firmware-revised) and the original warranty period is still in effect (with perhaps a 30 day extension to account for your downtime). I've RMA'd a lot of drives and never have I gotten one of different spec/size. I'm not even sure this would be desirable, e.g. in the case of replacing a drive in a RAID array with something of different specification (yes, even "better" specification). Symmetry and everything.

      --

      There are no karma whores, only moderation johns
    5. Re:I believe it... by Short+Circuit · · Score: 1

      Manufacturers don't want to spend more than a certain percentage of their sales on warranty replacements, so they limit their warranty periods to a value that would yield a comfortably low number of RMAs.

      By comparing manufacturer warranty rates, one can get a rough idea of how confident different manufacturers are about the lifetime of their products.

      However, the only justification I can think of for not relying on a drive beyond the warranty would be that one doesn't get a free drive as replacement if it fails. But buying a new drive every three-to-five years, just because one can't get a free drive, seems silly to me.

    6. Re:I believe it... by PoconoPCDoctor · · Score: 1

      Just a tad bit in to the off-topic realm, but a few years back, a company I worked for used Compaq (pre-HP takeover) as the desktop standard. Some of these systems had 10 gigabyte hard drives. When they failed under warranty, we'd get identical 10 gig drives, even though the standard was now 20 or 30 gig drives.

      When I checked the manufacturing date, the drive was was three frickin' years old! I didn't care that the drive was small, I cared that the drive had been sitting on a shelf or worse, been kicked around in warehouses for three years before it got to me.

      --
      "Let us raise a standard to which the wise and honest can repair" - George Washington
    7. Re:I believe it... by SirKron · · Score: 1

      I would rather the manufacturers post a MTBF rate limited in scope to a 3 or 5 year max as that is when most servers are recycled in the corporate world. By running the good drives for 10+ years the MTBF is drastically skewed to a much smaller MTBF rate. Anyways, we will only be arguing this for another five years. By then we will all be running solid state flash hard drives with a drastically better MTBF rate. So much so that we will be seeing way less RAID 10 and much more RAID 5 or 6.

    8. Re:I believe it... by The+Clockwork+Troll · · Score: 1
      I wish we could get the data on how many "reserve"/warranty drives each major manufacturer makes and stores for future replacement purposes.

      Forget MTTF - that would be a pretty realistic reflection of how often they really expect their drives to fail.

      --

      There are no karma whores, only moderation johns
  7. Statistics by Anonymous Coward · · Score: 0

    >The data sheets for the drives indicated MTTF between 1 and 1.5 million hours

    In statistics the average alone doesn't say anything, you need to give the variance

    http://en.wikipedia.org/wiki/Variance

    You can give an average value of espected life, but you also need to know how open your distribution is to understand if your product last longer than the competition.

    1. Re:Statistics by Anonymous Coward · · Score: 0

      Sorry but MTTF is actually the only statistic that directly translates into expected cost of replacement in a large HDD farm. Unless the downtime incurred by an unexpected HDD failure is so high that you're considering preemptively replacing some HDDs before they fail, variance and higher moments only have entertainment value.

  8. Fuzzy math by Spazmania · · Score: 1, Insightful

    Disk Drive Failures 15 Times What Vendors Say [...] That should mean annual failure rates of 0.88% [but] annual replacement rates were between 2% and 4%.

    0.88 * 15 = 4?

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    1. Re:Fuzzy math by mistahkurtz · · Score: 1

      uh, yeah.... and 2+2=5

      --
      not only is time travel possible, it's irrelevant.
    2. Re:Fuzzy math by flyingfsck · · Score: 1

      only for large values of 2

      --
      Excuse me, but please get off my Pennisetum Clandestinum, eh!
  9. This study is useless. by Lendrick · · Score: 2, Interesting

    In the article, they mention that the study didn't track actual failures, just the how often customers *thought* there was a failure and replaced their drive. There are all sorts of reasons someone might think a drive has failed. They're not all correct. I can't begin to guess what percentage of those perceived failures were for real.

    This study is not news. All it says is that people *think* their hard drives fail more often than the mean time to failure.

    1. Re:This study is useless. by mandelbr0t · · Score: 1

      And I think they fail less often than the MTTF. There, the statistics are satisfied as well, and it's still not news.

      --
      "Please describe the scientific nature of the 'whammy'" - Agent Scully
    2. Re:This study is useless. by crabpeople · · Score: 3, Interesting

      Thats fair, but if you pull a bad drive, ghost it (assuming its not THAT bad), plop the new drive in, and the system works flawlessly, what are you to assume?

      I dont really care to know exactly what is wrong with the drive. If i replace it, and the problem goes away, I would consdier that a bad drive. Even if you could still read and write to it. I just did one this morning that showed no symptoms other than windows taking what I considered a long time, to boot. All the user complained about was sluggish performance, and there were no errors or drive noises to speak of. Problem fixed, user happy, drive bad.

      As I already posted, a good rule of thumb is 3 years from the date of manufacture, is when most drives go bad.

      --
      I'll just use my special getting high powers one more time...
    3. Re:This study is useless. by Lendrick · · Score: 1

      You obviously know what you're doing. Not all users do... in fact, the bitter techie in my is screaming that most don't. :)

    4. Re:This study is useless. by zcat_NZ · · Score: 1

      You get a virus or spyware thats causing the system to crash frequently. Pull the drive out, ghost the new drive with a clean copy of the OS and the machine stops crashing. This surprises you?

      I get given quite a few drives that are suspected faulty. I run diagnostics and zero-fill them, and about one in three is just fine. Haven't had to buy a drive for quite a while.

      --
      455fe10422ca29c4933f95052b792ab2
    5. Re:This study is useless. by timeOday · · Score: 1

      Except they didn't study what "people" thought, they studied "a number of large production systems, including high-performance computing sites and internet services sites" - in other words, the best case scenario in terms of user expertise.

    6. Re:This study is useless. by AusIV · · Score: 1
      I've been guilty of that. I had a drive I thought was getting ready to die. It was rattling fairly consistently, so I put it in a RAID with a couple other drives, not really knowing what would happen if the drive died. One day, shortly after upgrading from Ubuntu Dapper to Edgy, my raid crapped out on me, giving me a screen saying the my logical volume couldn't be assembled from the RAID components, because the RAID devices weren't found. Having never dealt with a drive failure in a RAID, I assumed this meant the drive had died, so I went to the store, bought a new drive, and started to install it. I then found that the rattle was because I had used a screw that was one size to small, and it wasn't being properly held in place. I replaced it anyway, still thinking it was bad. I booted up a live CD and repaired the raid, then booted back to my regular boot, only to find the original problem. Turns out the software raid in Edgy just sucked, so I went back to dapper, and bought an external case for the old drive. It turned out to be fine.

      Long story short, I replaced a drive when I didn't need to. I happened to find out that the drive was fine, but lots of people probably never would have. I blame my own ignorance, both for causing the rattle, and not knowing what a raid failure looked like. It certainly isn't fair to blame Seagate for the drive failure, or misrepresenting the failure time - I assume they make sure their drives have legitimately failed before counting them in their drive failure statistics. You can't blame them for user ignorance.

    7. Re:This study is useless. by rrohbeck · · Score: 1

      >Thats fair, but if you pull a bad drive, ghost it (assuming its not THAT bad), plop the new drive in, and the system works flawlessly, what are you to assume?

      Been there, done that. If you have NTFS with encrypted files, you won't be able to read them even though your credentials are the same. Somehow the disk serial number goes into the key.

    8. Re:This study is useless. by Anonymous Coward · · Score: 0

      Problem fixed, user happy, drive bad.

      Your defragmentation technique sounds awfully expensive.

    9. Re:This study is useless. by Anonymous Coward · · Score: 0

      I dunno.... I have a couple Connor and Seagate drives that are around 10 years old. They're packed up in storage but still work as far as I know.

      My longest active service drive was a 5.7GB Maxtor which was retired from service after 8 flawless years. It migrated from one system to another to another to another to another to another with nearly constant daily use. It never had any problems and still passes a Spinrite check with flying colors.

      I actually removed it from service about last year ONLY because I realized just how freaking old it was and the system it was in was being handed down to a relative. Figured some moving part was bound to fail sooner or later.

      It still sits here on my desk, litterally as a paperweight. Can't bear to toss out something that had been so loyal to me.

      Next to it is one of my MP3 players. It hold about the same about of data but it fits in my coin pocket. Times sure have changed.

    10. Re:This study is useless. by Anonymous Coward · · Score: 0

      I guarantee the ones from LANL were bad. If not before they were pulled from the arrays, then shortly after as they were physically destroyed.

      Seriously, though, the Los Alamos drives all came from RAID arrays. It wasn't some moron's opinion that the drive had failed, it was a high-end RAID controller. Not perfect, but better than "i [sic] think they fail less often".

    11. Re:This study is useless. by Lord+Crc · · Score: 1

      There are all sorts of reasons someone might think a drive has failed. They're not all correct.

      I can attest to this. One of my hd's started to "crash" (spun down, parked head, spun up again) with shorter and shorter intervals. I replaced it, thinking it was dying. After a while, another hd started doing the same, and a third started to drop out and come back online (as if I had disconnected it and reconnected it). Turns out, when I switched from the chipset sata controller to the onboard raid controller, everything was nice and dandy, even the first drive works like a charm now.

  10. Over my career, I've replaced a TON of SCSI drives by mmell · · Score: 1

    I've still got quite a few of them, sizes ranging from 20MB - 2GB. Still operational (I presume). I wonder if those'd count towards the average?

  11. MTTF to annual failure rates by Anonymous Coward · · Score: 0

    The data sheets for the drives indicated MTTF between 1 and 1.5 million hours. That should mean annual failure rates of 0.88%
    Shouldn't that mean annual failure rates between 0.58% and 880000% ?
  12. Interface matters why? by neiko · · Score: 3, Interesting

    TFA seems surprised by SATA drives lasting as long as Fibre...why one earth would your data interface have any consequences on the drive internals? Or are we talking assuming Interface = Data Throughput?

    1. Re:Interface matters why? by Anonymous Coward · · Score: 0

      I think the general assumption is that more expensive "enterprise" level drives are significantly more reliable then much cheaper consumer level equipment. Recent studies show this not to be true.

    2. Re:Interface matters why? by ender- · · Score: 3, Insightful

      TFA seems surprised by SATA drives lasting as long as Fibre...why one earth would your data interface have any consequences on the drive internals? Or are we talking assuming Interface = Data Throughput?

      That statement is based on the long-held assumption that hard drive manufacturers put better materials and engineering into enterprise-targeted drives [Fibre] than they put into consumer-level drives [SATA].

      Guess not...

    3. Re:Interface matters why? by Danga · · Score: 1

      I thought the exact same thing. They are just dumbasses. The interface has probably zero effect on failure rate compared to the mechanical parts which are just about the same in all the drives.

      FTA:

      "the things that can go wrong with a drive are mechanical -- moving parts, motors, spindles, read-write heads," and these components are usually the same"

      The only effect I can see it having would be if really shitty parts were used for one interface compared to the other.

      --
      Hey, there is only one Return and it's not of the King, it's of the Jedi.
    4. Re:Interface matters why? by mollymoo · · Score: 5, Informative

      TFA seems surprised by SATA drives lasting as long as Fibre...why one earth would your data interface have any consequences on the drive internals?

      Fibre Channel drives, like SCSI drives, are assumed to be "enterprise" drives and therefore better built than "consumer" SATA and PATA drives. It's nothing inherent to the interface, but a consequence of the environment in which that interface is expected to be used. At least, that's the idea.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    5. Re:Interface matters why? by Penguin's+Advocate · · Score: 1

      It is probably assumed that FC drives are more reliable because they are expensive and only really used in relatively expensive servers. It's the whole "professional vs consumer grade" issue. It is generally assumed that "professional grade" drives should be more reliable than "consumer grade" drives. In my experience this is true, the 10000RPM scsi drives in my 10 year old Sun Ultra2s (which see continuous round the clock use) still work great, while I've never had a regular desktop drive from any manufacturer last more than 5 years. Not that my experience counts for much, I've only dealt with several hundred harddrives vs. the several hundred thousand in these studies. (Just for reference, the ratio I've seen between FC/SCSI and SATA/ATA drives failing is about 15:1 in favor of FC/SCSI, and I've never had a SCSI drive last less than 3 years, while I've had plenty of SATA and ATA drives last less than a couple months).

      --
      Frag 'em all...
    6. Re:Interface matters why? by Bill+Dimm · · Score: 1

      TFA seems surprised by SATA drives lasting as long as Fibre...why one earth would your data interface have any consequences on the drive internals?

      Because drive manufacturers claim they use different hardware for the drive based on the interface. For example, a SCSI drive supposedly contains a disk designed for heavier use than an ATA drive, they aren't just the same disk with different interfaces.

    7. Re:Interface matters why? by Mr.Ziggy · · Score: 1

      In theory, just changing the interface board on a drive would not change the reliability of the drive. BUT manufacturers are charging much more for Fibre Channel drives than SATA or IDE, because they are of supposed 'enterprise' quality. With suggestions of batch sorting or higher tolerances. It turns out those who are paying more for drive reliability are wrong. You can get more speed by spending more $/GB, but not more reliability.

    8. Re:Interface matters why? by Spazmania · · Score: 2, Informative

      They certainly charge enough more. SATA drives run about $0.50 per gig. Comparable Fibre Channel drives run about $3 per gig. A sensible person would expect the Fibre Channel drive to be as much as 6 times as reliable, but per the article there is no difference.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    9. Re:Interface matters why? by Spazmania · · Score: 1

      My 10 year old drives are still working great too. Its my 1 to 4 year old drives that are failing with alacrity.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    10. Re:Interface matters why? by Intron · · Score: 1

      An alternative and simpler explanation is that the manufacturers are correctly specifying MTBF when drives are properly mounted and cooled. When used in the substandard conditions actually experienced, then overheating and lousy shock and vibration characteristics cause any drive to fail much sooner.

      --
      Intron: the portion of DNA which expresses nothing useful.
    11. Re:Interface matters why? by hurfy · · Score: 1

      hehe, unfortunately too true :(

      not one of my old 4G in the daily desktops has failed in years, but, 4 of 12 from the 2-1/2 year-old desktops have bit it tho at less than half their age. All 4 within the same 6 months window. Currently replacing those old boxes, i'll be back in 2 years whining about the cheesy hardware ... oh well 5 years+ and almost zero maintainence couldn't last forever :(

      Wish i could find the article on MTBF, it didnt mean what one would expect it to mean.

    12. Re:Interface matters why? by timeOday · · Score: 1

      If you think "you get what you pay for" is sensible (and yes, a lot of people believe it). I'd say you get less than or equal to what you pay for.

    13. Re:Interface matters why? by timeOday · · Score: 1

      According to the google paper presented at the same conference, temperature doesn't matter nearly as much as people thought/assumed, and neither does read/write activity. (I don't think anybody has debunked vibration though!)

    14. Re:Interface matters why? by ZorinLynx · · Score: 1

      Yep, assumed. This is not the case, though; the quality is likely very similar. Keep in mind that companies use the same plants to build SATA and FC/SCSI drives, and likely a lot of the same parts, too.

      The reason FC and SCSI drives are so much more is because of lower volume. SATA drives likely sell 20x as many units as FC/SCSI, and economy of scale really helps out when it comes to hard drive production.

      -Z

    15. Re:Interface matters why? by WuphonsReach · · Score: 1

      According to hearsay (no source) - Enterprise drives (SCSI/FC) are tested individually while consumer drives are only spot-checked as part of a batch.

      I have no idea if that's true or not.

      --
      Wolde you bothe eate your cake, and have your cake?
  13. I have thought the MTTF is bullshit for a while by Danga · · Score: 5, Interesting

    I have had 3 personal use hard drives go bad in the last 5 years, they were either Maxtor or Wester Digital. I am not hard on the drives other than leaving them on 24/7. The drives that failed were all just for data backup and I put them in big, well ventilated boxes. With this use I would think the drives would last for years (at least 5 years), but nope! The drives did not arrive broken either, they all functioned great for 1-2 years before dying. The quality of consumer hard drives nowadays is way, WAY low, and the manufacturers should do something about it.

    I don't consider myself a fluke because I know quite a few other people who have had similar problems. What's the deal?

    Also, does anyone else find this quote interesting?:

    "and may have failed for any reason, such as a harsh environment at the customer site and intensive, random read/write operations that cause premature wear to the mechanical components in the drive."

    It's a f$#*ing hard drive! Jesus H Tapdancing Christ how can they call that premature wear, do they calculate the MTTF by just letting the drive sit idle and never reading and writing to it? That actually wouldn't suprise me.

    --
    Hey, there is only one Return and it's not of the King, it's of the Jedi.
    1. Re:I have thought the MTTF is bullshit for a while by dextromulous · · Score: 1

      I have had 3 personal use hard drives go bad in the last 5 years, they were either Maxtor or Wester Digital. I am not hard on the drives other than leaving them on 24/7.
      Ever read the manufacturer's fine print on how they determine MTBF? Last time I did (yeah, it was over a year ago,) it read: "8 hour a day usage." Drives that are on 24/7 get HOT, and heat leads to mechanical failure.
      --
      There are two types of people in the world: those who divide people into two types and those who don't.
    2. Re:I have thought the MTTF is bullshit for a while by XenoPhage · · Score: 1

      Ever read the manufacturer's fine print on how they determine MTBF? Last time I did (yeah, it was over a year ago,) it read: "8 hour a day usage." Drives that are on 24/7 get HOT, and heat leads to mechanical failure.

      MTTF, no? MTBF would indicate a fixable system.

      Yeah, but there has to be a plateau to the heat curve at some point. It's not as if the heat just keeps going up and up.. I would think that the constant on/off each day, causing expansion and contraction of the parts as they heat and cool, would cause much more wear over time. Leaving it on 24/7 in a well ventilated and cooled system should, I would think, keep the drives running better.

      Where are the majority of the failures anyway? In the mechanical components or on the disk platters themselves? ie, is this mechanical wear causing failures, or a breakdown of the chemicals used to coat the drive platters?

      --
      XenoPhage
      Technological Musings
    3. Re:I have thought the MTTF is bullshit for a while by dextromulous · · Score: 1
      I was able to quickly find at least one reference to this measure (8 hours/300 days a year for personal storage [PS] drives, 24 hours 265 for enterprise storage [ES] drives.)

      The most significant difference in the reliability specifica- tion of PS and ES drives is the expected power-on hours (POH) for each drive type. The MTBF calculation for PS assumes a POH of 8 hours/day for 300 days/year1 while the ES specification assumes 24 hours per day, 365 days per year.
      --
      There are two types of people in the world: those who divide people into two types and those who don't.
    4. Re:I have thought the MTTF is bullshit for a while by mollymoo · · Score: 1

      Do you seriously think a drive won't have reached thermal equilibrium after an hour, let alone after several hours? Mine seem to get up to their 'normal' temperatures in 30 minutes or less. And according to the Google study, heat doesn't lead to a significantly increased risk of failure till you get above 45 C or so.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    5. Re:I have thought the MTTF is bullshit for a while by dextromulous · · Score: 1

      Acronyms schmackronyms... anyway, I found at least one paper that I read in the past that states the 8 hours/day thing I was referring to: http://www.seagate.com/content/docs/pdf/whitepaper /D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

      The 8 hours/day is referring to personal storage (as opposed to enterprise storage systems,) and this discussion is supposed to be about enterprise storage, so I'm off topic anyway. (BTW, the whitepaper I linked to does specify it as MTBF, for what it's worth)
      --
      There are two types of people in the world: those who divide people into two types and those who don't.
    6. Re:I have thought the MTTF is bullshit for a while by dextromulous · · Score: 1

      Do you seriously think a drive won't have reached thermal equilibrium after an hour, let alone after several hours? Mine seem to get up to their 'normal' temperatures in 30 minutes or less.
      Sure, they will have reached "thermal equilibrium" after a short period of time. See Figure 9 in this paper " Reliability reduction with increased power on hours, ranging from a few hours per day to 24 x 7 operation " to see how I'm not sure that merely being hot is the problem.

      And according to the Google study, heat doesn't lead to a significantly increased risk of failure till you get above 45 C
      I'll have to take your word for it, I haven't read their study yet.
      --
      There are two types of people in the world: those who divide people into two types and those who don't.
    7. Re:I have thought the MTTF is bullshit for a while by mollymoo · · Score: 1

      Sure, they will have reached "thermal equilibrium" after a short period of time. See Figure 9 in this paper " Reliability reduction with increased power on hours, ranging from a few hours per day to 24 x 7 operation " to see how I'm not sure that merely being hot is the problem.

      The graph mostly seems to indicate that drives wear out when they are spinning. It's not all that far from a straight line (if you ignore the very low hours), which you would expect if wear was a significant component in the risk of failure. To a rough approximation, that graph shows a 0.5% risk of failure independent of usage level, then an additional 0.5% risk per 3000 hours/year of usage.

      I always assumed quoted MTBF was for power-on hours and framed my discussion in those terms, but it seems I should have read the small-print.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    8. Re:I have thought the MTTF is bullshit for a while by dextromulous · · Score: 1

      To a rough approximation, that graph shows a 0.5% risk of failure independent of usage level, then an additional 0.5% risk per 3000 hours/year of usage.

      Perhaps you misinterpreted the label on the y-axis of that figure. It is not in percent, it is a multiplier. So 0.5 means 50%.

      Quoting the paper, emphasis mine:

      The chart in Figure 9 shows the expected increase in AFR due to higher power-on-hours. Moving a drive from an expected 2,400 POH per year to 8,760 POH per year would increase the failure rate almost two-fold, if there were no compensation elsewhere in the design.
      --
      There are two types of people in the world: those who divide people into two types and those who don't.
    9. Re:I have thought the MTTF is bullshit for a while by Jeff+DeMaagd · · Score: 1

      I don't think drive reliability is that bad. I'm using more drives now (five in each of two computers, then there are external drives) than I ever have been (often just one or two drives per computer) and I am getting fewer failures than I did a decade ago. I had one drive fail a month ago, and two fail about a decade ago. I've got many drives that work but aren't worth connecting, my first drive probably would still work, but 40MB isn't worth it except for the nostalgia - to see what I ran ~15 years ago and what files I made.

      I think Maxtor has had a history of poor reliability, and some think that Western Digital isn't all that great either. Unfortunately, I haven't seen any reports comparing reliability between brands (the reports chicken out, though understandable if the maker is willing to sue), so now we're stuck without quantifiable information on how much brand and model makes a difference.

    10. Re:I have thought the MTTF is bullshit for a while by mollymoo · · Score: 1

      The quality of consumer hard drives nowadays is way, WAY low, and the manufacturers should do something about it.

      Point, counterpoint...

      I've never had a single one of my own hard drives fail. Not a single one, ever. I've had a dozen or so that I can remember, from the 20MiB drive in my Amiga to the 250GiB that now hangs off my NSLU2. They are all either still functioning or became obsolete before failing. Many of them have been run 24/7 for significant chunks of their lives and I don't replace them unless they become too small to be useful (they don't shrink of course, but my needs grow).

      --
      Chernobyl 'not a wildlife haven' - BBC News
    11. Re:I have thought the MTTF is bullshit for a while by mollymoo · · Score: 1

      Perhaps you misinterpreted the label on the y-axis of that figure. It is not in percent, it is a multiplier. So 0.5 means 50%.

      Oops, indeed I did. It only scales my interpretation, rather than contradicting it though. It still indicates that wear is highly significant (which I expected, but previously erroneously asssumed was accounted for by MTBFs applying to power-on time rather than calendar time).

      --
      Chernobyl 'not a wildlife haven' - BBC News
    12. Re:I have thought the MTTF is bullshit for a while by geekoid · · Score: 1

      Thats a good point, but it does indicate another reason to get lots of RAM

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    13. Re:I have thought the MTTF is bullshit for a while by paeanblack · · Score: 1

      I am not hard on the drives other than leaving them on 24/7.

      Are you sure? How's your power quality? This has a huge impact on the life of any piece of electronic equipment.

      If you haven't invested any serious money in power conditioning and monitoring, it's a pretty safe bet your power quality falls somewhere between total crap and utter shit. Unfortunately, if you are only replacing a thousand dollars or so worth of equipment every year, there isn't really a financially worthwhile solution. Good power costs much more than a few hard drives.

      I'd still wager you'd notice a drop in your drive failures if you dropped $500-$1000 on some entry-level UPSes.

    14. Re:I have thought the MTTF is bullshit for a while by mollymoo · · Score: 1

      What sort of power conditioning? Would galvanic isolation, rectification into a substantial input capacitance and a regulated ouput suffice? That's what you get in a computer PSU.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    15. Re:I have thought the MTTF is bullshit for a while by timeOday · · Score: 1

      To count non-operational (powered off) hours in the MTTF is just as dishonest as any other lie. Do you think aircraft engine manufacturers could get away with that?

    16. Re:I have thought the MTTF is bullshit for a while by Anonymous Coward · · Score: 0

      Consider yourself extremely lucky, for most IT people the situation is very different. For us, quite a few of the drives failed at some point. I even had a desktop WD drive that failed twice during the 3-year warranty period (that is, the original and the first replacement drive). And I'm not even talking about the 2.5" laptop drives, that despite the slow speed and the active protection system (we use IBM/Lenovo Thinkpads) have about 20%-30% chance to die during the first three years.

    17. Re:I have thought the MTTF is bullshit for a while by Danga · · Score: 1

      I didn't have a power conditioning UPS on it when the drives failed (although I do now) but overall my apartment building had pretty good power (yes I checked it, I worked as an electrician while getting my CS degree). IMO power conditioners are a waste of money and the only place I have heard of them making a difference is with high end home theatre systems where the sound quality may be better with a conditioner. I don't think a conditioner would make much difference if I plugged my home computer into it.

      The biggest reason I think it would make no difference is because if unconditioned power is supposed to be so bad for electronics then why is the only thing that I have a problem with turn out to be the hard drives? I would think bad power would take out RAM before a hard disk. Either way the only failure I have had with this box is the hard drives, nothing else.

      BTW I do have a high quality power supply in my main box so the blame can't be placed on a POS power supply which definitely do cause problems.

      --
      Hey, there is only one Return and it's not of the King, it's of the Jedi.
    18. Re:I have thought the MTTF is bullshit for a while by Sibko · · Score: 1

      Personally, I'm not too worried. Flash drives are gaining speed, and the Magnetic Drive manufacturers are going to be finding themselves out of customers if they can't offer reliable storage at cheap prices.

    19. Re:I have thought the MTTF is bullshit for a while by paeanblack · · Score: 2, Insightful

      The biggest reason I think it would make no difference is because if unconditioned power is supposed to be so bad for electronics then why is the only thing that I have a problem with turn out to be the hard drives? I would think bad power would take out RAM before a hard disk.

      Ram has no significant inductive load.

    20. Re:I have thought the MTTF is bullshit for a while by dextromulous · · Score: 1

      To count non-operational (powered off) hours in the MTTF is just as dishonest as any other lie. Do you think aircraft engine manufacturers could get away with that?

      I'm not sure they count the powered off hours, I have not found anything specifically stating that yet.

      --
      There are two types of people in the world: those who divide people into two types and those who don't.
    21. Re:I have thought the MTTF is bullshit for a while by Anonymous Coward · · Score: 0

      There was never this "8 hour a day usage" thing, until the IBM "Deathstar" which their lawyers used that as a defense for the harddrive failures in the lawsuit. I guess this "not meant for 24/7 usage" idea caught on :/

    22. Re:I have thought the MTTF is bullshit for a while by myowntrueself · · Score: 1

      To count non-operational (powered off) hours in the MTTF is just as dishonest as any other lie. Do you think aircraft engine manufacturers could get away with that?

      How about if aero-engine manufacturers gave MTTF based on engines kept running on the ground in a test harness with big filters in front of their intakes ensuring that no crap gets sucked into them? Not exactly 'powered off' hours.

      (Or 'second hand' guitar strings from the Bay City Rollers; not exactly new but then not exactly used either, as Jasper Carrot once noted)

      --
      In the free world the media isn't government run; the government is media run.
    23. Re:I have thought the MTTF is bullshit for a while by jimicus · · Score: 1

      Quite correct, but the effectiveness of that is proportional to the quality of the PSU.

      I've seen more desktop hard drives die than servers, but generally speaking I'd expect the PSU in a server to be substantially better quality.

    24. Re:I have thought the MTTF is bullshit for a while by drsmithy · · Score: 1

      And according to the Google study, heat doesn't lead to a significantly increased risk of failure till you get above 45 C or so.

      The hard disk in the average small, poorly ventilated PC would hit 40 degrees fairly easily. Heck, my mum's iMac reports the drive sitting around 50 degrees at idle.

      Most hard disks are't living in well ventilated machines sitting in climate-controlled data centres.

  14. I am shocked! by Anonymous Coward · · Score: 2, Insightful

    I just can't believe that the same vendors that would misrepresent the capacity of their disk by redefining a Gigabyte as 1,000,000,000 bytes instead of 1,073,741,824 bytes would misrepresent their MTBF too! And by the way, nobody actually runs a statistically significant sample set their equipment for 10,000 hours to arrive at a MTBF of 10,000 hours, so isn't their methodology a little suspect in the first place?

  15. And that's a really wide range by VampireByte · · Score: 2, Funny

    I feel sorry for anyone buying drives on the low end of that range. A MTTF of 1 hour really sucks.

    --

    Run and catch, run and catch, the lamb is caught in the blackberry patch.

    1. Re:And that's a really wide range by User+956 · · Score: 1

      I feel sorry for anyone buying drives on the low end of that range. A MTTF of 1 hour really sucks.

      Well, they don't call it "Best Borrow" for no reason.

      --
      The theory of relativity doesn't work right in Arkansas.
  16. Even better ... by khasim · · Score: 3, Interesting

    Give me 6 month failure rates.

    Start with 100 drives. Continuous usage.

    How many fail in the first 6 months? 12 months? 18 months? ... 60 months? That would be the info that I'd need. Where's the big failure spike? I'm going to be replacing them right before that.

    1. Re:Even better ... by ivan256 · · Score: 1

      The big spike is at the beginning.

    2. Re:Even better ... by Falkkin · · Score: 5, Informative

      This is handled in the paper. See this graph: http://www.usenix.org/events/fast07/tech/schroeder /schroeder_html/img14b.PNG

      Unfortunately there is no big "spike"; the average replacement rate just grows and grows with time.

    3. Re:Even better ... by Grail · · Score: 1

      TFA tells you that there is no "bathtub curve" and no "failure spike". The drives just fail more frequently as they get older - it's an exponentially rising curve.

    4. Re:Even better ... by Rakishi · · Score: 1

      And the google study posted a while back disagrees with that, as do a ton of other sources I'd assume. That does not say good things about the accuracy of this study.

    5. Re:Even better ... by vux984 · · Score: 2, Funny

      Alrighty then... I'll just replace them before I install them ;)

    6. Re:Even better ... by Anonymous Coward · · Score: 0

      Well, I had a brand new 100GB Toshiba notebook drive that died in less then 1 year. Mind you, this drive wasn't even in constant use and the notebook that it was in never left my desk.

      I tried to contact their support, but kept getting voicemail and no return calls. Ever since then, I won't touch a Toshiba product again.

    7. Re:Even better ... by kidgenius · · Score: 1
      I kind of read through the google paper, but just skimmed it. I was hoping for more specific numbers, etc. But I saw the CMU paper first, and focused on it.

      I will also add to what you said. After looking at the Weibull they performed, they are completely incorrect when they say "there is no infant mortality effect". Whenever you have a weibull shape factor of less than 1 (and in their case it was 0.71), that tells you that your system exhibits signs of infant mortality. Add to the fact that they are erroneously comparing MTBF to MTTF when the two are very different and tell you very different things. Also, I tried to talk to one of the researchers, and her reply to me was starting to give me the idea that she really doesn't know what she is talking about.

      They may have had very good ambitions, but when you show that you don't understand the very basics of Reliability, then your entire research is in question. They either performed their Weibull very incorrectly, don't understand infant mortality, wearout, and random modes of failure, or a combination of the two.

      And if I get any replies from anyone that states "MTBF = MTTF + MTTR" and then goes on to say that hard drives are not repairable, then you obviously don't understand Reliability. There is a huge difference between MTTF of a single drive, and the MTBF/MTTR of a SYSTEM of drives. One relates to a single piece of hardware, and the other two relate to a system. This is one thing where I got a strong impression that the CMU researchers don't know what they are saying. They aren't purposefully stating things incorrectly, they just are ignorant.

    8. Re:Even better ... by justthinkit · · Score: 1

      I find it interesting how big the biggest "failed" spikes are pretty much exactly at 36 and 48 months. To me this suggests a scheduled replacement cycle or a claim-before-the-warranty-runs-out move. The stats don't seem kosher to me.

      --
      I come here for the love
  17. Re:Over my career, I've replaced a TON of SCSI dri by Pojut · · Score: 1

    I've got a big ol' 5-inch 20MB hard drive whirring at home still...

    In fact, my TRS-80 is still functional too...the tape drive is a little wonky, but what are ya gonna do?

  18. You takes your chances by davidwr · · Score: 1

    At any given time, the drive has a finite probability of failing in the next 30 days of normal use.

    When this probability is high enough, you should replace it or take actions (like more frequent backups) that raise your tolerance for failure.

    Imagine drives had a failure rate similar to radioactive decay:

    2% of drives failed in the 1st year,
    2% of the remaining drives failed in the 2nd year,
    2% of the remaining drives failed in the 3rd year,
    and so on.

    Why should I replace my 5 year old drive with an identical new one? I shouldn't.

    However, that's not the real world. In the real world, drives are more like cars - a drive with the equivalent of 100,000 miles and 10 years on it is a lot more likely to have a mechanical breakdown than one with 6 months and 5,000 miles.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:You takes your chances by ivan256 · · Score: 1

      This isn't about data loss, it's about cost. No smart is taking their chances and playing the odds. They are protecting their data with redundancy and backups. You're going to run the drive until it dies, has performance impacting error rates, or needs to be upgraded for some other reason. This isn't about knowing when you need to buy a new drive to save your stuff. It's about knowing how much budget to allocate to drive replacements in your organization that has 50,000 drives. Tolerance for failure is not measured in data lost. It is measured in dollars.

    2. Re:You takes your chances by Anonymous Coward · · Score: 0

      "Why should I replace my 5 year old drive with an identical new one? I shouldn't."

      Read the FULL report ( http://www.usenix.org/events/fast07/tech/schroeder /schroeder_html/index.html )

      Scroll down to "4.2 Age-dependent replacement rates"

      It looks pretty clear from that graph that drive 5 years old are highly more likely fail. If you are a statistics person, thats seems pretty convincing.

      Not to say you need to chuck the drives, my anecdotal evidence being I have 2 linux boxes running with 10 year old 4.3GB IBM drives, and they just work.

    3. Re:You takes your chances by vaporland · · Score: 1

      i knew someone with an appleshare server with a drive that run continiously from 1990 until it was finally replaced in 2003. 13 years! Quantum drive in an Apple SCSI case. They do not make them like they used to . . .

      --
      Ask Me About... The 80's!
    4. Re:You takes your chances by Anonymous Coward · · Score: 0

      > At any given time, the drive has a finite probability of failing in the next 30 days of normal use.

      I suppose you mean, "non-zero" instead of "finite", because probability is by definition between 0 and 1.

  19. Having read the paper and seen the talk... by reset_button · · Score: 2, Informative
    Here are the main conclusions:
    • the MTTF is always much lower than the observed time to disk replacement
    • SATA is not necessarily less reliable than FC and SCSI disks
    • contrary to popular belief, hard drive replacement rates to not enter steady state after the first year of operation, and in fact steadily increase over time.
    • early onset of wear-out has a stronger impact on replacement than infant mortality.
    • they show that the common assumptions that the time between failure follows an exponential distribution, and that failures are independent, are not correct.
    It was an interesting paper (won the best paper award) at this year's FAST (File and Storage Technologies) conference. Here is a link to the paper, and the summary from the conference.
  20. Perhaps your data is safe if you DUPElicate it by Anonymous Coward · · Score: 0

    At least we know slashdot won't be in danger of losing their data if that's the case ;-)
    http://hardware.slashdot.org/article.pl?sid=07/02/ 21/004233

  21. It's not relative. by tomhudson · · Score: 1

    ... its because they were on "Internet Time."

  22. Corporations misrepresent products, news at 11:00! by NerveGas · · Score: 1

    Is there anyone out there that actually believed the published MTBF figures, even BEFORE these articles came out?

    It's hard to take someone seriously when they claim that their drives have a 100+ year MTBF, especially since precious few are still functional after 1/10th of that much use. To make it better, many drives are NOT rated for continuous use, but only a certain number of hours per day. I didn't know that anyone EVER believed the MTBF B.S..

    --
    Oh, you're not stuck, you're just unable to let go of the onion rings.
  23. I replace Drives for reasons other than failure. by zibix · · Score: 1

    I didn't notice anything in the article that would indicate that they only took into account drive being replaced due to failure. It seems like this would be common sense, but I'd like some verification that only drive-failures were being included in this "replacement" study.

  24. Check SMART Info by Bill+Dimm · · Score: 3, Interesting

    Slightly off-topic, but if you haven't checked the Self-Monitoring, Analysis and Reporting Technology (SMART) info provided by your drive to see if it is having errors, you probably should. You can download smartmontools, which works on Linux/Unix and Windows. Your Linux distro may have it included, but may not have the daemon running to automatically monitor the drive (smartd).

    To view the SMART info for drive /dev/sda do:
    smartctl -a /dev/sda
    To do a full disk read check (can take hours) do:
    smartctl -t long /dev/sda

    Sadly, I just found read errors on a 375-hour-old drive (manufacturer's software claimed that repair succeeded). Fortunately, they were on the Windows partition :-)

    1. Re:Check SMART Info by drinkypoo · · Score: 1

      Slightly off-topic, but if you haven't checked the Self-Monitoring, Analysis and Reporting Technology (SMART) info provided by your drive to see if it is having errors, you probably should.

      The last survey that popped up here said that if SMART says your drive will fail, it probably will, but if SMART doesn't say it will fail, it doesn't mean much.

      Suffice to say that you should never trust any piece of hardware that thinks it's SMARTer than you are.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Check SMART Info by Bill+Dimm · · Score: 1

      The last survey that popped up here said that if SMART says your drive will fail, it probably will, but if SMART doesn't say it will fail, it doesn't mean much.

      Yes, that was the Google study. So, if SMART says there is a problem, you should pay attention to it. If SMART doesn't find a problem, that doesn't mean you are out of the woods.

    3. Re:Check SMART Info by sparkz · · Score: 1

      Good point; I just downloaded it. It just stores the 5 most recent errors:

          hda has had 356 errors in its short life (I've had it about a year; 200Gb Seagate IDE)
          hdc has had 4,560 errors its life (after nearly 3 years of service; 80Gb Maxtor IDE)

      That does't sound good to me.

      I got the Seagate because my previous drive had failed fsck a few times and had some dodgy-looking data on it.

      These figures suggest about 1 error/day for the Seagate, and 4 errors/day for the Maxtor.

      I don't like those numbers :-(

      --
      Author, Shell Scripting : Expert Re
    4. Re:Check SMART Info by Anonymous Coward · · Score: 0

      Yes, that was the Google study. So, if SMART says there is a problem, you should pay attention to it. If SMART doesn't find a problem, that doesn't mean you are out of the woods.

      Right, that was in a study by Google recently. Basically it said that when SMART reports a problem your drive will likely fail, but if it doesn't report a problem you may still have a problem.

      Echo... echo...

    5. Re:Check SMART Info by mollymoo · · Score: 1

      The last survey that popped up here said that if SMART says your drive will fail, it probably will, but if SMART doesn't say it will fail, it doesn't mean much.

      Actually, other than scan errors, the Google data shows that even with SMART errors, you drive probably won't fail soon (they all fail eventually, of course). It is much more likely to fail than a drive without any errors, but still more likely to last another year than fail.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    6. Re:Check SMART Info by DimGeo · · Score: 1

      That's right. Shop smart. Shop S-MART! :P

    7. Re:Check SMART Info by Chalex · · Score: 2, Informative

      Slightly off-topic, but if you haven't checked the Google paper on Self-Monitoring, Analysis and Reporting Technology (SMART) info provided by your drive to see if it is having errors, you probably should. The paper is available here: http://hardware.slashdot.org/hardware/07/02/18/042 0247.shtml

      The conclusions are roughly the following: a) if there are SMART errors, the disk will fail soon, b) if there are no SMART errors, the disk is still likely to fail. They saw no SMART errors on 36% of their failed disks.

  25. RAID = Redundant Articles of Identical Discourse by MasterC · · Score: 2, Informative
    New meaning for RAID: Redundant Articles of Identical Discourse.
    Slashdot has a high rate of RAID, which is a bad thing. Which is a bad thing. It has been a whole 9 days. Slashdot needs a story moderation system so dupe articles can get modded out of existance. Ditto for slashdot editors who do the duping! :) (I have long since disabled tagging since 99% of the tags were completely worthless: "yes", "no", "maybe", "fud", etc. If tagging is actually useful now, please let me know!)

    Can we get redundant posting on the story about google's paper?
    --
    :wq
  26. Redundancy by pizza_milkshake · · Score: 3, Funny

    I thought storage-related redundancy was supposed to be a good thing ;)

    1. Re:Redundancy by georgewilliamherbert · · Score: 5, Funny

      Redundant Array of Irritating Discussions?

    2. Re:Redundancy by networkBoy · · Score: 1

      No... To fully redundify the issue it is:
      Redundant Array of Imitating Duplicates

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    3. Re:Redundancy by __aalwyc6372 · · Score: 1

      Redundant Amount of Irrelevant Discussions!

  27. Unfortunately by nmos · · Score: 1

    Unfortunately the data was skewed by one large web site that reported it's results multiple times.

    1. Re:Unfortunately by winkydink · · Score: 1

      Well put! That took me a second. :)

      --

      "I'd rather be a lightning rod than a seismometer." -Ken Kesey

  28. Thanks for the tip! by PatPending · · Score: 1

    He echoed storage vendors and analysts in pointing out that as many as half of the drives returned to vendors actually work fine and may have failed for any reason, such as a harsh environment at the customer site and intensive, random read/write operations that cause premature wear to the mechanical components in the drive. Random read/write operations? Oh, okay, I'll start using *sequential* read/write operations instead! Thanks for the tip!
    --
    What one fool can do, another can. (Ancient Simian Proverb)
    1. Re:Thanks for the tip! by Anonymous Coward · · Score: 0

      Random read/write operations? Oh, okay, I'll start using *sequential* read/write operations instead! Thanks for the tip!
      Why do you sound so surprised? Random I/O operations cause the disk heads to thrash across the disc, which increases wear on the actuator arms. That should just be obvious.
  29. Odd numbers for memory failure? by nmos · · Score: 1

    One of the things that bugged me last time this report was on /. was that 2 of the three sources reported that memory was replaced after 20% or more of their system failures. That seems pretty odd because in my experience memory hardly ever just goes bad. Sure sometimes it's bad right out of the box which is why I test every module that I buy but once it's installed and test memory tends to keep working just about forever. If that number is off then I wonder how seriously I should take their other numbers.

    1. Re:Odd numbers for memory failure? by Akaihiryuu · · Score: 2, Interesting

      I had a 4mb 72-pin parity SIMM go bad one time...this was about 12 years ago in a 486 I used to have. It just didn't work one day (it worked for the first two months). Turn the computer on, get past BIOS start, bam...parity error before bootloader could even start. Reboot, try again, parity error. Turn off parity checking, it actually started to boot and then crashed. The RAM was obviously very defective...when I took that 1 stick out the computer booted normally even with parity on, if I tried to boot with just that stick it would never even POST. That's the only time I have ever seen memory fail...but then it came from a really shady local dealer who regularly scammed people...this same guy had a rack of "shareware" DOS games with neatly printed labels (all labels he printed) for like $5/disk, all of the disks completely blank (not even formatted). I had happened to get one of those when I got the RAM, and my friend did too (from another part of the rack, we didn't give much thought to that at the time, was just an "oh, this looks like it might be neat" thing). Neither disk was even formatted. The CDROM drives he sold me and my friend died within a month also (about a month after the RAM). Amazingly the store was still in business when I went back with the stick of RAM...he looked at it with a magnifying glass, claimed it was "scratched" and therefore abused. I burned rubber out of his parking lot, tossing a lot of gravel against the windows, then I found a reputable place to get RAM (though this was back in the days when 4MB cost $200). 2 days later I drove by, the place was boarded up and closed. Both CDROM drives died within 2 days of each other a month later. Nothing that came out of that place worked.

    2. Re:Odd numbers for memory failure? by Anonymous Coward · · Score: 1, Informative

      Where I work we have some large compute clusters where the nodes report memory errors. It's actually very common for a memory module to start throwing errors that eventually exceed a threshold for replacement.

      We see everything eventually die - power supplies, fans, motherboards, RAM, CPUs, drives. Nothing is immune from "wearing out" except maybe the boxes themselves.

  30. Which is why I use Samsung by WindBourne · · Score: 1

    Samsung seems to have pretty decent QC at this time. I have no issues with them. OTH, I have seen maxtors die with less than 2 years on them.

    --
    I prefer the "u" in honour as it seems to be missing these days.
    1. Re:Which is why I use Samsung by Yetihehe · · Score: 1

      It means nothing. I have just returned 1 year samsung 120gb disk with bad sectors for warranty. But still I have seen one maxtor disk which passed away during first partitioning.

      --
      Extreme Programming - Redundant Array of Inexpensive Developers
  31. No way by Tablizer · · Score: 2, Funny

    High rate of failure? That's a bunch of

    1. Re:No way by Anonymous Coward · · Score: 0

      Now that's funny. I always hated that whole thing where folks try to imitate a broken connection by laboriously typing out ç##;]~*÷"#NO CARRI

  32. Seagate by mabu · · Score: 3, Insightful

    After 12 years of running Internet servers, I won't put anything but Seagate SCSI drives in any mission critical servers. My experience indicates Seagate drives are superior. Who's the worst? Quantum. The only thing Quantum drives are good for is starting a fire IMO.

    1. Re:Seagate by CelticWhisper · · Score: 2, Funny

      Well, duh. Why do you think they used to call them Fireballs?

      --
      Help protect civil rights from abuse by the TSA - visit TSA News Blog.
      http://www.tsanewsblog.com
  33. just assume 3 years by crabpeople · · Score: 4, Informative

    A good rule of thumb is 3 years. Most hard drives fail in 3 years. I dont know why, but im currently seeing alot of bad 2004 branded drives and consider that right on schedule. Last year the 02-03 drives were the ones failing left and right. I just pulled one this morning thats stamped march 04. Just started acting up a few days ago. Like clockwork.

    --
    I'll just use my special getting high powers one more time...
    1. Re:just assume 3 years by misleb · · Score: 1

      It is pretty amazing how that works out. Apple recalled a large subset of G4 eMacs because of that leaky capacitor issue in the power supplies. And after a few years of service, a bunch started failing within a window of a couple months. They got repaired for free, of course. But it was fairly chaotic having so many machine machines out for service at a time.

      Then again, considering the assembly-line efficiency and relative consistency with which devices and conponents are made these day, maybe it isn't isn't so surprising. It is almost like everything is designed to be disposable. :P

      -matthew

      --
      "THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
    2. Re:just assume 3 years by seaturnip · · Score: 1

      Google's study said that failure rates were not much correlated with the amount of time a drive's been in use. I'd trust them over a subjective impression on a small sample.

    3. Re:just assume 3 years by Toon+Moene · · Score: 1

      Ugh, three years is only 3 * 365 * 24 = 26280 hours - not a million - not even close ...

    4. Re:just assume 3 years by Zaiff+Urgulbunger · · Score: 1

      I'd agree that the Google study is probably the best, but I have also seen the same ~3yr failure point. I'm guessing here, but I'd say that the Google study shows that drives don't *just* fail at the start or end of their lives, but also, a not insignificant amount fail at some random interval.

      Thus, the 3yr figure still stands as being the point at which you should replace a drive regardless. But the Google study shows that you need to plan for plenty of failures anyway.

  34. Not only that, it's duplicated! by winkydink · · Score: 0, Offtopic
    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

  35. Faster, cheaper, more reliable by dangitman · · Score: 2, Informative
    Pick any two.

    I've noticed this personally. Now, anecdotal evidence doesn't count for a lot, and it may be a case that we are pushing our drives more. But back in the day of 40MB hard drives that cost a fortune, they used to last forever. The only drive I ever had fail on me in the old days were the Syquest removable HD cartridges, for obvious reasons. But even they didn't fail that often, considering the extra wear-and-tear of having a removable platter with separate heads in the drive.

    But these days, with our high-capacity ATA drives, I see hard drives failing every month. Sure, the drives are cheap and huge, but they don't seem to make them like they used to. I guess it's just a consequence of pushing the storage and speed to such high levels, and cheap mass-production. Although the drives are cheap, if somebody doesn't back up their data, the costs are incalculable if the data is valuable.

    --
    ... and then they built the supercollider.
  36. Re:Corporations misrepresent products, news at 11: by mollymoo · · Score: 1

    It's hard to take someone seriously when they claim that their drives have a 100+ year MTBF, especially since precious few are still functional after 1/10th of that much use.

    You're misinterpreting MTBF. A 100 year MTBF does not mean the drive will last 100 years, it means that 1/100 drives will fail each year. There will be another spec somewhere which specifies the design lifetime. For the Fujitsu MHT2060ATdrive which was in my laptop the MTBF is 300 000 hours, but the component life is a crappy 20 000 hours or 3 years - 93% of drives should make it that far given the MTBF. After the end of the design lifetime, all bets are off.

    --
    Chernobyl 'not a wildlife haven' - BBC News
  37. A Story by alan_dershowitz · · Score: 1

    When I was in high school in 1995, I was a network intern. We had a 486 Novell Netware server for the high school building. The actual admin was a LOTR fan, and named it GANDALF, others were SAMWISE, etc. One day about four years ago, a friend of mine who worked for the school district calls me and says, "hey, I saw Gandalf in the dumpster today. I thought you might want him, so I grabbed him."

    Besides nostalgia, there wasn't a lot I could do with a giant, noisy 486 anymore, so I ended up just pulling the SCSI interface and drive for use in another machine I had and dumping the rest. I was living in a trailer at the time, and was using a closet as my "server room." After about six months of service, the machine died on me. Everything inside the case had a crust on it. It turned out that I had a roof leak in the closet, and it eventually soaked and killed the machine.

    Anyway, it's 2007 and I'm still using that drive in a Samba print server. It's still alive despite a decade having passed and it being soaked with rainwater.

    1. Re:A Story by LordPhantom · · Score: 1

      I don't suppose the volume label is "The One Ring" is it?

  38. Re:RAID = Redundant Articles of Identical Discours by Nimey · · Score: 1

    They aren't useful yet. Given the crowd, won't be until they're rethought.

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
  39. Off-Topic: SI Units by ewhac · · Score: 5, Informative

    I just can't believe that the same vendors that would misrepresent the capacity of their disk by redefining a Gigabyte as 1,000,000,000 bytes instead of 1,073,741,824 bytes would misrepresent their MTBF too!

    Not that this is actually relevant or anything, but there's been a long-standing schism between the computing community and the scientific community concerning the meaning of the SI prefixes Kilo, Mega, and Giga. Until computers showed up, Kilo, Mega, and Giga referred exclusively to multipliers of exactly 1,000, 1,000,000, and 1,000,000,000, respectively. Then, when computers showed up and people had to start speaking of large storage sizes, the computing guys overloaded the prefixes to mean powers of two which were "close enough." Thus, when one speaks of computer storage, Kilo, Mega, and Giga refer to 2**10, 2**20, and 2**30 bytes, respectively. Kilo, Mega, and Giga, when used in this way, are properly slang, but they've gained traction in the mainstream, causing confusion among members of differing disciplines.

    As such, there has been a decree to give the powers of two their own SI prefix names. The following have been established:

    • 2**10: Kibi (abbreviated Ki)
    • 2**20: Mebi (Mi)
    • 2**30: Gibi (Gi)

    These new prefixes are gaining traction in some circles. If you have a recent release of Linux handy, type /sbin/ifconfig and look at the RX and TX byte counts. It uses the new prefixes.

    Schwab

    1. Re:Off-Topic: SI Units by Fulcrum+of+Evil · · Score: 1

      As such, there has been a decree to give the powers of two their own SI prefix names.

      One question: when did nist get to make decrees?

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    2. Re:Off-Topic: SI Units by DimGeo · · Score: 1

      That, my friend, is *completely* beside the point. Your OS uses 1024 as a base for these measures, and so does every software I can think of. You shouldn't need a bachelor's degree in CS to be able to figure out hard drive sizes when you go to the store...

    3. Re:Off-Topic: SI Units by Anonymous Coward · · Score: 0

      Advertising hard drive sizes in GB instead of GiB might be inconvenient to you but it is not deceptive, just like selling wood in metric units might be inconvenient in the U.S. but it is not deceptive. Selling 1-cm-thick wood and *calling* it 1-inch-thick would be deceptive, but that's the OS vendors, not the disk vendors. Also I said "inconvenient to you" because personally I'm tired of multiplying/dividing by 1024 in my head when I'm comparing filesizes with different prefixes and I would gladly set my OS to display values using 1000 as a base if the option was available.

    4. Re:Off-Topic: SI Units by ewhac · · Score: 1

      One question: when did nist get to make decrees?

      Dude, it's the National Institute of Standards and Technology. These are the guys who keep the reference kilogram for the United States, against which all others are measured. They establish and keep the standards of weights and measures for the country, in cooperation with other international standards organizations. Wanna know exactly how long an inch is? You go to them.

      If anyone gets to decree new SI prefix names, they are among one of the handful of organizations in the world that gets to do that.

      Schwab

    5. Re:Off-Topic: SI Units by Anonymous Coward · · Score: 0

      and who lobbied for that change? oh yeah, HD manufactures.
      Keebee, meebee, geebee? no, those don't sound lame at all.
      Wait, and why didn't the HDF manufactures adopt the new prefixes and not use the misleading prefix? oh yeah, to make their drives seem bigger.

    6. Re:Off-Topic: SI Units by Timothy+Brownawell · · Score: 1

      • 2**10: Kibi (abbreviated Ki)
      • 2**20: Mebi (Mi)
      • 2**30: Gibi (Gi)

      Hmm, "MiB"... perhaps there is some good come from all this nonsense.

    7. Re:Off-Topic: SI Units by chris_eineke · · Score: 1

      Someone with brains anticipated jokes about 500 GB not being a bigi and put the bi denominator after the magnitude. She deserves a Bimer. :-)

      --
      "All you have to do is be fragile and grateful. So stay the underdog." Chuck Palahniuk, Choke
    8. Re:Off-Topic: SI Units by glwtta · · Score: 1

      As a small addendum to that excellent explanation, I have to ask - how many freaking times do we have to go over this? I mean, in every single article about hard drives or networks, there's a dozen identical threads about this. Every time.

      How hard is this? Hard drives and network speeds are always measured in actual base 10 SI units. Always have been, always will be. Always.

      Get over it already!

      And no one's trying to mislead you - all HD packaging always specifies that "1 GB = 1,000,000,000 bytes", just for smart-asses like you.

      --
      sic transit gloria mundi
    9. Re:Off-Topic: SI Units by toddestan · · Score: 1

      Actually, it is deceptive because they are counting on the fact that most people don't realize that harddrive MB/GB's are different from other MB/GB's, even if they technically correct according to NIST. Don't forget that they started this whole thing anyway back in the 80's.

      And your wood analogy is kind of close. Last time I was in the hardware store, the 1/2" plywood was really 15/32" plywood if you looked close enough. Like the harddrive makers back in the 80's, they just figured they can skim off a bit, and people won't notice.

    10. Re:Off-Topic: SI Units by marcosdumay · · Score: 1

      Disks used to be measured on base 2 prefixes. It was a few years ago that they changed to base 10.

    11. Re:Off-Topic: SI Units by ScrewMaster · · Score: 1

      Yeah, well, mebi I'll use those terms and mebi I won't. So there.

      --
      The higher the technology, the sharper that two-edged sword.
    12. Re:Off-Topic: SI Units by Fulcrum+of+Evil · · Score: 1

      I guess my point was that nobody asked for new SI prefixes. When talking about memory and disk space, kilo = 1024.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    13. Re:Off-Topic: SI Units by Anonymous Coward · · Score: 0

      Yeah yeah just like an ounce is 28.349523 g, except for precious metals it is 31.103476 g. One goal of the metric system was to get rid of this mess so it is not coming back. If you want a prefix for 1024 find a new name, it's that simple. Kilo is already taken. Kibi has been suggested. if you don't like it come up with something else if you want, it can't be hard.

    14. Re:Off-Topic: SI Units by Anonymous Coward · · Score: 0

      Advertising hard drive sizes in GB instead of GiB might be inconvenient to you but it is not deceptive, just like selling wood in metric units might be inconvenient in the U.S. but it is not deceptive. Selling 1-cm-thick wood and *calling* it 1-inch-thick would be deceptive, but that's the OS vendors, not the disk vendors.

      Considering that a 2x4 is actually 1.5x3.5 inches in the US, I'm not sure that's a comparison you want to make. :)

      Besides, no self respecting man would toss around words like "mebibyte" and "gibibyte."
    15. Re:Off-Topic: SI Units by fnord_uk · · Score: 1

      NIST? Those guys? I found a 16dB discrepancy in their RF link budget calculator a few years ago. I often wonder how many over-specified systems that caused to be deployed. Still, you can't grumble about a better link margin if you can afford the costs of the design.

      fnord

      --
      In theory, theory and practice are the same. In practice, they're not.
    16. Re:Off-Topic: SI Units by Obfiscator · · Score: 1

      No one asked anyone to redefine the old SI prefixes, either. I don't understand why people decided to take a standard (the prefix "kilo") and change it to something that was almost, but not quite, what it originally meant. It seems like a new prefix would have been a better choice, and if that's the case why persist with the wrong choice?

      --
      "Nothing shocks me. I'm a scientist." -Indiana Jones
    17. Re:Off-Topic: SI Units by Fulcrum+of+Evil · · Score: 1

      Because it's now standard. The reason for measuring information in powers of two (and it really is different from physical quantities) is tied to the fundamental nature of computers. It's jargon, so it's up to the CS people to define its meaning and they have chosen what you see. It isn't really confusing to people who use it every day.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
  40. Re:Not So Fuzzy math by Annoying · · Score: 4, Informative

    0.88% != 0.88
    0.0088 * 15 = 0.132 (13%)
    13% you say? The excerpt says 2%-4%. RTA and you'll see though they report up to 13% on some systems.

  41. or BEFORE... by toby · · Score: 1

    Sigh.

    As Schwartz put it recently, there are two kinds of disk: Those that have failed, and those that are going to.

    --
    you had me at #!
  42. Before that by phorm · · Score: 1

    Hell, nowadays I wouldn't rely on one single drive before it reaches warranty. Usually by the time of the smaller warranty's (1yr) you've accumulated enough important stuff to make the data-loss much more painful than the cost of the replacement drive.

    Now in some cases manufacturers with longer warranties are stating that they have more faith in their product, and certainly the sudden drop in warranty length (from 2-3 years down to one for many) indicates a lack of faith in their products.

    Basically, a warranty isn't so much your guarantee on a product so much it says:
    This warranty length gives us the maximal profit on drive sales vs returns. In other words, any longer than that and the returns are going to eat into the company's profits, but there will be drive deaths both before and after that term. Nowadays a three year warranty isn't any sort of guarantee of such longevity, but rather the point at which the manufacturer is no longer willing to eat the cost of returns.

  43. Mod parent up! by Jaqenn · · Score: 1

    I burned all my mod points this morning, and this one definitely deserves +X informative.

    --
    You are awash in a sea of fiercely stated opinions. Obvious exits are: 'File->Quit', 'Reply', and 'Page Down'.
  44. Get your language right by billcopc · · Score: 1

    There's a big difference between a drive failure and a drive replacement.

    Just because Seagate/Western replace a drige doesn't mean that drive is toast. It means someone has a problem with it. Sometimes the problem is bad cabling. Sometimes the problem is bad cooling. Sometimes the problem is outside the box, sitting at a keyboard.

    Heck some people will get a new hard drive because they don't know how to reload the OS... very very often! Let's say someone unenlightened gets a boot-time error message, so they call tech support and the techie has them run a diagnostic tool... but they can't get it to run because the thing won't boot. The kid sets up an RMA, the customer gets a new blank drive, pops it in, and since there's nothing on the drive, it tries to boot off the CD-Rom. Windows Setup loads and the machine is magically "fixed". It would have been better fixed by changing the boot order in the BIOS, and doing a repair install, but the average user doesn't know all that "nerdy hacker stuff", and the average tech support drone is quite happy with the bad, easy solution. After all, India doesn't pay for the hard drive, Dell/HP/Toshiba do. Another problem is that stupid users think they're smarter than the tech they called (they're often right, but let's not go there). If they're staring at a blue screen, and you give them a 5 minute fix that brings it all back, a lot of idiots will say "You didn't fix it. It's gonna break again. I still want a new hard drive!". Personally I'd ship them a box of TNT but that usually doesn't show up as an option in the RMA part list. So you send the idiot a new hard drive, even when you know it's perfectly fine. Worst case, the guys in receiving will test the returned drive and put it back on the shelf.

    Now hard drives do actually fail from time to time, but not nearly as often as people seem to think. I learned the hard way about hard drive reliability. I used to be the alpha geek teenager who crammed a half dozen hard drives with handmade rounded ATA cables and a sparkomatic power supply, the one that comes with the $20 cheapo chassis at your local asian importer. Oh yeah, the CPU was an overclocked Athlon T-Bird, often mistaken for an industrial heating unit. Fastest ghetto RAID array in town only I had dead drives every six months.

    Then one day I started putting those same drives in a well ventilated chassis with top-quality cables, power supplies and lots of big efficient fans right up against the drive rails. I crack the case open every now and then to clean out any dust buildup, perhaps every 6 to 8 weeks or so.. doesn't even require a shutdown. I haven't had a drive fail in four years, seriously! And I'm talking about 20 drives here across my 3 main rigs, they get the living tar beat out of them on a daily basis. Random luck certainly has a play in all of this, but my point is a lot of failures can be prevented. I'd like to think a lot of physical failures could also be avoided if the damned manufacturers would spend a little more time and money on reliability. Sell me a drive that costs up to 30% more, but has subtle improvements that lead to a noticeably longer lifetime. Most people won't get it, they'd rather get a drive that dies twice as often but costs 30% LESS... just look at all the Nova DVD players Wal-Mart sold over the holidays... humans are cheap ignorant reptiles, that's just nature.

    For the other 10% whose time and data are actually worth something, there is a market for disaster-proof drives. Heck, just sandwich two disks with mirroring and have it tell me when one of them's on the fritz. That's what I end up doing anyway, only my current method involves buying an overpriced RAID controller. Well if I considered my data so important that I chose to spend $400 on a controller to RAID up a pair of $75 hard drives, I don't think I'd have a problem spending even $200 on a single unit that does it all in one neat package, I'd still be about 300 bucks ahead, and instead of me giving all that cash to Adaptec/

    --
    -Billco, Fnarg.com
  45. Actually, one useful feature of Vista... by Tim+Browse · · Score: 4, Interesting

    ...is that it detects SMART disk errors in normal use (i.e. you don't have to be watching the BIOS screens when your PC boots).

    When I was trying the Vista RC, it told me that my drive was close to failing. I, of course, didn't believe it at first, but I ran the Seagate test floppy and it agreed. So I sent it back to Seagate for a free replacement.

    About the only feature that impressed me in Vista, sadly. (And I'm not sure it should have impressed me, tbh. I'm assuming XP never did this as I've never seen/heard of such a feature.)

    1. Re:Actually, one useful feature of Vista... by Matt+Perry · · Score: 3, Informative

      When I was trying the Vista RC, it told me that my drive was close to failing. ... About the only feature that impressed me in Vista, sadly.
      Be sad no more. SmartMonTools will run in UNIX or Windows and notify you if it detects SMART errors. For the Windows installer look for the phrase "Install the Windows package" on the smartmontools home page..
      --
      Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
    2. Re:Actually, one useful feature of Vista... by Jeff+DeMaagd · · Score: 1

      I'm not sure it needs to be built into the OS. I know there are third party tools for Linux and OS X that check the SMART status, I would be surprised if XP didn't have something similar.

      I'm not totally convinced that it really does anything good, and there's some indication that my gut impression is correct. The Google drive paper said that SMART data is good for statistical tracking of a population of like drives, but a SMART error on a particular drive isn't an indicator that the given drive will actually die.

  46. Oddly... by Anonymous Coward · · Score: 0

    I have about 8 Maxtor drives that I've accumulated over the years. I've only had one ever die, and it was over 10 years old.

    I've had 3 Samsung drives, and they've all died after a few years.

    Just for reference, personal anecdots mean nothing -- statistically, all of the major manufacturers have roughly identical failure rates.

    1. Re:Oddly... by mollymoo · · Score: 1

      Just for reference, personal anecdots mean nothing -- statistically, all of the major manufacturers have roughly identical failure rates.

      Source?

      --
      Chernobyl 'not a wildlife haven' - BBC News
    2. Re:Oddly... by Gnight · · Score: 1
      Google's study actually contradicts this.

      Failure rates are known to be highly correlated with drive models, manufacturers and vintages....For example, [the chart showing failure rate] changes significantly when we normalize failure rates per each drive model.

      Source: Google, (2007). Failure Trends in a Large Disk Drive Population. Section 3.2, paragraph 1. http://labs.google.com/papers/disk_failures.pdf
  47. Firehose by pavon · · Score: 1

    Slashdot does have story moderation system now. It is called firehose - you can find a link in the menu at the top of the screen. It allows you to give thumbs up or thumbs down to a story as well as marking a story with feedback such as dupe or typo, in addition to the normal tagging system.

    I both gave this story a thumbs down and dupe feedback, however, so many other people moderated the story up that it was at the highest (visible) ranking by the time it got posted. Apparently a bunch of people missed the story the first time around, or didn't realize this was the same study or something. I guess I can't really blame the editors for giving users what they want.

  48. Minor nitpick by Anonymous Coward · · Score: 0

    While their symbols are often uppercase, the prefixes themselves are all-lowercase, e.g. kilo, mega, giga, not Kilo, Mega, Giga. If you meant to simply have them stand out in your text, try using italics.

  49. This is only news... by rickb928 · · Score: 2, Informative

    ...to those of you who haven't managed 24x7x365 servers very much. And little news to those of you who have a computer at all.

    I expect most desktop drives to last 5 years max. MAX. No manufacturer has an edge. It's just the way it is. MTBF is fiction.

    For an always-on server, I expect failures about every 3-4 years. For my clients who cared enough to pay for the very best, I replaced the drives in the 3rd year without waiting. No failures costa a bit more.

    My experience is that Seagate and Fujitsu are my best server drives. IBM was also on the list, but I'm watching Hitachi. No decision.

    The losers: Quantum (thankfully gone), Samsung (until recently), Maxtor. Not my opinion, my experience.

    Now, in fairness, these are some of my historical losers:

    Seagate: Early IDE drives and the 'stiction' problem. Remember banging drives to get them started?

    Quantum 'Bigfoot' drives: popular in Compaq machines, the 5.25" .7" thin piece of junk. died often. Even Compaq admitted these were bad.

    Seagate SCSI drives: Many different types had a bad habit of going off-line for no apparent reason. Your Novell server would log the 'device deactivated to a non-media defect' error. Just restarting the bus controller would sometimes wake them up. Sometimes repowering the drives. Would happen every few months. Usually when I was elsewhere...

    And then there was Miniscribe.

    But MTBF numbers are universally fiction. Imagine trying to sell the idea of a wave bearing lasting 16 years to an engineer with real-world experience. I figure MTBF numbers come out of the marketing department.

    -rick

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  50. Calculated MTBT != Reality by flyingfsck · · Score: 1

    The whole calculated MTBF thing is a sham. It is based on figures compiled by large telcos 20 or 30 years ago. The result of a calculated MTBF bears zero relationship to reality and the university is calling the bluff of the manufacturers.

    The only use of a calculated MTBF, is to call attention to potentially stressed components during the design cycle, but even that is dubious. The actual figures are totally meaningless actually and is really just a number, where greater numbers usually means better, but not necessarily.

    --
    Excuse me, but please get off my Pennisetum Clandestinum, eh!
  51. 1024m in a km by Tumbleweed · · Score: 1

    And those lying road signs, too. Everyone knows there should be 1024 meters in a kilometer!

    Wouldn't that be called a 'kebimeter'?

  52. Wow... by ThatsNotPudding · · Score: 0, Offtopic

    porn is a harsh mistress.

  53. How much RAM needed for a diskless system? by Fastball · · Score: 1

    I wonder if anyone has tried to develop a system that limits hard disk activity to boot up and shutdown. Like hibernation in a way. Then everything else takes place in memory, lots of memory. Yes, you're subject to losing data in the event of a power failure or application crash. However, with a decent UPS, there's time to retreat that data to disk before powering off. And some applications are mature enough that they don't crash sporadically if much at all.

    This is all very speculative, I know, but this is Slashdot. I believe I read something not long ago about DBMS that ran solely in memory and realized very nice performance.

    1. Re:How much RAM needed for a diskless system? by Anonymous Coward · · Score: 0

      > I wonder if anyone has tried to develop a system that limits hard disk activity to boot up and shutdown. Like hibernation in a way. Then everything else takes place in memory, lots of memory. Yes, you're subject to losing data in the event of a power failure or application crash. However, with a decent UPS, there's time to retreat that data to disk before powering off. And some applications are mature enough that they don't crash sporadically if much at all.

      > This is all very speculative, I know, but this is Slashdot. I believe I read something not long ago about DBMS that ran solely in memory and realized very nice performance.

      I've thought about this a little bit. You'll need as much RAM as you think your software will ever use at its highest load, plus the entirety of your dataset, plus generous room for growth/temp space that might otherwise be on disk. With gobs of RAM, you can rely on the OS' disk cache and/or the DB software's cache to almost never have to read from the disk, reducing some wear. There'd still be disk writes, but reads of in-cache sectors shouldn't be blocked by the OS writing to disk, so you win in performance.

      To go further, and eliminate disk reads and writes, you could mount a big ramdrive on bootup and copy your working directory tree over. On shutdown, resynchronize it to disk. In this situation all your harddrives could die simultaneously and you'd still be up and running.

      To improve speed further, you can get database software that is designed from the ground-up to reside entirely in memory, and it should be faster, because it skips all the usual file IO calls/filesystem abstraction/kernel disk cache/DB caching. Google for in-memory databases.

      You can also get harddrive-backed RAM drives with standard hard drive interfaces. Google solid-state drives.

  54. Ideal conditions vs. Real world by CorporalKlinger · · Score: 2, Informative

    I think one of the key problems here isn't necessarily the statistical methods used, it is that the CMU team was comparing real-life drive performance to the "ideal" performance levels predicted by the drive manufacturers. Allow me to provide two examples of this "apples to oranges" comparison problem.

    I have had two computers with power supply units that were "acting up." They ended up killing my hard drives on multiple occasions - Seagates, WD's, Maxtors, etc. It didn't matter what type of drive you put in these systems, the drive would die after anywhere from a week to two years. I later discovered that the power supplies were the problems, replaced them with brand new ones, and replaced the drives one last time. That was quite some time ago (years), and those drives, although small, still work, and have been transferred into newer computer systems since that time. The PSU was killing the drives; they weren't inherently bad or had a manufacturing defect. A friend of mine who lives in an apartment building constructed circa 1930 experienced similar problems with his drives. After just a few months, it seemed like his drives would spontaneously fail. When I tested his grounding plug, I found that it was carrying a voltage of about 30V (a hot ground - how wonderful). Since he moved out of that building and replaced his computer's PSU, no drive failures.

    The same type of thing is true in automobile mileage testing. Car manufacturers must subject their cars to tests based on rules and procedures dictated by state and federal government agencies. These tests are almost never real world - driving on hilly terrain, through winds, with the headlights and window wipers on, plus the AC for defrost. They're based on a certain protocol developed in a laboratory to level the playing field and ensure that the ratings, for the most part, are similar. It simply means when you buy a new car, you can expect that under ideal conditions and at the beginning of the vehicle's life, it should BE ABLE to get the gas mileage listed on the window (based on an average sampling of the performance of many vehicles).

    My point is that there really isn't a decent way to go about ensuring that an estimated statistic is valid for individual situations. By modifying the environmental conditions, the "rules of the game" change. A data-center with exceptional environmental control and voltage regulation systems, and top-quality server components (PSU's, voltage regulators, etc.) should expect to experience fewer drive failures per year than the drives found in an old chicken-shack data center set up in some hillbilly's back yard out in the middle of nowhere where quality is the last thing on the IT team's mind. It's impractical to expect that EVERY data center will be ideal - and since it's very very difficult to have better than the "ideal" testing conditions used in the MTTF tests - the real-life performance can only move towards more frequent and early failures. Using the car example above, since almost nobody is going to be using their vehicle in conditions BETTER than the ideal dictated by the protocols set forth by the government, and almost EVERYONE will be using their vehicles under worse conditions, the population average and median have nowhere to go but down. That doesn't mean the number is wrong, it just means that it's what the vehicle is capable of - but almost never demonstrates in terms of its performance - since ideal conditions in the real world are SO rare.

  55. Change the specs by DigitAl56K · · Score: 1

    Manufacturers should be compelled to update published MTBF specifications (and similar metrics) over time based on actual data (e.g. how many units have been [i]sold[/i] (not just shipped), how many have been returned or reported dead, and how long the diagnostic data on the drive reports it was actually working. DOA drives could be excluded.

  56. EMC automatic replacement - anecdotal by zerofoo · · Score: 1

    Two jobs ago I was a sysadmin at a place that had an EMC Clariion and Symetrix SANs. Both SANs had the ability to call home to EMC when they detected a drive failure and EMC would send out a replacement drive automatically.

    We saw FedEX overnight boxes sitting on our doorstep in the morning with disturbing regularity. The "quality" of the systems did not seem to matter. A $30,000 SAN using SATA drives or a $500,000 SAN using FC drives...both had almost equal failure rates.

    The FC SAN had WAY better performance....probably due to the 32 GB of system cache.

    -ted

  57. Ancedotally by queenb**ch · · Score: 1

    We have several large computing labs in our building. We run mostly IDE or SATA drives depending age of the hardware in them. Now, in their defense, we've been undergoing constant construction which means huge power fluctuations all day long and a big surge at night when all the construction equipment is shut off. The labs on are not on UPS's because it just isn't feasable. In some of the labs during the past year, we've seen hard drive failure rates as high as 25%. Brand doesn't seem to matter, neither does size, RPM's, etc. Fortunately, these are lab machines and it's pretty easy to bring one back up. Take the spare hard drive, apply ghost image, install. Send dead hard drive back for warranty. Put new drive on shelf as spare. Rinse and repeat. Still, it's a lot of time to replace all those.

    2 cents.

    QueenB.

    --
    HDGary secures my bank :/
    1. Re:Ancedotally by BrokenHalo · · Score: 1

      How do they work out the Mean Time to Failure anyway? 1.5 million hours in laymanspeak is 171.23 years. I don't know anyone who has had a disk drive that long. ;-)

    2. Re:Ancedotally by queenb**ch · · Score: 1

      That I don't know. I know how we work our out. We keep a running annual (March 2, 2006 to March 2, 2007) count of drives that have failed. On any give day we take the # of failed drives divided by total number of drives deployed We haven't been tracking the usage hours on them, but I know for a fact that there is not one single machine in the labs that's older than May of 2004. We've seen it as high as one in four that have been deployed through out the building failing.

      2 more cents,

      QueenB.

      --
      HDGary secures my bank :/
    3. Re:Ancedotally by cg0def · · Score: 1

      there is a statistical formula that they use. What the Carnegie Melon study is trying to prove is that the data being plugged in the formula is wrong or the formula needs to be adjusted so that it takes into account other parameters as well.

    4. Re:Ancedotally by Anonymous Coward · · Score: 0

      The MTTF does not define lifetime. MTTF of 171 years just means that if you put 1710 drives in service for a year, about 10 of them will fail within that time.
      It does not in any way tell you how long it will take for a single drive to fail, or for all 1710 drives to fail. That may happen after 3 or 5 years.

  58. "Enterprise" drives have different firmware by Sits · · Score: 1

    According to this NetApp reply to an open letter on storagemojo while the electronics of the drive beyond the interface may be the same on consumer and enterprise drives, the way the firmware behaves is not. The consumer drive firmware apparently do all it can to try and read data back even if it makes the drive temporarily unavailable and trusts additional information less that enterprise firmware.

  59. SMART does indicate failure... by Sits · · Score: 1
    The google paper said that there were SMART parameters that did indicate failure (but only a few parameters have a strong correlation with failure). The problem is that those parameters do not change in MOST failure cases - i.e. your disk can die without any warnings from SMART (StorageMojo summarises the Google paper and here's the original Google Failure Trends in Disks PDF).

    If (for example) the reallocated sector count is high I don't think it's a matter of if but when your drive will fail. A count of 1 doesn't guarantee failure but indicates a higher probability than usual of imminent failure. From page 7 of the PDF:

    After their first reallocation, drives are over 14 times more likely to fail within 60 days than drives without reallocation counts, making the critical threshold for this parameter also one.
  60. Re:Not So Fuzzy math by Anonymous Coward · · Score: 0

    So the headline should say " Disk Drive Failures Up To 15 Times What Vendors Say".

  61. Apparently, everyone ignores the word "mean" by kmweber · · Score: 0

    All it takes is a few drives whose reliability is sky-high to compensate for the many clustered around the bottom of the barrel. There's nothing fraudulent or corrupt about this. You can, certainly, question whether MTBF is a useful metric for measuring reliability, but it takes someone ignorant of high-school statistics to claim that just because the vast majority of drives fail BEFORE the "mean time to failure" means the numbers released are dishonest or fraudulent somehow. Geez...and y'all wonder how the American public gets whipped into such a frenzy about "terrorism" and whatnot. Y'all are the same way, just for different topics.

    --
    "Other than that, Mrs. Lincoln, how was the play?"
  62. Re:RAID = Redundant Articles of Identical Discours by Dirtside · · Score: 1

    I like dupes on Slashdot -- I sometimes go entire days without checking the site and it's nice to get another chance to participate in a discussion I may have missed out on. And when I do see the same story twice, oh well. Sometimes I ignore the second one, sometimes I read the discussion to see how it might have evolved from the first one.

    --
    "Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
  63. Urine Containment Device by Impy+the+Impiuos+Imp · · Score: 1

    > The study also shows no evidence that Fibre Channel drives are any more reliable than SATA drives

    Why would one think this to begin with? The core mechanisms are probably the same thing, just wrapped with a new I/O mechanism.

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.