Slashdot Mirror


Reviews of Hard Drive Reliability?

ewhac asks: "After having three 18G drives go toes-up on me in the last two months, all of them done so after about 40 days of use, I want the replacement drives to be rock-solid. While Tom's Hardware and AnandTech review individual drives and their performance, I haven't yet been able to locate any comprehensive or cohesive review of drive reliability and longevity. Does such a resource exist?"

19 of 44 comments (clear)

  1. Never buy IBM Drives by duffbeer703 · · Score: 3, Interesting

    Just had two 18GB IBM SCSI (LZX) drives die after less than a year. Also had 6 bad disks in 5 months on a shark at work.

    Never, ever, ever, ever buy IBM storage.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
    1. Re:Never buy IBM Drives by IronChef · · Score: 2

      Were they of the 75 GXP series? Those are (now) known to be turkeys.

    2. Re:Never buy IBM Drives by gagravarr · · Score: 2

      I've got an IBM 40gb disk (calls itself a IBM-DTLA-305040), and I've had no issues with it at all for about a year. Several of people I know have also had no issues with their IBM disks.

      My advice with any new disk is to put it in to a non critical box, then thrash it like mad for about a week solid. Lots of disk IO, keep the head moving a fair bit, read and write data etc. If it survives that, you shouldn't have any problems with it for the next two years (based on the normal failure to useage distribution curves). If it does fail, you haven't lost any data when you send for a new one.

      Part of the problem seems to be that most disk manufacturers don't like to advertise exactly how reliable (or not) their disks really are. The best way to tell how reliable they think they are is to look at their returns process. If it is really really easy and straightforward, they can't be expecting many returns (or else it wouldn't be ecconomic).

      --
      This post will enter the public domain 70 years after my death, unless Disney buys another extension.
    3. Re:Never buy IBM Drives by duffbeer703 · · Score: 2

      These drives are supposed to have 750,000 hours MTBF. They are enterprise-class 10k RPM SCSI drives that cost about $850 each when they were new.

      The fact that so many have broken to the point that there is a 60-day wait to get them replaced under warranty (IBM is out of spares at the moment) is an absolute outrage.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
  2. resist anecdotal evidence by Lepruhkawn · · Score: 5, Insightful

    I commend the request for asking for real data.

    Anecdotal evidence from people who have had drives of a certain brand fail on them and then say "never use this drive" is basically worthless. Even if you hear 5 or 10 people say that, ignore them.

    What you need to know is if there are enough anecdotes to show that the mfgr's MTBF rate is inaccurate and the real rate is a lot lower than what they report (or a lot lower than other mfgr's). Or maybe if there is a certain batch of drives that are anomalous.

    The question is: is the mfg's MTBF rate good enough for you and is it accurate?

    www.storagereview.com has started a reliability database but I don't know if their data is statistcally valuable yet.

    --
    Jesus saves....And takes 1/2 damage.
    1. Re:resist anecdotal evidence by polymath69 · · Score: 2
      Every one of them recommended Seagate.

      That's interesting. Seagate drives used to be famous for having a sudden death problem called "stiction", where the heads would fuse to the platters and the drives would become good only for so much landfill.

      Perhaps they solved that difficulty, and some time ago, but I'm only guessing because Goodle for "stiction" doesn't turn up Seagate anywhere in the top 10. But "seagate stiction" at least shows me that some people out there remember this. Some pages call it "infamous." When was this solved, if it was?

      --

      --
      I don't want to rule the world... I just want to be in charge of mayonnaise.
  3. Point of failure by ArcticChicken · · Score: 4, Informative

    If you've had 3 hard disks die on you in 2 months, the problem may not have been with the disks themselves. The first thing to check is if you're getting adequate ventilation to the area where the hard disks are at. You might also want to test the voltage your power supply is putting out.

    Questions like this about hard disks are really better answered here.

    1. Re:Point of failure by martyb · · Score: 4, Informative

      You might also want to test the voltage your power supply is putting out.

      Couldn't agree more; and not only in a static situation, but especially when you are booting the system.

      Here's a strange but true experience. I was working at a small company which was making custom PBXs. We had a few prototypes which were supposed to be identical. Most of them would boot up fine, but one exhibited strange behavior and would fail to boot cleanly. We saw many different modes of failure. We swapped out boards, power supply, etc. between the "good" and the "strange" PBX, but to no avail.

      Finally, I noticed that the power strips for the "good systems" had a 16-gauge wire to plug into the wall; the "strange" one had 18-gauge (i.e. a smaller gauge wire). Swapped in a new power strip and it worked like a charm.
      The voltage drop over the smaller wire was significant enough at boot time (when there was the greatest demand for power) to cause the system to fail!

    2. Re:Point of failure by Zeio · · Score: 2
      I agree. 90% of all problems I believe are related (not necessarily caused by - but related) to unconditioned power. I use the cheap and effective yet less known about APC LINE-R line conditioners, up to 1250 VA. They can be had from places like www.pricewatch.com/ and http://www.streetprices.com/ for about $115-$130. Well worth it, but they offer no battery backup, but *superior* line conditioning, like the integrated line conditioners on their (APC) very high end UPS's. I'd rather pay for a superior conditioner than pay for some lead acid batteries, and inverter and a "regular" conditioner. The cheap UPS's use crappy relays and a fast clamp time, thus they are not "real." TO me anyway, with exacting standards. Watch the tolerance on "conditioned" output on cheap UPS's.

      BACK to hard drives, I have had great success with both Maxtor and IBM, and reasonably high success with Seagate SCSI - just not Medalist drives or the types with the nasty-medalist fluid bearing design, some barracudas (none RECENT) suffer from this. I have seen many IDE drives fail, usually on lower memory systems when lots of thrashing / swapping occurs, and secretary's need to have every "office" application open along with www.revlon.com.

      Outside of that, since I work in IT, I have seen obscene failure rates with Western Digital products - there are have been times when ONTRACK got $3000+ for someone's hard drive having been failed, needs the "CRITICAL" data, blah blah blah (learn to backup - beeeotch, need to be a BOFH.) Dell was putting these garbage 6GB WDs in the Optiplex systems for a while and were really good at saying F**k You when you wanted them to do something extra nice when the broken hard drive cost you money and downtime. Cute Dell.

      Aside from the nasty 75GXP, particularly the ones made in Hungary, the new IBM drives and especially the 120GXP drives are simply superior in performance, I'll get back to you in a few months on MBTF on the 120GXP, but I don't suspect any problems, plus I do in fact check the SMART status with the superior IBM support disks to see if any shit is about to hit the fan. The 60GXP was very reliable, but I never got in more than 3-4 months on that one. None of my drives ever spin down or get shut off, I think cycling the power all the time can piss drives off as well - just a superstition. For Win32 victims, there is decent SMART Defender software to give you an early heads up, I'm sure some *nix variant of SMART polling has appeared or will, I just don't care to monitor *nix operations that carefully because impending hardware failure seems to be easier to see coming... Just a feeling.

      Touching on power once again, I would also suggest a PC Power and Cooling (overpriced) or an ENERMAX power supply, there are many other decent vendors, but these seem to get the job done, have a medusa pile of wires - more than any case needs, and are relatively quiet and reliable.

      Watch the temp on some of the hard drives as well, keeping the airflow good is essential. I kept an 18GB HDD on for almost 3 years straight until I got my 60GXP (soon to be upgraded to a 120GXP =), and I have had several SCSI drives in other machines as well, and thank goodness knock on wood never had any HDD failure.

      --
      Legalize the constitution. Think for yourself question authority.
    3. Re:Point of failure by kruczkowski · · Score: 2

      I bought my new computer about 6 months ago with a new Lain-Li case that has 2 big blowers cooling the drive. The new IBM 40giger died in 4 months, bought a WD 60giger and died 2 months later.

      I just think that new drives have such small platters space that the heads are more sensitive. At work he had an old 30MB drive and installed win3.1 on it, removed the HD cover and had it running for a month in the open, later we got board and stated trowing things at it, then frezing it with an air can... still ran. Of course we abusded so much that eventuly it died.

      --
      hmm... for fun I enjoy launching DDoS attacks against 127.87.42.5
  4. storagereview.com by zoombah · · Score: 5, Informative

    the storage review reliability index should serve you well. Unfortunately the site itself may be taken down soon (due to financial reasons), so get there quick.

  5. All Drives Suck -- Go Redundant by InitZero · · Score: 5, Informative

    Four 36 gig drives on 16 in our array blew out last week. (Probably heat-related. We had some AC problems in the computer room but the room never exceeded rated temperature.) Two weeks before that, two 18-gig drives in separate machines died for unknown reasons. The 36-gig drives were IBM. The 18-gig drives were Segate (who, at one time, made the IBM drives). In the last two months, we've also lost a few Maxtor drives.

    Except for the batch of drives in one array, the above is fairly typical. We have thousands of drives from many vendors and I can't swear one is any better or worse than the other. Hard drives all pretty much suck.

    Sure, we all read about MTBF being 500,000 hours for new drives but that's a pipe dream. Drives burn out every single day.

    If you have the money, buy a pair of top quality drives and mirror them. If you can't afford that, buy a couple of cheap drives and mirror them. Don't put important data on a single drive and expect it to be there when you get back from lunch.

    InitZero

    1. Re:All Drives Suck -- Go Redundant by tunah · · Score: 2
      Sure, we all read about MTBF being 500,000 hours for new drives but that's a pipe dream. Drives burn out every single day.

      Can't quite work out this sentence. Drives burn out every day? Stars burn out every single day (maybe they don't, but you get the idea) but that doesn't mean stars don't have long lifetimes (hint: they do)

      --
      Free Java games for your phone: Tontie, Sokoban
  6. What's the real problem? by uslinux.net · · Score: 4, Interesting
    I have several 9 and 18 GB drives in a mid size desktop, and they've been running for ages. I've used IBM, Maxtor, Seagate, Quantum, Western Digital, etc, and what I generally find is drives last about 3 years, which is really their useful life anyway. Some go longer, but in general, you should be able to count on 3 years.

    So, if you're finding your drives die in 30-60 days, there's likely another problem you're missing. If you're using SCSI, I'd guess they're probably 7200 or 10k RPM drives, which means LOTS of heat, especially if you have several. So, first of all, go buy a few 60 or 80mm fans, and stick them in front of the drives, if you can. Get some air flow across them (remember, air pushed across the drives does much more than air pulled/sucked across them). Heat will quickly kill a drive.

    Barring that, you haven't said how the drives have died (won't spin up, unusual read errors, etc), but a poor power supply, especially one running at capacity could burn out a drive. Finally, any sort of shock (case constantly being moved, bounced around, kicked, etc) could do a drive in, though that is probably less likely.

    As with anything else, it's all IMO, YMMV, etc.

    1. Re:What's the real problem? by ewhac · · Score: 2

      So, if you're finding your drives die in 30-60 days, there's likely another problem you're missing.

      If there is, I'd sorely like to know what it is.

      Barring that, you haven't said how the drives have died [ ... ]

      Two drives died by developing an unrecovered read error on exactly two consecutive sectors. The latest one was right in the middle of the directory structure for C:\WINDOWS\SYSTEM. Fortunately, the Linux and BeOS partitions remain bootable. The third drive hasn't malfunctioned yet, but is making a very worrying "squeak" noise regularly every 60 seconds, so I'm unwilling to commit data to it.

      The system is all SCSI, all the time. The internal chain is all Wide SCSI (no 50-pin adapters), with a twisted-pair cable and a separate terminator pack. The controller is a Mylex (nee BusLogic) BT-958 single-ended controller.

      The internal SCSI chain appears as follows (nice /proc/scsi/scsi formatting ruined to get past lameness filter):

      Attached devices:
      Host: scsi0 Channel: 00 Id: 00 Lun: 00
      Vendor: IBM
      Model: DDYS-T18350N
      Rev: S96H
      Type: Direct-Access
      ANSI SCSI revision: 03
      Host: scsi0 Channel: 00 Id: 01 Lun: 00
      Vendor: IBM
      Model: DDYS-T18350N
      Rev: S9YB
      Type: Direct-Access
      ANSI SCSI revision: 03
      Host: scsi0 Channel: 00 Id: 02 Lun: 00
      Vendor: IBM
      Model: DDRS-39130D
      Rev: DC1B
      Type: Direct-Access
      ANSI SCSI revision: 02
      Host: scsi0 Channel: 00 Id: 08 Lun: 00
      Vendor: PLEXTOR
      Model: CD-ROM PX-40TW
      Rev: 1.03
      Type: CD-ROM
      ANSI SCSI revision: 02

      The first two drives in the chain are the ones with problems. Drive 0 (boot drive) has the unrecovered read error; Drive 1 is the squeaker. Drive 1 itself is an RMA replacement for an earlier, identical drive that developed an unrecovered read error. Both of these drives have a fan blowing over them.

      Drive 2 has never exhibited any problems.

      The motherboard is an ASUS P2B-D, with two 1GHz Pentium-3s. The RAM is from Crucial, CAS latency 2, ECC. The power supply is 300W and came with the Antec case.

      In short, I've tried to not cheap out on anything. If you can spot something I've missed, I'd be happy to know.

      Schwab

    2. Re:What's the real problem? by uslinux.net · · Score: 2
      For starters, upgrade the power supply. Seriously. If you have 3 drives, plus a CD ROM and a dual P3 system, you're probably sucking mad amounts of power. If the drives aren't getting enough power, you could brown them out. Think about when the lights dim in your house because an appliance kicks on (okay, maybe not in *your* house, but it does in mine). That's because the voltage is dropping below (usually) 103 volts because of the sudden load. If several of your drives are being accessed simultaneously, you may very well be accomplishing the same thing *inside* the case. You can usually run at about 75% of your max wattage before you start to have problems. I've found drives are generally about 25 watts a piece, and those P3 CPUs are probably 25-30 watts each. When you start considering RAM, motherboard, etc, you'll probably find you're running over 225 watts. 300 watt power supplies were really designed for single cpu, 2 drive + CD & CDRW systems. With 3 drives constantly spinning, and dual CPUs, you really need 350-400 watts.

      Seriously, if you're getting unrecoverable read errors on adjacent sectors, it really sounds to me like there wasn't enough power, and the data didn't get written cleanly to disk. Remember, bits and bytes are just current.

      In addition, make sure you have a well shielded setup (cover on your system), and a good, high quality drive cable. I had a maxtor 340 mb drive back in '95/96 which occasionally crashed on "unrecoverable read error writing to drive c:" under windows95/dos. Swapping IDE cables fixed it (maxtor claimed it was RF interference).

  7. YMMV by raygundan · · Score: 2

    We need a truly objective survey of hard drive reliability. My personal experience is nearly the exact opposite of yours-- I have had two fujitsu drive failures within 2 years, and one IBM failure in 8 months. My maxtor and western digital drives (even the really old ones) are all still running happily.

    Just goes to show how true YMMV really is, and why anecdotal evidence isn't much help.

  8. Problems with mirroring disks by Nonesuch · · Score: 2
    InitZero writes:
    If you have the money, buy a pair of top quality drives and mirror them. If you can't afford that, buy a couple of cheap drives and mirror them. Don't put important data on a single drive and expect it to be there when you get back from lunch.
    Good advice.

    One problem I have is that most of the times I have had drives die early in their lifespan, it has been a 'batch' problem, and had a purchased two identical drives from the same vendor, chances are, both of them would have died at about the same time.

    Most mirroring solutions depend on using nearly-identical drives for the mirrored pair, right?

    Another issue, I've had very few drives fail in service, where the system was running for years and then either just went dead or started getting disk errors, increasing over time. 99% of the failures I have encountered have been with drives that just would not come back up after a shutdown.

    Sometimes you can hear the bearings going out, other times you shut the system down for just a few minutes, turn the power back on, and the drives just go 'clunk', but cannot spin up.

    In the old days of 'stiction' this could sometimes be overcome by repeated powercycles or the old 'weak karate chop to the side of the drive' method.

    Again, I've had multiple drives of about the same age fail in this manner, which in the case of a mirror, means losing the data...

  9. SunFire servers and redundant boot disks by Nonesuch · · Score: 2
    A cool feature of the latest FC-AL based systems from Sun, the OS includes commands to support hot-swap, including the ability to disconnect and/or power down one drive in a system without affecting the others.

    I've attempted this 'live software disconnect/spin down' with other OS's using standard SCSI, but haven't had much luck. Solaris never supported it before, and now only on FC-AL.

    One trick you can do with this is to have a 'warm spare' installed, a drive that contains a mirror of the system as of the last major change, but is not constantly running. By keeping the spare drive updated, installed, and ready, you can recover from a failed disk remotely, without any need for physical intervention. Combine this with the new "RSC" (battery-backed lights-out-management card with it's own ethernet and modem paging, and you really have something to brag about).

    If the big Sunfires are out of your budget, a subset of the full feature set is in the LOM interface on some(?) Netra models.

    One drawback of spinning down the disk (as I mentioned in another comment here), one of the most common failure modes is a drive that just won't spin up once you turn it off...