Reviews of Hard Drive Reliability?
ewhac asks: "After having
three 18G drives go toes-up on me in the last two months, all of them
done so after about 40 days of use, I want the replacement drives to
be rock-solid. While Tom's Hardware and
AnandTech review individual
drives and their performance, I haven't yet been able to locate any
comprehensive or cohesive review of drive reliability and longevity.
Does such a resource exist?"
Just had two 18GB IBM SCSI (LZX) drives die after less than a year. Also had 6 bad disks in 5 months on a shark at work.
Never, ever, ever, ever buy IBM storage.
Conformity is the jailer of freedom and enemy of growth. -JFK
Use a RAID array so that you have failure protection. I know compaq sells a good product that goes with their servers, maybe they sell it stand alone too.
~ now you know
I commend the request for asking for real data.
Anecdotal evidence from people who have had drives of a certain brand fail on them and then say "never use this drive" is basically worthless. Even if you hear 5 or 10 people say that, ignore them.
What you need to know is if there are enough anecdotes to show that the mfgr's MTBF rate is inaccurate and the real rate is a lot lower than what they report (or a lot lower than other mfgr's). Or maybe if there is a certain batch of drives that are anomalous.
The question is: is the mfg's MTBF rate good enough for you and is it accurate?
www.storagereview.com has started a reliability database but I don't know if their data is statistcally valuable yet.
Jesus saves....And takes 1/2 damage.
If you've had 3 hard disks die on you in 2 months, the problem may not have been with the disks themselves. The first thing to check is if you're getting adequate ventilation to the area where the hard disks are at. You might also want to test the voltage your power supply is putting out.
Questions like this about hard disks are really better answered here.
i know for sure that i have an ibm deskstar. and when i went to the computerstore they said that there is a lot of problems with ibm disks (ide atleast)
i used to run a store before, and then i had mostly problems with western digital caviar disks.
now i'm not sure whether there is just normal for disks to just die/have bad sectors, or if there is still some hope using samsung disks...
i know samsung bought a fab from conner. and i know i had some problems with those old conner disks.
i guess somebody should cover which fabs are producing what brands and how their reliability tests are performed. atleast i think it's a shame that ibm just have a bad word nowadays.
regards,
fluor
the storage review reliability index should serve you well. Unfortunately the site itself may be taken down soon (due to financial reasons), so get there quick.
Four 36 gig drives on 16 in our array blew out last week. (Probably heat-related. We had some AC problems in the computer room but the room never exceeded rated temperature.) Two weeks before that, two 18-gig drives in separate machines died for unknown reasons. The 36-gig drives were IBM. The 18-gig drives were Segate (who, at one time, made the IBM drives). In the last two months, we've also lost a few Maxtor drives.
Except for the batch of drives in one array, the above is fairly typical. We have thousands of drives from many vendors and I can't swear one is any better or worse than the other. Hard drives all pretty much suck.
Sure, we all read about MTBF being 500,000 hours for new drives but that's a pipe dream. Drives burn out every single day.
If you have the money, buy a pair of top quality drives and mirror them. If you can't afford that, buy a couple of cheap drives and mirror them. Don't put important data on a single drive and expect it to be there when you get back from lunch.
InitZero
So, if you're finding your drives die in 30-60 days, there's likely another problem you're missing. If you're using SCSI, I'd guess they're probably 7200 or 10k RPM drives, which means LOTS of heat, especially if you have several. So, first of all, go buy a few 60 or 80mm fans, and stick them in front of the drives, if you can. Get some air flow across them (remember, air pushed across the drives does much more than air pulled/sucked across them). Heat will quickly kill a drive.
Barring that, you haven't said how the drives have died (won't spin up, unusual read errors, etc), but a poor power supply, especially one running at capacity could burn out a drive. Finally, any sort of shock (case constantly being moved, bounced around, kicked, etc) could do a drive in, though that is probably less likely.
As with anything else, it's all IMO, YMMV, etc.
I would like to note (as someone else did) that StorageReview was attempting to build a reliability database, in addition to reviewing units themselves. Tho they seem to have intended to make money, they have subsequently followed the dot-bust and are going to end their site when their current funding runs out. It would be a shame to lose the data. Anyone interested should email them and ask them to make the database public domain, and then see if there is enough support for someone to host it. This would be a valuable resource. There is no substitute for good statistical data analysis. The only other thing you can review is manufacturer claimed MTBF (Mean Time Between Failures). If your drive bites the dust outside a statistically likely variance from the manufacturers claim, at least call them up and ask that they give you a new drive.
That said, you should use HW RAID and SCSI if you want reliablity. Otherwise, simply buy a good tape backup device and backup regularly. IDE drives are a commodity item, and are basicly least-common-denominator products where whoever can cut a corner to bring down the price will. Given that, use equipment aimed at business/enterprise/professionals and use HW RAID if the data needs to have reliable uptime.
You know, Consumer Reports has long been known for compiling reliability data for automobiles by surveying its readers.
If that kind of method is a good one, I wonder if we can get some techy rag to do something similar.
Jesus saves....And takes 1/2 damage.
Some of this may be redundant - but that's the point! Redundant storage :-)
:-)
Is it you? first check
1. Power Supply. Don't run 3x 10000 RPM's off a 230W p/s. It's just not cricket.
2. Cooling. Blow wind onto them, don't suck it from them - someone smarter than me can say why, but it just works better.
3. Shock. Did the courier drop them? Did you drop them?
If it's not you them,
1. If it is the slightest bit valuable then it should be redundant.
2. Did I mention redundancy?
I have a seagate SCSI disk in a MicroVAX that has hardly missed a beat since 1987. These disks don't exist any more - they just don't make em as good. This is sad because it is a reliable disk, but not so bad because it weighs about 5kg and I can hear it spin up from the other end of the house!
Having said all that - The newer IDE disks die _way_ before they should. It pisses me of as much as the next guy. What can we do?
1. Death
2. Taxes
3. Hard Drive Failure
Fortunately, there is good news. Though the latter will never be completely eliminated, data loss as a result of hard drive failures can!!! The secret it actually no secret at all - redundancy!
You can read a truck load of technical documentation, bore yourself to death with piles of market research, or even consult a psychic, but nothing will stop the inevitable failure of hard drives. It is an industry wide problem with (in my experience) little bearing on the hardware manufacturer. Sure, everyone has their favorites, but in the end the simple fact is that hard drives have moving parts and any thing with a moving part can, will, and DOES break...
Beer is proof that God loves us and wants us to be happy. -- Benjamin Franklin
I don't know of any such resource, but there's surely sufficient users here to form an idea of what to buy and what not to, just from their experiences.
Only last week I was agreeing with fellow LinuxSA members that Seagate, Fujitsu, and IBM drives are reliable, and Maxtor and Western Digital drives are not. The last-mentioned brands seem far more likely to seize or develop bad clusters after a few years of use.
I also does not seem coincidental that larger reputable companies seem to sell those drives perceived to be reliable and smaller "iffier" companies (such as those marketing only on cost) seem to sell those drives perceived to be unreliable.
Storage Review lost their entire database. Scroll down their front page to "A few answers on the event"
--Giving to trolls for the benefit of us all
The few hard drives I have had fail over time, the bad block blackhole, have always failed due to heat issues. This is especially true for 10K and 15K rpm SCSI drives. One particular PC chassis of mine was on an IBM 18GB 10K SCSI drive killing spree, until I stuck the latest drive in a well ventilated 5.25" slot.
Now that I've given my opinion on hard drives and heat, I'm going to reinforce some advice that has already been posted. If at all possible mirror your data drives. If your data is of a life or death nature create a backup system and don't forget to regularly verify backups.
--I'm just glad electronic devices work most of the time.
We need a truly objective survey of hard drive reliability. My personal experience is nearly the exact opposite of yours-- I have had two fujitsu drive failures within 2 years, and one IBM failure in 8 months. My maxtor and western digital drives (even the really old ones) are all still running happily.
Just goes to show how true YMMV really is, and why anecdotal evidence isn't much help.
I've had some Maxtor DiamondPlus drives for a while and they work great. I've got the newer 40gig and older 13gig, both 7200rpm drives. When I first got the 13gig drive it started failing within about two weeks, so I brought it back and got a replacement (which was actually a 15gig).
One problem I have is that most of the times I have had drives die early in their lifespan, it has been a 'batch' problem, and had a purchased two identical drives from the same vendor, chances are, both of them would have died at about the same time.
Most mirroring solutions depend on using nearly-identical drives for the mirrored pair, right?
Another issue, I've had very few drives fail in service, where the system was running for years and then either just went dead or started getting disk errors, increasing over time. 99% of the failures I have encountered have been with drives that just would not come back up after a shutdown.
Sometimes you can hear the bearings going out, other times you shut the system down for just a few minutes, turn the power back on, and the drives just go 'clunk', but cannot spin up.
In the old days of 'stiction' this could sometimes be overcome by repeated powercycles or the old 'weak karate chop to the side of the drive' method.
Again, I've had multiple drives of about the same age fail in this manner, which in the case of a mirror, means losing the data...
I do not deploy Linux. Ever.
I've attempted this 'live software disconnect/spin down' with other OS's using standard SCSI, but haven't had much luck. Solaris never supported it before, and now only on FC-AL.
One trick you can do with this is to have a 'warm spare' installed, a drive that contains a mirror of the system as of the last major change, but is not constantly running. By keeping the spare drive updated, installed, and ready, you can recover from a failed disk remotely, without any need for physical intervention. Combine this with the new "RSC" (battery-backed lights-out-management card with it's own ethernet and modem paging, and you really have something to brag about).
If the big Sunfires are out of your budget, a subset of the full feature set is in the LOM interface on some(?) Netra models.
One drawback of spinning down the disk (as I mentioned in another comment here), one of the most common failure modes is a drive that just won't spin up once you turn it off...
I do not deploy Linux. Ever.