Hardware For Bulk IDE Hard Drive Burn-In?
r0gue_ asks: "I work for a mid-size OEM hardware manufacturer. We ship approximately 300 to 500 IDE HDs every month across all our units. Currently we experience about a 4% failure rate (Maxtor and WDs), though in recent months it has been a couple percent higher. The problem is our systems are dedicated boxes with a non end-user friendly form factor. Virtually every physical HD failure results in an RMA. What we are looking for is a hardware based IDE HD burn-in platform. Something that we could drop a dozen or so drives in at once, stress test them for a day or two, then put them into inventory for builds. I know the HD manufacturers and larger OEMs use them but I have not been able to track down anywhere we could purchase one. Right now moving to SCSI or a form factor that supports externally removable drives is not an option. I was hoping that the Slashdot community could point me in the right direction."
Put 8 IDE controllers into a box (more than this maxs out the PCI bus bandwidth) .
write bash script that checks dmesg for how many drives are in the system and invoked the follwing perl script for each drive.
Write perl script that does this.
formats and partitions drive to max size,
copies a kernel or some other large file onto the disk until it is full.
monitors syslog for IDE errors
md5sums the files to make sure they all match.
reports an error if the MD5 doesnt match.
unless you get hotswap controllers you will have to reboot everytime you want to test another batch of drives.
if you dont wish to write this perl script i can be hired to do it for you.
In the late '80s I did hard drive repair and we used Wilson and Flexstar equipment for testing and burn-in. I can't find any links to Wilson equipment right now. Flexstar had a more extensible architecture and sounds like what you need. I've used the 2550 series RLL and EDSI Flexstar modules (this was the late '80s, we all thought that IDE was a passing fad at the time) and I can verify that the programming language for this equipment was very straightforward. The Flexstar equipment was very reliable. The only trouble we ever had was the cable ends that would naturally wear out from constant plugging and unplugging. We just replaced all the cable ends every two or three months.
Slashdotters! If you don't find a story interesting, please don't complain and call Slashdot lame. Just ignore the story. Do you complain to your local newspaper that they should not publish recipes because you don't cook?
Comment about the Slashdot question: The wording of the question seems to imply that you believe that Maxtor and Western Digital hard drives have an equal failure rate. That has not been my experience. My experience has been that Western Digital are the most reliable hard drives. I'm very interested to know the experience of other readers.
Western Digital went through a bad stretch in which they experienced a problem that caused high failure rates several years ago, but that was cured.
It's shocking that you are in the computer business and knowingly shipping products with a 4% failure rate. That's very expensive and annoys the customers.
However, you are on the right track. Electronic products have what is called "infant failure". Most failures occur in the first week. During 192 hours (one week), the failure rate falls typically by a factor of 100 or even 10,000. At the end of one week most failures have already happened.
It's very easy to write a program that exercises a hard drive. Just copy files back and forth from folder to folder. It is easy to write a program that fills a hard drive with files, then erases them and starts again.
The Promise Ultra133 TX2 supports adding four more hard drives to the 4 already supported by modern motherboards. Eight is enough for one test computer, usually, because the power supply won't support more. Be careful to use delayed start. Maybe you will need more powerful power supplies than you normally use.
Make SURE that you are not having troubles with heat. Are your drives cool when they are installed in your product? High heat will cause high failure rate.
We built some disk arrays using a front-loading IDE case with drive trays. This one is pretty pricey but it's _nice_ hardware:
https://www.rackmountplus.com/spec.asp?ID=RMAC4D -IDE
That, plus a couple RAID cards (like 3ware's new 12-port cards) in a 64/66 PCI slot and bonnie++ would do a pretty good job of burning in your drives. You could flip drives in and out in a few seconds.
---------------------
Western Digital: I've owned several western digital drives over the past decade, and none of them have ever failed me. At my workplace, I've found old WD drives in Pentium I PC's that have been in service for 6+ years without a single problem.
Maxtor: I've been plagued with problems from maxtor drives over the years. From one original Maxtor i've bought (and it's RMA replacements), i had 2 that had spindle motors that became abnormally loud, one catastrophically fail (IDE Auto-detect had problems even detecting what the drive was), and then the last one i had started failing (with SMART warning, at least) about 5 days after my warranty period ended.
Seagate: I've never used any of the newer seagate offerings, but my older seagate drives lasted for years before i replaced them with higher capacity drives.
Quantum: Most of the quantum drives (standard 3.5" form factor) i have encountered at my workplace have performed reliabily over the years, recently however we've started seeing a bunch of them failing, but considering most of them are 4+ years old, it's not a bad track record. I recently had a quantum bigfoot die last fall, but even that was close to 4 years old also. (Just what exactly prompted quantum to make this strange hybrid form factor bigfoot anyways?)
IBM: The older drives were great, and I used to love their service, i would contact their support and they would do an advanced RMA with no problem. However on my most recent experience with them in Fall '02 getting an RMA on a drive less than a year old, they told me they couldn't Advance RMA me a drive, i would have to use standard RMA, which including shipping times, would take almost a month, which was quite unacceptable. Not to mention how their newer drives had a problem with failures.
You can't go by brand alone - at some point every manufacturer has had a line of bad drives.
StorageReview has a Drive Reliability Survey that lists statistics for many drive families. For example, WD 205Bx drives are near the top of the rankings (99th percentile) while the 600Ax is near the bottom (10th percentile).
That 99th percentile is based on barely enough drives for it to be rated (just over 60) so it doesn't mean much, most of the drives have hundreds of units in the database. Besides I think the database is in many ways flawed as most people who list their drives will do so because they have had a failed drive. The best way from my perspective is to look at what companies with hundreds or thousands of drives are doing. Rackspace switched out all of their IBM IDE's for Maxtors, google uses Maxtor's, and the recent IDE backup unit featured here used Maxtors. But maybe I'm just biased because as an OEM I had tons of drives from almost every manufacturer die on me except for Maxtors.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.