Hardware For Bulk IDE Hard Drive Burn-In?
r0gue_ asks: "I work for a mid-size OEM hardware manufacturer. We ship approximately 300 to 500 IDE HDs every month across all our units. Currently we experience about a 4% failure rate (Maxtor and WDs), though in recent months it has been a couple percent higher. The problem is our systems are dedicated boxes with a non end-user friendly form factor. Virtually every physical HD failure results in an RMA. What we are looking for is a hardware based IDE HD burn-in platform. Something that we could drop a dozen or so drives in at once, stress test them for a day or two, then put them into inventory for builds. I know the HD manufacturers and larger OEMs use them but I have not been able to track down anywhere we could purchase one. Right now moving to SCSI or a form factor that supports externally removable drives is not an option. I was hoping that the Slashdot community could point me in the right direction."
"We sell expertise. The only catch: we are not experts ourselves." Worry not! There's always "ask Slashdot".
A-mazing.
What's with the lame Ask Slashdot questions lately? I'm fine with people asking stuff, all sorts of stuff, but lately there has been this trend "I'm too lazy to do my own job, could you please do it for me? And make that before 5, I'd like to go home." For godness' sake! If you are a OEM, call the fscking provider! They got to know this stuff. "Look, Joe, we are having this problem with the drives we are shipping, can you please tell me where to find stress testing hardware? And if it's not much of a problem, we'd like to avoid killing half of the drive's usable life while we are at it".
For the love of God, don't use Western Digital or Maxtor drives. It's like you're asking for that 4%.
Put 8 IDE controllers into a box (more than this maxs out the PCI bus bandwidth) .
write bash script that checks dmesg for how many drives are in the system and invoked the follwing perl script for each drive.
Write perl script that does this.
formats and partitions drive to max size,
copies a kernel or some other large file onto the disk until it is full.
monitors syslog for IDE errors
md5sums the files to make sure they all match.
reports an error if the MD5 doesnt match.
unless you get hotswap controllers you will have to reboot everytime you want to test another batch of drives.
if you dont wish to write this perl script i can be hired to do it for you.
In the late '80s I did hard drive repair and we used Wilson and Flexstar equipment for testing and burn-in. I can't find any links to Wilson equipment right now. Flexstar had a more extensible architecture and sounds like what you need. I've used the 2550 series RLL and EDSI Flexstar modules (this was the late '80s, we all thought that IDE was a passing fad at the time) and I can verify that the programming language for this equipment was very straightforward. The Flexstar equipment was very reliable. The only trouble we ever had was the cable ends that would naturally wear out from constant plugging and unplugging. We just replaced all the cable ends every two or three months.
Slashdotters! If you don't find a story interesting, please don't complain and call Slashdot lame. Just ignore the story. Do you complain to your local newspaper that they should not publish recipes because you don't cook?
Comment about the Slashdot question: The wording of the question seems to imply that you believe that Maxtor and Western Digital hard drives have an equal failure rate. That has not been my experience. My experience has been that Western Digital are the most reliable hard drives. I'm very interested to know the experience of other readers.
Western Digital went through a bad stretch in which they experienced a problem that caused high failure rates several years ago, but that was cured.
It's shocking that you are in the computer business and knowingly shipping products with a 4% failure rate. That's very expensive and annoys the customers.
However, you are on the right track. Electronic products have what is called "infant failure". Most failures occur in the first week. During 192 hours (one week), the failure rate falls typically by a factor of 100 or even 10,000. At the end of one week most failures have already happened.
It's very easy to write a program that exercises a hard drive. Just copy files back and forth from folder to folder. It is easy to write a program that fills a hard drive with files, then erases them and starts again.
The Promise Ultra133 TX2 supports adding four more hard drives to the 4 already supported by modern motherboards. Eight is enough for one test computer, usually, because the power supply won't support more. Be careful to use delayed start. Maybe you will need more powerful power supplies than you normally use.
Make SURE that you are not having troubles with heat. Are your drives cool when they are installed in your product? High heat will cause high failure rate.
Over what period? If that's over anything less than five years, I'd perhaps be looking towards the conditions the drives are in; are they well ventilated, or near any hot components? Keeping a drive cool can reduce failure rate by ~30% (based on a study IBM did on their SCSI drives); keeping them too hot can drastically increase it. Don't underestimate the effects a bit of active cooling on a drive could have on reducing early failures too.
After that, I'd look at maybe trying some different manufacturers. Seagate, for instance, have a very good reputation for low failure rates.
I used to be involved with a manufacturer, and we used something called an Octet machine for mastering IDE drives for desktops and laptop computers.
IIRC, there was a feature to test the disks as they were being mastered, but we never ran the machine in this mode due to the time it took to do it.
You could do 8 disks at a time, hence the name, I did a Google, but couldn't find you a manufacturer.
It looks like a elongated cash register, with an area covered with padding to site the drives, it can be connected to a PC where other programs can control it, rather than the limited software built into the machine.
Around 1995, this machine cost about 800 pounds (sterling).
Sheesh -- an Ask Slashdot that's already been answered on Slashdot! Not exactly a duplicate post, but apparantly the Editors aren't the only ones who don't read /.
If all this should have a reason, we would be the last to know.
We built some disk arrays using a front-loading IDE case with drive trays. This one is pretty pricey but it's _nice_ hardware:
https://www.rackmountplus.com/spec.asp?ID=RMAC4D -IDE
That, plus a couple RAID cards (like 3ware's new 12-port cards) in a 64/66 PCI slot and bonnie++ would do a pretty good job of burning in your drives. You could flip drives in and out in a few seconds.
---------------------
Interesting.
Intel motherboards have a BIOS setting called "Hard Disk Pre-Delay". The system waits for the hard drives to spin before it tries to detect them.
buy a few IDE raid cards and set them up raid one? This impliments a full mirror of data on the raided devices. Then perform burnin on the raid device.
Note: I have never implimented raid and am not an expert, so this idea would need to be independently verified.
-- The morphemes of your disquisition are ascertainable, but they have eschewed an ambit of transpicuous exposition.
Get several IDE adapters and run the cables out the back of the box. Use an another power suppply to spin the drives. I do not have any ideas about hot swapping. You could do a cheap environmental chamber with a cardboard box and no fans to see how the drives do without any ventilation. Then get iometer and write some tests. Be sure to do sevral passes with different byte patterns (00, AA, 55 ,FF) over the whole media. Also through in a large block of random accesses of varying both length and location. You should also do a butterfly pattern write read, FIRST LBA, LAST LBA, FIRST+1, LAST-1. Loop and let run as long as necessary to make you happy. The SCSI guys will do this kind of thing to drives for weeks-months non stop to figure MTBF and find other problems. They have specialized solutions for software but IOmeter should do unless you want to learn how to code direct disk accesses not file system.
We have several IDE fileservers at work. Each box is equipped with two 3Ware 8-port controllers, and 16 removable drive bays. Stick a 17th drive in there as an OS drive, install Linux, and run benchmark of your choice. Once you're happy with the drives, just pull the bays and swap in new drives.
Yes, exactly. However, cooking the power supply is not a problem, since all power supplies have overcurrent protection. The problem is that the BIOS begins its detection process before the power supply has stabilized enough to provide the correct voltage, due to the unusual load. When the detection fails, there is an error message. So the BIOS pre-delay can be helpful.
A friend of mine interned at the Seagate R&D plant in Longmont, Colorado last year, doing testing runs for harddrive series, all IDE. They had BIG refrigerator looking things that did automatic testing (or actually any other low-level function) based on commands from a terminal.
If I'm not mistaken, they just upgraded their cabinets, so it is likely that either there are surplus cabinets around from the various manufacturers, or theres somewhere making em. They might be a bit expensive, but if you're looking at throughput this might be the key.
Give one of em a ring, find out what they can recommend. I'm sure you can find someone if you just look.
--onyx--
Antec is a case maker. I have not been impressed with their power supplies. They are adequate, not wonderful, in my experience.
You can power the drives with old AT power supplies which can be had a lot cheaper than ATX supplies these days.
I see even classic Slashdot is now pretty much unusable on dial up anymore.
USB, also a usable plan.
Sadly you may need to use Windows has Solaris just isn't right and Linux has horrible Spaghetti Code for this stuff. Windows for all it's oh so many faults will let you get this up quickest.
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
I was looking for some support tools for my Deskstar yesterday, and ran acrossed this tool for OEM's from Hitachi.
Hitachi DDD-SI
Looking at the User's guide, it looks like you could use it's basic features on non IBM/Hitachi drives. You also might want to check out the other manufacturers sites and see if they've got something similiar.
... of testing HDs is to dd if=/dev/zero of=/dev/hdX bs=1 .This is one of the best stress tests anyone can easyly deploy. I learn it from an old friend, Mr. Nesc
"Emancipate yourself from mental slavery, none but ourselves can free our minds !"