Slashdot Mirror


Hardware For Bulk IDE Hard Drive Burn-In?

r0gue_ asks: "I work for a mid-size OEM hardware manufacturer. We ship approximately 300 to 500 IDE HDs every month across all our units. Currently we experience about a 4% failure rate (Maxtor and WDs), though in recent months it has been a couple percent higher. The problem is our systems are dedicated boxes with a non end-user friendly form factor. Virtually every physical HD failure results in an RMA. What we are looking for is a hardware based IDE HD burn-in platform. Something that we could drop a dozen or so drives in at once, stress test them for a day or two, then put them into inventory for builds. I know the HD manufacturers and larger OEMs use them but I have not been able to track down anywhere we could purchase one. Right now moving to SCSI or a form factor that supports externally removable drives is not an option. I was hoping that the Slashdot community could point me in the right direction."

2 of 51 comments (clear)

  1. I used to repair disk drives... by dfinster · · Score: 5, Informative

    In the late '80s I did hard drive repair and we used Wilson and Flexstar equipment for testing and burn-in. I can't find any links to Wilson equipment right now. Flexstar had a more extensible architecture and sounds like what you need. I've used the 2550 series RLL and EDSI Flexstar modules (this was the late '80s, we all thought that IDE was a passing fad at the time) and I can verify that the programming language for this equipment was very straightforward. The Flexstar equipment was very reliable. The only trouble we ever had was the cable ends that would naturally wear out from constant plugging and unplugging. We just replaced all the cable ends every two or three months.

  2. Re:How-to by dubl-u · · Score: 5, Informative

    That's a good approach. At 8 drives a day that's 250 a month for a station that you can build for well under $1000. I'd only add that you may need to tune this based on the failure modes that you are seeing.

    For example, if it's just bad spots, then you'll want to do as many reads and writes as possible. For that, the fastest thing would be a little C program that reads and writes different patterns to the raw device linearly.

    On the other hand, if the failures are tied to seeks, you'll want to write to semi-random locations on the device, to force maximum seeks. Or if you see a mix of both, then your best bet might be to follow m0rph3us0's plan, perhaps tweaking it a bit to better simulate normal filesystem efficiency (and you can just do bit compares rather than md5sums if CPU is an issue).

    You should also keep an eye on heat issues. The burn-in should happen at temperatures that are like what they will be in the end systems. If you pack 8 seeking drives into some cases, they'll cook. If you leave them in the open air, they might not trigger the failures you are seeing in the field. Try to match measured operating case temperature.

    Oh, and don't forget to measure whether this burn-in is really helping. Take stats now, and keep tracking causes of return. It could be that the drives are sensitive to noisy power or vibration or something else that your burn-in won't catch.