Ask Slashdot: Do You Test Your New Hard Drives?

← Back to Stories (view on slashdot.org)

Ask Slashdot: Do You Test Your New Hard Drives?

Posted by timothy on Sunday December 23, 2012 @05:22AM from the just-bite-the-corner-a-little dept.

An anonymous reader writes "Any Slashdot thread about drive failure is loaded with good advice about EOL — but what about the beginning? Do you normally test your new purchases as thoroughly as you test old, suspect drives? Has your testing followed the proverbial 'bathtub' curve of a lot of early failures, but with those that survive the first month surviving for years? And have you had any return problems with new failed drives, because you re-partitioned it, or 'ran Linux,' or used stress-test apps?"

7 of 348 comments (clear)

Min score:

Reason:

Sort:

Re:SSDs by roc97007 · 2012-12-23 05:29 · Score: 5, Insightful

> Who cares about HDDs anymore these days?
Anyone with a need for a massive amount of storage space.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
smartmontools by WD · 2012-12-23 05:35 · Score: 5, Informative

Set up the smartd.conf file to do the example short-test daily and long-test weekly, and email you when something is fishy. It's a trivial amount of effort, resulting in a significant amount of peace of mind. (In many cases, you'll have some amount of warning before your drive kicks the bucket and it's too late)
1. Re:smartmontools by Deekin_Scalesinger · 2012-12-23 05:58 · Score: 5, Funny
  
  This should be modded up for your username alone lol
  
  --
  "As the intrepid kobold companion continues his journey, he begins to wonder... if priests raises dead, why anybody die?
Yes! Especially before adding them to an array. by Anonymous Coward · 2012-12-23 05:42 · Score: 5, Interesting

I run some ZFS systems at work. With the current version of the filesystem, you can expand the zpools but you can't shrink them, so adding a bad drive causes immediate problems.
I've found that some drives are completely functional but write at extremely slow rates: maybe 10% of normal. With typical consumer drives, maybe 1/20 is like this. To ensure I don't put a slow drive into a production zpool array of disks, I always make a small test zpool consisting of just the new batch of drives and stress-test them.
This catches not only obviously bad drives, but also the slow or otherwise odd ones.
Re:dban followed by smartctl by bill_mcgonigle · 2012-12-23 05:45 · Score: 5, Interesting

Yes, this. I do it online:

dd if=/dev/zero of=/dev/sdX bs=8M

and then check smartctl. If I'm making a really big zpool, I fill them up and let ZFS fail out the turkeys:

dd if=/dev/zero of=/tank/zeros.dd bs=8M zpool scrub tank

If I'm building a 30-drive storage server for a client I'll often see 1-2 fail out. Better to catch them now then when they're deployed (especially with the crap warranties on spinning rust these days). I need to order in staggered lots anyway, so having 10% overhead helps keep things moving along.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
SMART + badblocks by SuperBanana · 2012-12-23 06:23 · Score: 5, Interesting

I run smartctl and capture the registers, then run badblocks, and compare smartctl's output to the pre-bad-blocks check.
If there are any remapped blocks, the drive goes back, as the factory should have remapped the initial defects already, and that means new failed blocks in the first few hours of operation.

--
Please help metamoderate.
Re:Heh by greg1104 · 2012-12-23 07:54 · Score: 5, Interesting

Spinrite hasn't been useful for years. There's a good analysis why at Does SpinRite do what it claims to do?. Everything the program does can be done more efficiently with a simpler program run from a Linux boot CD. And the fact that it takes so long is a problem--you want to get data off a dying drive as quickly as possible. Here's what I wrote on that question years ago, and the rise of SSDs make this even more true now:
SpinRite was a great program in the era it was written, a long time ago. Back then, it would do black magic to recover drives that were seemingly toast, by being more persistent than the drive firmware itself was.
But here in 2009, it's worthless. Modern drives do complicated sector mapping and testing on their own, and SpinRite is way too old to know how to trigger those correctly on all the drives out there. What you should do instead is learn how to use smartmontools, probably via a Linux boot CD (since the main time you need them is when the drive is already toast).
My usual routine when a drive starts to go back is to back its data up using dd, run smartmontools to see what errors its reporting, trigger a self-test and check the errors again, and then launch into the manufacturer's recovery software to see if the problem can be corrected by it. The idea that SpinRite knows more about the drive than the interface provided by SMART and the manufacturer tools is at least ten years obsolete. Also, getting the information into the SMART logs helps if you need to RMA the drive as defective, something SpinRite doesn't help you with.
Note that the occasional reports you see that SpinRite "fixes" problems are coincidence. If you access a sector on a modern drive that is bad, the drive will often remap it for you from the spares kept around for that purpose. All SpinRite did was access the bad sector, it didn't actually repair anything. This is why you still get these anecdotal "it worked for me" reports related to it--the same thing would have been much better accomplished with a SMART scan.