Ask Slashdot: Do You Test Your New Hard Drives?

Heh by Deekin_Scalesinger · 2012-12-23 05:23 · Score: 4, Insightful

Like, never. Out of the box and away she goes...good luck to thee!

--
"As the intrepid kobold companion continues his journey, he begins to wonder... if priests raises dead, why anybody die?

Re:Heh by JMJimmy · 2012-12-23 06:00 · Score: 4, Insightful

Add to the above:
HDD tools are useless. I recently tried a bunch of them - they all reported my HDD in perfect condition... while it was doing the click of death. HDD failed within a week.
Re:Heh by danomac · 2012-12-23 06:20 · Score: 1

The only new computer component I always test out-of-the-box is RAM - I've had many bad experiences over the last 10 years with unstability due to bad RAM.
As far as hard drives go, I never test them. I run several RAID arrays in the house, and I actually have had a replacement drive fail in a week (one of Seagate's recertified drives.) I noticed odd behaviour and rebooted the server and the RAID array was degraded. Oops!
I guess in a way I do test them - if the new drive fails shortly after rebuilding the array, it was likely a lemon to start with.
I don't think the hard drive tools really test anything other than the SMART information on the drives anyway. You're at the mercy of the drive failing bad (or hard) enough to actually trip a smart error. I've also had a drive that clicked and made awful noises for a year before it finally died. And no, it didn't ever report a SMART error, it just crapped out randomly.
Re:Heh by PlusFiveTroll · 2012-12-23 06:40 · Score: 3, Informative

Sounds more like your hard drive s.m.a.r.t. was useless. The tools can only report what the drive tells it, if smart isn't telling about relocated sectors, resets, or whatever other terrible malfunction then they are left in the dark.
Re:Heh by war4peace · 2012-12-23 06:56 · Score: 1

On a more general note: I never move important data. What I do is: I copy data from old HDD to new HDD and then use KLS Backup to set up incremental back-up. I still use old HDD until it fails. When that happens, the old HDD is taken out of the system, the "new" HDD becomes the "old" HDD and a brand new HDD becomes... yes, you guessed it: new HDD :)
Unimportant data never gets backed up (e.g. installed games or large ISOs I keep for some reason, music, uncompressed video captures, etc). It goes straight to the new HDD because that's usually larger than the old one.

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
Re:Heh by Anonymous Coward · 2012-12-23 07:10 · Score: 1, Informative

And I bet you're living in the past. Computer hardware is cheap, easily replaceable commodity parts these days. Why the fuck would I bother running worthless burn in tests when it's so easy and/or cheap to replace faulty parts? I don't care about the drive, just my data, which is always backed up with the most important stuff doubly redundant.
Re:Heh by hairyfeet · 2012-12-23 07:13 · Score: 4, Interesting

The problem is the best damned tool ever made for testing drives hasn't been updating in years and now won't work on drives bigger than 500Gb, I am of course talking about Spinrite. With Spinrite on lvl 2 you just bypass the firmware and write patterns of zeroes and ones and then read back what it reports, if its spitting errors right off the bat then you know to send it back. Problem is Gibson hasn't updated the thing since 06 so it can't handle drives bigger than 500Gb which makes it all but useless today.
So if anybody has found something that works similar to spinrite but works on large drives I too would like to know, I get drives coming in from all over the place at the shop with ZERO history here at the shop so I don't know if they've been barely used or thoroughly abused and having a tool I can run on them would be a big help.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Heh by spire3661 · 2012-12-23 07:21 · Score: 1

SMART itself is mostly useless and we should ignore it completely.

--
Good-bye
Re:Heh by hairyfeet · 2012-12-23 07:25 · Score: 3, Informative

That's nice, an OS used by less than 2% of the entire planet has some tool that reports what SMART is telling it, no different that a billion freeware programs for Windows. Just FYI but I can think of about a dozen freeware programs that will do the same damned thing in Windows, INCLUDING the email, so its not exactly like you got anything to brag about Ms AC.
Now I'm gonna spell out what the REAL problem is, which any guy who has spent time in the trenches will tell you and that is SMART SUCKS ASS and for several years has more about covering bad batches for the HDD OEMs than it has been for actually telling you something is going bad. I have had drives in the shop that sounded like an angle grinder bouncing on pavement where SMART said "Nope, nothing wrong here la la la"" while the thing just ground and sputtered, its the most fucking pointless diagnostic tool there is.
What we NEED is a replacement for Spinrite, something that bypasses the lying SMART and just runs a pass of zeroes and ones on the drive and reports a simple pass/fail on the read/writes. Spinrite was fucking brilliant for this, it would give you a layout of the entire drive with red for sectors that failed to report the correct data back and blue for clean so it took just a second to glance at the readout to spot a drive that was buggy out of the box, but nobody has updated the tool in years so its useless now since it can't do SATA 6 or drives above 500Gb.
So how about it FOSS devs, here is the requirements: Bypass SMART, does a single R/W cycle, reports results. That's ALL it has to do anjd so far nobody has stepped up to the plate. damned near every shop I knew including mine had bought a copy of Spinrite so there is good money to be made there if you are willing to put in the work, its a niche but its a niche with money, builders, repair shops and gamers would all love to hand you money for this tool, so get on it and report back when its done, okay?

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Heh by Runaway1956 · 2012-12-23 07:26 · Score: 2

I saw nothing about any burn in tests in the GP post. The guy has a couple of scripts running to ensure that A) he is made aware of impending hard disk problems, and B) his data is backed up in the event of a hard disk problem.
Reading comprehension 101, available at a community college near you.
Unless, of course, you're just trolling a Linux user. In which case, feel free to continue making a fool of yourself.

--
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
Re:Heh by JMJimmy · 2012-12-23 07:30 · Score: 3, Informative

No, not SMART. I did a full range of tests with all suits on top of SMART (surface tests, etc)
The only HDD tool I trust is the ancient one from GRC.
Re:Heh by Anonymous Coward · 2012-12-23 07:47 · Score: 0

Norton Disk Destroy is the only tool ya need :)
Re:Heh by greg1104 · 2012-12-23 07:54 · Score: 5, Interesting

Spinrite hasn't been useful for years. There's a good analysis why at Does SpinRite do what it claims to do?. Everything the program does can be done more efficiently with a simpler program run from a Linux boot CD. And the fact that it takes so long is a problem--you want to get data off a dying drive as quickly as possible. Here's what I wrote on that question years ago, and the rise of SSDs make this even more true now:
SpinRite was a great program in the era it was written, a long time ago. Back then, it would do black magic to recover drives that were seemingly toast, by being more persistent than the drive firmware itself was.
But here in 2009, it's worthless. Modern drives do complicated sector mapping and testing on their own, and SpinRite is way too old to know how to trigger those correctly on all the drives out there. What you should do instead is learn how to use smartmontools, probably via a Linux boot CD (since the main time you need them is when the drive is already toast).
My usual routine when a drive starts to go back is to back its data up using dd, run smartmontools to see what errors its reporting, trigger a self-test and check the errors again, and then launch into the manufacturer's recovery software to see if the problem can be corrected by it. The idea that SpinRite knows more about the drive than the interface provided by SMART and the manufacturer tools is at least ten years obsolete. Also, getting the information into the SMART logs helps if you need to RMA the drive as defective, something SpinRite doesn't help you with.
Note that the occasional reports you see that SpinRite "fixes" problems are coincidence. If you access a sector on a modern drive that is bad, the drive will often remap it for you from the spares kept around for that purpose. All SpinRite did was access the bad sector, it didn't actually repair anything. This is why you still get these anecdotal "it worked for me" reports related to it--the same thing would have been much better accomplished with a SMART scan.
Re:Heh by SuperTechnoNerd · 2012-12-23 07:56 · Score: 4, Interesting

You have to interpret the data correctly. Looking at seek error rate and raw read errors tells if the heads are positioning accurately. Run the drive hard (read/write patterns )and watch the temperature. And of course if you start seeing a non 0 pending, and realloc sector count you know the end is near. And watch as a drive gets older the spin up time will increase. (I rarely shut the raid server down so this is less important). I have smartd email and text me any time things start to get out of a happy place.. I do nightly quick test and weekly extended tests. Smart is useful - if your smart about it...
Re:Heh by Burpmaster · 2012-12-23 07:57 · Score: 4, Informative

What you want is just 'badblocks -w '.
Re:Heh by Anonymous Coward · 2012-12-23 08:03 · Score: 0

So how about it FOSS devs, here is the requirements: Bypass SMART, does a single R/W cycle, reports results. That's ALL it has to do anjd so far nobody has stepped up to the plate. damned near every shop I knew including mine had bought a copy of Spinrite so there is good money to be made there if you are willing to put in the work, its a niche but its a niche with money, builders, repair shops and gamers would all love to hand you money for this tool, so get on it and report back when its done, okay?
I guess you've never heard of dd?
Re:Heh by greg1104 · 2012-12-23 08:10 · Score: 4, Interesting

SMART is a part of the modern drive's firmware. You can't bypass it. Anyone who tells you otherwise--such as the makers of Spinrite--is lying to you in order to sell a product.
The quality of SMART implementation varies significantly based on the manufacturer. Anecdotally, I have 3 failed Western Digital drives here that flat out lie about the drive's errors. Running the tool needed to generate an RMA does a full SMART scan of the drive, remaps some bad sectors, and then says everything is good. But it's not--each drive is still broken, in a way the firmware seems downright evasive about. Try to use it again, it doesn't take long until another failure. It does seem like the sole purpose of SMART and its associated utilities on WD drives is to keep people from returning a bad drive, by providing a gatekeeper in that process that never says there's a problem.
Most of my serious installations avoid WD drives like the plague for this reason. I think that Seagate's drives are probably less reliable overall than WD nowadays. Regardless I prefer them, simply because the firmware is more honest about the errors that do happen. Drives fail and I plan for that. What I can't deal with is drives that fail but don't admit it.
The reason there are "RAID edition" firmware available is to provide a drive that isn't supposed to be as evasive about errors. It may be that some WD RAID edition models might not have the problem I'm describing. I soured on them as a brand before those became mainstream.
Re:Heh by Culture20 · 2012-12-23 08:26 · Score: 4, Informative

My usual routine when a drive starts to go back is to back its data up using dd
ddrescue is the tool for backing up a failing drive unless you really want to manually check every failed sector read then restart a new dd (skipping to the next sector).
Re:Heh by fufufang · 2012-12-23 08:42 · Score: 1

That's nice, an OS used by less than 2% of the entire planet has some tool that reports what SMART is telling it, no different that a billion freeware programs for Windows. Just FYI but I can think of about a dozen freeware programs that will do the same damned thing in Windows, INCLUDING the email, so its not exactly like you got anything to brag about Ms AC.
Now I'm gonna spell out what the REAL problem is, which any guy who has spent time in the trenches will tell you and that is SMART SUCKS ASS and for several years has more about covering bad batches for the HDD OEMs than it has been for actually telling you something is going bad. I have had drives in the shop that sounded like an angle grinder bouncing on pavement where SMART said "Nope, nothing wrong here la la la"" while the thing just ground and sputtered, its the most fucking pointless diagnostic tool there is.
What we NEED is a replacement for Spinrite, something that bypasses the lying SMART and just runs a pass of zeroes and ones on the drive and reports a simple pass/fail on the read/writes. Spinrite was fucking brilliant for this, it would give you a layout of the entire drive with red for sectors that failed to report the correct data back and blue for clean so it took just a second to glance at the readout to spot a drive that was buggy out of the box, but nobody has updated the tool in years so its useless now since it can't do SATA 6 or drives above 500Gb.
You meant badblocks? It does exactly what you suggested. However it can't detect those dynamic bad sector remapping done by the firmware.
So how about it FOSS devs, here is the requirements: Bypass SMART, does a single R/W cycle, reports results. That's ALL it has to do anjd so far nobody has stepped up to the plate. damned near every shop I knew including mine had bought a copy of Spinrite so there is good money to be made there if you are willing to put in the work, its a niche but its a niche with money, builders, repair shops and gamers would all love to hand you money for this tool, so get on it and report back when its done, okay?
Re:Heh by koinu · 2012-12-23 08:44 · Score: 2

Why does SMART suck?
When I watch the SMART values and events I can tell about 3 weeks in advance before a hard drive fails. Also, the manufacturers watch the SMART values to check if a replacement can be offered or if you made some mistake.
To me SMART does not lie, but reports too much. It reports every replaced sector which is totally unimportant, especially when you buy a new hard drive, you will find faulty sectors in 50% of cases (quite normal). The hard drive with few faulty sectors on day one will function for decades correctly.
Re:Heh by BLKMGK · 2012-12-23 08:49 · Score: 3, Informative

Not exactly useless... There's a preclear script that many unRAID users use to beat up their drives while monitoring SMART. It doesn't just look at SMART for a thumbs up or down but monitors the various parameters that SMART throws out. Users run this multiple times in a row and find bad drives fairly regularly. I will admit that I've not been running it but judging from the numbers of folks who have been finding it useful and from the fact that warranties seem to be getting ever shorter I may begin doing so. I use a decent number of the 3TB drives that are always going on sale and I'm starting to think I'm tempting fate by not testing them. I've gotten spoiled in that my unRAID box covers my ass in the even of a failure but I see too damn many reports of new drives going toes up to not be concerned. I have 3 drives sitting on the shelf waiting to be loaded and I may beat them up beforehand just to be sure they won't screw me when I least expect it...

--
Build it, Drive it, Improve it! Hybridz.org
Re:Heh by Anonymous Coward · 2012-12-23 09:07 · Score: 0

That's nice, an OS used by less than 2% of the entire planet has some tool that reports what SMART is telling it, no different that a billion freeware programs for Windows. Just FYI but I can think of about a dozen freeware programs that will do the same damned thing in Windows, INCLUDING the email, so its not exactly like you got anything to brag about Ms AC.
Now I'm gonna spell out what the REAL problem is, which any guy who has spent time in the trenches will tell you and that is SMART SUCKS ASS and for several years has more about covering bad batches for the HDD OEMs than it has been for actually telling you something is going bad. I have had drives in the shop that sounded like an angle grinder bouncing on pavement where SMART said "Nope, nothing wrong here la la la"" while the thing just ground and sputtered, its the most fucking pointless diagnostic tool there is.
What we NEED is a replacement for Spinrite, something that bypasses the lying SMART and just runs a pass of zeroes and ones on the drive and reports a simple pass/fail on the read/writes. Spinrite was fucking brilliant for this, it would give you a layout of the entire drive with red for sectors that failed to report the correct data back and blue for clean so it took just a second to glance at the readout to spot a drive that was buggy out of the box, but nobody has updated the tool in years so its useless now since it can't do SATA 6 or drives above 500Gb.
You meant badblocks? It does exactly what you suggested. However it can't detect those dynamic bad sector remapping done by the firmware.
So how about it FOSS devs, here is the requirements: Bypass SMART, does a single R/W cycle, reports results. That's ALL it has to do anjd so far nobody has stepped up to the plate. damned near every shop I knew including mine had bought a copy of Spinrite so there is good money to be made there if you are willing to put in the work, its a niche but its a niche with money, builders, repair shops and gamers would all love to hand you money for this tool, so get on it and report back when its done, okay?
Re:Heh by PhunkySchtuff · 2012-12-23 09:09 · Score: 2

Get enterprise series drives, not consumer drives. One difference is the firmware is a lot more up-front about errors, rather than trying to hide them and carry on as if everything is OK.
In a RAID, you're going to want to fail a drive as soon as it starts to play up, whereas the average consumer wants a drive that doesn't turn around and die at the first small error, where it can remap sectors and pretend that nothing happened.
Part of the reason enterprise drives cost more, when they're often the same, or very similar, physical hardware is that the price includes the better warranty...

--
Specialist Mac support for creative pros, Melbourne
Re:Heh by GameboyRMH · 2012-12-23 09:32 · Score: 1

That's sort of the test IMO. They imediately get loaded up with data that's verified in some way afterwards. And I usually run a SMART test early on.

--
"When information is power, privacy is freedom" - Jah-Wren Ryel
Re:Heh by SuperTechnoNerd · 2012-12-23 09:41 · Score: 1

The clicking you hear is the heads loading and unloading ( parked). Look at the difference in 'Power cycle count' and 'Load cycle count' they should be about the same. However a deranged drive may load/unload its heads for several reasons: An internal controller reset. Spindle speed out of spec, or it fails too many thermal re-calibrations. Thees two parameters are a good sign of how the drive is doing. If they are way different then somethings wrong.. And some drives just sit in and endless load/reload cycle.. Click.. Click... Click Smash them with a hammer :)
Re:Heh by thegarbz · 2012-12-23 09:53 · Score: 4, Informative

No, not SMART. I did a full range of tests with all suits on top of SMART (surface tests, etc)
The only HDD tool I trust is the ancient one from GRC.
That is absolutely laughable. Spinrite is about as good at interfacing with a modern drive than an old 16bit dos program trying to sqeeze every ounce of performance out of a 64bit processor. It had it's purpose in its day. These days running it will more likely do more harm than good.
Not to mention that if your drive is at the end of life running a program that is widely known to give it a most horrendous thrashing is probably not a good idea.
Re:Heh by TeknoHog · 2012-12-23 10:10 · Score: 1

Duh, you'll just have to hack the Gibson.

--
Escher was the first MC and Giger invented the HR department.
Re:Heh by Anonymous Coward · 2012-12-23 10:41 · Score: 0

The quality of SMART implementation varies significantly based on the manufacturer. Anecdotally, I have 3 failed Western Digital drives here that flat out lie about the drive's errors. Running the tool needed to generate an RMA does a full SMART scan of the drive, remaps some bad sectors, and then says everything is good. But it's not--each drive is still broken, in a way the firmware seems downright evasive about. Try to use it again, it doesn't take long until another failure. It does seem like the sole purpose of SMART and its associated utilities on WD drives is to keep people from returning a bad drive, by providing a gatekeeper in that process that never says there's a problem.
.
This is why I never buy a "reconditioned" drive.
Re:Heh by LordLimecat · 2012-12-23 10:42 · Score: 2

Not useless, just not a good indicator of a drive NOT being near death. Its a great indicator to confirm that the drive IS dying-- if you see for instance 500 bad sectors, you may want to prepare to replace that drive.
Re:Heh by LordLimecat · 2012-12-23 11:04 · Score: 1

Not sure if youre aware but Windows since Vista I believe monitors SMART values and will actually pop up a message when your disk is in imminent danger of failure.
Re:Heh by Lennie · 2012-12-23 11:11 · Score: 1

I think he does claim certain people had success with spinrite with SSD.

--
New things are always on the horizon
Re:Heh by Anonymous Coward · 2012-12-23 11:47 · Score: 0

But here in 2009
Come join the rest of us in 2012, it's great here.
Re:Heh by washu_k · 2012-12-23 11:57 · Score: 2

Running spinrite against an SSD is one of the clearest ways of showing that it is complete BS. It will report all sorts of things about the drive that are clearly impossible. It won't error or give no data, it clearly makes things up about the drive.

Another good BS test for spinrite is to run it against a non-ATA drive that is still BIOS accessible. A booted USB flash drive is the best, but something like a modern SCSI/SAS controller works as well. It's clearly impossible for spinrite to access such a device directly, yet it still reports all sorts of things it simply could not see. No errors or blank data, it again makes shit up and displays it.
Re:Heh by Maow · 2012-12-23 12:06 · Score: 1

The problem is the best damned tool ever made for testing drives hasn't been updating in years and now won't work on drives bigger than 500Gb, I am of course talking about Spinrite. With Spinrite on lvl 2 you just bypass the firmware and write patterns of zeroes and ones and then read back what it reports, if its spitting errors right off the bat then you know to send it back. Problem is Gibson hasn't updated the thing since 06 so it can't handle drives bigger than 500Gb which makes it all but useless today.
So if anybody has found something that works similar to spinrite but works on large drives I too would like to know, I get drives coming in from all over the place at the shop with ZERO history here at the shop so I don't know if they've been barely used or thoroughly abused and having a tool I can run on them would be a big help.
Have you contacted Gibson about updating the software?
I'm curious why he hasn't updated it, has he given any reasons? Would he consider updating for a commission (although I imagine he'd make a good bit from sales regardless of initial commission).
Re:Heh by Anonymous Coward · 2012-12-23 12:38 · Score: 0, Insightful

It had it's purpose in its day.
Apostrophes are hard.
Re:Heh by Anonymous Coward · 2012-12-23 12:44 · Score: 2, Interesting

Agreed. I just recovered a very messed up 120GB drive with gnu ddrescue. It took over 7 days to read, but only lost 300MB of data. Very happy with the results.
Re:Heh by greg1104 · 2012-12-23 13:29 · Score: 1

Of course they claim it works with SSD. Snake oil sales are always driven by "it worked for me!" testimonial claims. That doesn't mean the product was the cause of the change.
Spinrite touches every sector on the drive in a way that gets the drive firmware to re-allocate bad ones. This does something that can be useful for all drives, SSD or not. But the program is not necessary to do so. You can use SMART tests to do the same thing, faster, and without paying for the software. And the claim that it's doing some low-level magic is even more obviously crap when you're running against SSD. It can't possibly know how to communicate with such a drive below the firmware level, yet it still suggests it can.
Re:Heh by Anonymous Coward · 2012-12-23 14:02 · Score: 0

Does throwing it in a degraded raid array and having it rebuild count?
Re:Heh by aNonnyMouseCowered · 2012-12-23 14:32 · Score: 1

Mod parent up. A hopefully not woefully wrong layman's explanation for the program's advantages over plain dd: ddrescue or gddrescue can skip unreadable sectors based on some parameter like number of retries.
By default ddrescue also doesn't zero out or truncate its output. So if during one pass you sucessfully copied parts 1 to 10, you can resume copying from parts 11 to N. This is a nice feature because a sector that's unreadable during one pass may turn out to be readable during the next pass. I've not used ddrescue yet on magnetic media, but it has saved the data off quite a few DVD/CDs I used for backups before I decided to just dump all my data off several cheap external hard drives.
There are probably a few other tools similar to ddrescue, available via the Debian/Ubuntu repositories: apt-cache search is your friend.
Re:Heh by rev0lt · 2012-12-23 15:30 · Score: 1

On the system of a real computer user, every disk has a line like this in /etc/smartd.conf: /dev/disk/by-id/ata-COMPANY_MODEL_SERIAL -a -d sat -n never -m root@intranet.myhomenetwork -M diminishing -s (L/../../5/17)
On the system of a real computer user that actually CARES about data, you'd have at least a good RAID setup or something like ZFS. And not only smartd isn't part of most real-world operating systems (its a package), but also SMART info is usually lacking in preventing any kind of failure. I actually have on my desk several broken disks with pristine SMART data.

Doing that part to keeping your data safe doesn’t cost any relevant effort at all.
Maybe, but you actually did noting in this regard. It is way more useful (but still dumb as fuck) to scan the system log for ata crc and timeout messages (ever seen those?) than just gathering SMART data. Another point is that - even if you actually get relevant SMART errors, most of the time you are already late.
Re:Heh by Pentium100 · 2012-12-23 15:35 · Score: 3, Interesting

MHDD works best for me for testing the drive. Spinrite (and ddrescue) is good for data recovery, but not that good for testing. I had one drive that have a lot of sectors that were good, except that the drive took 10-30 seconds to read them making the PC extremely slow (Windows would drop to PIO mode and be slow even when reading the good sectors).Chkdsk didn't detect anything, Spinrite didn't detect anything, only mhdd showed lots of slow sectors (I later made a list and manually marked them as bad, getting a 2.5" IDE drive is not that easy or fast, so it will have to do until then).
Re:Heh by Golddess · 2012-12-23 16:03 · Score: 1

Running the tool needed to generate an RMA
This must be something new. Like, within the last few months new. Because out of the few consumer-grade Western Digital drives I've had to RMA over the years (one from May of this year), not once did I need to run any sort of program before I could generate the RMA and ship the drive back. Nope, it was as simple as entering the serial number in a form on their website, verifying that the drive was under warranty, and clicking "Generate RMA".

--
"I'm not sure I like the fugnutish tone you used in your post!" -RogL (608926)-
Re:Heh by Nutria · 2012-12-23 16:40 · Score: 2

Computer hardware is cheap
Relative to 10 years ago, but $150 here, $100 there and $75 somewhere else add up for an impoverished college student, or a middle class family with other expenses out the wazoo to pay.

--
"I don't know, therefore Aliens" Wafflebox1
Re:Heh by 1s44c · 2012-12-23 17:15 · Score: 1

To me SMART does not lie, but reports too much. It reports every replaced sector which is totally unimportant, especially when you buy a new hard drive, you will find faulty sectors in 50% of cases (quite normal). The hard drive with few faulty sectors on day one will function for decades correctly.
If your new drive has reallocated sectors it's broken and will likely fail soon. It's unlike to survive decades if it's in actual use.
Re:Heh by toddestan · 2012-12-23 17:28 · Score: 2

I wouldn't ignore it. While SMART saying everything is okay doesn't mean much, SMART telling you that there is a problem is a definite reason for concern.
Re:Heh by toddestan · 2012-12-23 17:37 · Score: 1

Because even if they are cheap and you have backups, replacing a bad hard drive is a pain in the ass, even more so when it's the system drive? (unless you also use RAID, I suppose).
Re:Heh by the_B0fh · 2012-12-23 19:38 · Score: 1

Had to. On black Friday, I bought 3 seagate 3TB drives from Newegg ($80 each, had to!!!! :))
1 was DOA.
1 died after 2 days
1 still going strong after 4 weeks of nonstop writes and deletes.
Called newegg up, and returned the 2 bad ones. Ugh, seagate really really sucks now.
Re:Heh by hairyfeet · 2012-12-23 22:16 · Score: 1

He's never said but I'm guessing he's retired now, he hasn't updated anything since 06 so there ya go.
And all these PITA GNU-Tool crap don't cut it, I need something I can throw in a drive, hit a button and move onto the next, i just don't have time to sit and baby the damned thing and I couldn't care less about the data on the drive, all I want to know is if the drive is good or not, a simple pass/fail on the drive.
So whether Spinrite was snake oil on its data recovery? Fuck if I know, never used it for that. But I can tell you that lvl 2, where it just bypassed the Smart and did a simple read/write and report worked fucking brilliant. I could pop that into a drive, hit a single button and move on, when i came back i would have a simple graph that you could see instantly whether its a pass or fail.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Heh by hairyfeet · 2012-12-23 22:23 · Score: 1

Look all I can tell you is what I've seen with my own two peepers and drives that the SMART was saying "la la la, nothing wrong, la la la" I'd run Spinrite on and the screen would be RED as blood from all the bad blocks on the thing. I could run a single pass with Spinrite and tell you in less than 10 minutes if a drive was good or if it was shit and I have yet to find a single tool that will do that today. As you pointed out all SMART is today is a con by the OEMs to keep from paying for RMAs, I too have had drives that were sounding like you'd dropped them down a flight of stairs and WD Tools would just make the drive lie its ass off, that's all.
So what we need is a tool that did the same thing Spinrite did on modern drives, just something I can throw into a machine, hit a single button, and come back in an hour and see whether its a good drive or a shit drive, that's all. All I've been able to find is either a bunch of CLI crap you have to baby and keep typing shit into or GUI tools that just spit back whatever SMART tells it, neither are any use in a shop where you got 40 drives sitting in a box you need to test.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Heh by hairyfeet · 2012-12-23 22:38 · Score: 1

Because more recent drives, say made in the last 3 years (after Seagate started having major fails in the over 640GB drives) are frankly more about CYA of the OEMs so they don't have to pay for RMAs than it is for giving you useful data. read the post above you about the guy running WD Tools and having the drive supposedly "fix itself". I have seen this exact same behavior with SeaTools and most of the time the OEMs will insist you run these tools before giving you an RMA.
In a way its similar to the horseshit HP pulled when Nvidia had bumpgate, they would demand you run this "BIOS update" that only set the fan to run at 100% which would make the laptop sound like an F15 taking off but they wouldn't let you send it back because of noise and the high revving fan would keep the chip from failing until a week or two after the warranty went out. Its the same thing with the new drive tools, instead of giving you an honest report it basically takes the data SMART is giving, which is often bullshit as I've found some of the newer drives won't even report remaps until they get above a certain "threshhold" and then tells it to just use up all the extra sectors and ignore updating the SMART data. What you get is a drive that could be making noise like an angle grinder but which will report "100%" under SMART.
To use a /. car analogy it would be like an automobile manufacturer finding out its cars leaked oil but rather than have a recall simply update the car's firmware to say the car was fine until the warranty ran out. SMART was SUPPOSED to be a VERY accurate way to gauge a drive's health, for the past 3 years or so its just been a way to turn down RMAs.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Heh by dhaen · 2012-12-23 22:40 · Score: 1

I've deployed hundreds of enterprise Seagate drives over the last 5 years as they're my maker of choice. I stick to these as I trust Seatools. Overall the failure rate has been lower than expected, none failed prematurely and one was D.O.A - a 3TB drive. Whilst I haven't had cause to graph the failures, I'm pretty sure the failures are closer to random that bathtub.
Re:Heh by richy+freeway · 2012-12-23 22:46 · Score: 1

Had to do a recovery for a customer once. Used ddrescue as I always do. Took literally 9 months of solid recovery but it worked, got 100% back.
Re:Heh by Anonymous Coward · 2012-12-23 22:59 · Score: 0

Now I'm gonna spell out what the REAL problem is, which any guy who has spent time in the trenches will tell you and that is SMART SUCKS ASS and for several years has more about covering bad batches for the HDD OEMs than it has been for actually telling you something is going bad
I wish I had mod points..
Remember the IBM 'Deathstars'?, the fault there was SMART related(at least on the examples that passed through my hands), and I've never quite trusted SMART since, which leads me to Spinrite.

What we NEED is a replacement for Spinrite, something that bypasses the lying SMART and just runs a pass of zeroes and ones on the drive and reports a simple pass/fail on the read/writes. Spinrite was fucking brilliant for this
Agreed,
Now, I've seen what I regard as an irrational hatred for Spinrite appear elsewhere in comments above/below (It's more probably a manifestation of hatred of Steve Gibson, but hey, let's not confuse the man with his software or any hyperbole he generates ).
Can Spinrite-as-is (or a modern rewrite) hope to cope with all the 'bespoke' BS that drive manufacturers put in their firmware nowadays? probably not (which is no doubt a factor in why Spinrite hasn't been updated for quite some time now). All I'll say here is that 'back in the day' I used it to recover irreplaceable research data from a couple of 'dead' hard disks, so if it is/was 'snake oil' as some other commentards assert, then it was bloody good snake oil. (Yes, yes, other tools may have done the same job, point here is, so did Spinrite).
Ok, so I last used Spinrite about 10 years ago, I've not even looked at the GRC site for about 5 years (till today), and nowadays I don't even bother trying to keep faulty drives going and I mostly use ddrescue/dd_rescue for simple data recovery, that's not the point, back then, Spinrite was available, claimed to do the job, and worked.
Back to the IBM 'Deathstars', when they started failing in the systems I was unfortunate enough to be looking after at the time (back in 2001/2002), I pulled them from service then let Spinrite do it's thing on them (on the basis of, 'they're fscked anyway, what harm can it do?'), I recovered some critical data and got about another 5 years further 'non-critical' use out of them (running on machines with SMART disabled). 10 years on from the time it 'failed', I still have one of these 'Deathstars' lurking in a drawer which still works, (at least, it did 8 months ago, when I last fired it up in the spirit of 'does-this-old-POS-still-work?').
How I wish I could find a similar 'snake oil' which would do as good a job for my lower back pains..
Re:Heh by jon_doh2.0 · 2012-12-23 23:34 · Score: 2

God, the grammar Nazis are breading on Slashdot. Fuck off!
Re:Heh by Anonymous Coward · 2012-12-23 23:39 · Score: 0

And you're feeding them bread to help them breed?
Re:Heh by yoshi_mon · 2012-12-23 23:57 · Score: 1

What I've observed is that there seems to be an uptick in the current failure rate of all drives. Both WD and Seagate have plenty of issues when you read the reviews on Newegg.
My personal theory, and I have 0 facts to back this up so take it for what you will, is that they have salvaged what they could from the fab plants that went under water. They then gave the best of their stock to the OEMs and are selling the rest as over the counter OEM/retail.
Their reasoning being that while their OEMs have contracts/constant cash flow/etc, end users who buy a drive here or there they could care less about. If a drive fails well you get to ship it back to them, on your dime, and cross your fingers that what they send you back will work any better.
My hope is that this will clear up after they burn though the stock of salvaged parts...but given that the market has allowed to become a duopoly I'm not sure what to say.

--

Really, I know what I'm doing...Ohhhh, look at the shiny buttons!
Re:Heh by Anonymous Coward · 2012-12-24 00:18 · Score: 0

I second that. Or dd_rescue, which sounds like the same thing and does function similarly, but is a different implementation.
Another very useful tool is testdisk, which can operate on the drive itself or more safely on a disk image made with ddrescue or dd_rescue. I used testdisk to retrieve several hundred GB of files from an almost brand new disk that had a load of bad blocks. A friend had just backed up all their files to an external drive, deleted the original, and then planned to copy things back after a reformat of their main system drive when they discovered the external had gone bad. Talk about bad timing. Lesson learned. They now make two backups. Anyway, it took weeks to read and re-read the files from the disk, and we didn't get 100% of it back, but we got over 90% of the files.
Re:Heh by unitron · 2012-12-24 00:37 · Score: 1

My usual routine when a drive starts to go back is to back its data up using dd
ddrescue is the tool for backing up a failing drive unless you really want to manually check every failed sector read then restart a new dd (skipping to the next sector).
dd_rescue
on the MFS Live cd v1.4 has saved me in the past after freezing the bad drive and then running with the -r switch and a default of 512 and a fallback of 1.

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:Heh by Richy_T · 2012-12-24 01:06 · Score: 1

I RMAed some drives last month and the way you describe is the way it is. Though I think it did seem to imply you would need to use the tool at some point, it didn't actually ask for the results (Though I printed them out and stuffed them in the boxes anyway)
Re:Heh by Electricity+Likes+Me · 2012-12-24 01:23 · Score: 1

This is asking for the wrong solution to the wrong problem.
If you don't care about the data on the drive, then this is what RAID is for - what it was designed for. High availability. If the drive fails, you don't care and you just replace it.
Spinrite and related tools are answering the question users end up with though which is "I know I should've backed up, but I never got round to it..."
Re:Heh by the_B0fh · 2012-12-24 01:55 · Score: 1

I'm willing to believe "enterprise" seagate drives have a different QC/design.
The consumer variants really and truly suck.
But having these suck means they're not getting my money when I perform enterprisey buys at a $7b company.
Re:Heh by drinkypoo · 2012-12-24 02:01 · Score: 1

Had to do a recovery for a customer once. Used ddrescue as I always do. Took literally 9 months of solid recovery but it worked, got 100% back.
Congratulations, you just discovered what Raspberry Pi is good for. I'm not aware of a cheaper option when you consider power costs. You could use pogoplugs or dockstars but you'd still want a power supply solution to avoid proliferation of wall warts and unless you decased them you'd still end up with a larger solution with the tiniest ones.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Heh by drinkypoo · 2012-12-24 02:05 · Score: 1

I use a decent number of the 3TB drives that are always going on sale and I'm starting to think I'm tempting fate by not testing them.
I got a couple of goflex drives and was gratified to learn that they have barracuda drives inside, which is nice because I haven't had a problem with one since the original black ultra-narrow-scsi half-height 3.5" case heaters. But I got a couple of them literally; one gets backed up to the other and, since I don't have a good place to put it offsite, the backup goes into a fire safe. The only way you're tempting fate is if you're not making backups. Yes, buying disks in pairs increases the cost, but what are you going to back up to?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Heh by drinkypoo · 2012-12-24 02:09 · Score: 1

Because more recent drives, say made in the last 3 years (after Seagate started having major fails in the over 640GB drives) are frankly more about CYA of the OEMs so they don't have to pay for RMAs than it is for giving you useful data.
This has always varied widely drive-by-drive, not just manufacturer-by-manufacturer. Some drives have NEVER had useful SMART diagnostics (as in, during the time period since people started implementing it) and some always have had. If a manufacturer makes a lemon they'll probably make it lie. You have no way to know save by buying disks and doing stress tests.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Heh by richy+freeway · 2012-12-24 02:11 · Score: 1

I have a Debian box dedicated to the job with hot swap drive bays and loads of storage for imaging too. Massive UPS could keep it all going for a good few hours before it gave up. :) ddrescue truly is a fantastic tool. Just mounted the partition image and pulled all the files out. Never thought it would work at all but when its going for the first month and not getting in the way its impossible to give up on it :)
Re:Heh by Anonymous Coward · 2012-12-24 02:50 · Score: 0

It's not a backup if you intend to delete the original
Re:Heh by hairyfeet · 2012-12-24 03:09 · Score: 1

Apparently you don't understand the problem so I'll spell it out. I have a shop, in this shop LOTS of drives are coming from all over the place. Now I don't care about the data because they are gonna be wiped and put in systems, okay? What I NEED is as simple a tool as possible, one button is fine, no button is better, that I can pop into a big old tower I have sitting in the corner with a pile of drive cages in it so that it can give me a simple pass/fail on the drives, THAT'S IT.
So you see your "solution" is worth less than nothing, I'm not gonna RAID a bunch of strange drives and then try to sell the mess, okay? with Spinrite it didn't matter if it were one drive or a half a dozen i could just pop in the disks, tell it to do a single pass on all of them, and when the tests were done all I had to do was pop the down arrow and it would give me a simple layout of the drive with red marking bad sectors and blue marking clear, easy peasy and I could tell at a glance which drives were good and which were shit and THAT is what I need, not RAID.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Heh by Anonymous Coward · 2012-12-24 05:13 · Score: 0

I saw nothing about any burn in tests in the GP post. The guy has a couple of scripts running to ensure that A) he is made aware of impending hard disk problems, and B) his data is backed up in the event of a hard disk problem.
Reading comprehension 101, available at a community college near you.
Unless, of course, you're just trolling a Linux user. In which case, feel free to continue making a fool of yourself.
Speaking of reading comprehension, the people before you apparently have trouble understanding that the summary is asking slashdot about NEW drives, not ones which have data on them. It's a pretty simple question- do you test prior to using a new drive?
Personally, I run a tool which just does a complete format (write/read the entire disc) and watch the SMART performance. Most drives will have some failed sectors from the manufacturing process, and can develop a few more during shipping, so it's usually well worth the wait. As long as it doesn't show an increasing problem, it's probably going to be fine.
But then again I'm not storing Enterprise data or using Enterprise hardware, just normal consumer grade stuff. At work we have a series of automated tools (which you can usually get from the manufacturer if your place of business has a good support contract) that we run on each new drive prior to putting them into production. We have a test lab set up where we install the new drives and run stress and performance tests, etc. Basically we thrash the holy fuck out of the drive in every possible way for about a week, using a combination of generic testing patterns/tools as well as some stuff which simulates actual behavior of the system we're going to put it in, and if it holds up to the abuse we clear it for installation in the production racks.
Re:Heh by Reziac · 2012-12-24 05:28 · Score: 1

What would you recommend as a live-ISO to use with ddrecue already included, specifically for folks who are NOT linux-savvy?
I've used an old version of Ghost to copy a sick drive... it did thousands of retries on bad sectors, took ~24 hours to copy a few GB, but got it all -- with NO errors found in the resulting disk image. I was amazed.

--
~REZ~ #43301. Who'd fake being me anyway?
Re:Heh by BLKMGK · 2012-12-24 05:45 · Score: 1

Well... I have about 20TB worth of data, mostly static media, but many many hours worth of work building it. This isn't something I can stick on a portable and lock away I'm afraid. I could certainly backup my music to such a drive, and a I do, but the rest? Yeah not so much. I do need to backup the drive holding my ESX VMs but that's not looking like cake either. I'm protected from a single drive failure and I've had them more than once. If I lose two drives I lose the data on those two drives, nothing more (unRAID). A lightning strike or fire would suck to say the least...

--
Build it, Drive it, Improve it! Hybridz.org
Re:Heh by Anonymous Coward · 2012-12-24 06:34 · Score: 0

And that person responded to someone saying that they didn't run burn in tests. Learn to read.
Re:Heh by Culture20 · 2012-12-24 07:05 · Score: 1

Dunno, I made my own fedora spin with ddrescue, gparted et al. If you can stomach the newer Ubuntu live CDs, it's easy to apt-get ddrescue to the live OS. Just make sure you have a flash drive or a second (third)HDD to store the log file. Without the log file, you can't retry sectors or restart copying only uncopied or failed sectors.
Re:Heh by digitalsolo · 2012-12-24 07:23 · Score: 1

God, the grammar Nazis are breading on Slashdot. Fuck off!
Pretzel breading or just a simple flour breading?

--
Just another ignorant American.
Re:Heh by hairyfeet · 2012-12-24 09:38 · Score: 1

But then we are right back at square one and needing a "new Spinrite" which we sadly don't have.
Frankly i don't give a shit if Steve Gibson was waving chicken bones when it came to the data recovery crap, it did ONE job very very VERY well that I have yet to find a tool to replace it with and the job was thus: You could slap a half a dozen drives into a box, slap in Spinrite, push a SINGLE BUTTON to have it check ALL the drives fully automated, and at the end of it just by pressing the down arrow i could look at a VERY easy to read at a glance layout of the drive. Red meant bad sector, blue meant good.
With a tool THAT simple and painless frankly it made testing a box of 40 odd sized drives a pleasure, no other tool i have ever found will do that one job, not one. if you are a programmer or know one this would be a niche you could make damned good money in, there are plenty of guys like me that would be happy to hand you $20-$40 for a disc that did what I just described. as it is I still use Spinrite on the under 500GB drives just because its so damned much easier to have a big old open sided ATX box with a SATA controller and a couple of extra SATA and IDE slots sitting in the corner where I can just slap a half a dozen drives in the cages at a time and have them all be tested while I'm working on other things.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Heh by suss · 2012-12-24 10:44 · Score: 1

Spinrite crashing on big disks is supposedly because of a bug in freedos (which it boots from) and should be fixable by replacing it with ms-dos...
Re:Heh by jon_doh2.0 · 2012-12-24 11:42 · Score: 1

lol. You sure got me.
Re:Heh by Anonymous Coward · 2012-12-24 12:29 · Score: 0

Trinity Rescue Kit
Comes with ddrescue and smartmontools, along with a lot of other useful software for working with Windows systems.
Re:Heh by Anonymous Coward · 2012-12-25 02:56 · Score: 0

Add to the above:
HDD tools are useless. I recently tried a bunch of them - they all reported my HDD in perfect condition... while it was doing the click of death. HDD failed within a week.
Drive failures are mostly non-linear. Just like heart attacks. See Black Swan
Re:Heh by Reziac · 2012-12-25 04:26 · Score: 1

An AC coughed up this:
Trinity Rescue Kit -- http://trinityhome.org/
Comes with ddrescue and smartmontools, along with a lot of other useful software for working with Windows systems.
Sounds quite useful. Thanks to both of you!

--
~REZ~ #43301. Who'd fake being me anyway?
Re:Heh by RockDoctor · 2012-12-25 09:16 · Score: 1

getting a 2.5" IDE drive is not that easy or fast, so it will have to do until then).
What? Which benighted pert of the world are you stuck in that you can't readily find a 2.5" hard drive? Without more than 3 words of the local language, I could find a laptop hard drive in Dar es Salaam within 4 hours of starting to look, on a Sunday, fresh off a plane.
Or are you just extremely poor? (Nothing wrong with that, but it does impose it's own constraints which have nothing to do with the general availability of hardware in your region.)

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Re:Heh by Pentium100 · 2012-12-25 10:21 · Score: 1

Oh, the stores have lots of 2.5" SATA drives, but no 2.5" IDE drives. While it is possible to use an adapter to connect a SATA drive to an IDE bus, that adapter does not fit inside a laptop along with the SATA drive.
No point in buying an old used drive, since it too may be days before failure. I could most likely get a NOS drive on ebay, but shipping from another country takes time (especially around Christmas).
Also, the laptop was not mine, and the owner did not want to spend money on a new drive and wait for it to arrive.
Re:Heh by RockDoctor · 2012-12-25 11:00 · Score: 1

OIC.
Hmm, yes, 2.5in IDE drives could be a problem. There are antique companies that deal with flint-knappping and semaphore flag stitching too, I'm sure. And yes, I've had to nurse along antique hardware too. Not fun. Something to be done for as long as necessary to recover data from the machine and then replace it with something more maintainable. There are times when a bullet in the back of the head really is the kindest thing.
And if your Boss doesn't understand that, it's time to give him the task of fixing the problem himself, or of finding a replacement for you.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Re:Heh by Pentium100 · 2012-12-25 12:53 · Score: 1

My boss understands it, this however, was not for him, but for a friend of the family. I made the hard drive work, told him that the drive can fail at any time now and said that the next time he most likely will need a new drive as this one will have failed completely. The laptop does not hold any valuable data, it is just used for repairing cars, even if the drive fails, he will only have the inconvenience of the time it takes me to find a working drive and install Windows.
And I got some experience and the knowledge of how to manually mark clusters as bad in FAT32 filesystem.

And yes, I've had to nurse along antique hardware too. Not fun.
In some cases it is even more fun that playing with brand new hardware. I would not have repaired a motherboard with a 286 if it wasn't fun.
Re:Heh by RockDoctor · 2012-12-26 07:26 · Score: 1

Ah ; essentially a hobby system rather than a mission-critical system. Fun enough for Saturday between lunch and pub.
Try some industrial back-plane computers running ROM-installed software (no source code available) on 8088s. They still do the DAQ job ... but one day the last will fry and my former employer will have to build a new system. S.E.P. : someone else's problem.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Re:Heh by Pentium100 · 2012-12-26 07:41 · Score: 1

ROM-installed software (no source code available)
Maybe it was written in asm then there would not really be any source...
Those computer can most likely be repaired more easily than modern ones - my 286 has a lot of common chips (mainly 74LS or 74F series) and uses low frequencies (low enough to see with my 100MHz scope) so if a chip fails it is possible that I could replace it.
Yes, your former employer will have to replace the system some time in the future, but it could be in 10 years or maybe he will no longer need that system.
Also, the economy is such that I would not quit unless there were bigger problems than having to maintain old hardware, because it may be that I won't be able to find a job for quite some time.
Re:Heh by Wolfrider · 2012-12-26 08:27 · Score: 1

--Try this:
o Run the manufacturer's testing tool on the drive (SeaTools, etc)
( boot Linux CD, such as Systemrescue CD or Knoppix )
' blockdev --setra 8192 /dev/sda ' # speed up I/O
' time badblocks -f -c 10240 -n -s -v /dev/sda ' # R/W entire disk looking for badsectors
SMART short test:
' smartctl -H /dev/sda ' # Check overall health
' smartctl --test=short /dev/sda '
--If you're really worried:
# smartctl --test=long /dev/sda
--View results:
# smartctl -a /dev/sda

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Re:Heh by Wolfrider · 2012-12-26 08:33 · Score: 1

--Oh, just one thing to be aware of -- on 1TB+ disks, the badblocks R/W test can take ~13+ hours. However, I run that test on ALL of my new drives - even before formatting them, and have had excellent results. Barring a factory defect, my drives last for years and years. (noatime helps!)

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Re:Heh by RockDoctor · 2012-12-30 08:25 · Score: 1

ROM-installed software (no source code available)
Maybe it was written in asm then there would not really be any source...
Most likely it was. The programmers who wrote the code were sacked during the next slump - and re-hired in the next boom when updates were needed and no-one could find anything resembling source code, programming notes, or anything. That happened several times over boom-bust cycles until I left (and stopped caring), but I believe that the programmers simply didn't bring their (putative) notes back into the country they were being employed in, having learned to screw the corporation for every penny they could get. "Want something fixed ... that'll be X for the first test version, then Y for each revision cycle afterwards."
Which is the way to treat a bunch of corporate back-stabbers, once they've demonstrated unequivocally what a bunch of scum-suckers they are.
Unfortunately, corporate entities are probably smarter about ownership of code and contracts these days.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Re:Heh by randyleepublic · 2013-01-01 20:36 · Score: 1

>> Spinrite... had its purpose in its day.

Spake the voice of small experience. Spinrite still dependably recovers failing drives. First you use level 2 to clean the drive up enough to recover the data. Then you stress test with level 4. If it passes level 4, use it. Otherwise, toss it and put the data on a new drive. No, this does not work every time - obviously if the drive is clicking or severely overheating you have to take more extreme measures. But it does work often enough to make it still worth having. No, I don't do this with server drives. Yes, I do it with workstation drives regularly.

--
Social Credit would solve everything...
Re:Heh by randyleepublic · 2013-01-01 20:37 · Score: 1

Mmmm. Bread! With butter!

--
Social Credit would solve everything...

No by Anonymous Coward · 2012-12-23 05:24 · Score: 0

No.

No by Anonymous Coward · 2012-12-23 05:25 · Score: 0

No

Re:SSDs by roc97007 · 2012-12-23 05:29 · Score: 5, Insightful

> Who cares about HDDs anymore these days?

Anyone with a need for a massive amount of storage space.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

dban followed by smartctl by X0563511 · 2012-12-23 05:30 · Score: 3, Interesting

If dban can write out every sector and not have smartctl show any pending sectors after the fact (and the average speed of the dban wipe was normal) then you've got good chances the drive will be fine.

--
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...

Re:dban followed by smartctl by Anonymous Coward · 2012-12-23 05:43 · Score: 0

While I don't know what dban is, the idea of writing every sector and then using smartctl seems to be a good one. I personally usually just do a long format, which works in either windows or linux and, oddly enough has identified a couple of drives early as they didn't survive the long format. Of course, last time I replaced a failed drive in a raid 6 set I just replaced it, counting on the redundancy left in the array to last long enough to make sure the drive was good. Also, the array repair automatically writes every sector...
As far as time goes, well its not as if you have to pay attention to the testing, so its not really a delay unless you need that drive online right then...
Re:dban followed by smartctl by bill_mcgonigle · 2012-12-23 05:45 · Score: 5, Interesting

Yes, this. I do it online:

dd if=/dev/zero of=/dev/sdX bs=8M

and then check smartctl. If I'm making a really big zpool, I fill them up and let ZFS fail out the turkeys:

dd if=/dev/zero of=/tank/zeros.dd bs=8M zpool scrub tank

If I'm building a 30-drive storage server for a client I'll often see 1-2 fail out. Better to catch them now then when they're deployed (especially with the crap warranties on spinning rust these days). I need to order in staggered lots anyway, so having 10% overhead helps keep things moving along.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:dban followed by smartctl by mathew7 · 2012-12-23 06:30 · Score: 1

I actually do read-write-read tests (dd_rescue because it can keep going after an error), with smartctl in between each and judge by myself what changed. I did it so often that I bothered to make a bash script to do everything and run overnight (I think it takes 3 hours for each step on my 2TB drives).
The idea for the 1st read is to update the "pending" list (if it needed), followed by writing to rectify it. A final read to see everything is ok (no pending increase).
Although I has several HDDs, I only had 1 pending on a new HDD, and that was on a 2TB 4KB/sector WD green drive when I played with the 63-sectors compatibility jumper. I still have that after 1 year.
Re:dban followed by smartctl by Anonymous Coward · 2012-12-23 06:46 · Score: 0

>> While I don't know what dban is,
Really? I just found this to be kind of odd in a comment on ./
Re:dban followed by smartctl by simplypeachy · 2012-12-23 07:26 · Score: 1

Amen to that! I tend to let the disk acclimatise at room temperature if it's been sat in a cold warehouse, and then run the short, long and conveyance SMART tests. Assuming they don't report any errors I dban it with the PRNG stream with verify all passes, at least three rounds, then check all of the SMART attributes and error logs again. If it survives that then it might even live until the end of the month! SMART isn't a be-all-end-all, but I tend to accept a disk is faulty if SMART says so.

I don't use vendor-supplied diagnostics any more. I've seen both Seagate and Western Digital disks with very nasty errors (audible faults, hundreds of re-allocated sectors) pass their tests with flying colours.
Re:dban followed by smartctl by Redlazer · 2012-12-23 08:04 · Score: 1

I'm a computer technician, and I'm always looking for new and better ways to test our equipment. Linux is one of my favourite testing tools, but right now I don't have a good way to test hard drives in Linux - I use stress to see if I get any errors or crashes on a suspicious drive. We use a Seagate boot disc for most of our testing though.
I'm interested in using your technique, but I don't understand it well enough to give it a go on someone else's data. Can you explain it a bit please?

--
Guns don't kill people, "with glowing hearts" kills people.
Re:dban followed by smartctl by X0563511 · 2012-12-23 09:10 · Score: 1

Skip the short test - that's all run during the long test. Don't skip the conveyance test though, supposedly that does some additional testing specific to the drives having been transported that the short/long tests does not cover.

--
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Re:dban followed by smartctl by simplypeachy · 2012-12-23 09:30 · Score: 1

Thanks for the tip. I also read that about conveyance - I assume it checks that all is well after being thrown around in a delivery van :-)
Re:dban followed by smartctl by Anonymous Coward · 2012-12-23 09:32 · Score: 0

Look everyone...the "spinning rust" dork is back.
Re:dban followed by smartctl by Anonymous Coward · 2012-12-23 10:16 · Score: 0

One of the criteria I use when shopping for a new drive is a minimum 3-year warranty. Many drives have only 2 years, so sometimes I have to pick a drive that would otherwise be my second choice. But I've had too many drives fail in the second-to-third year interval. A number have been low-end Maxtor drives, from a decade or so ago, but other vendors' drives as well.
Re:dban followed by smartctl by kasperd · 2012-12-23 12:02 · Score: 1

the idea of writing every sector and then using smartctl seems to be a good one.
In the past I have done a sequential write of all sectors with pseudo-random data, followed by a verification read to ensure that every sector contained the value I had written to it. Other than actually testing most of the surface of the disk, it would also catch any drive lying about its capacity. I don't know if there are hard drives, which lie about their capacity, but I have seen it on SD cards.

--

Do you care about the security of your wireless mouse?
Re:dban followed by smartctl by 1s44c · 2012-12-23 17:29 · Score: 1

While I don't know what dban is
I've never used it. I normally use shred which appears to do the same.
http://www.dban.org/
Re:dban followed by smartctl by DMUTPeregrine · 2012-12-23 18:38 · Score: 1

Essentially he's writing zeroes to the entire drive (wiping the drive) and then checking SMART to see if any write errors were reported.
The zfs method is similar, writes zeroes to the pool and tells zpool to scrub the pool, which is essentially an online fsck. That will report which disks (if any) had write errors.

--
Not a sentence!
Re:dban followed by smartctl by cblack · 2012-12-23 19:26 · Score: 1

dd does a block copy from input to output, /dev/zero is a device in linux that will always read zeros. smartctl is a cmdline utility (part of smartmontools package) that can be used to read error logs and counters from hard drives. So he uses dd to write a bunch of zeros to a new drive in blocks of 8MB and then checks to see if the smart firmware reported any errors.
http://smartmontools.sourceforge.net/man/smartctl.8.html
The zpool/scrub bits are specific to running a zfs pool but could be useful to check a batch of drives at once.
Re:dban followed by smartctl by unitron · 2012-12-24 01:11 · Score: 1

Would you care to share that script? asked the fellow dd_rescue user.

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:dban followed by smartctl by unitron · 2012-12-24 01:14 · Score: 1

Skip the short test - that's all run during the long test. Don't skip the conveyance test though, supposedly that does some additional testing specific to the drives having been transported that the short/long tests does not cover.
Run the short test.
If it fails that, no need to waste hours on the long one.

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:dban followed by smartctl by mathew7 · 2012-12-24 04:30 · Score: 2

Now that I searched for it, it seems I used dd. But anyway, here it is:
#!/bin/sh hdparm -i $1 smartctl -a $1 date time dd if=$1 of=/dev/null bs=1M date smartctl -a $1 time dd if=/dev/zero of=$1 bs=1M date smartctl -a $1 time dd if=$1 of=/dev/null bs=1M smartctl -a $1
I run the script followed by "| tee result.txt". In case you want to change to dd_rescue, bear in mind that it outputs a lot of data (progress) which should not be redirected.
Re:dban followed by smartctl by Anonymous Coward · 2012-12-24 04:43 · Score: 0

Please don't give it a go on someone else's data. This technique is for testing drives before they go into service; it wipes the entire the drive in the process.
Re:dban followed by smartctl by Anonymous Coward · 2012-12-24 08:36 · Score: 0

I'm interested in using your technique, but I don't understand it well enough to give it a go on someone else's data. Can you explain it a bit please?
Yes. Don't do this on a drive which has data on it. We're talking about testing out new drives prior to putting them into production, not dealing with drives which already contain data.
Re:dban followed by smartctl by Anonymous Coward · 2012-12-24 09:22 · Score: 0

Method described by GP only works for blank drives. It writes zeros to the entire drive and then looks for failed write sectors.
Re:dban followed by smartctl by X0563511 · 2012-12-24 10:46 · Score: 1

Hmm unitron has a good point!

Run the short test.
If it fails that, no need to waste hours on the long one.

--
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Re:dban followed by smartctl by Anonymous Coward · 2012-12-25 03:36 · Score: 0

filling drive with zeroes (using dd) and then smartctl is the way i do it too...
nice to hear someone else using exactly the same method!

Yes. I mean no. by JustinFreid · 2012-12-23 05:30 · Score: 0

I actually throw mine in a bathtub.
How did this make it to the front page, especially with SSD prices being what they are?

--
Hey, how's it going?

Re:Yes. I mean no. by Anonymous Coward · 2012-12-23 05:48 · Score: 0

i think you are misunderstanding the bathtub failure curve
Re:Yes. I mean no. by Anonymous Coward · 2012-12-23 05:54 · Score: 2, Funny

Let me guess,,, if it sank to the bottom it was a good drive, but if it floated it was a bad drive and needed to be burnt at the stake.
Re:Yes. I mean no. by geminidomino · 2012-12-23 05:58 · Score: 1

Of course! Fucking witches are getting into everything these days!
Re:Yes. I mean no. by davester666 · 2012-12-23 06:09 · Score: 1

Well, to be sure, if your HDD does float in water, it probably is possessed...

--
Sleep your way to a whiter smile...date a dentist!
Re:Yes. I mean no. by Anonymous Coward · 2012-12-23 06:22 · Score: 0

It's nice that you have 800 dollars to spend on a TB worth of storage, but many of us need more than a couple TB, and don't want to spend upwards of 2 grand for a relatively small storage array.
Re:Yes. I mean no. by nabsltd · 2012-12-23 06:26 · Score: 1

How did this make it to the front page, especially with SSD prices being what they are?
I have a 20TB RAID array that cost me about $0.10/GB, including controllers. If you can afford to build a 20TB array using SSD, you have far more money than I do. You will also need more controllers than I do (port multipliers divide the bandwidth, which you don't want to do for SSDs), since you'd need at least 20 SSDs (if you were willing to pay about $2.50/GB), but more likely more than 45 (at about $0.85/GB).
You also need special controllers that understand SSDs and can pass TRIM commands, and that will add about $0.15/GB. And, you'll need a much more expensive motherboard, since you need at least 24 PCIe lanes that can all be used for something other than video cards, but likely more than 40. Last, since this is Slashdot, you might not be able to use those special controllers, as not all of them have drivers for the kernel version you want to use.
So, yeah, for a boot drive, SSDs kick ass, but for storing your movie collection, not only are they 10 times more expensive than magnetic disks, but they are way overkill as far as performance is concerned.
Re:Yes. I mean no. by Anonymous Coward · 2012-12-23 06:43 · Score: 0

>> How did this make it to the front page, especially with SSD prices being what they are?
There are lots of widely different needs when it comes to consumer vs corporate/small business, and even hobbyists who deal with large amounts of data. If SSDs meet all your needs, that's great but you're missing the point about why the OP asked about conventional hard drives.
Re:Yes. I mean no. by PlusFiveTroll · 2012-12-23 06:55 · Score: 1

>So, yeah, for a boot drive, SSDs kick ass, but for storing your movie collection, not only are they 10 times more expensive than magnetic disks, but they are way overkill as far as performance is concerned.
And where performance is concerned the raid of SSDs replaces many many more disks.
Use a sledgehammer to drive railroad spikes
Use a finishing hammer to drive finishing nails.

Normally I don't test them... by Anonymous Coward · 2012-12-23 05:31 · Score: 1

..but I did not have harddrive fail catastrophically on me.

I do test flashcards, and their survival rate is about 50% :-(. (tar czvf /dev/sdb ..., and another flashcard dead...)

Re:Yes, it's happened. by zidium · 2012-12-23 05:32 · Score: 1

And you don't even tell us the vendor?!

--
Slashdot Valentines Beta Massacre: iT WORKED! The boycotts killed Beta!!

Used to never test by AK+Marc · 2012-12-23 05:32 · Score: 2

My first help desk job included every computer in the company. We had a server drive fail, so I had Compaq send a replacement. The new arrival didn't work. So then I spent more time looking at RAID configuration and such, but we got a second replacement. That one didn't work either. But I tested it on arrival. The third replacement worked fine, just when I was worried it was something stupid I was missing. Two DOA RMAs for the same part. And yes, that's happened to me again since that first time.

I test every "used" part as if it's suspect. The question was about new, but they are still new to me.

--
Learn to love Alaska

Re:Used to never test by Hentes · 2012-12-23 06:54 · Score: 1

Not to mention that some shadier shops tend to resell used or returned parts as new.
Re:Used to never test by PlusFiveTroll · 2012-12-23 07:05 · Score: 3, Interesting

Two DOA of the same part isn't out of the question, a good amount of the time the same part number is from the same batch, which may suffer from the same manufacturing defects. I see things like that pretty often in batches of disks that fall out of RAIDs.
Re:Used to never test by simplypeachy · 2012-12-23 07:28 · Score: 1

The UK's Novatech do this. I've seen Novatech's pre-built backup disks have "recertified" disks from "Magnetic Data Devices" in them. If that doesn't sound dodgy to you, then you should have seen how well they worked :-)
Re:Used to never test by kasperd · 2012-12-23 12:22 · Score: 1

Two DOA RMAs for the same part. And yes, that's happened to me again since that first time.
I have experienced that as well. I won't name the vendor, but I was working for one of their largest customers. At one time I investigated a DOA case in more detail. We had records about all our previous RMA cases, and it turns out I could find the serial number of the DOA part in multiple previous RMA cases. Once I knew this sort of thing was happening, I looked through our records and found many individual serial numbers which we had created RMA cases for more than once.

So it appears that sometimes the faulty part returned in an RMA case is simply put in the stash of new parts to be shipped to customers.

--

Do you care about the security of your wireless mouse?
Re:Used to never test by glsunder · 2012-12-23 14:17 · Score: 1

Way back when I was a tech at a local computer shop, we'd see bad batches of drives. The one that stuck in my mind was 6GB IBM drives for a period of a few months. I think 1/3 of the drives were bad. We tested every system with a variety of tests including drive tests and even winbench, since it worked pretty well at catching flaky motherboards.
Re:Used to never test by gmhowell · 2012-12-23 16:33 · Score: 1

My first help desk job included every computer in the company. We had a server drive fail, so I had Compaq send a replacement. The new arrival didn't work. So then I spent more time looking at RAID configuration and such, but we got a second replacement. That one didn't work either. But I tested it on arrival. The third replacement worked fine, just when I was worried it was something stupid I was missing. Two DOA RMAs for the same part. And yes, that's happened to me again since that first time.
I test every "used" part as if it's suspect. The question was about new, but they are still new to me.
But the third drive burned down, fell over, then sank into the swamp. But the fourth one stayed up. And that's what you're going to get, Lad, the strongest drive in the entire organization.

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon
Re:Used to never test by 1s44c · 2012-12-23 17:34 · Score: 1

Wow, really?
Did you confront the vendor? If so what did they say?
You really should name this vendor so we can all avoid them.
Re:Used to never test by chrispatch · 2012-12-23 20:31 · Score: 1

But the third drive burned down, fell over, then sank into the swamp. But the fourth one stayed up. And that's what you're going to get, Lad, the strongest drive in the entire organization.
But did it have HUGE tracts of storage ?
Re:Used to never test by Anonymous Coward · 2012-12-24 01:20 · Score: 0

..The one that stuck in my mind was 6GB IBM drives for a period of a few months.
I think I know the drives you speak of...
I had a relative who worked with Big Blue at the time who warned me about a batch of dodgy hard disks doing the rounds. They'd found a whole bunch of them faulty during their QA/QC/whatever checks, wouldn't deploy them internally, but somehow these disks eventually found themselves on the OEM market..I remember an idiot at the place I then worked coming in one morning grinning like the Cheshire cat going on about the bargain he'd just gotten..a box of these disks at a very low price.
Oy, did I laugh...
Re:Used to never test by kasperd · 2012-12-28 03:18 · Score: 1

Did you confront the vendor? If so what did they say?
I handed over the information to a colleague, who was handling the interaction with that vendor. So I don't know if the vendor was confronted with this, or what their reaction was. Soon thereafter I transferred to a development project which would get us out of the vendor-lock-in, which we had found ourselves in.

--

Do you care about the security of your wireless mouse?

Anyone actually does this? by tantrum · 2012-12-23 05:33 · Score: 1

I havn't even considered testing my personal harddrives. If they break I try to retrieve whatever is on them, but I just buy new drives instead of spending any amount of time fixing them, never returned a disk - I just buy a couple of new ones whenever I need more space.

At work we're using properly configured SANs with 24x7 support, so I couldn't be arsed to test disks there either. We don't have multiple racks of disks, so I don't see any good reason to test everything.

If you're testing new diskdrives you must be really bored or very broke.

Re:Anyone actually does this? by darkHanzz · 2012-12-23 05:40 · Score: 1

It used to be that stress-testing HD's with random disk access for one day could flush out a lot of bad ones. The ones that did survive tended to last many years. It's a tricky thing with RAID drives. If you happen to have bought a 'bad' batch, chances that more than one will fail before you replace one are pretty high. So testing makes sense sometime. A while ago, google published some research to show that drives do not fail randomly, but in clusters. Making RAID a bit more susceptible to data loss than one might expect.
Re:Anyone actually does this? by Anonymous Coward · 2012-12-23 07:08 · Score: 0

If you can retrieve whatever is on them, they are not broken!
In other words: If they were already broken, YOU DID NOT RETRIEVE EVERYTHING ON THEM. YOU LOST DATA.
If you are OK with losing data, I may call you completely insane, but in the end it's your thing. But don't expect anyone else to be as crazy as you are.
Re:Anyone actually does this? by VortexCortex · 2012-12-23 07:24 · Score: 1

If you're testing new diskdrives you must be really bored or very broke.
If you believe this you must not have many computers...
Re:Anyone actually does this? by tantrum · 2012-12-23 09:24 · Score: 1

I didn't really think I had to mention that I kept backups of my important stuff. Thought that was pretty much what everyone on slashdot did. As a hobby photographer I have monthly backups in 2+1 locations, one of my backups is even in another country :)
I'm ok with loosing my porn and pirated movie collection, though
Re:Anyone actually does this? by tantrum · 2012-12-23 09:32 · Score: 1

nah, not running _that_ many at home anymore. Still I don't think I need more of them
2 laptops, two desktops, a mediabox and a fileserver. Only the Fileserver has more than a couple of tb of disks.
Fileserver uses raid 5, just in case. Desktops are striped SSD's, laptops uses single SSDs and my mediabox boots from an SD card.
I've got neither the time or interrest in testing my disks besides replacing them if they break.
Re:Anyone actually does this? by tantrum · 2012-12-23 09:37 · Score: 1

Yeah I looked at the data from google when they published it and it makes sense. I don't really worry about it though.
At home I take backups of the stuff I care about (sourcecode, pictures, video). I've had quite a few disks breaking on me through the years, but haven't lost any data I care about due to a disk that breaks down.
At work we have redundant backups both of data and vms. Backups are on disk inhouse, on disk in a second location and on tape in a second location.
Re:Anyone actually does this? by tantrum · 2012-12-23 09:44 · Score: 1

If you can retrieve whatever is on them, they are not broken!
Yawn.. please try to keep a disk with a bad motor. You might get it to spin up, you'll get your data - but the disk is still broken.
Looking at one of the shelves here in my home office, I spot 3x1tb, 4x500gb and 2x3tb disks that I can pop in if something breaks. The 3tb disks are spares for my FS.
I'm using the 1tb disks for backups(3x3x1tb) and haven't thrown out the 500gb ones as I don't think they're quite old enough for the trash just yet.
Re:Anyone actually does this? by JackieBrown · 2012-12-24 03:11 · Score: 1

Same here. I don't test disks but I do save stuff that is important to me in multiple psychical locations.
I am not going to set up a raid plus buy hard drives in pairs just to back up movies. (If I buy 2 hard drives, I want to use both of them for something more tangible for me than backups.) Anything lost, while inconvenient, is not worth the cost and time involved with back ups.

smartmontools by WD · 2012-12-23 05:35 · Score: 5, Informative

Set up the smartd.conf file to do the example short-test daily and long-test weekly, and email you when something is fishy. It's a trivial amount of effort, resulting in a significant amount of peace of mind. (In many cases, you'll have some amount of warning before your drive kicks the bucket and it's too late)

Re:smartmontools by Deekin_Scalesinger · 2012-12-23 05:58 · Score: 5, Funny

This should be modded up for your username alone lol

--
"As the intrepid kobold companion continues his journey, he begins to wonder... if priests raises dead, why anybody die?
Re:smartmontools by Gaygirlie · 2012-12-23 07:08 · Score: 1

Set up the smartd.conf file to do the example short-test daily and long-test weekly, and email you when something is fishy. It's a trivial amount of effort, resulting in a significant amount of peace of mind. (In many cases, you'll have some amount of warning before your drive kicks the bucket and it's too late)
This is the setup I've used on my server for a while now.
I see all these defrags, fscks and such inferior when compared to S.M.A.R.T. self-tests simply because the drive itself will always know more about its condition than any 3rd-party tools that just try to guess its state via secondary effects, and as such it sometimes baffles me how few people even in this day and age ignores S.M.A.R.T. I recommend smartmontools and smartd under Linux and Hard Disk Sentinel under Windows, though HD Sentinel ain't free.
Re:smartmontools by spire3661 · 2012-12-23 07:31 · Score: 1

SMART is not a useful indicator of anything, dont rely on it.

--
Good-bye
Re:smartmontools by Anonymous Coward · 2012-12-23 07:35 · Score: 0

I think the problem is that the short tests don't necessarily write to the disks and some attributes are not updated until the offline data collection completes. Therefore, you need to set SMART to do that as well. That means the long test for the former and enabling automatic offline data collection for the latter. If you don't do that, then theoretically speaking a third party tool would know more.
Re:smartmontools by Anonymous Coward · 2012-12-23 07:44 · Score: 0

This should be modded up for your username alone lol

Did I miss the memo where "lol" replaced "." as a way to end sentences? What are the substitutes for other punctuation marks?
Is it supposed to rain today wtf if so omg I should bring an umbrella lol
Re:smartmontools by Anonymous Coward · 2012-12-23 07:54 · Score: 0

Unfortunately, many USB interface chipsets do not support S.M.A.R.T.
Re:smartmontools by Anonymous Coward · 2012-12-23 09:25 · Score: 1

This should be modded up for your username alone lol
Did I miss the memo where "lol" replaced "." as a way to end sentences? What are the substitutes for other punctuation marks?
Is it supposed to rain today wtf if so omg I should bring an umbrella lol
lmao can substitute for "!", lmfao can substitute for "!!". As in "Just saw a car run off the road lmao" or "and it flipped over 3 times lmfao"
Re:smartmontools by thegarbz · 2012-12-23 10:05 · Score: 1

Maybe you should read into exactly what the parent has setup. People think of S.M.A.R.T as a useless online scanning tool which will only report something if there's an issue writing the current data.
What the parent is suggesting is to setup a weekly long smart test. The drive does a surface scan taking a few hours and reports any problems it finds. This is no different from any other suggestion on here such as running badblocks in read only mode on the entire drive. If it has difficulty reading some sectors it'll report it.
Also saying it's not a useful indicator of anything clearly shows you don't know what to do with the data. There's many failures that reported in the data in advance. My personal example is that every drive that has died earlier reported a sharply rising bad sector relocation count. Right there we have a metric covering any number of issues with a drive head or platter due to mechanical degradation.
Sure looking at drive data once won't tell you anything, trending it over time will.
Re:smartmontools by Anonymous Coward · 2012-12-23 10:29 · Score: 0

I see all these defrags, fscks and such inferior when compared to S.M.A.R.T. self-tests
You can't compare fsck to SMART. SMART checks the hardware. fsck checks the file system. It's like saying a tire is better than a steering wheel. And defrag doesn't check anything. I'd recommend learning more about computers before posting again.
Re:smartmontools by Anonymous Coward · 2012-12-24 06:35 · Score: 0

I think you missed the memo that explained what lol means
Re:smartmontools by Anonymous Coward · 2012-12-27 03:54 · Score: 0

I think you missed the whoosh over your head.

Exercise the drive by roc97007 · 2012-12-23 05:36 · Score: 1

Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.

But having a returned drive rejected because I repartitioned it or "ran linux"? Never heard of that.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

Re:Exercise the drive by whoever57 · 2012-12-23 06:11 · Score: 1

Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.
Rather ineffective tests.
Use smartctl and schedule long tests. Also try something like:
dd if=/dev/sda of=/dev/sda bs=64k

--
The real "Libtards" are the Libertarians!

Yes, always by Anonymous Coward · 2012-12-23 05:36 · Score: 0

Yes, after four drives failed in their first 48hour of use. Not very nice situation.

Re:SSDs by Anonymous Coward · 2012-12-23 05:37 · Score: 0, Insightful

The massive storage requirements cause massive backup time, making a RAID setup of some kind necessary. At which point a dying disk now and then no longer is an issue.

man this is sad by Anonymous Coward · 2012-12-23 05:38 · Score: 0

if you want to develop a testing protocol for hard drives, there are lots of papers.

a common technique is to stress the drive over its rated capacity (i.e. thermal) in order to develop a curve,
so you dont have to wait the years required for it to fail under normal conditions.

if instead you care about operations, you use some mitigation strategy like raid.

dealing with drive failure is bread and butter for this crowd. why does such a lame, useless question end up on slashdot?

betteridge's law of headlines by whoever57 · 2012-12-23 05:38 · Score: 1

betteridge's law of headlines applies here. Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist. As for problems with RMAs for hard drives used under Linux, repartitioned, etc. No.

--
The real "Libtards" are the Libertarians!

Re:betteridge's law of headlines by Anonymous Coward · 2012-12-23 07:07 · Score: 0

Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist.
Seagate constellations with a consistent 3-5% DOA disagree.
Re:betteridge's law of headlines by simplypeachy · 2012-12-23 07:31 · Score: 1

They are also abused during the supply chain. I've had enough fail under (24hr) burn-in tests that I never let a disk hit production until it has been tested.
Re:betteridge's law of headlines by rrohbeck · 2012-12-23 07:32 · Score: 1

> Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist.
Not any more apparently. In our manufacturing line we see a lot of bad block replacements during the first write pass.
When I worked in the HDD field a couple of years ago every drive went through a 24h burn-in before it shipped. That doesn't seem to happen any more.

--
thegodmovie.com - watch it

never had early failure by iggymanz · 2012-12-23 05:38 · Score: 1

manufacturers do a burn-in before shipping, that gets most the early failures. of course, some will still win the lottery and get a crappy early-failure drive but has never happened to me.

Re:never had early failure by hairyfeet · 2012-12-23 08:31 · Score: 2

Then you sir are either the luckiest bastard on the planet or haven't bought any Seagate drives above 500Gb, because I've seen so many dead OOTB or very soon after leaving the box Segate 1TB and above drives i won't even touch them anymore.
There is a reason this guy is asking this question, its because we are now down to just 2 makers of drives and the Seagates are Russian roulette with your data. Most likely he has seen that the new Seagates are selling for as low as $50 a TB online and wants more space but can see all the horror stories in the feedback and wants some way to help mitigate the risk.
But I'm sorry friend, the only way I've found to mitigate the risk is to avoid Seagate like an STD, even with WD drives often double the price of the Seagate, because while the WDs seem to have about a 1 in 15 failure rate the Seagates depending on the size (1TB-2TB the worst, 3TB better but not great) you are looking at as low as a 1 in 3 chance of failure. With failures THAT high, which frankly I hadn't seen since the big Maxtor mess of 2002, i just would avoid Seagate for anything i gave a shit about as its just not worth the risk.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:never had early failure by Osgeld · 2012-12-23 09:03 · Score: 1

on the other hand of this totally "works for me" story, I have a few 1tb seagate drives without a problem ... WD's on the other hand, every single one I have ever purchased in my life ends up dead with in 2 years.
Re:never had early failure by drsmithy · 2012-12-23 09:38 · Score: 1

If seagate drives really had anything close to a 33% failure rate, it'd be plastered all over every bit of vaguely consumer protection oriented media in the world.
Re:never had early failure by hairyfeet · 2012-12-23 22:47 · Score: 1

Dude just look at the feedback on NewEgg and Tiger, NewEgg especially because that is where a LOT of system builders buy their drives and they often buy in bulk. You'll see feedback like "Bought 10 drives, 4 failed OOTB, its been 8 months now and all but 2 are fails and the other 2 are sounding iffy" and "Bought 4, all dead in less than a year" and it goes on and on and on.
hell look at the sale prices man, if that don't tell you Seagate has a serious problem I don't know what will! So far I've seen the same drive offered in every single sale since BF last month, they still have plenty in stock. WD drives were nearly double the price, they were sold out and backordered within A DAY, when you have the word get around it don't take long for the system builders to all get wind and start avoiding the bad batches, hell that is how I was able to avoid most of the big Maxtor mess or 2002. Even tiger has to know it, they've been selling Seagate 1TB drives for as low as $50 on sale, but more than 2/3rds of their kits? Seagate 500Gb, NOT the 1TB. This time last year you couldn't hardly find any tiger kits that were less than 1TB, even the $150 AMD dual special had a 1TB, not anymore.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:never had early failure by the_B0fh · 2012-12-24 01:01 · Score: 1

during black friday this year, newegg had some seagate 3TBs for sale for $80 each. I bought 3.
1x doa
1x died within 2 days
1x still going strong.
Seagate is the new maxtor.
Re:never had early failure by Richy_T · 2012-12-24 04:32 · Score: 1

I had a 1TB seagate that failed within a month. I haven't even been bothered to RMA it. Then again, I just RMAed 3 WD blacks, two of which failed at nearly the same time in a RAID 1 and unfortunately, Nvidia's RAID, which seemed to work fairly OK has crappy tools which don't report errors by default. It seems like HD quality is through the floor since my main file/mail-server is using a couple of drives from the early 2000s with not even a hiccup
Re:never had early failure by Reziac · 2012-12-24 05:51 · Score: 1

I've been a WD bigot for over 15 years now... out of all the ...perhaps 200 or so (anecdotal level but still a reasonable sample) WDs I've used or maintained, I've only seen ONE fail without warning, and likely from *lack* of use as it was 8YO but had only 2600 hours on it (apparently sitting doing nothing is not so good; from what I've read the magnet can start affecting stuff and then the head gets stuck and you get the clank-of-death). I have some with over 11 years running 24/7. I've nursed several along for years after they came down with the creepiing crud. They do occasionally die, but with that one exception, they've always given me plenty of warning (even without SMART).
The Seagates I'd seen had been generally reliable, but when they died, went in a blaze of glory -- literally got too hot to touch, even when laying out in the air. (Bad bearings, someone told me. Makes sense.) And that overheating was all the notice you got.
Conversely, I've seen dozens of Maxtors die without warning (and only rarely with a complaint from SMART). After Seagate bought first Conner (prone to lose data when idle) and then Maxtor, I started hearing of and seeing more that just ...died... without a prior hint of trouble. I'm guessing what's labeled Seagate now is more Maxtor tech than not, and THAT is the real problem.
BTW someone pointed out something with the "refurbs" that are now usually our RMA replacements... Seagate found that about 60% of RMAs were actually perfectly good HDs, suffering from software errors. And that "refurb" has been individually tested, not just batch-tested like new drives. So... you may be better off with refurbs for critical storage. (Certainly the only refurb I have has been good -- it has about 60,000 hours on it.)

--
~REZ~ #43301. Who'd fake being me anyway?
Re:never had early failure by digitalsolo · 2012-12-24 07:45 · Score: 1

Interesting, my company purchased ~400 1TB Seagate drives about 6 months ago for data storage arrays (RAID 6). We've had 3-4 failures so far. We must have better than average luck I guess. We replaced ~400 750Gb drives that were in place for a couple years in the same arrays, FWIW. Also Seagates.

--
Just another ignorant American.
Re:never had early failure by hairyfeet · 2012-12-24 10:14 · Score: 1

I'm afraid the blacks are whack, have been for a couple of revs now. The greens are good but naturally they have been phasing those out so good luck finding any, and of course as I said Seagate over 640GB is just shit, absolute garbage and I'd be afraid to put anything I cared about on it.
And sadly drive quality IS through the floor, I've got a box of 80-200GB drives sitting in the shop to be tested and I bet a good 90% of them, even running for who knows how long in what kind of conditions, will be good whereas in the past 3 years or so the quality has really taken a nosedive, i frankly haven't see this many bad drives coming through my door since The Maxtor Mess of 02.
Here is my personal order of from best to worst, just based on what I've seen at the shop, Samsung EcoGreen > Samsung Spinpoint/ Hitachi Deskstar (tie), WD Green > WD Black > Seagate under 640GB. > Seagate over 640GB. The Seagates over 640GB and WD Black 1TB and 2TB have gotten so piss poor in QA that frankly I would and actually have bought refurb Samsung and Hitachi over new, yes they are THAT bad. And while the Greens are fine WD seems to be having the exact same problem Seagate is having when it comes to their large capacity Black drives, for some reason neither company can build 7200RPM drives 1TB or larger worth a fuck ATM, the failure rates is just insane.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:never had early failure by drsmithy · 2012-12-24 10:27 · Score: 1

The plural of anecdote is not data.
Re:never had early failure by the_B0fh · 2012-12-24 15:10 · Score: 1

*shrug* go look at the reviews on newegg. plural of anecdote may not be data, but it definitely draws an interesting line.
Re:never had early failure by iggymanz · 2012-12-25 03:46 · Score: 1

I've been spinning a couple seagate 2TB drive for a year now, in between times my PC gets on after work and turned off every night. My son has a 1TB seagate, and he's a gamer. nice drives, those. you must be the unluckest bastard on the planet, don't walk under ladders and don't cross busy streets.
Re:never had early failure by iggymanz · 2012-12-25 06:32 · Score: 1

do some math, how many posters compared to units sold? 0.0001%?
Re:never had early failure by Richy_T · 2012-12-27 02:44 · Score: 1

I'm kinda surprised that there's not a central database of drive reliability. Perhaps automated reporting could be added in to smartmontools even.
Re:never had early failure by Wolfrider · 2012-12-27 04:51 · Score: 2

--If I were you, I would look into the following:
o Test all drives before putting them into production - either with SMART long test, or linux 'badblocks'
o Cooling - is it adequate enough?
o Powerful enough Power supply ++ UPS (essential these days)
o Mount all drives with "noatime" option in Linux, or in XP and later:
' fsutil behavior set disablelastaccess 1 ' and reboot
o Spin down all HDs when not in use.
--I do all of the above, and my drives last for years and years. Just sayin'

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Re:never had early failure by Osgeld · 2012-12-27 07:00 · Score: 1

oh I have drives dating back to the 80's, none are WD's though ... they all got sorted in the wash
Re:never had early failure by drsmithy · 2012-12-28 08:35 · Score: 1

Like I said, if there were _real_ data showing any particular brand of hard disk had failure rates in double-digit percentages, consumer protection groups would be all over it like a rash.
Re:never had early failure by the_B0fh · 2012-12-28 09:56 · Score: 1

Like they were all over Microsoft with the RROD, sure, whatever.

Every time by Anonymous Coward · 2012-12-23 05:38 · Score: 1

Every single platter HD I get I scan for bad sectors. I got sick and tired of returning faulty WD black drives to different suppliers because of huge bad sector counts. Sine I have been testing I have returned about 5 drives due to sector issues. I don't run any tests on SSD's

Lifetime of bathtubs by cvtan · 2012-12-23 05:39 · Score: 2

Old bathtubs lasted longer than old hard drives. Now it's the other way around.

--
Sorry, but gray text on gray background is making my eyes bleed.

Tools by Anonymous Coward · 2012-12-23 05:40 · Score: 0

I run badblocks -sw /dev/sdX on both new and old disks before use.

This does a 4 pass read/write scan of the disk and reports on errors it finds.

Yet to come across a new disk with issues, but it has saved my bacon sevral times on old disks that are about to die.

I also run smartctl -t long /dev/sdX and then smartctl -a /dev/sdX to read the results.

RAID them and you're OK by Anonymous Coward · 2012-12-23 05:42 · Score: 0

Buy more than one. Then you're OK if one is bad. Though some testing does seem like a decent idea.

Yes! Especially before adding them to an array. by Anonymous Coward · 2012-12-23 05:42 · Score: 5, Interesting

I run some ZFS systems at work. With the current version of the filesystem, you can expand the zpools but you can't shrink them, so adding a bad drive causes immediate problems.

I've found that some drives are completely functional but write at extremely slow rates: maybe 10% of normal. With typical consumer drives, maybe 1/20 is like this. To ensure I don't put a slow drive into a production zpool array of disks, I always make a small test zpool consisting of just the new batch of drives and stress-test them.

This catches not only obviously bad drives, but also the slow or otherwise odd ones.

Re:Yes, it's happened. by ArchieBunker · 2012-12-23 05:50 · Score: 2

Sounds like a really old troll.

--
Only the State obtains its revenue by coercion. - Murray Rothbard

Re:SSDs by White+Flame · 2012-12-23 05:53 · Score: 3, Insightful

Not really. People usually don't modify gigantic footprints of data per day, so standard incremental backup strategies are still very applicable. Most of the large data tends to be read-only over time, typically media, archives, large installation files, etc.

Murphy's Law of Testing by White+Flame · 2012-12-23 05:55 · Score: 2

Trying to coax an error will never reveal one. Only when you start using it "for real" will the problem manifest.

Re:Murphy's Law of Testing by marcosdumay · 2012-12-24 07:03 · Score: 1

By Mutphy's Law, your testing procedure will cause defects. And void your warranty.

--
Rethinking email

Re:SSDs by cpghost · 2012-12-23 05:58 · Score: 3, Informative

Who cares about HDDs anymore these days?

We do here at work. We need some modest 120+ TB of storage right now, and 30% of that content is highly dynamic (PostgreSQL databases). Anything but data center quality HDD would be silly, not to mention unreliable as hell and heavily expensive. SSDs are just for laptops or so, not for real data storage requirements.

--
cpghost at Cordula's Web.

Do you test third party software components? by thePowerOfGrayskull · 2012-12-23 05:59 · Score: 1

Do you perform extensive functional tests against third party software libraries before including them in your system? In most situations, no -- if it's established and proven. You trust that it does what it advertises, and only when it doesn't do you dig further.

Same goes for hard drives.

Re:Do you test third party software components? by PlusFiveTroll · 2012-12-23 07:16 · Score: 1

Wat? Do you download your software over UDP without any error checking or means of correction? Do your dll's and exe's not verify their size and signature? I tend to verify my packets, files, and packages.
Re:Do you test third party software components? by VortexCortex · 2012-12-23 07:39 · Score: 1

Software is logic; It's mathematics. The problem with your logic is thus:
"Do you perform mathematical proofs of theorems known to be proven and tested by many already? No, of course not. The same rules that govern logic constructs can be applied to physical reality"
That is to say, you're ignoring the vast difference in the reliability of their construction materials: Matter is an imperfect imprecise medium very different from mathematics.
Protip: Even the very elements themselves vary in atomic mass among atoms of the same elements!
Re:Do you test third party software components? by White+Flame · 2012-12-23 11:06 · Score: 1

"In theory, there's no difference between theory and practice. In practice, there is." - Jan L. A. van de Snepscheut
Re:Do you test third party software components? by unitron · 2012-12-24 01:33 · Score: 1

Do you perform extensive functional tests against third party software libraries before including them in your system? In most situations, no -- if it's established and proven. You trust that it does what it advertises, and only when it doesn't do you dig further.
Same goes for hard drives.
Can those software libraries be packed much more poorly by one vendor than another, or shipped via a carrier that plays football with them, and be different from how they were before shipping as a result?

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.

format and secure erase by danlip · 2012-12-23 06:05 · Score: 1

I always do a format and a secure erase (one pass of zeros). In addition to finding bad sectors I want to be sure to get rid of any trace of whatever crap they put on it at the factory (viruses, kiddie porn, crapware, etc).

Re:format and secure erase by Gaygirlie · 2012-12-23 07:26 · Score: 1

I always do a format and a secure erase (one pass of zeros).
That's not a secure erase. Doing that only clears out whatever content the OS has access to, it does not clear reallocated sectors at all, for example.

This isn't real estate by Anonymous Coward · 2012-12-23 06:06 · Score: 0

Real estate is a scam business filled with thieves, liars, hypocrites and leeches. You need to inspect before you buy, after you buy, you need insurance, lawyers, and notaries just to be sure that in 2012 a roof doesn't leak water. Real estate is filled with people who couldn't do any better in life.

Hard drives are built by engineers and technicians with a built-in sense of ethics. There's a whole lot less to worry about.

Re:SSDs by Desler · 2012-12-23 06:08 · Score: 1

People who need reliable, long-term storage care about HDDs. Just like how people still used tape drives even when CDs and DVDs came along.

Testing SSDs? by Anonymous Coward · 2012-12-23 06:08 · Score: 0

Does anyone have experience or a good protocol for stress testing SSDs while minimizing wearout?

Or should I even bother?

Re:Testing SSDs? by allo · 2012-12-23 10:24 · Score: 1

when a stresstest wears out the ssd too much, either you do not need it first place, or your usual usage will be too much stress as well.

Badblocks/Shred by SealBeater · 2012-12-23 06:09 · Score: 1

badblocks -t random /dev/sdX && shred /dev/sdX

Badblocks checks for bad sectors while writting random data to the drive and after all is good, I run shred once or twice to fill the drive with random data. You can probably get by with just badblocks tho.

--
-- Its survival of the fittest...and we got the fucking guns!!!

Re:Badblocks/Shred by Anonymous Coward · 2012-12-23 07:31 · Score: 0

Badblocks checks by reading it back and making sure it worked. shred won't do much for you since many of the write errors are silent and will go unnoticed until the drive fails to read the block back. I'd rather spend the time on a second badblocks pass.
Re:Badblocks/Shred by SealBeater · 2012-12-23 08:32 · Score: 1

shred won't do much for you since many of the write errors are silent and will go unnoticed until the drive fails to read the block back. I'd rather spend the time on a second badblocks pass.
The reason I use shred is because it fills the drive with random data faster than badblocks does. I do this because I do whole disk encryption.

--
-- Its survival of the fittest...and we got the fucking guns!!!
Re:Badblocks/Shred by rdebath · 2012-12-23 20:47 · Score: 1
If you do FDE you don't want to use badblocks --random, It creates ONE random block and writes it out repeatedly.
I find one of these is better..
- Testing the disk four writes and four reads.
  
  testdisk () { [ -e "$1" ] || { < "$1" ; return; } hdparm -f "$1" 2>/dev/null ||: cryptsetup create towipe $1 -c aes-xts-plain -d /dev/urandom badblocks -svw /dev/mapper/towipe cryptsetup remove towipe dd bs=512 count=1 if=/dev/zero of=$1 }
- Fast write to true random usually runs at full disk speed.
  
  wipedisk () { [ -e "$1" ] || { < "$1" ; return; } hdparm -f "$1" 2>/dev/null ||: dd bs=512 count=100 if=/dev/zero of=$1 cryptsetup create towipe $1 --offset 1 -c aes-xts-plain -d /dev/urandom dd bs=1024k if=/dev/zero of=/dev/mapper/towipe cryptsetup remove towipe }
- Alternate full speed random wipe, sometimes faster.
  
  wipedisk() { [ -e "$1" ] || { < "$1" ; return; } hdparm -f "$1" 2>/dev/null ||: openssl enc -bf-cbc -nosalt -nopad \ -pass "pass:`head -16c /dev/urandom | od -t x1`" \ -in /dev/zero | dd bs=1024k > $1 dd bs=512 count=1 if=/dev/zero of=$1 2>/dev/null }
The end result is a drive filled with true cryptographically random data completely indistinguishable from an encrypted drive, because it is an encrypted drive!
Re:Badblocks/Shred by SealBeater · 2012-12-23 22:15 · Score: 1

Awesome, thank you!

--
-- Its survival of the fittest...and we got the fucking guns!!!

Yep! by Anonymous Coward · 2012-12-23 06:18 · Score: 0

I always do (or at least when I get some time) and I've found a bad sector a couple of times now. The supplier has always sent me a new one straight out, and I was glad I found out sooner rather than later. For me at least, it seems worth it, and if you run a check overnight and get the computer to shut down after it's done, you're not losing much.

No - I just assume they will fail by turkeyfeathers · 2012-12-23 06:22 · Score: 1

I buy hard drives in pairs, using one for live data and one kept offline until it's time to back up the live drive (I use Unison sync to quickly determine what's changed between the two drives). My boot drive gets backed up every night with Macrium Reflect. The secret to a happy life: assume that every drive will fail tomorrow and keep everything backed up.

Re:No - I just assume they will fail by VortexCortex · 2012-12-23 07:51 · Score: 1

The secret to a happy life: assume that every drive will fail tomorrow and keep everything backed up.
That's why I horde precious metals instead of money: The fear that every drive will fail tomorrow. Can't say it's made me any happier overall -- being on a terrorist watch list. It does have it's rare moments, e.g., I can't fly, but I get to avoid the TSA.
Re:No - I just assume they will fail by White+Flame · 2012-12-23 11:09 · Score: 1

Buying in pairs is a bad idea. Chances are high that you'll end up with two drives from the same manufacturing run. If there's some issue with them that leads to very short life span, both drives are very likely to experience the same problem. It's best to purchase more sporadically.
Re:No - I just assume they will fail by turkeyfeathers · 2012-12-23 12:05 · Score: 1

Good point but I'm unlikely to get two from the same manufacturing run since I buy one WDC and one Seagate each time to avoid that sort of correlation! I haven't found one brand is any worse than the other.

SMART + badblocks by SuperBanana · 2012-12-23 06:23 · Score: 5, Interesting

I run smartctl and capture the registers, then run badblocks, and compare smartctl's output to the pre-bad-blocks check.

If there are any remapped blocks, the drive goes back, as the factory should have remapped the initial defects already, and that means new failed blocks in the first few hours of operation.

--
Please help metamoderate.

Re:SMART + badblocks by sribe · 2012-12-23 06:39 · Score: 1

Great idea, thanks. I always test new drives, but this one had not occurred to me.
Re:SMART + badblocks by Anonymous Coward · 2012-12-23 06:54 · Score: 0

I do:
$DRIVE="/dev/sdb"
smartctl --xall $DRIVE > sm_data_0.txt; smartctl -t short $DRIVE; smartctl -- xall $DRIVE > sm_data_1.txt ; badblocks -vws $DRIVE; smartctl --xall $DRIVE > sm_data_2.txt
In addition, if I copy a significant amount of data to the drive I also recheck (and save) the smartctl ouput.
For larger than 1TB drives am am seeing lots of raw read errors on new drives, so I have to agree with other observations I've read about the larger than 1TB drive technology -- its not there yet ....
Re:SMART + badblocks by rrohbeck · 2012-12-23 07:50 · Score: 2

That's the right way to do it but manufacturers increasingly don't accept returns for a single or few bad blocks. They say that's acceptable.
The reason is probably that it's too time consuming to test the entire surface with the high capacities but mostly unchanged transfer rates that we see.

--
thegodmovie.com - watch it

Google Whitepaper Answers Your Questions by idealego · 2012-12-23 06:23 · Score: 1

This answers most of your questions and does so using data based on a large dataset.
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/disk_failures.pdf

If you are concerned about reliability I suggest using an Intel SSD. Their failure rate is very low.

Re:Google Whitepaper Answers Your Questions by rrohbeck · 2012-12-23 07:53 · Score: 1

Except for FW bugs that may lock up the drive hard or cause it to say it has 8MB capacity.

--
thegodmovie.com - watch it

Re:SSDs by aaarrrgggh · 2012-12-23 06:29 · Score: 3, Insightful

Rebuild time. It takes our hardware raids about 24 hours to rebuild, and software raids about 72 hours. If the disk failure isn't detected immediately, even with RAID-6 you are pushing your luck.

RAID is not backup.

Re:SSDs by Anonymous Coward · 2012-12-23 06:29 · Score: 0, Offtopic

Let's see. I have three drives connected to this laptop. The internal, which is 1TB, an external that is 3TB and another external that is 4TB.

Let me know when I can buy an SSD for $100 that matches the size of any of those.

Re:Yes! Especially before adding them to an array. by mrmeval · 2012-12-23 06:30 · Score: 1

How have you been treated when returning them? I'd like to know what brands and what vendor. I'm always looking for success stories especially on commodity hardware. Thanks.

--
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty

I saw linear failure rates. by AlecC · 2012-12-23 06:36 · Score: 1

I used to work for a manufacturer of video raid arrays. While I was writing software, not on hardware QA, I saw a lot of drives go past. I saw no sign of high early failures, bathtub style. It seemed to me essentially random. The only tip I would have would be to monitor your bad block count. Most drives only showed one or two "grown" as opposed to factory marked bad blocks. If the bad block list grows into the teens, swap that drive.

--
Consciousness is an illusion caused by an excess of self consciousness.

Quick Disk Test tool by Anonymous Coward · 2012-12-23 06:40 · Score: 0

Some time ago I wrote a Java app to perform a quick, non-scientific test on USB sticks and hard drives, it is here:

http://sourceforge.net/apps/mediawiki/filereadtest/index.php?title=Main_Page#Quick_Disk_Test_tool

Reallocated Sector Count = lost data? by Anonymous Coward · 2012-12-23 06:41 · Score: 0

If my SMART data is showing the following:

Reallocated Sectors Count = 15
Reallocation Event Count = 15
Current Pending Sector Count = 0
Uncorrectable Sector Count = 0

Could the reallocated sectors mean data was lost? I've seen conflicting information on whether reallocated sectors means data was lost. Are there any other SMART attributes I can look at to determine if data was lost on the drive?

Re:Reallocated Sector Count = lost data? by Gaygirlie · 2012-12-23 07:19 · Score: 1

If my SMART data is showing the following:
Reallocated Sectors Count = 15
Reallocation Event Count = 15
Current Pending Sector Count = 0
Uncorrectable Sector Count = 0
Could the reallocated sectors mean data was lost? I've seen conflicting information on whether reallocated sectors means data was lost. Are there any other SMART attributes I can look at to determine if data was lost on the drive?
You would know if there was data that was lost. Normally the drive silently copies the data off of failing sectors to new sectors, reallocates the sector, and you don't notice anything. But if the sector is completely unreadable or returns incorrect CRC (that is, drive's internal CRC that is irrelevant of how the drive is formatted) then the drive will return an error to the operating system and you will be notified of it. The drive does not automatically reallocate such sectors as it will wait until the OS tries to write data to the broken sector before the drive reallocates it exactly for the reason that there wouldn't be silent corruption to files without users' knowledge. Case in point: the power supply on my server caught on fire and disrupted the other electrical components and on one of my drives there was a bunch of sectors with broken internal CRC -- nothing I could do about it, but atleast I was informed of what files I lost when I tried to read them. I proceeded to delete the files in question and wrote random data to the affected sectors after which the reallocated sectors count was increased.
Re:Reallocated Sector Count = lost data? by SuperQ · 2012-12-23 08:35 · Score: 1

Yea, I would like to see a better communication method for these error to be communicated up from the kernel through userspace. Most of the time when a "normal" user gets errors for EIO, they see some kind of crash or debug message. If the filesystem could simply put the filename with the error into a list for some userspace service, the GUI file manager(s) or some health monitoring service could notify the end user with something a little more descriptive.
This could also let the user activate the relocation write scrub for that file.
I guess this is all stuff that can be solved in the more advanced filesystems like ZFS/btrfs where they can simply read the replicated copy or recover with the RS code blocks. Then the end user doesn't even know they had a platter defect outside the relocation count.
Re:Reallocated Sector Count = lost data? by Gaygirlie · 2012-12-23 09:48 · Score: 1

If the filesystem could simply put the filename with the error into a list for some userspace service, the GUI file manager(s) or some health monitoring service could notify the end user with something a little more descriptive.
This could also let the user activate the relocation write scrub for that file.
I wholeheartedly agree, and I'd also like S.M.A.R.T. capabilities to actually be properly integrated with the OS if the drive supports them and reports sane values. Alas, very few OSes by default actually monitor S.M.A.R.T. or provide facilities for reporting component health to the end-user -- if the same facilities also monitored the health status of any other possible components in the system -- GPU, CPU, motherboard, other attached devices that know how to report their health -- it could possibly save people huge amounts of needless headaches.

I guess this is all stuff that can be solved in the more advanced filesystems like ZFS/btrfs where they can simply read the replicated copy or recover with the RS code blocks. Then the end user doesn't even know they had a platter defect outside the relocation count.
Well, it shouldn't be solved completely silently. End-user should still be warned of such defects even if the filesystem can correct them just so that the user can keep this in mind should there appear more such defects in a short amount of time.
Re:Reallocated Sector Count = lost data? by kasperd · 2012-12-23 12:46 · Score: 1

But if the sector is completely unreadable or returns incorrect CRC (that is, drive's internal CRC that is irrelevant of how the drive is formatted) then the drive will return an error to the operating system and you will be notified of it. The drive does not automatically reallocate such sectors as it will wait until the OS tries to write data to the broken sector before the drive reallocates it exactly for the reason that there wouldn't be silent corruption to files without users' knowledge.
That is all correct. If you are lucky, retrying the read may help. If the hard disk does not receive any writes for that sector, it does not get relocated. If the hard disk repeatedly receives read requests for that sector, it will try to read it each time it gets a read request. If eventually the read succeeds, then the sector gets relocated at that time.

So relocated sectors doesn't mean lost data. However relocated sectors is a warning sign that the drive may be dying. The study on hard drive failures, which Google did a few years back found that even a single relocated sector meant a significant increase in the likelihood of the disk failing.

--

Do you care about the security of your wireless mouse?
Re:Reallocated Sector Count = lost data? by SuperQ · 2012-12-23 17:12 · Score: 1

Yea, I was happy to see Ubuntu doing something with basic SMART output by default. The main problem is the more advanced health detection values are basically noise unless you're the manufacturer or a big enough disk customer that they will let you in on the secrets. But like you implied, lots of drives don't output sane values.
Yes, more bubble up health reporting would go a long way toward making computer support easier.

hundreds of drives... by spywhere · 2012-12-23 06:41 · Score: 1

...bought and installed in desktops & laptops over the last decade, and what I've learned is to buy Seagate drives. I have seen way fewer defects and first-year failures on Seagate than WD, and I was happy to see Maxtor go away.

Re:hundreds of drives... by Anonymous Coward · 2012-12-23 07:20 · Score: 0

I have never seen a Maxtor drive take a complete dump. They were probably the most reliable drives to be had for a good while. I miss that company.
Seagate has pulled themselves together and made something very good in recent times, but there was a time not so long ago that they were the recycled Korean plastic of hard disks.
Re:hundreds of drives... by rrohbeck · 2012-12-23 07:56 · Score: 1

Same thing here. We use hundreds of drives per week, mostly Seagate plus some Hitachi and recently qualified Toshiba. No WD unless you count HGST.

--
thegodmovie.com - watch it
Re:hundreds of drives... by hairyfeet · 2012-12-23 08:45 · Score: 1

Funny but for me its been the opposite, its been Samsung > Hitachi > WD > Seagate. Now admittedly my customers prefer 1TB and above, never had a problem with Seagates below 640GB but those above 640GB, especially the 7200RPM 1TB and 1.5TB have had high enough failure rates I just avoid them.
Know what you mean about Maxtor though, those DiamondMax drives where failure city. The worst part was there was about a year and a half where the ONLY drives you'd find in an HP or eMachine were DiamondMax so I had to deal with a LOT of systems from those companies with dead drives.

--
ACs don't waste your time replying, your posts are never seen by me.

Re:SSDs by jmichaelg · 2012-12-23 06:41 · Score: 1

> Anything but data center quality HDD would be silly, not to mention unreliable as hell and heavily expensive.

Guess Google is silly then using the cheapest possible hard drives and accommodating the inevitable failures.

Re:SSDs by Anonymous Coward · 2012-12-23 06:48 · Score: 0

If you prioritize rebuild over user I/O that drops to <8h for a 20*3T SW raid60.

Re:SSDs by PlusFiveTroll · 2012-12-23 06:49 · Score: 1

> SSDs are just for laptops or so, not for real data storage requirements

Yep, just for laptops

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-910-series.html

http://www.equallogic.com/products/default.aspx?id=10857

SSD isn't great for bulk data storage, but where you need high IOPS a few SSDs in arrays replace a truckload of drives.

Re:SSDs by PlusFiveTroll · 2012-12-23 06:50 · Score: 3, Insightful

Depending on your definition of reliable and long term, people still use tapes.

Re:SSDs by Anonymous Coward · 2012-12-23 06:53 · Score: 0

Google isn't running PostgresQL databases on those expendable machines

Did ketchup lead to the extinction of dinosaurs? by Dogtanian · 2012-12-23 06:53 · Score: 1

betteridge's law of headlines applies here.

No, it doesn't. This is an actual, legitimate question.

As I correctly predicted earlier this year, lots of Slashdotters have seized upon Betteridge as the latest fad kneejerk response, and are misapplying it without understanding what it means. In his own words, Betteridge's Law applies to cases where journalists "know the story is probably bollocks, and don’t actually have the sources and facts to back it up, but still want to run it."

For example, without the evidence to back it up, a headline saying "Tomato ketchup caused AIDS that led to exitinction of dinosaurs" would be obvious crap and lead to criticism of the paper and/or journalist. OTOH, "Did Tomato ketchup cause AIDS that led to the extinction of the dinosaurs?" gives them the weasellish get-out of "Well, we didn't actually *claim* that it did".

Even then, if a question headline was a genuine attempt to present a plausibly-supported but not universally-accepted idea (possibly because it was new and/or divisive), then Betteridge's wouldn't apply.

In short, Betteridge's original observation was insightful where he claimed it applied, but it was never a blanket dismissal of question headlines, so please stop the tedious, kneejerk misapplication.

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).

Disk Utility by hackertourist · 2012-12-23 06:55 · Score: 1

When installing a new disk in a Mac, I run Disk Utility with the Secure Erase option enabled. This will write 7 or 30 passes of 0000 to every block, that should find any early problems...

Do You Test Your New Hard Drives? by OneWordReply · 2012-12-23 06:55 · Score: 1

Never.

My testing methodology by dpidcoe · 2012-12-23 06:56 · Score: 2

I thoroughly test any new hdd I get for my desktop PC:

The first thing I do is format it and install windows. If that works, then we know the drive isn't DOA
From there I torture test it by copying several hundred gigabytes of software and movies, as well as installing some more programs.
After that, I let it run for a few months, using it normally. If it crashes during that time, then I know it was bad.

Manufacturer Tools by SrLnclt · 2012-12-23 06:58 · Score: 1

Recently picked up a couple 3TB Seagate drives and a Synology box for a new NAS at home. Since I was planning to move all my music, pictures, video, and general documents to the new box, I decided to download the manufacturer HDD tools and scan the drives first just in case. I think Seagate's is called SeaTools, I'm sure WD has a program as well. No errors reported on either drive, and no errors so far with the RAID array after a couple months of use.

Because you ran linux? by damn_registrars · 2012-12-23 06:59 · Score: 1

Well, the last drive I returned to a manufacturer was one that I was running FreeBSD on and they didn't seem to care. Granted, the experience with the manufacturer (Seagate) was less-than-pleasant but that had nothing to do with my choice of OS which I don't think they ever asked.

I now buy only Western Digital.

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.

Spinrite by Anonymous Coward · 2012-12-23 07:00 · Score: 0

When I purchase any computer, I always do a spinrite cycle on it.

Re:SSDs by war4peace · 2012-12-23 07:02 · Score: 1

Nope. SSDs are reliable enough to be used in server-grade implementations. The only issue with them is that they're highly specialized. If your regular HDDs become the bottleneck, you will need SSDs. Also, if you have some small implementations where you need fast access to read/write/modify data (some MMOs come to mind) and need to protect it against a power failure or RAM going awry, you should use SSDs.

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)

Burn In Testing for New Gear by hackus · 2012-12-23 07:03 · Score: 1

This is part of a process for testing new server gear.

Since I use Fedora, currently at 17, burn in testing is important.

Quick tip: Most of the distro's currently do not detect SSD drives during the install and do not include the "discard" keyword in the fstab entries for the device.

If you do use a Modern Distro, make sure that if you install or use a SSD with it, to mount the device with kernel flag for TRIM support set.

For example:

UUID=xxxxxxxxxxxxxxxxx /mnt/ssd2 ext4 discard,defaults 1 2

Where xxx..is your UUID label you made for the device and discard indicates enable TRIM support.

Burn in process for equipment for hard disks usually involved write a file the entire size of the disk, reading it, random seeking it, then deleting it.

I also use a customer script to drive sysbench with some common fileio tasks.

This is important for disks as it can reveal differences in firmware or firmware between SSD's used in arrays. For example a customer of mine had a really bad performing raid array and it was due to the mixing and matching of firmware between drives. (It worked well for a while, but then went bad when one of the drives in the RAID 5 array died, and he replaced it with a new one with different firmware.)

-Hack

--
Got Geometrodynamics? Awe, too hard to figure out? Too bad.

Re:Yes! Especially before adding them to an array. by Anonymous Coward · 2012-12-23 07:06 · Score: 0

No problems at all, from either WD or Seagate for RMA. That surprised me a bit, since in some cases I'm asking them to take back a drive that works, it just works -slow-... in their place, I would be skeptical.

I have had nothing but good luck with commodity hardware: my personal theory is that poor shipping and handling practices are responsible for most of commodity hard drive failure. I always buy them with overnight or two-day shipping (to a business address, never thrown on a doorstep), and they always just work.

Re:SSDs by guruevi · 2012-12-23 07:09 · Score: 1

HW RAID and SW RAID have been on par in performance for at least a decade. SW RAID these days is actually exceeding HW RAID performance because of the large difference in performance and calculation capabilities of the CPU (especially with data checksumming and compression).

--
Custom electronics and digital signage for your business: www.evcircuits.com

Re:SSDs by cpghost · 2012-12-23 07:14 · Score: 4, Interesting

Actually, the only use for SSDs currently are ZILs (ZFS intent logs) and we're evaluating whether we put PostgreSQL transaction logs on an SSD, but that's another story. Our main storage farm is still HDD-based.

--
cpghost at Cordula's Web.

Plug it in by mbone · 2012-12-23 07:18 · Score: 2

Testing is simple - plug it in, and run it till it fails. Might as well use it in the mean-time.

Always by Anonymous Coward · 2012-12-23 07:20 · Score: 0

I write /dev/zero to ALL blocks and I check SMART statistics and /var/log/messages for any timeout/IO issues or for defective blocks. Sometimes it's not a defective drive, but a bad cable.
I've also found FreeBSD to be a lot more informative and restrictive when something isn't working 100%. I belive this is because of the GEOM framework. Linux can be more forgiving, which could cause a minor problem becoming a major problem, because you didn't know about the minor problem before it became a major problem.

Re:SSDs by stuporglue · 2012-12-23 07:21 · Score: 1

I just bought a new ThinkPad which had several SSD options. I chose the slower 1 terabyte disk instead. I'd rather have everything I need with me, even if it is a little slower.

As for backups, I have a daily cron job which rsyncs between my laptop and my home server.

When I have massive changes I make sure I'm hooked up to the wired home network, otherwise it just goes on over wifi.

--
https://www.facebook.com/digitizeicm -- Show your support for the digitization of the Iron County Miner newspaper archiv

5-7% of incoming drives fail badblocks -w. by Anonymous Coward · 2012-12-23 07:22 · Score: 0

Yes, we do, and we find problems.

It all started due to a Seagate drive that had sat on the shelf too long for the the retailer to take a return, but Seagate did a warranty replacement with a refurbished drive, which meant we couldn't sell it to a customer. So we started doing incoming acceptance testing.

I'm willing to believe that it worked fine at the factory, but shipping knocked a speck of dust loose. With modern head-flying heights, it doesn't take much.

Nooooo! by briancox2 · 2012-12-23 07:26 · Score: 1

The Heisenberg Principle states that measuring anything changes it. So I don't check anything to see if it works for fear of it falling apart.

--
We should learn what we need to know about issues, before we decide what we need to feel about them.

Re:Nooooo! by rrohbeck · 2012-12-23 07:57 · Score: 1

I thought that applies only to software.

--
thegodmovie.com - watch it

smartmontools under Windows by simplypeachy · 2012-12-23 07:30 · Score: 1

smartmontools works brilliantly under Windows too as smartd can be run as a service. With a suitable smartd.conf and blat to email reports, it can be a double-click-installed jobby. Also writes to the Event Log.

Re:Did ketchup lead to the extinction of dinosaurs by rvw · 2012-12-23 07:30 · Score: 1

betteridge's law of headlines applies here.

No, it doesn't. This is an actual, legitimate question.

Thanks for the clarification. If you read the answers here, you'll notice that while most people don't test their new drives, some people do, so that proves you're right.

Newp. by Anonymous Coward · 2012-12-23 07:34 · Score: 0

I buy new drive, and it usually goes something like the following:

Using parted:

mktable gpt
mkpart "" ext4 2048s 1000GB
set 1 legacy_boot on
mount /dev/sdX1 /mnt/disk1 -o noatime
mkdir /mnt/disk1/boot/
mount /geekhut/SubLinux-(insert timestamp here).sqfs /mnt/loop0
cp -a /mnt/loop0/* /mnt/disk1/
dd if=/dev/zero of=/mnt/disk1/.swapfile bs=1M count=1000
mkswap /mnt/disk1/.swapfile
echo "/.swapfile swap swap defaults 0 0" >> /mnt/disk1/etc/fstab
extlinux -i /mnt/disk1/boot/
umount /mnt/disk1
cat /usr/lib/syslinux/mbrs/gpt_mbr.bin > /dev/sdX
umount /mnt/disk1
umount /mnt/loop0

DEPLOY! :^3

Re:SSDs by roc97007 · 2012-12-23 07:35 · Score: 2

At two companies I managed IP libraries (massive amounts of photographs and drawings used in catalogs and advertisements). The data changes only slowly, and (depending on usage) seasonally, so incremental backups are very much practical. But that's not really the issue.

This is important. Raid protects you from certain kinds of failures, usually limited to the mechanical or electrical failure of a single hard drive. (More protection can be had by nesting raid levels, but for most installations this is the case.) Raid does not protect you from a wide variety of failures including data corruption from a bad controller or application bug, systemic failure of the raid appliance (example: a catastrophic power supply failure taking out multiple drives) operator-induced data loss, either accidental or malicious, or environmental catastrophe. If your data is important, there is still no substitute for backing up your data and sending it to a remote site. Even geosynch won't necessarily help if you're synching bad data to the only remote copy. And, I'm not yet convinced that syncing to "the cloud" is a good idea.

Mind you, backups don't have to be to tape. I'm a photographer when I'm not a geek, and I typically keep tens of thousands of photographs online on my workstation. As backup to tape, DVD or even blu-ray isn't really practical, I back up to a series of hard drives using one of those plug-in hard drive toasters, then carefully store them elsewhere, disconnected from the computer. Disaster recovery is a set of drives in a safe at a friend's house.

There are examples where backups aren't necessary. I worked with one array that was essentially a huge cache for 1-800 calls, and a complete wipe would only mean that customers would see a delay on the next call as their particular part of the cache was rebuilt. But for the most part, depending on raid instead of a properly implemented backup solution is a really bad idea.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

Waste of time by Anonymous Coward · 2012-12-23 07:48 · Score: 0

There is no useful testing a user can do. The factory has already run more tests, at a deeper level, than anything the user could imagine or replicate.

From a user's POV, a new HDD either works or it does not, something that will be immediately apparent upon connection to the computer.

The common failure modes for a HDD are:
1) DOA,
2) fairly early visible errors due to manufacturing defects, and
3) EOL failures (the other end of the 'bath tub' curve.

1 is fairly common, and easy to accommodate, obviously. 2 is the fault users dread, and is very uncommon UNLESS the model of drive (think 'deathstars') is garbage. 3 is the usual failure mode that we just have to live with.

Clever testing tools are usually needed to determine whether 2 is happening (complicated by the fact that similar faults can be caused by PSU issues, MB issues, memory issues, other devices on the bus issues, etc.)

So long as important data is backed-up to multiple locations, nothing comes close to the HDD for reliable mass storage of data that still needs to be conveniently accessed. The biggest problem HDDs have is that they are increasingly reliable. This fools too many people into thinking a HDD that has worked flawlessly for years will continue to do so into the future.

This article seems to imply that there is a magic way to test a new HDD so that you can then rely on it. Big mistake! Far better to be a grown-up, accept that modern HDDs are VERY reliable, but that that very fact may well lull one into a false sense of security.

Re:Waste of time by unitron · 2012-12-24 01:44 · Score: 1

There is no useful testing a user can do. The factory has already run more tests, at a deeper level, than anything the user could imagine or replicate....
And then it gets packed and shipped to a vendor, and repacked and shipped to a customer.
Or it gets packed and shipped to a company which divides up that shipment and ships it again to their brick and mortar outlets.
Or there's a distributor in between the manufacturer and the retailer.
Anyway, it can be perfect coming out of the factory and get used as a football somewhere during shipping.

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.

Re:SSDs by aaron552 · 2012-12-23 07:58 · Score: 1

Isn't this what hot spares are for?

--
I had a sig once. It was lost in the great storm of '09.

Re:SSDs by hairyfeet · 2012-12-23 08:08 · Score: 1

And those building systems that want QA on their builds, people storing their family videos and photos, hell the list could go on all day. Problem is SMART got ruined by the OEMs, its more about CYA than actually reporting the truth and the only program that bypassed the lying SMART hasn't been updated in over half a decade and can't support newer drives.

--
ACs don't waste your time replying, your posts are never seen by me.

Re:SSDs by Anonymous Coward · 2012-12-23 08:16 · Score: 0

no, its why we have raid6 now.

Re:SSDs by hairyfeet · 2012-12-23 08:18 · Score: 4, Interesting

Unless you are using SLC, which is getting harder to find and more expensive every day you are really pushing your luck. The problem is the hot/crazy scale when it comes to these drives, specifically the fact that nobody has figured out how to lick the controller issue. For those that haven't run into it yet (lucky bastards) the controller issue will cause a drive to suddenly fail without ANY warning and unlike how the SSDs are always bragged on to "fail safe" into a read only mode what actually happens is when the controller fails the whole drive is completely dead, it won't even show up in BIOS/UEFI.

So until somebody figures out how to lick the controller problem, and when they do the money they make will truly be insane, or come up with the idea that i have been advocating for years of putting a second cheaper ARM controller on the board designed to take over as a read only backup while you get your data out? Well I'd be seriously leery of trusting any data I cared about to an SSD, not without spinning rust backups at the very least. The controller bug seems to bite every OEM on the ass, I have seen it from Intel to OCZ and its always the same. Push the button and poof! Data all gone with the drive. And of curse since you can't get your data off or even wipe it you have to hope they don't send it to some third world country for refurb where they help themselves to your data. Because of this I don't think my customers have even used 10% of their warranties for fear of the data falling into the wrong hands, great for the OEMs which rarely have to make good on warranties, not so good for the customer.

--
ACs don't waste your time replying, your posts are never seen by me.

Wrong Approach by nuckfuts · 2012-12-23 08:29 · Score: 4, Insightful

I've been dealing with hardware failures for 20+ years. What I've learned is that disasters WILL happen, regardless of what preventive measures are in place. So I shifted my focus toward recoverablity. To me, the important question is "When something catastrophic happens, how quickly and easily can I put things back in working order"?

Since I use RAID where appropriate, and more importantly, I am positively fanatic about frequent, full, and tested backups, the only concern I have when a hard drive dies is whether I'm still entitled to a warranty replacement.

Re:Wrong Approach by mcrbids · 2012-12-23 17:38 · Score: 1

Personally, I am positively fanatic about frequent, full, tested, and redundant backups. More than once I've seen a recovery fail when an otherwise frequently performed, frequently tested backup failed at the worst possible moment.
My rule of thumb is that if you ever get to where there are no remaining recovery options in the event of a failure, it's time to add another. Thus, we have redundant points of failure after a failure in (almost) every direction.
Backups of backups, as it were....

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Don't degauss it to start with by andy+the+engineer · 2012-12-23 08:36 · Score: 2

On black Friday I bought a 1 TB drive at Office Depot, and of course they waved the box over their anti-theft degauser. I asked for a different drive and told them that they shouldn't do that with drives. The girl gave me the look we all have seen, but the boy behind her actually agreed with me and they gave me a drive out of the cage and let me leave the store with the alarm blaring. I've just about filled it up already and It's been working fine.

--
Jack of all trades, master of some.

Re:SSDs by Anonymous Coward · 2012-12-23 08:41 · Score: 0

Let's see. I have three drives connected to this laptop. The internal, which is 1TB, an external that is 3TB and another external that is 4TB.

Let me know when I can buy an SSD for $100 that matches the size of any of those.

A quick search on NewEgg shows you can't get a 3TB internal or exernal HDD for $100US so I'm not sure what point you're making.

Heck yeah I test. by MasterOfGoingFaster · 2012-12-23 08:43 · Score: 1

I test every single drive before deployment. I've found Gibson Research Corp (grc.com) Spinrite to be vital. It's pretty much the only drive test / repair / recover tool I use - other than RAID recovery tools. I'm astonished at the number of people who say they don't test at all.

Go visit a UPS or Fed-Ex distribution center and watch the "slapper" kick packages off the 45MPH belt onto a slide at the load dock. Small boxes like hard drive packages are airborne. I doesn't matter how much the factory tests. Shipping damage is a very serious issue.

BTW - I have no connection with GRC, except as a near-daily user of their products.

--
Place nail here >+

Where the real errors are by Anonymous Coward · 2012-12-23 08:43 · Score: 0

In my experience, the real issues are with setting it up and partitioning it, plus properly doing a format for each one.

The solution? I use live CD versions of System Rescue or Knoppix to make sure it's sorted properly. It's a 5 or ten minute job but then it just drops into whatever system you want to use it with and you're done. Rarely are there major issues on a new drive if the electronics are working correctly.

Testing? No... by bwcbwc · 2012-12-23 08:43 · Score: 1

But I always do a full/slow format to at least do a sanity check.

--
We are the 198 proof..

Shouldn't this by rossdee · 2012-12-23 08:45 · Score: 1

Shouldn't this be a question for HDD manufacturers and OEM resellers?

Thoroughly Test? Too Time Consuming by Scarletdown · 2012-12-23 08:54 · Score: 1

I don't bother doing a full test before putting a new drive in service. If I am buying a new drive, it is because I need it now, and I need to be back up and running stat.

When buying a new drive, I have expectations of the manufacturer doing their job and selling me a working product that should be operational out of the box. However, after I have copied my data over from where I had backed it up (usually onto a shared drive on another system on my network), I keep the backup handy until I know for sure that the new drive will not need to be returned to where I got it for a warranty replacement.

--
This space unintentionally left blank.

Re:SSDs by Anonymous Coward · 2012-12-23 09:16 · Score: 0

Why not both? Just get the 256 mSATA SSD (sadly only 3gbps instead of 6) use that as your boot drive, leave your rotational drive as is (bootable). Hell you could even get a caddy replacement for the DVD drive and stick another 1TB in there and run those in RAID 0 or 1.

No. No. No. by Anonymous Coward · 2012-12-23 09:19 · Score: 0

No. I don't speed or stress test my new hard drives.

No. I don't have a high initial failure rate.

No. In the rare cases that I have had a hard drive fail under warranty, there has been no issue with what was on the drive. If the drive was at all functional, I wipe it. If it is completely non-functional and there is nothing of value/risk on the drive I return it. If I can't wipe it and there was any sensitive information on it, I don't do the warranty route. I destroy it and buy a new one.

So, the answer to all of your questions is: NO!

SeaTools by aaronfaby · 2012-12-23 09:23 · Score: 1

Not sure how effective this is, but we've been testing our hard drives using the Long Generic test with SeaTools. It appears to do a write/read test on each sector of the drive, as large drives such as a 2TB can take almost a full day to complete. There's also an option to repair bad sectors during the test. Seems to be pretty effective, and it's probably better than nothing. YMMV

Re:SSDs by drsmithy · 2012-12-23 09:23 · Score: 4, Funny

Holy crap. Twenty 3T spindles in a single array ? What do you do to de-stress ? Run between cars on a highway ?

Simple... by Anonymous Coward · 2012-12-23 09:38 · Score: 0

I shove them up the ass of the nearest stranger, punch them in the face, and skull fuck their children.

Fill drive, read drive, format drive, use drive by davidwr · 2012-12-23 10:03 · Score: 1

If I can fill a drive, flush or disable anything that would interfere with read tests, then read back all the data okay, then I'm confident enough to trust the drive. Yes, it could be bad but I'm not going to waste time doing additional burn-in tests.

If I'm doing a fresh OS install, it's easy enough to install the OS on the bare drive, configure it the way I want to, then boot with another disk (bootable CDs are your friend), fill up the disk with pad data, reboot, clone the disk, reboot, then verify the clone. If the clone succeeds, I trust the drive. Delete the pad data and boot with the new drive and away we go.

If it's not a fresh install I'm probably either cloning another OS disk to this one, in which case I do pretty much what I said above, or it's going to be a data disk. If it's going to be a data disk, I copy any data I want on it then fill it up with a pad file then clone and verify it as above. If the clone verifies okay, I'm good to go.

This is for non-mission-critical use of course.

If it were mission critical it's probably in a RAID with some redundancy anyways, so this test would be adequate.

If I don't have enough scratch space to fully clone the disk, I can clone and verify it in stages, one several-hundred-GB-chunk at a time.

OK, I lied. I don't really do all of this most of the time. Most of the time I just use a disk-erase program that offers verify-after-write and trust that if the drive's firmware does lie to me and report "successful read-after-zero" before the sector is zeroed out that it will either successfully remap the sector or before the end of the testing a "hard" error of some type will be reported back to the disk-erase program. I will be fooled if I bought a bum drive with lots of bad sectors but at least as many spare sectors for the firmware to remap, but that's an acceptable risk for almost every scenario I run into. I may also be fooled if all but the last failed sector remapped successfully but the last one did not. Hitting this edge case on a brand-new drive is extremely unlikely.

Oh, there's also the "ear" test the "it's taking too long" and "gee, that was fast" tests, the "ouch this drive is way too hot" test, and other tests you don't think about but which gives an experienced person a reason to suspect a problem every time he hears an abnormal sound, waits way too long for an operation or is surprised at how fast a drive is, or touches a drive and nearly (?) burns his fingers.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:SSDs by mlts · 2012-12-23 10:07 · Score: 1

The best is having both RAID6 and at least a hot spare. With the difference between disk capacities versus I/O so disparate, even with two drives, there is a large window (~24 hours in some cases, even on the high end SANs and lower tier drives) where the array is in a degraded state and is being rebuilt. The hot spare is important in this case because it allows the array to start being rebuilt immediately.

The ironic thing is that tier 3 drives store a lot of critical data, even though they are relatively cheap and slow. Because of this, taking reasonable measures such as RAID 6 + a hot spare or two is a must.

Testing is redundant by WaffleMonster · 2012-12-23 10:12 · Score: 1

If the drive successfully syncs up to the array then it passes otherwise it gets RMAd.

Re:SSDs by mlts · 2012-12-23 10:39 · Score: 1

I've seen HW and SW RAID leapfrog each other. Before Oracle bought Sun, there was a time where Sun pushed software RAID and Veritas Volume Manager, saying how CPU is cheap so might as well use it for drive I/O. A year or two later, they came out with hardware RAID appliances and the sales guys pushed how good having all the CPU overhead of party generation done on disk controllers was.

HW and SW raid I use on an application basis. For enterprise tasks, this tends to be moot because the data resides on FC/FCoE LUNs, and the only time I might worry about using RAID on the system side would be if I were migrating data from one SAN to another without shutting things down. Local disk tends to be more an afterthought (mainly because it tends not to have the ability to use MPIO), so it ends up being mirrored, just out of simplicity.

Of course, for desktops that are used on a day to day basis, those get hardware RAID so a drive failure means a SNMP trap firing off and a trouble ticket to desktop support versus a user screaming bloody murder.

S.M.A.R.T. conveyance and extended tests by knarf · 2012-12-23 10:57 · Score: 1

Whenever I get hold of a new drive I run it first through the SMART conveyance test (which usually comes up clean) followed by an extended test. The latter has shown errors in a surprising number of drives, if I'd have to give a rough estimate I'd say around 5%. These are usually read errors, which usually can be 'fixed' by overwriting the sectors in question, but it generally forebodes problems with the drive later on. If a drive shows errors in any of these tests I RMA it. The replacement drive gets a similar treatment.

--
--frank[at]unternet.org

That is a good question! by poofmeisterp · 2012-12-23 11:21 · Score: 1

I test new drives for performance, not for reliability. Now you got me thinking...

Try to break the disk before you lose your data by ncw · 2012-12-23 12:21 · Score: 2

Stress testing hard disks is a particular bugbear of mine, after having some really bad luck with early hard disks. Over the 15 years that I've been doing it I've had to send back loads of hard disks and flash cards because they failed my tests, either breaking completely or returning single bit errors in your data. Mostly the manufacturers will take disks back if you can get their stupid Windows program to return an error code. Sometimes it takes a bit of arguing but ultimately the manufacturers want to keep you happy. Flash disks with single bit errors are the hardest to send back in my experience.

Here is the latest generation of my stress testing code (re-written in Go recently): https://github.com/ncw/stressdisk

(Interestingly the stressdisk program sometimes finds bad ram in your computer too!)

I generally thrash every new hard disk or memory card for 24 hours to see if I can break it before trusting any data to it!

I also run a long smart test too.

Somewhat paranoid, yes, but I really, really hate losing data!

--
Every man for himself, all in favour say "I"

Never... but RAID is a hedge by Anonymous Coward · 2012-12-23 12:23 · Score: 0

I never test my new hard drives, but I figure RAID will prevent me losing data if a new drive fails. It's nice not to have to constantly back up 10TB of data, too.

Re:SSDs by Anonymous Coward · 2012-12-23 12:33 · Score: 0

There are many failure patters which RAID does not cover, so even if you have RAID, you should still invest in a backup solution. RAID systems don't replace backups.

I asked that too... by g7a · 2012-12-23 12:42 · Score: 1

I asked the same question not so long ago: http://slashdot.org/submission/2004807/ask-slashdot-how-do-you-go-about-testing-a-storage-medium sadly the comments with all the helpful messages in seem to have disappeared.

Re:SSDs by dinfinity · 2012-12-23 12:58 · Score: 2

Please. Quoting Jeff Atwood as an authoritative source on SSDs?
Some anecdotal evidence and a subsequent admission of buying from the brand known for the highest failure rate in SSDs isn't going to convince anyone.
I'd like to see some proper statistics before I believe anything you say.

The most reliable statistics I've seen show SSDs performing as good or better than HDDs when it comes to failing. I haven't seen any statistics on what percentage of failing drives did so spontaneously, completely, without warning and without any possibility for repair.

Mind you, I'm not claiming they don't. Just that I haven't seen any evidence beyond some anecdotes. And well, anybody that trusts a single drive with important data is an idiot or ignorant anyway.

fill drive with pseudo-random data, then read back by Anonymous Coward · 2012-12-23 13:47 · Score: 0

I don't trust a drive until I've filled it with pseudo-random data, then read it back for comparison. Then perhaps write with zeroes just to tidy up. As other commentators have pointed out, it also helps to check the elapsed time (to catch drives that are writing unusually slowly). The last Java program I wrote for this is GPL freeware:

http://www.softpedia.com/get/System/Hard-Disk-Utils/Erase-Disk.shtml

Re:Yes! Especially before adding them to an array. by Anonymous Coward · 2012-12-23 14:30 · Score: 0

not the GP, but am in almost the same boat (ZFS on OI).

I normally buy Seagate and have seen this failure mode as well. the disk sends data, the data is even right, it's just SLOW. Seagate has never had an issue with me returning these.

Hard Disk Sentinel by Anonymous Coward · 2012-12-23 14:49 · Score: 0

Check out Hard Disk Sentinel.. It's basically a really nice SMART monitoring tool, but it has a surface tool similar to SpinRite that's quite useful. It color codes the disk blocks by access time for read, read/write, write, etc. I recently had a WD drive that had what I assumed was a bad block (pending/offline) that would basically lockup the drive when accessed for 2 minutes or something crazy like that. Quick formatting then writing to the entire disk did nothing. Stupid WD drive didn't even give me a write error or any other indication of a problem when I filled the whole disk up again, but it couldn't read the data back. However sentinel clearly showed the block read failures (on 3 blocks). Like SpinRite I was able to constrain a new test to just the area around the bad blocks and do what they call re-initializing the disk surface (basically read/write lots of times). That forced the stupid drive to finally figure out the block was bad and move it. Drive finally appears happy.

Our test methods by john_uy · 2012-12-23 16:23 · Score: 1

We have recently purchased around 300-400 drives of 500GB from Hitachi GST.

Our test method for checking the drives is filling up the drives with files (by replication) and do a hash check after (with comparison to the original source file.) Should the drive drop out (due to retry errors) it is RMAd. We do check for SMART after, as based on experience, it is fairly accurate on the sector reallocation count when the drive is in imminent failure. You also need to keep an eye on read statistics (we use iostat) to check if the performance is sub par. Normally, the drives will return to normal speeds after sector reallocation.

Based on our statistics, I would say that we do get around 1% defect rate for the drives (we have swapped out around 3 of them for 1 died and 2 having bad sectors.) After around a month, you get a further 1% or less (for typically having bad sectors further.) The same goes for after around 1 year.

In another interesting note, we purchased around 8 pcs of 2TB from Hitachi GST and probably from a batch problem, we had to replace around half of it due to bad sectors. We had a batch before of around 8 pcs of 2TB but everything were good.

As for performance, there are times when some of the drives deliver consistent performance (the hash checks don't all finish at the same time.) Though we don't classify the drives but my guesstimate is around 5%.

--
Live your life each day as if it was your last.

Never a consistent answer by mcrbids · 2012-12-23 16:54 · Score: 1

I grind through lots of hard drives.

Among my other duties at our ASP software company, I perform the system administration, which includes backing up a few hundred databases daily and perhaps few dozens of billion files. To give you some example of our backup size, we currently have about 20 TB of data in redundant, consumer drives in RAID1 fashion, for about 10 TB of effective storage space.

I've gone through dozens of WD consumer drives with nary a failure, while I've had 2 out of 3 consumer Seagate drives fail within a few months, over several model lines.

For the past few years, I've more or less stayed clear of Seagate, although I had a number of their SCSI 10K drives in production with no trouble.

And everywhere you go, you get wildly conflicting results like this. (shrug)

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Re:Never a consistent answer by 1s44c · 2012-12-23 17:12 · Score: 1

And everywhere you go, you get wildly conflicting results like this.
That's my experience. The reliability changes from model to model from the same manufacturer are huge.
Buying drives is pretty much pot-luck.
Re:Never a consistent answer by dbIII · 2012-12-23 20:26 · Score: 1

I've had a run of 2TB WD drives fail lately (maybe six?), which is a worry since normally drives are so reliable that I've been through a pile of 1TB, 750GB, 500GB and 200GB drives in service for 3 or more years each time with almost no failures, and a lot of those were WD.
Re:Never a consistent answer by loosescrews · 2012-12-23 21:21 · Score: 1

You aren't the only one having problems with the 2TB WD drives. I bought three WD 2TB Black edition drives and two failed within a year. I turned the remaining drives into cold backups.
Re:Never a consistent answer by hairyfeet · 2012-12-23 22:29 · Score: 1

Let old Hairy fill you in dbill, WD 2TB are shit, all Seagate above 500GB are shit, avoid both, simple. Don't ask me why but both WD and Seagate have had nothing but trouble with their large capacity drives, with Seagate anything above around 640GB is like playing roulette with your data, WD has 1TB down but 2TB are just as bad as Seagate. I go through a LOT of drives here at the shop and frankly the ONLY 2TB drives which were consistently solid were the Samsung EcoGreens, good luck finding one of those now but frankly I'd take a refurb EcoGreen over a new Seagate, yes they are THAT bad and I wouldn't trust anything I cared about to a WD that was bigger than 1TB.
So the moral of the story is stick with 1TB drives, yes that sucks, yes we'd all like more room, but are you willing to risk having that 2TB of data crapped on by the drive?

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Never a consistent answer by dbIII · 2012-12-23 23:22 · Score: 1

I've had some seagate 1TB drives getting rejected by arrays as well. I think I've reused a couple in desktops that are almost dumb terminals and can afford to lose a drive but still have 3 on my desk that I've written "suspect" on. No SMART errors (which reminds me of your earlier post), but a couple of 3ware cards choked on them after a couple of months so I pulled them out.
The 2TB WD drives on the other hand are unquestionably dead. Maybe it was 4 or 5 instead of 6, but it was enough to both piss me off and annoy me when two died in a RAID6 array and a third one started timing out.
Re:Never a consistent answer by DarwinSurvivor · 2012-12-23 23:25 · Score: 1

Seeing as the 1TB and 2TB drives are almost the same price, you're probably better off getting the 2TB drives and using the extra capactity (on a separate drive of course) as additional backups.
Re:Never a consistent answer by unitron · 2012-12-24 00:47 · Score: 1

And everywhere you go, you get wildly conflicting results like this.
That's my experience. The reliability changes from model to model from the same manufacturer are huge.
Buying drives is pretty much pot-luck.
Not to mention that specifications get changed while the model number stays the same, so you can't even rely on that to know what you're getting.
And if it's a drive in a retail box the actual drive inside probably isn't the same model as the one that was in the box a year ago, so you're twice blind.
Yep, it's a roll of the dice at best.
And that's before you factor in that the brand you're buying just got bought out by another brand so that what you think is from one company may be a rebadged one from the other company, which really makes the model number meaningless.

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:Never a consistent answer by hairyfeet · 2012-12-24 03:02 · Score: 1

Uhhh...did you not see the "2TB on both WD and Seagate are shit" part of the post friend? Who cares if its the same price if the 2TB shits all over itself and dies hard in 3 months?
As I said I got through a shitload of drives at the shop, here is what I have found, Seagate above 500GB-640GB are SHIT, WD above 1TB are SHIT, and with both the 3TB are strictly roulette, you might get 3 good ones in a row followed by 5 total shit, its completely random when it comes to 3TB
So sure if you don't give a shit about your data? Go ahead and get the 2TB but don't be surprised if its a paperweight less than a year after purchase. And buying two WILL NOT SAVE YOU as these 2TB drives can crap out within hours of each other, the QC on these drives must be just horrible. the ONLY 2TB drives were I saw consistently good, in fact out of nearly 30 drives i didn't end up with a single loser, was the Samsung EcoGreen. That is why I was so pissed when I heard Samsung was selling out as the EcoGreen was my "go to" drive for any system that was gonna get heavy usage as even though its a 5400RPM drive the 32MB cache is fast enough to even use as a boot drive and they could take serious punishment and keep right on going. In fact the system I'm typing on is my home system which is a 1TB EcoGreen for OS and 2TB EcoGreen for data.
But if you actually give a shit about your data I wouldn't trust either a Seagate nor a WD 2TB right now, WD drives are a LITTLE better but frankly not much when it comes to 2TB as BOTH have truly horrible failure rates at 2TB. As I said at 1TB you can buy WD and have a pretty good shot at getting a solid drive, Seagate you're not getting shit above 640GB that isn't a total crapshoot.
If it were me and I HAD to have a 2TB drive? I'd be scouring the web for an EcoGreen right now, in fact if you can find their 1TB or 2TB drives they are worth buying, solid as a rock and low heat/noise.

--
ACs don't waste your time replying, your posts are never seen by me.
Re:Never a consistent answer by DarwinSurvivor · 2012-12-24 04:37 · Score: 1

Perhaps you didn't quite understand what I meant. Most 1TB and 2TB (and 500GB for that matter) drives cost about $100. Now say you have 3TB of data to store (not counting backups). You would probably recommend buying 7 1TB drives. 6 mirrored for 3TB of storage plus a spare in case 1 dies for $700. You could also buy 7 2TB drives which would give you 12TB of storage, enough for 3 copies of everything, plus a spare AND you'd have an extra TB of mirrored storage available ((3TB + 1TB) x 3 mirrors) == 12TB.
With the 1TB drives, you would need 2 drives in the same mirror to die. With 2TB drives, you would need 3 in the same mirror to die.
Re:Never a consistent answer by Crosshair84 · 2012-12-24 08:27 · Score: 1

What kind of WD 2TB drives are you using? We use 2tb WD Blacks on our DVR systems and have a very low failure rate. I have three 2tb Greens from 2010 that are going along just fine. Just wondering fiyou're suing something different than what we are using.
Re:Never a consistent answer by hairyfeet · 2012-12-26 00:47 · Score: 1

The blue ones which turned out to be deep fried ass, luckily I got my money back. I haven't tried the blacks yet, last i checked the price was kinda high on those, a little too high for my customers. The greens are good but they can be hard to find and I heard WD is phasing those out for the blue/black/red schema and I haven't had a chance to try the red yet but if you are building DVRs the reds are designed for DVR, security cams, and other industrial uses so you might want to check them out.
But just FYI but avoid anything Seagate over 640GB right now, talking to my fellow shop owners they are having Seagate 1TB and 1.5TB die on them left and right, just like Nvidia with bumpgate from the looks of it if you get a Seagate over 640GB it WILL fail, the only question is how long. I had one customer that refused to listen to me because he was building his own media tank/NAS setup and "found a deal" on new 1.5TB Seagate which gave him something like 10TB when he was finished. I tried to warn him but he didn't listen and now he is having to re-rip his entire blue ray collection because every damned one of those 1.5TB drives bought the farm in less than 6 months, taking most of his collection with them.
so if I run across some 2TB blacks at a price i can use I'll be sure to check 'em out, but if you don't want to spend time filling out RMAs you better avoid Seagate like an STD until they fix the problem. i thought I would die laughing as it took nearly 2 months for Tiger to finally find a price low enough to get people to take a chance on the Seagate 1TB, for first it was $80 on BF, nope, then $70 the week after, nope, then it was $60 after a $10 MIR, nope, finally the week before Xmas they said "fuck it" and sold them for $53 with no rebate and FINALLY sold out of the things,but I wouldn't be surprised if Seagate gets more than half of them back as RMAs and the only reason they won't get more is I knew several people that refused to RMA a drive because they didn't have a way to remove their data.

--
ACs don't waste your time replying, your posts are never seen by me.

S.M.A.R.T. - things are often named for what not by Anonymous Coward · 2012-12-23 17:34 · Score: 0

In dealing with three major vendors across PATA/SATA/SAS lines over the past decade, is to experience that they have all had runs of drive models, or components there of which exhibited greatly increased annualized failure rates. No clear cut naughty or nice vendors in the group.

In general, most of of the faults didn't stand out in testing, nor in early life and became noticeable after about two years of hard use. In hind sight, most of them could have been found with extensive testing and following up failures with tear downs, though as it happens only one defect had an obvious symptom in early use and testing.

Your mileage may vary.

Re:SSDs by toddestan · 2012-12-23 17:47 · Score: 1

Anyone who cares about reliability? Probably why they would also test the drives before doing anything important with them.

More... by mcrbids · 2012-12-23 17:49 · Score: 1

I also wanted to point out that the difference between full and partial backups disappears when you do backups using file-level hard links as your backup solution.

Doing backups disk-to-disk with rsync, using the hard links option, the difference between partial and full backups disappears. All backups are full AND partial; you get the benefits of both.

We do an "incremental" backup daily, in that only the files changed in the interim transfer as part of the backup, and only the changes occupy additional disk space, but the result is a "full" backup in that we end up with a complete snapshot of the file system that can be copied or used on demand.

This is really the way to go, and our particular solution is free and open sourced long ago.

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Re:SSDs by war4peace · 2012-12-23 17:51 · Score: 2

Exactly this.
I know a (very large) Data Center belonging to a (very large) company which started replacing their HDDs with SSDs. The price difference isn't even that large; price-per-GB for a server-grade 15K RPM SAS was negligibly close to SSD price. And the advantages are really there: (much) lower heat produced, less noise, less space taken, less energy consumed. Even with a similar failure rate, the advantages are there.

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)

Spinrite then data load till full then encrypt all by Anonymous Coward · 2012-12-23 18:21 · Score: 0

First, run spinrite
Second, assuming first step did not fail drastically, copy huge stuff onto drive filling the drive to near capacity
Third, format drive as an encrypted partition
Four, mount the encrypted drive partition, fill it will junk ala the second step then format the encrypted partition
Five, use drive as per normal use

If it doesn't survive until the forth step then return the drive for a replacement. Wash rinse and repeat until you get to the end of step four.

Re:SSDs by DMUTPeregrine · 2012-12-23 18:34 · Score: 1

The controller issue is mitigated by backups, RAID, and hot spares.

Of course even with RAID 1 (mirroring) and a hot spare that's 3 drives, not to mention your backup system, though that's not normally an SSD.

Ideally you'd have the same redundancy with spinning rust drives, of course, but the higher MTBF and the higher chance of detecting a pre-failure state can allow one to get away with less, say RAID with hot-swap.

--
Not a sentence!

No. by UltraZelda64 · 2012-12-23 20:01 · Score: 1

The last thing I would want to do to a new, potentially-untrustworthy hard drive is get it off to a bad start by causing more wear and tear on it right from the beginning. I just put it in, fire it up and start using it. First on simple and less important things, and then after a while of regular use and after I have gained some trust in it I start using it for more important things.

Of course... this can't always be done, like when replacing the system drive, so in that case I install the drive up as usual, set the OS up, make copies of the files I will need on it, and just see how it goes from there.

I have yet to buy a drive that died without working for at least several years of its life (5-8 years or more usually), so I don't typically buy a new drive with the expectation that it's a goner. I have had pretty good luck, with hard drives long outlasting computers.

Re:SSDs by dbIII · 2012-12-23 20:33 · Score: 1

ZFS is a lot quicker to rebuild so long as the array isn't completely full, for no other reason than it's doing less than a full dumb bit for bit copy.
Hardware RAID can sometimes spend stupid amounts of time copying all of that empty space, but I'm sure that will improve over time with better firmware. Even though that stuff is at the filesystem level they'll be ways to recognise empty space.

No Return, No Refund a COST of Security by Anonymous Coward · 2012-12-23 21:10 · Score: 0

Do you normally test your new purchases as thoroughly as you test old, suspect drives?
no, I install it, and it's either working or dead.

Has your testing followed the proverbial 'bathtub' curve of a lot of early failures, but with those that survive the first month surviving for years?
no, I install it, and it's either working or dead, if it's working it works until death.

And have you had any return problems with new failed drives, because you re-partitioned it, or 'ran Linux,' or used stress-test apps?"
I never return drives, I destroy them when they are dead, I got lucky and never bought from the pool of bad drives from Maxtor. I have bought a fucked drive though over the years--it's just I don't take them back. Most of the drives that don't work (e.g. DEAD/Clicking, etc) I got from friends who don't know crap about security on their computer, and they probably took the fucking covers off and hid their drugs (or house spiders and broken cherry succrets!?) in there. I get the magnets out and I won't say what I do with the rings--that's secret--all you need to know is they do not EVER get returned.

Yes by SAFH · 2012-12-23 22:50 · Score: 1

Bare minimum is dd if=/dev/zero of=/dev/sdX before any drive is put into use while monitoring with smartctl followed by a file system and large file (50% of drive size) read and hashing.

Same thing with RAM, who doesn't stress test it with memtest before using it?

Recently I purchased a bunch of WD Red drives and all six failed within 37 hours of first spin up. Dead Red's with a 37 hours MTBF.

--

I cannot confirm nor deny the allegation or allegations you may or may not have just made

Re:Yes! Especially before adding them to an array. by scsirob · 2012-12-23 23:26 · Score: 1

Interesting. I have a NAS with 3TB Seagate drives set up in RAID5. There was apparently a firmware issue in these drives that made the NAS drop a drive every now and then for a spurious read or write error.. Run a verify pass on the failed drive and all checked out OK. The drive would happily be added aback into the RAID group.

Because of the capacity, the rebuild took about 2 days. But on one of the drives it actually took 6 days! Perhaps I should look into this again..
The drives have since been updated to the latest firmware and the NAS has not dropped any since.

--
To Terminate, or not to Terminate, that's the question - SCSIROB

Re:SSDs by unitron · 2012-12-24 00:52 · Score: 1

Isn't this what hot spares are for?

Which model TiVo is it that lets you use hot spares?

--

I see even classic Slashdot is now pretty much unusable on dial up anymore.

Re:SSDs by Electricity+Likes+Me · 2012-12-24 01:38 · Score: 1

Actually the real saving grace of ZFS is that you can't get into the absurd situation of rebuilding a whole drive that only has a single bad sector. This is a life-saver, since chances are you end up finding another bad sector somewhere else during the rebuild, but thanks to the checksums if you do you're not left with corrupted data.

I had this happen to me with an mdraid6 volume: single disk, single bad sector - kicks out the disk (and so stop's syncing changes to it). Triggers a rebuild, finds another bad sector elsewhere, kicks out that disk. Suddenly my 2-disk redundancy has dropped to zero, and god help me if there's a bad sector anywhere else.

Re:SSDs by Electricity+Likes+Me · 2012-12-24 01:41 · Score: 1

I'm pretty skeptical of the "offline hard disk" approach. Unless you are very careful with those things, they're not inert when unpowered - you've got grease slowly hardening, oxygen leaking in, thermal cycling stress - and they're most importantly, designed to last about 3-5 years under constant use, not 10 years on the shelf.

Re:SSDs by Anonymous Coward · 2012-12-24 02:13 · Score: 0

Mirroring is backup. Its saved my enterprise stuff, so I didnt have to nurse backups.
I had the backup VERIFY against the restored array, but never had a glitch there.

I worked on a RAID-5 last week. One drive fail, one drive complaining. Live backup, followed by critical backup.
Replaced the dead drive, told it to rebuild.
4 drive arrray, and the RAID-6, I didnt trust, so I formatted it as RAID-10. ( a mirror of stripes...)
Well, the complaining drive bit the dust,
and since it was a mirror, No one knew. No one cared. Verified all the live data, ( the online stuff didnt match, but DID match the critical backup ). Replaced the complaining drive, Rebuilt the mirror overnight.
Somehow, a long standing 4 minute delay in the morning was cut to 1.5 minutes. People were complaining that the Accounting database was coming up too fast. ( A consultant said 'Mark it as offline!') They didnt have time to enjoy their coffee.

I asked for three of the largest reports, ( "this might take hours..." ) completed in 35 mins over lunch.
Oh time for benchmarks...40% faster with a mirror, 25% faster for writes.

So, watching carefully, saved both a lot of downtime, changed the configuration ( 5 to 10 ), sped up the throughput, made it more reliable, and avoided a complete loss and rebuild, with two failed drives on a RAID-5 because I converted it to mirroring.

Down side? We lost 33% of the total space, but considering that the array was less than 1/2 full, means that they will not need another pair of disks for a few years.

One nice feature to take advantage of is the email feature, but dont trust. Look at the status as a habit every morning.

RAID is NOT backup, but its a hell of a time saver avoiding restores. Thanks.

Re:SSDs by aaron552 · 2012-12-24 03:14 · Score: 1

The same model TiVo that lets you add disks to its RAID array?

--
I had a sig once. It was lost in the great storm of '09.

keep old drive around by Causemos · 2012-12-24 03:16 · Score: 1

Usually if I buy a new drive, it's to rotate out an older one. My "test" is to copy the mostly full drive onto the new one and keep the old drive on the shelf for a couple months in case of problems. Most drives either suffer an early death or last a good number of years. After 2-3 months I'll reuse the old drive for other storage needs.

Waseihou by Anonymous Coward · 2012-12-24 03:33 · Score: 0

I do only simple test - hearing test. If the harddrive sounds "weird", then it's time to replace. If a new drive is doing weird sounds, like high pitches or some other weird sounds made by it's motor, then one should have it replaced or be careful with backups. If you are hearing your HDD everyday, you will notice the change. If you notice that, get a new drive and make a backup. You ear cannot mislead you, and you should depend on it!

Re:SSDs by roc97007 · 2012-12-24 05:49 · Score: 1

It's true they're not inert when unpowered, but modern drives park the heads, making them less fragile than in the old days. It's true they're more sensitive to physical abuse than are tapes, but one takes that into account.

It's important to keep track of how old they are and cycle them. I write on the face with a sharpie the date I started using them and what they're backing up. (Just as I track the start date for memory cards for the camera.) Once a year I replace the drive with my important data with a new, usually larger drive, (ghosting the data) and the old one becomes the level zero backup. Incremental backups are done to spare lower capacity drives which are used for five years or so then scrubbed and donated to the local freegeek.

I don't keep the backup drive online. I know that a lot of people use an external drive as a backup and leave it plugged in, but that does not protect you from data corruption and some types of viruses. A good backup is physically disconnected from the machine. A great backup is geologically distant from the machine.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

Chance of infected MBR on replacement drive? by cpm99352 · 2012-12-24 06:39 · Score: 1

Quite relevant post for me, as I just had a Seagate 1.5T drive (which they sneakily had branded as Samsung) go bad after just 3000 hours - I purchased the drive in May and there are plenty of upset reviewers there complaining about Seagate trashing Samsung's name. I heard clicking, but interestingly SMART returned no errors. Luckily Seagate's SeaTools software detected the error:

Model: ST2000DL004 HD204UI
Firmware Revision: 1AQ10001
SMART - Pass 12/18/2012 10:45:55 AM
Short DST - Started 12/18/2012 10:46:11 AM
Short DST - FAIL 12/18/2012 10:48:14 AM
SeaTools Test Code: 6C9AC2A4

So, I set up the RMA. I think I'll go with a WD as a real replacement - they still have drives with 5 year warranties. Even there, though, on the newegg board are allegations they're either experiencing significant delays in getting a replacement, or the replacements are also bad.

But my real reason for posting was wondering about the integrity of a replacement drive? If I'm getting a "refurbished" drive, can I be guaranteed there's no virus/worm residing on the MBR? Is there a way to completely purge the drive that would clear any virus/worms?

I wiped my drive using HDDErase which worked without a hitch. I believe that would fix any infections, so maybe I'll start doing that before installing replacement drives. Thoughts?

Re:SSDs by Anonymous Coward · 2012-12-24 06:39 · Score: 0

Yes, because Newegg is the only place to buy hardware from. Seriously, I feel sorry for you morons who still buy from them. You can always get better prices at Amazon. Always.

I just bought a 3TB Seagate Backup Plus 7200RPM external USB 3.0 drive from Amazon last month. Cost? $109.

Yes, you are a stupid motherfucker.

If you consider new laptop = new hard drive by Anonymous Coward · 2012-12-24 06:49 · Score: 0

I stop the load/unload cycles creeping fast to their failing point -- 300 000 cycles (fight the planned obsolescence). Disabling this has negligible effect on hd temperature, which is currently at 38 degrees Celcius

gksudo gedit /etc/hdparm.conf /dev/sda {
apm = 254
apm_battery = 254
}

Re:Yes! Especially before adding them to an array. by Anonymous Coward · 2012-12-24 07:04 · Score: 0

Interesting. I thought I was in some sort of HD Bermuda Triangle. Some drives come directly from the manufacturer that just plain write slow. Glad to see others with this problem... not losing my mind.

WD, Seagate and Samsung, all have exhibited this issue for me- I'd guess 5% rate as well. Very weird.

Useless HDD tools and hidden partitions by billstewart · 2012-12-24 08:43 · Score: 1

The HDD tool that I really needed that nobody supports is the one that lets you manage manufacturers' hidden disk partitions on PATA drives.

A couple of years ago I had a 200GB Maxtor external hard drive which eventually started getting bad blocks, so I replaced the disk with a new 500MB PATA drive. The box didn't recognize the drive (because it was newer than the box, so the model wasn't listed), so it reformatted it using the "hide a partition" feature, leaving me with a drive that pretended to be 200 GB. (The feature's sometimes used for hiding a small Windows recovery partition, but is also available so a manufacture can do things like turning a 300 GB drive with bad blocks into a 200 GB drive with only good blocks...) Windows couldn't see the extra space at all, and the one Linux tool that was supposed to be able to support it was able to make the partition smaller but not larger, because it really didn't know about LBA yet :-) I ended up buying pizza for my local kernel wizard, and we looked at the source code for the tools and recompiled the Linux kernel to get some things to work a bit better but still weren't able to get it going. At that point I decide it was more cost-effective to buy a new drive than keep wasting time, but it was still really annoying, and it was at least a working spare 200MB drive.

I hope that by now, SATA drives don't have that feature, or else Win7 or Linux can work with it (because the point of the feature is supposed to be that you can't have full access to that space...)

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Disk drive makers that you don't hate? by billstewart · 2012-12-24 09:03 · Score: 1

I see people here hating on Seagate, and on WD, and I'll happily complain about Maxtor from before they got bought, but who's left? Are there any disk makers that people don't hate?

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Absolutely, we test our drives! by jafo · 2012-12-24 14:30 · Score: 1

I run a small hosting company (an add-on to our consulting), and we run burn-in testing on every drive we put into production.

Before we did this, we would regularly run into drives that would sporadically fall out of the RAID array because a block couldn't be read during validation. Once we started testing our drives, this all but stopped. My guess would be that some parts of the platters were marginal, and after a few read/write cycles they would fail and need to be relocated. So doing some testing would cause these blocks to be remapped.

What we do is "badblocks -svw -p 10". However, we've reduced it from 10 down to 3 because 2TB drives take so long to test now and that is our standard drive now. We target a few days to a week of burn-in testing.

Other things that this resulted in:

There was a Linux kernel bug with the LBA access code that caused one specific block on the drive to always report as bad with certain firmwares. The old firmware used to silently do the right thing with what the kernel was asking, the newer firmware reported a read error on this sector.

We've also found some drives that passed the testing fine, but did so at around a tenth the throughput. We were never able to track down why this was, we had a batch that were exhibiting this and we just gave the 10 or 15 drives away that were impacted.

We also had a batch of 10 or so drives that half of them were reporting high numbers of failures. We figured something had happened upstream (at the reseller or during shipping) and so we replaced even the ones that tested out ok.

So, yes, test your drives. Even though we're putting them in RAID arrays, we like to run the tests.

Sean

Always test by Anonymous Coward · 2012-12-24 21:15 · Score: 0

Yes, when I did a lot of hardware repair, I ran a BIOS HDD test every time.
It's tech 101.
Average was 1 in 10 that failed ("new" replacement drives)
Save yourself the headache--fighting a drive problem when you don't
know you have one is counter-productive.

If your unit doesn't have a HDD test built in, get DFT (Drive Fitness
Test) free (legally) from IBM / Hitachi. It tests all makes of drives
and is non - destructive (it won't damage your data).

Re:SSDs by unitron · 2012-12-25 03:54 · Score: 1

Correct.

It's their latest model, the Unobtania.

--

I see even classic Slashdot is now pretty much unusable on dial up anymore.

RAM too! by Anonymous Coward · 2013-01-01 02:18 · Score: 0

I was never in the habit of testing new hardware until this year, when I spent a month pulling my hair out over intermittent errors in a new system that turned out to be rooted in a bad RAM module. Now, if I can test it preemptively, I do!

Mid-life failures... by AliasMarlowe · 2013-01-01 03:57 · Score: 1

Two 1.5TB Seagates failed here, one died completely just after its 90 day warranty expired, the other lasted almost 6 months before its SMART error rate abruptly became huge (with a lurid warning that the drive is about to fail). They were the first Seagates I've had in years, and they'll be the last allowed in this house for several more years. My previous experiences with Seagate had been good, but that was back in the sub-GB days (and sub-GB is not a typo) before Seagate quality became a crap-shoot. The failed 1.5TB Seagates were replaced with 2TB WD drives, which have been humming along without SMART errors for almost two years.

--
Those who can make you believe absurdities can make you commit atrocities. - Voltaire

Slashdot Mirror

Ask Slashdot: Do You Test Your New Hard Drives?

348 comments