Ask Slashdot: How Do You Test Storage Media?
First time accepted submitter g7a writes "I've been given the task of testing new hardware for the use in our servers. For memory, I can run it through things such as memtest for a few days to ascertain if there are any issues with the new memory. However, I've hit a bit of a brick wall when it comes to testing hard disks; there seems to be no definitive method for doing so. Aside from the obvious S.M.A.R.T tests ( i.e. long offline ) are there any systems out there for testing hard disks to a similar level to that of memtest? Or any tried and tested methods for testing storage media?"
http://www.grc.com/sr/spinrite.htm
K Man
I've hit a bit of a brick wall when it comes to testing hard disks
Have you tried throwing them against the brick wall?
mhdd will test each sector and time it takes to acess, you can blacklist weak/slow sectors. Bout the best I know for disk integrity.
Even if your storage passes the test, it could fail the next day. What you should be doing is designing your storage to gracefully handle failure, like RAID 5 with spares.
In previous jobs, I've used the system of:
Full Format, Verify, Erase, then a Drive fitness test.
If there are errors in media, the Format, verify or erase will pick it up, then the fitness test to check the hardware.
Hitachi has a Drive Fitness test program
I have also used hddllf (hddguru.com)
It's a joke. I've seen drives work fine for years with it showing imminent drive failure and I've seen drives die instantly with no warning given whatsoever.
There is no perfect tool that I could say, each drive manufacturer makes their own, and there are numerous third party tools out there as well. My best advice is have them all and have them handy. One I use quite a bit is HDD Regenerator, pretty thorough utility but it takes some time to run.
If carrots got you drunk, rabbits would be fucked up. - Comedian Mitch Hedberg R.I.P. 03/30/68-2/24/05
don't use consumer drives if you're concerned.
see also http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/archive/disk_failures.pdf
The Goog wrote a nice paper on hard drives.
Need Mercedes parts ?
I look at the SMART data, then I run "fsck -f -c" to test all blocks on the drive, then I look at SMART data again to see if there have been any read errors or remapped sectors. Next, I run dd if=/dev/zero of=/dev/sdx (where sdx is the new drive), to write all sectors. I look at the SMART data again, and repeat the fsck/dd commands as many times as I need. This can easily be scripted, and you can do some random writing as well to exercise the drives seek characteristics.
ZFS is focused mainly on integrity, so just set copies to 10, with checksuming, and stress it with filling up files, an occasional scrub, and so forth. If there's a problem, zfs will report it.
Then have ghosting software auto backup periodically.
You can test that the drive works pretty easily, put it in a PC, copy a bunch of files to it (perhaps enough to fill it up), then run MD5 on those files vs the originals. That would be the "pedantic" way to test it, for "turbo-pedantic" (a bit like running memtest for 72 hours) you can test this way for your entire MP3 collection, then test again for your entire Quantum Leap upscaled 720p dvdrips collection.
For more practical testing, most drive manufacturers offer "validation" software tools for RMA purposes to test low-level operations and performance, and most of them are generic to the extent that you can actually test any make of drive the same way. It's free and it works, what more could you ask for?
Jet Stress does a good job of runnig the storage media through a lot of work.
If you're concerned about drive performance and reliability don't waste your time on off-the-shelf junk. Buy actual enterprise class drives from distributors which pay many dollars to have each and every drive tested for both performance and reliability in varying environmental conditions.
Has several levels of testing including a full blown exerciser. I've found it very effective for detecting the slightest drive problems. It's available for download from multiple sources.
All I usually do is:
1. smartctl -AH
Get an initial baseline report.
2. mke2fs -c -c
Perform a read/write test on the drive.
3. smartctl -AH
Get a final report to compare to the initial report.
If the drive remains healthy, and error counters aren't incrementing between the smartctl reports, it's good to go.
Can You Say Linux? I Knew That You Could.
Disappear for 4 months and come back and say they are good. Even if you test there is no reason that hardware can't fail at any point after the test. That's why we buy redundancy and support.
I've always had good results with the Hitachi Drive Fitness Test. Works fine with non-Hitachi drives too.
root ~/bin # cat scandisk
#!/bin/bash
# RW scan of HD
argg='/dev/'$1
# if IDE (old kernels)
hdparm -c1 -d1 -u1 $argg
# Speedup I/O - also good for USB disks
blockdev --setra 16384 $argg
blockdev --getra $argg
#time badblocks -f -c 20480 -n -s -v $argg
#time badblocks -f -c 16384 -n -s -v $argg
time badblocks -f -c 10240 -n -s -v $argg
exit;
---------
Note that this reads existing content on the drive, writes a randomized pattern, reads it back, and writes the original content back. With modern high-capacity over-500GB drives, you should plan on leaving this running overnight. You can do this from pretty much any linux livecd, AFAIK. If running your own distro, you can monitor the disk I/O with ' iostat -k 5 '.
From ' man badblocks '
-n Use non-destructive read-write mode. By default only a non-destructive read-only test is done. This option must not be combined with the -w option, as they are mutually exclusive.
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
http://www.uxd.com/qtpro.shtml
This is, by far, the best hard drive (and hardware) testing suite I've ever used.
There are many free tools for doing a surface scans of a hard drive, that test for bad sectors. Usually it's bad sectors that cause Hard Drives to fail, and that's all you can really test anyway. Hard Drives will fail, it's just a mater of time, that's why you need redundancy. Other than testing, keeping the System Cool, and Dust Free is all you can do.
-- By all means let's be open-minded, but not so open-minded that our brains drop out.
Modern failure modes tend to be catastrophic, you won't find bad sectors on a hard drive these days. The drives have so much error correcting and sector re-mapping that the very act of writing to a bad portion of the platter will silently correct and remap the sector. The main way you can see failures is to write data, do not read it for a *long* time, then get a read failure. Plus the initial part of the bathtub curve is in months not days, so testing for reliability is really not something you can do.
Hard Disk Sentinel: http://www.hdsentinel.com/ is a great tool They even have a free Linux client. What it does over SMART is that it takes the SMART data and weights them according to indications of failure, then gives you a score of 0-100 (100 being great, 0 being dead) as to how healthy the drive is. We use this extensively and have created NAGIOS scripts that monitor the output. Generally, if a drive has a score of 65 or higher, I will generally continue using it (pretty much all my setups are RAID 10 or RAID 6). If the score starts dropping rapidly (a few points every day, even if it started high) or gets below 65 or so, I go ahead and replace it. It has helped out a bunch.
Even with that, using the SMART data, in a SMART way, still only predicts about 30% of failures. The other 70% will come out of no where. That is why it is best to assume all drives will die at anytime and are suspect and never allow a single drive to be the sole copy of anything.
When it comes to media, even with SMART your drives will work 'till they die, and there's no way to predict that with a test.
Given that, your best option is to ensure that the drives are performing as expected. I've found many a faulty drive with IOZONE.
http://www.iozone.org/
-- "In order to have power, I must be taken seriously." -Mojo Jojo
OK so that was the noob version of the question.
I have a question for the old timers. has anyone ever implemented something like:
1) log the time and temp
2) do a run of bonnie++ or a huge dd command
3) log the time and temp
4) Repeat above about ten times
5) numerical differentiation of time and temp and also any "overtemps"
In theory run from a cold or lukewarm start that could detect a drive drawing "too much" current or otherwise being F'd up, or cooling fan malfunction
I'm specifically looking for rate of temp increase as in watts expended, not just static workload temp.
In practice it might be a complete waste of time.
Another one might be something like a smart reported temp vs iostat reported usage plotted on a scatterplot graph.
So the old timer question is has anyone ever bothered to implement this, and if so, did it do anything useful other than pad your billable hours?
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
I manage a team that oversees PB of disk, both within an enterprise array and internal to the server. For testing the arrays, since there's GB of cache in front of the disks, I can only rely on the vendor to do the appropriate post installation testing to make sure there are no DOA disks. For internal disks, as others have mentioned you could run IOMeter for days without a problem and then the very next day it's dead. Unlike memory, disks have moving parts that can fail much easier than chips. However, with proper precautions like RAID, single disk failures can be avoided.
The bigger problem is having a double disk failure. This is due to the amount of time required to rebuild the failed disk. Back when disks were 100GB this was a "relatively" quick process. However, in some of my arrays with 3TB drives in them, it can take much longer to replace the drive. Even to the point whereby having hotspares has been considered to be not worth it as my array vendor will have a new disk in the array within 4hrs. With what an enterprise disk costs from the array vendor (not Frys), it can start to add up.
I generally run 'badblocks' (included in most linux disributions).
Not completely related to how to test, but...
In 2007 Google reported that for a sample of 100k drives, only 60% of their drives with failures had ever encountered any SMART errors. Also, NetApp has reported a significant amount of drives with temporary failures, such that they can be placed back into a pool after being taken offline for a period of time and wiped. Google also had a lot of other interesting things to say (such as heat has no noticeable effect on hard drive life under 45C, that load is unrelated to failure rates, and that if a drive doesn't fail after 3 months, it's very unlikely to fail until the 2-3 year timeframe.
You can find the google paper here: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf
A few other notes that you can find from storage vendor tech notes if you own their arrays:
* Enterprise-level SAS drives aren't any more reliable than consumer SATA drives
- But they do have considerably different firmwares that assume they will be placed in an array, and thus have a completely different self-healing scheme than consumer-level drives (generally resulting in higher performance in failure scenarios)
* RAID 5 is a really bad idea - correlated failures are much more likely than the math would indicate, especially with the rebuild times involved with today's huge drives
* You have a lot more filesystem options that might not even make sense to use with a RAID system, like ZFS, as well as other mechanisms for distributing your data at a layer higher than the filesystem
Ultimately the reality is that regardless of the testing you put them under, hard drives will fail, and you need to design your production system around this fact. You *should* burn them in with constant read/write cycles for a couple days in order to identify those drives which are essentially DOA, but you shouldn't assume any drive that passes that process won't die tomorrow.
I mirror data and test it periodically with rsync using the dry-run (-n) and checksum options (-c) to do a full comparison. I usually have more confidence in a new disk after I've done this a few times.
Read this for more info on disk storage
http://storagemojo.com/2007/02/26/netapp-weighs-in-on-disks/
I mirror data and test it periodically with rsync using the dry-run (-n) and checksum options (-c) to do a full comparison. I usually have more confidence in a new disk after I've done this a few times.
I have a favorite boulder that has served my burn-in testing needs pretty well. Would you like a photo so you can chisel your own? I added some LED bling to mine.
I mean, really, someone working at slashdot doesn't know this? This is about as basic a question as it gets when it comes to hardware.
Raid 5, a solid backup scheme, and a storage closet full of replacement drives. There is no good way to test HDDs.
Due entirely to the fact that they are a WEAR item it is only possible to decide which brand you trust the most.
Other than that, if its a big job and a lot of HDDs are going to be bought you could take 10 of each candidate drive and run them through spinrite till they fail. One that fails last wins.
SSDs are still too expensive to be using for mass storage, and should be treated the exact same way even if you ARE using them for mass storage regardless, as they are also a wear item, and will fail without warning after so many read/write operations, the same as a HDD.
IIRC zfs supports online "data scrubbing" http://en.wikipedia.org/wiki/Data_scrubbing#In_RAID . This, combined with the other features of zfs can help you prevent data loss.
The storage team where I work use a program called IOmeter. Ot runs on Windows and Linux (and I think other platformd as well) and is open source. They were using it for stress testing SAN storage but I think it would work for locally attached storage as well.
Any insufficiently advanced magic is indistinguishable from technology.
http://linux.die.net/man/1/spew
While you can't predict against future failures, if you want to make sure that your drive media is okay today, there is a tool that will fill your disk with garbage and then verify that your disk has the right garbage on it: spew. Spew isn't the friendliest tool, but it does the job.
As a side effect, it stresses your I/O systems and memory. Years ago, I discovered that some Dell 2550's I had couldn't pass this test with the SATA controller I had shoved into them that seemed to work fine otherwise.
http://lime-technology.com/forum/index.php?topic=2817.0 ... the main feature of the script is
1. gets a SMART report
2. pre-reads the entire disk
3. writes zeros to the entire disk
4. sets the special signature recognized by unRAID
5. verifies the signature
6. post-reads the entire disk
7. optionally repeats the process for additional cycles (if you specified the "-c NN" option, where NN = a number from 1 to 20, default is to run 1 cycle)
8. gets a final SMART report
9. compares the SMART reports alerting you of differences.
Check it out. Its "original" purpose was to set the drive to all "0's" for easy insertion into a parity array (read: parity drive does not need to be updated if the new drive is all zeros) but it has also shown great utility as a stress test / burn-in tool to detect infant mortality and "force the issue" as far as satisfying the criteria needed for an RMA (read: sufficient reallocated block count)
If your skill level is enough to adapt the script to your own environ then great, otherwise UnRaid Basic is free and allows 3 drives in the array which should allow you to simultaneously pre-clear three drives. You might even be able to pre-clear more than that (up to available hardware slots) since you aren't technically dealing with the array at that point, but with enumerated hardware that the script has access to which should be eveything on the disc. Hardware requirements are minimal and it runs from flash.
If you can't be good, be good at it!
A good Storage Unit will do a good job at maintaining the Hard Drives you purchased and keeping them safe. They can also handle problems with drives to prevent data loss, and notify you when a drive is about to fail. A good SAN or DAS is what I would purchase. We purchased a Dell Powervault DAS, and have been very happy with it, I never worry about Hard Drives failing because I know the DAS will take care of it. Some companies like Dell and EMC will know if you have a bad hard drive, and ship you a new one before you realize it.
-- By all means let's be open-minded, but not so open-minded that our brains drop out.
It's always good to do a full write with read verify on new media. For my own piece of mind, I wrote a Java application that fills a drive with pseudo-random data and then reads it back to make sure (1) the data is correct, and (2) the entire drive capacity can be accessed. Use this in addition to the many good hardware diagnostic tools (see other comments). As has been pointed out, this only tells you that the drive is working now, but can't predict when it will fail.
BLATANT ADVERTISEMENT: The Java program has been released under the GPL and can be found here (Linux, MacOS, Windows, etc): http://linux.softpedia.com/get/System/System-Administration/Erase-Disk-46749.shtml
we use HP servers and HP ships a suite of software to install on the server along with the OS. they monitor the hardware and warn you of any problems. unless you like doing things the hard way, this was solved years and years ago
i have a bad hard drive i call HP, send them a log file and in 2 hours i have a new one delivered
Don't use raid 5 (write hole).
use 10 or simple mirroring.
(raidz should be ok).
Use the money saved on 2 expensive raid controllers (One fails you are stuffed if you cannot get another which is by no means guaranteed) to buy more disks.
Just RAID your storage or better connect to a SAN and be done with it.
Hitachi's (previously IBM's) Drive Fitness Test is the most thorough disk test I've used. It works on all makes, and has a "drive exerciser" that can loop a test sequence.
I've seen it find problems with drives that the manufacturer's own tools don't expose.
My policy is that if a drive survives 20 loops of the exerciser and then a full extended test that it's fit for production service.
Testing hardware by exercising it is like testing matches to see if they are good.
Sorry, but gray text on gray background is making my eyes bleed.
http://www.heise.de/download/h2testw.html - switchable to English of course.
While it is primarily advertised for flash media these days (and indispensable since there have been numerous forgeries or DOAs at least on the European market lately), it evolved as an HDD tester in the first place.
On Linux in particular, a combination of dd and smartctl (before&after writing the entire disk, as well as for self-tests) may come in handy too, of course.
It takes a while, but if you really want to be sure of your hardware (as sure as you can be, at least.)
Check the SMART status. If there are any re-allocated sectors, make note of the number.
Run badblocks with the -w switch against the drive (from a Linux live cd of your choice, for example)
That should completely read/write test the drive 4 times with multiple patterns. There should be no errors reported. This test will take longer than overnight on modern drives.
Check the SMART data again. Be wary if there has been an increase in Re-allocated sectors. This is considered normal and does not constitue drive failure. However, most drives should not have any reported re-allocations so early in life, and this may indiacate you have a drive of marginal quality.
Do not try this on SSD drives.
or read a book
Most everything above is good, but don't overlook the obvious. Spin the drive up in a quiet room and listen to it. If it sounds different from all the other drives like it, there's a good chance something is wrong.
I replaced the drive in my TiVo. The 1st replacement was so much louder, I swapped the original back, then put the new drive in a test rig. It started getting bad sectors in a few days. RMA'd it to Seagate, and the new one was much quieter.
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Send your drives to me, postage paid. I'll test them for you for no charge, and send them back to you before the warranty expires.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Or any tried and tested methods for testing storage media?
#2 pencil, Scantron sheet and the test.
Works for me everytime.
If "disco" means "I learn" in Latin, does "discothèque" mean "I learn technology"?
There's a program called h2benchw and that's great for testing hard drive.
There might be a few versions floating around, but it was published by the German magazine "C't". I use it to test my hard drives because it's the ONLY one that I've found that actually has a "swapping" application profile and tries to mimick the swapping file/data access pattern as a lot of what I do is swap (performance) limited.
The rest of the usual suite that I use include ATTO Express Pro Tools (now renamed to something else I think), and also HDTach as the most basic.
But there's no test that can replace what you're ultimately going to actually do with the system. Create a uniform benchmark for yourself and run your regular workload on it. That'll be the sure-fire way to test new hardware.
badblocks -c 10240 -s -w -t random -v /dev/sda1
that's my standard test for all HDDs
#
#\ @ ? Colonize Mars
#
If storage media is used for backups - try restoring them. If files that are stored there are not changed - you can also save MD5 (or some other hash) in separate file and re-calculate/re-check it periodically.
while label_visible(media):
apply_lighter_fluid(media)
ignite(media)
wait_while_burining(media)
return media_is_bad
Well.. maybe. Or Maybe not. But Definitely not sort of.
MHDD, If this hasn't been mentioned yet, it's official...everyone on /. is an idiot.
Spinrite is the one you want.
http://www.grc.com
1a) Build a small test computer with several removable hard drive racks. (SNT-125B go for around $16 on Newegg)
1b) Or plug a hard drive dock with eSata to your rig for individual drive testing.
2a) Run SeaTools for all hard drive brands, including WD. (http://www.seagate.com/www/en-us/support/downloads/seatools)
2b) Or run Data Lifeguard Diagnostics if it's a WD drive.
I've run SeaTools against every brand of hard drive, and it's caught every one.
For the most part I stick with WD drives. Their warranty is excellent.
I use Hiren's boot disk and run the HDAT test, it seems to work for the stuff i do. The other best way to test a HDD is to run it through a serious burn in test. We test hard drives and ship them daily and rarely do we have failures or returns.
Other people have posted about the manufactures test utilities, RAID etc.
But I didn't see anyone mention that most newer drives can run media scans while idle.
For example seagate supports the T10 Background Media Scan (BMS), where the drive scans itself, and relocates sectors when its not actively processing commands from the host. It also supports idle read after write, which reads recently written sectors and compares them with the copy in the cache during idle periods.
Finally, all the modern RAID controllers i've seen have scrub options for validating the RAID parity and taking drives offline that are failing the parity checks. (which is mostly pointless if your drive is scrubbing itself, the ECC from the drive provides more protection than RAID5. 6 is probably safer..).
The key of course it to make sure your raid controller understand the drive failure metrics behind it.
Speaking as somebody that has done hardware qualifications and burn-in development at very large scale for companies you ahve heard of let me tell you the tools I use:
fio: The _BEST_ tool for raw drive performance and burnin testing. A couple of hours of random access will ensure the drive head can dance, then a full block by block walk through with checksum verification will ensure that all blocks are readable and writable.. I usually do 2 or 3 passes here. You can tell fio to reject drives that do not perform to a minimum standard. Very useful for finding functional yet not quite up to speed drives. The statistics produced here are awesome as well.. Something like 70 stats per device per test.
stressapptest: This is google's burn in tool and virtually the only one I have ever found that supports NUMA on modern dual socket machines. This is IMPORTANT as its easy to ignore issues that come up with the link between the CPUs. The various testing modes give you the ability to tear the machine to pieces which is awesome. Stressapptest also is the most power hungry test I have ever seen, including the intel Power testing suite that you have to jump through hoops to get.
Pair this with a pass of memtest and you get a really, really nice burn in system that can burtalize the hardware and give you scriptable systems for detecting failure.
Choosing one brand to trust seems counterproductive, since this implies your disks will all be from one brand, thus increasing the chances that both/all disks in a mirror set are affected by a single process flaw. Best practice is to ensure mirrored drives are not only not of the same lot, but not produced on the same equipment, and the only easy way to do that is to use different brands.
dbench, lmbench, bonnie/++ and badblocks monitoring with smartmontools, blktrace, and seekwatcher
I've always used "consumer" grade HD's in my servers and they work most reliably. Difference from a desktop: they never stop spinning from day1. No heat up/cool down expansion/contraction. The "green" drive that spin down after some time of no use I've had the highest failure rates with.
I see that many comments include utilities that may or may not indicate ACTUAL problems. Some of these test for speed, but how? Personally, i find the most accurate way to test storage media (and network media) is by copying data to and from the storage media in question, to a RAMDISK. For network throughput, i like to copy ramdisk to ramdisk via differing protocols, ie ftp/http/samba, etc. (i run a small ramdisk as a virtual folder in IIS (win7) for hosting small files to give to friends.) Also, installing (cpu/mem/gpu) benchmark utilities to a ramdisk yields interesting results. i don't believe most or all benchmark utilities are ACTUALLY unhindered by storage media when testing other components, like the cpu.
RAID 1 is what you are refering to, RAID 10 is nested RAID setup which is a stripe of mirrors.
When they start selling drives large enough and can support more I/O, your RAID 1 only plan might work. Some people need more than 2TB per LUN and need far more I/O than two spindles can support. Sure, you can make one big RAID1 I guess but that is a 50% loss in space and not a good method to use.
You can go back to /b/ now..
It may not be sophisticated, but MHDD is what I use at work (among a couple of other tools). Other tools are more reliable in different circumstances, but my first stop is always MHDD, because it will give me a comprehensive R/W delay test on a disk. Extremely practical for a workshop, perhaps not practical for a data centre.
Admit it. You post strawman arguments as AC so you get modded Insightful for refuting them, rather than Troll
Or do any sort of verified format of the device.
Drives exposing logical sectors instead of physical sectors has been a problem for over 10 years. There are plenty of ways to determine the physical geometry such as timing seek times and sometimes it is as easy as looking up the drive type in a list. As far as SpinRite being BS because it works on USB drives, the program does what it is programmed for. When it encounters a condition it doesn't understand it makes lemonade and moves on. It may not be programmed to respond, "Are you sure this is rotational media?"
...passwords. Be sure they're hardened and salted. Also, redundancy and the cloud.
Basiclally, check the reallocated sectors in smartctl, run badblocks -w (for greater converage, add some "-t random" passes to the usual 4 fixed patterns), and make sure that reallocated sectors & pending sectors haven't changed.
If they have, you can use some judgement, but if they haven't, it's probably okay.
Some seek tests would be nice, but it already takes long enough to just read and write every sector.
If the weather's been cold (<5C) between the warehouse and the place of testing then I give a disk 24 hours to acclimatise outside of any packaging before powering it up. I check it for visible signs of damage as I've had a few delivered which had undamaged packaging but was still visibly broken. Once I've noted the product code, model name and serial (great fun otherwise, if $mfg does a recall and you don't know if affected) I do a warranty check on the drive. Ebuyer in the UK send out grey imports with no UK warranty sometimes, which shows up on $mfg's web site. If this is the case I'll get written confirmation from the supplier that they will honour the manufacturer warranty. n.b. I run a business, so consumer protection laws do not apply to me - this is a necessary precaution.
Then I start testing using a procedure which is designed to be thorough but not wasting any time if something is amiss. I don't consider SMART to be the end-all of fault finding, but always trust it if it's saying something is wrong. The manufacturer will honour returns according to SMART, so it's good enough for me.
I check the SMART attributes for anything obvious. Any mention of any LBA being bad or possibly bad, then the drive fails. I then run the short, conveyance, offline and long tests, checking the attributes and logs afterwards.
I run the manufacturer's short test on it, then the long test. These often lie if a disk is bad but if they do show up anything, the disk fails. I record the firmware version and check the manufacturer's site to see if they advise an upgrade (I don't trust the version printed on the label). smartmontools is also quite good at alerting you to this. I'll then DBAN the disk with DoD short to give it a good test and then check the SMART attributes again. Once these all pass I'll start using it and get smartmontools to schedule regular tests and email me if anything bad shows up.
They might know something or have utilities for doing such things...
"This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
Try ZFS.
If it's that important, implement an appropriate RAID solution and you don't even need to care about individual drives.
http://www.google.com/patents/US7461298
If it's a single disk (e.g. in a compute only server), then treat it like toilet paper. Know it's going to be replaced, how to replace it, and have spares on hand. Inform all people who use these servers that data on the server is being treated as throw-away at all times, and to never, ever expect data to survive a reboot -- and when the hard drive fails, you *might* be able to recover some data. It goes without saying that you also need to be able to re-provision the system with a common image, as well.
If people can't deal with that, then it's time to get into RAID, decent backups, NAS, SAN and other technologies that require additional cost and care to purchase and maintain but that are designed to guard against disk failures bringing down critical systems.
You should also make sure you understand your drive replacement procedure (as well as your consumers). Are you keeping a stack of spares and self-servicing, or are you paying a vendor to provide n-hour service?
When in doubt, test in production...
Fio
https://www.linux.com/learn/tutorials/442451-inspecting-disk-io-performance-with-fio
http://www.obsecurities.com/reviews/5-fio-flexible-io-tester-review
Maurice W. Hilarius Voice: (778) 347-9907
To answer the actual question: IOMeter. It's a load generator / benchmark. You can generate loads to test a storage device for your specific requirements and see if performance is up to snuff. You can also generate loads to stress a device until you halt it.
As someone else mentioned, throw bunches of read/writes at a drive for a couple days then put it into production with a reliable system to gracefully handle failure. You want to find drives that would fail in the first couple weeks and keep them from hitting your production environment.
Its not. Its actually cheaper to test a particular batch if its for a big enough job.
In a raid 5 3 drive configuration if one disk fails you rebuild it, then swap each of the other drives out one at a time immediately afterwards. Since one drive has failed and they've all had the same use pattern you can assume that the others are going to fail as well.
Beyond that, if things are that mission critical you should have streaming backups to an additional backup array of newer drives. Brand makes zero difference here, they just have to have less usage.
Besides which, whatever they claim, raid controllers to this day do not like different types of hardware inside a single array. I've seen IT guys take the manufacturers at their word and spend months looking for "ghosts in the system" due to random corrupt data errors. Its just the raid controller spazing because someone installed a Western Digital drive in the same array as a Maxtor, or a 120gb drive alongside two 80gb drives.
There is no "easy" when it comes to proper hardware management. If you're working in IT you need to learn this fast before it costs your company big time down the road.
You set some basic parameters, such as min/max filesizes to use, stepping size between files, how many processes/threads to use in the test, etc., etc.,... It will run a gambit of tests reading/writing files, sequential, random, varying read/write sizes, and sizes of the files it is creating. It outputs nice graphs so you can see where the peek performance values are in terms of the storage dealing with different sized files and read/write sizes.
http://www.iozone.org/
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
The company I work for reads the manufacturer's white papers on their drives. That's been our testing strategy for years; doesn't everyone do it that way?
Spinrite is good.
Also, as drives get bigger, they also get faster
True, but they get bigger faster. That's why there's a movement to migrate away from RAID 3-5: the risk of two drives failing within two days is too great.
And you obviously don't know how to price them.
On a new drive it is a good idea to run badblocks in write mode, which fills the drive with several byte patterns and compares. This will cause bad blocks to be remapped internally in the drive and generally ensures the physical medium is working.
But there can be other problems with disk and even more so with multi-disk storage. Specifically badblocks writes a single byte pattern over all of the disk and then reads that back. But what if the drive makes a mistake with the address data is written to or read from? Since the disk is filled with the same pattern an address error isn't detected. It would be better to write different patterns at different addresses.
One quick hack to do this is to encrypt the drive and ran badblocks on the encrypted volume. Most encryption algorithms produce different output depending on the position of the block and address errors will then result in corrupt data being read back from the drive.
Nice shift of goalposts and blame - but I was replying to YOU, that stupid comment, and your somewhat silly ones since trying to justify it. Are you really trying to pretend that you were not dismissive of every hardware solution in that post and all those ones since?
Your mysterious and nebulous "most people" doesn't matter anyway when you were attempting to pretend that it was never useful. You've also entirely misread or pretended to misread my post immediately above - please read it again (or for the first time) before babbling about just buying a better CPU when 1 or not enough no matter how good it is.