Slashdot Mirror


Ask Slashdot: How Do You Test Storage Media?

First time accepted submitter g7a writes "I've been given the task of testing new hardware for the use in our servers. For memory, I can run it through things such as memtest for a few days to ascertain if there are any issues with the new memory. However, I've hit a bit of a brick wall when it comes to testing hard disks; there seems to be no definitive method for doing so. Aside from the obvious S.M.A.R.T tests ( i.e. long offline ) are there any systems out there for testing hard disks to a similar level to that of memtest? Or any tried and tested methods for testing storage media?"

4 of 297 comments (clear)

  1. The usual by macemoneta · · Score: 5, Informative

    All I usually do is:

    1. smartctl -AH
    Get an initial baseline report.

    2. mke2fs -c -c
    Perform a read/write test on the drive.

    3. smartctl -AH
    Get a final report to compare to the initial report.

    If the drive remains healthy, and error counters aren't incrementing between the smartctl reports, it's good to go.

    --

    Can You Say Linux? I Knew That You Could.

  2. Reliability and fault-tolerance by Mondragon · · Score: 5, Informative

    Not completely related to how to test, but...

    In 2007 Google reported that for a sample of 100k drives, only 60% of their drives with failures had ever encountered any SMART errors. Also, NetApp has reported a significant amount of drives with temporary failures, such that they can be placed back into a pool after being taken offline for a period of time and wiped. Google also had a lot of other interesting things to say (such as heat has no noticeable effect on hard drive life under 45C, that load is unrelated to failure rates, and that if a drive doesn't fail after 3 months, it's very unlikely to fail until the 2-3 year timeframe.

    You can find the google paper here: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf

    A few other notes that you can find from storage vendor tech notes if you own their arrays:
      * Enterprise-level SAS drives aren't any more reliable than consumer SATA drives
        - But they do have considerably different firmwares that assume they will be placed in an array, and thus have a completely different self-healing scheme than consumer-level drives (generally resulting in higher performance in failure scenarios)
      * RAID 5 is a really bad idea - correlated failures are much more likely than the math would indicate, especially with the rebuild times involved with today's huge drives
      * You have a lot more filesystem options that might not even make sense to use with a RAID system, like ZFS, as well as other mechanisms for distributing your data at a layer higher than the filesystem

    Ultimately the reality is that regardless of the testing you put them under, hard drives will fail, and you need to design your production system around this fact. You *should* burn them in with constant read/write cycles for a couple days in order to identify those drives which are essentially DOA, but you shouldn't assume any drive that passes that process won't die tomorrow.

  3. Re:SpinRite by washu_k · · Score: 5, Informative

    Spinrite may do an OK job of exercising disks, but 90% of what it claims to do is BS.

    An easy test to prove that Spinrite is BS is run it against a USB key. Not a SATA SSD, but a USB flash drive. Make the USB key bootable with DOS, put Spinrite on and boot a PC with no other drives. Run its "tests" against the USB key. All the "low level" tests Spinrite claims to do will appear to work, but are impossible on a USB device.

    Infact, they are impossible on a modern mechanical HD as well. As yacc143 pointed out, modern drives are not the same as MFM/RLL drives of the past. The low level tests that Spinrite claims to do are simply impossible.

    It's also a terrible data recovery program, since it can only write recovered data back to the same disk. That's a data recovery 101 no-no, and Spinrite fails.

  4. Re:Why? by tlhIngan · · Score: 5, Informative

    Also: HARDWARE RAID CARDS.

    I can't stress that enough. software and semi-software raid is a joke.

    Not until the hardware fails and you need the data that was on there but not on the backup (or realized the backup failed a long time ago...).

    For performance, yes, hardware is fastest. For reliability though, software RAID is better (hardware RAID can have interesting firmware version issues).

    Linux running an md RAID array? If the server goes down, pop the drives in another server, a couple of mdadm commands later and the array is up and running. Hell, even Windows' software RAID ought to be able to work to recover an array where the server hardware died.

    So if you're using RAID not for performance reasons, but for protection against hard drive failure, soft-RAID works very well. Hell, one of my NAS appliances died, and all I did was take the drive out, attach 4 USB adapters to them, and plug them into my Linux box. Instant access to the data,

    There's nothing like the panic that happens when an array goes down due to non-drive hardware failure.