Slashdot Mirror


Ask Slashdot: How Do You Test Storage Media?

First time accepted submitter g7a writes "I've been given the task of testing new hardware for the use in our servers. For memory, I can run it through things such as memtest for a few days to ascertain if there are any issues with the new memory. However, I've hit a bit of a brick wall when it comes to testing hard disks; there seems to be no definitive method for doing so. Aside from the obvious S.M.A.R.T tests ( i.e. long offline ) are there any systems out there for testing hard disks to a similar level to that of memtest? Or any tried and tested methods for testing storage media?"

5 of 297 comments (clear)

  1. Why? by headhot · · Score: 5, Insightful

    Even if your storage passes the test, it could fail the next day. What you should be doing is designing your storage to gracefully handle failure, like RAID 5 with spares.

    1. Re:Why? by Shagg · · Score: 4, Insightful

      No, the point is to design your system so that if it fails 2 weeks down the line... it isn't a problem.

      --
      Unix is user friendly, it's just selective about who its friends are.
    2. Re:Why? by gregmac · · Score: 5, Insightful

      Even if your storage passes the test, it could fail the next day. What you should be doing is designing your storage to gracefully handle failure, like RAID 5 with spares.

      And then what you should test is that it actually notifies you when something does fail, so you know about it and can fix it. You can also test how long it takes to rebuild the array after replacing a disk, and how much performance degradation there is while that is happening.

      --
      Speak before you think
    3. Re:Why? by Joce640k · · Score: 4, Insightful

      Point is: You can't 'test'.

      You can only tell if it's working, not when it's about to fail.

        If people could predict when hard drives were going to fail we wouldn't need RAID or backups.

      --
      No sig today...
  2. Are you testing an array or individual drives? by HockeyPuck · · Score: 4, Insightful

    I manage a team that oversees PB of disk, both within an enterprise array and internal to the server. For testing the arrays, since there's GB of cache in front of the disks, I can only rely on the vendor to do the appropriate post installation testing to make sure there are no DOA disks. For internal disks, as others have mentioned you could run IOMeter for days without a problem and then the very next day it's dead. Unlike memory, disks have moving parts that can fail much easier than chips. However, with proper precautions like RAID, single disk failures can be avoided.

    The bigger problem is having a double disk failure. This is due to the amount of time required to rebuild the failed disk. Back when disks were 100GB this was a "relatively" quick process. However, in some of my arrays with 3TB drives in them, it can take much longer to replace the drive. Even to the point whereby having hotspares has been considered to be not worth it as my array vendor will have a new disk in the array within 4hrs. With what an enterprise disk costs from the array vendor (not Frys), it can start to add up.