Slashdot Mirror


Data Center Study Reveals Top 5 SMART Stats That Correlate To Drive Failures

Lucas123 writes Backblaze, which has taken to publishing data on hard drive failure rates in its data center, has just released data from a new study of nearly 40,000 spindles revealing what it said are the top 5 SMART (Self-Monitoring, Analysis and Reporting Technology) values that correlate most closely with impending drive failures. The study also revealed that many SMART values that one would innately consider related to drive failures, actually don't relate it it at all. Gleb Budman, CEO of Backblaze, said the problem is that the industry has created vendor specific values, so that a stat related to one drive and manufacturer may not relate to another. "SMART 1 might seem correlated to drive failure rates, but actually it's more of an indication that different drive vendors are using it themselves for different things," Budman said. "Seagate wants to track something, but only they know what that is. Western Digital uses SMART for something else — neither will tell you what it is."

3 of 142 comments (clear)

  1. Skip the blogspam, here's the real link by Anonymous Coward · · Score: 5, Informative

    https://www.backblaze.com/blog/hard-drive-smart-stats/

    Goes into a lot more detail too.

  2. The measurements in question: by Immerman · · Score: 4, Informative

    for those who are only passingly curious and don't want to read the article.
            SMART 5 - Reallocated_Sector_Count.
            SMART 187 - Reported_Uncorrectable_Errors.
            SMART 188 - Command_Timeout.
            SMART 197 - Current_Pending_Sector_Count.
            SMART 198 - Offline_Uncorrectable

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
    1. Re:The measurements in question: by omnichad · · Score: 4, Informative

      And I can confirm. Reallocated Sector Count rarely goes above zero when the drive is fine. It's possible to have a few sectors go bad and get reallocated, but it's usually part of a bigger problem when it happens (this number is reset to zero at the factory, after all initially bad sectors have been remapped). If the Current Pending Sector Count is non-zero, it's likely over.

      I always clone a drive immediately with ddrescue when it gets to this point, while the drive is still working.