Data Center Study Reveals Top 5 SMART Stats That Correlate To Drive Failures
Lucas123 writes Backblaze, which has taken to publishing data on hard drive failure rates in its data center, has just released data from a new study of nearly 40,000 spindles revealing what it said are the top 5 SMART (Self-Monitoring, Analysis and Reporting Technology) values that correlate most closely with impending drive failures. The study also revealed that many SMART values that one would innately consider related to drive failures, actually don't relate it it at all. Gleb Budman, CEO of Backblaze, said the problem is that the industry has created vendor specific values, so that a stat related to one drive and manufacturer may not relate to another. "SMART 1 might seem correlated to drive failure rates, but actually it's more of an indication that different drive vendors are using it themselves for different things," Budman said. "Seagate wants to track something, but only they know what that is. Western Digital uses SMART for something else — neither will tell you what it is."
I have had drives fail. I took them off line and wrote 0 and 1 to them with dd until Reallocated_Sector_Ct stops raising and Current_Pending_Sector goes to zero then ran e2fsck -c -c on them 2 or 3 times then, I put them back on line!!!
Most people would say this is crazy but in my opinion, the surface of the drives often have bad spots while the rest is perfectly OK. Some on those drives are still on line without reporting any new errors after more than 5 years, some almost 10 years. Those are server drives with very low Start_Stop_Count, Power_Cycle_Count and Power-Off_Retract_Count. All lower than 250 after 10 years. Those drives are spinning all the time.
Newer drives will relocate bad sectors to free reserved space they keep for that purpose. As long as you don't run out of free spare space, IMHO, it is worth a try.
Everything I write is lies, read between the lines.