Slashdot Mirror


Google Finds DRAM Errors More Common Than Believed

An anonymous reader writes "A Google study of DRAM errors in their data centers found that they are hundreds to thousands of times more common than has been previously believed. Hard errors may be the most common failure type. The DIMMs themselves appear to be of good quality, and bad mobo design may be the biggest problem." Here is the study (PDF), which Google engineers published with a researcher from the University of Toronto.

3 of 333 comments (clear)

  1. Re:ZFS by jfengel · · Score: 4, Insightful

    Changing your file system solves RAM errors how?

  2. Re:Percentage? by Tumbleweed · · Score: 4, Insightful

    Add to that the fact that Google (apparently) tends to run their data centers "hot" compared to what is commonly accepted, and use significantly cheaper components, and you've got a good explanation for why their error count is as high as it is.

    Yeah, but let's look at the more common situation - a home. Variable temperatures, most likely QUITE variable power quality, low-quality PSU, and almost certaily no UPS to make up for it. Add that to low-quality commodity components (mobo & RAM).

    I'd not be surprised to find the problem much more prevalent in non-datacenter environments.

    Switching to high-quality memory, PSU & UPS has made my systems unbelievably reliable the last several years. YMMV, but I doubt by much.

  3. clearly not a radiation engineer by SuperBanana · · Score: 5, Insightful

    That window looked out to a pile of coal, so the culprit was assumed to be low level alpha radiation.

    Alpha radiation is stopped by a sheet of office paper. It certainly wouldn't make it through the window, through the machine case, electromagnetic shield, circuit board, chip case, and into the silicon. Even beta radiation would be unlikely to make it that far.

    What is much more likely: thermal effects. IE, infrared from the sun heating up machines near the window.