Slashdot Mirror


Many DDR3 Modules Vulnerable To Bit Rot By a Simple Program

New submitter Pelam writes: Researchers from Carnegie Mellon and Intel report that a large percentage of tested regular DDR3 modules flip bits in adjacent rows (PDF) when a voltage in a certain control line is forced to fluctuate. The program that triggers this is dead simple — just two memory reads with special relative offset and some cache control instructions in a tight loop. The researchers don't delve deeply into applications of this, but hint at possible security exploits. For example a rather theoretical attack on JVM sandbox using random bit flips (PDF) has been demonstrated before.

23 of 138 comments (clear)

  1. Many DDR3 modules? by ArcadeMan · · Score: 3, Insightful

    This is all very interesting but totally pointless! Which modules? Tell us the brands, model names, manufacturer numbers?

    1. Re:Many DDR3 modules? by DigiShaman · · Score: 4, Insightful

      FTFP. "We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers."

      Short version, leakage current from adjacent gates can nudge other to bit-flip. I don't think this is a manufacturing problem as it is a fundamental EE design oversight. So yeah, defective by design (unintentionally)!!

      --
      Life is not for the lazy.
    2. Re:Many DDR3 modules? by Rei · · Score: 5, Informative

      If you're wanting to narrow it down, you won't like this line from the paper:

      In particular, all modules manufactured in the past two years (2012 and 2013) were vulnerable,

      It's pretty clever, and something I always wondered whether would be possible. They're exploiting the fact that DRAM rows need to be read every so often to refresh them because they leak charge, and eventually would fall below the noise threshold and be unreadable. Their exploit works by running code that - by heavily, cyclicly reading rows - makes adjacent rows leak faster than expected, leading to them falling below the noise threshold before they get refreshed.

      --
      I am a proud traitor to my species in alliance with my mother the Earth in opposition to those who would destroy her.
    3. Re:Many DDR3 modules? by DigiShaman · · Score: 4, Interesting

      True, and commodity chips not to exact spec will introduce disturbance errors. But apparently this is been a known problem with DRAM with various method of mitigation during the binning process. It's just that density and tolerances have become so tight that the issue is now exasperated. I wouldn't be surprised at all if those 19 models also had a few that failed if tested again and again.

      Honest. General computing from low-end PCs, phones, and other devices are long overdue in employing ECC by default. So you lose capacity and tiny performance hit. BFD if that means your data doesn't become corrupted. The only people that would care are the PC gaming benchmark queens.

      --
      Life is not for the lazy.
    4. Re:Many DDR3 modules? by DigiShaman · · Score: 2

      In my personal experience of "benchmark queens" in general; be it automotive performance or computing, are all about the synthetic numbers and zero basis on practicality (let alone value in cost). If a gamer is doesn't give a toss about a particular core subset of general computing (Video, CPU, RAM, and Storage), they're not benchmark queens. I've met plenty online who are. And when queens start debating online over numbers, the flamewars begin.

      --
      Life is not for the lazy.
    5. Re:Many DDR3 modules? by ChrisMaple · · Score: 2

      So, other than fixing the dram design, the solution is to refresh more frequently. A software fix might be a high priority background program that forces a full refresh at regular intervals (probably a big performance hit). If the CPU does its own dram control, there might be a register that affects refresh rate, or perhaps a microcode fix.

      The problem is analog in nature, which suggests that optimized and very clean supply voltages, and very clean and precisely timed control signals might reduce or eliminate the problem.

      In any case, this means that manufacturers need to fix their designs and test them more thoroughly.

      --
      Contribute to civilization: ari.aynrand.org/donate
    6. Re:Many DDR3 modules? by MightyYar · · Score: 3, Funny

      Climate change... [ducks].

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    7. Re:Many DDR3 modules? by tlhIngan · · Score: 2

      Data sheets now days are not avalable to the public

      Datasheets ARE publicly available. However, they're for the actual DRAM ICs themselves, and not of the modules.

      There are only a few DRAM manufacturers out there - Samsung, Hynix, Elpida, Micron are among them.

      Samsung Computing DRAM (they also have Graphics DRAM and others). Some of their newest chips don't have datasheets yet, but that'll be forthcoming. The older ones in production do, however.

      Hynix

      Micron (and Elpida).

      These are all generally available. Since the only real difference between them is a few timing numbers, they're not generally a huge secret - it's all governed by JEDEC standards anyhow.

      Memory modules are just collections of these chips so they can be generalized to what you buy in the store for your PC.

    8. Re:Many DDR3 modules? by greg1104 · · Score: 2

      I'm also bothered by people who put the word audiophiles in scare quotes for no good reason. P.S. Not all audiophiles are opposed to blind testing; some people like expensive audio toys that are objectively better too.

    9. Re:Many DDR3 modules? by MightyYar · · Score: 2

      ALL of that audiophile stuff sounds good (pun intended).

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  2. Re:good news for ECC memory makers by Rei · · Score: 3, Informative

    According to the paper, EEC only reduces but does not eliminate the problem (section 6.3). Multiple bits can be corrupted at once.

    --
    I am a proud traitor to my species in alliance with my mother the Earth in opposition to those who would destroy her.
  3. Malicious code can cause computers to crash by rossdee · · Score: 2

    Of course if you can get the target computer to run certain code, you can completely wipe all the RAM, but wheres the fun in that huh..

    1. Re:Malicious code can cause computers to crash by MightyYar · · Score: 2

      This gives you a way to affect RAM outside of a sandbox.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  4. Re:good news for ECC memory makers by DigiShaman · · Score: 2

    Ouch! Seriously bad. Worse than the Pentium FPU bug (and that's bad). What good is a computer if you can't rely on the data being committed back to disk because of corruption mid-flight in RAM?! At least with the FPU bug, it was only FPU. But here we're talking about an industry wide issue where any operation cannot guaranty data doesn't become corrupted back to disk. By the time bit-rot sets in, you may have to dive into your grandfather-father-son backup archive. And that's assuming such a backup scheme is being used by those who are effected. Shit, that's assuming people are even backing up their data in the first place!!

    --
    Life is not for the lazy.
  5. Does the cache control commands require root acces by TheSunborn · · Score: 2

    Does the cache control commands require root access on Windows or Linux?

  6. Re:good news for ECC memory makers by sshir · · Score: 4, Insightful

    At least with ECC you'll get _some_ feedback (it's random so it will pop from time to time) indicating that something fishy is going on. With regular ram all corruptions are silent so you'll get random crashes that will drive you crazy...

  7. Re:good news for ECC memory makers by ericloewe · · Score: 2

    Difference being that the system is immediately halted if an uncorrectable error is discovered.

  8. Re:Does the cache control commands require root ac by PhrostyMcByte · · Score: 5, Informative

    No. These are standard instructions that many apps require to function correctly when using multiple threads. Even if you aren't using them directly, at least some of the APIs you use most certainly are.

  9. Not theoretical. It's hogwash. by Anonymous Coward · · Score: 5, Funny

    This is ridiculous. Realistically, when have you ever run into a situation where stib teg ylirartibra deppilf?

  10. Re:good news for ECC memory makers by wolrahnaes · · Score: 2

    ECC does not mitigate it, but it will detect the problem where non-ECC memory will happily keep on operating with the corrupted data.

    For the standard car analogy, consider tire pressure monitoring systems. They won't stop you from getting a flat, but they'll let you know you have a slow leak where you might otherwise keep driving until it's bad enough that you notice otherwise. By that time the damage is done and you probably need a new tire.

    --
    I used to get high on life, but I developed a tolerance. Now I need something stronger.
  11. Known issue by Anonymous Coward · · Score: 5, Informative

    This has been know for some time. It's been referred to as "Row Hammer" and has been discussed at length by Intel and DRAM manufacturers.

    https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=intel%20row%20hammer

    I've seen it cause multi-bit errors in ECC systems

  12. Re:good news for ECC memory makers by Dragonslicer · · Score: 2

    If this was actually happening in the real world, computers would probably be crashing every few minutes.

    You mean attackers have been exploiting this ever since Windows 95?

  13. Wow. Superbad. by drolli · · Score: 2

    Thats an evil bug. This could even be triggered accidentally by bad programming.

    But more imporant, this allows you to break your VMs memory boundaries without any restriction. If you happen to make an educated guess about the memory layout of the physical machine and the host and guest kernel images loaded, you can try to

    a) manipulate the host kernel directly (that would be nearly undetectable)

    b) manipulate private keys in other VMs or the host

    c) manipulate other VMs memory

    d) communicate between VMs

    And all of this independent of any software bug. The only thing which can be done about it would be to disable the feature on the simulated guest processor which allows to manipulate the cache arbitratily (and implicitely limit running guest programs to 1 core!). Alternatively,increase the refresh rate (i remember that the refresh rate could acturally be set manually in the 90s).

    That being said, i just wonder if it possible to trigger this bug from a high level language (e.g. matlab) or the JVM where the operation causing the problem could be used implicitely for some vectorized code or other operations, e.g can this bug be triggered by the voilatile keyword in Java and accessign the memory in the same way?