Many DDR3 Modules Vulnerable To Bit Rot By a Simple Program
New submitter Pelam writes: Researchers from Carnegie Mellon and Intel report that a large percentage of tested regular DDR3 modules flip bits in adjacent rows (PDF) when a voltage in a certain control line is forced to fluctuate. The program that triggers this is dead simple — just two memory reads with special relative offset and some cache control instructions in a tight loop. The researchers don't delve deeply into applications of this, but hint at possible security exploits. For example a rather theoretical attack on JVM sandbox using random bit flips (PDF) has been demonstrated before.
I don't know if there are hundreds or thousands or hundreds of thousands of low level 'bugs' like this related to simple subsystems abused in specific ways.. but there are plenty.
If it's been demonstrated how is it still theoretical?
Either way, I wouldn't say that cache control instructions in a tight loop is dead simple.
Still pretty bad though.
This is all very interesting but totally pointless! Which modules? Tell us the brands, model names, manufacturer numbers?
Get free satoshi (Bitcoin) and Dogecoins
ALLR? (Address Line Layout Randomization)?
as for me, i'll wait for some real world examples of this possible exploit before i switch to ECC memeory, which would mean a new MB on top of the more exp memory.
We have proof!
Your reply already seems to suffer,
Of course if you can get the target computer to run certain code, you can completely wipe all the RAM, but wheres the fun in that huh..
Does the cache control commands require root access on Windows or Linux?
"just two memory reads with special relative offset and some cache control instructions in a tight loop" Yuh hurt yer what?
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
The authors did a good job of covering the issue
Also, the paper is a good primer on dram stuff in general.
Unfortunately, this Christmas present.violates the Engineer's first rule.
Try to stay out of the news, because when you are in the news, it's usually not a good thing.
The failure mechanism:
There is is bug in most DDR3 chips built especially after 2010.
If you do too many read cycles in to short a time to the same row, some bits in an adjacent row may automagically change.
Kind of a cumulative, adjacent cell disturb mechanism.
Existing programs may do this accidentally, but it is unlikely because the cache usually lowers the number of read cycles to a safe number.
This can easily be done with a strange program using cache flushes, which an ordinary x86 user process can do if it wishes.
Mitigations on existing memory controllers:
ECC likely does not help because more bits are likely to be disturbed than most ECC can handle.
Keep strange programs off your system.
Changing the refresh rate 64mS to 8mS seems to eliminate the issue with perhaps a 35% performance hit.
The OS might be able to remap the memory so that only every other physical row is used, with a 50% decrease of memory capacity.
At least it's a 100% increase in reliable memory.
Mitigations on new equipment:
DRAMS that meet their specifications would be nice, but this seems more likely to be a change in the specs.
An increased refresh rate on rows near a lot of activity.
The authors propose a probability base plan.
Seems like one based on hard accounting might be smarter if you have to change the controller anyway.
Consequences:
This mechanism produces random results.
It seems there are likely more fruitful ways to break into a system.
The ease of implementation and wide applicability still make it an (ah-hem) interesting bug to say the least.
No. These are standard instructions that many apps require to function correctly when using multiple threads. Even if you aren't using them directly, at least some of the APIs you use most certainly are.
Way back when RAM was stupid expensive, one way to reduce cost was to use so-called composite RAM. On high-end Macs back in the early-mid 1990s, that could cause the machine to not boot but instead play the first four notes of the Twilight Zone theme song.
No. These instructions would be pretty pointless if they were restricted. They are designed to control low-level behaviour of processor caches for code optimization purposes. This is mostly relevant for expensive computations. These happen much more often in application code than in kernel code.
This is ridiculous. Realistically, when have you ever run into a situation where stib teg ylirartibra deppilf?
I'm never upgrading from my vaxstation 4000. none of this new fangled tech for me. no sirree.
Good summary.
My question is will MEMTEST, one of my boot options in GRUB will test all kinds of memory patterns, from systematic to pretty random.
Will passing a few total test cycles (often 12-24 hours for large RAM amounts) indicate less chance of this kind of adjacent bit corruption?
Unless you are making a Speak-and-Spell, it's foolish not to use non-ECC RAM. I would rather pay an additional 9th as much and have some peace of mind that the RAM will at least keep from flipping a bit from comic rays, which happens about once a week.
I take that back; put it in the Speak-and-Spell, too.
https://www.youtube.com/c/BrendaEM
Sort of already known 'weakness', recent memtest86 include the 'hammer test' for the purpose of testing this case, see http://www.passmark.com/forum/showthread.php?4836-MemTest86-v6-0-Beta
No. These are standard instructions that many apps require to function correctly when using multiple threads.
Can you explain when you'd need to flush the cache when using multiple threads? You'd have to flush the cache back to RAM (isn't that a privileged instruction?), invalidate it, then read the data back from RAM. That's surely insanely slow compared to just using the CPU's internal cache coherency mechanisms?
At least with parity or inadequate ecc, the computer would likely stop before causing too much harm with bad results.
It's actually not for multithreaded code but rather for DMA coherency. DMA devices can access main memory so you may need to flush data from the CPU cache to main memory to ensure the PCI devices get the current data.
This paper was published at ISCA in June and on Soylent News earlier today (or possibly yesterday). Why is it suddenly being circulated six months after publication? Someone trying to promote ECC memory?
I am TheRaven on Soylent News
This has been know for some time. It's been referred to as "Row Hammer" and has been discussed at length by Intel and DRAM manufacturers.
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=intel%20row%20hammer
I've seen it cause multi-bit errors in ECC systems
...or better DRAM modules. Does this problem occur with DDR2 modules?
clflush is really quite pointless. wfence() is what you want. arm has gotten rid of cache line flushing.
Nope. It's an amazing DoS attack. Just wrap the code in some offset finding and address marching code, compile it in /tmp, execute it in the background, rm it, and watch it cause chaos. This will be a big problem on any shared platform.
Liquid nitrogen for your RAM then...?
if you want people to think you know what you are talking about, just put ".com" at the end of everything you say.com
This is the reason I recommend that everyone invest in write-only memory for their computers. It is far more secure and hack proof than the alternatives.
The rush to DDR3 was a cynical cash-grade designed to artificially inflate the price of DRAM modules (which happened beyond the wildest dreams of commodity memory speculators). Now we find the 'standard' is inherently faulty by design. And yet the equally worthless DDR4 is on our doorstep- offering almost nothing BUT another round of price inflation.
Current CPUs are NOT RAM bound- their large internal caches have mitigated restricted external bandwidth for all common computational tasks for years now. ONLY the issue of the onboard GPU considers the advantage of boosting RAM speeds, and even then recent moves by AMD and Nvidia to use reat-time compression to reduce bandwidth requirements implies existing memory bandwidth would be more than good enough if something like GDDR5 was used in place of the DDRn families of chips.
What we do NEED however is absolute reliability in memory chips. Now a modern computer has memory allocation operation through a VASTLY complicated system of dynamic virtual mapping, and many cores potentially seeking access to the 'same' memory locations, only rock-solid memory design can prevent terrible deep dangerous 'bugs' cause by unpredictable memory faults.
Since DDR3 make MASSIVE profits hand over fist, we might at least expect reliability for memory that has never been more expensive across the recent past. Sadly, the old adage of "the more you pay, the less you get" is kicking in. It is years since PC performance fanatics have fretted over exotic brands of RAM- so no-one has paid proper attention to the issue of correct RAM stick behaviour for quite some time.
Can't be a co-incidence that working with a brand new Kabini (AMD) laptop recently, problems led to running the standard memory tests that immediately showed the manufacture provided RAM was 'faulty'. I'd bet anything that the entire run of this particular laptop from HP has the same problem. A machine designed and built without ANYONE at HP even bothering to run the standard Microsoft Windows memory test. Recent AMD CPU/APU parts have proven VERY poor on the memory reliability side (AMD was the first to build the memory controller INTO the CPU chip, and AMD systems subsequently became FAR fussier about the memory sticks they'd work correctly with.
Why can't ECC be done in software? At least for userland applications (maybe not necessarily for kernel-space memory)?
There's prior art on softecc:
http://pdos.csail.mit.edu/papers/softecc:ddopson-meng/softecc_ddopson-meng.pdf
Yes, yes...let the butthurt flow through you. Waste your modpoints knocking me down.
Story I heard about mid-20th-century IBM mainframe. (I think it was the 360 series).
Core memory was tight and had cooling issues. The designers examined the instruction set and determined that, given cacheing and the like, no infinite loop could hammer a particular location more than one cycle in four (25% duty cycle), for which cooling was adequate. So they shipped.
Turns out, though, you could do a VERY LONG FINITE loop that hit a location every other cycle, for 50% duty cycle (not to mention the possibility of hitting a nearby location with some of the remaining cycles). Wasn't too long before a student managed to do this.
And set the core memory on fire.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Read disturb was allready known for flash memory. Read disturb is when a flash cells flips a bit when other cells adjacent to the disturbed cell are repeatedly read.
Thats an evil bug. This could even be triggered accidentally by bad programming.
But more imporant, this allows you to break your VMs memory boundaries without any restriction. If you happen to make an educated guess about the memory layout of the physical machine and the host and guest kernel images loaded, you can try to
a) manipulate the host kernel directly (that would be nearly undetectable)
b) manipulate private keys in other VMs or the host
c) manipulate other VMs memory
d) communicate between VMs
And all of this independent of any software bug. The only thing which can be done about it would be to disable the feature on the simulated guest processor which allows to manipulate the cache arbitratily (and implicitely limit running guest programs to 1 core!). Alternatively,increase the refresh rate (i remember that the refresh rate could acturally be set manually in the 90s).
That being said, i just wonder if it possible to trigger this bug from a high level language (e.g. matlab) or the JVM where the operation causing the problem could be used implicitely for some vectorized code or other operations, e.g can this bug be triggered by the voilatile keyword in Java and accessign the memory in the same way?
You shouldn't be required to skip the cache for most CPU bound thread sync. Cores one one package can snoop each-other's cache to resolve semaphores without a RAM trip, and I believe that multi-chip SMP also has a communication channel for that.
For now, I'm sure that Xen is going nuts making sure there are guard pages to keep any smashing inside your own VM. I wonder if any Cloud providers match ECC errors to particular VMs, and would like to have a nice chat with anybody trying to trigger this on their hardware.
FYI: You can snoop L2 cache, but not L1. Intel went with inclusive cache so snooping wouldn't be needed. AMD went with exclusive, which gives better cache usage, but went trying to sync threads, all of that cache snooping is a high latency operation. By having cache being inclusive, you no longer need to snoop, just look at cache normally.
AMD has higher overall throughput for many GPU type work loads, but Intel shines with work loads that require thread syncing.
http://www.google.com/patents/WO2014004748A1?cl=en
Row hammer refresh command
Claim 1:
Watch for too many cycles to a row
and when it happens send a refresh to the adjacent row.
Given an understanding of the failure mechanism, a junior engineer should be able to think of this in about 100mS.
The paper in this article was very careful to not present this solution, instead presenting the a probalistic one.
This presents an interesting and useful experiment.
how many folks reading the paper thought of the claim 1 solution which Intel claims is novel?