Exploiting the DRAM Rowhammer Bug To Gain Kernel Privileges

← Back to Stories (view on slashdot.org)

Exploiting the DRAM Rowhammer Bug To Gain Kernel Privileges

Posted by Soulskill on Monday March 9, 2015 @04:21PM from the flipping-tables-over-flipped-bits dept.

New submitter netelder sends this excerpt from the Project Zero blog: 'Rowhammer' is a problem with some recent DRAM devices in which repeatedly accessing a row of memory can cause bit flips in adjacent rows. We tested a selection of laptops and found that a subset of them exhibited the problem. We built two working privilege escalation exploits that use this effect. One exploit uses rowhammer-induced bit flips to gain kernel privileges on x86-64 Linux when run as an unprivileged userland process. When run on a machine vulnerable to the rowhammer problem, the process was able to induce bit flips in page table entries (PTEs). It was able to use this to gain write access to its own page table, and hence gain read-write access (PDF) to all of physical memory.

180 comments

Min score:

Reason:

Sort:

Impressive by Anonymous Coward · 2015-03-09 16:28 · Score: 5, Insightful

Don't have much more to say than that's an impressive exploit.
1. Re:Impressive by ArmoredDragon · 2015-03-09 16:53 · Score: 1
  
  I'll second that, and add that I suspect this could also be used for hypervisor/sandbox escapes on practically *any* platform that doesn't use ECC memory.
2. Re:Impressive by garyisabusyguy · 2015-03-09 16:55 · Score: 0, Offtopic
  
  I cannot believe this, an AC on slashdot told me that privilege escalation is impossible in Linux, so this must be wrong
  
  --
  Wherever You Go, There You Are
3. Re:Impressive by twistedcubic · 2015-03-09 17:53 · Score: 3, Insightful
  
  Double bonus if this result gets manufacturers of laptops to FINALLY include ECC memory.
4. Re:Impressive by Anonymous Coward · 2015-03-09 18:14 · Score: 2, Informative
  
  It's a hardware problem, not a software problem.
5. Re:Impressive by amalcolm · 2015-03-09 19:12 · Score: 1, Offtopic
  
  I'm sure all operating systems (kernels) will be similarly vulnerable to a hardware bug like this, so I think your pious comment is out of place.
  
  --
  Time for bed, said Zebedee - boing
6. Re:Impressive by Anonymous Coward · 2015-03-09 21:25 · Score: 1
  
  ECC won't stop this exploit, only mitigate it if scrub rates are high enough, and worst case it'll drop from a privilege escalation vector to a DoS attack.
7. Re: Impressive by Anonymous Coward · 2015-03-09 21:29 · Score: 0
  
  No, Temple OS wont be affected by this bug. The beauty of templeOS allows to trust in GOD and run everything in ring 0
8. Re:Impressive by Shinobi · 2015-03-09 21:48 · Score: 5, Informative
  
  And, if you had read the actual paper, you'd see that ECC isn't proof against it either
9. Re:Impressive by Anonymous Coward · 2015-03-10 00:02 · Score: 0
  
  What body part do you want to bet that they will raise RAM prices sky high in such a case?
10. Re:Impressive by Anonymous Coward · 2015-03-10 00:33 · Score: 0
  
  Your initial comment worded the issue as if it was explicitly a Linux-based vulnerability. It's not.
11. Re:Impressive by Anonymous Coward · 2015-03-10 00:47 · Score: 1
  
  You get your ass handed to you twice in a row, yet you keep coming back for more. I like that in a man.
12. Re:Impressive by MachineShedFred · 2015-03-10 01:25 · Score: 2
  
  so when they said:
  
  We also tested some desktop machines, but did not see any bit flips on those. That could be because they were all relatively high-end machines with ECC memory. The ECC could be hiding bit flips.
  they actually said that ECC doesn't matter?
  I guess we read differently.
  
  --
  Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
13. Re:Impressive by MachineShedFred · 2015-03-10 01:32 · Score: 3, Informative
  
  I see - you were looking at the PDF link, which is more theoretical. yes, if you can get two or more bits to shift inside a 64-bit chunk, then ECC doesn't help. There's got to be a low probability of that actually happening though - the Google Project Zero wasn't able to make it happen with ECC at all.
  
  --
  Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
14. Re:Impressive by Anonymous Coward · 2015-03-10 01:43 · Score: 0
  
  It isn't sacred, but it's a hell of a lot better than the common alternatives.
  $ printf "\x6d\x6f\x6f\n" | cowsay
15. Re:Impressive by bluefoxlucid · 2015-03-10 02:31 · Score: 1
  
  Two bits would trigger double-error detection.
  
  --
  Support my political activism on Patreon.
16. Re:Impressive by bluefoxlucid · 2015-03-10 02:32 · Score: 1
  
  The desire to believe in the infallibility of your chosen tools leaves you open to attack. What is that word again?
  American.
  
  --
  Support my political activism on Patreon.
17. Re:Impressive by Anonymous Coward · 2015-03-10 03:42 · Score: 0
  
  What body part do you want to bet that they will raise RAM prices sky high in such a case?
  Prices are determined by supply and demand, not by vendors who just happen to want to make more money than other vendors.
18. Re:Impressive by Anonymous Coward · 2015-03-10 04:11 · Score: 0
  
  Don't have much more to say than that's an impressive exploit.
  OK, OK. I won't. Sheesh.
19. Re:Impressive by Anonymous Coward · 2015-03-10 04:38 · Score: 0
  
  ECC RAM has 9 bits per byte. What you do with that extra bit is up to the controller. If you use it for simple per-byte parity, you might be able to create an undetectable corruption. If you use the extra bits from a bunch of bytes together (which you can because RAM is read multiple bytes at a time), then you can detect more complex corruption than just single (or odd number) bit errors.
  Personally I hope that this exploit will make ECC RAM the norm rather than an overpriced exception only available on server hardware. RAM errors are fiendishly difficult to debug without ECC and often cause undetected data corruption for months or years before bad RAM is suspected.
20. Re:Impressive by gnupun · 2015-03-10 04:47 · Score: 1
  
  I would be happy with giving a single bonus if some EE engineers or physicists can explain how this exploit works. Right now we only know what it does -- flip bits in some memory locations by writing to other memory locations.
21. Re:Impressive by Anonymous Coward · 2015-03-10 04:48 · Score: 0
  
  For that to happen, Intel would have to pull its head out of its ass and enable ECC RAM support in the memory controllers of their non-server CPUs. I'm looking forward to seeing reviewers down-rate CPUs for lack of ECC RAM support.
22. Re:Impressive by Shinobi · 2015-03-10 05:01 · Score: 1
  
  Nowhere did I say that ECC didn't matter. I just said that ECC isn't guaranteed to protect you. It just reduces probability.
23. Re:Impressive by Shinobi · 2015-03-10 05:05 · Score: 1
  
  That depends on how you do it though. Through some smart row walking you could probably increase the frequency a lot. And then you have ASLR complicating things from both ends(though I think it could be used to help in an attack actually)
24. Re:Impressive by Anonymous Coward · 2015-03-10 05:13 · Score: 1
  
  It flips bits by *reading* other bits. This happens because RAM is volatile. The charge in the small capacitors which form the RAM cells leaks. It needs to be refreshed (read and rewritten) periodically in order to maintain the data. Reading RAM discharges the small capacitors. The RAM controller immediately rewrites the data it just read. Due to the small structures and dense packing of RAM cells these days, this also discharges neighboring cells a little bit, but those aren't rewritten by the RAM controller. Normally that isn't a problem, because due to caching, the same RAM cells normally aren't read again and again, and after a short time, the small loss of charge from neighboring cells being read is "repaired" by the next refresh cycle. This attack however clears the cache between reads, so the same physical RAM cells are read over and over, and that depletes neighboring cells far enough to corrupt the data before it is rewritten by the refresh cycle.
  It's a hardware problem, but software could do something to mitigate the impact until the hardware is fixed (which basically means buying all new computers, computers that don't even exist yet.) The operating system could make sure to allocate critical data structures in physical RAM that is separated by unused physical RAM from user space allocations (and user space allocations from each other). The operating system could also prevent code from running if it uses the instruction for clearing the cache line. Neither of these are easy to do though, so don't expect a quick fix.
25. Re:Impressive by MachineShedFred · 2015-03-10 05:35 · Score: 1
  
  Yes, it would be detected, but it could not be corrected. It would likely crash the software with a memory fault, in which case you would have a denial of service attack rather than defeating protected memory and allowing something access from outside the box.
  Not good, but better than complete privilege escalation.
  
  --
  Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
26. Re:Impressive by gnupun · 2015-03-10 06:11 · Score: 1
  
  Due to the small structures and dense packing of RAM cells these days, this also discharges neighboring cells a little bit, but those aren't rewritten by the RAM controller... This attack however clears the cache between reads, so the same physical RAM cells are read over and over, and that depletes neighboring cells far enough to corrupt the data before it is rewritten by the refresh cycle.
  Are you sure that's the case? If what you say is true, the adjacent rows would drain charge and the 'one' bits in the 'victim' row would become 'zero' bits. But what's happening is, 'zero' bits are changing to 'one.' How can that happen if reads deplete capacitors of neighboring rows? It could instead be related magnetic field interference between the rows... just my guess.
27. Re:Impressive by Anonymous Coward · 2015-03-10 06:19 · Score: 0
  
  Whether a charged capacitor is interpreted as a one or a zero depends on the implementation. Take a look at the pictures illustrating the fading of DRAM memory in relation to cold boot attacks. When the data slowly fades (slower than normal due to the lower temperature in cold boot attacks), it doesn't uniformly fade to black but to a pattern, which is the result of this implementation dependence.
28. Re:Impressive by cheesybagel · 2015-03-10 06:31 · Score: 1
  
  ECC != parity check. It can detect two errors and correct one.
29. Re:Impressive by Anonymous Coward · 2015-03-10 06:36 · Score: 0
  
  I think the leak can either drain or fill. So if you are flooding all 1s, the charge can leak to the other capacitors as well.
30. Re:Impressive by haruchai · 2015-03-10 08:58 · Score: 1
  
  I've wondered for a long time why unregistered ECC hasn't become the default on desktops while reserving buffered DIMMs for servers.Mass production would likely erase any price advantage of non-ECC memory.
  
  --
  Pain is merely failure leaving the body
31. Re: Impressive by Anonymous Coward · 2015-03-10 09:55 · Score: 1
  
  Looks like he got multiple rows to flip with one single troll hammer.
  Seems people are pretty exploitable.
32. Re:Impressive by gweihir · 2015-03-10 10:19 · Score: 1
  
  ECC does not fix this at all, you just have to flip 3 bits in a 32 bit word. What fixes it is to refresh often enough to make sure bits do not get so weak that they can flip. On laptops, people may want to safe power on refreshes and hence space them as wide as possible, making this attack possible in the first place. (Alo keep in mind that during hibernation, refresh may be the main power consumer.) On desktops and servers, this attack may not work at all, or only with bad-quality DRAM. Of course we need actual numbers from the field, but my guess is that most servers and desktops are not actually vulnerable against this, ECC or no.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
33. Re:Impressive by gweihir · 2015-03-10 10:23 · Score: 1
  
  Yes, they would need to flip 3 bits in 32 bit (not 64, ECC in x86 works on 32 bit words) and they will happily get corrected to the wrong value. Takes longer, but is still feasible if the original attack is. The other thing is that desktops and servers will not try to refresh as slow as possible, unlike (some) laptops. You need to discharge the cells to a certain, chip-dependent level before this attack becomes possible. If you refresh often, that level should/may not be reached. I think it is far too early to panic on this. Even on laptops, there may be a software fix that involves refreshing more often.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
34. Re:Impressive by AaronW · 2015-03-10 14:15 · Score: 1
  
  It makes it completely ineffective when there is ECC. If three bits haven't flipped then the error will be reported and logged and probably blasted out on every console. The likelihood of this attack being detected well before it is exploited is extremely high. If it's two bits then either the application will be killed or the machine will halt, most likely the latter.
  
  --
  This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
35. Re:Impressive by AaronW · 2015-03-10 14:17 · Score: 1
  
  Every implementation I have seen in recent times has always performed error correction as well as detection. You only need 7 bits to correct one bit error out of 64-bits. With 8 bits you can detect two bits and correct one bit. ECC DIMMs are typically 72-bits wide. LPDDR4 also includes support to detect and prevent this sort of attack.
  
  --
  This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
36. Re:Impressive by AaronW · 2015-03-10 14:20 · Score: 1
  
  And try flipping three bits before you try and access the memory. If it's not exactly three bits then the memory corruption will be detected, and if it's one bit it will be corrected. In any case, if it is not three bits when the memory is accessed then it will be reported. Also, high end hardware can do memory scrubbing, where the memory controller periodically reads all of the memory in the background to detect and correct single bit errors before they are a problem. See Memory Scrubbing.
  
  --
  This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
37. Re:Impressive by AaronW · 2015-03-10 14:42 · Score: 1
  
  You must have read a different paper than me.
  ECC on a decent machine would make this attack almost impossible.
  For one thing, the memory controller will likely support memory scrubbing, which will detect and correct single memory errors long before enough bits are flipped for ECC to no longer detect the corruption. ECC will typically correct one bit error and detect two bit errors. Since the chance of a bit flipping is random and takes thousands of operations, the chance of having three bits flip and not be detected is exceedingly low.
  Also, the CPU does not operate on single words in DRAM. It is always a burst operation on a cache line (or even more). When you write to a word in memory if it is not in the cache then the entire cache line will be fetched into memory before it can be modified and written back as an entire cache line. It is possible this may be able to be bypassed by writing an entire cache line into the write buffer but I don't know enough of the low-level Intel architecture to know if it does this or not.
  
  --
  This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
38. Re:Impressive by Lost+Race · 2015-03-10 18:20 · Score: 1
  
  (not 64, ECC in x86 works on 32 bit words)
  There's no 36,32 hamming code that can give full SECDED. Intel and AMD use 72,64. Probably everyone else too, since x72 memory is so common.
39. Re:Impressive by Anonymous Coward · 2015-03-11 04:49 · Score: 0
  
  No. Not really. It's just potentially showing that ECC is doing it's job, "correcting" "errors". So the take away is just that ECC is working as it should and may or may not have bit flipping going on, but this is getting pretty arcane, and I'm pretty impressed as well that they got a workable exploit.
40. Re:Impressive by Agripa · 2015-03-11 11:24 · Score: 1
  
  if you can get two or more bits to shift inside a 64-bit chunk, then ECC doesn't help.
  Reporting on the event or halting the machine upon detection of a double bit error *is* better than missing it completely and some triple bit errors would be detected as well. If chipkill ECC was being used which is commonly available, then up to 4 bits within a nibble boundary would be detected and corrected.
41. Re:Impressive by Agripa · 2015-03-11 11:25 · Score: 1
  
  It would generate a report listing the error and may or may not halt the machine depending on how that is set.
42. Re:Impressive by Agripa · 2015-03-11 11:28 · Score: 1
  
  There's no 36,32 hamming code that can give full SECDED. Intel and AMD use 72,64. Probably everyone else too, since x72 memory is so common.
  For a while now 4 bit chipkill has been supported as well which I think takes 144,128.
43. Re:Impressive by Agripa · 2015-03-11 11:32 · Score: 1
  
  I've wondered for a long time why unregistered ECC hasn't become the default on desktops while reserving buffered DIMMs for servers.Mass production would likely erase any price advantage of non-ECC memory.
  Adding the extra 8 bits per 64 bit line would always raise the price 12.5% although that could be insignificant. There are also some boot time issues because memory has to be initialized which takes time and adds complexity.
  I am inclined to blame Intel as far as why ECC is not more common since they use this feature for market segmentation.
44. Re:Impressive by Agripa · 2015-03-11 11:38 · Score: 1
  
  ECC errors do not have to halt the machine. They can just be reported.
45. Re:Impressive by Anonymous Coward · 2015-03-13 14:35 · Score: 0
  
  Yeah, that's why if you price hardware from Lenovo, HP, and Dell, you end up with a $800 price difference for essentially the same parts. Because the Lenovos are in such high demand, and not cause HP wants to make $800 extra profit. Idiot
possible iOS Exploit? by muphin · 2015-03-09 16:33 · Score: 2, Interesting

is this possible to exploit on an iPhone?

--
It's not a typo if you understood the meaning!
1. Re:possible iOS Exploit? by Anonymous Coward · 2015-03-09 18:26 · Score: 0
  
  We don't know. Answering that would require highly specific technical knowledge.
2. Re:possible iOS Exploit? by jandrese · 2015-03-10 02:08 · Score: 1
  
  There is a definite maybe. One caveat of the process is that it requires access to special instructions to flush the cache constantly (hundred of thousands of times per second), and a processor fast enough to pound the memory controller. Those could be handicaps to running this exploit on a smartphone platform. It looks like modern phones use DDR3 derived memory (older phones like the iPhone 4 are DDR2 style and this won't work) so it's not impossible.
  
  --
  
  I read the internet for the articles.
3. Re:possible iOS Exploit? by pimproot · 2015-03-11 07:43 · Score: 1
  
  Probably, and if you have physical access to the iPhone you can increase the rate of memory errors with mild heat. This paper from 2003 is pretty interesting and I'm not sure why it hasn't led to a new class of jailbreaking / rooting exploits yet. (That I'm aware of, at least.)
  http://sip.cs.princeton.edu/pr... :
  "Our attack works by sending to the JVM a Java program that is designed so that almost any memory error in its address space will allow it to take control of the JVM. All conventional Java and .NET virtual machines are vulnerable to this attack. The technique of the attack is broadly applicable against other language-based security schemes such as proof-carrying code.
  "We measured the attack on two commercial Java Virtual Machines: Sun’s and IBM’s. We show that a singlebit error in the Java program’s data space can be exploited to execute arbitrary code with a probability of about 70%, and multiple-bit errors with a lower probability.
  "Our attack is particularly relevant against smart cards or tamper-resistant computers, where the user has physical access (to the outside of the computer) and can use various means to induce faults; we have successfully used heat. Fortunately, there are some straightforward defenses against this attack."
NSA memory controller hack by mveloso · 2015-03-09 16:34 · Score: 1

Geez, who knew that writing 'NSA' to 0xdeadbeef over and over would give you kernel access? Those NSA guys really broke into everything.
ECC Memory by Frobnicator · 2015-03-09 16:35 · Score: 5, Interesting

Yet another reason to push shared providers for ECC memory. The error correcting memory is so far not vulnerable to this attack, all the researchers that have tried it report that ECC memory identifies and corrects the corruptions. Of course some attackers may have found a way, but ECC minimizes the risk
Amazon says it uses ECC in their AWS machines, but other big hosts like Equinix say that ECC memory is "available". Be careful about your hosting, folks.

--
//TODO: Think of witty sig statement
1. Re: ECC Memory by Bruce+Perens · 2015-03-09 16:45 · Score: 2
  
  Yes, you beat me to it. A correctly-configured ECC motherboard with real ECC memory would defeat this. Watch out for fake ECC memory that just simulates the correction bits.
  Once memory starts being vulnerable to row interference, having a machine without ECC becomes much more dangerous, regardless of this exploit.
  
  --
  Bruce Perens.
2. Re: ECC Memory by Pinhedd · 2015-03-09 16:50 · Score: 1
  
  I can't see how it would be possible to defeat ECC.
  The attacker would have to construct a write that affects the desired bits in the row-to-be-hammered and has check bits that affect the row-to-be-hammered's check bits such that the altered row is validated. This is probably nigh impossible to do in all but a select few constrained cases.
3. Re: ECC Memory by Bruce+Perens · 2015-03-09 16:50 · Score: 4, Insightful
  
  It has yet to be established whether hammer techniques can result in a correct data+ECC pattern. If so, it should be possible to permute the memory in a way that defeats this, either on the memory module or the memory controller.
  That would make a good research paper for someone.
  
  --
  Bruce Perens.
4. Re: ECC Memory by ArmoredDragon · 2015-03-09 16:59 · Score: 1
  
  other big hosts like Equinix say that ECC memory is "available"
  I suspect they'd change that policy in a hurry if people started using this for hypervisor escapes.
5. Re: ECC Memory by Anonymous Coward · 2015-03-09 17:07 · Score: 1
  
  It has yet to be established whether hammer techniques can result in a correct data+ECC pattern. If so, it should be possible to permute the memory in a way that defeats this, either on the memory module or the memory controller.
  That would make a good research paper for someone.
  Not sure how as the ECC bits are not stored on the same chip as the data bits. It's why there are nine chips per side of a DIMM.
  http://www.realworldtech.com/parity-and-ecc-explored/
  I'd say that ECC RAM is immune to this attack.
6. Re: ECC Memory by Anonymous Coward · 2015-03-09 17:37 · Score: 0
  
  It's why there are nine chips per side of a DIMM.
  Only sometimes. Look hard enough and you'll find 3, 5, 7 and 9 chip ECC DIMMs depending on what batches of chips a given manufacturer has available to them. (PQI once made a DDR3 DIMM with 19 chips.)
7. Re: ECC Memory by thogard · 2015-03-09 17:38 · Score: 2
  
  ECC might be able to help the attack. If you know the state of memory and the associated ECC values you would like and can calculate a designed bit pattern with the same ECC that meets the requirements, you may be able to get the ECC hardware to flip the bit for you as you hammer bits that don't matter as much.
  Hammering memory to induce writes where they shouldn't happen has been done for decades. It was used back in the days when you needed high voltages to do writes in eeproms when people found out that you could use a 5V write power supply and sometimes get bits to change if you tried enough times.. Related techniques have been used with bubble memory and iron core as well.
8. Re: ECC Memory by djdanlib · 2015-03-09 17:49 · Score: 1
  
  Do you have an example of this fake ECC memory that interested parties should avoid?
9. Re: ECC Memory by tlhIngan · 2015-03-09 17:55 · Score: 2
  
  I can't see how it would be possible to defeat ECC.
  The attacker would have to construct a write that affects the desired bits in the row-to-be-hammered and has check bits that affect the row-to-be-hammered's check bits such that the altered row is validated. This is probably nigh impossible to do in all but a select few constrained cases.
  It's possible, but very unlikely.
  Rowhammer is not new - it's been known since the 90s since it affects NAND flash memory as well (the same stuff in an SSD) - here there are two problems. It's called write disturb and read disturb. Because in NAND flash, all the storage transistors are wired in series - each transistor is a page of flash, and all the pages are wired in series, so reading one page requires the transistors in other pages to be activated to be able to read the desired page.
  So NAND flash manufacturers design their transistors sepecially so you minimize write disturbs (where writing to a page flips bits in other pages) as long as you write the pages in sequence. But there's also read disturbs where the act of reading a page can flip bits on other pages because you're still activating those series transistors.
  That's why NAND flash has the spare area after each page - it's for ECC data which is required to catch such errors.
10. Re: ECC Memory by viperidaenz · 2015-03-09 18:05 · Score: 4, Informative
  
  When you write to a row of memory, it's ECC bits are written to too by the memory controller, which are next to the other ECC bits.
11. Re: ECC Memory by Macman408 · 2015-03-09 20:05 · Score: 5, Interesting
  
  I hadn't heard of this either, but a quick google turned up a description of false parity RAM: http://en.wikipedia.org/wiki/R...
  TLazy;DR: To save cost where parity RAM was required by the hardware but not by the operator, modules existed that would calculate the parity bit upon reading the RAM, rather than storing the parity bit. I don't see any evidence that this type of module ever existed for ECC though.
  To make sure memory is ECC, it's probably sufficient to count the memory chips on a DIMM. If there are 9 or 18 (or even 36, if it's a particularly large DIMM) identically-marked chips, that's ECC. If there are 4, 8, 16, or 32 chips, then it's probably not. If one of the chips is marked differently than the others, it might be a little more complicated; it might be possible that it's a different memory chip (e.g. if there are 4 x16 memory chips, you'd only need one x8 to get a x72 ECC DIMM, so that last chip would be different). But it's also possible that it's buffered/registered memory, and the different chip is the buffer/register.
  And an aside on the topic of buying RAM for yourself:
  In general, I'm not a fan of cheaping out on memory. I did computer repair for a while, and it shocked me how many problems were caused by bad RAM - from the obvious ("my computer crashes every time I boot it") to less obvious ("every few days, an application crashes") to the rather insidious ("it was running fine, and now I can't mount my hard drive any more"). It got to the point where, when a computer came in with nonspecific symptoms like that, I'd open up the computer and peek at the RAM chips first. If they had no recognizable manufacturer, they were certainly garbage. If they were recognizable but not top-tier, they probably needed some stress testing on our RAM tester. And if they were the good stuff (Samsung always had my vote there, though it's hard to find because they don't sell directly to consumers), then it was probably something else.
  That's also where I learned that things like memtest86 or other software diagnostic tools were basically useless too. Only the absolute worst memory would fail a test, even a looped test run for days. Most bad RAM was marginal - after all, it probably passed some manufacturing tests. We had a rather expensive (~$4k-8k) box that would test memory, doing things like varying the supply voltage or self-heating the RAM. When RAM is installed in your PC, you're still limited by the hardware - i.e. the voltage regulator and the memory controller - which probably keep the memory as close to nominal conditions as possible. Obviously, those machines are rather hard to come by, so you have to make do with software tests instead - but a pass on those just means I can't prove it's bad; it doesn't mean the memory is good. Even if I pass all memory testing, I'll still swap/remove/replace DIMMs in an attempt to find which one is bad, because it's often not obvious.
12. Re: ECC Memory by Anonymous Coward · 2015-03-09 20:54 · Score: 0
  
  [...] to the rather insidious ("it was running fine, and now I can't mount my hard drive any more").
  Huh. I once had some RAM with a stuck bit on a system used for burning data CDs and DVDs. 50% of the disks produced had one single 0-bit where they should have a 1-bit buried somewhere in the data. Now that's insideous!
13. Re: ECC Memory by Anne+Thwacks · 2015-03-09 21:25 · Score: 1
  
  If there are 9 or 18 (or even 36, if it's a particularly large DIMM) identically-marked chips, that's ECC. If there are 4, 8, 16, or 32 chips, then it's probably not
  In the days of the AMD586, it was common for mother boards to be sold with fake ECC. There was actually a "fake ECC" chip soldered where the ECC should be. Often, these boards had defective RAM in too, but would pass the BIOS fake memory test! Memtest86 was written because of these boards.
  I bought one myself and was astonished that it was cost effective to deliberately engineer defective machines. Memtest86 may not be very good, but it would flush these boards out, which was the problem when it was written.
  Realistic testing for the kind of problem in this article requires knowledge of the layout of memory cells to know which is ajacent to what, as well as prolongued testing. However, it should be possible to produce a background task that does the test continuously and put it in the idle loop. This is often done in embedded systems. Perhaps the memory chips could hold a guide to the cell layout.
  And perhaps people who sell defective memory chips could face a class action.
  
  --
  Sent from my ASR33 using ASCII
14. Re: ECC Memory by Anonymous Coward · 2015-03-09 21:46 · Score: 0
  
  http://users.ece.cmu.edu/~omutlu/pub/dram-row-hammer_kim_talk_isca14.pdf
  slide 32:
  Other Results in Paper (cont’d)
  As many as 4 errors per cache-line
  –
  Simple ECC (e.g., SECDED) cannot prevent all errors
15. Re: ECC Memory by Shinobi · 2015-03-09 22:04 · Score: 3, Informative
  
  In the paper, they actually state that ECC isn't entirely proof either, because you can get multiple bit errors as per their testing. SECDED will be defeated by that. Chipkill might work.
16. Re: ECC Memory by Anonymous Coward · 2015-03-09 23:57 · Score: 0
  
  Yes but you don't get to pick the ECC bits independently of the data bits. I don't know what the exact algorithm is used in ECC, but the ECC bits are always the same for the same data word. This poses significant difficulties for a row-hammer type sidechannel attack, as it may not be possible to generate the correct bit pattern simultaneously for the data and ECC bits. If the rowhammer attack on the ECC flips the data bits, or the rowhammer attack flips the ECC bits, either way the ECC checksum is invalid, and the line will be fetched from it's mirror if DRAM mirroring is used; if mirroring is not enabled or supported, it will still cause an uncorrectable ECC error, and the OS will most likely crash unless it has some special support for pagetable sparing.
17. Re: ECC Memory by Anonymous Coward · 2015-03-10 00:38 · Score: 0
  
  About 15 years ago...
  I second you're findings about RAM. We used to test RAM with a specific disk containing AT&T Unix. Opening up multiple windows and perform a load on the PC with commands like "find ./ -name *.*" and a few others. IMHO linux would probably do as well.
  If a machine survived a whole day without producing a Kernel Panic, we had good RAM. Some of this observations could be confirmed with a Memory Tester (Hardware Device), but as you said, this is not obvious.
  Stress tests are the only reliable way I know to perform such tests. Idle or near Idle state memory tests do not consume power, do not heath up the pc etc..
  We used a test pc with some known good RAM and added suspected RAM and repeated the same series of test to found out if RAM was useless.
18. Re: ECC Memory by MachineShedFred · 2015-03-10 01:35 · Score: 1
  
  ECC is able to correct one error, but find more than one. If by some strange probability you were able to shift 2+ bits in the same 64-bit chunk of DRAM, ECC would detect it, but just mark it as errored rather than accept what the value is.
  It would more likely be a denial-of-service attack rather than being able to manipulate values. The fact that Google's "Zero Labs" guys couldn't make it happen on ECC systems speaks to the probabilities though - ECC may be enough of a protection until the problem gets solved in the next generation of DRAM.
  
  --
  Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
19. Re: ECC Memory by An+ominous+Cow+art · 2015-03-10 01:36 · Score: 5, Funny
  
  Now that's insideous!
  It takes 2 bit changes to change 'i' to 'e', so your problems are worse than you thought...
20. Re: ECC Memory by pla · 2015-03-10 01:45 · Score: 1
  
  I bought one myself and was astonished that it was cost effective to deliberately engineer defective machines.
  
  16GB (as 2x8GB) of ECC will cost you at least $160 for the absolute bottom of the barrel. The same 16GB of non-ECC goes for just about $100. That gives you a 60% markup for only 12% more chips. Really, it surprises me we don't see more fraud like that.
21. Re: ECC Memory by jandrese · 2015-03-10 02:13 · Score: 1
  
  Except that the ECC memory only costs so much because so few people buy it. It's a "business part". I don't think most consumer mobos are equipped to handle ECC memory either. It's a shame too because if the costs were in line with the actual hardware (it cost $112 instead of $160) and it was supported by the mobo manufacturers then I think a lot of system builders would go for ECC memory. $12 is not a bad price to pay to know when it is a faulty memory chip that's causing your system to crash and not something else, especially if the system tells you which bank is faulty so you don't have to waste time swapping DIMMs around looking for the one causing the crash.
  
  --
  
  I read the internet for the articles.
22. Re: ECC Memory by bluefoxlucid · 2015-03-10 02:42 · Score: 3, Interesting
  
  Wouldn't it be better design to put ECC into the memory controller, and arrange the chips to support ECC? That is: physically wire the memory chips and the memory bus to write far from each other (64 chips = interleave every 1 bit across all chips; 32 chips = interleave 2 bits per chip, starting on high bits in addressing so you put them half a chip's distance away from each other), physically protecting against chip-local anomalies. Have the MMU perform a single rotation, logically reserving an amount of RAM on each slot to carry ECC for the previous slot. Write ECC bits to those areas.
  Doing it in this way provides zero-cost physical isolation of single-chip memory errors (it's just the wiring layout), while also isolating the ECC from its corresponding module (avoiding RAS/CAS thrashing, allowing simultaneous access to the ECC bits and the corresponding RAM). It lets you pop in an ECC chip and use ECC, or pop in a non-ECC chip and sacrifice 12.5% of your RAM to error correction. That's a lot of RAM: on an 8GB system, it's almost 1GB.
  
  --
  Support my political activism on Patreon.
23. Re: ECC Memory by bluefoxlucid · 2015-03-10 02:42 · Score: 1
  
  That is not fraud.
  
  --
  Support my political activism on Patreon.
24. Re: ECC Memory by Anonymous Coward · 2015-03-10 03:10 · Score: 0
  
  http://www.allhdd.com/index.php?target=products&mode=search&subcats=Y&type=extended&avail=Y&pshort=Y&pfull=Y&pname=Y&pkeywords=Y&cid=0&q=XJM2N&src=pw
  192GB bulk pack for $80.
  I've bought from them several times with no issue.
25. Re: ECC Memory by tlhIngan · 2015-03-10 03:14 · Score: 1
  
  Except that the ECC memory only costs so much because so few people buy it. It's a "business part". I don't think most consumer mobos are equipped to handle ECC memory either. It's a shame too because if the costs were in line with the actual hardware (it cost $112 instead of $160) and it was supported by the mobo manufacturers then I think a lot of system builders would go for ECC memory.
  It's chipset and processor support, actually.
  Intel, for example, typically mandates ECC on the Xeon line (modern CPUs have the memory controller on die now, so ECC is dictated by the processor chosen).
  It's also available on the workstation line of processors since workstations typically go for stability.
  Of course, ECC is still a "feature" people will pay an inflated markup for so those Xeons and workstation processors cost a fair bit more than the low end and enthusiast parts.
  (Enthusiast parts? Yes, those don't typically include ECC as ECC typically ads a clock or two of delay, and people wanting "extreme performance" are trying get the fastest RAM possible - ECC adding a clock or two (with buffering/registers) just defeats the whole purpose of buying fast RAM). Look it up some time on how going from CL6 to CL5 will cost you extra.
26. Re: ECC Memory by pla · 2015-03-10 03:37 · Score: 1
  
  Okay, I admit it, I don't get the punchline. I even added one to the cart expecting that as the price per stick (even then, unbelievably low, but maybe for tested pulls), but no, $79 for the whole bulk pack, new???
  
  You can't even get no-name sticks of non-ECC labelled in Chinese for that. That can't count as a real price, can it?
27. Re: ECC Memory by Anonymous Coward · 2015-03-10 04:24 · Score: 0
  
  When trying to identify ECC-RAM by counting, remember to count only the chips of similar size. There are small (i.e. really small typically 2-4 mm across) EEPROMs that store the capabilities on all DIMMs, and on "Server" Memory, there may be small buffer chips to amplify the signals from the DIMM.
28. Re: ECC Memory by gTsiros · 2015-03-10 05:06 · Score: 1
  
  can you think of a way to do those tests with less cost?
  
  --
  Looking for people to chat about multicopters, coding, music. skype: gtsiros
29. Re: ECC Memory by jandrese · 2015-03-10 05:15 · Score: 1
  
  I'd rather most system came with ECC memory by default and "enthusiasts" could special order non-ECC memory to try to eek out another couple of FPS in the benchmark. It would be treated like overclocking. You trade off some system life and maybe a little stability to get a few percentage points more performance.
  
  --
  
  I read the internet for the articles.
30. Re: ECC Memory by Anonymous Coward · 2015-03-10 06:03 · Score: 0
  
  i3, Pentium G and Celeron are workstation processors?
  http://ark.intel.com/search/ad...
31. Re: ECC Memory by gweihir · 2015-03-10 10:14 · Score: 1
  
  ECC does not fix this problem. You just need to flip any 3 bits in a 32 bit word, and then ECC happily corrects to the wrong value. That makes the attack take longer, but that is all. A fix that actually works is to refresh often enough that cells do not get so weak this attack works. Most desktops and servers may already do that. But on laptops, there is incentive to refresh as rarely as possible to safe power.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
32. Re: ECC Memory by Macman408 · 2015-03-10 11:33 · Score: 1
  
  The place I used to work used to offer it as a service for $5-10 per SIMM/DIMM. If you can find a local shop that has such a tester, maybe they'd do the same. Otherwise, it's probably not terribly likely to get a low-cost solution, since the volume of such testers is pretty small.
33. Re: ECC Memory by rsaralegui · 2015-03-11 04:34 · Score: 0
  
  ECC is no longer a "business part". For example, HP ProLiant Microservers (Gen7 and Gen8) are often recommended for setting up a home NAS. They are small and relatively cheap computers (I got my Gen7 for about 180 USD ). See the specs (PDF).
34. Re: ECC Memory by Rich0 · 2015-03-11 06:01 · Score: 1
  
  I was looking at motherboards and even finding ones that support ECC is difficult, unless this is one of those situations where any motherboard works as long as the CPU supports it.
  As far as I understand it, most AMD processors do support ECC, and Intel only supports it in their upper-end products (artificial restriction to segment the market - i7/Xeon/etc). What I don't understand is where the motherboard fits in.
  Heck, Newegg doesn't even track that as an option on their product selector.
35. Re: ECC Memory by Rich0 · 2015-03-11 06:22 · Score: 1
  
  Okay, I admit it, I don't get the punchline. I even added one to the cart expecting that as the price per stick (even then, unbelievably low, but maybe for tested pulls), but no, $79 for the whole bulk pack, new???
  You can't even get no-name sticks of non-ECC labelled in Chinese for that. That can't count as a real price, can it?
  Sure. You just have to pay with bitcoin in advance. You know, just in case you're not a reliable purchaser.
36. Re: ECC Memory by Agripa · 2015-03-11 12:55 · Score: 1
  
  can you think of a way to do those tests with less cost?
  The problem with the software only test is that it does not verify the operating margin in timing, voltage, and temperature. If the motherboard supports it, then raising the memory operating frequency by say 10% and lowering the DRAM operating voltage by say 10% and *then* running the software only test like memtest86 or maybe the Prime95 stress test should work to detect marginal DRAM.
lp q s,M by HouseOfMisterE · 2015-03-09 16:37 · Score: 1

sopuM sn I ssupoo u
Rowhammer in MemTest86 & on Slashdot by PassMark · 2015-03-09 16:48 · Score: 5, Informative

It is worth noting that the row hammer issue isn't new. It as been known about for some time. Including this old Slashdot post
http://hardware.slashdot.org/s...
There has been an implementation of row hammer testing in MemTest86 V6.0 for over 6 months now as well. MemTest86 implements just the single sided hammer, whereas Google used a double sided hammer.
http://www.memtest86.com/
While the double hammer might produce more RAM errors, this pattern of memory accesses isn't very likely to occur in real life software. So is of limited use as a RAM reliability test.
What is new in this report is the fact that they manipulated the RAM bit flips to turn them into an exploit. Something that was previously speculated on but considered too hard to implement.
What they didn't show however is any results from desktop machines. All their testing was on laptops. In fact they state, "We also tested some desktop machines, but did not see any bit flips on those". So the problem isn't as grave as it might at first appear. They speculate that ECC RAM blocks the bit flips and this has also been the experience with MemTest86, most (but not all) of the flips are single bit flips, which ECC would correct.
Disclaimer: I'm one of the MemTest86 developers.
1. Re:Rowhammer in MemTest86 & on Slashdot by Anonymous Coward · 2015-03-09 18:01 · Score: 0
  
  Would you have any information on how DDR4 ram performs against this type of attack in practice? Also, why do desktop computers appear to be safe?
2. Re:Rowhammer in MemTest86 & on Slashdot by jklovanc · 2015-03-09 18:05 · Score: 1
  
  What is new in this report is the fact that they manipulated the RAM bit flips to turn them into an exploit.
  From the paper;
  
  Left unchecked, disturbance errors can be exploited by a malicious program to breach memory protection and compromise the system. With some engineering effort, we believe we can develop Code 1a into a disturbance attack that injects errors into other programs, crashes the system, or perhaps even hijacks control of the system. We leave such research for the future since the primary objective in this work is to understand and prevent DRAM disturbance errors.
  They have demonstrated the bit flip but not the exploit. An exploit would be much more difficult as you would need access to memory right next to the location you need to flip. Then flip it in just the right pattern to not crash. They have done the easy part and left the hard part to someone else.
3. Re:Rowhammer in MemTest86 & on Slashdot by Anonymous Coward · 2015-03-09 18:11 · Score: 0
  
  Thank you for your single thread cpu and entire cpu benchmark tables!
4. Re:Rowhammer in MemTest86 & on Slashdot by Anonymous Coward · 2015-03-09 18:20 · Score: 2, Interesting
  
  Similarly, this is a relatively well documented issue in the embedded world as well...
  ARM have erratum for half a dozen IP revisions over the past few decades, as do many IHVs that make SoCs based on them.
  Similar deals in the MIPS and PowerPC realms.
  Some architectures thankfully have a rather locked down MMU/MPU, where unprivileged code simply can't even attempt to rowhammer (some ARMs are like this) - alas, most do not - thus you're at the mercy of all the other hardware vulnerabilities in your system (not just your CPU, but your memory controller, interconnect bus(es), coprocessors, and memory itself - which is what this rowhammer exploit is really about, JEDEC (LP)DDR2/3 simply isn't strict enough at lower frequencies about signal integrity - which is what rowhammer exploits).
  I guess hardware validation isn't as big a deal in the x86 scene as it is in the embedded scene.
5. Re:Rowhammer in MemTest86 & on Slashdot by phantomfive · 2015-03-09 19:34 · Score: 2
  
  What is new in this report is the fact that they manipulated the RAM bit flips to turn them into an exploit.
  That's bigger than being able to corrupt memory in the first place. What it means is that every computer (laptop I guess, without ECC) is vulnerable to a privilege escalation exploit, and the difference between root and a normal user is meaningless.
  
  Next all we need is a way to exploit this from javascript. :)
  
  --
  "First they came for the slanderers and i said nothing."
6. Re:Rowhammer in MemTest86 & on Slashdot by Anonymous Coward · 2015-03-09 19:51 · Score: 0
  
  Read the section "Exploiting rowhammer bit flips".
  They explain step by step how to go forth and exploit.
7. Re:Rowhammer in MemTest86 & on Slashdot by Anonymous Coward · 2015-03-09 23:10 · Score: 0
  
  Could I careful direct solar flares towards the memory to receive a desired similar effect?
8. Re:Rowhammer in MemTest86 & on Slashdot by Anonymous Coward · 2015-03-09 23:19 · Score: 0
  
  Yes, I started runnning rowhammer on my home linux server which is the cheapest motherboard (low end asus) , cpu (semparon 125) and ram (non-ecc) I could buy on the day. I tested it with memtest86 when I first built it (probably pre-rowhammer version) and it passed. Today I have run rowhammer for 6 hours and I have not been able to reproduce it. I would consider ram that is vulnerable to this to be broken. Clearly not all ram and not all non-ecc is affected.
9. Re:Rowhammer in MemTest86 & on Slashdot by TCM · 2015-03-09 23:56 · Score: 1
  
  What about servers that employ data scrambling? From the sound of it, this should completely defeat the Hammer exploit.
  http://en.wikipedia.org/wiki/M... /scrambling
  
  --
  Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
10. Re:Rowhammer in MemTest86 & on Slashdot by Shinobi · 2015-03-10 05:11 · Score: 1
  
  It's a matter of speed and density too. The vast majority of embedded stuff operates at significantly lower speeds/throughput/memory size requirements, which means there's less vulnerability. Also, embedded stuff tends to be aimed at running very specific applications, which also reduces the vulnerability.
11. Re:Rowhammer in MemTest86 & on Slashdot by Anonymous Coward · 2015-03-10 08:12 · Score: 0
  
  There has been an implementation of row hammer testing in MemTest86 V6.0 for over 6 months now as well.
  Disclaimer: I'm one of the MemTest86 developers.
  From website:
  
  Version 6.0.0 13/Feb/2015
  New Features
  * New "Hammer Test" for detecting disturbance errors
  Feb 2015 is more than 6 months ago [within error: +/- 600%] Do you write software within the same tolerance?
Deja vu... by sotweed · 2015-03-09 16:49 · Score: 5, Interesting

This problem is remarkably similar to a problem I encountered in the memory of a 7094 (old
IBM computer) which had a core memory which stored 36-bit words. The memory was supposed
to work by operating on 6 bits at a time at 200 nanosecond intervals. The reason for this was to avoid
creating a magnetic field that was too strong. The problem occurred when the timing was off due
to failure of a component and two of the intervals overlapped. This meant that when one attempted
to store a word with 35 1s, the field created was strong enough to store 36 1s. We wrote a
diagnostic to demo the problem, and with that the engineers were able to isolate and fix the problem
in short order.
1. Re:Deja vu... by ArcadeMan · 2015-03-09 17:02 · Score: 5, Funny
  
  Alright, alright! We're getting off your lawn!
  
  --
  Get free satoshi (Bitcoin) and Dogecoins
2. Re:Deja vu... by Billly+Gates · 2015-03-09 17:21 · Score: 1, Offtopic
  
  words do not mean anything today as each cpu has a different number of bytes representing each word.
  How much ram is that?
  
  --
  http://saveie6.com/
3. Re:Deja vu... by sotweed · 2015-03-09 17:42 · Score: 5, Interesting
  
  I was describing something that happened in a machine that was built before the world settled
  on 8-bit bytes. The machine had 36-bit words, and each word had an address. The 6-bit
  nibbles were not addressable. It was 32,768 (2**15) words of 36 bits. Equivalent
  to a little over 100K bytes!
4. Re:Deja vu... by Anonymous Coward · 2015-03-09 17:55 · Score: 0
  
  That explains why you're typing on a terminal with approx 100 characters per line limit.
  Weird.
5. Re:Deja vu... by Anonymous Coward · 2015-03-09 18:01 · Score: 0
  
  >had a core memory which stored 36-bit words
6. Re:Deja vu... by Anonymous Coward · 2015-03-09 19:16 · Score: 0
  
  That explains why you're typing on a terminal with approx 100 characters per line limit.
  Weird.
  You mean Twitter?
7. Re:Deja vu... by wonkey_monkey · 2015-03-09 21:05 · Score: 2
  
  We wrote a diagnostic to demo the problem, and with that the engineers were able to isolate and fix the problem
  in short order.
  That claim alone is enough to date your story back at least 30 years.
  
  --
  systemd is Roko's Basilisk.
8. Re:Deja vu... by Anne+Thwacks · 2015-03-09 21:33 · Score: 2
  
  The IBM709x series were in use around about 1970. By 1974 the system 360 was all the rage, and bytes were in common use, so probably over 40 years ago.
  
  --
  Sent from my ASR33 using ASCII
9. Re:Deja vu... by Anonymous Coward · 2015-03-09 21:50 · Score: 1
  
  And the rest, 30 years ago was 1985. Let's put this in perspective. The Commodore Amiga was released the summer of '85, as was the lesser Atari ST. Both models used Motorola 68000 CPUs, 32 bit registers on a 16 bit bus. This is pretty much the end of the golden age of home computing.
10. Re:Deja vu... by confused+one · 2015-03-09 22:49 · Score: 1
  
  Clearly you don't work in the embedded world, where VLIW processors, using word lengths that are not a multiple of a byte, are common.
11. Re:Deja vu... by Anonymous Coward · 2015-03-09 23:05 · Score: 1
  
  Your user ID should be a negative number...
12. Re:Deja vu... by wonkey_monkey · 2015-03-10 01:51 · Score: 1
  
  Just hedging my bets :)
  
  --
  systemd is Roko's Basilisk.
13. Re:Deja vu... by frank_adrian314159 · 2015-03-10 03:38 · Score: 1
  
  Wow! That was HUGE for the day! I grew up on PDP-8's which were limited to 4096 12-bit words (unless you used bank selection extensions - some machines had more) and soon migrated to 64K 16-bit words on the PDP-11, but I did my time with IBM's 360 and Control Data's 6xxx series (with its odd 60-bit word), too. Christ on a crutch, how impoverished the world is as far as computer architectures go. And, I'm getting old. Time to go lawn-yell.
  
  --
  That is all.
14. Re:Deja vu... by Anonymous Coward · 2015-03-10 03:48 · Score: 0
  
  Actually, it's closer to 50 years, since the machine in question was taken out of service around July, 1965, after about 5 years of operation, and shipped to the DEW (Distant Early Warning) line somewhere in Canada or Greenland..
15. Re:Deja vu... by maestroX · 2015-03-10 04:35 · Score: 1
  
  I was describing something that happened in a machine that was built before the world settled on 8-bit bytes
  
  I know, you'd implement ECC by visually inspecting the pins.
Re:Frist Psot!! by grcumb · 2015-03-09 16:52 · Score: 5, Funny

I got First Post!!!!!!! Yippppppeeeeeeee!!!!!!!!
And I don't even know what a DRAM rowhammer is!!!!!!!!!!
Dude, guess what? We row-hammered you into second place. Now excuse me while I flip you the bit. :-)

--
Crumb's Corollary: Never bring a knife to a bun fight.
Multiprocessing by Bruce+Perens · 2015-03-09 16:56 · Score: 2

Multi-threaded programs really do need those cache flushes to implement their interprocessor communications, don't they? It seems to me that they would be the ones most likely to hit this problem.

--
Bruce Perens.
1. Re:Multiprocessing by Pelam · 2015-03-09 17:35 · Score: 2
  
  I'm not sure. The locked instructions, compare and exchange and mfence ensure cache coherency so in my experience the flushes are not necessary.
  Maybe driver code needs the flushes. Driver needs to know data is really in the RAM before hardware with DMA can get it.
  Cache flush instructions seem to be a late addition with SSE2.
2. Re:Multiprocessing by Bruce+Perens · 2015-03-09 17:51 · Score: 2
  
  Compare-and-exchange and mfence would be doing cache flush all of the way to RAM and global cache line invalidation, wouldn't they? So, they can potentially be used to hammer too.
  
  --
  Bruce Perens.
3. Re:Multiprocessing by TheRaven64 · 2015-03-09 22:27 · Score: 3, Interesting
  They don't flush, no. They will add memory fences, which will generate cache coherency bus traffic, but won't trigger a write back to main memory (modern CPUs can snoop the cache of other cores, so the data will be sent cache to cache).
  The main reasons for flushing the cache are:
  
  If you have some non-volatile DRAM and want to ensure consistency.
  
  If you're doing DMA on anything other than the latest Intel chips, so that the DMA controller will see the data that you've flushed from the cache.
  
  If you're writing a JIT compiler or some other form of self-modifying code (including a run-time linker) and need to ensure that i-cache and d-cache are consistent (I think x86 does this automatically, but I could be wrong).
  
  If you're writing a crypto algorithm and want to make side-channel attacks via the cache difficult.
  --
  I am TheRaven on Soylent News
4. Re:Multiprocessing by TheRaven64 · 2015-03-09 22:30 · Score: 3, Interesting
  
  Nope, no cache flush for compare and exchange. Modern CPUs use a modified version of the MESI protocol, where each cache line has a state associated with it (modified, exclusive, shared, invalid in MESI, a few more in modern variants). When you do a compare and exchange, you move your copy of the cache line into exclusive state and everyone else's into invalid. Before this, you must have the line in the shared state (where multiple caches can have read-only copies). When another core wants access to the memory, it will request the line in shared state. If another cache has it in its exclusive state, then the exclusive line will be downgraded to shared and a copy of its contents sent to the requesting site.
  If atomic operations had to go via main memory then they would be significantly slower than they are and would be a huge bottleneck for multicore systems.
  
  --
  I am TheRaven on Soylent News
5. Re:Multiprocessing by Anonymous Coward · 2015-03-10 00:31 · Score: 1
  
  AMD does cache snooping, but Intel uses an inclusive L3 cache which means all of that data is always in L3, no need to do slow snooping. It's a trade off. Exclusive gives more effective cache at the cost of increased latency when attempting to lock memory. Inclusive duplicates more data, but has lower locking latency. CPU or Memory tradeoff.
6. Re:Multiprocessing by Anonymous Coward · 2015-03-10 03:16 · Score: 0
  
  I wonder more about using this attack on the on-die CPU memory cache. This would seem to be the most likely place to defeat ECC RAM, for starters. I wonder if the hammer issue affects CPU cache?
7. Re:Multiprocessing by Anonymous Coward · 2015-03-10 03:43 · Score: 0
  
  I wonder if the hammer issue affects CPU cache?
  As long as it is not composed of dram manufactured with the same, particular buggy processes as the main memory between the certain time periods, everything should be peachy.
8. Re:Multiprocessing by Lost+Race · 2015-03-10 17:54 · Score: 1
  
  1. On-die caches are SRAM, not DRAM.
  2. On-die caches have ECC.
9. Re:Multiprocessing by Bruce+Perens · 2015-03-11 06:50 · Score: 1
  
  I suspect that we could persuade those caches to flush to RAM, simply by exhausting the number of possible lines for that address - if the cache is set-associative. Of course modern processors have multiple levels of cache, so that makes it harder.
  
  --
  Bruce Perens.
10. Re:Multiprocessing by TheRaven64 · 2015-03-11 07:14 · Score: 1
  
  I don't think I understand what you think you're trying to do. You can't make a cache flush a line that you're modifying with an atomic operation to RAM, because atomic ops require the value to be in cache. Given an n-way set associative cache, however, you can typically force cache flushes (without requiring special cache flush instructions) by writing N+1 values at cache-line offsets (e.g. at address X, X+64, X+128,...) repeatedly. This probably wouldn't trigger the rowhammer issues though, because it's up to the CPU which row it evicts each time and you'd end up repeatedly stalling on loads without bashing a single DRAM line. You might be able to do something similar with the nontemporal store instructions that Intel added in recent generations of processor...
  
  --
  I am TheRaven on Soylent News
Previous rowhammer discussion by Pelam · 2015-03-09 17:05 · Score: 1

Here is the link to the earlier slashdot discussion on this prevalent hardware bug. The original paper suggested the possibility of exploit.
Yet another design consideration by eighthdev · 2015-03-09 17:32 · Score: 1

As if there aren't enough things to take into consideration regarding security and exploits. Now we have to worry about faulty hardware design as well. Yikes!
1. Re: Yet another design consideration by Anonymous Coward · 2015-03-09 18:22 · Score: 0
  
  As an EE:
  bwahahahaha you thought you didn't before....
2. Re:Yet another design consideration by Anonymous Coward · 2015-03-09 20:33 · Score: 0
  
  This is why Mozilla's asm.js is a better solution than Google's NaCL. While NaCL is starts up faster and runs marginally faster than asm.js, if there is ever a hardware exploit then the only solution may be whitelists and signed code... just like ActiveX.
Difficult to exploit on servers by AaronW · 2015-03-09 19:12 · Score: 5, Informative

In reality this would be difficult to exploit on a server since servers typically use ECC memory. ECC memory can typically detect two bits and correct one bit error and will likely catch this unless you can flip enough bits correctly so that the ECC remains correct. Doing this means that you would need to know the contents of those memory locations prior to flipping the bits.
I don't know about X86, but on the CPUs my company makes we support hardware address randomization, so that the address lines going out to memory are randomized such that finding adjacent rows or columns can be very difficult to figure out.
It's a shame that Intel only supports ECC memory with their XEON processors rather than all of their processors. ECC does not add much in terms of cost, only 12.5% more DRAM chips and a few more traces and a few other miscellaneous parts (resistors, capacitors). Even the lowest end processors my company makes (for things like small routers, wireless access points, etc) supports ECC and address line randomization.

--
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
1. Re:Difficult to exploit on servers by hottoh · 2015-03-10 05:27 · Score: 1
  
  Modern i3s support ECC, not just XEONs. See below.
  
  http://ark.intel.com/products/77480/Intel-Core-i3-4130-Processor-3M-Cache-3_40-GHz
  
  I picked the first ARK link & the i3 supports ECC RAM.
  
  .
2. Re:Difficult to exploit on servers by Anonymous Coward · 2015-03-10 05:59 · Score: 0
  
  Not only i3, also plenty of Pentium Gs and even a bunch of Celerons. Easy to find using the advanced search
3. Re:Difficult to exploit on servers by gweihir · 2015-03-10 10:11 · Score: 1
  
  Sorry, but that is not accurate. If you flip any 3 bits on ECC, it happily corrects to the wrong value. Hence the attack may take longer and possibly produce even more crashes before it succeeds, but it still is doable. A different reason this attack will be much harder or infeasible on servers (and desktops) is that there is no reason to slow down refresh to safe power on these systems. For laptops, it can make a difference, especially when hibernating.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
4. Re:Difficult to exploit on servers by AaronW · 2015-03-10 14:08 · Score: 1
  
  The problem is you don't know how many bits have flipped, and if only one bit has flipped it will get corrected when you check. If two bits flip then it will also be detected and the fact that this occurring will be reported. Depending on the OS, it will either kill the application and lock out that block of memory or it will cause the system to halt. In any event, one and two bit errors will be reported and detected.
  
  --
  This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
5. Re:Difficult to exploit on servers by AaronW · 2015-03-10 14:34 · Score: 1
  
  Flipping more than one bit with ECC is far more difficult since you need to make sure that there is no read operation involved before the write operation. You also need to make sure that the server isn't doing memory scrubbing. If memory scrubbing is taking place, which is something I would expect any decent server to do, it will detect and correct a single bit error long before three bit errors are generated. Also, if you don't generate three bit errors but only one or two or more than three then it will be detected. Since this requires a large number of write operations over a lot of memory to make this happen it is far more likely that this will be detected.
  Also, if you start a write operation on a memory location you need to make sure that the memory controller does not first try and read in the entire cache line before the write happens. Memory accesses are performed on cache lines, not single words in DRAM. Typically the memory controller will read 32 or 64 bytes at a time from DRAM, modify the changed data, then write back the entire cache line.
  
  --
  This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
6. Re:Difficult to exploit on servers by Agripa · 2015-03-12 12:55 · Score: 1
  
  Doesn't the I4, Pentium G, and Celeron support of ECC also depend on the motherboard chipset enabling ECC despite the memory controller being in the CPU?
My solution, and I'm patenting this.. by borknado · 2015-03-09 19:13 · Score: 1

...is no physical memory, at all. The processor is simply as wide as necessary, so if your program and data is 100 megabytes large, then you need a 800 megabit processor, or a processor with 100 megabytes of register. Yes, thank you.
old servers as desktops by dltaylor · 2015-03-09 19:20 · Score: 1

So I'm not (quite) a paranoid nutcase for running server-class hardware, including always using ECC DIMMS. Current desktops are older Dell T3500s, with nearly top bin Xeons, upgraded supplies and graphics, plus, of course, 24GBytes of ECC RAM.
First big splurge on a desktop had a Tyan mainboard with the ServerWorks chipset (since Intel's were pathetic, at the time), dual P-IIIs, PCI-X, PLUS an AGP slot. Awesome, for its time.
http://www.tyan.com/archive/l_chinese/html/pr01_s2567.html
Re:Frost pist by Anonymous Coward · 2015-03-09 20:30 · Score: 0

C-C-C-C-Combo breaker
Thanks to Wang by michaelmalak · 2015-03-09 20:51 · Score: 4, Informative

All RAM on PCs used to be parity RAM until Wang started suing RAM manufacturers in the 90s over its patents on parity SIMMs.
1. Re: Thanks to Wang by fuzzyfuzzyfungus · 2015-03-09 21:18 · Score: 5, Funny
  
  That sounds like a real dick move on their part.
2. Re: Thanks to Wang by Anne+Thwacks · 2015-03-09 21:42 · Score: 1
  
  How was that patent even slightly valid? Parity was known about before WW1 - and by implication probably before An Wang was born!
  
  --
  Sent from my ASR33 using ASCII
3. Re: Thanks to Wang by Anonymous Coward · 2015-03-09 21:53 · Score: 0
  
  Wang were going down the shitter, it was the last throw of the dice. They were bankrupt a year later.
4. Re: Thanks to Wang by Anonymous Coward · 2015-03-10 00:45 · Score: 0
  
  A right cock-up. I'll bet it was Chubby Peter's idea. That guy had a real set of cahones, but he was also a prick.
5. Re: Thanks to Wang by coofercat · 2015-03-10 03:06 · Score: 1
  
  Calling their global support service Wang Care wasn't a great move.
  (the story goes that the European head had to answer directly to Dr. Wang about why the name had been changed from Wang Care to whatever it ended up being)
  Opening an office in Cologne, Germany wasn't a successful one either - no one wanted to go to Wang Cologne ;-)
6. Re: Thanks to Wang by hottoh · 2015-03-10 05:37 · Score: 3, Informative
  
  All RAM on PCs used to be parity RAM until Wang started suing RAM manufacturers in the 90s over its patents on parity SIMMs.
  
  Not so.
  
  Many had ECC. AAPL computers skipped ECC, to save money and look stupid at the same time (I made the stupid part up).
  
  AAPL = Apple Inc., FYI
7. Re: Thanks to Wang by michaelmalak · 2015-03-10 07:22 · Score: 1
  
  By "PC" I meant IBM PC and compatibles. Apple, Atari et al were "Personal Computers", not PCs.I think there was a TV commercial about that.
8. Re: Thanks to Wang by gweihir · 2015-03-10 10:08 · Score: 1
  
  And with parity, you just need two bit-flips instead of one. Does not really help. What helps is faster refresh.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
9. Re: Thanks to Wang by toddestan · 2015-03-10 15:31 · Score: 1
  
  Because it was using parity ... in a computer!
  Though surely those patents have expired by now. Time for parity ram to make a comeback?
Escalation by ThatsNotPudding · 2015-03-09 23:59 · Score: 1

Weird how most bug exploits result in pretty much every OS reacting with: "I don't know what you want, so here's the keys to the kingdom! (escalation)" instead of: "I don't know what you want, so I'm jumping out of the window and taking you with me! (system crash)."
1. Re:Escalation by Anonymous Coward · 2015-03-10 00:32 · Score: 0
  
  Err... that's why they're bug exploits, and not crashes due to a bug.
2. Re:Escalation by ledow · 2015-03-10 01:50 · Score: 1
  
  Why? Because you cannot make code run in different sections. Here, the physical hardware is PROVIDING the facility to access a table which is normally privileged, which determines whether a program is allowed to access ANY AND ALL RAM.
  The privilege is not normally available, and would normally block almost all such attacks. This is a complete way around all the hardware features that are supposed to stop this kind of access and so, of course, the kernel can do NOTHING about it.
  The problem is that most software bugs DO NOT give up escalation at all, except where poor code is run in an escalated context because it HAS to. It's actually quite hard to find a privilege escalation bug that an ordinary user can actually exploit anywhere near reliably, and they are usually patched EXTREMELY quickly. This is actually a hardware bug meaning that all such hardware precautions, restrictions and security are basically bypassed because of a hardware bug.
3. Re:Escalation by mlts · 2015-03-10 02:29 · Score: 2
  
  I wonder if there is -any- way to mitigate this in software, similar to how the Linux kernel intercepted the instructions to prevent the FDIV bug from happening in early Pentium chips. The only way I see would be to use a Bochs style emulator, and deal with its immense performance hit that its style of emulation does (where hardware virtualization hooks are not used.)
4. Re:Escalation by KingMotley · 2015-03-10 04:27 · Score: 1
  
  Of course there is. The question is how quickly it could be implemented, and what type of performance overhead it would take, and if it is worthwhile.
5. Re:Escalation by LeadSongDog · 2015-03-10 06:54 · Score: 1
  
  The sensible response is just to run the test and find out if your DRAM has this bug. If not, then the attack already fails: no amount of coding is going to make you more resistant. If you do have the bug, return the defective product for replacement. Again, no amount of coding is going to make you more resistant. Why do people persist in thinking that everything should be fixed in software?
  
  --
  Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.
6. Re:Escalation by mlts · 2015-03-10 07:47 · Score: 1
  
  Easier said than done in a lot of cases. For example, if a newer Macbook has this bug, the only way to fix the problem is to toss the entire thing.
  Having some form of software remedy, even if it is something that might see it happening and do a hard reset or a power down, may be better than a compromise in come environments, especially with regards to virtualization where getting ring 0 on the bare metal can be an incredible catastrophe.
7. Re:Escalation by gweihir · 2015-03-10 10:07 · Score: 1
  
  That is not actually what happens. In practice, unless you know exactly which bits you are attacking (think Windows where the attacker has access to the binary, or Linux with a distribution kernel), you will produce a lot of system crashes for each successful privilege escalation. Even if you know exactly, it may still produce a lot more crashes than compromises.
  This is not so different from most software-level attacks and one of the reason why repeated crashes are highly suspicious in productive systems. If you are paranoid, you limit the number of automatic reboots for this reason. But for many systems, uptime is more important than security (or at least that is what "business" thinks), hence unlimited automatic reboots are often the norm.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
GaryIsABusyGuy == Troll by Anonymous Coward · 2015-03-10 01:22 · Score: 0

Enjoy your triple Troll moderations, asshole. This is Hardware. If this were a vulnerability introduced in the Linux Kernel, then you'd have a fucking point. This is Hardware. If you're a damn software genius and build a custom OS that locks everything down to bare metal and you'll still have this bug. This is Hardware. All operating systems running on this hardware will have the exact same attack vector, and thus the playing field is just as level now as it was before the bug was discovered. This is Hardware. Even given this hardware bug, Linux security is still leagues better than the Windows Equivalent. OS X Would still be vulnerable to this bug and it's only slightly more secure than Linux given the fact that it's such a damn walled garden. This is Hardware. Do you get the recurring theme, asshole?
Why you don't use non-ECC memory by Anonymous Coward · 2015-03-10 02:11 · Score: 0

I suspect that ECC (Error Correction Code) RAM will avoid this. I will not purchase a machine without it. It also protects from failing memory. All modern systems should implement this. Yes, it is more expensive than simple parity checking RAM, or just simple RAM, but it will detect and protect you from such attacks.
1. Re:Why you don't use non-ECC memory by gweihir · 2015-03-10 10:02 · Score: 1
  
  Unfortunately, it will not. It just makes the attack take longer, as you have to introduce 3 bit flips instead of one. (Memory ECC is 1 bit correcting, 2 bit detecting error correction.) Even fast memory scrubbing is not a sure way to prevent this attack. The only way I see is to refresh fast enough that no bit flips happen or they are so exceptionally unlikely to make the attack infeasible. There is a real possibility that most non-laptop systems already do that and that this problem is the result of misguided attempts to safe power on laptops.
  That is not to say ECC is worthless. It has other benefits. But for this attack it is not the right fix.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
[citation needed] by Anonymous Coward · 2015-03-10 02:40 · Score: 2, Interesting

I've followed the PC market since the 1970s. I was paying a good bit of attention to memory cost. As far as I could tell, people noticed that parity RAM hardly ever caught errors, and that non-parity RAM was cheaper than parity RAM. Parity became optional, then eventually went away entirely, simple because it was marginally more expensive.
Patent issues came and went, but I don't think they were a major driver away from parity.
1. Re:[citation needed] by michaelmalak · 2015-03-10 07:27 · Score: 1
  
  The Wang patent was actually for having nine chips on a SIMM. When Wang started enforcing its patent, competitors switched to putting three chips on a SIMM instead. During that transition, parity RAM was scarce and expensive -- 9-chip because it was being phased out and 3-chip because quantities weren't available at first. It got people to reconsider whether parity was necessary, and it became "socially acceptable" to have non-parity RAM.
  Back in the days of discrete RAM chips, they were always installed in multiples of 18.
It's an issue of DRAM settings. by Anonymous Coward · 2015-03-10 06:14 · Score: 0

Row hammering corruption occurs when refreshes are set to far apart (posted refreshes). Consecutive reads between refreshes drain the charge in such an extent that corruption occurs. Setting up the refresh rate until all possibilities of row hammering corruption becomes nil, effectively cures this issue.
It will come at the expense of slight, hardly measurable DRAM speed losses and a slight power consumption increase.
Normal code will never hit this issue because normal reads result in cached data. Even for code severely hammering some address will result in cache usage, not memory reads. This type of hammering is possible if you can bypass cache hierarchy or evict/invalidate data stored between reads.
1. Re:It's an issue of DRAM settings. by gweihir · 2015-03-10 09:57 · Score: 1
  
  I agree on the settings. This may be why laptops could be a primary target: Refresh consumes the same power even when the CPU is idle or asleep, hence there is incentives to space it as wide as possible. On desktop and server systems it hardly matters if the RAM consumes 1W more or so, but it does matter for laptops, especially when going into hibernation. At the same time, refreshing slower under hibernation does not make sense, (well, unless you know about rowhammer) and hence laptops may often be more susceptible to this problem.
  As to bypassing caches, that is not a problem if you have a large enough memory area to play with. You do not even need root permissions for that.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Wow! That explains it. by Anonymous Coward · 2015-03-10 09:28 · Score: 0

Years ago, I was doing my undergrad project writing a kernel in 68K assembler (years ago, remember). I was at the point where my kernel tick driver was working but I didn't have any process support yet. I was getting strange reboots after an exact amount of time (2min or so); 100% reproducible on my machine. I eventually discovered the issue went away if the tick handler just did some do-nothing XOR ops to some memory locations before returning. I knew it had to be something to do with the kernel running such a small amount of code in such a highly-recurring fashion -- perhaps this was what was going on. I didn't have a scope handy or anything to determine if it was some sort of bus glitch but I always suspected...
Once I had processes actually running in the kernel, I was able to remove those XOR ops.
Just tried it by gweihir · 2015-03-10 09:51 · Score: 1

I just tried this on my older AMD Phenom(tm) II X4 965 server with 32GB non-ECC DDR3-1333 (I think) Kingston memory. Unfortunately, the datasheed does not list refresh cycle length. No results in 200 test-cycles. I did not see any description on how long the test was run in for the table in the referenced article, but 200 cycles are about 15 minutes on my machine.
This has me wonder whether this may be a problem more common in laptops, where refresh-cycles may be set to longer intervals to save power. Refresh is a significant power-consumption source for DRAM and it stays at full power even when the CPU sleeps (otherwise the memory loses its contents). At the same time, the longer the refresh cycles, the weaker the bits in the cells get before refresh and hence the larger the risk of rowhammer to succeed. Maybe some laptop or memory module manufacturers have gone into unsafe territory here? On a desktop system, refreshing faster does not matter, the power consumed is small and the loss in performance minuscule. Hence more aggressive refresh actually increases system stability. But on a laptop things are different, and even saving 1W on refresh can make quite a difference.
Does anybody know of a tool to manually change DRAM refresh-intervals on Linux/x86? The only thing I know off would be manipulating the SPD EEPROMs or going into the BIOS and changing them there if the BIOS setup supports it.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.