Many DDR3 Modules Vulnerable To Bit Rot By a Simple Program

← Back to Stories (view on slashdot.org)

Many DDR3 Modules Vulnerable To Bit Rot By a Simple Program

Posted by Soulskill on Wednesday December 24, 2014 @02:05AM from the flipping-bits-for-fun-and-profit dept.

New submitter Pelam writes: Researchers from Carnegie Mellon and Intel report that a large percentage of tested regular DDR3 modules flip bits in adjacent rows (PDF) when a voltage in a certain control line is forced to fluctuate. The program that triggers this is dead simple — just two memory reads with special relative offset and some cache control instructions in a tight loop. The researchers don't delve deeply into applications of this, but hint at possible security exploits. For example a rather theoretical attack on JVM sandbox using random bit flips (PDF) has been demonstrated before.

92 of 138 comments (clear)

Min score:

Reason:

Sort:

Applications include... crashing computers. by Anonymous Coward · 2014-12-24 02:11 · Score: 1

I don't know if there are hundreds or thousands or hundreds of thousands of low level 'bugs' like this related to simple subsystems abused in specific ways.. but there are plenty.
Many DDR3 modules? by ArcadeMan · 2014-12-24 02:12 · Score: 3, Insightful

This is all very interesting but totally pointless! Which modules? Tell us the brands, model names, manufacturer numbers?

--
Get free satoshi (Bitcoin) and Dogecoins
1. Re:Many DDR3 modules? by DigiShaman · 2014-12-24 02:21 · Score: 4, Insightful
  
  FTFP. "We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers."
  Short version, leakage current from adjacent gates can nudge other to bit-flip. I don't think this is a manufacturing problem as it is a fundamental EE design oversight. So yeah, defective by design (unintentionally)!!
  
  --
  Life is not for the lazy.
2. Re:Many DDR3 modules? by ArcadeMan · 2014-12-24 02:29 · Score: 1
  
  It also means that 19 out of 129 DRAM modules are not affected by this problem, hence my question.
  
  --
  Get free satoshi (Bitcoin) and Dogecoins
3. Re:Many DDR3 modules? by Rei · 2014-12-24 02:32 · Score: 5, Informative
  
  If you're wanting to narrow it down, you won't like this line from the paper:
  
  In particular, all modules manufactured in the past two years (2012 and 2013) were vulnerable,
  It's pretty clever, and something I always wondered whether would be possible. They're exploiting the fact that DRAM rows need to be read every so often to refresh them because they leak charge, and eventually would fall below the noise threshold and be unreadable. Their exploit works by running code that - by heavily, cyclicly reading rows - makes adjacent rows leak faster than expected, leading to them falling below the noise threshold before they get refreshed.
  
  --
  I am a proud traitor to my species in alliance with my mother the Earth in opposition to those who would destroy her.
4. Re:Many DDR3 modules? by ArcadeMan · 2014-12-24 02:36 · Score: 1
  
  That PDF has a lot of details but TL;DR, you were able to condense it into a single paragraph that we can read in a few seconds.
  Thank you.
  
  --
  Get free satoshi (Bitcoin) and Dogecoins
5. Re:Many DDR3 modules? by DigiShaman · 2014-12-24 02:37 · Score: 4, Interesting
  
  True, and commodity chips not to exact spec will introduce disturbance errors. But apparently this is been a known problem with DRAM with various method of mitigation during the binning process. It's just that density and tolerances have become so tight that the issue is now exasperated. I wouldn't be surprised at all if those 19 models also had a few that failed if tested again and again.
  Honest. General computing from low-end PCs, phones, and other devices are long overdue in employing ECC by default. So you lose capacity and tiny performance hit. BFD if that means your data doesn't become corrupted. The only people that would care are the PC gaming benchmark queens.
  
  --
  Life is not for the lazy.
6. Re:Many DDR3 modules? by WaywardGeek · 2014-12-24 02:50 · Score: 1
  
  It sounds like you know a bit about modern DRAM architecture. Data sheets now days are not avalable to the public, so it's hard to figure out basic things, like how much power is burned in the DRAM in a simple loop. Do you have a simple rule of thumb for modern DRAM power loss? If I understand correctly, static power is minimal, but dynamic power can generate several watts of power.
  
  --
  Celebrate failure, and then learn from it - Nolan Bushnell
7. Re:Many DDR3 modules? by Luckyo · 2014-12-24 03:02 · Score: 1
  
  Overwhelming majority of "PC gaming benchmark queens" wouldn't give a toss because memory speed hasn't been a bottleneck in gaming in many years.
  People who would care are ordinary users and OEMs who would have to absorb the extra cost. Especially to OEMs costs are far from trivial.
8. Re:Many DDR3 modules? by DigiShaman · 2014-12-24 03:20 · Score: 2
  
  In my personal experience of "benchmark queens" in general; be it automotive performance or computing, are all about the synthetic numbers and zero basis on practicality (let alone value in cost). If a gamer is doesn't give a toss about a particular core subset of general computing (Video, CPU, RAM, and Storage), they're not benchmark queens. I've met plenty online who are. And when queens start debating online over numbers, the flamewars begin.
  
  --
  Life is not for the lazy.
9. Re:Many DDR3 modules? by QQBoss · 2014-12-24 03:31 · Score: 1
  
  It can, but the chances of it staying perfectly readable is very small. And realize that removing RAM from a machine puts it under a very different condition than intentionally accessing the RAM in a pattern which causes faster than normal leakage, so the results aren't mutually exclusive.
10. Re:Many DDR3 modules? by MightyYar · 2014-12-24 04:01 · Score: 1
  
  I'm not sure whether I more bothered by "benchmark queens" or people who flame over their subjective opinions. The latter are a lot like "audiophiles", unwilling to believe in blind testing.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
11. Re:Many DDR3 modules? by swillden · 2014-12-24 04:05 · Score: 1
  
  But I was assured that DRAM stays readable for minutes after they're removed from the machine?
  http://it.slashdot.org/story/0...
  Not if adjacent rows are being heavily, cyclicly read.
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
12. Re:Many DDR3 modules? by ChrisMaple · 2014-12-24 04:24 · Score: 2
  
  So, other than fixing the dram design, the solution is to refresh more frequently. A software fix might be a high priority background program that forces a full refresh at regular intervals (probably a big performance hit). If the CPU does its own dram control, there might be a register that affects refresh rate, or perhaps a microcode fix.
  The problem is analog in nature, which suggests that optimized and very clean supply voltages, and very clean and precisely timed control signals might reduce or eliminate the problem.
  In any case, this means that manufacturers need to fix their designs and test them more thoroughly.
  
  --
  Contribute to civilization: ari.aynrand.org/donate
13. Re:Many DDR3 modules? by Archtech · 2014-12-24 04:28 · Score: 1
  
  'I'm not sure whether I more bothered by "benchmark queens" or people who flame'.
  FTFY. Does anyone ever flame about anything except subjective opinions?
  
  --
  I am sure that there are many other solipsists out there.
14. Re:Many DDR3 modules? by Archtech · 2014-12-24 04:36 · Score: 1
  
  Reminds me of the first time I ever heard this particular discussion: at DEC in about 1983. A colleague who had gone to do quality engineering on VAX/VMS systems asked for statistics on crashes caused by memory errors. All VAX computers had built-in ECC (of course), but the advanced thinkers in engineering were wondering if it would be more cost-effective to do without. Money would be saved, both by the manufacturer and the customer, and systems would run significantly faster (maybe). Surely that would be worth the fairly infrequent crash, which could be recovered from with the help of backups, logs, etc.?
  We all thought the idea was daft - purely on general principle. The reduction in speed due to ECC could be exactly specified, as could the extra cost. But random crashes couldn't - and what if human error caused the backups, logs, etc. to be missing or corrupt? Worse still, what if errors were introduced that didn't cause a crash or any noticeable problem? All sorts of critical systems could go on stacking up subtly wrong data more or less indefinitely.
  To this day I always ask for ECC whenever I buy a new PC - but the only machines I have ever found that had it were Dell workstations.
  
  --
  I am sure that there are many other solipsists out there.
15. Re:Many DDR3 modules? by MightyYar · 2014-12-24 04:49 · Score: 3, Funny
  
  Climate change... [ducks].
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
16. Re:Many DDR3 modules? by tlhIngan · 2014-12-24 05:02 · Score: 2
  
  Data sheets now days are not avalable to the public
  Datasheets ARE publicly available. However, they're for the actual DRAM ICs themselves, and not of the modules.
  There are only a few DRAM manufacturers out there - Samsung, Hynix, Elpida, Micron are among them.
  Samsung Computing DRAM (they also have Graphics DRAM and others). Some of their newest chips don't have datasheets yet, but that'll be forthcoming. The older ones in production do, however.
  Hynix
  Micron (and Elpida).
  These are all generally available. Since the only real difference between them is a few timing numbers, they're not generally a huge secret - it's all governed by JEDEC standards anyhow.
  Memory modules are just collections of these chips so they can be generalized to what you buy in the store for your PC.
17. Re:Many DDR3 modules? by greg1104 · 2014-12-24 05:29 · Score: 1
  
  Memory speed can technically still be the bottleneck on large memory footprint games like BF4; see the bit-tech review for some numbers on that. The people chasing after PC gaming benchmarks reflexively use the fastest memory around though, and if you do that it's less likely for memory to dictate the speed limits.
18. Re:Many DDR3 modules? by greg1104 · 2014-12-24 05:31 · Score: 2
  
  I'm also bothered by people who put the word audiophiles in scare quotes for no good reason. P.S. Not all audiophiles are opposed to blind testing; some people like expensive audio toys that are objectively better too.
19. Re:Many DDR3 modules? by Guy+From+V · 2014-12-24 06:04 · Score: 1
  
  Audio queen here, you probably mean double blind testing.
20. Re:Many DDR3 modules? by Guy+From+V · 2014-12-24 06:11 · Score: 1
  
  If the module is supercooled quickly after its removed, it can be minutes before RAM bits start to wipe. Even if they do, RAM bits "erode" in a predictable manner allowing for information to be rebuilt if not degraded enough after power-down.
21. Re:Many DDR3 modules? by ttucker · 2014-12-24 08:05 · Score: 1
  
  To this day I always ask for ECC whenever I buy a new PC - but the only machines I have ever found that had it were Dell workstations.
  Always ECC user here as well. With Intel, only Xeon systems come with ECC support in the chipset. You are actually looking for any workstation level computer with a Xeon chip, although Dell is the only outfit with an even semi reasonable price.
22. Re:Many DDR3 modules? by MightyYar · 2014-12-24 10:46 · Score: 1
  
  Even blind testing would be an improvement.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
23. Re:Many DDR3 modules? by MightyYar · 2014-12-24 10:49 · Score: 1
  
  They aren't scare quotes - they are there to differentiate people who think they can hear things that they really can't from people who truly chase better sound. If I hear anything about oxygen in your speaker wire, you'll get the quotes.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
24. Re:Many DDR3 modules? by oobayly · 2014-12-24 11:03 · Score: 1
  
  My understanding was that oxygen free copper is supposed to more fatigue tolerant so that it gives better plug-unplug endurance, not better sound.
25. Re:Many DDR3 modules? by MightyYar · 2014-12-24 11:36 · Score: 1
  
  I've seen nonsense about inductance and capacitance. And then it'll be stranded. Oy.
  Most people are using it to make a permanent connection in their homes with stranded wire... so endurance, fatigue, corrosion are all non-issues. I would wager a very high sum of money that double-blind testing would result in no perceptible difference.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
26. Re:Many DDR3 modules? by oobayly · 2014-12-24 12:23 · Score: 1
  
  Oops, had meant to say that in my comment - that very few people will need the "endurance" - I completely agree. I have to admit that I got suckered into buying zero-oxygen-copper cables (it sounds good, doesn't it), until I decided to check what it actually meant - zilch!
27. Re:Many DDR3 modules? by Luckyo · 2014-12-24 14:33 · Score: 1
  
  That is indeed the problem with many technologies. "If they were standard, their costs would be much cheaper".
  At which point the question becomes that of "is this functionality actually needed as a standard in most use scenarios?"
  For ECC memory, this question was asked ever since the early 80s and the answer is still "no".
28. Re:Many DDR3 modules? by Luckyo · 2014-12-24 14:39 · Score: 1
  
  This used to be the problem back in the day before DDR3, true. After DD3 got to around 1333-1600MHz, the problem was effectively eliminated in favour of latency being the only reasonable bottleneck. And that actually gets worse rather than better when you increase the frequency
  The tests you link show exactly that - no noticeable difference. They're looking at 1-2% difference between 1333 modules and 2400 modules. Because that is not the bottleneck. System is bottlenecked elsewhere, most likely on GPU. If this was a bottleneck, you would see improvements that would match the differential in RAM speed, as happens with most GPU tests for example.
29. Re:Many DDR3 modules? by MightyYar · 2014-12-24 16:10 · Score: 2
  
  ALL of that audiophile stuff sounds good (pun intended).
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
30. Re:Many DDR3 modules? by sjames · 2014-12-24 19:17 · Score: 1
  
  That crazy theory again!?!
  Sir, I assure you that ducks absolutely do not cause climate change!
  But note that pirates can slow it.
31. Re:Many DDR3 modules? by sjames · 2014-12-24 19:21 · Score: 1
  
  Plenty have it on the server side. Just use a server board in your desktop.
32. Re:Many DDR3 modules? by sjames · 2014-12-24 19:43 · Score: 1
  
  In those cases, there tend to be a LOT of errors. The risk is that enough will read correctly to leak valuable information like passwords. Also, in those cases the memory is not active.
33. Re:Many DDR3 modules? by greg1104 · 2014-12-25 00:55 · Score: 1
  
  That's not how it works. The way you spot a bottlenecks in performance work is that if you change anything else, there is zero impact on the resulting system speed. Conversely, if you alter something and the system really does get faster, you must have just hit one of the bottlenecks.
  Given that, the way high detail performance goes from 83 to 86 FPS as RAM speed increases means that RAM speed must have been a bottleneck. If speed had been strictly limited by the video card instead, speeding up the memory would have given zero total system speed increase. It's not hard to get RAM fast enough to no longer be the bottleneck, but you can't just throw junk memory at this game without that turning into a limiter.
34. Re:Many DDR3 modules? by greg1104 · 2014-12-25 01:35 · Score: 1
  
  Inductance and capacitance impact total impedance, and it is possible to find bad combinations where that turns into an easily measurable problem with the cable. See high cap wire section of "Speaker Wire: A History" for how that comes out on a scope. It's very easy to find cases where the wire doesn't matter too. One of the funny things about objective audio testing is that people usually find what they set out to, because it's so easy to set up tests to give the results you want. That doesn't disprove there are no edge cases where those things do matter. Audible amplifier feedback and oscillation is a real thing.
  Serious corrosion does happen in old audio cables, with them turning a lovely puke green eventually. I have some systems going back to the 80's here, and that copper is totally grody, fer sure! Preventing that is mainly about the jacket and termination though. Just using high quality copper doesn't make it go away.
  That coat hanger wire vs. Monster Cable test used Martin Logan SL3 speakers, which was such a weird choice I have to throw the whole thing out as a waste of time. Those are electrostatic panels with a traditional woofer. Electrostatics have very different electrical properties than regular speakers. You can't really extrapolate from that exotic test to the rest of the market, where people are mainly using traditional cone and dome speakers. Car anology: you can observe that changing gas for a Tesla electric vehicle doesn't impact its performance, but that doesn't prove gas quality is irrelevant to regular engines.
35. Re:Many DDR3 modules? by greg1104 · 2014-12-25 01:38 · Score: 1
  
  Oxygen-free copper is very a much a real thing, and it does matter for some applications. The only part that's hard to support is whether those differences are audible in home audio. All other things being equal between two cables, it shouldn't matter. (All other things are usually not equal)
36. Re:Many DDR3 modules? by MightyYar · 2014-12-25 04:33 · Score: 1
  
  Perhaps you can measure things on a scope, but that doesn't mean the difference is perceptible. It's not my money, so I don't really care what audiophiles do with it - but they also seem to expect me to be impressed, which I am not. I politely nod but honestly think they are just burning their money. I can't take someone seriously who thinks that oxygen makes a perceptible difference in audio, and then think nothing of using stranded wire vs. solid. Even with an oscilloscope, the stranded vs. solid will be a much bigger difference than the 97% vs 99.99% copper. And by "much bigger", I mean "still not perceptible".
  I know a guy who does installs. He tells many stories, but I like this one: He ran out of super-expensive speaker wire specified by one customer. He temporarily finished the job with landscaping wire, of all things. It was the proper gauge and everything, but cheap stuff that he uses for outdoor installs (which unbelievable people insist on having fancy cable for! Shut those birds up, would you?). He came back later (when the specified wire came in) and told the customer what he needed to do. They guy, completely oblivious to the "problem", was horrified. Just horrified! He had been quite happy with the new system, but now noted that certain things do indeed sound wrong... the brain is an amazing machine.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
37. Re:Many DDR3 modules? by lsatenstein · 2014-12-25 10:11 · Score: 1
  
  FTFP. "We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers."
  Short version, leakage current from adjacent gates can nudge other to bit-flip. I don't think this is a manufacturing problem as it is a fundamental EE design oversight. So yeah, defective by design (unintentionally)!!
  So, as ddr3 gets more dense, and space between the cells has decreased, we should be standardizing on ECC memory for all desktops and servers. The second thought I have is "What minimal cpu clockspeed would enable this activity to occur with standard hardware? " It this problem likely to occur with off the self hardware motherboards and cpus?
  
  --
  Leslie Satenstein Montreal Quebec Canada
38. Re:Many DDR3 modules? by Bengie · 2014-12-25 11:35 · Score: 1
  
  Memory speed can technically still be the bottleneck
  And taking a piss before you head to work can save you gas money. Your link shows an 80% increase in memory speed giving a 1.7% increase in performance. Congrats, you just doubled your memory's power consumption.
39. Re:Many DDR3 modules? by greg1104 · 2014-12-25 12:55 · Score: 1
  
  Some ludicrously overpriced cable aimed at the mass market is stranded, with Monster being the biggest offender by volume. But most of the really expensive speaker cable is solid core instead of stranded, with the core size limited only by how flexible the cable needs to be. The stuff I like uses a number of 14 AWG wires that total to match 12 AWG. I've tried using twisted pairs of 12 AWG copper instead, just basic power cable from Home Depot, but I can barely route the stuff. I like the cables (and amplifiers) I use to be mechanically sound and measure perfectly, priority #1, so I never have to include them in troubleshooting why something sounds bad. Multi-component systems are hard to optimize, and making individual parts as perfect as is practical lowers the complexity. That doesn't lead me to $1000 speaker cables, but I'm not getting $5 ones either.
  When audio changes are big enough to show on a scope, normally the only reason someone bothered to isolate them out is because they were audible. Some of what audiophiles complain about here is real albeit misunderstood. Let's say you start with low-feedback amplifiers with a low damping factor, which some people think are good things. That gives you an amp that's more prone to oscillation than is has to be. If you then combine that with a high capacitance cable, next thing you know there's a perfect storm of bad design that really does sound different. What's supposed to be ultrasonic junk moves into audible. And some idiots will think that because it's different, it's better, so next thing you know every part of people's system is tweaked for more of that junk.
  There is a side of the market that demands the best engineered products for the price point at every step of the chain too though. I read audio reviews starting with the bench plots.
40. Re:Many DDR3 modules? by Luckyo · 2014-12-25 17:59 · Score: 1
  
  Actually that is how it works. Concept of a bottleneck refers to aspect of a pipe+pool system where thickness of the pipe is the limiting factor and increasing width of the pipe offers a comparable increase in flow throughput.
  When you double pipe's thickness and get 1-2% more flow, it means that your system's bottleneck is elsewhere.
41. Re:Many DDR3 modules? by strikethree · 2014-12-25 23:09 · Score: 1
  
  the issue is now exasperated.
  Not being a pedant, just trying to be helpful: The word that you are looking for is exacerbated.
  
  --
  "Someone needs to talk to the tree of liberty about its ghoulish drinking problem." by ohnocitizen
42. Re:Many DDR3 modules? by Agripa · 2014-12-27 06:56 · Score: 1
  
  This used to be the problem back in the day before DDR3, true. After DD3 got to around 1333-1600MHz, the problem was effectively eliminated in favour of latency being the only reasonable bottleneck. And that actually gets worse rather than better when you increase the frequency
  The latency at higher clock frequency does not increase in the way you suggest. It only appears that way because latency is measured in clock cycles so when the clock cycle is halved, twice as many are needed for a given duration.
43. Re:Many DDR3 modules? by Luckyo · 2014-12-27 20:07 · Score: 1
  
  Where did I post anything to suggest what you're suggesting?
  It's well known that increasing RAM frequency impacts latency in net negative way. Your suggestion implies that impact is neutral, when it's rarely so unless you buy much more expensive RAM specifically picked and binned for those frequencies and latencies. Typical RAM sold incurs significant net negative impact on latency as frequency increases. Alternative is lower reliability.
  Anyone who did any overclocking and worked with RAM memory doing it should be well aware of this issue.
44. Re:Many DDR3 modules? by Agripa · 2014-12-28 02:21 · Score: 1
  
  Where did I post anything to suggest what you're suggesting?
  And that [latency] actually gets worse rather than better when you increase the frequency
  Increasing the RAM frequency has little or no effect on latency; it only changes the unit of measurement. Latency as measured in clock cycles goes up but latency measured in nanoseconds stays roughly the same (actually it generally gets better) and it is the later which matters as far as the processor is concerned.
  The first word access time shown in this table is the most relevant:
  http://en.wikipedia.org/wiki/C...
45. Re:Many DDR3 modules? by Luckyo · 2014-12-28 08:14 · Score: 1
  
  Took me a while to figure out what you're talking about. That's some exotic trolling. Well done. Shame no one cares about it this far down the chain.
  Your case was specifically addressed long ago when I mentioned the costs. You've linked to standards table which addresses what kinds of memory are made. It's correct to state that in those standard, CAS latency generally gets net better as frequency goes up. What you are trolling on is costs - subject mentioned at the very beginning.
good news for ECC memory makers by funkymonkjay · 2014-12-24 02:21 · Score: 1

as for me, i'll wait for some real world examples of this possible exploit before i switch to ECC memeory, which would mean a new MB on top of the more exp memory.
1. Re:good news for ECC memory makers by Rei · 2014-12-24 02:35 · Score: 3, Informative
  
  According to the paper, EEC only reduces but does not eliminate the problem (section 6.3). Multiple bits can be corrupted at once.
  
  --
  I am a proud traitor to my species in alliance with my mother the Earth in opposition to those who would destroy her.
2. Re:good news for ECC memory makers by DigiShaman · 2014-12-24 02:48 · Score: 2
  
  Ouch! Seriously bad. Worse than the Pentium FPU bug (and that's bad). What good is a computer if you can't rely on the data being committed back to disk because of corruption mid-flight in RAM?! At least with the FPU bug, it was only FPU. But here we're talking about an industry wide issue where any operation cannot guaranty data doesn't become corrupted back to disk. By the time bit-rot sets in, you may have to dive into your grandfather-father-son backup archive. And that's assuming such a backup scheme is being used by those who are effected. Shit, that's assuming people are even backing up their data in the first place!!
  
  --
  Life is not for the lazy.
3. Re:good news for ECC memory makers by Anonymous Coward · 2014-12-24 03:04 · Score: 1
  
  Welcome to the Digital Dark Age
4. Re:good news for ECC memory makers by sshir · 2014-12-24 03:19 · Score: 4, Insightful
  
  At least with ECC you'll get _some_ feedback (it's random so it will pop from time to time) indicating that something fishy is going on. With regular ram all corruptions are silent so you'll get random crashes that will drive you crazy...
5. Re:good news for ECC memory makers by ericloewe · 2014-12-24 03:19 · Score: 2
  
  Difference being that the system is immediately halted if an uncorrectable error is discovered.
6. Re:good news for ECC memory makers by wolrahnaes · 2014-12-24 04:16 · Score: 2
  
  ECC does not mitigate it, but it will detect the problem where non-ECC memory will happily keep on operating with the corrupted data.
  For the standard car analogy, consider tire pressure monitoring systems. They won't stop you from getting a flat, but they'll let you know you have a slow leak where you might otherwise keep driving until it's bad enough that you notice otherwise. By that time the damage is done and you probably need a new tire.
  
  --
  I used to get high on life, but I developed a tolerance. Now I need something stronger.
7. Re:good news for ECC memory makers by 0123456 · 2014-12-24 04:24 · Score: 1
  
  Ouch! Seriously bad. Worse than the Pentium FPU bug (and that's bad). What good is a computer if you can't rely on the data being committed back to disk because of corruption mid-flight in RAM?!
  It apparently only happens if you read the same bytes from RAM 139,000 times in 64 milliseconds. If your program is doing that, you probably have a lot more to worry about than disk corruption.
  If this was actually happening in the real world, computers would probably be crashing every few minutes.
8. Re:good news for ECC memory makers by greg1104 · 2014-12-24 05:46 · Score: 1
  
  The test numbers in section 6.3 show that ECC mitigates most of the errors, as the bulk of them are single bit ones. And if you're on a system that's prone to this problem, the odds are you will see a warning about that ECC correction kicking in long before you'll hit one of the uncorrectable multi-bit errors.
9. Re:good news for ECC memory makers by Dragonslicer · 2014-12-24 06:19 · Score: 2
  
  If this was actually happening in the real world, computers would probably be crashing every few minutes.
  You mean attackers have been exploiting this ever since Windows 95?
10. Re:good news for ECC memory makers by complete+loony · 2014-12-24 11:52 · Score: 1
  
  Worse problem; VM server farms. If you can run arbitrary code, you might be able to flip bits in the hypervisor or another VM.
  
  --
  09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
Re:So, what comes next? by ThePhilips · 2014-12-24 02:29 · Score: 1

Wear Leveling?
Leakage Leveling?
P.S. Question is whether a workaround is possible with the CPU microcode.

--
All hope abandon ye who enter here.
Re:Theoretical vs demonstrated by beelsebob · 2014-12-24 02:42 · Score: 1

I would guess that it's theoretical because it involves things like knowing exactly where the JVM is positioned in physical memory, and how its pages are laid out. That, and that the demonstration involved knowing all of these things before you started.
Re:So, what comes next? by FirstOne · 2014-12-24 02:46 · Score: 1

ECC is dismissed in the article, but the article ignores that ECC systems also have a scrubbing capability
Unfortunately, ASUS is the only manufacturer that consistently includes ECC support in their AMD based motherboard line.
Malicious code can cause computers to crash by rossdee · 2014-12-24 02:47 · Score: 2

Of course if you can get the target computer to run certain code, you can completely wipe all the RAM, but wheres the fun in that huh..
1. Re:Malicious code can cause computers to crash by MightyYar · 2014-12-24 04:05 · Score: 2
  
  This gives you a way to affect RAM outside of a sandbox.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
2. Re:Malicious code can cause computers to crash by 0123456 · 2014-12-24 04:10 · Score: 1
  
  This gives you a way to affect RAM outside of a sandbox.
  Only if the sandbox lets you repeatedly access memory and flush the cache between accesses, and you happen to know where your data is in physical RAM.
3. Re:Malicious code can cause computers to crash by MightyYar · 2014-12-24 04:53 · Score: 1
  
  Ah, yes, well I should have said "possibly" :)
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
4. Re:Malicious code can cause computers to crash by Macman408 · 2014-12-24 10:23 · Score: 1
  
  It depends a bit on the physical structure of the RAM, but for the most part, the errors fall on logically adjacent rows (i.e. nearby memory addresses) in the RAM. So most of the time, you'll only affect other RAM inside your sandbox, and if you affect something outside the sandbox, it won't be far outside.
  I remember encountering a similar failure when designing a system; the particular memory controller and the particular DRAM module we were using both met all applicable specs, but when used together in a particular manner, they would fail miserably. The specific test was to alternate writing all zeros and all ones at different addresses. The RAM controller had an oddity where it would enable the drivers for the RAM data pins a very briefly before the data was known. For that particular data pattern, that meant that it would drive all ones on the data pins to the RAM for less than a nanosecond, before starting to drive all zeros (or the reverse). There's nothing really against that in the spec; the data was all correct for all the relevant setup and hold time requirements relative to the control signals. However, it caused a lot of noise on the ground plane of the DRAM module; we measured as much as 0.75V or so. (That's measuring the ground voltage on one side of the SO-DIMM to the ground voltage on the other side; it's shorted by a mostly-solid layer of copper, but that just wasn't enough to carry all the current with this particular access pattern.) So from the point of view of the RAM chips, it's a little like having your 2.5V supply voltage suddenly drop to 1.75V. It messes up all the reference voltages, so a 1 might be interpreted as a 0, or vice versa. The memory controller manufacturer refused to do anything about it (and it would've taken them many months to redesign and respin the chip anyway), but the RAM module manufacturer was friendly to us, and they beefed up the ground plane so that the noise level was much more manageable.
  In any case, I'm sure there are thousands of faults like this that are just waiting to be found and exercised in any given system. No modern computer is 100% tested, they're far too complicated. There will always be some weird sequence of things that could happen and trigger some failure - but hopefully that sequence is so odd, it'll never happen.
Does the cache control commands require root acces by TheSunborn · 2014-12-24 03:14 · Score: 2

Does the cache control commands require root access on Windows or Linux?
Wow, a Forgetful Christmas Bug by Anonymous Coward · 2014-12-24 03:21 · Score: 1

The authors did a good job of covering the issue
Also, the paper is a good primer on dram stuff in general.
Unfortunately, this Christmas present.violates the Engineer's first rule.
Try to stay out of the news, because when you are in the news, it's usually not a good thing.
The failure mechanism:
There is is bug in most DDR3 chips built especially after 2010.
If you do too many read cycles in to short a time to the same row, some bits in an adjacent row may automagically change.
Kind of a cumulative, adjacent cell disturb mechanism.
Existing programs may do this accidentally, but it is unlikely because the cache usually lowers the number of read cycles to a safe number.
This can easily be done with a strange program using cache flushes, which an ordinary x86 user process can do if it wishes.
Mitigations on existing memory controllers:
ECC likely does not help because more bits are likely to be disturbed than most ECC can handle.
Keep strange programs off your system.
Changing the refresh rate 64mS to 8mS seems to eliminate the issue with perhaps a 35% performance hit.
The OS might be able to remap the memory so that only every other physical row is used, with a 50% decrease of memory capacity.
At least it's a 100% increase in reliable memory.
Mitigations on new equipment:
DRAMS that meet their specifications would be nice, but this seems more likely to be a change in the specs.
An increased refresh rate on rows near a lot of activity.
The authors propose a probability base plan.
Seems like one based on hard accounting might be smarter if you have to change the controller anyway.
Consequences:
This mechanism produces random results.
It seems there are likely more fruitful ways to break into a system.
The ease of implementation and wide applicability still make it an (ah-hem) interesting bug to say the least.
1. Re:Wow, a Forgetful Christmas Bug by skids · 2014-12-24 03:43 · Score: 1
  
  Thank you. Very helpful of you.
  
  --
  Someone had to do it.
2. Re:Wow, a Forgetful Christmas Bug by ChipMonk · 2014-12-24 04:35 · Score: 1
  
  It's too bad you posted this as AC. You could have gotten some good karma from the mod points.
Re:Does the cache control commands require root ac by PhrostyMcByte · 2014-12-24 03:31 · Score: 5, Informative

No. These are standard instructions that many apps require to function correctly when using multiple threads. Even if you aren't using them directly, at least some of the APIs you use most certainly are.
This makes me think of mid-90s Macs by RogueWarrior65 · 2014-12-24 03:40 · Score: 1

Way back when RAM was stupid expensive, one way to reduce cost was to use so-called composite RAM. On high-end Macs back in the early-mid 1990s, that could cause the machine to not boot but instead play the first four notes of the Twilight Zone theme song.
Not theoretical. It's hogwash. by Anonymous Coward · 2014-12-24 03:42 · Score: 5, Funny

This is ridiculous. Realistically, when have you ever run into a situation where stib teg ylirartibra deppilf?
Using Non-ECC Ram is Unacceptable by BrendaEM · 2014-12-24 04:16 · Score: 1, Insightful

Unless you are making a Speak-and-Spell, it's foolish not to use non-ECC RAM. I would rather pay an additional 9th as much and have some peace of mind that the RAM will at least keep from flipping a bit from comic rays, which happens about once a week.
I take that back; put it in the Speak-and-Spell, too.

--
https://www.youtube.com/c/BrendaEM
1. Re:Using Non-ECC Ram is Unacceptable by twistedcubic · 2014-12-24 07:17 · Score: 1
  
  This is true. However, getting a laptop with ECC RAM straight from the manufacturer is never an option, and impossible when RAM is soldered onto the motherboard. I think if Apple started using ECC RAM, and advertised it, others might follow suit (like with the "retina" displays).
2. Re:Using Non-ECC Ram is Unacceptable by thegarbz · 2014-12-24 10:53 · Score: 1
  
  How foolish and for what specific workload? I have a gaming rig where I sometimes edit photos and do 3d design and some light coding. In the past 10 years I've never seen any visible data corruption and not had an inexplicable crash.
  So tell me again why I should spend the money? Your once a week problem sound note theoretical than practical.
3. Re:Using Non-ECC Ram is Unacceptable by Archtech · 2014-12-25 23:47 · Score: 1
  
  Why was my comment moderated "Troll" when I merely pointed out that the parent had unintentionally inserted an extra negative in his statement? The drift of his comment was surely that ECC RAM is better. Yet he wrote "it's foolish not to use non-ECC RAM".
  It's sad that moderators don't take the trouble to read what is in front of them. Or, worse still, that at least one moderator routinely mods my comments "Troll" without reading them.
  
  --
  I am sure that there are many other solipsists out there.
memtest86 includes a test for this by Anonymous Coward · 2014-12-24 04:17 · Score: 1

Sort of already known 'weakness', recent memtest86 include the 'hammer test' for the purpose of testing this case, see http://www.passmark.com/forum/showthread.php?4836-MemTest86-v6-0-Beta
Re:Does the cache control commands require root ac by 0123456 · 2014-12-24 04:17 · Score: 1

No. These are standard instructions that many apps require to function correctly when using multiple threads.
Can you explain when you'd need to flush the cache when using multiple threads? You'd have to flush the cache back to RAM (isn't that a privileged instruction?), invalidate it, then read the data back from RAM. That's surely insanely slow compared to just using the CPU's internal cache coherency mechanisms?
Known issue by Anonymous Coward · 2014-12-24 05:10 · Score: 5, Informative

This has been know for some time. It's been referred to as "Row Hammer" and has been discussed at length by Intel and DRAM manufacturers.
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=intel%20row%20hammer
I've seen it cause multi-bit errors in ECC systems
So.... by Festering+Leper · 2014-12-24 06:09 · Score: 1

Liquid nitrogen for your RAM then...?

--
if you want people to think you know what you are talking about, just put ".com" at the end of everything you say.com
Re:not for threaded xode by 0123456 · 2014-12-24 06:24 · Score: 1

That seems more likely, but, when I was writing DMA code years ago, we put the buffers in non-cached RAM (and there were only written to from a driver in the kernel). Maybe explicit cache flushes are faster these days.
Re:Why we need coders. by Pelam · 2014-12-24 06:47 · Score: 1

XD
Write-Only Memory by marciot · 2014-12-24 06:49 · Score: 1

This is the reason I recommend that everyone invest in write-only memory for their computers. It is far more secure and hack proof than the alternatives.
Not the first time hammering caused trouble. by Ungrounded+Lightning · 2014-12-24 11:46 · Score: 1

Story I heard about mid-20th-century IBM mainframe. (I think it was the 360 series).
Core memory was tight and had cooling issues. The designers examined the instruction set and determined that, given cacheing and the like, no infinite loop could hammer a particular location more than one cycle in four (25% duty cycle), for which cooling was adequate. So they shipped.
Turns out, though, you could do a VERY LONG FINITE loop that hit a location every other cycle, for 50% duty cycle (not to mention the possibility of hitting a nearby location with some of the remaining cycles). Wasn't too long before a student managed to do this.
And set the core memory on fire.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Allready known for flash memory by xluap · 2014-12-24 12:35 · Score: 1

Read disturb was allready known for flash memory. Read disturb is when a flash cells flips a bit when other cells adjacent to the disturbed cell are repeatedly read.
1. Re:Allready known for flash memory by Agripa · 2014-12-27 07:44 · Score: 1
  
  We used to call this "pattern sensitivity" when applied to RAM.
Wow. Superbad. by drolli · 2014-12-24 15:32 · Score: 2

Thats an evil bug. This could even be triggered accidentally by bad programming.
But more imporant, this allows you to break your VMs memory boundaries without any restriction. If you happen to make an educated guess about the memory layout of the physical machine and the host and guest kernel images loaded, you can try to
a) manipulate the host kernel directly (that would be nearly undetectable)
b) manipulate private keys in other VMs or the host
c) manipulate other VMs memory
d) communicate between VMs
And all of this independent of any software bug. The only thing which can be done about it would be to disable the feature on the simulated guest processor which allows to manipulate the cache arbitratily (and implicitely limit running guest programs to 1 core!). Alternatively,increase the refresh rate (i remember that the refresh rate could acturally be set manually in the 90s).
That being said, i just wonder if it possible to trigger this bug from a high level language (e.g. matlab) or the JVM where the operation causing the problem could be used implicitely for some vectorized code or other operations, e.g can this bug be triggered by the voilatile keyword in Java and accessign the memory in the same way?
1. Re:Wow. Superbad. by Pinhedd · 2014-12-25 05:27 · Score: 1
  
  It's not possible to do any of those.
  1. The mechanism that this uses doesn't provide for deterministic results. At worst, rewriting the same row numerous times may result in some of the bits in spatially related rows being corrupted.
  2. Address spaces are highly randomized and virtual to physical translation makes it incredibly difficult to obtain even an educated guess as to the layout.
  This exploit just allows an attacker to possibly corrupt nearby data. It's a troll tool, nothing else.
2. Re:Wow. Superbad. by drolli · 2014-12-27 01:53 · Score: 1
  
  Maybe. Maybe not. Not sure what the effect of secod order page translation would be if you manage to trigger the loading of a module (of the first use of memory in a module) in another VM after your VM hase been loaded. If you manage to trigger the access to the modules data memory, whci normally may be unuses after you allocate ("pad") enough memory, i could imagine that you can actually kill "nearby" data (which in Second order translation would apprear physically close to you memory).
  I am not saying that this is a MOV instruction into another VMs memory, but merely stating that by a educated guess, and innocent network communication you could sometime reset/clear flags or counters which may enable you to do further things.
Re:Theoretical vs demonstrated by MShook · 2014-12-24 20:41 · Score: 1

Because it's a scientify theory or as wiki says: A scientific theory is a well-substantiated explanation of some aspect of the natural world that is acquired through the scientific method and repeatedly tested and confirmed through observation and experimentation. As with most (if not all) forms of scientific knowledge, scientific theories are inductive in nature and aim for predictive power and explanatory force.
Re: Does the cache control commands require root a by Bengie · 2014-12-25 11:48 · Score: 1

FYI: You can snoop L2 cache, but not L1. Intel went with inclusive cache so snooping wouldn't be needed. AMD went with exclusive, which gives better cache usage, but went trying to sync threads, all of that cache snooping is a high latency operation. By having cache being inclusive, you no longer need to snoop, just look at cache normally.

AMD has higher overall throughput for many GPU type work loads, but Intel shines with work loads that require thread syncing.