Intel Patents On-Chip Cosmic Ray Detectors
holy_calamity writes "Intel has been awarded a patent for building cosmic ray detectors into chips, to guard against soft errors where a high energy particle from space changes a value in a circuit. It's a problem that largely only affects RAM. As component sizes shrink futher, "this problem is projected to become a major limiter of computer reliability in the next decade", says the patent. Intel's solution is to build in a detector that responds to cosmic errors by repeating the latest operation, reloading previous instructions, or rolling back to a previous state. You can also read the full patent."
I guess the butterfly stunt in http://xkcd.com/378/ wouldn't work after all.
Quidquid latine dictum sit, altum viditur.
But you can't really verify it because those events are so rare. It seems to me that Intel's innovation is to use some sort of detector, instead of using two or more chips and a comparator. It's probably way cheaper, but it won't work if the majority of unexplainable events are not, in fact, caused by cosmic rays but by some other effect (perhaps something temperature-related).
How did they manage to build a detector that can work out whether the cosmic rays collided with the actual bits (no pun intended) that hold the data? According to the oracle, cosmic rays collide with nuclei in an essential random way, so there's no way a detector could just see a ray passing through and know whether it was on a collision course. Perhaps they are detecting the pions and other subatomic particles that result from a collision actually occurring? If they've found a way to do that then it sounds fairly ingenious to me and a well-deserved patent.
apterous.org
Cosmic ray detector certainly makes for better marketing hype than ECC.
It's just as likely registers could be corrupted, or the "rollback" state. Wouldn't be easier to have, I dunno, maybe error correction/detection involved, instead of some arbitrary cosmic ray detector?
Sometimes the more "esoteric" designers attempt to get simply leads to more potential for disaster.
Cosmic ray detection would be far better for random number generation, than anything else.
I know at least four people who REALLY could have used this. Oh well, too late now.
SJW: Someone who has run out of real oppression, and has to fake it.
/sigh They haven't built it... They haven't even designed it... They don't even know if it is possible... Yet... they can patent it? Patents should require proof of a working/workable DESIGN...
It seems to me that, even if the individual detectors are very simplistic, and the geocoding of inputs is very rough, there would probably be some interesting scientific uses for a multi-million node planet-sized distributed cosmic ray detector.
Does anyone in an relevant field see a good use for this?
"The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge
It won't take long for someone to figure out how to detect the gamma errors and create what amounts to a geiger counters on laptop computers. If this bill passes http://www.villagevoice.com/news/0803,thompson,78873,2.html will everyone be required to get a permit for their laptop computers? ;-)
I nominate this patent as the poster child for Patent Reform.
Tech Support: "No, sir...clicking on 'Remember Password' will NOT help you remember your password."
Currently, chips (both computational and memory) are protected against soft errors using multiple methods. There are rad hardening methods (both hardware and software) and most of the latest research involves using error correcting codes. Simply duplicating the output and comparing can only detect errors in one bit. The more the times you duplicate, the more you can detect (it progresses as n-1), and the max length of error that can be corrected is half that. However, this takes a lot of space (duplication that is), so generally other codes such as Hamming or BCH codes are used.
The main problem using codes and everything is that cosmic ray errors cause whats called single event upsets and most codes can not detect 100% of errors where the hamming weight of the error (sum of number of ones in the error vector) is larger than the designed specification of the error. The problem comes when the SEU manifests itself as a multi-bit fault and the error vector cannot be detected by the code. SEU's are the most common type of errors in space application : See http://www.eas.asu.edu/~holbert/eee460/see.html
The contribution of the cosmic error detector is that if you know you have a cosmic ray at some point in time, you can flush and redo your computation (for computation channels eg microprocessors etc) or flush that line in memory (for memory channels) in case of SEU's and that is a pretty big deal.
Legally obligatory sig : My opinions are my own... etc etc
POWER6 has actually be shipping with this for a while - if an instruction fails (cosmic ray or not, although in terms of random bit-flipping events they account for a large percentage), it gets automatically retried, transparently to the rest of the system. Without this sort of thing you generally take a hard fault - so this type of protection is great to see. Same thing on a SPARC64, incidentally (but not UltraSPARC - ie Niagara or children). What sets the POWER6 apart from both SPARC64 and this patent is if that instruction fails repeatedly Possibly indicating a chip fault), in many cases it can actually back the instruction out of the failing core and slap it onto another core, also transparently and avoiding a hard crash. Someone noted that this has been done on mainframes for years - yup, also true. This is another case of UNIX-class technology making inroads up the platform stack.
In the late 70's TC May, an scientist working at Intel proved that cosmic rays could flip bits... given that discovery was many years ago, it seems rather clear that as chips get smaller, etc. that cosmic ray dectection could be a good thing on chips. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1479948
http://www.hawknest.com/
For RAM - there is really no problem - just use error checking. It's got to be easier to add an extra couple of bits to the width of your RAM to permit error-correction than to have a cosmic ray detector for every single bit.
The tricky problem isn't RAM - it's computational elements. There is no single way to error-correct computational elements because they are so diverse. A multiplier would need different protection to an adder which is different from a shift-register. Hence, the idea of rolling back (say) the last instruction executed and having a "do-over".
But for large arrays of homogeneous circuitry - like RAM - this doesn't seem worth the effort.
www.sjbaker.org
I feel another distributed computing project coming up: after SETI@home, and Folding@home, maybe this would make for an interesting way to get statistics on cosmic rays?
... Just mount the chips in a vertical fashion. I work in an X-ray crystallography lab and we have a large format CCD detector. It's maybe about half a foot in diameter, but because it is mounted vertically, I see a cosmic ray streak maybe once every 200 or so 40 second exposures. Compare that to a cosmic ray detector of roughly the same size which is mounted horizontally in the other side of the building. It's counting cosmic rays almost constantly.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 is the magic number.
You could stage a denial of service attack using your death ray to target a given victim's machine. Their machine will completely lock up.
This sounds similar to what DARPA's EMPiRe project is doing.
This subject reminds me of a paper I saw some time ago, on a way to use the cosmic rays to your advantage and breaking out of the JVM. Here's the link: http://www.cs.princeton.edu/sip/pub/memerr.pdf
Its widely acknowledged that Intel created EMF burst proof chips for the government. The technology inside of them was never publicly discussed. I think it might be similar to cosmic ray correction. They might just be patenting a sub set of it now before the shrinking die sizes cause someone else to patent technology they've been using for years.
Well.. maybe. Or Maybe not. But Definitely not sort of.
So they can tell now when a cosmic ray hits chip, and correct for it. But what happens when a cosmic ray hits the cosmic-ray detector and scrambles its brains, huh? Will we need a corrector for the corrector now too? And a corrector-corrector corrector? WHERE WILL IT ALL END
the coolest club on
I remember joking about "stray alpha particles from the sun" screwing up code for class projects. Now Intel is trying to take that one away, all we'll be left with is "my dog ate it".
this is my sig, there are many like it, but this one is mine.
More for laughs than anything else, I started logging them and found that a server with 16GB got maybe one ot two hits per week. After that I started to take ECC seriously - for professional quality servers.
You probably don't need it for the domestic appliance quality stuff that people run at home - but for real work, get some decent kit
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Tin foil hats, for RAM!
In order to get a good idea of whether a few bits have changed in a large RAM array due to radiation (which is all it takes... more than a couple of bits can bollix data even in ECC memory), the detector itself would have to be comparable in size to the memory array.
It is a waste of space.
It would be cheaper (and maybe even lighter) to just radiation-harden the chip.
That's what you are patenting: the idea! Although you are supposed to be working on something commercially viable.
The patent office does not insist on working models unless it is an extremely unlikely idea... like perpetual motion, or free energy. There are good reasons for that.
You're forgetting about what happens after an interaction of a cosmic ray with an atom. In the case of the ray being a neutron, the interaction will result in a lot of kinetic energy imparted to the nucleus (called the primary knock-on atom) which will then tear off a bunch of electrons as it slows down (a heavy charged particle with a given energy will have a well defined range in matter, which is why ion implantation superceded diffusion in chip fabrication). The range of the nucleus will likely be much larger than the thickness of the chip which would allow for use of a separate detector.
This doesn't sounds so extremely new to me. You can even download the vhdl to a rad hard Leon3 (SPARC V8 instruction set) at gaisler here. This chip covers SEU (Single Event Upsets) typical of those caused by cosmic rays.
Microsoft's XP crash analysis early in this decade concluded that PCs always left on tended to crash unexpectedly. Dump analysis showed strange values in key OS variables, and cosmic rays (or other bit-blasting particles) were among the likely sources. The conclusion was so clear that Microsoft floated the idea (see URL above) that Vista-generation PCs should use Error-Correcting Code (ECC) memory to detect and fix multi-bit errors -- in consumer PCs. [Note that servers and business workstations have used ECC memory for decades].
Having seen corrupted data in my own copy of Microsoft Money and other applications that I have left open for weeks, I am prepared to accept cosmic rays as well as Microsoft bugs as potential sources. Finally, why would Intel invest R&D capital in a cosmic ray detector if it had no likely or practical use for Intel's consumer and business customers?
Maybe the solution will come when we abandon charge-states as our means of information processing, and instead shift into photonics. These components will then be immune to ionizing radiation.
Meaning that the insane one was allowed to try and patent this? ;)
Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
http://xkcd.com/378/
Microsoft claims Vista's poor performance and unreliability are due to interference from cosmic rays. Vista makes a computer run so fast, they claim, that cosmic rays present a serious threat to the computer's stability, often resulting in lower performance than older operating systems like XP. Microsoft plans to release a cosmic ray shielding computer case, which will retail for $300, and should be released some time this month. Current Vista license holders will get a $50 discount.
"When information is power, privacy is freedom" - Jah-Wren Ryel
Patents are easier to read online with Google Patents. It also lets you download a PDF.
here
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
Does anybody know is your typical battery-backed CMOS RAM is susceptible to corruption by cosmic rays too?
I've looked into a few systems that arrived DOA due to a corrupt CMOS RAM (they were OK after resetting them with the jumper on the motherboard) after air shipment from the US to Europe or Asia and I wonder if that's the root cause.
thegodmovie.com - watch it
Tho ensure the integrity of data and program logic flow under conditions that can cause bit flipping, just run multiple copies of your code on a multi-core CPU and have any thread that does not match the majority become invalid. If three threads agree and the other will not self terminate then they vote it of the CPU. Each thread has it's own memory blocks too, so you have the computational equivalent of a RAID. It would be cheaper and faster than Intel's idea as no roll-back is required, just perhaps a block copy to over-write the corrupted code/data. Anyone interested in cooking up this in a custom Linux kernel?
dan@tekgnu.com
You'd have to hit the computer in question with enough radiation that a number of errors should occur. Then you see if the mechanism actually works. It might, of course, be somewhat expensive to build or borrow a particle accelerator that provides a passable approximation of cosmic rays ;-)
As Mrsooreams wrote just one post below you, it seems not guaranteed that the particles actually hit the detector and not only the computing elements. So I'd take this one with a grain of salt...
C - the footgun of programming languages
I recall reading a paper IBM published years back, advocating why ECC memory is still necessary due to cosmic rays... I forget, but they possibly tied it into why their chipkill (essentially, a RAID array for each DIMM) technology should be used, but as i understand it ECC will be adequate for correcting 'a' cosmic ray bit flip. chipkill is only necessary if you have multiple bits flipped, which potentially is the concern as dimm sizes scale up.
I really fail to see this as more than some marketing hype for a solution to an already solved problem.
Troll, Troll, go away and flame again some other day
As someone who has worked for several hardware vendors, including Sun, I am still amused by the truth in the following joke that was once told to me:
Q: How do you tell the difference between a computer salesman and a used car salesman?
A: A used car salesman knows when he is lying.
( My apologies to those computer salesmen who do really understand the technology they sell. Unfortunately there are too many who do not. )
My discussion above was in regard to "utility" patents, which are for inventions and the like. It is also possible to get a "design" patent, which is on things like shape and style... that is a different animal entirely. The standards and rules are completely different for design patents, and not really relevant to the discussion.