Intel Patents On-Chip Cosmic Ray Detectors
holy_calamity writes "Intel has been awarded a patent for building cosmic ray detectors into chips, to guard against soft errors where a high energy particle from space changes a value in a circuit. It's a problem that largely only affects RAM. As component sizes shrink futher, "this problem is projected to become a major limiter of computer reliability in the next decade", says the patent. Intel's solution is to build in a detector that responds to cosmic errors by repeating the latest operation, reloading previous instructions, or rolling back to a previous state. You can also read the full patent."
Cosmic ray detector certainly makes for better marketing hype than ECC.
... Just mount the chips in a vertical fashion. I work in an X-ray crystallography lab and we have a large format CCD detector. It's maybe about half a foot in diameter, but because it is mounted vertically, I see a cosmic ray streak maybe once every 200 or so 40 second exposures. Compare that to a cosmic ray detector of roughly the same size which is mounted horizontally in the other side of the building. It's counting cosmic rays almost constantly.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 is the magic number.
I work on distributed cosmic ray detectors. The patent is very sparse with details, so it's difficult to say much about it. The biggest problems I see are timing and data analysis. The detectors need to have a synced clock to within a few nanoseconds. This is possible with GPS if you know all the circuitry and the delays therein. But I don't think you could do it in normal pc's. Now each pc needs at least two detectors to do some triggering before you send the data. If you don't you'll end up with huge amounts of "noise" data. After that you still have a huge pile of raw data collected from a collection of (probably crappy) detectors who are not calibrated.
it seems painfully inefficient to 'redo' stuff that doesn't seem to be wrong just because a cosmic ray was detected.
1) The likelihood of a cosmic ray is ridiculously small. So small in fact that the cost of rewinding progress when they are detected would be completely unnoticeable.
2) We *do* have the ability to package CPUs such that they are protected by CPUs. The problem is that the packages are so large and expensive that no one would buy them given the current probability of soft errors.
So the solution is most definitely NOT to stop shrinking transistors. Even in 10 process technology generations, the mean time to a soft error actually affecting a bit on a CPU is something like 1 million hours. Never mind whether or not that particular soft error is critical.
The laws of probability forbid it!
[quote]They didn't, they've created a detector which works out whether the chip was hit by a cosmic ray or not.[/quote]
As the GP said, there is no way of knowing wheter a cosmic ray passed through you or not. The cosmic ray could easily just smash your bit to a new, random state and pass happily unhindered through the actual detector thingy. Only way to improve the situation would be to build a large detector volume (at least a couple cm^3).