That Time The Windows Kernel Fought Gamma Rays Corrupting Its Processor Cache (microsoft.com)
Long-time Microsoft programmer Raymond Chen recently shared a memory about an unusual single-line instruction that was once added into the Windows kernel code -- accompanied by an "incredulous" comment from the Microsoft programmer who added it:
;
; Invalidate the processor cache so that any stray gamma
; rays (I'm serious) that may have flipped cache bits
; while in S1 will be ignored.
;
; Honestly. The processor manufacturer asked for this.
; I'm serious.
invd
"Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.
"In case we decide to resume trying to deal with gamma rays corrupting the the processor cache, I guess."
;
; Invalidate the processor cache so that any stray gamma
; rays (I'm serious) that may have flipped cache bits
; while in S1 will be ignored.
;
; Honestly. The processor manufacturer asked for this.
; I'm serious.
invd
"Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.
"In case we decide to resume trying to deal with gamma rays corrupting the the processor cache, I guess."
The need for error checking has been around for a very long time. Yes, cosmic particles are indeed a thing, and result in increased memory errors at high altitude, in airplanes, or especially in space.
I remember parity RAM being around in the 90s, and I'm pretty sure it's older than that. Pretty much any server these days uses ECC for this reason.
I run ECC and record the occassional bit flip in my logs once in a while. These can be found at /sys/devices/system/edac/mc/mc0/.
What's odd is that ECC is not routinely used in all hardware. Depending on the conditions it can be of great help, as the rare bit flip can cause strange problems that can take ages to track down. And it works well for figuring out when you have a bad memory module -- the computer will figure it out on its own.
One component that many defence contract required was a Nuclear Event Detector. This little component would set a pin when it detected the precursor of a nuclear detonation. What the system did next was up to the vendor, but usually it would involve a shutdown and disconnect of ports and power lines.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Some of the newer Doppler WX radars do a rapid narrow scan in some modes of operation for some fine examination of a particular front or phenomenon they want to image with more detail or using some more specialized mode like water vapor density, etc.
So, the usual low(er) power scanning 'round and 'round, like radars usually do, probably isn't enough to trigger this poster's problem, but if the high-powered focused scans happen to be in his direction, well, bad news that day.
Perhaps some Meteorologist can weigh in on this mode of operation with the radars, I don't know enough about them to be more specific.
-- You are in a maze of little, twisty passages, all different... --
"If I had to guess this was because of a real processor bug Intel didn't want to admit to."
Alpha particles affecting memory is a known, but uncommon, issue. This code invalidated the cache when coming out of S1 (sleep) state. The deeper (S2+) sleep states already invalidate the cache. The longer the processor is in a static state (sleep), the more chance that an alpha particle hit will flip a bit. Invalidating the cache when coming out of a sleep state has no meaningful impact on performance. The time to re-fetch is nothing compared to the amount of time spent sleeping. Of course, there are many more bits in RAM which could be affected, so a problem is more likely to occur there, which this doesn't address.
But it hurts nothing, avoids an (admittedly rare) issue, and is but a single instruction. I wonder why they removed it?
"National Security is the chief cause of national insecurity." - Celine's First Law
A friend of mine, developer of the spreadsheet SW back in the days of DOS a Norton Commander, had one customer who would keep complaining about the SW crashing from time to time. These kind of crashes would only happen to this customer and no other.
He installed a debug build on the customer's site and and waited... and fair enough, the SW would crash, and crash again and again... at completely random places in the code. In some cases there was literally no way those lines of code could make the program crash under any circumstances.
Well, he spent days trying to debug it and came up empty handed. Until it struck him to look at the time when the SW is crashing. And fair enough, it was crashing on one particular day in a week usually in the time-span of few hours during that day. Now comes the interesting part -- the customer's site was actually a railway station on the Slovakia-Ukraine border (in town called Uzghorod). So he called the customer to ask if there was a train in the station regularly on that day and hour every week and voila, there was one train coming from Ukraine to Slovakia with some goods. So he asked the customer to take Geiger counter and see if there was anything going on in the air.
They found out one of the train cars was radiating like hell. It was used for transferring spent nuclear fuel before. And Ukrainians thought they would save some money by using it for regular cargo after EOL. I wouldn't like to be a person living near those railway tracks...
tl;dr
Spreadsheet SW was crashing on the computers in the train station and thanks to customer complaints they found out the crashes were caused by radioactive train coming regularly to the station.