That Time The Windows Kernel Fought Gamma Rays Corrupting Its Processor Cache (microsoft.com)
Long-time Microsoft programmer Raymond Chen recently shared a memory about an unusual single-line instruction that was once added into the Windows kernel code -- accompanied by an "incredulous" comment from the Microsoft programmer who added it:
;
; Invalidate the processor cache so that any stray gamma
; rays (I'm serious) that may have flipped cache bits
; while in S1 will be ignored.
;
; Honestly. The processor manufacturer asked for this.
; I'm serious.
invd
"Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.
"In case we decide to resume trying to deal with gamma rays corrupting the the processor cache, I guess."
;
; Invalidate the processor cache so that any stray gamma
; rays (I'm serious) that may have flipped cache bits
; while in S1 will be ignored.
;
; Honestly. The processor manufacturer asked for this.
; I'm serious.
invd
"Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.
"In case we decide to resume trying to deal with gamma rays corrupting the the processor cache, I guess."
Since it explains the reasoning why that code is there.(Since another developer could come by and wonder why that code is there.) I've seen way too many people put in a comment like ;invalidate cache
and call it a day.
Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
The need for error checking has been around for a very long time. Yes, cosmic particles are indeed a thing, and result in increased memory errors at high altitude, in airplanes, or especially in space.
I remember parity RAM being around in the 90s, and I'm pretty sure it's older than that. Pretty much any server these days uses ECC for this reason.
I run ECC and record the occassional bit flip in my logs once in a while. These can be found at /sys/devices/system/edac/mc/mc0/.
What's odd is that ECC is not routinely used in all hardware. Depending on the conditions it can be of great help, as the rare bit flip can cause strange problems that can take ages to track down. And it works well for figuring out when you have a bad memory module -- the computer will figure it out on its own.
that's what their embedded OSes were for. AFAIK this was in their consumer code base.
If I had to guess this was because of a real processor bug Intel didn't want to admit to. I remember when Win XP hit the shop I was at was flooded with dead computers from upgrades. Manufacturers had been selling bad ram in computers for years. By default Win98 would only make use of the first 64 MB of ram in most cases (there was a registry hack I've long since forgotten to force it to use your entire ram before going to the cache).
Anyway, XP's installer would copy the CD into ram to make the (very slow) install run faster. So you got to find out your OEM stuck bad ram in your box the hard way when the installer blew up. The best part was the upgrade couldn't roll itself back gracefully. I don't remember all the steps to fix it but it was a pain. We just did software where I was at too so it was fun having to send them somewhere else to get new ram and have them yell at me that the ram was fine. Good times.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
Your cpu has been asleep for an apriori unknown amount of time, you are powering back up you'd absolutely want to clear the cache to purge any potential bit flips. It's a relatively cheap way of insuring data integrity.
I think they use laptops on the International Space Station and there you are not protected from cosmic rays by the blanket of the Earth's atmosphere. Just read up on the phosphenes experienced by the astronauts as they try to go to sleep.
Not sure if "gamma rays" is the correct term here, as high-energy protons are most likely to create a local change in electric charge density. With modern processors being built ont the 14 nanometres process this becones a serious problem. All the processors that are used in spacecraft and control vital functions are radiation-hardened. That usually means older fabrication processes (wider paths reduce the probability of cross-talk) and amorphous silicon (a monocrystal can sustain permanent damage from a particle of high enough energy)
Overall, it does make sense if it is meant to be used in space.
"Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.
I don't like seeing commented out code. If it's commented out then it has no business being in the source code file - even if there's an explanation in the comment block. The code's removal along with its comment block should be documented in whatever revision control system is in use. Maybe I'm bias because I worked in safety critical environments where commented out code is a no-no.
A friend of mine, developer of the spreadsheet SW back in the days of DOS a Norton Commander, had one customer who would keep complaining about the SW crashing from time to time. These kind of crashes would only happen to this customer and no other.
He installed a debug build on the customer's site and and waited... and fair enough, the SW would crash, and crash again and again... at completely random places in the code. In some cases there was literally no way those lines of code could make the program crash under any circumstances.
Well, he spent days trying to debug it and came up empty handed. Until it struck him to look at the time when the SW is crashing. And fair enough, it was crashing on one particular day in a week usually in the time-span of few hours during that day. Now comes the interesting part -- the customer's site was actually a railway station on the Slovakia-Ukraine border (in town called Uzghorod). So he called the customer to ask if there was a train in the station regularly on that day and hour every week and voila, there was one train coming from Ukraine to Slovakia with some goods. So he asked the customer to take Geiger counter and see if there was anything going on in the air.
They found out one of the train cars was radiating like hell. It was used for transferring spent nuclear fuel before. And Ukrainians thought they would save some money by using it for regular cargo after EOL. I wouldn't like to be a person living near those railway tracks...
tl;dr
Spreadsheet SW was crashing on the computers in the train station and thanks to customer complaints they found out the crashes were caused by radioactive train coming regularly to the station.