Slashdot Mirror


That Time The Windows Kernel Fought Gamma Rays Corrupting Its Processor Cache (microsoft.com)

Long-time Microsoft programmer Raymond Chen recently shared a memory about an unusual single-line instruction that was once added into the Windows kernel code -- accompanied by an "incredulous" comment from the Microsoft programmer who added it:

;
; Invalidate the processor cache so that any stray gamma
; rays (I'm serious) that may have flipped cache bits
; while in S1 will be ignored.
;
; Honestly. The processor manufacturer asked for this.
; I'm serious.
invd


"Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.

"In case we decide to resume trying to deal with gamma rays corrupting the the processor cache, I guess."

5 of 166 comments (clear)

  1. That's a great comment by NotSoHeavyD3 · · Score: 5, Insightful

    Since it explains the reasoning why that code is there.(Since another developer could come by and wonder why that code is there.) I've seen way too many people put in a comment like ;invalidate cache and call it a day.

    --
    Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
  2. Laptop aboard the International Space Station ? by Laxator2 · · Score: 5, Informative

    I think they use laptops on the International Space Station and there you are not protected from cosmic rays by the blanket of the Earth's atmosphere. Just read up on the phosphenes experienced by the astronauts as they try to go to sleep.

    Not sure if "gamma rays" is the correct term here, as high-energy protons are most likely to create a local change in electric charge density. With modern processors being built ont the 14 nanometres process this becones a serious problem. All the processors that are used in spacecraft and control vital functions are radiation-hardened. That usually means older fabrication processes (wider paths reduce the probability of cross-talk) and amorphous silicon (a monocrystal can sustain permanent damage from a particle of high enough energy)

    Overall, it does make sense if it is meant to be used in space.

  3. Commented out code by DigressivePoser · · Score: 5, Insightful
    The comment block was descriptive and necessary, but it should also include processor errata info to trace back to published documentation. Perhaps this was something newly discovered and the processor and software engineers were in close communications.

    "Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.

    I don't like seeing commented out code. If it's commented out then it has no business being in the source code file - even if there's an explanation in the comment block. The code's removal along with its comment block should be documented in whatever revision control system is in use. Maybe I'm bias because I worked in safety critical environments where commented out code is a no-no.

  4. Re:Sure they did by msauve · · Score: 5, Interesting

    "If I had to guess this was because of a real processor bug Intel didn't want to admit to."

    Alpha particles affecting memory is a known, but uncommon, issue. This code invalidated the cache when coming out of S1 (sleep) state. The deeper (S2+) sleep states already invalidate the cache. The longer the processor is in a static state (sleep), the more chance that an alpha particle hit will flip a bit. Invalidating the cache when coming out of a sleep state has no meaningful impact on performance. The time to re-fetch is nothing compared to the amount of time spent sleeping. Of course, there are many more bits in RAM which could be affected, so a problem is more likely to occur there, which this doesn't address.

    But it hurts nothing, avoids an (admittedly rare) issue, and is but a single instruction. I wonder why they removed it?

    --
    "National Security is the chief cause of national insecurity." - Celine's First Law
  5. It happened like this... by toxygen01 · · Score: 5, Interesting

    A friend of mine, developer of the spreadsheet SW back in the days of DOS a Norton Commander, had one customer who would keep complaining about the SW crashing from time to time. These kind of crashes would only happen to this customer and no other.

    He installed a debug build on the customer's site and and waited... and fair enough, the SW would crash, and crash again and again... at completely random places in the code. In some cases there was literally no way those lines of code could make the program crash under any circumstances.

    Well, he spent days trying to debug it and came up empty handed. Until it struck him to look at the time when the SW is crashing. And fair enough, it was crashing on one particular day in a week usually in the time-span of few hours during that day. Now comes the interesting part -- the customer's site was actually a railway station on the Slovakia-Ukraine border (in town called Uzghorod). So he called the customer to ask if there was a train in the station regularly on that day and hour every week and voila, there was one train coming from Ukraine to Slovakia with some goods. So he asked the customer to take Geiger counter and see if there was anything going on in the air.

    They found out one of the train cars was radiating like hell. It was used for transferring spent nuclear fuel before. And Ukrainians thought they would save some money by using it for regular cargo after EOL. I wouldn't like to be a person living near those railway tracks...

    tl;dr
    Spreadsheet SW was crashing on the computers in the train station and thanks to customer complaints they found out the crashes were caused by radioactive train coming regularly to the station.