Intel Skylake Bug Causes PCs To Freeze During Complex Workloads (arstechnica.com)

← Back to Stories (view on slashdot.org)

Intel Skylake Bug Causes PCs To Freeze During Complex Workloads (arstechnica.com)

Posted by samzenpus on Monday January 11, 2016 @09:00AM from the protect-ya-neck dept.

chalsall writes: Intel has confirmed an in-the-wild bug that can freeze its Skylake processors. The company is pushing out a BIOS fix. Ars reports: "No reason has been given as to why the bug occurs, but it's confirmed to affect both Linux and Windows-based systems. Prime95, which has historically been used to benchmark and stress-test computers, uses Fast Fourier Transforms to multiply extremely large numbers. A particular exponent size, 14,942,209, has been found to cause the system crashes. While the bug was discovered using Prime95, it could affect other industries that rely on complex computational workloads, such as scientific and financial institutions. GIMPS noted that its Prime95 software "works perfectly normal" on all other Intel processors of past generations."

10 of 122 comments (clear)

Min score:

Reason:

Sort:

Deja Voo of the Pentium 5 FDIV bug by xmas2003 · 2016-01-11 09:04 · Score: 4, Insightful

Old-timers will remember the Pentium 5 FDIV bug where the chip sometimes yielded incorrect results for complex mathematical calculations.

--
Hulk SMASH Celiac Disease
1. Re:Deja Voo of the Pentium 5 FDIV bug by Junta · 2016-01-11 09:09 · Score: 5, Informative
  
  Well 'Deja Vu' and you can leave '5' off.
  For an analogous screw up, you only need to look at Haswell/Broadwell and TSX feature, which they retroactively disabled due to defect.
  The FDIV was noteworthy because the state of things were such that they didn't have much recourse other than replacing the processors. We haven't seen a defect such that processors had to be physically recalled at such scale since, though there have been a number of similarly disastrous issues, if not for the fact they could push a microcode change to disable something or workaround...
  
  --
  XML is like violence. If it doesn't solve the problem, use more.
2. Re:Deja Voo of the Pentium 5 FDIV bug by 110010001000 · 2016-01-11 09:14 · Score: 4, Informative
  
  All processors have bugs. Some are fixed and some are not. You can obtain errata sheets from the manufacturers. At least this one is easily fixable.
Re:Lack of competition by Moof123 · 2016-01-11 09:15 · Score: 5, Insightful

If you saw the actual errata list for processors on launch day, regardless of manufacturer, your jaw would drop. A lot of nasties get cleaned up on subsequent revisions (mask changes), but in the meantime patches show up for the BIOS, libraries, and compilers so that the user never sees the warts. With Billions of transistors there will be design errors that even intel will not catch during verification or characterization. The fact that a BIOS fix will take care of it is a sign that it is not that egregious.
If you want to avoid this kind of stuff you should wait a few months after any major shakeup to buy.
Re:Lack of competition by Moof123 · 2016-01-11 09:17 · Score: 5, Informative

Go see page 21 for example:
http://www.intel.com/content/d...
When hardware must just work by Puff_Of_Hot_Air · 2016-01-11 09:31 · Score: 5, Informative

This is a really interesting talk from 32c3 detailing the challenges involved in designing and verifying something as complex as a CPU where it can only be simulated at 1 Hz and costs 5 million to produce silicon for testing. https://www.youtube.com/watch?v=eDmv0sDB1Ak. The level of difficulty on getting this right just blows my mind. If it weren't for economies of scale CPU's would be completely out of reach. Also interesting in the talk is the vast number of CPU defects that are found and cataloged that most people appear to be unaware of. Most are of little importance (and hence don't get fixed), but some are fixed via code (as in this case), but there is no guarantee that these are being patched by OEM's.
1. Re:When hardware must just work by Moof123 · 2016-01-11 10:19 · Score: 5, Interesting
  
  I work on ASIC design, though I am on the Analog side of things. There are more people doing verification than design by roughly 2:1. I am told that in the smaller nodes and more complex designs that the ratio is even higher. Basically you can slap down some RTL code (verilog or VHDL) quickly, but torturing it through all exceptions is very hard. Then you have to synthesize and build it, which can introduce all sorts of timing and parastic kinds of problems that have to be double checked. Finally test vectors have to be created to double check the functionality of every transistor in the design to assure that what was built matches the masks.
  It is truly phenominal that anything with Billions of gates ever works at all, let alone with the high yield and relatively low error count we have come to expect.
2. Re:When hardware must just work by tlhIngan · 2016-01-11 11:06 · Score: 5, Interesting
  
  I've done this.
  First, billions of transistors is actually easy - most of the transistors in a modern CPU is actually spent on caches and other memory. Logic itself doesn't have as high a transistor density as you might think. In fact, in practically all ASIC designs, there's so much extra silicon space that they put extra gates there that do nothing but are tied to a logic value. These spare transistors serve to provide "rework" room for the design. If you look at most steppings, you start with A0, then you have A1, A2, ... B0, B1, ... etc. Well, going from A0 to A1 is basically just a metal mask change - they don't change the transistor masks (each mask costs around $100K each, and 10 layer metal designs have often 30+ masks, so a $3M cost before the first silicon is patterned). instead, they rewire the transistors using this spare sea of transistors to fix the issues - hopefully only needing to change 5, maybe 10 masks tops ($1M). When you go from Ax to B0, that implies a complete new mask set - either there are too many fixes, or the design is being revised.
  As for simulation, it's multi-stage. First each block is individually tested, and simulated, then it's all brought together and software simulated to check for easy to spot faults and have full inner visibility to see why things are the way they are. The complexity of modern CPUs and SoCs means this is only around 1Hz, usually less, so it's reserved for initial testing and sanity checking test vectors.
  The next step is to put in on an accelerator - systems like Cadence's Palladium which can get your clock speeds up to the hundreds of Hz range. The simulation isn't as visible and the timings can be off, but you can functionally check most of the blocks and with careful probes design, bring error cases back to the software model to understand what's going on.
  The next stage is FPGA simulation - you're testing the logic itself and FPGAs (we're talking about the ones that cost easily $30K each, and no, you need at least 4 or 8 of them or more - that's a quarter million dollars in FPGAs!). But the system moves to the kHz range to even 1MHz. Which despite its slowness, is actually fast enough to boot an OS like Windows or Linux or run test software so software development for drivers and such can begin. Visibility is limited to whatever probes you could install and whatever debugging tools your FPGA toolset has.
  Then it's all laid out and routed and all that, and software simulations are run to verify timings - ensuring there are no setup and hold violations in the final floorplan.
  And it's not as bad as you think - each block is quite independent and as long as the interface contract is held (setup and hold, timings and other things for the block), the tools will tell you how close you are to violating the specs for each block. So you can test each block in isolation and as long as the interface contract is held, be assured it will work.
  Of course, it won't catch integration errors like ground bounce or other such things that. It's akin more to building a space shuttle or airplane - with the right design, you can get something that works.
At 3ghz 1 in a billion is 3 times a second by Anonymous Coward · 2016-01-11 09:47 · Score: 5, Interesting

Just saw this video
https://www.youtube.com/watch?v=eDmv0sDB1Ak
Gives some insight in to the insanely complex nature of processor design and how absurdly reliable they need to be. Modern computers pretty much expect the CPU to be flawless and that's a daunting task considering their complexity and the staggering amount of computations they perform even in ordinary day-to-day use.
An error that occurs one in a billion operations will happen 3 times a second at 3ghz.
So yeah. Some bugs are gonna happen. Thankfully most can be fixed with microcode updates.
Re:Prime95 is now an industry? by slew · 2016-01-11 21:24 · Score: 4, Informative

FWIW, your "mathematical" explanation is totally bogus. You appear to have literally no idea what you are saying.
The reason the FFT works for modular multiplication of *integers* with thousands of bits is that you can pick a radix and a convolution size where you do multi-digit convolution where you don't lose any precision in those thousands of bits. Using a "logarithm" algorithm would require nearly 10x the precision to do modular multiplication on integers and using hw floating point (even long doubles) would be totally useless because it isn't accurate to more precision.
Also, addition and multiplication in the time domain does NOT magically become multiplication and addition in the frequency domain. Convolution in the time domain becomes multiplication in the frequency domain (that's how the FFT algorithm works: FFT multiply iFFT becomes cheaper than digit convolution when the size of the problem becomes large).
Finally, although it might be technically possible to use a DCT used in a typical video decoder to do some trivial digit convolution, the precision of a typical video decoder' DCT is only 14-16 bits and limited to 8 points which isn't enough precision to do squat for the modular multiplication needed to search for very large Mersenne Primes (which is what Prime95 program does). Of course you can't even get to the 1D DCT used in GPU hardware accelerators (they are generally hardwired to do 2D DCT only and modern compression algorithms don't even use the DCT anymore).
Sorry to rain on your parade, but leaving stream of consciousness BS like that around unchallenged risks it getting modded up and makes it harder for people to distinguish the real shit from the BS...