Intel Skylake Bug Causes PCs To Freeze During Complex Workloads (arstechnica.com)

← Back to Stories (view on slashdot.org)

Intel Skylake Bug Causes PCs To Freeze During Complex Workloads (arstechnica.com)

Posted by samzenpus on Monday January 11, 2016 @09:00AM from the protect-ya-neck dept.

chalsall writes: Intel has confirmed an in-the-wild bug that can freeze its Skylake processors. The company is pushing out a BIOS fix. Ars reports: "No reason has been given as to why the bug occurs, but it's confirmed to affect both Linux and Windows-based systems. Prime95, which has historically been used to benchmark and stress-test computers, uses Fast Fourier Transforms to multiply extremely large numbers. A particular exponent size, 14,942,209, has been found to cause the system crashes. While the bug was discovered using Prime95, it could affect other industries that rely on complex computational workloads, such as scientific and financial institutions. GIMPS noted that its Prime95 software "works perfectly normal" on all other Intel processors of past generations."

6 of 122 comments (clear)

Min score:

Reason:

Sort:

Re:Deja Voo of the Pentium 5 FDIV bug by Junta · 2016-01-11 09:09 · Score: 5, Informative

Well 'Deja Vu' and you can leave '5' off.
For an analogous screw up, you only need to look at Haswell/Broadwell and TSX feature, which they retroactively disabled due to defect.
The FDIV was noteworthy because the state of things were such that they didn't have much recourse other than replacing the processors. We haven't seen a defect such that processors had to be physically recalled at such scale since, though there have been a number of similarly disastrous issues, if not for the fact they could push a microcode change to disable something or workaround...

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Deja Voo of the Pentium 5 FDIV bug by 110010001000 · 2016-01-11 09:14 · Score: 4, Informative

All processors have bugs. Some are fixed and some are not. You can obtain errata sheets from the manufacturers. At least this one is easily fixable.
Re:Lack of competition by Moof123 · 2016-01-11 09:17 · Score: 5, Informative

Go see page 21 for example:
http://www.intel.com/content/d...
When hardware must just work by Puff_Of_Hot_Air · 2016-01-11 09:31 · Score: 5, Informative

This is a really interesting talk from 32c3 detailing the challenges involved in designing and verifying something as complex as a CPU where it can only be simulated at 1 Hz and costs 5 million to produce silicon for testing. https://www.youtube.com/watch?v=eDmv0sDB1Ak. The level of difficulty on getting this right just blows my mind. If it weren't for economies of scale CPU's would be completely out of reach. Also interesting in the talk is the vast number of CPU defects that are found and cataloged that most people appear to be unaware of. Most are of little importance (and hence don't get fixed), but some are fixed via code (as in this case), but there is no guarantee that these are being patched by OEM's.
Re:Prime95 is now an industry? by tlhIngan · 2016-01-11 18:33 · Score: 1, Informative

That's the technical explanation, but the mathematical one is actually fairly simple - you convert the multiplication to an addition. There are several ways to do this - logarithms are one common way (A*B = inverse log(log(A) + log(B)) ), but so is convolution, or realizing that addition and multiplication in say, the time domain becomes multiplication and addition in the frequency domain, respectively.
So if you have two numbers, you do the FFT of them to convert the domains, then you add them up, and then do the inverse FFT. The FFT is not the only way - the DCT is another way (the FFT is an optimized for computers Fourier transform using sines, while DCT uses cosines). You might use the DCT if you have say, DCT hardware available like on a GPU (video encoders and decoders generally use the DCT over the FFT as the DCT's first parameter gets you the DC level)
Re:Prime95 is now an industry? by slew · 2016-01-11 21:24 · Score: 4, Informative

FWIW, your "mathematical" explanation is totally bogus. You appear to have literally no idea what you are saying.
The reason the FFT works for modular multiplication of *integers* with thousands of bits is that you can pick a radix and a convolution size where you do multi-digit convolution where you don't lose any precision in those thousands of bits. Using a "logarithm" algorithm would require nearly 10x the precision to do modular multiplication on integers and using hw floating point (even long doubles) would be totally useless because it isn't accurate to more precision.
Also, addition and multiplication in the time domain does NOT magically become multiplication and addition in the frequency domain. Convolution in the time domain becomes multiplication in the frequency domain (that's how the FFT algorithm works: FFT multiply iFFT becomes cheaper than digit convolution when the size of the problem becomes large).
Finally, although it might be technically possible to use a DCT used in a typical video decoder to do some trivial digit convolution, the precision of a typical video decoder' DCT is only 14-16 bits and limited to 8 points which isn't enough precision to do squat for the modular multiplication needed to search for very large Mersenne Primes (which is what Prime95 program does). Of course you can't even get to the 1D DCT used in GPU hardware accelerators (they are generally hardwired to do 2D DCT only and modern compression algorithms don't even use the DCT anymore).
Sorry to rain on your parade, but leaving stream of consciousness BS like that around unchallenged risks it getting modded up and makes it harder for people to distinguish the real shit from the BS...