Intel Skylake Bug Causes PCs To Freeze During Complex Workloads (arstechnica.com)
chalsall writes: Intel has confirmed an in-the-wild bug that can freeze its Skylake processors. The company is pushing out a BIOS fix. Ars reports: "No reason has been given as to why the bug occurs, but it's confirmed to affect both Linux and Windows-based systems. Prime95, which has historically been used to benchmark and stress-test computers, uses Fast Fourier Transforms to multiply extremely large numbers. A particular exponent size, 14,942,209, has been found to cause the system crashes. While the bug was discovered using Prime95, it could affect other industries that rely on complex computational workloads, such as scientific and financial institutions. GIMPS noted that its Prime95 software "works perfectly normal" on all other Intel processors of past generations."
Well 'Deja Vu' and you can leave '5' off.
For an analogous screw up, you only need to look at Haswell/Broadwell and TSX feature, which they retroactively disabled due to defect.
The FDIV was noteworthy because the state of things were such that they didn't have much recourse other than replacing the processors. We haven't seen a defect such that processors had to be physically recalled at such scale since, though there have been a number of similarly disastrous issues, if not for the fact they could push a microcode change to disable something or workaround...
XML is like violence. If it doesn't solve the problem, use more.
Go see page 21 for example:
http://www.intel.com/content/d...
This is a really interesting talk from 32c3 detailing the challenges involved in designing and verifying something as complex as a CPU where it can only be simulated at 1 Hz and costs 5 million to produce silicon for testing. https://www.youtube.com/watch?v=eDmv0sDB1Ak. The level of difficulty on getting this right just blows my mind. If it weren't for economies of scale CPU's would be completely out of reach. Also interesting in the talk is the vast number of CPU defects that are found and cataloged that most people appear to be unaware of. Most are of little importance (and hence don't get fixed), but some are fixed via code (as in this case), but there is no guarantee that these are being patched by OEM's.