Intel Skylake Bug Causes PCs To Freeze During Complex Workloads (arstechnica.com)
chalsall writes: Intel has confirmed an in-the-wild bug that can freeze its Skylake processors. The company is pushing out a BIOS fix. Ars reports: "No reason has been given as to why the bug occurs, but it's confirmed to affect both Linux and Windows-based systems. Prime95, which has historically been used to benchmark and stress-test computers, uses Fast Fourier Transforms to multiply extremely large numbers. A particular exponent size, 14,942,209, has been found to cause the system crashes. While the bug was discovered using Prime95, it could affect other industries that rely on complex computational workloads, such as scientific and financial institutions. GIMPS noted that its Prime95 software "works perfectly normal" on all other Intel processors of past generations."
Old-timers will remember the Pentium 5 FDIV bug where the chip sometimes yielded incorrect results for complex mathematical calculations.
Hulk SMASH Celiac Disease
If you saw the actual errata list for processors on launch day, regardless of manufacturer, your jaw would drop. A lot of nasties get cleaned up on subsequent revisions (mask changes), but in the meantime patches show up for the BIOS, libraries, and compilers so that the user never sees the warts. With Billions of transistors there will be design errors that even intel will not catch during verification or characterization. The fact that a BIOS fix will take care of it is a sign that it is not that egregious.
If you want to avoid this kind of stuff you should wait a few months after any major shakeup to buy.
It probably just means the NSA is already using your processor's compute capacity as part of their vast decryption botnet. The fix should improve resource management so you won't notice it in future.
Go see page 21 for example:
http://www.intel.com/content/d...
This is a really interesting talk from 32c3 detailing the challenges involved in designing and verifying something as complex as a CPU where it can only be simulated at 1 Hz and costs 5 million to produce silicon for testing. https://www.youtube.com/watch?v=eDmv0sDB1Ak. The level of difficulty on getting this right just blows my mind. If it weren't for economies of scale CPU's would be completely out of reach. Also interesting in the talk is the vast number of CPU defects that are found and cataloged that most people appear to be unaware of. Most are of little importance (and hence don't get fixed), but some are fixed via code (as in this case), but there is no guarantee that these are being patched by OEM's.
Just got a MSI with 32GB of RAM and the skylake processor because I need to manipulate large Autocad files. For no reason my laptop would lock up and nothing would be in the dump logs. I could not figure it out...until now.
Like software one should wait until after the product has had a revision 1st.
Oddly we think of intel cpus and chipsets as rock solid and operating systems as garbage based on Vista, ME, and 8.1. Perhaps doing the same and buying older hardware would be wise too.
My gigabyte board for example I am disappointed in and same with Asus when z97 haswell. Was new. Both are top brands but were extremely unstable and buggy. Asus Sabertooth is unusable and Gigabyte got stable after 4 updates somewhat.
http://saveie6.com/
Isn't it easier to distibute new firmware with microcode_ctl/intel-microcode packages? MS-Windows also seems to have some such package updates.
The CPU makes the PC freeze? If they could just crank this bug down a bit it could revolution the server cooling industry.
lucm, indeed.
Just saw this video
https://www.youtube.com/watch?v=eDmv0sDB1Ak
Gives some insight in to the insanely complex nature of processor design and how absurdly reliable they need to be. Modern computers pretty much expect the CPU to be flawless and that's a daunting task considering their complexity and the staggering amount of computations they perform even in ordinary day-to-day use.
An error that occurs one in a billion operations will happen 3 times a second at 3ghz.
So yeah. Some bugs are gonna happen. Thankfully most can be fixed with microcode updates.
Surprising, I expected in-silicone code to be more robustly tested prior to getting released. Turns out, code is code.
Everything is getting faster. Development cycles are getting shorter, schedules are getting tighter, margins are being trimmed down and testing is taking some of that hit. Software is already brutally paced to the point that customers are now performing QA. We're having to train our customers how to use Bugzilla and we somehow accept this as "Ok". Eventually the pacing will become so brutal that version 2 won't even use the same codebase as version 1. Posting bugs will become useless. Software development velocity is such that no-one wants to write long-lived code anymore.
Once hardware reaches this breakneck prototyping velocity it's going to be the same thing. Defects will become more common. Revisions will become more common. Just hope they don't tell us to change out the mobo each time or we'll never get anything working. Even if the time between revisions stays the same the complexity is going up and I'd expect they're pulling all-nighters just to keep pace. Risk goes up accordingly.
indeed. Streng was bold.
http://blogs.sciencemag.org/pipeline/archives/2010/02/23/things_i_wont_work_with_dioxygen_difluoride
see also
https://what-if.xkcd.com/40/
Thanks, that got me one step closer, and then I found these on the Intel website:
http://www.intel.com/content/www/us/en/processors/core/core-i3-processor.html?wapkw=skylake
http://www.intel.com/content/www/us/en/test/manju-test/core-i5-processor.html?wapkw=skylake
So, it look like the processor names (what I can find in the system specs) are i3-6x00 and i5-6x00, etc.
I don't have anything that new, so I am OK.
For a given value of performance expectation, as purchased.
One might be a little bit cheesed to discover that the entire hardware floating point subsystem has been replaced with on chip emulator, which additionally wires down half of your L2 cache to host the microcode execution vectors and/or byte codes.
In the spirit of good will and transparency, I hope to see Intel recirculate the original sample chips to all the hardware review websites (whose benchmarks are still found all over the internet) so that these websites can all update their benchmarks (and conclusions, if necessary) to the new Skylake post-BIOS performance reality.
Admittedly, it's not a large hope.
Apple calls these sorts of things "firmware updates" (yes that is a generic name). Things like this are included, as are things like updates for ethernet chipsets, firewire routers (there are 3 on the MacPro), and even rarely firmware for the GPU. Additionally there are sometimes "SMC" updates for the part of the computer that manages power and sleep behavior.
You think the processor companies have the time or budget to do exhaustive testing?
and run simultaneously on 7.9335 threads, too!
if this is supposed to be a new economy, how come they still want my old fashioned money?
This is awesome!. I so rarely get a chance to use the phrase correlation != causation on Slashdot! (Also, I have some awesome swamp^H^H^H^H^Hland for sale, cheap!)
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Guilty as charged, but I'm going to go with "Everyone, I found the OS X manifestation of this bug!!"
Correct, but for the wrong reason:
There are currently no Apple products that utilize a Skylake CPU.
I was incorrect. The 2015 iMac has a Skylake CPU.
How exactly does one use "Fast Fourier Transforms to multiply extremely large numbers" and when exactly did Prime95 become an industry?
The most common way to multiply numbers larger than the register size of the machine (e.g., 4000 bit numbers) is to express it like most people multiply numbers more than 1 digit relative to some base R.
(c0 + c1*R+ c2*R^2 + c3*R^3 + ...) * (d0 + d1*R+ d2*R^2 + d3*R^3 + ...) = (p0 + p1*R+ p2*R^2 + p3*R^3 + ...)
Where R is 10 for humans, for a computer, R is some power of 2 (because computers like that).
A basic observation of the math is that product of digits computed this way is very similar to a linear convolution of those digits (coefficients in this representation) and you can speed up large convolutions using an FFT. If you pick R small enough, you can do the multiplication and all the partial products together without any rounding problems using the SSE/AVX SIMD floating point math on your x86-64 computer.**
Prime95 is freeware app that is used by GIMPS that uses this FFT technique to multiple large numbers together very quickly and is a big stress on the CPU because the code is highly optimized.
Nobody claimed Prime95 is an "industry", but other industries that rely on skylake processors to do complex operations might be affected by the same bug Prime95 has triggered.
**Interestingly, the straight forward integer multiplication is slower than floating point for a certain precisions in nearly all x86-64 implementations because of a premium on SSE/AVX speed, intel has invested more on 32-bit FP math (24-bit mantissa multiplier for FP), vs 32-bit int math (32-bit x 32-bit -> 64bit int multipliers are much bigger)
I find neither Gigabyte nor Asus to be "top" motherboard manufacturers. At best they are premium value boards (cheap boards with some premium features enabled). I have found them consistently to be buggy and sometimes even outright useless. The last time I bought them, I actually returned an Asus board because it 'supported' ECC RAM but didn't actually implement it (simply disabled it).
I buy SuperMicro boards, not always on the edge but consistently configurable and very good support if any bugs do arise. I've had decent luck with Via boards way back in the day and MSI/Tyan as well.
Custom electronics and digital signage for your business: www.evcircuits.com
The only difference between the low end and high end chips is the number of flaws in the core coming off the die. It's impossible to get a consistent yield on a wafer. Minor electrical variances, impurities in the materials, flaws in the machines that do the manufacturing, etc. The chip maker has to test each and every chip that is produced to sort them into a wide variety of performance bins. The ones that have the fewest flaws and can run the fastest get put in the most expensive bins. The ones with flaws in the cache and inoperative cores get dumped in the cheap bin. And everything in between.
So really, they only have to test one design to root the bugs out. And the test applies to all of the grades of chips coming off the line.
Even so, it's impossible to fully test the chip before it goes to market. So they have to decide to test it to a "good enough that we can patch it in BIOS or software patches" level.
Comment removed based on user account deletion
That is why I never buy the (new/lat)est stuff. I'll get the old and more stable stuff.
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Most processor bugs have nothing to do with the frequency of execution, they're caused by a unique set of circumstances. So when someone says it will happen once out of every billion operations they're making the assumption that you will setup that unique case one out of every billion times. This depends heavily on what you're doing with processor. For example, this bug is a math related operation and chances are that if you put it in one of Google or Netflix web servers it would never hit the bug for the duration of it's use even though its getting hammered... because they're not doing math operations of this nature. However, a math major may hit it 2-3 times a week doing their homework (I was in college during the FDIV bug, my g/f at the time was an engineering student who had a statics simulation that triggered it... I thought it was cool... she did not :p)
That's the technical explanation, but the mathematical one is actually fairly simple - you convert the multiplication to an addition. There are several ways to do this - logarithms are one common way (A*B = inverse log(log(A) + log(B)) ), but so is convolution, or realizing that addition and multiplication in say, the time domain becomes multiplication and addition in the frequency domain, respectively.
So if you have two numbers, you do the FFT of them to convert the domains, then you add them up, and then do the inverse FFT. The FFT is not the only way - the DCT is another way (the FFT is an optimized for computers Fourier transform using sines, while DCT uses cosines). You might use the DCT if you have say, DCT hardware available like on a GPU (video encoders and decoders generally use the DCT over the FFT as the DCT's first parameter gets you the DC level)
FWIW, your "mathematical" explanation is totally bogus. You appear to have literally no idea what you are saying.
The reason the FFT works for modular multiplication of *integers* with thousands of bits is that you can pick a radix and a convolution size where you do multi-digit convolution where you don't lose any precision in those thousands of bits. Using a "logarithm" algorithm would require nearly 10x the precision to do modular multiplication on integers and using hw floating point (even long doubles) would be totally useless because it isn't accurate to more precision.
Also, addition and multiplication in the time domain does NOT magically become multiplication and addition in the frequency domain. Convolution in the time domain becomes multiplication in the frequency domain (that's how the FFT algorithm works: FFT multiply iFFT becomes cheaper than digit convolution when the size of the problem becomes large).
Finally, although it might be technically possible to use a DCT used in a typical video decoder to do some trivial digit convolution, the precision of a typical video decoder' DCT is only 14-16 bits and limited to 8 points which isn't enough precision to do squat for the modular multiplication needed to search for very large Mersenne Primes (which is what Prime95 program does). Of course you can't even get to the 1D DCT used in GPU hardware accelerators (they are generally hardwired to do 2D DCT only and modern compression algorithms don't even use the DCT anymore).
Sorry to rain on your parade, but leaving stream of consciousness BS like that around unchallenged risks it getting modded up and makes it harder for people to distinguish the real shit from the BS...
While the bug was discovered using Prime95, it could affect other industries that rely on complex computational workloads, such as scientific and financial institutions.
How about porn?
Please login to access my lawn
Simulation testing is very difficult. It is many orders of magnitude slower than the actual device. At some point, you have to ask "should we do 2 more months of simulation on this or just spend a million $ or so to fabricate some samples with the newest tiny geometry?" So you fabricate, find 50 errors missed in simulation, fix those and start simulations again. Fabricate again (whoops! there goes another million) and find that there are flaws caused by the fixes, flaws hidden by the previous flaws, newly discovered flaws, and yet there will still be flaws that won't be easily found, or found soon.
With each new fabrication the pressure builds to market a chip that at least have known bugs that can be worked around. Customers want faster chips and more features, and they're somewhat willing to work around bugs to have what they want. Rather than wait for another multi-month cycle that angers fans and customers and costs money, the manufacturer ships and crosses its fingers. We're human beings, and we're doing the best we can, which is very good.
Contribute to civilization: ari.aynrand.org/donate
I've found Gigabyte to be okay, but I've never understood why people like Asus so much. Their stuff is way too flaky and unreliable to command the premium prices you'll pay for it. It's too bad that Intel stopped making motherboards (at least ones in standard form factors). They generally weren't terribly friendly to overclockers and could be a bit conservative on what settings they exposed but they tended to be pretty stable and well supported.
I told him I had canceled the PC orders I placed and would not buy more of them until the situation was resolved. A short while later, Intel changed their tune and also started being more open with the bugs in their processors
Before I was born, Britain had never had a female prime minister, America had never had a black president, and the Shah still ruled Iran.
My birth clearly changed all of this...
systemd is Roko's Basilisk.
It's incorrect to say that FFT is "using sines". FFT is using complex exponents as base functions, while DCT uses real cosine functions. The major practical difference between the two is the discontinuities at the boundaries present in FFT, but absent in DCT. That's what makes DCT easier to apply in compression jobs.
A successful API design takes a mixture of software design and pedagogy.
weapons of math destruction. 3
you insensitive clod!