Slashdot Mirror

← Back to Stories (view on slashdot.org)

Nvidia Pascal GP100 GPU To Rock 4 TFLOPS Double Precision, 12 TFLOPS Single Precision Processing Power (techtimes.com)

Posted by BeauHD on Sunday February 21, 2016 @10:06AM from the chick-magnet dept.

New information emerged regarding Nvidia's Pascal GPU, covering the total compute performance of the much-anticipated FinFET-based chip. Based on a number of slides from an independent researcher, the Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance. The flagship GPU is purportedly able to provide four TFLOPs of Double-Precision (FP64) compute performance as well.

2 of 45 comments (clear)

Min score:

Reason:

Sort:

Re:For comparison by thogard · 2016-02-21 11:49 · Score: 5, Interesting

Those numbers make it look like they were using a 32x32 hardware multiplier-adder and the new one uses a 64x64. Multiplying is a great example of how a 2x increase in transistor density from Moore's law can result in something far greater than 2x real speed increase. To do a 64x64 multiply in an 8 bit cpu (like the 6809 which had an 8x8 multiply instruction) you would have to do 56 separate multiplies (for the significand) and then 16 sums before a number of other sums and shifts to get the exponent normalized. Each of those instructions would take 2 to 11 cpu cycles. A 16 bit hardware multiplier would reduce 56 mul operations to 16 and a 32 bit hardware multiplayer would reduce it to 4. The barrel multiplier is often the largest structure in the ALU part of even a modern CPU. They show up on photos of modern chips as the largest rectangle area that isn't cache or memory controllers.
Top supercomputer was 1 TFLOP until late 2000 by Mostly+a+lurker · 2016-02-21 18:44 · Score: 3, Interesting

It is amazing to recall that the world's top supercomputer ASCI Red from 1997 to 2000 was only capable of just over 1 TFLOP.