Nvidia Pascal GP100 GPU To Rock 4 TFLOPS Double Precision, 12 TFLOPS Single Precision Processing Power (techtimes.com)

← Back to Stories (view on slashdot.org)

Nvidia Pascal GP100 GPU To Rock 4 TFLOPS Double Precision, 12 TFLOPS Single Precision Processing Power (techtimes.com)

Posted by BeauHD on Sunday February 21, 2016 @10:06AM from the chick-magnet dept.

New information emerged regarding Nvidia's Pascal GPU, covering the total compute performance of the much-anticipated FinFET-based chip. Based on a number of slides from an independent researcher, the Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance. The flagship GPU is purportedly able to provide four TFLOPs of Double-Precision (FP64) compute performance as well.

14 of 45 comments (clear)

Min score:

Reason:

Sort:

Quad SLI Supercomputer For The Desktop. by zenlessyank · 2016-02-21 10:18 · Score: 5, Funny

Stick these in a dual processor 18 core Xeon board with some nice fiber channel flash storage and then we can really play some solitaire.
techtimes - 230 ad elements blocked and counting! by Anonymous Coward · 2016-02-21 10:20 · Score: 2

Shockwave flash has crashed after autoplaying an ad with music. Twice.
Can someone link to a real website?
For comparison by PhrostyMcByte · 2016-02-21 10:52 · Score: 2

This new chip is potentially quite a large step up in raw compute performance. Their current flagship Titan X is pushing 6 TFLOPS of single-precision and 192 GFLOPS of double-precision.
They're clearly aiming high for 4K and VR performance here.
1. Re:For comparison by Arkh89 · 2016-02-21 11:47 · Score: 2
  
  Note that both the Titan and the Titan Z have better DP performance than the Titan X (1TFlops and 1.5 TFlops IIRC). I am hopping that they will stop crippling the DP on their "gaming" board though (or at least doing it to a lesser extent than the current 1/20~1/32x).
  Also, it is nice to see that the global memory bandwidth will go 4x from this generation (~250GB/s).
2. Re:For comparison by thogard · 2016-02-21 11:49 · Score: 5, Interesting
  
  Those numbers make it look like they were using a 32x32 hardware multiplier-adder and the new one uses a 64x64. Multiplying is a great example of how a 2x increase in transistor density from Moore's law can result in something far greater than 2x real speed increase. To do a 64x64 multiply in an 8 bit cpu (like the 6809 which had an 8x8 multiply instruction) you would have to do 56 separate multiplies (for the significand) and then 16 sums before a number of other sums and shifts to get the exponent normalized. Each of those instructions would take 2 to 11 cpu cycles. A 16 bit hardware multiplier would reduce 56 mul operations to 16 and a 32 bit hardware multiplayer would reduce it to 4. The barrel multiplier is often the largest structure in the ALU part of even a modern CPU. They show up on photos of modern chips as the largest rectangle area that isn't cache or memory controllers.
None more TFLOPS by PopeRatzo · 2016-02-21 11:02 · Score: 3, Funny

How many TFLOPS do I need to run the latest AAA games?
All of them.

--
You are welcome on my lawn.
1. Re:None more TFLOPS by Z80a · 2016-02-21 14:45 · Score: 2
  
  And how many to make em actually good?
Re:single precision is for marketing by godrik · 2016-02-21 11:18 · Score: 2

actually it really depends on application. In the Machine Learning community, half-precision is quite popular! For all graphics purpose single precision is what you need. Only scientific computing really need double precision.
Re:single precision is for marketing by ShanghaiBill · 2016-02-21 17:49 · Score: 2

For all graphics purpose single precision is what you need.
For many graphics applications, half-precision is good enough. FP16 isn't much faster to compute than FP32, but it is a big win for memory bandwidth, which is usually the performance chokepoint for GPUs.
Unrelated metrics by Anonymous Coward · 2016-02-21 18:31 · Score: 2, Insightful

"Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance"
Theoretical memory bandwidth has no impact on theoretical floating point performance.
It would've been better to say something about the core count and clock.
Top supercomputer was 1 TFLOP until late 2000 by Mostly+a+lurker · 2016-02-21 18:44 · Score: 3, Interesting

It is amazing to recall that the world's top supercomputer ASCI Red from 1997 to 2000 was only capable of just over 1 TFLOP.
FLOPS, not FLOPs by hankwang · 2016-02-21 18:57 · Score: 2

FLoating-point Operations Per Second. It makes no sense to speak of one FLOP, two FLOPs, as the S is not for plural.

--
Avantslash: low-bandwidth mobile slashdot.
Re:single precision is for marketing by zapadnik · 2016-02-21 19:10 · Score: 3, Informative

True. In graphics single-precision is used because it is faster, but it means that some extra work is required to ensure loss of precision doesn't occur. Consider a flight simulator that wants precision of on millimeter over the circumference of the Earth. Single Precision Floating Point doesn't cut it, you have to use relative locations for rendering, you can't just use the full global coordinates you have. However, if the GPU is fast enough for double precision operations then you can do everything in global coordinates (eg.unmodified WGS-84).
Graphics will probably always choose the extra speed of single precision over the ease of use of double. But the advent of faster and faster consumer grade cards like this might start to change that for some applications. The competition between NVidia and AMD (and to a lesser extend, Intel) really benefits consumers and developers. The performance of this card is great news.
Re:single precision is for marketing by greenfruitsalad · 2016-02-21 22:06 · Score: 2

real men don't need precision. only brute force.