Slashdot Mirror


Nvidia Pascal GP100 GPU To Rock 4 TFLOPS Double Precision, 12 TFLOPS Single Precision Processing Power (techtimes.com)

New information emerged regarding Nvidia's Pascal GPU, covering the total compute performance of the much-anticipated FinFET-based chip. Based on a number of slides from an independent researcher, the Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance. The flagship GPU is purportedly able to provide four TFLOPs of Double-Precision (FP64) compute performance as well.

45 comments

  1. single precision is for marketing by Anonymous Coward · · Score: 0

    the real men need double.

    1. Re:single precision is for marketing by Anonymous Coward · · Score: 0

      At least the card will not cost one million dollars.

    2. Re:single precision is for marketing by ls671 · · Score: 1

      Nah, I am a real man and I get along fine with half-precision.

      --
      Everything I write is lies, read between the lines.
    3. Re:single precision is for marketing by Anonymous Coward · · Score: 0

      Actually the bigger push is for FP16 these days as superfast calculations are more important that higher levels of precision for many game engines.

    4. Re:single precision is for marketing by godrik · · Score: 2

      actually it really depends on application. In the Machine Learning community, half-precision is quite popular! For all graphics purpose single precision is what you need. Only scientific computing really need double precision.

    5. Re:single precision is for marketing by Anonymous Coward · · Score: 0

      I worked in VFX briefly. Half-precision was pretty popular there as well.

    6. Re:single precision is for marketing by ShanghaiBill · · Score: 2

      For all graphics purpose single precision is what you need.

      For many graphics applications, half-precision is good enough. FP16 isn't much faster to compute than FP32, but it is a big win for memory bandwidth, which is usually the performance chokepoint for GPUs.

    7. Re:single precision is for marketing by Shinobi · · Score: 1

      Correction, for some scientific computing needs, double precision is a must. There are other fields where single precision is perfectly fine.

    8. Re:single precision is for marketing by zapadnik · · Score: 3, Informative

      True. In graphics single-precision is used because it is faster, but it means that some extra work is required to ensure loss of precision doesn't occur. Consider a flight simulator that wants precision of on millimeter over the circumference of the Earth. Single Precision Floating Point doesn't cut it, you have to use relative locations for rendering, you can't just use the full global coordinates you have. However, if the GPU is fast enough for double precision operations then you can do everything in global coordinates (eg.unmodified WGS-84).

      Graphics will probably always choose the extra speed of single precision over the ease of use of double. But the advent of faster and faster consumer grade cards like this might start to change that for some applications. The competition between NVidia and AMD (and to a lesser extend, Intel) really benefits consumers and developers. The performance of this card is great news.

    9. Re:single precision is for marketing by greenfruitsalad · · Score: 2

      real men don't need precision. only brute force.

    10. Re:single precision is for marketing by NotDrWho · · Score: 1

      as much as 12 TFLOPs of Single-Precision (FP32) compute performance

      Can someone tell me what this is in cores?

      --
      SJW's don't eliminate discrimination. They just expropriate it for themselves.
    11. Re:single precision is for marketing by Areyoukiddingme · · Score: 1

      FP16 isn't much faster to compute than FP32, but it is a big win for memory bandwidth, which is usually the performance chokepoint for GPUs.

      It used to be. This thing has stacked memory (called High Bandwidth Memory (HBM)) with an absurdly wide memory bus. With four stacks, the memory bus is 4096 bits wide, vs the typical 512 bits over 8 channels of GDDR5. HBM2, which Samsung started producing in volume a month ago, doubles the number of dies in the stacks, from 4 to 8, and doubles the throughput.

      AMD's Zen is being built with the same memory interface. One supposes that a CPU operating on much less homogeneous data won't enjoy a bump the size of the 4X gain between the Titan double precision and this thing's double precision performance, but it should be at least double. Combined with their 40% gain in instructions per clock over their previous generation, and for a couple of months at the end of 2016, AMD might actually be ahead of Intel. It might even last longer than that, if Intel is smart enough to drag their feet before introducing their own stacked memory CPU. AMD needs the life support.

    12. Re: single precision is for marketing by Redbehrend · · Score: 1, Interesting

      Funny how nvidia won't share anything or help the market but they'll take everyone else's tech and use it lol. I want karma to catch up to nvidia so maybe they'll think twice about trying to pull one or charge high prices on its customers again. Only reason customers needed the new 900 series is because they pulled some seriously shady stuff with the studios.

  2. Quad SLI Supercomputer For The Desktop. by zenlessyank · · Score: 5, Funny

    Stick these in a dual processor 18 core Xeon board with some nice fiber channel flash storage and then we can really play some solitaire.

    1. Re:Quad SLI Supercomputer For The Desktop. by Anonymous Coward · · Score: 0

      To hell with solitaire.

      I want to play pool via the solar system, simulated with actual physics.
      Give me 6 blackholes and a whitehole.

      So what is it?

    2. Re:Quad SLI Supercomputer For The Desktop. by Anonymous Coward · · Score: 0

      Minesweeper

  3. techtimes - 230 ad elements blocked and counting! by Anonymous Coward · · Score: 2

    Shockwave flash has crashed after autoplaying an ad with music. Twice.

    Can someone link to a real website?

  4. Re:techtimes - 230 ad elements blocked and countin by Anonymous Coward · · Score: 0

    I think somebody has a malware infestation.

  5. For comparison by PhrostyMcByte · · Score: 2

    This new chip is potentially quite a large step up in raw compute performance. Their current flagship Titan X is pushing 6 TFLOPS of single-precision and 192 GFLOPS of double-precision.

    They're clearly aiming high for 4K and VR performance here.

    1. Re:For comparison by Arkh89 · · Score: 2

      Note that both the Titan and the Titan Z have better DP performance than the Titan X (1TFlops and 1.5 TFlops IIRC). I am hopping that they will stop crippling the DP on their "gaming" board though (or at least doing it to a lesser extent than the current 1/20~1/32x).
      Also, it is nice to see that the global memory bandwidth will go 4x from this generation (~250GB/s).

    2. Re:For comparison by thogard · · Score: 5, Interesting

      Those numbers make it look like they were using a 32x32 hardware multiplier-adder and the new one uses a 64x64. Multiplying is a great example of how a 2x increase in transistor density from Moore's law can result in something far greater than 2x real speed increase. To do a 64x64 multiply in an 8 bit cpu (like the 6809 which had an 8x8 multiply instruction) you would have to do 56 separate multiplies (for the significand) and then 16 sums before a number of other sums and shifts to get the exponent normalized. Each of those instructions would take 2 to 11 cpu cycles. A 16 bit hardware multiplier would reduce 56 mul operations to 16 and a 32 bit hardware multiplayer would reduce it to 4. The barrel multiplier is often the largest structure in the ALU part of even a modern CPU. They show up on photos of modern chips as the largest rectangle area that isn't cache or memory controllers.

    3. Re:For comparison by Junta · · Score: 1

      Note that Maxwell consciously screwed DP performance. You have to go back to Kepler for decent DP.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    4. Re:For comparison by Anonymous Coward · · Score: 0

      I am hopping that they will stop crippling the DP on their "gaming" board though (or at least doing it to a lesser extent than the current 1/20~1/32x).

      Fat chance, considering AMD started doing the exact same thing (a 2013 Hawaii is 1/8 DP rate as 290X/390X, yet 1/2 DP when it's a FirePro ...).

      Also, it is nice to see that the global memory bandwidth will go 4x from this generation (~250GB/s).

      Yup, pretty much confirms the rumors, 4096 bit HBM2 @ 1GHz.

    5. Re:For comparison by AHuxley · · Score: 1

      Yes 4K will be the test at 60 fps, 120 and beyond. No more sli needed :)

      --
      Domestic spying is now "Benign Information Gathering"
    6. Re:For comparison by Anonymous Coward · · Score: 0

      Nope, DP was severely crippled even for Kepler, you are thinking of fermi. My gtx580(fermi) has better DP performance than even most Maxwell cards. This is the reason you actually see a lot of 580s in server farms that require DP for whatever reason.

    7. Re:For comparison by Anonymous Coward · · Score: 0

      The flaw in this logic is the assumption that a 16bit vs 8bit multiplier only uses 2x the transistors.

    8. Re:For comparison by SScorpio · · Score: 1

      The Titan was 1.5 TFlops, while the Titan Black was 1.7 TFlops. The Z is listed at 2.7 TFlops, but it's two chips on a single card while the others single chip.

      I'd also like to see their gaming cards get better DP performance, but I'd be very surprised if we actually got the reports 1/4x. The Titan was 1/3x.

    9. Re:For comparison by Anonymous Coward · · Score: 0

      yes, 8x8 -> 16x16 is roughly 4x the gate-count... but he did say 2x increase in gate density, not count. if talking linear density (which we often do), that is 4x the devices in the same area.

  6. None more TFLOPS by PopeRatzo · · Score: 3, Funny

    How many TFLOPS do I need to run the latest AAA games?

    All of them.

    --
    You are welcome on my lawn.
    1. Re:None more TFLOPS by Z80a · · Score: 2

      And how many to make em actually good?

    2. Re:None more TFLOPS by Shinobi · · Score: 1

      Depends on the game, doesn't it? With X-Plane and other sims, the answer is: ALL OF THEM! since you can make the flight models far more advanced, and also include more advanced radar simulation etc.

  7. Re:techtimes - 230 ad elements blocked and countin by St.Creed · · Score: 1

    I see nothing, but I'm running Ghostery. Ghostery only detects 5 ad-networks, and one of 'm is the twitter button. The rest of the items seems harmless.

    You may, as suggested by AC below, want to remove some malware from your system.

    --
    Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
  8. Re:techtimes - 230 ad elements blocked and countin by Anonymous Coward · · Score: 0

    Nah, this website really does try to continuously load ads.

    Leave it open for a few minutes and the adblock counter will be in the hundreds if not thousan

  9. Re:techtimes - 230 ad elements blocked and countin by Anonymous Coward · · Score: 0

    ABP count: 5

  10. Re:techtimes - 230 ad elements blocked and countin by alvinrod · · Score: 1

    I run Ghostery as well and I paused the blocking to test it out. At least one of the initial networks is responsible for loading other networks which seem to load other crap in turn. It didn't take more than 10 seconds before the count was over 100. At that point, stuff started auto-playing and making noise so I shut the tab, but I wouldn't be surprised if more shit kept getting pulled in. I don't know which is the bad apple, but it's pretty damned clear that it's out of control.

  11. Obligitory by dmgxmichael · · Score: 1

    Can it run Crysis?

    At 8K?

  12. Just imagine by 0100010001010011 · · Score: 1

    A full beowulf cluster of those!

  13. Unrelated metrics by Anonymous Coward · · Score: 2, Insightful

    "Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance"

    Theoretical memory bandwidth has no impact on theoretical floating point performance.
    It would've been better to say something about the core count and clock.

  14. Top supercomputer was 1 TFLOP until late 2000 by Mostly+a+lurker · · Score: 3, Interesting

    It is amazing to recall that the world's top supercomputer ASCI Red from 1997 to 2000 was only capable of just over 1 TFLOP.

  15. FLOPS, not FLOPs by hankwang · · Score: 2

    FLoating-point Operations Per Second. It makes no sense to speak of one FLOP, two FLOPs, as the S is not for plural.

    1. Re:FLOPS, not FLOPs by Tumbleweed · · Score: 1

      FLoating-point Operations Per Second. It makes no sense to speak of one FLOP, two FLOPs, as the S is not for plural.

      This comment is endorsed by Pedantic-Man(tm)!

  16. Re: techtimes - 230 ad elements blocked and counti by Anonymous Coward · · Score: 0

    Someone's using uBlock or some other crap that slows the browser to a crawl.

  17. To Rock? by Anonymous Coward · · Score: 0

    To Rock?

    Is the GPU going to be wobbly?

    Could someone explain to me how the wobblyness of the GPU helps it achieve these high throughputs?

  18. Re:techtimes - 230 ad elements blocked and countin by St.Creed · · Score: 1

    Eew...

    Yes, after a bit closer examination I clicked on one of the links in the ads that were reasonably well-behaved, and it led me straight into a number of sites registered at the nr. 1 destination for crooks and criminals - straight up fraudulent websites.

    So since they apparently don't mind that criminals advertise on their site, they probably don't mind that some of them have "drive-by payloads" either. It's probably just a number that pop up irregularly. Nice...

    That site is indeed best avoided.

    --
    Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)