Slashdot Mirror


NVIDIA Unveils 2 Petaflop DGX-2 AI Supercomputer With 32GB Tesla V100, NVSwitch Tech

bigwophh writes from a report via HotHardware: NVIDIA CEO Jensen Huang took to the stage at GTC today to unveil a number of GPU-powered innovations for machine learning, including a new AI supercomputer and an updated version of the company's powerful Tesla V100 GPU that now sports a hefty 32GB of on-board HBM2 memory. A follow-on to last year's DGX-1 AI supercomputer, the new NVIDIA DGX-2 can be equipped with double the number of Tesla V100 processing modules for double the GPU horsepower. The DGX-2 can also have four times the available memory space, thanks to the updated Tesla V100's larger 32GB of memory. NVIDIA's new NVSwitch technology is a fully crossbar GPU interconnect fabric that allows NVIDIA's platform to scale to up to 16 GPUs and utilize their memory space contiguously, where the previous DGX-1 NVIDIA platform was limited to 8 total GPU complexes and associated memory. NVIDIA claims NVSwitch is five times faster than the fastest PCI Express switch and offers an aggregate 2.4TB per second of bandwidth. A new Quadro card was also announced. Called the Quadro GV100, it too is being powered by Volta. The Quadro GV100 packs 32GB of memory and supports NVIDIA's recently announced RTX real-time ray tracing technology.

41 comments

  1. This Is Not A 2 Petaflop Supercomputer At All by dryriver · · Score: 3, Informative

    The Nvidia V100 is a 15 TeraFlops capable GPU at 32 Bit accuracy, and half that at 64 Bit accuracy. You'd need a whopping 134 of these GPUs in a box with perfect parallelization between them to hit 2 TeraFlops for general GPGPU compute tasks. Nvidia claims that the TENSOR cores in a V100 deliver about 120 TeraFlops of MACHINE LEARNING performance. How they measured this is an open question - did they take a machine learning task that was 120 times faster than a 1 TeraFlop CPU with no AI optimization could do, and magically arrive at 120 TFLOPS? What AI tasks these TENSOR core TeraFlops can be used for is the next question. So for anyone thinking "I can get 2000 GPGPU TeraFlops in 1 box", sorry that isn't the case here. For specific AI tasks, this may be the machine to get. For general GPGPU, this thing is just a casing with a couple of 15 TFLOP GPUs crammed together.

    --
    Why did the chicken cross the road? Because Elon Musk put an AI chip in its head.
    1. Re: This Is Not A 2 Petaflop Supercomputer At All by Anonymous Coward · · Score: 1

      A flop is a flop, no matter how many petas.

    2. Re:This Is Not A 2 Petaflop Supercomputer At All by Anonymous Coward · · Score: 1

      The Nvidia V100 is a 15 TeraFlops capable GPU at 32 Bit accuracy, and half that at 64 Bit accuracy. You'd need a whopping 134 of these GPUs in a box with perfect parallelization between them to hit 2 TeraFlops for general GPGPU compute tasks. Nvidia claims that the TENSOR cores in a V100 deliver about 120 TeraFlops of MACHINE LEARNING performance. How they measured this is an open question - did they take a machine learning task that was 120 times faster than a 1 TeraFlop CPU with no AI optimization could do, and magically arrive at 120 TFLOPS? What AI tasks these TENSOR core TeraFlops can be used for is the next question. So for anyone thinking "I can get 2000 GPGPU TeraFlops in 1 box", sorry that isn't the case here. For specific AI tasks, this may be the machine to get. For general GPGPU, this thing is just a casing with a couple of 15 TFLOP GPUs crammed together.

      It is literally called an __AI__ supercomputer, the target market and intended purpose is deep learning, training, and inference. Tasks which make use of the tensorcores which are matrix-multiply-and-accumulate units.
      Sure the flop count is only on workloads making use of the tensorcores, but seeing as how that's the market for it anyway I see no problem.

    3. Re: This Is Not A 2 Petaflop Supercomputer At All by Anonymous Coward · · Score: 0

      We ditched SIMD vector cores for scalar because vector had poor occupancy and a bloated ISA. Now weâ(TM)re adding SIMD back with Tensor Cores. Itâ(TM)s like Cray all over again.

      Usually thereâ(TM)s more than 1 year between revisiting bad architectures. Maybe ours brain is get the stupider.

    4. Re:This Is Not A 2 Petaflop Supercomputer At All by Anonymous Coward · · Score: 0

      From the FA:

      DGX-2 can train FAIRSeq, a cutting-edge neural machine translation model, in about one and a half days, where it took 15 days on DGX-1 (a 10X improvement). Other gains NVIDIA boasts for DGX-2 are in areas of inference or image recognition, where DGX-2 is claimed 190X faster, and up to 60X faster with speech recognition and voice synthesis

      There's big money to be had in providing governments and security companies with vision and speech recognition that works.

      So this is a return to a crossbar architecture like the old 8-CPU SGI workstations? With two beefy Xenon 28 cpu CPUS and a ton of ram. But it gets twice the performance of the prior model on a particular LT model and higher training speeds for unlisted IR and SR tasks.

      16 of the V100 GPU cores at 15 TF each still gives 260 trillion floating point operations per second in 32-bit mode. That's about 1/8th of the 2 Petaflops unless you use some creative accounting a la harddirve "megabytes."

      So compared to a really big Etherium-miner style compute block this just has very high speed RAM interconnects for large matrix transforms like in ML. Unlike ML, Proof of Stake Crypto algorithms are generally not bottle-necked at the IO. So you could see people powering 16 to 24 V100 or GV100 chips on a single frame. That large 24 core miner is still only a 360 trillion floating operations per second architecture. (And God help you when the power bills come in.)

      Really these are only useful for the development phase of Machine Learning. Being a matrix multiply function, once you have a model trained they take very little power. Even complex models can be run on a Rasberry PI class hardware. But compared to hardwware, developer time is expensive. Reducing that is useful to reduce cost. But the real hidden message is this may enable less skilled researchers - like the kind your non-Apple / non-Google company can hire - to use less tuned ML models.

      Time to market much more expensive than developer time. This may not be 2 PF but a 10-fold reduction in training to convergence time can let you iterate faster on training various parameters.

      (Although both time-to-market and developer-time pale in cost to the expense of a production outage after release.)

    5. Re:This Is Not A 2 Petaflop Supercomputer At All by Anonymous Coward · · Score: 0

      Tensor cores only do one thing. D=(AxB)+C. where a,b,c are 4x4 matrices. They do it 64 times per clock cycle however. Very useful if you can reduce a meaningful part of your workload to that function.

    6. Re:This Is Not A 2 Petaflop Supercomputer At All by Anonymous Coward · · Score: 0

      Perhaps they are using 16-bit floats to achieve the advertised performance.

      IMHO, if they are doing that, they're being deceptive: 16-bit arithmetic may be adequate for "AI" (i.e., neural nets, which are kinda fuzzy anyway), they are totally useless for serious number crunching. With only 16 bits, you lose so much precision on repeated calculations that the results you get are worthless. Even 32 bits are questionable. That's why the standard flop uses 64 bits.

  2. 2 Petaflop? by cold+fjord · · Score: 4, Funny

    I thought that nobody needed more than 640 teraflops?

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
  3. Fuck raytracing by Anonymous Coward · · Score: 0

    How much bitcoin can this bad mofo crank through?

    1. Re:Fuck raytracing by mark-t · · Score: 2

      What difference does that make? You'll spend more on electricity trying to mine them than what bitcoin is worth.... it's been that way for years.

    2. Re:Fuck raytracing by dryriver · · Score: 1

      Its only the TENSOR cores for Machine Learning and AI tasks that supposedly deliver 120 TFlops per GPU card. The card itself does just 15 TFlops for general computation tasks. So unless you can figure out how to mine Bitcoin using the Tensor cores, these Volta V100 GPUs are basically just like the GTX 1080 GPU, just with about 5000 CUDA cores and more RAM capacity.

      --
      Why did the chicken cross the road? Because Elon Musk put an AI chip in its head.
    3. Re:Fuck raytracing by InvalidsYnc · · Score: 2

      Yes, Please, let those stupid fricking bitcoin miners start using something other than the top end of the consumer line so that regular people don't have to spend so damn much or wait so damn long to buy a decent video card.

      Now, what I really meant:

      Bitcoin miners should stop buying the fucking graphics cards all up so that I can buy one cheap. Assholes.

    4. Re: Fuck raytracing by Anonymous Coward · · Score: 0

      Ray tracing won't make stupid games for retards magically more attractive.

    5. Re: Fuck raytracing by Anonymous Coward · · Score: 0

      Forget mining... crack some elliptic curve equations with that 1tb of ram in a box, and voila... ALL YOUR BITCOIN BELONG TO US !

    6. Re: Fuck raytracing by Anonymous Coward · · Score: 1

      AMD fan, I take it? Your point is valid, but you keep harping on it as if it's some big conspiracy you're uncovering...

    7. Re: Fuck raytracing by Anonymous Coward · · Score: 0

      Not if mom pays for it.

  4. Hahahaha by Anonymous Coward · · Score: 0

    In other news, earlier today BeauHD unveiled his 2-inch micro-penis to his new boyfriend. The boyfriend died from uncontrolled laughter.

    1. Re: Hahahaha by Anonymous Coward · · Score: 0

      Micropenis is a vagina, no?

    2. Re: Hahahaha by Anonymous Coward · · Score: 0

      No, inverted penis is a vagina.

  5. Re:Trump can't have one in prison though by Anonymous Coward · · Score: 0

    wut?

  6. more spyware by Anonymous Coward · · Score: 0

    hint: replace flowers with faces

  7. But will it run Crysis? by Anonymous Coward · · Score: 0

    But will it run Crysis?

    1. Re:But will it run Crysis? by dryriver · · Score: 1

      It can run Crysis BACKWARDS and SIDEWAYS at 1,000,000 FPS. The gameplay is also far more tense, because you are using Nvidia's new TENSOR cores.

      --
      Why did the chicken cross the road? Because Elon Musk put an AI chip in its head.
    2. Re:But will it run Crysis? by PopeRatzo · · Score: 1

      It can run Crysis BACKWARDS and SIDEWAYS at 1,000,000 FPS. The gameplay is also far more tense, because you are using Nvidia's new TENSOR cores.

      I just had a brainstorm. There should be a version of Crysis that mines bitcoins while you play and the more dudes you kill in the game the more bitcoins it mines.

      That's totally my idea don't none a you try to steal it I'm going to patent it in the morning. Or copyright it. I can't remember which.

      --
      You are welcome on my lawn.
  8. Consistent computation? by Anonymous Coward · · Score: 0

    I wonder if this beast has the same issue that was reported a few days ago with the cards that were giving inconsistent results?

  9. Sweet by Anonymous Coward · · Score: 0

    This is just what I need to get started building my AGI !

  10. What a waste of electricity ... by Anonymous Coward · · Score: 0

    Hot!!!

  11. But can it by AHuxley · · Score: 1

    Ray trace at 8K?

    --
    Domestic spying is now "Benign Information Gathering"
  12. No need for Intel to worry by Anonymous Coward · · Score: 0

    They have their awesome stable of Atom and Celron processors.

  13. RoboCop by harvey+the+nerd · · Score: 1

    is that the new enchanced version with the Robo-Cop routines to whack old homeless ladies pushing their cart or bicycle across the street....

    1. Re:RoboCop by bennet42 · · Score: 1

      No that's the Death Race 2000 version of AI that accidentally got loaded by Uber.

  14. Re:Trump can't have one in prison though by Anonymous Coward · · Score: 0, Informative

    He'll get his. Mark my words.

  15. Performance? by Anonymous Coward · · Score: 0

    Okay.. someone hijack that and tell me the hash rate please?

    Someone call the Russians ..they know what they're doing.

  16. Imagine . . . by Joey+Vegetables · · Score: 1

    Imagine a Beowulf cluster of these!

    Yeah, I'm showing my age. So what?

    1. Re:Imagine . . . by LoganTeamX · · Score: 1

      I laughed, well played. Also showing my age here.

      --
      One of the 187.
  17. But . . . by hduff · · Score: 1

    Can it play DOOM?

    --
    "I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert
  18. Mining anybody? by Anonymous Coward · · Score: 0

    That's one hell of a Bitcoin mining rig!