AMD Introduces Radeon Instinct Machine Intelligence Accelerators (hothardware.com)

← Back to Stories (view on slashdot.org)

AMD Introduces Radeon Instinct Machine Intelligence Accelerators (hothardware.com)

Posted by ryuzaki0 on Monday December 12, 2016 @03:25AM from the shape-of-things-to-come dept.

Reader MojoKid writes: AMD is announcing a new series of Radeon-branded products today, targeted at machine intelligence and deep learning enterprise applications, called Radeon Instinct. As its name suggests, the new Radeon Instinct line of products are comprised of GPU-based solutions for deep learning, inference and training. The new GPUs are also complemented by a free, open-source library and framework for GPU accelerators, dubbed MIOpen. MIOpen is architected for high-performance machine intelligence applications and is optimized for the deep learning frameworks in AMD's ROCm software suite. The first products in the lineup consist of the Radeon Instinct MI6, the MI8, and the MI25. The 150W Radeon Instinct MI6 accelerator is powered by a Polaris-based GPU, packs 16GB of memory (224GB/s peak bandwidth), and will offer up to 5.7 TFLOPS of peak FP16 performance. Next up in the stack is the Fiji-based Radeon Instinct MI8. Like the Radeon R9 Nano, the Radeon Instinct MI8 features 4GB of High-Bandwidth Memory (HBM) with peak bandwidth of 512GB/s. The MI8 will offer up to 8.2 TFLOPS of peak FP16 compute performance, with a board power that typical falls below 175W. The Radeon Instinct MI25 accelerator will leverage AMD's next-generation Vega GPU architecture and has a board power of approximately 300W. All of the Radeon Instinct accelerators are passively cooled but when installed into a server chassis you can bet there will be plenty of air flow. Like the recently released Radeon Pro WX series of professional graphics cards for workstations, Radeon Instinct accelerators will be built by AMD. All of the Radeon Instinct cards will also support AMD MultiGPU (MxGPU) hardware virtualization technology.

27 of 55 comments (clear)

Min score:

Reason:

Sort:

Re:CUDA benchmarks? by 0100010001010011 · 2016-12-12 04:11 · Score: 1

Woosh.
Addressable memory by BigBuckHunter · 2016-12-12 04:13 · Score: 1

Every time I see "16 GB of memory" on a GPU card, I have to ask the same question... Is all 16GB addressable? I've never been 'not' disappointed before.
1. Re:Addressable memory by rsmith-mac · 2016-12-12 10:23 · Score: 2
  
  Every time I see "16 GB of memory" on a GPU card, I have to ask the same question... Is all 16GB addressable?
  As opposed to RAM that's put on a video card but isn't addressable, so that all it does is waste space and power?
AMD driver developer says: by citizenr · 2016-12-12 04:26 · Score: 1

In own words of AMD driver developer:
"We don't happen to have the resources to pay someone else to do that for us."
https://lists.freedesktop.org/...
AMD does hardware, but they dont support it with software.

--
Who logs in to gdm? Not I, said the duck.
Re:CUDA benchmarks? by Anonymous Coward · 2016-12-12 04:48 · Score: 2, Informative

"Besides being built for massive scaling, it includes compilers, language run times and interesting (and importantly) CUDA-application support. (CUDA being the NVIDIA developed GPGPU programming language.)"
Holy balls! Time to eat crow buddy: CUDA is fucking supported...
Source: https://www.pcper.com/reviews/Graphics-Cards/Radeon-Instinct-Machine-Learning-GPUs-include-Vega-Preview-Performance
FP16 isn't even meant for computation by sl3xd · 2016-12-12 05:18 · Score: 1

So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
Not only that, but FP16 is intended for storage (of many floating-point values where higher precision need not be stored), not for performing arithmetic computations.
Kudos to AMD's marketing department for boasting about their compute performance with a number format that was never meant for computation.
Tell them to get back to me with their 64, 128, and 256-bit IEEE floating point performance..

--
-- Sometimes you have to turn the lights off in order to see.
1. Re:FP16 isn't even meant for computation by Graymalkin · 2016-12-12 05:40 · Score: 1
  
  Storage is technically a floating point operation!
  — AMD'S marketing department
  
  --
  I'm a loner Dottie, a Rebel.
2. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 05:42 · Score: 3, Insightful
  
  So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
  FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
3. Re:FP16 isn't even meant for computation by darkstar949 · 2016-12-12 06:03 · Score: 1
  
  Neural networks have a shaky biological basis at best. More pragmatically, they are a network of perceptrons with sigmoidal output functions. In that cases, yes, more bits of precision can be very relevant. Once you start talking about a deep learning network the updates to individual perceptrons can be very small and 32 bits are needed.
4. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-12 06:23 · Score: 1
  
  Actually, since perceptrons can't do non-linear separable problems, it might be more accurate to call them non-linear multiple regression optimizers. Although, since gradient descent backpropogation is based on a difference of squares, it might be even more accurate to call them non-linear least squares multiple regression optimizers. But then, since they do non-linear regression with zillions terms, they are really arbitrary function approximators, so it might be even more accurate to call them non-linear least squares multiple regression arbitrary function approximator optimizers.
5. Re:FP16 isn't even meant for computation by Kartu · 2016-12-12 07:18 · Score: 1
  
  This is aimed at "deep learning" and 16 bit is just what they need.
6. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 08:02 · Score: 1
  
  More pragmatically, they are a network of perceptrons with sigmoidal output functions.
  Today, most bleeding edge NNs use rectified linear activation functions. Sigmoids are soooo 2014.
  
  Once you start talking about a deep learning network the updates to individual perceptrons can be very small and 32 bits are needed.
  You can get the same flexibility and more just by going wider and deeper. The bottleneck for NNs is not the math ops, but getting data in and out of the GPU. By using FP16, you cut the per-neuron data in half.
7. Re:FP16 isn't even meant for computation by QuantumFTL · 2016-12-12 08:46 · Score: 1
  
  Accidentally posted as anonymous coward, reposting under my actual name.
  
  So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
  FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
  There's a lot of rounding error with FP16. The neural networks I use are 16-bit integers, which work much, much better, at least for the work I'm doing. Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
8. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-12 10:04 · Score: 1
  
  Do you really think the output voltage of a biological neurons has 32 bits of precision and range?
  ...What? It's analog. It's got precision going down into the quantum scale... You know depending on noise. Range is also a big issue. But it's leveraging real-world physics to compute things. Think about how many discreet binary operations you'd have to perform to calculate the weighted middle point between populated cities. With an analog "computer"(it's a board with holes, a bit of string with some rocks on the end) it's "computation" is done practically instantly when you lift the thing up and gravity pulls all the rocks down and the middle knot jumps to the point you want. When I say a neuron has god's own precision, that's not entirely a hyperbole.
  Analog computers are just different. Neural Networks EMULATE the brain, but the differences are important.
  But yeah, I'd agree with the "wider deeper better" comment, so it's a bit moot.
9. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-12 10:42 · Score: 1
  
  My apologies. I assumed biological neurons ran in meatspace.
10. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 11:22 · Score: 2
  
  There's a lot of rounding error with FP16.
  Sure, but it doesn't matter. Backprop, learning rate, denoising, etc. all just heuristics anyway. So what if your mantissa is off by one bit? You get better accuracy by going wider, adding layers, and (most importantly) using more data. But you can't afford to do that if half your bandwidth is sucked up transmitting meaningless precision.
  
  Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
  They are not necessarily more effective, just more efficient. If you have infinite resources, you might even get better results using FP32. But resources are never infinite. Here is a guy who claims that even 8 bits is enough for deep NNs.
11. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 13:56 · Score: 1
  
  ...What? It's analog. It's got precision going down into the quantum scale...
  That is not true in any meaningful sense. If you give the same inputs to the same biological neuron, there is no way that you are going to get the same output down to the planck scale. In fact, it is unlikely that you are even going to get 8 bit precision (an output difference of 1/256th).
12. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-14 03:55 · Score: 1
  
  Wow, it's like the meaningful usage of a biological neuron depends on how much noise there is in the system.
  But I think you forget the subject matter. It doesn't matter if the neuron fires 10% early 20% of the time. It's a real-world genetic algorithm system. That's just a feature the GA gets to play with. Because it truly doesn't care about getting exact answers, only good enough to balance a shmuck on two legs... most of the time.
  Jesus, meat-space is just different. Comparing the two is going to run into problems.
13. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-14 04:00 · Score: 1
  
  Did you even read the thing I quoted?
  How about my second to last sentence?
  Tell you what, give the entire post another once over and then try again.
14. Re:FP16 isn't even meant for computation by Blaskowicz · 2016-12-14 07:01 · Score: 1
  
  What I wanted to reply to the parent : it's not like a one dimensional analog signal either. This leaves out chemicals and finer details of what's happening in dendrites and axon and whatever stuff I can't name.
  The idea you can map out the high level electrical brain and only that, and get a brain is a fallacy. It's like we're stuck in the late 90s and Ray Kurweil's ideas of the brain ; I reckon it's the main limit to transhumanism or singularity philosophies. Computer neural networks do have their own uses though, obviously.
15. Re:FP16 isn't even meant for computation by QuantumFTL · 2016-12-18 10:53 · Score: 1
  
  So, one problem is that there is not always more data. In my field, we have a surplus of some sorts of data, but other data requires hundreds of thousands of hours of human input, and we only have so much of that to go around. Processing all of that is easy enough, getting more is not.
  Also, by "effective", I should have made it clear that I meant "an effective overall solution to the problem", which includes all costs of training a wider, lower-precision network. This includes input data collection, storage and processing, all of the custom software to handle this odd floating point format, including FP16-specific test code and documentation, run time server costs and latency, any increased risks introduced by using code paths in training and , etc.
  I'm not saying that I don't believe it's possible, I've just seen absolutely no evidence that this is a significant win in most or even a sizable fraction of cases, or that it represents a "best practice" in the field. Our own experiments have shown a severe degradation in performance when using these nets w/out a complete retraining, the software engineering costs will be nontrivial, and much of the hardware we are forced to run on does not even support this functionality.
  As an analog, when we use integer based nets and switch between 16-bit and 8-bit integers, we see an unacceptable level of degradation, even though there is a modest speedup and we can use slightly larger neural nets. I'm very wary of anything with a mantissa much smaller than 16 bits for that reason--those few bits seem to make a significant difference, at least for what we're doing. We're solving a very difficult constrained optimization problem using markov chains in real time, and if the observational features are lower fidelity, the optimization search will run out of time to explore the search space effectively before the result is returned to the rest of the system. It's possible that the sensitivity of our optimization algorithm to input quality is the issue here, not the fundamental usefulness of FP16, but I'm still quite skeptical. If this were a "slam dunk", I'd expect to see it move through the literature in a wave like the Restricted Boltzmann Machine did.
  Oh, and thank you for the like (great reading) and the thoughtful reply. Not always easy to find on niche topics online.
Re:CUDA benchmarks? by ShanghaiBill · 2016-12-12 05:34 · Score: 1

I was thinking Slashdot would be the crowd that I wouldn't have to add the sarcasm tag (/s) but it appears a few people took it literally.
Many Aspies have difficulty understanding understanding sarcasm. We take everything literally. Slashdot tends to have more "whooshes" than other online forums.
Re:What exactly are they good for? by GrumpySteen · 2016-12-12 05:55 · Score: 1

The specific AI use is deep learning, which you'll no doubt write off as a buzz word, but it's important to a large number of fields such as image recognition, voice recognition, drug research, product recommendations and so on.
Part of deep learning is the analysis of large quantities of data. A GPU should be able to analyze thousands of sets of data in parallel, which would make deep learning cheaper and faster. ATI is attempting to produce the tools needed to make that happen.
Re:CUDA benchmarks? by K.+S.+Kyosuke · 2016-12-12 12:09 · Score: 1

nVidia: "You can have any programming language you want as long as it's our bastardized version of C". :D

--
Ezekiel 23:20
Re:CUDA benchmarks? by K.+S.+Kyosuke · 2016-12-12 12:14 · Score: 2

There's this thing called "compilers". They eat source code and spit out binaries. Then there's this thing called SPIR-V. AMD supports it. Now put two and two together. If you want to be tortured on the CUDA rack, there's little preventing you from opting for it.

--
Ezekiel 23:20
Re: CUDA benchmarks? by Entrope · 2016-12-12 13:02 · Score: 1

Which open, reasonably available standards make it as easy to write compute kernels and interface host code with them as CUDA? CUDA lock-in is not pleasant, but writing code to launch OpenCL or Vulkan kernels is at least an order of magnitude harder than the code to launch a CUDA kernel, and often two orders of magnitude harder.
Re:Value-add feature. by Blaskowicz · 2016-12-14 07:36 · Score: 1

There already exist some Fire Pro branded cards with the virtualization features. One is based around the Radeon R9 380's GPU, and is quite very expensive but you pay a "Fire Pro" premium akin to a "Quadro" premium mostly. (Substitute FireGL / Fire Pro / Pro)
It's the counterpart to nvidia's Geforce Grid (or formerly VGX), one redeeming quality is nvidia has sold complete Geforce Grid systems as in pre-built rackable servers while AMD will sell you the card only.
I'll say it's a licensing issue : on a similar note, nothing could have stopped Windows XP Home from being able to run thirty thin clients if that's you wish. Except they asked you to run Windows Server 2000 or Server 2003 and pay expensive per seat licenses instead.
(XP Home did actually come with an RDP server and multi-user support, only these were labeled "remote assistance" and "fast user switching" respectively)