AMD Introduces Radeon Instinct Machine Intelligence Accelerators (hothardware.com)

← Back to Stories (view on slashdot.org)

AMD Introduces Radeon Instinct Machine Intelligence Accelerators (hothardware.com)

Posted by ryuzaki0 on Monday December 12, 2016 @03:25AM from the shape-of-things-to-come dept.

Reader MojoKid writes: AMD is announcing a new series of Radeon-branded products today, targeted at machine intelligence and deep learning enterprise applications, called Radeon Instinct. As its name suggests, the new Radeon Instinct line of products are comprised of GPU-based solutions for deep learning, inference and training. The new GPUs are also complemented by a free, open-source library and framework for GPU accelerators, dubbed MIOpen. MIOpen is architected for high-performance machine intelligence applications and is optimized for the deep learning frameworks in AMD's ROCm software suite. The first products in the lineup consist of the Radeon Instinct MI6, the MI8, and the MI25. The 150W Radeon Instinct MI6 accelerator is powered by a Polaris-based GPU, packs 16GB of memory (224GB/s peak bandwidth), and will offer up to 5.7 TFLOPS of peak FP16 performance. Next up in the stack is the Fiji-based Radeon Instinct MI8. Like the Radeon R9 Nano, the Radeon Instinct MI8 features 4GB of High-Bandwidth Memory (HBM) with peak bandwidth of 512GB/s. The MI8 will offer up to 8.2 TFLOPS of peak FP16 compute performance, with a board power that typical falls below 175W. The Radeon Instinct MI25 accelerator will leverage AMD's next-generation Vega GPU architecture and has a board power of approximately 300W. All of the Radeon Instinct accelerators are passively cooled but when installed into a server chassis you can bet there will be plenty of air flow. Like the recently released Radeon Pro WX series of professional graphics cards for workstations, Radeon Instinct accelerators will be built by AMD. All of the Radeon Instinct cards will also support AMD MultiGPU (MxGPU) hardware virtualization technology.

55 comments

Min score:

Reason:

Sort:

What exactly are they good for? by Anonymous Coward · 2016-12-12 03:41 · Score: 0

What can I do with these new "AI cards"? Specific stuff, preferably which makes me money. Not abstract, generic buzzwords, please.
1. Re:What exactly are they good for? by Anonymous Coward · 2016-12-12 04:01 · Score: 0
  
  you can make your own alexa or google home
  then you too can know the joy and satisfaction of having a device that opens your garage door when you ask it to play a podcast
2. Re:What exactly are they good for? by Anonymous Coward · 2016-12-12 04:11 · Score: 0
  
  What can I do with these new "AI cards"? Specific stuff, preferably which makes me money. Not abstract, generic buzzwords, please.
  They allow you to 3D print a paradigm shifting immersive VR experience and deliver it via drone to your target demographic.
3. Re:What exactly are they good for? by GrumpySteen · 2016-12-12 05:55 · Score: 1
  
  The specific AI use is deep learning, which you'll no doubt write off as a buzz word, but it's important to a large number of fields such as image recognition, voice recognition, drug research, product recommendations and so on.
  Part of deep learning is the analysis of large quantities of data. A GPU should be able to analyze thousands of sets of data in parallel, which would make deep learning cheaper and faster. ATI is attempting to produce the tools needed to make that happen.
4. Re:What exactly are they good for? by Anonymous Coward · 2016-12-12 06:22 · Score: 0
  
  You make no money, you have no time to think deeply about things and nothing to invest.
5. Re:What exactly are they good for? by Anonymous Coward · 2016-12-12 12:30 · Score: 0
  
  Garbage in, garbage out. These cards, and any others being pushed by various vendors, are intended for deep analysis of very large data sets. You should already have high-bandwidth access to a data set that you strongly suspect may provide interesting insights before asking if this is the right tool to help you expose them.
CUDA benchmarks? by 0100010001010011 · 2016-12-12 03:41 · Score: 0, Flamebait

How well do they run CUDA? From what I've done so far with ML/NNs it's CUDA all the way.
Almost all "How to use GPU for ___" come back with CUDA instructions first and OpenCL is nowhere near as close.
Looking at the tensorflow open tickets it's still very much a work in progress: https://github.com/tensorflow/...
1. Re:CUDA benchmarks? by Anonymous Coward · 2016-12-12 04:01 · Score: 0
  
  Uh, CUDA is nVIdia.
2. Re:CUDA benchmarks? by 0100010001010011 · 2016-12-12 04:11 · Score: 1
  
  Woosh.
3. Re:CUDA benchmarks? by Anonymous Coward · 2016-12-12 04:32 · Score: 0
  
  http://hothardware.com/gallery/NewsItem/39569?image=big_radeon-instinct-slide-4.jpg&tag=&p=1
  Looks like they've forked all the big frameworks to support their GPU. In that case: I'd like to walk back my earlier claim and simply state that I'm disinclined to deal with the headache of working from forked branches of those frameworks: but there are many people who would probably be willing to tolerate that to save some money.
4. Re:CUDA benchmarks? by Anonymous Coward · 2016-12-12 04:46 · Score: 0
  
  Nobody gets your lame attempt a humor, so no "woosh" for you.
5. Re:CUDA benchmarks? by Anonymous Coward · 2016-12-12 04:48 · Score: 2, Informative
  
  "Besides being built for massive scaling, it includes compilers, language run times and interesting (and importantly) CUDA-application support. (CUDA being the NVIDIA developed GPGPU programming language.)"
  Holy balls! Time to eat crow buddy: CUDA is fucking supported...
  Source: https://www.pcper.com/reviews/Graphics-Cards/Radeon-Instinct-Machine-Learning-GPUs-include-Vega-Preview-Performance
6. Re:CUDA benchmarks? by 0100010001010011 · 2016-12-12 04:49 · Score: 0
  
  I was thinking Slashdot would be the crowd that I wouldn't have to add the sarcasm tag (/s) but it appears a few people took it literally.
7. Re:CUDA benchmarks? by ShanghaiBill · 2016-12-12 05:34 · Score: 1
  
  I was thinking Slashdot would be the crowd that I wouldn't have to add the sarcasm tag (/s) but it appears a few people took it literally.
  Many Aspies have difficulty understanding understanding sarcasm. We take everything literally. Slashdot tends to have more "whooshes" than other online forums.
8. Re: CUDA benchmarks? by Anonymous Coward · 2016-12-12 11:48 · Score: 0
  
  It runs CUDA style workflow amazingly, as it lets you upgrade to OpenCL vs Nvidias buggy and proprietary version.
9. Re:CUDA benchmarks? by K.+S.+Kyosuke · 2016-12-12 12:09 · Score: 1
  
  nVidia: "You can have any programming language you want as long as it's our bastardized version of C". :D
  
  --
  Ezekiel 23:20
10. Re:CUDA benchmarks? by K.+S.+Kyosuke · 2016-12-12 12:14 · Score: 2
  
  There's this thing called "compilers". They eat source code and spit out binaries. Then there's this thing called SPIR-V. AMD supports it. Now put two and two together. If you want to be tortured on the CUDA rack, there's little preventing you from opting for it.
  
  --
  Ezekiel 23:20
11. Re: CUDA benchmarks? by Entrope · 2016-12-12 13:02 · Score: 1
  
  Which open, reasonably available standards make it as easy to write compute kernels and interface host code with them as CUDA? CUDA lock-in is not pleasant, but writing code to launch OpenCL or Vulkan kernels is at least an order of magnitude harder than the code to launch a CUDA kernel, and often two orders of magnitude harder.
12. Re: CUDA benchmarks? by Anonymous Coward · 2016-12-13 20:02 · Score: 0
  
  Everything you're saying tells me you gave up on OpenCL before you'd learned a graceful way to run it. Then you learned CUDA, persisted with it, and now you think CUDA is the easier of the two APIs you learned. But you only actually learned one of them. ...or, everyone who's persisted with OpenCL is doing something "at least an order of magnitude harder" than whatever it is you're doing, by choice, without complaining about it. Which of those scenarios is likelier?
Addressable memory by BigBuckHunter · 2016-12-12 04:13 · Score: 1

Every time I see "16 GB of memory" on a GPU card, I have to ask the same question... Is all 16GB addressable? I've never been 'not' disappointed before.
1. Re:Addressable memory by rsmith-mac · 2016-12-12 10:23 · Score: 2
  
  Every time I see "16 GB of memory" on a GPU card, I have to ask the same question... Is all 16GB addressable?
  As opposed to RAM that's put on a video card but isn't addressable, so that all it does is waste space and power?
AMD driver developer says: by citizenr · 2016-12-12 04:26 · Score: 1

In own words of AMD driver developer:
"We don't happen to have the resources to pay someone else to do that for us."
https://lists.freedesktop.org/...
AMD does hardware, but they dont support it with software.

--
Who logs in to gdm? Not I, said the duck.
1. Re:AMD driver developer says: by 0100010001010011 · 2016-12-12 04:47 · Score: 0
  
  Boss: I need you to get some hardware to try out neural net training on $DATASET
  0100010001010011: Well. I can buy the Nvidia and get started with Tensorflow or AMD opensourced everything and I have to write the tools myself.
  Boss: I need the results by the end of the month, I don't care how you do it.
OpenCL Support in Major Tensor Libraries by Anonymous Coward · 2016-12-12 04:42 · Score: 0

None of this will matter much until they get full, out-of-the-box OpenCL support in major deep learning libraries like TensorFlow, Theano, and Torch.
Just hire some people to do this and watch your sales shoot up.
FP16 isn't even meant for computation by sl3xd · 2016-12-12 05:18 · Score: 1

So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
Not only that, but FP16 is intended for storage (of many floating-point values where higher precision need not be stored), not for performing arithmetic computations.
Kudos to AMD's marketing department for boasting about their compute performance with a number format that was never meant for computation.
Tell them to get back to me with their 64, 128, and 256-bit IEEE floating point performance..

--
-- Sometimes you have to turn the lights off in order to see.
1. Re:FP16 isn't even meant for computation by Graymalkin · 2016-12-12 05:40 · Score: 1
  
  Storage is technically a floating point operation!
  — AMD'S marketing department
  
  --
  I'm a loner Dottie, a Rebel.
2. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 05:42 · Score: 3, Insightful
  
  So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
  FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
3. Re:FP16 isn't even meant for computation by darkstar949 · 2016-12-12 06:03 · Score: 1
  
  Neural networks have a shaky biological basis at best. More pragmatically, they are a network of perceptrons with sigmoidal output functions. In that cases, yes, more bits of precision can be very relevant. Once you start talking about a deep learning network the updates to individual perceptrons can be very small and 32 bits are needed.
4. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-12 06:23 · Score: 1
  
  Actually, since perceptrons can't do non-linear separable problems, it might be more accurate to call them non-linear multiple regression optimizers. Although, since gradient descent backpropogation is based on a difference of squares, it might be even more accurate to call them non-linear least squares multiple regression optimizers. But then, since they do non-linear regression with zillions terms, they are really arbitrary function approximators, so it might be even more accurate to call them non-linear least squares multiple regression arbitrary function approximator optimizers.
5. Re:FP16 isn't even meant for computation by Kartu · 2016-12-12 07:18 · Score: 1
  
  This is aimed at "deep learning" and 16 bit is just what they need.
6. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-12 07:24 · Score: 0
  
  https://arxiv.org/pdf/1502.02551.pdf
  Tl; dr:
  Noisy data means that rounding errors don't matter more than sampling errors/ bias.
7. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 08:02 · Score: 1
  
  More pragmatically, they are a network of perceptrons with sigmoidal output functions.
  Today, most bleeding edge NNs use rectified linear activation functions. Sigmoids are soooo 2014.
  
  Once you start talking about a deep learning network the updates to individual perceptrons can be very small and 32 bits are needed.
  You can get the same flexibility and more just by going wider and deeper. The bottleneck for NNs is not the math ops, but getting data in and out of the GPU. By using FP16, you cut the per-neuron data in half.
8. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-12 08:40 · Score: 0
  
  FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
  There's a lot of rounding error with FP16. The neural networks I use are 16-bit integers, which work much, much better, at least for the work I'm doing.
  Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
9. Re:FP16 isn't even meant for computation by QuantumFTL · 2016-12-12 08:46 · Score: 1
  
  Accidentally posted as anonymous coward, reposting under my actual name.
  
  So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
  FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
  There's a lot of rounding error with FP16. The neural networks I use are 16-bit integers, which work much, much better, at least for the work I'm doing. Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
10. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-12 09:53 · Score: 0
  
  I wonder at what point increased precision no longer helps. I've seen some visual examples of FP errors and the differences in output. Going from 16 to 32 show a large difference with in a few iterations, but going from 32 to 64 took quite a few and 64 to 128 we had to the word of the presenter that they would visually diverge some time in the future. The presenter's main point was even going from 32 to 64 was difficult to claim a benefit. You got different answers because 64 was more precise, but the number of decimal places of precision is insane. As for 128, I will use NTP 128's time precision of "the ability to reference any point in from from the Big Bang to the heat death of the Universe with a precision of the time it takes for an electron to emit a photon." In other words, total over-kill.
11. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-12 10:04 · Score: 1
  
  Do you really think the output voltage of a biological neurons has 32 bits of precision and range?
  ...What? It's analog. It's got precision going down into the quantum scale... You know depending on noise. Range is also a big issue. But it's leveraging real-world physics to compute things. Think about how many discreet binary operations you'd have to perform to calculate the weighted middle point between populated cities. With an analog "computer"(it's a board with holes, a bit of string with some rocks on the end) it's "computation" is done practically instantly when you lift the thing up and gravity pulls all the rocks down and the middle knot jumps to the point you want. When I say a neuron has god's own precision, that's not entirely a hyperbole.
  Analog computers are just different. Neural Networks EMULATE the brain, but the differences are important.
  But yeah, I'd agree with the "wider deeper better" comment, so it's a bit moot.
12. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-12 10:31 · Score: 0
  
  Umm, not sure what all that was about but he was clearly talking about the precision and range of A/D. Not the granularity of meatspace.
13. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-12 10:42 · Score: 1
  
  My apologies. I assumed biological neurons ran in meatspace.
14. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 11:22 · Score: 2
  
  There's a lot of rounding error with FP16.
  Sure, but it doesn't matter. Backprop, learning rate, denoising, etc. all just heuristics anyway. So what if your mantissa is off by one bit? You get better accuracy by going wider, adding layers, and (most importantly) using more data. But you can't afford to do that if half your bandwidth is sucked up transmitting meaningless precision.
  
  Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
  They are not necessarily more effective, just more efficient. If you have infinite resources, you might even get better results using FP32. But resources are never infinite. Here is a guy who claims that even 8 bits is enough for deep NNs.
15. Re:FP16 isn't even meant for computation by ShanghaiBill · 2016-12-12 13:56 · Score: 1
  
  ...What? It's analog. It's got precision going down into the quantum scale...
  That is not true in any meaningful sense. If you give the same inputs to the same biological neuron, there is no way that you are going to get the same output down to the planck scale. In fact, it is unlikely that you are even going to get 8 bit precision (an output difference of 1/256th).
16. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-13 20:20 · Score: 0
  
  FP16 is often preferred for neutral network simulation if it's faster and uses less memory. It hasn't always been faster, but it is now, and it definitely uses less memory. You don't necessarily need accurate results out of a neural network. You want interesting results, and half precision isn't a disadvantage there.
  FP16 is often sufficient for image processing and other graphics tasks, too. Particles & cloth simulation for games, fluid simulation for VFX, encoding & decoding video... none of those things require full floating-point precision, and if half-float is double-speed, that's a huge win.
17. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-13 20:28 · Score: 0
  
  Simulating meat isn't really the goal of neural network research, though.
  A very accurate simulation of some ideal & disembodied neurons isn't as interesting as an approximate simulation that can do a lot of work.
18. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-14 03:55 · Score: 1
  
  Wow, it's like the meaningful usage of a biological neuron depends on how much noise there is in the system.
  But I think you forget the subject matter. It doesn't matter if the neuron fires 10% early 20% of the time. It's a real-world genetic algorithm system. That's just a feature the GA gets to play with. Because it truly doesn't care about getting exact answers, only good enough to balance a shmuck on two legs... most of the time.
  Jesus, meat-space is just different. Comparing the two is going to run into problems.
19. Re:FP16 isn't even meant for computation by HeckRuler · 2016-12-14 04:00 · Score: 1
  
  Did you even read the thing I quoted?
  How about my second to last sentence?
  Tell you what, give the entire post another once over and then try again.
20. Re:FP16 isn't even meant for computation by Blaskowicz · 2016-12-14 07:01 · Score: 1
  
  What I wanted to reply to the parent : it's not like a one dimensional analog signal either. This leaves out chemicals and finer details of what's happening in dendrites and axon and whatever stuff I can't name.
  The idea you can map out the high level electrical brain and only that, and get a brain is a fallacy. It's like we're stuck in the late 90s and Ray Kurweil's ideas of the brain ; I reckon it's the main limit to transhumanism or singularity philosophies. Computer neural networks do have their own uses though, obviously.
21. Re:FP16 isn't even meant for computation by Anonymous Coward · 2016-12-14 07:18 · Score: 0
  
  nvidia is making FP16 computation very slow on gaming cards though, as far as I remember. As in really, awfully slow and nominally compatible : it's meant to be really fast on Titan and Tesla (or just the biggest Tesla. GP100 focuses on FP16 ; GP102, 104 and 106 can do INT8. The latter ones out to be especially good at "executing" a neural network rather than training it)
  So, use of FP16 might suffer for the uses you mentioned, we'll have to see what future (but already existing) AMD consumer GPUs do, they might support FP16 at full speed afterall as long with a number of mobile SoC chips.
22. Re:FP16 isn't even meant for computation by QuantumFTL · 2016-12-18 10:53 · Score: 1
  
  So, one problem is that there is not always more data. In my field, we have a surplus of some sorts of data, but other data requires hundreds of thousands of hours of human input, and we only have so much of that to go around. Processing all of that is easy enough, getting more is not.
  Also, by "effective", I should have made it clear that I meant "an effective overall solution to the problem", which includes all costs of training a wider, lower-precision network. This includes input data collection, storage and processing, all of the custom software to handle this odd floating point format, including FP16-specific test code and documentation, run time server costs and latency, any increased risks introduced by using code paths in training and , etc.
  I'm not saying that I don't believe it's possible, I've just seen absolutely no evidence that this is a significant win in most or even a sizable fraction of cases, or that it represents a "best practice" in the field. Our own experiments have shown a severe degradation in performance when using these nets w/out a complete retraining, the software engineering costs will be nontrivial, and much of the hardware we are forced to run on does not even support this functionality.
  As an analog, when we use integer based nets and switch between 16-bit and 8-bit integers, we see an unacceptable level of degradation, even though there is a modest speedup and we can use slightly larger neural nets. I'm very wary of anything with a mantissa much smaller than 16 bits for that reason--those few bits seem to make a significant difference, at least for what we're doing. We're solving a very difficult constrained optimization problem using markov chains in real time, and if the observational features are lower fidelity, the optimization search will run out of time to explore the search space effectively before the result is returned to the rest of the system. It's possible that the sensitivity of our optimization algorithm to input quality is the issue here, not the fundamental usefulness of FP16, but I'm still quite skeptical. If this were a "slam dunk", I'd expect to see it move through the literature in a wave like the Restricted Boltzmann Machine did.
  Oh, and thank you for the like (great reading) and the thoughtful reply. Not always easy to find on niche topics online.
They can only compete on cost. by Anonymous Coward · 2016-12-12 05:37 · Score: 0

If they open their consumer products up to full, unrestricted GPGPU access they could have a chance by getting their tools in to the hands of programmers on the cheap.
Nvidia gates the good stuff behind expensive product lines because they can get away away with it. The only difference between the consumer gear is preferential binning, memory amounts, and configuration fuses that enable/disable features (Many of which are software only). The silicon is the same.
AMD could make inroads if they enable the small players to get the same results with generic gaming hardware. - Of course without the benefits of professional level support and mature frameworks, which is why Nvidia is head in the business and research space.
AMD needs to get their foot in the door because right now they're behind.
Ayy by Anonymous Coward · 2016-12-12 05:55 · Score: 0

Ayy
I hear... by Anonymous Coward · 2016-12-12 05:59 · Score: 0

I hear the Radeon Indistinct will be using fuzzy logic.
Value-add feature. by Anonymous Coward · 2016-12-12 06:11 · Score: 0

The only problem with MxGPU is that all the virtualization vendors see it as a value-add so it's going to be an extra cost item.
1. Re:Value-add feature. by Blaskowicz · 2016-12-14 07:36 · Score: 1
  
  There already exist some Fire Pro branded cards with the virtualization features. One is based around the Radeon R9 380's GPU, and is quite very expensive but you pay a "Fire Pro" premium akin to a "Quadro" premium mostly. (Substitute FireGL / Fire Pro / Pro)
  It's the counterpart to nvidia's Geforce Grid (or formerly VGX), one redeeming quality is nvidia has sold complete Geforce Grid systems as in pre-built rackable servers while AMD will sell you the card only.
  I'll say it's a licensing issue : on a similar note, nothing could have stopped Windows XP Home from being able to run thirty thin clients if that's you wish. Except they asked you to run Windows Server 2000 or Server 2003 and pay expensive per seat licenses instead.
  (XP Home did actually come with an RDP server and multi-user support, only these were labeled "remote assistance" and "fast user switching" respectively)
Efficiency indicators by Anonymous Coward · 2016-12-13 08:45 · Score: 0

The real bet in massively parallel processing is one: cost per flop.
If this is not much lower than current cpu/gpu configurations for a typical pc/server, even if these boards are super-fast for their scale, they will fail as a product in market.
Non-scalable (due to cost) massively parallel processing is only for research institutions, DoE and NSA that have enough money to spend on supercomputers anyway.