AMD Introduces Radeon Instinct Machine Intelligence Accelerators (hothardware.com)
Reader MojoKid writes: AMD is announcing a new series of Radeon-branded products today, targeted at machine intelligence and deep learning enterprise applications, called Radeon Instinct. As its name suggests, the new Radeon Instinct line of products are comprised of GPU-based solutions for deep learning, inference and training. The new GPUs are also complemented by a free, open-source library and framework for GPU accelerators, dubbed MIOpen. MIOpen is architected for high-performance machine intelligence applications and is optimized for the deep learning frameworks in AMD's ROCm software suite. The first products in the lineup consist of the Radeon Instinct MI6, the MI8, and the MI25. The 150W Radeon Instinct MI6 accelerator is powered by a Polaris-based GPU, packs 16GB of memory (224GB/s peak bandwidth), and will offer up to 5.7 TFLOPS of peak FP16 performance. Next up in the stack is the Fiji-based Radeon Instinct MI8. Like the Radeon R9 Nano, the Radeon Instinct MI8 features 4GB of High-Bandwidth Memory (HBM) with peak bandwidth of 512GB/s. The MI8 will offer up to 8.2 TFLOPS of peak FP16 compute performance, with a board power that typical falls below 175W. The Radeon Instinct MI25 accelerator will leverage AMD's next-generation Vega GPU architecture and has a board power of approximately 300W. All of the Radeon Instinct accelerators are passively cooled but when installed into a server chassis you can bet there will be plenty of air flow. Like the recently released Radeon Pro WX series of professional graphics cards for workstations, Radeon Instinct accelerators will be built by AMD. All of the Radeon Instinct cards will also support AMD MultiGPU (MxGPU) hardware virtualization technology.
"Besides being built for massive scaling, it includes compilers, language run times and interesting (and importantly) CUDA-application support. (CUDA being the NVIDIA developed GPGPU programming language.)"
Holy balls! Time to eat crow buddy: CUDA is fucking supported...
Source: https://www.pcper.com/reviews/Graphics-Cards/Radeon-Instinct-Machine-Learning-GPUs-include-Vega-Preview-Performance
So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
As opposed to RAM that's put on a video card but isn't addressable, so that all it does is waste space and power?
There's a lot of rounding error with FP16.
Sure, but it doesn't matter. Backprop, learning rate, denoising, etc. all just heuristics anyway. So what if your mantissa is off by one bit? You get better accuracy by going wider, adding layers, and (most importantly) using more data. But you can't afford to do that if half your bandwidth is sucked up transmitting meaningless precision.
Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
They are not necessarily more effective, just more efficient. If you have infinite resources, you might even get better results using FP32. But resources are never infinite. Here is a guy who claims that even 8 bits is enough for deep NNs.
There's this thing called "compilers". They eat source code and spit out binaries. Then there's this thing called SPIR-V. AMD supports it. Now put two and two together. If you want to be tortured on the CUDA rack, there's little preventing you from opting for it.
Ezekiel 23:20