AMD Unveils Vega GPU Architecture With 512 Terabytes of Memory Address Space (hothardware.com)
MojoKid writes: AMD lifted the veil on its next generation GPU architecture, codenamed Vega, this morning. One of the underlying forces behind Vega's design is that conventional GPU architectures have not been scaling well for diverse data types. Gaming and graphics workloads have shown steady progress, but today's GPUs are used for much more than just graphics. In addition, the compute capability of GPUs may have been increasing at a good pace, but memory capacity has not kept up. Vega aims to improve both compute performance and addressable memory capacity, however, through some new technologies not available on any previous-gen architecture. First, is that Vega has the most scalable GPU memory architecture built to date with 512TB of address space. It also has a new geometry pipeline tuned for more performance and better efficiency with over 2X peak throughput per clock, a new Compute Unit design, and a revamped pixel engine. The pixel engine features a new draw stream binning rasterizer (DSBR), which reportedly improves performance and saves power. All told, Vega should offer significant improvements in terms of performance and efficiency when products based on the architecture begin shipping in a few months.
Most high end GPU cards available have 8Gb, a large number of budget versions settle for 4Gb, and only a few offer 16Gb. Marketing this as a stand out point is iffy.
What you will find is that most cards have only a fraction of their RAM as addressable, so a 16GB card either 4 or 8 gigs addressable. The increase to 512GB is a godsend to AI researchers and other fields with large datasets.
With Rizen coming out soon and a new GPU design that looks very advanced, AMD is set to make substantial progress in market share, as long as they don't screw up. I'm rooting for them. I had switched all of our shops new PC's to Intel when they released their 6th gen Core series as AMD was just too far behind. Teh consumer PC's were all AMD for the past five years or so. I wanna go back to AMD, as long as the new stuff performs. Don't let us down AMD!
Lisandro you COULD RTFA, you know? It's even an effing meme around here.
The HBCC gives the GPU access to 512TB (half a petabyte) of virtual address space and gives the GPU fine-grained control, for adaptable and programmable data movement. Often, more memory is allocated for a particular workload than is necessary; the HBCC will allow the GPU to better manage disparities like this for more efficient use of memory. The huge address space will also allow the GPU to better handle datasets that exceed the size of the GPU’s local cache. AMD showed a dataset being rendered in real-time on Vega using its ProRender technology, consisting of hundreds of gigabytes of data. Each frame with this dataset takes hours to render on a CPU, but Vega handled it in real-time.
The "news for nerds" version of this story's headline is "AMD Unveils Vega GPU Architecture With 49 bits of Memory Address Space"
See that "Preview" button?
But this is not new at all. IIRC Nvidia's CUDA 5 already gives you 49 bits of unified address space. Don't really know the addressing limitations on previous AMD architectures, but I doubt it was substantially lower.
Realistically, large address spaces when you can only practically fill 0.05-0.1% means little for performance. I don't want to attack AMD with this, who usually manufacture really good GPU hardware, but this sounds like a marketing gimmick and nothing more. I particularly enjoyed the "hours to real-time" comparison... against a CPU.
What you will find is that most cards have only a fraction of their RAM as addressable, so a 16GB card either 4 or 8 gigs addressable. The increase to 512GB is a godsend to AI researchers and other fields with large datasets.
Nope.
1: The GPU addresses the whole damn pool.
2: We're talking about 512 TB, not GB.
3: They're not planning to release a card with 512 TB of RAM, but they are releasing professional cards with lots of RAM (8 GB, 16 GB, or more) AND onboard connections for flash storage (SSDs). Vega will likely continue and extend this. By having a huge address space, you simply have the ability to keep the entire dataset in your cache on the card. The memory controller then decides what needs to live in the fast HBM2 chips at any given moment. You don't need to use PCIe bandwidth, go through the CPU, or (gasp) go to disk storage to get your dataset onto the card for processing after the initial load. You don't need to manually load pieces into or out of the GPU's memory. You just load your shit once and tell it to fucking go.
They're actually using NVMe drives as the extra "memory". This works out well for huge datasets where you take a performance hit streaming it from the host. Load up the data on one of those 4GiB/s NVMe SSDs. They already have a product out that does this and it makes certain workloads much faster. Just waiting to see an 8x PCIe 4.0 NVMe XPoint SSD. Will be wicked fast for what they use it.
There is already technology available to feed this monster. Things like the EMC DSSD can have 1/2 PB of NVMe flash connected via a PCIe bridge, and presented as a single shared memory mapped space to an entire rack if servers. I assume that is the use case for these cards, mostly in the supercomputing space.
Pretty much all of it. I'm having a hard time finding out the number of addressing bits supported by, say, an Arctic Lake (4xx) GPU, but considering that AMD's GCN offered unified memory on the entire 64 bit space since 2011 and nVidia offers 49 bits of unified address space since CUDA 5 it surprises me that someone tried to make a selling point out of this feature.
They went as far as comparing a CPU render against their new GPU. WTF.
The only reason i can imagine someone would try to push this feature is that 512TB sounds like a huge number. There's no practical application for it in the near future, and any benefit of such a large addressing space you already got on previous architectures, both from AMD and its competition.
Someone check me on my logic here. The way I read this article is that AMD has created a new architecture with a memory controller that can address 512TB of memory address space. That's great and all but are we going to see cards any time in the near future with 512TB of GDDR on them? Not likely. How many years away are we? Who knows. It seems to me this is highly theoretical and possibly to put pressure on the memory industry to innovate on even more dense memory to push graphics even farther to the limit. It could also be to get some investor interest in the next "big thing".
Side question: How did AMD validate that their architecture works without actually being able to fabricate an actual board in practice, simulation?
We'll make great pets
The Intel i386 had a 32 bit address bus back in the late eighties. Nobody could afford 4Gb back then for the type of machine that would have one. I had an Amiga with 5Mb of RAM and people ooh'd and aah'd about that.
But it didn't matter. Early 32 bit machines didn't use the top bits to support more RAM, they used it to support more functionality. Flat address spaces, with VM used to locate memory exactly where it needed to be.
I suspect the aim is similar here too.
You are not alone. This is not normal. None of this is normal.
You don't understand.
The architecture can address that much, but the actual product will only address what's available.
There will be on-package HBM2 and the ability to connect to on-board (but off-package) storage in the form of fast flash.
512 TB of addressable space is just future proofing to allow for seamless work with a dataset regardless of whether it's on the 16 GB of ball-smackingly fast HBM2, on the SSD on your RadeonPro card, or in your system memory (or potential even abstracted out to disk storage). The drivers and GPU's memory manager handle moving the data around as you work on it. You shouldn't have to explicitly manage the dataset and thus not have to load it twice. In a worst-case scenario you'll be moving stuff back and forth over the PICe bus, just as you do now. But that'll only happen if you're trying to be stupid or your dataset exceeds the capacity of the on-board memory and SSD.
Also keep in mind that AMD will almost certainly be releasing APUs (their CPU+GPU combos) with Vega cores and HBM2 memory. This is all an extension of their previous push for HSA (heterogeneous system architecture) - essentially tying CPU and GPU and memory and everything else closely together and letting everyone talk to each other in the most efficient way possible.