AMD Details High Bandwidth Memory (HBM) DRAM, Pushes Over 100GB/s Per Stack
MojoKid writes: Recently, a few details of AMD's next-generation Radeon 300-series graphics cards have trickled out. Today, AMD has publicly disclosed new info regarding their High Bandwidth Memory (HBM) technology that will be used on some Radeon 300-series and APU products. Currently, a relatively large number of GDDR5 chips are necessary to offer sufficient capacity and bandwidth for modern GPUs, which means significant PCB real estate is consumed. On-chip integration is not ideal for DRAM because it is not size or cost effective with a logic-optimized GPU or CPU manufacturing process. HBM, however, brings the DRAM as close to possible to the logic die (GPU) as possible. AMD partnered with Hynix and a number of companies to help define the HBM specification and design a new type of memory chip with low power consumption and an ultra-wide bus width, which was eventually adopted by JEDEC 2013. They also develop a DRAM interconnect called an "interposer," along with ASE, Amkor, and UMC. The interposer allows DRAM to be brought into close proximity with the GPU and simplifies communication and clocking. HBM DRAM chips are stacked vertically, and "through-silicon vias" (TSVs) and "bumps" are used to connect one DRAM chip to the next, and then to a logic interface die, and ultimately the interposer. The end result is a single package on which the GPU/SoC and High Bandwidth Memory both reside. 1GB of GDDR5 memory (four 256MB chips), requires roughly 672mm2. Because HBM is vertically stacked, that same 1GB requires only about 35mm2. The bus width on an HBM chip is 1024-bits wide, versus 32-bits on a GDDR5 chip. As a result, the High Bandwidth Memory interface can be clocked much lower but still offer more than 100GB/s for HBM versus 25GB/s with GDDR5. HBM also requires significantly less voltage, which equates to lower power consumption.
What is the point of providing more than one link if all of them point to the same page?
They're a substantially less evil force than either Intel or NVIDIA.
The Internet King? I wonder if he could provide faster nudity.
They just went from 330GB/s on the 290X to ~500GB/s. This will do nothing at all to security. Also we still have no idea how memory latency is impacted (shorter paths, but also lower clocks). If they can scale down to APUs and lower cost GPUs then there is some really great potential.
One has to give it to AMD. Despite their stock and sales taking a battering, they have consistently refused to let go of cutting edge innovation. If anything, their CPU team should learn something from their GPU team.
On the topic of HBM, the most exciting thing is the power saving. This would potentially shave off 10-15W from the DRAM chip and possibly more from the overall implementation itself - simply because this is a far simpler and more efficient way for the GPU to address memory.
To quote:
"Macri did say that GDDR5 consumes roughly one watt per 10 GB/s of bandwidth. That would work out to about 32W on a Radeon R9 290X. If HBM delivers on AMD's claims of more than 35 GB/s per watt, then Fiji's 512 GB/s subsystem ought to consume under 15W at peak. A rough savings of 15-17W in memory power is a fine thing, I suppose, but it's still only about five percent of a high-end graphics cards's total power budget. Then again, the power-efficiency numbers Macri provided only include the power used by the DRAMs themselves. The power savings on the GPU from the simpler PHYs and such may be considerable."
http://techreport.com/review/2...
For high end desktop GPUs, this may not be much, but this provides exciting possibilities for gaming laptop GPUs, small formfactor / console formfactor gaming machines (Steam Machine.. sigh), etc. This kind of power savings combined with increased bandwidth cna be a potential game changer. You can finally have a lightweight thin gaming laptop that can still do 1080p resolution at high detail levels for modern games.
I know Razer etc already have some options, but a power efficient laptop GPU from the AMD stable will be a very compelling option for laptop designers. And really, AMD needed something like Fiji - they really have to dig themselves out of their hole.
Do we really need to see an article from MojoKid every day to drive traffic to his site?
The answer depends on how that question is interpreted. No, we don't need to see such an article. Yes, such an article is necessary to drive traffic to said site, if that's your goal anyway.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
nVidia fanboy since switching to Linux. A combination of them releasing their new unified driver, the latest nvidia chips being notoriously hard for nouveau, and now this, I think my next card is going to come from AMD
All the graphics I have seen represent the HVM memory being much higher or thick that the GPU core part. Is that how it will be? And if so how are they going to make the lid make good contact with the hot parts when they are "recessed"?
The good: next year AMD start making APUs with HBM. The only thing that was holding back the iGPU was memory bandwidth. So, now they can put a 1024 shader GPU on the die and not have it starved by bandwidth. That will have interesting applications: powerful gaming laptops much cheaper than those with a discreet GPU and HPC (especially considering HSA applications)
The bad: this year AMD is only releasing one new GPU, Fiji. The rest are rebadges. And there is no new architecture. Even Fiji is making do with GCN 1.2
Moore's law cover this security concern. Expect computations to keep doubling this your key that takes a year to crack will take a few hours in the course of a few days in the next decade.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
which was eventually adopted by JEDEC 2013
I hope the organizers were careful not to invite any patent trolls to that round.
I'm not sure how increased bandwidth could make a difference beyond sheer number of tries per second. On that scale, it's (very) roughly a factor of 2 difference, so add a bit and move on, no?
It's not like Bitcoin mining hardware hasn't already made passwords effectively useless.
What a retarded statement. Even all bitcoin hardware in the world can not break one (real) password,
nor any password that is hashed also with something other/more then sha256.
GPUs tend to be more bandwidth-constrained than latency-constrained, as they typically have long pipelines that can hide much of the latency. Though I'm sure you could make them really latency-constrained if you wrote shaders badly.
Not when your data accesses are very very pipelined like in the case of 3D graphics. Efficient rendering code is generally setup to render a small number of large batches precisely because of this pipelining.
I'm glad you brought that up. Clearly, the AMD engineers spent all this time working on the wrong solution.
And if you stagger the memory addresses round robin and maybe offset the clock of each stack appropriately, might there be a performance gain as well?
Googling the JEDEC document number JESD235 from the article found several references with NVIDIA talking about this for 2 years now for their Pascal series of chips after Maxwell.
future-nvidia-pascal-gpus-pack-3d-memory-homegrown-interconnect
http://en.wikipedia.org/wiki/High_Bandwidth_Memory
http://www.cs.utah.edu/thememoryforum
No. To drive an external bus requires a lot of silicon space to handle the capacitance resistance and distance. This also requires a lot of power.
Stacked chips required far smaller drivers. The distance is in the mm rather than decimeters. The insulators are far better (as the current and voltage can be far smaller). Capacitance is also far lower. And you do not need to have 1024 bit data paths + address + signaling on the motherboard which makes motherboards far simpler and cheaper to make. Not counting the problems with signal propagation along different length paths on a motherboard (designed into the chip in this case) or having interactions from the multilayer PCB traces.
So yes there are very good reasons to do this.
Control-F "heat"
[No Results]
My thoughts exactly.
That's not to say this isn't great stuff, but thermal issues have always been the first or second most important problem with stacked memory. The other problem being fabrication and routing. You can't put a heatsink on a die when the die is wedged between 5 other dies. So you're hoping a heatsink on the top and bottom is enough for the middle wafers, or you're running some sort of tiny heat exchanger system.
How about they work on the drivers? i love amd cards and new HBM is super ..but every 1-2 years i fall for AMD.ATI video card hype and fall into the driver hell trap. 260x and 285x horrid linux/windows support . i can live with heat or noise issues but driver crash's on a black desktop with icons ..come on before they break out new gpu how about fix the drivers ..nvidia gets hate for there drivers but they work and dont go bonkers
So confused. GPU memory bandwidth really hasn't been an issue.
Uh, yes, it has. That's why GPU manufacturers often made mid-range cards by putting the same chip on a board with a half-wide memory bus; saves money on RAM, and is significantly slower.
Expect computations to keep doubling this your key that takes a year to crack will take a few hours in the course of a few days in the next decade.
Not to worry; I'll change my key several times before then.
Comment removed based on user account deletion
Oh yes when you get to the hundreds of Mhz-Ghtz range the signaling properties and distance become very important. Electrical signals in copper do not travel at the speed of light. Even if they did you still would have timing issues due to different lengths of the copper traces. on the MB. It takes a lot of skill to negate this single effect. The wider the bus the harder it is to achieve
Unless you get Monster cables of course!.
Latency is dominated by reading the DRAM itself and not the interface frequency which is why latency which is specified in clocks is roughly proportional to interface frequency.
I take it the HotHardware people have mod points today
Have you seen how their company is doing lately?
That is debatable. Bumping the signals from the silicon to light seems to be quite slow (sort of time expensive as these things go). Almost enough to negate the transmission time overhead. Many people are working on this issue (lasing elements on the silicon substrate and various other good ideas)
But this seems to work. it avoids the large bus drivers it avoids the distance issues. It avoids the routing issues.
Note that I said "avoid" a lot there:)