AMD Details Next-Gen Kaveri APU's Shared Memory Architecture

← Back to Stories (view on slashdot.org)

AMD Details Next-Gen Kaveri APU's Shared Memory Architecture

Posted by timothy on Tuesday April 30, 2013 @05:25AM from the grander-unified-theory dept.

crookedvulture writes "AMD has revealed more details about the unified memory architecture of its next-generation Kaveri APU. The chip's CPU and GPU components will have a shared address space and will also share both physical and virtual memory. GPU compute applications should be able to share data between the processor's CPU cores and graphics ALUs, and the caches on those components will be fully coherent. This so-called heterogeneous uniform memory access, or hUMA, supports configurations with either DDR3 or GDDR5 memory. It's also based entirely in hardware and should work with any operating system. Kaveri is due later this year and will also have updated Steamroller CPU cores and a GPU based on the current Graphics Core Next architecture." bigwophh writes links to the Hot Hardware take on the story, and writes "AMD claims that programming for hUMA-enabled platforms should ease software development and potentially lower development costs as well. The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs."

2 of 128 comments (clear)

Min score:

Reason:

Sort:

Re:Where's the fine print? by serviscope_minor · 2013-04-30 05:56 · Score: 5, Insightful

You can't beat an Ivy Bridge chip for performance for watt though.
Ehugh. Yes no kind of.
For "general" workloads IVB chips are the best in performance per Watt.
In some specific workloads, the high core count piledrivers beat IVB, but that's rare. For almost all x86 work IVB wins.
For highly parallel churny work that GPUs excel at, they beat all X86 processors by a very wide margin. This is not surprising. They replace all the expensive silicon that make general purpose processors go fast and put in MOAR ALUs. So much like the long line of accelerators, co processors, DSPs and so on, they make certain kinds of work go very fast and are useless at others.
But for quite a few classes of work, GPUs trounce IVB at performance per Watt.
The trouble is that GPUs suck. They have teeny amounts of local memory and a slow interconnect to main memory. They also suck at certain things and batting data between the fast (for some things) GPU and fast (for other things) CPU is a real drag becuase of the latency. This limits the applicability of GPUs.
Only with the new architecture, which I (and presumably many others) hoped was AMDs long term goal a number of these problems have disappeared since the link is very low latency and the memory fully shared.
This means the very superior performance per Watt (for some things) GPU can be used for a wider range of tasks.
So yes, this should do a lot for power consumption for a number of tasks.

--
SJW n. One who posts facts.
Re:Why compromise? by SenatorPerry · 2013-04-30 06:13 · Score: 5, Informative

In OpenCL you need to copy items from the system memory to the GPU's memory and then load the kernel on the GPU to start execution. Then you must copy the data back from the GPU's memory at the end after execution. AMD is saying that you can instead pass a pointer to the data in the main memory instead of actually making copies of the data.
This should reduce some of the memory shifting on the system and speed up OpenCL execution. It will also eliminate some of the memory constraints on OpenCL regarding what you can do on the GPU. On a larger scale it will open up some opportunities for optimizing work.