AMD Details Next-Gen Kaveri APU's Shared Memory Architecture
crookedvulture writes "AMD has revealed more details about the unified memory architecture of its next-generation Kaveri APU. The chip's CPU and GPU components will have a shared address space and will also share both physical and virtual memory. GPU compute applications should be able to share data between the processor's CPU cores and graphics ALUs, and the caches on those components will be fully coherent. This so-called heterogeneous uniform memory access, or hUMA, supports configurations with either DDR3 or GDDR5 memory. It's also based entirely in hardware and should work with any operating system. Kaveri is due later this year and will also have updated Steamroller CPU cores and a GPU based on the current Graphics Core Next architecture."
bigwophh writes links to the Hot Hardware take on the story, and writes "AMD claims that programming for hUMA-enabled platforms should ease software development and potentially lower development costs as well. The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs."
Dear Linux Advocate,
Money doesn't grow on trees. And, Linux Advocates is growing. Naturally, we anticipate operating costs and hope to be able to meet them.
But, any amount you feel you are able to donate in support of our ongoing work will be most surely appreciated and put to very good use. Your contributions keep Linux Advocates growing.
Show your support by making a donation today.
Thank you.
Dieter T. Schmitz
Linux Advocates, Owner
http://www.linuxadvocates.com/p/support.html
HXT slots and video cards any time soon ?
World English Dictionary
synonyms: alleged, ostensible, nominal
This is no match for Haswell. But I'm glad to see AMD is doing its job so well: being a nice distant 2nd place while keeping Intel's prices at bay.
will feature this technology. It will be interesting to see how it stacks up.
I'm not so sure how I feel about this whole Linux advocacy thing you're trying to promote. But spam, now there's an idea I can get behind! Take my money!
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
I'm curious how long it will be before these optimizations are found in the compilers themselves.
As usual, AMD is leaving out some key information. What will be the TDP of such chips? I've always rooted for AMD and all my systems were built with them. You can't beat an Ivy Bridge chip for performance for watt though. With the i7-3770K, AMD doesn't offer anything compelling to compete. I like the idea that they're using the GCN architecture to assist with processing, but have they done anything to the lithography or power consumption? Intel's haswell chips come out soon and those are even better. Power is key in the mobile space where a lot of chips are going. -Joe
Does this make it easier or harder to write malware? Serious question.
This should really help round trip times trough the GPU. With most existing setups, doing a render to texture, and getting the results back CPU side is quite expensive, but this should help a lot. It should also work great for procedural editing/generating/swapping geometry that you are rendering. Getting all those high poly LODs onto the GPU will not longer be an issue with systems like this.
Interestingly enough, this is somewhat similar to what Intel has now for their integrated graphics, except it looks like the AMD GPU has access to the full address space and cache system, which Intel does not do. Also, its not an Intel GPU, so its likely better in other ways too, but I shouldn't need to point that out.
Intel's Haswell is moving in the opposite direction working to get some dedicated memory for the GPU, which is closer to the traditional GPU approach. Its nice to see companies exploring new areas; hopefully we will get some great hardware out of it, ideally with no broken drivers.
One question they never seem to answer is why bother unifying the memory architecture at all? CPU and GPU memory architectures have always been different for the same reasons that CPUs and GPUs themselves are different; one is designed for fast execution of serial instructions with corresponding random smaller reads and writes to memory, and the other is designed for fast execution of parallel instructions with corresponding contiguous reads and writes that are much larger in size. It seems like you're just going to to ending compromising if you try to shoehorn one onto the other.
They talk about passing pointers back and forth as though the GPU and CPU effectively share an MMU. The problem is, GPUs and CPUs don't work the same way. GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis. It would be devastating if a GPU could, for example, allow an arbitrary user-space process to overwrite parts of the kernel and inject virus code that runs with greater-than-root privilege. It would similarly be devastating if some arbitrary process could, for example, read the private RAM that backs your keychain or other security-related processes.
I'm assuming that they're doing something sane like having a separate set of RWX bits on each page table entry to control what the GPU's rights are for that page, so that the GPU would only be allowed to read specifically flagged main-memory pages, but these fuzzy marketing briefs provide just enough information to be terrifying.
Check out my sci-fi/humor trilogy at PatriotsBooks.
...heterogeneous uniform ...
Now there's an oxymoron!
Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.
HTH,
ac
If you want the fastest thing on the market, buy Intel. For the majority who want the best deal or most bang for their buck, AMD is the best buy.
With a GPU next to the CPU the latency between them is reduced, this is awesome for OpenCL applications. Imagine you wanted to work a markov model into your AI and you needed to a large number of matrix calculations to get it to run properly and you want it in real time, I think this might solve that problem. I'm imagining game AI improving with adoption of this style of processor. Anyone see this differently?
Eat sleep die
That's why Intel's HD4000 is faster than AMD's HD 7660D in several OpenCL benchmarks. http://semiaccurate.com/2013/04/29/a-look-at-intels-opencl-performance/
And it didn't get 'shelved', it got turned into a Tesla-Alike, since while it was great for GPGPU loads, it actually sucked as a replacement for the then-current generation AMD/Nvidia GPUs and by the time it was released to the public it would've been an i740/i752 disaster all over again.
In low-cost systems the CPU and GPU are combined on a single chip with a single (slow) memory controller. Given that constraint, AMD is trying to at least wring as much efficiency as they can from that single cheap chip. I salute them for trying to give customers more for their money, but let's admit that this hUMA thing is not about breaking performance records.
"Kaveri" is actually the name of a major river in Karnataka, a state in India. AMD also has "Kabini" chip, and "Kabini" is also a river in Karnataka, India :)
I think AMD overrate heterogenous computing. The assumption is that all applications can take advantage of GPGPU. This is simply not true. Only certain types of application are suitable, such as multimedia and simulation - where it's very obvious what part of the code can be parallelised.
With GDDR5 memory this could be very interesting.
Hi Charlie!
I'd still prefer an i3 and an entry level dedicated videocard.
Oh hello there. I must disappoint you, I'm not Charlie, I'm just one of semiaccurate readers. Anyway, a few days ago, I was rather surprised, that even though HD4000 has 2x lower raw performance than HD 7660D, it still manages to beat HD 7660D in quite a few benchmarks. Shared Memory Architecture is an obvious explanation for that...
OK, so the SGI O2's UMA has now been reinvented for a new generation, just with more words tacked on....
Does this mean that you can pass a pointer to a buffer object from a GPU process to a CPU process, manipulate it on the CPU and pass the pointer back to the GPU to continue processing there?
Is to use it, install it in as many places as you can (for friends and family) and work out any problems or questions they may have. Even if they don't stick with it, the experience will be useful in general and will help shape and grow Linux. You can start them off slowly by recommending some OSS apps where they may be useful, such as LibreOffice, VLC, Firefox, Chromium, Inkscape, Gimp, Pidgin, Thunderbird, etc. Many of them are probably already running a couple of those apps. Eventually they can switch over painlessly, or at least benefit from OSS in general.
I've never been to your site or heard of it, and I still come across many advocates. I don't think pouring resources into such an insignificant site will benefit Linux in general. The hard core are already doing their job, the supporters are fine with Google and official forums for their distro, and the users are doing the best thing they can, using the product.
So where exactly does linuxadvocates.com come in again? Seems useless.
I'm interested to see what the software model for this will be. Sure they could use OpenCL, but it seems like a lot of the pain in using OpenCL derives from the underlying memory architecture. With a shared virtual address space and fully coherent caches all in hardware, it should be possible to have a much simpler software model than OpenCL. I guess it doesn't really matter what the software model is though since now that everything is in main memory, GPU functions can be called just like regular functions and the caller doesn't need to care how they are implemented. E.g. it should be possible to have a BLAS GPU library that operates on main memory pointers, where before the cost of copying a matrix to the GPU and back for a single operation woudn't have been worth it.
You didn't read the word PS4 and "will"?
Why would a graphics card want to use virtual memory? Also, what motherboard takes GDDR5? Who the heck wrote this nonsense?
Why would a graphics card want to use virtual memory?
Shared physical memory avoids the cost of copying data to and from the GPU but without shared virtual memory the data will end up at different addresses on the CPU and GPU. This means that you cannot use pointers to link parts of the data together and must rely on indexes of some sort. This makes it harder to port existing code and data structures to use GPU computation.
Also, with shared physical memory you have to tell the device which memory you want to use (so that it can tell you which address to use). With shared virtual memory you can use any memory that is mapped into the CPU process and the memory system will automatically make it visible to the GPU.
In other words, it makes the programmers' life easier. How you measure this benefit is another question altogether!
In my experience GPU and especially GPGPU bottleneck is not amount of memory but memory access bandwidth. 256-512 bit is not adequate for existing apps. Before amount of memory will become important manufacturers should move to at least 2048 bit mem bus and also increase amounts of registers per core several times.
I haven't seen this magical word in the presentation. Moreover I do not see the CPU/GPU convergence often talked about. It sounds more like a marketing hype. Moreover the ecosystem could be enriched with DSP or Network processor cores all uniformly offering their resources to software, I did not see it.
The question now will be how long it takes before the drivers (OpenCL, DirectX, OpenGL) and even the OSes themselves can take advantage of the architecture. And once that's done, AMD would be wise to work directly with the big compilers (gcc, Clang, msvc, and Intel if they would do it) to allow developers to flip a bit so that the RTLs could use OpenCL for as many math calculations as possible. After all, this is just one step closer to performing nearly all floating point math on the "math-coprocessor" (aka GPU).
If you prefer one hardware over the other without seeing benchmarks, then you are someone that is usually referred to as a "fanboy". Have fun with that.
Watch for Penguins, they eat Apples and throw rocks at Windows.
So how do you do this in Java, Python? Did nobody ask? I did a search for "java huma uniform memory access" and this page came up first with nothing from java.com or oracle in sight.
Ok more searching says to use OpenCL and lots of stackoverflow questions... but they're not new... and OpenCL is not Java. What do you do for this new easier to program hardware? Is their definition of "supported" currently a bit optimistic? Supported by Java..... because Java lets you do lots of things not actually in Java and still work with a Java program, so pretty much anything is "supported" in Java. Is that the jist? I guess we need the tools to evolve before things really take hold.
simple, fast homepage with your links: http://www.ngumbi.com/
And if you believe benchmarks are a good indicator of real performance, you're just fucking stupid.