AMD Details Next-Gen Kaveri APU's Shared Memory Architecture
crookedvulture writes "AMD has revealed more details about the unified memory architecture of its next-generation Kaveri APU. The chip's CPU and GPU components will have a shared address space and will also share both physical and virtual memory. GPU compute applications should be able to share data between the processor's CPU cores and graphics ALUs, and the caches on those components will be fully coherent. This so-called heterogeneous uniform memory access, or hUMA, supports configurations with either DDR3 or GDDR5 memory. It's also based entirely in hardware and should work with any operating system. Kaveri is due later this year and will also have updated Steamroller CPU cores and a GPU based on the current Graphics Core Next architecture."
bigwophh writes links to the Hot Hardware take on the story, and writes "AMD claims that programming for hUMA-enabled platforms should ease software development and potentially lower development costs as well. The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs."
HXT slots and video cards any time soon ?
will feature this technology. It will be interesting to see how it stacks up.
I'm not so sure how I feel about this whole Linux advocacy thing you're trying to promote. But spam, now there's an idea I can get behind! Take my money!
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
I'm curious how long it will be before these optimizations are found in the compilers themselves.
As usual, AMD is leaving out some key information. What will be the TDP of such chips? I've always rooted for AMD and all my systems were built with them. You can't beat an Ivy Bridge chip for performance for watt though. With the i7-3770K, AMD doesn't offer anything compelling to compete. I like the idea that they're using the GCN architecture to assist with processing, but have they done anything to the lithography or power consumption? Intel's haswell chips come out soon and those are even better. Power is key in the mobile space where a lot of chips are going. -Joe
Does this make it easier or harder to write malware? Serious question.
This should really help round trip times trough the GPU. With most existing setups, doing a render to texture, and getting the results back CPU side is quite expensive, but this should help a lot. It should also work great for procedural editing/generating/swapping geometry that you are rendering. Getting all those high poly LODs onto the GPU will not longer be an issue with systems like this.
Interestingly enough, this is somewhat similar to what Intel has now for their integrated graphics, except it looks like the AMD GPU has access to the full address space and cache system, which Intel does not do. Also, its not an Intel GPU, so its likely better in other ways too, but I shouldn't need to point that out.
Intel's Haswell is moving in the opposite direction working to get some dedicated memory for the GPU, which is closer to the traditional GPU approach. Its nice to see companies exploring new areas; hopefully we will get some great hardware out of it, ideally with no broken drivers.
One question they never seem to answer is why bother unifying the memory architecture at all? CPU and GPU memory architectures have always been different for the same reasons that CPUs and GPUs themselves are different; one is designed for fast execution of serial instructions with corresponding random smaller reads and writes to memory, and the other is designed for fast execution of parallel instructions with corresponding contiguous reads and writes that are much larger in size. It seems like you're just going to to ending compromising if you try to shoehorn one onto the other.
The APU graphics kick the shit out of Intels, and now, you don't even need a memory->vid memory BUS. Think about it
They talk about passing pointers back and forth as though the GPU and CPU effectively share an MMU. The problem is, GPUs and CPUs don't work the same way. GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis. It would be devastating if a GPU could, for example, allow an arbitrary user-space process to overwrite parts of the kernel and inject virus code that runs with greater-than-root privilege. It would similarly be devastating if some arbitrary process could, for example, read the private RAM that backs your keychain or other security-related processes.
I'm assuming that they're doing something sane like having a separate set of RWX bits on each page table entry to control what the GPU's rights are for that page, so that the GPU would only be allowed to read specifically flagged main-memory pages, but these fuzzy marketing briefs provide just enough information to be terrifying.
Check out my sci-fi/humor trilogy at PatriotsBooks.
...heterogeneous uniform ...
Now there's an oxymoron!
Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.
HTH,
ac
If you want the fastest thing on the market, buy Intel. For the majority who want the best deal or most bang for their buck, AMD is the best buy.
When someone asks me about buying AMD or Intel, the general summarization I give them is that AMD's built-in GPU handily beats Intel's built-in GPU but Intel's CPU beats AMD's CPU. If graphics are a big concern, they should get a cheap discrete card as one under $100 will be good for most games. Thus AMD's advantage is negated. Also both companies offer more CPU processing power than most consumers can use anyway.
Well, there's spam egg sausage and spam, that's not got much spam in it.
With a GPU next to the CPU the latency between them is reduced, this is awesome for OpenCL applications. Imagine you wanted to work a markov model into your AI and you needed to a large number of matrix calculations to get it to run properly and you want it in real time, I think this might solve that problem. I'm imagining game AI improving with adoption of this style of processor. Anyone see this differently?
Eat sleep die
That's why Intel's HD4000 is faster than AMD's HD 7660D in several OpenCL benchmarks. http://semiaccurate.com/2013/04/29/a-look-at-intels-opencl-performance/
And it didn't get 'shelved', it got turned into a Tesla-Alike, since while it was great for GPGPU loads, it actually sucked as a replacement for the then-current generation AMD/Nvidia GPUs and by the time it was released to the public it would've been an i740/i752 disaster all over again.
In low-cost systems the CPU and GPU are combined on a single chip with a single (slow) memory controller. Given that constraint, AMD is trying to at least wring as much efficiency as they can from that single cheap chip. I salute them for trying to give customers more for their money, but let's admit that this hUMA thing is not about breaking performance records.
"Kaveri" is actually the name of a major river in Karnataka, a state in India. AMD also has "Kabini" chip, and "Kabini" is also a river in Karnataka, India :)
I think AMD overrate heterogenous computing. The assumption is that all applications can take advantage of GPGPU. This is simply not true. Only certain types of application are suitable, such as multimedia and simulation - where it's very obvious what part of the code can be parallelised.
With GDDR5 memory this could be very interesting.
Hi Charlie!
I'd still prefer an i3 and an entry level dedicated videocard.
Oh hello there. I must disappoint you, I'm not Charlie, I'm just one of semiaccurate readers. Anyway, a few days ago, I was rather surprised, that even though HD4000 has 2x lower raw performance than HD 7660D, it still manages to beat HD 7660D in quite a few benchmarks. Shared Memory Architecture is an obvious explanation for that...
"a cheap discrete card as one under $100 will be good for most games. "
HAHAHAHA
Maybe games from 5 years ago, otherwsie you're stuck with the lowest possible settings in any "modern" game. Ever used one of those sub $100 graphic cards? I did, all I could do was play older games, such as Unreal Tournament 2004 or Starcraft 1 and playback HD videos, other than that, couldn't play at any decent settings games like bioshock, SC2, any mmo such as WoW, Guildwars, etc...
OK, so the SGI O2's UMA has now been reinvented for a new generation, just with more words tacked on....
Does this mean that you can pass a pointer to a buffer object from a GPU process to a CPU process, manipulate it on the CPU and pass the pointer back to the GPU to continue processing there?
Is to use it, install it in as many places as you can (for friends and family) and work out any problems or questions they may have. Even if they don't stick with it, the experience will be useful in general and will help shape and grow Linux. You can start them off slowly by recommending some OSS apps where they may be useful, such as LibreOffice, VLC, Firefox, Chromium, Inkscape, Gimp, Pidgin, Thunderbird, etc. Many of them are probably already running a couple of those apps. Eventually they can switch over painlessly, or at least benefit from OSS in general.
I've never been to your site or heard of it, and I still come across many advocates. I don't think pouring resources into such an insignificant site will benefit Linux in general. The hard core are already doing their job, the supporters are fine with Google and official forums for their distro, and the users are doing the best thing they can, using the product.
So where exactly does linuxadvocates.com come in again? Seems useless.
I'm interested to see what the software model for this will be. Sure they could use OpenCL, but it seems like a lot of the pain in using OpenCL derives from the underlying memory architecture. With a shared virtual address space and fully coherent caches all in hardware, it should be possible to have a much simpler software model than OpenCL. I guess it doesn't really matter what the software model is though since now that everything is in main memory, GPU functions can be called just like regular functions and the caller doesn't need to care how they are implemented. E.g. it should be possible to have a BLAS GPU library that operates on main memory pointers, where before the cost of copying a matrix to the GPU and back for a single operation woudn't have been worth it.
Ok. Noted. Either will do fine CPU-wise.
Ah. Great. So AMD is the better buy then.
Not only that, but it will save ~$100 on the CPU and ~$50 more on the motherboard. That's GREAT advice.
But no.. Then we hear this;
Ummm.. First you made a good case for AMD, and now you're saying they should pick Intel anyway, and not only that, They should cough up an extra $100 on top of the ~$150 extra they already need to cough up, just to negate AMD's advantage. WTF? Why not just pick AMD in the first place then?
AMD beats Intel on the price point however.
And that isn't even counting that with Intel you need to buy a $100 extra card either.
If you *need* top notch performance, go Intel. Otherwise AMD will be lighter on your wallet and do the same job very well.
You didn't read the word PS4 and "will"?
No. I'm saying if the user intends to get a discrete GPU there isn't an advantage to AMD and a slight advantage to Intel. But most consumers don't do anything that would see a difference anyways. Either works.
Well, there's spam egg sausage and spam, that's not got much spam in it.
radeon 5670 with 512m onboard (cost from newegg when bought $90) plays GW, SC and all the other games Iv'e thrown at it quite handily. Will have probs if game is is heavily tesserected but that's the only time it's a prob and I run 1900x1080 (monitor native rez) and the funniest thing is - the new radeon drivers support the damn thing while my 7300GT is no longer supported by either Nvidia or Linux, even with the god damn nouveua and nv driver. The reason I still have the old Geforce 7300GT - it's fanless so don't have to worry about it dying from overheating.
I've said it before and I'll say it again, what AMD is doing is pushing the APU as the new FPU. That's right. Once they get things completely revamped, you're going to be looking at a CPU that outperforms Intels best by quite a bit in the next decade.
Mod me up/Mod me down: I wont frown as I've no crown
Why would a graphics card want to use virtual memory? Also, what motherboard takes GDDR5? Who the heck wrote this nonsense?
You can get a Geforce GTX 650 for under $100 these days. That will handle pretty much any game at maximum or high settings.
I think AMD's target for this architecture is a typical Walmart shopper (lower price point, higher sales volume) looking to buy a laptop, so add-on video cards are out of the question. The first 2 questions this type of shopper will ask is "how much?' and "which one is better?"
AMD still has the advantage of the CPU and mobo costing much less then the Intel system even if you are going with only a dedicated GPU. a 970 based mobo and a cheap Phenom2 or Vishera FX4 series CPU and any $100+ GPU and you are still comming in around $100 under the equivilent Intel system.
It's time to stop thionking with your e-peen and making better use of your money.
Why would a graphics card want to use virtual memory?
Shared physical memory avoids the cost of copying data to and from the GPU but without shared virtual memory the data will end up at different addresses on the CPU and GPU. This means that you cannot use pointers to link parts of the data together and must rely on indexes of some sort. This makes it harder to port existing code and data structures to use GPU computation.
Also, with shared physical memory you have to tell the device which memory you want to use (so that it can tell you which address to use). With shared virtual memory you can use any memory that is mapped into the CPU process and the memory system will automatically make it visible to the GPU.
In other words, it makes the programmers' life easier. How you measure this benefit is another question altogether!
In my experience GPU and especially GPGPU bottleneck is not amount of memory but memory access bandwidth. 256-512 bit is not adequate for existing apps. Before amount of memory will become important manufacturers should move to at least 2048 bit mem bus and also increase amounts of registers per core several times.
I haven't seen this magical word in the presentation. Moreover I do not see the CPU/GPU convergence often talked about. It sounds more like a marketing hype. Moreover the ecosystem could be enriched with DSP or Network processor cores all uniformly offering their resources to software, I did not see it.
I think AMD's target for this architecture is a typical Walmart shopper
Partly that and partly it's way more interesting. The unified memory trades performance for flexibility (as always), but puts it in a very interesting space. Less performant than a discrete GPU with a crazy memory architecture, but puts tons more FPU grunt under the flexible memory susbsystem of easy to use CPUs.
It will make acceleration more applicable to a much wider range of tasks at the cost of being slower on some.
Due to the close coupling, on the right codes, this thing outght to absoloutely hammer even the top end i7s. It should also be able to handily beat discrete GPUs on tasks where the cpu-gpu-cpu latency is just too high, or where GPUs just don't have enough memory.
Of course on single threaded tasks even the i5 and i3 will probably beat it though AMD has been slowly closing the gap and this will improve the situation slightly.
I can see this being personally useful to me. The thing is that discrete GPUs are a bit of a major faff for too many tasks and yield too little benefit.
Honestly due to the enhanced opportunities for acceleration, it's probably a waste to use it for graphics. May as well offload that on to some dedicated hardware. And the cycle of reinvention begins again.
SJW n. One who posts facts.
tesserected isn't a word, but it's such as good word I'm going to go away and develop the technology, saving you any spelling embarrassment.
Yours, anon.
The question now will be how long it takes before the drivers (OpenCL, DirectX, OpenGL) and even the OSes themselves can take advantage of the architecture. And once that's done, AMD would be wise to work directly with the big compilers (gcc, Clang, msvc, and Intel if they would do it) to allow developers to flip a bit so that the RTLs could use OpenCL for as many math calculations as possible. After all, this is just one step closer to performing nearly all floating point math on the "math-coprocessor" (aka GPU).
Not trying to pick a fight here, but I don't think this computes unless you change your mind about the importance of the CPU's computational power, or take some other - not yet mentioned - factor(*) into consideration.
Eg: If the user intends to get a discrete GPU, as you say, s/he will have approx $150 more to spend on the GPU if s/he picks the AMD solution. A $250 GPU vs. a $100 GPU is a pretty significant difference. Thus if graphics matter, the user should pick the AMD solution.
(*) of which there is possibly a boatload to consider. Socket longevity, thermal design power, ability to build a quiet system, ability to use ECC memory, etc. Not only price, but also many 'features' favor AMD since AMD tends to enable ECC, AMD-V and such in consumer CPUs, whereas you have to step up to Xeons to get that from Intel. However, some properties such as Computational power per Watt tend to favor Intel in a significant way. Where I think we agree, is that with Intel you can get pretty much everything you can get from AMD, provided you're willing to spend the money (Eg. step up to a Xeon CPU, add a discrete graphics card).
If you prefer one hardware over the other without seeing benchmarks, then you are someone that is usually referred to as a "fanboy". Have fun with that.
Watch for Penguins, they eat Apples and throw rocks at Windows.
So how do you do this in Java, Python? Did nobody ask? I did a search for "java huma uniform memory access" and this page came up first with nothing from java.com or oracle in sight.
Ok more searching says to use OpenCL and lots of stackoverflow questions... but they're not new... and OpenCL is not Java. What do you do for this new easier to program hardware? Is their definition of "supported" currently a bit optimistic? Supported by Java..... because Java lets you do lots of things not actually in Java and still work with a Java program, so pretty much anything is "supported" in Java. Is that the jist? I guess we need the tools to evolve before things really take hold.
simple, fast homepage with your links: http://www.ngumbi.com/
And if you believe benchmarks are a good indicator of real performance, you're just fucking stupid.
Either will do fine CPU-wise
Except the Intel CPU will complete its given tasks two to four times faster than AMD's closest equivalent CPU.
So AMD is the better buy then
Nope. Benchmarks show that Intel HD 4000 graphics are easily on par with any integrated AMD GPU.
it will save ~$100 on the CPU
That's easy to say when you don't even specify which specific products you are referencing. I can easily find an Intel CPU that outperforms an AMD CPU in the same price range.
~$50 more on the motherboard
Considering you can buy an Intel brand LGA 1155 motherboard for $50 or less, I'd like to know where I can get AMD motherboards for free or where AMD will pay me to take one.