AMD Details Next-Gen Kaveri APU's Shared Memory Architecture

HXT slots and video cards any time soon ? by Anonymous Coward · 2013-04-30 05:34 · Score: 0

HXT slots and video cards any time soon ?

The PS4 by MXPS · 2013-04-30 05:38 · Score: 4, Interesting

will feature this technology. It will be interesting to see how it stacks up.

Re:The PS4 by Anonymous Coward · 2013-04-30 06:43 · Score: 0

will feature this technology. It will be interesting to see how it stacks up.
The PS3 already uses shared address space. And so do most laptop computers.
Or did you mean this specific technology? No, the PS4 will not, it will be using the AMD "Jaguar" CPU line, and a Radeon-line GPU.
Re:The PS4 by Wesley+Felter · 2013-04-30 06:54 · Score: 2

One of the problems with the PS3 is that it didn't have shared memory. Maybe you're thinking of the 360.
Re:The PS4 by triffid_98 · 2013-04-30 08:09 · Score: 1

And unlike the Atari Jaguar, it will actually be a 64 bit system. *rimshot*
Re:The PS4 by thoper · 2013-04-30 08:22 · Score: 2

in effect, the ps4 memory is even more integrated.. see: here and here

Spam Advocates by TheNinjaroach · 2013-04-30 05:40 · Score: 2

I'm not so sure how I feel about this whole Linux advocacy thing you're trying to promote. But spam, now there's an idea I can get behind! Take my money!

--
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..

Interesting by Malenx · 2013-04-30 05:41 · Score: 1

I'm curious how long it will be before these optimizations are found in the compilers themselves.

Where's the fine print? by madwheel · 2013-04-30 05:41 · Score: 1

As usual, AMD is leaving out some key information. What will be the TDP of such chips? I've always rooted for AMD and all my systems were built with them. You can't beat an Ivy Bridge chip for performance for watt though. With the i7-3770K, AMD doesn't offer anything compelling to compete. I like the idea that they're using the GCN architecture to assist with processing, but have they done anything to the lithography or power consumption? Intel's haswell chips come out soon and those are even better. Power is key in the mobile space where a lot of chips are going. -Joe

Re:Where's the fine print? by K.+S.+Kyosuke · 2013-04-30 05:45 · Score: 1

Power is key in the mobile space where a lot of chips are going. -Joe
I hope that your i7-3770K is serving you well in your cell phone.

--
Ezekiel 23:20
Re:Where's the fine print? by madwheel · 2013-04-30 05:49 · Score: 1

I guess I need to provide more information to help get my point across. Intel has 4th gen chips that run on a 7 watt TDP. The performance per watt is pretty remarkable. Intel's i7-3770K has a 77 watt TDP. AMD's FX-8350 has a 125 watt TDP, get's spanked by Intel in most benchmarks, and doesn't have any graphics chip on die to drive a monitor. Translating that down, Intel has an advantage. I would love to be proven wrong though.
Re:Where's the fine print? by serviscope_minor · 2013-04-30 05:56 · Score: 5, Insightful

You can't beat an Ivy Bridge chip for performance for watt though.
Ehugh. Yes no kind of.
For "general" workloads IVB chips are the best in performance per Watt.
In some specific workloads, the high core count piledrivers beat IVB, but that's rare. For almost all x86 work IVB wins.
For highly parallel churny work that GPUs excel at, they beat all X86 processors by a very wide margin. This is not surprising. They replace all the expensive silicon that make general purpose processors go fast and put in MOAR ALUs. So much like the long line of accelerators, co processors, DSPs and so on, they make certain kinds of work go very fast and are useless at others.
But for quite a few classes of work, GPUs trounce IVB at performance per Watt.
The trouble is that GPUs suck. They have teeny amounts of local memory and a slow interconnect to main memory. They also suck at certain things and batting data between the fast (for some things) GPU and fast (for other things) CPU is a real drag becuase of the latency. This limits the applicability of GPUs.
Only with the new architecture, which I (and presumably many others) hoped was AMDs long term goal a number of these problems have disappeared since the link is very low latency and the memory fully shared.
This means the very superior performance per Watt (for some things) GPU can be used for a wider range of tasks.
So yes, this should do a lot for power consumption for a number of tasks.

--
SJW n. One who posts facts.
Re:Where's the fine print? by K.+S.+Kyosuke · 2013-04-30 06:06 · Score: 1

Intel's i7-3770K has a 77 watt TDP. AMD's FX-8350 has a 125 watt TDP, get's spanked by Intel in most benchmarks, and doesn't have any graphics chip on die to drive a monitor.
You know, that might be exactly the problem here. This is something completely different. If the GPU will be any decent, chances are that a combination of a high-end-GPU equipped APU with a lot of GDDR5 memory would make many HPC people much happier than Haswell ever could. In some application areas, it's all about bandwidth. Today, if you're trying to do HPC on, say, a 20GB dataset in memory, on a single machine, you're screwed.

--
Ezekiel 23:20
Re:Where's the fine print? by serviscope_minor · 2013-04-30 06:09 · Score: 1

Translating that down, Intel has an advantage.
i7 3770k: Â£250
FX 8350: Â£160
Yes. Advantage Intel. Also take into account that quality motherboards are usually cheaper for AMD and that one can also upgrade more easily.
The more apt comparison is to some i5. At that point, the 8350 beats it in a large number of benchmarks (and does actually beat the much more expensive i7). Basically in multi threaded code the FX8350 wins. In single threaded code the i5 wins.

--
SJW n. One who posts facts.
Re:Where's the fine print? by parlancex · 2013-04-30 06:12 · Score: 1

The trouble is that GPUs suck. They have teeny amounts of local memory and a slow interconnect to main memory. They also suck at certain things and batting data between the fast (for some things) GPU and fast (for other things) CPU is a real drag becuase of the latency. This limits the applicability of GPUs.
The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s which actually exceeds the best main memory bandwidth you'd get out of an Ivy Bridge CPU with very fast memory, so no, that's not a bottleneck for bandwidth, though yes, there is some latency there.

I don't know why everyone seems to forget that GPUs aren't just fast because they have a lot of ALUs (TFA included), they are fast because of the highly specialized GDDR memory they are attached to. One would be completely useless without the other. Even the lowly GTX 285 from 4 years ago was pushing 160GB/s for memory bandwidth.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 06:14 · Score: 0

This would also require a higher use of GPGPU coding (i.e., OpenCL, CUDA) in order to take full advantage of this architecture, or go to a thread scheduler approach (i.e., Mac OS X Grand Central Dispatch) where the thread scheduler would best know which hardware to send each process. It also gives a lot more credence to AMD HyperTransport. One of the biggest issues I have seen with the Piledriver cores is total memory bandwidth. Maybe AMD might consider opening up the HT bus a little more...
Re:Where's the fine print? by madwheel · 2013-04-30 06:19 · Score: 1

I do agree with you. I'm simply referring to the simple tasks the general public does. Web surfing, iTunes, emails, etc. These are not heavily threaded tasks. Granted the difference is marginal because any modern processor can handle this with ease. Sure in highly threaded workloads the AMDs offer a better bang for your buck, but the general public does not do this on a day to day basis.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 06:25 · Score: 0

As usual, AMD is leaving out some key information. What will be the TDP of such chips?
Does it really matter? As long as you can get a cooler solution for it, I doubt too many really care about TDP as opposed to power/money.
While I would like to buy AMD, I do have a i7-3770K for my main system now due to the sheer generic performance it gives. And I really think that AMD's plan for heterogeneous computing is a great idea. Especially with cache coherency which I've always wondered about, this system will be a beast if they can match it with a decent GPU.
AMD really lost the ball after Athlon64, which as I believe it, people blame on AMD at one point kicking out quite a chunk of their experienced hardware engineers.
But I think the heterogeneous memory access puts them in a good place against Intel. Especially with cache coherency the performance for swapping between cpu and gpu will be quite impressive, right now the latencies with PCI-e are pretty awful, either you post a large chunk of processing, or it'll be slower or just as fast. But I have to say that the advantage of this architecture will be more evident on consoles, like ... the PS4. (Assuming it actually has cache coherency) .
Re:Where's the fine print? by bored · 2013-04-30 06:52 · Score: 3, Informative

The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s which actually exceeds the best main memory bandwidth you'd get out of an Ivy Bridge CPU with very fast memory, so no, that's not a bottleneck for bandwidth, though yes, there is some latency there.
Its both, for my application, the GPU is roughly 3x-5x as fast as a high end CPU. This is fairly common on a lot of GPGPU workloads. The GPU provides a decent but not huge performance advantage.
But, we don't use the GPU! Why not? Because copying the data over the PCIe link, waiting for the GPU to complete the task, and then copying the data back over the PCI bus yields a net performance loss over just doing it on the CPU.
In theory, a GPU sharing the memory subsystem with the CPU avoids this copy latency. Nor does it preclude still having a parallel memory subsystem dedicated for local accesses on the GPU. That is the "nice" thing about opencl/CUDA the programmer can control the memory subsystems at a very fine level.
Whether or not AMD's solution helps our application remains to be seen. Even if it doesn't its possible it helps some portion of the GPGPU community.
BTW:
In our situation its a server system so it has more memory bandwidth than your average desktop. On the other hand, i've never seen a GPU pull more than small percentage of the memory bandwidth over the PCIe links doing copies. Nvidia ships a raw copy benchmark with the CUDA SDK, try it on your machines the results (theoretical vs reality) might surprise you.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 07:20 · Score: 0

With Valve doing their Steam Box for the living room, PS4 using a very similar design I think this will make a very interesting product that Intel cant match. If Valve adds more features to their Steam in a Box like shopping on Amazon, Ebay & whatever sites from your armchair and the shift from physical media to streamed & downloaded content I think a lot of households might have an Entertainment/Media PC in the living room. The PS4, Xbox 720 & Mac are all really just locked down PCs with some customized hardware anyway so most non-mobile gaming will be on one basic platform. If AMD & Valve can convince manufacturers to make boxes for the living room I see good things ahead.
Sure Intel's CPUs are faster but with more things using GPUs and GPGPUs I don't think this is a bad product.
It might also mean we start getting AMD optimized games (more console ports.. yay) that offer a serious performance improvements.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 07:36 · Score: 0

Each tab in a web browser is a different thread. Each program (as you mentioned) is in a different thread. In terms of smoothness, AMD may be better if you have multiple programs running in the background.
I know that some people's usage is that they leave all their reasonably frequently used websites in a tab and leave it running. This may help them.
Re:Where's the fine print? by Kjella · 2013-04-30 07:40 · Score: 1

Assuming you're willing to write special software that'll only see benefit on AMDs APUs, not on Intel nor anything with discrete GPUs. I suppose it's different for the PS4 or Xbox720 where you can assume that everyone that'll use the software will have it, but for most PC software the advantages would have to be very big indeed. If you need tons of shading power it's better to run on discrete GPUs, even with unified memory switching between shaders and cores isn't entirely free so it might not do that much for general computing, you need the right kind of mix. I'm hoping but can't help to feel that AMD is giving up a big market in pursuit of a small market.

--
Live today, because you never know what tomorrow brings
Re:Where's the fine print? by skids · 2013-04-30 07:43 · Score: 1, Interesting

Speaking as someone currently considering buying slightly behind the curve, I was all set to jump on an Intel-based fanless system because of the TDP figures. However, with the PowerVR versions of the Intel GPU c**k-blocking linux graphics, and with AMD finally open-sourcing UVD, I'm now back to considering a Brazos. Less choices for fanless pre-built systems, though. May have to skip on the pay-a-younger-geek-because-I-dont-enjoy-playing-legos-anymore part.
So no, for some markets, Intel has not yet realized the advantage that their IC processes should technically give them, and to the point of TFA, if they do not combine that advantage with architectural improvements, there will be ways for AMD to stay in this market for some time to come.

--
Someone had to do it.
Re:Where's the fine print? by tibman · 2013-04-30 09:44 · Score: 1

You're welcome : ) http://4changboard.wikia.com/wiki/Falcon_Guide

--
http://soylentnews.org/~tibman
Re:Where's the fine print? by Luckyo · 2013-04-30 10:02 · Score: 1

In terms of APUs, they have intel not just beat but utterly demolished. Intel has absolutely nothing on AMD when it comes to combination of slowish low TDP CPU and a built in GPU with performance of a low end discreet GPU.
And while they lack CPU power for high end, wouldn't you want a discreet CPU with a discreet GPU in that segment in the first place?
Re:Where's the fine print? by Rockoon · 2013-04-30 10:26 · Score: 2

The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s
32GB/s doesnt sounds like a lot when you divide it amongst the 400 stream processors that an upper end AMD APU has, and thats as favorable a light as I can shine on your inane bullshit. There is a reason that discrete graphics cards have their own memory, and it isnt because they have more stream processors (these days they do, but they didnt always) .. its because PCI Express isnt anywhere near fast enough to feed any modern GPU.

Llano APU's have been witnessed pulling 500 GFLOPS. Does 32GB/s still sound like a lot? No, it sounds like shit. Clearly memory bandwidth is a big issue in this scene.

--
"His name was James Damore."
Re:Where's the fine print? by parlancex · 2013-04-30 10:45 · Score: 1

Sigh. Here I go feeding the trolls.
I'm not sure what point you're trying to make here, since MY main point in the rest of this topic was that modern GPUs are mostly limited by memory bandwidth, which makes the development in TFA pretty pointless. You're right! 32GB/s isn't enough to make the most of the computing resources available on a modern GPU! That was my point; How exactly would the GPU accessing main memory directly help? The fastest system RAM currently available in consumer markets in the fastest possible configuration can barely reach 30GB/s. In order for GPUs to confer a computional advantage they need to be doing heavy lifting on GDDR RAM which could deliver over 160GB/s on cards that are over 4 years old.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 11:24 · Score: 0

He has his own website now -- http://www.logicalincrements.com/
The guide is a bit more oriented toward machines built for gaming, but the lower-end builds should be fine for an HTPC.
Re:Where's the fine print? by cynyr · 2013-04-30 12:04 · Score: 1

because you wouldn't need to transfer it between the CPU and GPU? you could just pooint the GPU at main system ram and let it have at it.

--
All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 12:35 · Score: 0

Nobody gave a shit about "TDP" until it was Intel's advantage. Now they won't shut up about it. They slam AMD's desktop chip for having 125W TDP, but nobody says tickety-boo about i7-3820's 130W TDP.
People just lap it up and regurgitate it without even digesting it.
Re:Where's the fine print? by fast+turtle · 2013-04-30 13:03 · Score: 1

The only way you can state that "each tab in the browser is another thread" is if you're not using firefox. Even on 20.1, they still don't have that and a single bad tab can and does take the entire browser down. Hell even IE 9 finally got it straight with tabs. Doesn't handle too many tabs at once but it runs each one i a seperate thread. Just like Chrome does. Opera does the same AFAIK (don't use it). You're right about background tasks and such though as the multithread performance of an AMD CPU is far better then Intel's unless you're into the Xeon's.

--
Mod me up/Mod me down: I wont frown as I've no crown
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 15:03 · Score: 0

From what I read, the new APU will support GDDR5 in a unified RAM configuration. That would be better than DDR3/GDDR5 combo.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 15:33 · Score: 0

You could just go with a low TDP E2 or A4 APU and add a good after market cooler with a wide fin gap width and get better performance and still be fanless. Scthe and Noctua make several made for this, they are the models paired with fans that are in the 120-140mm 800-1200RPM range like the Noctua NH-C14 or Scythe Kabuto or Ninja Rev.B. I've used them to make silent builds before. There are fanless power supplies as well as models that turn off the fan if the heat or darw is low enough.
http://silentpcreview.com/ is a great place to start, you'd be surprised just how fast you can go and still have no fans.
Re:Where's the fine print? by Anonymous Coward · 2013-04-30 15:59 · Score: 0

Since Ivy Bridge Intel GPUs support GPGPU via OpenCL. The performance for this will undoubtedly go up with the Haswell GPUs.
The market is changing, the heaviest lift most people do is in photo and video manipulation, which is by far and away faster on the GPU then on the CPU. They want to browse the web, play some games, and edit down that 4 hour long video of their cat to the 5 seconds that will be Youtube gold.
Re:Where's the fine print? by ardor · 2013-04-30 19:04 · Score: 1

The sad thing is that the PowerVR's are actually pretty decent. The drivers (made by Intel) are to blame.

--
This sig does not contain any SCO code.
Re:Where's the fine print? by Anonymous Coward · 2013-05-01 04:47 · Score: 0

The foolish man says "my audience failed to understand me". The wise man says "I failed to understand my audience".
Re:Where's the fine print? by Anonymous Coward · 2013-05-02 11:05 · Score: 0

GPU FLOPS and bus bandwidth have very little to do with each other. You are clueless and talking out of your ass. Go to school and learn about computer science.

GPU malware? by Anonymous Coward · 2013-04-30 05:45 · Score: 0

Does this make it easier or harder to write malware? Serious question.

CPU - GPU - CPU latency by Anonymous Coward · 2013-04-30 05:51 · Score: 2, Interesting

This should really help round trip times trough the GPU. With most existing setups, doing a render to texture, and getting the results back CPU side is quite expensive, but this should help a lot. It should also work great for procedural editing/generating/swapping geometry that you are rendering. Getting all those high poly LODs onto the GPU will not longer be an issue with systems like this.

Interestingly enough, this is somewhat similar to what Intel has now for their integrated graphics, except it looks like the AMD GPU has access to the full address space and cache system, which Intel does not do. Also, its not an Intel GPU, so its likely better in other ways too, but I shouldn't need to point that out.

Intel's Haswell is moving in the opposite direction working to get some dedicated memory for the GPU, which is closer to the traditional GPU approach. Its nice to see companies exploring new areas; hopefully we will get some great hardware out of it, ideally with no broken drivers.

Re:CPU - GPU - CPU latency by Anonymous Coward · 2013-04-30 07:01 · Score: 0

Haswell is getting dedicated memory because the usual stick-of-ddr-whatever are slow. Modern GPUs are completely hamstrung by memory bandwith, and using main memory for video memory is the biggest reason integrated graphics always suck.
Haswell is only getting dedicated memory, and thus enhanced gpu performance, on certianSKUs. In all cases, those systems will have the CPU directly soldered on to the motherboard because the signaling requirements for high speed GDDR5 are so tight that it's impossible to put the memory on seperate modules (or sticks)
This new AMD product, in practice, will be the same. Soldered on, dedicated high speed memory.
I imagine both setups will have a two tier memory setup, with larger amounts of main memory on traditional sticks. The dedicated GDDR5 memory will act as a sort of psedo-L4 cache and graphics memory, I'd guess. I don't' see how it would be easy to make it addressable by the OS without some sort of new memory management and scheduling scheme. (Would be annoying to have your program randomly be assigned fast or slow memory)
Re:CPU - GPU - CPU latency by UnknownSoldier · 2013-04-30 07:43 · Score: 2

/Oblg.
http://www.dvhardware.net/news/nvidia_intel_insides_gpu_santa.jpg
or
http://media.bestofmicro.com/V/6/233106/original/feature_image09.jpg
Re:CPU - GPU - CPU latency by Anonymous Coward · 2013-04-30 10:27 · Score: 0

Intel's Haswell is moving in the opposite direction working to get some dedicated memory for the GPU, which is closer to the traditional GPU approach.
According to currently available information (including a presentation made by Intel at the recent Intel Developer Forum in Beijing), that's not true at all. Haswell models with an on-package eDRAM die will use it as a huge 128MB L4 cache common to GPU and CPU, rather than dedicated GPU memory.
(It makes a lot of sense that it's cache. 128MB isn't really enough to be a GPU's only memory, and is at the same time too much to be dedicated frame and Z-buffer storage ala the Xbox360's eDRAM.)

Why compromise? by parlancex · 2013-04-30 05:53 · Score: 1

One question they never seem to answer is why bother unifying the memory architecture at all? CPU and GPU memory architectures have always been different for the same reasons that CPUs and GPUs themselves are different; one is designed for fast execution of serial instructions with corresponding random smaller reads and writes to memory, and the other is designed for fast execution of parallel instructions with corresponding contiguous reads and writes that are much larger in size. It seems like you're just going to to ending compromising if you try to shoehorn one onto the other.

Re:Why compromise? by Anonymous Coward · 2013-04-30 06:00 · Score: 0

One question they never seem to answer is why bother unifying the memory architecture at all? CPU and GPU memory architectures have always been different for the same reasons that CPUs and GPUs themselves are different; one is designed for fast execution of serial instructions with corresponding random smaller reads and writes to memory, and the other is designed for fast execution of parallel instructions with corresponding contiguous reads and writes that are much larger in size. It seems like you're just going to to ending compromising if you try to shoehorn one onto the other.
I'd assume that this would allow instructions to pass from the CPU to the GPU without the step of looking up a memory register a 2nd time, which in computer time is actually a major bottleneck. * knows nothing about hardware architecture *
Re:Why compromise? by SenatorPerry · 2013-04-30 06:13 · Score: 5, Informative

In OpenCL you need to copy items from the system memory to the GPU's memory and then load the kernel on the GPU to start execution. Then you must copy the data back from the GPU's memory at the end after execution. AMD is saying that you can instead pass a pointer to the data in the main memory instead of actually making copies of the data.
This should reduce some of the memory shifting on the system and speed up OpenCL execution. It will also eliminate some of the memory constraints on OpenCL regarding what you can do on the GPU. On a larger scale it will open up some opportunities for optimizing work.
Re:Why compromise? by dgatwood · 2013-04-30 06:13 · Score: 1

I can see the benefit of being able to allocate a GPU/CPU-shared memory region in VRAM for fast passing of information to the GPU without a copy, but apart from making the above concept slightly cheaper to implement, the only benefit I could come up with for allowing the GPU access to main memory is making password theft easier. That and letting their driver developers write sloppier code that doesn't have to distinguish between two types of addresses....
The most hilarious part of this is that while they're doing this, the rest of the world seems to be moving in the opposite direction, towards having separate I/O and physical address spaces that are mapped with an MMU. But I digress.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Why compromise? by forkazoo · 2013-04-30 06:13 · Score: 4, Insightful

Because when you are doing stuff like OpenCL, dispatching from CPU space to GPU space has a huge overhead. The GPU may be 100x better at doing a problem than the CPU, but it takes so long to transfer data over to the GPU and set things up that it may still be faster to do it on the CPU. It's basically the same argument that led to the FPU being moved onto the same chip as the CPU a generation ago. There was a time when the FPU was a completely separate chip,a nd there were valid reasons why it ought to be. But, moving it on chip was ultimately a huge performance win. The idea behind AMD's strategy is basically to move the GPU so close to the CPU that you use it as freely as we currently use the FPU.
Re:Why compromise? by parlancex · 2013-04-30 06:18 · Score: 0

Wrong! The GPU is only 100x faster at doing certain problems because of the fast GDDR memory it is attached to which is optimized for very large sequential reads and writes. There are a tiny number of applications that require huge numbers of FLOPs on very small amounts of data (BitCoin mining and password hashing attacks come to mind, but that's about it.)
Re:Why compromise? by Anonymous Coward · 2013-04-30 06:23 · Score: 0

The *eventual* end game is x # of cpus (no gpu in the mix) all bashing away. For that you need concurrent memory. Think instead of 4 or 8 way CPUs think 64 or 128 way cpus. At that point you do not need dedicated GPU. You can peel off 32 or 64 cpus to do the work that the GPU used to do, or you can use those CPUs to do other work no special libraries needed. You have 1 unified model to program to. Instead of a specific shader language that only a small number of people understand. Instead you can use python/C/perl/whatever and just code it up and use the gnu toolchains with no special flags to make it work. Improve your compiler and all programs are better. Instead of specific case GPU's.
For example intel a few years ago had a 48 way x86 cpu on 1 die. It was almost able to do real time ray tracing. The TDP probably was huge and they were going low power so it got 'shelved' for now.
Parallel reads and writes are the same problem N way CPUs have. GPU's have the exact same issue just with a specific CPU ISA that they expose.
Simplifying the ISA is very appealing. Both from a programmer perspective and a manufacturing perspective.
Re:Why compromise? by Anonymous Coward · 2013-04-30 06:27 · Score: 0

Yeah they never talk about why, except when they spell it out in plain English for people that use their eyeballs, such as in the slides on the link in the article summary.

It makes it so that you don't need to copy data back and forth in memory nearly as much. When the CPU and GPU have to communicate, they can just pass pointers. Depending on the workload this can be a big deal.
Re:Why compromise? by kukulcan · 2013-04-30 06:29 · Score: 1

I agree with you.
Having a unified memory is a nice thing, but i expect it will only make a difference in something like the PS4, where you can target a specific architecture, which has GDDR5 as main memory, and doesn't have a discrete GPU. These two points are relevant: if you have "normal" DDR3 you loose a lot more than you gain by having UMA, and this will not change a thing in discrete GPUs because the PCIe bus is going to always be in the way of the GPU accessing main memory.
I think it is more a "nice to have" than a big step forward. The difficulty in programing GPUs lies in the different algorithms one must employ, and while having to copy memory back and forth between the CPU and GPU is a nuisance and something to be avoided, that usually isn't a dealbreaker, though i admit it is useful in some situations.
Re:Why compromise? by Anonymous Coward · 2013-04-30 06:31 · Score: 0

If you are memory bandwidth bound, then a good discrete GPU is only about 10x faster than a CPU.
Re:Why compromise? by parlancex · 2013-04-30 06:39 · Score: 1

Yeah, that's about right. I was just quoting from the OP there. An application that was already properly optimized on the CPU generally only sees performance gains of around 10 to 20x in best case scenarios.
Re:Why compromise? by Anonymous Coward · 2013-04-30 06:58 · Score: 0

One question they never seem to answer is why bother unifying the memory architecture at all? CPU and GPU memory architectures have always been different for the same reasons that CPUs and GPUs themselves are different; one is designed for fast execution of serial instructions with corresponding random smaller reads and writes to memory, and the other is designed for fast execution of parallel instructions with corresponding contiguous reads and writes that are much larger in size. It seems like you're just going to to ending compromising if you try to shoehorn one onto the other.
CPU's are running multiple cores and multiple processes, so when looking at the entire address space there really isn't "serial access" any more.
Re:Why compromise? by markhahn · 2013-04-30 07:28 · Score: 1

nah. providing wider and faster memory will help even purely CPU codes, even those that are often quite cache-friendly. the main issue is that people want to do more GPUish stuff - it's not enough to serially recalculate your excel spreadsheet. you want to run 10k MC sims driven from that spreadsheet, and that's a GPU-like load.
but really it's not up to anyone to choose. add-in GPU cards are dying fast, and CPUs almost all have GPUs. so this is really about treating APUs honestly, rather than trying to pretend they can survive on old-fashioned CPU memory interfaces.
Re:Why compromise? by hedwards · 2013-04-30 08:30 · Score: 1

I'm sure that they've thought about that already. The question is whether they've done the work necessary to deal with the problem.
Re:Why compromise? by Anonymous Coward · 2013-04-30 14:25 · Score: 0

I can see the benefit of being able to allocate a GPU/CPU-shared memory region in VRAM for fast passing of information to the GPU without a copy, but apart from making the above concept slightly cheaper to implement, the only benefit I could come up with for allowing the GPU access to main memory is making password theft easier. That and letting their driver developers write sloppier code that doesn't have to distinguish between two types of addresses....
I believe the general idea in both AMD and Intel camps (but especially AMD's) is to fully integrate the GPU's view of memory addresses with the CPU's. They're pushing towards the integrated GPU becoming a tightly integrated low-latency heterogeneous compute resource, rather than a batchmode peripheral in a galaxy far, far away. If an iGPU has a MMU which walks the same page table as the CPU's MMU, issues like password theft by scanning memory go away... (To the same extent they go away for code running on the CPU, anyways.)
Re:Why compromise? by GoatCheez · 2013-04-30 17:47 · Score: 1

That wouldn't exist otherwise because.......?

Re:No match for Haswell by Anonymous Coward · 2013-04-30 05:56 · Score: 1, Insightful

The APU graphics kick the shit out of Intels, and now, you don't even need a memory->vid memory BUS. Think about it

Security model? by dgatwood · 2013-04-30 06:00 · Score: 1

They talk about passing pointers back and forth as though the GPU and CPU effectively share an MMU. The problem is, GPUs and CPUs don't work the same way. GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis. It would be devastating if a GPU could, for example, allow an arbitrary user-space process to overwrite parts of the kernel and inject virus code that runs with greater-than-root privilege. It would similarly be devastating if some arbitrary process could, for example, read the private RAM that backs your keychain or other security-related processes.

I'm assuming that they're doing something sane like having a separate set of RWX bits on each page table entry to control what the GPU's rights are for that page, so that the GPU would only be allowed to read specifically flagged main-memory pages, but these fuzzy marketing briefs provide just enough information to be terrifying.

--

Check out my sci-fi/humor trilogy at PatriotsBooks.

Re:Security model? by forkazoo · 2013-04-30 06:17 · Score: 1

My understanding is that there will indeed be something like RWX control. Not just for security, but also for performance. If boths ides can freely write to a chunk of memory, you can get into difficulties accounting for caches in a fast way.
That said, if the CPU and the GPU are basically sharing an MMU, then the GPU may be restricted from accessing pages that belong to process that aren't being rendered/computed. There's no reason why two different applications should be able to clobber each other's texture memory if they do something stupid. So, having the GPU share pointers with the CPU is potentially a very good thing for security. (How well AMD implements the concept in practice remains to be seen, but I'm optimistic.)
Re:Security model? by frank_adrian314159 · 2013-04-30 06:18 · Score: 1

GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis.
If you plan to make the GPU easy to use as a general computing resource (which, according to the writeup, seems to be what they're aiming at) the GPU needs to also be working at a per-process basis and linked to the main system memory so that results are easily available to the main system for I/O, etc.
Of course, even if this is their goal, one question still remains... Will this be useful? It all depends on the apps. I could see this architecture potentially making programming easier for folks who program audio processing software, rendering, or large simulations, especially if they can make the GPU look more like a general purpose processor.

--
That is all.
Re:Security model? by Anonymous Coward · 2013-04-30 06:30 · Score: 0

Yeah you're right, a company full of CPU and GPU engineers would be too stupid to think of issues like that. I'm glad that you brought this point up in a way that showcases your immense knowledge while also being just slightly alarmist so that we can all learn from you.
Re:Security model? by Anonymous Coward · 2013-04-30 12:37 · Score: 0

Yeah because there are no vulnerabilities now, right?

Makes Sense! by Anonymous Coward · 2013-04-30 06:03 · Score: 0

...heterogeneous uniform ...

Now there's an oxymoron!

Name is a pun by Anonymous Coward · 2013-04-30 06:08 · Score: 2, Informative

Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.

HTH,

ac

Re:Name is a pun by Anonymous Coward · 2013-04-30 06:46 · Score: 1

Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.
HTH,
ac
"Kaveri" is actually the name of a major river in Karnataka, a state in India. AMD names its cores on major rivers all around the world.
HTH.
Re:Name is a pun by Anonymous Coward · 2013-04-30 07:14 · Score: 0

What's the closest thing to a fish's arsehole?
A Finn!
Hahahahaha!
Re:Name is a pun by Anonymous Coward · 2013-04-30 07:15 · Score: 0

I think it would be more accurately translated as "buddy".
Re:Name is a pun by jones_supa · 2013-04-30 07:29 · Score: 1

Yes, buddy would be spot on. :)
But all in all, the chip's name really sounds sympathetic to the Finnish ear!
Re:Name is a pun by Anonymous Coward · 2013-04-30 08:26 · Score: 0

Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.
HTH,
ac
"Kaveri" is actually the name of a major river in Karnataka, a state in India. AMD names its cores on major rivers all around the world.
HTH.
And Tamil Nadu too.
https://en.wikipedia.org/wiki/Kaveri_River
Re:Name is a pun by Radak · 2013-04-30 10:45 · Score: 1

It's the night before Vappu. We're way too busy getting drunk in Finland.
Re:Name is a pun by TeknoHog · 2013-04-30 17:59 · Score: 1

It's not just Finnish. Hebrew chaver may be the common etymology for both this and the Finnish word. It is also the origin of the Dutch word gabber.
OTOH, the pun with APU is harder to explain without Finnish.

--
Escher was the first MC and Giger invented the HR department.

AMD = most bang by Anonymous Coward · 2013-04-30 06:24 · Score: 0

If you want the fastest thing on the market, buy Intel. For the majority who want the best deal or most bang for their buck, AMD is the best buy.

Re: No match for Haswell by UnknowingFool · 2013-04-30 06:36 · Score: 2

When someone asks me about buying AMD or Intel, the general summarization I give them is that AMD's built-in GPU handily beats Intel's built-in GPU but Intel's CPU beats AMD's CPU. If graphics are a big concern, they should get a cheap discrete card as one under $100 will be good for most games. Thus AMD's advantage is negated. Also both companies offer more CPU processing power than most consumers can use anyway.

--
Well, there's spam egg sausage and spam, that's not got much spam in it.

Here's to better AI! by pieisgood · 2013-04-30 06:37 · Score: 1

With a GPU next to the CPU the latency between them is reduced, this is awesome for OpenCL applications. Imagine you wanted to work a markov model into your AI and you needed to a large number of matrix calculations to get it to run properly and you want it in real time, I think this might solve that problem. I'm imagining game AI improving with adoption of this style of processor. Anyone see this differently?

--
Eat sleep die

Re:Here's to better AI! by godrik · 2013-04-30 07:20 · Score: 1

I don't know... This heterogeneous computing with low latency seems interesting if it does not harm raw performance. The main advantage would be to transport data back and forth between the two. If the computation on one side is long, then the decrease in latency is not very useful. If both of them are really fast, then there is not too much to gain to begin with.
It really helps when you need fast turn around so for small and very synchronous computation. I am waiting to see one good usecase.
Re:Here's to better AI! by gmueckl · 2013-04-30 08:09 · Score: 1

Learning AIs in games have been problematic in the past. Mostly it is about control over the experience that gets delivered to the customer: as a designer your job is to get it just right. You can do this easily with current more or less heuristic AI algorithms. The ability to learn opens the scope of possible behaviours so much that it's not possible anymore to deliver a guaranteed experience.
Short version: the designer can't stop the game from going nuts in unpredictable ways because of stupid player input (and well, all player input is "stupid").

--
http://www.moonlight3d.eu/

Ivy Bridge already has shared memory architecture by Anonymous Coward · 2013-04-30 06:49 · Score: 0

That's why Intel's HD4000 is faster than AMD's HD 7660D in several OpenCL benchmarks. http://semiaccurate.com/2013/04/29/a-look-at-intels-opencl-performance/

Larrabee by Anonymous Coward · 2013-04-30 06:57 · Score: 0

And it didn't get 'shelved', it got turned into a Tesla-Alike, since while it was great for GPGPU loads, it actually sucked as a replacement for the then-current generation AMD/Nvidia GPUs and by the time it was released to the public it would've been an i740/i752 disaster all over again.

Re:Larrabee by Anonymous Coward · 2013-04-30 07:05 · Score: 0

It was probably the memory subsystem that killed it. When you end up with that many CPU's having a suck memory system will kill you quick. Think O(n^x) suck.
It got bashed on the benchmarks as everything out there assumed nVidia/ATI hardware. The heat was too high for it so everything got clocked back. It was not designed around vector math. It was basically a 48 way x86 cpu chip. Newer instructions added to the x86 isa probably would work around that.
Re:Larrabee by gmueckl · 2013-04-30 08:01 · Score: 1

They sell this stuff under the brand name Xeon Phi now. It's something like 60 simplified x86-like units on a die. Looks like they only cater to big orders from supercomputer builders right now.

--
http://www.moonlight3d.eu/
Re:Larrabee by Anonymous Coward · 2013-04-30 11:15 · Score: 0

It was probably the memory subsystem that killed it. When you end up with that many CPU's having a suck memory system will kill you quick. Think O(n^x) suck.
Oh really? Try justifying that rather outlandish claim. What do n and x even mean, in your mind?
Not that it matters what you retroactively decide you were talking about. It doesn't take a genius to see that if several CPUs are performing the same task (as will usually be the case in a massively parallel computer), memory bandwidth requirements will scale linearly with the number of CPUs.

It got bashed on the benchmarks as everything out there assumed nVidia/ATI hardware.
Intel never released Larrabee as a GPU, nor did Intel ever release any directly comparable benchmark data of it as a GPU. You're making shit up, such data simply wasn't available to the general public.

The heat was too high for it so everything got clocked back.
You know this how? This is another case where Intel never released enough data to make any such claims.

It was not designed around vector math. It was basically a 48 way x86 cpu chip. Newer instructions added to the x86 isa probably would work around that.
Gee, you mean like the "Larrabee New Instructions", the 512-bit vector instructions which were pretty much the entire point of the chip? Larrabee's x86 support structure can be viewed as nothing more than control logic for the LNI vector ALUs -- if I recall correctly the vector ALUs (one per x86 core) were over 90% of the chip's logic.
For someone who clearly knows nothing about Larrabee, or computer architecture in general, you sure seem eager to present yourself as an expert.

The real reason is cost by Wesley+Felter · 2013-04-30 07:00 · Score: 1

In low-cost systems the CPU and GPU are combined on a single chip with a single (slow) memory controller. Given that constraint, AMD is trying to at least wring as much efficiency as they can from that single cheap chip. I salute them for trying to give customers more for their money, but let's admit that this hUMA thing is not about breaking performance records.

Kaveri is named after a river in India!!! by Anonymous Coward · 2013-04-30 07:04 · Score: 0

"Kaveri" is actually the name of a major river in Karnataka, a state in India. AMD also has "Kabini" chip, and "Kabini" is also a river in Karnataka, India :)

Re:Kaveri is named after a river in India!!! by mooglez · 2013-04-30 07:11 · Score: 1

Kaveri is also finnish, and means "a friend", seems to describe the system decently.

heterogenous computing overrated by edxwelch · 2013-04-30 07:09 · Score: 1

I think AMD overrate heterogenous computing. The assumption is that all applications can take advantage of GPGPU. This is simply not true. Only certain types of application are suitable, such as multimedia and simulation - where it's very obvious what part of the code can be parallelised.

Re:heterogenous computing overrated by Anonymous Coward · 2013-04-30 07:29 · Score: 0

Well... if we conveniently ignore that those two applications are among the most demanded from a computer, both in the consumer and professional worlds...
Re:heterogenous computing overrated by Anonymous Coward · 2013-04-30 22:55 · Score: 0

You are excluding the possibility that future compilers will be better at parallelizing programs.

This looks an awful lot like the PS4 by juancn · 2013-04-30 07:11 · Score: 2

Today I read an an article in Gamasutra that details some of the internals of the PlayStation 4 and the architecture looks a lot like what's described here.

With GDDR5 memory this could be very interesting.

Re:This looks an awful lot like the PS4 by UnknownSoldier · 2013-04-30 07:51 · Score: 1

Holy crap -- has hell frozen over? Sony is actually thinking about developers for once!? Using (mostly) off the shelf commodity parts is definitely going to help win back some developers. Time will tell if "they are less evil then Microsoft"
Thanks for the great read.
Re:This looks an awful lot like the PS4 by Anonymous Coward · 2013-04-30 08:47 · Score: 0

This basically IS the PS4, and the Xbox Infinity; the Xbox Infinity has DDR3, the PS4 has GDDR4. They have lower-power Jaguar cores instead of Steamrollers, but the GCN-based Southern Islands GPUs on the APU are similar (if tweaked a bit).
It's a solid, very workable architecture, as you'll see, although it needs faster RAM. Strong caches and DDR3 just aren't as effective as you'd hope.

Re:Ivy Bridge already has shared memory architectu by dstyle5 · 2013-04-30 07:33 · Score: 1

Hi Charlie!

The GPU side still needs it's own memory channels by Anonymous Coward · 2013-04-30 07:51 · Score: 0

I'd still prefer an i3 and an entry level dedicated videocard.

Re:Ivy Bridge already has shared memory architectu by Anonymous Coward · 2013-04-30 08:04 · Score: 0

Oh hello there. I must disappoint you, I'm not Charlie, I'm just one of semiaccurate readers. Anyway, a few days ago, I was rather surprised, that even though HD4000 has 2x lower raw performance than HD 7660D, it still manages to beat HD 7660D in quite a few benchmarks. Shared Memory Architecture is an obvious explanation for that...

Re: No match for Haswell by Anonymous Coward · 2013-04-30 08:05 · Score: 0

"a cheap discrete card as one under $100 will be good for most games. "
HAHAHAHA
Maybe games from 5 years ago, otherwsie you're stuck with the lowest possible settings in any "modern" game. Ever used one of those sub $100 graphic cards? I did, all I could do was play older games, such as Unreal Tournament 2004 or Starcraft 1 and playback HD videos, other than that, couldn't play at any decent settings games like bioshock, SC2, any mmo such as WoW, Guildwars, etc...

SGI O2 reinvented by Shinobi · 2013-04-30 08:11 · Score: 2

OK, so the SGI O2's UMA has now been reinvented for a new generation, just with more words tacked on....

How does it work? by Anonymous Coward · 2013-04-30 08:51 · Score: 0

Does this mean that you can pass a pointer to a buffer object from a GPU process to a CPU process, manipulate it on the CPU and pass the pointer back to the GPU to continue processing there?

The best way to advocate/support Linux... by Anonymous Coward · 2013-04-30 08:56 · Score: 0

Is to use it, install it in as many places as you can (for friends and family) and work out any problems or questions they may have. Even if they don't stick with it, the experience will be useful in general and will help shape and grow Linux. You can start them off slowly by recommending some OSS apps where they may be useful, such as LibreOffice, VLC, Firefox, Chromium, Inkscape, Gimp, Pidgin, Thunderbird, etc. Many of them are probably already running a couple of those apps. Eventually they can switch over painlessly, or at least benefit from OSS in general.

I've never been to your site or heard of it, and I still come across many advocates. I don't think pouring resources into such an insignificant site will benefit Linux in general. The hard core are already doing their job, the supporters are fine with Google and official forums for their distro, and the users are doing the best thing they can, using the product.

So where exactly does linuxadvocates.com come in again? Seems useless.

What about the software model by PhamNguyen · 2013-04-30 09:17 · Score: 1

I'm interested to see what the software model for this will be. Sure they could use OpenCL, but it seems like a lot of the pain in using OpenCL derives from the underlying memory architecture. With a shared virtual address space and fully coherent caches all in hardware, it should be possible to have a much simpler software model than OpenCL. I guess it doesn't really matter what the software model is though since now that everything is in main memory, GPU functions can be called just like regular functions and the caller doesn't need to care how they are implemented. E.g. it should be possible to have a BLAS GPU library that operates on main memory pointers, where before the cost of copying a matrix to the GPU and back for a single operation woudn't have been worth it.

Re:What about the software model by serviscope_minor · 2013-04-30 21:30 · Score: 1

Indeed it should be easier. There will still be some cost, since the processors are still in thread bundles and still trade speed for throughput, but the cost will be much lower. I expect the break even point will be pretty small though and won't have the huge disadvantage of limited memory for very large things.
I wonder what the low level locking primitives between the GPU and CPU will be. Those will have some effect on the speed.
I also wonder what/how the stream processors will be dealt with by the OS and scheduler.
Since they're the right side of the MMU now, it would seem that the os kernel is the right place to do the scheduling and so on. Presumably with thread bundles though the program will have to keep the kernel well informed about scheduling requirements.
At this point would the most sensible thing be for AMD to simply release the instruction set and let everyone go nuts on it. I guess they can still provide proprietary drivers for 3D graphics, but the interesting thing here isn't the graphics capability anyway.

--
SJW n. One who posts facts.
Re:What about the software model by Anonymous Coward · 2013-05-01 14:32 · Score: 0

OpenCL developer here. It seems likely that the software model will be OpenCL, at least at first.
Currently, we use a command queue to schedule all our work, including data transfers. We have to copy data from the host (CPU) to our OpenCL device (GPU), then enqueue our kernels (the work code), then when the kernels have finished, we copy our finished data back to the host. Naturally, the CPU can't touch what's in GPU memory while the command queue is being executed.
With hUMA, those copy commands won't actually copy anything. They'll just transfer ownership of memory that's been claimed as a 'buffer' from the CPU to the GPU, and back again when the command queue finishes. The CPU won't be able to access those resources while they're owned by the GPU. We won't have to change much (if any) of our code, and any hUMA-friendly OpenCL program will still work on a non-hUMA device.
Note that this sort of thing is already abstracted away by things like C++AMP and Bolt, so if anything, the software model can only be simpler than what we have now. We developers just won't have to think about it.
This is going to be wonderful. I don't care if the early hUMA devices have only a fraction of the performance of a discrete GPU, because that'll still be far faster than CPU code alone, but with access to all the GDDR5 RAM I can fit in the motherboard. And I can still have a discrete GPU as well, for teraFLOPs processing of any problem that'll fit in its ~2GB.

Re: No match for Haswell by Flodis · 2013-04-30 09:25 · Score: 1

You do realize that what you're saying is an argument for AMD, don't you?

both companies offer more CPU processing power than most consumers can use anyway.

Ok. Noted. Either will do fine CPU-wise.

AMD's built-in GPU handily beats Intel's built-in GPU

Ah. Great. So AMD is the better buy then.

Not only that, but it will save ~$100 on the CPU and ~$50 more on the motherboard. That's GREAT advice.

But no.. Then we hear this;

If graphics are a big concern, they should get a cheap discrete card as one under $100 will be good for most games. Thus AMD's advantage is negated.

Ummm.. First you made a good case for AMD, and now you're saying they should pick Intel anyway, and not only that, They should cough up an extra $100 on top of the ~$150 extra they already need to cough up, just to negate AMD's advantage. WTF? Why not just pick AMD in the first place then?

Re: No match for Haswell by cheater512 · 2013-04-30 09:29 · Score: 2

AMD beats Intel on the price point however.
And that isn't even counting that with Intel you need to buy a $100 extra card either.

If you *need* top notch performance, go Intel. Otherwise AMD will be lighter on your wallet and do the same job very well.

PS3 != PS4 by Anonymous Coward · 2013-04-30 09:32 · Score: 0

You didn't read the word PS4 and "will"?

Re:PS3 != PS4 by Anonymous Coward · 2013-04-30 12:30 · Score: 0

"The PS3 already uses shared address space."

Re: No match for Haswell by UnknowingFool · 2013-04-30 10:51 · Score: 1

No. I'm saying if the user intends to get a discrete GPU there isn't an advantage to AMD and a slight advantage to Intel. But most consumers don't do anything that would see a difference anyways. Either works.

--
Well, there's spam egg sausage and spam, that's not got much spam in it.

Re: No match for Haswell by fast+turtle · 2013-04-30 12:54 · Score: 1

radeon 5670 with 512m onboard (cost from newegg when bought $90) plays GW, SC and all the other games Iv'e thrown at it quite handily. Will have probs if game is is heavily tesserected but that's the only time it's a prob and I run 1900x1080 (monitor native rez) and the funniest thing is - the new radeon drivers support the damn thing while my 7300GT is no longer supported by either Nvidia or Linux, even with the god damn nouveua and nv driver. The reason I still have the old Geforce 7300GT - it's fanless so don't have to worry about it dying from overheating.

I've said it before and I'll say it again, what AMD is doing is pushing the APU as the new FPU. That's right. Once they get things completely revamped, you're going to be looking at a CPU that outperforms Intels best by quite a bit in the next decade.

--
Mod me up/Mod me down: I wont frown as I've no crown

whaaaaaaat? by slashmydots · 2013-04-30 14:18 · Score: 0

Why would a graphics card want to use virtual memory? Also, what motherboard takes GDDR5? Who the heck wrote this nonsense?

Re: No match for Haswell by Anonymous Coward · 2013-04-30 14:42 · Score: 0

You can get a Geforce GTX 650 for under $100 these days. That will handle pretty much any game at maximum or high settings.

Re: No match for Haswell by camg188 · 2013-04-30 14:55 · Score: 1

I think AMD's target for this architecture is a typical Walmart shopper (lower price point, higher sales volume) looking to buy a laptop, so add-on video cards are out of the question. The first 2 questions this type of shopper will ask is "how much?' and "which one is better?"

Re: No match for Haswell by Anonymous Coward · 2013-04-30 16:41 · Score: 0

AMD still has the advantage of the CPU and mobo costing much less then the Intel system even if you are going with only a dedicated GPU. a 970 based mobo and a cheap Phenom2 or Vishera FX4 series CPU and any $100+ GPU and you are still comming in around $100 under the equivilent Intel system.

It's time to stop thionking with your e-peen and making better use of your money.

One address is better than two by Ottibus · 2013-04-30 18:11 · Score: 1

Why would a graphics card want to use virtual memory?

Shared physical memory avoids the cost of copying data to and from the GPU but without shared virtual memory the data will end up at different addresses on the CPU and GPU. This means that you cannot use pointers to link parts of the data together and must rely on indexes of some sort. This makes it harder to port existing code and data structures to use GPU computation.

Also, with shared physical memory you have to tell the device which memory you want to use (so that it can tell you which address to use). With shared virtual memory you can use any memory that is mapped into the CPU process and the memory system will automatically make it visible to the GPU.

In other words, it makes the programmers' life easier. How you measure this benefit is another question altogether!

GPU/GPGPU bottleneck by S3D · 2013-04-30 19:43 · Score: 1

In my experience GPU and especially GPGPU bottleneck is not amount of memory but memory access bandwidth. 256-512 bit is not adequate for existing apps. Before amount of memory will become important manufacturers should move to at least 2048 bit mem bus and also increase amounts of registers per core several times.

IOMMU by FithisUX · 2013-04-30 21:17 · Score: 1

I haven't seen this magical word in the presentation. Moreover I do not see the CPU/GPU convergence often talked about. It sounds more like a marketing hype. Moreover the ecosystem could be enriched with DSP or Network processor cores all uniformly offering their resources to software, I did not see it.

Re: No match for Haswell by serviscope_minor · 2013-04-30 21:39 · Score: 1

I think AMD's target for this architecture is a typical Walmart shopper

Partly that and partly it's way more interesting. The unified memory trades performance for flexibility (as always), but puts it in a very interesting space. Less performant than a discrete GPU with a crazy memory architecture, but puts tons more FPU grunt under the flexible memory susbsystem of easy to use CPUs.

It will make acceleration more applicable to a much wider range of tasks at the cost of being slower on some.

Due to the close coupling, on the right codes, this thing outght to absoloutely hammer even the top end i7s. It should also be able to handily beat discrete GPUs on tasks where the cpu-gpu-cpu latency is just too high, or where GPUs just don't have enough memory.

Of course on single threaded tasks even the i5 and i3 will probably beat it though AMD has been slowly closing the gap and this will improve the situation slightly.

I can see this being personally useful to me. The thing is that discrete GPUs are a bit of a major faff for too many tasks and yield too little benefit.

Honestly due to the enhanced opportunities for acceleration, it's probably a waste to use it for graphics. May as well offload that on to some dedicated hardware. And the cycle of reinvention begins again.

--
SJW n. One who posts facts.

Re: No match for Haswell by Anonymous Coward · 2013-05-01 00:17 · Score: 0

tesserected isn't a word, but it's such as good word I'm going to go away and develop the technology, saving you any spelling embarrassment.

Yours, anon.

Very Interesting by Aquineas · 2013-05-01 01:07 · Score: 0

The question now will be how long it takes before the drivers (OpenCL, DirectX, OpenGL) and even the OSes themselves can take advantage of the architecture. And once that's done, AMD would be wise to work directly with the big compilers (gcc, Clang, msvc, and Intel if they would do it) to allow developers to flip a bit so that the RTLs could use OpenCL for as many math calculations as possible. After all, this is just one step closer to performing nearly all floating point math on the "math-coprocessor" (aka GPU).

Re: No match for Haswell by Flodis · 2013-05-01 07:45 · Score: 1

Not trying to pick a fight here, but I don't think this computes unless you change your mind about the importance of the CPU's computational power, or take some other - not yet mentioned - factor(*) into consideration.

Eg: If the user intends to get a discrete GPU, as you say, s/he will have approx $150 more to spend on the GPU if s/he picks the AMD solution. A $250 GPU vs. a $100 GPU is a pretty significant difference. Thus if graphics matter, the user should pick the AMD solution.

(*) of which there is possibly a boatload to consider. Socket longevity, thermal design power, ability to build a quiet system, ability to use ECC memory, etc. Not only price, but also many 'features' favor AMD since AMD tends to enable ECC, AMD-V and such in consumer CPUs, whereas you have to step up to Xeons to get that from Intel. However, some properties such as Computational power per Watt tend to favor Intel in a significant way. Where I think we agree, is that with Intel you can get pretty much everything you can get from AMD, provided you're willing to spend the money (Eg. step up to a Xeon CPU, add a discrete graphics card).

Re:The GPU side still needs it's own memory channe by halltk1983 · 2013-05-01 08:18 · Score: 1

If you prefer one hardware over the other without seeing benchmarks, then you are someone that is usually referred to as a "fanboy". Have fun with that.

--
Watch for Penguins, they eat Apples and throw rocks at Windows.

OK so how do I (or someone) use this? by RalphTheWonderLlama · 2013-05-01 10:22 · Score: 1

The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs.

So how do you do this in Java, Python? Did nobody ask? I did a search for "java huma uniform memory access" and this page came up first with nothing from java.com or oracle in sight.

Ok more searching says to use OpenCL and lots of stackoverflow questions... but they're not new... and OpenCL is not Java. What do you do for this new easier to program hardware? Is their definition of "supported" currently a bit optimistic? Supported by Java..... because Java lets you do lots of things not actually in Java and still work with a Java program, so pretty much anything is "supported" in Java. Is that the jist? I guess we need the tools to evolve before things really take hold.

--
simple, fast homepage with your links: http://www.ngumbi.com/

Re:The GPU side still needs it's own memory channe by Anonymous Coward · 2013-05-02 11:07 · Score: 0

And if you believe benchmarks are a good indicator of real performance, you're just fucking stupid.

Re: No match for Haswell by Anonymous Coward · 2013-05-02 11:24 · Score: 0

Either will do fine CPU-wise

Except the Intel CPU will complete its given tasks two to four times faster than AMD's closest equivalent CPU.

So AMD is the better buy then

Nope. Benchmarks show that Intel HD 4000 graphics are easily on par with any integrated AMD GPU.

it will save ~$100 on the CPU

That's easy to say when you don't even specify which specific products you are referencing. I can easily find an Intel CPU that outperforms an AMD CPU in the same price range.

~$50 more on the motherboard

Considering you can buy an Intel brand LGA 1155 motherboard for $50 or less, I'd like to know where I can get AMD motherboards for free or where AMD will pay me to take one.

Slashdot Mirror

AMD Details Next-Gen Kaveri APU's Shared Memory Architecture

128 comments