Slashdot Mirror


AMD Details Next-Gen Kaveri APU's Shared Memory Architecture

crookedvulture writes "AMD has revealed more details about the unified memory architecture of its next-generation Kaveri APU. The chip's CPU and GPU components will have a shared address space and will also share both physical and virtual memory. GPU compute applications should be able to share data between the processor's CPU cores and graphics ALUs, and the caches on those components will be fully coherent. This so-called heterogeneous uniform memory access, or hUMA, supports configurations with either DDR3 or GDDR5 memory. It's also based entirely in hardware and should work with any operating system. Kaveri is due later this year and will also have updated Steamroller CPU cores and a GPU based on the current Graphics Core Next architecture." bigwophh writes links to the Hot Hardware take on the story, and writes "AMD claims that programming for hUMA-enabled platforms should ease software development and potentially lower development costs as well. The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs."

70 of 128 comments (clear)

  1. The PS4 by MXPS · · Score: 4, Interesting

    will feature this technology. It will be interesting to see how it stacks up.

    1. Re:The PS4 by Wesley+Felter · · Score: 2

      One of the problems with the PS3 is that it didn't have shared memory. Maybe you're thinking of the 360.

    2. Re:The PS4 by triffid_98 · · Score: 1

      And unlike the Atari Jaguar, it will actually be a 64 bit system. *rimshot*

    3. Re:The PS4 by thoper · · Score: 2

      in effect, the ps4 memory is even more integrated.. see: here and here

  2. Spam Advocates by TheNinjaroach · · Score: 2

    I'm not so sure how I feel about this whole Linux advocacy thing you're trying to promote. But spam, now there's an idea I can get behind! Take my money!

    --
    I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
  3. Interesting by Malenx · · Score: 1

    I'm curious how long it will be before these optimizations are found in the compilers themselves.

  4. Where's the fine print? by madwheel · · Score: 1

    As usual, AMD is leaving out some key information. What will be the TDP of such chips? I've always rooted for AMD and all my systems were built with them. You can't beat an Ivy Bridge chip for performance for watt though. With the i7-3770K, AMD doesn't offer anything compelling to compete. I like the idea that they're using the GCN architecture to assist with processing, but have they done anything to the lithography or power consumption? Intel's haswell chips come out soon and those are even better. Power is key in the mobile space where a lot of chips are going. -Joe

    1. Re:Where's the fine print? by K.+S.+Kyosuke · · Score: 1

      Power is key in the mobile space where a lot of chips are going. -Joe

      I hope that your i7-3770K is serving you well in your cell phone.

      --
      Ezekiel 23:20
    2. Re:Where's the fine print? by madwheel · · Score: 1

      I guess I need to provide more information to help get my point across. Intel has 4th gen chips that run on a 7 watt TDP. The performance per watt is pretty remarkable. Intel's i7-3770K has a 77 watt TDP. AMD's FX-8350 has a 125 watt TDP, get's spanked by Intel in most benchmarks, and doesn't have any graphics chip on die to drive a monitor. Translating that down, Intel has an advantage. I would love to be proven wrong though.

    3. Re:Where's the fine print? by serviscope_minor · · Score: 5, Insightful

      You can't beat an Ivy Bridge chip for performance for watt though.

      Ehugh. Yes no kind of.

      For "general" workloads IVB chips are the best in performance per Watt.

      In some specific workloads, the high core count piledrivers beat IVB, but that's rare. For almost all x86 work IVB wins.

      For highly parallel churny work that GPUs excel at, they beat all X86 processors by a very wide margin. This is not surprising. They replace all the expensive silicon that make general purpose processors go fast and put in MOAR ALUs. So much like the long line of accelerators, co processors, DSPs and so on, they make certain kinds of work go very fast and are useless at others.

      But for quite a few classes of work, GPUs trounce IVB at performance per Watt.

      The trouble is that GPUs suck. They have teeny amounts of local memory and a slow interconnect to main memory. They also suck at certain things and batting data between the fast (for some things) GPU and fast (for other things) CPU is a real drag becuase of the latency. This limits the applicability of GPUs.

      Only with the new architecture, which I (and presumably many others) hoped was AMDs long term goal a number of these problems have disappeared since the link is very low latency and the memory fully shared.

      This means the very superior performance per Watt (for some things) GPU can be used for a wider range of tasks.

      So yes, this should do a lot for power consumption for a number of tasks.

      --
      SJW n. One who posts facts.
    4. Re:Where's the fine print? by K.+S.+Kyosuke · · Score: 1

      Intel's i7-3770K has a 77 watt TDP. AMD's FX-8350 has a 125 watt TDP, get's spanked by Intel in most benchmarks, and doesn't have any graphics chip on die to drive a monitor.

      You know, that might be exactly the problem here. This is something completely different. If the GPU will be any decent, chances are that a combination of a high-end-GPU equipped APU with a lot of GDDR5 memory would make many HPC people much happier than Haswell ever could. In some application areas, it's all about bandwidth. Today, if you're trying to do HPC on, say, a 20GB dataset in memory, on a single machine, you're screwed.

      --
      Ezekiel 23:20
    5. Re:Where's the fine print? by serviscope_minor · · Score: 1

      Translating that down, Intel has an advantage.

      i7 3770k: £250
      FX 8350: £160

      Yes. Advantage Intel. Also take into account that quality motherboards are usually cheaper for AMD and that one can also upgrade more easily.

      The more apt comparison is to some i5. At that point, the 8350 beats it in a large number of benchmarks (and does actually beat the much more expensive i7). Basically in multi threaded code the FX8350 wins. In single threaded code the i5 wins.

      --
      SJW n. One who posts facts.
    6. Re:Where's the fine print? by parlancex · · Score: 1

      The trouble is that GPUs suck. They have teeny amounts of local memory and a slow interconnect to main memory. They also suck at certain things and batting data between the fast (for some things) GPU and fast (for other things) CPU is a real drag becuase of the latency. This limits the applicability of GPUs.

      The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s which actually exceeds the best main memory bandwidth you'd get out of an Ivy Bridge CPU with very fast memory, so no, that's not a bottleneck for bandwidth, though yes, there is some latency there.

      I don't know why everyone seems to forget that GPUs aren't just fast because they have a lot of ALUs (TFA included), they are fast because of the highly specialized GDDR memory they are attached to. One would be completely useless without the other. Even the lowly GTX 285 from 4 years ago was pushing 160GB/s for memory bandwidth.

    7. Re:Where's the fine print? by madwheel · · Score: 1

      I do agree with you. I'm simply referring to the simple tasks the general public does. Web surfing, iTunes, emails, etc. These are not heavily threaded tasks. Granted the difference is marginal because any modern processor can handle this with ease. Sure in highly threaded workloads the AMDs offer a better bang for your buck, but the general public does not do this on a day to day basis.

    8. Re:Where's the fine print? by bored · · Score: 3, Informative

      The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s which actually exceeds the best main memory bandwidth you'd get out of an Ivy Bridge CPU with very fast memory, so no, that's not a bottleneck for bandwidth, though yes, there is some latency there.

      Its both, for my application, the GPU is roughly 3x-5x as fast as a high end CPU. This is fairly common on a lot of GPGPU workloads. The GPU provides a decent but not huge performance advantage.

      But, we don't use the GPU! Why not? Because copying the data over the PCIe link, waiting for the GPU to complete the task, and then copying the data back over the PCI bus yields a net performance loss over just doing it on the CPU.

      In theory, a GPU sharing the memory subsystem with the CPU avoids this copy latency. Nor does it preclude still having a parallel memory subsystem dedicated for local accesses on the GPU. That is the "nice" thing about opencl/CUDA the programmer can control the memory subsystems at a very fine level.

      Whether or not AMD's solution helps our application remains to be seen. Even if it doesn't its possible it helps some portion of the GPGPU community.

      BTW:
      In our situation its a server system so it has more memory bandwidth than your average desktop. On the other hand, i've never seen a GPU pull more than small percentage of the memory bandwidth over the PCIe links doing copies. Nvidia ships a raw copy benchmark with the CUDA SDK, try it on your machines the results (theoretical vs reality) might surprise you.

    9. Re:Where's the fine print? by Kjella · · Score: 1

      Assuming you're willing to write special software that'll only see benefit on AMDs APUs, not on Intel nor anything with discrete GPUs. I suppose it's different for the PS4 or Xbox720 where you can assume that everyone that'll use the software will have it, but for most PC software the advantages would have to be very big indeed. If you need tons of shading power it's better to run on discrete GPUs, even with unified memory switching between shaders and cores isn't entirely free so it might not do that much for general computing, you need the right kind of mix. I'm hoping but can't help to feel that AMD is giving up a big market in pursuit of a small market.

      --
      Live today, because you never know what tomorrow brings
    10. Re:Where's the fine print? by skids · · Score: 1, Interesting

      Speaking as someone currently considering buying slightly behind the curve, I was all set to jump on an Intel-based fanless system because of the TDP figures. However, with the PowerVR versions of the Intel GPU c**k-blocking linux graphics, and with AMD finally open-sourcing UVD, I'm now back to considering a Brazos. Less choices for fanless pre-built systems, though. May have to skip on the pay-a-younger-geek-because-I-dont-enjoy-playing-legos-anymore part.

      So no, for some markets, Intel has not yet realized the advantage that their IC processes should technically give them, and to the point of TFA, if they do not combine that advantage with architectural improvements, there will be ways for AMD to stay in this market for some time to come.

    11. Re:Where's the fine print? by tibman · · Score: 1
      --
      http://soylentnews.org/~tibman
    12. Re:Where's the fine print? by Luckyo · · Score: 1

      In terms of APUs, they have intel not just beat but utterly demolished. Intel has absolutely nothing on AMD when it comes to combination of slowish low TDP CPU and a built in GPU with performance of a low end discreet GPU.

      And while they lack CPU power for high end, wouldn't you want a discreet CPU with a discreet GPU in that segment in the first place?

    13. Re:Where's the fine print? by Rockoon · · Score: 2

      The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s

      32GB/s doesnt sounds like a lot when you divide it amongst the 400 stream processors that an upper end AMD APU has, and thats as favorable a light as I can shine on your inane bullshit. There is a reason that discrete graphics cards have their own memory, and it isnt because they have more stream processors (these days they do, but they didnt always) .. its because PCI Express isnt anywhere near fast enough to feed any modern GPU.

      Llano APU's have been witnessed pulling 500 GFLOPS. Does 32GB/s still sound like a lot? No, it sounds like shit. Clearly memory bandwidth is a big issue in this scene.

      --
      "His name was James Damore."
    14. Re:Where's the fine print? by parlancex · · Score: 1

      Sigh. Here I go feeding the trolls.

      I'm not sure what point you're trying to make here, since MY main point in the rest of this topic was that modern GPUs are mostly limited by memory bandwidth, which makes the development in TFA pretty pointless. You're right! 32GB/s isn't enough to make the most of the computing resources available on a modern GPU! That was my point; How exactly would the GPU accessing main memory directly help? The fastest system RAM currently available in consumer markets in the fastest possible configuration can barely reach 30GB/s. In order for GPUs to confer a computional advantage they need to be doing heavy lifting on GDDR RAM which could deliver over 160GB/s on cards that are over 4 years old.

    15. Re:Where's the fine print? by cynyr · · Score: 1

      because you wouldn't need to transfer it between the CPU and GPU? you could just pooint the GPU at main system ram and let it have at it.

      --
      All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
    16. Re:Where's the fine print? by fast+turtle · · Score: 1

      The only way you can state that "each tab in the browser is another thread" is if you're not using firefox. Even on 20.1, they still don't have that and a single bad tab can and does take the entire browser down. Hell even IE 9 finally got it straight with tabs. Doesn't handle too many tabs at once but it runs each one i a seperate thread. Just like Chrome does. Opera does the same AFAIK (don't use it). You're right about background tasks and such though as the multithread performance of an AMD CPU is far better then Intel's unless you're into the Xeon's.

      --
      Mod me up/Mod me down: I wont frown as I've no crown
    17. Re:Where's the fine print? by ardor · · Score: 1

      The sad thing is that the PowerVR's are actually pretty decent. The drivers (made by Intel) are to blame.

      --
      This sig does not contain any SCO code.
  5. CPU - GPU - CPU latency by Anonymous Coward · · Score: 2, Interesting

    This should really help round trip times trough the GPU. With most existing setups, doing a render to texture, and getting the results back CPU side is quite expensive, but this should help a lot. It should also work great for procedural editing/generating/swapping geometry that you are rendering. Getting all those high poly LODs onto the GPU will not longer be an issue with systems like this.

    Interestingly enough, this is somewhat similar to what Intel has now for their integrated graphics, except it looks like the AMD GPU has access to the full address space and cache system, which Intel does not do. Also, its not an Intel GPU, so its likely better in other ways too, but I shouldn't need to point that out.

    Intel's Haswell is moving in the opposite direction working to get some dedicated memory for the GPU, which is closer to the traditional GPU approach. Its nice to see companies exploring new areas; hopefully we will get some great hardware out of it, ideally with no broken drivers.

  6. Why compromise? by parlancex · · Score: 1

    One question they never seem to answer is why bother unifying the memory architecture at all? CPU and GPU memory architectures have always been different for the same reasons that CPUs and GPUs themselves are different; one is designed for fast execution of serial instructions with corresponding random smaller reads and writes to memory, and the other is designed for fast execution of parallel instructions with corresponding contiguous reads and writes that are much larger in size. It seems like you're just going to to ending compromising if you try to shoehorn one onto the other.

    1. Re:Why compromise? by SenatorPerry · · Score: 5, Informative

      In OpenCL you need to copy items from the system memory to the GPU's memory and then load the kernel on the GPU to start execution. Then you must copy the data back from the GPU's memory at the end after execution. AMD is saying that you can instead pass a pointer to the data in the main memory instead of actually making copies of the data.

      This should reduce some of the memory shifting on the system and speed up OpenCL execution. It will also eliminate some of the memory constraints on OpenCL regarding what you can do on the GPU. On a larger scale it will open up some opportunities for optimizing work.

    2. Re:Why compromise? by dgatwood · · Score: 1

      I can see the benefit of being able to allocate a GPU/CPU-shared memory region in VRAM for fast passing of information to the GPU without a copy, but apart from making the above concept slightly cheaper to implement, the only benefit I could come up with for allowing the GPU access to main memory is making password theft easier. That and letting their driver developers write sloppier code that doesn't have to distinguish between two types of addresses....

      The most hilarious part of this is that while they're doing this, the rest of the world seems to be moving in the opposite direction, towards having separate I/O and physical address spaces that are mapped with an MMU. But I digress.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    3. Re:Why compromise? by forkazoo · · Score: 4, Insightful

      Because when you are doing stuff like OpenCL, dispatching from CPU space to GPU space has a huge overhead. The GPU may be 100x better at doing a problem than the CPU, but it takes so long to transfer data over to the GPU and set things up that it may still be faster to do it on the CPU. It's basically the same argument that led to the FPU being moved onto the same chip as the CPU a generation ago. There was a time when the FPU was a completely separate chip,a nd there were valid reasons why it ought to be. But, moving it on chip was ultimately a huge performance win. The idea behind AMD's strategy is basically to move the GPU so close to the CPU that you use it as freely as we currently use the FPU.

    4. Re:Why compromise? by kukulcan · · Score: 1
      I agree with you.

      Having a unified memory is a nice thing, but i expect it will only make a difference in something like the PS4, where you can target a specific architecture, which has GDDR5 as main memory, and doesn't have a discrete GPU. These two points are relevant: if you have "normal" DDR3 you loose a lot more than you gain by having UMA, and this will not change a thing in discrete GPUs because the PCIe bus is going to always be in the way of the GPU accessing main memory.

      I think it is more a "nice to have" than a big step forward. The difficulty in programing GPUs lies in the different algorithms one must employ, and while having to copy memory back and forth between the CPU and GPU is a nuisance and something to be avoided, that usually isn't a dealbreaker, though i admit it is useful in some situations.

    5. Re:Why compromise? by parlancex · · Score: 1

      Yeah, that's about right. I was just quoting from the OP there. An application that was already properly optimized on the CPU generally only sees performance gains of around 10 to 20x in best case scenarios.

    6. Re:Why compromise? by markhahn · · Score: 1

      nah. providing wider and faster memory will help even purely CPU codes, even those that are often quite cache-friendly. the main issue is that people want to do more GPUish stuff - it's not enough to serially recalculate your excel spreadsheet. you want to run 10k MC sims driven from that spreadsheet, and that's a GPU-like load.

      but really it's not up to anyone to choose. add-in GPU cards are dying fast, and CPUs almost all have GPUs. so this is really about treating APUs honestly, rather than trying to pretend they can survive on old-fashioned CPU memory interfaces.

    7. Re:Why compromise? by hedwards · · Score: 1

      I'm sure that they've thought about that already. The question is whether they've done the work necessary to deal with the problem.

    8. Re:Why compromise? by GoatCheez · · Score: 1

      That wouldn't exist otherwise because.......?

  7. Re:No match for Haswell by Anonymous Coward · · Score: 1, Insightful

    The APU graphics kick the shit out of Intels, and now, you don't even need a memory->vid memory BUS. Think about it

  8. Security model? by dgatwood · · Score: 1

    They talk about passing pointers back and forth as though the GPU and CPU effectively share an MMU. The problem is, GPUs and CPUs don't work the same way. GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis. It would be devastating if a GPU could, for example, allow an arbitrary user-space process to overwrite parts of the kernel and inject virus code that runs with greater-than-root privilege. It would similarly be devastating if some arbitrary process could, for example, read the private RAM that backs your keychain or other security-related processes.

    I'm assuming that they're doing something sane like having a separate set of RWX bits on each page table entry to control what the GPU's rights are for that page, so that the GPU would only be allowed to read specifically flagged main-memory pages, but these fuzzy marketing briefs provide just enough information to be terrifying.

    --

    Check out my sci-fi/humor trilogy at PatriotsBooks.

    1. Re:Security model? by forkazoo · · Score: 1

      My understanding is that there will indeed be something like RWX control. Not just for security, but also for performance. If boths ides can freely write to a chunk of memory, you can get into difficulties accounting for caches in a fast way.

      That said, if the CPU and the GPU are basically sharing an MMU, then the GPU may be restricted from accessing pages that belong to process that aren't being rendered/computed. There's no reason why two different applications should be able to clobber each other's texture memory if they do something stupid. So, having the GPU share pointers with the CPU is potentially a very good thing for security. (How well AMD implements the concept in practice remains to be seen, but I'm optimistic.)

    2. Re:Security model? by frank_adrian314159 · · Score: 1

      GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis.

      If you plan to make the GPU easy to use as a general computing resource (which, according to the writeup, seems to be what they're aiming at) the GPU needs to also be working at a per-process basis and linked to the main system memory so that results are easily available to the main system for I/O, etc.

      Of course, even if this is their goal, one question still remains... Will this be useful? It all depends on the apps. I could see this architecture potentially making programming easier for folks who program audio processing software, rendering, or large simulations, especially if they can make the GPU look more like a general purpose processor.

      --
      That is all.
  9. Name is a pun by Anonymous Coward · · Score: 2, Informative

    Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.

    HTH,

    ac

    1. Re:Name is a pun by Anonymous Coward · · Score: 1

      Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.

      HTH,

      ac

      "Kaveri" is actually the name of a major river in Karnataka, a state in India. AMD names its cores on major rivers all around the world.

      HTH.

    2. Re:Name is a pun by jones_supa · · Score: 1

      Yes, buddy would be spot on. :)

      But all in all, the chip's name really sounds sympathetic to the Finnish ear!

    3. Re:Name is a pun by Radak · · Score: 1

      It's the night before Vappu. We're way too busy getting drunk in Finland.

    4. Re:Name is a pun by TeknoHog · · Score: 1

      It's not just Finnish. Hebrew chaver may be the common etymology for both this and the Finnish word. It is also the origin of the Dutch word gabber.

      OTOH, the pun with APU is harder to explain without Finnish.

      --
      Escher was the first MC and Giger invented the HR department.
  10. Re: No match for Haswell by UnknowingFool · · Score: 2

    When someone asks me about buying AMD or Intel, the general summarization I give them is that AMD's built-in GPU handily beats Intel's built-in GPU but Intel's CPU beats AMD's CPU. If graphics are a big concern, they should get a cheap discrete card as one under $100 will be good for most games. Thus AMD's advantage is negated. Also both companies offer more CPU processing power than most consumers can use anyway.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  11. Here's to better AI! by pieisgood · · Score: 1

    With a GPU next to the CPU the latency between them is reduced, this is awesome for OpenCL applications. Imagine you wanted to work a markov model into your AI and you needed to a large number of matrix calculations to get it to run properly and you want it in real time, I think this might solve that problem. I'm imagining game AI improving with adoption of this style of processor. Anyone see this differently?

    --
    Eat sleep die
    1. Re:Here's to better AI! by godrik · · Score: 1

      I don't know... This heterogeneous computing with low latency seems interesting if it does not harm raw performance. The main advantage would be to transport data back and forth between the two. If the computation on one side is long, then the decrease in latency is not very useful. If both of them are really fast, then there is not too much to gain to begin with.

      It really helps when you need fast turn around so for small and very synchronous computation. I am waiting to see one good usecase.

    2. Re:Here's to better AI! by gmueckl · · Score: 1

      Learning AIs in games have been problematic in the past. Mostly it is about control over the experience that gets delivered to the customer: as a designer your job is to get it just right. You can do this easily with current more or less heuristic AI algorithms. The ability to learn opens the scope of possible behaviours so much that it's not possible anymore to deliver a guaranteed experience.

      Short version: the designer can't stop the game from going nuts in unpredictable ways because of stupid player input (and well, all player input is "stupid").

      --
      http://www.moonlight3d.eu/
  12. The real reason is cost by Wesley+Felter · · Score: 1

    In low-cost systems the CPU and GPU are combined on a single chip with a single (slow) memory controller. Given that constraint, AMD is trying to at least wring as much efficiency as they can from that single cheap chip. I salute them for trying to give customers more for their money, but let's admit that this hUMA thing is not about breaking performance records.

  13. heterogenous computing overrated by edxwelch · · Score: 1

    I think AMD overrate heterogenous computing. The assumption is that all applications can take advantage of GPGPU. This is simply not true. Only certain types of application are suitable, such as multimedia and simulation - where it's very obvious what part of the code can be parallelised.

  14. This looks an awful lot like the PS4 by juancn · · Score: 2
    Today I read an an article in Gamasutra that details some of the internals of the PlayStation 4 and the architecture looks a lot like what's described here.

    With GDDR5 memory this could be very interesting.

    1. Re:This looks an awful lot like the PS4 by UnknownSoldier · · Score: 1

      Holy crap -- has hell frozen over? Sony is actually thinking about developers for once!? Using (mostly) off the shelf commodity parts is definitely going to help win back some developers. Time will tell if "they are less evil then Microsoft"

      Thanks for the great read.

  15. Re:Kaveri is named after a river in India!!! by mooglez · · Score: 1

    Kaveri is also finnish, and means "a friend", seems to describe the system decently.

  16. Re:Ivy Bridge already has shared memory architectu by dstyle5 · · Score: 1

    Hi Charlie!

  17. Re:Larrabee by gmueckl · · Score: 1

    They sell this stuff under the brand name Xeon Phi now. It's something like 60 simplified x86-like units on a die. Looks like they only cater to big orders from supercomputer builders right now.

    --
    http://www.moonlight3d.eu/
  18. SGI O2 reinvented by Shinobi · · Score: 2

    OK, so the SGI O2's UMA has now been reinvented for a new generation, just with more words tacked on....

  19. What about the software model by PhamNguyen · · Score: 1

    I'm interested to see what the software model for this will be. Sure they could use OpenCL, but it seems like a lot of the pain in using OpenCL derives from the underlying memory architecture. With a shared virtual address space and fully coherent caches all in hardware, it should be possible to have a much simpler software model than OpenCL. I guess it doesn't really matter what the software model is though since now that everything is in main memory, GPU functions can be called just like regular functions and the caller doesn't need to care how they are implemented. E.g. it should be possible to have a BLAS GPU library that operates on main memory pointers, where before the cost of copying a matrix to the GPU and back for a single operation woudn't have been worth it.

    1. Re:What about the software model by serviscope_minor · · Score: 1

      Indeed it should be easier. There will still be some cost, since the processors are still in thread bundles and still trade speed for throughput, but the cost will be much lower. I expect the break even point will be pretty small though and won't have the huge disadvantage of limited memory for very large things.

      I wonder what the low level locking primitives between the GPU and CPU will be. Those will have some effect on the speed.

      I also wonder what/how the stream processors will be dealt with by the OS and scheduler.

      Since they're the right side of the MMU now, it would seem that the os kernel is the right place to do the scheduling and so on. Presumably with thread bundles though the program will have to keep the kernel well informed about scheduling requirements.

      At this point would the most sensible thing be for AMD to simply release the instruction set and let everyone go nuts on it. I guess they can still provide proprietary drivers for 3D graphics, but the interesting thing here isn't the graphics capability anyway.

      --
      SJW n. One who posts facts.
  20. Re: No match for Haswell by Flodis · · Score: 1
    You do realize that what you're saying is an argument for AMD, don't you?

    both companies offer more CPU processing power than most consumers can use anyway.

    Ok. Noted. Either will do fine CPU-wise.

    AMD's built-in GPU handily beats Intel's built-in GPU

    Ah. Great. So AMD is the better buy then.

    Not only that, but it will save ~$100 on the CPU and ~$50 more on the motherboard. That's GREAT advice.

    But no.. Then we hear this;

    If graphics are a big concern, they should get a cheap discrete card as one under $100 will be good for most games. Thus AMD's advantage is negated.

    Ummm.. First you made a good case for AMD, and now you're saying they should pick Intel anyway, and not only that, They should cough up an extra $100 on top of the ~$150 extra they already need to cough up, just to negate AMD's advantage. WTF? Why not just pick AMD in the first place then?

  21. Re: No match for Haswell by cheater512 · · Score: 2

    AMD beats Intel on the price point however.
    And that isn't even counting that with Intel you need to buy a $100 extra card either.

    If you *need* top notch performance, go Intel. Otherwise AMD will be lighter on your wallet and do the same job very well.

  22. Re: No match for Haswell by UnknowingFool · · Score: 1

    No. I'm saying if the user intends to get a discrete GPU there isn't an advantage to AMD and a slight advantage to Intel. But most consumers don't do anything that would see a difference anyways. Either works.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  23. Re: No match for Haswell by fast+turtle · · Score: 1

    radeon 5670 with 512m onboard (cost from newegg when bought $90) plays GW, SC and all the other games Iv'e thrown at it quite handily. Will have probs if game is is heavily tesserected but that's the only time it's a prob and I run 1900x1080 (monitor native rez) and the funniest thing is - the new radeon drivers support the damn thing while my 7300GT is no longer supported by either Nvidia or Linux, even with the god damn nouveua and nv driver. The reason I still have the old Geforce 7300GT - it's fanless so don't have to worry about it dying from overheating.

    I've said it before and I'll say it again, what AMD is doing is pushing the APU as the new FPU. That's right. Once they get things completely revamped, you're going to be looking at a CPU that outperforms Intels best by quite a bit in the next decade.

    --
    Mod me up/Mod me down: I wont frown as I've no crown
  24. Re: No match for Haswell by camg188 · · Score: 1

    I think AMD's target for this architecture is a typical Walmart shopper (lower price point, higher sales volume) looking to buy a laptop, so add-on video cards are out of the question. The first 2 questions this type of shopper will ask is "how much?' and "which one is better?"

  25. One address is better than two by Ottibus · · Score: 1

    Why would a graphics card want to use virtual memory?

    Shared physical memory avoids the cost of copying data to and from the GPU but without shared virtual memory the data will end up at different addresses on the CPU and GPU. This means that you cannot use pointers to link parts of the data together and must rely on indexes of some sort. This makes it harder to port existing code and data structures to use GPU computation.

    Also, with shared physical memory you have to tell the device which memory you want to use (so that it can tell you which address to use). With shared virtual memory you can use any memory that is mapped into the CPU process and the memory system will automatically make it visible to the GPU.

    In other words, it makes the programmers' life easier. How you measure this benefit is another question altogether!

  26. GPU/GPGPU bottleneck by S3D · · Score: 1

    In my experience GPU and especially GPGPU bottleneck is not amount of memory but memory access bandwidth. 256-512 bit is not adequate for existing apps. Before amount of memory will become important manufacturers should move to at least 2048 bit mem bus and also increase amounts of registers per core several times.

  27. IOMMU by FithisUX · · Score: 1

    I haven't seen this magical word in the presentation. Moreover I do not see the CPU/GPU convergence often talked about. It sounds more like a marketing hype. Moreover the ecosystem could be enriched with DSP or Network processor cores all uniformly offering their resources to software, I did not see it.

  28. Re: No match for Haswell by serviscope_minor · · Score: 1

    I think AMD's target for this architecture is a typical Walmart shopper

    Partly that and partly it's way more interesting. The unified memory trades performance for flexibility (as always), but puts it in a very interesting space. Less performant than a discrete GPU with a crazy memory architecture, but puts tons more FPU grunt under the flexible memory susbsystem of easy to use CPUs.

    It will make acceleration more applicable to a much wider range of tasks at the cost of being slower on some.

    Due to the close coupling, on the right codes, this thing outght to absoloutely hammer even the top end i7s. It should also be able to handily beat discrete GPUs on tasks where the cpu-gpu-cpu latency is just too high, or where GPUs just don't have enough memory.

    Of course on single threaded tasks even the i5 and i3 will probably beat it though AMD has been slowly closing the gap and this will improve the situation slightly.

    I can see this being personally useful to me. The thing is that discrete GPUs are a bit of a major faff for too many tasks and yield too little benefit.

    Honestly due to the enhanced opportunities for acceleration, it's probably a waste to use it for graphics. May as well offload that on to some dedicated hardware. And the cycle of reinvention begins again.

    --
    SJW n. One who posts facts.
  29. Re: No match for Haswell by Flodis · · Score: 1

    Not trying to pick a fight here, but I don't think this computes unless you change your mind about the importance of the CPU's computational power, or take some other - not yet mentioned - factor(*) into consideration.

    Eg: If the user intends to get a discrete GPU, as you say, s/he will have approx $150 more to spend on the GPU if s/he picks the AMD solution. A $250 GPU vs. a $100 GPU is a pretty significant difference. Thus if graphics matter, the user should pick the AMD solution.

    (*) of which there is possibly a boatload to consider. Socket longevity, thermal design power, ability to build a quiet system, ability to use ECC memory, etc. Not only price, but also many 'features' favor AMD since AMD tends to enable ECC, AMD-V and such in consumer CPUs, whereas you have to step up to Xeons to get that from Intel. However, some properties such as Computational power per Watt tend to favor Intel in a significant way. Where I think we agree, is that with Intel you can get pretty much everything you can get from AMD, provided you're willing to spend the money (Eg. step up to a Xeon CPU, add a discrete graphics card).

  30. Re:The GPU side still needs it's own memory channe by halltk1983 · · Score: 1

    If you prefer one hardware over the other without seeing benchmarks, then you are someone that is usually referred to as a "fanboy". Have fun with that.

    --
    Watch for Penguins, they eat Apples and throw rocks at Windows.
  31. OK so how do I (or someone) use this? by RalphTheWonderLlama · · Score: 1

    The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs.

    So how do you do this in Java, Python? Did nobody ask? I did a search for "java huma uniform memory access" and this page came up first with nothing from java.com or oracle in sight.

    Ok more searching says to use OpenCL and lots of stackoverflow questions... but they're not new... and OpenCL is not Java. What do you do for this new easier to program hardware? Is their definition of "supported" currently a bit optimistic? Supported by Java..... because Java lets you do lots of things not actually in Java and still work with a Java program, so pretty much anything is "supported" in Java. Is that the jist? I guess we need the tools to evolve before things really take hold.

    --
    simple, fast homepage with your links: http://www.ngumbi.com/