AMD Fusion System Architecture Detailed
Vigile writes "At the first AMD Fusion Developer Summit near Seattle this week, AMD revealed quite a bit of information about its next-generation GPU architecture and the eventual goals it has for the CPU/GPU combinations known as APUs. The company is finally moving away from a VLIW architecture and instead is integrating a vector+scalar design that allows for higher utilization of compute units and easier hardware scheduling. AMD laid out a 3-year plan to offer features like unified address space and fully coherent memory for the CPU and GPU that have the potential to dramatically alter current programming models. We will start seeing these features in GPUs released later in 2011."
Integrating CPU, GPU and unifying the memory address space will probably make things easier for programmers. So hopefully it'll help programmers utilize the hardware better.
I think only a small number of computer users upgrade components these days - gamers and power users. But the majority of people these days buy a beige box or a laptop and never ever open them. From a business point of view, combining the GPU and the CPU makes sense. Heck, nobody cried when separate math coprocessors disappeared.
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Dead. Project.
Larrabee proved to have a few fundamental flaws, last I checked.
One concern of mine is simply performance with unified memory. The reason is that memory bandwidth is a big factor in 3D performance. The kind of math you have to do just needs a shitload of memory access. This is why GPUs have such insane memory configurations. They have massively wide controllers, special high performance ram (GDDR5 is based on DDR3, but higher performance) and so on. That's wonderful, but also expensive.
So it seems to me that you run in to a situation where either you are talking about needing to have much more expensive memory for a computer, possibly with additional constraints (at high speeds memory on a stick isn't feasible, electrical issues are such that you have to solder it to the board) or a system where your performance suffers because it is starved for memory bandwidth. Please remember that it would also have to share memory with the CPU.
Perhaps they've found a way to overcome this, but I'm skeptical.
I also worry this could lead to fragmentation of the market. What I mean is right now we have a pretty nice unified situation from a developer perspective. AMD and Intel have all kinds of cross licensing agreements with regards to instruction sets. So the instructions for one are the instructions for the other. While there are special cases, like 3DNow that only AMD does, or AVX which Intel has and AMD has yet to implement, by and large you have no problems supporting both with a very similar, or dead identical, codebase.
Likewise GPUs are unified from an app perspective. You talk to them with DirectX or OpenGL. The details of how AMD or nVidia do things aren't so important, that handled. You use one interface to talk to whatever card the user has. Not saying there can't be issues, but by and large it is the same deal.
Well this could change that. APUs might need a drastically different development structure. Ok fine, except AMD might be the only company that has them. Intel doesn't seem to be going down this road right now, and nVidia doesn't have a CPU division. So then as a developer you could have a problem where something that works well for traditional CPU/GPU doesn't work well, or maybe at all, for an APU.
That could lead to a choice of three situations, none that good:
1) You develop for traditional architectures. That's great for the majority of people, who are Intel owners (and people who own what is now current AMD stuff) but screws over this new, perhaps better, way of doing things.
2) You develop for the APU. That is nice for the people who have it but it screws over the mass market.
3) You develop two versions, one for each. Everyone is happy but your costs go way up from having more to maintain.
Of course even if everything goes APU it could be problematic if AMD and Intel have very different ways of doing things. Their cross licensing does not extend to this sort of thing, and I could see them deciding to try and fight it out.
So neat idea, but I'm not really sure it is a good one at this point.
Except Larrabee failed because performance didn't live up to expectations and was a generation behind the best from AMD and nVidia. What this development from AMD allows is much more efficient interaction and sharing of data between a traditional CPU and an on-die GPU through updates to the memory architecture. These memory changes will also allow the parts to take advantage of the very fastest DDR3 memory that current CPUs struggle to fully utilise.
The two most obvious scenarios for this technology are for accelerating traditional problems that take advantage of the existing vector units (SSE, etc.) by utilising the integrated GPU to massively accelerate these programs, and in gaming rigs where there is a discrete GPU the new architecture allows the integrated GPU to share some of the workload. The example given, and one that is increasingly relevant as all games now have physics engines, is for the discrete GPU to concentrate on pushing pixels to the screen and the integrated GPU to be used to accelerate the physics engine.
Is it a game changer? Probably not in the first couple of generations, although it would be a very welcome boost to AMDs platform that could get them back in the game as the preferred CPU maker. But long term Intel will have to come up with an answer to this in some form as programmers get ever more adept at exploiting the GPU for general purpose computing, and changes like those AMD are incorporating into their designs make these techniques ever more powerful and relevant to wider ranges of problems. Adding more x86 cores won't necessarily be the answer.
... and congratulated AMD for redescovering sgi's O2 Unified memory Architecture..
PS: IBM PC jr. (1984) & Commodore Amiga (1985) were actually the 1st one to use UMA. Could this mean we will have "Chip RAM" & "Fast RAM" again ? :)
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
It's not that difficult to write code that takes full advantage of modern hardware. The limitation is need. Every 18 months, we get a new generation of processors that can easily do everything that the previous generation could just about manage. Something like an IBM 1401 took a weekend to run all of the payroll calculations for a medium sized company in 1960, using heavily optimised FORTRAN (back when Fortran was written in all caps). Now, the same calculations written in interpreted VBA in a spreadsheet on a cheap laptop will run in under a second.
It would be naive to say that computers are fast enough - that's been said every year for the last 30 or so, and been wrong every time - but the number of problems for which efficient use of computational resources is no longer important grows constantly. Look at the number of applications written in languages like Python and Ruby and then run in primitive AST interpreters. A decent compiler could run them 10-100x faster, but there's no need because they're already running much faster than required. I work on compiler optimisations, and it's slightly disheartening when you realise that the difference that your latest improvements make is not a change from infeasible to feasible, it's a change from using 10% of the CPU to using 5%.
I am TheRaven on Soylent News
I would imagine that you'll likely still be able to upgrade by adding a discrete graphics card for quite some time.
There will still be that same ability to get separate components, but the GPU element is being moved from the chipset onto the CPU(now called an APU).
There really have been only three general configurations:
1: CPU with integrated graphics on the motherboard
2: CPU with integrated graphics on the motherboard PLUS a discrete video card/GPU.
3: CPU without integrated graphics on the motherboard with ONLY one or more video cards.
So, what this does is to update 1 and 2, since you can still add a discrete video card. Since the graphics portion of Fusion is better than what Intel offers, this isn't a bad setup. There will also be the option to swap the APU with a faster version that has both a faster CPU core as well as faster GPU core in most motherboards.
Yes, there are certain advantages offered by the APU design, but it isn't an "all or nothing" offering, AMD will continue to offer straight CPUs(with Bulldozer being the next core design), and if you think about it, AMD may go to a tick-tock design like Intel has, but rather than it being based on core design and fab processor technology going back and forth, we may see AMD going CPU core design, GPU design, and then APU to combine the latest CPU and GPU designs.
Right now, many are waiting for AMD to release its first all new core design since 2003, since that will hopefully get AMD the better CPU core performance that many have been waiting for.
Laptop sales passed desktop sales a couple of years ago. Anyone buying a desktop is now in the minority. With laptops, the constraints are different. Having the CPU and GPU in separate chips complicates the board design, which adds to the cost. With integrated CPU and GPU designs, you can have a simple board design and just pop a faster chip in the top of the line models.
Upgrading your GPU separately? My first PC had a slot for installing an FPU. You could get one from Intel, but you could get faster ones from AMD. Then Intel integrated their inferior FPU into the die with the 486. How many people now complain about not being able to replace their FPU with a faster third-party one?
I am TheRaven on Soylent News
No it doesn't. Like OpenCL, CUDA basically means you're sending instructions to the GPU by writing data to a mapped memory region. Sharing address space is not possible at that level. It's only possible to do at a CPU level.
Apple has "Mac vs PC", Microsoft has "Laptop Hunters", Linux has recession
Since this design seems to be about using the APU for non-graphics things as well, you could probably stick an nVidia card in the PCI-E slot for better video and continue to use the Fusion APU for OpenCL (or whatever) at the same time.
Also, nothing about AMD's new design precludes discrete GPUs more or less similar to today's models, it is just an effort to make the (economically inevitable) integrated GPU more useful by virtue of its close integration with the system, rather than simply cheaper as integrated GPUs are today.
Expansion will be slightly trickier than today's Crossfire/SLI, because certain GPU elements(while comparatively few) will enjoy much faster access to the CPU and main memory, while the expansion GPU(s) will presumably have many more elements, and their own pool of RAM; but be a PCIe bus away from the CPU. I'm sure that the beta drivers and the edge cases will be pretty dire; but it will eventually be worked out.
One reason why laptop sales passed desktop sales is of course that desktops last longer, due to their upgradeability.
While I agree with you regarding application programming, need, etc. I must clarify that I was talking about graphics/game applications that require the full hardware potential.
If you compare this new architecture with an arguably over complicated architecture like the playstation 3 I'd argue that writing software that utilizes the hardware to its full potential is indeed hard. And in this context, making a more elegant, integrated GPU/CPU will make the lives of us poor indie game programmers a bit easier.
The current trend seems to be towards more power efficient hardware and virtualization (and dynamic scaling etc), rather than ever faster hardware...
So while your interpreted spreadsheet may be able to compute payroll calculations in a second, your hardware will consume more power doing it that way than using an optimized implementation... Also with sub optimal code, you won't be able to run as many instances on a single piece of hardware, and thus require more hardware.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Not really. Now the CPU spends 95% of its time waiting for data from the network or disk instead of 90%, but the CPU is rarely the bottleneck these days.
Around the time of the Pentium II, Intel did some simulations where they increased the (simulated) speed of the CPU running typical applications and measured performance. They found that, if the speed of other components didn't change, an infinitely fast CPU (i.e. all CPU operations took 0 simulation time) ran about twice as fast as the ones that they were shipping. It doesn't take much of an improvement in CPU speed before the CPU just isn't the bottleneck anymore, even in processor intensive tasks. RAM and disk bandwidth and latency quickly take over. This was one of the problems Apple had with the PowerPC G4 - the RAM wasn't fast enough to supply it with data as fast as it could process it, so it rarely came close to its theoretical maximum speed.
I am TheRaven on Soylent News
But since they couldn't do it, the original plan does mean much, now does it?
Will it run Linux?
I'm not being facetious, I got stung by the lack of support by Nvidia for their Optimus graphics cards on my ASUS U30JC.
Thankfully Martin Juhl has been working on a solution using VirtualGL, which gives us the use of our Nvidia cards under linux
I think what is going to be really interesting is to see what this does to PC gaming from the perspective of non-Windows operating systems.
APUs are clearly a step forward in the direction of putting powerful graphics processing on portable devices, an area where Microsoft and Windows has very little marketshare at the moment.
Therefore, this surely must bring DirectX's domination in the PC gaming market into question - will this therefore result in more commercial games being developed around OpenGL, thus making cross-platform games much easier to develop?
Gentoo Linux - another day, another USE flag.
A "math coprocessor" is just the FPU (Floating Point Unit) of a particular era of microcomputers. The FPU implements machine instructions for floating point math. Before the microcomputer, when machines filled cabinets, you might have an FPU (on one or more circuit boards), you might not. Same with the early micros. Eventually they built the FPU into the same die as the CPU, so no need for a separate chip. The FPU is always tightly coupled to the CPU because it shares the same control unit as the CPU. (A CPU consists of a control unit plus an arithmetic/logic unit.) You can't change the design of one without changing the other.
A GPU is different from an FPU. It doesn't process CPU instructions -- it has its own control unit. GPUs operate independently of the CPU.
Building a CPU into the same die or IC package as the CPU won't prevent you from installing a discrete graphics card. No need to get all upset about it.
Although the tech may eventually get to the point where you won't bother with a discrete graphics card. I suspect we'll eventually see a large package containing CPU, GPU and memory, for performance reasons. One will upgrade them all together.
Before you panic about that: In the early days of minicomputers, CPUs were implemented as many boards containing lots of discrete logic and small scale integration. It was possible to do things like change how the adder was implemented, how memory was accessed, or add whole new machine instructions. You could "upgrade" at that level. That capability was lost with the move to (very) large scale integration. However, things are so much cheaper and faster with (V)LSI that it's worth it.
So if $100 will bring you a new CPU, GPU, and RAM, running 10x faster than what you had before, then yah, I can see it happening, and being a win.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Frankly I really don't see how much better GPUs can get picture wise myself. Hell my HD4850 which my GF got me for my BDay cranks the living hell out of the purty, so much I have to be careful not to be distracted by the purty and get my ass blown off! And maybe it is different with CUDA but the only thing I've seen come out for Streams is a video transcoder that frankly doesn't give you as good a result as a plain Jane CPU only transcode, and the time savings isn't worth the picture hit.
So while I'm sure this will make programmers happy I really don't see how it will make much of a difference to Joe user. Hell even the sub $150 GPUs that are the biggest market have so much purty being thrown on the screen it is truly insane, I never thought I'd see the day that human faces and movements would get THAT realistic!
And finally there is that bloated stinking dead elephant in the room no one mentions, I'm of course talking about the craptastic consoles that everyone is writing the games for. While i like the fact that the vast majority of games will run native resolution with lots of bling even on my 3 generations old HD4850, I'll be the first to admit PCs aren't the main target market anymore. Hell the new Nintendo is gonna have the HD4xxx series, which like mine is already three generations behind and it ain't come out yet!
So I honestly don't see how all this extra goodness is gonna make much of a diff. The developers write to the consoles first, the consoles don't have these features, therefor nobody writes to them. Hell look at how few DX10 and DX11 games are out, simply because the consoles are DX9. If the other consoles follow Nintendo then we'll be seeing DX10 in late 2012, so maybe this cutting edge stuff will get used by the majority of games around 2022, when you can pick up these chips at a yard sale for $5. Depressing, but that is life.
ACs don't waste your time replying, your posts are never seen by me.
Actually, more a latency problem than a bandwidth problem. It's that the pyramid of L1-2-3 cache and system ram is quite a few cycles from the CPU. You can see with multi-core systems that they scale quite nicely as long as you are running well multi-threaded code.
The other question is of course how much of the bottleneck is between chair and keyboard. Very often they'll complain if the computer takes 5 seconds to do some heavy processing while they happily goof off for 5 minutes. And it's not like computers practically lock up under load anymore, you can always do other things while it's working.
Live today, because you never know what tomorrow brings