AMD's Fusion CPU + GPU Will Ship This Year
mr_sifter writes "Intel might have beaten AMD to the punch with a CPU featuring a built-in GPU, but it relied on a relatively crude process of simply packaging two separate dies together. AMD's long-discussed Fusion product integrates the two key components into one die, and the company is confident it will be out this year — earlier than had been expected."
Sup dawg. We herd you like processing units, so we put a processing unit in yo' processing unit so you can computer while you compute!
It doesn't really matter, any more than AMD's "proper" quad core mattered more than Intel pasting two dual-core dies together. This is really just AMD getting beaten to the punch again, and having to try to spin it in some positive way. It's great news that it will be out earlier than expected, but I think they would have been better off taking the less "beautiful" and just throwing discrete dies into a single package. Particularly as it has yet to be seen how big the market for this sort of thing is. More exciting to me is that AMD is ahead of schedule with this, so hopefully they'll be similarly ahead with their next architecture. I'm yearning for the day when AMD is back to being competitive on a clock-for-clock basis with Intel.
Calling Intel's offerings crude sounds like it is quoting from AMD's press release. It may be crude, but it works and was quick and cheap to implement. But does it have any disadvantages? Certainly the quote from the article doesn't seem terribly confident that the integrated offering is going to be any better:
We hope so. We've just got the silicon in and we're going through the paces right now - the engineers are taking a look at it. But it should have power and performance advantages.
Dissing a product for some technical reason that may not have any real performance penalties? That's FUD!
Even faster than current generation discrete GPUs? I think not.
They'll move data inside the chip instead of having to send it off to the internal bus, they'll have access to L2 cache (and maybe even L1 cache), they'll be running in lock-step with the CPU, etc, etc. These have distinct advantages over video cards.
I hope so, Intel is far too dominant right now.
expandfairuse.org
AMD Fusion was meant to compete with Larrabee which is not released. The Intel package with two separate dies is not interesting. The point of these products is to give the programmer access to the vast FP power of a graphics chip, so they can do, for instance, a large scale fft and ifft faster than a normal CPU. If this proves more powerful than Nvidia's latest Fermi (GTX 480 I believe), then expect a lot of shops to switch. Right now my workplace has a Nvidia Fermi on backorder, so it looks like this is a big market.
That'll certainly increase bandwidth which will help outperform current integrated graphics and really low end discrete chips, but I severely doubt it will be enough to compensate for the raw number of transistors in the mid to high end discrete chips. An ATI 5670 graphics chip has just about as many transistors as a quad core Intel Core i7.
In addition to the CPGPU or whatever what they're calling it, Fusion should finally catch up to (and exceed) Intel in terms of niftilicious vector instructions. For example, it should have crypto and binary-polynomial acceleration, bit-fiddling (XOP), FMA and AVX instructions. As an implementor, I'm looking forward to having new toys to play with.
I hereby place the above post in the public domain.
This is great for mobile devices and laptops but I don't think I want my CPU and GPU combined in my gaming rig. I generally upgrade my video card twice as often as my CPU. If this becomes the norm then eventually I'll either get bottlenecked or have to waste money on something I don't really need. Being forced to buy two things when I only need one is not my idea of a good thing.
Call me when they can fit 9 inches of graphics card into one of these cpu.
Size isn't everything!
Forget thrust, drag, lift and weight. Airplanes fly because of money.
No.
CUDA is Nvidia.
ATI has Stream.
Worth noting is that Apple has invested rather heavily in technology to allow programmer use of the GPU in MacOS X. And were recently rumored to have met with high ranking persons from AMD. Seems only logical that this type of chip could find its way into some of the Apple gear.
Question is of course if it would be powerefficient enough for laptops, where space is an issue...
Actually Intel had a radical way to handle this - Larrabee. It was going to be 48 in order processors on a die with Larrabee new instructions. There was a Siggraph paper with very impressive scalability figures for a bunch of games running DirectX in software - they captured the DirectX calls from a machine with a conventional CPU and GPU and injected them into a Larrabee simulator.
This was going to be a very interesting machine - you'd have a machine with good but not great gaming performance and killer server performance - servers are naturally "embarrassingly parallel" because you can have one thread per client. A sort of x86 take on Sun's Niagra.
Of course there are problems with this sort of approach. Most current games are not very well threaded - they have a small number of threads that will run poorly on an in order CPU. So if the only chip you had was a Larrabee and it was both a CPU and a GPU the GPU part would be well balanced across multiple cores. The CPU part would likely not. You have to wonder about memory bandwidth too.
Larrabee was switched to be a GPU only and then canned.
Of course as a pure GPU it is a bit of a poor design. Real GPUs don't drag in x86 compatibility - they can implement whatever instruction set is best and nothing else. The instruction set is not publicly exposed and can change from generation to generation. You can cram a lot more than 48 cores onto a GPU and the peak performance is higher. Power consumption is lower too.
Still a modern gaming GPU is huge - there's no way you're going to cram it and a modern GPU onto a die and get something affordable. Then again CPUGPU chips are probably not aimed at gamers - there's an argument for having a CPU and a stripped down integrated GPU on one chip for netbooks like the latest Atoms do.
You could cram in a chipset too to reduce the price on netbooks.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
I look forward to seeing what AMD's new architecture brings. It's not really interesting thinking about it as integrating a GPU into the same space as a CPU, but creating one chip that can do more exotic types of calculations than either chip could alone and making it a available in every system. I'm also envisioning "GPU" instructions being executed where normally CPU instructions were when not in use, and vise versa, basically so everything available could be put to use.
Arguably, the "off-chip FPU" nowadays IS a GPU - hence all the GPGPU stuff.
Thats what the guys tell themselves.
Balderdash!
There's two sides to this coin and Intel's is pretty neat. By not having the GPU integrated into the CPU die, Intel can improve the CPU/GPU without having to redesign the entire chip. For example, any Power management improvements can be moved into the design as soon as it's ready. Another advantage for them is the fact that each die CPU and GPU are actually indepenent and can be manufactured using what ever process makes the most sense to them.
AMD's design offers a major boost to overall CPU performance simply through the fact that the integration is far deeper then Intel's. From what I've read, the Fusion ties the Stream Processors (FPU) directly to a CPU and should offer a major boost in all Math ops of the CPU and I expect that it will finally compete with Intel's latest CPU's in regards to FPU operations.
Mod me up/Mod me down: I wont frown as I've no crown
And if Moore's law continues to hold, within the next four years it won't be an issue to put both of those chips on the same die. Hell, that may even be the budget option.
Of course there are problems with this sort of approach. Most current games are not very well threaded - they have a small number of threads that will run poorly on an in order CPU. So if the only chip you had was a Larrabee and it was both a CPU and a GPU the GPU part would be well balanced across multiple cores. The CPU part would likely not. You have to wonder about memory bandwidth too.
I believe that it was in fact memory bandwidth which killed larrabee. A GPU's memory controller is nothing like a CPU's memory controller, so trying to make a many-core CPU behave like a GPU while still also behaving like a CPU just doesnt work very well.
Modern good performing GPU's require the memory controller be specifically tailored to filling large cache blocks. Latency isnt that big of an issue. The GPU is likely to need the entire cache line, so latency is sacrificed for more bandwidth. The latency is amortized over many many operations.
CPU's on the other hand require the memory controller be tailored to filling small cache blocks. Latency is a big issue. The CPU may only want or need 4 bytes from that cache line, so latency can't be sacrificed for bandwidth. The latency may not be amortized over many operations.
"His name was James Damore."
It seems like the caching issues could be fixed with prefetch instructions that can fetch bigger chunks. Which it apparently has.
Still just fetching instructions for 48 cores is a huge amount of bandwidth.
http://perilsofparallel.blogspot.com/2010/01/problem-with-larrabee.html
Let's say there are 100 processors (high end of numbers I've heard). 4 threads / processor. 2 GHz (he said the clock was measured in GHz).
That's 100 cores x 4 treads x 2 GHz x 2 bytes = 1600 GB/s.
Let's put that number in perspective:
* It's moving more than the entire contents of a 1.5 TB disk drive every second.
* It's more than 100 times the bandwidth of Intel's shiny new QuickPath system interconnect (12.8 GB/s per direction).
* It would soak up the output of 33 banks of DDR3-SDRAM, all three channels, 192 bits per channel, 48 GB/s aggregate per bank.
In other words, it's impossible.
So 48 cores needs 16 banks of DDR3-SDRAM.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
If AMD puts a competetive GPU onto the CPU die, comparable to their current high-end graphics boards) then this is a really big deal. Perhaps the biggest issue with GPGPU programming is the fact that the graphics unit is at the end of a fairly narrow pipe with limited memory, and getting data to the board and back is a performance bottleneck and a pain in the butt for a programmer.
Putting the GPU on the die could mean massive bandwidth from the CPU to the hundreds of streaming processors on the GPU. It also strongly implies that the GPU will have access directly to the same memory as the CPU. Finally, it would mean that if you have a Fusion-based renderfarm then you have GPUs on the renderfarm.
This is exciting!
I love Mondays. On a Monday, anything is possible.
I've been watching this for a while, and as far as I can tell, discrete graphics cards will still be significantly faster for most things. The reason being memory bandwidth. Sure cache is faster, for smaller datasets. Unfortunately, let's assume you have 10MB of cache, your average screen size will take at half of that (call it 5MB for a 32 bit 1440x900 image), and that's not counting the cpu's cache usage if it's shared. So you can't cache many textures, geometry or similar, after which it drops off to the figures below:
DDR3-1066 8533 MB/s (x2 or x3) up to ~ 25 GB/s (~8600 GT)
DDR3-1333 10667 MB/s up to 32 GB/s (~8600 GT)
Both well below the 103 GB/s of an 8800 Ultra
Compare that with a few current generation end cards:
Geforce 220-25GB/sec
Geforce 260-111GB/sec
Geforce 280-141GB/sec
Geforce 480-177GB/sec
There will be some advantages to having it on die, but for anything requiring lots of memory bandwidth, a discrete card is likely to absolutely trounce Fusion, especially when you consider that the memory bandwidth for the DDR chips quoted above, is shared with the processor. (Considering I was thinking of all current CPUs, AMD's are only dual channel, or x2, not the x3 as above, but that may change, and probably should if they introduce a new socket which they probably need to, simply to support the graphics outputs.) That's a lot of the reason Integrated graphics using main memory have always been behind anything with it's own memory. Even the really cheap Nvidia cards (Don't remember which they were, but they were about the time PCI express came out) that were advertised as 64MB (of system memory) had at least 16MB. That was for two reasons: Latency PCI express has a lot of bandwidth, but local memory is faster, and framebuffer.
Fusion strikes me as AMD repeating Nvidia's experiment, probably with the result of beating the heck out of current integrated chips, but being at best comparable to 'midrange' (x6____) graphics cards. If it has that performance, and has good drivers: it will be a resounding success for them. It won't cannibalize the highly profitable high end, but will make good gaming even cheaper on AMD. Every AMD Fusion based computer would be capable of good enough gaming, or 3D work. Bonus to them if when not in use for graphics the GPU part also speeds up the CPU with a separate dedicated card. (I think that's another intention, but not the primary focus, but like everyone I'll have to wait and see.)