AMD Previews New Processor Extensions
An anonymous reader writes "It has been all over the news today: AMD announced the first of its Extensions for Software Parallelism, a series of x86 extensions to make parallel programming easier. The first are the so-called 'lightweight profiling extensions.' They would give software access to information about cache misses and retired instructions so data structures can be optimized for better performance. The specification is here (PDF). These extensions have a much wider applicability than just parallel programming — they could be used to accelerate Java, .Net, and dynamic optimizers." AMD gave no timeframe for when these proposed extensions would show up in silicon.
Yup. They tried it with Itanium, and it didn't work.
The thing is, at this stage in processor design, the actual instruction set isn't all that important.
But *compilers* are more important than ever, and writing a good compiler is hard work. x86 compilers have been tweaked and improved for nearly 30 years. A new instruction set could NEVER achieve that kind of optimization.
Interestingly,the Itanium and the EPIC architecture were designed to move all the hard work of "parallel processing" to the compiler. Unfortunately, they could never get the compiler to work all that well on most kinds of code. The compiler could never really "extract" the parallelism that Itanium CPUs needed to work at full speed.
Which is *exactly* the problem we have now with our multi-core CPUs. Compilers don't know how to extract parallelism very well. It's an *incredibly* difficult problem that Intel has already thrown untold billions of dollars at. Essentially, even though Itanium/EPIC never caught on, we're having to deal with all the same problems it had, anyway.
Well we had the 68000 family which had much better instruction set then the X86.
We have the Power and PowerPC which had a much better instruction set than the X86.
We have the ARM which is a much better instruction set then the X86.
We have the MIPS which is pretty nice.
And we had the Alpha and still do for a little while longer.
The problem with all of them is that they didn't run X86 code. Intel and AMD both made so much money from selling billions of CPUs that they could plow a lot of money into making the X86 the fastest pig with lipstick that the world has ever seen.
What made the IA-64 such a disaster was that it was slow running X86 code.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
EM64T?
What happened is that the P4 architecture was more of a marketing scheme to push MHz, but not performance. AMD came out with an architecture directed at high performance. Intel came out with the Core 2 products which also focused on peroformance instead of clock speed. Intel has a lead in the manufacturing process side with respect to node size. This helps them to produce a lot at a lower cost. And If you look at Intel's and AMD's financials, you'll see how much each has to spend on R&D. Intel has a lot more money to put down on more designs and more engineers than AMD does.
Not again.
Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do. Internally, Barcelona is a very nice RISC core capable of doing so many things at once its insane. The only thing that performs better is a GPU, and that's only because they're thrown at embarassingly parallel problems. The fastest general purpose CPUs come from Intel and AMD, and it has nothing to do with instruction set.
AMD64, and the new Core2 and Barcelona chips are very nice chips. 16 64-bit registers, 16 128-bit registers, complete IEEE-754 floating point support, integer and floating-point SIMD instructions, out-of-order execution, streaming stores and hardware prefetch. Add to that multiple cores with very fast busses, massive caches - with multichip cache coherency - and the ability to run any code compiled in the last 25 years. What's not to like?
Okay, I'll feed the troll. Tell me where I can buy an ATX (or smaller) PPC motherboard and CPU new for, oh, say $200, and I'll look at PPC again. The reason that x86 gets all the software is because it's the cheapest, it's the cheapest because all the motherboard manufacturers make motherboards for it, and all the motherboard manufacturers make motherboards for it because it gets all the software.
Looking at the PDF, it supposedly gathers profile data in the background (in local caches on the chip itself) and dumps periodically depending on the OS/application settings. This allows it to profile on-the-fly with very little impact on application performance.
The application can then gather the information, which is stored in its address space, and do with it what it will (optimize on-the-fly).
Of particular interest is that the OS can allow the profile information to be dumped to the address space of other threads/processes as well as the one that the data is collected on. The OS controls the switching of the cached profile information during a context switch.
This is both cool (in that a secondary core/thread can help optimize the first) and scary (one thread getting access to another's instruction address information). I predict there will be exactly 42 Windows patches released 3.734 days after the service pack that allows Windows to take advantage of this feature because of security reasons.
Performance counters could be used by JITs to generate more optimized code. I wonder which programming languages use JITs...
Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do.
Sure, now it is, since the decoding of CISC instructions into micro-ops has largely decoupled ISA from the microarchitecture, allowing many of those neat-o performance features you meantion like out-of-order execution. However in the past this wasn't the case and a lot of x86's odd behaviors that seemed like good ideas when they were made were serious performance limiters. Like a global eflags register that is only partially written by various instructions (and they always write even if the result isn't needed).
Even today, I would say that all those RISC ISAs are better than x86, simply from the standpoint that they are cleaner, easier to decode, have fewer tricky modes to deal with, fewer odd dependencies, and all the other things that make building an actual x86 chip a pain in the arse. No, in the end it makes no difference in performance. Yet, if you had it to do all over again, building the One ISA to Rule Them All without concern for software compatability, and you decided to make something that was more like x86 than Alpha, I'd slap the taste out of your mouth.
But we do have to be concerned with software compatability, and that I think was the GP's main point. All of those other ISAs failed to dominate -- even when there were actual performance implications! -- simply because they were not x86 and hence didn't run the majority of software. IA64 failed not because it was itself all that bad, but because it couldn't run x86 software well. So when AMD came out with 64-bit backward-compatible x86, everyone stopped caring about IA64. Because it wasn't x86, and AMD64 was.
So ultimately I agree with you both, and I don't think the GP was nonsense at all. It's a very valid point -- backward compatability is king, so x86 wins by default no matter what. Your point -- that x86 isn't actually hurting us anymore -- is just the silver lining on that cloud.
The enemies of Democracy are
So what we need really is a "native" x86 compiler, say, from Intel, that would maybe outperform the multi-platform GCC compiler... an Intel C/C++ Compiler, or 'ICC' we could call it... maybe...
Oh who am I kidding, that could never happen.
The revolution will not be televised... but it will have a page on Wikipedia
Funny. I've seen a $59 Brisbane core (1.9 out of the box) overclocked to 2.9 GHz with just air cooling, so I'm not sure why everyone insists AMD can't hit the 3GHz barrier, especially when AMD keeps displaying 3GHz Barecelonas.
There are three reasons to buy AMD right now.
1. Price, price and price. AMD knows Intel has the better fab, but AMD is selling super cheap. You can get a dual-core processor for half what Intel charges, and for the average user, it is more than enough. I'm running Oblivion at 30 FPS with a $59 processor, and I've barely overclocked it. The cheapest Intel dual-core proc was $120 when I bought my $59 proc. Most people have no idea that their proc these days often underclocks itself, and you rarely touch the full potential of your proc. Intel is faster, and no one doubts that today, but if you never see the speed benefit, why spend the extra dollars? On a performance per dollar basis, AMD wins hands down.
2. There is a mountain of evidence against Intel for anti-trust violations, and I try not to financially support evil. The EU is also coming down on Intel for anti-trust violations.
3. Even if the anti-trust suits both come through, AMD is near bankruptcy, and I prefer choice in the marketplace. I am terrified of the day when Intel has no competition pushing them and they can just sell what they want and whatever price they want.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.