AMD Previews New Processor Extensions
An anonymous reader writes "It has been all over the news today: AMD announced the first of its Extensions for Software Parallelism, a series of x86 extensions to make parallel programming easier. The first are the so-called 'lightweight profiling extensions.' They would give software access to information about cache misses and retired instructions so data structures can be optimized for better performance. The specification is here (PDF). These extensions have a much wider applicability than just parallel programming — they could be used to accelerate Java, .Net, and dynamic optimizers." AMD gave no timeframe for when these proposed extensions would show up in silicon.
Anybody?
I wonder - amongst 16-bit "real mode", 16-bit "protected mode", 32-bit mode, 64-bit mode - how many different instruction kinds / opcodes a modern x86 CPU supports?
Looks like there isn't a whole lot there that you couldn't get using existing performance counters and a tool like oprofile....
-- Erich
Slashdot reader since 1997
and did away with the aging x86 instruction set and came up with something new.
Yeah, I know, Intel tried with Itanium.
These extensions could be useful, but speaking as someone from the target audience... I just don't care right now. No amount of minor improvement difference (as might be gained through these) is as important to me as seeing a viable alternative to Intel. Not because I'm an AMD fanboy, but because competition brings the prices down, and accelerates the release of faster chips. From what I hear now, we'll finally see Barcelona chips out on September 10th at -maybe- up to 2.3 Ghz if you're one of the cherised few, but most retail ones will be 1.9 Ghz. I haven't seen the (valid) numbers, so I can't say for sure, but I'm worried about how competitive this will be.
/Grumble
I realize that the software people and hardware people both have their projects to work on, and they work largely independently in terms of a time-frame, but I figure this news might be timed to say, "Hey! Look at us! We're doing stuff!", but it only serves to frustrate me that their still aren't any real numbers on Barcelona, and, on the whole, that AMD seems to have dropped the ball.
They can't get the chips to clock up nicely as a whole; an individual chip or a few dozen individuals can, but most of them are binning in the sub-2GHz category, and that's simply atrocious; no matter how much "better" they are than Intel's quad cores, Intel's are already pushing 3GHz (and benchmarking roughly 50% better, meaning both architectures are performing pretty similarly and roughly the same clock-for-clock).
The first stab at Barcelona we're getting are going to pathetically under-perform compared to the competition.
Has there in the past been an example of AMD adding new instructions and then Intel following along and adopting them? I know it works in the converse, but somehow I doubt Intel wants AMD taking the lead in extending its own ISA.
Part of the hardcore faithful who believed in Apple long before it was cool again to do so
The game.
I never quite understood why chip manufacturers had added cores long after memory bandwidth had became a problem. Why not add specialized execution units and make instruction set a bit fatter? It's not like arithmetic and logic operations are all that you can do with an int or a few ints. Same for floats (but even more operations).
its a good start, but it isint much. parallel programming will still be a bitch
2008: x86_64 retired because of bad performance, there are many prefix's bytes of the instructions of the CISC ISA x86_64.
x86-64 IS DEAD!!!
Let's go ppc64!!!
Let's go IBM!!! Let's go AMD-IBM!!!
I for one
think this
is good
news.
Please sign petition to restore sanity to our banking system!!!
http://financialpetition.org/
I see all fuss about programming. easy. don't what the is parallel It's
I see all fuss about programming. easy. don't what the is parallel It's
"It has been all over the news today:". Really? The only AMD news I've been seeing all day has been "Barcelona not shipping on schedule, and parts won't be as fast as promised". Ooops. Well, those Core2's are still cheap. and faster.
Also, I know from asm on SPARC that many op codes are really just variations of other ops (and/or pseudo ops). For instance, (I'm not sure of the x86 equivalent)
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
That must have been speculation or a SWAG from the poster to suggest it could be used to accelerate Java and/or .NET. There is nothing special about java or net that would allow this optimization. Both run on top of the OS and not on top of the hardware. So if the OS provided similar information about its routines, then that could be used. As it stands, the only thing to accelerate Java or .NET (both of which are c/c++ programs) is something that would accelerate any c/c++ program running on top of an OS.
You can poo-poo Java all you want, but the reality is that it's made programming a lot easier for the "rest of us", especially in a world where cross platform compatibility is key.
Yet another waste of silicon to 'accellerate' badly written software.
AND well-written software. What, you think you could write code that's just as fast without all the "hardware acceleration" being done for you, without using any instruction set extensions that have been added over the years? You are on crack.
Instead of devoting transistors to speed up the latest toy programming languages ('managed' code), why can't we just train programmers better?
And better profiling tools are contrary to this goal how exactly? And at what point do you tell your better-trained programmers that using those hardware acceleration features will make their code go faster?
Ahh..of course, because of java..don't bother learning HOW to optimized, let java do it FOR you...
Or let your C compiler do it for you. Whichever. There's a matter of degree, to be sure, but even still you're most likely wasting your time "optimizing" individual lines of C code since the compiler can probably do a better job and that's been the case for quite a while. The thing that will get you the most bang for buck is the same in C as it is in Java -- optimize your algorithms. Java can't do that for you, and neither can your C compiler.
The enemies of Democracy are
Ah, yes, the 'rest of us', meaning to me, the mediocre programmers, or 'Code Monkeys'. Please, by all means continue to churn out steaming mounds of code.
It's cross-platform alright, but crap is crap, on any platform, and in any language.
Java never made anything easier for anyone and you know it.
This is a joke. I am joking. Joke joke joke.
"They would give software access to information about cache misses..." Yeah that ought to help significantly with side-channel attacks against crypto software.
It isn't Intel's job to train programmers to do things right. That is the responsiblity of the education system. Nothing stops the education system from still teaching proper programming and design skills.
this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
Instead of devoting transistors to speed up the latest toy programming languages ('managed' code), why can't we just train programmers better?
Ahh..of course, because of java..don't bother learning HOW to optimized, let java do it FOR you...
I'm tempted to slam this as an uneducated rant, but since there's a little teeny kernel of truth in it, I'll let it slide.
The issue is not "badly written code". It's being able to run the same compiled code on a wide variety of hardware without recompiling it for every chip variant.
The huge drawback with all the RISC architectures (at least initially) was that each version of each chip had different numbers of functional units, different latencies for the functional units, different latencies to cache and memory, etc.
If you ever dealt with the MIPS or Sun compilers, they have a huge number of flags for hyper-optimizations on a variety of implementations of those architectures. The problem is that when you optimize it for one variant, it often makes it worse on other variants (because instructions that didn't collide in the instruction pipeline now do, as just one example..)
Now all of the modern architectures play the same games. Power/PowerPC, SPARC, Itanium, all of them. They all have multiple pipelines and execution units, massively parallel instruction issue, etc. Just like the X86.
And it's not because the programmers are idiots, but because that's the only way you could ever ship one binary that would run "optimally" on every implementation of that architecture.
PS. Java and C++ only make this worse because they are so dependent on such out-of-order massively-parallel execution (since they are so darn difficult to statically optimize).
The supreme irony of this is that for a while there, Java on X86 (Sun's implementation, no less!) ran rings around Java on SPARC (great strategy for pulling in customers for SPARC !). It's only with recent SPARC implentations (Niagara/Niagara 2) that play the same way as the X86's, that SPARC has finally caught up with and passed X86 again..
Intel isn't alone by providing detailed documentation. AMD gives instruction set and detailed optimization tips too.
It would be cool if GPU manufacturers were as helpful as CPU manufacturers are!
The number depends on how you look at it. I made a table that lists every x86 instruction excluding prefixes a while ago and it came out to 57,839 instruction/parameter combinations. That doesn't factor in the specific values passed to the opcode, or in the registers, or the differences in behavior of the chip depending on mode, how memory protection is setup, out of order execution, or instruction prefixes.
The large number of combinations certainly makes validation a tremendous challenge.
Do you even know Java? Or do you know anything OTHER than Java?
Running code that is directed at one architecture or another was an issue for RISC. If you look at the x86 CISC machines, you'll have a lot less variance. When it comes to RISC vs. CISC, it's not so important to omptimize for a specific architeture on CISC simply because the CPU handles a lot of things instead of the programmer/compiler code. The variances between running a program on CISC architectures is much smallar then doing the same for RISC architectures.
Yeah, but I couldn't find a way to get AMD to mail me a hard copy of their documentation (at least, not for free). If they do so, please correct me, as I haven't looked in quite a few months.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Profiling is useful for code produced by any language, and being able to profile without adding code, eg, at the beginning of functions, means you get to see how the actual software runs, without doing things that affects caching etc (for example, profiling code might push certain instructions onto a different cache line, skewing the results)
The revolution will not be televised... but it will have a page on Wikipedia
Funny. I've seen a $59 Brisbane core (1.9 out of the box) overclocked to 2.9 GHz with just air cooling, so I'm not sure why everyone insists AMD can't hit the 3GHz barrier, especially when AMD keeps displaying 3GHz Barecelonas.
There are three reasons to buy AMD right now.
1. Price, price and price. AMD knows Intel has the better fab, but AMD is selling super cheap. You can get a dual-core processor for half what Intel charges, and for the average user, it is more than enough. I'm running Oblivion at 30 FPS with a $59 processor, and I've barely overclocked it. The cheapest Intel dual-core proc was $120 when I bought my $59 proc. Most people have no idea that their proc these days often underclocks itself, and you rarely touch the full potential of your proc. Intel is faster, and no one doubts that today, but if you never see the speed benefit, why spend the extra dollars? On a performance per dollar basis, AMD wins hands down.
2. There is a mountain of evidence against Intel for anti-trust violations, and I try not to financially support evil. The EU is also coming down on Intel for anti-trust violations.
3. Even if the anti-trust suits both come through, AMD is near bankruptcy, and I prefer choice in the marketplace. I am terrified of the day when Intel has no competition pushing them and they can just sell what they want and whatever price they want.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Java is a great concept with piss-poor execution.
Oddly enough, the same code can often be compiled cross-architecture and cross-platform quite easily on GCC that provides a nice, fast executable native to each platform and architecture and it uses a fraction of the start-up speed and resources of Java.
I'm a crappy programmer, and even that is transparent to me.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Jombeewoof is a bastard who thinks the world owes him a living. http://slashdot.org/comments.pl?sid=267807&cid=202 07637 Jombeewoof tried to destroy an Internet Service Provider in Massachusetts by expecting large bandwidth without paying anything. Educated alone doesn't pay the bills. Jombeewoof is not worth your mod points and is a MySpace loser. Jombeewoof, give up, get off the Internet. The TrollGoons won't leave you alone.
I was reading the Great Microprocessors list and it says AMD already did that back in the K5 days. It had a mode where it can natively execute the RISC-like instructions. Nobody used it, so I don't know whether current gen AMD chips support it.
-- "This world is a comedy to those who think, a tragedy to those who feel."
Sony had a $10k PS2 called the PA that recorded exactly what happened to every cycle on the cpu, gpu etc. without changing the way the game ran. It was the most incredible thing, like you had been sitting in the dark for years and then suddenly someone turned on the lights.
Is it cache misses, dma contention, background threads, branch stalls or actual work? Optimizing on the PC just feels like groping around in the dark again.
--
thegirlorthecar.com - a dating game for guys
-- http://thegirlorthecar.com funny dating game for guys
Whole families have one or two computers but every member has their own phone. ARM has triumphed numerically. It doesn't try to compete with x86 but a future could exist in which many people have an extremely powerful ARM-based phone and rely on the internet a lot instead of having a PC.
This is all just my personal opinion.
There's a matter of degree, to be sure, but even still you're most likely wasting your time "optimizing" individual lines of C code since the compiler can probably do a better job and that's been the case for quite a while.
Terrible, if people start to give up to optimize the code (and understanding why it works), the net result will always be a noticeable decrease in programming quality (a very usual situation).
I know that you are aiming at premature optimization, and you are really right on this one, but the notion that 'optimize code' == 'wasting time' is a perfect excuse to not to learn how and why things work.
You have to know how to optimize the code to decide when is detrimental to use that optimization, and then, when you know how to do it, you realize it's a matter of degree, as you rightly said.
What's in a sig?
releases ANOTHER newer faster processor two weeks later ... effectively kicking AMD in the groin AGAIN.
You're crazy if you think the education system teaches programmers how to write good code. They can't even teach math and english well. Good programmers are mentored by other programmers.
Yes but then the 8051 then is probably out numbers the X86 and the Arm. The Mips, Arm, Power, and even the 68k still exists in the embedded market. For example the Power is in all three of the new game consoles. Arms are in a lot of the WAPs. I keep wondering if we will see the a CPU the size of the latest AMD but containing 16 or more ARM cores. Sort of a T1 competitor.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
i don't mean that you'd be an idiot for being a mac person, but that x86 cpu particulars would slip your mind. :D
Please stop stalking me, bro.
Isn't this just exposing/documenting the CPU's internal debug features so that developers can use them?
If you look at the die shots of recent CPUs, you will see a big chunk of transistors marked DEBUG.
like most /.ers,you have these wierd catagorys of evil and non evil companies.
ALL large companies are the same - the more successfull, the more evil
why is this so ?
while everyone professes to like the free market, businesmen hate the free market and love monopoly - in a free market you have to work harder for less, who in their rigth mind would actually like that ?
So, the 1st thing a company does when it becomes big and succesffull is to use its power to dampen market forces in any way it can.
Now sometimes, when a company is really, really rich and successful, like google or the old AT&T they are so succesfull that they cna hide their evilness behind total monopoly power. but as sooon as their market posistion slips, they beocme evil.
mark my words, you heard it hear 1st: as soon as googles profit starts to fall, andit is no longer a wall street darling, they willl be right in their with MS and GM and whoever.
There is a multi-core ARM CPU under development. The idea is that multiple cores are the best way to keep increasing performance without increasing power consumption.
I don't think that it's anything astoundingly interesting by desktop standards but it will allow embedded devices to keep advancing. As usual, before your phone can handle it properly, there is probably going to be some software that needs a redesign if it's going to show a speed improvement.
This is all just my personal opinion.
If you ever dealt with the MIPS or Sun compilers, they have a huge number of flags for hyper-optimizations on a variety of implementations of those architectures Sun's compiler has a huge number of flags for hyper-optimizations on a variety of implementations of X86 too. Near as I can tell, though, their impact on the vast majority of code is minimal. AMD and Intel can throw in all the new instructions they want, but they won't be meaningful for years -- if ever -- because code has to run on existing processors that don't implement those instructions.
Hey, I know this. Java makes storing random things inside a hashtable easier than C. What do I win?
(what it doesn't make easier as well as the set of languages that are better suited for this particular example is left as an exercise)
Do you mind if I call Microsoft into that comitee? They are the ones holding x86 alive.
Rethinking email
The ARM core isn't slow by any stretch I would bet that a good dual or quad core ARM would run all the software the average desktop needs. It would probably work just fine for most business systems. Since the ARM core is so small compared to say an Core2Duo or AthlonX2 I would bet that you could put 16 or more on a single die and then use Hyper transport for memory IO. You would need to add something like SSE and maybe an FPU but the end result could be very interesting for servers.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
You must be crazy if that's what you got out of my message. I didn't say the education system currently teaches programmers how to write good code. I said nothing stops them from doing so, whether they know how to is a different issue.
this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
I know python, c, c++ vbscript and javascript. I did try and learn java once, that didn't last long though. Relax man, its a joke.
This is a joke. I am joking. Joke joke joke.
I guess the education system failed me on reading comprehension....LOL