Slashdot Mirror


AMD Releases X86-64 Architecture Programmers Overview

AMD has released a manual in PDF format to allow software developers to migrate their code to its 64-bit Hammer microprocessor platform. The PDF document is here and Sandpile Web site gives excellent explanatory material of the content of the PDF file. Surprisingly, Sun supports the Sledge Hammer. Article can be found here

9 of 199 comments (clear)

  1. GCC can't do VLIW by Anonymous Coward · · Score: 5
    Face it, with VLIW the onus for performance is on the compiler. And C is not the easiest language for VLIW compilers. VLIW compilers work best when global analysis can be performed. But global ananlysis is very difficult amid the untamed world of C pointers (aliasing problems).

    AMD's 64 bit processor is a better match for the current state of compiler technology, particularly Open Source compiler technology.

  2. The inevitable outcome by ameoba · · Score: 4
    Both AMD and Intel have impressive 64bit architectures coming in the near future. I don't really know enough about either to say if one is technically superior to the other, but as we've seen time and time again, that's not going to be the determining factor. Since the two aren't compatable, they'll be forced to compete, and I don't think the market has room for both of these platforms.

    Technical merit won't really make a difference here; both Intel and AMD can make a good chip. It's going to come down to who gets to market first, who can buy the press coverage, and who's going to get the required software support.

    I have some vague memory of an Intel sponsored 64bit Linux port; If AMD expects to succeed, they should be doing the same.

    I can sum the whole post up in two words:

    Betamax
    --
    my sig's at the bottom of the page.
  3. Intel's superior offering????? by Idaho · · Score: 5

    You obviously never read Tom's Hardware. Look at this article describing how Tom thinks about Intel's IA64 architecture (Itanium, formerly Merced) in relation to AMDs SledgeHammer technology.

    Why do you think it's called SledgeHammer in the first place???

    --
    Every expression is true, for a given value of 'true'
  4. Don't Blame AMD for that one by Killer+Napkin · · Score: 4

    Intel started that whole game with adding MMX Technology (R) to their chips. It made everyone think that, somehow, they were getting something entirely different with the Pentium chips because MMX was this magical add-on that no one else had.

    Suddeny Cyrix (you remember them don't you) chips started coming out with MX technology. AMD, if I remember correctly started licensing MMX for a while and finally came out with 3DNow.

    Now every peice of hardware you buy comes with magical tweaks. Hard drives, video cards, and even motherboards -- they all come in brightly colored boxes with completely unrelated pictures on them flouting "Proprietary Brand Useless Features (R)" If you're going to blame someone for being lame, blame Intel. (Of course, I'd also blame Microsoft, who touts useless or standard features as being "innovative")

  5. VLIW = Very Long Instruction Word by Idaho · · Score: 4

    For the uninformed,

    VLIW means Very Long Instruction Word, which means that the processor can execute
    several instructions at one time (one Instruction Word can contain multiple instructions).

    This is a good idea because this way the processor does not have to worry about
    running instructions concurrently. Instead, the compiler has to worry about that,
    which makes it really hard to write a compiler that handles this right, while still
    optimizing as much as possible.

    Since the VLIW technology has not been tested very much in microprocessor designs,
    Intel is doing some really hard (maybe even innovate!) work, but that also means
    many problems/bugs/performance issues, as with almost every 'new' technology.

    VLIW is successfully used in (e.g.) DSP processors, but those have very simple
    instruction sets, where the order in which things are processed often does not
    really matter, so this is rather simple to implement.

    However, in a microprocessor it's much harder, which may be part of the reason why Intel's Itanium is delayed so much (performance problems, and problems running high clock frequencies).

    --
    Every expression is true, for a given value of 'true'
    1. Re:VLIW = Very Long Instruction Word by ToLu+the+Happy+Furby · · Score: 4

      Sorry to be picky, but all modern CPU evaluate many instructions simultaneously, using multiple-issue systems for many similar functional units of eack kind. The difference with VLIW is that several instructions are held in the same instruction word.

      More to the point, most modern CPU's, the chip itself determines which instructions should be run simultaneously. With VLIW, in theory, the compiler does this. Thus (in theory) the VLIW chip can be a lot simpler, since it doesn't need to spend a whole bunch of chip area on complicated logic to manage the process of choosing which instructions to process simultaneously and on large buffers to hold many instructions while the fastest combinations are chosen.

      In conventional VLIW the instructions combined in the same word must not have any dependencies on one another. The Crusoe native instruction set is VLIW in this sense. The Itanium system - which they call EPIC - is more complex, allowing instructions that interfere with one another to be combined in the same word. This ought to be somewhat easier for the compiler, but it seems in fact that elements of it are causing problems.

      Ah, what happens when theory becomes practice. See, the problem with all of the above is that traditional VLIW (and it is very a traditional idea, despite Intel pretending they just invented it (and even then, it was HP who invented EPIC)) can only produce code which is known to be dependency-safe at compile time. In the fields where VLIW has been used for decades--i.e. specialized DSP stuff--this means most code. In the fields in which Intel is trying to sell its IA-64 chips--i.e. general purpose, multitasking machines (first servers and eventually workstations and even, they claim, desktops)--this turns out to be almost no code at all. It turns out that virtually nothing that a general-purpose computer tends to do has very many instructions which can be proven at compile-time never to have any dependencies on many other instructions. This compares to a normal superscalar x86/RISC processor, which has the much more fruitful task of determining which instructions are likely to be dependency free at run-time.

      This presented a problem. So, with EPIC, HP came up with the idea of having long-words (or "bundles") composed of instructions which were merely likely not to have dependencies on each other. So far so good...except that then you need a bunch of hardware on the chip to determine when you have dependencies and when you don't, and to figure out what to do when you do (after all, the chip still needs to execute all those instructions). That is, all the hardware that any non-VLIW superscalar CPU needs. But wasn't the whole point of VLIW was to make the chip simpler by moving all that stuff into the compiler and keeping it there?

      Another feature that most modern CPU's (so called "out-of-order designs") have is the ability to pick one side of a conditional branch--the side which is more likely to be taken--and pretend it has already been taken, executing the results before it knows they will be needed. The problem with this is that now the chip needs to keep track of which instructions are definitely needed and which would have to be "thrown out" if it turns out the wrong branch was taken. This would seem to be counter to VLIW philosophy, but because of the speed gains speculative execution offers, EPIC includes it as well. While branching "hints" are indeed introduced by the IA-64 compiler, this is still just more complexity which ends up not being removed from silicon.

      Further holding back Itanium is the fact that without dynamic handling of instructions in the CPU, the chip can't take advantage of something called register renaming, which essentially allows the compiler to pretend it's using certain registers while the on the chip the programs just use whatever registers happen to be available. The upshot of this is that Itanium needs a whopping 128 14-ported (a "port" is like a line into/out of the register) 64-bit general purpose registers. More registers--sounds like a good thing, right? Problem is that 128 14-ported 64-bit registers take up such a phenomenal amount of die space that it's difficult to get them to do all the things they need to do in one clock cycle and still stay in sync. The result is that Itanium won't clock very fast without failing, and Intel had to screw with the pipeline timings late in the design process to get it to "acceptable" clock speeds at all.

      The end result of this? Amazingly, in a processor whose entire design philosphy is to remove as much complicated control logic from the chip to the compiler, the portion of Itanic's die dedicated to control logic will be bigger than the entire die of an Alpha 21264 on an identical process!! Intended to be a high-volume chip released at 800 MHz in late 1997 or 1998, Itanium is going to end up a low-volume chip released at 733 MHz in 2001, if indeed it is released at all in anything more than engineering-sample quantities.

      Thus everyone's hopes have turned to the next IA-64 chip, McKinley. Intended to be released in late 2001 (put me on the record as doubting it) as a high-volume, moderate clock speed chip on Intel's upcoming .13um process, it may indeed be an impressive performer. (Interestingly enough, McKinley is being designed entirely at HP...) Whether that's because of or despite the EPIC design will be interesting to see. It is, at the least, pretty doubtful that the competing Alpha 21364 and Power4 chips around by then won't be even faster than McKinley, and they'll be manufactured on less advanced processes than Intel's.

      Bottom line: at this point in the game, VLIW looks like a damn neat idea that never should have made the jump to general-purpose computing. Whether IA-64 will end up dominating the world anyways will be an interesting story to see. On the one hand, the server market isn't known for rewarding well-designed processors: the lion's share of the market is owned by Sun, whose UltraSPARC-II's are horribly outdated and outclassed on a per-processor basis by everything this side of a Celeron. Meanwhile, the Alpha, universally agreed to be the fastest and most elegant processor around, is languishing in the market. On the other hand, Sun's big strength (other than marketing) is in scalability, and given that the two big OS's for IA-64 appear to be Linux and Windows-64 (laff), it seems Intel won't be able to able to compete on this measure either.

    2. Re:VLIW = Very Long Instruction Word by cburley · · Score: 4
      Excellent post. I have only a few off-hand observations to add.

      My impression of IA-64/EPIC, based on some reading of the docs, is that it's VLIW redesigned so it can scale up or down quite a large range of implementations.

      (I say something about this on my linuxexpo web page (see my site), and that's over a year old.)

      Basically, VLIW design can be (roughly) thought of as focusing on the question "given chip process design circa <insert two-or-so-year period>, and problem space <type of software being targeted>, and assuming very clever compilers, what's the optimal ISA we can implement?"

      (Note that it doesn't take into account long-term viability of the ISA that gets produced, because the idea is that, with very clever compilers, when you get to the next generation of chip/process design, you instantiate a new ISA based on asking that question again, recompile, and, viola, you've got "optimal results" absent previous-ISA baggage. Ah, dreams.)

      EPIC seems to augment the question with "and that scales across a fairly wide range of potential chip sizes and processes".

      So, e.g. on the 128-bit VLIW machines I used to deal with, there were reasonably decent low-end chip designs that couldn't viably handle the ISA, because too much stuff would have to go off-chip (like, say, the register file maybe ;-).

      Whereas simpler designs, 16-bit or even 32-bit CISC or RISC, could fit on such chips, blowing the doors off a similarly-processed instantiation of that 128-bit VLIW.

      A good amount of the logic you talk about in your article strikes me as being unnecessary for low-end applications.

      E.g. the register renaming, the huge multi-port register file (a huge file is needed, but not so hugely multiported), the logic that attempts to figure out what the dependencies are, etc. -- these all strike me as being necessary only for medium-to-large-scale (vis-a-vis the high-level ISA) implementations.

      My impression is that early Alphas, e.g. 20264 or 20266 or whatever, were intentionally done as low-end implementations. But they didn't attract a lot of attention. Even though the (well-done, IMO) ISA allowed for a great deal of scaling up, as the 21264 and 21264 are apparently proving (my 21264 is "mothballed", sadly), there isn't enough "groundswell" to make it popular. Perhaps not coming out with more of a barn-burner chip -- shooting higher than the basic ISA implementation -- is partly responsible for this.

      I'm thinking that the only way EPIC, IA64 in particular, ends up looking good is if it's popular and implemented across a wide spectrum of chips, from low to high end.

      That way, code that's already compiled to the ISA can, in theory (and perhaps in practice too), be optimized for mid-level to high-end chips, and still run pretty well on the low end.

      Similarly, code that's "mundanely generated" might run about as well on the low end as the highly-optimized stuff anyway, but could still get worthwhile boosts being moved, without recompiling, up the chain.

      That's a nice dream, if it's what they have in mind, but there are a couple of problems.

      One, any information that is able to be encoded into ISA-level instructions that helps a variety of implementations of an architecture make their own decisions how to implement them represents ISA space (and usually other resources) that can't be used in a way that's optimal for a given implementation -- one that didn't have to accommodate the "wide-range" ISA (like EPIC).

      E.g. if you reserve in the ISA a bit that says "please serialize between previous and next op" so a faster chip can know when to do dependency analysis before assuming parallel execution will work, that's a bit that a low-end chip, which is going to serialize it anyway, won't need.

      Also e.g. if you reserve bits and fields for all sorts of optimization hints, so mid-level chips can do a better job of guessing, not only do you hurt low-end chips that won't use them, but you hurt high-end chips that have either enough logic to do at least that good a job of guessing at runtime anyway or have otherwise rendered most of the compile-time guessing redundant (e.g. by reducing latencies on branches, memory loads, whatever).

      In both cases, these bits in the ISA could, on chips that don't really benefit from them, have served to boost performance in some other fashion. E.g. a low-end chip might want hints about L1/L2 cache issues, or maybe just a more compact ISA so it holds more instructions in Icache. Whereas a high-end chip might want hints about completely different things, or maybe just a way to specify what some new ALU can do in parallel with everything else!

      Two, notice the Catch-22 in my statement about how EPIC/IA64 might look good. It has to be popular and run across a wide variety of architectures to show how good it really is (to the market, anyway), but how does it get popular until the market knows how good it is?

      HP/Intel seems to be taking the approach of doing their best to "mandate" its popularity -- by making a real commitment to the architecture, including ensuring Linux runs out it from the outset, making good (great?) compilers (including OSS ones -- not sure how SGI Pro64 fits into this) available, etc.

      But while they struggle to popularize this "epic" ISA by planting many seeds far and wide, and, at the same time, proving its value by rolling out an increasingly-wide array of implementations (in terms of performance), they do risk giving competitors opportunities to make inroads targeting shorter-term strategies.

      And, in the longer run that IA64 seems targeted towards, will ISA compatibility across implementations be nearly as important as it was when this strategy was formulated and adopted, what, five or so years ago?

      If so, then assuming IA64 is not much more or less "correct" an ISA for the longer haul than the competitions' designs for their shorter outlooks, Intel perhaps can afford (fiscally speaking) to remain committed to an ISA that might seem like a dinosaur roaming around slowly while clever early mammals expand into a bunch of niches for awhile, until that day comes when the dinosaur's strengths are visible -- applications compiled to IA64 will have longer useful lifetimes compared to those compiled to upstart ISAs, they'll scale better across a wide range of machines, etc.

      But what works against this is the increasing viability of the ISA-ignorant code base out there, by which I mean code distributed as source, bytecodes, whatever, and which cares little for the underlying ISA to meet its performance goals.

      That viability is increased by things like:

      • Increased deployment of Open Source software
      • Better compiler technology, encouraging less use of ISA-specific tactics (assembly coding being the extreme) for performance
      • Increased awareness of the pitfalls of investing in, or developing, ISA-specific applications (this applies especially to the comparatively huge "market" of in-house software)
      • The mere existence of IA64, which tells everyone that even the "King" realizes that the one-time "only" ISA (the 1980s Unix equivalent of the VAX ISA), the IA32, will someday pass away as a viable platform for many
      That last item is kinda like what Y2K did for two-digit year encoding. After all, similar mind-sets, priorities, etc. lead to that as to doing ISA-specific, e.g. IA32-specific, platform development.

      Yet it seems Y2K didn't cause a lot of people to resort to "1900/2000" solutions. I.e. we don't see a lot of cases (that I know of anyway) where the developers said "okay, the old code was for 1900, the new code supports 2000, and includes some 1900->2000 conversion utilities, but it's still all two-digit stuff".

      No, it seems most Y2K issues were handled by finally going the full four (or more!) digits, so the problem wouldn't come up again in 2100.

      If so much effort (US$Billions) was put into keeping the problem from resurfacing in another hundred years, that suggests the industry (those who decide what platforms to target for their software, whether for distribution or use in-house) will respond, to a significant extent, to the IA32->IA64 transition with less of an "okay, let's retarget everything to IA64" attitude and more of a "hmm, IA64 might get us 10-20 years of viability, that's too short for the investment we have to make, let's go the extra distance, get away from ISA-specific tactics, and pick a strategy that gives us flexibility in choosing IA64 vs. IA32 vs. Alpha vs. whatever at a suitably fine-grained level".

      To the extent the industry adopts that model, the advantages of EPIC decrease, while the disadvantages remain the same.

      It's also interesting to note the strong feedback among the other items.

      In particular, consider how the success of Open Source (mainly GNU/Linux -- GCC specifically) came about soon enough to cause HP/Intel to openly "target" OSS so it could join the IA64 revolution.

      Now, that helps promote IA64 acceptance.

      But it also allows competitors, who want to produce better, "one-off" ISAs using whatever IA64-like techniques are appropriate, to do so without losing all that Open Source software, and even without necessarily having to do much high-end compiler development!

      I.e. to the extent OSS compiles code well for IA64, it can be fairly easily modified to compile code well for a one-off ISA that doesn't have all the baggage of IA64 but does do some of the sophisticated stuff.

      So OSS popularity led to IA64 "openness", which could well lead to better compiler technology being available/affordable for arbitrary ISAs that are VLIW or RISC subsets of IA64, and that could encourage the third item above, in that more "users" of code bases will realize they'd spend less $$ to get performance if they could just recompile for the ISA de jour (from AMD, Compaq, whoever).

      I'm not saying IA64 represents taking a risky path, since I don't know the percentages -- could be anywhere from 10% to 90% chance of success for all I know.

      But, of course, a huge amount of $$ and energy is being put into IA64, so it is, indeed, a "big risk" to say the least.

      --
      Practice random senselessness and act kind of beautiful.
  6. As usually slashdot line noise is high by arivanov · · Score: 5
    By the time I read AMD pdf slashdot was already full of various "meaningfull opinions". Whatever, as usual, people never change.

    I have to admit I am impressed:

    • AMD managed to get rid of quite a bit of legacy for the 64 bit mode. Most of the utterly idiotic segmented switching is gone. There is a well defined supervisior mode and user mode. There is a SYSCALL instruction so the mess of "jump to that magic offset to get promoted to OS level" is gone and OSes will have a nice clean API.
    • At the same time there is a reasonable amount of backwards compatibility. It is not without a cost but the cost is way less than Itanic. Almost all idiotic x86ism (tm) like the x86 task switching mechanism generate a GPF on the spot. So the OS will have to handle them in software. This means two things: the compatibility is relatively simple and the compatibility will be easier to achieve for emulating 32 bit OSes where ring 0 is called within a well defined API. Porting "the world of bypass backdoors" - NT on this will suck (even after AMD has broken their own rules and made exemptions for it). Porting all Unix OSes available for x86 will be a piece of cake.
    • Also unless told to it will do all calculations in 32 bit. So that most of favourite x86 legacy issues will have less impact. Also it has a reasonable number of register. 8 more general purpose 64 bit ones and 8 more 128 bit SIMD ones.
    Quite an interesting beast to play with, question being will it have proper linux and BSD support. Also after reading the PDF it becomes clear why is the Sun support for this. This approach is very close to their uSparc/Sparc legacy scenario. And they have swam in that swamp for years (and are still swimming there). It is their CPU. It was born for them ;-)
    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  7. Re:Moving forward, a little at a time by SQL+Error · · Score: 4

    Agreed 100%. IA-64 is a very interesting architecture (all those little crinkly bits - lovely baroque feel) but the first chips (i.e. Merced) look like they'll be hot (150W), slow, and expensive. Intel are already spinning Merced as a development platform only. McKinley (chiefly designed by HP, who tend to know what they're doing) looks like the first "real" IA-64 chip, but it's still some distance away.

    Meanwhile, AMD has designed the SledgeHammer as a minimal 64-bit extension to the Athlon, so it's not much bigger than existing chips (on the order of 10%), and so hopefully won't be too expensive. Though if it comes with dual cores and a large L2 cache, you can expect to pay rather more than for a Duron.

    Assuming that it does come with dual cores and a healthy L2 cache (and no critical bugs), SledgeHammer will make a good choice for small servers and high-end desktops even without true 64-bit OS support. And Intel's troubles with Merced effectively hand AMD a 12-to-18-month window of opportunity in the low-end 64-bit market. Which may be all they need.

    The other issue is that Merced is likely to be relatively slow on IA-32 apps, while SledgeHammer could well be the performance leader. If you don't expect all your code to go 64-bit overnight, it's another plus for the AMD solution.