Slashdot Mirror


Is Prescott 64-bit?

unassimilatible writes "According to The Inquirer, Intel's new Prescott has 64 bit instructions lurking inside. Could really rain on the parade of those who thought the new Athlon 64's would be supreme - especially when you look at Intel's price roadmap. Don't run out and buy an Athlon 64 just yet..."

11 of 487 comments (clear)

  1. Performance doesn't come directly from 64 bits by vlad_petric · · Score: 4, Informative
    MMX and SSE can already do integer operations on 64 bits ... What people don't realize is that the performance improvement comes from a significant change in the instruction set architecture (ISA). While x86 is the most commercially-successful ISA, it is ugly as hell, difficult to compile for and stressing the memory system unnecessarily, as it has very few registers ("difficult to explain and impossible to love" once said an Intel designer).

    Itanium is a full fix to the problem. The horrendous x86 ISA is completely replaced by an explicitly-parallel (EPIC) instruction set that has all the nice properties of a RISC machine (easier to compile for, less stress on the memory system as you get 128 registers, easier for the machine to decode the instructions as they're fix format and don't require RISC conversion, etc.). The problems with it are:

    1. You need a compiler that "knows" how to bundle instructions effectively (a VLIW-compiler). GCC clearly isn't there yet (it's not uncommon for the intel compiler to beat gcc by 30->50% when running computationally-intensive stuff)

    2. Being completely different than x86, it can't be very efficient at emulating x86 programs.

    AMD partially fixes the problem by extending the x86 ISA to 64 bits, *and* adding 8 general purpose registers. Because they just extended the ISA, running old code is just as fast. Furthermore, new code can benefit from from the extra 8 registers, and run even faster.

    For the short term the Opteron is a pretty impressive chip, but I really don't see how AMD is going to stay on Moore's curve with such a shitty instruction set architecture.

    P.S. Clearly 32 bits can only address 4GB of RAM, and for *some* servers more addressing space buys you something. But I'd say they are a very small minority.

    --

    The Raven

    1. Re:Performance doesn't come directly from 64 bits by drinkypoo · · Score: 4, Informative
      AMD did add more registers. In fact, they quadrupled the number of registers. Vlad asserts that x86 has eight GPRs, and that x86-64 just adds eight more, but he is wrong on both counts. x86 has four GPRs, [E]AX through [E]DX (The "E" means 32 bit) and it has four index registers - actually two index registers, and two index offset registers - which can be used with some instructions. Many x86 instructions specify that your result must be stored in a specific register or pair of registers (for 64 bit results of multiply operations for example) and none of those results go into the address index registers. Furthermore, many instructions require that you use the index registers for - gasp - addressing, and they look at those registers to determine where to get information, and/or where to store it. Hence, you have FOUR general purpose registers in x86. If you want to be really strict about it, you have zero general purpose registers on the x86, because each of the four so-called GPRs has a purpose to many instructions. CX, for example, must be used for your counter by many instructions, so when writing assembler you are forever having to take into account where each instruction is going to want to look for data. Modern x86-compatible processors actually have a whole shitload of temporary registers and do register renaming so that when you think you're moving data from register to register to avoid this problem, the CPU is actually leaving it in place and just renaming the registers. This is true of the processors from both Intel and AMD, and presumably even the VIA processors, though I have no information there.

      Now, I admit I haven't spent a lot of time looking through my x86-64 manuals, because it's been vaporware until fairly recently, and furthermore they lied to us about how many HT buses would be on each flavor of processor right up until the very end, so I won't be buying anything until either they bring out an Athlon 64 MP which has the missing hypertransport bus re-added, or until the Athlon 64 brings down the price of Opteron processors. My Athlon XP is holding me for the time being, and besides, there's no 64 bit windows yet. Even after there IS a 64 bit windows, I expect to have to wait a little while for some of my drivers. So it hasn't really been a serious consideration for me. But I suspect that in many cases they have provided us with new instructions to replace the old instructions which require that the result go into specific registers.

      So, x86-64 has 16 GPRs, plus 16 "XMM" registers for SSE/SSE2, not to mention it implements the SSE2 functions from the P4. I think it pretty effectively does all the things it needs to do. Meanwhile it still has hardware solutions for emulating all the deficiencies of the x86 so that it can maintain backwards compatibility without sloowowwwwwing dooowowwwnnnn like itanic. It's the perfect solution for those persons who are not ready to give up their backwards compatibility, and it does not have the flaws that you and vlad assert. If you don't believe me, go root around AMD's site for the PDFs. Hell, I even got them to send me the paper documentation for free, which I intended to read in the bathroom. Unfortunately, even my wholly irrelevant nintendo summer 2003 catalog has been water damaged in there, so I'm definitely not going to venture into the latrine with my AMD technical docs.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Performance doesn't come directly from 64 bits by scheme · · Score: 5, Informative
      Itanium is a full fix to the problem. The horrendous x86 ISA is completely replaced by an explicitly-parallel (EPIC) instruction set that has all the nice properties of a RISC machine (easier to compile for, less stress on the memory system as you get 128 registers, easier for the machine to decode the instructions as they're fix format and don't require RISC conversion, etc.). The problems with it are:
      1. You need a compiler that "knows" how to bundle instructions effectively (a VLIW-compiler). GCC clearly isn't there yet (it's not uncommon for the intel compiler to beat gcc by 30->50% when running computationally-intensive stuff)
      2. Being completely different than x86, it can't be very efficient at emulating x86 programs.

      The Itanium ISA is elegant an and clean in some places but in others is an ungodly mess of complicated things. Take the register save engine (RSE) for example. It's supposed to handle spilling registers to the stack and loading them to the stack. This includes handling page faults, exceptions, interrupts, and memory errors. Oh yeah, this is supposed to be automatic and handled invisibly by hardware without software intervention. Hasn't happened yet.

      Also the EPIC ISA that the Itanium uses isn't easy to compile for. This is one of the biggest problems with the Itanium. It requires compilers to pull out a lot of parallelism in the code and present that to the hardware for execution. Intel sort of glossed over this when introducing the Itanium about 10 years ago and the compiler technology hasn't been able to really do this. So although the Intel compiler is better than gcc, it still isn't all that great.

      Incidentally, the Itanium does a better job at emulating the x86 ISA in software than in hardware. It was a big deal a few months ago when Intel introduced a software x86 emulator that offered a dramatic improvement over using the built in hardware emulation.

      --
      "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  2. Re:I wouldn't buy the Athlon anyway by Malor · · Score: 4, Informative

    Athlon chipsets sucked rocks for a long time, and were really unstable. But VIA finally got their act together, I think with the KT133A.

    AFAIK, other than stomping on occasional driver bugs, Athlon chips have been pretty excellent ever since. I have an Athlon 1900+ on an ASUS A7V333 that's rock solid, and a new Athlon 2500+ on an Nforce2 board that's not quite as solid, but which is still pretty good.

    I'd like to see some improvements on the NForce2 chip stability. It's not all the way there yet, in my opinion. But the VIA chipsets are extremely solid.

  3. Re:gotta compete by Waffle+Iron · · Score: 3, Informative
    But look at the PC architecture...the same outdated CISC architecture that was used in 1981 is still there in today's PC's.

    You do realize that there has been no such thing as a "CISC processor" since the Pentium Pro came out. Underneath the X86 bytecode VM, Pentium IVs, Athlons, etc. are highly advanced RISC cores with multiple concurrent execution units.

    The main reason that the huge expensive power sucking Itanium scrapes out a small lead in benchmarks over X86 CPUs is because of its expensive huge power sucking cache.

  4. Re:IInntteerreessttiinng by Waffle+Iron · · Score: 5, Informative

    You're guess is basically on the right track. I don't want to violate any NDAs, but let me just say that the AAA and AAS opcodes will now support Unicode.

  5. No, it isn't by Groo+Wanderer · · Score: 5, Informative

    As the author of the article, I had to REALLY make things vague. The people involved would be hurt badly by Intel if their names got out. Some of the situations that were told to me make it quite apparent who was leaking. That was as specific as I could make it :(.

    -Charlie

  6. Re:There's more to it than 64-bit instructions by Wesley+Felter · · Score: 3, Informative

    Intel x86 CPUs can already address 36 bits of physical memory, which should be enough for the next few years.

    Intel doesn't do on-board memory controllers because they got burned by Timna.

  7. Re:gotta compete by Waffle+Iron · · Score: 3, Informative
    The logic to convert x86 instructions to micro-ops takes up space on the die and uses extra power.

    However, the compact legacy CISC instruction set does conserve on instruction cache space. This offsets much of the cost of the conversion logic. Moreover, it allows custom optimizations for the exact architecture du jour without affecting binary compatibility.

    And any way you look at it, you have to read from memory a lot more often with 8 "general purpose" registers than 32 real GPRs, which is what most sane CPUs have.

    Many modern X86 CPUs have more than 32 real GPRs which are utilized by register renaming. Like quantum mechanics, the processor state for any given instruction is smeared out over time and space, and the CPU is operating on many instructions simultaneously. The number of visible registers just doesn't matter as much as it would seem on the surface.

    Itanium doesn't have to do this, PowerPC doesn't have to do this, no modern ISA requires this nonsense.

    They will when somebody figures out the next architecture trick that doesn't match the assumption of the designers of their ISAs. Take a look at history; remember when MIPS stood for "Microprocessor Without Interlocked Pipeline Stages"? What did the R4000 introduce? Could it be - interlocked pipeline stages? Exposing CPU implementation details to the software is not something that wears well over time.

    It doesn't just "scrape out a lead" in floating-point benchmarks, it absolutely destroys the x86 competition.

    That's because the FPU has not been very important in the X86 market up to this point. Business and multimedia apps just don't need it. If AMD or Intel put their efforts into an X86 with ultimate FPU performance, it could match or beat the Itanium.

    I suspect that Intel took advantage of the huge schedule delays in the Itanium to throw in more FPU horsepower because it had to beat the consumer-grade chips on something.

    And oh yeah, its running at what, half the clockspeed of the P4? If Itanium had the same economies of scale behind it at this point, there would be no competition.

    As I said, the cache and memory architecture is the primary factor in the performance of CPUs today. Clockspeed, instruction set, registers ... who cares? Everything that's not cache is only a small fraction of the die size.

    All of that hardware architecture stuff is a red herring. Worrying about those non-issues has caused the Itanium schedule to slip nearly a decade while they desperately tried to write a C compiler that could statically wring out performance from their brittle concurrent execution model without the benefits of the run-time statistics information available to the X86 code translators.

  8. Good point, one little problem. by Groo+Wanderer · · Score: 4, Informative

    I didn't consider timing when I wrote the story, or any of it's predecessors. Silly as I am going to the A64 launch tuesday. Anyway, I have been chasing this story since the chip-architect articles. The timing was unfortunate, but it wasn't an Intel plant, that much I can assure you.

    For about 3 months, I have known there was 64 bit functionality there, but I didn't have enough to prove it to my own satisfaction. I chased leads, interviewed people, and got that info.

    The fact that IDF brought me into close proximity with a ton of sources was the thing that got me so much info so quickly. There was only one thing from Intel directly, the rest were from third parties supporting the chip. If IDF had happened last January, I probably would have gotten the info then.

    -Charlie

  9. amd64 CPU's available _now_ by Brian+Ristuccia · · Score: 4, Informative

    You can order amd64 systems from places like appro and Penguin Computing right now, with decent sized collections of 64-bit applications provided by popular distributions such as SuSE. Let's not forget that the amd64 CPU's can run ia32 binaries at speeds faster than many ia32 CPU's and on a system with an amd64 kernel allow for more aggregate address space consumption across processes and the ability to install tremendous amounts of physical memory for buffers and cache even if individual processes can only take advantage of a few gigabytes.

    With other groups like the Debian project well underway in their amd64 porting efforts, you can expect thousands of popular applications built for the amd64 platform. There's tons of software available for amd64 already, and you can bet by the time that AMD releases their "Athlon64" or whatever they're targeting the low-end market with, there will be even more.