Slashdot Mirror


Is Prescott 64-bit?

unassimilatible writes "According to The Inquirer, Intel's new Prescott has 64 bit instructions lurking inside. Could really rain on the parade of those who thought the new Athlon 64's would be supreme - especially when you look at Intel's price roadmap. Don't run out and buy an Athlon 64 just yet..."

25 of 487 comments (clear)

  1. Performance doesn't come directly from 64 bits by vlad_petric · · Score: 4, Informative
    MMX and SSE can already do integer operations on 64 bits ... What people don't realize is that the performance improvement comes from a significant change in the instruction set architecture (ISA). While x86 is the most commercially-successful ISA, it is ugly as hell, difficult to compile for and stressing the memory system unnecessarily, as it has very few registers ("difficult to explain and impossible to love" once said an Intel designer).

    Itanium is a full fix to the problem. The horrendous x86 ISA is completely replaced by an explicitly-parallel (EPIC) instruction set that has all the nice properties of a RISC machine (easier to compile for, less stress on the memory system as you get 128 registers, easier for the machine to decode the instructions as they're fix format and don't require RISC conversion, etc.). The problems with it are:

    1. You need a compiler that "knows" how to bundle instructions effectively (a VLIW-compiler). GCC clearly isn't there yet (it's not uncommon for the intel compiler to beat gcc by 30->50% when running computationally-intensive stuff)

    2. Being completely different than x86, it can't be very efficient at emulating x86 programs.

    AMD partially fixes the problem by extending the x86 ISA to 64 bits, *and* adding 8 general purpose registers. Because they just extended the ISA, running old code is just as fast. Furthermore, new code can benefit from from the extra 8 registers, and run even faster.

    For the short term the Opteron is a pretty impressive chip, but I really don't see how AMD is going to stay on Moore's curve with such a shitty instruction set architecture.

    P.S. Clearly 32 bits can only address 4GB of RAM, and for *some* servers more addressing space buys you something. But I'd say they are a very small minority.

    --

    The Raven

    1. Re:Performance doesn't come directly from 64 bits by khuber · · Score: 2, Informative
      P.S. Clearly 32 bits can only address 4GB of RAM

      Intel has had more than 32 bit addressing since the Pentium Pro, which introduced 36 bit physical addresses (64 GB).

    2. Re:Performance doesn't come directly from 64 bits by vlad_petric · · Score: 2, Informative

      Yeah, but support for in the 2 major x86 OSes (Win & Linux) is rather flaky. Furthermore, a "normal" app is still limited to 4G by the OS.

      --

      The Raven

    3. Re:Performance doesn't come directly from 64 bits by drinkypoo · · Score: 4, Informative
      AMD did add more registers. In fact, they quadrupled the number of registers. Vlad asserts that x86 has eight GPRs, and that x86-64 just adds eight more, but he is wrong on both counts. x86 has four GPRs, [E]AX through [E]DX (The "E" means 32 bit) and it has four index registers - actually two index registers, and two index offset registers - which can be used with some instructions. Many x86 instructions specify that your result must be stored in a specific register or pair of registers (for 64 bit results of multiply operations for example) and none of those results go into the address index registers. Furthermore, many instructions require that you use the index registers for - gasp - addressing, and they look at those registers to determine where to get information, and/or where to store it. Hence, you have FOUR general purpose registers in x86. If you want to be really strict about it, you have zero general purpose registers on the x86, because each of the four so-called GPRs has a purpose to many instructions. CX, for example, must be used for your counter by many instructions, so when writing assembler you are forever having to take into account where each instruction is going to want to look for data. Modern x86-compatible processors actually have a whole shitload of temporary registers and do register renaming so that when you think you're moving data from register to register to avoid this problem, the CPU is actually leaving it in place and just renaming the registers. This is true of the processors from both Intel and AMD, and presumably even the VIA processors, though I have no information there.

      Now, I admit I haven't spent a lot of time looking through my x86-64 manuals, because it's been vaporware until fairly recently, and furthermore they lied to us about how many HT buses would be on each flavor of processor right up until the very end, so I won't be buying anything until either they bring out an Athlon 64 MP which has the missing hypertransport bus re-added, or until the Athlon 64 brings down the price of Opteron processors. My Athlon XP is holding me for the time being, and besides, there's no 64 bit windows yet. Even after there IS a 64 bit windows, I expect to have to wait a little while for some of my drivers. So it hasn't really been a serious consideration for me. But I suspect that in many cases they have provided us with new instructions to replace the old instructions which require that the result go into specific registers.

      So, x86-64 has 16 GPRs, plus 16 "XMM" registers for SSE/SSE2, not to mention it implements the SSE2 functions from the P4. I think it pretty effectively does all the things it needs to do. Meanwhile it still has hardware solutions for emulating all the deficiencies of the x86 so that it can maintain backwards compatibility without sloowowwwwwing dooowowwwnnnn like itanic. It's the perfect solution for those persons who are not ready to give up their backwards compatibility, and it does not have the flaws that you and vlad assert. If you don't believe me, go root around AMD's site for the PDFs. Hell, I even got them to send me the paper documentation for free, which I intended to read in the bathroom. Unfortunately, even my wholly irrelevant nintendo summer 2003 catalog has been water damaged in there, so I'm definitely not going to venture into the latrine with my AMD technical docs.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    4. Re:Performance doesn't come directly from 64 bits by scheme · · Score: 5, Informative
      Itanium is a full fix to the problem. The horrendous x86 ISA is completely replaced by an explicitly-parallel (EPIC) instruction set that has all the nice properties of a RISC machine (easier to compile for, less stress on the memory system as you get 128 registers, easier for the machine to decode the instructions as they're fix format and don't require RISC conversion, etc.). The problems with it are:
      1. You need a compiler that "knows" how to bundle instructions effectively (a VLIW-compiler). GCC clearly isn't there yet (it's not uncommon for the intel compiler to beat gcc by 30->50% when running computationally-intensive stuff)
      2. Being completely different than x86, it can't be very efficient at emulating x86 programs.

      The Itanium ISA is elegant an and clean in some places but in others is an ungodly mess of complicated things. Take the register save engine (RSE) for example. It's supposed to handle spilling registers to the stack and loading them to the stack. This includes handling page faults, exceptions, interrupts, and memory errors. Oh yeah, this is supposed to be automatic and handled invisibly by hardware without software intervention. Hasn't happened yet.

      Also the EPIC ISA that the Itanium uses isn't easy to compile for. This is one of the biggest problems with the Itanium. It requires compilers to pull out a lot of parallelism in the code and present that to the hardware for execution. Intel sort of glossed over this when introducing the Itanium about 10 years ago and the compiler technology hasn't been able to really do this. So although the Intel compiler is better than gcc, it still isn't all that great.

      Incidentally, the Itanium does a better job at emulating the x86 ISA in software than in hardware. It was a big deal a few months ago when Intel introduced a software x86 emulator that offered a dramatic improvement over using the built in hardware emulation.

      --
      "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  2. Re:BAH! 286 is all you need. by carsont · · Score: 2, Informative

    As far as I know, the UltraSPARC made its debut in 1995, while the first 64-bit Alpha from DEC was announced in 1992. 64-bit MIPS and PA-RISC chips were probably sometime between those two dates. See here.

    --

    Ubi dubium, ibi libertas.
  3. Re:I wouldn't buy the Athlon anyway by Malor · · Score: 4, Informative

    Athlon chipsets sucked rocks for a long time, and were really unstable. But VIA finally got their act together, I think with the KT133A.

    AFAIK, other than stomping on occasional driver bugs, Athlon chips have been pretty excellent ever since. I have an Athlon 1900+ on an ASUS A7V333 that's rock solid, and a new Athlon 2500+ on an Nforce2 board that's not quite as solid, but which is still pretty good.

    I'd like to see some improvements on the NForce2 chip stability. It's not all the way there yet, in my opinion. But the VIA chipsets are extremely solid.

  4. Re:gotta compete by Waffle+Iron · · Score: 3, Informative
    But look at the PC architecture...the same outdated CISC architecture that was used in 1981 is still there in today's PC's.

    You do realize that there has been no such thing as a "CISC processor" since the Pentium Pro came out. Underneath the X86 bytecode VM, Pentium IVs, Athlons, etc. are highly advanced RISC cores with multiple concurrent execution units.

    The main reason that the huge expensive power sucking Itanium scrapes out a small lead in benchmarks over X86 CPUs is because of its expensive huge power sucking cache.

  5. Re:IInntteerreessttiinng by Waffle+Iron · · Score: 5, Informative

    You're guess is basically on the right track. I don't want to violate any NDAs, but let me just say that the AAA and AAS opcodes will now support Unicode.

  6. No, it isn't by Groo+Wanderer · · Score: 5, Informative

    As the author of the article, I had to REALLY make things vague. The people involved would be hurt badly by Intel if their names got out. Some of the situations that were told to me make it quite apparent who was leaking. That was as specific as I could make it :(.

    -Charlie

  7. Re:gotta compete by EastCoastSurfer · · Score: 2, Informative

    gotta compete. Intel had to come with something better(cost effective) than Athalon64. If there was no competition, we would be still using 8088/6

    LOL, Intel is actually their largest competitor. Every time they release a new chip guess who they are primarily up against? People who are running other Intel chips.

    Without AMD though, I'm sure Intel would keep their new chips at higher price points for a bit longer and milk the power user crowd for a little more money.

  8. Re:There's more to it than 64-bit instructions by Wesley+Felter · · Score: 3, Informative

    Intel x86 CPUs can already address 36 bits of physical memory, which should be enough for the next few years.

    Intel doesn't do on-board memory controllers because they got burned by Timna.

  9. Look to the G5 by Groo+Wanderer · · Score: 2, Informative

    A really good example of what you are talking about is the G5. It simply extends an efficient architecture to 64 bits. Other than upping the memory limit, it does precious little to performance. The chip in 32 bit more is about as fast as 64 bit, and only starts to show a difference when memory useage gets large.

    As for AMD, you can see the effect by running a program in 32 bit mode, then running a 32 bit program recompiled to take advantage of the registers in 'compatibility' mode. There is quite a difference.

    -Charlie

  10. Actually, do go get that AMD or IBM chip by droleary · · Score: 2, Informative

    Don't run out and buy an Athlon 64 just yet...

    To anyone that 64 bits might make a difference for, they're steering clear of Intel, who has stated they're not going to focus on that desktop market for another 5 years. So all this article amounts to is Prescott FUD to support Intel's (misguided) roadmap.

    Disclaimer: I own some AMD stock and I do my Unix development on Mac OS X.

  11. Re:gotta compete by Waffle+Iron · · Score: 3, Informative
    The logic to convert x86 instructions to micro-ops takes up space on the die and uses extra power.

    However, the compact legacy CISC instruction set does conserve on instruction cache space. This offsets much of the cost of the conversion logic. Moreover, it allows custom optimizations for the exact architecture du jour without affecting binary compatibility.

    And any way you look at it, you have to read from memory a lot more often with 8 "general purpose" registers than 32 real GPRs, which is what most sane CPUs have.

    Many modern X86 CPUs have more than 32 real GPRs which are utilized by register renaming. Like quantum mechanics, the processor state for any given instruction is smeared out over time and space, and the CPU is operating on many instructions simultaneously. The number of visible registers just doesn't matter as much as it would seem on the surface.

    Itanium doesn't have to do this, PowerPC doesn't have to do this, no modern ISA requires this nonsense.

    They will when somebody figures out the next architecture trick that doesn't match the assumption of the designers of their ISAs. Take a look at history; remember when MIPS stood for "Microprocessor Without Interlocked Pipeline Stages"? What did the R4000 introduce? Could it be - interlocked pipeline stages? Exposing CPU implementation details to the software is not something that wears well over time.

    It doesn't just "scrape out a lead" in floating-point benchmarks, it absolutely destroys the x86 competition.

    That's because the FPU has not been very important in the X86 market up to this point. Business and multimedia apps just don't need it. If AMD or Intel put their efforts into an X86 with ultimate FPU performance, it could match or beat the Itanium.

    I suspect that Intel took advantage of the huge schedule delays in the Itanium to throw in more FPU horsepower because it had to beat the consumer-grade chips on something.

    And oh yeah, its running at what, half the clockspeed of the P4? If Itanium had the same economies of scale behind it at this point, there would be no competition.

    As I said, the cache and memory architecture is the primary factor in the performance of CPUs today. Clockspeed, instruction set, registers ... who cares? Everything that's not cache is only a small fraction of the die size.

    All of that hardware architecture stuff is a red herring. Worrying about those non-issues has caused the Itanium schedule to slip nearly a decade while they desperately tried to write a C compiler that could statically wring out performance from their brittle concurrent execution model without the benefits of the run-time statistics information available to the X86 code translators.

  12. Good point, one little problem. by Groo+Wanderer · · Score: 4, Informative

    I didn't consider timing when I wrote the story, or any of it's predecessors. Silly as I am going to the A64 launch tuesday. Anyway, I have been chasing this story since the chip-architect articles. The timing was unfortunate, but it wasn't an Intel plant, that much I can assure you.

    For about 3 months, I have known there was 64 bit functionality there, but I didn't have enough to prove it to my own satisfaction. I chased leads, interviewed people, and got that info.

    The fact that IDF brought me into close proximity with a ton of sources was the thing that got me so much info so quickly. There was only one thing from Intel directly, the rest were from third parties supporting the chip. If IDF had happened last January, I probably would have gotten the info then.

    -Charlie

  13. amd64 CPU's available _now_ by Brian+Ristuccia · · Score: 4, Informative

    You can order amd64 systems from places like appro and Penguin Computing right now, with decent sized collections of 64-bit applications provided by popular distributions such as SuSE. Let's not forget that the amd64 CPU's can run ia32 binaries at speeds faster than many ia32 CPU's and on a system with an amd64 kernel allow for more aggregate address space consumption across processes and the ability to install tremendous amounts of physical memory for buffers and cache even if individual processes can only take advantage of a few gigabytes.

    With other groups like the Debian project well underway in their amd64 porting efforts, you can expect thousands of popular applications built for the amd64 platform. There's tons of software available for amd64 already, and you can bet by the time that AMD releases their "Athlon64" or whatever they're targeting the low-end market with, there will be even more.

  14. Re:Hrmm by Groo+Wanderer · · Score: 2, Informative

    According to Le Inq, Prescott takes more than that.
    http://www.theinquirer.net/?article=11588
    Now these may have been taken from a roadmap that I really should not have seen, but you can see that the 100w number is a bit conservative. The next few generations are specced to narrow the gap between min and max power usage, but not lessen it. Depressing.

    -Charlie

  15. Re:I wouldn't buy the Athlon anyway by Sj0 · · Score: 2, Informative

    I'd like to inform you that any modern processor in a laptop will run hot. If you don't believe that to be the case, I invite you to run a p3 or p4 laptop on your lap for several hours.

    Tell me, Amsterdam Vallon, what broke on your AMD college computer? Unless it was a defect with the construction of an AMD processor, your point will prove irrelevant. I'm using an AMD processor right now, and my Windows 2000 machine got a virus thanks to IE and broke. That's not AMDs fault. My old motherboard needed a flash upgrade to use an XP 1800+. That's not AMDs fault. My hard drive was an old 1Gb and after years of service, died. That's not AMDs fault. Furthermore, if you managed to crush your core, or if you installed inadequate cooling or did a substandard installation initially in any way, you cannot blame AMD. They make processors. Installing a P4 with some sub-par Aladin chipset motherboard by PC-CHIPS, a 100 watt power supply, an IBM DeathStar hard drive, cheap ram made in some communist country and a Socket 7 heatsink will result in your machine breaking as well.

    For the record, I'm on my fourth Athlon. I've used the chips without problems, upgraded without a hitch, and run the new chip without problems until I decided to upgrade again. My next machine is undoubtedly going to be an Athlon 64 as a result of the quality I've witnessed.

    --
    It's been a long time.
  16. Re:gotta compete by Anonymous Coward · · Score: 1, Informative

    To be more accurate, Intel didn't do it first, but rather the NexGen 5x86 did it first. AMD bought NexGen, in part b/c their K5 sucked ass. and it was the NexGen engineers (in part at least) who made the K6 as well.
    So, it wasn't just 6th gen processors that had RISC emulating CISC.
    --
    tabris

  17. Compiler by vlad_petric · · Score: 2, Informative
    To be more specific, the compiler has to build traces (or hypertraces) from multiple basic blocks, as the level of paralellism in a basic block is just too small (this is also called "Flynn's bottleneck"). To do this properly you need profiling. JITs and software interpreters can do this on the fly (i.e. you don't need 2-steps compilation), and that's the reason software emulation does better than the hardware one (note: VLIW-scheduling in hardware is possible, but no processor does this AFAIK)

    I also agree with you about RSE being a mess - but stackable registers (similar to register windows in Solaris) is a very effective mechanism for reducing memory accesses. It does make out-of-order execution a living hell, but in the end it all comes down to stressing the memory less, as RAM doesn't follow Moore's curve ...

    --

    The Raven

  18. x86 has been 48-bit for years by Animats · · Score: 2, Informative

    x86 machines have had 48-bit address spaces for years. Some of them even bring out a few more pins, so you can address more than 4GB of memory. It's even supported by both Linux and Windows. You can't have more than 4GB per process space, but you can have more than 4GB in the machine. Works fine.

  19. Re:This is pure speculation by mauriceh · · Score: 2, Informative

    Mike Magee STARTED The Register, and still owns part of it.
    He left due to an internal disagreement and started The Inquirer.

    Get som facts before you babble my friend!

    --
    Maurice W. Hilarius Voice: (778) 347-9907
  20. Re:Itanium? by Anonymous Coward · · Score: 1, Informative

    88000 was a Motorola RISC processor from the late 1980's. Perhaps you mean Intel i860? (Intel's late 1980's RISC processor.)

  21. Re:Hrmm by jonadab · · Score: 2, Informative

    You don't upgrade to get a faster CPU. Not these days. You upgrade
    for other reasons -- your old motherboard is maxed out for RAM, and
    you need more. Your old motherboard is USB1.1 and you want 2.0. You
    could get an expansion card, but you've only got one slot left and you
    really wanted to add IEEEwhateverthenumberisforthattrademarkedbus.
    The new board supports SATA RAID, which will give you a performance
    boost for disk-intensive applications. And so on and so forth.

    Do you go for a faster CPU while you're upgrading? Well, sure.
    Nobody wants to buy a new computer with the same MHz number as the
    old one, for psychological reasons if nothing else. But unless you
    raytrace animations for a living or something, it's probably not the
    thing driving you to upgrade.

    --
    Cut that out, or I will ship you to Norilsk in a box.