Slashdot Mirror


Itanium Update

NegaMaxAlphaBeta writes: "For those of you interested in Intel's Itanium 64 bit processor, EETimes has a nice update article to let us know what's happening with this beast. With an 8 stage pipeline, as opposed to the 20 stage pipeline in the P4, clock frequencies are obviously not as high (~1 GHz). Other notable numbers extracted from the article: 130 Watts power consumption, 328 registers, 6 MB of onchip L3 cache ... quite nice (well, not the power thing). I'm sure many people can appreciate 64 bit integer ops; for me, it means single instruction xor for the 64 bit hash codes used in chess transposition tables."

7 of 297 comments (clear)

  1. Re: "HyperThreading" in IA-64 by 2002 by Bodero · · Score: 5, Informative
    Hyperthreading, as implemented, exists in the Pentium 4 line.

    Right. And there's no indication that something similar will appear in IA64 until at least 2006 (which is the *earliest* that the Alpha team could likely add it to that complex - or if you prefer messy - an architecture if the hooks for it weren't already built in).

    It's a weak second to SMT. With HT, as I understood it, if a processor happens to have a floating point op and an integer op on hand at the same time, it can run both of 'em at once, instead of sequentially. That's the limit to the HT magic. It can't do two FP or integer ops at once.

    Well, real-world server applications could be sped up by 30%, which would mean that HT could execute multiple *non*-FP instructions at once (and the article doesn't say it can't, just that it can't execute two FP ops at once).

    It actually seems to look quite a bit like EV8's SMT, except that we don't know if it currently adds more execution units to the P4 architecture and whether all execution units can be applied to service a single thread if multiple threads aren't present. And, of course, it only supports two concurrent threads rather than four.

    Intel stole and then implemented Alpha technologies for its Pentium, and only much later did it negotiate with Digital to get the official right to use that stuff.

    No: I'm assessing the situation, unlike your propensity for drawing conclusions based on vague speculation and no data.

    IA64 has to all appearances been developed with zero attention paid to things like out-of-order execution (in fact, it was developed explicitly to *avoid* out-of-order execution). OOO and SMT are intimately intertwined in EV8's SMT design, and apparently also in HT's. There's no indication that Intel has until now given any thought toward incorporating SMT/HT technology in EPIC, and every indication that it will thus take at least close to 5 years before such IA64 technology hits the street (especially as incorporating it into EPIC will almost certainly involve radically different internal approaches than those used to incorporate it into EV8 and P4).

  2. Re:What a dog by deranged+unix+nut · · Score: 3, Informative

    I'm sure it is proprietary, but Intel has written it's own optimizing compiler for the IA64 instruction set.

    It is an interesting solution to the performance problem: Rather than just increase clock speed again, figure out the performance details at compile time and arrange the code to help the processor run it more efficiently.

    For example, if you have an if statement and the compiler can determine that 95% of the time the TRUE block will be executed, the code can be arranged so the branch prediction will choose the more frequent route and the pipeline penalty won't need to be paid as frequently. (This is just a simple case of optimization, the IA64 will require insanely complex optimizations, but that is just expanding on what compiler writers have been doing for years.)

    It makes the compiler orders of magnitude more complex, but it could potentially increase execution speed by a couple orders of magnitude too.

  3. Re:IA64 is the "heir apparent" by DarkEdgeX · · Score: 3, Informative
    Pentium III: CPUID - A 'workstation idea' that once again missed its market. Maybe if they'd found a way to node-lock software that can't be used for machine tracing. Maybe that's not what they were after.

    I think you're confusing CPUID with Processor Serial Number (PSN).. PSN, IMHO, was a good idea, but the privacy zealots cried foul and ruined an otherwise good way to lock software to a specific individuals CPU. (YES, I know there are work-arounds that pirates can use (from simply hex-editting the instructions that check for the PSN to writing drivers return false info).) I really wish Intel hadn't backed down on PSN and included it in the P4 (afterall, for those naysayers that don't want PSN, or their identity, revealed to websites or software, you can disable it in the BIOS).

    Oh well. Thought I'd clear that up. CPUID is GOOD. PSN is BAD (to the privacy folk, anyways).

    --
    All I know about Bush is I had a good job when Clinton was president.
  4. Re:Compiler by anonymous+loser · · Score: 5, Informative

    Apparently you're not familiar with VLIW processor design. It's not "throwing it off" to the software guys because it's too difficult to implement. It is dramatically reducing the complexity of the pipeline, thereby increasing throughput by orders of magnitude (see CISC vs. RISC).

    And the compiler has far *more* information than the runtime hardware has. The scheduling hardware is only capable of looking a few instructions at a time to decide how to enhance ILP, whereas the compiler by its very nature has access to the entire program at once, and can perform optimizations not possible in hardware.

    This is further enhanced by a development cycle that includes profiling. As you use the program during development, the compiler can use the same profiling information that is used to "manually" optimize code to perform its own optimizations. With an advanced OS, this become extremely powerful, as some of the registers on the processor actually keep track of profile data at runtime. Then, during page swaps to/from virtual memory, the processor has the opportunity to dynamically optimize and recompile the code.

  5. Re:So, McKinley isn't a properly designed system? by Saidin · · Score: 2, Informative

    McKinley is the CPU, the system in the box that the CPU gets plugged into. So, if the box (system) is properly designed, the CPU never needs to throttle itself.

  6. There was already a 64-bit xor! by Anonymous Coward · · Score: 1, Informative

    for me, it means single instruction xor for the 64 bit hash codes used in chess transposition tables

    Try PXOR in the MMX instruction set. It's been there for years.

  7. Re:Whew! It's fun to be over your head. by doug363 · · Score: 2, Informative
    I think I've figured out what the whole 64-bit thing is about. It means that each instruction (right term?) has more capacity to carry data. This doesn't necessarily mean that it will be twice as fast, of course, because not all instructions are that large.

    Yup, exactly right. It means that the CPU tends to deal mainly with 64-bit (8 byte) chunks of data at a time, instead of the more common 32-bit chunks. As far as programming goes, not everything needs larger instructions. For example, to program a user interface, 32 bit integers are quite sufficient for most purposes (unless you have over 4 billion items in a listbox or something). If you only need to store a number from 1 to 10, using 8 bytes instead of 4 is a waste of memory. (This happens a lot.) However, it is useful for many operations, such as multimedia, games, DSP applications, crypto, etc. etc. These applications would run faster on a 64-bit processor because they can use 1 instruction to manipulate a 64 bit number instead of 2 or more that are necessary to do the same thing on a 32-bit processor.

    The other reason to use 64-bit processors is that it makes it easier to use 64-bit memory addressing. (For various reasons, it's a little easier to program if memory addresses are the same size as integers.) If you have more than 4 GB of RAM, (or you want more than a 4GB address space more precisely) then you need larger pointers. At the moment x86 programs use 32 bit pointers, but the Pentiums and above actually have 36 address lines, so they can use up to 64GB of RAM. Anyway, a 4 GB address space will be fairly cramped in about 10 years, so it's time they bumped that up a bit.

    Intel has an emulation mode in the IA-64 series to allow people to run existing 32-bit programs, but at the moment it's dog slow. (It runs at about the speed of a Pentium 133, if that, when the processor is running at around 700 MHz.) The IA-64 architecture is completely different from the current IA-32 (x86) stuff. I get the impression that the 32 bit emulation doesn't use as many tricks as the existing processors to get programs to run faster. They're also overhauling the motherboard/BIOS stuff that's been around for a long while. (Some of it since the original IBM PC.)

    Of course, just because a processor can do 64-bit operations, it doesn't mean that it's actually faster than its predecessors. For instance, IA-64 has a few weaknesses:

    • It doesn't have an integer multiply instruction. You have to convert to floats and back if you don't want to program the multiply using shifts or something.
    • It doesn't support a floating-point type with better precision than 64 bits (called "double" in many programming languages). This makes it unsuitable for high-precision calculations. Current IA-32 chips can use up to 80 bit floating point values.
    • Intel seems to have tried to include every feature (except see above) but the kitchen sink in the instruction set. Loads of processor hints about instruction grouping, branch prediction, cache hints, and heaps of other stuff. This makes quite a complex design that could be difficult to implement and write really good compilers for. (Then again, Intel could always sell their own...)
    • And all of the space-heater comments.
    Anyway, it remains to be seen what effect the above points will have on its acceptance.