Itanium Update
NegaMaxAlphaBeta writes: "For those of you interested in Intel's Itanium 64 bit processor, EETimes has a nice update article to let us know what's happening with this beast. With an 8 stage pipeline, as opposed to the 20 stage pipeline in the P4, clock frequencies are obviously not as high (~1 GHz). Other notable numbers extracted from the article: 130 Watts power consumption, 328 registers, 6 MB of onchip L3 cache ... quite nice (well, not the power thing). I'm sure many people can appreciate 64 bit integer ops; for me, it means single instruction xor for the 64 bit hash codes used in chess transposition tables."
Right. And there's no indication that something similar will appear in IA64 until at least 2006 (which is the *earliest* that the Alpha team could likely add it to that complex - or if you prefer messy - an architecture if the hooks for it weren't already built in).
It's a weak second to SMT. With HT, as I understood it, if a processor happens to have a floating point op and an integer op on hand at the same time, it can run both of 'em at once, instead of sequentially. That's the limit to the HT magic. It can't do two FP or integer ops at once.
Well, real-world server applications could be sped up by 30%, which would mean that HT could execute multiple *non*-FP instructions at once (and the article doesn't say it can't, just that it can't execute two FP ops at once).
It actually seems to look quite a bit like EV8's SMT, except that we don't know if it currently adds more execution units to the P4 architecture and whether all execution units can be applied to service a single thread if multiple threads aren't present. And, of course, it only supports two concurrent threads rather than four.
Intel stole and then implemented Alpha technologies for its Pentium, and only much later did it negotiate with Digital to get the official right to use that stuff.
No: I'm assessing the situation, unlike your propensity for drawing conclusions based on vague speculation and no data.
IA64 has to all appearances been developed with zero attention paid to things like out-of-order execution (in fact, it was developed explicitly to *avoid* out-of-order execution). OOO and SMT are intimately intertwined in EV8's SMT design, and apparently also in HT's. There's no indication that Intel has until now given any thought toward incorporating SMT/HT technology in EPIC, and every indication that it will thus take at least close to 5 years before such IA64 technology hits the street (especially as incorporating it into EPIC will almost certainly involve radically different internal approaches than those used to incorporate it into EV8 and P4).
The way I understand it, Intel bought Alpha not to praise it, but to bury it.
I'm sure it is proprietary, but Intel has written it's own optimizing compiler for the IA64 instruction set.
It is an interesting solution to the performance problem: Rather than just increase clock speed again, figure out the performance details at compile time and arrange the code to help the processor run it more efficiently.
For example, if you have an if statement and the compiler can determine that 95% of the time the TRUE block will be executed, the code can be arranged so the branch prediction will choose the more frequent route and the pipeline penalty won't need to be paid as frequently. (This is just a simple case of optimization, the IA64 will require insanely complex optimizations, but that is just expanding on what compiler writers have been doing for years.)
It makes the compiler orders of magnitude more complex, but it could potentially increase execution speed by a couple orders of magnitude too.
So when most people go out and buy a computer, they see a lot of mhz and think it's really fast. So if they're use to 2ghz+ pentiums, why would they even think of buying a 1ghz itanium? Sure, I know it'll probably be faster, but how does intel plan to market these? Will they also drop mhz ratings like AMD? Or will they go on some major re-educaiton campaign, like Apple?
F-bacher
James Tiberius Kirk: "Spock, the women on your planet are logical. No other planet in the galaxy can make that claim."
...the Itanium product line will see its speed increase from 800 to 1 GHz, which is half the frequency of the company's fastest 2-GHz Pentium 4....Intel contends, however, that the faster front-side bus, more on-chip memory and redundant logic resources will more than make up for the processor's lag in clock speed.
We can only hope that this chip helps the media away from using clock speed as the primary (often only) measure of performance.
for me, it means single instruction xor for the 64 bit hash codes used in chess transposition tables
Watch where you say that, or you'll be using that nifty Itanium to repel the hordes of women instinctively flocking to you like the salmon of Capistrano.
Is anyone else so completely stunned as me, that essentially everyone (except AMD) has rolled over and allowed the IA64 to be crowned heir apparent as the new high-end microprocessor? The Alpha is dead by acquisition, HPPA is dead by partnership, MIPS is lost somewhere in the low end, and Sparc and Power4 are both retreating upstream.
It's amazing that ANYONE can field the number of mistakes that Intel has, and get away with it. For some time now, their first-outs have been essentially flops:
Pentium: Remember the 5V room heaters?
Pentium: Then the 3.3V units with floating point bugs?
Pentium Pro: The ancestor of the Pentium II/III line was a good CPU in its own right, and worked well for Unix and OS/2. But it completely missed the market, performing terribly on 16 bit code.
Celeron: DeCeleron, until they put the cache back on. From another point of view, the whole Celeron program has been a disaster, either by its own crippling, or by revealing how overpriced the PII/PIII line is.
Pentium III: CPUID - A 'workstation idea' that once again missed its market. Maybe if they'd found a way to node-lock software that can't be used for machine tracing. Maybe that's not what they were after.
Pentium 4: Let's face it, this CPU is just plain uneven and imbalanced. After a round of redesign to even it out, just like with the others, it could very well be an excellent CPU. Tame the prefetch, expand the trace cache, etc.
Itanium: Didn't even make it out the door before spin-doctoring began. "Just wait for McKinley!" I've already heard one set of rumors that McKinley isn't going to *really* do it either, so just wait for IA64-III.
Is all this any better than the "Just wait for this new release!" that Microsoft keeps pulling? Though I guess Intel does generally get each family right on the second shot.
AMD has a good product, I just wish they were a little less mum, and had a better response than warmed-over P-numbers. I also wish we could hear a bit more noise about the Hammers.
The living have better things to do than to continue hating the dead.
(bolding is my emphasis)
To protect against heat-related system meltdowns, McKinley includes a programmable thermal trip that can throttle processor performance by 40 percent to cut power consumption. But the company sees that more as a safety net, not as an answer to thermal issues. "This should never be needed in a properly designed system," said Naffziger.
Apparently you're not familiar with VLIW processor design. It's not "throwing it off" to the software guys because it's too difficult to implement. It is dramatically reducing the complexity of the pipeline, thereby increasing throughput by orders of magnitude (see CISC vs. RISC).
And the compiler has far *more* information than the runtime hardware has. The scheduling hardware is only capable of looking a few instructions at a time to decide how to enhance ILP, whereas the compiler by its very nature has access to the entire program at once, and can perform optimizations not possible in hardware.
This is further enhanced by a development cycle that includes profiling. As you use the program during development, the compiler can use the same profiling information that is used to "manually" optimize code to perform its own optimizations. With an advanced OS, this become extremely powerful, as some of the registers on the processor actually keep track of profile data at runtime. Then, during page swaps to/from virtual memory, the processor has the opportunity to dynamically optimize and recompile the code.
328 *physical* registers, not logical (ISA accessible). with 128 context switches will hurt big time ia64. yet another bad design decision of the itanic.
A context switch happens one in a blue moon. Fast context switches are not going to make up for sluggish performance for the real work the machine is doing between context switchs. Registers are considerably faster than cache; the absolutely fastest cache in the world is P4's L1 cache which has a load latency of 2 cycles, and on most architectures it is 3 cycles. Putting 128 qwords into registers is an absolutely dramatic speedup for programs which have a working set more than 8 dwords (all that IA-32 gives you).
Mattel offered "barbie" and "hot wheels" computers earlier this year ... maybe intel could go in with Mattel and offer an Easy Bake Itanium computer.
Free Techno/Jazz/DNB/MI Music by guys obsessed with monkeys!
This makes me wonder, how many Crusoe processors could you put in a box (all other components equal) and equal this power consumption? Would the performance of such a box meet or exceed the performance of an Itanium box for real-world servers?
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
With an 8 stage pipeline, as opposed to the 20 stage pipeline in the P4, clock frequencies are obviously not as high (~1 GHz).
...6mb of on die cache...
This beast has a small wang... its not the size that counts, but how you use it. (no giggling from the girls damn't)
130 Watts power consumption...
Who needs space heaters anyway?
OY! Hold your wallet tight, not for the light bank accounted!
I'm sure many people can appreciate 64 bit integer ops; for me, it means single instruction xor for the 64 bit hash codes used in chess transposition tables.
Not quite what the intel boys will be using in their next commercial. However, the wizards in marketing will be stressing the enhanced features of porn browsing. The fourth blue intel commando will be a scantily clad woman... further emphasizing the need for this processor which will not just make the internet faster, but will speed on your favorite pron sights.
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
> It is an interesting solution to the performance
> problem: Rather than just increase clock speed
> again, figure out the performance details at
> compile time and arrange the code to help the
> processor run it more efficiently.
That is neither interesting nor a solution. People (i.e. compiler writers) have been working on this for forty years with some (limited) success.
> the IA64 will require insanely complex
> optimizations, but that is just expanding on
> what compiler writers have been doing for years.
Just because the IA64 demands heroic compiler optimization to make up for its shortcomings doesn't mean that the ability to write such compilers will suddenly spring out of nowhere.
Compiler researchers haven't just been sitting on their butts for the last forty years.
> For example, if you have an if statement and the
> compiler can determine that 95% of the time the
> TRUE block will be executed, the code can be
> arranged so the branch prediction will choose
> the more frequent route and the pipeline penalty
> won't need to be paid as frequently.
This was a bad example. Dynamic branch predictors (such as you find in any modern fast CPU) do a great job in practice, better than any known static predictors.
Well its fairly obvious that you are an expert on cpu design.
I've programmed about a dozen chips in both the games field and compiler-writing field, I don't design chips any more than Eddie Irvine designs racing cars. But I don't think I'll ever see him getting into a tractor for his qualifying lap.
Raw speed became less important for most applications, so intel added mmx to speed up multimedia.
What planet are you on? MS and Intel have conspired to make raw speed as important as possibe for years. I personally have been offered payment by Intel to produce slower software as part of their "everybody must upgrade" roadmap. MMX came as a direct response to the increasing performance of 3D boards which reduced the need for a faster CPU. Intel fear anything which reduces the need to upgrade so they tried to fight back with MMX. That fear led to the only sigificant addition to the instruction set since the 386.
Once a few quality compilers are around this won't even be an issue.
You grossly underestimate the difficulty of this instruction set. I doubt there will ever be more than one (ie Intel's) good compiler and I doubt there will ever be even one which is reliable and predictable.
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"