Intel and HP Commit $10 billion to Boost Itanium

Intel just removed 32bit hardware support by TubeSteak · 2006-01-26 17:52 · Score: 3, Informative

http://www.digit-life.com/archive.shtml?2006/0125

an Intel spokesperson confirmed that the Montecito platform, which will premiere the company's next-generation 64-bit Itanium architecture, will dispense with executing all 32-bit instruction set applications on-die, prompting customers to opt instead for software-based emulation which Intel promises will be faster anyway.

The rest of the article is quite interesting. They claim that 32bit software emulation will outperform by "[greater than a factor of three]" their old hardware implementation.

Anyone want to tie this into their $10 billion push?

--
[Fuck Beta]
o0t!

It is rather uninspiring to see all the negativity by Superfarstucker · 2006-01-26 18:37 · Score: 3, Informative

Sure, it is a huge sum of cash and perhaps the 'shareholders' might get more short term benefit out of investing the same sum of money into commodity microprocessor R&D but the itanium could eventually pay off in a big kind of way. It seems that most people posting here are just as impatient as shareholders when it comes to results, they want them NOW! Good things can't always manifest themselves in a short period of time and I think it is impressive that Intel & HP continue to invest money into something that has yet to produce any tangible benefits over existing architecture. I'm willing to bet that x86 isn't the omega to processor design ideology, and itanium may not be either, but Intel & HP seem to believe it is a step in the right direction. Very few people that post here have the knowledge necessary to even begin assessing whether such a design may ever pan out and it appears the jury is still out among those who have the capacity to decide. Meanwhile Apple continues to recieve gratuitous praise for releasing shiny white computers with chamfered corners. Maybe if Intel & HP invested 10 Bn into cosmetic processor design they would be recieved more favorably with the press.

Re:AMD64 by demachina · 2006-01-26 18:45 · Score: 4, Informative

Its pretty good for vectorizable Fortran codes like those typically run on supercomputers, finite element analysis, computational fluid dynamics, crash codes, and 3D molecular modeling. These kinds of codes can be scheduled by compilers to take full advantage of the instruction parallelism in Itanic's EPIC instruction set. Itanic is a dog on most of the C and C++ codes most of the rest of the world uses on their computers because compilers have a pretty hard time scheduling four instructions in parallel at compile time on C and C++ codes.

There is a market for Itanic in some traditional supercomputing applications but it is a relatively small market and never been a big growth market. I really doubt Intel and HP will ever recover the billions they've already sunk in to Itanic, let alone another $10 billion.

I imagine the people at AMD are dancing in the streets at this news because Intel and HP are going to keep throwing even more good money after bad trying to salvage this dog. Its money that they wont be investing in R&D in markets that really matter.

AMD can continue their push to dominate servers, workstations and desktops. If they could crack laptops, phones and embedded apps Intel would be in serious trouble.

--
@de_machina

Re:AMD64 by jayslambast · 2006-01-26 18:51 · Score: 4, Informative

Well, not in a small processor system, but once you start building larger and larger systems, Itanium (or Power5+) have the extra 'features' for error handling and reporting that an x86 don't have. Xeon and Opteron have the error handling of a fleet of 1950's cars. Sure they have alot of horsepower, but when they break down it stops running. You might have to drive the car a couple of more times to determine whether the car needs to be replaced. In a large computer system, this increases the down up time of a system. Itanium is like a BMW X3. Sure its a gas hog, and maybe a little less horsepower, but when it breaks down, you have tons of status lights to tell you what's wrong, and which processor is broken and whether the part is still good (a cache single bit correctable error) or needs to be replaced (mbe error on the fsb.) In large system, you can determine the source of the problem, whether it was an ignorable or replacible the processor error or a chipset problem.
If any of you have ever put together a computer that has a bad part, its sometimes really hard to figure out what caused the problem. Systems that Itaniums usually go in have the error detection and error logging to exactly pinpoint where problems lie. This is the reason oracle DBs use these type of processors. It doesn't make sense for the common user to use Itanium, but companies like Amazon and Visa want these systems more for the reliability features than the speed.

I hope this works. by megabeck42 · 2006-01-26 18:52 · Score: 3, Informative

Truth be told, IA64 is a fantastically better architecture than IA32 or x86-64. Some of it's current caveats, for example, suboptimal software support and high costs, are not due to it's technical qualifications or drawbacks. Once the architecture reaches a critical mass and reasonable market acceptance, these issues should disappear. (more chips -> more people will target software for it, more chips produced in volume -> less cost per chip, etc.)

It's other caveats, for example, poor compiler support, are issues that need to be considered carefully. I'd like to specifically address the poor compiler support. I am not concerned about this issue for the following reasons:

1. Compilers can improve easily, with a recompile. If the architecture achieves a critical mass, then more people and organizations will justify the time and effort to improve compilers on the architecture. Not only can they improve, but taking advantage of such improvements would not require replacing hardware, which makes it an issue of time.

2. The architecture is much more realistic about the guarantees that it's willing to make as a processor. One of the early complaints, was that initial generation of compilers for IA64 would generate, on average, 40% NOPs. It's important to consider a few details when regarding that statement.
A. First, each clock cycle could allow the execution of up to 3 concurrent operations.
B. Second, the architecture is not inserting extra NOPs transparently into the pipeline, as almost all modern processors do in the event of a pipeline data hazard. This fact can be viewed different ways.
i. Most modern processors have to evaluate wether to insert a pipeline stall every single time that an instruction is executed. This is, essentially, wasted work because such a computation could be done by the assembler, however, it does spare the processor the burden of loading useless NOPs into the pipeline and the cache. On the other hand, minimizing the logic that a processor has to complete per cycle generally decreases the minimum amount of time necessary per clock (meaning that it could scale to higher clock speeds.)
ii. The immediate question is, does reading all these NOPs out of memory cause a bigger hit to performance, than making the processor calculate the data hazards? Personally, I don't know. But, let's consider the idea for a moment. On both processors, let's assume that the instruction cache is fast enough to deliver data without wait states, assuming the cache has the data. When your processor is prefetching well, then the NOP issue shouldn't be a big issue. (Except for the fact that the NOPs will now be in the binary, making the binaries larger. I consider this a moot point given the inexpense of modern storage.) When your prefetcher can't anticipate correctly, though, I think the IA64 loses. Both IA64 and other modern architectures have branch predictors, so I suspect unanticipated branches which cause a pipeline flush (unavoidable) and unanticipated cache fills (unavoidable) will be mitigated roughly equally, But because the IA64 has longer instructions that aren't quite as dense, the IA64 will stall longer. Btw, I'm ignoring data stalls, to simplify my argument and because I don't think the architectural differences in the IA64 will significantly impact it. I'd enjoy being corrected on this point.
The IA64 includes a predicate register, which stores the results of comparison instructions. Instructions in an IA64 'bundle' can be qualified to be executed conditionally, based on the condition of a certain bit in the predicate register. This allows the IA64 to avoid some branches. The compiler/assembler can pack a bundle which includes the appropriate two instructions, each qualified to execute for different states of the predicate register. Essentially, the processor is simultaneously issued the commands for both p

--
fnord.

Re:I hope this works. by megabeck42 · 2006-01-26 20:07 · Score: 2, Informative

ad 1: Compilers can improve easily, with a recompile. this remark I consider extremely naive and it really, really hurts your credibility.

Agreed. The point I was trying to make was that realizing the benefits of compiler improvements requires updating your software, not replacing the processor. Obviously, recompiling the same software isn't going to be an advantage.

B huh? Are you mixing up RISC and VLIW (EPIC) designs?
No, I'm not mixing them up. I was trying to compare their merits.

Essentially, I tried to reason a guess to the following question.

What would be the effect of removing the data-hazard protection from the chip and relying on the compiler to insert explicit noops? I surmise that a unpredicted branch will hurt more on IA64.

Then I babble on for too long about different features. Sorry.

Look at Itaniums performance on data dependent branches, it is underwhelming...
This is unfortunate; do you know what is limiting the chip here?

Itanium greatly (like: insanely) benefits from repeated compile-execute-profile iterations of the benchmark.
Where, generally, does the compile-and-execute profile work improve things? Does it use the profiling output to hint the processor's branch predictor?

Patterson and Hennesey, Computer Organization and Design - on the shelf, well worn and well read.

I'll check out the others at the library.

--
fnord.
Re:I hope this works. by twitchingbug · 2006-01-26 21:51 · Score: 2, Informative

Agreed. The point I was trying to make was that realizing the benefits of compiler improvements requires updating your software, not replacing the processor. Obviously, recompiling the same software isn't going to be an advantage.
Ah... but you see. this is the problem. improving compiler technology is extremely hard. Of course, the big hope in VLIW and EPIC architectures was that compiler technology would improve by some huge factor. This hasn't really panned out. Most code that we run is highly data dependent and branches way too frequently to parallelize anything. This is the same reason chips are moving to multiple cores now. It's hard to eek out that extra 3% single thread performance now - in chip or in the compiler.
From your original post...
Most modern processors have to evaluate wether to insert a pipeline stall every single time that an instruction is executed. This is, essentially, wasted work because such a computation could be done by the assembler, however, it does spare the processor the burden of loading useless NOPs into the pipeline and the cache
uh this doesn't make any sense. Inserting nops for data dependencies/cache misses/etc doesn't "burden" processors. The only burden is if you happen to load your instruction stream with a ton of useless NOPs. Now I don't know IA-64 well, but somehow I doubt they removed all data dependency stalls - the instruction code explosion would amazing. your binaries would be huge.
Look at Itaniums performance on data dependent branches, it is underwhelming... This is unfortunate; do you know what is limiting the chip here?
data dependant branches - the hold back is that it's a serial stream of instructions. you can't parallelize code at all if each instruction is dependent on the instruction before it.
Where, generally, does the compile-and-execute profile work improve things? Does it use the profiling output to hint the processor's branch predictor?
no, you feed back the profiling information to the compiler, which will use loop counts and branch results to unroll certain loops, spend more time software pipelining heavily used loops, moving around basic blocks to reduce branching and increase block sizes. then you'll get faster code. Of course, it's not unheard of for intel or amd to make specific compiler optimizations to speed up SPEC. When I mean specific, i mean like very specific. if you see a unique-only-to-SPEC block of code, then compile into the nice hand optimized assembly. :P

Re:Itanium vs. Ultrasparc T1 by raftpeople · 2006-01-26 19:25 · Score: 2, Informative

Itanium has been taking market share from Power????

"Sales of IBM's Unix systems, called the pSeries, grew 15% in the first quarter and 36% in the second quarter--far outpacing Sun and HP. The trend should continue in the fourth quarter--historically, industrywide Unix sales have spiked 25% during this period--and into 2006, when IBM introduces a new high-end chip called Power5+."

Re:Apple by be-fan · 2006-01-26 19:32 · Score: 2, Informative

The G6 would've been a POWER5 derivative. The POWER5 is a massively out-of-order RISC. Itanium is an in-order VLIW. They share nothing in common. The IP would've been useless.

--
A deep unwavering belief is a sure sign you're missing something...

Re:Itanium vs. Ultrasparc T1 by ChrisGilliard · 2006-01-26 20:06 · Score: 2, Informative

The T1 has terrible FP performance

Yes, you're right about this. The T1 can only do a single thread of floating point ops at a time. This is why it's being marketed to the web/ap server market which don't do many flops. Sun is working on a new chip code named Rock which will address these issues. If I remember correctly rock will support 8 floating point threads at a time. It will also have some really awsome I/O lookahead features that allow a special 'thread' to read thousands of instructions ahead and look for I/O that can be started early. What the T1 is going to do to the Webserver market, Rock will do to the high end number crunching market.

--
No Sigs!

Re:Itanium vs. Ultrasparc T1 by JonAnderson · 2006-01-26 23:04 · Score: 3, Informative

Your logic is based around the concept that every task is highly parallel - they're simply not.

Well, the server I am logged into right now has 358 processes running. Each of which has a least 1 lwp which equates to at least one thread. How many people have a server running one process with one thread?

Even Sun don't claim that a T1 is comparable to an Itanium/Power/Sparc for tasks which need a few fast cores, which is why they use examples like Java application servers as the primary benchmark.

Like specweb? like sap sd 2 tier? like Lotus notes? These are just the published benches.

The Ultrasparc T1 is not a high-end machine, it's a low end one designed to compete against cheap x86 machines, I think the main surprise for me is that it's not available in a blade form-factor.

Exactly. The T1 costs $26K in it's most expensive config (32GB DDR2) for a 2u system capable of beating out bigger, hotter Itanic, x86 and Power systems on certain workloads (contrary to what you think, those certain workloads represent a significant segment of what customers buy these types of machines to do). There are definite plans to have a blade version out this year.

n 90% or more of workloads out there, a 32 thread core would have about 28 cores sat idle and 4 cores working flat out.

Really? Thats sounds like total bollocks to me.

Slashdot Mirror

Intel and HP Commit $10 billion to Boost Itanium

11 of 272 comments (clear)