Intel Dumps Iitanium's x86 Hardware Compatibility

← Back to Stories (view on slashdot.org)

Intel Dumps Iitanium's x86 Hardware Compatibility

Posted by CowboyNeal on Thursday January 19, 2006 @12:13PM from the back-to-the-drawing-board dept.

Spinlock_1977 writes "C|Net is running a story that Intel is going back to software x86 emulation on Itanium in order to reclaim chip real estate. (room for another 9MB of cache?) One notable quote about x86 emulation: 'Basically, no one ever used hardware-based IA-32 execution, so better to use the silicon for something else,' said Illuminata analyst Gordon Haff. 'Of course, basically no one uses software-based emulation either, but at least that doesn't cost chip real estate.'"

14 of 277 comments (clear)

Min score:

Reason:

Sort:

Re:x86: Intel's biggest mistake by wiredlogic · 2006-01-19 12:33 · Score: 2, Informative

The 640KiB barrier was imposed by the IBM PC architecture not the 8086 hardware. The 8086 can directly address 1MiB of RAM. 4MiB if you isolate each of CS, DS, SS, and ES into their own banks with additional decoding logic.

--
I am becoming gerund, destroyer of verbs.
Re:x86: Intel's biggest mistake by maynard · 2006-01-19 12:36 · Score: 3, Informative

I don't seem to remember any "640K" barrier with the 8088 or 8086. Didn't it support up to 20 address lines? Yup... I thought so. That missing 384K was reserved for ROM, video RAM, and whatever else one might need. And lets not forget the bank switched expanded RAM boards that were around in the day. As one whose family owned an original XT w/ 20MB drive and full 640K from 1983 onward, I can say with assurance that 640K was a whopping amount of RAM in the day. It also cost a buttload.
Re:Better use for sillicon? by PCeye · 2006-01-19 12:40 · Score: 2, Informative

Fake breasts are of silicone...unless you have a thing for bots

Oblig. Futurama ref: "Hey there sailing unit!"
not invented here? by ScottCooperDotNet · 2006-01-19 12:47 · Score: 2, Informative

Could it be the old "not invented here" syndrome?
The 486 on core is VERY, VERY slow... worthless by theendlessnow · 2006-01-19 12:50 · Score: 2, Informative

This is a good thing. The Itanium can emulate the x86 faster than the 'good for nothing' 486 that was on core. It's worthless and NOBODY has been using it for a LONG time.
Irony .... Where? by vanka · 2006-01-19 13:01 · Score: 2, Informative

I'm not sure what your point is with that comment, Apple's emulation of the PPC architecture (Rosetta) is all done in software, which doesn't run at native speed. As I recall, the Itanium had software emulation of x86 at first, then they added I guess they added hardware emulation. Now to cut costs and chip real estate they are taking out hardware emulation and reverting to software emulation. I'm missing the irony in this particular situation. How is this ironic?
Re:Intel is continuing development? by Anonymous Coward · 2006-01-19 13:06 · Score: 5, Informative

Parent mocked:
>Sheesh, the Itanic wasn't exactly a success story. How does it fit into their new roadmap with cooler chips that eat less power? That processor was a goddamn space heater.

See: http://www.ideasinternational.com/benchmark/bench. html

Make special note of the SPECint2000 page and SPECfp2000 pages and also make note of the TPC-C scores.

The Itanium 2 takes the top three SPECint_rate_base2000 spots (128 cores), the top SPECfp_base2000 (single core) and the top two SPECfp_rate_base2000 spots (128 cores). The 64-way HP Superdome (by now they're all Itaniums, so they don't bother noting PA vs Intel) is in four of the top eight nonclustered TPC spots.

In short, the Itanium 2 is the best scientific computing chip on the market, as proven by the SPEC_int_base2000 and SPECfp_rate_base2000 stats (beating out the Power5). Also, it's not too shabby on the TPC numbers, only being edged by the IBM Power 5.

If you don't work with a 16+ core Itanium 2 or Power5, please STFU about them being market failures. They're not marketed at you.
Re:What is Itanium good for, anyway? by Anonymous Coward · 2006-01-19 13:07 · Score: 2, Informative

These are my personal opinions and not those of my employer.

Some users are:
- Certain well-tuned scientific and engineering applications that are floating-point intensive but not memory bandwidth bound. Ideally, the code should have few branches. There is a significant performance bonus for code that can fit fithin the L3. However, the per/processor cost delta over the Opteron is difficult to justify for the standard 2 processor per node compute cluster model.
- Large systems. SGI can support up to 512 processors and 6 TB of memory in a seamless single system image today. This is useful if you need to run applications that require large contiguous memory maps, e.g. certain computational chemistry applications (this is why there is a single half-terabyte system sitting in the room next to me). Additionally, 1-2 microsecond MPI latency is a major benefit for huge MP applications (Pathscale's new Infinipath adapters are very close, however) .
- Real world supercomputing. The difficulty of porting and developing code to Blue Gene cannot be understated- A high linpack score is not necessarily reflective of real-world performance.
- I/O bandwidth- (this is particularly more SGI specific than IA64), the IA64-based Altix is particularly good at streaming large amounts of data. For example, a single 3700 IO-brick has 6 independent 133MHz PCI-X busses (12 ports total). A single system's IO subsystem can handle over 3 gigabytes per second sustained read and write.
Re:as soon as... by msbsod · 2006-01-19 13:26 · Score: 2, Informative

OpenVMS/Itanium - two excellent products, very closely connected, and both pushed together into the niche market in absolute silence. The same happened to OpenVMS/Alpha. What a waste!
Just one thing by Andy+Dodd · 2006-01-19 14:09 · Score: 2, Informative

The Itanium was known as Merced (for a river in Oregon or Washington, I believe), not Mercedes

--
retrorocket.o not found, launch anyway?
Re:x86: Intel's biggest mistake by Zencyde · 2006-01-19 14:32 · Score: 3, Informative

250 or 240? I'll just use the 240 number. 257,698,037,760 bytes using the GiB system. 240,000,000,000 bytes using the GB system. over 17 gigs difference, and you're telling me you won't notice it? :P

--
What day is it? Could you please tell me?
Re:Athlon64 should to it too. by tomstdenis · 2006-01-19 15:34 · Score: 2, Informative

You really ought to read up on the AMD64 design. There is little advantage to ignoring 32-bit opcodes. First off, "64-bit opcodes" are actually extensions of 32-bit ones. That is, 32-bit code can [and often is] smaller than 64-bit code [except when there are a lot of register spills].

Second, the hardware real estate is not that much. If the cpu only did 64-bit opcodes [hint: think something as simple as adding two char's] you'd have MORE overhead as you mask off bytes, words, dwords.

Aside from cache and the ALU decoder space is where there is waste.

Better designs would be to keep the ALU and drop the ISA [or at least re-arrange the opcode map to favour more common opcodes ... ala huffman!].

Tom

--
Someday, I'll have a real sig.
Re:Intel is continuing development? by boner · 2006-01-19 15:56 · Score: 2, Informative

While the Itanic is a nice piece of engineering, please consider the following. The EPIC (or VLIW) architecture employed in the Itanic really puts the burden of optimization at the compiler level. The current generation of compilers is really not good enough to fully leverage the Itanic's computing resources.

While the SPEC and TPC-C numbers are impressive, please consider this: those numbers are the result of several compile-execute-profile iterations. These iterations provided the compilers with the information needed for their optimization decisions.

SPEC reflects workloads that are in general repetitive in nature and are therefore correctly optimized using compile-execute-profile iterations. SPECfp accurately reflects the behavior of number crunching codes and various other flavors of scientific computing. However, the ad-hoc nature of OLTP and Datawharehousing workloads as seen in the real world cannot be as optimized as TPC-C. Workloads that have many data dependent execution paths cannot be efficiently optimized for the EPIC instruction set (other than through speculative branching). Therefore these ad-hoc workloads never reach the performance levels boasted by TPC-C results.

The reason the Power5 edges out the Itanium on TPC-C is exactly for that reason. The RISC architecture allows easier optimization of data dependent execution paths.

While the Itanic is indeed a decent scientific computing chip, price-performance wise it is not better than the AMD Opteron. The main reasons for Itanics high performance are the many parallel execution units combined with the large caches. Both are expensive, substantially decreasing the price-performance.

According to some estimations (too lazy to find the link), it will take maybe two generations of compilers before the EPIC instruction set can really shine. Optimizing code at compilation time on the fly is hard and a lot of investment is needed into optimization routines to get that done properly. Great performance improvements are seen using compile-execute-profile, but currently (afaik) the best running code on Itanium is still hand tweaked. (Checkout: Itanium - A system implementor's tale, Charles Gray et.al. USENIX 2005)

BTW, your last comment was uncalled for, Intel originally did market the Itanic as the be-all, end-all of computing. Although I would be interested to see pictures of a 16+ core Itanium 2+. My latest count of Itanium cores stopped at two. Did you mean 8+ dual core Itanium 2?

just my 2 cents
Re:Shouldn't matter with modern software. by Hurricane78 · 2006-01-19 18:54 · Score: 5, Informative

I'm sorry, but this is plainly wrong!

The CPUs of a modern mobile phone are the same that are in modern gameboys: ARM9 (or sometimes lower)
The only difference are the added chips for multimeda and other stuff in gameboys.

As someone who actually *wrote* a game engine and other apps for mobile phones in java i can tell you that it IS java's fault!
The best proof is that apps compiled directly for the chip run at least three times faster without doing anything better.
So it can't be the chip.
Even with the libs of the phone manufacturer it does not become much better, because additionally to still bein slow as crap it does not run everywhere anymore. Even if you automated the different screen sizes, performances, buttons, and so on...
But at least you don't have to stick with the extremely minimal functionality of MIDP 1 or 2. ;)

At least for me i can say that I will never write a program for a virtual machine ever again!
If you *have* to compile a different version for every phone out there, you at least don't want it to be slow. ;)

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.