Intel Dumps Iitanium's x86 Hardware Compatibility
Spinlock_1977 writes "C|Net is running a story that Intel is going back to software x86 emulation on Itanium in order to reclaim chip real estate. (room for another 9MB of cache?)
One notable quote about x86 emulation: 'Basically, no one ever used hardware-based IA-32 execution, so better to use the silicon for something else,' said Illuminata analyst Gordon Haff. 'Of course, basically no one uses software-based emulation either, but at least that doesn't cost chip real estate.'"
Intel's chips will use that extra sillicon for a nice pair of fake breasts. That's sure to up their earnings next quarter. Take that AMD.
I think the days of it mattering what the exact instruction set is are pretty much over.
Sheesh, the Itanic wasn't exactly a success story. How does it fit into their new roadmap with cooler chips that eat less power? That processor was a goddamn space heater.
This is very old news. Various sources and die photos have showed this for more than a year... ...and no one cares.
The die space reclaimed was somewhat significant, and the software emulation is faster than the hardware emulation.
-- Have you ever imagined a world with no hypothetical situations?
"Of course, basically no one uses Itanium either..."
if they are going to dump x86 compatibility, why not dump Itanium compatibility and just go back to Alpha?
Why not extend that logic? No one really used the Itanium chip anyway so why not use the silicon to make Yohan's for Apple?
Help fight continental drift.
Why not just say....
Basically, no one ever used Itanium , so better to use the silicon in a more meaningful manner...
1. Stop making Itanium chips
2. Harvest saved silicon
3. ????
4. Profit!
Given ???? involves *cough* implants of some type....
Imagine Intel branded implants.
I'm talk about cyborg implants, what were you guys thinking about!!
There's a sense of irony with Apple having, apparently, no problem getting PPC emulation to work on an Intel x86 ... and Intel having no joy running x86 emulation on IA64. If I didn't know better it would look to me like IA64 is a bag of crap.
Oh, hang on.
Dave
I write a blog now, you should be afraid.
The 640KiB barrier was imposed by the IBM PC architecture not the 8086 hardware. The 8086 can directly address 1MiB of RAM. 4MiB if you isolate each of CS, DS, SS, and ES into their own banks with additional decoding logic.
I am becoming gerund, destroyer of verbs.
Politics. Yes, Alpha is a much superior platform, technically speaking, than pretty much anything else out there today. But for Intel to turn their back on Itanic (thank you, Register, for consistently misnaming the Itanium in such an apt way) would mean admitting that the billions of R&D they spent on it was a waste. HP also has political reasons to not resurrect Alpha.
Damn shame, that. If they'd poured as much money into Alpha as they did into Itanic, they'd have a platform that would whomp all over everything currently in the marketplace.
I don't seem to remember any "640K" barrier with the 8088 or 8086. Didn't it support up to 20 address lines? Yup... I thought so. That missing 384K was reserved for ROM, video RAM, and whatever else one might need. And lets not forget the bank switched expanded RAM boards that were around in the day. As one whose family owned an original XT w/ 20MB drive and full 640K from 1983 onward, I can say with assurance that 640K was a whopping amount of RAM in the day. It also cost a buttload.
Perhaps this is an indication that Intel has finally realized that their strangehold on the CPU market may be threatened by AMD? And that they will have to optimize and trim the fat off their products? Competition is good.
There's obviously a typo in this headline, which I've corrected:
Iintel Dumps Iitanium's x86 Hardware Compatibility.
C'mon Slashdot editors, get with it.
I work in Computational Engineering and can say that I know people who specifically write for epic because it is good at pushing the _huge_ amounts of serial computations (mostly solving large systems of equations) through the processor quickly.
I have personally had a dual itanium workstation sit under my desk for around 9 months. It was ok I suppose. I was doing Finite Element mechanical simulations on it and it did fairly well at it (it helped that it had 8 Gigs of RAM). I also got Gentoo compiled on it (this was before it was really supported) and it worked fairly well as a desktop (had an nvidia quadro card in it).
Personally, I think intel should just give up... they obviously lost the fight. But who knows, maybe it is actually making them _some_ money (although it can't be much).
Friedmud
Could it be the old "not invented here" syndrome?
This is a good thing. The Itanium can emulate the x86 faster than the 'good for nothing' 486 that was on core. It's worthless and NOBODY has been using it for a LONG time.
This is offtopic but... I'm glad to see someone using MiB rather than MB, that makes things so much simpler because of the way metric prefixes work. I salute you!
What day is it? Could you please tell me?
I think removal of the x86-emulation from the Itanium CPU was overdue. It should have never made it into the chip. Every serious software developer would have re-compiled their code on the new chip anyway. What I wish to see next is a dramatic reduction of the power consumption and return to the original promise by Intel to make the Itanium a replacement of the aging x86 architecture, not only for expensive servers, but also for desktop and notebook PCs. The x86 is smashhit because it is available for so many different applications. The Itanium however was pushed into a niche.
If you have the time to hand-optimize your code, it blows anything out of the water. This means it's useful for simple number crunching, but not much else - more processors are generally cheaper than more coders. It was expected that compilers would improve by the time Itanium was adopted, but that hasn't really happened. (I read here that the hurd coders were able to make their Itanium message-passing routine TEN TIMES faster by doing it in hand-coded assembly compared to what a compiler churned out)
I am trolling
Maybe they just make it for the supercomputer folks... a niche market which is probably 10x larger and 100x more profitable than the propeller-beanie AMD fanboy crowd that trolls around here, scoffing at neon-illumiation-free chassis.
https://www.accountkiller.com/removal-requested
I'm not sure what your point is with that comment, Apple's emulation of the PPC architecture (Rosetta) is all done in software, which doesn't run at native speed. As I recall, the Itanium had software emulation of x86 at first, then they added I guess they added hardware emulation. Now to cut costs and chip real estate they are taking out hardware emulation and reverting to software emulation. I'm missing the irony in this particular situation. How is this ironic?
These are my personal opinions and not those of my employer.
Some users are:
- Certain well-tuned scientific and engineering applications that are floating-point intensive but not memory bandwidth bound. Ideally, the code should have few branches. There is a significant performance bonus for code that can fit fithin the L3. However, the per/processor cost delta over the Opteron is difficult to justify for the standard 2 processor per node compute cluster model.
- Large systems. SGI can support up to 512 processors and 6 TB of memory in a seamless single system image today. This is useful if you need to run applications that require large contiguous memory maps, e.g. certain computational chemistry applications (this is why there is a single half-terabyte system sitting in the room next to me). Additionally, 1-2 microsecond MPI latency is a major benefit for huge MP applications (Pathscale's new Infinipath adapters are very close, however) .
- Real world supercomputing. The difficulty of porting and developing code to Blue Gene cannot be understated- A high linpack score is not necessarily reflective of real-world performance.
- I/O bandwidth- (this is particularly more SGI specific than IA64), the IA64-based Altix is particularly good at streaming large amounts of data. For example, a single 3700 IO-brick has 6 independent 133MHz PCI-X busses (12 ports total). A single system's IO subsystem can handle over 3 gigabytes per second sustained read and write.
I'm sure that both of the users of the Itanium are thrilled by this development. They should drop it and use the extra fab capacity to make 8-bit microcontrollers. There's still a market for those.
It's good to use your head, but not as a battering ram.
Intel's biggest failure was taking so dang long to get the Itanium to market. I remember when it was annouced, and a short decade later the thing arrived. By the time it arrived things had changed quite a bit at HP, Intel and the computer markets in general.
business users typically run ancient software from companies or consulting comapanies that no longer exist in binary form only. COmpatibility is more important for Intel than a company like Apple.
People who buy pc's do so because its what everyone else buys.
Its a mess and I am glad I am not Intel. I bet HP has a contract forcing Intel to keep making the Itanium too. They killed the alpha for Itanium and its just astounding after what a few billion in sunkin costs can do to make sure you wont leave for something better.
I think Alpha had a much better chance of taking over. W2k beta3 was out at the same time of the x86 version and Linux and BSD already were ported. HP could keep VMS and their other oses without porting them to another platform. The alpha had great FX32 software and I think some basic hardare assisted x86 emulation but I could be wrong? I remember reading back in 97 that the emulation was as fast as a pentium 166 on the 250mhz alpha. Pretty impressive.
Some mass production to lower costs and more software resulting from Intel and HP being behind it could have brought the windows based software to the platform as well. Just like Apple by now I bet we all would be using alpha based systems. Intel would of had a strong edge over AMD as well as teh AlthonXP would be struggling to compete.
http://saveie6.com/
Yes it is, because MB means 1000 Kilobytes, and MiB means 1024 Kibibytes. Usually you're off by a factor of about 0.91 or 1.1, which can mean a lot. Please don't insult the functionality of accuracy. :D
What day is it? Could you please tell me?
I find is odd that Intel keeps backtracking to its 20 year old Pentium Pro design. Both of their recent high-budget designs, the P4 and the Itanium proved to be a flop to some extent, while the P6/Pentium Pro/PII/PIII/Centrino/Banias architecture has scaled amazingly well since its humble 200 MHz beginnings.
Was there a generation change at the design offices? What else could have caused the most prominent chip design firm to lose its ability to do solid engineering? Granted even the golden boys created a dead end (i960) architecture, it wasn't quite as expensive a mistake as Itanium...
I remember that in the nineties new chip generations would be popping up left and right, each of them offering some really unique and cool innovation in terms of memory management, execution streamlining or heat management. But Transmeta was the last memorable innovation, and since then everyone seems to be exclusively focused on cache megabytes and transistor sizes. I would love to see real experimentation and innovation reintroduced in the CPU arena...
I think the days of it mattering what the exact instruction set is are pretty much over.
Indeed-- which is why it's hilarious that pretty much the entire world is just this moment moving to a single common unified instruction set. The server world has standardized on x86-64, Itanium is a walking corpse; the PC world has standardized on x86 as well, PPC has retreated to video game systems. We are moving to a new world of processor agnosticism, at the exact same time processor agnosticism has become largely pointless.
The Itanium was known as Merced (for a river in Oregon or Washington, I believe), not Mercedes
retrorocket.o not found, launch anyway?
Damn shame, that. If they'd poured as much money into Alpha as they did into Itanic, they'd have a platform that would whomp all over everything currently in the marketplace.
I don't know that I agree. The alpha was a particular set of optimizations. Dual register files, branch-prediction hints. pure 32bit (sub-32 bit data access had to be emulated through a multi-step process). Deep pipeline (for it's day).
But at the same time, they purposefully witheld adding out-of-order execution (plays havoc w/ their highly optimized register configuration). Sparc had similar problems with their rolling register-stack.
I studied the alpha prior to the announcement that their new version would have out-of-order, so I don't know if they ever did go that route.
The point is that by adding all of the techniques that were employed by modern CPUs (aside from slightly higher speed memory), they would not have maintained much of an advantage. Their performance would be comparable to the AMD-64, but not much faster.
I'd still love to see the alpha kept alive, there was absolutely nothing wrong with it, except it's price (for general work-station use).
-Michael
250 or 240? I'll just use the 240 number. 257,698,037,760 bytes using the GiB system. 240,000,000,000 bytes using the GB system. over 17 gigs difference, and you're telling me you won't notice it? :P
What day is it? Could you please tell me?
Memory allocation while coding. :P
What day is it? Could you please tell me?
This means anytime it misses in L1, the entire machine stalls waiting for the data to come back from L2/L3/memory. This is fine for applications where the compiler can figure out all the data dependences and schedule the code to hide these cache misses (i.e. scientific applications). It is not good for your run-of-the-mill GUI programs like Word, Firefox, your favorite email reader, etc. Out-of-order architectures like Pentium Pro/II/III/4 and Athlon hide L1 misses a LOT better because other (independent) instructions can execute while the cache miss is going on.
A few points brought up in the article that I'll respond to:
- Predication - Predication (conversion of if/else code with branches to branchless straight-line code using predicated instructions) is not limited to EPIC/Itanium architectures. Conditional movs (cmov) in x86/AMD64/EM64T are a watered-down version, but they suffice for a lot of simple situations such as the one the article brings up.
- Instruction Level Parallelism (ILP) - Sure, the Itanium can decode/execute/retire up to 6 instructions per clock. That's dependent on two things: a) the compiler finding 6 independent instructions to schedule every clock, b) no L1 cache misses occurring (remember, Itanium is in-order, cache miss = stall).
- ILP is dead anyway - CPU cores are much faster than memory. Any time you have to go to main memory for something, you take a HUGE hit in performance. Who cares if your CPU core executes 100,000 instructions in 0.00001 ns if it takes 100,000 cycles to bring a cache line in from memory? Memory bottlenecks are starting to dominate CPU performance (see this paper for more info), so single-thread performance is going to be dominated by how well the cores mitigate cache misses. Out-of-order cores can do this well (it's getting harder, read the paper), but it's difficult for in-order cores.
- Thread Level Parallelism (TLP) - Any benefits of TLP stated in the article will apply to dual-core out-of-order processors in the same way they will apply to Itanium processors.
- Power - Intel just came out with their dual-core mobile stuff. AMD will sometime before the summer. The article claims that performance per watt is superior for Itanium; that may have been true a year ago, but it's about to not be true.
- Floating point performance - Itanium is the fastest FP chip on the planet. However, a lot of consumer apps aren't floating point-intensive, they're non-FP apps like Word, Firefox, an email client. Performance of these apps, like I said before, is much more dependent on not having cache misses dominate performance. Plus, with SSE2/SSE3 taking over all the FP duties in the latest Athlon64/Xeon/P4s, and Intel and AMD concentrating their efforts on improving those functional units, I bet consumer-level FP performance goes up.
Now, one predicted trend for the future is for all architectures to move to simple, cheap, in-order cores, and put a lot of them on the chip to give increases in TLP without using a hugely complicated, expensive, lots-of-power-and-chip-area out-of-order core. From what I can tell, Itanium is a hugely complicated, expensive, in-order core, not exactly what we need to put 16 cores on a chip. Intel could easily resurrect the original Pentium core, retrofit SSE/SSE2/SSE3 to it, maybe add some runahead execution stuff (from that paper I linked to above) or maybe two-pass pipelining to mitigate the cache misses, and voila: a cheap, in-order core.Oh yeah, this is all academic anyway; backwards-compatibility (x86 has it, Itanium doesn't) is probably going to be the real driving force like it has been for the past 6 years.
You really ought to read up on the AMD64 design. There is little advantage to ignoring 32-bit opcodes. First off, "64-bit opcodes" are actually extensions of 32-bit ones. That is, 32-bit code can [and often is] smaller than 64-bit code [except when there are a lot of register spills].
... ala huffman!].
Second, the hardware real estate is not that much. If the cpu only did 64-bit opcodes [hint: think something as simple as adding two char's] you'd have MORE overhead as you mask off bytes, words, dwords.
Aside from cache and the ALU decoder space is where there is waste.
Better designs would be to keep the ALU and drop the ISA [or at least re-arrange the opcode map to favour more common opcodes
Tom
Someday, I'll have a real sig.
They do, hard-drive companies use the metric standard when listing sizes, while Windows will read it in the ibi-prefix standard. Which is why you lose about 10% of your data capacity upon purchasing a hard-drive. :P
What day is it? Could you please tell me?
I studied the alpha prior to the announcement that their new version would have out-of-order, so I don't know if they ever did go that route.
;)
Yep, with the 21264 - aggresively out-of-order CPU. The 21064 and 21164 might not have executed instructions out-of-order, however they were highly speculative. AXP arch was designed for out-of-order from the beginning, the two early CPUs did memory IO out-of-order. 21064 had a 32 entry register file it seems, not 2, btw, according to a paperp on the AXP 21064 I found on google written by a DECy.
Their performance would be comparable to the AMD-64, but not much faster.
Agreed, cause guess what: AMD64 is Alpha's progeny-in-spirit.
The AMD K7 is very alpha-like (hence so is the K8). Highly speculative, out-of-order, wide multiple issue CPUs like the 21264. Not co-incidentally given that Dirk Meyer, co-architect of the 21264, led the AMD K7 design team. K7 used the 21164/21264 EV6 PtP interconnect too. K8 made it routable with HyperTransport - just as DEC^WCompaq did with EV6 in the 21364. You would still expect this mythical equivalently developed Alpha to beat AMD64 though, given it'd be able to use the die-space 'wasted' on x86-decoding for something more productive (cache or somesuch).
I use Friend/Foe + mod-point modifiers as a karma/reputation system.
The x86 was never used on itanium ? crap.
Sure ist was ( and I assume is ) used - for the firmware. IIRC, the EFI-firmware of the Itanium boxen was entirely x86. They use the x86-ISA for running the x86-based firmware of add-on cards. That way, Itanium boxen are able to use about any PCI-card out there, without
them having any special firmware.
Alphas did that in software, which mostly worked but far from working with everthing.
SPARCs and the PowerPC-based Apples have PCI, but neither is able to handle standard
PCI-cards for exacty that reason, which is why you have to shrug off $$$ to get the same
PCI-hardware with their native firmware support.
Ok, any PCI-card stuffed in an Itanium box would need decent OS-drivers, but at least
that is in the realm of the OS-vendor and drivers can be ported. Only very few
PCI HW-manufacturers ever did anything but x86 firmware, geared towards BIOS.
EFI, the firmware that ships with Itaniums, is quite good at handling that crappy
PC-BIOS type firmware. Need a decent RAID-controller ? Just stuff it in.
I'd call that a big plus. There are and have been numerous misconceptions about Itanium
from the very beginning, but saying "Nobody needs on-chip x86" is utterly stupid.
IIRC, the chip "real-estate" needed for x86 was in the lowish single-digit percentage
of the total chip-real estate. And it was a good investment, since it saves $$$ for
anybody running Itaniums. It was there for exactly that purpose, until some marketing
freak obviously decided to sell that as "backwards compatibility". x86 on Itanium was
and is dead slow, but for POST/Init purposes, it is sufficient.
Please, intel, keep it. If Itanium is ever going to be a success, users will happily
welcome the ability to extend systems using standard off-the-shelf components.
And, while we are at it, start shipping EFI for the "x86-crowd" now. I think, i am not
alone with the perception, that hitting "CTRL-S", "ESC whatsoever" at the right moment
during POST to enter some firmware configuration tool of some card, just plain sucks.
I want a firmware shell. I want x86-style SRM. EFI is close to that. Intel even
open-sourced major parts of EFI ( www.tianocore.org ). AFAIK, the Intel-based Apples
will use it. I want it too.
For gods sake, keep x86 in Itaniums.
Regards