Microsoft Announces End of the Line For Itanium Support
WrongSizeGlass writes "Ars Technica is reporting that Microsoft has announced on its Windows Server blog the end of its support for Itanium. 'Windows Server 2008 R2, SQL Server 2008 R2, and Visual Studio 2010 will represent the last versions to support Intel's Itanium architecture.' Does this mean the end of Itanium? Will it be missed, or was it destined to be another DEC Alpha waiting for its last sunset?"
How could anyone possibly have any use for servers that don't run Windows?
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
It would appear that the good ship Itanic has struck an MS Iceberg 2010 Datacenter Edition R2!
Seriously, though: is this an admission by Microsoft that HP-UX is(somehow) hanging on at the high end, despite HP's every attempt to mismanage it, or (more likely) is this a consequence of the fact that, at this point, there is nothing Itanium can do that Intel couldn't do better and cheaper just by bolting some extra cache and a few extra Itanium features onto Xeons?
With Alpha finally gone for good, its job is done and it can now sail off into the West.
Lacking <sarcasm> tags,
I am incredibly offended that you would compare this bloated, brute-force, abomination of a chip to the incredibly well designed, elegant, and efficient Alpha (may it rest in peace).
Does this mean the end of Itanium? Will it be missed, or was it destined to be another DEC Alpha waiting for its last sunset?
Kinda funny to make that comparison since the Alpha was killed to enable the Itanium. (Long story involving HP making a deal with Intel to hand over the last of PA-RISC/Itanium processor development to Intel and DEC killing Alpha at the same time to clear out the market since HP was in the process of purchasing DEC/Compaq, although the acquisition was not yet public at the time of the cpucide).
But I doubt its the end of Itanium. Itanium models have things that even the latest Xeons don't in terms of RAS. Most customers don't care about the level of fault tolerance and reliability, but the ones who can't migrate to linux (or Windows) because they are dependent on features of more proprietary OSes like Tandem (now HP) NonStop do need Itanium, and their software is unlikely to be ported to x86 anytime soon (it took at roughly 4 years to get NonStop ported to Itanium to begin with).
When information is power, privacy is freedom.
Having used Alpha workstations, I beg to differ. The Alpha was a design that managed to do the absolute minimum per clock cycle in each pipeline stage. This allowed very high clock speeds, and high theoretical peak performance with very deep pipelines. In reality, the deep pipelines' branch misprediction penalty was so bad you never got close to the theoretical peak performance, and the high clock speeds made them hot and unreliable - poor reliability was the main driving factor for switching to SPARC. Everyone should've been able to see the problems with the Pentium 4 well in advance - it was basically an Alpha with an x86 recompiler frontend, so it suffered from all the same problems.
DEC Tru64 had a lot going for it - lots of good ideas in there. When DEC and HP merged, they should have taken what was worthwhile from HP-UX and integrated it into Tru64, then ported the result to HP-PA. That would've produced a system that people wanted. (HP-UX horrible - nothing behave quite how it should. I'd be surprised if the thing really passed POSIX conformance without some money under the table.)
They all get outmoded.
No one ever had to evacuate a city because the solar panels broke!
The Alpha was a design that managed to do the absolute minimum per clock cycle in each pipeline stage
That is pretty much what RISC was about, in a nutshell.
and the high clock speeds made them hot and unreliable
I don't know what system you were running. I was using an AlphaServer ES40; four 667 Alphas with 8gb RAM. It was one of the most reliable systems I've ever used for HPC. There was a rack of intel x86 systems of the same era right next to it - something like 32 Intel Xeon CPUs - and the Alpha made the rack look silly and wasteful. On BLAST, the Alpha ran circles around the intel rack, and it became even more embarrasing for the intel rack when the data sets got larger. That was only one example, though; we found pretty much anything we could get source code for, the Alpha ran better. And that was going up against 1.8ghz Xeons.
By comparison, the Itanium wants to run native 32bit code (though it certainly doesn't do it well). The compilers aren't easy to setup (even in Linux) and it's hard to find a Linux distro that runs on one. I have an SGI cluster with Itanium2 CPUs in it; I know the care and feeding for this system well.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
No one can stop the x86 train, not even Intel.
GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation.
The alpha didn't even attempt to do out of order execution until the EV6 chip...
The EV4 and EV5 chips were strict in-order processors.
The difference with the P4, is that the p4 was expected to run code that was originally optimized for a 386, whereas the original alpha had code that specifically targeted it... In-order execution works very well when you can specifically target a particular processor (see games consoles), since you can tune the code to the available resources of the processor... The compiler for the alpha was also pretty good, it could beat gcc hands down at floating point code for instance.
In terms of alphas getting hot, the only workstation i remember which had heat problems was the rather poorly designed multia (which used a cut down alpha chip anyway).. other alpha systems i used were rock solid reliable and i still have several in the loft somewhere - one of which ran for 6 months after the fans failed before i noticed and shut it down...
Clock for clock the alpha was pretty quick too, unlike the p4 that was considerably slower than a p3 at the same clock...
http://forum.pcvsconsole.com/viewthread.php?tid=11606 shows alphas getting specfp2000 scores higher than x86 chips running at 3x the clock rate.
A lot of people, myself included, think itanium should never have existed, and that the development effort should have been put into alpha instead - an architecture that already had a good software and user base...
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Debian 27 plans to drop support.
Microsoft has had a strict policy since the dawn of Windows that Windows be built for at least 2 processor architectures at all times. They really worried about i386-isms creeping into the kernel. It pretty much doesn't matter what 2 you choose, as long as it's more than one (and they're somewhat different), it keeps the kernel devs honest. I wonder what they're doing now: perhaps they just decided that i386 and "amd64" are different enough to serve their purpose.
Socialism: a lie told by totalitarians and believed by fools.
The other thing is, keep a full build internally.
The rumor mill says that Microsoft has current versions of Windows built for ARM internally... sorta like how Apple kept x86 builds of Mac OS X internally the whole time.
If the 1.8GHz Xeon was based on the Netburst architecture, first you have to multiply by 2/3rds to correct for diet Pepsi clock cycles, then if your code base is scientific, you have to divide by two for the known x86 floating point catastrophe, and finally, if your scientific application is especially large register set friendly, there's another factor of 0.75. So on that particular code base, a 1.8GHz Netbust is about equal to a 400MHz Alpha (I only ever worked with the in-order edition). Netburst usually had some stinking fast benchmarks to show for itself if it happened to have exactly the right SSE instructions for the task at hand. And it gained a lot of relative performance on pure integer code. BTW, were you running Xeon in 64-bit mode? That could be another factor of 0.75.
A lot of people, myself included, think itanium should never have existed, and that the development effort should have been put into alpha instead - an architecture that already had a good software and user base
Yeah, you and a lot of clear headed people with insight into the visible half of the problem space. Not good enough.
Alpha was a nice little miracle, but it fundamentally cheated in its fabrication tactics. This is a long time ago, but as I recall, in order to get single-cycle 64-bit carry propagation, they added extra metal layers for look-ahead carry generation. For a chip intended Intel scale mass production, this kind of thing probably makes an Intel engineer's eyebrows pop off. That chip was tuned like a Ferrari. I'm sure the Alpha was designed to scale, but almost certainly not at a cost of production that generates the fat margins Intel is accustomed to.
Around the time Itanium was first announced, I spent a week poking into transport triggered architectures. There was some kind of TTA tool download, from HP I think, and I poked my nose into a lot of the rationale and sundry documentation.
TTA actually contains a lot of valid insight into the design problem. The problem is that Intel muffed the translation, through a combination of monopolistic sugar cravings, management hubris, and cart before the horse engineering objectives. I'm sure many of the Intel engineers would like to take a Mulligan on some of the original design decisions. There might have been a decent in there somewhere trying to get out. Itanium was never that chip.
I pretty much threw in the towel on Itanium becoming the next standard platform for scientific computing when I discovered that the instruction bundles contained three *independent* instructions. They went the wrong way right there. They could have defined the bundles to contain up to seven highly dependent instructions, something like complex number multiplication: four operands, seven operations, two results. It should have been possible to encode that in a single bundle. Either the whole bundle retires, or not at all.
Dependencies *internal* to a bundle are easy to make explicit with a clever instruction encoding format. You wouldn't need a lot of circuitry to track these local dependencies. What you gain is that you only have to perform four reads from the register file and two writes to the register file to complete up to, in this example, seven ALU operations. Ports on the register file is one of the primary bottlenecks in TTA theory.
What you lose is that these bundles have a very long flight time before final retirement. Using P6 latencies, it's about ten clock cycles for the complex multiplication mul/add tree in this example (not assuming a fused mul-add). This means you have to keep a lot of the complexity of the P6 on the ROB side (retirement order buffer). But that also functions as a shock absorber for non-determinism, and takes a huge burden off the shoulders of the compiler writers. This was apparent to me long before the dust settled on the failure of the Itanium compiler initiative.
In my intuitively preferred approach, instructions within bundles would be tightly bound and s
Oh come on. It's really disingenuous to be quoting that kind of shit. Have you ever taken a really close look at the kind of hardware the vendors use to get these benchmark numbers? Database app benchmarks are almost always very sensitive to I/O, and these kinds of numbers are usually generated by systems that have their I/O card slots max'd out, with several hundred (if not thousands) of small high speed disks behind them. The cost of these solutions in real life would be crippling. Vendor quoted benchmarks should usually be taken with a generous pinch of salt.
This is a response to my own post. Sometimes after uncorking a minor screed, I note to myself "that was more obnoxious than normal" and then my subconscious goes "ding!" and I get what's grinding me.
The secret of x86 longevity is to have been so coyote-ugly that it turns into pablum the brain of any x86-hater who tries to make a chip to rid the planet of the scourge once and for all.
For three decades right-thinking chip designers have *wanted* x86 to prove as bad in reality as ugliness ought to dictate.
Instead of having a balanced perspective on beauty, the x86-haters succumb to the rule of thumb that the less like x86, the better. And almost always, that lead to a mistake, because x86 was never in fact rotten to the gore. You need a big design team, and it bleeds heat, but all other respects, it proved salvageable over and over and over again.
On the empirical evidence, high standards of beauty in CPU design are overrated. Instead, we should have been employing high standards of pragmatic compromise.
If any design team had aimed merely for "a hell of lot less ugly", instead of becoming mired in some beauty-driven conceptual over-reaction, maybe x86 might have died already.
Maybe instruction sets aren't meant to be beautiful. Of course, viewed that way, this is an age-old debate.
The Rise of ``Worse is Better''
Empirically, x86 won.
The lingering question is this: is less worse less better, or was there a way out, and all the beauty mongers failed to find it?
The Alpha was supposed to run Unix - Tru64 Unix in particular. Running in a proper 64bit environment the Alpha was an incredible chip.
This is a pretty gross oversimplification. First of all, Microsoft spent a lot of money writing a portable OS partially because the conventional wisdom at the time was that RISC would bury x86. (Keep in mind they could have just kept using OS/2.) Digital also badly needed volume for their chip production and make a somewhat serious attempt at the Windows workstation/server market. That Alpha was pigeonholed as a Unix chip is one of the main reasons it failed.
Business. Numbers. Money. People. Computer World.
Just to keep this clear: you're talking about NT (which wasn't even called "Windows NT" initially, internally). NT is almost entirely written in C, and the few architecture-specific parts are abstracted from the core codebase and typically present in assembly modules which are maintained for multiple architectures and which the compiler automatically uses the appropriate one for the current build. There's some use of inline assembly or specifics of x86, but it's all behind #if blocks, with the equivalent checks for other CPU architectures. Overall, NT has been ported to at least 5 architectures that I know of - x86 (32-bit), x64, ia64 (Itanium), PPC, and DEC Alpha. If MS wanted to, it would be possible to port it to ARM, MIPS, SPARC, or almost any other reasonably modern architecture of at least 32 bits.
By comparison, Win9x has a ton of assembly code that enabled it to run fast even on low-end machines, keeping the system requirements down (and making it attractive to home users back in the days before consumer hardware caught up with the demands of NT). Of course, use of assembly like this has downsides - 9x was badly unstable, and completely non-portable. It only ever ran on x86, and I'm not even sure it made much use of the features found in any version after the i386.
There's no place I could be, since I've found Serenity...
The POSIX NT subsystem (and Interix, the user-space software that runs in the subsystem) have existed for a very long time, possibly all the way back to pre NT 4. The NT kernel doesn't actually use Win32 (or Win16, DOS, or Win64) system calls; it uses NT system calls,w hich are a superset of the functionality in all of those, plus the functionality required for OS/2 and POSIX. For example, the NTCreateFile system call not only implements the Win32 CreateFile system call (as seen in Win9x) but also the OpenFile system call (Win16) and the open system call (POSIX). For each API that NT supports, there is a user-mode DLL that translates the API-specific system calls (such as open(2)) to NT system calls (such as NTCreateFile()). These are then passed to ntdll.dll, which executes the actual system call (invoking ring-0 kernel code).
The OS/2 subsystem was discontinued years ago, but the POSIX one is still supported. From XP forward, it's been possible to enable the POSIX subsystem and download pre-compiled libraries, shells, utilities, headers, build toolchain (optionally using GCC or MSVC), manpages, and so forth to produce a working, if somewhat bare-bones, UNIX-like environment. Initially called OpenNT and now known as Interix, various third parties have provided additional functionality such as package managers (apt, portage, pkgsrc, or one specifically for Interix from http://suacommunity.com/ ), additional shells, libraries, utilities, X servers, and more.
There's no place I could be, since I've found Serenity...