More Itanium-Linux Capability

← Back to Stories (view on slashdot.org)

More Itanium-Linux Capability

Posted by ryuzaki0 on Wednesday February 16, 2000 @04:23PM from the make-bzimage-;-make-modules dept.

gregus writes "Cnet is reporting that SGI and Red Hat have released their Itanium compilers and will make them open source." Mentions the Trillian kernel porting effort, and other stuff. Kinda a fluff piece: any piece that explains what a compiler is is probably fluff ;)

8 of 69 comments (clear)

Min score:

Reason:

Sort:

Re:Aww yeah. by Guy+Harris · 2000-02-16 14:32 · Score: 4

I think TM should at least document the instruction set for their chips

You left an "s" out following "instruction set"; Transmeta's technical white paper on Crusoe says on pages 7 and 8 that "the native ISA of the model TM5400 is an enhancement (neither forward nor backward compatible) of the model TM3120's ISA and therefore runs a different version of Code Morphing software."
As others have noted, publishing the native instruction set architecture may trap them into continuing to provide products that implement that ISA (or writing a binary-to-binary translator (he says, avoiding the "CM" phrase) to map that ISA to the new chip's native ISA), and that appears to be one thing they don't want to do - they want to be able to change the internal instruction set from product to product as they think appropriate.
Re:RedHat/Cygnus IA-64 Developer Release README by Mr+Z · 2000-02-16 14:50 · Score: 3

The Haifa scheduler and other "interesting" pieces in the backend should really help alot. From what I recall, Haifa includes a software pipeliner as well as some other block-scheduling pieces which will be very necessary to get parallelism out of this beast.

One thing I wonder is whether they're actually generating bundles, or if they're just issuing a serial code stream. For the uninitiated, a bundle is Intel's term for a group of instructions that have been marked for parallel execution. An early compiler port that's striving for correctness need not know about bundles by simply issuing bundles which contain a single instruction each. The peephole optimizer might do trivial pairing of instructions after-the-fact, but you really don't get alot of parallelism that way, trust me.

The compiler won't truly shine until the full IA-64 pipeline model, complete with instruction latencies, numbers and mixes of functional units, etc. is described in minute detail to the compiler, and the compiler has the infrastructure for stitching together tightly packed bundles. There are many techniques and optimizations that will need to be implemented in order to stitch those bundles together.

It'll be even more interesting if the compiler can tune for different EPIC iterations, since different chips will have different numbers of functional units. Although the EPIC encoding is scalable, the best performance will be reached if the code provides parallelism which matches the available hardware, rather than exceeding it, since overly parallel code may tie up more registers than is necessary and will trash the instruction cache if it's unrolled too much.

I'm willing to wager that this early GNU C port is available now because the IA64 offers a protected pipeline. IMHO, the single biggest difference between EPIC and VLIW is that EPIC provides pipeline interlocks, whereas traditional VLIW exposes all delay slots and requires the programmer to get it right. While the protected pipeline allows early compilers to ramp up quickly, it also lowers the performance ceiling for a given transistor count.

If anyone here wants to see really hairy VLIW code, go check out TI's C6000 benchmarks page. The C6000 can issue 8 instructions every cycle, and has a fully exposed pipeline. (For those of you crazy enough to click the link, the '||' are used to denote parallel instructions, and branches occur 5 cycles after they're issued.) It's an absolute blast to program by hand (it's my day job), but you don't want to program anything larger than a function in scope. You get a very strong appreciation for compiler technology too. :-) Let me tell you, I've seen some of these "interesting" optimizations coming from the C6000 compiler, and they're pretty mind-bending. I wonder how long they'll take to get these into the IA64 compilers...
--Joe
--

--
Program Intellivision!
EPIC is much more than VLIW. by Mr+Z · 2000-02-16 15:19 · Score: 5

EPIC adds an awful lot to the VLIW base. It encodes explicit parallelism, much like VLIW does, but it breaks away from some VLIW principles in order to make it easier to get initial compilers targeted to the platform and easier for Intel to change the pipeline later.

Traditional VLIW machines sport a "fully exposed pipeline", which means that if an instruction takes more than 1 cycle, the program doesn't see the result until it's actually written back, and the machine lets the user read the old value in the meantime. (For those of you who are familiar with the MIPS or SPARC architectures, you might recognize this concept as "delay slots". VLIW takes this to the extreme such that all delay slots are always fully exposed.) The benefit of this is that you eliminate pipeline interlocks, thereby simplifying the hardware greatly. The pipeline always knows it can issue the next instruction and never has to compare notes between packets. Very clean, and quite simple compared to the heavy voodoo modern CPUs currently perform.

EPIC, in contrast, offers a protected pipeline. From what I've read, it sounds like it's using a simple scoreboard approach to keep track of in-flight values, so it's not nearly as complex as the many register-renaming approaches that are out there; however, it's still quite a bit more complex than the traditional VLIW approach. The protected pipeline makes it easier for Intel to change the pipeline depth later. VLIW doesn't have that luxury for its native code, since changing the pipeline changes the delay slots and breaks all existing code. (Incidentally, that's probably the real reason Transmeta doesn't want anyone targeting its VLIW engine directly. It can't change the pipeline very much if anyone actually does. It's not the instruction set that matters as much as it is the pipeline!)

Traditional VLIW also encodes the exact functional unit that each instruction will be issued on. It does this either positionally (by having a slot in the VLIW opcode for each functional unit and using a fixed-length opcode), or, in the case of C6000, by assigning each unit a different portion of the opcode space and stringing together independent instructions through some bundling mechanism. The main point here is that traditional VLIW encodes the mix of functional units in the code stream. This makes it difficult to change the number or mix of functional units, but it can greatly simplify dispatch, as the dispatcher only needs to look at the instruction word -- it doesn't need to know if the functional units are busy or whatever.*

EPIC, on the other hand, relies on superscalar issue techniques to identify functional units that are available an to issue instructions to them. Again, this costs alot of hardware, but since the parallelism is encoded for the CPU, the hardest part (determining if two instructions have a dependency) is taken care of. There still needs to be a fair amount of logic in the pipeline, though, for pulling instructions out of bundles and finding units for them.

That said, there are many ways in which EPIC and VLIW are the same. EPIC features such as predication, speculative loads, rotating register files, and so on are also available in the VLIW world. (Not all VLIWs implement these though. The C6000, for instance, only implements predication, but arguably it's the feature with the greatest bang/buck ratio.) Also, explicitly coded parallelism is another unique feature of both EPIC and VLIW.

But please, don't confuse the issue by insisting they are the same. A true VLIW core has very spartan decode and dispatch hardware compared to what will be necessary to fully support an EPIC machine. The VLIW will be much more finnicky to support, but as long as you have a compiler of some sort in-between your codebase and the core (eg. the Transmeta Code Morphing software as one example), you're safe.
--Joe

[*Actually, it does need to know, if the architecture has some instructions that aren't fully pipelined. However, it only needs to know enough so that it doesn't blow up the chip. Code which issues an instruction to a unit that's busy is incorrect code in the VLIW world, and the hardware won't save you. Period.]

--

--
Program Intellivision!
Aww yeah. by pb · 2000-02-16 11:47 · Score: 3

On a chip this weird, we'll need the compiler. The fact that it's open source is awesome. That's just as cool as if Transmeta made their code-morphing software open source... (just so people understand, these are somewhat similar issues) Actually, maybe Transmeta could work on fast x86 translation for running natively on these platforms. I don't know if it'd be faster or better than the emulation or not.

CISC was made to make the assembler programmer's life easier. RISC was made to make the hardware manufacturer's life easier. VLIW was made to eke out more speed without using different (increasingly weird) techniques. I don't think it makes anyone's life particularly easy except for perhaps the end user. But I know it will make the compiler writer's lives hell. :)

My take on it is that by executing instructions in parallel by design, you can avoid the bother of reordering so many instructions on the fly, and trust the compiler to do a good job the first time. Therefore, good compilers will be cruicial to the speed improvements with this new platform.
---
pb Reply or e-mail; don't vaguely moderate.

--
pb Reply or e-mail; don't vaguely moderate.
Awwwww.... by Accipiter · 2000-02-16 11:32 · Score: 3

Intel said it will offer only minimal help to Sun because Sun wasn't doing enough to encourage software companies to use Intel chips instead of Sun's own UltraSparc chips.

I guess since it's Intel's ball. If they don't want to play, They'll take their ball and go home.

This can backfire though. Okay, so Intel is doing the same thing Sun did, and most likely will have a similar result. So Sun won't encourage users to run Solaris on Intel chips. (It's not going to have a huge impact on Intel, but it's a factor.)

-- Give him Head? Be a Beacon?

--
-- Give him Head? Be a Beacon?
(If you can't figure out how to E-Mail me, Don't. :P)
Heroic compilers for EPIC by possible · 2000-02-16 11:57 · Score: 3

"For this architecture, you really need a great compiler," said HP's David Mosberger in an interview earlier this month. Mosberger has been working on Linux for Intel's upcoming chip families for two years.
My understanding is that this new Intel chip will be the first commercially available chip to use the EPIC (Explicitly Parallel Instruction Computing).
From what I've read, the philosophy of EPIC is to have the CPU slavishly execute instructions in the exact order and manner prescribed by the compiler, allowing compilers to do intense optimizations without worrying about being second-guessed by the CPU. To quote from an article in this month's issue of IEEE Computer magazine:
[EPIC and VLIW code] provides an explicit plan for how the processor will execute the program, a plan the compiler creates statically at compile time. The code explicitly specifies when each operation will be executed, which functional units will do the work, and which registers will hold the operands.
There is a decent overview of EPIC at http://www.linux3d.net/cpu/CPU/epic/.
What I couldn't determine from my reading was whose standard it is and to what degree the IA-64 chip will implement it?
Hey! by Signal+11 · 2000-02-16 12:07 · Score: 3

Itantium, pentium, xeon..
Okay, I'm seeing a pattern developing here.. but why not name the chip what it is? I propose a new chip...
Marketanium
Marketanium is a revolutionary new 13th generation Inhell(tm)(r)(c) processor capable of over 30 FudFlops per second. It also has the new MNI (Means Nothing) instruction set and boasts a 1.6 BogoHerz speed....
Bleh. I wish they'd just name them the way they used to: 8088.. 80286..386..486..586... or atleast come up with better names for their chips.. like the Sextium!
RedHat/Cygnus IA-64 Developer Release README by Richard+Wakefield · 2000-02-16 13:03 · Score: 5

The README for the Linux/ia64 Developer's Release on Cygnus' ftp site (which incidentally is what RedHat's site links to), has some very interesting tidbits:

The entire GNU toolchain has been extended to support IA-64 (this includes binutils, gcc, and gdb).

The compiler generates working code, but does not generated optimized code for the Itanium processor yet. It has some basic optimizations, but no "interesting" optimizations yet.

Binutils is mostly functional, with the exception of shared library support and a few other things.

Gdb has only partial functionality--basic commands work, but most advanced commands are not working.

--
"You can represent this entire problem as a 3x2 matrix"