More Itanium-Linux Capability

Re:Not errors, but... by Guy+Harris · 2000-02-16 14:17 · Score: 2

Some niceties cc gets that gcc doesn't:

Variables that are assigned a value, but never used in an expression/function call

Static functions that aren't referenced by any other function (gcc does do this for variables, however)

GCC gets those two, at least if you use a sufficiently high level of optimization:

% gcc --version 2.7.2.3 % cat foo.c static int unused(int arg) { return arg + 17;

}

int
foo(int bar)
{

int a = 17;

return bar;

}
% gcc -c -O2 -Wall foo.c
foo.c: In function `foo':
foo.c:10: warning: unused variable `a'
foo.c: At top level:
foo.c:3: warning: `unused' defined but not used

although it doesn't get the other two (although "function arguments that aren't referenced" are sometimes desirable if the function is called through a pointer, and some functions pointed to do use the argument in question; other times, though, it's an indication of an error).

To gcc's credit, it does do some pretty spiffy control-flow analysis with -O9

Does -O9 do better than, say, -O2?

(One problem with said flow analysis is that it sometimes gives "false hits", so one occasionally either has to filter out the noise or stick in an unnecessary initialization.)

Back to gcc, I'll have to try -W along with -Wall... that really turns up the analness?

I don't think so. The man page says:

-W Print extra warning messages for these events: [list of events elided] ... -Wall All of the above `-W' options combined. These are all the options which pertain to usage that we recommend avoiding and that we believe is easy to avoid, even in conjunction with macros.

so -Wall would appear to include -W.

(At work, we compile the software for our products with -Wall, some other -W options to insist that functions have prototype declarations to increase the chances that a prototype declaration will be available when the function is called, and -Werror to ensure that if you violate the rules you don't get an image to download to the box....)

Re:Aww yeah. by Guy+Harris · 2000-02-16 14:32 · Score: 4

I think TM should at least document the instruction set for their chips

You left an "s" out following "instruction set"; Transmeta's technical white paper on Crusoe says on pages 7 and 8 that "the native ISA of the model TM5400 is an enhancement (neither forward nor backward compatible) of the model TM3120's ISA and therefore runs a different version of Code Morphing software."

As others have noted, publishing the native instruction set architecture may trap them into continuing to provide products that implement that ISA (or writing a binary-to-binary translator (he says, avoiding the "CM" phrase) to map that ISA to the new chip's native ISA), and that appears to be one thing they don't want to do - they want to be able to change the internal instruction set from product to product as they think appropriate.

Re:-Wall no longer implies -W by Guy+Harris · 2000-02-16 14:38 · Score: 2

-Wall no longer implies -W.

Well, that's annoying; any idea what the rationale for not including the -W warnings in the list of "all" (not-too-hard-to-eliminate) warnings was?

Oh, well, time to tweak the recipe files from which Makefiles are built at work, and to tweak Makefile.am for Ethereal....

Re:RedHat/Cygnus IA-64 Developer Release README by Mr+Z · 2000-02-16 14:50 · Score: 3

The Haifa scheduler and other "interesting" pieces in the backend should really help alot. From what I recall, Haifa includes a software pipeliner as well as some other block-scheduling pieces which will be very necessary to get parallelism out of this beast.

One thing I wonder is whether they're actually generating bundles, or if they're just issuing a serial code stream. For the uninitiated, a bundle is Intel's term for a group of instructions that have been marked for parallel execution. An early compiler port that's striving for correctness need not know about bundles by simply issuing bundles which contain a single instruction each. The peephole optimizer might do trivial pairing of instructions after-the-fact, but you really don't get alot of parallelism that way, trust me.

The compiler won't truly shine until the full IA-64 pipeline model, complete with instruction latencies, numbers and mixes of functional units, etc. is described in minute detail to the compiler, and the compiler has the infrastructure for stitching together tightly packed bundles. There are many techniques and optimizations that will need to be implemented in order to stitch those bundles together.

It'll be even more interesting if the compiler can tune for different EPIC iterations, since different chips will have different numbers of functional units. Although the EPIC encoding is scalable, the best performance will be reached if the code provides parallelism which matches the available hardware, rather than exceeding it, since overly parallel code may tie up more registers than is necessary and will trash the instruction cache if it's unrolled too much.

I'm willing to wager that this early GNU C port is available now because the IA64 offers a protected pipeline. IMHO, the single biggest difference between EPIC and VLIW is that EPIC provides pipeline interlocks, whereas traditional VLIW exposes all delay slots and requires the programmer to get it right. While the protected pipeline allows early compilers to ramp up quickly, it also lowers the performance ceiling for a given transistor count.

If anyone here wants to see really hairy VLIW code, go check out TI's C6000 benchmarks page. The C6000 can issue 8 instructions every cycle, and has a fully exposed pipeline. (For those of you crazy enough to click the link, the '||' are used to denote parallel instructions, and branches occur 5 cycles after they're issued.) It's an absolute blast to program by hand (it's my day job), but you don't want to program anything larger than a function in scope. You get a very strong appreciation for compiler technology too. :-) Let me tell you, I've seen some of these "interesting" optimizations coming from the C6000 compiler, and they're pretty mind-bending. I wonder how long they'll take to get these into the IA64 compilers...

--Joe
--

--
Program Intellivision!

EPIC is much more than VLIW. by Mr+Z · 2000-02-16 15:19 · Score: 5

EPIC adds an awful lot to the VLIW base. It encodes explicit parallelism, much like VLIW does, but it breaks away from some VLIW principles in order to make it easier to get initial compilers targeted to the platform and easier for Intel to change the pipeline later.

Traditional VLIW machines sport a "fully exposed pipeline", which means that if an instruction takes more than 1 cycle, the program doesn't see the result until it's actually written back, and the machine lets the user read the old value in the meantime. (For those of you who are familiar with the MIPS or SPARC architectures, you might recognize this concept as "delay slots". VLIW takes this to the extreme such that all delay slots are always fully exposed.) The benefit of this is that you eliminate pipeline interlocks, thereby simplifying the hardware greatly. The pipeline always knows it can issue the next instruction and never has to compare notes between packets. Very clean, and quite simple compared to the heavy voodoo modern CPUs currently perform.

EPIC, in contrast, offers a protected pipeline. From what I've read, it sounds like it's using a simple scoreboard approach to keep track of in-flight values, so it's not nearly as complex as the many register-renaming approaches that are out there; however, it's still quite a bit more complex than the traditional VLIW approach. The protected pipeline makes it easier for Intel to change the pipeline depth later. VLIW doesn't have that luxury for its native code, since changing the pipeline changes the delay slots and breaks all existing code. (Incidentally, that's probably the real reason Transmeta doesn't want anyone targeting its VLIW engine directly. It can't change the pipeline very much if anyone actually does. It's not the instruction set that matters as much as it is the pipeline!)

Traditional VLIW also encodes the exact functional unit that each instruction will be issued on. It does this either positionally (by having a slot in the VLIW opcode for each functional unit and using a fixed-length opcode), or, in the case of C6000, by assigning each unit a different portion of the opcode space and stringing together independent instructions through some bundling mechanism. The main point here is that traditional VLIW encodes the mix of functional units in the code stream. This makes it difficult to change the number or mix of functional units, but it can greatly simplify dispatch, as the dispatcher only needs to look at the instruction word -- it doesn't need to know if the functional units are busy or whatever.*

EPIC, on the other hand, relies on superscalar issue techniques to identify functional units that are available an to issue instructions to them. Again, this costs alot of hardware, but since the parallelism is encoded for the CPU, the hardest part (determining if two instructions have a dependency) is taken care of. There still needs to be a fair amount of logic in the pipeline, though, for pulling instructions out of bundles and finding units for them.

That said, there are many ways in which EPIC and VLIW are the same. EPIC features such as predication, speculative loads, rotating register files, and so on are also available in the VLIW world. (Not all VLIWs implement these though. The C6000, for instance, only implements predication, but arguably it's the feature with the greatest bang/buck ratio.) Also, explicitly coded parallelism is another unique feature of both EPIC and VLIW.

But please, don't confuse the issue by insisting they are the same. A true VLIW core has very spartan decode and dispatch hardware compared to what will be necessary to fully support an EPIC machine. The VLIW will be much more finnicky to support, but as long as you have a compiler of some sort in-between your codebase and the core (eg. the Transmeta Code Morphing software as one example), you're safe.

--Joe

[*Actually, it does need to know, if the architecture has some instructions that aren't fully pipelined. However, it only needs to know enough so that it doesn't blow up the chip. Code which issues an instruction to a unit that's busy is incorrect code in the VLIW world, and the hardware won't save you. Period.]

--

--
Program Intellivision!

What Compilers Really Do by Carnage4Life · 2000-02-16 11:29 · Score: 2

Cmdr Taco: Kinda a fluff piece: any piece that explains what a compiler is is probably fluff

CNet: Much of the performance gain that's expected from Itanium will only become a reality if compilers can line up instructions in just the right way so that the chip can operate efficiently. And compilers also are an essential tool for getting higher-level software, such as databases or e-commerce software, to work on the new chip.

Aww yeah. by pb · 2000-02-16 11:47 · Score: 3

On a chip this weird, we'll need the compiler. The fact that it's open source is awesome. That's just as cool as if Transmeta made their code-morphing software open source... (just so people understand, these are somewhat similar issues) Actually, maybe Transmeta could work on fast x86 translation for running natively on these platforms. I don't know if it'd be faster or better than the emulation or not.

CISC was made to make the assembler programmer's life easier. RISC was made to make the hardware manufacturer's life easier. VLIW was made to eke out more speed without using different (increasingly weird) techniques. I don't think it makes anyone's life particularly easy except for perhaps the end user. But I know it will make the compiler writer's lives hell. :)

My take on it is that by executing instructions in parallel by design, you can avoid the bother of reordering so many instructions on the fly, and trust the compiler to do a good job the first time. Therefore, good compilers will be cruicial to the speed improvements with this new platform.
---
pb Reply or e-mail; don't vaguely moderate.

--
pb Reply or e-mail; don't vaguely moderate.

two things: by Blue+Lang · 2000-02-16 11:48 · Score: 2

one: according to my cursory search, this is the ninth c|net story posted to /. in the month of february. clue to everyone reading /. - read www.news.com - it's good, and there are no grits.

clue to those of you who post this stuff: c|net has a daily email digest. you can just procmail it straight into HTML.

two: On top of the above, isn't it sort of ironic that Taco makes fun of articles about compilers, when /. is written in perl?

Maybe not.

--
blue, burning karma because he can.

--
i browse at -1 because they're funnier than you are.

Awwwww.... by Accipiter · 2000-02-16 11:32 · Score: 3

Intel said it will offer only minimal help to Sun because Sun wasn't doing enough to encourage software companies to use Intel chips instead of Sun's own UltraSparc chips.

I guess since it's Intel's ball. If they don't want to play, They'll take their ball and go home.

This can backfire though. Okay, so Intel is doing the same thing Sun did, and most likely will have a similar result. So Sun won't encourage users to run Solaris on Intel chips. (It's not going to have a huge impact on Intel, but it's a factor.)

-- Give him Head? Be a Beacon?

--

-- Give him Head? Be a Beacon?
(If you can't figure out how to E-Mail me, Don't. :P)

gcc or what? by Straker+Skunk · 2000-02-16 11:42 · Score: 2

Fast binaries rock!! GO SGI!!!

(*dances jig*)

I'm wondering, especially since they went the whole nine yards to release this under GPL-- will this be a new backend to gcc, or is it a whole new animal?

While I haven't had cause to complain about gcc's compiled-binary performance, I will go gaga if [SGI's compiler] has the same code-analysis capability as the IRIX cc. Having lint practically built right in is soooooo nice for debugging... if you can build it cleanly with -fullwarn, you can build it anywhere. (IMHO, gcc's greatest fault is that it's too damned lenient }:->

--
iSKUNK!

Re:gcc or what? by sterwill · 2000-02-16 11:44 · Score: 2

Having never used IRIX's C compiler in-depth, I can't really comment on its capabilities (they sound impressive). But GCC is one of the most correct compilers I've used. Do you have any examples of invalid code that GCC will allow through with "-W -Wall -ansi -pedantic"?

--

Heroic compilers for EPIC by possible · 2000-02-16 11:57 · Score: 3

"For this architecture, you really need a great compiler," said HP's David Mosberger in an interview earlier this month. Mosberger has been working on Linux for Intel's upcoming chip families for two years.

My understanding is that this new Intel chip will be the first commercially available chip to use the EPIC (Explicitly Parallel Instruction Computing).

From what I've read, the philosophy of EPIC is to have the CPU slavishly execute instructions in the exact order and manner prescribed by the compiler, allowing compilers to do intense optimizations without worrying about being second-guessed by the CPU. To quote from an article in this month's issue of IEEE Computer magazine:

[EPIC and VLIW code] provides an explicit plan for how the processor will execute the program, a plan the compiler creates statically at compile time. The code explicitly specifies when each operation will be executed, which functional units will do the work, and which registers will hold the operands.

There is a decent overview of EPIC at http://www.linux3d.net/cpu/CPU/epic/.

What I couldn't determine from my reading was whose standard it is and to what degree the IA-64 chip will implement it?

Hey! by Signal+11 · 2000-02-16 12:07 · Score: 3

Itantium, pentium, xeon..

Okay, I'm seeing a pattern developing here.. but why not name the chip what it is? I propose a new chip...

Marketanium

Marketanium is a revolutionary new 13th generation Inhell(tm)(r)(c) processor capable of over 30 FudFlops per second. It also has the new MNI (Means Nothing) instruction set and boasts a 1.6 BogoHerz speed....

Bleh. I wish they'd just name them the way they used to: 8088.. 80286..386..486..586... or atleast come up with better names for their chips.. like the Sextium!

Re:Not errors, but... by wowbagger · 2000-02-16 20:13 · Score: 2

Actually, all the items you list are caught if you use g++ (i.e. compile as C++, not C), and use -W -Wall -pendantic -ansi. And you still get inline functions.

OK, I'm a C++ "bigot", and I don't understand why anyone would program in C when they have a perfectly good C++ compiler handy. The only times I drop back to plain ol' C is when I'm working on a microcontroller or DSP that has no C++.

And remember: You don't HAVE to use objects/inheritance/polymorphism/RTTI/exceptions/s treams/STL just because you threw the C++ switch: You want to pretend you're writing C, go for it.

And to forstall the flames: C++, written by somebody with a clue how to use it and compiled with a good compiler is every bit as efficient as C. The only times I've seen poor size or speed from C++ was when the person writing it had no clue how to design objects, and rather than sticking with plain C they botched the design. No language, not Java, not Ada, not Pascal, not Modula, can save you from an incompetent programmer.

--
www.eFax.com are spammers

Who's Arkady? by divec · 2000-02-16 12:15 · Score: 2

Who's Arkady? It rings a bell with the name "Darrell" but I'm not sure why!

--

perl -e 'fork||print for split//,"hahahaha"'

Not errors, but... by Straker+Skunk · 2000-02-16 13:00 · Score: 2

It's not that gcc lets through incorrect code, just that SGI's compiler is very good at pointing out quirks that could be potential problems. All warning-land stuff, of course.

Some niceties cc gets that gcc doesn't:

Variables that are assigned a value, but never used in an expression/function call
Static functions that aren't referenced by any other function (gcc does do this for variables, however)
Function arguments that aren't referenced (a bit nitpicky, but it's good to name them foo_unused or whatever)
Mixing of integer and enumerated types without (int) casts (annoying sometimes, 'specially when building GTK+ code, but hey, it doesn't hurt)

There are others, but alas, I don't have a big hairy codebase under development right now to give more examples :-( Most of the time, it's things that make you say "okay, no big deal, but I really should clean up that bit..."

To gcc's credit, it does do some pretty spiffy control-flow analysis with -O9 ("variable foo may be used before initialization", "statement is unreachable", etc.). But still, I don't consider a program done until I have an excuse for each and every warning given by SGI's compiler.

(Back to gcc, I'll have to try -W along with -Wall... that really turns up the analness? I'd use -ansi, except that it takes out nice things like inline functions and printf'ing long long ints)

--
iSKUNK!

RedHat/Cygnus IA-64 Developer Release README by Richard+Wakefield · 2000-02-16 13:03 · Score: 5

The README for the Linux/ia64 Developer's Release on Cygnus' ftp site (which incidentally is what RedHat's site links to), has some very interesting tidbits:

The entire GNU toolchain has been extended to support IA-64 (this includes binutils, gcc, and gdb).

The compiler generates working code, but does not generated optimized code for the Itanium processor yet. It has some basic optimizations, but no "interesting" optimizations yet.

Binutils is mostly functional, with the exception of shared library support and a few other things.

Gdb has only partial functionality--basic commands work, but most advanced commands are not working.

--
"You can represent this entire problem as a 3x2 matrix"

Re:Wouldn't that be like... by Accipiter · 2000-02-16 23:41 · Score: 2

That's entirely my point.

Sun makes UltraSparc. Intel is complaining because Sun didn't endorse Intel Chips. WHY would Sun advertise for Intel?

So BECAUSE Sun didn't endorse Intel, Intel won't help them port Solaris to Itanium.

-- Give him Head? Be a Beacon?

--

-- Give him Head? Be a Beacon?
(If you can't figure out how to E-Mail me, Don't. :P)

Re:It�s just VLIW by Anonymous Coward · 2000-02-16 13:21 · Score: 2

It's VLIW, but with a ton of other features thrown in.

My favorites:
1) It allows predication of instructions. Suppose you have an "if...else..." branch. If it is sometimes true, sometimes false, then it will really slow down current processors. EPIC allows you to execute both sides (the "if" section and the "else" section) concurrently, and just throw away the results of the part which turn out to be unneeded. This is a hell of a lot better than conditional moves.

2) Software pipelining. It will automatically unroll and pipeline "for" and "while" loops. This means that you will not get branch misprediction stalls on these loops, and you should get pretty close to the theoretical limit of performance for the chip.

3) The ability to move loads and stores around (advancing loads, speculative memory accesses, etc.). This won't mean anything to most people here, but this is a big deal when you are trying to optimize code.

None of this stuff is part of VLIW. VLIW just means that you issue groups of instructions at the same time to the decoder, to parallel execution units.

One other point that keeps getting screwed up on Slashdot: most of the newer compilers do profile-guided optimization. (I know that Intel's compiler and Digital's Fortran compiler both do it.) This is the technique where your executable is instrumented by the compiler, which then recompiles the code based on the run-time statistics.

Transmeta announced it as though they were the only ones to have thought of something so simple...just like "code morphing."

Slashdot Mirror

More Itanium-Linux Capability

19 of 69 comments (clear)