Intel's New Compiler Boosts Transmeta's Crusoe
Bram Stolk writes: "Intel recently released its new C++ compiler for linux.
I've been testing it on my TM5600 Crusoe. Ironically, it turns out that Transmeta's arch nemesis, Intel, provides the tools to really unlock Crusoe's full potential on linux." It doesn't support all of gcc's extensions, so Intel's compiler can't compile the Linux kernel yet, but choice is nice.
Hell, I'll buy Transmeta. Sure, I'll have to go without food for...say...twenty minutes or so, but... =^)
Intel's C++ compiler still compiles code to x86. This is really great, considering that the approx. 28% speedup in Crusoe is not the native Crusoe. I wonder how Crusoe will fare once there is a compiler that build straight to its native.
For me, Crusoe + icc + GNU/linux is a winning combination.
Well, to me, it's a hasty conclusion. P4 gains 26%, Athlon XP gains 19%, and plain Athlon gains 16%.
--
Error 500: Internal sig error
At first in the writeup it looked as though you were planning on compliling an image, and I thought to my self "Holy crap, self! Can complilers these days make graphics from source code?" Then I realized that you were just compiling the program to make the image. Then I looked at the example, and it looks as though you are (effectively) compiling a graphic. I'm so confused... :o)
Intel's compiler can't compile the Linux kernel yet
Last time i checked the kernel was in C not C++
~wally
Any bets on which of the next versions will spew an error about "incompatible architecture" when used on non-Intel hardware?
I Am My Own Worst Enemy
Wait, the Kernel uses GCC extensions? I thought the Kernel was written in real C, not that bastard GCC version. I've never look at Kernel code, so I'm not sure. Is this really true?
If it's true, I think that's a huge mistake. The Kernel should not be at the mercy of one compiler.
Sometimes it's best to just let stupid people be stupid.
Seriously, anything that is going to need the optimizations that this new compiler does, should probably be written in ASM anyway. Your 'hello world' and 'count and increment an array' programs are not going to run any faster. Don't bother.
/. is a commercial entity. goto slashdot.com
Uhh, just because you can't recompile the kernel doesn't mean you can't recompile your other programs. You can keep your gcc-compiled (slower) kernel and then recompile your other programs with the Intel compiler (faster). Just because only part of your system speeds up doesn't mean it's useless.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
What I think is being pointed out is that on-cpu time will be shorter. This has some nice pluses to it, even if the kernel doesn't get the nice optimizations of it all. I would love a faster apache server, even if my kernel is (reletivly speaking) more pokey. :-)
Sam
Say... The Linux kernel, and povray for that matter, ran just fine on Transmeta BEFORE icc came out. You just compiled them with gcc.
;)
The only thing different here is that they run a bit faster!
Exactly how much did you learn about Transmeta before you bought their stock?
Justin
"Why would God give us a waist if we wasn't supposed to rest our pants on it?" - Rev. Roy McDaniels
...because this is the first question everyone asks as soon as they find out Intel's compiler works on Linux. ;-)
I'm not surprised the compiler helped Crusoe. GCC is a remarkable achievement in portability, but architecture-tailored compilers (MSVC, ICC) do better both in terms of code size and speed - like 30% better. But if you're going to PAY for your compiler, it better not be beaten by a free alternative.
I hope we see distros using icc, and I also hope it spurs further development in GCC.
I wonder if Intel's compiler is binary compatible with gcc. While it's probably against the licensing to redistribute the compiler's math or C library, I wonder if you could compile the gnu math/C library with icc and produce a shared object? An optimized math or other system library would give some decent improvement in performance.
Given that Intel makes a lot of its money from selling silicon, why on earth would it develop compiler technology which legitimized the approach of one of its major competitors ?
I can only assume that Intel has some fairly advanced code morphing technology of its own, and has been using the transmeta devices as a testbed.
I can just see it now, a 4GHz pentium with code morphing extensions.
I expect this one will be fought out in the patent arena. IBM and Intel are heavyweight players and I don't see either of them giving any ground willingly.
how is intel the 'archenemy' of us... just because Linus works at Transmeta? What chip are you running your OS on? I bet its an Intel chip, or an intel-clone (AMD)
/me is wintel-free, yay Mac
Just a thought: Might this compiler perhaps be different in a way that improves the situation regarding the C++ library relocation issues that bothers KDE?
- El riesgo siempre vive - Private J. Vasquez
Why don't they use ANSI C for the kernel?
I'm suprised I haven't seen anyone else post this. Intel's compiler is EXPENSIVE! $499? I think since most programmers are not exactly rich (Gates excluded), I think most Linux people are not going to exactly embrace this new compiler.
$500? I paid less than that for my MS compiler!
Erioll
This isn't a particularly startling result. Many of the things an x86 compiler has to optimize for these days are similar across all processors: e.g., regular branch patterns are faster than unpredictable ones; you have very few visible registers; it's helpful to have closely associated data in the same cache lines; you're usually better with the RISCy subsets of the ISA; etc. Intel would have had to go well out of its way to optimize for their own chips and pessimize for others, and I can't see Intel bothering.
That's what I have said. My point was that the 28% gain is basically on-par with P4. Athlon gains weren't too shabby either. Meanwhile, we understand that current Crusoe performance is pretty dismal compared to P4 or Athlon. So, 2% difference on performance gain doesn't mean that Crusoe performance is now leveraged into a new level.
If it were compiled into its native, we then can see Crusoe's raw power and compare them neck to neck. The story would have been much different.
Note also that I am not a revisionist. I believe Slashdot community is intelligent enough figuring out what I said.
--
Error 500: Internal sig error
Intel's compiler boosts AMD Athlons too.
AMD uses (or at least, used to use, I haven't checked lately) Intel's compilers for their SPEC runs.
Intel's compiler is the best available for CPUs that implement the x86 ISA. Transmeta implements that ISA, so why does this news surprise people?
GCC has never been an especially performant compiler, on sparc/mips/alpha atleast, the vendor compilers are CONSIDERABLY faster than gcc, it really sickens me to see programs which use nonstandard features of C that refuse to compile on anything other than gcc. Perhaps the gcc team should work more on generating more optimised output, and less on adding nonstandard features..
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
There shouldn't be a lot of problems for binary compatibility with C (e.g. glibc, libcurses, X libraries). (Famous last word is "should" so unless someone does some testing and reports the results, take with a grain of salt). For C++, it gets a bit murkier. The Intel page has a section called "Compatibility with the GNU Compilers". They refer to the C++ ABI that was developed for Itanium, which I believe is basically the same ABI as GCC 3.x (it has mangled names which start with _Z). When they say they aren't compatible with g++, I suspect they mean g++ 2.95.x and maybe even 3.0 or 3.0.1, I'm not sure that sentence applies to 3.0.2 or (certain unspecified) future releases of 3.x.
How much time does the CPU spend running kernel code as opposed to user-land app code? Virtually none. Idiot.
Crusoe does cool things because it runtime optimizes the code that it is morphing. If you were to run crusoe code natively, you'd no longer get the optimization benefits, and all you'd be left with is an even slower low-power chip.
Theoretically, you could write a Crusoe-to-Crusoe code morphing module, but that wouldn't buy you anything more than the X86-to-Crusoe morpher.
pooptruck
So, again, until you can actually compile the kernel, it's a fascinating breakthrough, but one with little utility to the real world.
So what you're saying is that the only really useful use of a compiler is to compile the Linux kernel?
That's quite possibly the silliest thing I've heard someone say. Try:
Son: "Look ma, I got the fastest engine in the world for my car! Now I can drive faster than anyone else!"
Ma: "Um, sonny, it can't play MIDI files or make julean fries, so it's totally useless."
Totally wrong. There are thousands of pieces of software out there. The Linux kernel is merely one.
--Dan
(10, Uses Car-as-Computer analogy)
You're benchmarking an intel compiler which will generate optimized intel code, but telling gcc to use "-m386" ?
You have an 80386 machine here secretly? Why not use the optimized flags like "-mcpu=i686 -march=i686" and give a fair comparison?
Am I the only one to see this? C'mon people, wake up, read the manual.
What if, besides caruso, Intel's compiler is actually a BETTER compiler than gcc on intel hardware? Then were stuck using gcc for compiling the kernel when something better is or might be some day available. . Locking the kernel to a compiler is a BAD THING[tm].
The Linux kernel is not only available on Intel chips. It is available on ARMs, DEC Alphas, SUN Sparcs, M68000 machines (like Atari and
& Amiga), MIPS and PowerPC, as well as IBM mainframes.
Which makes more sense? Targetting a cross plartform compiler like gcc are targetting individiual compilers for each platform Linux runs on?
the macro should be:
Sig? What sig?
The real story here is that the maintainers of GCC aught to look carefully at their optimization code for x86 FPUs.
I'm betting that Intel developers have done their best to make use of the P4 cache. Since Transmeta CPUs do work recompiling programs on the fly they have larger caches (128KB L1 + 512KB L2) than the Athlon (128KB L1 + 256KB L2) and the Pentium 4 (20? KB L1 and 256KB L2). ICC is probably also highly agressive in implimenting SSE and SSE2 instructions. Transmetal CPUs also use VLIW instructions in core wich are by their nature highly parallel (compared to native x86). Even if the Transmeta chips can't use SSE and SSE2 they may benefit from the parallel-oriented optimizations that ICC probably makes.
On a different note: in a program like POVRay that executes basically the same tight loop of instructions mega-gazillions of times during a scene the Transmeta chip's software can have the opportunity to highly optimize the program. I would like to see the stats on the second and third runs of that rendering to see how much the Transmeta "code morphing" improved the performance. It would be very interesting if the GCC and ICC built POVRays perfomed at almost the same speed after a few runs. It would obviously be a great proof of the value of Transmeta's design. I for one have always wondered what the code morphing stuff would be able to do if it were able to interface with the operating system and recompile and save the entire system back to the hard disk as it goes through the optimiztion processes. (I suppose that errors could be highly disasterous.)
That's just my $0.02 and I'm no expert so I could definately be wrong.
This is not a signature.
You can not distribute (shared)objects, binaries etc made with ICC. Its plainly for you to play on your own machine, you can not redistribute anything produced by ICC regardless if its free or not. This is ofcourse my understanding of ICC's license (i havent read it in a while though)
As I type this, I'm downloading Intel's Linux Fortran compiler. While this is slightly off-topic, it will be interesting to see if this free (non-supported version) will compile some code I have that previously relied on Compaq/Digital Fortran's fort26.dll on the Win32 platform (not my code, honest :).
:)
If I can get it to compile on Linux, then I can do a whole host of things my employer previously thought impossible.
Praying for the end of your wide-awake nightmare.
Interesting benchmark of Intel's compiler vs. gcc 2.95.4, but what about gcc 3.0? I'd love to see how that compared, given that I've heard such mixed opinions about whether it's optimisation tends to be better, worse, or the same as the 2.95 series..
That thread is from May. In the meanwhile, it seems that almost all the new KDE tree is compilable with the intel compiler (at least based on the cvs logs, I didn't check it myself).
Now, for the expected performance increases. If I am correct, the intel compiler is the old KAI C++ compiler, which was highly regarded in number crunching circles as the best optimizing, more standard compliant compiler around.
Still, the spectacular increases occur only in very specific cases which are amenable to optimization. Number crunching (big math computations) are the best example, and this applies probably to mp3 encoding, divx playback and compression, image processing and other stuff like this, too. But for your average, highly heterogenous code which goes into your typical desktop apps, the increase is significatly smaller.
Lotzi
gcc has gotten so far behind the specialized instruction set curve that you're better off writing hardware descriptions for an FPGA using iverilog than spending $500 to write useful software for a modern instruction set.
Most speed loss is via applications, not the kernel. This compiler can most definitely help out in that area. Mozilla and XFree86 are two that I bet could get a nice boost from it!
Derek Greene
Actually, having read the license, I found the following loophole:
. . . if you buy the compiler, you are allowed to distribute code that you compile with icc ;) Find someone who has paid for icc, and ask them nicely if they would compile something for you. No, it's not open-source, but you can distribute source code along with an optimized binary if you're so inclined.
The gcc "open projects" page gives people a good idea of what remains to be done on gcc. The minutes of the IA-64 GCC summit are especially interesting and informative, because it gives a good idea of the current state of GCC and also what GCC needs to be a competitive compiler in the future.
Bottom line: Do not be surprised when commercial compilers beat gcc performance. It's catching up, but it's still got a long way to go.
GCC Home Page
I wonder, would we see noticable speed increases if a major Linux distribution (say, Mandrake) were to build all of their binary packages using the Intel compiler? The usefulness of this compiler for the average Linux user seems questionable given that all distros come with a perfectly wonderful compiler (gcc), but a use like this seems like a shoe-in.
Assuming, of course, that you would actually see any speed up. I wonder if any distro maintainers have bought the compiler and are rebuilding their binaries to compare execution speed, load times, and binary size?
wow, the compiler fixes bloated design issues too?
Seriously though, any speed up with any program helps. Given that Mozilla's UI is in XUL and that's were a lot of the sluggish behavior seems to be, has anyone come up with a jit compiler for xul?
the good ground has been paved over by suicidal maniacs
More generally, why not ignore the x86 and treat the Transmeta as its own architecture?
I expect the Code Morphing hardware can be used for more than x86 compatibility.
OS X is built with Apple's version of gcc. That always bugged me. I mean, gcc's great and everything, but going from MrC to gcc doesn't sound like a great idea... I can see a number of reasons why they'd want to use gcc, but I don't think performance is one of them :(
Does anyone know if Apple's gcc is pushing ahead in PPC optimizations? IIRC, their gcc's code base is pretty far apart from the main trunk.
Moderators should have to take a reading comprehension test.
Are any of the slashdot "writers" or owners of OSDN investors in Transmeta? Is Transmeta an investor in OSDN? Yes, there is a preference to filter Transmeta stories. But why are there so many PR stories about this company listed on slashdot?
glibc requires gcc - and a relatively recent gcc at that.
So - no, for the same reason you can't compile a Linux kernel with it.
Yet. (I agree with the poster who said "probably by the next version icc will support at least some gcc extensions".)
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
Another one who learned the pronunciation of "Crusoe" from the Gilligan's Island theme song!
I realize you've got a smiley there, but I've got to say: Duh! Who would use/buy a compiler that didn't allow you to distribute your binaries? That would be like using a word processor where you didn't own the work you wrote.
Though it wouldn't surprise me if sooner or later the Microsoft C++ or Word license would claim that any work produced with the tools is property of MS.
thanks,
Michael
Floating point performance doesn't tell much about integer performance and vice versa (remember the Itanium). It is well-known that GCC has got its problems with the stack-based x86 floating point unit (especially pre-3.0 versions; some people claim that 3.x is faster).
Since the kernel doesn't use floating point instructions, it's not such a big loss that you can't compile it with icc yet. In addition, compiling the kernel (which is not written in ISO C, let alone ISO C++) might uncover a few bugs in the kernel code and the compiler, and it's not very likely that the kernel folks are able or even willing to help you if you use a strange system configuration with a proprietary compiler.
I think the most dramatic demonstration of this was a test done by Tom's Hardware last year. He ran a test on a bunch of different processors doing MPEG-4 encoding using FlaskMPEG. The Pentium 4 performed abysmal, comming in behind a Pentium III 1ghz. Intel decided then to download the source code to FlaskMPEG and recompile it with their compiler. This moved the P4 up to the top of the heap, but also increased all the other scores. The P4 1.5 got the biggest boots, from 3.83fps to 14.03fps the PIII 1ghz also got a lesser boost from 4.39fps to 8.03fps. However the Intel compiler helped out the Athlon 1.2ghz too, boosting it from 6.43fps to 11.14fps. So it even gave their competitors' hardware a 60% speed boost.
Intel's compiler division isn't interested in trying to screw their competitiors and make Intel's chips look the best, they are interested in producing the most optimized x86 code possible. Now of course the Intel compiler supports all the special Intel extensions (MMX, SSE, SSE2) and I don't believe it supportins things like 3dnow, but that dones't mean they are going to screw up their code on purpose to make it run poorly on other chips.
Anybody tried to compile kde with icc ? The pre-linking optimization helps a bit, but even the calculator takes about as long to start up as M$ word (and that's not a joke :( ).
This is an x86 optimised compiler and so since transmeta emulates x86, it was obvious that Transmeta would also benefit from it. Then it became a "Intel helped transmeta without knowing it haha" thing but I'm sure even the AMD chip perf improves with this compiler
The duality weakens
I'm surprised to see the lack of "I recompiled everything using this on my Athlon and my performance increase was XX%". Simply said, It's an optimized x86 compiler and any processor that uses that instruction set should benefit from using it. Intel releasing it for 'free' gives HPTC guys one less hoop to jump through when tweaking their applicatiions. It also adds value to Intel processors in general.
Peter
www.alphalinux.org
I'm not a big Intel fan, but I just have to respond to this. The fact that the Intel compiler is unable to compile the Linux kernel is absolutely not the compiler's fault ... if the code is written against a bunch of weird gcc-specific extensions, that's hardly the compiler's fault.
I am currently working on the firmware-level compiler team at AMD, converting the legacy firmware compiler to a newer firmware base to match the new core. (64-bit, VLIW, etc... The upcoming Unicorn chip, will be released in 2003). I can tell you this much: While I admire the gcc team, the gcc compiler is quite bloated and has a lot of exotic features which do not work well with standard compilers. If the gcc team ever tried to fit gcc into firmware runspace, it would be literally impossible without a complete rewrite.2DUP * ;
P4/1.7 +26%, P3/866 +23%, Athlon/1.2 +16%, AthlonXP/1.2 +19% (due to SSE). But the boost is somewhat lower if you exclude the subtest 252.eon, which is more than 3 times as fast with icc.
Another interesting test compared scores for icc on Linux vs. Windows on the P4. Linux scores a little lower on average, but two test show huge differences: 176.gcc on Linux scores 745 vs. 529 on Windows, while for 252.eon it's 406 (L) vs. 745 (W) - gcc only scored 115. You can see that SPEC sub-scores can differ wildly on the same processor even when using the same compiler.
Mixed results for C++: 252.eon is C++, so it's obviously fast, but icc doesn't work with gcc compiled libraries (incl. most graphic toolkits).
One more thing: if you set some switches the wrong way, the resulting code may not work as intended.
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
You can also look at some rudimentary benchmarks comparing gcc 3.0.1 and Intel C++ 5.0.
All about me
This development may be that step in the right direction that Transmeta, and for that matter Intel and Linux need at this point. I would really hate to see a great company such as Transmeta go by the wayside, because variety is good. Maybe Intel finally realizes that there is life after Windows.
I hate sigs.
I could stand to use Intel's Signal Processing Library on Linux right now.
My understanding is that Intel does have these libraries ready to go for Linux (and have for at least a year), but for some reason, refuse to release them.
Anyone have any clues about this?
In the course of every project, it will become necessary to shoot the scientists and begin production.
You can download it from Intel
Reminder: This compiler includes no support and cannot be used to produce products for resale or commercial use.
And thus produces binaries incompatible with the GNU General Public License, which allows no such restrictions on distributed binaries.
Will I retire or break 10K?
Transmeta is NOT RISC, it is VLIW with a x86 to VLIW optimizing translator.
VLIW means "very long instruction word," and EPIC means "explicit parallel instruction computing," both of which in practice mean "architectures that combine several fixed-length instructions into one word." RISC means "reduced instruction set computing," which in practice means "architectures with fixed-length instructions." All important VLIW/EPIC instruction sets have fixed-length instructions (32-bit in a 256-bit word for TMS320C6K, 32-bit in a 128-bit word for Crusoe, or 41-bit in a 128-bit word for IA64), but MIPS, PPC, and Sparc disprove the converse; therefore, VLIW/EPIC RISC.
Will I retire or break 10K?
Just because the compiler is from Intel, it does not mean that it always generates better x86 code than gcc. Quite the contrary, there is a lot of real-world C++ code, for which g++ 2.95 and g++ 3.02 generate significantly better code on Pentium IIIs and Pentium IVs than the Intel compiler. I am talking about factors anywhere between 1.5 and 4 times slower.
Surprisingly that includes floating-point heavy applications, even with SSE2 instructions enabled. You'd expect that the Intel compiler should do particularly well at these, but this is not always the case.
We did some benchmarking and measuring as a consequence of these results. It turned out that Intel's compiler is rather bad at handling typical C++ data and procedural abstractions. g++ is much better at these, and it shows. I don't understand how people can keep harping on how lousy the code that gcc generates is supposed to be. In my experience, it has been quite respectable, especially with gcc 3.02.
The bottom line is, as so often: Measure the performance of your C++ programs before deciding whether to compile it with g++ or Intel's compiler.
I agree, design with those two is a big issue...i'd say particularly more so with XFree86...In a way with the networking ideas behind XFree86 it makes sense for the purpose but when it's just on your desktop machine...wtf? Err am I making sense?
Derek Greene
There is a sub-project for Mozilla called Rhino that implements the JavaScript interpreter in Java. It apparenlty can or did translate JavaScript into Java bytecode that could be processed by the JVM's jit. According to the history page, it doesn't sound like it worked all that great (leaked memory and the JavaScript->Java translation was slow).
the good ground has been paved over by suicidal maniacs
ICC doesn't even attempt SSE optimizations at the optimization level tested (-xMi; that's PPRO and MMX instructions; you need to -xMiKW to get SSE and SSE2 as well). The big wins that gcc could get would come from rewriting the scheduler and register allocator. The difference for gcc probably comes from extra loads and stores, and possibly more code in loop bodies. Function inlining may also play a part, as gcc doesn't do that very well.
You may also be right that gcc doesn't play with the x87 stack very well, but that is likely a minor difference in comparison.
Even Slashdot wants to hide some things
All I see there is directory after directory named things like "Boys In Bondage" and "Hot Rimming Volume 6"
Ah yes, Shakespear's Classical "Gay Boys in Bondage".
"Hot Rimming Volume 7" is about creative pie crust making.
What have you been downloading?
"Face it, a nation that maintains a 72% approval rating on George W. Bush is a nation with a very loose grip on reality.
I tried Intel's C++ compiler on my own floating point heavy plasma simulation program. I tried some very high optimization flags, and that produced a binary which crashed.
Using -O1 produced a binary roughly 1/2 as fast as a -O3 g++-compiled binary.
Perhaps this compiler is a win on C code, but on C++ it sure looks like a dog to me.
Since when does the GPL not allow restrictions on distributing binaries? It only requires the ability to get the source for free.
A deep unwavering belief is a sure sign you're missing something...
I wonder if Intel's compiler is binary compatible with gcc. While it's probably against the licensing to redistribute the compiler's math or C library, I wonder if you could compile the gnu math/C /X library with icc and produce a shared object? An optimized math or other system (X-)library would give some decent improvement in performance.