A Review of GCC 4.0

← Back to Stories (view on slashdot.org)

Posted by Hemos on Monday May 2, 2005 @04:44AM from the only-time-will-tell dept.

ChaoticCoyote writes " I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion: Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this. That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0. Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit. "

29 of 429 comments (clear)

Min score:

Reason:

Sort:

Expected by Hatta · 2005-05-02 04:46 · Score: 4, Interesting

It was a long time before GCC 3 got better than 2.95. I expect the same thing will happen here.

--
Give me Classic Slashdot or give me death!
1. Re:Expected by ajs · 2005-05-02 05:17 · Score: 4, Interesting
  
  I'm not convinced that this test shows that gcc4 is less effecitve than gcc3, though.
  
  First off, all of the programs tested are programs that use hand-tooled assembly in the most performance-sensitive code. That has to mean that the compiler is moot in those sections.
  
  A better test would be to compare three things: the hand-optimized assembly under gcc 3 vs the C code (usually there's a configure switch that tells the code to ignore the hand-tuned assembly, and use a C equivalent) under gcc4 vs that same C code under gcc4.
  
  I think you'd see a surprising result, and if the vectorization code is good enough, you should even see a small boost over the hand-tuned assembly (since ALL of the code is being optimized this way, not just critical sections).
2. Re:Expected by Rimbo · 2005-05-02 06:00 · Score: 5, Interesting
  
  I think the author of the article misunderstands just what happened with GCC 4.0.
  
  The main improvement in GCC 4.0 is implementing Single Static Assignment.
  
  SSA is not an optimization. It is a simplification. If you can assume SSA, then it opens the door to an entire class of optimizations that can help improve your performance without affecting your code's correctness.
  
  That last bit -- optimizing code without affecting correctness -- was a big problem in the days before SSA.
  
  In that regard, SSA is a similar technology to RISC -- it does not speed things up by itself, but it enables speedups for later on.
  
  The lack of SSA is one thing that kept gcc out of the hands of compiler researchers. Now that it does that, academia can start hacking away with gcc, and the delay you expect is the time between implementing SSA and implementing all of the optimizations that really will improve code performance.
What about... by elid · 2005-05-02 04:47 · Score: 2, Interesting

...Tiger? Wasn't it compiled with GCC 4.0?
1. Re:What about... by Space+cowboy · 2005-05-02 18:23 · Score: 3, Interesting
  
  It would be nice to see some test results for Apple's GCC versions 3 and 4.
  
  Well, I did have a bunch of results for you, but the CRAPPY LAMENESS FILTER won't let me post them. Apparently I have to use less 'junk' characters (of course the CRAPPY PROGRAMMER didn't define what a 'junk' character is in the error message, so that's NO USE WHAT-SO-EVER.)
  
  So, I guess I'll summarise. gcc version 4 is slightly worse than 3.3, and slightly better when the tree-vectorize option is passed and altivec code is generated.
  
  Simon
  
  --
  Physicists get Hadrons!
intel compiler by Anonymous Coward · 2005-05-02 04:49 · Score: 1, Interesting

wasn't one of the goals of 4.0 to become more like intel's prized x86 compiler?
1. Re:intel compiler by MORB · 2005-05-02 05:07 · Score: 5, Interesting
  
  Intel compiler's reason why it generate faster code is because it does auto-vectorisation (ie, it automatically finds out how to transform some code patterns to take advantage of native vector operation, such as those provided by sse). They started to implement this in gcc 4.0, but it's a veyr first iteration that for what I know is still kinda limited. I'm not even sure it's enabled by default, even in -O3. There are lots of improvement there targeted at gcc4.1.
Re:The performance of compiled code by Chirs · 2005-05-02 04:51 · Score: 1, Interesting

At some point you've got the best algorithm, you've profiled, you've hand-optimised, you've got the fastest hardware you can afford....and you *still* need that last 5%.

That's when you spend 10 hours tweaking compilers settings...
The Future? by liam193 · 2005-05-02 04:52 · Score: 2, Interesting

Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95.

While I know the benefits of Fortran 95 are a big thing, saying it's a technological step forward to incorporate for the first time a 10 year old standard seems a bit ridiculous. When I first saw this article I had to check my calendar to make sure it was May 1st and not April 1st.
Compilation Speed Test by a KDE developer by Anonymous Coward · 2005-05-02 04:55 · Score: 5, Interesting

http://www.kdedevelopers.org/node/view/1004 Qt: -O0 -O2 gcc 3.3.5 23m40 31m38 gcc 3.4.3 22m47 28m45 gcc 4.0.0 13m16 19m23 KDElibs (with --enable-final) -O0 -O2 gcc 3.3.5 14m44 27m28 gcc 3.4.3 14m49 27m03 gcc 4.0.0 9m54 23m30 KDElibs (without --enable-final) -O0 gcc 3.3.5 32m56 gcc 3.4.3 32m49 gcc 4.0.0 15m15 I think KDE and Gentoo people will like GCC 4.0 ;)
Re:The performance of compiled code by Anonymous Coward · 2005-05-02 04:59 · Score: 1, Interesting

If you really, positively need an extra 5% performance, you might as well just buy a computer that's 5% faster.
Why should they? What problem do you have how someone spends their time? If someone wants to make their system as fast as they can, that's their business.

You're obviously a small box user. Have you ever worked in the real world where huge batch runs can take weeks? You think companies should splash out another million or too on new hardware, just because you use a pissy little machine?
Re:The performance of compiled code by pclminion · 2005-05-02 05:03 · Score: 4, Interesting

You're obviously a small box user. Have you ever worked in the real world where huge batch runs can take weeks?
Yes.

You think companies should splash out another million or too on new hardware, just because you use a pissy little machine?

I think that companies should re-evaluate their "need" for an extra 5% performance. Here's an idea -- if you need something 10 minutes faster, why not start the process 10 minutes sooner?

5% just gets lost in the noise. You beef up your system, making it 5% faster... And then some retard in production makes a mistake and sets you back six weeks.
Kind of a weird review by Just+Some+Guy · 2005-05-02 05:03 · Score: 4, Interesting

As far as I'm concerned, unless you're using "-Os" because you're deliberately building small binaries at the expense of all else - say, for embedded development - the resulting binary size is completely irrelevant as a compiler benchmark. What if the smaller result uses a slower, naive algorithm (which in this case would mean choosing an obviously-correct set of opcodes to implement a line of C instead of a less-obvious but faster set)?
Second, the runtime benchmarks were close enough to be statistically meaningless in most cases. The author concludes with:

Is GCC 4.0 better than its predecessors?
In terms of raw numbers, the answer is a definite "no".

My take would have been "in terms of raw numbers, it's not really any better yet." It's close enough to equal (and slower in few enough cases that I'd be willing to accept them), though, that I'd be willing to switch to it if I could do so without having to modify a lot of incompatible code. It's clearly the way of the future, and as long as it's not worse than the current gold standard, why not?

--
Dewey, what part of this looks like authorities should be involved?
Re:I'll tell you what the problem is... by Inkieminstrel · 2005-05-02 05:03 · Score: 3, Interesting

Gee, I would have just compiled 4.0.0 with 3.4.3, then compiled 4.0.0 again with 4.0.0.
Do the new models replace or confuse old ones? by expro · 2005-05-02 05:07 · Score: 3, Interesting

I agree that this compiler is a cornerstone of free software.

But it was very frustrating to me to try to port the compiler to a new platform by modifying existing back ends for similar platforms.

After spending a few months on it (m68k in this case), I could not escape the layers of hack upon cruft upon hack upon cruft, that made it extremely difficult to make even fairly superficial mods because everyone seemed to be using the features differently and all the power seemed lost in hacks that made it impossible to do simple things (for me anyway). I am quite familiar with many assemblers and optimizing compilers.

I hope that the new work makes a somewhat-clean break with the old, otherwise, I would fear yet another layer to be hacked and interwoven, with the other ones that were so poorly fit to the back ends.

I suspect that not all backends are the same and perhaps the same experience would not be true for a more-popular target, but it seems to me it shouldn't be that hard to create a model that is more powerful yet more simple. Such would seem to me to be a major step forward and enable much greateer optimization, utilization, maintainability, etc.
-ftree-* by Anonymous Coward · 2005-05-02 05:08 · Score: 5, Interesting

The whole point of gcc4.0.0 is the tree-ssa thing. The author of this test didn't seem to notice that this stuff doesn't get enabled in -O2 nor -O3, but does have to be enabled by hand. This includes autovectorization (-ftree-vectorice) among other things which may make a difference.

If I was him, I'd repeat the tests again enabling the -ftree stuff when building with gcc4.0.0.
when? why? by ChristTrekker · 2005-05-02 05:10 · Score: 2, Interesting

At what point (of 3's evolution) would you say it surpassed 2.95? Why?

--
Constitutionally Correct
non-x86 arch? by ChristTrekker · 2005-05-02 05:12 · Score: 3, Interesting

What about the performance on MIPS? PPC? C'mon, people...enquiring minds want to know!

--
Constitutionally Correct
Re:The performance of compiled code by The+Snowman · 2005-05-02 05:13 · Score: 3, Interesting

I think that companies should re-evaluate their "need" for an extra 5% performance. Here's an idea -- if you need something 10 minutes faster, why not start the process 10 minutes sooner?

In any large organization, the process gets in the way. Some suit decides the product needs a new feature, or needs to ship sooner, or whatever, and this slowly trickles down to the developers who suddenly are put in crunch time where every minute counts. Schedules and deadlines may change daily. People's jobs may be at risk. Shit happens.

Nobody really likes it, but that is sometimes how we arrive at the point where we "need" an extra 5% performance, where we "need" the program to finish ten minutes sooner. Starting earlier is not always an option, usually because you don't know you even have to start *at all* until the last minute.

--
24 beers in a case, 24 hours in a day. Coincidence? I think not!
Re:What the hell? by Kupek · 2005-05-02 05:16 · Score: 2, Interesting

One of the major changes in the 4.0.0 release is the internal reorganization that allows for more aggressive optimizations. Hence, he tested how the optimized performance of the latest 3.x versus 4.0.0. How do you tell the compiler to optimize? Well, you have to pass it "lots of flags."
Re:The ? operator by keshto · 2005-05-02 05:19 · Score: 3, Interesting

Dude, you've never coded in a commercial environment , have you ? Or are all your company's projects meant to be compiled by a specific version of gcc only, regardless of the OS and architecture? I use gcc exclusively these days, but it's for my research. Back when I was working, we had to code for both VC++ and g++ . Atleast, the ones of us who worked on core-engine code. Fixing some moron's VC++ -specific idiocy sucked.
Re:I'll tell you what the problem is... by confused+one · 2005-05-02 05:19 · Score: 2, Interesting

I believe the Linux from Scratch (LFS) folks have found you have to repeat this three (3) times to have, what is effectively, a clean 4.0.0 compile.
Funny you should say that - a story about sprintf by Szplug · 2005-05-02 06:11 · Score: 3, Interesting

A prospective professor gave a talk at our school, he'd been working at some grid computing lab. Well their code was underperforming, so they profiled it and found that sprintf was taking 30? 50? 70? % (I forget) of their code time - the machines had to communicate with each other a lot, and they used sprintf to serialize. (It's an easy fix - C++'s stream operators are much faster, since the type is known at compile time.)

Oh yeah, also, for Quake 1, John Carmack hired Michael Abrash, an assembly language guru, to help out. Well Abrash found that GCC's memcpy() (or whatever it was) was copying byte-by-byte instead of by word (or something, I don't remember) and his reimplementation of that alone, doubled the frame rate!

Just some interesting counter examples to keep in mind :)

--
Someday we'll all be negroes
Observations on Apple's GCC4 release by Paradox · 2005-05-02 06:30 · Score: 4, Interesting

It isn't a huge deal for most people, but it seems like the new GCC is singificantly better at optimizing for the PowerPC now.

I've been working with the GNU GSL on my mac a lot, and I recently updated to Tiger. The first thing I noticed when I recompiled the GSL with Apple's modified GCC4.0 is the significant and noticable speed increase. With this intense math stuff, doing SVD on 300x200 matricies, and it's shocking how much faster it is. I went from 3-5 seconds down to less than one.

I am not going to post any hard numbers because I haven't rigorously compared them yet, but I'll make some formal comparisons this week.

--
Slashdot. It's Not For Common Sense
No, the third run is for finding bugs by Bananenrepublik · 2005-05-02 06:44 · Score: 2, Interesting

If your compiler compiles correctly, a program (leaving floating point inaccuracies aside) should produce the same result no matter what compiler it is compiled with. I.e. a gcc 4.0 should produce the same results no matter if it's itself compiled with 3.4.3 or 4.0.
Still generating 386 assembly? by heroine · 2005-05-02 06:44 · Score: 1, Interesting

It seems aside from extremely rare cases, gcc still generates exclusively 386 assembly. The commercial compilers have long since migrated into MMX, MMX2, SSE, SSE2, 3DNow instructions. The new versions of Windows software are all compiled with modern instructions but GCC and all of Linux is still using old 386 instructions.

If the amount of energy being spent on redesigning the kernel architecture, redesigning the compiler architecture, and redesigning the command usages was spent supporting new instruction sets, it could probably catch up to MSVC from 2000.

It's sort of sad that instead of improving the computer's ability to perform a certain amount of work in a certain amount of time, all the energy in GCC has always gone towards the study of compiler design itself.
1. Re:Still generating 386 assembly? by Anonymous Coward · 2005-05-02 07:13 · Score: 1, Interesting
  
  Errrr....
  -march=pentium4
Re:kettle? black? by Grishnakh · 2005-05-02 06:56 · Score: 2, Interesting

The other argument that has been missed being... GCC is not charging you money to upgrade/use 4.0. When you pay a bunch of money, expectations of performance, reliability, and speedy bug-fixes rise.

Exactly. The goals in releasing software are completely different for GCC and MS.

For GCC, like a lot of open-source products, the idea behind releasing all-new x.0 versions is to get it out there so early adopters will start using it, filing bug reports, etc. It's the same reason the Linux kernel releases new even-numbered versions (2.x.0) before they're really ready for mainstream use. If they waited too long, people would avoid them, thinking they're "just development versions", and it'd take forever to get the bugs out. Unlike with commercial software, you need to know what you're doing when you use these open-source products directly, rather than using a packaged distribution. If you want a stable system, don't download GCC or Linux directly and compile from sources. Get the version that comes with your distribution.

MS, on the other hand, wants to get money and marketshare when they release all-new versions of their software. When they release a new version, they do so with the implication that this product is ready for general use, and that all the bugs are worked out (after all, you're paying a lot of money for it, so shouldn't the debugging be part of the price?). This is reinforced by the fact that most updates are not free.

If I buy a car, I expect it to work reliably, and not break down within the warranty term, as long as I perform the required maintenance. If a wheel falls off as soon as I drive out of the dealer's lot, they have to fix it at their own expense. I might even get a new car under the Lemon Law if this happens multiple times. What's more, if the car has any serious defects, these are usually fixed for free under factory recalls.

But when I buy MS software, none of this applies. There is no warranty whatsoever. If it doesn't work, there is no recourse. And if they release a new version that fixes a lot of problems, there's no guarantee it will be free. Did all the Windows ME buyers get a new version of Windows to fix that disaster of an OS? No, they had to upgrade at their own expense.

The bottom line is that the product MS releases is a shrink-wrapped product that is supposedly intended for direct use by the general public, which implies a certain level of fitness. GCC, on the other hand, does not release its product directly to the general public (though they're certainly free to download it and try it out if they wish). Its product is intended for use by software developers (who want to try out the latest, and possibly buggy compiler) and distributions. People who just want to surf the net or write documents should not be concerned with this, nor should developers who want to produce stable code. These people should all be simply using precompiled distributions, and using the software versions provided by them.
Re:Fast KDE compile. by Anonymous Coward · 2005-05-02 08:59 · Score: 1, Interesting

For those of you who miss the humor in Andrew Pinski's reply, some background:

GCC's optimization passes try to be as machine independant as possible. They, of course, handle machine specific things, like allocating and spilling registers and so on. But sometimes some platforms have some stupid rules for what registers and instructions to be used when (instruction XXX sets a flag, so instruction YYY has to be issued afterward to use that flag before it gets overwritten, or maybe register foo had been spilled so it has to be reloaded before ZZZ, or this value always has to be in register A, etc). A lot of this code ends up in a function called reload, in a file by the same name.

Over the years reload has grown, and grown, and grown. No developer understands all of it. A goal of many people is to improve the register allocator and othr passes to make it so reload can be disabled on most platforms. The thing is, well, not quite buggy, but fragile and causes many bugs when things change.

So when Pinski says "Reload, wow, not that unexcepted really." he means it. There are also other comments in that thread, like "Though it is still a reload patch, and all reload patches are dangerous."

From what I understand from the rest of the thread, reload needs to make two hardware registers the same for an instruction, but ends up using up a virtual register it shouldn't, which later confuses the rest of the compiler since there are still notes to not use the virtual register.

The bug has been fixed in CVS. Patch is in the attached web page, both for khtml to use with 4.0.0 and for GCC if you roll your own and want to compile khtml.