Slashdot Mirror


A Review of GCC 4.0

ChaoticCoyote writes " I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion: Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this. That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0. Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit. "

16 of 429 comments (clear)

  1. Expected by Hatta · · Score: 4, Interesting

    It was a long time before GCC 3 got better than 2.95. I expect the same thing will happen here.

    --
    Give me Classic Slashdot or give me death!
    1. Re:Expected by ajs · · Score: 4, Interesting

      I'm not convinced that this test shows that gcc4 is less effecitve than gcc3, though.

      First off, all of the programs tested are programs that use hand-tooled assembly in the most performance-sensitive code. That has to mean that the compiler is moot in those sections.

      A better test would be to compare three things: the hand-optimized assembly under gcc 3 vs the C code (usually there's a configure switch that tells the code to ignore the hand-tuned assembly, and use a C equivalent) under gcc4 vs that same C code under gcc4.

      I think you'd see a surprising result, and if the vectorization code is good enough, you should even see a small boost over the hand-tuned assembly (since ALL of the code is being optimized this way, not just critical sections).

    2. Re:Expected by Rimbo · · Score: 5, Interesting

      I think the author of the article misunderstands just what happened with GCC 4.0.

      The main improvement in GCC 4.0 is implementing Single Static Assignment.

      SSA is not an optimization. It is a simplification. If you can assume SSA, then it opens the door to an entire class of optimizations that can help improve your performance without affecting your code's correctness.

      That last bit -- optimizing code without affecting correctness -- was a big problem in the days before SSA.

      In that regard, SSA is a similar technology to RISC -- it does not speed things up by itself, but it enables speedups for later on.

      The lack of SSA is one thing that kept gcc out of the hands of compiler researchers. Now that it does that, academia can start hacking away with gcc, and the delay you expect is the time between implementing SSA and implementing all of the optimizations that really will improve code performance.

  2. Compilation Speed Test by a KDE developer by Anonymous Coward · · Score: 5, Interesting

    http://www.kdedevelopers.org/node/view/1004

    Qt:
    -O0 -O2
    gcc 3.3.5 23m40 31m38
    gcc 3.4.3 22m47 28m45
    gcc 4.0.0 13m16 19m23

    KDElibs (with --enable-final)
    -O0 -O2
    gcc 3.3.5 14m44 27m28
    gcc 3.4.3 14m49 27m03
    gcc 4.0.0 9m54 23m30

    KDElibs (without --enable-final)
    -O0
    gcc 3.3.5 32m56
    gcc 3.4.3 32m49
    gcc 4.0.0 15m15

    I think KDE and Gentoo people will like GCC 4.0 ;)

  3. Re:The performance of compiled code by pclminion · · Score: 4, Interesting
    You're obviously a small box user. Have you ever worked in the real world where huge batch runs can take weeks?

    Yes.

    You think companies should splash out another million or too on new hardware, just because you use a pissy little machine?

    I think that companies should re-evaluate their "need" for an extra 5% performance. Here's an idea -- if you need something 10 minutes faster, why not start the process 10 minutes sooner?

    5% just gets lost in the noise. You beef up your system, making it 5% faster... And then some retard in production makes a mistake and sets you back six weeks.

  4. Kind of a weird review by Just+Some+Guy · · Score: 4, Interesting
    As far as I'm concerned, unless you're using "-Os" because you're deliberately building small binaries at the expense of all else - say, for embedded development - the resulting binary size is completely irrelevant as a compiler benchmark. What if the smaller result uses a slower, naive algorithm (which in this case would mean choosing an obviously-correct set of opcodes to implement a line of C instead of a less-obvious but faster set)?

    Second, the runtime benchmarks were close enough to be statistically meaningless in most cases. The author concludes with:

    Is GCC 4.0 better than its predecessors?

    In terms of raw numbers, the answer is a definite "no".

    My take would have been "in terms of raw numbers, it's not really any better yet." It's close enough to equal (and slower in few enough cases that I'd be willing to accept them), though, that I'd be willing to switch to it if I could do so without having to modify a lot of incompatible code. It's clearly the way of the future, and as long as it's not worse than the current gold standard, why not?

    --
    Dewey, what part of this looks like authorities should be involved?
  5. Re:I'll tell you what the problem is... by Inkieminstrel · · Score: 3, Interesting

    Gee, I would have just compiled 4.0.0 with 3.4.3, then compiled 4.0.0 again with 4.0.0.

  6. Re:intel compiler by MORB · · Score: 5, Interesting

    Intel compiler's reason why it generate faster code is because it does auto-vectorisation (ie, it automatically finds out how to transform some code patterns to take advantage of native vector operation, such as those provided by sse). They started to implement this in gcc 4.0, but it's a veyr first iteration that for what I know is still kinda limited. I'm not even sure it's enabled by default, even in -O3. There are lots of improvement there targeted at gcc4.1.

  7. Do the new models replace or confuse old ones? by expro · · Score: 3, Interesting

    I agree that this compiler is a cornerstone of free software.

    But it was very frustrating to me to try to port the compiler to a new platform by modifying existing back ends for similar platforms.

    After spending a few months on it (m68k in this case), I could not escape the layers of hack upon cruft upon hack upon cruft, that made it extremely difficult to make even fairly superficial mods because everyone seemed to be using the features differently and all the power seemed lost in hacks that made it impossible to do simple things (for me anyway). I am quite familiar with many assemblers and optimizing compilers.

    I hope that the new work makes a somewhat-clean break with the old, otherwise, I would fear yet another layer to be hacked and interwoven, with the other ones that were so poorly fit to the back ends.

    I suspect that not all backends are the same and perhaps the same experience would not be true for a more-popular target, but it seems to me it shouldn't be that hard to create a model that is more powerful yet more simple. Such would seem to me to be a major step forward and enable much greateer optimization, utilization, maintainability, etc.

  8. -ftree-* by Anonymous Coward · · Score: 5, Interesting

    The whole point of gcc4.0.0 is the tree-ssa thing. The author of this test didn't seem to notice that this stuff doesn't get enabled in -O2 nor -O3, but does have to be enabled by hand. This includes autovectorization (-ftree-vectorice) among other things which may make a difference.

    If I was him, I'd repeat the tests again enabling the -ftree stuff when building with gcc4.0.0.

  9. non-x86 arch? by ChristTrekker · · Score: 3, Interesting

    What about the performance on MIPS? PPC? C'mon, people...enquiring minds want to know!

  10. Re:The performance of compiled code by The+Snowman · · Score: 3, Interesting

    I think that companies should re-evaluate their "need" for an extra 5% performance. Here's an idea -- if you need something 10 minutes faster, why not start the process 10 minutes sooner?

    In any large organization, the process gets in the way. Some suit decides the product needs a new feature, or needs to ship sooner, or whatever, and this slowly trickles down to the developers who suddenly are put in crunch time where every minute counts. Schedules and deadlines may change daily. People's jobs may be at risk. Shit happens.

    Nobody really likes it, but that is sometimes how we arrive at the point where we "need" an extra 5% performance, where we "need" the program to finish ten minutes sooner. Starting earlier is not always an option, usually because you don't know you even have to start *at all* until the last minute.

    --
    24 beers in a case, 24 hours in a day. Coincidence? I think not!
  11. Re:The ? operator by keshto · · Score: 3, Interesting

    Dude, you've never coded in a commercial environment , have you ? Or are all your company's projects meant to be compiled by a specific version of gcc only, regardless of the OS and architecture? I use gcc exclusively these days, but it's for my research. Back when I was working, we had to code for both VC++ and g++ . Atleast, the ones of us who worked on core-engine code. Fixing some moron's VC++ -specific idiocy sucked.

  12. Funny you should say that - a story about sprintf by Szplug · · Score: 3, Interesting

    A prospective professor gave a talk at our school, he'd been working at some grid computing lab. Well their code was underperforming, so they profiled it and found that sprintf was taking 30? 50? 70? % (I forget) of their code time - the machines had to communicate with each other a lot, and they used sprintf to serialize. (It's an easy fix - C++'s stream operators are much faster, since the type is known at compile time.)

    Oh yeah, also, for Quake 1, John Carmack hired Michael Abrash, an assembly language guru, to help out. Well Abrash found that GCC's memcpy() (or whatever it was) was copying byte-by-byte instead of by word (or something, I don't remember) and his reimplementation of that alone, doubled the frame rate!

    Just some interesting counter examples to keep in mind :)

    --
    Someday we'll all be negroes
  13. Observations on Apple's GCC4 release by Paradox · · Score: 4, Interesting

    It isn't a huge deal for most people, but it seems like the new GCC is singificantly better at optimizing for the PowerPC now.

    I've been working with the GNU GSL on my mac a lot, and I recently updated to Tiger. The first thing I noticed when I recompiled the GSL with Apple's modified GCC4.0 is the significant and noticable speed increase. With this intense math stuff, doing SVD on 300x200 matricies, and it's shocking how much faster it is. I went from 3-5 seconds down to less than one.

    I am not going to post any hard numbers because I haven't rigorously compared them yet, but I'll make some formal comparisons this week.

    --
    Slashdot. It's Not For Common Sense
  14. Re:What about... by Space+cowboy · · Score: 3, Interesting

    It would be nice to see some test results for Apple's GCC versions 3 and 4.

    Well, I did have a bunch of results for you, but the CRAPPY LAMENESS FILTER won't let me post them. Apparently I have to use less 'junk' characters (of course the CRAPPY PROGRAMMER didn't define what a 'junk' character is in the error message, so that's NO USE WHAT-SO-EVER.)

    So, I guess I'll summarise. gcc version 4 is slightly worse than 3.3, and slightly better when the tree-vectorize option is passed and altivec code is generated.

    Simon

    --
    Physicists get Hadrons!