Slashdot Mirror


A Review of GCC 4.0

ChaoticCoyote writes " I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion: Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this. That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0. Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit. "

40 of 429 comments (clear)

  1. The performance of compiled code by pclminion · · Score: 5, Informative
    This has always bugged me.

    Some people spend 10 hours tweaking compiler settings and optimizations to get an extra 5% performance from their code.

    Other people spend 2 hours selecting the proper algorithm in the first place and get an extra 500% performance from their code.

    To semi-quote The Matrix: One of these endeavors... is intelligent. And one of them is not.

  2. email by Anonymous Coward · · Score: 1, Informative

    I tried to email you about your "thethere" mistake, but you don't want to talk to people apparently. Not the most important of corrections maybe, but anyway...

    scott.ladd@coyotegulch.com
    SMTP error from remote mailer after RCPT TO::
    host smtp.secureserver.net [64.202.166.12]:
    553 217.209.223.* mail rejected due to excessive spam

  3. Fast KDE compile. by Anonymous Coward · · Score: 4, Informative

    It's damn fast for KDE compile as someone tested.

    1. Re:Fast KDE compile. by badfish99 · · Score: 3, Informative
      Well, the article you link to starts with the words:
      KDE sources now blacklist gcc 4.0.0 because it miscompiles KDE
      It must be easy to compile fast if you don't mind getting the wrong answer.
    2. Re:Fast KDE compile. by AArmadillo · · Score: 4, Informative

      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20973
      GCC bug report
      http://lists.kde.org/?l=kde-cvs&m=111451142117674& w=2
      KDE CVS report

      It involves some problem with register allocation. It seems only to miscompile KHTML, and there is already a patch attached to the GCC bug report (although the patch just disables the optimization that is causing the problem, rather than fixing the core problem itself).

  4. Re:What about... by scotlewis · · Score: 5, Informative

    Yes and no. The default compiler is GCC4, however, the kernel and much of the OS (pretty much all of Darwin, in fact) are still compiled with GCC3 because they haven't completely cleared the codebase of GCC3-isms.

    That said, remember that the submitter is talking about GCC4 on x86 platforms, and remember that Apple is putting a lot of work into making sure the PowerPC optimizations are as good as possible. Not to mention things like GCC4's auto-vectorization of code to take advantage of the Altivec unit (which has a more noticeable effect than MMXing x86 code).

    It would be nice to see some test results for Apple's GCC versions 3 and 4.

  5. Re:kettle? black? by Dan+Berlin · · Score: 5, Informative

    Significant difference. If you ask gcc folk (like me), we'd happily tell you that 4.0 will probably be, performance wise, win in some cases, and a lose in others. Anytime you add large numbers of optimizations, it takes a while to tune everything else so that we get good generated code. 4.0 is more a test of the new optimizers than something that is supposed to produce spectacular results in all cases.

  6. Re:Screenshots? by wawannem · · Score: 2, Informative

    damn... I meant:

    Here you go:

    bash$ gcc -o test main.c
    bash$

  7. Re:I'll tell you what the problem is... by Anonymous Coward · · Score: 5, Informative

    This was meant as a joke, but for those who took this too seriously: if you have ever tried building GCC yourself, you should know that it always recompiles itself.

    A gcc "stage 1" build is gcc compiled with your old compiler. The "stage 2" build is gcc compiled with the compiler created in the previous stage. This is the one that gets installed. The "stage 3" build is optional and verifies that the "stage 2" compiler creates the same output as the previous one.

  8. Re:Kind of a weird review by larien · · Score: 2, Informative

    Smaller binaries = quicker load time (less disk I/O or memory being moved around) and smaller memory footprint. Yes, this is mostly in embedded apps where memory sizes might still be in KB rather than GB, but if you're analyzing performance, memory usage is relevant, even if it may not be your primary concern.

  9. This is expected, I think by diegocgteleline.es · · Score: 5, Informative

    I found this in the osnews announcement

    "Before we get a bunch of complaints about the fact that most binaries generated by GCC 4.0 are only marginally faster (and some a bit slower) than those compiled with 3.4, let me point out a few things that I've gathered from casually browsing the GCC development lists. I'm neither a GCC contributor nor a compiler expert.

    Prior to GCC 4.0, the implementation of optimizations was mostly language-specific; there was little or no integration of optimization techniques across all languages. The main goal of the 4.0 release is to roll out a new, unified optimization framework (Tree-SSA), and to begin converting the old, fragmented optimization strategies to the unified framework.

    Major improvements to the quality of the generated code aren't expected to arrive until later versions, when GCC contributors will have had a chance to really begin to leverage the new optimization infrastructure instead of just migrating to it.

    So, although GCC 4.0 brings fairly dramatic benefits to compilation speed, the speed of generated binaries isn't expected to be markedly better than 3.4; that latter speedup isn't expected until later installments in the 4.x series."

    1. Re:This is expected, I think by Ian+Lance+Taylor · · Score: 2, Informative

      That is only partly true. All gcc releases since 1.0 have integrated optimizations across all languages. What gcc 4.0 has added is a new, higher-level framework for language independent optimizations. The new framework, known as tree-ssa, permits more powerful optimization techniques, in particular some of the techniques which have been developed by the compiler research community since gcc was first written in the 1980s. The old language independent optimization framework, known as RTL, is still there and is still used in gcc 4.0.

  10. Re:I'll tell you what the problem is... by Anonymous Coward · · Score: 1, Informative

    Nice try. When you build GCC you do a "make bootstrap", which does things in multiple stages:\
    1- Compiles 4.0 with system compiler (3.4.3) and no optimizations
    2- Compiles 4.0 with stage 1 compiler, full optimization
    3- Compiles 4.0 with stage 2 compiler, full optimization.
    4- Checks that stage 2 and 3 produced the same code. Results of stage 3 is the final compiler.

    (I might be missing a stage there)

    Modern GCCs even have a bootstrap target that adds an extra stage where GCC is profiled, to see which branches are taken more often, and the results are fed back into the next stage so the compiler is optimized for real world usage. Nice stuff really.

  11. It really does depend on the code by DoofusOfDeath · · Score: 4, Informative

    There was one test case I did for my own use. I've got a small C++ program that's computationally heavey and has a small working set of memory.

    On that program (on a P4) I got an 11% reduction in runtime using GCC 4 vs. GCC 3.3.5. This was actually a big deal for me work.

    The lesson here: You're mileage with GCC 4.0's improvements may vary from the benchmarks, and you might want to try it on your own code.

  12. Re:The Future? by Lemming+Mark · · Score: 2, Informative

    It's not just the addition of F95, which obviously only benefits Fortran users.

    GCC4 has a new optimisation architecture called "Tree SSA". This introduces a new representation (well, actually two: GENERIC and GIMPLE, although the latter is a subset if the former) for programs under compilation. The GIMPLE representation is used for advanced high-level optimisations before feeding the code into the compiler "back end" for architecture-specific optimisation and code generation.

    The advantages of Tree SSA are multiple:
    * cleaner architecture
    * allows high level optimisations that were previously hard or impossible to do at the RTL (Register Transfer Language) level used by the backends
    * despite being "high level" many optimisations that take advantage of program structure can be made language independent because of the GENERIC / GIMPLE representation

    However, it'll take a while for new optimisations that have been enabled by this framework to be written. The idea is that Tree SSA breaks a fundamental barrier to the continued improvement of optimisation in GCC and should yield gains in years to come.

    There are some other nifty things in GCC4 like the "mudflap" system for detecting program errors. Enhanced type-checking fussiness is also welcome as far as I'm concerned, even if it results in some compile errors.

  13. Re:Speed/Performance Benchmarks?? by tomstdenis · · Score: 2, Informative

    Not always. Usually Intel's CC produces faster code except when you give it code [LibTomMath] that it can't efficiently optimize.

    Also as he hates to have pointed out his options aren't always optimal.

    Quite a few applications are faster with 3.4.3 on a P4 with "-fno-regmove" as well as -O3. My AES for instance goes down from >500 cycles/block to 380 cycles/block on my Prescott P4 with this switch.

    380 cycles/block is faster than Intel CC v8.0 with "-O3 -xP -ip" by about 30 cycles/block.

    Also the guy probably didn't try profiling. I can drop a fair chunk of cycles in doing ECC point multiplies on my P4 with GCC by doing a profiled build system. ...

    ETC!

    Tom

    --
    Someday, I'll have a real sig.
  14. Re:Screenshots? by Anonymous Coward · · Score: 1, Informative

    One of my small joys in life is reading slashdot and seeing people who just didn't get the joke. :)
    -K

  15. 4.0.0 broke backward compatibility big time by cpghost · · Score: 4, Informative

    Recently, a discussion took place on a FreeBSD mailing list wether the project wanted to use GCC 4.0.0 as the system compiler. Some objections where:

    • KDE would not compile cleanly
    • Most of the 12.000+ ports would need manual tweaking because of other incompatibilities.
    • Some C constructs have been obsoleted, requiring huge sweeps over the existing BSD code base.

    If I understood it right, We won't have a GCC 4.0.0 system compiler on FreeBSD anytime soon. Installing the gcc40 port is, of course, always possible.

    --
    cpghost at Cordula's Web.
    1. Re:4.0.0 broke backward compatibility big time by mandolin · · Score: 3, Informative
      If you actually follow the link, it looks like gcc4 was miscompiling bits of kde. Here is an example.

      "not compiling cleanly" may have been a less-accurate description of the problem.

    2. Re:4.0.0 broke backward compatibility big time by JoeBuck · · Score: 4, Informative
      In the case of the KDE problem, there were two bugs, one of which was in KDE, but the other of which was in 4.0.0. So it sometimes is appropriate to blame the compiler. That bug is fixed in CVS and will be fixed in 4.0.1.

      It should not surprise anyone that the first 0.0 release has some bugs. It's the first release of a compiler with a completely new optimization structure (tree-ssa). I would advise waiting for 4.0.1 for a production-quality release, or go with vendor patches (by making Fedora Core 4 with a 4.0.0 based compiler, Red Hat will probably shake out a few more bugs).

  16. Re:The Future? by Anonymous Coward · · Score: 1, Informative

    The last C standard was finalized in 1999 (that's what C99 means). The last (and only) C++ standard is from 1998. There was a recent technical revision to C++, came out this year or last.

    SH

  17. Re:I'll tell you what the problem is... by rsidd · · Score: 4, Informative

    The GP was a joke, but since you're serious, this is exactly what the "bootstrapping" build of gcc does: it builds a stage 1 build with the system compiler, then a stage 2 build with the stage 1 build, then -- if you want -- a stage 3 build with the stage 2 build, and verifies that the stage 2 and stage 3 builds are the same.

  18. Re:-ftree-* by Florian+Weimer · · Score: 3, Informative

    The whole point of gcc4.0.0 is the tree-ssa thing.

    True, this is the major infrastructure change which justified the "4".

    The author of this test didn't seem to notice that this stuff doesn't get enabled in -O2 nor -O3, but does have to be enabled by hand.

    No, most tree-ssa optimizers are enabled implicitly at -O2 (they replace quite a few of the old RTL-based optimizers). Only some numerics code can benefit from loop autovectorization (which has to be enabled explicitly; for most source code, it just increases compile time).

  19. Re:Kind of a weird review by Waffle+Iron · · Score: 2, Informative
    the resulting binary size is completely irrelevant as a compiler benchmark.

    As the ratio of raw CPU MIPS to memory bandwidth and latency continues to increase, systems lean more and more on caches to compensate. Since larger code eats up more of the scarce cache resources to do the same job, small code can be more important than code with the lowest raw instruction clock count. This can be especially important in C++, where redundant code generated by templates can really get out of hand if not properly controlled.

  20. Re:Expected by The+boojum · · Score: 4, Informative

    Sorry to tell you this, but the review is even mistaken with respect to Povray. Povray is not a C++ program - it's good ol' C. So in fact, none of the programs he benchmarked were C++. The test was exclusively on C code.

    As nice as C is, a lot of the improvements in GCC seemed to have been targetted at improving its handling of C++ code. I'd particularly like to know how it fairs with respect to modern C++ style code - massively templated stuff with STL, Boost, traits and policies, smart pointers, lots of small inlined methods, etc. This test tells me nothing about that, and that's where a lot of development is these days.

  21. Re:Expected by Shisha · · Score: 2, Informative

    I call it the most reasonable benchmark because it has thousands of contributors and covers a wide range of code purposes and individual coding habits - and yet, performance is omitted.
    As someone who has done some kernel (see this old project) and other programming, I would probably disagree with this statement. The code you find in the linux kernel is rather different (think concurrency, locking, I/O waiting, message passing) to the code you'd find in a number crunching application (think for loops that take forever, huge data sets, nested recursion) which would be rather different to the code you find in something like LAME (think DSP code).

    As for the coding habits, the kernel develoment process encourages similar coding habits as it makes the code easier for others to read. There would be differences between different subsystems, which brings us to another problem: where do you start benchmarking the kernel as a whole?

  22. Re:kettle? black? by Miniluv · · Score: 2, Informative
    No, what it means is that if you're doing anything that is compiler sensitive, you should test before you deploy. Some people are seeing gains with GCC4, other people are seeing the same, and some are seeing losses. Each group needs to make a decision about a compiler for their situation.

    The GCC folks released this with it being well documented that it wasn't going to blow the doors off for everyone in every situation, but instead that this was a major step forward for internals, which should allow them to make some major steps forward that are externally visible soon.

  23. Re:Tree-ssa is in there by Dan+Berlin · · Score: 2, Informative

    of the ones you listed, only -ftree-loop-linear and -ftree-vectorize are not on by default. (you also missed a couple, though they slipped my mind)

  24. Re:Expected by uid8472 · · Score: 2, Informative

    POVray used to be written in plain C, but recent versions -- 3.5 and later, I think -- use C++.

  25. Re:Expected by ma_luen · · Score: 5, Informative

    I think you are over estimating the interest of the research community in working on gcc. The move from the intentionally underdocumented and ill defined intermediate representations to tree ssa is a huge step for gcc. Unfortunately, there is still no real effort to make the platform attractive to do experimental work on.

    The McCat compiler from McGill (which is what gcc borrowed the ssa rep from), C-- or the LLVM project all provide a much nicer platform. The internal representation is clearly documented, there are frameworks and examples for writing new passes and most importantly they all allow for whole program compilation.

    Until gcc decides to support some of this the project will continue to be ignored by research groups. This might be fine since research compiler work can be fairly ugly and it is just easier to port what works.

    Otherwise I agree that the move to ssa form is a critical step for gcc to take and it will enable it to become a "modern" compiler. More emportantly it will enable the inclusion of the large body of compiler work that is based on ssa forms.

    Mark

  26. Re:Still generating 386 assembly? by Ian+Lance+Taylor · · Score: 4, Informative

    You need to read up on the -march, -mfpmath, -mmmx, -msse, -msse2, -msse3, and -m3dnow options.

    If you build gcc yourself, you can even make them the default by configuring with an appropriate --with-arch option.

  27. Re:intel compiler by X · · Score: 2, Informative

    Um, if it doesn't make code run faster, what's the point of including it?

    Sigh.. I wasn't saying that it doesn't make code run faster. I was saying that it doesn't necessasrily make code run faster. Auto-vectorization is only a win in certain circumstances. There are a whole host of optimizations that only apply in specific circumstances and/or only improve performance in certain circumstances and slow things down in others. If there weren't trade-offs with optimizations, compilers would just have "-O" and wouldn't bother with tons of other optimization flags.

    --
    sigs are a waste of space
  28. Re:No, the third run is for finding bugs by Anonymous Coward · · Score: 5, Informative

    Yes, as long as it wasn't miscompiled.

    Historically, GCC tends to bring out the worst in compilers. That is why when you build GCC, the system compiler will be used once, /without optimizations/ to produce a slow GCC 4.0 which can be used to compile itself. This is done twice (stage 1 compiles stage 2 and stage 2 compiles stage 3) so that 2 and 3 can be compared to ensure that there were no miscompilations, as it is unlikely that a miscompiled compiler will produce correctly executable machine code that replicates exactly.

    Unlikely but possible. Look for the paper "Reflections on trusting trust" for a beautiful hack involving intentional miscompilations. The author basically changed the compiler so that when "login" was being compiled, the compiler inserted a back door. And when a new compiler was being compiled, the compiler would insert the code to insert the back door and to change the next compiler. And then no matter how much you checked teh source to either login or the compiler, you would never notice the back door.

  29. Re:Expected by Anonymous Coward · · Score: 1, Informative

    I'd particularly like to know how it fairs with respect to modern C++ style code - massively templated stuff with STL, Boost, traits and policies, smart pointers, lots of small inlined methods, etc.

    Even then, a lot of current code is optimized for existing compilers. For example, GCC 4.0 finally includes full-blown scalar replacement of aggregates, so struct members can be accessed with less overhead when accessed repeatedly (as oposed to reloading address and offset). This makes a lot of difference in heavy templated code with lots of iterators. OTOH, GCC has included a simpler version since 2.95 at least, which does something similar for structs with a single member, like most of the STL iterators.

    So in this case, if you use code that only uses STL iterators, you won't see any improvement (as there was already no abstraction penalty), so you have to choose benchmarks carefully. Likewise you might run into a lot of code that avoids unoptimized constructs. Like passing and returning pointers to avoid depending on constructier ellision and NRVO.

    Smart pointers are something I'm very interested in and will be checking out as soon as I clear out enough space to build a cygwin binary. Modern smart pointers (boost::smart_ptr or std::tr1::smart_ptr) have more than one member, so scalar replacement of aggregates should be really nice in terms of execution speed.

    And at some point I'll play around with the new lambda libraries. Should be nice to implicitely build up a function inside the parameter list of a transform :)

  30. Re:The Future? by GauteL · · Score: 2, Informative

    It IS a big thing. This is the first freely distributable, readily available compiler of Fortran 95.

    Up until now, my PhD work has needed compilers I can't just simply install without high fees, because the academic free license for propriatary compilers still sounds a bit fishy in it's requirements. This is actually a major boost for the Scientific Computing community.

    However, lots of people have just NOW started to trust current F95 compilers (lots of academic code are still written in F77). It will be several more years until they trust the GNU Fortran 95 compilers.

    Besides, while it is called Fortran 95, does not mean it was actually in heavy use by 1995.

  31. "problem" with gcc 4.0 by Anonymous Coward · · Score: 1, Informative
    Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit.
    A lot of people will probably get upset when their code which compiled just fine with 3.4.x doesn't compile with 4.0.0 anymore.

    Question is, should they be upset at the compiler?

    Recently, I found this thread on the reactos forums. It is about compiling reactos with gcc 4. Sure enough, there were problems. One thing that caught my eye is this:

    Also a mountain of Warnings in the reactos code mainly to do with signed and unsigned. Yep people have been mixing them all over the place some files have 10-20 Warning each just to do with signed and unsigned.
    Seems like a good opportunity to start checking code against 4.0.0 and fix them warnings before they get promoted to errors in a subsequent version...
  32. Re:moronic review by ChaoticCoyote · · Score: 2, Informative

    I have no clue what you're talking about.

    One benchmark *was* C++ (povray), and, in fact, I use KDE as my desktop. It just so happens that most code in a distribution is written in C.

    I have quite a bit of heavily-templatized C++ in my library and customer code, but it is either proprietary (under NDA) or unsuitable to timing. As I state in the article, C++ programmers should seriously consider GCC 4.0 for it's imrpoved compile times, if nothing else.

  33. Re:No, the third run is for finding bugs by petermgreen · · Score: 4, Informative
    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  34. Re:The ? operator by Rhesus+Piece · · Score: 2, Informative

    You know... funny doesn't always mean
    we're laughing _with_ you. ;-)

  35. Re:4.0? by cimetmc · · Score: 2, Informative

    The recommended compiler for linux 2.6.x is the main compiler which was used when linux 2.6 was released. The linux developers don't want to officially switch compilers in mid term of a version because that would be a potential additional source of bugs. It does not necessarily mean that newer GCC version would be less good. It's just that it might produce code that would behave differently for whatever reason and it would make development more difficult.

    The dicussion about recommended GCC versions of the Linux kernel regularly pops up on the kernel mailing list. For instance, you can see one such a discussion here:
    http://groups-beta.google.com/group/linux.kernel/b rowse_frm/thread/c2da87604102e689/5d754728f97e5105

    A much better indicator on GCC quality is to see what versions various Linux distributions actually use. For instance, if you take SuSE pro 9.3, it uses GCC 3.3.5.

    Marcel