A Review of GCC 4.0
ChaoticCoyote writes "
I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion:
Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this.
That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0.
Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit.
"
Well clearly the problem is that you compiled GCC 4.0.0 with GCC 3.4.3! What I did was go through the GCC 4.0 source code in two seperate windows, fire up hexedit in another, and go through line by line "compiling" GCC 4.0 with the GCC 4.0 source, in my head. I wouldn't recommend doing this with -funroll-loops, my hands started cramping up.
Or you could wait to compile 4.0 until the 3.0 branch makes it to 3.9.9, then it will be close enough anyway. YMMV, people say I give out bad advice, go figure...
It was a long time before GCC 3 got better than 2.95. I expect the same thing will happen here.
Give me Classic Slashdot or give me death!
...Tiger? Wasn't it compiled with GCC 4.0?
Some people spend 10 hours tweaking compiler settings and optimizations to get an extra 5% performance from their code.
Other people spend 2 hours selecting the proper algorithm in the first place and get an extra 500% performance from their code.
To semi-quote The Matrix: One of these endeavors... is intelligent. And one of them is not.
"Like a baby, we won't really appreciate its value until it's matured a bit."
Does this mean I have to wait until it's 18?
It's damn fast for KDE compile as someone tested.
While I know the benefits of Fortran 95 are a big thing, saying it's a technological step forward to incorporate for the first time a 10 year old standard seems a bit ridiculous. When I first saw this article I had to check my calendar to make sure it was May 1st and not April 1st.
Is that what you say to new parents? :-)
Where are the screenshots?
http://www.kdedevelopers.org/node/view/1004
;)
Qt:
-O0 -O2
gcc 3.3.5 23m40 31m38
gcc 3.4.3 22m47 28m45
gcc 4.0.0 13m16 19m23
KDElibs (with --enable-final)
-O0 -O2
gcc 3.3.5 14m44 27m28
gcc 3.4.3 14m49 27m03
gcc 4.0.0 9m54 23m30
KDElibs (without --enable-final)
-O0
gcc 3.3.5 32m56
gcc 3.4.3 32m49
gcc 4.0.0 15m15
I think KDE and Gentoo people will like GCC 4.0
I love the open source movement but I wonder why the following comment is OK for open source projects and not close source?
quote "That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC."
I bet no one would dare say that about certain product from Redmond.
"Like a baby, we won't really appreciate its value until it's matured a bit."
Seriously, this is why I don't appreciate babies. At least after about 4 or 5 years, they're useful for mild manual labour. Sure they'll complain and cry, but all you gotta do is tie their dishwashing to the number of fish heads they're allotted that week. Works pretty well, I gotta say. Anyway, at least they're not a net productivity drain like babies are.
Anyway, what I mean to say is: from your description, it looks like I'll be staying away from GCC 4 for a while, too. Goddamn babies.
-Laxitive
Second, the runtime benchmarks were close enough to be statistically meaningless in most cases. The author concludes with:
My take would have been "in terms of raw numbers, it's not really any better yet." It's close enough to equal (and slower in few enough cases that I'd be willing to accept them), though, that I'd be willing to switch to it if I could do so without having to modify a lot of incompatible code. It's clearly the way of the future, and as long as it's not worse than the current gold standard, why not?
Dewey, what part of this looks like authorities should be involved?
Intel compiler's reason why it generate faster code is because it does auto-vectorisation (ie, it automatically finds out how to transform some code patterns to take advantage of native vector operation, such as those provided by sse). They started to implement this in gcc 4.0, but it's a veyr first iteration that for what I know is still kinda limited. I'm not even sure it's enabled by default, even in -O3. There are lots of improvement there targeted at gcc4.1.
I agree that this compiler is a cornerstone of free software.
But it was very frustrating to me to try to port the compiler to a new platform by modifying existing back ends for similar platforms.
After spending a few months on it (m68k in this case), I could not escape the layers of hack upon cruft upon hack upon cruft, that made it extremely difficult to make even fairly superficial mods because everyone seemed to be using the features differently and all the power seemed lost in hacks that made it impossible to do simple things (for me anyway). I am quite familiar with many assemblers and optimizing compilers.
I hope that the new work makes a somewhat-clean break with the old, otherwise, I would fear yet another layer to be hacked and interwoven, with the other ones that were so poorly fit to the back ends.
I suspect that not all backends are the same and perhaps the same experience would not be true for a more-popular target, but it seems to me it shouldn't be that hard to create a model that is more powerful yet more simple. Such would seem to me to be a major step forward and enable much greateer optimization, utilization, maintainability, etc.
The whole point of gcc4.0.0 is the tree-ssa thing. The author of this test didn't seem to notice that this stuff doesn't get enabled in -O2 nor -O3, but does have to be enabled by hand. This includes autovectorization (-ftree-vectorice) among other things which may make a difference.
If I was him, I'd repeat the tests again enabling the -ftree stuff when building with gcc4.0.0.
At what point (of 3's evolution) would you say it surpassed 2.95? Why?
Constitutionally Correct
What about the performance on MIPS? PPC? C'mon, people...enquiring minds want to know!
Constitutionally Correct
One of the major changes in the 4.0.0 release is the internal reorganization that allows for more aggressive optimizations. Hence, he tested how the optimized performance of the latest 3.x versus 4.0.0. How do you tell the compiler to optimize? Well, you have to pass it "lots of flags."
I found this in the osnews announcement
"Before we get a bunch of complaints about the fact that most binaries generated by GCC 4.0 are only marginally faster (and some a bit slower) than those compiled with 3.4, let me point out a few things that I've gathered from casually browsing the GCC development lists. I'm neither a GCC contributor nor a compiler expert.
Prior to GCC 4.0, the implementation of optimizations was mostly language-specific; there was little or no integration of optimization techniques across all languages. The main goal of the 4.0 release is to roll out a new, unified optimization framework (Tree-SSA), and to begin converting the old, fragmented optimization strategies to the unified framework.
Major improvements to the quality of the generated code aren't expected to arrive until later versions, when GCC contributors will have had a chance to really begin to leverage the new optimization infrastructure instead of just migrating to it.
So, although GCC 4.0 brings fairly dramatic benefits to compilation speed, the speed of generated binaries isn't expected to be markedly better than 3.4; that latter speedup isn't expected until later installments in the 4.x series."
Like a baby, we won't really appreciate its value until it's matured a bit.
"Come here son. Did you know your mother and I almost decided to not keep you when you were born? You were just a baby at the time, you didn't seem to have any value. I mean, seriously, what use is there for a baby? I'm glad we didn't make that mistake.
Now go play outside and don't come back before dinner time, and pick up the trash when you leave."
There was one test case I did for my own use. I've got a small C++ program that's computationally heavey and has a small working set of memory.
On that program (on a P4) I got an 11% reduction in runtime using GCC 4 vs. GCC 3.3.5. This was actually a big deal for me work.
The lesson here: You're mileage with GCC 4.0's improvements may vary from the benchmarks, and you might want to try it on your own code.
I don't wanna read the review it reveals the ending or something. I mean what good is a compiler without some big unexpected surpises?
Are you kidding? Babies are worth $15,000-$20,000 easily, even if they're female. Once e-Bay stops being a bunch of pussies and we get some open bidding started, I expect their value to go up even higher.
Once again, we see that the
Only on /. could this get modded up...
He's not a dumbass because he uses Gentoo. It's pretty obvious that he doesn't know what he's talking about. Straight from TFA:
Some folk may object to my use of -ffast-math -- however, in numerous accuracy tests, -ffast-math produces code that is both faster and more accurate than code generated without it.
"I don't know about you, but if I want my math done fast and wrong I'll ask my cat" - Anonymous
Not always. Usually Intel's CC produces faster code except when you give it code [LibTomMath] that it can't efficiently optimize.
...
Also as he hates to have pointed out his options aren't always optimal.
Quite a few applications are faster with 3.4.3 on a P4 with "-fno-regmove" as well as -O3. My AES for instance goes down from >500 cycles/block to 380 cycles/block on my Prescott P4 with this switch.
380 cycles/block is faster than Intel CC v8.0 with "-O3 -xP -ip" by about 30 cycles/block.
Also the guy probably didn't try profiling. I can drop a fair chunk of cycles in doing ECC point multiplies on my P4 with GCC by doing a profiled build system.
ETC!
Tom
Someday, I'll have a real sig.
Unless the GCC documentation is very wrong, the only tree-ssa optimizations in 4.0 which don't get turned on by default at -O3 are -ftree-loop-linear, -ftree-loop-im, -ftree-loop-ivcanon, -fivopts, and -ftree-vectorize. It's true that some of these may be good optimization wins (probably increasing compile time in the process, but that's what the higher optimization levels are all about), but there are plenty of tree-ssa optimization passes being used in these tests.
Auto-vectorization, by the way, does not fall into a "obvious optimization wins which perhaps should be enabled at -O3 by default" category. It can bring very big performance benefits in some situations, but it should be used with caution.
Recently, a discussion took place on a FreeBSD mailing list wether the project wanted to use GCC 4.0.0 as the system compiler. Some objections where:
If I understood it right, We won't have a GCC 4.0.0 system compiler on FreeBSD anytime soon. Installing the gcc40 port is, of course, always possible.
cpghost at Cordula's Web.
Uh, and this review is helping us... how?
lame uses assembler code for vectorization. One of the new features of gcc 4 is the beginnings of a vectorization model. A good test for gcc 4 would have been to compile some C-only bignum libraries, and Ogg Vorbis! povray is also a good example, but then you need to test more than one specific test-run. Maybe gcc 4 makes radiosity in pov-ray 400% faster at a 2% cost in the rest of the code?
This guy is the Tom's Hardware of Linux reviews, except he doesn't have the annoying ads, and he does not split his lack of content over 30 HTML pages.
The new warnings of gcc 4 have helped me find a bug in my code. That saved me a week. Consider how much faster gcc 4 needs to make pov-ray or lame to save you a week of work!
gcc 4 can now reorder functions according to profile feedback. That should make large C++ projects faster. Also, the ELF visibility should make KDE start much faster. This should have been tested!
Please note that I'm not saying gcc 4 produces faster code. I don't rightly know. I do know it produces smaller code for my project dietlibc, where size matters more than speed.
A prospective professor gave a talk at our school, he'd been working at some grid computing lab. Well their code was underperforming, so they profiled it and found that sprintf was taking 30? 50? 70? % (I forget) of their code time - the machines had to communicate with each other a lot, and they used sprintf to serialize. (It's an easy fix - C++'s stream operators are much faster, since the type is known at compile time.)
:)
Oh yeah, also, for Quake 1, John Carmack hired Michael Abrash, an assembly language guru, to help out. Well Abrash found that GCC's memcpy() (or whatever it was) was copying byte-by-byte instead of by word (or something, I don't remember) and his reimplementation of that alone, doubled the frame rate!
Just some interesting counter examples to keep in mind
Someday we'll all be negroes
Does Gentoo have users? I though they only had installers.
It isn't a huge deal for most people, but it seems like the new GCC is singificantly better at optimizing for the PowerPC now.
I've been working with the GNU GSL on my mac a lot, and I recently updated to Tiger. The first thing I noticed when I recompiled the GSL with Apple's modified GCC4.0 is the significant and noticable speed increase. With this intense math stuff, doing SVD on 300x200 matricies, and it's shocking how much faster it is. I went from 3-5 seconds down to less than one.
I am not going to post any hard numbers because I haven't rigorously compared them yet, but I'll make some formal comparisons this week.
Slashdot. It's Not For Common Sense
ACK! If 70% of your time is spent in a serialization function call, FORGET about optimizing the function call.... You are WAY too fine grained in your algorithm for effective parallelization. He's have been better off running the whole damn thing serially on a single box methinks. His fancy grid algorithm spent more time doing "grid" stuff than working on his problem!
If your compiler compiles correctly, a program (leaving floating point inaccuracies aside) should produce the same result no matter what compiler it is compiled with. I.e. a gcc 4.0 should produce the same results no matter if it's itself compiled with 3.4.3 or 4.0.
Signed, a parent of three.
Dewey, what part of this looks like authorities should be involved?
You need to read up on the -march, -mfpmath, -mmmx, -msse, -msse2, -msse3, and -m3dnow options.
If you build gcc yourself, you can even make them the default by configuring with an appropriate --with-arch option.
I don't get it. I was very serious when I wrote that, still this comment has 60%Funny, and even 20%Troll. ....
Oh well. I guess this comment will now lose all its Funny mod points, but what the heck.
To the contray, now it's even funnier.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Auto-vectorization is not the reason why Intel's compiler is better. It certainly helps, but in my experience, not much. Intel's compiler just does better optimizations across the board. Which is no surprise: Intel is making a compiler for thier chipsets. They have inside knowledge of what's best to do when for a particular chip. Further, Intel's compiler is marketed as the fastest compiler for x86, which as far as I know, is true. Hence, they spend a lot of time on the optimizations.
GCC, on the other hand, has a different goal: get a working compiler on as many platforms as possible.
Um, if it doesn't make code run faster, what's the point of including it?
Sigh.. I wasn't saying that it doesn't make code run faster. I was saying that it doesn't necessasrily make code run faster. Auto-vectorization is only a win in certain circumstances. There are a whole host of optimizations that only apply in specific circumstances and/or only improve performance in certain circumstances and slow things down in others. If there weren't trade-offs with optimizations, compilers would just have "-O" and wouldn't bother with tons of other optimization flags.
sigs are a waste of space
You know, I remember when someone did this to GCC 3, comparing against 2.9.5.
4.0.0 is a brand new compiler. Lots of techniques in it are brand new. Lots of tweaks and polish can be applied. If you actually take the time to compare 3.4 to 3.0, you'll find that the gap is bigger than 4.0 to 3.4. Furthermore, if you compare 2.9.5 to 3.0, you'll find 2.9.5 is better than 3.0 by a much wider margin than 3.4 is to 4.0.
This is a misunderstanding of the nature of progress. 4.0 is a brand new compiler with brand new internal behaviors. Lots of things are at the It Works stage, instead of the It's Efficient stage. You can't compare a 3-year polished compiler to a 3-week polished compiler; it's utter nonsense.
If you want to compare 4.0 to something, compare it to 3.0, or sit down.
StoneCypher is Full of BS
I have no clue what you're talking about.
One benchmark *was* C++ (povray), and, in fact, I use KDE as my desktop. It just so happens that most code in a distribution is written in C.
I have quite a bit of heavily-templatized C++ in my library and customer code, but it is either proprietary (under NDA) or unsuitable to timing. As I state in the article, C++ programmers should seriously consider GCC 4.0 for it's imrpoved compile times, if nothing else.
All about me
The recommended compiler for linux 2.6.x is the main compiler which was used when linux 2.6 was released. The linux developers don't want to officially switch compilers in mid term of a version because that would be a potential additional source of bugs. It does not necessarily mean that newer GCC version would be less good. It's just that it might produce code that would behave differently for whatever reason and it would make development more difficult.
b rowse_frm/thread/c2da87604102e689/5d754728f97e5105
The dicussion about recommended GCC versions of the Linux kernel regularly pops up on the kernel mailing list. For instance, you can see one such a discussion here:
http://groups-beta.google.com/group/linux.kernel/
A much better indicator on GCC quality is to see what versions various Linux distributions actually use. For instance, if you take SuSE pro 9.3, it uses GCC 3.3.5.
Marcel