A Review of GCC 4.0

← Back to Stories (view on slashdot.org)

Posted by Hemos on Monday May 2, 2005 @04:44AM from the only-time-will-tell dept.

ChaoticCoyote writes " I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion: Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this. That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0. Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit. "

15 of 429 comments (clear)

Min score:

Reason:

Sort:

Re:The performance of compiled code by pclminion · 2005-05-02 04:54 · Score: 3, Insightful

At some point you've got the best algorithm, you've profiled, you've hand-optimised, you've got the fastest hardware you can afford....and you *still* need that last 5%. That's when you spend 10 hours tweaking compilers settings...
If you really, positively need an extra 5% performance, you might as well just buy a computer that's 5% faster.
Re:The performance of compiled code by Jeremy.DeGroot · 2005-05-02 04:55 · Score: 3, Insightful

You should do both. Choosing the right algorithms is crucial, no doubt about it. But if you've got a massive database application, that 5% can represent a huge amount of work and be worth the trouble. A little bit of extra performance can, in many cases, go a long, long way towards adding to the value of the software. Both endeavors are intelligent in many (if not most) cases. Performance is important in software, and any little bit you can squeeze out will likely be a big deal.
Re:The performance of compiled code by Minwee · 2005-05-02 04:59 · Score: 5, Insightful

Unfortunately, including a faster computer with every copy of the code you distribute may be prohibitively expensive.
Re:The performance of compiled code by kfg · 2005-05-02 04:59 · Score: 5, Insightful

And in both groups you will find people who believe that execution speed is the measurement of code quality.

KFG
Re:Expected by Rei · 2005-05-02 05:11 · Score: 5, Insightful

I think the problem is that, if I'm not mistaken, he's testing all C code except Povray. The biggest reported improvements in 4.0 were for g++, so using such a small C++ sample base (Povray - one purpose, one set of design principles, few authors) seems bound to produce inaccurate benchmarking.

Further, on his most reasonable C benchmark (the Linux kernel), he only records compile time and binary size, but no performance. I call it the most reasonable benchmark because it has thousands of contributors and covers a wide range of code purposes and individual coding habits - and yet, performance is omitted.

In short, I wouldn't trust this benchmark. Probably the best benchmark would be to build a whole Gentoo system with both, with identical configurations, and check build times and performances ;)

--
Dear Lord: One of your creatures may be hurt tonight. Please let it be the other creature.
Re:The performance of compiled code by mattgreen · 2005-05-02 05:11 · Score: 4, Insightful

It is because it is easier to delve into needlessly technical aspects afforded by compiler settings and 'optimizations' than it is to admit that one's algorithm is not sound. Kids running Gentoo delude themselves into thinking that omitting the frame pointer on compiles is going to make a massive difference in terms of performance, and fail to remember it makes bug hunting far more difficult when applications crash. Additionally, the 5% gain mentioned can be a severe overstatement. I frequent a game programming board, and the widespread use of C++ has led to an abundance of nano-optimization threads, the most amusing of which was an attempt to optimize strlen().

Optimizing every single line of code is a complete waste of time, since the 80/20 rule generally applies. Use a profiler to determine where that 20% is.
Re:The performance of compiled code by IHateSlashDot · 2005-05-02 05:25 · Score: 3, Insightful

The point of this article is compiler optimizations, not algorithm selection. At the point that I look at compiler performance, I've already done all of the algorithm tuning so your point is moot. This is a very interesting benchmark for those that of who already write good code and want the compiler to make the best of it.
Re:kettle? black? by imroy · 2005-05-02 05:39 · Score: 3, Insightful

I think you're a little mixed up there. When we criticize MS, we're often referring to the release of known buggy and badly implementated software to the general public. Instead the submitter of this article is referring to the "full potential" of the new optimization framework in GCC-4.0. It will, in theory, allow for much better optimizations to be performed on internal parse tree. But for now many of the CPU models are incomplete or non-existant, or something like that. The full potential of these optimizations will be delivered in a later release, either 4.0.x or 4.1, or perhaps a little of both. And the GCC team wouldn't have released GCC 4.0 with known, serious bugs.

Or perhaps I've just been trolled. Wouldn't be the first time. I see that this is your first comment on slashdot. Welcome. Just don't troll.
Re:The performance of compiled code by BlurredWeasel · 2005-05-02 05:43 · Score: 5, Insightful

I run gentoo (not for performance, but mainly because I am familiar with it, and it is easy), and you know what...I don't bug hunt. And adding -fomitframepointer or whatever the hell the option is (its in my flags somewhere) doesn't cost me anything, makes my system say (made up stat) 5% faster and I am happy. It makes no sense why you should deride me (read: gentooers) as an idiot. We're just end users, and if we can get a little bit of performance for free, well why not.
Tree-ssa is in there by jensend · 2005-05-02 05:51 · Score: 3, Insightful

Unless the GCC documentation is very wrong, the only tree-ssa optimizations in 4.0 which don't get turned on by default at -O3 are -ftree-loop-linear, -ftree-loop-im, -ftree-loop-ivcanon, -fivopts, and -ftree-vectorize. It's true that some of these may be good optimization wins (probably increasing compile time in the process, but that's what the higher optimization levels are all about), but there are plenty of tree-ssa optimization passes being used in these tests.

Auto-vectorization, by the way, does not fall into a "obvious optimization wins which perhaps should be enabled at -O3 by default" category. It can bring very big performance benefits in some situations, but it should be used with caution.
What a crappy review by Fefe · 2005-05-02 06:04 · Score: 3, Insightful

Uh, and this review is helping us... how?

lame uses assembler code for vectorization. One of the new features of gcc 4 is the beginnings of a vectorization model. A good test for gcc 4 would have been to compile some C-only bignum libraries, and Ogg Vorbis! povray is also a good example, but then you need to test more than one specific test-run. Maybe gcc 4 makes radiosity in pov-ray 400% faster at a 2% cost in the rest of the code?

This guy is the Tom's Hardware of Linux reviews, except he doesn't have the annoying ads, and he does not split his lack of content over 30 HTML pages.

The new warnings of gcc 4 have helped me find a bug in my code. That saved me a week. Consider how much faster gcc 4 needs to make pov-ray or lame to save you a week of work!

gcc 4 can now reorder functions according to profile feedback. That should make large C++ projects faster. Also, the ELF visibility should make KDE start much faster. This should have been tested!

Please note that I'm not saying gcc 4 produces faster code. I don't rightly know. I do know it produces smaller code for my project dietlibc, where size matters more than speed.
Re:4.0.0 broke backward compatibility big time by chrysalis · 2005-05-02 06:14 · Score: 4, Insightful

gcc 4.0 just tries to follows standards. If something doesn't compile with gcc 4, don't blame the compiler. The source code was broken at the first place.

--
{{.sig}}
Re:The performance of compiled code by Brandybuck · 2005-05-02 07:58 · Score: 4, Insightful

If you have a choice of algorithms, then of course use the better algorithm. But for most of the day-to-day code we deal with, we don't have that choice, because we're not dealing with code that has any grand algorithms to it. For example, if I'm writing a GUI frontend to a command line app, what are my choices of algorithms? Not much.

In my real life coding work, the places where algorithm efficiency makes a difference are far outweighed by those places that don't. And of those places that do make a difference, the performance is rarely a critical need. For example, I just coded up some RAMDAC lookup tables, and a difference of algorithm would make a huge difference in efficiency. But this particular routine was triggered by a user event (clicking a button in a config dialog), so that my dogslow but highly readable/understandable algorithm wasn't a bottleneck for anything. In this case tweaking the compiler settings would have given a 5% boost to everything, but a change in algorithm would only have given a 1/10 second boost for an event that would happen approximately once a week or less.

--
Don't blame me, I didn't vote for either of them!
God, this gets old. by stonecypher · 2005-05-02 08:21 · Score: 3, Insightful

You know, I remember when someone did this to GCC 3, comparing against 2.9.5.

4.0.0 is a brand new compiler. Lots of techniques in it are brand new. Lots of tweaks and polish can be applied. If you actually take the time to compare 3.4 to 3.0, you'll find that the gap is bigger than 4.0 to 3.4. Furthermore, if you compare 2.9.5 to 3.0, you'll find 2.9.5 is better than 3.0 by a much wider margin than 3.4 is to 4.0.

This is a misunderstanding of the nature of progress. 4.0 is a brand new compiler with brand new internal behaviors. Lots of things are at the It Works stage, instead of the It's Efficient stage. You can't compare a 3-year polished compiler to a 3-week polished compiler; it's utter nonsense.

If you want to compare 4.0 to something, compare it to 3.0, or sit down.

--
StoneCypher is Full of BS
Re:The performance of compiled code by mattgreen · 2005-05-02 13:27 · Score: 3, Insightful

Exactly. You couldn't distinguish an app compiled with frame pointers disabled from one with frame pointers enabled. The reason I point this particular optimization out is that it makes life hell for developers if you end up sending a core dump file in. This is a net loss in my book, since open source software evolves rapidly and being able to assist in reporting bugs is vital.

How is pointing out that one optimization people crow about is largely ineffective being an asshole?