Speed Test: Comparing Intel C++, GNU C++, and LLVM Clang Compilers
Nerval's Lobster writes "Benchmarking is a tricky business: a valid benchmarking tries to remove all extraneous variables in order to get an accurate measurement, a process that's often problematic: sometimes it's nearly impossible to remove all outside influences, and often the process of taking the measurement can skew the results. In deciding to compare three compilers (the Intel C++ compiler, the GNU C++ compiler (g++), and the LLVM clang compiler), developer and editor Jeff Cogswell takes a number of 'real world' factors into account, such as how each compiler deals with templates, and comes to certain conclusions. 'It's interesting that the code built with the g++ compiler performed the best in most cases, although the clang compiler proved to be the fastest in terms of compilation time,' he writes. 'But I wasn't able to test much regarding the parallel processing with clang, since its Cilk Plus extension aren't quite ready, and the Threading Building Blocks team hasn't ported it yet.' Follow his work and see if you agree, and suggest where he can go from here."
compiled with clang
The benchmarks in TFA are a little funny. Why is system time so large while user time so small? The only time I've seen this in real applications is when there is major core contention for resources.
Which one produced the fastest code?
What on earth does compiler benchmarking have to do with the BI section of slashdot?
Furthermore, why on earth are you idiots creating a blurb on the main screen that just links to a different slashdot article? Its such terrible self promotion. Just freaking write the main article as the main article. No need to make it seem as if the Buisness Intellegence section is actually worth reading, its not.
Well.. maybe. Or Maybe not. But Definitely not sort of.
man, it took a long time to read it.
Interesting info, but I have a couple of issues:
First off, why wasn't Microsoft's C++ compiler included in this? That's the one we use at work, so that's the one I'd really like compared to all those others. Are we the only ones still using it or something?
More importantly, why on earth was compilation speed the only thing compared? I mean, I suppose its nice for g++ users to know that their 10 minute compiles would have been 2 minutes longer if they used the Intel compiler, but Intel users might not really care if they believe their resulting code is going to run faster. Speed of compilation of optimized code is a particularly useless metric, because different compilers have different definitions of "unoptimized", so its guaranteed you aren't comparing apples to apples.
I suppose compilation speed is a nice metric to brag about between compiler writers. But for compiler users, the most important things are roughly these, in order: Toolchain support, language feature support (eg: C++2012/14 features), clarity of error/warning messages, speed of generated code (optimization), and lastly speed of compilation. I'm not really sure why you took it upon yourself to measure the least important factor, and only that one.
The code in the benchmark runs a parallel for over a 10 billion element array but in steps of 100 elements.
It's going to be limited by the creation and destruction of threads.
Also, by not initializing the input array, the floating point arithmetic is vulnerable to eventual denormal values.
Why, so we can have more first posters?
I read TFA and all I got was this lousy cookie
The main claim for g++ for a very long time was "while it does not optimize much or support all of the language, it is FREE".
I have never heard that claim, maybe because it isn't true. g++ has always been one of the best at language support. It has not always been the best at low-level processor specific optimizations, but it has made up for that by being really good at higher level optimizations, like recognizing unused code, inlining, and code hoisting. I haven't seen a better compiler at any price.
first ++pre
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
I am a scholar and study parallel computing. These benchmarks are pretty much pointless. You can not make any conclusions out of these results. Here the author take the time whole time of the execution for the creation of the process to its destruction. That means that are included lots of overhead which would be included in startup time in a real application.
There is also apparently no thread pinning to computational cores. This is known to make a HUGE difference.
Then the authors compared cilk result. cilk is known to be slow for simple codes that do not require workstealing and have complex dependencies. For the record, I know they are also comparing TBB. But TBB is implemented on top of the cilk engine in the intel compiler (I don't know about gcc).
In these results hyperthreading is enabled. The proper use of hyperthreading is complicated. There are some problems where it helps, other where it harms, and I would not be surprise that this behavior be compiler dependent.
Finally, it is almost impossible to compare compilers. On different platforms, with the same compilers you will get different results. Some functions are better compiled by one compiler and some functions are better compiled by the other compiler. This has been reported over and over and over again.
If you care about performance, you should not rely on what your compiler is doing in your back. You need to know what it is doing. Depending on memory alignment (and what the compiler knows about it), depending how the vectorization happen, depending on potential memory aliasing you will get different results.
If you care about performance, you need to benchmark and you need to optimize and you need to know what the compiler does.
int FirstPost(int a, int b)
{
if(a < b)
printf("I got first post!");
else
printf("No, I got first post!");
}
int main(int argc, const char** argv)
{
int i = 0;
// What prints out here?
FirstPost(i++, i++);
}
The best thing about UDP jokes is I don't care if you get them or not
Assuming typical C calling convention.... "No, I got first post" will be printed, where a will be 1 and b will be 0 in the call to FirstPost. This is because generally, final arguments are evaluated and pushed onto the stack before earlier ones.
Although typically, the standard may say this behavior is undefined, in practice, almost all modern C compilers will produce the output I've described here.
File under 'M' for 'Manic ranting'
This information is perhaps 2 years out of date, but back for one of my projects, when we switched from g++ to Intel C++, our software got about twice as fast with no other changes. It got even faster when we took advantage of SSE3 instructions.
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
it has made up for that by being really good at higher level optimizations
Heh, heh, heh, don't remember the great EGCS split of '97, do you sonny? Yep, us old timers knew that gcc was a dog of an optimizer, but them EGCS whippersnappers fixed it, and even got the fork accepted as the official gcc. Remember, you probably got to where you are today by running over the body of some crusty old-timer.
If it were just up to the order of evaluation of the function arguments, then it would be unspecified. However, the program also modifies the same object twice without an intervening sequence point, and that puts it into undefined behavior territory (6.5/2, C99 draft standard).
Clang will just issue a warning that you are making multiple unsequenced modifications. This is undefined in the C spec and the compiler just increments i sequently printing "I got first post!." Sequence points like this are hard to clarify for all cases which is why the C99 spec leaves it undefined. In C11 a detailed memory model has been created which should define most cases. http://en.wikipedia.org/wiki/C11_(C_standard_revision)
Confirmed with:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix
It is speed that is important which is why a lot of HPC people still prefer the intel compilers.
The main problem back then was X86 optimizations. Not the high level optimizations although it was lacking in those too. Eventually they started porting the code to use GIMPLE and moved most of the optimizations away from the language dependent trees to the GIMPLE language independent code. This was done before LLVM was even popular.
six release cycles a day is probably why you have bugs in the first place...
No doubt that gcc is a damned good compiler these days, at least in terms of the quality of code produced, if not the speed at which the compiler runs. My point was just that it wasn't always so. Back then gcc was considered a toy compared to some of the commercial compilers, and it was. Thankfully the EGCS people did a lot to change that, and got the ball rolling for future improvements.
I thought that was something people used back when MS-DOS was a popular OS was not even aware the product still existed.
I am talking about Watcom C++ of course.
It was open sourced some time ago. Now it supports Linux (to some extent) and some other CPU architectures.
It can still make DOS/4GW exes, though. Ahh, nostalgia.
he said bugfix/test, not release.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.