Slashdot Mirror


Speed Test: Comparing Intel C++, GNU C++, and LLVM Clang Compilers

Nerval's Lobster writes "Benchmarking is a tricky business: a valid benchmarking tries to remove all extraneous variables in order to get an accurate measurement, a process that's often problematic: sometimes it's nearly impossible to remove all outside influences, and often the process of taking the measurement can skew the results. In deciding to compare three compilers (the Intel C++ compiler, the GNU C++ compiler (g++), and the LLVM clang compiler), developer and editor Jeff Cogswell takes a number of 'real world' factors into account, such as how each compiler deals with templates, and comes to certain conclusions. 'It's interesting that the code built with the g++ compiler performed the best in most cases, although the clang compiler proved to be the fastest in terms of compilation time,' he writes. 'But I wasn't able to test much regarding the parallel processing with clang, since its Cilk Plus extension aren't quite ready, and the Threading Building Blocks team hasn't ported it yet.' Follow his work and see if you agree, and suggest where he can go from here."

46 of 196 comments (clear)

  1. Re:first post by Anonymous Coward · · Score: 5, Funny

    compiled with clang

  2. Funny benchmarks by Anonymous Coward · · Score: 2, Insightful

    The benchmarks in TFA are a little funny. Why is system time so large while user time so small? The only time I've seen this in real applications is when there is major core contention for resources.

    1. Re:Funny benchmarks by Trepidity · · Score: 3, Interesting

      This looks like it's testing the compile-time, in which case a large % of time being system time isn't that uncommon. Lots of opening and closing small files, generating temp files, general banging on the filesystem. Can heavily depend on your storage speed: compiling the Linux kernel on an SSD is much faster than on spinning rust.

    2. Re:Funny benchmarks by godrik · · Score: 4, Interesting

      that's normal. There is hyperthreading on that machine, it screws up that kind of measurement. You should always use wall-clock time when dealing with parallel codes. You should also repeat the test multiple times and discard the first few results which the author did not do. It is very standard in parallel programming benchmark. And since the author did not do that, I assume he does not know much about benchmarking. Lots of parallel middleware have high initialization overhead. This tends to be particularly true for intel tools.

    3. Re:Funny benchmarks by robthebloke · · Score: 5, Insightful
      I agree. He's not testing compiled code performance, he's just created a set of tests which will all be memory bandwidth limited. FTA:

      I’m testing these with an array size of two billion.

      That's all I needed to read to ignore him completely. Completely and utterly pointless. If g++ won, it is likely because it utilised stream intrinsics to avoid writing data back to the CPU cache, which would have freed up more cache, and minimised the number of page faults. This will not in anyway test the performance of the CPU code, it will just prove that your 1333Mhz memory is slower than your 3Ghz processor . This is why you don't profile code (wrapped up in a stupid for loop), but profile whole applications instead. From my own tests (measuring the performance of large scale applications using real world data sets), intel > clang > g++ (although the difference between them is shrinking). The author of the article hasn't got a clue what he's doing. FTA:

      Notice the system time is higher than the elapsed time. That’s because we’re dealing with multiple cores.

      No it isn't. It's because your CPU is sat idle whilst it waits for something to do.

    4. Re:Funny benchmarks by Anonymous Coward · · Score: 5, Insightful

      Here's another tipoff that the guy is clueless about benchmarking, talking about a test which does FP math:

      I’m not initializing the arrays, and that’s okay

      Actually, it's not. This is a bad mistake which totally invalidates the data. Many FPUs have variable execution time depending on input data. There is often a large penalty for computations involving denormalized numbers. If uninitialized data arrays happen to be different across different compilers (and they might well be), execution time can vary quite a lot for reasons completely unrelated to compiled code quality.

      It's not limited to FP, either. I remember at least one PowerPC CPU which had variable execution time for integer multiplies -- the multiplier could detect "early out" conditions when one of the operands was a small number, allowing it to shave a cycle or two off the execution time.

      The moral of the story: making sure that input data for benchmarks is always the same is very important, even when it's trivially obvious that the code will execute the exact same instruction count for any data set.

    5. Re:Funny benchmarks by robthebloke · · Score: 2

      Last time I checked, 11 minutes was greater than 250 seconds. ('%e' is the default format for user/system time [seconds], '%E' is the format time for elapsed time [minutes]. The different formatting should have been a big hint).

  3. Compile time is irrelevant. by Anonymous Coward · · Score: 4, Insightful

    Which one produced the fastest code?

    1. Re:Compile time is irrelevant. by 0123456 · · Score: 2, Insightful

      Which one produced the fastest code?

      My current project takes two hours to compile from scratch, and uses around 20% CPU when it runs. So yes, compile time can be more important than how fast the code runs.

    2. Re:Compile time is irrelevant. by ledow · · Score: 4, Insightful

      But over the lifetime of any average, runtime should outweigh compile time by orders of magnitude.

      Otherwise, honestly, why bother to write the program efficiently at all?

      And if you want to decrease compile times, it's easy - throw more machines and more power at the job. If you want to decrease runtime, then ALL of your users have to do that.

      Honestly, if your compile times are that much, and that much of a burden, you need to upgrade, and you also need to modularise your code more. The fact is that most of that compile time isn't actually needed for 90% of compiles unless your code is very crap.

    3. Re:Compile time is irrelevant. by ShanghaiBill · · Score: 4, Insightful

      Which one produced the fastest code?

      It doesn't matter. It may matter which one compiles your code faster. Depending on your use of things like templates, classes, etc. that may be a different compiler than the best for the benchmarks. But even that is unlikely to matter much. I doubt if their is much more than a few percentage difference. More important are issues like standard compliance, good warning messages, tool-chain/IDE integration, etc.

    4. Re:Compile time is irrelevant. by marcello_dl · · Score: 2

      The time spent on running code vs compiling code, for me, is like 10000:1, to be optimistic. Compilation time is pretty irrelevant for me and I daresay most users.

      --
      ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
    5. Re:Compile time is irrelevant. by rfolkker · · Score: 5, Informative

      I have worked on projects that have taken upwards of 8 hours for a full compile. There is a lot of validity behind the business impact of different compilers.

      The current mentality of throw more horse power at a problem is not always the practical, or the logical conclusion. If you can improve your overall compile time, it can improve your productive time.

      From a Build Engineering perspective, analyzing why it takes time for a project to compile is one of the most important metrics.

      Not only do I monitor how long a project takes to compile, but I also keep an active average, and try to maintain highs and lows to identify compile spikes.

      We monitor processor(s), disk access speeds, memory loads, build warnings, change size, concurred builds, etc.

      We look at all possible solutions. With the current build tools we have, we can either provision another build system for the queue, or if necessary increase memory, or disk space, or faster drives, more processors, or even upgraded software. We have gone as far as home-grown fixes to get around issues until better solutions become available.

      All of this needs to be accounted for, so, not only is compile time relevant, but what is CAUSING compile times is relevant.

    6. Re:Compile time is irrelevant. by TheGavster · · Score: 3, Interesting

      While any user-facing application is going to spend most of its time waiting for the user to do something, the latency to finish that task is still something the user will want to see optimized. Further, if a long-running task tops out at 20% CPU, apparently optimization was weighted too much towards CPU and you need to look into optimizing your IO or memory usage.

      --
      "Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
    7. Re:Compile time is irrelevant. by Gothmolly · · Score: 2

      You're IO bound. Get a real disk subsystem.

      --
      I want to delete my account but Slashdot doesn't allow it.
    8. Re:Compile time is irrelevant. by F.Ultra · · Score: 2, Funny

      A week? Try Gentoo.

    9. Re:Compile time is irrelevant. by wiredlogic · · Score: 2

      An excessively long build time can inflate development costs if the delay in testing new code becomes prohibitively long. A large codebase that takes 4 hours to build on a slow compiler will force developers to frequently wait over night for test results to come back. If a different compiler can build that code 4x faster you have many more opportunities to observe test results during a work day. Upgrading the build system isn't always an option when you have to support legacy platforms with inherently slow hardware and cross compiling isn't an option.

      --
      I am becoming gerund, destroyer of verbs.
    10. Re:Compile time is irrelevant. by VortexCortex · · Score: 4, Interesting

      Which one produced the fastest code?

      My current project takes two hours to compile from scratch, and uses around 20% CPU when it runs. So yes, compile time can be more important than how fast the code runs.

      I had a C++ project like that once... It was a tightly coupled scripting language that could be compiled down to the host language if parts needed to be sped up. I noticed that I was mostly avoiding C++ features since they didn't exist ( eg: using multiple inheritance with non-pure virtual base classes -- Which the scripting lang allowed by allowing variables to be virtual ) and implementing them in C instead. So, I ditched C++ and coded to C99 instead. When I got all the C++ out of the codebase (thus making it compilable in either) the compile time dropped from an hour and a half in C++ to 15 minutes in C. Since I absolutely must have scripting and the VM lang optionally allows GC transparently across C or script (by replacing malloc and friends), and it has more flexible OOP (entities can change "state" and thus remap method behaviors (function pointers) making large improvements over jump-tables (switch statements) for my typically highly stateful code: I avoid C++ like the plague.

      In fact, since the scripting language can translate itself into C, I don't touch C much either for my projects unless I'm extending the language itself. Over the years I've ported the "compiled" output to support Java and Perl, and Javascript (and am working on targeting ASM.js). It's grueling work just for my games and OS hobby projects, but I really can't bring myself to use a compilable language that doesn't double as its own scripting language -- That's asinine, especially if it compiles an order of magnitude slower.

      Don't get me wrong, I get the utility of a general purpose OOP language built around the most general purpose use cases possible; However when you design something for everyone, you've actually designed it for no one at all. I'll take a language with full language features applicable to its templating (code generation), like my scripting language (or Lisp) over C++ or Java any day. (Note: Rhino or Lua JIT + Java is a close contender as far as usability goes, but nothing beats native compiled code for my applications' use case.)

      WRT to the "insightful" commenter above: Deadlines are far more important to code being able to run than the distributed minute performance gains on end user systems which are influenced by moore's law. Release date is far more significant: The code has 0% usability if I can't produce it in time. Unfortunately, some project depend on emergent behaviors and thus require fast revisions to tweak (this goes doubly for me, hence the scripting component requirement).

    11. Re:Compile time is irrelevant. by semi-extrinsic · · Score: 2

      I have to say, if we ignore Firefox/Linux/other ridiculously huge codes, the people having hours of compile time must be doing something wrong. The 30 000 line Fortran code I'm working on now takes

      $ time make clean optim_ifort
      ...
      make clean optim_ifort 50.16s user 0.86s system 98% cpu 51.731 total

      in serial, and it's roughly 3x faster for the parallell cmake build. This is with -O3 and inter-procedural optimizations turned on, generating AVX-tuned code. If I have only edited a file or two, the compilation takes less time than a sip of coffee.

      --
      for i in `facebook friends "=bday" 2>/dev/null | cut -d " " -f 3-`; do facebook wallpost $i "Happy birthday!"; done
    12. Re:Compile time is irrelevant. by TheRaven64 · · Score: 2

      30,000 lines of code is a tiny project. I have codebases I wrote myself that are larger. Anything developed by a team is likely to be at least an order of magnitude larger. You're also comparing Fortran to C++, so you get a much faster compile because Fortran doesn't encourage large compile-time code generation in the way C++ templates do, which make parsing very slow, and makes alias analysis trivial, which makes a lot of optimisations easier.

      --
      I am TheRaven on Soylent News
  4. DIe Buisness Intelligence DIE by Bill,+Shooter+of+Bul · · Score: 5, Interesting

    What on earth does compiler benchmarking have to do with the BI section of slashdot?

    Furthermore, why on earth are you idiots creating a blurb on the main screen that just links to a different slashdot article? Its such terrible self promotion. Just freaking write the main article as the main article. No need to make it seem as if the Buisness Intellegence section is actually worth reading, its not.

    --
    Well.. maybe. Or Maybe not. But Definitely not sort of.
    1. Re:DIe Buisness Intelligence DIE by MrNemesis · · Score: 3, Insightful

      Oxymoronic.

      --
      Moderation Total: -1 Troll, +3 Goat
  5. Re:first post by Anonymous Coward · · Score: 2, Funny

    man, it took a long time to read it.

  6. Measuring pebbles by T.E.D. · · Score: 5, Insightful

    Interesting info, but I have a couple of issues:

    First off, why wasn't Microsoft's C++ compiler included in this? That's the one we use at work, so that's the one I'd really like compared to all those others. Are we the only ones still using it or something?

    More importantly, why on earth was compilation speed the only thing compared? I mean, I suppose its nice for g++ users to know that their 10 minute compiles would have been 2 minutes longer if they used the Intel compiler, but Intel users might not really care if they believe their resulting code is going to run faster. Speed of compilation of optimized code is a particularly useless metric, because different compilers have different definitions of "unoptimized", so its guaranteed you aren't comparing apples to apples.

    I suppose compilation speed is a nice metric to brag about between compiler writers. But for compiler users, the most important things are roughly these, in order: Toolchain support, language feature support (eg: C++2012/14 features), clarity of error/warning messages, speed of generated code (optimization), and lastly speed of compilation. I'm not really sure why you took it upon yourself to measure the least important factor, and only that one.

    1. Re:Measuring pebbles by Anonymous Coward · · Score: 2, Informative

      Your pre-elementary reading and comprehension skills leave much to be desired.

      It’s interesting that the code built with the g++ compiler performed the best in most cases, although the clang compiler proved to be the fastest in terms of compilation time.

      Just in case you didn't get that: They did benchmark the resulting binaries, and g++ made the best ones.

    2. Re:Measuring pebbles by T.E.D. · · Score: 4, Interesting

      OK. Much abashed, I went back through the article.

      It turns out that there are numbers for an actual code benchmark. Its found about 2/3rds of the way through the report, in the third graph (untitled), after the balance of the text had already been devoted to compilation speed comparisons. Also, it only listed 2 of the 3 compilers, half the data was for unoptimized code (and thus useless), and it was hidden behind a sign that said "Beware of leopard".

      OK, perhaps I made that last part up.

      For the curious, the difference in at least that one table was never more than 5%. In my mind, hardly a differentiator unless you are doing heavy number crunching or stock trading programs.

      Perhaps the remaining 1/3rd is all about more important things? I've lost interest. You're right. I'm weak.

    3. Re:Measuring pebbles by T.E.D. · · Score: 3, Interesting

      This could be the difference between an hour and 10 minutes for builds of some projects.

      If that's really the delta (one is 600% slower), then something is likely seriously pathologically wrong with one of those two compilers. Submit a bug report (not that it helps you, but it will help someone else).

      But yes, different users in different phases will have different priorities. I'm not laying down an immutable law here, just trying to restore the proper proportion to a situation that we both agree is way out of wack.

  7. Crappy benchmark by raxx7 · · Score: 5, Informative

    The code in the benchmark runs a parallel for over a 10 billion element array but in steps of 100 elements.
    It's going to be limited by the creation and destruction of threads.

    Also, by not initializing the input array, the floating point arithmetic is vulnerable to eventual denormal values.

    1. Re:Crappy benchmark by Anonymous Coward · · Score: 2, Interesting

      By not initializing the input array, the code's behaviour is undefined. Which makes this a test of what these compilers do with complete garbage source that isn't even a valid C++ program.

  8. Re:first post by Mitchell314 · · Score: 2

    Why, so we can have more first posters?

    --
    I read TFA and all I got was this lousy cookie
  9. Re:future compiler trends by ShanghaiBill · · Score: 2

    The main claim for g++ for a very long time was "while it does not optimize much or support all of the language, it is FREE".

    I have never heard that claim, maybe because it isn't true. g++ has always been one of the best at language support. It has not always been the best at low-level processor specific optimizations, but it has made up for that by being really good at higher level optimizations, like recognizing unused code, inlining, and code hoisting. I haven't seen a better compiler at any price.

  10. Re:first post by neokushan · · Score: 3, Funny

    first ++pre

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
  11. This benchmark is pointless by godrik · · Score: 4, Informative

    I am a scholar and study parallel computing. These benchmarks are pretty much pointless. You can not make any conclusions out of these results. Here the author take the time whole time of the execution for the creation of the process to its destruction. That means that are included lots of overhead which would be included in startup time in a real application.

    There is also apparently no thread pinning to computational cores. This is known to make a HUGE difference.

    Then the authors compared cilk result. cilk is known to be slow for simple codes that do not require workstealing and have complex dependencies. For the record, I know they are also comparing TBB. But TBB is implemented on top of the cilk engine in the intel compiler (I don't know about gcc).

    In these results hyperthreading is enabled. The proper use of hyperthreading is complicated. There are some problems where it helps, other where it harms, and I would not be surprise that this behavior be compiler dependent.

    Finally, it is almost impossible to compare compilers. On different platforms, with the same compilers you will get different results. Some functions are better compiled by one compiler and some functions are better compiled by the other compiler. This has been reported over and over and over again.

    If you care about performance, you should not rely on what your compiler is doing in your back. You need to know what it is doing. Depending on memory alignment (and what the compiler knows about it), depending how the vectorization happen, depending on potential memory aliasing you will get different results.

    If you care about performance, you need to benchmark and you need to optimize and you need to know what the compiler does.

    1. Re:This benchmark is pointless by Anonymous Coward · · Score: 2, Insightful

      I am a scholar and study parallel computing.

      aka I'm a second year computer science student.

      No, give the guy a break... English is not his first language. You can tell from the "what your compiler is doing in your back", instead of "behind your back", that sort of thing. From timezone, European seems most likely... from the sentence structure... French?

  12. Re:first post by TheNastyInThePasty · · Score: 4, Funny

    int FirstPost(int a, int b)
    {
       if(a < b)
           printf("I got first post!");
       else
          printf("No, I got first post!");
    }

    int main(int argc, const char** argv)
    {
       int i = 0;

       // What prints out here?
       FirstPost(i++, i++);
    }

    --
    The best thing about UDP jokes is I don't care if you get them or not
  13. Re:first post by mark-t · · Score: 2

    Assuming typical C calling convention.... "No, I got first post" will be printed, where a will be 1 and b will be 0 in the call to FirstPost. This is because generally, final arguments are evaluated and pushed onto the stack before earlier ones.

    Although typically, the standard may say this behavior is undefined, in practice, almost all modern C compilers will produce the output I've described here.

  14. Intel C++ produced fastest code for us by pauljlucas · · Score: 4, Insightful

    This information is perhaps 2 years out of date, but back for one of my projects, when we switched from g++ to Intel C++, our software got about twice as fast with no other changes. It got even faster when we took advantage of SSE3 instructions.

    --
    If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
  15. Re:future compiler trends by ebno-10db · · Score: 4, Informative

    it has made up for that by being really good at higher level optimizations

    Heh, heh, heh, don't remember the great EGCS split of '97, do you sonny? Yep, us old timers knew that gcc was a dog of an optimizer, but them EGCS whippersnappers fixed it, and even got the fork accepted as the official gcc. Remember, you probably got to where you are today by running over the body of some crusty old-timer.

  16. Re:first post by EvanED · · Score: 4, Informative

    If it were just up to the order of evaluation of the function arguments, then it would be unspecified. However, the program also modifies the same object twice without an intervening sequence point, and that puts it into undefined behavior territory (6.5/2, C99 draft standard).

  17. Re:first post by elbonia · · Score: 2

    Clang will just issue a warning that you are making multiple unsequenced modifications. This is undefined in the C spec and the compiler just increments i sequently printing "I got first post!." Sequence points like this are hard to clarify for all cases which is why the C99 spec leaves it undefined. In C11 a detailed memory model has been created which should define most cases. http://en.wikipedia.org/wiki/C11_(C_standard_revision)

    Confirmed with:
    Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
    Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
    Target: x86_64-apple-darwin13.0.0
    Thread model: posix

  18. for complilers by mjwalshe · · Score: 2

    It is speed that is important which is why a lot of HPC people still prefer the intel compilers.

  19. Re:future compiler trends by cheesybagel · · Score: 3, Informative

    The main problem back then was X86 optimizations. Not the high level optimizations although it was lacking in those too. Eventually they started porting the code to use GIMPLE and moved most of the optimizations away from the language dependent trees to the GIMPLE language independent code. This was done before LLVM was even popular.

  20. Re:Four-hour compile times means a 1 day turnaroun by namgge · · Score: 4, Insightful

    six release cycles a day is probably why you have bugs in the first place...

  21. Re:future compiler trends by ebno-10db · · Score: 4, Informative

    No doubt that gcc is a damned good compiler these days, at least in terms of the quality of code produced, if not the speed at which the compiler runs. My point was just that it wasn't always so. Back then gcc was considered a toy compared to some of the commercial compilers, and it was. Thankfully the EGCS people did a lot to change that, and got the ball rolling for future improvements.

  22. Re:other compilers by aiht · · Score: 2

    I thought that was something people used back when MS-DOS was a popular OS was not even aware the product still existed.

    I am talking about Watcom C++ of course.

    It was open sourced some time ago. Now it supports Linux (to some extent) and some other CPU architectures.
    It can still make DOS/4GW exes, though. Ahh, nostalgia.

  23. Re:Four-hour compile times means a 1 day turnaroun by smash · · Score: 2

    he said bugfix/test, not release.

    --
    I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.