Speed Test 2: Comparing C++ Compilers On WIndows
Nerval's Lobster writes "In a previous posting, developer and programmer Jeff Cogswell compared a few C++ compilers on Linux. Now he's going to perform a similar set of tests for Windows. "Like all things Windows, it can get costly doing C++ development in this environment," he writes. "However, there are a couple notable exceptions" such as free and open-source cygwin, mingW, Express Versions of Visual Studio, and Embacadero. He also matched up the Intel C++ Compiler, Microsoft C++ Compiler, and the Embarcadero C++ 6.70 Compiler. He found some interesting things — for example, Intel's compiler is pretty fast, but its annoying habit of occasionally "calling home" to check licensing information kept throwing off the rests. Read on to see how the compilers matched up in his testing."
>> its annoying habit of occasionally "calling home" to check licensing information
Calling home for the latest NSA exploits to inject in to your application? /tinfoil-hat-no-so-paranoid-these-days-dept
I do believe that no one on slashdot cares about this.
Nobordy uses Windose anymorr, it is nowt a ipon.
UNITE with the Campaign for a Free Internet because today, our future begins with tomorrow!
Did calling home really throw off the results? Since that is something that ordinary users would have to put up with, I would think it should be part of the test. It might be difficult to get an average, but testing Intel's compiler only when it is at its fastest doesn't seem fair.
does the Intel one still slow down on AMD systems and or trun out code with AMD slow down blocks?
Based on his description, he is using a very synthetic benchmark:
The code I’m testing contains no #include directives, and makes use of only standard C++ code. It starts with one class, and then is followed by 6084 small classes derived from various instantiations of the template classes. (So these 6084 classes are technically not templates themselves.) Then I create 6084 instantiations of the original template class, using each of the 6084 classes. The end result is 6084 different template instantiations. Now, obviously in real life we wouldn’t write like that (at least I hope you don’t).
So in his own words, the code does not reflect realistic compiles. There is no reason to assume that the result generalise to any programs that anyone actually cares about.
Also, there are no error bars of any kind listed.
In other words, I have no reason to assign any meaning to these numbers.
I took a quick took at their website. It looks quite scammy, they only talk about how much you will save, not about how much it will cost.
After clicking through the buy-now buttons twice, I found the C++ version was $4000.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
My understanding is that they never explicitly 'slowed down' AMD systems; but that the binaries produced by their compiler refused to honor the capabilities flags of non-intel processors (eg. even if an AMD CPU lists 'SSE2, SSE3' among supported instructions, it would get the fallback to non-SSE instructions, while Intel CPUs would get whatever their supported instructions lists specified). No actual 'here be lots of NOPs for no reason'; but x87 on a machine that can do recent SSE is probably enough to achieve the same effect...)
LLVM has got to be dynamically linking and stripped by default. There are switches on the other compilers that will let you do that, and it looks like they're being ignored.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
This doesn't test the speed of generated code. I like to know which compiler produces faster code when looking at benchmarks.
According to the fine article, "The Intel compiler occasionally âoecalls homeâ to an Intel-owned Website to check licensing information. When it does so, it prints out a message about when the current license expires. I didnâ(TM)t use the results when that happens, since it would add time and skew the timing results. " WRONG. The tester should not have excluded these results where time was wasted with this nonsense: If WE the users have to put up with it, it SHOULD be included in the benchmarks.
Sometimes the "writing on the wall" is blood spatter...
Wow, lets look at what's being measured here, COMPILE TIME, and EXECUTABLE SIZE...what about the performance of the generated application.
This doesn't measure optimizations, this just measures COMPILE TIME..I don't care if my applications takes 1 sec or 1 hour to compile, I care about the PERFORMANCE of the actual APPLICATION.
This is just crap.
My understanding is that they never explicitly 'slowed down' AMD systems; but that the binaries produced by their compiler refused to honor the capabilities flags of non-intel processors
Oh, my. Just how many major non-Intel x86-64 CPU vendors are there? AMD, and...? It's suspiciously similar to the ACPI and SecureBoot affairs, don't you think?
Ezekiel 23:20
Well, our wiki overlords list 15 known CPU IDs; but one of them is intel, one is AMD, one is a VM, and most of the rest are the forlorn epitaphs of the fallen.
If you're from the states, it's a bit early for the bottle isn't it?
Oh, my. Just how many major non-Intel x86-64 CPU vendors are there?
Why is it Intel's job to waste time supporting processors that aren't their own?
never heard of Embarcadero.
I've heard of Borland, Watcom, Digital Mars and Bloodshed
I'd just like to see a C++11 compiler for windows.
File under 'M' for 'Manic ranting'
If you're trying to figure out what features are available with only the vendor ID, you're doing it wrong. "GenuineIntel" and "AuthenticAMD" have never been enough to tell you anything about what SSE operations a chip can do. And if every other vendor can parse the rest of the CPUID data, Intel can too (or they can fuck off, IMO).
I guess they don't teach science in Computer Science, because the Linux post from 23 days ago doesn't state the compiler options that were used. Even after having the glaring omission pointed out by a commenter 20 days ago, the problem still hasn't been fixed. Fixing it should be a priority over moving on to Windows.
The reason there are these feature flags is that, they can be used to identify the capabilities of the processor. It's not supporting processors that aren't theirs, it's supporting the same features in all processors that support the features.
Benchmarking compilers on how long it takes to compile would be like benchmarking cars based on how long it takes to fill the gas tank.
There are so many things that can affect compile time more than the compiler - and the end customer really doesn't care anyway. Frankly, if you want a 3-5x speedup, just put the whole thing on an SSD and let it fly.
I have mod points and I am not afraid to use them
Also Microsoft's Jim Radigan held a cool presentation in GoingNative 2013 where he reveals some optimization tricks done by the MSVC++ compiler. It also shows some screenshots where Windows is being compiled on a monster multi-core machine.
VIA was also one that was affected by Intels compiler behaviour.
My compiler instantly produces a 0-byte executable for any codebase. You can't actually run the resulting file, but since all we care about here is compile time and executable size, I guess my compiler wins!
There were checks for "GenuineIntel" in the cpuid result (http://www.agner.org/optimize/blog/read.php?i=49). The only thing that could possibly be worse would be *excluding* "AuthenticAMD". And that's debatable.
Just use libsimdpp ( https://github.com/p12tic/libsimdpp ) or any of the myriad similar wrappers. With modest time investment you get almost optimal implementation for multiple instruction sets on any compiler you use.
Am I blind or is every compiler version specified except for the Visual Studio one? Version 18 (which ships with 2013) is the latest.
Also, the VS compiler times are reversed. All of the others listed in the article are "No optimization" followed by "full optimization".
Final nitpick, /Ox favors execution speed
Just use libsimdpp ( https://github.com/p12tic/libsimdpp ) or any of the myriad similar wrappers. With modest time investment you get almost optimal implementation for multiple instruction sets on any compiler you use.
If memory serves, their argument varies (depending on whether the FTC appears interested or not) between 'fuck you, it's the Intel compiler collection, and it'll do what's best for Intel. Go suck an Opteron if you like AMD so much.' and 'Gosh, we sure know about the capabilities flags; but we can't be sure of the details of other vendors'(*cough*shoddy, probably reverse engineered illegally*cough*) implementations of certain complex features, and our customers expect our compiler suite to provide stable, correct output, so reverting to the x87 codepath is our only real option..."
Oh, VIA... I'm honestly always a bit surprised to see them still trying.
Back before Intel got (slightly) serious about cheap, with 'atom' and AMD got slightly serious about low-power, with some of their APUs, they made more sense, (in particular, a number of rather interesting x86 embedded specialty boards were VIA based, for situations too low-power or cost constrained for a p3/p4); but lately they've been a much tougher sell. Still some interesting specialty stuff; but 'Unichrome' graphics are such a clusterfuck to deal with that they make AMD look like GPU driver gods, and Intel look (while slow) nearly infallible, and both Intel and AMD have put some rather more aggressive parts into what used to be VIA's playground.
With modest time investment you get almost optimal implementation for multiple instruction sets on any compiler you use.
I'm using ClozureCL and SBCL. I don't think that this is going to work. :-)
Ezekiel 23:20
Sorry, my post was directed to the parent of your post. Somehow I misclicked somewhere and didn't notice.
Doesn't matter, it's still an interesting thing to study. Maybe if I ported and macroified the whole thing for ClozureCL, some good use for me could come from it, too! :-)
Ezekiel 23:20
The Intel compilers do NOT "phone home" for licensing. What they do "phone home" for is to send anonymous usage data. When you install, you're asked if you want to opt in to this - it is not enabled by default. Licensing is done entirely locally for single-user licenses. See http://software.intel.com/en-us/articles/software-improvement-program for more information.
Why would they need to "reverse engineer" features (instruction set extensions) that are already publicly documented for the benefit of compiler writers and assembly language programmers?
Ezekiel 23:20
Long, long ago some review site ran a Via CPU based system while spoofing the CPU ID to appear as an Intel CPU of similar capabilities.
They expected a few percent gain in the FP and INT benches, but oddly got an 8-fold increase in reported memory bandwidth. The other benchmarks appeared to reflect a real increase in memory performance.
Don't wipe your arse with Intel, they're so dirty you'll end up shittier.
I used that one a lot when I was younger. It is a free C/C++/Fortran compiler for Windows, OS/2, DOS and Linux. It has an integrated IDE and also does cross-compiling.
Given how a comparison like this is fairly objective for a variety of reasons, it would have been a much better use of the readers' time to just do a final chart which shows you which compiler, on average, and from his specific tests, performed the best. Then toss in another chart which compares binary size, even though people aren't really going to care as much about that.
A lot.
Typing this in a AND Phemom II black edition which is very fast and not that far off from an i7 back in 2010 when I purchased this. True the newer ones are slower per ghz sadly.
But what if AMD's next chip kicked ass! Remember the Athlon and the later AthlonXPs were the fastest x86 chip you could buy a decade ago?
Tomshardware would include Skyrim and other Intel compiled apps and whine how slow their inferior AMD chips are and intel fan boys would gleam ... but regardless I have a problem with Intel.I do not want to pay more money for a CPU that provides less value.
Even if you do want to argue on this price/performance I get unlocked bios that can support virtualization. Not $900 computer to run VMware natively. Any AMD chipset can run it if you turn it on in the bios!
To answer the grandparent YES INTEL compilers DO CRIPPLE AMD if they do not include SSE3, registers, and other items and use non IEEE standard FPU x87 to make their cpus look better. It is like Nvidia crippling OpenCL to force developers to make CUDA apps who then go on how slow AMD's radeons are etc.
http://saveie6.com/
Why is it Intel's job to waste time supporting processors that aren't their own?
You mean like all the CPUs based on the X86_64 (aka AMD64) instruction set? You know the instruction set invented by AMD that Intel licenced when it became apparent that it would kill Itanium.
Seriously, Intel devised a scheme to determine if a CPU supports advanced features. They documented the scheme and told programmers that it was the correct way to determine the feature set of a CPU. Then they create a compiler that does not use the documented method. In fact, their compiler went out of its way to ensure that only Intel CPUs would actually use the advanced features even if a CPU reported that it could.
F*ck Intel and f*ck any company that would waste my time by doing such underhanded things. I have no respect for Intel and I don't use their sh!t any more.
Right now, I'm posting this as I wait for a brain-dead compiler to complete its task. This is productive time lost because we've chosen a compiler which can't figure out dependency management, takes a long time to compile, and needlessly recompiles unchanged files. Compile time is the single largest user-changeable component of the compile-edit-debug cycle.
It matters to those with a deadline to meet, or to those who'd like to see their families once in a while.
My understanding is that they never explicitly 'slowed down' AMD systems
You are wrong:
"Overview of CPU dispatching in Intel software"
http://www.agner.org/optimize/blog/read.php?i=49#121
Posting to cancel moderation.
But what if AMD's next chip kicked ass! Remember the Athlon and the later AthlonXPs were the fastest x86 chip you could buy a decade ago?
It could theoretically happen, but the Athlon's success was as much about AMD coming up with a decent architecture as it was Intel simultaneously dropping the ball with the Netburst architecture.
My Via QuadCore Artigo comes tomorrow in the mail. I recently fell in love with these processors because of the super low power without sacrificing a lot. I run gentoo and develop C++ apps. I have an FX8120 with a cooler so huge that it blocks two memory slots and ran it all day long at 5.5ghz stable. But with the 8800GTX and that CPU I was idling at 230 watts. Under load 500-700 on my Kill-A-Watt.
The Via QuadCore idles around 10-15 watts and has a total TDP of 25 watts under 100% load. With a fanless box, no CDROM, and a high end SSD with caching; it does everything I need at even less power. Hell then you undervolt it and underclock if you want 2-5 watts idle.
The point is I went from a 230watt idle and ~650watt load computer to a 10-15 watt idle and 25watt under load computer that does the same stuff. Plays videos, flash on webpages isn't laggy and generally I'm happy to see my power bill drop. The room is quiet and cool. Unbelievable how wasteful that 8-core rig was. I'll just take it to work and make it a remote distCC host for the odd huge compile.
The QuadCore is even faster and has a rather impressive stock video card. Can't wait to get Gentoo on it tomorrow afternoon. I put the Kill-A-Watt on the desk so I can always see how much power it uses. My C7-D under 15.x load averages for roughly 3 days, tends to meter 0.89kWH. My desktop hit that in 15 minutes or less under load. I'm happy.
Next up to run this rig off my solar panels.....
For the next benchmark, why not compare Windows, Mac, Linux, Android, BeOS.
The basis for comparison can be "how long does it take to open the clock application, measured from the desktop". We won't count boot time, and we can run on a virtual machine to equalize the compartison.
The numbers would be just as meaningless, and still be better than that useless benchmark.
How good would the Intel compiler have to be at optimizing on AMD processors to avoid accusations that they were deliberately slowing things down?
Did anyone ever hack the compiler to remove this check? ie. just assume the chip is a genuine Intel? I'd be interested in the results.
Visual C++ has this handy /MP option which tells the compiler to do multi-threaded compiles. On some of our build machines (with 16 cores) this gives an almost linear increase in build speeds. It's obvious from the author's discussion of multi-core that he is not aware of this option and did not use it.
A performance benchmark which doesn't turn on the go-fast option is not going to produce meaningful results.
The author also doesn't discuss debug symbols. VC++ generates debug symbols by default, whereas the other compilers do not. Generating builds without symbols is not a reasonable scenario for most builds, so this makes the file size comparisons rather meaningless.
Now the Via quad core is equally overclocked in order to make a valid comparison, right?
And by the way, Flash sucks at video. Take any computer where Flash lags and pegs the CPU and play an un-transcoded flv in ANY other player, and it will be perfectly smooth while putting almost no load on the computer. It's a shame Adobe ate its only competition and ruined all of the products it acquired.
That is a form of explicitly slowing down and a rather blatant one. Like if someone decides to 'run' the 100 meter by hopping on one foot.
Naturally, they wouldn't. It's just liars lying to people not technical enough to catch it.
It's not really, but it is their job not to sandbag them. The ICC isn't a freebie you get with the CPU.