Comparing G++ and Intel Compilers and Vectorized Code
Nerval's Lobster writes "A compiler can take your C++ loops and create vectorized assembly code for you. It's obviously important that you RTFM and fully understand compiler options (especially since the defaults may not be what you want or think you're getting), but even then, do you trust that the compiler is generating the best code for you? Developer and editor Jeff Cogswell compares the g++ and Intel compilers when it comes to generating vectorized code, building off a previous test that examined the g++ compiler's vectorization abilities, and comes to some definite conclusions. 'The g++ compiler did well up against the Intel compiler,' he wrote. 'I was troubled by how different the generated assembly code was between the 4.7 and 4.8.1 compilers—not just with the vectorization but throughout the code.' Do you agree?"
I don't think it's troubling.
Firstly they beat on the optimizer a *lot* between major versions.
Secondly, the compiler does a lot of micro optimizations (e.g. the peephole optimizer) to choose between essentially equivalent snippets. If they change the information about the scheduling and other resources you'd expect that to change a lot.
Plus I think that quite a few intresting problems such as block ordering are NP-hard. If they change the parameters of their heuristic NP-hard solver, that will give very different outputs too.
So no, not that bothered, myself.
SJW n. One who posts facts.
Asking any audience larger than about 20 to compare the qualitative differences of object code vectorization is statistically problematic as the survey group is larger than the qualified population.
Help stamp out iliturcy.
Unfortunately, that's not unique to GCC. I've seen this happen with several different compliers for different programming languages over the years. Worse, I've seen it with the same compiler, but different Optimizer settings.
In one case, our system didn't work (segfaulted) with the optimizer engaged, and didn't meet timing requirements without the optimizer. And the problem wasn't in our code, it was in a commercial product we bought. The compiler vendor, the commercial product vendor (and the developer of that product, not the same company as we bought it from) and our own people spent a year pointing fingers at each other. No one wanted to (a) release source code and then (b) spend the time stepping through things at the instruction level to figure out what was going on.
And the lesson I learned from this: Any commercial product for which you don't have access to source code is an integration and performance risk.
Yeah, on Intel processors. What about AMD and other x86 processors? Don't ever forget that ICC was once caught red-handed disabling important features when the CPUID did not return GenuineIntel...