GNU GCC Vs Sun's Compiler on a SPARC
JigSaw writes "When doing research for his evaluation of Solaris 9 on his Ultra 5, Tony Bourke kept running into the same comment online over and over again: Sun's C compiler produces much faster code than GCC does. However, he couldn't find one set of benchmarks to back this up and so he did his own."
I wish this guy would tell us what CPU he's using. There's a hell of a lot of difference between the low-cache and high-cache CPUs (yes, these will work in a u5 as well as a u10). Looks like he's using a low-cache one, where there's not as much difference (and where the 64bit penalty isn't as noticeable).
Read the articles. 2MB cache, 333 MHz UltraSPARC IIi
The Sun compiler has some optimizations, not turned on in this test, that gcc doesn't even offer. Scheduling optimization based on profiles from previous runs (rarely used) optimization and inlining across source files and across all program object files, ordering to help pagefault analysis, enable instruction prefetch, etc. If would be interesting to see if these have vast or just incremental improvements in the runtimes. I think Sun is adding auto-parallelization to future compilers. Most people don't use these optimizations since they imply some work and some testing to see whether or not they help a particular dataset. I know we don't, by the time we get something that we could test, we're already on a new codebase.
That said, the fact that a generic compiler like gcc is within spitting distance of Forte or SunONE or whatever they call it this week is impressive.
Actually, in early iterations, gcc killed most vendor's compilers, including Sun's. This was mostly because most vendors's compilers were absolutely terrible when gcc was first released. Since then, compiler technology has made huge advances and vendors have spent lots of effort improving them. At the same time, with the increasingly complex scheduling requirements of todays RISC processors, making a compiler fast takes a lot more work. Designing a portable instruction scheduler that performs good on very different processors is nearly impossible (though gcc does it surprisingly well).
The OpenSSL code has highly optimized assembly for those functions under x86. On other archs it is just C code that the compiler has to optimize.
That may explain the speed difference that you are seeing.
Scheduling optimization based on profiles from previous runs (rarely used)
gcc supports this.
May we never see th