Comparing G++ and Intel Compilers and Vectorized Code
Nerval's Lobster writes "A compiler can take your C++ loops and create vectorized assembly code for you. It's obviously important that you RTFM and fully understand compiler options (especially since the defaults may not be what you want or think you're getting), but even then, do you trust that the compiler is generating the best code for you? Developer and editor Jeff Cogswell compares the g++ and Intel compilers when it comes to generating vectorized code, building off a previous test that examined the g++ compiler's vectorization abilities, and comes to some definite conclusions. 'The g++ compiler did well up against the Intel compiler,' he wrote. 'I was troubled by how different the generated assembly code was between the 4.7 and 4.8.1 compilers—not just with the vectorization but throughout the code.' Do you agree?"
For better or worse, I've always given the intel compiler the benefit of the doubt. They have access to documents that the GCC folks don't.
I don't think it's troubling.
Firstly they beat on the optimizer a *lot* between major versions.
Secondly, the compiler does a lot of micro optimizations (e.g. the peephole optimizer) to choose between essentially equivalent snippets. If they change the information about the scheduling and other resources you'd expect that to change a lot.
Plus I think that quite a few intresting problems such as block ordering are NP-hard. If they change the parameters of their heuristic NP-hard solver, that will give very different outputs too.
So no, not that bothered, myself.
SJW n. One who posts facts.
I have worked on a couple of projects that compiled and ran perfectly with GCC 4.6 and 4.7. They no longer run when compiled with the latest versions of GCC. No warnings, no errors during compilation, they simply crash when run. It's the same source code, so something has changed. The same code, when compiled with multiple versions of Clang, runs perfectly. The GCC developers are doing something different and it is causing problems. Now it may be that a very well hidden bug is lurking in the code and the latest GCC is exposing that in some way, but this code worked perfectly for years under older versions of the compiler so it's been a nasty surprise.
And got completely different results!
Asking any audience larger than about 20 to compare the qualitative differences of object code vectorization is statistically problematic as the survey group is larger than the qualified population.
Help stamp out iliturcy.
One amusing thing I discovered is that GCC 4.8.0 will actually unroll and vectorize this simple factorial function: Just look at that output!
Program Intellivision!
This is 2013 (almost 2014!) why are we talking about vectorization? Why don't people write code in vector notation in the first place anyway? If Matlab and Fortran could implement this 25 years ago, I am sure we are ready to move on now...
I just had one of those "WTF was I thinking then". An old C program of mine started dumping code. Long story short: I did one of those things in the "don't do that" category: a function scans a text buffer, stuffs a null byte here and there and returns pointers into the buffer: think tokenizing.
Point is... the buffer was stack allocated in this function! Don't try this at home, I say :-)
A newer gcc saw the buffer wasn't being accessed within the function and thought "meh, after return this buffer is toast anyway. No need to write those pesky NULLs. No one will see that!".
The compile with the old compiler "just worked" because nobody touched the stack in the meantime: the caller copied away the bits needed. Sheer luck, I'd say.
So it may well be that there are some undefined behaviours in there which fall prey to a more aggressive optimizer. Try compiling with -O0 and see whether there's any difference in behaviour. If there is... happy bug hunting :-)
...do you trust that the compiler is generating the best code for you?,,,
Trust, but verify.
.
I come from the days when it was the programmer, not the compiler, that optimized the code. So nowadays, I let the compiler do its thing, but I do a lot of double-checking of the generated code.
OMG! What's this goatse doing here?? I thought all these images were taken down by a DMCA notice by the original asshole!
Slashdot's name? When my compiler sees
Mantle is a good idea insofar as it should kick Microsoft and/or NVIDIA up the behind. We desperately need someone to cure us of the pain that is OpenGL and the lack of cross platform compatibility that is Direct 3D.
Obviously NVIDIA won't play ball with Mantle but I've got a feeling they might have to eventually given that some AAA games developers are going code a path for it. When it starts showing up how piss-poor our current high level layers are compared to what the metal can do, they'll have no choice.
Educate yourself. Is it not embarrassing to fling all this poo only to find out you are wrong?
I write code in Machine Code with a bootable hex editor (446 bytes, fits in a HDD boot sector). It's the easiest way to bootstrap an OS from scratch now that MoBos don't have boot from serial port anymore...
Here, run it in a VM: "qemu-system-i386 hexboot.img ", if you want.
Or, "dd if=hexboot.img of=/dev/sda bs=1 count=446 conv=notrunc", if you want to preserve the partition table on a bootable drive.
Arrows,PgUp,PgDn,Home,End = navigate; Tab = ASCII/Hex, Esc = jump to segment under cursor, F8 = Run code at the cursor. (this is a real-mode version)
When it boots you'll be looking at the code that booted, there's only two variables that didn't fit into the registers, you can see them changing at the bottom of the code as you stroll around.
That's all you need to create an OS, complier, etc. from scratch. You'll probably destroy your system though if you're not careful, so keep in in the VM if you're a noob; Lock-up is a mild danger, but corrupting the CMOS, etc. can leave your system bricked. You can replace the BIOS too if you know what you're doing. Maybe some day I'll publish a path to go from zero to OS while avoiding the Ken Thompson Compiler Hack... Folks are only just beginning to get interested in having actual system security, so maybe we'll lick the problem some other way. There's still chip microcode to worry about, but programmable hardware may allow us to route that exploit vector out too some day.
Screw your bullshit optimized compiler crap. It's stupid and far slower than you think, esp. since the binaries are bigger (1 cache miss and I've already beaten you in most cases). Besides, Next year or so the system will run twice as fast. My need for speed is tempered by my greater need for security and readable machine code. If I identify a patch of code that needs to be optimized or vectorized, I can do it myself.
I don't care about my lawn, it's just there to keep the dirt intact.
Hey genius, try removing the 'as' command and then run gcc. Let us know how well the compiler works without an assembler.
'Nuff said.
When documentation runs to hundreds or thousands of pages, it's hard to read it from cover to cover and reread it when each new version comes out.
the day that AMD came out with Mantle and started leveraging it's 100% monopoly in the console market
Among consoles that aren't discontinued or battery-powered, I count Xbox 360, PlayStation 3, Wii U, Xbox One, PlayStation 4, and OUYA. Of these, two have NVIDIA graphics: PlayStation 3 has RSX, and OUYA has the same Tegra 3 that's in the first-generation Nexus 7 tablet. The forthcoming iBuyPower Steam Machine also has NVIDIA graphics.
Was your program dealing with dates or tenses?
sometimes I want to reserve a global variable in a fixed register, but that requires all modules be compiled with the same flag.
That isn't "breaking the rules" as much as creating your own ABI. Classic Mac OS on 68K used to do this, where register A5 was typically reserved as a pointer to the program's global variable segment because the Mac OS ABI used position-independent code.
A compiler going to an assembler today is LAME.
How so? A tool should do one thing well. What an assembler does well is generate relocatable object code in a given format. If you're targeting two platforms, one of which uses ELF and the other COFF or whatever, one could use the same compiler to target both along with two different assemblers, one for each object code format.
if (a && b=f(a) && c=g(b)) {
do stuff with a and b and c
}
If you convert that into the other format then you need to add something like six lines of code and two levels of nested if statements.
Actually, no. Computers are not getting faster.
Microprocessors stopped getting faster a few years ago, now we just get more of them. Supercomputers have mostly reached the limits of scalability, so there is a limit to that too.
Well, what are they using in 2025?
while(1) attack(People.Sandy);
As soon as I get my C++ loops to end I'll worry about converting them to ASM.
Having to work for a living is the root of all evil.
I suspect the vectorized version of fact(1000000) is faster than the naive implementation.
It sabatoges for non-intel. We're talking about a compiler. It shouldn't matter what brand of CPU is being used.