Examining the User-Reported Issues With Upgrading From GCC 4.7 To 4.8
Nerval's Lobster writes "Developer and editor Jeff Cogswell writes: 'When I set out to review how different compilers generate possibly different assembly code (specifically for vectorized and multicore code), I noticed a possible anomaly when comparing two recent versions of the g++ compiler, 4.7 and 4.8. When I mentioned my concerns, at least one user commented that he also had a codebase that ran fine after compiling with 4.6 and 4.7, but not with 4.8.' So he decided to explore the difference and see if there was a problem between 4.7 and 4.8.1, and found a number of issues, most related to optimization. Does this mean 4.8 is flawed, or that you shouldn't use it? 'Not at all,' he concluded. 'You can certainly use 4.8,' provided you keep in mind the occasional bug in the system."
If it ain't broke, don't fix it. No need to upgrade.
Thanks for another worthless uninformative article.
Holy fuck, I sure won't be using this for anything mission-critical.
Once upon a time I hear someone say "Trust the compiler, the compiler is your friend."
In theory that sounds fine but the more I compile the more I lean towards that it is absolutely necessary for C and C++ users to know assembler and preferably have a good idea of what the compiler will output. Problems can occur and abstractions makes them harder to analyze.
This is even more important when working with microcontrollers where the compilers generally aren't as tried and tested as one would wish.
Does this mean 4.8 is flawed, or that you shouldn't use it? 'Not at all,' he concluded. 'You can certainly use 4.8,' provided you keep in mind the occasional bug in the system."
It reminds me of the [in]famous Windows 9x BSOD whenever I wanted to print some particular Word document. If I wanted it to print without throwing the BSOD, all I had to do was to remove the leading space at the begining of the header. The same document prints fine in Windows XP.
With this kind of logic, it just doesn't make sense!
Though the code behaves differently with, and without optimisation, and does not work on the new compiler whereas it did on the old,
this does not mean it is a bug in the compiler.
GCC, Clang, acc, armcc, icc, msvc, open64, pathcc, suncc, ti, windriver, xlc all do varying optimisations that vary across version, and
that rely on exact compliance with the C standard. If your code is violating this standard, it risks breaking on upgrade.
http://developers.slashdot.org/story/13/10/29/2150211/how-your-compiler-can-compromise-application-security
http://pdos.csail.mit.edu/~xi/papers/stack-sosp13.pdf
Click on the PDF, and scroll to page 4 for a nice table of optimisations vs compiler and optimisation level.
_All_ modern compilers do this as part of optimisation.
GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)
This doesn't on first glance seem insane code to check if a buffer will overflow if you put some data into it. However the C standard says that an overflowed
pointer is undefined, and this means the compiler is free to assume that it never occurs, and it can safely omit the result of the test.
I've only run into a few compiler bugs (like the one in this article, most always due to the optimizers), and it was always so incredibly aggravating, because it's easy to believe that compilers are always perfect. Granted, they might not produce the most efficient code, but bugs? No way! Of course I know better now, and most of the bugs I came across were back in the Pocket PC days when we had to maintain 3 builds (SH3, MIPS and ARM) for the various platforms (and of course the bugs were specific to an individual platform's compiler, which made it a little easier actually to spot a compiler bug, when a simple piece of code worked on 2 of 3 architectures).
Better known as 318230.
Some people see "bugs," others see "features."
I've seen solution features designed around security holes before, and when we finally patched the breach, we received emails demanding that the decision be reversed and how dare we break customer solutions by surreptitiously patching things!
Sometimes you never can win.
-- "Simplicity is prerequisite for reliability." --Dijkstra
The article basically says:
"GCC 4.8 includes new optimizations! Because of this, the generated assembly code is different! This might be BAD."
Like, duh? Do you expect optimizations to somehow produce the same assembly as before, except magically faster?
The linked "bug" is here: http://stackoverflow.com/questions/19350097/pre-calculating-in-gcc-4-8-c11 - which says, "Hey, this certain optimization isn't on by default anymore?" And to which the answer is, "Yeah, due to changes in C++11, you're supposed to explicitly flag that you want that optimization in your code."
So, yeah. Total non-story.
One of the projects I work on will compile and run perfectly with GCC 4.6 and any recent version of Clang. However, compiling under GCC 4.7 or 4.8 causes the program to crash, seemingly at random. We have received several bug reports about this and, until we can track down all the possible causes, we have to tell people to use older versions of GCC (or Clang). Individual users are typically fine with this, but Linux distributions standardize on one version of GCC (usually the latest one) and they can't/won't change, meaning they're shipping binaries of our project known to be bad.
So, as has always been the case: use optimizers with caution, and verify the results. This is standard software development procedure. Some aspects of optimization are deterministic and straightforward, and are therefore pretty low risk; others optimizations can have unpredictable results that can break code.
OMG, you mean when you revise code, add features and alter functionality, it may result in bugs?
Undefined behavior is a very big problem in C and C++. It causes major headaches in producing cross-compiler code. I don't have much experience with multi-threaded in GCC but it must cause major headaches because of all the timing involved.
Complain about compiler optimizations fucking shit up quite moronically and being generally stupid as a brick, including introducing block-headed bottlenecks due to over optimistic assumptions of synchronization overhead when applying automatic vectorization or improper use of SIMD, and you're just some old wanker who loves assembly code too much and doesn't know what the fuck you're talking about. Instruct folks that counting backwards is faster because comparing to the static value 0 is faster (even in high level languages, like JavaScript), and the moronic mods down vote. Nope, let's not consider that the idyllic high level language actually has to ever run on actual Von Neumann architecture hardware -- Just ignore the hardware altogether, let the holy compiler sort it out. Ever actually LOOKED at what the shit is going on in there? I have. It's horrendous -- Oh, but confirmation bias bolsters your own uninformed opinion over anyone else's (as your unevolved fight-or-flight lizard brain logic dictates).
Compiler Apologists abound -- Much like the religious zealots. They're quick to claim that "just because your code doesn't work any more doesn't mean it's the compiler's fault" and then ignore any evidence that contradicts their beliefs. Fuck humans. Downmod me again. Label me a troll for heresy. Search your foolish feelings, you know it's fucking true.
Protip: If compiler devs knew anything about cybernetics there would be no such thing as undefined behavior. array[i] = i++; should either index the array first, or increment the index var first, and thus be compile-time defined behavior as only either of these two outcomes, by no reasonable stretch of imagination should this result "undefined" behavior allowing optimization as a no-op or running fucking Nethack, you damn idiotic overachieving primordial pond-scum. Compilers should throw a fatal error if they can't figure out what to do -- Just like when a semicolon is missing and thus requires further clarification of intent -- you don't just no-op that line in the sake of optimization eh? For fuck's sake, if that's the best your brightest have to offer, your planet is doomed!
He actually observed that different assembler code was generated - well how do you think can you generate _faster_ assembler code without generating _different_ assembler code?
The article does _not_ make any claim that any code would be working incorrectly, or give different results. The article _doesn't_ examine any user-reported issues. So on two accounts, the article summary is totally wrong.
I _cannot wait_ to see how much hilarity ensues in the Gentoo world, where it's real common for random clowns with no debugging (or bug reporting) ability to have -Oeverything set.
If you depend on undefined behaviour, and it seems to work, you're just lucky.
The problem isn't the language, it's the offenders bad coding practices.
Having been somewhat involved in the migration of a lot of C++ code from older versions of gcc to gcc 4.8.1, I can tell you that 4.8.1 definitely has bugs, in particular with -ftree-slp-vectorize. This doesn't appear to be a huge problem in that almost all the (correct) C++ code we threw at the compiler produced good compiler output, meaning that the quality of the compiler is very good overall. If you do find a bug, and you have some code that reproduces the problem, file a bug report, and the gcc devs will fix the problem. At any rate, gcc 4.8.2 has been out for a number of months now, so if you're still on 4.8.1, you may want to upgrade.
Please correct me if I got my facts wrong.
I haven't tried this with the latest version by even a version 4.x GCC cannot generate inline code with the 8 bytes version of cmpxchg with 32bit code. Doing this in a function is OK.
I think the problem is that this instruction almost takes up all of the registers and GCC cannot cope with this if you want to do it inline.
cmpxchg8b is useful for lock-free code.
Government cannot make man richer, but it can make him poorer. - Ludwig von Mises
C does what you tell it to.
If you tell it to do something stupid, it will still try to do it.
It's up to YOU to not tell it to do stupid things.
Maybe you need a static code checker?
The whole problem is the introduction of C++ into the code base.
C++ is a departure from conventional problem solving and in general,
most C++ coders come from an academic or MicroSoft background
and are not hardened programmers. Yeah, their theory is wonderful,
but there's little experience in edge cases, which will be the majority
of the issues.
>GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)
Seriously? Wait, no, I thing Slashdot just ate your <, and that should be if(p+100 < p)
edit: Wait, Slashdot silently swallows malformed "HTML tags", but doesn't parse < properly? How the $#@! are you supposed to include a less-than sign?
--- Most topics have many sides worth arguing, allow me to take one opposite you.
<< LIke This >>
Hint: the trailing ';' is not optional.
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
On a related note, does anyone have any suggestion on how to track down such bugs? Are there for example code-analysis tools that will highlight code with undefined behavior likely to give different results when optimized, or valid code that may trigger known compiler bugs? It seems like such a thing would be immensely valuable - if I have a compiler-related mystery bug *somewhere* in my codebase, being able to narrow that down to even the 0.1% of lines containing "suspicious" code could make the difference between it being impossible to solve and merely difficult.
In fact I'm rather surprised that "this code may cause undefined behavior" isn't a standard compiler warning. I mean C and C++ are performance-oriented languages that practically invite developers to come up with "clever" solutions, a warning that they have exceeded the sometimes non-obvious limits of defined behavior would probably save more debugging hours than any other warning on the planet.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Is gcc 4.8 the one where the compiler source was completely converted to C++?
/me ducks.
Stick Men
If you're using virtual memory, you're doing it wrong. Malloc() my ass.
The start of the summary was just so bizarre to me. Of course different versions generate different code, that's what happens when you change how code is optimized. Why would someone set out to investigate this, except as a question about how it improves the code.
Now if there's a bug that's a different issue, and all compilers are going to have some sort of bugs somewhere as these are complex pieces of code. But a change in the output should never be treated as evidence of a bug.
This.
I've seen a few cases where I've inherited code working on 4.7 that broke on 4.8. In all cases compiling with -Wall -Wextra -Werror and then correcting all the mistakes it flagged up removed the errors. Basically the only reason it was working in the first place was the original developer was making assumptions about how the specific compiler would behave in undefined circumstances. By cleaning up the code it was now only relying on defined behaviour.
The real issue with GCC 4.8 is it uses C++ as implementation language, so now I need a C++ compiler to compile a C compiler. Fuuuuuuuuuck.
The GP has a valid point. Most people complaining of these optimizer "bugs" likely have undefined behavior. In C & C++, the compiler/optimizer/linker is given full freedom on what to do. Often, the compiler will just eliminate the code. It could, in theory, format your hard drive. Yes, compiler bugs do happen, but they tend to be rare and infrequent. Last GCC bug I saw was on a minor revision of 4.1.2 that caused an ICE (internal compiler error) when you had an anonymous namespace at the global namespace level.
GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if (p+100<p)
With pointers that's fairly reasonable, because they practically never overflow like that anyway, but it applies to signed integers as well and there it's fairly dangerous optimization. Many overflow checks etc are easiest to do with wraparound arithmetic and it's terribly easy to forget the unsigned keyword there (thus invoking UB) and the resulting bugs get promoted from lurking portability issues to actual live security hazards.
Nm, rhetorical question, you ARE an idiot. GCC has extensive regression tests, any patch that you send in MUST come with tests, and if you RTFA you will see that it doesn't mention any actual bugs. Most complaints about GCC 4.8 optimizations are from users who wrote code that did stuff that the C standard said would cause undefined behavior, but in older compilers happened to do predictable things so the buggy user code went into production without anyone noticing. GCC 4.8 optimizes more aggressively and if your code does undefined operations, all the dire warnings about that will ACTUALLY COME TO PASS instead of being FUD. Of course there ARE compiler bugs but the more common case is user code bugs that newer compilers actually trigger. C is just a bloody dangerous language and people unwilling to deal with that shouldn't be writing in it. See blog.regehr.org for much more.
to find undefined behaviour, using Clang static analyzer or something like Coverity, etc. are all approaches to pursue.
Some info about KCC is here: http://blog.regehr.org/archives/523
Basically the code that comes out of it is slower than crap, but it checks just about every error condition and undefined behaviour you can imagine, and tells you if it hits such a thing. I want to start using it (haven't yet).
Even Stroustrup hates seeing C/C++.
C++ in an intersection, not a subset, of C.
If you are writing your C++ code with a C mindset, you are doing it wrong and should get the fuck out.
Get back to Python you script kiddie piece of shit.
You need the compiler to do everything for you?
If you don't know if something is stupid, GTFO.
Now the compiler is supposed to reason out your intent?
What are you smoking, numbnuts.