Examining the User-Reported Issues With Upgrading From GCC 4.7 To 4.8
Nerval's Lobster writes "Developer and editor Jeff Cogswell writes: 'When I set out to review how different compilers generate possibly different assembly code (specifically for vectorized and multicore code), I noticed a possible anomaly when comparing two recent versions of the g++ compiler, 4.7 and 4.8. When I mentioned my concerns, at least one user commented that he also had a codebase that ran fine after compiling with 4.6 and 4.7, but not with 4.8.' So he decided to explore the difference and see if there was a problem between 4.7 and 4.8.1, and found a number of issues, most related to optimization. Does this mean 4.8 is flawed, or that you shouldn't use it? 'Not at all,' he concluded. 'You can certainly use 4.8,' provided you keep in mind the occasional bug in the system."
If it ain't broke, don't fix it. No need to upgrade.
Thanks for another worthless uninformative article.
Holy fuck, I sure won't be using this for anything mission-critical.
Once upon a time I hear someone say "Trust the compiler, the compiler is your friend."
In theory that sounds fine but the more I compile the more I lean towards that it is absolutely necessary for C and C++ users to know assembler and preferably have a good idea of what the compiler will output. Problems can occur and abstractions makes them harder to analyze.
This is even more important when working with microcontrollers where the compilers generally aren't as tried and tested as one would wish.
Does this mean 4.8 is flawed, or that you shouldn't use it? 'Not at all,' he concluded. 'You can certainly use 4.8,' provided you keep in mind the occasional bug in the system."
It reminds me of the [in]famous Windows 9x BSOD whenever I wanted to print some particular Word document. If I wanted it to print without throwing the BSOD, all I had to do was to remove the leading space at the begining of the header. The same document prints fine in Windows XP.
With this kind of logic, it just doesn't make sense!
Though the code behaves differently with, and without optimisation, and does not work on the new compiler whereas it did on the old,
this does not mean it is a bug in the compiler.
GCC, Clang, acc, armcc, icc, msvc, open64, pathcc, suncc, ti, windriver, xlc all do varying optimisations that vary across version, and
that rely on exact compliance with the C standard. If your code is violating this standard, it risks breaking on upgrade.
http://developers.slashdot.org/story/13/10/29/2150211/how-your-compiler-can-compromise-application-security
http://pdos.csail.mit.edu/~xi/papers/stack-sosp13.pdf
Click on the PDF, and scroll to page 4 for a nice table of optimisations vs compiler and optimisation level.
_All_ modern compilers do this as part of optimisation.
GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)
This doesn't on first glance seem insane code to check if a buffer will overflow if you put some data into it. However the C standard says that an overflowed
pointer is undefined, and this means the compiler is free to assume that it never occurs, and it can safely omit the result of the test.
I've only run into a few compiler bugs (like the one in this article, most always due to the optimizers), and it was always so incredibly aggravating, because it's easy to believe that compilers are always perfect. Granted, they might not produce the most efficient code, but bugs? No way! Of course I know better now, and most of the bugs I came across were back in the Pocket PC days when we had to maintain 3 builds (SH3, MIPS and ARM) for the various platforms (and of course the bugs were specific to an individual platform's compiler, which made it a little easier actually to spot a compiler bug, when a simple piece of code worked on 2 of 3 architectures).
Better known as 318230.
Some people see "bugs," others see "features."
I've seen solution features designed around security holes before, and when we finally patched the breach, we received emails demanding that the decision be reversed and how dare we break customer solutions by surreptitiously patching things!
Sometimes you never can win.
-- "Simplicity is prerequisite for reliability." --Dijkstra
The article basically says:
"GCC 4.8 includes new optimizations! Because of this, the generated assembly code is different! This might be BAD."
Like, duh? Do you expect optimizations to somehow produce the same assembly as before, except magically faster?
The linked "bug" is here: http://stackoverflow.com/questions/19350097/pre-calculating-in-gcc-4-8-c11 - which says, "Hey, this certain optimization isn't on by default anymore?" And to which the answer is, "Yeah, due to changes in C++11, you're supposed to explicitly flag that you want that optimization in your code."
So, yeah. Total non-story.
One of the projects I work on will compile and run perfectly with GCC 4.6 and any recent version of Clang. However, compiling under GCC 4.7 or 4.8 causes the program to crash, seemingly at random. We have received several bug reports about this and, until we can track down all the possible causes, we have to tell people to use older versions of GCC (or Clang). Individual users are typically fine with this, but Linux distributions standardize on one version of GCC (usually the latest one) and they can't/won't change, meaning they're shipping binaries of our project known to be bad.
So, as has always been the case: use optimizers with caution, and verify the results. This is standard software development procedure. Some aspects of optimization are deterministic and straightforward, and are therefore pretty low risk; others optimizations can have unpredictable results that can break code.
OMG, you mean when you revise code, add features and alter functionality, it may result in bugs?
Undefined behavior is a very big problem in C and C++. It causes major headaches in producing cross-compiler code. I don't have much experience with multi-threaded in GCC but it must cause major headaches because of all the timing involved.
He actually observed that different assembler code was generated - well how do you think can you generate _faster_ assembler code without generating _different_ assembler code?
The article does _not_ make any claim that any code would be working incorrectly, or give different results. The article _doesn't_ examine any user-reported issues. So on two accounts, the article summary is totally wrong.
I _cannot wait_ to see how much hilarity ensues in the Gentoo world, where it's real common for random clowns with no debugging (or bug reporting) ability to have -Oeverything set.
You stopped taking your meds again, didn't you? Maybe you should have a cookie and a glass of milk and then maybe a little nap.
If you depend on undefined behaviour, and it seems to work, you're just lucky.
The problem isn't the language, it's the offenders bad coding practices.
Having been somewhat involved in the migration of a lot of C++ code from older versions of gcc to gcc 4.8.1, I can tell you that 4.8.1 definitely has bugs, in particular with -ftree-slp-vectorize. This doesn't appear to be a huge problem in that almost all the (correct) C++ code we threw at the compiler produced good compiler output, meaning that the quality of the compiler is very good overall. If you do find a bug, and you have some code that reproduces the problem, file a bug report, and the gcc devs will fix the problem. At any rate, gcc 4.8.2 has been out for a number of months now, so if you're still on 4.8.1, you may want to upgrade.
Please correct me if I got my facts wrong.
Protip: Don't confuse compiler devs with people who specify programming languages. It makes you look stupid.
A successful API design takes a mixture of software design and pedagogy.
I haven't tried this with the latest version by even a version 4.x GCC cannot generate inline code with the 8 bytes version of cmpxchg with 32bit code. Doing this in a function is OK.
I think the problem is that this instruction almost takes up all of the registers and GCC cannot cope with this if you want to do it inline.
cmpxchg8b is useful for lock-free code.
Government cannot make man richer, but it can make him poorer. - Ludwig von Mises
Here's some others: Don't confuse undefined behavior with useless behavior. Don't confuse undefined behavior with a free ticket to generate whatever crap code you'd want.
C does what you tell it to.
If you tell it to do something stupid, it will still try to do it.
It's up to YOU to not tell it to do stupid things.
Maybe you need a static code checker?
>GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)
Seriously? Wait, no, I thing Slashdot just ate your <, and that should be if(p+100 < p)
edit: Wait, Slashdot silently swallows malformed "HTML tags", but doesn't parse < properly? How the $#@! are you supposed to include a less-than sign?
--- Most topics have many sides worth arguing, allow me to take one opposite you.
<< LIke This >>
Hint: the trailing ';' is not optional.
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
On a related note, does anyone have any suggestion on how to track down such bugs? Are there for example code-analysis tools that will highlight code with undefined behavior likely to give different results when optimized, or valid code that may trigger known compiler bugs? It seems like such a thing would be immensely valuable - if I have a compiler-related mystery bug *somewhere* in my codebase, being able to narrow that down to even the 0.1% of lines containing "suspicious" code could make the difference between it being impossible to solve and merely difficult.
In fact I'm rather surprised that "this code may cause undefined behavior" isn't a standard compiler warning. I mean C and C++ are performance-oriented languages that practically invite developers to come up with "clever" solutions, a warning that they have exceeded the sometimes non-obvious limits of defined behavior would probably save more debugging hours than any other warning on the planet.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
> counting backwards is faster because comparing to the static value 0 is faster
Quite so. However if you're counting there's a pretty good chance that you're traversing an array, in which case caching optimizations that presume a forward traversal will tend to completely overwhelm any potential gains from comparing to a constant instead of a variable.
While we're ranting, why is _++ even a distinct operator from ++_? Are there really that many situations where i++ can streamline the code significantly? We're only human, it's physically impossible for us to retain and apply every subtly of an 800+ page language specification indefinitely, even if it were completely unambiguous to begin with.
As for your example, actually there's a very good reason for it to be undefined behavior: The fewer restrictions placed on the order of evaluations of generally non-interacting elements, the more opportunities are exposed for optimization, especially once you bring parallel and vectorized CPUs into the equation. And imposing deterministic evaluation order only when side effects may be present then introduces numerous special cases which require expanding the standard and drastically complicating compiler implementation, with a corresponding increase in bugs and reduction in optimizations due to excessive caution. Warnings would be nice though. Even if the fact that Funky( Foo(), Bar() ) causes undefined behavior because of side effects buried 63 function-calls down would likely still slip by, at least the most obvious problems would be caught and perhaps serve as a reminder that caution must be exercised.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Is gcc 4.8 the one where the compiler source was completely converted to C++?
/me ducks.
Stick Men
If you're using virtual memory, you're doing it wrong. Malloc() my ass.
The start of the summary was just so bizarre to me. Of course different versions generate different code, that's what happens when you change how code is optimized. Why would someone set out to investigate this, except as a question about how it improves the code.
Now if there's a bug that's a different issue, and all compilers are going to have some sort of bugs somewhere as these are complex pieces of code. But a change in the output should never be treated as evidence of a bug.
This.
I've seen a few cases where I've inherited code working on 4.7 that broke on 4.8. In all cases compiling with -Wall -Wextra -Werror and then correcting all the mistakes it flagged up removed the errors. Basically the only reason it was working in the first place was the original developer was making assumptions about how the specific compiler would behave in undefined circumstances. By cleaning up the code it was now only relying on defined behaviour.
While we're ranting, why is _++ even a distinct operator from ++_?
Some platforms has a preincrement and postdecrement opcodes, but not the converse. Some vice versa. Once upon a time, a programmer would care about such things, and C would never have gained the acceptance it did without the ability to use what looked fast on your platform. Heck, there are still coders who use the "register" keyword, as if that still did anything.
Socialism: a lie told by totalitarians and believed by fools.
The real issue with GCC 4.8 is it uses C++ as implementation language, so now I need a C++ compiler to compile a C compiler. Fuuuuuuuuuck.
The GP has a valid point. Most people complaining of these optimizer "bugs" likely have undefined behavior. In C & C++, the compiler/optimizer/linker is given full freedom on what to do. Often, the compiler will just eliminate the code. It could, in theory, format your hard drive. Yes, compiler bugs do happen, but they tend to be rare and infrequent. Last GCC bug I saw was on a minor revision of 4.1.2 that caused an ICE (internal compiler error) when you had an anonymous namespace at the global namespace level.
Ah, that would explain it.
And I imagine there's some compilers for embedded systems that still take "register" seriously.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if (p+100<p)
With pointers that's fairly reasonable, because they practically never overflow like that anyway, but it applies to signed integers as well and there it's fairly dangerous optimization. Many overflow checks etc are easiest to do with wraparound arithmetic and it's terribly easy to forget the unsigned keyword there (thus invoking UB) and the resulting bugs get promoted from lurking portability issues to actual live security hazards.
Nm, rhetorical question, you ARE an idiot. GCC has extensive regression tests, any patch that you send in MUST come with tests, and if you RTFA you will see that it doesn't mention any actual bugs. Most complaints about GCC 4.8 optimizations are from users who wrote code that did stuff that the C standard said would cause undefined behavior, but in older compilers happened to do predictable things so the buggy user code went into production without anyone noticing. GCC 4.8 optimizes more aggressively and if your code does undefined operations, all the dire warnings about that will ACTUALLY COME TO PASS instead of being FUD. Of course there ARE compiler bugs but the more common case is user code bugs that newer compilers actually trigger. C is just a bloody dangerous language and people unwilling to deal with that shouldn't be writing in it. See blog.regehr.org for much more.
to find undefined behaviour, using Clang static analyzer or something like Coverity, etc. are all approaches to pursue.
Some info about KCC is here: http://blog.regehr.org/archives/523
Basically the code that comes out of it is slower than crap, but it checks just about every error condition and undefined behaviour you can imagine, and tells you if it hits such a thing. I want to start using it (haven't yet).
Even Stroustrup hates seeing C/C++.
C++ in an intersection, not a subset, of C.
If you are writing your C++ code with a C mindset, you are doing it wrong and should get the fuck out.
Get back to Python you script kiddie piece of shit.
You need the compiler to do everything for you?
If you don't know if something is stupid, GTFO.
Now the compiler is supposed to reason out your intent?
What are you smoking, numbnuts.