GCC 4.0 Preview
Reducer2001 writes "News.com is running a story previewing GCC 4.0. A quote from the article says, '(included will be) technology to compile programs written in Fortran 95, an updated version of a decades-old programming language still popular for scientific and technical tasks, Henderson said. And software written in the C++ programming language should run faster--"shockingly better" in a few cases.'"
What I'd like to see is features like OpenMP for thread-level parallalism.
Is it just me, or is compiling C++ code an order of magnitude slower than compiling C code? (exaggeration) I'm sure there's a very good reason why this is so, but it still doesn't make me happy.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
"...software written in the C++ programming language should run faster--..."
Is this the programmer's way of saying it will run at some speed less than faster?
But will it compile C++ any faster? The difference between compile times of C and C++ files is staggering. Compiling Qt/KDE takes forever with gcc 3.x.
"GCC 4.0 also introduces a security feature called Mudflap, which adds extra features to the compiled program that check for a class of vulnerabilities called buffer overruns, Mitchell said. Mudflap slows a program's performance, so it's expected to be used chiefly in test versions, then switched off for finished products." - from the article
I really love this feature, it will probably cut down on a great deal of problems. My only concern is that some devs will think running it all the time is OK (read: "Mudflap slows a program's performance"), so hopefully that's not the case.
More detailed information on the mudflap system can be found here.
Problems are like gifts, it's better to give than to receive
Screenshots, screenshots! I need screenshots people!!!
Can we get Boost in standard library please ?
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.slashdot.org Errors found while checking this document as HTML5!
And how many times will they break ABI, API and library compatability in THIS major release? Count stands at 4 for the 3 series, maybe higher.
The biggest challenge with Binary compatability across Linux distros is the GCC release (followed by the glibc releases, who live in the same ivory tower). I realize that things have to change, but I wish that they would not break compat between versions quite so often...
I'd really like to be able to take a binary between versions, and it just work.
This is one area where Sun rocks. Any binary from any solaris2 build will just work on any later version. With some libraries, you can go back to the SunOS days (4.1.4, 4.1.3UL, etc). That's 15 years or so.
Zapman
but the reason it takes forever to compile KDE lies in fact that it uses extensively the templates. While templates (a.k.a. generics) are a very useful language feature, they increase compile times. Including support for export template feature could help but only when anybody would use it in their code.
You can make an experiment and try compiling KDE with Intel C++ or Comeau C++ compilers, and see that not much can be gained comparing to GCC.
You can defy gravity... for a short time
I wish the compiler would output sane error messages on compiling code that uses a lot of templates (i.e STL). At least fixing it so that the line numbers are shown during debugging would be a huge improvement!
GCC is just the compiler. If you want an IDE that works with it, look around... there are a few. I don't really have any recommendations as I'm mostly working with other tools right now.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
The protection from buffer overruns is valuable enough that perhaps it is worth including all the time. After all, who knows what vulnerabilities lurk after you "turn off" mudflap?
;)
Besides, it might just be automating the addition of the same code that we would need to put in to fix buffer overrun vulnerabilities.
This is one case where I think it's worth "wasting" a small amount of performance (except perhaps in routines that need to be highly optimized) to give added security. Sure beats ray-traced-on-the-fly desktop widgets, or something, which you KNOW we're goingto see advertized in another decade.
Does anyone have a LiveCD of this stuff? ;-)
echo "getuid(){return 0;}" > e.c; gcc -shared -o e.so e.c; LD_PRELOAD=./e.so sh
My guess is that they are using f2c (translating fortran to C first, then compiling), rather than integrating and updating g77. I don't expect this to match most native Fortran compilers for efficiency.
The gcc team seem to have no respect for legacy code. Incompatible syntax changes and incompatible dynamic libraries make me dread every new release.
Likewise, there are several IDEs that can nicely handle a C++ project which uses GCC. Eclipse is maybe the best example of these.
Besides, do you really want "Must have GUI to cope with compiler" on your resume? ;-)
If you're interested, here's a (long) discussion which makes reference to many of the things coming in the new GCC.
Game! - Where the stick is mightier than the sword!
Don't all compilers convert a program's source code into binary instructions?
That GCC is the staple in the embedded world. They could've mentioned that most probably it is the compiler used for the proverbias Internet toaster, or maybe even something sexier, like Formula-1 engine-tuning app... ;-) Apparently the article is written to educate the "general public", would be nice to put this little tidbit into their minds..
Paul B.
Have they found some new-fangled magical technique for compilation?
Actually, SSA trees probably count, which is new in GCC 4 (invented in the early 90's). Look here, scroll down to "Power Through Builds" for a list of improvements from SSA trees.
Of course, this claim may be due to no longer doing something shockingly inefficient.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
$ gcc --version
gcc (GCC) 4.0.0 20050310 (Red Hat 4.0.0-0.33)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I've noticed it compiles a bit faster and the binaries are a bit bigger aswell.
(I'm especially excited by the possibility of random compiler incompatibilities!)
First, you are missing view of an ideaology among many open source projects which is to create a very powerful and optimized that does not bind itself, its users, or any other projects that want to build on top of it to any particular GUI. Most programs do this by running in extremely flexible commandline interfaces, allowing library interfaces, or just being a library for external programs to reference. You do have a point, however, that there is a lacking of a good IDEs in the linux community. I don't think any of us can deny the tremendous effect of an extremely good IDE (Eclipse for java for example). I think within the open source community one of the biggest threats they have to people just picking up linux and wanting to program is a lack of a good IDE. Honostly, when i'm programming in .NET on Visual Studio 2003, I feel like i'm in heaven. I only wish I could have the same type of luxury within linux (Especially with the MONO project!). But with all things, it takes contribution.
"gcc" will switch languages based on the filename extension. Many people compile C++ by calling "gcc".
"g++" suppresses that bit of logic and forces the language to be C++, which is useful if you have some C code that you want to be built as C++, or if you're feeding the C++ source from stdin (hence, no filename extension).
Linking C++, though, you want to use g++ instead of gcc, unless you really know what you're doing. The "gcc" driver doesn't know which libraries to pull in -- yes, this is something we'd like to change someday -- and the "g++" driver will correctly pull in libstdc++, libm, etc, etc, in the correct order for your linker and your system.
(Hands up, everybody who remembers when "g++" was a shell script!)
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Even the most cursory search of the GCC mailing list archives would disprove this.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
What do they do that makes their c compiler so much faster ?
Damn!
That's: g++ -o myapp file1.cpp file2.cpp file3.cpp
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Anyone know when 4.0 will be ready for the distros?
3 monts? 6 months? a year ? forever?
.\.\att Clare
Have you tried maintaining a compiler used in as many situations as GCC? (If not, you should try, before making complaints like this. It's an educational experience.)
We added a "select ABI version" to the C++ front-end in the 3.x series. If you need bug-for-bug compatability, you can have it.
Wanna know when this is gonna happen? Sooner, if you help.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Unless things have changed recently, their front end is derived from gcc 2.96 (which, as most of you know, does not actually exist), and their backend is the Pro64/Open64 Itanium compiler retargeted to AMD64. And Pro64 is itself mostly SGI's MIPSPro retargeted to Itanium. The PathScale team leader is also the former SGI MIPSPro compiler team leader (Fred Chow). MIPSPro rocks, and PathScale is also pretty good. The best part is that the compiler is GPL'ed (but only v2, and with a patent infringement clause that actually violates the GPL itself *sigh*),
I don't know when was that last time you're refering to, but all benchmarks I've seen in the last couple of years clearly show Intel's compiler superiority. The generated code was up to 3 times faster.
I just hope that GCC closes the gap with 4.0.
1's and 0's should be free.
You've got to be fucking kidding me.
Have a look at the mailing list anytime somebody reports a bug, and the choice is between fixing the bug and changing the ABI. Watch the flamefests erupt.
(Watch them die down a few days later as one of the brilliant core maintainers manages to do both, with a command-line option to toggle between the default fixed version and the buggy old version.)
Wait a few months. See a new corner-case weird bug some in. Lather, rinse, repeat.
Such as...?
All the ones I can think of were GCC extensions long before they were officially added to the languages. In fact, their presence in GCC actually influences their presence in an official language standard, because that's what the standards bodies do: standardize existing practice.
The troublesome part is when the syntax as added to the language standard differs from the extension that was originally put in GCC. Then we have to choose which once to support -- because supporting both is often not feasible -- knowing that whatever choice we make, slashdot is going to whinge about it. :-)
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Hands up, everybody who remembers when "g++" was a shell script!
...
Are you going to rob us? At first I thought that was your joke, but the more I think about it, the more I wonder if, being a part of the gcc team, you are inserting insidious code to look for credit card and bank account numbers on the disk during compiles and use steganography to embed them in executables; no one else would know about them, and all you'd need is a robot crawling download pages, looking for binaries with some magic code somewhere
The little bit of extra disk thrashing during the combined compile and search would never be noticed, and no one looking at compiled machine lanuage ever wonders why it is so odd looking. They just assume it's because of some new fangled optimization.
My god you are devious rascals!
Infuriate left and right
GCC is an incredibly versatile compiler, with frontends for C, C++, Java, Ada and Fortran provided with the basic install. 3rd party extensions include (but are probably not limited to) Pascal, D, PL/I(!!) and I'm pretty sure there are Cobol frontends, too.
They did drop CHILL (a telecoms language) which might have been useful, now that telecoms are taking Linux and Open Source very seriously. As nobody seems to have picked it up, dusted it off, and forward-ported it to modern GCCs, I think it's a safe bet that even those interested in computer arcana are terribly interested in CHILL.
OpenMP as been discussed on and off for ages, but another poster here has implied that design and development is underway. OpenMP is a hybrid parallel architecture, mixing compiler optimizations and libraries, but I'm not completely convinced by the approach. There are just too many ways to build parallel systems and therefore too many unknowns for a static compile to work well in the general case.
Finally, the sheer size and complexity of GCC makes bugs almost inevitable. It provides some bounds checking (via mudflap), and there are other validation and testing suites. It might be worth doing a thorough audit of GCC at this point, so that the 4.x series can concentrate on improvements and refinements.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
At the GCC conference in Ottawa in the summer of 2003, there were two very interesting features presented that they said might make it into GCC 4.0.
- LLVM. Low Level Virtual Machine. This is a low level and generic pseudo code generator and virtual machine.
http://llvm.cs.uiuc.edu/
This sounded fabulous, and the project appears to be progressing well (it's at v1.4 now). If I understand correctly it is only politics that has kept it out of GCC 4. Can anyone shed more light on this?
- Compiler Server. Rather than invoking GCC for each TU you would run the GCC-Server once for the whole app and then feed it the TU's. This would make the compile process much faster and allow for whole program optimization.
This would have been nice but perhaps they found better ways to achieve the same thing.
At present, there are numerous features of Fortran 95 that gfortran (the gcc Fortran 95 compiler) does not handle correctly. G95 http://www.g95.org/, from which gfortran forked, is closer to being a full Fortran compiler. One can search the Usenet group comp.lang.fortran to confirm these statements.
If you just want a free Fortran 95 compiler use g95. Bugs reported to Andy Vaught are usually fixed quickly, and fresh Linux compiler binaries are posted almost daily. If you want to participate in the development of a Fortran 95 compiler, gfortran is more democratic.
Not sure what you tried but in most compiler benchmarks Intel ranges from "just as fast as the others" to "devistatingly faster". It sometimes generates code that's faster than hand optimised assembly designed to do the same thing. The Intel compiler even generates better code for Athlons than other compilers.
It gets even more devistating on Fortran. Seems Intel has like the only good Fortran compiler in the world. That's part of the reason their chips do so well on SPEC, the FP part is all fortran code and their compiler just rules at it.
If you Google around for compiler benchmarks you'll find a number of them, and virtually all show the Intel compiler dominating. One of the best, which I can't find a link for right now, was a test done by Toms Hardware. They did MPEG-4 encoding with the P4 and found that it blew. Intel figured something was wrong, got the source and recompiled the program (was compiled with VC++ 6.0). The P4 almost quadrupled in speed (and got even faster with the SSE optimised modes they added), and even the Athlons showed a near doubling in speed.
"They that can give up high performance to obtain a little temporary security deserve neither performance nor security."
--not Benjamin Franklin
My other first post is car post.
No idea about MSVC, it doesn't build very good Linux binaries though anyways.
Game! - Where the stick is mightier than the sword!
...would that not mean the speed other programs run at reaches "faster" more quickly?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Gcc was what killed fortran. Gcc did not implement many fortran features forcing fortran programmers to use a pathetic subset of the language. For example in F77 they never implemented opening files read only (only open read-write) so you could never detect EOF's on pipes to fortran 77. But the real death knell for fortran was sung by g95 and its reduced language elements. Anyone who mistook g95 for F95 would indeed be right in concluding fortran was a dated useless language. Fortran95 does indeed have stucrtures, classes, pointers and allocatable memory contrary to widespread belief to the contrary due to g95. The irony is that fortan 2000 is actually a wonderful language for scientific programming in the coming age of multi-core processors. I would not write a wordprocessor in fortran but for sceintific programming its effieicient memory storage, implicit parallelism in the most basic elementes of the language language (for example for-loops that were allowed to iterate over their range out of order, and subroutines that declare which variables can have side effect) is perfect for the coming age of microprocessing. My favorite parts of fortrans are that one cannot overflow a buffer nor is it possible for a typo to compile. That last statment will elude understanding by most folks who never tried to write a line parser for fortran syntax but it's consequence is that hidden syntax errors that compile are impossible in fortran. (logic errors of course are possible in any language) one trivial example is you cant write = when you meant == or +=. Or the declaration of intent on calling arguments allows you to pass by reference without worrying that an array will be unintentionally modified. RIP fortran95, killed by g95.
Some drink at the fountain of knowledge. Others just gargle.
A start would be sticking to ISO C. If you can possibly avoid it, steer clear of writing code targetted at a specific compiler.
Tubal-Cain smokes the white owl.
OS X 10.2 shipped with GCC 3.1 I believe -- a while before it was released.
;-)
10.3 shipped with GCC 3.3, before 3.3 was released.
10.4 looks to continue the pattern. Apple takes a snapshot of GCC, forks it 6-9 months before the OS ships, tweaks/tunes/optimizes GCC, builds and ships with that version of the compiler, and then re-submits its changes, so future GCC builds (especially the PPC ones) get all the goodies.
And the compiler has had 6-9 months of QA from Apple, which is as good as the amount of credit you give their QA department
Alas, from experience I can attest that usually this is your own fault for writing nonstandard code targetting some particular feature of gcc. The best thing you can do to your code is to make sure it compiles on multiple compilers. Listen to your compiler's warnings; you ignore them at your peril.
Is there anyone who knows what this LLVM issue is about? Anyone out there who is not just ranting incoherently about RMS?
Peace, or Not?
>> but Intel has actually been putting a fair bit of work into GCC
Bollocks. They only wrote some stuff to support IA64 because they were desperate and no-one else would.
LLVM is sort of a mostly-compiled form of a program.
(like preprocessed, but more work having been done)
If gcc can convert C to LLVM, and LLVM to native,
then you could replace either half with something
proprietary. You could add a proprietary middle
step that optimized LLVM code.
Just thought I would point out that Phil is one of the people who have been in charge of the libstdc++ (GNU Standard C++ Library) for a long time. I realise that no one is used to actual subject authorities on slashdot.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
It's a pretty far-fetched idea, but...
LLVM can be used as a GPL bypass. If this were to
become a problem, people would not feel as good
about contributing to gcc.
Well, that's how RMS thinks anyway. Never mind that
adding LLVM would enable some really neat stuff.
For me, *NIX is an IDE.
MyIDE = xterms + vim + grep + make + svn + man + the browser + diff + io redirection +....
It's not as polished as an IDE, not as cool. But you get to organize it any way your want.
And besides, considering most of my time is spent manipulating text, any IDE that doesn't have vim integrated in it is useless, at least to me.
(NB: if you like, you can subst emacs for vim in the above)
LLVM is written in C++, and RMS has dictated "Only C shalt thou write for gcc."
Saying that distcc is "less error prone" is a meaningless statement since you're comparing distcc against an unfinished project. The compile server can work "even when preprocessor tricks are used" - give us credit for having thought about the issues, and having come up with solutions, albeit partially implemented and not necessarily optimal.
Your compile server makes a lot of assumptions that many popular projects break.
So what? As long as many projects can benefit from it. If some projects benefit, that would encourage other projects to clean up their header files, which would be a good thing in itself. (A side benefit of the compile server is that it encourages clean design.)
I agree discc is far simpler, and it will be challenging to engineer a compile server that can detect and recover from header files that aren't "clean", without the checks taking so much time we lose most of the benefit. It's essentially research, and there is no guarantee that it would justify the investment needed. But it does have good potential.
Note there are some limitations for distcc. First, of course it assumes you have multiple idle machines you can spread your compiles to. That may not be the case in a home environment or when travelling. Second, shipping pre-processed source code all over the place is quite expensive. Distcc doesn't save you time in preprocessing, optimizing, or code generation. All it helps with is parsing and semantic analysis, so the best it can give you is a modest constant-time improvement. By this I mean that if you have M files that include N header files each, the compile-time with distcc is O(M*N), but with the compile server it could potentially be O(M+N).
printf("%d\n", ({for(i=2;i14;i++);i;}) );
Many modern filesystems use something called delayed allocation, so (eg, temp) files that are written and deleted shortly after, are removed from the write queue, and never actually make it to disk. I think I recall reading it coming to reiserfs a few years back. So the effect is that /tmp already is mounted in ram.
-2A
The revolution will not be televised... but it will have a page on Wikipedia
I didn't go into details because this has been covered elsewhere, and I'm tired of discussing it myself. But I didn't realize I would be accused of "uninformed slander". So. A bit of background info first.
Inside the guts of the compiler, after the parser is done working over the syntax (for whatever language), what's left over is an internal representation, or IR. This is what all the optimizers look at, rearrange, throw out, add to, spin, fold, and mutilate.
(Up to 4.0, there was really only one thing in GCC that could be properly called an IR. Now, like most other nontrivial compilers, there's more than one. It doesn't change the political situation; any of them could play the part of "the IR" here.)
Once the optimizers are done transforming your impeccable code into something unrecognizable, the chip-specific backends change the IR into assembly code. (Or whatever they've been designed to produce.)
Each of these transformations throws away information. What started out as a smart array class with bounds checking becomes a simple user-defined aggregate, which becomes a series of sequential memory references, which eventually all get turned into PEEK and POKE operations. (Rename for your processor as appropriate, or look up that old joke about syntactic sugar.)
Now -- leaving out all the details -- it would be Really Really Useful if we could look at the PEEKs and POKEs of more than one .o at a time. Since the compiler only sees one .c/.cpp/.whatever at a time, it can only optimize one .o at a time. Unfortunately, typically the only program that sees The Big Picture is the linker, when it pulls together all the .o's. Some linkers can do some basic optimization, most of them are pretty stupid, but all of them are limited by the amount of information present in the .o files... which is nothing more than PEEK and POKE.
As you can imagine, trying to examine a pattern of PEEK and POKE and working out "oh, this started off as a smart array class with bounds checking, let's see how it's used across the entire program" is essentially impossible.
Okay, end of backstory.
The solution to all this is to not throw out all that useful abstract information. Instead of, or in addition to, writing out assembly code or machine code, we write out the IR instead. (Either to specialized ".ir" files, or maybe some kind of accumulating database, etc, etc; the SGI compiler actually writes out .o files containing its IR instead of machine code, so that the whole process is transparent to the user.) Later on, when the linker runs, it can see the IR of the entire program and do the same optimizations that the compiler did / would have done, but on a larger scale.
This is more or less what all whole-program optimizers do, including LLVM. (I think LLVM has the linker actually calling back into the compiler.)
The "problem" is that between the compiler running and the linker running, the IR is just sitting on the disk. Other tools could do whatever they want with it. RMS's fear is that a company would write a proprietary non-GPL tool to do all kinds of neat stuff to the IR before the linker sees it again. Since no GPL'ed compiler/linker pieces are involved, the proprietary tool never has to be given to the community. Company wins, community loses.
End of problem description. Begin personal opinionating.
It's a legitimate concern, but many of us feel that a) it's going to happen eventually, and b) we do all GCC users a disservice by crippling the tools merely to postpone an inevitable scenario. As usual, there's a wide range of opinions among the maintainers, but the general consensus is that keeping things the way they are is an untenable position.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Wasn't meant to be a joke... I know they build GCC with GCC, but I don't suppose they could have built the front-end with GCC 4.0 yet, due to the bugs still present. So current speed up is probably based on compilation with GCC 3.4.
Don't know why I bother writing this, none's going to read it now...
1's and 0's should be free.
And if you are using templates, you are actually using a compile time language, which obviously will slow compilation down.
But they do claim C++ compilation is much faster with 4.0:
Personally, I don't care much. Development seem to be restricted by link time, for a well organized project. And I always compile with full optimization.