Slashdot Mirror


GCC Gets PCH Support And New Parser

Screaming Lunatic writes "GCC will finally get precompiled header support which should help with faster compile times. GCC will also be fitted with a new recursive descent parser that fixes more than 100 bugs in GCC. I'm not sure how they decomposed C++ into a context free grammar so that it could be parsed using recursive descent."

83 comments

  1. Uh Links? by GigsVT · · Score: 1

    Which link is the story?

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  2. the changelog for 3.4 does a better job by yugami · · Score: 3, Informative

    since the link for the parser is dated 2000 it's a bit confusing as to why this is news. However the 3.4 changelog (yes, 2 versions away for both features) meantions both.
    http://gcc.gnu.org/gcc-3.4/changes.html

  3. ANTLR? by angel'o'sphere · · Score: 5, Informative

    Terrence Parr, the author of the antlr compiler construction tool says that most languages can be parsed with LL-k grammars provided you have a deep enough look ahead (that means 'k' is big enough).

    Basicly: you aer NOT context free but context sensitive, of course.

    Terrence showed that in practice the hughe drawbacks of a look ahead of k does not appear often.

    I would think the typical look ahead for C++ is 1 in over 85% of the cases and 2 for the rest and in rare cases 4 ... however this is a guess(stemming from java parsers which are often LL-1)

    angel'o'sphere

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    1. Re:ANTLR? by farnsworth · · Score: 2, Informative
      Wow, reading your post really hurt my head, partly because I don't really understand what you are talking about, and partly because of the way you wrote it. Here's the same content with commas, lists and some spelling corrections. May others' heads hurt less.

      Terrence Parr (author of the antlr compiler construction tool) says that most languages can be parsed with LL-k grammars, provided you have a deep enough 'look-ahead' (i.e., 'k' is big enough).

      Basicaly: you are NOT context-free, but you are context-sensitive.

      Terrence showed that, in practice, the drawbacks of look-ahead do not appear often.

      I would think the typical look-ahead for C++ is:

      1 in over 85% of the cases

      2 for the rest

      4 in rare cases

      ... however this is a guess based on java parsers, which are often LL-1.

      --

      There aint no pancake so thin it doesn't have two sides.

    2. Re:ANTLR? by bhurt · · Score: 3, Insightful

      The problem is that C++ requires basically an infinite k to parse context free. For example, consider what the compiler does when it sees a set of tokens like:
      a d;

      Now, is this:
      a) a strange but legal expression parsed
      like:
      (a ) d;

      Both are legal interpretations. The only way you can tell is by looking at the type of a before parsing the rest of the expression. And even this may not help. IIRC, the issue may be undecidable, as template names are in a different "namespace" than variable names (i.e. I can have both a template type and a variable with the same name at the same time.

      Note that d being a more complex expression than just a name also allows you to decide. But we're up to a k=7 on the simple case. Make b and c arbitrarily complex expressions themselves, and you can make k need to be arbitrarily large. Or heck, consider:
      a d, e, f;
      (three new variables d, e, and f, or four different expressions combined with the comma operator?).

      And it doesn't matter how infrequent this code construct is. The compiler has to deal with this situation, and situations like it. And deal correctly. C++ simply is not LALR(1) parsable, and likely not really parsable at all. It's a fundamental flaw in the language.

      And this does cost. YACC was developed because it's easier to write and maintain parsers in YACC than hand-rolled parsers for any nontrivial but LALR(1) parsable language. Time and effort will go to the development and maintainance of the compilers. For commercial products, this means higher prices. For free products like GCC, this means fewer new features and fewer bugfixes in other parts of the code.

      Brian

    3. Re:ANTLR? by bhurt · · Score: 2

      Argh. Plain old text obviously isn't.

      The above code should look like:
      a < b, c > d;
      (a < b), (c > d);
      (a < b, c >) d;
      a < b, c > d, e, f;
      etc. Must remember to preview before posting.

    4. Re:ANTLR? by angel'o'sphere · · Score: 2

      What I wanted to say is:

      building a LL-k parser on a language wich is not realy suited was a long time considered un good.

      Reason: k may be infinite.
      Reality: you parse only finite source files ... this reduces k a bit.

      Fact: constructs in wich you need a big k are rare!

      I did not say that your lists of comparisions do NOT exist or do only exist in rare cases, I ment: if they appear they are usualy short.

      So the specific k needed in the samples you brought up is: 6?

      angel'o'sphere

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    5. Re:ANTLR? by Pseudonym · · Score: 4, Informative

      The quote from Terrence Parr is misleading.

      While most languages are LL(k) for some k, most grammars are not, and require some massaging to get into a form which LL parsers will accept. The massaged version is invariably not the "most preferred" way to specify the language. In many cases, a compiler compiler (such as ANTLR) can do the massaging automatically, but this is often not the case.

      Both ISO and ARM C++'s grammars, in particular, are inherently ambiguous and require semantic disambiguation no matter what you do.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    6. Re:ANTLR? by SporkLand · · Score: 1

      "Basicaly: you are NOT context-free, but you are context-sensitive"

      Having k lookahead is not the same as being context sensitive. Context sensitive grammars come in the form.

      non terminal or terminal 0 or more times :- not terminal or terminal more times than the left side.

      Using K lookahead to resolve ambiguities is not the same as being context sensitive.

    7. Re:ANTLR? by cakoose · · Score: 1

      I came across a simpler example of this (I think) when I tried to write a C parser. Take the following code:

      f ( g );

      This can mean one of two (or possibly more) things:

      • An invocation of function 'f' with parameter 'g'. OR
      • A declaration of the variable 'g' of type 'f'.

      The meaning depends on what 'f' represents. If 'f' is a type, then it is a variable declaration. Otherwise, it is a function call. I think this is what makes C impossible to parse with a plain context-free grammar. You need to do some weird stuff in your YACC production rules to get the proper result.

      As far as I can tell, it's really a silly thing that could have easily been avoided by changing the pointer syntax a little. As a matter of fact, I think that the thing causing problems is the way pointer-type variables are declared (Java doesn't have this issue). I guess the reason the problem exists is that the original parser was hand-written and so the Bell Labs guys didn't bother with the computational-theory side of things.

    8. Re:ANTLR? by batand · · Score: 1

      The standard C++ grammar is highly ambiguous. So not even infinite lookahead can be used to parse it. The ambiguities require context-information to be resolved, and in many cases even given this context information (eg. is this identifier a type, variable or template) infinite lookahead is still required. Actually only very few problems are solved with finite (aka k) lookahead.

      Also the standard C++ grammar is not free of left-recursion, so LL (or recursive descent) parsing is not well-suited for this grammar at all.

      ANTLR uses infinite lookahead (he calls it predicates), and therefore in theory the parsers constructed using this are non-linear.

      Either Mark Mitchell of CodeSourcery has transformed the grammar into some unrecognizable gob of rules, or he uses backtracking and infinite look-ahead. In the first case some serious restructuring of the syntax tree is necessary.

      I have just finished a thesis on the problems of parsing C++, and I would not have chosen recursive descent. I would have chosen automatically generated recursive acent (LALR(1)) instead, with backtracking and dependency on context information (aka lexical feedback). But I hope the best for their approach.

      I just hope that the parser can also be used for other purposes than the GCC-compiler, ie. that it is a separate module, because a decent, free C++ frontend is needed by many projects (editors,
      code analyzers).

    9. Re:ANTLR? by AJWM · · Score: 2

      The standard C++ grammar is highly ambiguous. So not even infinite lookahead can be used to parse it.

      I figure that, after a reasonable attempt to figure it out from context, the compiler should be free to throw up its hands in disgust and tell the programmer something like: "I can't figure out what the hell you mean here, and if I can't, probably a human couldn't either. Please make it less ambiguous."

      As the various obfuscated C (etc) programming contests prove, just because code is syntactically (and semantically) legal doesn't mean it's comprehensible.

      infinite lookahead is still required. Actually only very few problems are solved with finite (aka k) lookahead.

      Not to worry. If it can't be solved with finite lookahead, the program source itself must be infinite (or incorrect), and the compiler will never halt anyway.

      Determining ahead of time whether this is the case is left as an exercise for the student ;-)

      --
      -- Alastair
    10. Re:ANTLR? by Anonymous Coward · · Score: 0

      This particular weird declaration construct is what forces C++ to have the "typename" keyword. Consider trying to tell apart, particularly when doing templates:

      A::B(C) as either being a variable C of type A::B, or a call to method B of class A with argument A.

      typename is being used to disambiguate that -- but at a cost of more verbose code in far more common cases.

  4. Could I use that parser indepedently? by Anonymous Coward · · Score: 2, Interesting

    One of things sorely missing in C++ is an easy way to analyze source code. Would that new parser simplify this? Can I use it outside of gcc to implement a refactoring tool (for example).

    Anyway, the links provide little to no information. Perhaps someone knows more.

    1. Re:Could I use that parser indepedently? by aled · · Score: 1

      easy: program in Java.

      Sorry, I couldn't help myself :-)

      --

      "I think this line is mostly filler"
    2. Re:Could I use that parser indepedently? by tbspit · · Score: 1

      but: he wants to program in a real language.

      Sorry, neither could I :-)

    3. Re:Could I use that parser indepedently? by __past__ · · Score: 2

      Have a look at GCC-XML.

  5. Why the extra step? by glenstar · · Score: 3, Interesting
    To create a precompiled header file, simply compile it as you would any other file, if necessary using the -x option to make the driver treat it as a C or C++ header file. You will probably want to use a tool like make to keep the precompiled header up-to-date when the headers it contains change.

    Why, why, why, why? Why can't the header file simply be compiled at the first inclusion and cached somewhere? I know I am bitching about a single step here, but can anyone explain to me the rationale behind this?

    1. Re:Why the extra step? by PD · · Score: 5, Informative

      Because that sort of thing can get screwed up easily and cause all sorts of problems. I'm thinking of how Borland's precompiled headers sometimes goofed up, or my horrible experiences with Sun's cached templates on their C++ compiler. I'd rather explicitly tell the compiler exactly what I want done in terms of precompilation than to let it guess and screw up on its own.

    2. Re:Why the extra step? by glenstar · · Score: 2

      Ok... I can see that... kind of. But wouldn't the existence of object files fall under the same category?

    3. Re:Why the extra step? by Anonymous Coward · · Score: 0

      This isn't nearly the same as object files. Read the link. There are all kinds of conditions on when and how you can use precompiled headers.

      For example, you can only have one precompiled header per compilation step (though that header can include other precompiled headers...)

    4. Re:Why the extra step? by PD · · Score: 5, Insightful

      No, object files are built one at a time with a makefile. The correlation would be if we just typed "gcc" in a directory and the compiler built every cpp file into an object. What if I didn't want a file compiled? What if that file was supposed to be copied into a directory after it was built with another tool? In that case, gcc would be doing the wrong thing by building every .o automatically.

      A makefile lets me control the building of each and every .o file myself, allowing for all sorts of things that I might want to do.

      Precompiled headers should work the same way, or they won't be as flexible as the .h files.

    5. Re:Why the extra step? by j7953 · · Score: 5, Interesting
      Why, why, why, why? Why can't the header file simply be compiled at the first inclusion and cached somewhere?

      But that's just what make will do. Why rebuild the same functionality within a different tool? Basically, the reason is (probably, I'm not a GCC developer) the UNIX philosophy of having small tools doing their job. GCC is a compiler and nothing else, make is a tool that decides what needs to be compiled.

      If you want automation, you can always use an IDE (or some other tool) that includes a make equivalent or that creates appropriate makefiles for you.

      --
      Sig (appended to the end of comments I post, 54 chars)
    6. Re:Why the extra step? by cookd · · Score: 1

      I'm a bit confused -- the excuse for not caching headers is because that is what make would do? Make doesn't cache precompiled headers, which is what we want to do. So what if the precompiled headers are recompiled using make's dependency rules. That is ideal, if you ask me.

      --
      Time flies like an arrow. Fruit flies like a banana.
    7. Re:Why the extra step? by j7953 · · Score: 2
      Make doesn't cache precompiled headers, which is what we want to do. So what if the precompiled headers are recompiled using make's dependency rules. That is ideal, if you ask me.

      Yes, that's what I meant. If you write an appropriate makefile rule, make will cause the headers to be recompiled only when they have been modified. Effectively, this means that make will be "caching" your precompiled headers.

      --
      Sig (appended to the end of comments I post, 54 chars)
    8. Re:Why the extra step? by cookd · · Score: 1

      Well, that kind of header caching isn't the idea behind precompiled headers. First, note that (absent the concept of precompiled headers) nobody directly compiles headers:

      gcc niftystuff.h -o niftystuff.o

      I suppose it might work, but it is meaningless. You can't access the declarations in niftystuff.h by linking with niftystuff.o, since the declarations are a compile-time feature.

      The best you can do is 1) minimize the headers included in each .C file to the minimum needed (possibly even organizing your program according to the headers needed) and 2) minimize the need to recompile .C files.

      As far as I can tell, you are talking about part 2. If properly set up, make can compile a .C file only when it or an included .h file has changed. But when you have to recompile the .C file, you have to compile the included .h files, too.

      The point of precompiled headers (.pch) is to save the time spent parsing the declarations of headers. That way, even when the .C file changes, you don't have to reparse the .h files it includes.

      This is primarily useful for headers that come from outside of the current project, e.g. OS headers, CRT headers, X11 toolkit headers, libraries, etc. You probably never change these, so you precompile these headers once when they are installed (or upgraded). Now you never have to wait for stdlib.h to parse.

      --
      Time flies like an arrow. Fruit flies like a banana.
  6. gcc already has precompiled headers? by Anonymous Coward · · Score: 1, Interesting

    Mac OS X talks uses precompiled headers, I thought GCC was already using them.

    1. Re:gcc already has precompiled headers? by Harik · · Score: 3, Informative
      Anonymous Coward Moaned
      Mac OS X talks uses precompiled headers, I thought GCC was already using them.
      ... without reading the article in question.
      Geoffrey Keating of Apple Computer, Inc., with support from Red Hat, Inc., has contributed a precompiled header implementation that can dramatically speed up compilation of some projects.
  7. what a coincidence by Anonymous Coward · · Score: 5, Funny

    context free grammar

    This is good for slashdot, which is a grammar-free context!

  8. old story? by reinard · · Score: 1

    ummm... from the article:

    This work will happen in early 2001.

    am i missing something?

    --
    Reinard
    1. Re:old story? by unDiWahn · · Score: 1

      Yeah -- the company was contracted to do the work in early 2002, but (according to the news on the gcc page) the code has only just been completed and integrated. I guess it just took longer than originally expected?

  9. PCH is so 1993... by Anonymous Coward · · Score: 1, Informative

    Microsoft Visual C++ had precompiled headers at least ten years ago. Y'know... when DOS was all the rage. :)

    Is this simply something that nobody cared about and finally got around to doing it - or was it a tricky feature to implement given that GCC already worked quite well enough without it?

    1. Re:PCH is so 1993... by Anonymous Coward · · Score: 1, Insightful

      There was an ample supply of eyeballs on the GCC project but eyeballs can't type. The extra time was needed to track down brains and hands to change the code.

  10. Standard C++ Easier by Euphonious+Coward · · Score: 5, Interesting
    ISO Standard 14882 C++ is easier to parse than ARM C++. The biggest difference is that the committee eliminated "implicit int" declarations, which eliminated a lot of ambiguities. Requiring typename in templates helped too.

    (OT) Just wait until you see C++0x. It will (probably) support variable definitions like

    auto iter = some_map.begin();
    and figure out a type for iter by looking at the result type from map<>::begin().
    1. Re:Standard C++ Easier by Cuthalion · · Score: 1

      Oh that would be so great..

      --
      Trees can't go dancing
      So do them a big favor
      Pretend dancing stinks!
    2. Re:Standard C++ Easier by jason_watkins · · Score: 2

      that's awsome...

      where's good info on the possible features of C++0x? I've seen the a/v presentations on technet from a year or so ago.

    3. Re:Standard C++ Easier by The+Bungi · · Score: 2
      Gawd I hope not. That's like turning off Option Explicit in Visual Basic (if that doesn't raise the hairs on your back I don't know what would).

      What's the problem with:

      blah::iterator it = myMap.begin();
      or simply defining meaningful-sounding typedefs for templates?
    4. Re:Standard C++ Easier by Curien · · Score: 2

      Hmm... I had always liked the (not really existant) typeof operator, so I'd been thinking we'd be able to say

      typeof(some_map)::iterator = some_map.begin();

      But I guess that

      typeof(some_map.begin()) iter = some_map.begin();

      is only a slight step away. And the auto there looks awfully nice. :-}

      --
      It's always a long day... 86400 doesn't fit into a short.
    5. Re:Standard C++ Easier by Euphonious+Coward · · Score: 2
      Curient wrote: "I had always liked the ... typeof operator..."

      That's there too. It's more useful for library declarations than for straight coding, but both of the declarations suggested will work.

    6. Re:Standard C++ Easier by PD · · Score: 1

      Only problem is that to me auto means to put the variable on the stack - the opposite of static.

    7. Re:Standard C++ Easier by Curien · · Score: 2

      Well, I agree that it could make things a little confusing, but auto is almost never used (many people don't think it should have ever been a keyword), and it'll only break code that was already broken by C++98 (code that relied on implicit int). For example

      auto int x = 0; // x is an int auto variable
      auto x = 0f; // x a float auto variable

      Unfortunately, it could make for confusing constructs like

      static auto x = 0f; // x is a float static var

      Maybe we should just stick with typeof -- it's more verbose but no less functional.

      --
      It's always a long day... 86400 doesn't fit into a short.
    8. Re:Standard C++ Easier by jhunsake · · Score: 2, Interesting

      But map::begin() is an overloaded function returning two types, iterator and const_iterator. How do you propose the compiler resolves that?

    9. Re:Standard C++ Easier by Euphonious+Coward · · Score: 2
      jason_watkins wrote: "where's good info on the possible features of C++0x? "

      There's nothing public I know of. Join! Help!

    10. Re:Standard C++ Easier by Anonymous+Brave+Guy · · Score: 2

      The problem with

      blah::iterator it = myMap.begin();

      is that sometimes, you don't know the type of blah, or even blah::iterator. Grab a Google Groups, and search the archives of comp.lang.c++.moderated, comp.std.C++ and similar groups if you're interested in why this matters.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    11. Re:Standard C++ Easier by Anonymous+Brave+Guy · · Score: 3, Informative

      You can resolve the ambiguity using the same rules that decide which version you'll be calling (which you have to work out anyway, of course) and take the return type of that version. As long as you've got context -- which you have in the example case -- it's not a problem. In general, you'd need to specify which overload you wanted. It's kind of like explicit template instantiation (and equally horrible if you have to do it).

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    12. Re:Standard C++ Easier by jhunsake · · Score: 1

      What is the context? It depends on whether the person was later going to modify the map using that iterator. But even making a decision based on that is bad... using the const version is usually to prevent programmer mistakes.

    13. Re:Standard C++ Easier by tbspit · · Score: 1

      But you will not use

      auto x = map::begin();

      You will use

      auto x = some_map.begin();

      The overloading of begin() depends on whether some_map is itself const or not, so I see no problem here.

    14. Re:Standard C++ Easier by Anonymous+Brave+Guy · · Score: 2

      The context is (and must be) unambiguous at the time of the function call. In this case, you'll get the const version if you call begin() on a const object, and it's basically as simple as that. Whether or not you'll ever modify data doesn't have any bearing on its constancy in C++, which is unfortunate at other times, but helpful here.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    15. Re:Standard C++ Easier by vidarh · · Score: 2

      But this is not always what you want. I often use const_iterator's when iterating over a non-const container, because the iteration in question have no need to change the container - it helps reducing the chance of accidentally calling functions that would change the container in places where you don't want to change it.

    16. Re:Standard C++ Easier by jhunsake · · Score: 1

      Exactly, this was my point. This is a case where the compiler is guessing... and may guess incorrectly.

    17. Re:Standard C++ Easier by jhunsake · · Score: 1

      No, suppose I am writing a function that uses a map. The function consists of two for loops. In the first, I iterate through the map, but don't want to change it. Hence I use a const_iterator to make sure that I don't try to change it in the body of the for. In the second loop, I do actually make some changes (or there is a possibility I could), so I use a iterator. The point of the constant version is to help prevent errors. Are you proposing that functions have to be divided along those lines?

    18. Re:Standard C++ Easier by tbspit · · Score: 1

      Your first comment talks about overloading which is simply calling a different function according to the parameter. So map::begin() is overloaded as:

      const_iterator begin() const;

      or

      iterator begin();

      To use overloading for your case:

      auto x = static_cast<const map>(some_map).begin();

      but this clearly defies the scope, as you still have to use the type here, and you might as well use:

      map::const_iterator x = some_map.begin();

    19. Re:Standard C++ Easier by jhunsake · · Score: 1

      So, in other words, auto is useless? :)

    20. Re:Standard C++ Easier by tbspit · · Score: 1

      Sure it is useless. Go on and write something like:

      for (std::map<std::string, std::vector<my_ns::my_class>, my_comp>::iterator i = v.begin(); i != v.end(); ++i)

      instead of the seemingly cleaner

      for (auto i = v.begin; i != v.end(); ++i)

      :)

    21. Re:Standard C++ Easier by jhunsake · · Score: 1

      I will, thank you. But let's not exaggerate.

      using std::map;
      using std::vector;
      using std::map;
      typedef map<string, vector<my_ns::my_class>, my_comp> mymap;
      for (mymap::iterator i = v.begin(); i != v.end(); ++i)


      Check out Stroustrup's book, as this is what he frequently does.

      I really hope they don't add it to the standard, it will only lead to more shitty code.

    22. Re:Standard C++ Easier by Anonymous+Brave+Guy · · Score: 2

      I appreciate that it's not always what you want, but it is always what you'll get. When you use a const_iterator on a non-const container, you're still getting a normal iterator back first, it's just then being implicitly converted to the constant access version. If you want to do that, you're explicitly overriding the rules of the language, in which case presumably the auto stuff mentioned isn't what you're looking for anyway.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    23. Re:Standard C++ Easier by vidarh · · Score: 2
      You are NOT overriding the rules of the language. A non const value can ALWAYS legally be made const. This has a lot of value for instance in implementing const member functions that operate on STL containers. Example:

      class Foo { typedef std::map<std::string, std::string> Data; Data m_data; public: std::string getBar(const std::string & key) const { Data::const_iterator i = m_data.find(key); if (i == m_data.end()) return ""; return i->second; } };

      This is a common and useful idiom (btw. for people less experienced with C++: using [] to check for existence of a value in a map is a BAD thing, as the [] operator always create a new element if it can't find an existing matching element).

      Personally I find that I use const_iterator all over the place, most of the time with non-const containers, as I tend to keep the number of operations that mutate data members to a minimum, and const_iterator serves the dual purpose of protecting me from accidentally introducing bugs and making my const member functions correct.

      Personally I don't see much value of auto in the first place, and in this case I'd be unable to use it most of the time, while a proper typeof operator give the flexibility of auto and work with my case and provide lots of other benefits without the problem of overloading the meaning of yet another keyword (it's bad enough with static).

    24. Re:Standard C++ Easier by Anonymous+Brave+Guy · · Score: 2
      You are NOT overriding the rules of the language. A non const value can ALWAYS legally be made const.

      You're not overriding it in that sense, sure. My point was that the language rules are clear on the overriding: if you call the method on a non-const object, you get the non-const method, whether or not you will use it as such. This can be a pain, for example in the context of operator[] on an associative array type, where you might prefer to throw an exception rather than implicitly create an entry that does not yet exist if you're looking up rather than assigning. You are, of course, at liberty to convert the returned value yourself into a constant form, but that is a second step that the language would not ordinarily take for you.

      The purpose of something like auto would be to replace the first step. If you're wanting to explicitly set the type, you don't need auto. Its value is elsewhere; look at some of the recent template discussions in places like comp.lang.c++.moderated and you'll find the need for that feature, or something like typeof, does come up every now and then.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  11. Because implementations just got contributed by sixseve · · Score: 5, Informative

    From the gcc.gnu.org homepage news:

    January 10, 2003

    Geoffrey Keating of Apple Computer, Inc., with support from Red Hat, Inc., has contributed a precompiled header implementation that can dramatically speed up compilation of some projects.

    December 27, 2002

    Mark Mitchell of CodeSourcery has contributed a new, hand-crafted recursive-descent C++ parser sponsored by the Los Alamos National Laboratory. The new parser is more standard conforming and fixes many bugs (about 100 in our bug database alone) from the old YACC-derived parser.

    1. Re:Because implementations just got contributed by MissMyNewton · · Score: 2

      January 10, 2003


      Geoffrey Keating of Apple Computer, Inc., with support from Red Hat, Inc., has contributed a precompiled header implementation that can dramatically speed up compilation of some projects.


      Where are the "Apple only steals, never gives back" trolls???

      --

      ---

      Information wants...you to shut your pie hole.

  12. Well done GCC, but.... by peterpi · · Score: 4, Insightful

    ... in my experience, good use of forward declarations (to avoid unrequired chains of #include), combined with simply putting less in each .c file is a lot more effective than adding the complication of precompiled headers into your build process.

    1. Re:Well done GCC, but.... by sohp · · Score: 4, Insightful

      That's exactly the sort of rule of thumb that John S. Lakos talks about in his terrific book, Large-Scale C++ Software Design (ISBN: 0201633620). Basically, pre-compiled headers are for developers who are too lazy or inexperienced to manage inter-module dependencies efficiently.

    2. Re:Well done GCC, but.... by m8pple · · Score: 3, Interesting
      Whilst that's true to a certain extent, pre-compiled headers still definitely have their use just for cutting down the time taken for the compiler to find, load and parse all the system headers, crt headers, stl headers, boost headers etc.

      You could argue that including only the very specific headers you need for each source file is the best way to go, but I think it is a reasonable trade-off to include all these static system/library headers in a precompiled header, then to re-reference the specific headers in the user source code to indicate the dependency explicitly. I totally agree with the stuff in the book about people binding modules together too tightly through inter-dependencies in user headers though (although I'm not convinced by everything he talks about :)

      I usually chuck almost all the stl and alot of boost into my pre-compiled headers when setting up a build, which cuts the full rebuild time at least in half, usually more. I'm always reluctant to do a full rebuild of the same module under g++ as I know it'll take a long time (comparatively). Admittedly the larger each source file is, the less the benefit, but I tend towards lots of small to medium sized source files anyway.

      I'm not sure how the gcc version will do it, but the msvc (boo, hiss etc.) version actually takes a memory dump of the parsed code tree when pre-compiling, then just copies this back into memory for successive compilations, so the speed increase is dramatic. Hopefully the gcc version does something similar (or better). Be nice if it had something similar to ccache built in as well.

    3. Re:Well done GCC, but.... by Anonymous Coward · · Score: 0

      Compilers are basically for developers who are too lazy or inexperienced to hand-code assembly language, that's my motto!

    4. Re:Well done GCC, but.... by Lumpish+Scholar · · Score: 5, Interesting
      ... in my experience, good use of forward declarations (to avoid unrequired chains of #include), combined with simply putting less in each .c file is a lot more effective than adding the complication of precompiled headers into your build process.
      My experience is just the opposite.

      Putting less into each .c file (so that changing a .c file requires less to be recompiled) is only useful if most of the code you need to compile is in .c files. Unfortunately, even with forward declarations, every .c file is likely to have thousands (or tens of thousands) of source from all the .h files that are (recursively) included; that's where the bulk of the compiled code is. Unless each of the smaller .c files can include significantly fewer .h files than the larger .c files could (which, in my experience, they can't), then doubling the number of .c files roughly doubles the amount of source code (.c files plus all the .h files per .c file) needed to compile a product.

      I haven't had a lot of luck with precompiled headers, either. (Context: a project with a hundred source files spread across a dozen directories, totalling about fifty thousand lines of source.)

      Best solution I know of for C++: Use as many forward declarations as you can, periodically trim your include directives, and have relatively large .c files. Each includes a lot of .h source, but this reduces the total bulk of what comes out of the preprocessor.

      I know of C++ systems that take a CPU week to build because of these issues!

      Note that Java doesn't have this problem, or the problem of teaching your makefile about header file dependencies. (Not important enough to get all projects to switch from C or C++, but among the reasons that some projects should.)
      --
      Stupid job ads, weird spam, occasional insight at
    5. Re:Well done GCC, but.... by GlassHeart · · Score: 2
      Unless each of the smaller .c files can include significantly fewer .h files than the larger .c files could (which, in my experience, they can't) [...]

      This occurs because programmers created these superheaders where everything go. Instead, related declarations should be clustered in a single include file, so that a single #include statement can satisfy several uses. Conversely, unrelated declarations should be split over different headers. Headers should have include guards to avoid repeated inclusion. Finally, headers should #include everything it needs to compile, but no more.

      Messy headers are a symptom of a deeper problem with the organization. If the developer didn't even bother thinking about where to put the API, it's a good bet that the API is not well documented either.

      Look at the Standard C Library for an example. It's a relatively small library by today's standards, yet its declarations are broken out over two dozen headers.

      (Not important enough to get all projects to switch from C or C++ [to Java], but among the reasons that some projects should.)

      Which projects? If it's small enough, the build time wasn't worth worrying about to start with. If it's big enough for the build time to be a problem, then the rewrite can cost you your entire competitive advantage.

    6. Re:Well done GCC, but.... by t_hunger · · Score: 2

      I guess the gcc-people are not really good designers then wrt. C++. Try this: Write a hello world in C++, include just one STL-header, run g++ -E and run a linecount on the output: Your few lines of code turned into well over 30000 lines that g++ sees and works on! So precompiled headers should make things faster, even if your own code is well structured.

      --
      Regards, Tobias
    7. Re:Well done GCC, but.... by p3d0 · · Score: 1

      What a great book. Lots of common sense in there.

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  13. Karma whore by Anonymous Coward · · Score: 0

    Burn karma to make meta-karma?

  14. C++ Type Inference, great.. by Tom7 · · Score: 2

    Great... that will make using C++ templates and stuff a bit nicer...
    (Of course, SML and O'Caml (and related languages) have had much more sophisticated type inference for 30 years!)

  15. Recursive Descent / Context Freeness by Tom7 · · Score: 5, Informative

    Just to clarify: A language does not need to be context-free in order to be parsed by a recursive descent parser, because you can augment the recursive functions with extra arguments that provide, well, context. For instance:

    [exp] ::= x | let [dec] in [exp] end | n | print [exp]

    [dec] ::= val x = [exp]

    (where x is the set of variables and n is the set of integer constants)

    This language is context-free, but the following restriction isn't: We say that strings are only in this language if variables aren't used before being declared. Legal:

    let
    val x = 3
    in
    print x
    end

    Illegal:

    let
    val x = 3
    in
    print y
    end

    This language isn't context-free (in the usual sense) but can be parsed easily by a recursive function. That function simply takes with it a list of all the declared variables. (In fact, you can pull this same sort of hack with lex/yac by having the lexer make a call into your code, which keeps a symbol table of variables it has seen as it runs.)

    (If I understand the problem with C and C++ correctly, the difficulty parsing has to do with recognizing a token as a type name or an identifier, so I think this is relevant.)

  16. LL(k)? I thought LALR(1) was "better." by SkewlD00d · · Score: 2

    A predictive bottom-up LR parser is more powerful than top-down LL. It's in the dragon book. Additionally, a top-down, recursive descent parser (using a stack or recursive calls) has huge, huge memory requirements (non-deterministic). SLR(1)/LALR(1)/LALR(k) parsers are usually table driven, and are essentially finite state machines (FSMs). Shift-Reduce parsers (such as those generated by lex/yacc/flex/bison) use a symbol stack that doesn't grow huge with one-or-more or zero-or-more conditionals, because it reduces. Bottom up parsers tend to push all the previous crap on to the stack, all the way up to the root node. Index operations are several orders of magnitude faster than procedure calls. Sounds like GCC is having a case of feature creep. I'll stick to 2.95 TYVM.

    --
    The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
  17. Re:LL(k)? I thought LALR(1) was "better." by cakoose · · Score: 3, Informative
    A predictive bottom-up LR parser is more powerful than top-down LL.

    In terms of grammars alone, I believe that is somewhat correct but we're talking about a compiler here. LL parsing is often helpful because it can create and use inherited attributes. A top-down parser can perform some of the semantic work that an LR bottom-up parser cannot.

    I also don't think that the recursive call stack should be of much concern because GCC will probably do something like that anyway (though maybe not as fine grained) in the next compiler pass. As said before, it might actually reduce the amount of work.

    What do you mean by 'huge memory requirements (non-deterministic)'? Yes, an LL parser will maintain a deeper stack but the memory usage is by no means unbounded.

    I don't know how you can classify changing GCC's internal structure as 'feature-creep'. LL grammars are usually considered easier to read/understand and despite what some wannabe-macho programmers may think, readability/clarity is good. It is also easier to write meaningful error messages because an LL parser kind of models the way we naturally think.

    Now I'm not saying that LL parsers are better. The workings of LR grammars are extremely interesting (Knuth is a god). However, don't knock it for no reason at all (stick to 2.95? gimme a break). I'm sure the GCC guys know more than both of us about compiler design and have good reasons for their design decisions.

  18. Um, that's already in the library by devphil · · Score: 2


    You don't need GCC to implement it, nor do you need to wait on C++0x. Every container declares appropriate nested typedefs. Using your example:

    map<whatever>::iterator iter = some_map.begin();

    The "figure out a type for you" was done when some_map was declared.

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
    1. Re:Um, that's already in the library by Euphonious+Coward · · Score: 2
      devphil wrote:
      map<whatever>::iterator iter = some_map.begin();
      I find it interesting that whenever people argue against this feature, they always abbreviate their examples. Verbosity for verbosity's sake is not a good thing: if we all know the type, why should we have to keep saying it? It doesn't make the code any safer.

      devphil's code above is an example of a workaround. It was my work to insist on having typedefs all over the STL classes, for this purpose. They were necessary clutter because of a weakness in the language. They will remain as not-so-necessary clutter, for backward compatibility.

  19. Yes, sort of, with some help by devphil · · Score: 3, Informative


    You don't really want that kind of thing done in the parser, because your refactorer (or whatever) would then have to handle the possibility of incorrect code. The parser is handling syntax (mostly), remember; semantic correctness is checked later.

    You want GCC to parse the code, check the code, and then do something other than generate assembly. To some extent that's being done already (there are command-line options to dump various representations of the source, e.g., -fdump-translation-unit).

    Also, the back-end (code generation) is seperate from the front end (language handling), so if you were to implement a back-end whose "assembly language" output was actually, say, XML, then you would have a C-to-XML, C++-to-XML, Java-to-XML, FORTRAN-to-XML, ObjectiveC-to-XML, and whatever-else-I'm-missing-to-XML converter. Dunno why you'd want such a thing, but you could probably build some kind of program database out of it (which is what IDEs use to do things like function name completion when typing code).

    All of which is independent of the front-end parser.

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
  20. Close by devphil · · Score: 2


    The version of GCC that ships with OS X has been heavily hacked on to make it work better for that operating system. One of the things Apple added was a simple implementation of PCH.

    Other companies have done the same thing, many times. One of the customized versions of GCC that Cygnus did for a customer had a (simple) PCH in it, too.

    The version being checked in to GCC is the result of all these different implementations coming together.

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
  21. Re:LL(k)? I thought LALR(1) was "better." by batand · · Score: 2, Informative

    Index operations are several orders of magnitude faster than procedure calls.

    Actually a table based solution like bison is slower than implementing the parser directly in code (like recursive descent).

    But giving up on LALR parsing is not necessary. You can use recursive ascent. See this article that claims a speedup of 2.5 to 6.5 compared to table based LALR(1).

  22. Re:LL(k)? I thought LALR(1) was "better." by Anonymous Coward · · Score: 0

    It's in the dragon book

    Don't believe everything you read in books. Especially old books.

  23. Re:LL(k)? I thought LALR(1) was "better." by rambham · · Score: 1
    I am one of the authors of the technical report that was sited.

    I have never heard of the term recursive ascent.

    Basically the work mentioned in that technical report says that if you convert a table driven LALR parser into a directly executable function then you can get a 2.5 to 6.5 speed up for the parser.

    A little background: The tables in bison and yacc generated parsers encode information about what to do at some particular point in the parse.For example the tables could say state 20 should shift token 1 or 2 but if its token 4 then a reduce should take place. Basically the tables in a LALR parser are state machines.

    What we showed was that if you replace the tables with executable code the resulting parser will be faster. And the amount of code that is generated is not too big. So the above example could be encoded as a switch statement with case labels for 1,2 and 4.

    Of course there are lot more details that need to be taken care of, but those are mentioned in the paper.

    I feel very safe concluding that a directly executable parser would always be faster (for any real world language) than a table based parser. I would similarly expect that the recursive descent parser added to gcc would be faster than the old table driven one.

    One final note: In case anyone wants the source code, its not ready. I've been wanting to clean it up and release it to the world for the past 7 years now, but I've not yet managed to find the time!!

  24. Objective-C++? by JoshRendlesham · · Score: 1

    Is there any word about the inclusion of Objective-C++ support for GCC within the near future? Would this new parser allow for the easier or quicker inclusion of Objective-C++ support?

    While considered ugly by some, Objective-C++ would allow for the easier creation of Objective-C bindings for C++ libraries under platforms supporting the GCC Objective-C compiler. I would sure look forward to Objective-C bindings to libraries such as wxWindows and FOX.

  25. How does that work with templates? by Per+Abrahamsen · · Score: 1

    Precompiled headers aren't much use with C, but for any C++ program that uses iostreams or STL it is godsend.

    I have no idea whether those libraries could be rewritten to have small headers, it would probably require a working version of the "export" keyword at least, but I don't know if that is enough.

  26. Re:LL(k)? I thought LALR(1) was "better." by SkewlD00d · · Score: 1

    All parsers are state machines in one way or another. Well, my point is that the advantage of LALR/SLR predictive parsers is that they don't have to waste time doing multiple token look-aheads { LL(1), LL(2)... LL(k) }. Certainly C is approximately LL(5). In LALR(1), it just shifts until it finds something that it can reduce. Additionally, error states can be introduced, and have associated actions, e.g., "missing semicolon", or whatever.

    Wait a minute, function calls tend to incur penalties on almost every OS & platform combination. Unless of course, there's alot of inline code and globals, then you can probably cheat and not setup any stack frames (except pushing the return address).

    But any program you can implement with recursive function calls, you can implement with one function and a stack. Your new task: implement a LALR(1) C parser w/ one function and a heap. A "real" solution, i.e. GCC, would probably involve a stack and FSM code generated by a tool that could do state minimization, the whole reason for using tables to being with.

    --
    The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.