Slashdot Mirror


Which Compiler to Extend for a Small Project?

Andreas(R) asks: "While planning the design of my small programming language, and would appreciate some lessons learned from experienced programmers which have already tried this. I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison. The programming language will be interpreted, object oriented and have higher order programming. Perl 1 seems like a decent starting point, as it's yacc based, and 5000 lines of code. Later versions of Perl are too large to get a good understanding of the whole program in a short period of time. Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL? What other small languages can be used instead? How do I go about designing a small programming language in practice, using what I already know about compiler theory?"

6 of 89 comments (clear)

  1. Holy cow! by sidecut · · Score: 5, Insightful
    Can you give us a few more specifics on what the language will be used for? Will it be embedded? Database connected? Real-time? Interactive?

    While this seems like beaucoup fun, I'd question the need to extend an existing language by altering the compiler. Towards that end, you might want to use LISP or Scheme, as language extension is built into the language. ( See what Paul Graham has to say about the subject)

    1. Re:Holy cow! by xp · · Score: 5, Interesting

      Or why don't you design a meta-language using which other languages can be designed -- a language that remains completely extensible -- something like MDef.

  2. This is going to get me in trouble by Dancin_Santa · · Score: 5, Informative

    Look. The source code to Perl 1 is only 5000 lines long. The source code is open and available for anyone who wants to use it for research and investigation as well as for educational purposes. If we had to release all our source code because at one time in our lives we saw some source code that was covered by the GPL, none of us could ever get jobs programming.

    The source is there. Use it as a base for your own program. Change it enough that you aren't blatantly copying it. Release it under your own license. If you like the GPL, then do that. Some of us like less restrictive licenses like BSD or the original (non-GPL compatible) Artistic License.

    Definitely, go with Perl 1. From the look of it, it seems to have some pretty good foundations to build upon. Taking a look at where the language itself is now, there's much improvement needed in v1. It seems like a pretty good place to start.

    Don't be restricted by licenses on small projects, much less ones that are essentially abandoned.

  3. my $0.02 after a couple compiler classes by blackcoot · · Score: 5, Informative

    if you have never taken a compiler class before or written a compiler on your own, i suggest the following:

    • while i encourage re-use, if your purpose is to learn how to write a compiler, don't extend someone else's. find a grammar for a language you're comfortable with (e.g. pascal) and start from there. you'll find that getting just plain pascal to compile properly will be quite a challenge. oo-ness just adds another layer of complexity to the compiler.
    • to aid your debugging, you'll want to spend some time thinking up good ways of a) visualizing your parse tree, b) representing your IL in a human readable format, c) representing the entire state of your interpreter in human readable format. these just by themselves can be very challenging projects.
    • acquire a copy of the dragon book. this is only a starting point, you may also want to peruse some of andrew appel's compiler books (such as this one or this one)
    • lex/flex and bison/yacc are rather antiquated, you may want to check out terence parr's antlr (formerly pccts) instead. this allows you to implement your compiler+interpreter in your language of choice, rather than being forced to use c. my compiler classes all required that we used lex/yacc, so that's what i did; however, i would have really liked to have the option of doing it all in java or c++.
    • how you setup your symbol table will have a large determining factor in your success. i've used trees of hashtables in the past quite successfully (each node in the tree corresponding to a lexical context such that all symbols visible in the current scope are on the path between the present node and the root). i expect that extending this to support an OO language shouldn't be too hard. e.g: augment nodes representing a type to include back pointers to parent types. you will have to modify search to do lookups as appropriate.

    there are several "toy" grammars out there which allow you to do useful stuff (recursion, 'interesting' data structures [i.e. self referential], etc.) without wading through a lot of useless cruft (implementing huge amounts of runtime support, for example). i'd go with one of those. once you're comfortable that you can make one of these learning languages work, then try to hack one of your own.

    this all said, good luck! i am by no means an expert on compiler construction (worked on a custom in-house scripting language as an intern a couple years back and had to take compiler classes to satisfy breadth requirements for my m.s. c.s.) but i do hope this is a little bit useful.

  4. A few ideas by RevAaron · · Score: 5, Insightful

    First, there are two kinds of small languages:
    1. small languages like lua, io, and scheme that are small in the built-in libraries and in the total distro. These three are great places to start- both are small, OOPish, allow higher-order programming by passing classes, objects, functions and methods as objects.

    2. Then there are languages that are big in some ways, but small in syntax. Some of these are easier to extend than so-called "little languages." The reason is usually that their syntax is small, in an isolated place, easy to get at, and meant to be modified. The two best examples for this are Smalltalk and Lisp. Both of these languages satisfy your other requirements and really kick ass for extention. Unlike the above languages, the so-called little-languages, most Smalltalk and Lisp dialects have big, useful libraries. Unlike a big fat language like perl or C++, having a useful library doesn't mean that the language is a huge pain in the ass to extend.

    Both Lisp and Smalltalk have a number of implementations. I am a big fan of Squeak Smalltalk, though systems like Little Smalltalk or even GNU Smalltalk maybe worth checking out.

    A lot of people here have bad feelings about Lisp-like languages. It's a shame, since Scheme, ISLISP (OpenLisp is a great implementation) and Common Lisp are all *very* powerful languages. You can be quite productive with them once you get over the part about whining about parens. But Lisp may very well be the best option here, there is a long history of people writing custom-syntaxes and language extensions. Look up Common Lisp macros- power almost beyond comprehension, a lot of fun to play with, and with an elegance all its own.

    There are examples of people writing a C-like syntaxes for various Scheme implementations. IIRC, Gambit-C (a Scheme to C compiler) comes with one. On Cliki, there are a bunch of other alternative Scheme syntaxes listed.

    To, one of the big advantages to using a language in the second category is that syntax extension/modification is done in the language itself, rather than in C. With that comes the familiarity of the language you're creating and the other benefits you gain by using a high-level language like Smalltalk or Common Lisp.

    Just some thoughts...

    --

    Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
  5. Don't extend. Its overrated. by Anonymous Coward · · Score: 5, Informative

    Honestly, its easier to write a recursive descent parser by hand for a programming language than you think, and interpreters are ridiculously easy unless you're worried about making it fast, which is way overrated too. It mattered with 640KB of RAM at 20MHz, but these days, its just stupid to care unless you notice its insanely slow.

    First off, if you've not found this link: http://compilers.iecc.com/crenshaw/, then I recommend you start with it. While its about writing a compiler, it really help make parsing much clearer.

    Scheme is a good language to check out if you want to start with another design(a scheme interpreter can be written in a few hours, even in C, if you're slick, even if you're not, it would be short project to get 90%).

    Some other reference material: Parsing Techniques(free online). Also: Modern Compiler Design by the same guys and well worth the investment. Concepts, Techniques, and Models of Programming Languages, teaches kernal theory of language design, and may open your mind to some other techniques you may not be aware of.

    Checking out the archives on Lambda The Ultimate would be wise too. Also, if you're in Boston on December the 4th, you might check out the Lightweight Languages Workshop at MIT.