Which Compiler to Extend for a Small Project?
Andreas(R) asks: "While planning the design of my small programming language, and would appreciate some lessons learned from experienced programmers which have already tried this. I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison. The programming language will be interpreted, object oriented and have higher order programming. Perl 1 seems like a decent starting point, as it's yacc based, and 5000 lines of code. Later versions of Perl are too large to get a good understanding of the whole program in a short period of time. Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL? What other small languages can be used instead? How do I go about designing a small programming language in practice, using what I already know about compiler theory?"
While this seems like beaucoup fun, I'd question the need to extend an existing language by altering the compiler. Towards that end, you might want to use LISP or Scheme, as language extension is built into the language. ( See what Paul Graham has to say about the subject)
Look. The source code to Perl 1 is only 5000 lines long. The source code is open and available for anyone who wants to use it for research and investigation as well as for educational purposes. If we had to release all our source code because at one time in our lives we saw some source code that was covered by the GPL, none of us could ever get jobs programming.
The source is there. Use it as a base for your own program. Change it enough that you aren't blatantly copying it. Release it under your own license. If you like the GPL, then do that. Some of us like less restrictive licenses like BSD or the original (non-GPL compatible) Artistic License.
Definitely, go with Perl 1. From the look of it, it seems to have some pretty good foundations to build upon. Taking a look at where the language itself is now, there's much improvement needed in v1. It seems like a pretty good place to start.
Don't be restricted by licenses on small projects, much less ones that are essentially abandoned.
if you have never taken a compiler class before or written a compiler on your own, i suggest the following:
there are several "toy" grammars out there which allow you to do useful stuff (recursion, 'interesting' data structures [i.e. self referential], etc.) without wading through a lot of useless cruft (implementing huge amounts of runtime support, for example). i'd go with one of those. once you're comfortable that you can make one of these learning languages work, then try to hack one of your own.
this all said, good luck! i am by no means an expert on compiler construction (worked on a custom in-house scripting language as an intern a couple years back and had to take compiler classes to satisfy breadth requirements for my m.s. c.s.) but i do hope this is a little bit useful.
First, there are two kinds of small languages:
1. small languages like lua, io, and scheme that are small in the built-in libraries and in the total distro. These three are great places to start- both are small, OOPish, allow higher-order programming by passing classes, objects, functions and methods as objects.
2. Then there are languages that are big in some ways, but small in syntax. Some of these are easier to extend than so-called "little languages." The reason is usually that their syntax is small, in an isolated place, easy to get at, and meant to be modified. The two best examples for this are Smalltalk and Lisp. Both of these languages satisfy your other requirements and really kick ass for extention. Unlike the above languages, the so-called little-languages, most Smalltalk and Lisp dialects have big, useful libraries. Unlike a big fat language like perl or C++, having a useful library doesn't mean that the language is a huge pain in the ass to extend.
Both Lisp and Smalltalk have a number of implementations. I am a big fan of Squeak Smalltalk, though systems like Little Smalltalk or even GNU Smalltalk maybe worth checking out.
A lot of people here have bad feelings about Lisp-like languages. It's a shame, since Scheme, ISLISP (OpenLisp is a great implementation) and Common Lisp are all *very* powerful languages. You can be quite productive with them once you get over the part about whining about parens. But Lisp may very well be the best option here, there is a long history of people writing custom-syntaxes and language extensions. Look up Common Lisp macros- power almost beyond comprehension, a lot of fun to play with, and with an elegance all its own.
There are examples of people writing a C-like syntaxes for various Scheme implementations. IIRC, Gambit-C (a Scheme to C compiler) comes with one. On Cliki, there are a bunch of other alternative Scheme syntaxes listed.
To, one of the big advantages to using a language in the second category is that syntax extension/modification is done in the language itself, rather than in C. With that comes the familiarity of the language you're creating and the other benefits you gain by using a high-level language like Smalltalk or Common Lisp.
Just some thoughts...
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Honestly, its easier to write a recursive descent parser by hand for a programming language than you think, and interpreters are ridiculously easy unless you're worried about making it fast, which is way overrated too. It mattered with 640KB of RAM at 20MHz, but these days, its just stupid to care unless you notice its insanely slow.
First off, if you've not found this link: http://compilers.iecc.com/crenshaw/, then I recommend you start with it. While its about writing a compiler, it really help make parsing much clearer.
Scheme is a good language to check out if you want to start with another design(a scheme interpreter can be written in a few hours, even in C, if you're slick, even if you're not, it would be short project to get 90%).
Some other reference material: Parsing Techniques(free online). Also: Modern Compiler Design by the same guys and well worth the investment. Concepts, Techniques, and Models of Programming Languages, teaches kernal theory of language design, and may open your mind to some other techniques you may not be aware of.
Checking out the archives on Lambda The Ultimate would be wise too. Also, if you're in Boston on December the 4th, you might check out the Lightweight Languages Workshop at MIT.