Which Compiler to Extend for a Small Project?
Andreas(R) asks: "While planning the design of my small programming language, and would appreciate some lessons learned from experienced programmers which have already tried this. I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison. The programming language will be interpreted, object oriented and have higher order programming. Perl 1 seems like a decent starting point, as it's yacc based, and 5000 lines of code. Later versions of Perl are too large to get a good understanding of the whole program in a short period of time. Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL? What other small languages can be used instead? How do I go about designing a small programming language in practice, using what I already know about compiler theory?"
Look. The source code to Perl 1 is only 5000 lines long. The source code is open and available for anyone who wants to use it for research and investigation as well as for educational purposes. If we had to release all our source code because at one time in our lives we saw some source code that was covered by the GPL, none of us could ever get jobs programming.
The source is there. Use it as a base for your own program. Change it enough that you aren't blatantly copying it. Release it under your own license. If you like the GPL, then do that. Some of us like less restrictive licenses like BSD or the original (non-GPL compatible) Artistic License.
Definitely, go with Perl 1. From the look of it, it seems to have some pretty good foundations to build upon. Taking a look at where the language itself is now, there's much improvement needed in v1. It seems like a pretty good place to start.
Don't be restricted by licenses on small projects, much less ones that are essentially abandoned.
Also available under The Artistic License. You might like it over the GPL.
Perl might not be the best starting point if you want to learn something about desining a programming language. If you're thinking of using that, I'd say you're better off starting from scratch.
;)
Functional languages, especially those with pattern matching primitives (like ML or Haskell), are really good at this kind of program -- in fact, writing compilers and interpreters is really their shining point. I highly recommend using one of these languages rather than C. Lots of undergraduate computer science classes (like 15-212 and 15-312 at my university) write interpreters in functional languages as part of the curriculum. You could try to find some course notes... I speak from experience when I say that this beats the hell out of mucking around in C or C++, not really knowing what you're doing.
if you have never taken a compiler class before or written a compiler on your own, i suggest the following:
there are several "toy" grammars out there which allow you to do useful stuff (recursion, 'interesting' data structures [i.e. self referential], etc.) without wading through a lot of useless cruft (implementing huge amounts of runtime support, for example). i'd go with one of those. once you're comfortable that you can make one of these learning languages work, then try to hack one of your own.
this all said, good luck! i am by no means an expert on compiler construction (worked on a custom in-house scripting language as an intern a couple years back and had to take compiler classes to satisfy breadth requirements for my m.s. c.s.) but i do hope this is a little bit useful.
"Is Python out of the question for such a project, since it's not GPL?"
You seem to be concerned about the limitations the license puts on you. Python's license is less restrictive than the GPL - read about it (http://python.org/psf/license.html).
One of many things this means is that if you decide the Python License isn't restrictive enough for you, you can relicense the combination of Python plus your changes under the GPL, as long as you adhere to Python's license (leaving its copyright and other required information intact).
You don't really say what sort of problems you're talking about or why you want to build the language.
If you just want to build a language that will teach you about programming languages, look at old fashioned Pascal not Delphi or Kylix or even Turbo Pascal, but good old-fashioned Jensen and Wirth 1974 Pascal.
If you want to design a programming language, the best advice is to write some code in the proposed language. Remember Tony Hoare's rule, and keep it simple. Most programming languages, from Perl to Python to Java 5, suffer from being accumulations of features.
Have a look at Ruby, Modula-n, Oberon, and so forth.
If you're looking to learn lots about programming in general, think about things you want to do, and construct a lanaguage that does them. Icon is a nice example. Look at SNOBOL, if only because you'll appreciate the "five miles through the snow" stories we old farts tell.
On the other hand, Paul Graham seems to like creating new programming languages. Very useful observations on how to go about creating a new language.
I'm not really sure what your goal here is. Are you wanting to write a compiler for the sheer joy of writing a compiler? That's a good goal, for sure, and I recommend that every programmer write a compiler or two during their careers, or at least some interpreters.
On the other hand, maybe you want to extend an existing language because you need some specific language feature. Also a good goal, but I do want to caution you to evaluate existing languages first; you may find that some language does what you want, or makes it easy to write a language library that does what you want.
Or maybe you want to write an interpreter to script a bigger program. Then I'd say that you may be better off using something that's already there.
In the first case, if you want to learn how to write a compiler, I generally recommend writing Scheme, or some other simple Lisp. Scheme has advantages in that its parse tree representation is obvious (that's why Lisp looks like it does), the structure of an interpreter and of a compiler are quite similar, and it covers the fundamentals of compilers without burdening you with a bunch of cruft. (If this is your first compiler, you may want to leave out continuations and garbage collection for now.) The very excellent book Structure and Interpretation of Computer Programming has what you need: chapter 4 is about writing interpreters (for Scheme, some modified Schemes, and Prolog), and chapter 5 is all about writing a compiler. All the code in SICP is in Scheme, but the book starts from the beginning with (+ 1 1).
SICP may be too academic for your taste, so you may prefer Paradigms of Artificial Intelligence Programming instead. It uses Common Lisp, and has a little bit more practical feel that SICP. Chapter 22 is about writing a Scheme interpreter, and chapter 23 is about writing a Scheme compiler. Unlike SICP, PAIP doesn't cover the garbage collector. PAIP uses Common Lisp, and although it has enough of an introduction to be a "refresher" for somebody already familiar with Lisp, it's not really suitable for learning the language.
It's really simple to write Lisp in Lisp. Indeed, a month ago I wrote one for a comp.lang.lisp post, just to get a silly quine to work! That's why I keep talking about Lisp and Scheme: it's easy to do.
If Scheme isn't your thing, then Pascal may be a good alternative. Don't try to get fancy with heap allocation, pointers, objects, and other new add-ons; I'd start with plain old Wirth-designed Pascal, and get fancy later if you want. Pascal is really designed for a classroom setting. I have a dim memory of Ada being used for this purpose in some books, but I can't speak very authoratively there.
Of course, the definitive book on writing compilers is the Dragon Book. But you may want to be familiar with some basic CS theory about FSMs first.
Now, that's if you want to learn about compilers for the joy of learning about compilers. But what if you just need one particular language feature for your problem, and that's why you want to write a compiler? Well, then I'd suggest you make sure you've looked at a lot of different languages first. Some languages have surprising features that may let you write a small in-language library to do what you need, instead of needing to extend the compiler.
Lisp is a good candidate for this. John Foderaro once described Lisp as "a programmable programming language". You can alter the language to suit the problem at hand, instead of having to work the other way around. At work now, I use Lisp as a bridge between two very different programming languages because I can extend it in both directions to cover what I nee
I highly recommend Lua. It's a brilliant language. The code base is clean, portable, and easy to read.
http://www.lua.org
In fact, I recommend you not create a new language at all and just use Lua. It's a deceptively simple language that allows you to extend it through certain meta-constructs to become pretty much anything you need-- from simple data description to full Object-Orientation.
I once read through the entire dragon book with the intention of creating my own language; I gave it all up when I found Lua.
Good stuff.
Honestly, its easier to write a recursive descent parser by hand for a programming language than you think, and interpreters are ridiculously easy unless you're worried about making it fast, which is way overrated too. It mattered with 640KB of RAM at 20MHz, but these days, its just stupid to care unless you notice its insanely slow.
First off, if you've not found this link: http://compilers.iecc.com/crenshaw/, then I recommend you start with it. While its about writing a compiler, it really help make parsing much clearer.
Scheme is a good language to check out if you want to start with another design(a scheme interpreter can be written in a few hours, even in C, if you're slick, even if you're not, it would be short project to get 90%).
Some other reference material: Parsing Techniques(free online). Also: Modern Compiler Design by the same guys and well worth the investment. Concepts, Techniques, and Models of Programming Languages, teaches kernal theory of language design, and may open your mind to some other techniques you may not be aware of.
Checking out the archives on Lambda The Ultimate would be wise too. Also, if you're in Boston on December the 4th, you might check out the Lightweight Languages Workshop at MIT.
Maybe you should have a look at Parrot? The CVS includes a lot of working and non-working language-implementations which you could have a look at..
I also recommend you to have a look at Lua which is a minimalistic yet beautiful language..
My other account has a 3-digit UID.
If you do as the parent says and "use it as a base" and "change enough so you aren't blatantly copying it" then you are still copying it and restricted in distrution by copyright law.
The GPL may grant you other privileges if you abide by its conditions. (Though AFAIK most PERL is licensed under the PERL Artisitic License which is more permissive than the GPL.)
Sam
blog.sam.liddicott.com
Write a compiler for parrot.
Lots of people are doing that right now, and you could learn a lot from the included compilers for different languages, all in varying stages of completion.
Stumbling in the dark
I hear slavering of jaws
Eaten by a grue.
A cool site for people interested in languages and their design is: http://lambda-the-ultimate.org/
I agree completely that only looking for GPL code is a mistaken approach, as the GPL is more restrictive than other licenses. You can almost always include BSD code in GPL, but not vice versa. In any case, the GPL is probably not an ideal license if you want your language to be very widely used.
Thirdly, if you want to look at some nice source code and an interesting way of doing things, have a look at Tcl. It's C sources (modulo the regexp package which came from somewhere else) are some of the nicest I have read. Beautifully commented, and very clear to read. Also, Tcl is a smart approach for a language designed to be extensible.
The article's author really needs to state his purpose, though... just for fun? To 'get famous'? For work? As a homework assignment?;-)
http://www.welton.it/davidw/
Oh, I would recommed the STL if you are in a hurry. I don't know why functional is better, personally it just gave me a headache in school.
Among the many reasons that functional languages are superior for this kind of task is algebraic data types and pattern matching. Since C++ and Java don't even have sum types, the only way to simulate this is with a load of objects and "instanceof" stuff. For example, here's a simple calculator interpreter in ML:
datatype exp =
Int of int
| Plus of exp * exp
| Times of exp * exp
(* if, then, else *)
| Ifzero of exp * exp * exp
(* evaluates an expression, returning an int *)
fun eval (Int i) = i
| eval (Plus (e1, e2)) = eval e1 + eval e2
| eval (Times (e1, e2)) = eval e1 + eval e2
| eval (Ifzero (b, t, f)) = if eval b = 0
then eval t
else eval f
That's it! These sorts of recursive passes over syntax trees is exactly what goes on for the bulk of a compiler or interpreter. Just think about what a mess this would be with the STL.