Which Compiler to Extend for a Small Project?

← Back to Stories (view on slashdot.org)

Which Compiler to Extend for a Small Project?

Posted by Cliff on Wednesday November 10, 2004 @03:10PM from the ground-floor-of-computer-language-construction dept.

Andreas(R) asks: "While planning the design of my small programming language, and would appreciate some lessons learned from experienced programmers which have already tried this. I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison. The programming language will be interpreted, object oriented and have higher order programming. Perl 1 seems like a decent starting point, as it's yacc based, and 5000 lines of code. Later versions of Perl are too large to get a good understanding of the whole program in a short period of time. Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL? What other small languages can be used instead? How do I go about designing a small programming language in practice, using what I already know about compiler theory?"

8 of 89 comments (clear)

Min score:

Reason:

Sort:

Holy cow! by sidecut · 2004-11-10 15:15 · Score: 5, Insightful

Can you give us a few more specifics on what the language will be used for? Will it be embedded? Database connected? Real-time? Interactive?
While this seems like beaucoup fun, I'd question the need to extend an existing language by altering the compiler. Towards that end, you might want to use LISP or Scheme, as language extension is built into the language. ( See what Paul Graham has to say about the subject)
Two quick comments by poincaraux · 2004-11-10 15:37 · Score: 4, Insightful

I can't really tell when you're asking questions and when you're stating project requirements, but ..

I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison.

you might want to check out ANTLR.

Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL?

Did you just say "I can only use GPL'd things. Python isn't GPL'd. Can I use it?" I'll assume you meant something like "I want something with a nice license. Does Python have a nice license?" instead. If that's what you meant, you should check out the Python license for yourself. Summary: it's a nice license. It's a certified Open Source license, imposes fewer restrictions than the GPL and is compatible with the GPL.

Python definitely doesn't fit into 5000 lines of source, though, so Perl1 might be a better bet. PyPy is pretty cool, if you're looking for something smallish and Pythonish.

Sorry if you're looking for something else and these comments turn out to be totally useless.
Rewrite! by acidrain · 2004-11-10 16:23 · Score: 4, Insightful

Adapting something that is not really what you are making is just taking on a crutch. You get moving faster to start, but become dependant on something that slows you down.

If you have what it takes to write a new language, then you would be best starting from scratch. Read 3-5 codebases, make a list of the things you liked/didin't like and start out on you own. In the long run, having written the thing your self will give you the advantage. You will know intuitively how everyhting works and how extensions will fit in. That will give you a 2x advantage.

Don't be afraid to read over two other existing implementations as you go. Sharing ideas is very important.

An approach I have also taken is re-write a program in parts. You pick a major component, and replace it with what you need. This gives you testing check-points. The more often you get to a working state and test, the less time you will spend debugging overall. If you can look at perl 1 and determine you can add to it to make the parser a super-set of what you want you could start there. Then you can write something that interprets the byte-code output (the subset generated by your language) to what you want, and write your own interpreter. Then you can tackle replacing the parser and byte code generator... With flex and bison, that should be easy enough. But plan to replace the entire thing. Otherwise you will spend a lot of time reworking things that are not really what you want. If you discover a few gems along the way that you want to keep/port all the better, just don't take on any crutches.

Oh, I would recommed the STL if you are in a hurry. I don't know why functional is better, personally it just gave me a headache in school. But some people claim great results with it...

Finally, a real language must be able to compile itself. Or at least generate it's own byte-code in the case of interpreted langauges. Think about it! You could hack perl 1 to generate your byte code, and then write your parser in perl 1 and have a self compiling language.

--
-- http://thegirlorthecar.com funny dating game for guys
1. Re:Rewrite! by some+guy+I+know · 2004-11-11 04:31 · Score: 2, Insightful
  
  a real language must be able to compile itself.
  Not necessarily.
  A more accurate statement would be "a real general purpose language must be able to compile itself.".
  Some languages fulfill narrow requirements that may not include compiling.
  Adding compiling ability to such a language may make it less efficient for fulfilling its primary purpose.
  Some examples of languages that would probably be made worse if self-compiling ability were added: SQL, APL, and most "descriptive" languages (e.g., HTML, XML, and other SGML derivatives, POV-Ray source, map source for games, etc.).
  
  --
  Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
A few ideas by RevAaron · 2004-11-10 16:38 · Score: 5, Insightful

First, there are two kinds of small languages:
1. small languages like lua, io, and scheme that are small in the built-in libraries and in the total distro. These three are great places to start- both are small, OOPish, allow higher-order programming by passing classes, objects, functions and methods as objects.

2. Then there are languages that are big in some ways, but small in syntax. Some of these are easier to extend than so-called "little languages." The reason is usually that their syntax is small, in an isolated place, easy to get at, and meant to be modified. The two best examples for this are Smalltalk and Lisp. Both of these languages satisfy your other requirements and really kick ass for extention. Unlike the above languages, the so-called little-languages, most Smalltalk and Lisp dialects have big, useful libraries. Unlike a big fat language like perl or C++, having a useful library doesn't mean that the language is a huge pain in the ass to extend.

Both Lisp and Smalltalk have a number of implementations. I am a big fan of Squeak Smalltalk, though systems like Little Smalltalk or even GNU Smalltalk maybe worth checking out.

A lot of people here have bad feelings about Lisp-like languages. It's a shame, since Scheme, ISLISP (OpenLisp is a great implementation) and Common Lisp are all *very* powerful languages. You can be quite productive with them once you get over the part about whining about parens. But Lisp may very well be the best option here, there is a long history of people writing custom-syntaxes and language extensions. Look up Common Lisp macros- power almost beyond comprehension, a lot of fun to play with, and with an elegance all its own.

There are examples of people writing a C-like syntaxes for various Scheme implementations. IIRC, Gambit-C (a Scheme to C compiler) comes with one. On Cliki, there are a bunch of other alternative Scheme syntaxes listed.

To, one of the big advantages to using a language in the second category is that syntax extension/modification is done in the language itself, rather than in C. With that comes the familiarity of the language you're creating and the other benefits you gain by using a high-level language like Smalltalk or Common Lisp.

Just some thoughts...

--

Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Re:Do not design a programming language. by radpole · 2004-11-11 01:22 · Score: 2, Insightful

Your right noone should ever try anything new. All the smart people have already done everything anyway. Oh yeah 640k is enough memory.
Very good suggestion by turgid · 2004-11-11 05:43 · Score: 2, Insightful

FORTH is trivial to implement (in a few hundred lines rather than a few thousand) and can be compiled or interpreted. It is interactive, the parser is completely minimal (all tokens are seperated by spaces with few exceptions) and the compiler/interpreter/system can be extremely compact. The code also runs relatively quickly. FORTH was fairly popular in the days of 8-bit micros and 16-bit minis for these reasons, and is still used in microcontrollers and workstation firmware.

--
Stick Men
Re:Uh. by keesh · 2004-11-11 06:15 · Score: 2, Insightful

They're only 'equivalent' at a really basic level. Sure, from an academic "what can be calculated?" POV they're the same (although *not* when time complexity is considered), but for any practical purpose they are not equivalent. Compare how long it takes to write, say, a program which adds up ten numbers read from stdin in BF or INTERCAL with how long it takes to write the same program in Ruby or Haskell...