Slashdot Mirror


Which Compiler to Extend for a Small Project?

Andreas(R) asks: "While planning the design of my small programming language, and would appreciate some lessons learned from experienced programmers which have already tried this. I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison. The programming language will be interpreted, object oriented and have higher order programming. Perl 1 seems like a decent starting point, as it's yacc based, and 5000 lines of code. Later versions of Perl are too large to get a good understanding of the whole program in a short period of time. Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL? What other small languages can be used instead? How do I go about designing a small programming language in practice, using what I already know about compiler theory?"

11 of 89 comments (clear)

  1. Re:This is going to get me in trouble by SpaceLifeForm · · Score: 3, Interesting

    I can't argue your excellent points.
    I would add that (given time) that he may want to look at SmallTalk also.
    At least for inspiration.

    --
    You are being MICROattacked, from various angles, in a SOFT manner.
  2. Re:my $0.02 after a couple compiler classes by Profane+MuthaFucka · · Score: 4, Interesting

    Wow, all excellent ideas. I will add just one more: if you can possibly manage it, use a garbage collected language, or make use of the Boehm collector. If you're using a bison/yacc approach, the structure you're working in can make proper allocation and deallocation pretty complicated.

    --
    Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
  3. Try a compiled Lisp by Anonymous Coward · · Score: 1, Interesting
    You're building a compiler? A compiler requires a parser in the front and a code generator in the rear, tied together by some sort of syntax tree. Various language constructs will require transformations upon this syntax tree.

    As it happens, there already exists a class of languages that are strong at manipulating syntax trees, and at writing parsers. Several of them also support dynamic compilation, meaning that your language implementation can choose when to stop dicking with the syntax tree and instead compile it to native code.

    One such language is Steel Bank Common Lisp. SBCL runs on a number of platforms, chiefly Linux. It is a native-code compiler -- contrary to the usual myth about Lisps, SBCL doesn't even contain a Lisp interpreter; it compiles all expressions to native machine code. If that doesn't suit, consider CMUCL, from which SBCL is derived; it includes both an interpreter and compiler.

    Common Lisp is designed to be extended. You can customize the compiler by writing compiler-macros, which specify optimizations or special cases for compiled code. You can write your own parser, or extend the built-in syntax with read-macros -- so you can use a mix of built-in syntax and your own, or gradually move from the former to the latter while having a working code base at every step.

  4. Re:Holy cow! by xp · · Score: 5, Interesting

    Or why don't you design a meta-language using which other languages can be designed -- a language that remains completely extensible -- something like MDef.

  5. OCaml by Markus+Registrada · · Score: 2, Interesting
    OCaml was pretty much designed specifically for this sort of thing. Every part of the language system accepts plug-ins for your private variants. If you can bear to use a parsing language that is far more powerful than yacc, you might consider it.

    That said, interpreted languages are stupidly easy to do, so much so that it's hard to learn much -- they forgive every mistake until long after it's too late to fix it. (Witness Perl.) A high-performance language is a Good Thing not particularly because the programs run faster, although that's a nice side benefit, but rather because it enforces a fundamental rigor. There's no faking performance. When you face that kind of problem squarely, God speaks to you, and if you listen carefully enough you can learn deep truths.

    But that's not for everybody. Most people have had any desire to learn anything, deep or otherwise, beaten out of them. By all means go for easy comfort, the economic vitality of the nation depends on people like you. Forget I said anything. Garbage collection, bytecodes, regexps, yeah!

  6. Have A Look At LLVM by swagora · · Score: 3, Interesting

    Andreas(R):

    While you haven't provided enough details to comment in length, I do have some experience with what you're planning. A couple of years back I started a programming system (XPS) which was rather audacious in scope. After two years of working on it, I realized that I too needed a "back end" compilation system. I looked at various alternatives like GCC (too complex), research compilers (low quality), open source virtual machines like Mono (immature at the time). I was quite surprised when I looked at UIUC's LLVM compiler toolkit. I thought it would be just another half-baked compiler system from a University that never got finished when the Ph.D student left. Instead, I found a well designed, working, *toolkit* for compiler construction. While LLVM still lacks some features, its core is very solid and easily extensible. I've been working with it for a year now and its been quite a pleasure. Check it out at http://llvm.cs.uiuc.edu/

  7. Use a simple parser by Frans+Faase · · Score: 2, Interesting
    Consider to use a parser that is easier and more powerful than yacc and lex. Have a look at IParse, a simple, small interpretting parser. The whole source is in a single 92 Kbyte file.

    Forget about using a Virtual Machine. That is nice for speed, but it requires a lot of work. Beter make an interpretter that interprets the abstract program tree. I once started doing this for a JavaScript interpretter, but I never came to finish it.

  8. Uh. by warrax_666 · · Score: 2, Interesting

    All Turing Complete languages (which BF is) are equivalent. That means that any OO Turing Complete language can be implemented in a non-OO Turing Complete language. Case in point: C++ compiles to assembly/machine code which is certainly not OO.

    How to implement an OO language on a Turing Machine: Do something equivalent to how OO is implemented in C, ie. store "function pointers" (tape addresses) along with the data and use indirection to call those every time a virtual method is needed.

    --
    HAND.
    1. Re:Uh. by TwistedSquare · · Score: 2, Interesting

      You may be interested in this guy's thoughts on the matter of Turing Completeness.

  9. Battery life by tepples · · Score: 3, Interesting

    It mattered with 640KB of RAM at 20MHz

    How much CPU power is in inexpensive handheld devices again? I program for one that has 384 KB of RAM at 16.8 MHz. Wouldn't overclocking a handheld device drain the battery faster?

  10. Suggested reading/viewing by C+Joe+V · · Score: 2, Interesting
    Original poster: What, pray tell, is "compiler theory"? I'm a little perplexed that someone who knows something about "compiler theory" is asking Slashdot how to write a compiler. Most of the answers you will get here are from people who don't really know any "compiler theory".

    People suggesting Lisp/Scheme: Sure, these languages are extensible, but any extension of Lisp/Scheme you can create with macros or whatever will still look like Lisp/Scheme. If the point of this exercise is to design one's own language, one might not want to base the syntax on Lisp's.

    People suggesting using various parser generating tools (yacc, bison, antlr,...): A parser is not a compiler. I guess it's good to use the right tool for each part of the job, but parsing is really the least of his problems.

    If the point is designing a language for the fun of it, don't bother writing a compiler, write an interpreter. As others have pointed out, writing an interpreter in a sane high-level language like ML (or Scheme or, I suppose, even Java) is really dead easy if you're not worried about performance. It will probably distract you less from the language design than a compiled implementation would, and be easier to modify as you go along. Plus it will be portable, maybe. If your language becomes wildly popular (or if you feel that performance is the only thing standing between your language and wild popularity) you can write a better interepreter, or a JIT, or a compiler, later on when you're more sure of what you're doing.

    If the point is writing a compiler for the fun of it, why not do it all yourself from scratch (except maybe for the parser, for which you can use any of the excellent tools recommended by slashdotters if you don't want to do it by hand)? Compilers aren't really that hard (if you know "compiler theory"), especially if you compile to assembly or C and use an assembler or C compiler to go the rest of the way. Compiling to Parrot assembly/bytecode or JVM or .NET bytecode is also a possibility.

    Since you say your language has higher-order features, you will want to read up on implementation techniques for functional languages if you haven't already. For instance, check out the 90-minute Scheme-to-C compiler here.

    Best of luck,
    CJV