Slashdot Mirror


Parsing Algorithms and Resources?

Derek Williams asks: "I'm a senior majoring in computer engineering & computer science and I've been programming for about 7 years, mainly in C and Java. While I've had quite a few courses that delve into some of the deeper topics of programming (e.g. Object Oriented Design), I find that the majority of programs I write, both for work and elsewhere, involve parsing. Although I have no problem tackling these sorts of programs, I was wondering if there was some branch of computer science dedicated to the study of parsing. What books and websites out there are of interest to someone looking to learn more about parsing and algorithms relating to it?"

22 of 52 comments (clear)

  1. A quick search on Google for parsing returns... by arb · · Score: 4, Informative
  2. Compilers and Parsing... by hackwrench · · Score: 3, Interesting

    Compilers and parsing seem to go hand-in-hand. With that in mind, here's a link to Jack Crenshaw's How to Build a Compiler

  3. A quick search on Google by arb by kasal · · Score: 3, Funny
    Hallo all and arb,
    you should at least look to the results of the search. One of the links is about compression algorithms and other two are about natural language parsing.

    The following two are the only ones relevant:

    Stepan Kasal

  4. duh! welcome to the 21st century by jilles · · Score: 2

    In short, unless you have a very good reason (i.e. you know what you are doing) to write your own parsers, you should be using a parser generator (use google with the right keywords and you will find plenty of resources) or regular expressions (for the simple stuff, again: google is your best friend).

    --

    Jilles
  5. Read books on Compiler Design by soegoe · · Score: 2, Interesting

    Try some books on Compiler Design, especially the first chapters about analysis. They should provide lots of information about parsing. One good book is Modern Compiler Design .

    1. Re:Read books on Compiler Design by speedy1161 · · Score: 2, Interesting

      This book is heavy on the theory and such, a bit light on examples, but since they primarily use (f)lex and (b)yacc, finding examples if you get stuck isn't too hard. They dedicate a lot of pages to parsing and parser theory.

  6. Book recommendations by Evil+Attraction · · Score: 3, Informative

    Start with Learning Perl, proceed with Programming Perl and finish off with Mastering Regular Expressions. Your parsing needs will be filled forever.

  7. Red Dragon Book by blackcoot · · Score: 5, Informative

    You've got a couple choices -- finding yourself a good regular expression library seems like a good start ;-) If you're looking to do something a little more interesting than just lexical analysis, check out the red dragon book (better known as Compilers: Principles, Techniques, and Tools by Aho, Sethi & Ullman. I used it in my compiler course and I can tell you that they hit all the various parsing techniques (recursive descent, LA, LALR, SLR, etc.) very well, along with some other stuff. They concentrate on Lex/Yacc as tools -- you may prefer to check out ANTLR -- Terrence Parr's parser generator. It can be targeted at a bunch of languages and can also produce tree walkers for when it comes time to use your parsed data.

    1. Re:Red Dragon Book by PD · · Score: 2

      Imagine a program, where there is an embedded scripting language. Imagine that the scripts can be attached to various objects to tie the system together. Imagine that these scripts are interpreted by a parser, and all have to run at once. Imagine that you have just found a use for a reentrant parser.

    2. Re:Red Dragon Book by PD · · Score: 2

      Yes, you're misunderstanding me. A nice simple way of writing an interpreter is to use a parser like Bison. Write Bison rules to recognize all the statements in your language. When you've got a looping construct, just reset your input stream to the top of the loop and continue parsing. Each rule executes code that does something.

      Take a look at the ubiquitous calculator example and imagine that expanded to include all the regular language features that you'd expect to see.

  8. sed and regular expressions by den_erpel · · Score: 2, Informative

    For reading about parsing (and regular expressions), the book of O'Reilly "Sed and Awk" is a good start.
    Based (and extended), you'll find a lot of information in "Programming Perl".
    And if you're writing in C, again the O'reilly book "lex and awk".

    http://py-howto.sourceforge.net/regex/regex.html
    http://sitescooper.org/tao_regexps.html

    Hey, I can't help it, that I find the books by that publisher exactly what I'm looking for in my programming needs :)

    --
    Genius doesn't work on an assembly line basis. You can't simply say, "Today I will be brilliant."
  9. How to deal with parsing by Far� · · Score: 3, Informative
    There is a vast literature on parsing, and a lot of (mostly useless) parsing theory is a large part of the cursus in many CS departments. Check CiteSeer to get a glimpse of how much literature there is.

    The best tools to build parsers and manipulate parsed syntax trees are functional languages, such as OCaml (with its streams parsers, camlp4, ocamllex/ocamlyacc, etc.), or SML, Haskell, etc.

    Of course, if you were a LISPer, you'd know that although you have lots of well-known tools to build new parsers, such as Meta or Zebu, the best thing to do about a parser is not to write it, but rather to reuse the builtin extensible universal parser, READ, and its extensible universal unparser, WRITE.

    If you spend most of your time writing parsers, you're not just using the wrong tools, you're also using the wrong approach.

    Just my .2 mg of e-gold worth...

    --

    -- Faré @ TUNES.org
    Reflection & Cybernet

  10. compiler design books and resources. by ncarey · · Score: 3, Informative

    Well...the Dragon book for starters, as mentioned earlier. That's probably the ur-source for most of the theory behind the magic. Makes my head hurt, though.

    Terence Parr's book, Practical Computer Language Recognition and Translation (out of print). His doctoral dissertation is a useful thing too (try the Purdue University library).

    comp.compilers is another useful resource. It's archived at http://compilers.iecc.com.

    Alan Holub's Compiler Design in C is a classic.

    The ACM's SIGPLAN ("Special Interest Group On Programming Languages") and it's journal SIGPLAN Notices of the ACM are all fine resources. So is ACM Transactions on Programming Languages and Systems.

    Don't forget the IEEE as well.

    Not to mention Abelman and Sussman: Structure and Interpretation of Computer Programs.

    The garbage collection page is a good source for information on memory management and garbage collection.

    Your university's library is another good resource.

    Well. That should keep you out of trouble.

    --
    N. --
  11. Reverse parsing by ka9dgx · · Score: 3, Interesting
    Things get really interesting when the parser is set up to preserve context so that it can be run backwards, spitting out source from the symbol tables and code segment tokens. You need to preserve whitespace and declaration order, but it's definitely feasible for a language that doesn't use tons of Macros (sorry C/C++ programmers) like Pascal, Basic, etc.

    --Mike--

  12. Lex and Yacc by Aniquel · · Score: 2, Informative

    I'm surprised that no one has mentioned lex and yacc. Guys, perl != generic parser. If you're looking to delve into the science of scanners and parsers, from tokenizing to LR1 grammars, look at lex and yacc. Their java equivalents, JLex and CUP, are just as good for that language (and a whole lot easier to use, I might add). OReilly has a small book on using them, and it would probably be a good introduction to the science of parsing - not to mention the foundation of compilers.

  13. Compiler Books, data types, combinator parsing by Tom7 · · Score: 2

    While it's kind of scary that you're a senior with 7 years of experience and don't know this, it should be easy to catch up by reading the beginning of a compilers book. Older compiler books tend to concentrate on parsing a lot, since parsing used to be "most" of the work involved in compiling. Actually, all the theory on how to conserve the last bit of memory using some kind of wacky table is pretty useless these days, but it's there.

    Though perl's built-in regexp support makes its syntax more brief for matching regular grammars, many grammars are not regular (The grammars of most programming languages are context free or worse!). For those grammars, a parser generator like bison can make life a bit easier.

    Then, manipulating parse trees in a language like C or perl is pretty painful once you've used datatypes and pattern matching in a language like SML, O'Caml, or Haskell. If you're going to be doing a lot of work on the parsed data, and its structure is not trivial, give these a shot.

    Finally, about the only thing that's excited me about parsing for years is the idea of parsing combinators: writing parsers that look like the grammar IN the language itself. They can allow the parsing of arbitrary grammars, and don't require resorting to external tools. There's a good beginning explanation in Paulson's "ML for the Working Programmer", though I can't find a tutorial online... maybe it's time to write one!

  14. Not... by Tom7 · · Score: 2


    Your needs for parsing regular (or regular-ish) grammars in perl will be filled. Unfortunately, regular expressions are not always the right tool for the job (see the gargantuan e-mail address parser in the back of the MRE book for an example) because many grammars are not regular. Try parsing (say) C with perl! Also, perl has poor facilities for manipulating such parse trees, especially compared to a language with datatypes and pattern matching like SML or O'caml.

    Perl makes parsing flat data brief (though regular expression libraries in other high level languages make it just as easy, if a little more verbose), but it is not by any stretch the be-all and end-all of parsing.

  15. Java Book on Parsers by The+Slashdolt · · Score: 2

    I have found this book very good. I am a Java programmer and I wanted to understand parsing from the bottom up, before using something like ANTLR. This books gives a great foundation with code to explain it all.

    --
    mp3's are only for those with bad memories
  16. Branch of computer science dedicated to the study by geophile · · Score: 2

    The following is not meant as a flame.

    This is an odd question coming from a cs/ce senior. What, may I ask, have you been studying, when you should have been in a compiler course?

    I've been out of school for a while, but it is inconceivable that compilers would now be an optional part of any CS curriculum.

  17. A few tools do it all... by DrCode · · Score: 2

    While there's lots of theory, I've been living for years off the same small set of tools:

    For C/C++, there's YACC/LEX or BISON/FLEX.

    For Java, there's CUP/JFLEX.

  18. Other books by Pseudonym · · Score: 2

    Some have mentioned the dragon book (Aho, Ullman et all) and the tiger book (Appel's "Modern Compiler Implementation"). Both excellent references.

    If you're at all mathematically minded, you should look into a good book on formal languages, such as Harrison's "Introduction to Formal Language Theory", or R. Nigel Horspool's "LR Parsing". (Caution: Don't buy these. Look for them in your local university library.)

    Once you have the basics, you should look into the work of Mark Hopkins. The guy is a genius, and understands more of the theory of parsing than anyone I've ever seen, and has invented some techniques which far surpass the "state of the art" of lex and yacc.

    --
    sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  19. dumb question by khuber · · Score: 2
    Try using google.com. sheesh

    -Kevin