Parsing Algorithms and Resources?
Derek Williams asks: "I'm a senior majoring in computer engineering & computer science and I've been programming for about 7 years, mainly in C and Java. While I've had quite a few courses that delve into some of the deeper topics of programming (e.g. Object Oriented Design), I find that the majority of programs I write, both for work and elsewhere, involve parsing. Although I have no problem tackling these sorts of programs, I was wondering if there was some branch of computer science dedicated to the study of parsing. What books and websites out there are of interest to someone looking to learn more about parsing and algorithms relating to it?"
Parsing Techniques - A Practical Guide
Flexible Parsing
Workshop on The Evaluation of Parsing Systems
Robust Parsing
Parsing Resources
Probably the last one on that list would be the most useful starting place...
Start with Learning Perl, proceed with Programming Perl and finish off with Mastering Regular Expressions. Your parsing needs will be filled forever.
You've got a couple choices -- finding yourself a good regular expression library seems like a good start ;-) If you're looking to do something a little more interesting than just lexical analysis, check out the red dragon book (better known as Compilers: Principles, Techniques, and Tools by Aho, Sethi & Ullman. I used it in my compiler course and I can tell you that they hit all the various parsing techniques (recursive descent, LA, LALR, SLR, etc.) very well, along with some other stuff. They concentrate on Lex/Yacc as tools -- you may prefer to check out ANTLR -- Terrence Parr's parser generator. It can be targeted at a bunch of languages and can also produce tree walkers for when it comes time to use your parsed data.
For reading about parsing (and regular expressions), the book of O'Reilly "Sed and Awk" is a good start.
l
:)
Based (and extended), you'll find a lot of information in "Programming Perl".
And if you're writing in C, again the O'reilly book "lex and awk".
http://py-howto.sourceforge.net/regex/regex.htm
http://sitescooper.org/tao_regexps.html
Hey, I can't help it, that I find the books by that publisher exactly what I'm looking for in my programming needs
Genius doesn't work on an assembly line basis. You can't simply say, "Today I will be brilliant."
The best tools to build parsers and manipulate parsed syntax trees are functional languages, such as OCaml (with its streams parsers, camlp4, ocamllex/ocamlyacc, etc.), or SML, Haskell, etc.
Of course, if you were a LISPer, you'd know that although you have lots of well-known tools to build new parsers, such as Meta or Zebu, the best thing to do about a parser is not to write it, but rather to reuse the builtin extensible universal parser, READ, and its extensible universal unparser, WRITE.
If you spend most of your time writing parsers, you're not just using the wrong tools, you're also using the wrong approach.
Just my .2 mg of e-gold worth...
-- Faré @ TUNES.org
Reflection & Cybernet
Well...the Dragon book for starters, as mentioned earlier. That's probably the ur-source for most of the theory behind the magic. Makes my head hurt, though.
Terence Parr's book, Practical Computer Language Recognition and Translation (out of print). His doctoral dissertation is a useful thing too (try the Purdue University library).
comp.compilers is another useful resource. It's archived at http://compilers.iecc.com.
Alan Holub's Compiler Design in C is a classic.
The ACM's SIGPLAN ("Special Interest Group On Programming Languages") and it's journal SIGPLAN Notices of the ACM are all fine resources. So is ACM Transactions on Programming Languages and Systems.
Don't forget the IEEE as well.
Not to mention Abelman and Sussman: Structure and Interpretation of Computer Programs.
The garbage collection page is a good source for information on memory management and garbage collection.
Your university's library is another good resource.
Well. That should keep you out of trouble.
N. --
I'm surprised that no one has mentioned lex and yacc. Guys, perl != generic parser. If you're looking to delve into the science of scanners and parsers, from tokenizing to LR1 grammars, look at lex and yacc. Their java equivalents, JLex and CUP, are just as good for that language (and a whole lot easier to use, I might add). OReilly has a small book on using them, and it would probably be a good introduction to the science of parsing - not to mention the foundation of compilers.