Slashdot Mirror


XML and Perl

davorg writes "One of Perl's great strengths is in processing text files. That is, after all, why it became so popular for generating dynamic web pages -- web pages are just text (albeit text that is supposed to follow particular rules). As XML is just another text format, it follows that Perl will be just as good at processing XML documents. It's therefore surprising that using Perl for XML processing hasn't received much attention until recently. That's not saying that there hasn't been work going on in that area -- many of the Perl XML processing modules have long and honourable histories -- it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." Read on to see how well Davorg thinks this book introduces XML text processing with Perl to the wider world. XML and Perl author Mark Riehl, Ilya Sterin pages 378 publisher New Rider rating 8 reviewer Davorg ISBN 0735712891 summary Good introduction to processing XML with Perl

XML and Perl is written by two well-known members of the Perl XML community. Both are frequent contributors to the "perl-xml" mailing list, so there's certainly no doubt that they know what they are talking about. Which is always a good thing in a technical book.

The book is made up of five sections. The first section has a couple of chapters which introduce you to the concepts covered in the book. Chapter one introduces you separately to XML and Perl and then chapter two takes a first look at how you can use Perl to process XML. This chapter finishes with two example programs for parsing simple XML documents.

Section two goes into a lot more detail about parsing XML documents with Perl. Chapter three looks at event-driven parsing using XML::Parser and XML::Parser::PerlSAX to demonstrate to build example programs before going to talk in some detail about XML::SAX which is currently the state of the art in event-driven XML parsing in Perl. It also looks at XML::Xerces which is a Perl interface to the Apache Software Foundation's Xerces parser. Chapter four covers tree based XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM and XML::LibXML. In both of these chapters the pros and cons of each of the modules are discussed in detail so that you can easily decide which solution to use in any given situation.

Section three covers generating XML documents. In chapter five we look at generating XML from text sources using simple print statements and also the modules XML::Writer and XML::Handler::YAWriter. Chapter six looks at taking data from a database and turning that into XML using modules like XML::Generator::DBI and XML::DBMS. Chapter seven looks at miscellaneous other input formats and contains examples using XML::SAXDriver::CSV and XML::SAXDriver::Excel.

Section four covers more advanced topics. Chapter eight is about XML transformations and filtering. This chapter covers using XSLT to transform XML documents. It covers the modules XML::LibXSLT, XML::Sabletron and XML::XPath.

Chapter nine goes into detail about Matt Sergeant's AxKit, the Apache XML Kit which allows you to create a website in XML and automatically deliver it to your visitors in the correct format.

Chapter ten rounds off the book with a look at using Perl to create web services. It looks at the two most common modules for creating web services in Perl - XML::RPC and SOAP::Lite.

Finally, section five contains the appendices which provide more background on the introductions to XML and Perl from chapter one.

There was one small point that I found a little annoying when reading the book: Each example was accompanied with a sample of the XML documents to be processed together with both a DTD and an XML Schema definition for the document. This seemed to me to be overkill. Did we really need both DTDs and XML Schemas for every example. I would have found it less distracting if one (or even both) of these had been moved to an appendix.

That small complaint aside, I found it a useful and interesting book. It will be very useful to Perl programmers (like myself) who will increasingly be expected to process (and provide) data in XML formats.

You can purchase XML and Perl from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

4 of 138 comments (clear)

  1. It's a great book about a terrific subject by PhysicsGenius · · Score: -1, Troll

    Perl and XML. XML and Perl. It's a marriage made in heaven. One of them uses a cryptic, machine-readable-only encoding to concisely depict data and programs. The other is a markup language.

  2. If Larry Wall gave his wife a gift... by digitalgimpus · · Score: -1, Troll

    If Larry Wall gave his wife a gift...

    Would it be a Perl Necklace? ;)

  3. Formalised features of Perl (in this book?) by Rat+Tank · · Score: -1, Troll

    (Moderators: skip to note at the end before you moderate this ;) )
    I've programmed for some time in Perl, but at no time has this been anything to do with the CompSci degree I'm studying; no course even mentions it. Why is this? Perl doesn't seem to have much respect in educated programming circles, and I think this is why;
    It's type system is not entirely sound. Inference upon the typing rules (which aren't formally stated anywhere; I had to derive them from the sourcecode) can lead to propositional contradictions.
    It is most certainly _not_ Turing complete (trivially provable); hence not all algorithms can be implemented in it that you could with a Turing complete language like Java or C(++).
    It's reference counting system of garbage collection can sometimes result in memory leaks, as opposed to the more thorough graph traversal employed in other languages.
    IMPORTANT: Please, only reply to this post or moderate it if you actually understand the principles of compsci that I'm arguing about. I've been smacked down before by ignorant kiddies before, and would much prefer to see more reason in the future.

  4. Perl6 is a mistake by Anonymous Coward · · Score: -1, Troll
    I've been using perl pretty much constantly since the Pink Camel, and believe me, Perl 5 is an extremely good language for quick scripting things. That's what it was designed for. Sure, you can do big projects in it, but it's not exactly ideal. Recently I've started using Ruby as well, and I intend to move my department over to it instead of wasting time with Perl 6.

    One of the goals of Perl 6 is to make non-trivial projects possible. That's good. The way it's being done is bad. Perl was once a lightweight, extremely flexible language. Now it's become a huge ugly monster. People wanted OO, so a nasty hack was bolted on top to allow some semblance of it. Now this nasty hack is being expanded. Sure, the code's different, but the basic form is the same. Kludge upon kludge upon kludge; I'd much rather have a nice, clean, pure language (and not one with loads of irritating whitespace thank you very much).

    The same goes for the syntax. All the switching between $, @ and % is really irritating (ask a newbie how to get at the length of the keys array of a hash inside a hash, for example), and the changes proposed for 6 are just making this worse -- it seems that Larry, in his infinite wisdom, wants to prefix every data type with a different hard-to-type character. Perl was only designed for the three data types, and adding more is a mess.

    Perl 6 is a complete rewrite, but it keeps all the mess which has accumulated over the previous versions. This is not good. Sure, my const int $var = 27; may look neat (in the same way that, say, Pascal does), but $var isn't entirely constant, or entirely an integer, it's just a hack which makes it sort of behave like one. The whole thing is an exercise in pseudo-computer science masturbation with little real purpose except to please the managers who dislike the one thing that makes Perl special.

    On a similar note is regexes. I'm an avid fan of regular expressions simply because a nondeterministic finite automata is far more flexible than linear code. However, Larry must have been smoking that cheap $2 crack when he wrote this. Does he want Perl 6 to be flex or something?

    I won't be going on to use 6. It's a nice idea, but it's completely unnecessary. It won't make large projects any easier to manage (the language is still, at heart, an almighty hack -- an impressive one, but still a hack). It won't make OO any cleaner. It won't make development any faster. To put it bluntly, Perl scripts will still look less beautiful than our friend Mr Goatse. I'd prefer to use a language which has always been pure synthesis of science and engineering, not some half-baked imposter.

    Perl 6 will be nice, but I'm guessing it will be the end of Perl. It can't do what it wants to do whilst still being based upon a nasty mess. There are now other options, which provide all of Perl's power and none of the mess. Sorry, but BSD^W Perl is dying. Larry is buggering it up the ass without lubricants, just like Shoeboy is doing to Larry's daughter.