XML and Perl
XML and Perl is written by two well-known members of the Perl XML community. Both are frequent contributors to the "perl-xml" mailing list, so there's certainly no doubt that they know what they are talking about. Which is always a good thing in a technical book.
The book is made up of five sections. The first section has a couple of chapters which introduce you to the concepts covered in the book. Chapter one introduces you separately to XML and Perl and then chapter two takes a first look at how you can use Perl to process XML. This chapter finishes with two example programs for parsing simple XML documents.
Section two goes into a lot more detail about parsing XML documents with Perl. Chapter three looks at event-driven parsing using XML::Parser and XML::Parser::PerlSAX to demonstrate to build example programs before going to talk in some detail about XML::SAX which is currently the state of the art in event-driven XML parsing in Perl. It also looks at XML::Xerces which is a Perl interface to the Apache Software Foundation's Xerces parser. Chapter four covers tree based XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM and XML::LibXML. In both of these chapters the pros and cons of each of the modules are discussed in detail so that you can easily decide which solution to use in any given situation.
Section three covers generating XML documents. In chapter five we look at generating XML from text sources using simple print statements and also the modules XML::Writer and XML::Handler::YAWriter. Chapter six looks at taking data from a database and turning that into XML using modules like XML::Generator::DBI and XML::DBMS. Chapter seven looks at miscellaneous other input formats and contains examples using XML::SAXDriver::CSV and XML::SAXDriver::Excel.
Section four covers more advanced topics. Chapter eight is about XML transformations and filtering. This chapter covers using XSLT to transform XML documents. It covers the modules XML::LibXSLT, XML::Sabletron and XML::XPath.
Chapter nine goes into detail about Matt Sergeant's AxKit, the Apache XML Kit which allows you to create a website in XML and automatically deliver it to your visitors in the correct format.
Chapter ten rounds off the book with a look at using Perl to create web services. It looks at the two most common modules for creating web services in Perl - XML::RPC and SOAP::Lite.
Finally, section five contains the appendices which provide more background on the introductions to XML and Perl from chapter one.
There was one small point that I found a little annoying when reading the book: Each example was accompanied with a sample of the XML documents to be processed together with both a DTD and an XML Schema definition for the document. This seemed to me to be overkill. Did we really need both DTDs and XML Schemas for every example. I would have found it less distracting if one (or even both) of these had been moved to an appendix.
That small complaint aside, I found it a useful and interesting book. It will be very useful to Perl programmers (like myself) who will increasingly be expected to process (and provide) data in XML formats.
You can purchase XML and Perl from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.
One new, and cool, Perl XML module that people might not know about is Petal (PErl Template Attribute Language).
It is an implementation of the Zope TAL (Template Attribute Language) specification and it basically allows you to create XML templates where all the templating commands are just attributes of existing tags.
This allows things like XHTML templates which are very WYSIWYG friendly since the editors don't do anything with attributes that they don't know about.
Check out MKDoc a mod_perl CMS
Would be nice to have a book with more than just one chapter on web services.
You might be interested in Programming Web Services with Perl then.
Radioactive cats have 18 half-lives.
How effective were the examples? How easy to read and understand were the general concepts? Were the descriptions of libraries and API's clear? Was the writing generally readable?
Would this book even make a good reference?
Jeez, anyone want to follow up the post with a real review?
Indeed, the perl only XML libraries are quite slow. I believe most of the quality perl XML handling is done by modules that use C libraries to do the grunt work. However, if the data in the XML itself is text data, then of course, perl and XML are a good match. Add SOAP and mod_perl into the mix, and you got some very nifty tools.
(For reference, see this rant by the brilliant net.kook Erik Naggum. The most quotable bit, for the lazy among you, is
)I think one of the main reasons Perl and XML aren't generally used together is because Perl isn't object oriented in the same way the Java and C# are. I know that OO concepts have been bolted on to Perl in the same way the OO was bolted on to C++ and in my opinion with similar results (i.e., kludge-fest). It's very natual in Java to parse an XML doc and get an object, while it's more natural to parse a log file or CSV file with Perl.
Well, perhaps not your soul, but your Perll code just reflects the way you think to a greater extent than other languages. This isn't something that's done underhandedly, it is well advertised in every posting in c.l.perl and the Camel book, and every other book about Perl. Which is that Perl is not at all orthogonal, TMTOWDI (there's more than one way to do it). If you want to be rigorous and declare everything and not have your typos become references automatically, you "use strict" and your magic line is "#!/usr/bin/perl -w". If not, well Perl allows you to do that too. If you want objects, you can do that, if not, not.
// and tr and split I get along just fine.
If is possible to write quality code in Perl Just because the language allows you to not do so isn't its fault. It doesn't stop you from doing it, because that'd stop you from doing brilliant things.
To address some specific things you mentioned, you can do full-fledged exception handling in Perl if you want to (with eval and specific modules), or, you know, not. And I'm not familiar with the false positive matches in regexps (perhaps you're referring to some famous problem). But if a regexp doesn't do what you want it to, isn't is wrong? Between
As you're also aware, most Comp Sci courses fawn over Pascal, a VERY formalized language. However, it's not mentioned much past education circles (and Apple afficionados).
In practice, reference counting doesn't seem to lead to memory leaks as you describe. And, I would argue it is much more efficient than Java's method.
PERL is an excellent SCRIPTING language. Larry Wall describes it as a "glue" language. XML is a good thing to glue together. It's perfect for that. Every tool has its purpose; push any too far, and you start abusing it.
Trying to find the quote from Larry Wall. I think it goes something like this: "Perl did easy things easily and made impossible things doable."
To Copy from One is Plagiarism; To Copy from Many is Research.
Ah no, see, you forgot to read the first line:
"One of Perl's great strengths is in processing text files."
Perl is good at handling text files. XML is a text file. Therefore, Perl is good at handling XML.
As opposed to:
My pasta maker is good at making pasta. Pasta is a type of food. Ice-cream is also food. Therefore, my pasta maker is good at making ice-cream.
Does that help?
Score:-1, Funny
I see the table of contents explained in paragraph form. And then one complaint about the organization of the book. And then I expect to read the review, but it's already on to "you can buy this book here", and user comments.
I know complaining about slashdot stories is like shooting those proverbial barreled fish, but sheesh.
But if people are interested in getting a good price rather than putting a commission into your pocket (and contributing to a company that abuses software patents), maybe they should order it from Bookpool instead, for $3 less than Amazon. (I don't have any affiliation with Bookpool.)
Actually, /. used to link to Amazon, and had an affiliate program. Once Amazon started enforcing their one-click patent, and the Amazon boycott began, /. switched to Fatbrain (which was bought by BN).
I'm seeing a lot of comments that perl doesn't have any particular strengths when dealing with XML. A good module people should check out is XML::Simple. Basically, it automagically turns XML into a nested data structure, and automagically turns a nested data structure into XML. The great thing about it you just make a single API call, and just directly access the data from there without having to learn anything more complicated. Definitely not an end-all solution, but definitely handles the common case wonderfully, and has quite a few handy options to allow more fine tuned control.
Except that your syllogism is faulty, whereas his is not.
His:
1. (from earlier in his post) Perl is well suited for processing all text formats.
2. XML is a text format.
3. Therefore, Perl is well suited for processing XML.
Yours:
1. Your pasta maker is good at making pasta.
2. Pasta is a type of food.
3. Therefore, your pasta maker is good at making all types of food (for example, ice cream).
You can see that he went from general to specific, whereas you went from specific to general. He argues that being able to do all things in a given set (process all text formats) gives the ability to do one of the things in that set (process a particular text format). You argue that being able to do one thing in a set (make a particular food) gives the ability to do all things in the set (make all foods).
You could save your argument by changing your middle point to be "All foods are a type of pasta," and then your conclusion becomes trivially true. But you'd also have to get everyone to agree that ice cream is pasta.
--
Promoting critical thinking since 1994.
The reviewer is correct, Perl is a good tool for slamming and jammin' text, including XML. What I'm not so sure of is the quote "It's therefore surprising that using Perl for XML processing hasn't received much attention until recently."
I mean one need only scroll down the extensive list of CPAN Modules to see well over 50, as well as many sites/authors devoting time, energy and resource.
Similarly, I would point out some press modules supporting web services via XML, such as SOAP::Lite as far back as 02/26/01 and XML-RPC also in '01 -- or O'Reilly's own XML.com with articles such as "Processing XML with Perl" written shortly after the turn of the millenium.
Point is, though I personally love Perl, blatant plugs such as "... it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." " don't inspire confidence in the reviewer's objectivity.
--- have you healed your church website?
check it out. http://axkit.org/
"Apache AxKit is an XML Application Server for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation."
picture coccoon for perl. using perl for xsp pages and doing pipline transformations on xml. great stuff.
"I once rewrote a Perl parser in Java and went from 9hrs to 45mins"
... However, Java is just as goddamn interpretted as Perl, if not more so! Perl compiles to *native* byte-code prior to execution, unless you are talking about eval'd strings, whereas Java sits in non-native byte-code that has to be interpretted real-time by the VM. Best case: you have a good just-in-time compiler that pulls Java up to even with Perl (that is, compiled imediately prior to run-time into native byte-code).
Well, shit. I once rewrote a Perl parser in *Perl* and went from 9hrs to 45mins. What the hell kind of flame-bait shit is this!?
It is true that extremely well-written C code can outperform perl code at anything. It is also true that for things that perl is made for (like ripping through tons of text-data), a typical Perl program will *most likely* do it better than a typical C program, simply because it is making use of more optimized underlying algorithms (even though the actual execution structure is slightly more bloated than C... double-dereferencing pointers, compile-time imediately before run-time, etc).
Also, Java has all the same disavantages with respect to C... that is more insulation from the *actual* memory (no such thing as a real pointer in either, garbage-collection, etc).
Anyway, bottom-line is this. If what you say is at all true, then you had a shittily-written Perl program. I promise you that I can write just as shitty a program in Java... does that mean that we should trash Java?!?!? Abso-f*cking-lutely not! I'll do you one better, too: I'll write just as shitty and slow of a parser in Java that doesn't even *look* that bad to someone who doesn't understand the subtleties behind such simple abstractions as strings, lists and arrays.
I'm very serious with what I said originaly, I have, in fact, taken a Perl parser (a super-light-weight XML parser, actually) and reduced the parse-time by several orders of magnitude. The idiot who wrote it originaly (myself), went walking through the string or stream looking for 's (with a regexp), at the highest level. It is *terribly* slow to strip leading characters off of a long string in Perl (I'm pretty sure that it copies the whole goddamn string, minus those 10 (or however many) characters on the front). I made a *very* simple change, namely this:
# split on positive lookahead assertion of a ''
# then we just deal individually with blocks of text that all start
# with a ''... should save time
my @xml = split(/(?=)/,' '.$xml);
shift @xml;
And, you'll note that I f*cking commented it (something which people just don't seem to understand when they trash perl). Bang! Many orders of magnitude in speed improvement. Simple.
Anyway, pull your head out of your ass.
:Wq
Not an editor command: Wq
Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.
That only correlates if ice cream is a type of pasta, because XML is a text format.
This is a lot more like saying "since my pasta maker is good at making Ziti, Rigate, Macaroni, etc., all pastas really, and Spaghetti is a type of pasta, my pasta maker should be good at making Spaghetti.
Information wants to be anthropomorphized.