XML and Perl
XML and Perl is written by two well-known members of the Perl XML community. Both are frequent contributors to the "perl-xml" mailing list, so there's certainly no doubt that they know what they are talking about. Which is always a good thing in a technical book.
The book is made up of five sections. The first section has a couple of chapters which introduce you to the concepts covered in the book. Chapter one introduces you separately to XML and Perl and then chapter two takes a first look at how you can use Perl to process XML. This chapter finishes with two example programs for parsing simple XML documents.
Section two goes into a lot more detail about parsing XML documents with Perl. Chapter three looks at event-driven parsing using XML::Parser and XML::Parser::PerlSAX to demonstrate to build example programs before going to talk in some detail about XML::SAX which is currently the state of the art in event-driven XML parsing in Perl. It also looks at XML::Xerces which is a Perl interface to the Apache Software Foundation's Xerces parser. Chapter four covers tree based XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM and XML::LibXML. In both of these chapters the pros and cons of each of the modules are discussed in detail so that you can easily decide which solution to use in any given situation.
Section three covers generating XML documents. In chapter five we look at generating XML from text sources using simple print statements and also the modules XML::Writer and XML::Handler::YAWriter. Chapter six looks at taking data from a database and turning that into XML using modules like XML::Generator::DBI and XML::DBMS. Chapter seven looks at miscellaneous other input formats and contains examples using XML::SAXDriver::CSV and XML::SAXDriver::Excel.
Section four covers more advanced topics. Chapter eight is about XML transformations and filtering. This chapter covers using XSLT to transform XML documents. It covers the modules XML::LibXSLT, XML::Sabletron and XML::XPath.
Chapter nine goes into detail about Matt Sergeant's AxKit, the Apache XML Kit which allows you to create a website in XML and automatically deliver it to your visitors in the correct format.
Chapter ten rounds off the book with a look at using Perl to create web services. It looks at the two most common modules for creating web services in Perl - XML::RPC and SOAP::Lite.
Finally, section five contains the appendices which provide more background on the introductions to XML and Perl from chapter one.
There was one small point that I found a little annoying when reading the book: Each example was accompanied with a sample of the XML documents to be processed together with both a DTD and an XML Schema definition for the document. This seemed to me to be overkill. Did we really need both DTDs and XML Schemas for every example. I would have found it less distracting if one (or even both) of these had been moved to an appendix.
That small complaint aside, I found it a useful and interesting book. It will be very useful to Perl programmers (like myself) who will increasingly be expected to process (and provide) data in XML formats.
You can purchase XML and Perl from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Perl is a markup language?
slashdot!=valid HTML
Though the reviewer didn't think so, I like it when DTD and XML Schema examples are side by side. Having looked at DTD's for quite some time now, have to change gears to the new standard of using XML schemas.
Would be nice to have a book with more than just one chapter on web services. There are a plethura of Java/C# web services books out there, but it's hard to find one on there just for Perl, PHP, etc.
--------
Free your mind.
... but I thought Perl was a write-only language? How can I be expected to read the book, if it's just gibberish like Perl? Geez. :)
(Okay, fine - I admit it - I kinda like Perl. But that's another story.)
As XML is just another text format, it follows that Perl will be just as good at processing XML documents.
Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.
The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.
Not really. If you're using XML as "just another text format", then you're making a funamental mistake. Within your software, you should always be treating XML as a hierarchical data structure, not as a text stream. Apart from manipulating CDATA or attribute value text, Perl has no particular strength with XML.
One new, and cool, Perl XML module that people might not know about is Petal (PErl Template Attribute Language).
It is an implementation of the Zope TAL (Template Attribute Language) specification and it basically allows you to create XML templates where all the templating commands are just attributes of existing tags.
This allows things like XHTML templates which are very WYSIWYG friendly since the editors don't do anything with attributes that they don't know about.
Check out MKDoc a mod_perl CMS
How effective were the examples? How easy to read and understand were the general concepts? Were the descriptions of libraries and API's clear? Was the writing generally readable?
Would this book even make a good reference?
Jeez, anyone want to follow up the post with a real review?
(For reference, see this rant by the brilliant net.kook Erik Naggum. The most quotable bit, for the lazy among you, is
)Knowing Slashdot, someone would have made a copy of the newsletter and filed that too...
Ah am not a crook! (\(-__-)/)
Although I agree that Perl/XML sounds like a powerful and flexible way to serve dynamic content, I can't help thinking that it is ultimately better to adapt existing frameworks (Slashcode, PHP-Nuke & friends etc..). Maybe a friendly group of Perl/XML gods will read the book and produce a framework/toolkit that the rest of us mere mortals can use. I suspect that I will buy this book anyway, read it, and after frying my brain for a few days I will stuff it on my bookshelf and walk away with a huge inferiority complex. My bookshelf makes me look like a guru, but secretly, my encyclopaedic knowledge comes from here.
Modest doubt is called the beacon of the wise. - William Shakespeare
and i know there are going to be a lot of posts saying "XML obviates Perl!"...
but i disagree. Perl absoulely RIPS through this stuff, unlike the Java stuff i've written. sometimes, there's nothing like some good, old-fashioned procedural code to munge one document into another.
the only problem i had was with UTF-8 stuff. perl really wasn't quite there until perl 5.8, and i'm having trouble finding installs of it on the machines i need to use it on at the university i work for.
Just raise the taxes on crack.
I think one of the main reasons Perl and XML aren't generally used together is because Perl isn't object oriented in the same way the Java and C# are. I know that OO concepts have been bolted on to Perl in the same way the OO was bolted on to C++ and in my opinion with similar results (i.e., kludge-fest). It's very natual in Java to parse an XML doc and get an object, while it's more natural to parse a log file or CSV file with Perl.
Uniquely enough our data processing that has nothing to do with the web is heavily constructed with perl. We love the flexibility of it. It doesn't take to long for a new person to figure out how our daily processing works.
In fact I have been looking into perl-xml for processing of scalc spreadsheets that our stores send to us every day. It has been a valuable tool and we would be up a creek with Windows tools trying to do the exact same thing.
--Travis
Well, perhaps not your soul, but your Perll code just reflects the way you think to a greater extent than other languages. This isn't something that's done underhandedly, it is well advertised in every posting in c.l.perl and the Camel book, and every other book about Perl. Which is that Perl is not at all orthogonal, TMTOWDI (there's more than one way to do it). If you want to be rigorous and declare everything and not have your typos become references automatically, you "use strict" and your magic line is "#!/usr/bin/perl -w". If not, well Perl allows you to do that too. If you want objects, you can do that, if not, not.
// and tr and split I get along just fine.
If is possible to write quality code in Perl Just because the language allows you to not do so isn't its fault. It doesn't stop you from doing it, because that'd stop you from doing brilliant things.
To address some specific things you mentioned, you can do full-fledged exception handling in Perl if you want to (with eval and specific modules), or, you know, not. And I'm not familiar with the false positive matches in regexps (perhaps you're referring to some famous problem). But if a regexp doesn't do what you want it to, isn't is wrong? Between
Then maybe you should get it from Amazon, where it is $12 cheaper.
Please Rob, explain to us how whatever deal you have with bn.com is worth your user base overpaying by so much? Users can buy the book through the link above, and I will put a third of my affiliate commission (about $1.40 per copy) towards Perl development projects. This way everybody wins. Using your link, I assume you win, and that bn wins, but your loyal user base is out an additional $12 and I can't imagine your deal with bn.com nets you that much for providing the link.
Work for Change & GET PAID!
As you're also aware, most Comp Sci courses fawn over Pascal, a VERY formalized language. However, it's not mentioned much past education circles (and Apple afficionados).
In practice, reference counting doesn't seem to lead to memory leaks as you describe. And, I would argue it is much more efficient than Java's method.
PERL is an excellent SCRIPTING language. Larry Wall describes it as a "glue" language. XML is a good thing to glue together. It's perfect for that. Every tool has its purpose; push any too far, and you start abusing it.
Trying to find the quote from Larry Wall. I think it goes something like this: "Perl did easy things easily and made impossible things doable."
To Copy from One is Plagiarism; To Copy from Many is Research.
Treat XML as Lisp sexps, but with terrible syntax (and that's compared to Lisp!).
It's much less painful in the long run.
I see the table of contents explained in paragraph form. And then one complaint about the organization of the book. And then I expect to read the review, but it's already on to "you can buy this book here", and user comments.
I know complaining about slashdot stories is like shooting those proverbial barreled fish, but sheesh.
I'd agree with you, but I can't get to slashdot to read your post.
XML may look like text, but it isn't.
Instead, XML is structured data, represented as text.
Perl's text processing operators are all regular-expression (pattern) based. That works great for text (such as old-style log files) but works piss-poorly when coming to a structured file such as XML.
It'd be a royal pain in the arse to match, using a regular expression, some of the things you need to match when processing XML. Don't believe me? Take a look at what you do with XSLT (a great language for processing XML) and think of the matching power you can do with XPath, that you cannot do with Perl's regular expressions.
- David
You are an idiot. HTH.
Perl is excellent... if you need to push your CPU to its limits. Why bother running Folding@Home or SETI@Home when Perl is sucking away all the CPU?
As you're also aware, most Comp Sci courses fawn over Pascal, a VERY formalized language. However, it's not mentioned much past education circles (and Apple afficionados).
Ever heard of Borland's Delphi product? The language is object-Pascal.
I'm seeing a lot of comments that perl doesn't have any particular strengths when dealing with XML. A good module people should check out is XML::Simple. Basically, it automagically turns XML into a nested data structure, and automagically turns a nested data structure into XML. The great thing about it you just make a single API call, and just directly access the data from there without having to learn anything more complicated. Definitely not an end-all solution, but definitely handles the common case wonderfully, and has quite a few handy options to allow more fine tuned control.
A few comments of my recent experiences with Perl & XML:
At my current dev project I was asked to design a small application or script that would read an XML file, validate it using a DTD, perform more complex validation using data in our DB, and then save the XML file into various DB tables.
Issue #1: The XML::Parser module doesn't do any validation. You will need XML::libXML which uses the gnome xml library.
Issue #2: XML::LibXML was a pain to install on our Solaris environment. There were a couple of dependencies. Overall, having to install XML::LibXML on multiple machines would be very difficult.
These issues weren't show stoppers, however, we ultimately decided to go the Java route. Deployment with java was much simpler. The java XML parsers handle validation. Also, incorporating this into our WebLogic app would be easier (if we later decided to do that).
For the most basic uses, XML::Parser should suffice. XML::Parser::Simple is really easy to use (it creates a hash table of hash tables representing your XML document for you to parse).
My $.02 : XML is composed of text because it only allows ascii characters. Thats it. Well-formed XML "the language" requires more definitions, but an xml "file" is just another text file format. You're talking about nondeterministic finite automata quintuple that specifies how XML is parsed. understood, etc. But within that quintuple, I is the set of all ascii characters >= 32 and 128. At least I think that's true. Can someone post if I'm wrong? I appreciate learning of my misconceptions.
Nothing great was ever achieved without enthusiasm
XML is NOT just a text file (just because we can read it with a simple "more hello.xml"). Perl is good at processing text, because it knows regular expressions and some extensions to them. However, an XML DTD (or a Schema) defines a context-free grammar, which make a language class above the regular languges. That's why we can't fully parse XML files with Perl's RE. A good example would be nested tags that result from recursive grammar rules in the DTD. These cannot be parsed without some serious geekism in Perl RE. However, I love to write those little tools that operate on XML data in Perl. Very often, you can work with regular expressions on context-free/sensitive language data!
The reviewer is correct, Perl is a good tool for slamming and jammin' text, including XML. What I'm not so sure of is the quote "It's therefore surprising that using Perl for XML processing hasn't received much attention until recently."
I mean one need only scroll down the extensive list of CPAN Modules to see well over 50, as well as many sites/authors devoting time, energy and resource.
Similarly, I would point out some press modules supporting web services via XML, such as SOAP::Lite as far back as 02/26/01 and XML-RPC also in '01 -- or O'Reilly's own XML.com with articles such as "Processing XML with Perl" written shortly after the turn of the millenium.
Point is, though I personally love Perl, blatant plugs such as "... it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." " don't inspire confidence in the reviewer's objectivity.
--- have you healed your church website?
XML allows unicode characters.
check it out. http://axkit.org/
"Apache AxKit is an XML Application Server for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation."
picture coccoon for perl. using perl for xsp pages and doing pipline transformations on xml. great stuff.
The above includes several places that *should* have had a less-than character. You'd think that posting "Plain Old Text" would properly escape them as <, but I guess you'd be wrong.
Oh, well. You know what I meant.
:Wq
Not an editor command: Wq
Good to know, thanks.
Nothing great was ever achieved without enthusiasm
Who gives a damn? XML is generally a pain in the ass, long winded, over-the-top way to store simple data. Simple comma delimited text files are more than sufficient enough for many data needs and serious people who need real data storage of complex relational data use a database engine of some sort.
The conclusion? There isn't much development of Perl XML tools because no one cares about them. XML, with the exception of a few specialized purposes, is a buzzword for marketdroids and the technically incompetent. Why should I spend hours working out an application to parse XML data when I can write a quick script in any language to parse a comma separated text file in a few minutes?
Maybe CSV isn't quite as kewl sounding as XML?
Use AxKit! You're selling yourself short if you start to develop a site without it. It's just the ideal way to get the whole separation of content and presentation thing that XML is supposed to be all about. It makes it dirt easy to store your content in XML, use XSLT for transformations and XSP for dynamic back-end processing. Check it out!
Also read this
simon
home page
Perl's strength is text processing was its ability to work with (read and generate) poorly structured data. XML makes it easy to create well structured data thus writing document processing code in languages like C++ is easier. People who don't know Perl, or people who learned other XML toolkits first, have less reason to learn XML with Perl.
Cause it is so true.
Why is Slashdot so slow? My God, it is so slow as to be unreadable!
Is slashdot now hosted on a Lego Brick? A Mac SE? An Atari 2600? WTF?
Poorly (spell)checked stories, duplicates, and now unbearably slow.
How about /. changes to a newsletter that gets mailed out every day? The page updates would be faster, the EDs could use the spellcheck in MS Works, and stories could be filed once! in a filing cabinet.
This is a book review? How about a Slashdot review?
Alright. Start your oh-so-predictable mods. Yawn. Wake me when the page refreshes.
Maybe the editors should read this book, and speed up slashdot...
http://www.cgisecurity.com/lib
http://www.cgisecurity.com/xml.shtml
Click here to see more posts that detail Erik's dislike of XML.
When Erik Naggum speaks, the Internet listens.
And you have a shitty sense of humor.
That Perl was geared toward text proccessing has been an obstacle to XML support in my admittedly limited experience. We're trying to interface with a 3rd party system that claims to use XML for data interchange. But because their programmers are used to traditional text-proccessing, their XML support is _very_ kludgy. Stupid things like requiring line feeds after each element, etc.
XML is unicode; which != text in older perls.
Thanks for the ad-hominims. There's a lot of Perl bashing out there, but one thing you'll notice again and again is that the people who do the bashing have rarely written even a line of Perl in their lives.
Every language can be misused. Perhaps it's Perl's flexibility and openess to a variety of solution (even bad ones) that scares people who want to follow The One Right Way of programming. In contrast, experienced programmers don't want to be hemmed in or told how to program, and relish in Perl's flexibility.
Indeed, I have. But, its just as dead as its parent. Squeezing features into Pascal is like squeezing OO into PERL and C++. SmallTalk is the best OO language implementation I've seen so far. But, guess what? No one uses it either.
Just because you CAN doesn't necessarily mean you SHOULD (i.e. Object-C).
To Copy from One is Plagiarism; To Copy from Many is Research.
i wish someone would use something to render the text of this article readable.
xml::dom::fu:LOL::blah:BLAH::bLaH
that gave me a headache...
anyway, isn't this why people use php?
Large print giveth, and the small print taketh away
You can write such parsers as regular expressions, but that makes them even slower.
Despite this, I parse millions of lines of SGML/HTML/XML into trees of HTML::Element, using only Perl. But it's clunkier than it should be.