XML and Perl
XML and Perl is written by two well-known members of the Perl XML community. Both are frequent contributors to the "perl-xml" mailing list, so there's certainly no doubt that they know what they are talking about. Which is always a good thing in a technical book.
The book is made up of five sections. The first section has a couple of chapters which introduce you to the concepts covered in the book. Chapter one introduces you separately to XML and Perl and then chapter two takes a first look at how you can use Perl to process XML. This chapter finishes with two example programs for parsing simple XML documents.
Section two goes into a lot more detail about parsing XML documents with Perl. Chapter three looks at event-driven parsing using XML::Parser and XML::Parser::PerlSAX to demonstrate to build example programs before going to talk in some detail about XML::SAX which is currently the state of the art in event-driven XML parsing in Perl. It also looks at XML::Xerces which is a Perl interface to the Apache Software Foundation's Xerces parser. Chapter four covers tree based XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM and XML::LibXML. In both of these chapters the pros and cons of each of the modules are discussed in detail so that you can easily decide which solution to use in any given situation.
Section three covers generating XML documents. In chapter five we look at generating XML from text sources using simple print statements and also the modules XML::Writer and XML::Handler::YAWriter. Chapter six looks at taking data from a database and turning that into XML using modules like XML::Generator::DBI and XML::DBMS. Chapter seven looks at miscellaneous other input formats and contains examples using XML::SAXDriver::CSV and XML::SAXDriver::Excel.
Section four covers more advanced topics. Chapter eight is about XML transformations and filtering. This chapter covers using XSLT to transform XML documents. It covers the modules XML::LibXSLT, XML::Sabletron and XML::XPath.
Chapter nine goes into detail about Matt Sergeant's AxKit, the Apache XML Kit which allows you to create a website in XML and automatically deliver it to your visitors in the correct format.
Chapter ten rounds off the book with a look at using Perl to create web services. It looks at the two most common modules for creating web services in Perl - XML::RPC and SOAP::Lite.
Finally, section five contains the appendices which provide more background on the introductions to XML and Perl from chapter one.
There was one small point that I found a little annoying when reading the book: Each example was accompanied with a sample of the XML documents to be processed together with both a DTD and an XML Schema definition for the document. This seemed to me to be overkill. Did we really need both DTDs and XML Schemas for every example. I would have found it less distracting if one (or even both) of these had been moved to an appendix.
That small complaint aside, I found it a useful and interesting book. It will be very useful to Perl programmers (like myself) who will increasingly be expected to process (and provide) data in XML formats.
You can purchase XML and Perl from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Or what should be called.. "How to switch to MS Access and VB"
has a woody for banner ads
i have gotten 1st 2nd 3rd
Perl and XML. XML and Perl. It's a marriage made in heaven. One of them uses a cryptic, machine-readable-only encoding to concisely depict data and programs. The other is a markup language.
fouth too?
Though the reviewer didn't think so, I like it when DTD and XML Schema examples are side by side. Having looked at DTD's for quite some time now, have to change gears to the new standard of using XML schemas.
Would be nice to have a book with more than just one chapter on web services. There are a plethura of Java/C# web services books out there, but it's hard to find one on there just for Perl, PHP, etc.
--------
Free your mind.
... but I thought Perl was a write-only language? How can I be expected to read the book, if it's just gibberish like Perl? Geez. :)
(Okay, fine - I admit it - I kinda like Perl. But that's another story.)
"that the world outside of the Perl community doesn't seem to have taken much notice of this work"
Thats because most programmers have no need of Perl. Just because people go on and on about websites, doesn't mean that its responsible for the majority - or even a significant - amount of code written each day by most coders.
As XML is just another text format, it follows that Perl will be just as good at processing XML documents.
Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.
The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.
Not really. If you're using XML as "just another text format", then you're making a funamental mistake. Within your software, you should always be treating XML as a hierarchical data structure, not as a text stream. Apart from manipulating CDATA or attribute value text, Perl has no particular strength with XML.
If Larry Wall gave his wife a gift...
;)
Would it be a Perl Necklace?
...the Perl XML's you!
One new, and cool, Perl XML module that people might not know about is Petal (PErl Template Attribute Language).
It is an implementation of the Zope TAL (Template Attribute Language) specification and it basically allows you to create XML templates where all the templating commands are just attributes of existing tags.
This allows things like XHTML templates which are very WYSIWYG friendly since the editors don't do anything with attributes that they don't know about.
Check out MKDoc a mod_perl CMS
(Moderators: skip to note at the end before you moderate this ;) )
I've programmed for some time in Perl, but at no time has this been anything to do with the CompSci degree I'm studying; no course even mentions it. Why is this? Perl doesn't seem to have much respect in educated programming circles, and I think this is why;
It's type system is not entirely sound. Inference upon the typing rules (which aren't formally stated anywhere; I had to derive them from the sourcecode) can lead to propositional contradictions.
It is most certainly _not_ Turing complete (trivially provable); hence not all algorithms can be implemented in it that you could with a Turing complete language like Java or C(++).
It's reference counting system of garbage collection can sometimes result in memory leaks, as opposed to the more thorough graph traversal employed in other languages.
IMPORTANT: Please, only reply to this post or moderate it if you actually understand the principles of compsci that I'm arguing about. I've been smacked down before by ignorant kiddies before, and would much prefer to see more reason in the future.
One of the goals of Perl 6 is to make non-trivial projects possible. That's good. The way it's being done is bad. Perl was once a lightweight, extremely flexible language. Now it's become a huge ugly monster. People wanted OO, so a nasty hack was bolted on top to allow some semblance of it. Now this nasty hack is being expanded. Sure, the code's different, but the basic form is the same. Kludge upon kludge upon kludge; I'd much rather have a nice, clean, pure language (and not one with loads of irritating whitespace thank you very much).
The same goes for the syntax. All the switching between $, @ and % is really irritating (ask a newbie how to get at the length of the keys array of a hash inside a hash, for example), and the changes proposed for 6 are just making this worse -- it seems that Larry, in his infinite wisdom, wants to prefix every data type with a different hard-to-type character. Perl was only designed for the three data types, and adding more is a mess.
Perl 6 is a complete rewrite, but it keeps all the mess which has accumulated over the previous versions. This is not good. Sure, my const int $var = 27; may look neat (in the same way that, say, Pascal does), but $var isn't entirely constant, or entirely an integer, it's just a hack which makes it sort of behave like one. The whole thing is an exercise in pseudo-computer science masturbation with little real purpose except to please the managers who dislike the one thing that makes Perl special.
On a similar note is regexes. I'm an avid fan of regular expressions simply because a nondeterministic finite automata is far more flexible than linear code. However, Larry must have been smoking that cheap $2 crack when he wrote this. Does he want Perl 6 to be flex or something?
I won't be going on to use 6. It's a nice idea, but it's completely unnecessary. It won't make large projects any easier to manage (the language is still, at heart, an almighty hack -- an impressive one, but still a hack). It won't make OO any cleaner. It won't make development any faster. To put it bluntly, Perl scripts will still look less beautiful than our friend Mr Goatse. I'd prefer to use a language which has always been pure synthesis of science and engineering, not some half-baked imposter.
Perl 6 will be nice, but I'm guessing it will be the end of Perl. It can't do what it wants to do whilst still being based upon a nasty mess. There are now other options, which provide all of Perl's power and none of the mess. Sorry, but BSD^W Perl is dying. Larry is buggering it up the ass without lubricants, just like Shoeboy is doing to Larry's daughter.
Why is Slashdot so slow? My God, it is so slow as to be unreadable!
Is slashdot now hosted on a Lego Brick? A Mac SE? An Atari 2600? WTF?
Poorly (spell)checked stories, duplicates, and now unbearably slow.
How about /. changes to a newsletter that gets mailed out every day? The page updates would be faster, the EDs could use the spellcheck in MS Works, and stories could be filed once! in a filing cabinet.
This is a book review? How about a Slashdot review?
Alright. Start your oh-so-predictable mods. Yawn. Wake me when the page refreshes.
How effective were the examples? How easy to read and understand were the general concepts? Were the descriptions of libraries and API's clear? Was the writing generally readable?
Would this book even make a good reference?
Jeez, anyone want to follow up the post with a real review?
(For reference, see this rant by the brilliant net.kook Erik Naggum. The most quotable bit, for the lazy among you, is
)Although I agree that Perl/XML sounds like a powerful and flexible way to serve dynamic content, I can't help thinking that it is ultimately better to adapt existing frameworks (Slashcode, PHP-Nuke & friends etc..). Maybe a friendly group of Perl/XML gods will read the book and produce a framework/toolkit that the rest of us mere mortals can use. I suspect that I will buy this book anyway, read it, and after frying my brain for a few days I will stuff it on my bookshelf and walk away with a huge inferiority complex. My bookshelf makes me look like a guru, but secretly, my encyclopaedic knowledge comes from here.
Modest doubt is called the beacon of the wise. - William Shakespeare
and i know there are going to be a lot of posts saying "XML obviates Perl!"...
but i disagree. Perl absoulely RIPS through this stuff, unlike the Java stuff i've written. sometimes, there's nothing like some good, old-fashioned procedural code to munge one document into another.
the only problem i had was with UTF-8 stuff. perl really wasn't quite there until perl 5.8, and i'm having trouble finding installs of it on the machines i need to use it on at the university i work for.
Just raise the taxes on crack.
I think one of the main reasons Perl and XML aren't generally used together is because Perl isn't object oriented in the same way the Java and C# are. I know that OO concepts have been bolted on to Perl in the same way the OO was bolted on to C++ and in my opinion with similar results (i.e., kludge-fest). It's very natual in Java to parse an XML doc and get an object, while it's more natural to parse a log file or CSV file with Perl.
Well, perhaps not your soul, but your Perll code just reflects the way you think to a greater extent than other languages. This isn't something that's done underhandedly, it is well advertised in every posting in c.l.perl and the Camel book, and every other book about Perl. Which is that Perl is not at all orthogonal, TMTOWDI (there's more than one way to do it). If you want to be rigorous and declare everything and not have your typos become references automatically, you "use strict" and your magic line is "#!/usr/bin/perl -w". If not, well Perl allows you to do that too. If you want objects, you can do that, if not, not.
// and tr and split I get along just fine.
If is possible to write quality code in Perl Just because the language allows you to not do so isn't its fault. It doesn't stop you from doing it, because that'd stop you from doing brilliant things.
To address some specific things you mentioned, you can do full-fledged exception handling in Perl if you want to (with eval and specific modules), or, you know, not. And I'm not familiar with the false positive matches in regexps (perhaps you're referring to some famous problem). But if a regexp doesn't do what you want it to, isn't is wrong? Between
Then maybe you should get it from Amazon, where it is $12 cheaper.
Please Rob, explain to us how whatever deal you have with bn.com is worth your user base overpaying by so much? Users can buy the book through the link above, and I will put a third of my affiliate commission (about $1.40 per copy) towards Perl development projects. This way everybody wins. Using your link, I assume you win, and that bn wins, but your loyal user base is out an additional $12 and I can't imagine your deal with bn.com nets you that much for providing the link.
Work for Change & GET PAID!
I can't access SLaSHdoT WIth MOZILLA!
It's blocking mozilla but I can get to it with IE!! UGh!
HeLp!!
Treat XML as Lisp sexps, but with terrible syntax (and that's compared to Lisp!).
It's much less painful in the long run.
I see the table of contents explained in paragraph form. And then one complaint about the organization of the book. And then I expect to read the review, but it's already on to "you can buy this book here", and user comments.
I know complaining about slashdot stories is like shooting those proverbial barreled fish, but sheesh.
XML may look like text, but it isn't.
Instead, XML is structured data, represented as text.
Perl's text processing operators are all regular-expression (pattern) based. That works great for text (such as old-style log files) but works piss-poorly when coming to a structured file such as XML.
It'd be a royal pain in the arse to match, using a regular expression, some of the things you need to match when processing XML. Don't believe me? Take a look at what you do with XSLT (a great language for processing XML) and think of the matching power you can do with XPath, that you cannot do with Perl's regular expressions.
- David
Perl is excellent... if you need to push your CPU to its limits. Why bother running Folding@Home or SETI@Home when Perl is sucking away all the CPU?
I'm seeing a lot of comments that perl doesn't have any particular strengths when dealing with XML. A good module people should check out is XML::Simple. Basically, it automagically turns XML into a nested data structure, and automagically turns a nested data structure into XML. The great thing about it you just make a single API call, and just directly access the data from there without having to learn anything more complicated. Definitely not an end-all solution, but definitely handles the common case wonderfully, and has quite a few handy options to allow more fine tuned control.
A few comments of my recent experiences with Perl & XML:
At my current dev project I was asked to design a small application or script that would read an XML file, validate it using a DTD, perform more complex validation using data in our DB, and then save the XML file into various DB tables.
Issue #1: The XML::Parser module doesn't do any validation. You will need XML::libXML which uses the gnome xml library.
Issue #2: XML::LibXML was a pain to install on our Solaris environment. There were a couple of dependencies. Overall, having to install XML::LibXML on multiple machines would be very difficult.
These issues weren't show stoppers, however, we ultimately decided to go the Java route. Deployment with java was much simpler. The java XML parsers handle validation. Also, incorporating this into our WebLogic app would be easier (if we later decided to do that).
For the most basic uses, XML::Parser should suffice. XML::Parser::Simple is really easy to use (it creates a hash table of hash tables representing your XML document for you to parse).
My $.02 : XML is composed of text because it only allows ascii characters. Thats it. Well-formed XML "the language" requires more definitions, but an xml "file" is just another text file format. You're talking about nondeterministic finite automata quintuple that specifies how XML is parsed. understood, etc. But within that quintuple, I is the set of all ascii characters >= 32 and 128. At least I think that's true. Can someone post if I'm wrong? I appreciate learning of my misconceptions.
Nothing great was ever achieved without enthusiasm
XML is NOT just a text file (just because we can read it with a simple "more hello.xml"). Perl is good at processing text, because it knows regular expressions and some extensions to them. However, an XML DTD (or a Schema) defines a context-free grammar, which make a language class above the regular languges. That's why we can't fully parse XML files with Perl's RE. A good example would be nested tags that result from recursive grammar rules in the DTD. These cannot be parsed without some serious geekism in Perl RE. However, I love to write those little tools that operate on XML data in Perl. Very often, you can work with regular expressions on context-free/sensitive language data!
The reviewer is correct, Perl is a good tool for slamming and jammin' text, including XML. What I'm not so sure of is the quote "It's therefore surprising that using Perl for XML processing hasn't received much attention until recently."
I mean one need only scroll down the extensive list of CPAN Modules to see well over 50, as well as many sites/authors devoting time, energy and resource.
Similarly, I would point out some press modules supporting web services via XML, such as SOAP::Lite as far back as 02/26/01 and XML-RPC also in '01 -- or O'Reilly's own XML.com with articles such as "Processing XML with Perl" written shortly after the turn of the millenium.
Point is, though I personally love Perl, blatant plugs such as "... it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." " don't inspire confidence in the reviewer's objectivity.
--- have you healed your church website?
XML allows unicode characters.
check it out. http://axkit.org/
"Apache AxKit is an XML Application Server for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation."
picture coccoon for perl. using perl for xsp pages and doing pipline transformations on xml. great stuff.
The above includes several places that *should* have had a less-than character. You'd think that posting "Plain Old Text" would properly escape them as <, but I guess you'd be wrong.
Oh, well. You know what I meant.
:Wq
Not an editor command: Wq
Good to know, thanks.
Nothing great was ever achieved without enthusiasm
Who gives a damn? XML is generally a pain in the ass, long winded, over-the-top way to store simple data. Simple comma delimited text files are more than sufficient enough for many data needs and serious people who need real data storage of complex relational data use a database engine of some sort.
The conclusion? There isn't much development of Perl XML tools because no one cares about them. XML, with the exception of a few specialized purposes, is a buzzword for marketdroids and the technically incompetent. Why should I spend hours working out an application to parse XML data when I can write a quick script in any language to parse a comma separated text file in a few minutes?
Maybe CSV isn't quite as kewl sounding as XML?
Use AxKit! You're selling yourself short if you start to develop a site without it. It's just the ideal way to get the whole separation of content and presentation thing that XML is supposed to be all about. It makes it dirt easy to store your content in XML, use XSLT for transformations and XSP for dynamic back-end processing. Check it out!
Also read this
simon
home page
Perl's strength is text processing was its ability to work with (read and generate) poorly structured data. XML makes it easy to create well structured data thus writing document processing code in languages like C++ is easier. People who don't know Perl, or people who learned other XML toolkits first, have less reason to learn XML with Perl.
Cause it is so true.
http://www.cgisecurity.com/lib
http://www.cgisecurity.com/xml.shtml
Click here to see more posts that detail Erik's dislike of XML.
When Erik Naggum speaks, the Internet listens.
That Perl was geared toward text proccessing has been an obstacle to XML support in my admittedly limited experience. We're trying to interface with a 3rd party system that claims to use XML for data interchange. But because their programmers are used to traditional text-proccessing, their XML support is _very_ kludgy. Stupid things like requiring line feeds after each element, etc.
XML is unicode; which != text in older perls.
Thanks for the ad-hominims. There's a lot of Perl bashing out there, but one thing you'll notice again and again is that the people who do the bashing have rarely written even a line of Perl in their lives.
Every language can be misused. Perhaps it's Perl's flexibility and openess to a variety of solution (even bad ones) that scares people who want to follow The One Right Way of programming. In contrast, experienced programmers don't want to be hemmed in or told how to program, and relish in Perl's flexibility.
i wish someone would use something to render the text of this article readable.
xml::dom::fu:LOL::blah:BLAH::bLaH
that gave me a headache...
anyway, isn't this why people use php?
Large print giveth, and the small print taketh away
You can write such parsers as regular expressions, but that makes them even slower.
Despite this, I parse millions of lines of SGML/HTML/XML into trees of HTML::Element, using only Perl. But it's clunkier than it should be.