Perl and XML

← Back to Stories (view on slashdot.org)

Posted by timothy on Thursday August 22, 2002 @04:00AM from the but-the-book-is-readable dept.

prostoalex writes: "In the world of information technology, information, as the name suggests, is as important as technology itself. Erik T. Ray's and Jason McIntosh's Perl and XML is an attempt to take a look at perhaps the most popular languages for data processing. XML is an open-standard specification for documents, while Perl's natural powers lie in the area of data processing, and, as the name suggests, practical extracting and reporting." Prostalex has reviewed Perl & XML below; read on for his take on the book. Perl and XML author Erik T. Ray, Jason McIntosh pages 216 publisher O'Reilly rating 4/5 reviewer Alex Moskalyuk ISBN 059600205X summary Introduction to XML processing with Perl

With qualities like these, one might think that the marriage of Perl and XML would be total bliss, and the two languages would live happily ever after. In reality, however, the marriage has been far from perfect, and has produced an enormous number of kids: some uglier, some prettier, some simpler, some more sophisticated. Perl & XML is a good attempt to provide an overview of XML processing techniques existing nowadays in the Perl world.

The book does not even make an attempt to give you a brief introduction to Perl, and thus eliminates the weak point of trying to be another Camel book, as many publications in the field attempt to do. The logical assumption is that you know Perl and have heard something about XML. The first chapter of the book tells you why there are so many variations of Perl modules for XML processing, who is behind the well-known modules and why the interaction of Perl with XML has been rather disorganized. Indeed, a short visit to the XML section of CPAN brings up dozens of available modules, most of which characterized by some intimidating or non-descriptive names like SAX, Grove, YAWriter, etc.

The second chapter is titled "XML Recap"; the contents of the chapter, though, are good enough to be called "Concise but Informative Introduction to XML". Don't get your expectations too high -- O'Reilly has a whole bundle of books related just to learning XML, and thus a single chapter can barely touch the surface of what you might need to know, but it provides a good introduction to the world of markup, elements, namespaces, character encoding, processing instructions, schemas and transformations in XML.

Chapter 3 goes from theory to practice, and gives the reader an opportunity to try his first Perl script on XML data. The parsers covered in this chapter are XML::Parser, XML::LibXML, XML::XPath and XML::Writer. Document validation and well-formedness are also explained, and luckily enough this exact chapter is what O'Reilly Publishing decided to publish as a free chapter available on the Web. In this chapter, the authors make a distinction between stream-based and tree-based XML processing, and thus it doesn't come as a big surprise that the next four chapters are dedicated to examples of such processing.

Chapter 4, Event Streams, discusses the issues of processing XML document as a stream of data, where your application has to react to various input without really knowing where the end of the document is. XML::PYX and XML::Parser are covered in this chapter.

Chapter 5 shows examples of using SAX for XML processing with Perl, and also provides an overview of SAX history, which in a nutshell tells you that SAX has been designed for Java with its strong type checking and interface classes. It goes to explain that using it in Perl, which is known for its forgiving nature, thus requires a certain responsibility on the part of programmer. XML::Handler::YAWriter is also discussed in this chapter.

From stream processing, the authors take you to parsing XML trees. In this case, the document is assumed to be loaded into memory and Perl script can safely assume that the whole XML document has been loaded. XML::Simple, XML::SimpleObject, XML::TreeBuilder and XML::Grove are discussed in this chapter, with XML::Parser revisited.

DOM (Document Object Model) is another standard recommended by W3C and it is mostly concerned with how an XML document is stored in computer's memory. XML::DOM is discussed in this chapter with XML::LibXML revisited. The authors also provide a good overview of DOM standard.

The last three chapters deal with applications of Perl in XML data processing that go beyond stream and tree processing -- XPath and XSLT are explained with copious examples. Remember though, that both technologies have several-hundred-page books written about them, and thus several pages in a Perl and XML book can serve at best as good introduction. Chapter 9 deals with RSS and writing SOAP with Perl and XML, with XML::RSS and SOAP::Lite being explained. The last chapter deals with such issues as namespacing, subclassing and for Web designers provides a handy tutorial on converting your XML data into HTML via XSLT stylesheets.

The table of contents is posted on the publisher's Web site.

The first three chapters of the book are easy to read, since they provide a general overview of the data-processing world, history of XML with reference to appropriate events in the Perl community. However, data processing can hardly be called an exciting topic and thus bulk of the book is about routinely introducing particular modules, telling you what you can do with each, and then giving you an example of Perl code processing some XML document. The examples are apt and relate to some of data processing that some us had to do, i.e. shopping lists, address books, recipes, diaries of mad professors, etc.

The code examples are numerous, and if you get tired after looking at pages and pages of Perl lines, you better plan accordingly, as sometimes the subchapter consists of nothing more than an XML file and related Perl processing code with author's notes. For a 200-page book Perl and XML provides a great introduction into the area, provided you have good knowledge of Perl, using CPAN modules and just general knowledge about data processing. The book would probably have a more exact title if it had the word "Cookbook" in its name -- some might consider it a good reference. However, for those just getting acquainted with XML, another tutorial might be needed to get a full comprehension of XML's power.

You can purchase Perl & XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

19 of 74 comments (clear)

Min score:

Reason:

Sort:

Great book! by legLess · 2002-08-22 04:17 · Score: 4, Funny

It's so good, it gets two reviews!

--
This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."
1. Re:Great book! by kisrael · 2002-08-22 04:33 · Score: 4, Funny
  
  Well, given that XML tends to be a bit redundant, I guess it's only fitting...
  
  --
  SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
2. Re:Great book! by TheKubrix · 2002-08-22 04:40 · Score: 3, Funny
  
  dummy, they're TOTALLY different,......
  
  This book:
  Pages: 216
  Rating: 6
  Pub: O'Reilly
  Summary: Introduction to XML processing with Perl
  
  That book:
  Pages: 202
  Rating: 4/5
  Pub: O'Reilly and Associates
  Summary: Good introduction to XML for Perl programmers.
  
  ....and quite frankly the ONLY thing those two books have similar (and this is a longshot) is that ISBN # thingy....
  
  next time PLEASE read the article before posting.
3. Re:Great book! by namespan · 2002-08-22 04:46 · Score: 2
  
  Having multiple opinions about something is often a good idea for many things. It might not be as important for book reviews as it is for say, removing your gall bladder or getting rotator cuff repair, but it's still sortof nice.
  
  The alternative -- "Isn't one opinion good enough for anything?" -- seems rather frightening to me.
  
  --
  Libertarianism is rich wolves and poor sheep playing gambler's ruin for dinner.
MOD UP by Anonymous Coward · 2002-08-22 04:33 · Score: 2, Funny

I agree fully, I have also read the book and come to the same conclusions. Perl is good for writing quick and dirty scripts but is not suited for a real site.

Personally I am more impressed with microsofts script language, called batch processing (.bat files) and I would like to see this used more for internet applications.
Totally Different. by JasonMaggini · 2002-08-22 04:34 · Score: 3, Funny

This review is "Perl and XML". The other review was "Perl & XML". Big difference.
Re:perl is overrated by Tablizer · 2002-08-22 04:43 · Score: 3, Interesting

(* Perl is fine for single developer, heavenly might some say. But for team projects it's hell in disguise. *)

Perl developer's are so productive that you don't need teams :-)

Seriously, I have come to the conclusion that languages are pretty much subjective when it comes to productivity, error rates, etc.

While I am not a Perl fan myself, if Perl fans are productive under it, then I have no reason to complain, as long as they don't shove it down non-fan throats (like some Sun technologies).

--
Table-ized A.I.
Re:This book is destined by Wee · 2002-08-22 04:45 · Score: 5, Insightful

The author correctly points out that Perl is best suited for quick, unmaintainable hacks and not serious, large-scale engineering.
The same argument best used in favor of MySQL applies here as well: not everything has to be a super scalable, completely buzzword compliant, ninth-wonder-of-the-world engineering project. In fact, I'd wager that if you counted up all "developer time" spent in the US every day, you'd find that a large percentage of it is done on "throwaway" projects.
Lots of people just need to regex through logs and make a simple database and draw quick-and-dirty graphs and walk a MIB on a router and automate a build process and probe a firewall and so on and so on. They use perl, shell, whatever. And that's completely fine. Whatever works, works.
And XML, as he notes, it bloated by ivory-tower academic requirements and has made (and will make) zero headway in the Real World.
You couldn't be more wrong. XML couldn't be more real world. SOAP, office doc file formats, database output, jabber, web services, config files, the list goes on. Even nmap spits out XML if you want it to. Hell, we've been fighting to get it used at the university where I work because we've used it in the "real world" and know and like it...
-B

--
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
Re:Oh good! by m_ilya · 2002-08-22 05:12 · Score: 2

Where did you get it?
BTW it is Perl, not PERL.

--
--
Ilya Martynov (http://martynov.org/)
Samples, Learning by fm6 · 2002-08-22 05:52 · Score: 5, Insightful

The sample chapter on the O'Reilly site contains a lot of what I dislike about about O'Reilly books: too much cuteness, not enough organization, too many useless examples. The last are particularly dumb: you don't want to do your own XML parsing, the authors say repeatedly that you don't want to do your own parsing -- yet most of the examples are parsing code!
It's also dumb to have any introductory XML material if you're not going to be serious about it. Which gives me an excuse to plug my favorite XML for beginners book, Harold's XML Bible, 2nd edition. Yeah, stupid title (who are you, Moses?), and the CD-ROM is badly put together (you'll need to convert some text files from Mac format!). But the book itself is very good. No assumptions at all about previous knowledge If you know jack about XML, or you understand the basics, but can't figure out how it's used, this is the book you want.
1. Re:Samples, Learning by Wiggins · 2002-08-22 06:31 · Score: 2, Interesting
  
  "The sample chapter on the O'Reilly site contains a lot of what I dislike about about O'Reilly books: too much cuteness, not enough organization, too many useless examples. The last are particularly dumb: you don't want to do your own XML parsing, the authors say repeatedly that you don't want to do your own parsing -- yet most of the examples are parsing code! "
  O'Reilly chose a poor example chapter. The examples of how to do parsing are important in the context of understanding what the modules are doing and being able to determine when one should use which module, which is much of what the book is about.
  "It's also dumb to have any introductory XML material if you're not going to be serious about it"
  Thats the beauty of this particular book, they are "serious about it" they are serious about it being an introduction. For myself who has read about XML standards existing and been doing web development I didn't need an all encompassing book about what the hell a markup language is and how they are generally formatted, etc. and all of the minute details in the XML spec, what I needed was a very concise overview about what DOM, SAX, XSL, etc. are and where they are applied. Which was accomplished beautifully in this particular book.
  Having said all of that I very much appreciated the book as a perfect fit for someone with a fair amount of Perl experience, lots of web experience, but no XML experience. Only fault I found was the price, but that is O'Reilly NOT the author or the text itself.
  
  --
  Funny and I thought Perl == Paid employment recently located ....hmmph.....
Re:perl is overrated by Tablizer · 2002-08-22 06:32 · Score: 2

(* Some languages are more productive than others, some are faster than others, and some are safer than others. *)

Well, toy languages aside, their tradeoffs tend to counter other things. For example, dynamic languages might not check as much at compile time, but the tradeoff is that the language may be simpler and easier to read and test and combine easier with parts from "outside the type tree".

I don't want to get into another long "my language is better than yours" debate, because they never go anywhere.

There is at least as much evidence that letting people use the language they are more comfortable with increases productivity as there is that there is "one best language".

--
Table-ized A.I.
Re:How Ironic by Wee · 2002-08-22 06:49 · Score: 2

So, Perl is optimized for quick and dirty problems. The kind in which you know in advance you won't be maintaining it, and don't need a solid, easily-maintained language.
When did I ever say that? Perl is "optimized" to give you enough rope to hang yourself. Or climb. Or whatever you want. If you want to use it for one-off glue scripts, more power to you. You want to write something big and complicated with a definite maintainance path? Go right ahead. And if you don't like/know/use Perl, then that's fine too. Use Tcl or shell or Ruby or expect or Python or DOS Batch or even Brainfuck if you want. But don't begrudge another's use of Perl. They may have a very good reason to use it.
And XML is a sophisticated mechanism for describing data for when a simple pipe-delimited file format won't achieve a sufficient buzzword high.
That's bullshit, and you know it. You can't always use CSV/pipes. Yeah, you can very often just use simple name/value pairs (for example), but XML fills a very crucial role when data storage needs become more complicated or have to grow beyond original design specs. I'd give you examples from stuff I've personally encountered, but you probably wouldn't care so I won't bother to go dig stuff up. You sound like someone that likes to roll their own parsers anyway...
It's so powerful that this book doesn't even go into detail on learning XML - the reviewer recommends additional books for really getting to understand XML, should you desire to replac the fast & easy delimited format.
Your argument is specious.
I have an older book on XML by Charles Goldfarb (of SGML fame) which is 1100+ pages. In it, he admits that even with that much paper, the book can't fully impart knowledge of XML to the reader. The nature of XML defies that sort of easy classification and it's just too big a concept.
I'm going to write a book called "Oil-based Paints and Art" and it'll be around 216 pages long. In it, I'm going to claim to fully describe not just painting with oils, but also completely describe "art", and everything you can do with it. Then, in the face of both ludicrous claims, I'm going to ask that you trust me and my knowledge of the subject enough that you'll purchase my book.
Oh yeah, sounds like a match made in heaven.
To some, it is. And they probably don't get on your case for your using your delimited format for everything. They probably don't have the slightest clue what your design goals are, and so couldn't possibly understand why you choose the tech you choose. Likely they figure that you've weighed your options and made the best choice, from a technological standpoint -- that you've chosen the right tool for the job at hand, whatever it may be. Since you know best what you need for your job.
I will never understand why people will constantly engage in techno-snobbery. If someone wants to use Perl and XML, what the hell should you care?
-B

--
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
Re:This book is destined by ajs · 2002-08-22 06:55 · Score: 4, Informative

The author correctly points out that Perl is best suited for quick, unmaintainable hacks and not serious, large-scale engineering.

A common misconception about Perl, and I understand why you would think this way. Having written several large systems in Perl (as well as several smaller systems in which maintainability was key), I will point out these problems with your thesis:

1. If your quick hacks are unmaintainable, get another job. If your prototypes aren't good enough to turn into a product with a small amount of work, you're wasting lots of your own (presumably valued) time.

2. PDL belies your comments. Take a look at it. It's large, maintainable and certainly serious. Scientific computing is a dream with it.

I generally find that "large scale engineering" is the easy part. What's hard is applying sound engineering and scientific principles to your code. You can do that as easily in C as in Perl as in Python as in Eiffel. It's a little harder in C++, though honestly not by much, because of the action-at-a-distance nature of C++'s object model.

Oddly, Java has a problem in neither area. Java's problems have very little to do with the language itself (which is an acceptable C derivative, if you want a C derivative with GC) and more to do with what Sun does with it, and how that has trained its users to approach it.

Perl's biggest problems are the following:

1. Incomplete object model. This was deliberate, and has given Perl6 the ability to form its object model out of a decade of best practices by those who "rolled their own" in Perl5.

2. Circa AWK/shell subroutine semantics. Also a major aim of the Perl6 effort, this is a subtle thorn in Perl's side. It's easily overcome once you get used to it, but leads to nearly impossible-to-optimize function calls in average code.

3. The lack of true compilation of the language leaves software developers with very few options in terms of shipping programs. Again one of the first aims of Perl6

See the Perl6 theme? Wait for it... it will rock your world.
Re:How Ironic by kpharmer · 2002-08-22 07:24 · Score: 2, Interesting

Jeez,

Calm down and look at what we're talking about here:
- Perl: a language well-known emphasizing ease of development at the cost of ease of maintenance
- XML: a distributed metadata mechanism that emphasizes ease of maintenance at the cost of ease of development.

Doesn't that sound like an odd mix that may occasionally be reasonable, but often shows a poor understanding of priorities and options?

What's next? How about:
- MySQL & WebLogic
- Dos bat files & Oracle
- Zope with isam files
- SAP sitting on top of Dbase IV

Common problems in it:
- emotional attachments to tools
- blind acceptance of marketing hype
- lack of perspective beyond a single project
Combined, these characteristics probably make these two technologies look ideal.
Re:Oh good! by m_ilya · 2002-08-22 07:33 · Score: 2

You are doing very wrong assumption that there are still connection between acronym and language. PERL 1 and Perl 5 are very different languages.
On the other hand many things can be described as "Practical Extraction and Reporting". Like getting XML data from middle layer (extraction), converting it on the fly to HTML/Excel/CSV report using XLST stylesheets and serving result via mod_perl (reporting) :) It is something I have implemented not very long time ago. Perl perfectly suited my needs.

--
--
Ilya Martynov (http://martynov.org/)
Re:This book is destined by ajs · 2002-08-22 13:32 · Score: 2

Your definition of "quick hack" seems to be questionable. I've cranked out quick hacks in every language I've ever used (C, C++, Perl, Pascal, CLIPS, numerous line-oriented scripting languages, LISP, etc.) It's always possible, and downright good. What's bad is writing code for any reason which others cannot understand and manage. If you do so in Perl, it's just as bad as doing so in Java, Python, C or COBOL.

Every language can be used to write obfuscated code. The question is, do you write obfuscated code for fun and when appropriate, or do you do it because you cannot do otherwise? If you cannot do otherwise, get out of the kitchen. If you can, do... in Perl or whatever langauge suits the job at hand.

Perl simply suits more jobs than most languages because of its flexibility, from complex data management (yes, of which its heralded capacity for text processing is a minor and rather unimpressive aspect) to elegant implementations of such high-level constructs as closures to the grace with which it does what you probably wanted in the first place by default, but always offers you the option to do otherwise.

Your comments make it pretty clear that you either do not know Perl, or you do not know it well. This is not shocking. Most people "know Perl" because they've seen some overglorified mailroom clerk's attempt at text processing written in it. The fact that Perl lends itself to being used by low-end programmers does not mean that that is the only domain in which it functions. If it were, I would have moved on long ago.
Re:perl is overrated by Tablizer · 2002-08-22 17:04 · Score: 2

(* Perl is easier to read than C or C++ because it is dynamic?!?! *)

Maybe for a Perl fan it is. (Plus, Perl is not the only way to make a dynamic language and there are exceptions.)

Like I said, different people are suited for different languages. Perl models Larry Wall's head, and people who think kikd of like Larry like Perl and can read Perl.

--
Table-ized A.I.
Re:This book is destined by ajs · 2002-08-23 14:15 · Score: 2

You increase disoreder by writing an obfuscated code when it is not originally required.

No, I most certainly do not. I don't know how you got that impression, but you're wrong. As I alluded to (perhaps too causally), I do obfuscate code for fun in competitions such as the IOCCC, but when it comes to work, I consider obfuscation to be downright evil. Anyone who writes hard-to-read code in Perl, C, C++, Java, Python, Eiffel, LISP or any other language should be shown the door. It's simply not acceptable.

I think you and I agree on this point, so I'm not quite sure how we got on opposite sides of the debate. I guess you just assumed that because a lot of lousy programmers have used Perl, that Perl is a language only suited to lousy programmers. That is certainly not true.