Perl & XML

← Back to Stories (view on slashdot.org)

Posted by timothy on Thursday July 11, 2002 @04:15AM from the merging-realms dept.

dooling writes: "Perl & XML is a well-written book that accomplishes what it sets out to do. It states in the preface that it is written for Perl programmers who want to learn about XML and what is available in Perl for XML processing. It achieves this goal, but little else. When you are done reading this book you will have been given an overview of Perl and XML, know where to begin to attack an XML document, and know where to look to find more information." For dooling's more complete review, read on below. Perl & XML author Erik T. Ray & Jason McIntosh pages 202 publisher O'Reilly and Associates rating 6 reviewer dooling ISBN 059600205X summary Good introduction to XML for Perl programmers.

The book starts out with a brief explanation of why XML and Perl are well-suited for each other. It then provides a teaser of things to come: an explanation of how to use the XML::Simple module. The first chapter concludes with some warnings and gotchas that seem a little premature since they have not really explained XML. Fortunately, most of these gotchas are covered in context later in the book.

The second chapter provides a whirlwind overview of XML -- covering its structure, DTDs, schemas, and XSLT (transformation). The discussion of XML in general, its history, and parts of an XML document are well done. They give someone who is familiar with static HTML the needed background to understand the structure of an XML document and the vocabulary used to describe it. Unfortunately, the discussion of where XML begins to distinguish itself from HTML, namely with DTDs, the new replacement for DTDs called schemas, and the transformation language XSLT, is too brief. They gloss over these topics with little explanation and few examples. That said, there are other books that do provide more in-depth coverage of XML (this book only promises an introduction).

The next five chapters cover Perl modules designed to process XML, starting with simple parsers and writers. Only methods and syntax relating to XML processing are explained. Therefore, if you are considering reading this book, you should be fairly comfortable with Perl and object-oriented (OO) interfaces to CPAN modules (nearly all the modules discussed provide OO APIs). Again, there are other books and perldoc documentation that cover Perl and it's OO features; so read them first if you are not familiar with OO Perl. If you are familiar with OO Perl, these chapters provide a good overview of the different ways XML can be processed (stream- and tree-based approaches), the advantages and disadvantages of each, and the Perl modules best suited for each approach. These chapters are the biggest strength of this book. The modules discussed in these chapters are by no means an exhaustive list of XML-related modules available from CPAN nor do the explanations of each module cover everything the module does. These chapters do, however, provide the reader with enough information that she can begin to process XML documents intelligently and know where to turn when she needs more information.

The next chapter, Chapter 8, covers XML tree iterators, XPath, XSLT, and XML::Twig. All of these topics are covered in a span of 16 pages (with only slightly over two pages dedicated to XSLT). Indeed, after reading the chapter, you may get the feeling that it was only included so the authors could cram more trite colloquialisms into the book. The short shrift given to these topics creates the impression, which is strengthened in the chapters that follow, that this book was rushed a bit to press.

Chapter 9 discusses applications of XML, including RSS and SOAP, and Chapter 10 is mostly example code. These chapters are intended to give you a feeling for what is possible without really giving you enough information to make it happen. The main problem with these chapters are the examples: the examples are long and the explanations are short. Thus, they are more useful as templates or a quick reference than for learning these topics in detail. Of course, the authors never promised you would be programming SOAP applications when you were done reading this book. And again, there are other books out there which discuss these topics in more detail. So the authors stay true to their promise throughout the book: they will introduce you to XML and tell you how to interact with XML using Perl, no more.

Personally, I found this book did, in general, give me enough information to get started using XML and pointed me where I needed to go to get more information. I am an experienced Perl programmer who is new to XML and comfortable with on-line documentation. This book seems to be written for people who fit this profile and who want to learn by doing (finding the answers to the "hard" questions as they arise). It does introduce a wide variety of XML-related topics and the Perl modules used to interact with them, which is what the authors promised to do in the preface. While it is by no means an authoritative text on Perl and XML, there is something to be said for keeping promises ...

Index As with most first-edition books, the index was adequate but not complete. For example, XML::Twig, which has an entire section covering it, does not appear in the index at all.

Contents
Preface

Perl and XML
- Why Use Perl with XML?
- XML Is Simple with XML::Simple
- XML Processors
- A Myriad of Modules
- Keep in Mind ...
- XML Gotchas
An XML Recap
- A Brief History of XML
- Markup, Elements, and Structure
- Namespaces
- Spacing
- Entities
- Unicode, Character Sets, and Encodings
- The XML Declaration
- Processing Instructions and Other Markup
- Free-Form XML and Well-Formed Documents
- Declaring Elements and Attributes
- Schemas
- Transformations
XML Basics: Reading and Writing
- XML Parsers
- XML::Parser
- Stream-Based Versus Tree-Based Processing
- Putting Parsers to Work
- XML::LibXML
- XML::XPath
- Document Validation
- XML::Writer
- Character Sets and Encodings
Event Streams
- Working with Streams
- Events and Handlers
- The Parser as Commodity
- Stream Applications
- XML::PYX
- XML::Parser
SAX
- SAX Event Handlers
- DTD Handlers
- External Entity Resolution
- Drivers for Non-XML Sources
- A Handler Base Class
- XML::Handler::YAWriter as a Base Handler Class
- XML::SAX: The Second Generation
Tree Processing
- XML Trees
- XML::Simple
- XML::Parser's Tree Mode
- XML::SimpleObject
- XML::TreeBuilder
- XML::Grove
DOM
- DOM and Perl
- DOM Class Interface Reference
- XML::DOM
- XML::LibXML
Beyond Trees: XPath, XSLT, and More
- Tree Climbers
- XPath
- XSLT
- Optimized Tree Processing
RSS, SOAP, and Other XML Applications
- XML Modules
- XML::RSS
- XML Programming Tools
- SOAP::Lite
Coding Strategies
- Perl and XML Namespaces
- Subclassing
- Converting XML to HTML with XSLT
- A Comics Index

Index

You may also want to check out Erik T. Ray's home page, Jason McIntosh's home page, or O'Reilly's page for the book. You can purchase Perl &amp XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

4 of 125 comments (clear)

Min score:

Reason:

Sort:

my opinion by larry+bagina · 2002-07-11 04:23 · Score: 5, Informative

I am a professional developer, working mostly with Perl. I work in the field of biology and bioinformatics, but have spent the last 8 years working as a web and database Internet developer. And, I own practically every O'Reilly Perl book ever published (not that I necessarily think they're all worth buying). So, now that you know where I'm coming from...

If you are preparing to do a serious amount of XML development, and you're in the process of determining a) which Perl XML modules on CPAN you want to use, and b) how to use them; and, you don't have a whole lot of time to spend tracking down the sometimes-hard-to-find documentation on these modules; then buying this book is a no-brainer. It covers all the major XML modules, how to use then and really helps you figure out when to use the different modules.

Even if you're not new to XML and Perl, this book would serve as an excellent refresher course on what XML tools are available out there for you... Maybe you haven't looked at your code in awhile, or want to update it to use a newer module from CPAN? Or, maybe you're looking for a better way to do it? Then, this book would definitely help you out.

While a fan of O'Reilly books in general, I'll be the first to admit some of them are more useful than others. I highly recommend this book, though, as it's actually useful, comprehensive and very well presented. I find myself cracking it open all the time, especially as my utilization of XML has grown more complicated. It has definitely earned its place in my Aqua Perl book collection.

--
Do you even lift?
These aren't the 'roids you're looking for.
Right... by alexhmit01 · 2002-07-11 04:36 · Score: 5, Insightful

You realize that if I get an XML file, I can figure out what it is saying and decide what to do with it. With your ideal (binary) files, I need to reverse engineer the format.

With binary, I need permission to interoperate. With XML, I need a text editor (or print-out) and some common sense.

You worry all you want about the computer's efficiency. I use my machines to make my life easier. I don't jump through hoops to make the computer's life easier...

Taking troll bait,
Alex
how to attack and neutralize the wild XML document by Dr.+Awktagon · 2002-07-11 04:42 · Score: 5, Funny

...know where to begin to attack an XML document...

I can tell you from personal experience, you want to attack the soft, weak center of each element, or, even better, any undefended #PCDATA.

You'll want to avoid attacking the sharp angle brackets present on every element. Your sword blows will simply glance off, and then the XML document will jab you with the sharp corner.

Entities are another hidden danger. The ampersand prefix character is very quick and wiley, and even though it appears smooth and undefended, it can quickly turn on you, showing its offensive nature and bristling an array of pointy teeth. (Note, this depends on your screen font).

In short, attacking XML documents is risky, but with the proper strategy, can yield a nearly limitless supply of delicious data.

Ahem.

Does anybody know of any Perl XSLT module that allows Perl functions to be called from the templates? I.e., to format dates or stuff like that.
Re:Why parse XML in the first place? by The+Pim · 2002-07-11 09:36 · Score: 5, Insightful
Interoperability is great and all, but I think XML is nothing but hype.
Heck, let me give this my crack... :-)
Ok, obviously the biggest reason for XML's popularity is hype. That's just the way the industry works; it doesn't make XML good or bad.
There are a several legitimate technical benefits to XML, that might be persuasive in one context or another.
- It looks like HTML, so everyone intuitively "gets" it.
- It's textual (not binary)--but of course, many formats are textual.
- It's reasonably easy for humans to understand without a spec, provided the tag and attribute names are not obfuscated, and the relationships are relatively simple. Note this does not make it easy for programs to understand!
- You don't have to write your own parser. You don't even have to write a grammar--just throw in a tag and the corresponding code to read and write it. This advantage is not as big as some make it out to be: many languages have easy-to-use features for parsing, and those that don't can make use of easy-to-use parser-generator tools.
- There are lots of libraries and tools. Of course, this is self-reinforcing (tools -> popular -> more tools -> more popular -> ...).
Many XML proponents, including some in this thread, would add to this list that XML is a good data storage and/or interchange format. Some "insightfully" note that it is better for data interchange than data storage. This is the biggest delusion over XML: XML is a rotten format for data.
Remember what XML was back before the hype machine was in overdrive? It was a better HTML, and a simpler SGML. HTML and SGML have always been formats for documents, and XML was intended to be the same. XML is indeed a pretty good match for documents. (This is debated of course: documents are complex things, and modeling them is non-trivial. Embedded Markup Considered Harmful, by Ted Nelson, is a good introduction.)
But XML is a poor match for data. This is because an XML document is a tree, and most data are not hierarchical. Consider that the database industry abandoned hierarchical databases many years ago (ok, abandoned is a little strong: we still use LDAP). Hierarchical data formats force you to pick which relationships will define the hierarchy, and any other relationships have to be kluged in.
Take a simple example of the sort of thing people use XML for: address book entries. Say you start out with a person element (I'm not going to write out the examples in XML syntax because it's too painful on slashdot) containing a name element and an address element. Now, you realize that multiple people may live at the same address, and you don't want to duplicate the address (data formats should be normalized). You either have to turn things inside out, putting the person element inside the address element, or make person and address both top-level elements, and link them somehow. In the former case, you have chosen an awkward hierarchy, and have "used-up" your ability to group people. What if you want a different grouping in the future? In the latter case, you have given up a lot of simplicity and read/writability (since now names and their corresponding addresses are in different places) by forcing non-hierarchical data into a hierarchical format.
What is the solution? Well, I won't assert that it is the best data model that will ever exist, but the database industry has settled (roughly) on the relational model. So I think we should create a format describing relations, combined with the other advantages of XML: extensible, textual, readable, and most of all, standards-based. Yes, this would mean we would have to learn two technologies, one for documents and one for data. But the technology for data would be so much simpler--and as a bonus, integrate easily into our databases--that it would be a huge win overall. I don't have time to defend this model in depth. But think about it.
By the way, another example of the bad match between XML and data is the great debate over when you should use elements, and when you should use attributes. The fact that there is an arbitrary decision to be made shows that XML has degrees of complexity that only get in the way when you use it as a data format. (If you're going to use XML for data, at least have the decency to eschew attributes except for an id attribute.)
--

The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.