Perl & XML
The book starts out with a brief explanation of why XML and Perl are well-suited for each other. It then provides a teaser of things to come: an explanation of how to use the XML::Simple module. The first chapter concludes with some warnings and gotchas that seem a little premature since they have not really explained XML. Fortunately, most of these gotchas are covered in context later in the book.
The second chapter provides a whirlwind overview of XML -- covering its structure, DTDs, schemas, and XSLT (transformation). The discussion of XML in general, its history, and parts of an XML document are well done. They give someone who is familiar with static HTML the needed background to understand the structure of an XML document and the vocabulary used to describe it. Unfortunately, the discussion of where XML begins to distinguish itself from HTML, namely with DTDs, the new replacement for DTDs called schemas, and the transformation language XSLT, is too brief. They gloss over these topics with little explanation and few examples. That said, there are other books that do provide more in-depth coverage of XML (this book only promises an introduction).
The next five chapters cover Perl modules designed to process XML, starting with simple parsers and writers. Only methods and syntax relating to XML processing are explained. Therefore, if you are considering reading this book, you should be fairly comfortable with Perl and object-oriented (OO) interfaces to CPAN modules (nearly all the modules discussed provide OO APIs). Again, there are other books and perldoc documentation that cover Perl and it's OO features; so read them first if you are not familiar with OO Perl. If you are familiar with OO Perl, these chapters provide a good overview of the different ways XML can be processed (stream- and tree-based approaches), the advantages and disadvantages of each, and the Perl modules best suited for each approach. These chapters are the biggest strength of this book. The modules discussed in these chapters are by no means an exhaustive list of XML-related modules available from CPAN nor do the explanations of each module cover everything the module does. These chapters do, however, provide the reader with enough information that she can begin to process XML documents intelligently and know where to turn when she needs more information.
The next chapter, Chapter 8, covers XML tree iterators, XPath, XSLT, and XML::Twig. All of these topics are covered in a span of 16 pages (with only slightly over two pages dedicated to XSLT). Indeed, after reading the chapter, you may get the feeling that it was only included so the authors could cram more trite colloquialisms into the book. The short shrift given to these topics creates the impression, which is strengthened in the chapters that follow, that this book was rushed a bit to press.
Chapter 9 discusses applications of XML, including RSS and SOAP, and Chapter 10 is mostly example code. These chapters are intended to give you a feeling for what is possible without really giving you enough information to make it happen. The main problem with these chapters are the examples: the examples are long and the explanations are short. Thus, they are more useful as templates or a quick reference than for learning these topics in detail. Of course, the authors never promised you would be programming SOAP applications when you were done reading this book. And again, there are other books out there which discuss these topics in more detail. So the authors stay true to their promise throughout the book: they will introduce you to XML and tell you how to interact with XML using Perl, no more.
Personally, I found this book did, in general, give me enough information to get started using XML and pointed me where I needed to go to get more information. I am an experienced Perl programmer who is new to XML and comfortable with on-line documentation. This book seems to be written for people who fit this profile and who want to learn by doing (finding the answers to the "hard" questions as they arise). It does introduce a wide variety of XML-related topics and the Perl modules used to interact with them, which is what the authors promised to do in the preface. While it is by no means an authoritative text on Perl and XML, there is something to be said for keeping promises ...
Index As with most first-edition books, the index was adequate but not complete. For example, XML::Twig, which has an entire section covering it, does not appear in the index at all.
Contents
Preface
- Perl and XML
- Why Use Perl with XML?
- XML Is Simple with XML::Simple
- XML Processors
- A Myriad of Modules
- Keep in Mind ...
- XML Gotchas
- An XML Recap
- A Brief History of XML
- Markup, Elements, and Structure
- Namespaces
- Spacing
- Entities
- Unicode, Character Sets, and Encodings
- The XML Declaration
- Processing Instructions and Other Markup
- Free-Form XML and Well-Formed Documents
- Declaring Elements and Attributes
- Schemas
- Transformations
- XML Basics: Reading and Writing
- XML Parsers
- XML::Parser
- Stream-Based Versus Tree-Based Processing
- Putting Parsers to Work
- XML::LibXML
- XML::XPath
- Document Validation
- XML::Writer
- Character Sets and Encodings
- Event Streams
- Working with Streams
- Events and Handlers
- The Parser as Commodity
- Stream Applications
- XML::PYX
- XML::Parser
- SAX
- SAX Event Handlers
- DTD Handlers
- External Entity Resolution
- Drivers for Non-XML Sources
- A Handler Base Class
- XML::Handler::YAWriter as a Base Handler Class
- XML::SAX: The Second Generation
- Tree Processing
- XML Trees
- XML::Simple
- XML::Parser's Tree Mode
- XML::SimpleObject
- XML::TreeBuilder
- XML::Grove
- DOM
- DOM and Perl
- DOM Class Interface Reference
- XML::DOM
- XML::LibXML
- Beyond Trees: XPath, XSLT, and More
- Tree Climbers
- XPath
- XSLT
- Optimized Tree Processing
- RSS, SOAP, and Other XML Applications
- XML Modules
- XML::RSS
- XML Programming Tools
- SOAP::Lite
- Coding Strategies
- Perl and XML Namespaces
- Subclassing
- Converting XML to HTML with XSLT
- A Comics Index
You may also want to check out Erik T. Ray's home page, Jason McIntosh's home page, or O'Reilly's page for the book. You can purchase Perl & XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
You realize that if I get an XML file, I can figure out what it is saying and decide what to do with it. With your ideal (binary) files, I need to reverse engineer the format.
With binary, I need permission to interoperate. With XML, I need a text editor (or print-out) and some common sense.
You worry all you want about the computer's efficiency. I use my machines to make my life easier. I don't jump through hoops to make the computer's life easier...
Taking troll bait,
Alex
Parsing XML indeed. I mean seriously, have any of you ever actually tried to impliment XML parsing? It's an order of magnitude slower than accessing a database, ten zillion times slower than reading a flat file ASCII database, and a trillion times more expensive (well, I'm exaggerating a bit) than reading in a text file with nested variable=value pairs.
Interoperability is great and all, but I think XML is nothing but hype.
Programmers, hear my cry! Spend your precious hours working on your program interface, your error-checking, your overall design and modularity, don't spend time worrying about a scheme with a fancy name that saves data like this: value.
Don't mod me up or down, I just want to foster a discussion about this. I mean, as a standalone programmer using Perl for a majority of their web application products, what benefit does XML give you other than buzzword compliance?
----
Slogan-free since April! We pass the savings on to you!
- Conversion/translation complexity
- Syntax Errors
- Conversion errors
- Storage requirements (object files)
The only benefit AFAIK is that people can read the code better. However, the applications still have to understand the standard coding syntax, which comprises of a hideous amount of keywords and styles. Said applications would have been better off using Assembly (read: efficient) code in the first place.Ban C!
Please note the extremely sarcastic tone of this post.
Your complaints are old fashioned. Maintainability is a major overlooked flaw in Computer Science.
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
Ok, i'll reply to this. I'm proly being trolled :P
Conversion complexity, granted. It does take a bit of work. But would you recomend describing each record with individual lines? That's a bigger pain than ever. What XML gives isn't just a structure for your data, but a language to describe it. It also allows for non-2d data. By this, we can have people with subsets of data, with subsets of different types. This is great, as now we can have a language that describes data in a logical manner and be completely portable.
Conversion errors, please be more spefic. If you convert to a comma delimited format, you are screwd if you do it wrong. If you do it straight to binary, you have to worry about how many bits represent any given data. Why do you think that pack has so many different switches for converting data?
Just because you use XML doesn't mean you must store data in XML form. Hell, it's stupid if you have gigs of data to use XML to relate it all. DB's dont' use xml except for expression of data back to the user/software it talks to (if asked for).
If you are worried about bandwidth, on a simplistic level, gzip it. Yes, compress it. Hell, do a gzip stream which is supported by many browsers.
If your program plans on sharing data, you'll want to use XML. If you never want to share your data, fine use binary. It's not terrible. But once you wanna share it between two machines or processes, now you have to worry about deciphering the binary format. THat is.. unless you work by yourself and have documentation on everything.
-
ping -f 255.255.255.255 # if only
For a more detailed, and more depressing, take on the above, see http://www.xmlsucks.org/but_you_have_to_use_it_any way/.
Yes, it's a PDF. Unroll it - it's worth the effort.
To a Lisp hacker, XML is S-expressions in drag.
I use XML as the interchange format for a web publishing system which publishes our internet web site (http://www.bms.com), but the data is actually stored in an oracle database. I have a perl object which handles all the fuss of getting/putting xml into the database.
As an interchange format XML is ideal; think of it comma separated files on steroids. When all your data can be serialized to XML you get the following benefits:
1. XML has rich data structures for complex info.
2. XML can be self describing.
3. XML is 100% portable.
Like HTML, people will discover uses for your XML files that you never thought of. Also, if you lose all the docs, you can read the XML in a standard text or unicode editor and figure it out. This is even better than comma separated, since most CSV files don't bother to include a first row field discription.
Like CVS, you can parse XML files with standard command line tools like grep. And in 100 years, all those Oracle tablespaces will require a lot of reverse engineering to get the data off it, while your text based xml files will still be parsable.
I agree though, with the general notion of the parent. Definitely don't do XML because it sounds cool. Use the best process for the job, and for many data related jobs, relational tables and SQL are best.
One thing you can do to improve speed; serialize your DOM objects using the Perl Storable module, and save along with your plain text versions. Then when you need to access the data, all you need to do is unserialize the object, which is a lot faster than reparsing.
Peace, or Not?