XML and Perl

← Back to Stories (view on slashdot.org)

Posted by timothy on Thursday January 30, 2003 @04:30AM from the texty-bits dept.

davorg writes "One of Perl's great strengths is in processing text files. That is, after all, why it became so popular for generating dynamic web pages -- web pages are just text (albeit text that is supposed to follow particular rules). As XML is just another text format, it follows that Perl will be just as good at processing XML documents. It's therefore surprising that using Perl for XML processing hasn't received much attention until recently. That's not saying that there hasn't been work going on in that area -- many of the Perl XML processing modules have long and honourable histories -- it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." Read on to see how well Davorg thinks this book introduces XML text processing with Perl to the wider world. XML and Perl author Mark Riehl, Ilya Sterin pages 378 publisher New Rider rating 8 reviewer Davorg ISBN 0735712891 summary Good introduction to processing XML with Perl

XML and Perl is written by two well-known members of the Perl XML community. Both are frequent contributors to the "perl-xml" mailing list, so there's certainly no doubt that they know what they are talking about. Which is always a good thing in a technical book.

The book is made up of five sections. The first section has a couple of chapters which introduce you to the concepts covered in the book. Chapter one introduces you separately to XML and Perl and then chapter two takes a first look at how you can use Perl to process XML. This chapter finishes with two example programs for parsing simple XML documents.

Section two goes into a lot more detail about parsing XML documents with Perl. Chapter three looks at event-driven parsing using XML::Parser and XML::Parser::PerlSAX to demonstrate to build example programs before going to talk in some detail about XML::SAX which is currently the state of the art in event-driven XML parsing in Perl. It also looks at XML::Xerces which is a Perl interface to the Apache Software Foundation's Xerces parser. Chapter four covers tree based XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM and XML::LibXML. In both of these chapters the pros and cons of each of the modules are discussed in detail so that you can easily decide which solution to use in any given situation.

Section three covers generating XML documents. In chapter five we look at generating XML from text sources using simple print statements and also the modules XML::Writer and XML::Handler::YAWriter. Chapter six looks at taking data from a database and turning that into XML using modules like XML::Generator::DBI and XML::DBMS. Chapter seven looks at miscellaneous other input formats and contains examples using XML::SAXDriver::CSV and XML::SAXDriver::Excel.

Section four covers more advanced topics. Chapter eight is about XML transformations and filtering. This chapter covers using XSLT to transform XML documents. It covers the modules XML::LibXSLT, XML::Sabletron and XML::XPath.

Chapter nine goes into detail about Matt Sergeant's AxKit, the Apache XML Kit which allows you to create a website in XML and automatically deliver it to your visitors in the correct format.

Chapter ten rounds off the book with a look at using Perl to create web services. It looks at the two most common modules for creating web services in Perl - XML::RPC and SOAP::Lite.

Finally, section five contains the appendices which provide more background on the introductions to XML and Perl from chapter one.

There was one small point that I found a little annoying when reading the book: Each example was accompanied with a sample of the XML documents to be processed together with both a DTD and an XML Schema definition for the document. This seemed to me to be overkill. Did we really need both DTDs and XML Schemas for every example. I would have found it less distracting if one (or even both) of these had been moved to an appendix.

That small complaint aside, I found it a useful and interesting book. It will be very useful to Perl programmers (like myself) who will increasingly be expected to process (and provide) data in XML formats.

You can purchase XML and Perl from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

138 comments

Re:It's a great book about a terrific subject by zapfie · 2003-01-30 04:42 · Score: 1

Perl is a markup language?

--
slashdot!=valid HTML
Nice by Gortbusters.org · 2003-01-30 04:46 · Score: 1

Though the reviewer didn't think so, I like it when DTD and XML Schema examples are side by side. Having looked at DTD's for quite some time now, have to change gears to the new standard of using XML schemas.

Would be nice to have a book with more than just one chapter on web services. There are a plethura of Java/C# web services books out there, but it's hard to find one on there just for Perl, PHP, etc.

--
--------
Free your mind.
1. Re:Nice by DaRobin · 2003-01-30 05:01 · Score: 5, Informative
  
  Would be nice to have a book with more than just one chapter on web services.
  
  You might be interested in Programming Web Services with Perl then.
  
  --
  Radioactive cats have 18 half-lives.
I'd buy it ... by B3ryllium · 2003-01-30 04:46 · Score: 1, Funny

... but I thought Perl was a write-only language? How can I be expected to read the book, if it's just gibberish like Perl? Geez. :) (Okay, fine - I admit it - I kinda like Perl. But that's another story.)
1. Re:I'd buy it ... by nmtratman · 2003-01-30 05:17 · Score: 1
  
  Perl excels at text processing in part because Perl excels at regular expressions. They are part of the language, instead of a tacked-on library interface. It is easier to extract arbitrarily formatted text using Perl, while other languages have a more difficult time, since the regular expressions don't come naturally in them.
  
  XML has a regular, structured format. It is easily parsed, but almost no one parses it directly. They use a model which represents the data, usually some form of DOM or SAX. Libraries are present in most languages. The need to rely heavily on regular expressions isn't there, and it allows people to choose other languages without paying a huge development penalty.
  
  Not that there isn't a development penalty, but the penalty is mostly same as developing under that language normally. Developing in C will generally take more time than in, say, Perl, Tcl, or Python, because of low-level issues that the other languages don't have. The resulting code, though, isn't necessarily uglier or different in structure.
  
  There are lots of pages on Perl and XML (check google if you don't believe me), but it just seems that Perl doesn't have the overwhelming advantage on other languages on this subject. That's not to say it isn't useful. But if I were to do XML processing, I probably wouldn't be using Perl.
  
  Unless it was to process nasty, arbitrarily formated text into XML.
  
  If you really want your Perl script to be write only, use "chmod 0333 myScript.perl". Nifty language that is constantly coaxing you to the dark side, begging you to give in to your inner desires, to write code that will rip the sanity from those who look at it!
  
  --
  Car analogies work about as well as a Ford Pinto with a keg of beer in the passenger seat.
2. Re:I'd buy it ... by B3ryllium · 2003-01-30 05:45 · Score: 1
  
  I haven't looked in to XML on Perl (although I had a friend write a regexp-based parser of a fairly large XML feed), but I did look passingly at XML in PHP ... it seemed like PHP had a fairly decent implementation. I plan to explore the PHP version in the future, but if the need exists, I'm always open to Perl. :)
  
  I've never even looked at Python code, but I hear it has a few ... oddities ... over other languages. I've never heard anyone say it sucks, though, so that's a plus. :)
You lost me on the incredible leap of logic... by rand.srand() · 2003-01-30 04:50 · Score: 0, Insightful

As XML is just another text format, it follows that Perl will be just as good at processing XML documents.

Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.
1. Re:You lost me on the incredible leap of logic... by sheriff_p · 2003-01-30 05:45 · Score: 3, Insightful
  
  Ah no, see, you forgot to read the first line:
  
  "One of Perl's great strengths is in processing text files."
  
  Perl is good at handling text files. XML is a text file. Therefore, Perl is good at handling XML.
  
  As opposed to:
  
  My pasta maker is good at making pasta. Pasta is a type of food. Ice-cream is also food. Therefore, my pasta maker is good at making ice-cream.
  
  Does that help?
  
  --
  Score:-1, Funny
2. Re:You lost me on the incredible leap of logic... by Anonymous Coward · 2003-01-30 05:45 · Score: 0
  
  if x is a subset of y and you can do z to y, you
  can do z to x.
  in other words, its not that text and xml are both text formats but that xml is a subset of
  text.
3. Re:You lost me on the incredible leap of logic... by Jagunco · 2003-01-30 06:00 · Score: 1
  
  Perl's ability to process text files don't make DOM or SAX any easier (or better) than any other implementation. It's plain dumb to say that since XML files are text files because perl is good at processing text files (sic) so perl is good for processing XML documents.
4. Re:You lost me on the incredible leap of logic... by IpalindromeI · 2003-01-30 06:53 · Score: 3, Insightful
  
  Except that your syllogism is faulty, whereas his is not.
  
  His:
  1. (from earlier in his post) Perl is well suited for processing all text formats.
  2. XML is a text format.
  3. Therefore, Perl is well suited for processing XML.
  
  Yours:
  1. Your pasta maker is good at making pasta.
  2. Pasta is a type of food.
  3. Therefore, your pasta maker is good at making all types of food (for example, ice cream).
  
  You can see that he went from general to specific, whereas you went from specific to general. He argues that being able to do all things in a given set (process all text formats) gives the ability to do one of the things in that set (process a particular text format). You argue that being able to do one thing in a set (make a particular food) gives the ability to do all things in the set (make all foods).
  
  You could save your argument by changing your middle point to be "All foods are a type of pasta," and then your conclusion becomes trivially true. But you'd also have to get everyone to agree that ice cream is pasta.
  
  --
  
  --
  Promoting critical thinking since 1994.
5. Re:You lost me on the incredible leap of logic... by andy@petdance.com · 2003-01-30 08:15 · Score: 1
  
  Of course, many (most?) of the Perl XML modules aren't doing the parsing directly, but calling the expat library, which is not Perl at all.
6. Re:You lost me on the incredible leap of logic... by Requiem · 2003-01-30 08:16 · Score: 1
  
  No, it actually makes perfect sense.
  
  Consider the following:
  
  for all elements e of a set x, y(e).
  z belongs to x.
  therefore, by definition, y(z).
  
  Note the "by definition" part.
7. Re:You lost me on the incredible leap of logic... by C+Joe+V · 2003-01-30 08:45 · Score: 1
  
  Note the "by definition" part.
  y(z) does not follow "by definition" unless your first assumption is the definition of y. It is not the definition of perl that it is good at processing text files. It is at most a fact about perl.
  In fact I don't think it makes sense to claim "For all text files T, perl is good at processing T." That's pretty nonsensical if you ask me. And I don't think it's even true that "For all operations O on text files, perl is a good implementation language for O." If this were really true, then it would follow that perl is a good language for any operation that assumes its input is an XML file, which is what the original poster seemed to mean.
  What's more reasonable is to relax the statement to "There exists a fairly large class C of operations on text files such that perl is a good implementation language for any operation O in C." But it does not follow from this that perl is good for XML, unless the intersection of C and "XML operations" is a fairly large portion of the latter set.
  CJV
8. Re:You lost me on the incredible leap of logic... by Golias · 2003-01-30 10:08 · Score: 2, Insightful
  
  As XML is just another text format, it follows that Perl will be just as good at processing XML documents.
  Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.
  That only correlates if ice cream is a type of pasta, because XML is a text format.
  This is a lot more like saying "since my pasta maker is good at making Ziti, Rigate, Macaroni, etc., all pastas really, and Spaghetti is a type of pasta, my pasta maker should be good at making Spaghetti.
  
  --
  Information wants to be anthropomorphized.
9. Re:You lost me on the incredible leap of logic... by Anonymous Coward · 2003-01-30 12:04 · Score: 0
  
  "His:
  1. (from earlier in his post) Perl is well suited for processing all text formats.
  2. XML is a text format.
  3. Therefore, Perl is well suited for processing XML."
  
  is as faulty as is
  
  1. VW Beetles are well suited for driving on all roads.
  2. Formula 1 race tracks are roads.
  3. Therefore, VW Beetles are well suited for competing on Formula 1 race tracks.
  
  Yes you can process XML as text (just as you can use a VW Beetle in a Formula 1 race), but if you treat XML as XML (by using specialized Perl libs), you are going to have a lot more fun (as they do with their fast and specialized cars).
10. Re:You lost me on the incredible leap of logic... by Directrix1 · 2003-01-30 14:20 · Score: 1
  
  Really I think the logical fallicy here is the fact that people think that PERL is good at processing text files. PERL is good at parsing text files which can be expressed by a regular language. That is about the scope of the effectiveness of the PERL pattern matching's usefulness.
  
  --
  Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
11. Re:You lost me on the incredible leap of logic... by C+Joe+V · 2003-01-31 06:03 · Score: 1
  
  I agree completely. Not having any particular expertise in this area, I won't argue whether regular languages occur often enough in practice to justify blanket statements about perl. But my point, which I never quite said explicitly, was that XML is not regular (in the sense of regular expressions) and therefore perl's advantage for the task is limited if it exists at all.
  CJV
12. Re:You lost me on the incredible leap of logic... by IpalindromeI · 2003-01-31 18:27 · Score: 1
  
  The logic isn't faulty. That is a completely separate matter from the claims of the premises. You may not agree that XML is merely a text format, but the logic is sound, nevertheless. Unlike yours, where your premises don't support your conclusion. Your first premise claims VW Beetles are well suited for driving, which you then try to use as support for a conclusion about high-performance racing. Most driving would hardly be considered high-performance racing. Your first premise should have been that "VW Beetles are well suited for competing on all roads." This would make your logic sound, but I think you'd have a much harder time convincing people of that than you would convincing them that XML is a text format.
  
  --
  
  --
  Promoting critical thinking since 1994.
13. Re:You lost me on the incredible leap of logic... by Requiem · 2003-02-01 18:27 · Score: 1
  
  The question is not "Is Perl good for processing text?" The question is "do the conclusions follow validly from the premises." Given the premises, yes, it does. It says nothing of the real-world truth of the premises, only that we're taking them to be true in this particular argument.
XML is NOT just text! by Anonymous Coward · 2003-01-30 04:51 · Score: 5, Insightful

The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.
1. Re:XML is NOT just text! by DaRobin · 2003-01-30 05:28 · Score: 3, Interesting
  
  True and then not so. Perl's flexible data structures and OO make it a simpler approach than languages that think XML == Object Serialisation. It is also very likely that a lot of what you're going to see flying by in SAX or hanging around in DOM will be text. Sometimes lots of it, sometimes text that has non-XML structure and requires microparsing.
  
  But anyway, what really puts Perl ahead of the pack (together with Python, the only viable competitor I've tried -- Java is really lagging these days) is its large wealth of SAX (and to a lesser degree, DOM) tools. All sorts of very useful filters can be grabbed, complex pipeline management is a given, the SAX writing framework is cool, there are SAX parsers for many non-XML formats, etc.
  
  --
  Radioactive cats have 18 half-lives.
2. Re:XML is NOT just text! by consumer · 2003-01-30 05:31 · Score: 3, Insightful
  Let's see...
  
  Editable in emacs (or vi). Check.
  
  Grep-able. Check.
  
  Diff-able. Check.
  
  Understandable to the naked eye. Check.
  
  Sure smells like text to me.
3. Re:XML is NOT just text! by Anonymous Coward · 2003-01-30 05:39 · Score: 3, Insightful
  
  What you're looking at there is one possible representation of an XML document. What you can see is NOT XML. XML is an idea - a hierarchical data structure. If you're manipulating some XML programatically, you should be manipulating this hierarchical data structure, and you'll be using some sort of API (SAX or DOM, probably) to do so. You should emphatically NOT be manipulating text strings. Any code of the form
  tag = tag + "</" + tagname + ">"
  means you're doing it wrong.
  
  So, no, XML is not editable in emacs (or vi), grep-able, diff-able or understandable to the naked eye. Go and think about it again.
4. Re:XML is NOT just text! by EvlG · 2003-01-30 06:10 · Score: 2, Insightful
  
  I think it is interesting to note that this is precisely the reason that XML is poorly suited for any task that requires human intervention.
5. Re:XML is NOT just text! by Anonymous Coward · 2003-01-30 06:34 · Score: 0
  
  according to this paper on soap, your argument doesn't hold water. Perl xml parsing is actually equal but not better than Java or .NET.
6. Re:XML is NOT just text! by nosferatu-man · 2003-01-30 06:50 · Score: 1
  
  ... which of course kicks the chair out from under of one of the primary arguments of the XML snakeoil salesmen, that XML is "human-readable", to say nothing of "human-editable".
  
  'jfb
  
  --
  To spur "enterprise Linux," Big Bang, the distributed two-phase commit.
7. Re:XML is NOT just text! by ClosedSource · 2003-01-30 07:50 · Score: 1
  
  I'm not sure that "human-readible" is a primary argument for XML, but I don't think it matters much. ASCII codes aren't "human-readible" either, text editors convert them to characters we can read.
  
  You can't efficiently use a text editor to edit pictures, sounds or movies but this doesn't limit our ability to edit them using more appropriate tools.
  
  If I were going to edit or process XML, I would use the best tool for the job and if that's not a text editor, so what?
8. Re:XML is NOT just text! by cygnus · 2003-01-30 09:02 · Score: 2, Insightful
  
  you're doing it wrong.
  ...
  So, no, XML is not editable in emacs (or vi), grep-able, diff-able or understandable to the naked eye. Go and think about it again.
  yes it is.. just because you claim that "you're doing it wrong," doesn't mean it's impossible.
  xml is text just as much as html is.. are you going to tell me that html isn't editable in emacs or human-readable? how is html different from DocBook, for example?
  
  --
  Just raise the taxes on crack.
9. Re:XML is NOT just text! by orcrist · 2003-01-30 09:41 · Score: 2, Insightful
  
  The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.
  
  I'll respond to you though many others are making similar arguments. First of all, when you say "XML is NOT just text!" do you mean "XML is NOT merely text" or "XML is not solely text"? I'll agree with the first, but the second is generally not true.
  
  What noone seems to be mentioning is what you get out of those libraries: you get the entire structure in nodes thanks to the library's parser, but what are the contents of those nodes? Text! You might argue that the element names and most of the attributes are either defined by the dtd/schema, etc. but at least CDDATA will often be abitrary text. And, at least in my experience (mostly web-based applications), there will often be a need to process some of that text, e.g. extract links which are embedded in the text, convert newlines to <br>s, and many other things. And then, isn't it handy when the language reading the contents of those nodes has strong text-handling abilities?
  
  Just a thought.
  
  -chris
  
  --
  San Francisco values: compassion, tolerance, respect, intelligence
10. Re:XML is NOT just text! by Golias · 2003-01-30 10:14 · Score: 1
  
  The whole point of XML is that it is NOT just a string of text.
  Actually, the whole point of XML is that it is just a string of text.
  If XML parsers used a file format that wasn't human-readable text, there would be little point in using it, and we would all just stick with object-model databases.
  
  --
  Information wants to be anthropomorphized.
11. Re:XML is NOT just text! by Ed+Avis · 2003-01-30 10:25 · Score: 1
  
  XML is text. XML is not just text.
  
  The point is that the document conforms to a certain structure: either rigidly (as when validating against a DTD or similar schema definition), loosely (as with well-formed XML, where elements must be closed correctly, but you can mix any elements and attributes you want), or something in between.
  
  It's not obvious at all that Perl is a natural mix for processing XML. The things which Perl does so well - line-by-line file processing, string operations, regular expressions - are not very useful on XML. (For example you cannot match a balanced tree structure with a regular expression, so you can't use the standard string processing to do something so simple as extract an element and its contents.) Indeed they may lead you in a false direction at first. For quick throwaway tools, where the file is already pretty-printed in a certain way, Perl string operations may do the trick; for building applications that need to handle XML they are inadequate.
  
  To read and write XML you will need libraries, and that is the case in any language. Perl has a good selection including the standard-API-but-very-slow XML::DOM, the nonstandard-API-but-useful XML::Twig, and the I-used-to-use-it-but-IMHO-it-is-best-avoided XML::Simple. But using these libraries isn't particularly easier from Perl than from any other language.
  
  The ideal XML processing language would have a type system which could check at compile time whether the output you are generating will be valid for the DTD you have chosen; and it would also map the XML's DTD or schema onto the language's type system at input. For example, no need to get the list of child elements and get the first element from it, if the DTD specifies that there must be exactly one child.
  
  --
  -- Ed Avis ed@membled.com
12. Re:XML is NOT just text! by nosferatu-man · 2003-01-30 10:35 · Score: 1
  
  > I'm not sure that "human-readible" is a primary
  > argument for XML ...
  
  Sure it is. It's the entire justification for having a text-based protocol -- otherwise, why waste the cycles?
  
  'jfb
  
  --
  To spur "enterprise Linux," Big Bang, the distributed two-phase commit.
13. Re:XML is NOT just text! by Anonymous Coward · 2003-01-30 12:13 · Score: 0
  
  I could not agree more. To a computer, there is nothing about xml that makes it better than an edi document, a cobol file, or any other file format. In fact, compared to the actual data, there is so much IO reading an xml file, that it is harder for a computer to read than most formats.
  
  Folks, we have seen this before: Java did not take over the world; not all code is object oriented; intranets did not solve all (or any for that matter) company information problems; pascal. . . ada. . . cobol. . . fortran. . . intelegent systems. . . paperless office. . . and xml will not be the only data exchange format.
  
  Without going into detail I will say that the following things really did change things: email, file sharing, erp, edi, and fax. They all have one thing in common; they save time on the front end. How does xml save time on the front end?
14. Re:XML is NOT just text! by fishbowl · 2003-01-30 12:56 · Score: 1
  
  >Sure it is. It's the entire justification for
  >having a text-based protocol -- otherwise, why
  >waste the cycles?
  
  You don't use the text-based *representation* unless you are marshalling or unmarshalling the data. When you work with bound XML objects, you are using the document model as a container for methods to process the data, but not necessarily as a means to present the data in a text format.
  
  You can use XML to represent data which is stored in a RDBMS. Naturally you can see that just because your query is presented as Document Nodes, and/or translated to a document marked up according to some DTD, that the document or the object in memory is not the same thing as the data in the database.
  
  XML in a text file is not "the data", unless that's where your application needs it. There are plenty of applications for XML where the data never sees ascii at all.
  
  Read up on JAXB.
  
  --
  -fb Everything not expressly forbidden is now mandatory.
15. Re:XML is NOT just text! by grantm · 2003-01-30 13:04 · Score: 2, Insightful
  What you're looking at there is one possible representation of an XML document.
  
  I couldn't agree less. In fact, XML is one possible representation of the abstract hierarchical data structure you described. Furthermore, XML is in fact a text representation. There are many other ways you could represent that data structure (eg: a custom binary format, records in a relational or hierarchical database, a object serialised to a binary stream etc) but none of them are XML.
  
  The W3C themselves say that "XML is text" and then go on to point out that advantages of being a text format include:
  
  you can look at data without needing the program that produced it
  
  you can read it with you favourite text editor
  
  it's easier for developers to debug
  
  They also say: "Like HTML, XML files are text files that people shouldn't have to read, but may when the need arises".
  
  In parallel with the development of XML, our notion of the definition of 'text' has also moved forward. Through the adoption of standards like Unicode and bridging facilities like encoding declarations, we have moved past 7-bit ASCII as being the one true text.
  
  To claim that an XML file is not "editable in emacs (or vi), grep-able, diff-able or understandable to the naked eye" is demonstrably untrue. You'll obviously need a text editor that understands whichever encoding the file uses (both emacs and vim fit that bill) but a text editor is a perfectly servicable tool for viewing and editing XML (obviously not the best tool in many cases, but acceptable nontheless)
16. Re:XML is NOT just text! by bad-badtz-maru · 2003-01-30 18:32 · Score: 1
  
  What is the meaning of your sig? I am curious, because I feel that the lack of two-phase commit is one of the major issues preventing the open-source rdmbs servers from competing fully with Oracle. I was curious if your sig was related to this issue.
  
  maru
17. Re:XML is NOT just text! by DaRobin · 2003-01-30 21:58 · Score: 1
  
  I know that paper, so? I never said it was faster (especially as SOAP::Lite does not use the fastest parsers). I was talking about the flexibility and the wealth of tools. That's easily verifiable, and given that I use both Perl and Java for XML work all day, I simply know I'm right.
  
  --
  Radioactive cats have 18 half-lives.
18. Re:XML is NOT just text! by nosferatu-man · 2003-01-31 07:10 · Score: 1
  
  Heh. It's actually randomly generated text from a disassociating web crawler I wrote once. I used to run all kinds of dot.com press releases through it, just for kicks.
  
  'jfb
  
  --
  To spur "enterprise Linux," Big Bang, the distributed two-phase commit.
Natural? by CaseyB · 2003-01-30 04:55 · Score: 1, Redundant

As XML is just another text format, it follows that Perl will be just as good at processing XML documents.
Not really. If you're using XML as "just another text format", then you're making a funamental mistake. Within your software, you should always be treating XML as a hierarchical data structure, not as a text stream. Apart from manipulating CDATA or attribute value text, Perl has no particular strength with XML.
1. Re:Natural? by mortonda · 2003-01-30 05:05 · Score: 3, Informative
  
  Not really. If you're using XML as "just another text format", then you're making a funamental mistake. Within your software, you should always be treating XML as a hierarchical data structure, not as a text stream. Apart from manipulating CDATA or attribute value text, Perl has no particular strength with XML.
  
  Indeed, the perl only XML libraries are quite slow. I believe most of the quality perl XML handling is done by modules that use C libraries to do the grunt work. However, if the data in the XML itself is text data, then of course, perl and XML are a good match. Add SOAP and mod_perl into the mix, and you got some very nifty tools.
Petal by Chris+Croome · 2003-01-30 05:00 · Score: 3, Informative

One new, and cool, Perl XML module that people might not know about is Petal (PErl Template Attribute Language).

It is an implementation of the Zope TAL (Template Attribute Language) specification and it basically allows you to create XML templates where all the templating commands are just attributes of existing tags.
This allows things like XHTML templates which are very WYSIWYG friendly since the editors don't do anything with attributes that they don't know about.

--
Check out MKDoc a mod_perl CMS
This was a review? by Syris · 2003-01-30 05:03 · Score: 4, Insightful

I'm sorry, but this just wasn't a terribly deep review and well below par for /. Listing contents of a book and then nitpicking a detail don't a book review make.

How effective were the examples? How easy to read and understand were the general concepts? Were the descriptions of libraries and API's clear? Was the writing generally readable?

Would this book even make a good reference?

Jeez, anyone want to follow up the post with a real review?
XML frees us from Perl by Euphonious+Coward · 2003-01-30 05:07 · Score: 4, Interesting

The whole point of XML is to free us from having to do the kinds of things Perl is meant for. Absent free-form text munging, Perl really has no advantage over other languages. At the same time, it has real deficits for people who need to know they have solved a problem correctly and completely.
(For reference, see this rant by the brilliant net.kook Erik Naggum. The most quotable bit, for the lazy among you, is
...[Perl] rewards idiotic behavior in a way that no other language or tool has ever done, and on top of it, it punishes conscientiousness and quality craftsmanship -- put simply: you can commit any dirty hack in a few minutes in perl, but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer; you can make something work, but you can't really figure out its complete set of failure modes and conditions of failure. (how do you tell when a regexp has a false positive match?)
)
1. Re:XML frees us from Perl by Slugbait · 2003-01-30 05:22 · Score: 1
  
  The whole point of XML is to free us from having to do the kinds of things Perl is meant for. Absent free-form text munging, Perl really has no advantage over other languages. At the same time, it has real deficits for people who need to know they have solved a problem correctly and completely.
  
  I essentially agree with you but one still has the problem of merging a non-xml document into xml form. Here perl can be fairly useful.
2. Re:XML frees us from Perl by Euphonious+Coward · 2003-01-30 05:33 · Score: 1
  
  Slugbait writes: "...one still has the problem of merging a non-xml document into xml form."
  That's true, but the Perl XML-handling modules are not much help for that.
3. Re:XML frees us from Perl by DaRobin · 2003-01-30 05:40 · Score: 2, Informative
  
  Not much help? If you start counting the number of Perl modules that expose a SAX interface to non-XML data (not to mention the host of other super-useful SAX tools) you'll probably find only one egal, Python.
  
  And if you think that XML has freed us from additional text processing, you obviously haven't used XML much, or at least without much variety. Most people seem constantly bent on including microlanguages in attribute values or text content. Those need good text processing.
  
  --
  Radioactive cats have 18 half-lives.
4. Re:XML frees us from Perl by glwtta · 2003-01-30 05:45 · Score: 4, Insightful
  
  how do you tell when a regexp has a false positive match?
  A what? You (or rather the brilliant person being quoted) either mean that it matches a string that the expression isn't supposed to, which would be a serious bug in the language (and I am not aware of any such bugs); or you mean that it matches correctly, but matches things you didn't expect it to, in which case you tell, by (gasp!) testing your code. In any case, how do you tell a "false positive" regexp match in Java?
  but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer
  Perhaps you can't. I have, and I do.
  
  --
  sic transit gloria mundi
5. Re:XML frees us from Perl by Anonymous Coward · 2003-01-30 05:52 · Score: 1, Interesting
  
  Okay, I'll bite.
  
  The whole point of XML is to free us from having to do the kinds of things Perl is meant for.
  
  So how does XML do that in, let's say, system administration?
  
  Absent free-form text munging, Perl really has no advantage over other languages.
  
  So ehmm... what type of things is XML made out of? Elements' names, contents, etc, it's all text.
  
  You can commit any dirty hack in a few minutes in perl, but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer
  
  You can write a dirty hack in any language. And about the last part: what about CPAN?
  
  (How do you tell when a regexp has a false positive match?)
  
  That would be by understanding the regex, just as any other chunk of code. (Funny, that... When you want to say something bad about Perl, moan about its horrible, illegible, etc regexes. When you want to mention something positive about another language -- especially when comparing to Perl -- mention support for powerful, fast, etc regexes. And advertised as "Perl-compatible" at that.)
  
  -- Arien
6. Re:XML frees us from Perl by Anonymous Coward · 2003-01-30 06:30 · Score: 0
  
  Yeah, I'm amazed how many Perl programs don't handle error conditions well. By "don't handle well" I mean "ignore completely". When I first saw this page (it's written in Perl), some of the values were zero when they shouldn't have been. They get the data by scraping altavista, but they don't check for errors when they retreive the data. Lucky it's just a novelty site and isn't actually showing something important.
  
  Slashdot (written in Perl) randomly gives me some some weird "formkey" error when I try to post -- that's a step up, at least it's recognizing that an error occured -- but it's caught too late, and the software tries to blame the error on me. It says I had pressed the back button (I hadn't) or I have a firewall (I have, but I don't see what that's got to do with random errors on Slashdot). Clearly an error had occured earlier, but they didn't catch it at its origin.
  
  Also on Slashdot, when the site is under heavy load, the front page sometimes shows ads -- the same ad repeated -- between each story. I don't think that's meant to happen.
  
  Then there's the famous story of the two high-school kids who were suspected of taking a shotgun to school because of a subtle Perl error.
  
  That's just some anecdotal evidence, but it's representative of my personal experience with software written in Perl. I don't know if it's the language or the programmers. I suspect it's both. Some more anti-Perl material:
  
  What's wrong with Perl by Lars Marius Garshol
  
  Is Perl Difficult? by Paul Prescod
7. Re:XML frees us from Perl by Euphonious+Coward · 2003-01-30 07:38 · Score: 1
  
  Arien asked, "how does XML [free us from doing the kinds of things Perl is meant for] in, let's say, system administration"
  When config files are in XML, they can be munged programmatically without regexp hackery.
  He goes on, "... what type of things is XML made out of? Elements' names, contents, etc, it's all text."
  It's not free-form text, it's structured text. Somebody else pointed out, though, that there is a distressingly large amount of free-form text to be parsed in attribute strings, body text, and (!) comments, that XML structure extraction tools don't help with.
  (I won't answer criticism of Naggum's rant; he's not known as a net.kook for nothing. Take it up with him.)
8. Re:XML frees us from Perl by scrytch · 2003-01-30 07:57 · Score: 2, Insightful
  
  Maybe the author was unable to write anything but hacks, and couldn't make anything elegant or maintainable. I've written programs with multiple subsystems, and put them well into maintenance without a lick of trouble, all in perl.
  
  Yes, $dd->updsp( 1,3, @ad ) looks worse than $Driver->update_displays( $Display:LOBBY, $Display:CUSTSERV, @additional ), and boy it's just a shame that perl doesn't let me use meaningful identifiers or document API's or forward declare functions for arg checking ahead of time. Oh wait... Really. The argument is dead, continuing to raise it is just trolling.
  
  I switched to python because I got tired of leaning on my shift key. Tcl has probably the prettiest syntax for me, but as a language it's braindead beyond belief (not to mention slow)
  
  --
  I've finally had it: until slashdot gets article moderation, I am not coming back.
9. Re:XML frees us from Perl by Internet+Dog · 2003-01-30 08:29 · Score: 1
  
  Absent free-form text munging, Perl really has no advantage over other languages. At the same time, it has real deficits for people who need to know they have solved a problem correctly and completely.
  Absolutely. Once you get beyond text parsing by standadizing the syntax, the goal of a program is to manipulate objects. XML maps very well into object trees and that is why it is commonly processed using Java and Python. If you want the powerful capabilities of a dynamically typed language, with a simple, easy to learn grammer, then you should use Python for processing XML, not Perl. (Perl's object syntax is as obtuse as the rest of the language and offers no advantages over the elegant object model of Python. In fact, Larry Wall borrowed much of the Perl object design from Python. Use the genuine original, not the imitation.) The standard Python library includes a fine package for navigating through XML data and zero text processing code needs to be written to do this. It's objects all the way down.
  There is a good article that explains how to use Python generators to process XML content. This is something you will never be able to do as easily in either Java or Perl.
10. Re:XML frees us from Perl by Mark_Uplanguage · 2003-01-30 09:40 · Score: 1
  
  How do you tell when anything has a false positive match...TESTING
  
  --
  "The difference between stupidity and genius is that genius has its limits." -- Albert Einstein
11. Re:XML frees us from Perl by danpbrowning · 2003-01-30 18:40 · Score: 1
  
  Agreed. Apparently this Erik Naggum fellow isn't very well informed. Perl can be as beautiful as any language, or uglier than them all -- it's up to the programmer.
  
  --
  Daniel
Re:Mod this down, please. by Alcohol+Fueled · 2003-01-30 05:07 · Score: 0

"...and stories could be filed once! in a filing cabinet."

Knowing Slashdot, someone would have made a copy of the newsletter and filed that too...

--
Ah am not a crook! (\(-__-)/)
Let's reinvent the wheel again by Chocolate+Teapot · 2003-01-30 05:11 · Score: 1

Although I agree that Perl/XML sounds like a powerful and flexible way to serve dynamic content, I can't help thinking that it is ultimately better to adapt existing frameworks (Slashcode, PHP-Nuke & friends etc..). Maybe a friendly group of Perl/XML gods will read the book and produce a framework/toolkit that the rest of us mere mortals can use. I suspect that I will buy this book anyway, read it, and after frying my brain for a few days I will stuff it on my bookshelf and walk away with a huge inferiority complex. My bookshelf makes me look like a guru, but secretly, my encyclopaedic knowledge comes from here.

--
Modest doubt is called the beacon of the wise. - William Shakespeare
1. Re:Let's reinvent the wheel again by jslag · 2003-01-30 05:35 · Score: 1
  
  Maybe a friendly group of Perl/XML gods will read the book and produce a framework/toolkit that the rest of us mere mortals can use.
  
  That happened years ago: the Apache XML project's AxKit.
2. Re:Let's reinvent the wheel again by davorg · 2003-01-30 07:13 · Score: 1
  
  Although I agree that Perl/XML sounds like a powerful and flexible way to serve dynamic content,
  
  You need to move on from thinking that everything is there purely to be used for the web. Well over half of the work I've done with XML and Perl has nothing to do with the web.
3. Re:Let's reinvent the wheel again by Chocolate+Teapot · 2003-01-30 07:17 · Score: 1
  
  Considering that 'web' appears five times in the story, I don't think I jumped to any conclusions.
  
  --
  Modest doubt is called the beacon of the wise. - William Shakespeare
4. Re:Let's reinvent the wheel again by Anonymous Coward · 2003-01-30 11:34 · Score: 0
  
  this might be it!
  
  http://www.axkit.org/
5. Re:Let's reinvent the wheel again by hondo77 · 2003-01-30 11:49 · Score: 1
  
  Although I agree that Perl/XML sounds like a powerful and flexible way to serve dynamic content...
  
  I am using Perl, XML, and Apache but it has nothing to do with serving up content for the web (or humans, for that matter). Don't think of XML as a better HTML for web pages. It's a lot more.
  
  --
  I live ze unknown. I love ze unknown. I am ze unknown.
i hate perl... by cygnus · 2003-01-30 05:12 · Score: 1

and i know there are going to be a lot of posts saying "XML obviates Perl!"...

but i disagree. Perl absoulely RIPS through this stuff, unlike the Java stuff i've written. sometimes, there's nothing like some good, old-fashioned procedural code to munge one document into another.

the only problem i had was with UTF-8 stuff. perl really wasn't quite there until perl 5.8, and i'm having trouble finding installs of it on the machines i need to use it on at the university i work for.

--
Just raise the taxes on crack.
1. Re:i hate perl... by dubbayu_d_40 · 2003-01-30 06:45 · Score: 0, Flamebait
  
  Perl does not rip through text files. Programmers can rip through perl code, but perl is SLOW. I once rewrote a Perl parser in Java and went from 9hrs to 45mins.
2. Re:i hate perl... by etcshadow · 2003-01-30 07:51 · Score: 2, Insightful
  
  "I once rewrote a Perl parser in Java and went from 9hrs to 45mins"
  
  Well, shit. I once rewrote a Perl parser in *Perl* and went from 9hrs to 45mins. What the hell kind of flame-bait shit is this!?
  
  It is true that extremely well-written C code can outperform perl code at anything. It is also true that for things that perl is made for (like ripping through tons of text-data), a typical Perl program will *most likely* do it better than a typical C program, simply because it is making use of more optimized underlying algorithms (even though the actual execution structure is slightly more bloated than C... double-dereferencing pointers, compile-time imediately before run-time, etc). ... However, Java is just as goddamn interpretted as Perl, if not more so! Perl compiles to *native* byte-code prior to execution, unless you are talking about eval'd strings, whereas Java sits in non-native byte-code that has to be interpretted real-time by the VM. Best case: you have a good just-in-time compiler that pulls Java up to even with Perl (that is, compiled imediately prior to run-time into native byte-code).
  
  Also, Java has all the same disavantages with respect to C... that is more insulation from the *actual* memory (no such thing as a real pointer in either, garbage-collection, etc).
  
  Anyway, bottom-line is this. If what you say is at all true, then you had a shittily-written Perl program. I promise you that I can write just as shitty a program in Java... does that mean that we should trash Java?!?!? Abso-f*cking-lutely not! I'll do you one better, too: I'll write just as shitty and slow of a parser in Java that doesn't even *look* that bad to someone who doesn't understand the subtleties behind such simple abstractions as strings, lists and arrays.
  
  I'm very serious with what I said originaly, I have, in fact, taken a Perl parser (a super-light-weight XML parser, actually) and reduced the parse-time by several orders of magnitude. The idiot who wrote it originaly (myself), went walking through the string or stream looking for 's (with a regexp), at the highest level. It is *terribly* slow to strip leading characters off of a long string in Perl (I'm pretty sure that it copies the whole goddamn string, minus those 10 (or however many) characters on the front). I made a *very* simple change, namely this:
  
  # split on positive lookahead assertion of a ''
  # then we just deal individually with blocks of text that all start
  # with a ''... should save time
  my @xml = split(/(?=)/,' '.$xml);
  shift @xml;
  
  And, you'll note that I f*cking commented it (something which people just don't seem to understand when they trash perl). Bang! Many orders of magnitude in speed improvement. Simple.
  
  Anyway, pull your head out of your ass.
  
  --
  :Wq
  Not an editor command: Wq
3. Re:i hate perl... by dubbayu_d_40 · 2003-01-30 11:48 · Score: 1
  
  You are right of course. Perl is as fast as Java and paring XML is most assuredly the domain of a scripting language.
  ha ha ha
The Right Tool for the Right Job by nathanz · 2003-01-30 05:23 · Score: 2, Funny

I think one of the main reasons Perl and XML aren't generally used together is because Perl isn't object oriented in the same way the Java and C# are. I know that OO concepts have been bolted on to Perl in the same way the OO was bolted on to C++ and in my opinion with similar results (i.e., kludge-fest). It's very natual in Java to parse an XML doc and get an object, while it's more natural to parse a log file or CSV file with Perl.
1. Re:The Right Tool for the Right Job by DaRobin · 2003-01-30 05:44 · Score: 1
  
  That's because you see XML more or less as an object serialisation syntax when it has been proven over and over again that there's serious impedance mismatch between those two views (at least, with Java's rather limited view of OO). See XML Schema if you don't think so.
  
  Don't forget that the Desperate Perl Hacker was in the requirements for XML. And they succeeded pretty well in making XML match Perl.
  
  --
  Radioactive cats have 18 half-lives.
2. Re:The Right Tool for the Right Job by Nohea · 2003-01-30 17:02 · Score: 1
  
  Actually, i use the SAX interface with Perl. This is writing an OO event handler. Same way Java folks do.
  
  http://sax.perl.org/
  
  It's just as natural to do OO w/Perl as it is doing text parsing. Perl just doesn't force you to do it one way.
Re:Who cares about Perl? by GombuMstr · 2003-01-30 05:26 · Score: 1

Uniquely enough our data processing that has nothing to do with the web is heavily constructed with perl. We love the flexibility of it. It doesn't take to long for a new person to figure out how our daily processing works.

In fact I have been looking into perl-xml for processing of scalc spreadsheets that our stores send to us every day. It has been a valuable tool and we would be up a creek with Windows tools trying to do the exact same thing.

--Travis
Perl is a reflection of your soul by Nexus7 · 2003-01-30 05:27 · Score: 4, Interesting

Well, perhaps not your soul, but your Perll code just reflects the way you think to a greater extent than other languages. This isn't something that's done underhandedly, it is well advertised in every posting in c.l.perl and the Camel book, and every other book about Perl. Which is that Perl is not at all orthogonal, TMTOWDI (there's more than one way to do it). If you want to be rigorous and declare everything and not have your typos become references automatically, you "use strict" and your magic line is "#!/usr/bin/perl -w". If not, well Perl allows you to do that too. If you want objects, you can do that, if not, not.

If is possible to write quality code in Perl Just because the language allows you to not do so isn't its fault. It doesn't stop you from doing it, because that'd stop you from doing brilliant things.

To address some specific things you mentioned, you can do full-fledged exception handling in Perl if you want to (with eval and specific modules), or, you know, not. And I'm not familiar with the false positive matches in regexps (perhaps you're referring to some famous problem). But if a regexp doesn't do what you want it to, isn't is wrong? Between // and tr and split I get along just fine.
If you REALLY want to buy the book by Cy+Guy · 2003-01-30 05:30 · Score: 1, Offtopic

Then maybe you should get it from Amazon, where it is $12 cheaper.

Please Rob, explain to us how whatever deal you have with bn.com is worth your user base overpaying by so much? Users can buy the book through the link above, and I will put a third of my affiliate commission (about $1.40 per copy) towards Perl development projects. This way everybody wins. Using your link, I assume you win, and that bn wins, but your loyal user base is out an additional $12 and I can't imagine your deal with bn.com nets you that much for providing the link.

--
Work for Change & GET PAID!
1. Re:If you REALLY want to buy the book by Anonymous Coward · 2003-01-30 05:52 · Score: 0
  
  They use BN because Jeff Bezos is a clown who abuses the patent system. Why support that shit?
2. Re:If you REALLY want to buy the book by graxrmelg · 2003-01-30 05:52 · Score: 3, Informative
  
  But if people are interested in getting a good price rather than putting a commission into your pocket (and contributing to a company that abuses software patents), maybe they should order it from Bookpool instead, for $3 less than Amazon. (I don't have any affiliation with Bookpool.)
3. Re:If you REALLY want to buy the book by Badger · 2003-01-30 06:30 · Score: 2, Informative
  
  Actually, /. used to link to Amazon, and had an affiliate program. Once Amazon started enforcing their one-click patent, and the Amazon boycott began, /. switched to Fatbrain (which was bought by BN).
4. Re:If you REALLY want to buy the book by Cy+Guy · 2003-01-30 08:36 · Score: 1
  
  But if people are interested in getting a good price rather than putting a commission into your pocket (and contributing to a company that abuses software patents),
  
  They can't really abuse the patent, they can only take advantage of it. If you want to boycott anyone over their one-click patent, boycott the US government that issued them the patent. If you think the patent was issued in error, then provided the prior art to discredit the patent. ...they should order it from Bookpool [bookpool.com] instead, for $3 less than Amazon.
  
  Except that unless you buy another book to get over BookPool's $40 Free Shipping threshhold you will just be paying that $3 to UPS instead of me and the Perl Development Fund. Amazon's Free Shipping threshhold of $25 falls conveniently just under their price for the book.
  
  --
  Work for Change & GET PAID!
5. Re:If you REALLY want to buy the book by zonker · 2003-01-30 15:16 · Score: 0
  
  it is funny how people deem the ethics of amazon to be worse than the ethics of barnes and noble putting small local booksellers out of business...
  
  --
  Large print giveth, and the small print taketh away
Re:Formalised features of Perl (in this book?) by Mr.+Droopy+Drawers · 2003-01-30 05:38 · Score: 2, Interesting

As you're also aware, most Comp Sci courses fawn over Pascal, a VERY formalized language. However, it's not mentioned much past education circles (and Apple afficionados).

In practice, reference counting doesn't seem to lead to memory leaks as you describe. And, I would argue it is much more efficient than Java's method.

PERL is an excellent SCRIPTING language. Larry Wall describes it as a "glue" language. XML is a good thing to glue together. It's perfect for that. Every tool has its purpose; push any too far, and you start abusing it.

Trying to find the quote from Larry Wall. I think it goes something like this: "Perl did easy things easily and made impossible things doable."

--
To Copy from One is Plagiarism; To Copy from Many is Research.
XML by Anonymous Coward · 2003-01-30 05:41 · Score: 0

Treat XML as Lisp sexps, but with terrible syntax (and that's compared to Lisp!).
It's much less painful in the long run.
1. Re:XML by Anonymous Coward · 2003-01-30 06:30 · Score: 0
  
  Treat XML as Lisp sexps
  
  I thought sexps was for printing pr0n pics on a PostScript printer, and had nothing to do with lisp.
So, where's the review? by mattdm · 2003-01-30 05:45 · Score: 3, Insightful

I see the table of contents explained in paragraph form. And then one complaint about the organization of the book. And then I expect to read the review, but it's already on to "you can buy this book here", and user comments.

I know complaining about slashdot stories is like shooting those proverbial barreled fish, but sheesh.
1. Re:So, where's the review? by Chelloveck · 2003-01-31 04:38 · Score: 1
  
  That was my thought exactly. I could get more depth by going to the publisher's website than I can get out of this review. At least the publisher gives me a couple of sample chapters to form an opinion around.
  
  --
  Chelloveck
  I give up on debugging. From now on, SIGSEGV is a feature.
Re:Mod this down, please. by Anonymous Coward · 2003-01-30 05:45 · Score: 0

I'd agree with you, but I can't get to slashdot to read your post.
XML isn't text by Anonymous Coward · 2003-01-30 06:04 · Score: 0

XML may look like text, but it isn't.

Instead, XML is structured data, represented as text.

Perl's text processing operators are all regular-expression (pattern) based. That works great for text (such as old-style log files) but works piss-poorly when coming to a structured file such as XML.

It'd be a royal pain in the arse to match, using a regular expression, some of the things you need to match when processing XML. Don't believe me? Take a look at what you do with XSLT (a great language for processing XML) and think of the matching power you can do with XPath, that you cannot do with Perl's regular expressions.

- David
Re:It's a great book about a terrific subject by Anonymous Coward · 2003-01-30 06:12 · Score: 0

You are an idiot. HTH.
Perl is for suckas by Anonymous Coward · 2003-01-30 06:17 · Score: 0

Perl is excellent... if you need to push your CPU to its limits. Why bother running Folding@Home or SETI@Home when Perl is sucking away all the CPU?
1. Re:Perl is for suckas by Anonymous Coward · 2003-01-30 06:20 · Score: 0
  
  Perl is much better suited for CPU burn-in torture tests. Its CPU utilization would push any overclocked system to the brink of collapse. Folding@Home? SETI@Home? Bah, they aren't even in the same league.
Re:Formalised features of Perl (in this book?) by egoots · 2003-01-30 06:22 · Score: 1

As you're also aware, most Comp Sci courses fawn over Pascal, a VERY formalized language. However, it's not mentioned much past education circles (and Apple afficionados).

Ever heard of Borland's Delphi product? The language is object-Pascal.
XML::Simple by Anonymous Coward · 2003-01-30 06:37 · Score: 2, Interesting

I'm seeing a lot of comments that perl doesn't have any particular strengths when dealing with XML. A good module people should check out is XML::Simple. Basically, it automagically turns XML into a nested data structure, and automagically turns a nested data structure into XML. The great thing about it you just make a single API call, and just directly access the data from there without having to learn anything more complicated. Definitely not an end-all solution, but definitely handles the common case wonderfully, and has quite a few handy options to allow more fine tuned control.
1. Re:XML::Simple by Nohea · 2003-01-30 17:06 · Score: 1
  
  I think you're better off using XML::Writer and XML::SAX.
  
  Writing a SAX handler is invaluable for parsing huge XML files. There's only do much you can fit in memory.
Perl & XML -- issues by Anonymous Coward · 2003-01-30 06:38 · Score: 0

A few comments of my recent experiences with Perl & XML:

At my current dev project I was asked to design a small application or script that would read an XML file, validate it using a DTD, perform more complex validation using data in our DB, and then save the XML file into various DB tables.

Issue #1: The XML::Parser module doesn't do any validation. You will need XML::libXML which uses the gnome xml library.

Issue #2: XML::LibXML was a pain to install on our Solaris environment. There were a couple of dependencies. Overall, having to install XML::LibXML on multiple machines would be very difficult.

These issues weren't show stoppers, however, we ultimately decided to go the Java route. Deployment with java was much simpler. The java XML parsers handle validation. Also, incorporating this into our WebLogic app would be easier (if we later decided to do that).

For the most basic uses, XML::Parser should suffice. XML::Parser::Simple is really easy to use (it creates a hash table of hash tables representing your XML document for you to parse).
Re:XML is NOT just text! (old school answer) by fishdan · 2003-01-30 06:38 · Score: 1

Wow, are we arguing about what is text? Now that is an old school computing arguement that I'm not sure the kids will appreciate! (no offense intended.)
My $.02 : XML is composed of text because it only allows ascii characters. Thats it. Well-formed XML "the language" requires more definitions, but an xml "file" is just another text file format. You're talking about nondeterministic finite automata quintuple that specifies how XML is parsed. understood, etc. But within that quintuple, I is the set of all ascii characters >= 32 and 128. At least I think that's true. Can someone post if I'm wrong? I appreciate learning of my misconceptions.

--
Nothing great was ever achieved without enthusiasm
XML doesn't create regular languages... by davids-world.com · 2003-01-30 07:06 · Score: 1

XML is NOT just a text file (just because we can read it with a simple "more hello.xml"). Perl is good at processing text, because it knows regular expressions and some extensions to them. However, an XML DTD (or a Schema) defines a context-free grammar, which make a language class above the regular languges. That's why we can't fully parse XML files with Perl's RE. A good example would be nested tags that result from recursive grammar rules in the DTD. These cannot be parsed without some serious geekism in Perl RE. However, I love to write those little tools that operate on XML data in Perl. Very often, you can work with regular expressions on context-free/sensitive language data!
hasn't received much attention until recently? by HealYourChurchWebSit · 2003-01-30 07:20 · Score: 4, Informative

The reviewer is correct, Perl is a good tool for slamming and jammin' text, including XML. What I'm not so sure of is the quote "It's therefore surprising that using Perl for XML processing hasn't received much attention until recently."

I mean one need only scroll down the extensive list of CPAN Modules to see well over 50, as well as many sites/authors devoting time, energy and resource.

Similarly, I would point out some press modules supporting web services via XML, such as SOAP::Lite as far back as 02/26/01 and XML-RPC also in '01 -- or O'Reilly's own XML.com with articles such as "Processing XML with Perl" written shortly after the turn of the millenium.

Point is, though I personally love Perl, blatant plugs such as "... it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." " don't inspire confidence in the reviewer's objectivity.

--
--- have you healed your church website?
1. Re:hasn't received much attention until recently? by Koschei · 2003-02-01 17:46 · Score: 1
  
  Funny you should mention the reviewer's objectivity.
  
  He wrote Manning's "Data Munging with Perl". Which is all about "slamming and jammin' text, including XML". In fact, it's got a reasonably large section on XML.
  
  But he doesn't mention it, and he doesn't push it. Instead he recommends two books from different publishers, neither of which he works for. As far as I know he doesn't provide any content for O'Reilly's oreillynet, but I may be wrong there.
  
  People who use Perl, and are part of the Perl community, know that you can slice and dice XML wuith Perl. What Dave is trying to say is that the managers and Java/Python/whatever programmers aren't so aware of this.
  
  --
  -- koschei
Re:XML is NOT just text! (old school answer) by Anonymous Coward · 2003-01-30 07:33 · Score: 0

XML allows unicode characters.
Axkit, perl & XML so happy together by porter235 · 2003-01-30 07:33 · Score: 2, Informative

check it out. http://axkit.org/

"Apache AxKit is an XML Application Server for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation."

picture coccoon for perl. using perl for xsp pages and doing pipline transformations on xml. great stuff.
Friggin less-than's by etcshadow · 2003-01-30 07:59 · Score: 1

The above includes several places that *should* have had a less-than character. You'd think that posting "Plain Old Text" would properly escape them as <, but I guess you'd be wrong.

Oh, well. You know what I meant.

--
:Wq
Not an editor command: Wq
1. Re:Friggin less-than's by juhaz · 2003-01-31 09:29 · Score: 1
  
  Hey, what can you expect? Slashcode is written in perl.
  
  (it's funny, laugh.)
2. Re:Friggin less-than's by etcshadow · 2003-02-07 02:36 · Score: 1
  
  It is funny. On the other hand, that's such a more suitable task for perl than dealing with xml:
  
  if ($form{Format} eq 'Plain Old Text') {
  $form{Comment} =~ s/&/&/g;
  $form{Comment} =~ s/</>/g;
  $form{Comment} =~ s/>/</g;
  }
  
  bing-bang-boom. How hard was that?
  
  --
  :Wq
  Not an editor command: Wq
3. Re:Friggin less-than's by juhaz · 2003-02-07 02:59 · Score: 1
  
  Would be more suitable, except for the fact that post mode names are misleading.
  
  Plain old text is not plain text. FAQ claims the following:
  
  HTML Formatted: You determine the formatting, using allowed HTML tags and entities.
  
  Plain Old Text: Same as "HTML Formatted", except that
  is automatically inserted for newlines, and other whitespace is converted to non-breaking spaces in a more-or-less intelligent way.
  
  Extrans: Same as "Plain Old Text", except that & and are converted to entities (no HTML markup allowed).
  
  Code: Same as "Extrans", but a monospace font is used, and a best attempt is made at performing proper indentation.
  
  So it seems that "Extrans" (whatever that is supposed to mean) would have done the job...
Re:XML is NOT just text! (old school answer) by fishdan · 2003-01-30 08:10 · Score: 1

Good to know, thanks.

--
Nothing great was ever achieved without enthusiasm
A simple answer to the question: by danbeck · 2003-01-30 08:16 · Score: 0

Who gives a damn? XML is generally a pain in the ass, long winded, over-the-top way to store simple data. Simple comma delimited text files are more than sufficient enough for many data needs and serious people who need real data storage of complex relational data use a database engine of some sort.

The conclusion? There isn't much development of Perl XML tools because no one cares about them. XML, with the exception of a few specialized purposes, is a buzzword for marketdroids and the technically incompetent. Why should I spend hours working out an application to parse XML data when I can write a quick script in any language to parse a comma separated text file in a few minutes?

Maybe CSV isn't quite as kewl sounding as XML?
1. Re:A simple answer to the question: by owlstead · 2003-01-30 13:35 · Score: 1
  
  XML itself is indeed a simple vehicle for storing data (the data itself can be quite complex, since you can put in anything you like). Obviously XML will not replace an RDBMS for storing and looking up data, and it does not need to.
  
  Though XML itself may look easy, I can asure you that the technical incompetent won't like the standards written around XML a bit. Schema's and XSLT take a while to get used to.
  
  Furthermore, you do not have to write an application to parse XML at all. It has been done already. You will be presented with the DOM or with SAX. With the DOM you get a pre-parsed tree structure and with SAX you will be called back if it has found your data. 95% of the people in these discussions will know this.
  
  The only conclusion I can draw from your writing is that you are as deep in XML as the writer of the original article: not at all. You see XML as just a text-file with some data in it. Other /. articles have already explained why this isn't so.
  
  Warper
  
  can anybody rewrite _all_ the linux configuration files to xml please? before lunch?
use AxKit! by sbwoodside · 2003-01-30 08:18 · Score: 1

Use AxKit! You're selling yourself short if you start to develop a site without it. It's just the ideal way to get the whole separation of content and presentation thing that XML is supposed to be all about. It makes it dirt easy to store your content in XML, use XSLT for transformations and XSP for dynamic back-end processing. Check it out!

Also read this

simon

--
home page
XML makes Perl less important by mackman · 2003-01-30 08:24 · Score: 1

Perl's strength is text processing was its ability to work with (read and generate) poorly structured data. XML makes it easy to create well structured data thus writing document processing code in languages like C++ is easier. People who don't know Perl, or people who learned other XML toolkits first, have less reason to learn XML with Perl.
1. Re:XML makes Perl less important by hondo77 · 2003-01-30 12:08 · Score: 1
  
  That might be true in a perfect world where everything is XML. In the world in which I live I have to transform some goofy mainframe-generated files ("I don't care how wide you make field 154, just tell me how wide it and I will process the file.") into XML. That makes perl very important.
  
  --
  I live ze unknown. I love ze unknown. I am ze unknown.
Re:Mod this up, please. by Anonymous Coward · 2003-01-30 08:25 · Score: 0

Cause it is so true.
Re:Mod this down, please. by Anonymous Coward · 2003-01-30 08:39 · Score: 0

I won't see the results for 3 days.
Why is Slashdot so slow? My God, it is so slow as to be unreadable!
Is slashdot now hosted on a Lego Brick? A Mac SE? An Atari 2600? WTF?
Poorly (spell)checked stories, duplicates, and now unbearably slow.
How about /. changes to a newsletter that gets mailed out every day? The page updates would be faster, the EDs could use the spellcheck in MS Works, and stories could be filed once! in a filing cabinet.
This is a book review? How about a Slashdot review?
Alright. Start your oh-so-predictable mods. Yawn. Wake me when the page refreshes.
Maybe the editors should read this book, and speed up slashdot...
XML and Perl security papers by Anonymous Coward · 2003-01-30 09:10 · Score: 0

http://www.cgisecurity.com/lib
http://www.cgisecurity.com/xml.shtml
FWIW Erik Naggum objects to *both* XML and Perl by Anonymous Coward · 2003-01-30 09:55 · Score: 0

See this post wherein Erik states

I still use SGML to produce documentation. I dislike HTML and the
incredible abuse it has seen. I positively /detest/ XML and the
disgusting mess it has introduced to the world.

Click here to see more posts that detail Erik's dislike of XML.

When Erik Naggum speaks, the Internet listens.
Re:It's a great book about a terrific subject by Anonymous Coward · 2003-01-30 10:16 · Score: 0

And you have a shitty sense of humor.
Part of the Problem by boatboy · 2003-01-30 10:22 · Score: 1

That Perl was geared toward text proccessing has been an obstacle to XML support in my admittedly limited experience. We're trying to interface with a 3rd party system that claims to use XML for data interchange. But because their programmers are used to traditional text-proccessing, their XML support is _very_ kludgy. Stupid things like requiring line feeds after each element, etc.
Re:XML is NOT just text! (old school answer) by Anonymous Coward · 2003-01-30 11:53 · Score: 0

XML is unicode; which != text in older perls.
Re:Mod this down, please. by Anonymous Coward · 2003-01-30 11:57 · Score: 0

Why is Slashdot so slow?
comments.pl . . . hmmmm
LIES by Anonymous Coward · 2003-01-30 11:58 · Score: 0

Let's try some text substitution on the Naggum quote:

...[Java] rewards idiotic behavior in a way that no other language or tool has ever done, and on top of it, it punishes conscientiousness and quality craftsmanship -- put simply: you can commit any dirty hack in a few minutes in Java, but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer; you can make something work, but you can't really figure out its complete set of failure modes and conditions of failure. (how do you tell when a regexp has a false positive match?)

...[Visual Basic] rewards idiotic behavior in a way that no other language or tool has ever done, and on top of it, it punishes conscientiousness and quality craftsmanship -- put simply: you can commit any dirty hack in a few minutes in Visual Basic, but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer; you can make something work, but you can't really figure out its complete set of failure modes and conditions of failure. (how do you tell when a regexp has a false positive match?)

...[Python] rewards idiotic behavior in a way that no other language or tool has ever done, and on top of it, it punishes conscientiousness and quality craftsmanship -- put simply: you can commit any dirty hack in a few minutes in Python, but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer; you can make something work, but you can't really figure out its complete set of failure modes and conditions of failure. (how do you tell when a regexp has a false positive match?)

Thanks for the ad-hominims. There's a lot of Perl bashing out there, but one thing you'll notice again and again is that the people who do the bashing have rarely written even a line of Perl in their lives.

Every language can be misused. Perhaps it's Perl's flexibility and openess to a variety of solution (even bad ones) that scares people who want to follow The One Right Way of programming. In contrast, experienced programmers don't want to be hemmed in or told how to program, and relish in Perl's flexibility.
Re:Formalised features of Perl (in this book?) by Mr.+Droopy+Drawers · 2003-01-30 13:10 · Score: 1

Indeed, I have. But, its just as dead as its parent. Squeezing features into Pascal is like squeezing OO into PERL and C++. SmallTalk is the best OO language implementation I've seen so far. But, guess what? No one uses it either.

Just because you CAN doesn't necessarily mean you SHOULD (i.e. Object-C).

--
To Copy from One is Plagiarism; To Copy from Many is Research.
wow, that is ugly. by zonker · 2003-01-30 15:12 · Score: 0

i wish someone would use something to render the text of this article readable.

xml::dom::fu:LOL::blah:BLAH::bLaH

that gave me a headache...

anyway, isn't this why people use php?

--
Large print giveth, and the small print taketh away
Perl lousy for parsers. by Animats · 2003-01-30 20:04 · Score: 1

Actually, Perl is mediocre at processing XML/HTML/SGML. Ever write a lex-type state machine parser in Perl? You can do it, but it's not as easy as it should be. "Get next character from string" is slow and/or clunky in Perl. (If strings are long, removing the first character is expensive. And you can't just subscript your way through a string. So you need to manage a small working buffer explicitly, something you shouldn't have to do in a language like Perl.) Perl does tree structures of objects, but Perl 5 objects aren't all that fast. Parsers in Perl tend to either have C components (creating a portability problem) or are slow. This is a lack.
You can write such parsers as regular expressions, but that makes them even slower.
Despite this, I parse millions of lines of SGML/HTML/XML into trees of HTML::Element, using only Perl. But it's clunkier than it should be.