XML::Simple for Perl Developers
An anonymous reader writes "XML has become pervasive in the computing world and is buried more and more deeply into modern applications and operating systems. It's imperative for the Perl programmer to develop a good understanding of how to use it. In a surprisingly large number of cases, you only need one tool to integrate XML into a Perl application, XML::Simple. This article tells you where to get it, how to use it, and where to go next."
The CB App. What's your 20?
Outside of non-professional teenage Slashdot readers who still think the shitty Perl syntax is 'kewl', who the hell cares about the language anymore?
http://www.ruby-lang.org/en/
XML::LibXML is where it's at, it is a) quite a bit faster and b) has a sensible interfce rather than giving you useless empty hashrefs in the middle of a tree.
Stop intellectual property from infringing on me
No, professionals write good code, regardless of the language. There is just as much shitty Python/Ruby out there as there is Perl.
Great, the XML bloat and slowness continues to propogate out of document storage...
Yes, but at least with Python you can read and understand the shit- and then fix it if need be.
You, sir, sound look someone who does not understand Perl. Decent Perl looks like an unreadable mess only to VBScript programmers.
Professionals use Perl and Ruby.
Idiots Indentation Nazis use Python.
Python is as a Python will always be, slow and constricting.
Coding in Python is like baby talking to a 2 year old. It's f*cking annoying.
Go back to Basic or Turtle!
fine, I'll feed the troll.
Parsing perl with wet-ware isn't always easy. Obfuscating your code in the name of optimization should be countered with good commenting. Every useful script will have to be maintained, and the grandparent post is totally correct. I work minor miracles with Perl; or, miracles to me, anyway--I couldn't have created my dissertation data without Fortran--specifically g95--and Perl.
I know there are lots of useful languages out there. Every language has its fanboys. Heck, I liked the PDP-11 macro language a lot. If people produce useful code with Perl, don't complain about it; be glad for them.
XML has become pervasive in the computing world and is buried more and more deeply into modern applications and operating systems.
Man, that's the scariest thing I've read all day. IT'S BURROWING INTO THE SYSTEM!! AHHHH!!!! ANGLE BRACKETS EVERYWHERE!
Seriously, the first thing that came to mind was the type of stuff you read on thedailywtf.com.. something like: "And then Joe realized that the reason the string_to_upper() was so slow was because it was calling a SOAP service on a machine at the lead developer's previous employer, passing 23K of XML in both directions...." .. And then INVARIABLY, there's a reply that says "Hey, that isn't such a bad idea...."
So, no thanks, I don't want XML burying itself in my code any more than I want my music player to squirt songs.
How is this news? XML::Simple is old. Really old. 1999 old.
After all, I am strangely colored.
This is the most pointless article I've seen linked from slashdot in a long time (and yes, I've seen a lot of crap here). What is the point of posting a run of the mill tutorial on something that's been covered many times before?
Having spent a lot of time playing with this crap lately, can I just butt into this pointless thread and say screw XML, use YAML or JSON instead. XML is a steaming, clumsy overrated turd. I benchmarked XML::Simple against YAML::Syck - the latter encoded 2.5 times faster and parsed nine times faster than XML::Simple. The syck library is indeed aptly named.
"Leverage the power of XML" by deprecating it wherever you can for a more sensible cross platform format.
</rant>
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
As true as that may be, I have never heard of any other language be referred to as a "write only language".
Love sees no species.
who can't RTFM!?!!
More bullshit from the big blue
Oh yeah, thats right, the Linux fanboy on some crazy patent free westworld trip!
Perl is for guys who live with their mum
Perl is for people who think how they think is how we all think
AMPERSAND, WHY YES AMPERSAND!
Of course you dummy
XML?
XML is some stuff one of those deranged geeks COULD decode if the shit hit the fan
XML is the ANGLE BRACKET DAWN OF A NEW AGE; A NEW ROBOT AGE WHERE WE WILL LIVE IN TOTAL WORLD PEACE
Perl was the fucking bastard son of AWK
Perl is Sooo 1995
Where the fuck is Perl 6?
Oh quick bolt on some OOPS stuff, I got an interview with Sun Microsystems tommorrow
Thats right Mr President sir, Perl will make our willys larger than Big Bad Bills
WE WILL DRENCH THEIR FACES IN OUR SUPER CUM
Fuck we can't give it away man, free as in out-of-date BEER
PS Has anyone confirmed Perl is dead yet?
When information is power, privacy is freedom.
I just today noticed the announcement of XML::Tiny.
you had me at #!
Ah, and now I can say that I've only known three languages to be referenced to as write only - Perl being one of them (TECO and APL being the other two).
Regardless, even though the reputation is a shared one that doesn't make it a good thing.
Love sees no species.
Why is there always so much Perl bashing that goes on here. You can make any languages code look like a steaming pile of crap. And honestly I probably had some of the same sentiments before I started my current job, where I've done quite a bit of development in Perl. Its the right tool for the job, and I do have to take a little extra time when writing my code to make it as readable as say Java or C++, but really not all that much more. Be a smart programmer and you wont have to worry about how ugly the code is, its only as ugly as the person writing it.
It only claims to support a subset of XML, and of course it is called "Tiny", so I guess I can understand not supporting CDATA or attributes (...maybe). But for a ">" in a CDATA block to cause it to fail? It doesn't seem very useful.
... utterly useless for anything of real size. XML::Simple is a huge memory sink, because, as mentioned elsewhere, it insists on generating full hash and array representations of the source XML text. This seems to be the side effect of taking too seriously a lot of Perl advice enthusiastically handed out in the older documentation. (Put file text into a huge array! Don't close your file descriptors!) The rest of us know better.
Dog is my co-pilot.
There's a reason they're called "Cowards". Those of us who've been around a while know that there are all kinds of languages out there, and plenty of them can be used to write good code. The real problem with Perl is that Larry Wall is slowly losing his mind, adding features that nobody asked for; a good example is the new, backwards-incompatible regexps. (Don't go off that there'll be a compatibility mode; that's beside the point. The hubris needed to upend this core part of the language is pretty astonishing.) Also, he seems to be spending a lot of time with the Parrot rehosting, something else that is (perhaps) of dubious value. The changes are so orthogonal to what I do with Perl (hello, how about faster OO calls?) I have to wonder if there'll ever be a reason to switch to Perl 6. (Perhaps someone out there who knows more can comment.)
Dog is my co-pilot.
gack!
"that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
One thing I user it for was representing a database in XML. Once I had the DB layout in a datastructure, it was one line to print it out. Of course, this was before I knew about DBIx::XML_RDB...
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
I'm hoping I die and/or retire before perl5 is discontinued.
"that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
So you have never used APL, eh?
...), Basic, C, C++, Fortran (66,77,90,95), Gauss, Icon, Java, maxima/macsyma, mumath, pascal, perl, prolog, python, ruby, R, VB and probably a few I forget in there. The big idea I have learned is to never force fit a tool to a problem. Select the right tool for the right problem. And go from there.
...). It is not the perfect language for everything though, there are some missing bits.
... spent 15 years developing Fortran code. May it never reach 16 years.
Perl is derided by people who quite frankly don't have a clue.
We get lots of flammage from the Java and Python programmers that seem to be unable to grasp that when they try to justify their language choices by putting down other languages, they demonstrate how clearly idiotic their choices are. They cannot come up with something better than "line noise"? My god, have they not heard of the obfuscated code contest?
One can write unreadable code in any language. Perl is not unique in this regard. Moreover, Perl itself does not admit more unreadable code than other languages. The regex engine in Perl is a language unto itself. You don't need to use it, ever. But once you do, you realize how incredibly powerful it is. And you learn how to parse it, and even more scary, emit it, in your head. What takes hundreds of lines in Java (well what doesnt) becomes single digit number of lines in Perl.
In my career, I have used APL, Assembly (x86, 8080/Z80, 6502, 6800, F8,
Perl is wonderful in that it allows for rapid application development, has a really huge library to draw from (www.cpan.org), orders of magnitude larger than competitive languages, an active developer base, an active contributer base, is portable (you can run Perl anywhere, windows, linux, mac, Cray, AIX,
Ruby is neat, though I am amused by those in the Java community running over to it, thinking it is better than Perl. It is slightly different, but the syntax is actually quite close to perl. Learning it isn't hard once you know Perl, you can go back and forth quite easily. The problem in Ruby's case is speed. This hopefully will improve over time.
Python is hard for me to use. I am reminded of BASIC on IBM PCs. Some people like it, I don't. Use it if you must.
Java has always felt to me to be a solution in search of a problem. I haven't seen things that are being done in Java that couldn't be done more quickly and efficiently in other languages. Java has developed a cult-like following. Many people drank the koolaid, committed company resources to it, and poo-pooed other, better solutions. Only to discover that each "advance" meant to deliver more performance dug people in deeper to the hole, made the systems harder and more expensive to develop. And until recently, the vast majority of people were in significant denial over the fact that java was and is just a marketing gimmick for Sun. They drank the koolaid.
Fortran
APL. You want write only? Parse this: +/x
In APL, we wrote complex calculation systems in very few lines. It was a tremendously powerful language.
In Fortran we wrote complex calculation systems in quite a few lines. Not very powerful for IO, really sucked for this.
In Perl we drive complex calculation codes written in almost any language. Insanely powerful. Expressive and concise syntax, reads well when well written. Good IO, good networking, good system hooks. Can use MVC and web tools, Jifty even comes with a pony.
In my personal experience, Simple is probably the worst implementation of an XML parser in perl. For a simple implementation, I have found Twig to be much more useful, sensible and fast.
Nobody is supposed to remember SNOBOL. There was no SNOBOL.
Help stamp out iliturcy.
Perl is a good development language. I especially appreciate the transparency of scripting languages - if something goes wrong I can examine the source immediately. But I think you underestimate the power of Java. CPAN is good, but the resources available to a Java developer are even more extensive, thanks to the combined efforts of the major software players and the open source community. Java-specific development tools are leaps ahead of most other languages. When a serious amount of effort is required to solve an problem then it makes sense to invest time in a tool that make complex systems easier to develop and troubleshoot.
I agree that the article is pointless. XML::Simple is the oldest Perl XML library in existence, and there are better alternatives available. How does YAML::Syck hold up against XML::LibXML for performance? Is the syntax as easy to use as XML::XPath? Who else uses YAML? I don't want to invest my time in a data format that no one else uses.
This is the same approach that is built-in to the qore language http://qore.sourceforge.net/.
It makes it really easy to manipulate data in XML format.
However, qore supports deserialization of mixed text and data and multiple out-of order elements, XML attributes (imagine parsing a docbook file for example), as well as serialization (conversion of a qore data structure to an XML string) with the same features.
The same limitations regarding streaming input and very large files affect this approach, but in all other common cases, it makes it really remarkably easy to manipulate and create data in XML format using this approach.
(Qore also supports JSON with the same approach -- serialization and deserialization between JSON strings and qore data structures...)
thanks,
David
Oh yeah, it's just the absolute end of the world. Just a total show-stopper. After all, who would want readable, maintainable, consistent-looking code in a large team? What a disaster!
Hey thank you for pointing this out. I did not know about this. JSON and YAML are pretty nice.
Some drink at the fountain of knowledge. Others just gargle.
The author of that article says you'll want to use the magic of perl and XML::Simple because "XSLT can't do arithmetic" and proceeds to do magical things like increase numbers by 20%.
... bizarre. Of course you can do that with XSLT! <xsl:value-of select="whatever * 1.2">
That's just
Then he formats a number -- because XSLT, of course, doesn't have a format-number() function.
Next article -- why you should commute to work in an airplane because, as everyone knows, cars can't turn corners.
http://www.bartleby.com/100/420.47.html
Consistency is great... if all your code does the exact same thing, the exact same way. Otherwise, it can be misleading, and the ability to express differing functions in differing formatting is an indispensable boon to clarity.
Java is well-suited for large projects with fairly well defined requirements (and potentially complex interactions between objects/components).
If the requirements are ill defined, or the system small enough to likely be confined to a single box, or the object model relatively simple (few types, lots of instances) then perl is the first thing I think of...
Unless the object model is regular and layered, then I think ruby.
Unless there is a need for blistering IO and syscalltastic goodness with function overloading, then C++
Unless there isn't a need for too many object tricks or the STL, then C.
All of the above languages have excellent tools, environments, and libraries. I think they've all got it made in the shade.
BTW, the easiest language to develop and troubleshoot is JavaScript (ecmascript). Tools like firebug make it stupid-easy. Of course, there's no regularized non-web environment for it; I've seen small efforts to that end but they always end up not going far. It's a real shame, IMHO. Prototyping, lazy-evaluating, duck typing, easy-to-read language... what's not to love?
I don't know *kicks dirt*
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
No, professionals don't work as a software programmer/engineer/whatevertitleyouwanttoglorifyi twith, that is a blue collar job nowdays :) Get with the times, move on :)
These comments are based on C++, though i have used Perl I have been doing much more C++ coding lately.
The XML files are text based. Text I/O has to be read in sequentially whereas you can write out an entire block of allocated memory to the disk in binary format. The advantages of this is that you can read & write that data to and from disk 50 to 100 times faster than reading it in sequential from a text file.
What advantages does XML provide at the expensive of data i/o speed loss?
Nick Powers
Encryption: I may not agree with what you say, but I will defend your right to encrypt it...
What advantages does XML provide at the expensive of data i/o speed loss? Most xml parsers I have used had the option to read the complete file into memory in large chunks, and perform the actual parsing on the memory stream. So there is no performance penalty compared to binary file loading in terms of I/O.
The actual performance penalty comes from the actual parsing, which is of course slower than just memcpy()'ing memory-ready binary data into place. I have spent a whole lot of time on this performance issue and have found (the hard way) that the key to speeding this up lies in optimizing the xml document structure. Take it as a given that most xml tutorials advocate sub-par structure for the sake of simplicity.
Take a look at https://collada.org/public_forum/welcome.php for a good optimized xml-based format.
regards,
Whatever. If 80K for a hack (and that's what I am; a hack and thoroughly mediocre) is 'blue collar' well then, call me blue collar. Beats flipping burgers.
I've used Perl for a while now, and I've done some work on C#. Frankly speaking there isn't any library I see available in C# that can parse simple XML documents as simply and quickly as XML::Simple.
Guys, most of the time there's really no point talking about how much faster lib X is, or how lib Y is yada yada...Perl and XML::Simple does the job. Does it fast, does it CLEANly, and lets the overworked developer move on to the next job.
That's life. Get real.
I used XML::Simple in a script I wrote to grab data from my garmin GPS to use at http://www.gpsvisualizer.com/ Worked great without a lot of extra stuff to learn for a simple task.
I'm a bit leery of installing it on a machine where I do have root access. The incomprehensibility of the install code, plus the fact that it comes from IBM (which may be a bit better than MS, but does have a bit of history
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Oh boo hoo, he hasn't heard of these two random other languages, one of which is obscure and the other long-dead. He must really be an incompetent fool: anyone who wasn't around in the days of TECO is. What was your point again?
My point was that proof by regurgitated insult isn't any proof at all.
When information is power, privacy is freedom.
XML::Gay
I code in Perl for a living. I code in Python and Ruby for fun.
While many languages have excess baggage (like semicolons) simply because their designers didn't want to stray too far from the languages that inspired them, Perl is the only language out there that is made up entirely of this baggage. It's a Frankenstein's Monster of dead language ideas.
Also, Perl, above all other languages I have used, Perl seems to have been been designed with the philosophy "complexity is good." Today, this philosophy is almost universally recognized as being completely backward in the software engineering world.
Perl's criticism is well-deserved. The only real advantages of Perl over the other high level languages in its space are execution speed, availability of skills, and a large module library. It doesn't take a genius to realize that Perl's lead in these areas is shrinking rapidly.
And if you agree that complexity is bad, surely you realize that Perl encourages it, while the other languages do the opposite.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
XSLT is disqualified because it's impossible to do anything of any complexity in it without going insane.
Not only is the syntax horendously complex, but there are major misimplemented features that force strange and illogical workarounds, but people used to real programming languages have a really hard time grasping the template style leading to even more frustrations.
XSLT is a complete paradox, on one hand it's supposed to let non-programmers manipulate xml, but at the same time it's so hard to use that programmers can't use it.
I'd rather use a real programming language that I already know and manipulate the data with that, thank you very much.
-- To dream a dream is grand, but to live it is divine. -- Leto ][
Because you see, despite the fact that programmers like to pretend that they're supremely rational they're actually as faddish as a bunch of teenagers.
Yep, XML::Twig is the thing to use - gives you the efficiency of a stream/event based parser and the convenience of DOM/XML::Simple style access.
XML::Simple is a toy in comparison.
I don't realize that. Can you explain?
All I know is that it's impossible for Chinese people to communicate because someone who never learned the pictographs can't write a post-modern novel within a week of starting to learn Mandarin.
how to invest, a novice's guide
See the INSTALL_BASE argument to MakeMaker.
how to invest, a novice's guide
I wonder what's going on here?
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Not to mention you'd probably indent the code in any other language, python just gets rid of the extra braces.
... as one of more than one ways to do things. Unfortunately, every Perl program I have seen (with the merciful exception of POPFile) picks one of the other ways. TMTOWTDI, don't you know.
The problem with Perl isn't the language. OK, let me amend that: s/ (t)/just $1/. Its that the culture which surrounds the language encourages practices which are at odds with maintainability. You can't seperate Perl code from Perl coders, and Perl coders are, to a disturbing degree, prone to creating unmaintainable cruft with copious use of syntatic sugar which is actually disguised cyanide. Perl would be better off if some things which are syntactically allowable were banned forever. Here's a couple:
* copious use of default parameters (a wonderful way to introduce bugs into the program by accidentally overwriting one when inserting a new line of code between two sections of code which do not look obviously related)
* $_.=$& Code this obtuse needs to be discouraged, not encouraged. Its powerful, but totally obtuse, and programmer brain time is more important than finger time, which unfortunately Perl has decided to optimize for. (Incidentally: it appends the last regular expression match to whatever string you were operating on. I think.)
* Multiple methods of passing arguments to functions. (I have come across code which uses more than three in a single source file.) Function prototypes are like seatbelts. I know you think you're smart enough to not cause an accident. Statistically speaking, you're not. Buckle the heck up.
Help poke pirates in the eyepatch, arr.
You need a fairly recent version of ExtUtils::MakeMaker. Perhaps upgrading that (as root) will help. Otherwise, the PREFIX option exists in older versions, but it doesn't work the way almost everyone assumes it does.
how to invest, a novice's guide
Yeah, but once you screw up the indents in a whole file of Python source, you're not going to think that this is such a great thing. See, if you accidentally flatten out the indents of Python code, you have no clue where blocks used to begin and end. If the same thing happened in Perl, you could easily straighten things out by using the curlies as cues. (Heck, my editor will fix indents automatically for Perl.) This actually happened to me once--I accidentally sucked out all the leading spaces of every line of a Python program. The resulting mess made me re-think my infatuation with the language.
Having tried Python and having used Perl for years, I can't think of any reason why we need Python. Seems to me that anything you can do in Python you can also do in Perl. The reverse may be true, but ask yourself...does the world really need another programming language? I'm not going to be fanatic about it...use anything you think will do the job. But I'm not going to be using a language that gets upset if my indents are a off by a space, and that runs counter to my intuitive perception that white space is syntactically meaningless.
Great men are almost always bad men--Lord Acton's Corollary
Many businesses consider C and C++ to be "write only languages" and ban it's use in custom applications encoding business logic.
I personally know of far more businesses that consider C/C++ to be "write only" and consider Perl in the same way.
Not to mention you'd probably indent the code in any other language, python just gets rid of the extra braces.
Yeah, but once you screw up the indents in a whole file of Python source, you're not going to think that this is such a great thing. See, if you accidentally flatten out the indents of Python code, you have no clue where blocks used to begin and end.
Yeah; I've seen quite a bit of this problem from my experiments with python. Usually, it comes about when trying to exchange code via email. Lots of email software plays fast and loose with white space, not to mention the nasty problems caused by line wrapping. Getting python code safely through via email can be a real challenge. Undoing the damage for more than a few lines of code can give you a headache. It's a lot easier with languages that don't use white space syntactically; you just feed it to a "prettyprinter" and it's good again.
Of course, the real solution would be to round up all the idiots who write email software that munges the format of the text, take them out back, and work them over a bit. I've found that rational argument doesn't work with this crowd. They seem to think that it's perfectly acceptable for email software to "improve" the text by rewriting white space, and nothing you can say convinces them otherwise. Even more recalcitrant are the folks who like to convert email between plain text and HTML.
It doesn't even work to just put files in a web directory and tell people to download them. They usually do this with a browser, and lots of browsers will rewrite white space (especially tabs) and do line wrapping in with text/plain and <pre> parts of HTML files, and there's no way to prevent this.
In a few cases, the only way we've found to prevent irrecoverable damage to python code is to encrypt the text with some simple tool like uuencode, and decrypt it at the receiving end. It's a PITA for what should be a simple file copy, but it works.
With (nearly) fully-parenthesized languages like C and perl, most such damage is usually easy to undo. With python, it can be nearly impossible, because the block structure is simply gone.
Of course, this is only a problem if you're trying to share code with others. As a language for in-house, unshared code that never has to be transmitted to anyone else, python is probably fine.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
That, or the self-described programmers posting on Slashdot really are a faddish bunch of teenagers.
Hasn't everyone already heard of XML::Simple? Or are we pointing out again the reason why I stopped using perl? The fact that 99% of perl developers always reinvent the wheel rather than using CPAN.
To install local CPAN modules, see:t /
http://sial.org/howto/perl/life-with-cpan/non-roo
Intron: the portion of DNA which expresses nothing useful.
how to invest, a novice's guide
This is my big gripe with XML::Simple. I've been using XML as protocol for dataexchange for a number of years in a number of languages and I do a great deal of Perl. I love XML::Simple when I'm parsing a very simple configuration file or a tree which is not very deep. However, it makes life so easy, that I've seen people munge in huge files with Simple, where a sax parser should have been used. I've seen people marshal out XML, when they don't understand the underlying schema - or even that such a concept exists.
A little knowledge is a dangerous thing and a (way too) powerful API is even worse. There are way too many daft Perl projects and implementations out there (in the commercial world) for this very reason. It almost makes me loath the language I love so much.