XML Compression Options?
ergo98 asks: "About a year ago I had the need to evaluate XML compression technologies (for a project where two machines had to communicate via XML document, and there was an excess of CPU power and a dearth of bandwidth): At the time the best option seemed to be a research project called XMill, however it seemed even then to be an abandoned project with no more updates and little market presence, and was only source available as a command line utility requiring reworking into library form. I'm curious if there's been any progress in the XML compression arena in the past year: If you have more CPU power than bandwidth what is the best option for XML document compression? Has any XML specific compression algorithms been made as a module for Apache?"
There is a content-encoding plugin for Apache called mod_gzip that will do the server end, for any output including dynamic. I've not tried it, but on face value it's a standards-based way of getting what you want.
I think, although I can't find it for sure, that LWP supports gzip content-encoding too, which would mean that things like SOAP::Lite and XML-RPC would benefit too.
more about the content-encoding thing
"don't fall into the fallacy of believing that Perl can solve social problems. Maybe Perl 6 can, but that's a ways off"
The problem of course is that if you control both the producer and consumer you're greatly limiting the applicability of XML in the first place. Just yesterday I explained to my boss that one of the advantages to XML is for cases when you have 10 people who want your data, but you can't dictate was software they use...AND, 6 months from now, 10 other people who you haven't even met yet are going to want your data too. If you're in that boat, and you create any sort of compression scheme, then you're in trouble. If you're not, then you may not need XML at all (at least, not for moving your data around).
Perhaps you're hoping that there will be some compression module that becomes a standard part of XML, so that you can safely say "Anybody who is able to parse my XML message would also be able to decompress it"? Good luck. Even if that did happen, it would take ages for all of the parsers out there to get up to date.
What you'll probably find is that something like SOAP or WSDL will have a compression component. But in that case it's ok, because both the client and server sides of WSDL that do the marshalling/unmarshalling will be provided for you by your tools (such as BEA WebLogic). Think about what CORBA IDL was like -- you just write the interface, and then both client and server stubs are automatically generated for you. In that case, it's perfectly reasonable to expect that some compression/decompression code could be written in to the code automatically.
www.HearMySoulSpeak.com
Replace any set of spaces greater than 1 with a single space. You'll cut, by a fair bit, your average XML document. :-)
In other words, you can have it concise and efficent, or you can have it human-readable and pleasant to look at. :-)
Vintage computer games and RPG books available. Email me if you're interested.
really, how hard is it:
http://www.google.com/search?q=xml+compression
The very first thing that comes up is a project on SourceForge with in depth explanation of algorithm.
It's 10 PM. Do you know if you're un-American?