Slashdot Mirror


XML Compression Options?

ergo98 asks: "About a year ago I had the need to evaluate XML compression technologies (for a project where two machines had to communicate via XML document, and there was an excess of CPU power and a dearth of bandwidth): At the time the best option seemed to be a research project called XMill, however it seemed even then to be an abandoned project with no more updates and little market presence, and was only source available as a command line utility requiring reworking into library form. I'm curious if there's been any progress in the XML compression arena in the past year: If you have more CPU power than bandwidth what is the best option for XML document compression? Has any XML specific compression algorithms been made as a module for Apache?"

3 of 51 comments (clear)

  1. GZIP by shemnon · · Score: 4, Informative

    How about simply using a text compression on the XML? Since gzip has a backwards token index it compresses XML quite nicely. It is availabe on java as well as C based implementations. If windows is your platform of choice you can get at it via cygwin.

    --
    --Shemnon
    1. Re:GZIP by ergo98 · · Score: 5, Informative

      The problem with GZIP is quite simply that it doesn't take advantage of XML specifics (i.e. there are domain attributes of XML that make certain algorithms much more efficient than others): XMill, as an example, manages to achieve almost twice the compression of GZip with approximately the same CPU usage.

      In this case adding in a module that would do XMill on the server side (of course obeying the Accept-Encoding that the HTTP client passes in, so a client that didn't handle the custom compression would not be thwarted) would allow us to use a custom HTTP client that could do the compression and achieve twice the throughput on the limited pipe. As I mentioned in the submission: We have more CPU power than bandwidth, so that 2x compression improvement is very significant (i.e. there is a world of low bandwidth vertical uses out there: satellite, frame relay, CDPD, etc). For those who cringe at the idea of using XML to begin with, please realize that with the proper compression XML encoded data is smaller than any proprietary packaging because of its high degree of predictability.

  2. XML Compression... by bje2 · · Score: 3, Informative

    these guys claim to be able to compress XML at a 34-1 ratio...

    --

    "Facts are meaningless. You could use facts to prove anything that's even remotely true." - Homer Simpson