Slashdot Mirror


Old Protocol Could Save Massive Bandwidth

GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."

12 of 287 comments (clear)

  1. mod_gzip ? by AdamInParadise · · Score: 4, Informative

    Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...

    --
    Nobox: Only simple products.
    1. Re:mod_gzip ? by Nemesis][ · · Score: 3, Informative

      Yes, it's a very welcome and needed addition to a "bloated" protocol. But just be aware of some possible drawbacks when using it.

      It dosn't work with SSL easily. See this thread if curious. I ran into this when I wanted to force Open Webmail to use https only and found the pages were not getting compressed.

      And take note of possible problems with caching proxies serving pages to browsers that can't handle it.

      It has a few other quirks, but overall I for one am quite satisfied with it.
      Curious about the savings it brings? Use this.

      Machines are always broken till the repairman comes.

  2. Re:100:1 text compression ? by cREW+oNE · · Score: 3, Informative

    First....

    200 BYTE (!) XML documents are pretty rare. They probably standarized a few headers and instead of sending they just send some code.

    Don't believe for a second we're talking about a compression scheme here. The usual slashdot lack of information applies.

    --

    +++ATH0

  3. ASN.1 not suitable by cartman · · Score: 5, Informative

    ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.

    Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?

    It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.

    1. Re:ASN.1 not suitable by pegacat · · Score: 5, Informative

      This is pretty much right. I do a lot of work on X500 / ldap / security, and ASN1 is used throughout all this. It does a pretty good job, but as the poster points out, the ITU is a completely brain damaged relic of the sort of big company old boys club that used to make standards. It's very difficult to get info out of them. (Once you get it though, it's usually pretty thorough!)

      As for the 'compression', well, yes, it sorta would be shorter under many circumstances. ASN1 uses pre-defined 'global' schema that everyone is presumed to have. Once (!) you've got that schema, subsequent messages can be very terse. (Without the schema you can still figure out the structure of the data, but you don't know what its for). For example, I've seen people try to encode X509 certificates (which are ASN.1) in XML, and they blow out to many times the size. Since each 'tag equivalent' in ASN.1 is a numeric OID (object identifier), the tags are usually far shorter than their XML equivalents. And ASN.1 is binary, whereas XML has to escape binary sequences (base64?).

      But yeah, ASN.1 is a pain to read. XML is nice for humans, ASN1 is nice for computers. Both require a XML parser/ ASN.1 compiler though. ASN.1 can be very neat from an OO point of view, 'cause your ASN.1 compiler can create objects from the raw ASN.1 (a bit like a java serialised object). But I can't see ASN.1 being much chop to compress text documents, there are much better ways of doing that around already (and I thought a lot of that stuff was automatically handled by the transport layer these days?)

      And just for the record... the XML people grabbed a bunch of good ideas from ASN.1, which is good, and LDAPs problems are more that they screwed up trying to do a cut down version of X500, than that they use ASN.1 :-)!

      --
      Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird.
  4. ASN.1 was designed to be efficient by Anonymous Coward · · Score: 4, Informative

    If I remember the history right, ASN.1 was designed during the era of X.25 and charging for every 64 byte packet. I used to use ASN.1 for remote communications in a commercial product, but later changed it to a hybdrid of CORBA and XML, mostly due to more modern techologies, and since the actual bandwith did not cost that much anymore, it did not make sense to keep an old protocol alive. ASN.1 has it's drawbacks too--8 different ways to encode a floating point number. It was a political reason, because everyone involved wanted their own floating point format included, and as a net result, everyone has to be able to decode 8 different formats. A encoding designed by a committee (a stoneage telcom committe as a matter of fact).

  5. ASN.1 resources on the web. by gd23ka · · Score: 3, Informative

    Actually ASN.1 is a formal way of specifying how to encode data into binary representations like BER, CER, DER and PER which do save bandwidth compared to XML.

    Those of you that want to find out more about ASN.1, can pick up free e-books on ASN.1 here. There's some blatant propaganda in them for OSS Nokalva's ASN.1 compiler, but of course there's also snacc, an GPL'd open source ASN.1 compiler. Snacc however only generates code for encoding to BER, so you might also want to check out the a hacked version of snacc from Queensland University of Technology.

    ASN.1 is a base technology for a lot of standards out there like X.509, PKCS and LDAP, the OSI application layer protocols etc.

  6. Re:ASN.1 is evil by eigenhead · · Score: 3, Informative

    Its tha devils spawn I tell ya. Its extremely complex and hard to debug.

    Having worked with ASN.1 and CMIP I can certainly state that most examples for ASN.1 data types I've seen (M3100 and that lot) are far too complex (too many CHOICE, ANY values). But I still think ASN.1 and BER/PER are a decent way to efficiently encode data in a platform-independent manner. ASN.1 data types can be really simple or really complex, so blame the designers defining complex types in ASN.1 not the notation itself.

    The whole reason the net has taken off so quickly is the simple, open and clear protocols used. You need to debug your email server? Just telnet in and talk to it! With ASN.1 you need a compiler to make each damn data packet.

    I think it is only fair to state that a lack of good (I mean open and free, of course) ASN.1 decoders/encoders contributes to the lack of widespread adoption of technologies like ASN.1. Not that tools like SNACC are all that bad, but were good tools around in the early days of ASN.1? Certainly CMIP never had good free toolkits.

    The standards bodies play a role here. Making sure you advocate for your standard early on and doing your best to promote good open reference implementations goes a long way towards helping a standard gain widespread adoption.

    I think SNMP is a good example of how ASN.1 can be used effectively. Just because ASN.1 allows for complex types doesn't mean people have to build complex types into their standards/protocols.

    I'm growing tired of the "I've got the world on a String" school of data typing ;->

    Sometimes efficient, compact encoding/decoding is just what the solution calls for, whether it is ASN.1 BER/PER or the OMG IDL using CDR.

  7. Re:bandwidth is cheap by Jeffrey+Baker · · Score: 4, Informative
    at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.

    Translated:

    My debugging tools are inadequate, and my brain is inadequate for improving them.

    You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone.

    Use your CPU, not your eyeballs!

  8. ASN.1 -- excellent choice by ciurana · · Score: 4, Informative

    Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.

    ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.

    ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.

    An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).

    I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.

    http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.

    Cheers!

    E
    --
    http://eugeneciurana.com | http://ciurana.eu
  9. Those who do not undestand ASN.1 .... by StormyMonday · · Score: 3, Informative

    are condemned to repeat it. Badly.

    I have had to deal with dozens of binary protocols that do the same thing as ASN.1, and do it worse.

    As to comparisons, XML and ASN.1 are designed for different jobs. Designing a Web page in ASN.1 would be ridiculous. Sending (say) telemetry data encoded in XML is equally ridiculous. I can believe that *data* transmissions could be 100 times larger in XML than in ASN.1. You have the header, DTD, some namespace delcarations, and a bunch of nested tags, just to express a couple of numbers.

    Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.

    A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.

    As to the ISO, yeah, they're seriously obnoxious. They tend to go off into their own little world, redefine standard terminology so they're incomprehensible to outsiders, and come up with stuff that can't be implemented. (Nobody uses ASN.1 -- it's unimplementable. When people talk about using ASN.1 for something real, they're talking about a subset. A subset, of course, cannot claim conformance to the standard.) The crowning insult, of course, is that they fund the organization by selling the standards. Hey, it's a standard -- you *have* to buy it!

    "It's all in knowing what wrench to use to pound in the screw."

    --
    Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
  10. Re:Hello, haven't we read Comer's book? by isdnip · · Score: 3, Informative

    I've done real quantitative studies on the topic, and quite frankly you got it wrong. Moore's Law (for CPU power) is far stronger than "Moore's Law for bandwidth". Bandwidth growth has been on the order of 30-40%/year, while CPU power has grown faster than that for at least two decades.

    ASN.1 is well known outside of the IETF fundamentalist crowd. With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either. Nor is it difficult, if used correctly (and anything can be tough if used wrong). It's a simple tag-length-value notation which can recurse. The only reason the Internet doesn't use it more is the usual NIH.