Posted by
Hemos
on from the reduce-reuse-recycle dept.
GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."
Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...
ASN.1 not suitable
by
cartman
·
· Score: 5, Informative
ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.
Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?
It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.
Re:ASN.1 not suitable
by
pegacat
·
· Score: 5, Informative
This is pretty much right. I do a lot of work
on X500 / ldap / security, and ASN1 is used
throughout all this. It does a pretty good job,
but as the poster points out, the ITU is a completely brain damaged relic of the sort of big
company old boys club that used to make standards.
It's very difficult to get info out of them.
(Once you get it though, it's usually pretty
thorough!)
As for the 'compression', well, yes, it sorta
would be shorter under many circumstances.
ASN1 uses
pre-defined 'global' schema that everyone is
presumed to have. Once (!) you've got that
schema,
subsequent messages can be very terse. (Without
the schema you can still figure out the structure
of the data, but you don't know what its for). For example, I've seen people try to encode X509
certificates (which are ASN.1) in XML, and they
blow out to many times the size. Since each
'tag equivalent' in ASN.1 is a numeric OID
(object identifier), the tags are usually far
shorter than their XML equivalents. And ASN.1
is binary, whereas XML has to escape binary
sequences (base64?).
But yeah, ASN.1 is a pain to
read. XML is nice for humans, ASN1 is nice
for computers. Both require a XML parser/
ASN.1 compiler though. ASN.1 can be very
neat from an OO point of view, 'cause your
ASN.1 compiler can create objects from the
raw ASN.1 (a bit like a java serialised
object). But I can't see ASN.1 being much
chop to compress text documents, there are
much better ways of doing that around
already (and I thought a lot of that stuff
was automatically handled by the transport
layer these days?)
And just for the record... the XML people
grabbed a bunch of good ideas from ASN.1, which
is good, and
LDAPs problems are more that they screwed up
trying to do a cut down version of X500, than
that they use ASN.1:-)!
-- Wer mit Ungeheuern kämpft, mag zusehn,
dass er nicht dabei zum Ungeheuer wird.
ASN.1 was designed to be efficient
by
Anonymous Coward
·
· Score: 4, Informative
If I remember the history right, ASN.1 was designed during the era of X.25 and charging for every 64 byte packet.
I used to use ASN.1 for remote communications in a commercial product, but later changed it to a hybdrid of CORBA and XML, mostly due to more modern techologies, and since the actual bandwith did not cost that much anymore, it did not make sense to keep an old protocol alive.
ASN.1 has it's drawbacks too--8 different ways to encode a floating point number. It was a political reason, because everyone involved wanted their own floating point format included, and as a net result, everyone has to be able to decode 8 different formats.
A encoding designed by a committee (a stoneage telcom committe as a matter of fact).
Re:bandwidth is cheap
by
Jeffrey+Baker
·
· Score: 4, Informative
at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
Translated:
My debugging tools are inadequate, and my brain is inadequate for improving them.
You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone.
Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.
ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.
ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.
An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).
I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.
http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.
Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...
Nobox: Only simple products.
ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.
Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?
It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.
If I remember the history right, ASN.1 was designed during the era of X.25 and charging for every 64 byte packet. I used to use ASN.1 for remote communications in a commercial product, but later changed it to a hybdrid of CORBA and XML, mostly due to more modern techologies, and since the actual bandwith did not cost that much anymore, it did not make sense to keep an old protocol alive. ASN.1 has it's drawbacks too--8 different ways to encode a floating point number. It was a political reason, because everyone involved wanted their own floating point format included, and as a net result, everyone has to be able to decode 8 different formats. A encoding designed by a committee (a stoneage telcom committe as a matter of fact).
Translated:
You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone.
Use your CPU, not your eyeballs!
Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.
ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.
ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.
An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).
I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.
http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.
Cheers!
Ehttp://eugeneciurana.com | http://ciurana.eu