Posted by
Hemos
on from the reduce-reuse-recycle dept.
GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."
ASN.1 not suitable
by
cartman
·
· Score: 5, Informative
ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.
Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?
It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.
Re:ASN.1 not suitable
by
pegacat
·
· Score: 5, Informative
This is pretty much right. I do a lot of work
on X500 / ldap / security, and ASN1 is used
throughout all this. It does a pretty good job,
but as the poster points out, the ITU is a completely brain damaged relic of the sort of big
company old boys club that used to make standards.
It's very difficult to get info out of them.
(Once you get it though, it's usually pretty
thorough!)
As for the 'compression', well, yes, it sorta
would be shorter under many circumstances.
ASN1 uses
pre-defined 'global' schema that everyone is
presumed to have. Once (!) you've got that
schema,
subsequent messages can be very terse. (Without
the schema you can still figure out the structure
of the data, but you don't know what its for). For example, I've seen people try to encode X509
certificates (which are ASN.1) in XML, and they
blow out to many times the size. Since each
'tag equivalent' in ASN.1 is a numeric OID
(object identifier), the tags are usually far
shorter than their XML equivalents. And ASN.1
is binary, whereas XML has to escape binary
sequences (base64?).
But yeah, ASN.1 is a pain to
read. XML is nice for humans, ASN1 is nice
for computers. Both require a XML parser/
ASN.1 compiler though. ASN.1 can be very
neat from an OO point of view, 'cause your
ASN.1 compiler can create objects from the
raw ASN.1 (a bit like a java serialised
object). But I can't see ASN.1 being much
chop to compress text documents, there are
much better ways of doing that around
already (and I thought a lot of that stuff
was automatically handled by the transport
layer these days?)
And just for the record... the XML people
grabbed a bunch of good ideas from ASN.1, which
is good, and
LDAPs problems are more that they screwed up
trying to do a cut down version of X500, than
that they use ASN.1:-)!
-- Wer mit Ungeheuern kämpft, mag zusehn,
dass er nicht dabei zum Ungeheuer wird.
Totally misses the point
by
coyote-san
·
· Score: 5, Insightful
This idea totally misses the point.
ASN.1 achieves good compression because the designer must specify every single and parameter for all time. The ASN.1 compiler, among other things, then figures out that that "Letterhead, A4, landscape" mode flag should be encoded as something like 4.16.3.23.1.5, which is actually a sequence of bits that can fit into 2 bytes because the ASN.1 grammar knows exactly how few bits are sufficient for every possible case.
In contrast, XML starts with *X* because it's designed to be extensible. The DTDs are not cast in stone, and in fact a well-behaved application should read the DTD for each session, and only extracting the items of interest. It's not an error if one site decides to extend their DTD locally, provided they don't remove anything.
But if you use ASN.1 compression, you either need to cast those XML DTDs into stone (defeating the main reason for XML in the first place), or compile the DTD into an ASN.1 compiler on the fly (an expensive operation, at least at the moment).
This idea is actually pretty clever if you control both sides of the connection and can ensure that the ASN.1 always matches the DTD, but as a general solution it's the wrong idea at the wrong time.
-- For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
The ASN.1 faithful just don't get it
by
RobertGraham
·
· Score: 5, Insightful
Preface: I've written parsers for ASN.1 (esp. SNMP MIBs, but also generic), BER/DER (same thing), PER, HTML, XML, and while we are at it, XDR and CORBA IDL. I've written a BER decoder that can decode SNMP at gigabit/second speeds.
There are a vast number of differences between ASN.1 and XML. To think that ASN.1 is in any way related to XML demonstrates that they just don't "get it".
1. Why not XDR or just raw binary?
Why not just specify your own binary format for you application? The thing that the ASN.1 bigots don't understand is that in most real-world applications, the ASN.1 formatting provides only overhead but no realworld value. This happens in XML, too, but the value proposition for XML is much clearer. A good example is the H.323 series PER encoding which is just plain wrong: well-documented custom encoding would have been tons better.
2. DTD or no DTD
The ASN.1 language is essentially a DTD; it gets encoded in things like BER. The trick is that I can parse "well-formed" XML content without knowing the DTD. This is impossible with current ASN.1 encoding. The idea of DTD-free "well-formed" input and DTD-based "valid" input is at the core of XML. Yes, both ASN.1 and XML both format data, but proposing ASN.1 as being a valid substitute means you just don't grok what XML is all about
3. Interoperability
The Internet grew up in an environment that parsers should be liberal in what they receive. This was important in early interoperability, but now is a detriment. For example, it is impossible to write an interoperable HTML parser. XML took the radical zen approach of mandating that any parser that excepts malformed input is BAD. As a result, anybody writing an parser knows the input will be well-formed. There is one-and-only-one way to represent input (barring whitespace), so writing parsers is easy. ASN.1 has taken the opposite approach, there are a zillion ways to represent input.
As a result, non-interoperable ASN.1 implementations abound. For example, most SNMP implementations are incompatible. They work only "most" of the time. Go to a standard SNMP MIB repository and you'll find that the same MIB must be published multiple times to handle different ASN.1 compilers.
The long and the short of it is that ASN.1 implementations today are extremely incompatible with each other, whereas XML libraries have proving to extremely interoperable. Right now, XML has proven the MOST interoperable way to format data, and ASN.1 has proven to be the LEAST.
4. Bugs
Most XML parsers have proven to be robust, most ASN.1 parsers have proven to be buggy. You can DoS a lot of devices today by carefully crafting malformed SNMP BER packets.
5. Security
You can leverage ASN.1's multiple encodings to hack. For example, my SideStep program shows how to play with SNMP and evade network intrusion detection systems:
http://robertgraham.com/tmp/sidestep.html
At the same time, ASN.1 parsers are riddled with buffer-overflows.
Anyway, sorry for ranting. I think XML advocates are a little overzealous (watch carefully your possessions or some XMLite will come along and encode it), but ASN.1 is just plain wrong. The rumor is that somebody through it together as a sample to point out problems, but it was accidentally standardized. It is riddled with problems, it should be abandoned. An encoding system is rarely needed, but if you need one, pick XDR for gosh sakes.
ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.
Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?
It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.
This idea totally misses the point.
ASN.1 achieves good compression because the designer must specify every single and parameter for all time. The ASN.1 compiler, among other things, then figures out that that "Letterhead, A4, landscape" mode flag should be encoded as something like 4.16.3.23.1.5, which is actually a sequence of bits that can fit into 2 bytes because the ASN.1 grammar knows exactly how few bits are sufficient for every possible case.
In contrast, XML starts with *X* because it's designed to be extensible. The DTDs are not cast in stone, and in fact a well-behaved application should read the DTD for each session, and only extracting the items of interest. It's not an error if one site decides to extend their DTD locally, provided they don't remove anything.
But if you use ASN.1 compression, you either need to cast those XML DTDs into stone (defeating the main reason for XML in the first place), or compile the DTD into an ASN.1 compiler on the fly (an expensive operation, at least at the moment).
This idea is actually pretty clever if you control both sides of the connection and can ensure that the ASN.1 always matches the DTD, but as a general solution it's the wrong idea at the wrong time.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
There are a vast number of differences between ASN.1 and XML. To think that ASN.1 is in any way related to XML demonstrates that they just don't "get it".
1. Why not XDR or just raw binary?
Why not just specify your own binary format for you application? The thing that the ASN.1 bigots don't understand is that in most real-world applications, the ASN.1 formatting provides only overhead but no realworld value. This happens in XML, too, but the value proposition for XML is much clearer. A good example is the H.323 series PER encoding which is just plain wrong: well-documented custom encoding would have been tons better.
2. DTD or no DTD
The ASN.1 language is essentially a DTD; it gets encoded in things like BER. The trick is that I can parse "well-formed" XML content without knowing the DTD. This is impossible with current ASN.1 encoding. The idea of DTD-free "well-formed" input and DTD-based "valid" input is at the core of XML. Yes, both ASN.1 and XML both format data, but proposing ASN.1 as being a valid substitute means you just don't grok what XML is all about
3. Interoperability
The Internet grew up in an environment that parsers should be liberal in what they receive. This was important in early interoperability, but now is a detriment. For example, it is impossible to write an interoperable HTML parser. XML took the radical zen approach of mandating that any parser that excepts malformed input is BAD. As a result, anybody writing an parser knows the input will be well-formed. There is one-and-only-one way to represent input (barring whitespace), so writing parsers is easy. ASN.1 has taken the opposite approach, there are a zillion ways to represent input.
As a result, non-interoperable ASN.1 implementations abound. For example, most SNMP implementations are incompatible. They work only "most" of the time. Go to a standard SNMP MIB repository and you'll find that the same MIB must be published multiple times to handle different ASN.1 compilers.
The long and the short of it is that ASN.1 implementations today are extremely incompatible with each other, whereas XML libraries have proving to extremely interoperable. Right now, XML has proven the MOST interoperable way to format data, and ASN.1 has proven to be the LEAST.
4. Bugs
Most XML parsers have proven to be robust, most ASN.1 parsers have proven to be buggy. You can DoS a lot of devices today by carefully crafting malformed SNMP BER packets.
5. Security
You can leverage ASN.1's multiple encodings to hack. For example, my SideStep program shows how to play with SNMP and evade network intrusion detection systems: http://robertgraham.com/tmp/sidestep.html At the same time, ASN.1 parsers are riddled with buffer-overflows.
Anyway, sorry for ranting. I think XML advocates are a little overzealous (watch carefully your possessions or some XMLite will come along and encode it), but ASN.1 is just plain wrong. The rumor is that somebody through it together as a sample to point out problems, but it was accidentally standardized. It is riddled with problems, it should be abandoned. An encoding system is rarely needed, but if you need one, pick XDR for gosh sakes.