DTD vs. XML Schema
AShocka writes "The W3C XML Schema Working Group has released the first public Working Draft of Requirements for XML
Schema 1.1. Schemas are technology for specifying and constraining
the structure of XML documents. The draft adds functionality and
clarifies the XML Schema Recommendation Part 1 and Part 2. The XML Schema Valid FAQ
highlights development issues and resources using XML Schema. This article at webmasterbase.com addresses the
XML DTDs Vs XML Schema issue.
Also see the W3C Conversion Tool from DTD to XML Schema
and other XML Schema/DTD Editors."
I think James Clarke's RELAX NG and W3C XML Schema is the best description (if slightly biased ;--) of the relative strength of the 2 technologies. Note that James Clarke also just released a new version of Trang , a tool that does conversions between Relax NG, Schemas and DTDs.
Look, that's why there's rules, understand? So that you think before you break 'em. (Terry Pratchett)
This is a misunderstanding of the way schema validation is supposed to work. Schemas have what is called "location hints" which should be used in case you have never before encountered a particular namespace. The key word, however, is "hints" - i.e. you should never have to remotly obtain a schema if you don't need to.
..."master" XSD schema... you never ever have to get it remotely - the parser should be implementing it already...
In most cases, if you are doing schema validation, you already know whta schema you can expect, so they should be not only locally available, but also cached in memory...
As for the
In my experience, many benefits of XML come when dealing with the presentation layers of many application architectures, with the ability to repurpose syndicated data at wil, here are a few examples:
Effective use of XML and XSLT allows you to easily aggregate informational data from one or multiple sources and "repurpose" for an infinite variety of business and technological goals.
One of the main benefits of XML is that it offers and effective, textual representation of "scructured data", that can be conveniently accessed and manipulated according to a slew of various surrounding standards such as XPath, DOM, XSLT, namespaces.
Extraordinary Vacations. Exceptional Prices
There certainly is a "vs." involved. There are many good reasons to choose DTDs for a given validation requirement rather than W3C XML Schemas. I address some of those in an IBM developerWorks articles:
Comparing W3C XML Schemas and Document Type Definitions (DTDs)
This is a bit old, but still correct. Not a lot has changed in either spec.
I am currently working on a series of articles on RELAX NG. In most ways, I think RELAX NG really is the best of all worlds. It is more powerful than W3C XML Schemas, while being a natural extension of the semantics of DTDs. Moreover, if you choose to use the compact syntax (non-XML), you get something very easy to read and edit by hand.
David...
Buy Text Processing in Python
That's funny, I just looked at the man page for gzip.
Gzip uses the Lempel-Ziv algorithm used in zip and PKZIP.
The amount of compression obtained depends on the size of
the input and the distribution of common substrings. Typ-
ically, text such as source code or English is reduced by
60-70%. Compression is generally much better than that
achieved by LZW (as used in compress), Huffman coding (as
used in pack), or adaptive Huffman coding (compact).
Mind you, XML is highly repeditive in it's tag use on long documents. Long as in multiple records, not necessarily byte length.
Now let's take a larger file, 'cause after all, since modem users can download 5k html really quick. I've taken the soap distribution from apache (or was it sun) and took all the xml files in there and concatonated them together. 22k XML file. Not huge, but big enough for this example.
Here's my findings:
[caligraphy:~] spencerp% ls -al o.xml
-rw-r--r-- 1 spencerp staff 22118 Jan 23 21:21 o.xml
[caligraphy:~] spencerp% gzip o.xml
[caligraphy:~] spencerp% ls -al o.xml.gz
-rw-r--r-- 1 spencerp staff 3021 Jan 23 21:21 o.xml.gz
[caligraphy:~] spencerp% gzip -l o.xml.gz
compressed uncompr. ratio uncompressed_name
3021 22118 86.4% o.xml
Not bad for taking non repeditive text, with random xml schemas and getting 86.4%. Now imagine a larger one with a consistent schema. Compression goes even higher. Granted, it will be slightly larger than a binary. But even a 100meg file can be moved across a 100megabit network in 5 minutes time. And THAT is a lot of data.
Btw, there is a falacy with your math. If I get 50% compression of an XML file, which could have been implemented in binary format, it doesn't mean the binary format would be 49 times smaller.
-
ping -f 255.255.255.255 # if only