Is Dedicated Hosting for Critical DTDs Necessary?
pcause asks: "Recently there was a glitch, when someone at Netscape took down a page that had an important DTD (for RSS), used by many applications and services. This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. Is this a sane way to build an XML based Internet infrastructure? Companies come and go all of the time; this means that the storage and availability of those DTDs is in constant jeopardy. It strikes me that we need an infrastructure akin to the root server structure to hold the key DTDs that are used throughout the industry. What organization would be the likely custodian of such data, and what would be the best way to insure such an infrastructure stays funded?"
Exactly. How about hosting these important files via a decentralized bittorrent tracker?
Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.
what if you're using a 3rd party library that has references to the dtd, schema or whatever? you don't really want to go through and change all of them.
what if XML files, for instance, are being exchanged between your application and others and they are including a dtd that doesn't reside within your domain?
I'm sure there are other scenarios as well.
Most tools provide a way to refer to a DTD on a public URL, yet use the local copy instead. (ie: taglib-location directive in java)
Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).
TODO: 753) write sig.
but just have your DTD as a W3C standard, distribute copies with your software, and don't bother a remote server until a new version of the DTD is released. Then distribute it with a new version of your software.
Seriously, what the fuck were they thinking relying on a server to be always available?
Hail Eris, full of mischief...
E pluribus sanguinem
What about a distributed file system that works like DNS? Hierarchial servers that each are responsible for a different level of the DTD. The "Root" is a trusted group of servers, which maintain a list of other servers where you can get a copy of the rest of the DTD. Then plugin builders and other sub-entities can have their own server for extensions to the base DTD.
Unfortunately, the DNS method has proven to not necessarily be the best way, with poisoning and stuff that can occur. Of course, it was designed during the days when they didn't just let anyone on the internet. But you can always diff your copy all the way to the publisher if you are paranoid, and with a signing server or something MD5ish that signs the DTD.
Cool! Amazing Toys.
I recently (within the last year) deployed an application that end users use for downloading and viewing custom content, and are intended to install the app onto laptops, tablets, and other portable devices allowing them view said content both on and off-line.
When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.
In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.
<tag
</tag>data</tag> - A closing tag being used for an opening tag
<tag>data<tag> - The opposite problem - two opening tags!
<tag attribute="data" attribute="data2" attribute="data3"> - The same attribute appearing multiple times because they wanted to send multiple values. God forbid they create child nodes!
and other fun things. Imagine my frustration when I was told, "The customer is always right, if that is what they are sending then find a way to handle it..." Different xml events had special processing to turn the invalid xml into a well-formed document so that they could be parsed. Ugh.
AmeriQuest had farmed all the work out to an Indian outsourcing firm. You get what you pay for...
Love sees no species.
Exactly. What always struck me about certain applications that do a DTD-conformant XLST processing step _every_time_ a web page is checked. That means my web app is dependent on the location on the internet being reachable (proxies!! downtime!! all that yummy goodness!!), plus the unacceptable overhead. But.. they merrily keep on making XSLT processors that _will_not_run_ without access to the DTD (I'm looking at you java!).
Religion is what happens when nature strikes and groupthink goes wrong.