Is Dedicated Hosting for Critical DTDs Necessary?
pcause asks: "Recently there was a glitch, when someone at Netscape took down a page that had an important DTD (for RSS), used by many applications and services. This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. Is this a sane way to build an XML based Internet infrastructure? Companies come and go all of the time; this means that the storage and availability of those DTDs is in constant jeopardy. It strikes me that we need an infrastructure akin to the root server structure to hold the key DTDs that are used throughout the industry. What organization would be the likely custodian of such data, and what would be the best way to insure such an infrastructure stays funded?"
ICANN!
Mhahahahaha. Yeah. I know, I crack myself up.
Deleted
Nothing too insightful to write, but worth saying in today's volatile political climate. Centralization makes me nervous.
Regards.
w3c.org . There's no better place to keep the standards related to the web.
ilex paraguariensis for all
ICANN do it!
and DTD stands for? Distributed Technical Dependency?
Libertarian Leaning Political Discussion Forum.
...keep a copy, host it on your own site and reference that instead. There was no problem except that some were using that file to download the definitions. Or just expand the definition to include a checksum and a list of mirrors. Is this even a problem worth solving? I mean except for the slashdot post it seemed to me like this went by without anyone noticing.
Live today, because you never know what tomorrow brings
"This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. "
Install them into every commercial and consumer router.
Such a system should also allow stable storage and management of ontology definitions, used within the semantic web.
I would suggest someone like OSTG or the Mozilla foundation...
You sound like a PHB who thinks to himself, "XML is a buzzword, I'll bet it'll get the job done."
Well, I wouldn't call it sane if anybody who is actively using XML and needs a DTD isn't hosting it right along with whatever web site they're using the XML for. Relying on somebody else to maintain a critical DTD that you use isn't sane. It's pretty dumb.
I don't respond to AC's.
You shouldn't be using DTDs any more. Validation is better achieved with RelaxNG, and you shouldn't use them for entity references because then non-validating parsers won't be able to handle your code.
For those document types that already use DTDs, either you ship the DTDs with your application, or you cache them the first time you parse a document of that type.
The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust. You shouldn't be looking at the hosting to solve the problem.
Bogtha Bogtha Bogtha
This is just not an issue worth solving...
This can help:t ion
http://en.wikipedia.org/wiki/Document_Type_Defini
If the absence of these files will break your app or service, then you need to make your app or service more robust.
Sure, DTD files are necessary for development. If your app requires that they be used to validate something in real time each time it is comes in from a client or whatever, then use an internal copy of the version of the DTD file that you support. If the host makes a change to it (or drops it, or lets it get hacked), your app won't break, and you can decide when you will implement and support that change.
I really don't see what is gained by making the real time operation of your application dependent on the availability and pristinity of remotely and independently hosted files. It just makes you fragile, and you can get all the benefits you need from just checking the files during your maintenance and development cycles.
The only other language I know of that even allows file sourcing over HTTP is PHP, and there it's a gaping security hole that defaults to off. In everything else, the dependencies *get installed to the local file system*.
NTP.org" maintains a pool of public NTP servers that are accessible via the hostname "pool.ntp.org", so perhaps something similar would work for a global DTD repository. An industry organization with a vested interest, the W3C seems like the most logical, could maintain the DNS zone and organizations could volunteer some server space and bandwidth to host a mirror of the collected pool of DTDs. Volunteering organizations might come and go, but when that happens it's just a matter of updating the DNS zone to reflect the change and everyone using DTDs just needs to know a single generic hostname will always provide a copy of the required DTD.
Just a thought...
UNIX? They're not even circumcised! Savages!
Most tools provide a way to refer to a DTD on a public URL, yet use the local copy instead. (ie: taglib-location directive in java)
Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).
TODO: 753) write sig.
but just have your DTD as a W3C standard, distribute copies with your software, and don't bother a remote server until a new version of the DTD is released. Then distribute it with a new version of your software.
Seriously, what the fuck were they thinking relying on a server to be always available?
Hail Eris, full of mischief...
E pluribus sanguinem
A key mistake in your assumptions was brought up when the Netscape fiasco was news, and I will bring it up again...
d td" is a URI. It uniquely identifies a file. It *HAPPENS* to also be the URL for that same file, for now, but that is just a fortunate intentional coincidence. Your software should not rely on or require the file to be located at that URL. /var/dtd/rss-0.91.dtd is a perfectly valid location for the file identified by the URI "[whatever]/rss-0.91.dtd". What we need is for XML-using-software authors to support and embrace local DTD caches, AND package DTDs along with their applications (with the possibility of updating them from the web if neccessary).
"http://my.netscape.com/publish/formats/rss-0.91.
It is silly that millions of RSS readers fetch a non-changing file from the same web site every day. It is only very slightly less silly that they fetch it from the web at all.
Don't usually do this, but the above comment is the first one in this conversation that explains why this problem doesn't really exist.
I have seen the future, and it is inconvenient.
I think there is an OASIS standard called XML Catalogs for redirecting offsite schema requests to a local copy...
what to stop someone from hosting this files locally, for their own use, on a local server? In some cases this would not be practical, with redirects for downloading, etc. but could this be done for some instances?
"It is a greater offense to steal men's labor, than their clothes"
Are Critical DTDs Necessary?
As far as I know this is an quite old story.
And we (the Slashdotters) came already to the conclusion that programmers who write code that relies on such kind of external resource need to be fired because they're obviously incompetent and a danger to the business of their employer.
So, it doesn't really matter if such external resources are hosted one way or the other. You stay away from them. You stay away from them. You don't use these external resources in your code.
People are still using DTD's? I thought everybody switched to XML Schema a while back. God, I can't keep up with this constant flux!
I need some chinese food. Hmm...
Schezuan!
NO CARRIER
This has been covered before here and elsewhere... anyone who is using a DTD as a URL rather than a URI needs to be taken out and shot. I say bring them all down and let all the apps that rely on them die or be fixed.
Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
I recently (within the last year) deployed an application that end users use for downloading and viewing custom content, and are intended to install the app onto laptops, tablets, and other portable devices allowing them view said content both on and off-line.
When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.
In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.
The trick is to make centralized copies of important, or oft used, files available. I'd not just do DTD's. I think as AJAX, Web 2.0, or whatever you wanna call it, grows more popular and demands users download more and more Javascript, images, etc that are often the same files between different websites that it could be very useful to them if we stored a copy of those shared files on one server, with caching properly configured, so that users need to only download and store one copy instead of dozens of copies.
You don't have to centralize the originals - just copies. You get the benefits of a centralized resource without the risks of a centralized organization.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
Wikipedia's Root nameserver entry says that 4 of the 13 root nameservers are run by private companies.
Am I missing something here, or is this problem solved by catalog files? Surely any decent XML parser that can download an external DTD subset from a URI can get the DTD subset via a catalog file?
What's wrong with this website?
Linux box with an uptime of 153 days. It does have to go down now and again so I can clean the dust and cat fur out of it, but that doesn't take too long.
Loose lips lose spit.
Unfortunately, DTDs aren't just for validation... they're also the only good way to define "entities" (e.g. "&foo;") in XML. This comes up a lot when trying to put HTML in XML feeds, because HTML has a lot of entities that aren't in the XML spec. Specifically, you may notice that you can't type " " in ordinary XML.
It's trivial to define " " yourself in a DTD, (<!ENTITY nbsp "&#a0;">) and many of the standard DTDs out there do define it, but by the XML 1.0 standard it's got to be defined somewhere or else the XML won't parse.
When I moderate, I only use "-1, Overrated". That way, I never get meta-moderated!
I don't know why important DTDs aren't just turned into serializations. HTML 5 (and, in practice, HTML in general) has a text/html serialization because the major browsers don't care about DTDs. It seems like well-published specifications like RSS should just be serialized and DTDs ignored, even though they are presented, instead of breaking when the DTD can't be found. I guess that wouldn't work if a generic XML parser was used for RSS, but for RSS readers, the DTD shouldn't matter.
The last time I checked, there is no mechanism by which an XML file can provide a link to the corresponding RelaxNG schema in the same way that it can provide a DTD.
Thus, while an application which expects files conforming to a specific schema can validate against that schema, it is not possible for a program to validate an arbitrary XML file. For example, there is no way xmllint can automatically find the related RelaxNG schema, in the same way that it can find the DTD.
If I am wrong, and there is a way to provide the schema, please enlighten me.
www.eFax.com are spammers
Quick, someone register http://all.your.dtds.are.belong.to.us/ :-)
Seriously though, we don't need dedicated hosting for DTDs. We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless. Web browser vendors realised this a long time ago. No browser ever read HTML's SGML DTDs, and they do not use validating parsers for XHTML either (although, they use a hack to parse a subset of the DTD to handle XHTML and MathML entity references).
DTDs are bad for several reasons:
Plus, if a UA needs to request the DTD every time it parses the file, that adds significant overhead by the time it fetches the DTD, parses it and checks the document for validity. It's just not worth it. The Netscape RSS DTD issue was a mistake, and it's time to learn from that. There are much better alternatives available for validating XML than DTDs, such as RelaxNG or Schematron.By reading this signature, you hereby agree with the content of the above comment.
Isn't this what doctypes like this are for:
n al.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitio
That whole PUBLIC thing means that the browser can have its own copy so that it doesn't have to fetch it off the website. Is there a reason that this is not the standard way of doing this?
As the Americans learned so painfully in Earth's final century,free flow of information is the only safeguard against...
Validation is overrated. Especially, when it comes to RSS. There's so many competing "compatable" standards, that really aren't. feedparser.org has a great write up about the state of RSS. It's pathetic.
If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway. When you're constructing XML, you should write it according to the DTD, but if you're relying on a remote site, then you're asking for trouble. Just cache the version locally, but seriously, you're tool shouldn't really need it. You're engineers do, but not the tool.
Finally, it's trivial to reconstruct a dtd from sample documents.
A DTD spec SHOULD have both a PUBLIC identifier and a SYSTEM identifier. The system identifier is strongly recommended to be a URL so that a validating parser can fetch the DTD if the DTD is not found in the system catalog.
The system catalog is supposed to map from the PUBLIC identifier to a local file, so that the parser needn't go to the network.
If you are running a recent vintage Linux, look in
So:
www.eFax.com are spammers
Now I may have not quite grasped the importance of DTDs, but I can think of only one scenario where retrieving a DTD from a to-be-determined location would be useful: Validating XML against any DTD. (Solution: Whomever wants to validate will also provide the DTD.)
To my knowledge any other application could just depend on builtin DTDs for validating the formats it knows and don't care about whatever it doesn't know as it wouldn't be able to intelligently use them, anyways.
Did I forget to take in account one of those nice tiny little huge details somewhere?
No, you can't./Yes, ICANN. No, you can't./Yes, ICANN. No, you can't./Yes, ICANN, Yes, ICANN!
Anything you can be ICANN be greater./Sooner or later, I'm greater than you.
No, you're not. Yes, I am./No, you're not. Yes, I am./No, you're NOT!. Yes, I am./Yes, I am!
ICANN shoot a partridge With a single cartridge./ICANN get a sparrow With a bow and arrow.
ICANN live on bread and cheese.
And only on that?/Yes./So can a rat!
Any note you can reach ICANN go higher.
ICANN sing anything Higher than you.
No, you can't. (High)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN! (Highest)
Anything you can buy ICANN buy cheaper./ICANN buy anything Cheaper than you.
Fifty cents?/Forty cents! Thirty cents?/Twenty cents! No, you can't!
Yes, ICANN, Yes, ICANN!
Anything you can say ICANN say softer./ICANN say anything Softer than you.
No, you can't. (Softly)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer)
YES, ICANN! (Full volume)
ICANN drink my liquor Faster than a flicker./ICANN drink it quicker And get even sicker!
ICANN open any safe.
Without bein' caught?/Sure./That's what I thought--you crook!
Any note you can hold ICANN hold longer.ICANN hold any note Longer than you.
No, you can't.
Yes, ICANN No, you can't/Yes, ICANN No, you can't.
Yes, ICANN
Yes, I-I-I-I-I-I-I-I-I No, you C-A-A-A-A-A-A-A-A-A-A-A-A-N'T--
CA-A-A-A-N! (Cough, cough!)
Yes, you ca-a-a-an!
Anything you can wear ICANN wear better./In what you wear I'd look better than you.
In my coat?/In your vest! In my shoes?/In your hat! No, you can't!/Yes, ICANN Yes, ICANN!
Anything you say ICANN say faster./ICANN say anything Faster than you.
No, you can't. (Fast)
Yes, ICANN. (Faster) No, you can't. (Faster)
Yes, ICANN. (Faster) Noyoucan't. (Faster)
YesIcan! (Fastest)
ICANN jump a hurdle./ICANN wear a girdle.
ICANN knit a sweater./ICANN fill it better!
ICANN do most anything!/Can you bake a pie?
No./Neither can I.
Anything you can sing ICANN sing sweeter./ICANN sing anything Sweeter than you.
No, you can't. (Sweetly)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't, can't, can't (sweeter)
Yes, ICANN, CAN, CAN (Sugary)
Yes, ICANN! No, you can't!
tasks(723) drafts(105) languages(484) examples(29106)
A URL has:
The only one of those that the machine itself has any control over hiding from the user is the path, which can be virtualized. However, many aren't. DTDs certainly don't seem to be.
A distributed system for this kind of mission-critical information is what we need. Think DNS for documents, rather than just hosts.
You can map public and system identifiers to local resources. Use them for dtds, schemas, stylesheets, etc. Here's the spec. Google for more information.
Do it like the DNS system, have a bunch of companies (*cough* google yahoo verizon mozilla microsoft (or not) *cough*) host this stuff, I doubt they'll all go bankrupt at the same time ;P
Why doesnt the content provider just provide the dtd. Why have to worry about caching it or random errors poping up in it, when the DTD can be stored on the very same server as the website, or stored with the application. Then it doesnt matter if another company screws up or if some miliscious hacker decideds to attack the DTD it doesnt effect your product...
Some might think well what if it changes?
well its obvious download the new one update your xhtml/xml or application to the specific changes.
Using URLs is just a non-bureaucratic way to avoid name clashes, which is rather clever. However, using http:/// as prefix is rather brain-dead because all the other brain-dead people will assume that you have to anything from this URL. It would have been smarter to add a dedicated URN prefix for this like "namespace:", "spec:" or "whatever:".
Kind of, but not really.
Yes, XML catalogs are the answer.
Nothing in the XML specs says that any actual document is hosted at the URI. It's a mechanism to specify a globally unique identifier. It's an identifer, not a promise to host a document. Some folks host the DTD document at the URI, but there's no requirement to do that.
While I'm sure not every RSS client uses a high-quality XML implementation, it seems clearly true that every RSS client is an XML processor. RSS is an XML format. So an RSS client is, um, processing XML...
As for checking for updates, that's a non-issue. Remember, the URI is a unique identifier. If you ever update the DTD, you'd generate a new URI.
Having anything in a live project linking externally is insane! I never understood how developers can risk this.
We use maven, use dtd's schemas wsdl etc. Much of the wsdl and other files refer to online areas. We download these and alter the references to be local. Otherwise we would have a build fail because of an internet issue, which is just nuts.
Same with maven, we have our own local repository where we keep a subset of what we use. Again same situation. In these cases this is just for building, I can't imagine doing this on a live site. This can especially go for externally referenced javascript... local copies are your friend.
The real question is whether the Internet users at large would help pay for robust infrastructure at w3 to support this.
Just flush XML and then it wouldn't be an issue...
That is all.
I am all for it but such a think needs to be international NOT in control of a company based in any country *cough* ICANN *cough*
Just the other day, I was looking for a DTD for 4ML related to music and lyric notations. But the website is not working. Most probably the guy got bored with it and forgot to pay the hosting company.
We definitely need some sustainable way to host the DTDs.
~Sivaraj
1) There are some sensitive environments (military, etc) where you simply do *NOT* connect your internal network to "teh interweb". No ifs, ands, ors, buts. The result is a broken browser where the DTD's are required.
/etc/dtd/ and users will be able to add their own DTDs in ~/.dtd
2) Remember the incident where popular "safe" Superbowl sites were compromised and laced with malware-installing code? What happens to millions of Firefox-on-Windows users when a bunch of Russian mobsters or Chinese government agents hijack a DTD host and load it with a zero-day Windows exploit?
3) Remember "pharming", where DNS servers are hijacked to redirect *CORRECTLY TYPED URLS* to malware-infested sites. Even if the bad-guys can't hijack the DTD host, they can still hijack Windows-based DNS servers (ptui!) and anybody who relies on them gets redirected to a malware-install site.
That's the problem; here's my solution. It's composed of two parts.
A) DTDs will be *LOCAL FILES ON YOUR WORKSTATION* (excepting "thin clients").
B) Browsers (or possibly Operating Systems) will include new DTDs with updates. In posix OS's (*NIX, BSD) DTDs will be stored in
Windows will have its own locations. When you get your regular update for your browser (or alternatively, your OS), part of the update will be any new DTDs. There will be a separate file for each DTD and version, so that your browser can properly handle multiple tabs opening to sites using different versions of the same base DTD.
I'm not repeating myself
I'm an X window user; I'm an ex-Windows user
Why, the same organization that should probably be responsible for *all* critical Internet infrastructure standards, just as it is responsible for the standards relating to telecommunications and radio communications.
The ITU (also here.
Go ahead, laugh, but I think it's long past time for control of such functions as DNS, NTP, assigned numbers, et cetera, to be transferred out of the hands of primarily US-based corporations and loosely coupled organizations such as the IETF and IANA and into the hands of some sort of international treaty organization.
Since the ITU not only fits this description, but in fact was founded to deal with precisely these sorts of issues, why not let it do what it does for the Internet as well?
The best way would be to use a decentralized information storage system like ed2k: URI scheme or Magnet: URI scheme
Is it not obvious that it may as well be the W3C? XML is their standard, operating a registry for public-use DTDs would be a rather reasonable service to provide..
As many people have alluded to already, it's an incredibly bad idea to make your application have an unnecessary dependency on an external service. Keep a local copy, just copy it down once and you're done, simple as can be.
But maybe there are urls out there pointing to "the latest and greatest" version, rather than a specific version, and you like the idea of using "the latest and greatest". So, think for a moment what happens when the DTD/schema changes. Is your app magically going to change how it deals with the xml at the same time? Of course not!
So, until you can get out a patch, you'd be refusing xml docs your code/xsl has been built to handle, and possibly letting in xml docs that your code/xsl has not been built to handle. Whereas, if you just kept a local DTD/schema, you would have no trouble keeping it, and the code/xsl behind in in sync.
If everyone mirrored every DTD they need, we'd not have this problem; They aren't large just mirror the ones you use.
The DTD is great for doing the development. It is horrible for having to validate a transaction each and every time.
Think about this. Why should you pull down a DTD each time you goto validate a transaction? Does your transaction dynamically change each time or does it stay the same for long periods of time? Likely it stays the same.
Additionally if you are referencing a DTD that is external to your environment, why the hell would you trust that DTD? How do you verify that that DTD is the correct DTD? There is enough cracking and whatever else out there that somebody could sneak in and either change the DTD without telling anybody and it still seems to work or maybe not. Perhaps the change introduces a hole into your system or somebody's system that allows a cracker in. It is purely dumb to ever reference an external DTD ever!
In my job, I explicitly remove DTD references before parsing XML documents to prevent a security issue from happening. Also, what if your customer providing the DTD changes the DTD without telling you and simultaneously changes the document you validate against to match those changes? Will your system be able to handle that change and adjust. In theory it should, but in practice, especially if you are a business, that is bad business. Contracts likely need to be changed, people need to be paid, etc.
If you want your system to break, reference and external DTD. If you want to increase your security risk, reference and external DTD.
The statement below is FALSE
The statement above is TRUE
...they are unique identifiers, not URLs.
They don't need to be hosted for the same reason that there isn't a machine out there called com.sun.java.util.dates.FunnyDate