Slashdot Mirror


Is Dedicated Hosting for Critical DTDs Necessary?

pcause asks: "Recently there was a glitch, when someone at Netscape took down a page that had an important DTD (for RSS), used by many applications and services. This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. Is this a sane way to build an XML based Internet infrastructure? Companies come and go all of the time; this means that the storage and availability of those DTDs is in constant jeopardy. It strikes me that we need an infrastructure akin to the root server structure to hold the key DTDs that are used throughout the industry. What organization would be the likely custodian of such data, and what would be the best way to insure such an infrastructure stays funded?"

45 of 140 comments (clear)

  1. I know! by Colin+Smith · · Score: 4, Funny

    ICANN!

    Mhahahahaha. Yeah. I know, I crack myself up.

    --
    Deleted
    1. Re:I know! by mollymoo · · Score: 2, Insightful

      The point was that repling on a single entity isn't a good idea. Google is a single company, The Internet Archive is a single organisation.

      I'd suggest something more along the lines of DNS, where although there would be a single ultimate authority, the day-to-day business of serving DTDs would be distributed and handled by multiple levels of servers.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    2. Re:I know! by commodoresloat · · Score: 4, Funny

      ICANN!

      Mhahahahaha. Yeah. I know, I crack myself up.

      No you cann't!
    3. Re:I know! by UltraAyla · · Score: 2, Insightful

      I think you're right on the money here. DNS-like was my first thought as well. Have a root system where all updates are made, then have organizations which check for updates to a package of multiple critical DTDs on a weekly or monthly basis or something. Then people can have a list of DTD sources in the event that one goes down (though I'm pretty sure XML only supports one DTD in each document - someone correct me if I'm wrong). This would reduce the burden on any one person, allow organizations to manage their DTDs on their own if they like, etc.

  2. Centralization by ushering05401 · · Score: 4, Insightful

    Nothing too insightful to write, but worth saying in today's volatile political climate. Centralization makes me nervous.

    Regards.

    1. Re:Centralization by radarsat1 · · Score: 2, Interesting

      Exactly. How about hosting these important files via a decentralized bittorrent tracker?
      Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
      There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.

    2. Re:Centralization by Bogtha · · Score: 4, Informative

      There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.

      This is known as a URN. URLs and URNs are together known as URIs.

      --
      Bogtha Bogtha Bogtha
    3. Re:Centralization by kwark · · Score: 2, Informative

      You meant something like magnet URIs?
      http://en.wikipedia.org/wiki/Magnet:_URI_scheme

    4. Re:Centralization by frisket · · Score: 3, Informative

      The defects of the URN/URI/URL mechanism were well known at the time this was discussed in the working groups and SIGs while XML was gestating.

      The correct solution would have been to fix the outstanding problems with FPIs and use a combination of local catalog and DNS-style resolution, but this was turned down. Perhaps it's time to wake it up.

      In the 1990s I did try to devise a resolution server for FPIs, in the hope that someone like the (then) GCA (now IdeAlliance) -- who were the ISO 9070 Registration Authority and theoretically still are -- would pick up the idea.

      I still have the large collection of SGML DTDs used at the time, now largely redundant, but replacing it with current XML is not the problem. This is something that should probably be discussed at the Markup conference in Montreal this summer.

  3. w3c by partenon · · Score: 5, Insightful

    w3c.org . There's no better place to keep the standards related to the web.

    --
    ilex paraguariensis for all
    1. Re:w3c by JordanL · · Score: 4, Funny

      There's no better place to keep the standards related to the web.
      Some say that wistfully, others begrudgingly.
    2. Re:w3c by inKubus · · Score: 2, Interesting

      What about a distributed file system that works like DNS? Hierarchial servers that each are responsible for a different level of the DTD. The "Root" is a trusted group of servers, which maintain a list of other servers where you can get a copy of the rest of the DTD. Then plugin builders and other sub-entities can have their own server for extensions to the base DTD.

      Unfortunately, the DNS method has proven to not necessarily be the best way, with poisoning and stuff that can occur. Of course, it was designed during the days when they didn't just let anyone on the internet. But you can always diff your copy all the way to the publisher if you are paranoid, and with a signing server or something MD5ish that signs the DTD.

      --
      Cool! Amazing Toys.
  4. DTD? by mastershake_phd · · Score: 4, Insightful

    and DTD stands for? Distributed Technical Dependency?

    1. Re:DTD? by x_MeRLiN_x · · Score: 4, Informative

      Document Type Definition

    2. Re:DTD? by Sporkinum · · Score: 4, Funny

      It's the sound Carlos Mencia makes...

      --
      "He's lost in a 'floyd hole"
  5. In case of death... by Kjella · · Score: 4, Insightful

    ...keep a copy, host it on your own site and reference that instead. There was no problem except that some were using that file to download the definitions. Or just expand the definition to include a checksum and a list of mirrors. Is this even a problem worth solving? I mean except for the slashdot post it seemed to me like this went by without anyone noticing.

    --
    Live today, because you never know what tomorrow brings
    1. Re:In case of death... by centinall · · Score: 2, Interesting

      what if you're using a 3rd party library that has references to the dtd, schema or whatever? you don't really want to go through and change all of them.

      what if XML files, for instance, are being exchanged between your application and others and they are including a dtd that doesn't reside within your domain?

      I'm sure there are other scenarios as well.

  6. Sane? by DogDude · · Score: 5, Insightful

    Well, I wouldn't call it sane if anybody who is actively using XML and needs a DTD isn't hosting it right along with whatever web site they're using the XML for. Relying on somebody else to maintain a critical DTD that you use isn't sane. It's pretty dumb.

    --
    I don't respond to AC's.
    1. Re:Sane? by DogDude · · Score: 2, Insightful

      Well, even if you're not, then you should absolutely, positively, and without any doubt, at least in my mind, have a copy of all of your DTD's.

      --
      I don't respond to AC's.
    2. Re:Sane? by curunir · · Score: 2, Insightful

      Exactly. If you write an application that requires a DTD (or XSD for that matter) to parse an XML document, include that file as part of the software. The XML processing code should intercept entity references and load them from the local copy. Not only does this make your application more reliable, it also makes it faster.

      Public hosting of schema documents should not be for application use where the application knows ahead of time what kind of document it will be parsing (like the RSS situation). In all likelihood, a change to that schema document will cause an error in the XML parsing anyway, since the parser isn't expecting new or changed elements.

      Public hosting of documents should be reserved for editors that create XML documents that must comply with a given format. This allows XML authors to validate their documents against the schema, but nothing breaks when the publicly-hosted document becomes unavailable.

      --
      "Don't blame me, I voted for Kodos!"
  7. No by Bogtha · · Score: 5, Insightful

    You shouldn't be using DTDs any more. Validation is better achieved with RelaxNG, and you shouldn't use them for entity references because then non-validating parsers won't be able to handle your code.

    For those document types that already use DTDs, either you ship the DTDs with your application, or you cache them the first time you parse a document of that type.

    The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust. You shouldn't be looking at the hosting to solve the problem.

    --
    Bogtha Bogtha Bogtha
    1. Re:No by Anonymous Coward · · Score: 3, Insightful

      The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust.

      Not sufficiently robust is an understatement. ****ing stupid is what I would call it. If every browser had to hit the W3C site for the HTML DTDs every time they loaded a web page, the web would collapse.

  8. Don't know what a DTD is? by ryanisflyboy · · Score: 2, Informative
  9. Don't use them by Anonymous Coward · · Score: 5, Insightful

    If the absence of these files will break your app or service, then you need to make your app or service more robust.

    Sure, DTD files are necessary for development. If your app requires that they be used to validate something in real time each time it is comes in from a client or whatever, then use an internal copy of the version of the DTD file that you support. If the host makes a change to it (or drops it, or lets it get hacked), your app won't break, and you can decide when you will implement and support that change.

    I really don't see what is gained by making the real time operation of your application dependent on the availability and pristinity of remotely and independently hosted files. It just makes you fragile, and you can get all the benefits you need from just checking the files during your maintenance and development cycles.

    1. Re:Don't use them by Skreems · · Score: 4, Informative

      Exactly. The only point of having a URL associated with a DTD is to assure a unique identifier for each one. It wasn't worth starting a group specifically to regulate DTD identifiers, so they hooked it to a system that's already regulated. Yeah, it's nice to have the DTD live at that location, so if you get a file with a reference to an unfamiliar DTD you can pull it down on the spot, but it shouldn't be required.

      --
      Slashdot needs a "-1, Wrong" moderation option.
      The Urban Hippie
  10. Perhaps something like "pool.ntp.org"? by Zocalo · · Score: 4, Insightful

    NTP.org" maintains a pool of public NTP servers that are accessible via the hostname "pool.ntp.org", so perhaps something similar would work for a global DTD repository. An industry organization with a vested interest, the W3C seems like the most logical, could maintain the DNS zone and organizations could volunteer some server space and bandwidth to host a mirror of the collected pool of DTDs. Volunteering organizations might come and go, but when that happens it's just a matter of updating the DNS zone to reflect the change and everyone using DTDs just needs to know a single generic hostname will always provide a copy of the required DTD.

    Just a thought...

    --
    UNIX? They're not even circumcised! Savages!
  11. using non-local cached copy considered harmful by tota · · Score: 4, Interesting

    Most tools provide a way to refer to a DTD on a public URL, yet use the local copy instead. (ie: taglib-location directive in java)

    Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).

    --
    TODO: 753) write sig.
  12. Call me crazy... by Nimey · · Score: 4, Interesting

    but just have your DTD as a W3C standard, distribute copies with your software, and don't bother a remote server until a new version of the DTD is released. Then distribute it with a new version of your software.

    Seriously, what the fuck were they thinking relying on a server to be always available?

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
    1. Re:Call me crazy... by Megane · · Score: 2, Interesting

      Even more stupid is that the URI had a freaking version number in the filename! It's not like someone would update it, and then give it the old version number. It's going to give you the same file even when there's a newer version!

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    2. Re:Call me crazy... by Nimey · · Score: 4, Funny

      It's not like someone would update it, and then give it the old version number.


      Your trust in the world is cute. :-)
      --
      Hail Eris, full of mischief...

      E pluribus sanguinem
  13. URI vs URL by Sparr0 · · Score: 5, Insightful

    A key mistake in your assumptions was brought up when the Netscape fiasco was news, and I will bring it up again...

    "http://my.netscape.com/publish/formats/rss-0.91.d td" is a URI. It uniquely identifies a file. It *HAPPENS* to also be the URL for that same file, for now, but that is just a fortunate intentional coincidence. Your software should not rely on or require the file to be located at that URL. /var/dtd/rss-0.91.dtd is a perfectly valid location for the file identified by the URI "[whatever]/rss-0.91.dtd". What we need is for XML-using-software authors to support and embrace local DTD caches, AND package DTDs along with their applications (with the possibility of updating them from the web if neccessary).

    It is silly that millions of RSS readers fetch a non-changing file from the same web site every day. It is only very slightly less silly that they fetch it from the web at all.

  14. Not again by dedazo · · Score: 3, Informative

    This has been covered before here and elsewhere... anyone who is using a DTD as a URL rather than a URI needs to be taken out and shot. I say bring them all down and let all the apps that rely on them die or be fixed.

    --
    Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
  15. Supply local DTDs with your app by Dragonshed · · Score: 4, Interesting

    I recently (within the last year) deployed an application that end users use for downloading and viewing custom content, and are intended to install the app onto laptops, tablets, and other portable devices allowing them view said content both on and off-line.

    When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.

    In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.

  16. sure there is! by commodoresloat · · Score: 2, Funny

    What's wrong with this website?

  17. I have a server in my basement we could use. by fyoder · · Score: 4, Funny

    Linux box with an uptime of 153 days. It does have to go down now and again so I can clean the dust and cat fur out of it, but that doesn't take too long.

    --
    Loose lips lose spit.
    1. Re:I have a server in my basement we could use. by Skapare · · Score: 2, Funny

      I have an old Sun Sparc 5/70 that still works. Rock solid machine and has OpenBSD loaded on it. I even have a static IP address on my dialup service I could put it on.

      --
      now we need to go OSS in diesel cars
  18. DTDs, XML entities and the non-breaking space by Darkforge · · Score: 3, Funny

    Unfortunately, DTDs aren't just for validation... they're also the only good way to define "entities" (e.g. "&foo;") in XML. This comes up a lot when trying to put HTML in XML feeds, because HTML has a lot of entities that aren't in the XML spec. Specifically, you may notice that you can't type " " in ordinary XML.

    It's trivial to define "&nbsp;" yourself in a DTD, (<!ENTITY nbsp "&#a0;">) and many of the standard DTDs out there do define it, but by the XML 1.0 standard it's got to be defined somewhere or else the XML won't parse.

    --

    When I moderate, I only use "-1, Overrated". That way, I never get meta-moderated!

  19. short answer: no by coaxial · · Score: 3, Insightful

    Validation is overrated. Especially, when it comes to RSS. There's so many competing "compatable" standards, that really aren't. feedparser.org has a great write up about the state of RSS. It's pathetic.

    If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway. When you're constructing XML, you should write it according to the DTD, but if you're relying on a remote site, then you're asking for trouble. Just cache the version locally, but seriously, you're tool shouldn't really need it. You're engineers do, but not the tool.

    Finally, it's trivial to reconstruct a dtd from sample documents.

    1. Re:short answer: no by KermodeBear · · Score: 2, Interesting
      Off-topic gripe, but:

      If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway.
      I did work developing a large XML-based integration with the mortgage lender AmeriQuest. Boy, did they have interesting ideas on what valid XML is! I had to deal with fun things like:

      <tag />data</tag> - An empty tag being used for an opening tag
      </tag>data</tag> - A closing tag being used for an opening tag
      <tag>data<tag> - The opposite problem - two opening tags!
      <tag attribute="data" attribute="data2" attribute="data3"> - The same attribute appearing multiple times because they wanted to send multiple values. God forbid they create child nodes!

      and other fun things. Imagine my frustration when I was told, "The customer is always right, if that is what they are sending then find a way to handle it..." Different xml events had special processing to turn the invalid xml into a well-formed document so that they could be parsed. Ugh.

      AmeriQuest had farmed all the work out to an Indian outsourcing firm. You get what you pay for...
      --
      Love sees no species.
  20. EXACTLY by wowbagger · · Score: 4, Insightful
    Exactly right, but it is even worse than that:

    A DTD spec SHOULD have both a PUBLIC identifier and a SYSTEM identifier. The system identifier is strongly recommended to be a URL so that a validating parser can fetch the DTD if the DTD is not found in the system catalog.

    The system catalog is supposed to map from the PUBLIC identifier to a local file, so that the parser needn't go to the network.

    If you are running a recent vintage Linux, look in /etc/xml/ - there are all the catalog maps for all the various DTDs in use.

    So:
    1. The application writers SHOULD have added the DTDs to the local system's catalog.
    2. Failing that, the application SHOULD have cached the DTD locally the first time it was fetched, and never fetched it again.


  21. XML catalog files let your app use local copies... by KarmaRundi · · Score: 3, Informative

    You can map public and system identifiers to local resources. Use them for dtds, schemas, stylesheets, etc. Here's the spec. Google for more information.

  22. DTD Critical Hosting by liothen · · Score: 2, Insightful

    Why doesnt the content provider just provide the dtd. Why have to worry about caching it or random errors poping up in it, when the DTD can be stored on the very same server as the website, or stored with the application. Then it doesnt matter if another company screws up or if some miliscious hacker decideds to attack the DTD it doesnt effect your product...
      Some might think well what if it changes?
    well its obvious download the new one update your xhtml/xml or application to the specific changes.

  23. The security implications are extremely ugly by knorthern+knight · · Score: 2, Insightful

    1) There are some sensitive environments (military, etc) where you simply do *NOT* connect your internal network to "teh interweb". No ifs, ands, ors, buts. The result is a broken browser where the DTD's are required.

    2) Remember the incident where popular "safe" Superbowl sites were compromised and laced with malware-installing code? What happens to millions of Firefox-on-Windows users when a bunch of Russian mobsters or Chinese government agents hijack a DTD host and load it with a zero-day Windows exploit?

    3) Remember "pharming", where DNS servers are hijacked to redirect *CORRECTLY TYPED URLS* to malware-infested sites. Even if the bad-guys can't hijack the DTD host, they can still hijack Windows-based DNS servers (ptui!) and anybody who relies on them gets redirected to a malware-install site.

    That's the problem; here's my solution. It's composed of two parts.

    A) DTDs will be *LOCAL FILES ON YOUR WORKSTATION* (excepting "thin clients").

    B) Browsers (or possibly Operating Systems) will include new DTDs with updates. In posix OS's (*NIX, BSD) DTDs will be stored in /etc/dtd/ and users will be able to add their own DTDs in ~/.dtd
    Windows will have its own locations. When you get your regular update for your browser (or alternatively, your OS), part of the update will be any new DTDs. There will be a separate file for each DTD and version, so that your browser can properly handle multiple tabs opening to sites using different versions of the same base DTD.

    --

    I'm not repeating myself
    I'm an X window user; I'm an ex-Windows user
  24. Re:Catalog files? by EsbenMoseHansen · · Score: 4, Insightful

    Or better yet why can't you just copy the blasted thing to your own site if your going to use it?

    Is there some technical reason I'm not aware of that means it has to stay somewhere central?

    There shouldn't be, yet I would be greatly surprised if some application didn't match on the entire DTD string, hostname and all.

    I am equally baffled at what applications need the DTD for anyway. Except for generic XML applications, what use is a DTD? Most applications only handles a fixed few XML document types anyway.

    Finally, if they really need that DTD... any distro have most major DTDs available. No reason why they couldn't carry a few extra. Should be easy to just search for them locally.

    --
    Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.
  25. Re:Localized hosting by bytesex · · Score: 2, Interesting

    Exactly. What always struck me about certain applications that do a DTD-conformant XLST processing step _every_time_ a web page is checked. That means my web app is dependent on the location on the internet being reachable (proxies!! downtime!! all that yummy goodness!!), plus the unacceptable overhead. But.. they merrily keep on making XSLT processors that _will_not_run_ without access to the DTD (I'm looking at you java!).

    --
    Religion is what happens when nature strikes and groupthink goes wrong.