Is Dedicated Hosting for Critical DTDs Necessary?

I know! by Colin+Smith · 2007-05-17 10:09 · Score: 4, Funny

ICANN!

Mhahahahaha. Yeah. I know, I crack myself up.

--
Deleted

Re:I know! by Score+Whore · 2007-05-17 10:13 · Score: 1

Clearly the particular organization is not yet formed, however there is absolutely no question that it should be hosted in Iran.
Re:I know! by rs79 · 2007-05-17 10:38 · Score: 1

Why there? Did you want to run an MLM?

Google or archive.org come to mind as a more senseible choice.

--
Need Mercedes parts ?
Re:I know! by mollymoo · 2007-05-17 11:44 · Score: 2, Insightful

The point was that repling on a single entity isn't a good idea. Google is a single company, The Internet Archive is a single organisation.

I'd suggest something more along the lines of DNS, where although there would be a single ultimate authority, the day-to-day business of serving DTDs would be distributed and handled by multiple levels of servers.

--
Chernobyl 'not a wildlife haven' - BBC News
Re:I know! by commodoresloat · 2007-05-17 11:45 · Score: 4, Funny

ICANN!

Mhahahahaha. Yeah. I know, I crack myself up.

No you cann't!
Re:I know! by UltraAyla · 2007-05-17 15:27 · Score: 2, Insightful

I think you're right on the money here. DNS-like was my first thought as well. Have a root system where all updates are made, then have organizations which check for updates to a package of multiple critical DTDs on a weekly or monthly basis or something. Then people can have a list of DTD sources in the event that one goes down (though I'm pretty sure XML only supports one DTD in each document - someone correct me if I'm wrong). This would reduce the burden on any one person, allow organizations to manage their DTDs on their own if they like, etc.
Re:I know! by Anonymous Coward · 2007-05-17 15:31 · Score: 1, Informative

ICANN!
Mhahahahaha. Yeah. I know, I crack myself up.

You laugh, but ICANN's Internet Assigned Numbers Authority (IANA) has a track record of running countless protocol registries. i.e. port numbers, SNMP private enterprise numbers, MIME types etc. It seems to make sense to me.
Re:I know! by Bazer · 2007-05-18 02:23 · Score: 1

Why not use a round-robin entry like pool.ntp.org under w3c.org and be gone with it?
Re:I know! by Anonymous Coward · 2007-05-18 06:53 · Score: 0

They also have a track record of publicizing contact e-mail addresses, that you are required to provide, in their registries.
Registering a port number, private enterprise number etc is guaranteed to bring you loads of spam (especially 419).
And they are not at all willing to change the situation (not that it would matter anymore).

Centralization by ushering05401 · 2007-05-17 10:09 · Score: 4, Insightful

Nothing too insightful to write, but worth saying in today's volatile political climate. Centralization makes me nervous.

Regards.

Re:Centralization by radarsat1 · 2007-05-17 10:21 · Score: 2, Interesting

Exactly. How about hosting these important files via a decentralized bittorrent tracker?
Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.
Re:Centralization by Bogtha · 2007-05-17 10:27 · Score: 4, Informative

There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.

This is known as a URN. URLs and URNs are together known as URIs.

--
Bogtha Bogtha Bogtha
Re:Centralization by TheRaven64 · 2007-05-17 10:28 · Score: 1

Doctypes do not contain a URL indicating the location of the DTD, they include a URI. This URI is typically a URL, but could easily be something else.

--
I am TheRaven on Soylent News
Re:Centralization by kwark · 2007-05-17 10:53 · Score: 2, Informative

You meant something like magnet URIs?
http://en.wikipedia.org/wiki/Magnet:_URI_scheme
Re:Centralization by Anonymous Coward · 2007-05-17 11:41 · Score: 0

Why? It's not as if the XML specification would have to change. People would still be able to host their DTDs where ever they want.
Re:Centralization by Doctor+Memory · 2007-05-17 14:38 · Score: 1

Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized. Nope, it just means that your torrent tracker would have to have a way to resolve the reference. Whether something like DNS where you have specific "go-to" hosts, or whether you just ask every host you're connected with, or something else (maybe a kind of dynamic mesh with ad-hoc gateways), the choice is up to you.

Maybe something like NTP, where you have the strata-1 time servers, and then the designated strata-2 servers, and everyone is encouraged to set up a strata-3 server for their own subnet. This way nobody's really dependent on anyone else if they don't want to be. Once this gets set up, maybe you could even have a dtd: protocol that specifies how to find a server, and how to cache DTDs once you get them (and how to expire or occlude them when a new version comes out).

--
Just junk food for thought...
Re:Centralization by frisket · 2007-05-17 23:45 · Score: 3, Informative

The defects of the URN/URI/URL mechanism were well known at the time this was discussed in the working groups and SIGs while XML was gestating.
The correct solution would have been to fix the outstanding problems with FPIs and use a combination of local catalog and DNS-style resolution, but this was turned down. Perhaps it's time to wake it up.
In the 1990s I did try to devise a resolution server for FPIs, in the hope that someone like the (then) GCA (now IdeAlliance) -- who were the ISO 9070 Registration Authority and theoretically still are -- would pick up the idea.
I still have the large collection of SGML DTDs used at the time, now largely redundant, but replacing it with current XML is not the problem. This is something that should probably be discussed at the Markup conference in Montreal this summer.

w3c by partenon · 2007-05-17 10:09 · Score: 5, Insightful

w3c.org . There's no better place to keep the standards related to the web.

--
ilex paraguariensis for all

Re:w3c by JordanL · 2007-05-17 10:16 · Score: 4, Funny

There's no better place to keep the standards related to the web.
Some say that wistfully, others begrudgingly.

--
FanFictionRecs.net
Re:w3c by inKubus · 2007-05-17 11:09 · Score: 2, Interesting

What about a distributed file system that works like DNS? Hierarchial servers that each are responsible for a different level of the DTD. The "Root" is a trusted group of servers, which maintain a list of other servers where you can get a copy of the rest of the DTD. Then plugin builders and other sub-entities can have their own server for extensions to the base DTD.

Unfortunately, the DNS method has proven to not necessarily be the best way, with poisoning and stuff that can occur. Of course, it was designed during the days when they didn't just let anyone on the internet. But you can always diff your copy all the way to the publisher if you are paranoid, and with a signing server or something MD5ish that signs the DTD.

--
Cool! Amazing Toys.
Re:w3c by bofkentucky · 2007-05-17 11:17 · Score: 1

The TXT record is more than capable of doing this, just like your spf statement for your approved mail exchangers.

--
09f911029d74e35bd84156c5635688c0
Re:w3c by flooey · 2007-05-17 11:23 · Score: 1

w3c.org . There's no better place to keep the standards related to the web.

I'd expand on that and say: whatever organization is responsible for developing the format that the DTD is for. The W3C is responsible for things like XHTML, so they should be hosting the DTD for it. The IETF should have the DTD for Atom. RSS is currently maintained by Harvard and the DTD should be maintained by them.
Re:w3c by J'raxis · 2007-05-17 11:47 · Score: 1

RSS 0.9x was developed by Netscape; having the originator host it, forever, is how we got in this problem in the first place.

--
Liberty in your lifetime
Re:w3c by Anonymous Coward · 2007-05-17 12:27 · Score: 0

That place is for standards. Just because it's an important dtd doesn't mean it's a standard. A better place would be at the GNAA.

hmm... by Anonymous Coward · 2007-05-17 10:10 · Score: 0, Redundant

ICANN do it!

Re:hmm... by pionzypher · 2007-05-17 10:16 · Score: 0, Redundant

Yeah? Well whatever you can do, ICANN do better.

--
I'll believe in corporations having personhood when Texas executes one... - advocate_one
Re:hmm... by Short+Circuit · 2007-05-17 12:12 · Score: 1

ICANN do anything better than you.

--
tasks(723) drafts(105) languages(484) examples(29106)

DTD? by mastershake_phd · 2007-05-17 10:10 · Score: 4, Insightful

and DTD stands for? Distributed Technical Dependency?

--
Libertarian Leaning Political Discussion Forum.

Re:DTD? by ralf1 · 2007-05-17 10:16 · Score: 0

Google is your friend. http://en.wikipedia.org/wiki/Document_Type_Definit ion

--
"Would you, could you, with a goat?" Dr Seuss
Re:DTD? by x_MeRLiN_x · 2007-05-17 10:17 · Score: 4, Informative

Document Type Definition
Re:DTD? by Anonymous Coward · 2007-05-17 10:21 · Score: 0

Death by Text Data
Re:DTD? by Sporkinum · 2007-05-17 11:07 · Score: 4, Funny

It's the sound Carlos Mencia makes...

--
"He's lost in a 'floyd hole"
Re:DTD? by bckrispi · 2007-05-17 11:28 · Score: 1

^If only Mencia were that funny!

--
Xenon, where's my money? -Borno
Re:DTD? by Joebert · 2007-05-17 11:43 · Score: 1

Thankyou, my eyes are still watering, great laugh !

--
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
Re:DTD? by Chris+Mattern · 2007-05-17 13:39 · Score: 1

I thought it was the sound Tweeky makes.

"DTDTDT, that's right, Buck."

Chris Mattern
Re:DTD? by The+Iconoclast · 2007-05-18 01:21 · Score: 1

No, that's

"BDBDBDBD"

--
Quando Omni Flunkus Moritati

In case of death... by Kjella · 2007-05-17 10:11 · Score: 4, Insightful

...keep a copy, host it on your own site and reference that instead. There was no problem except that some were using that file to download the definitions. Or just expand the definition to include a checksum and a list of mirrors. Is this even a problem worth solving? I mean except for the slashdot post it seemed to me like this went by without anyone noticing.

--
Live today, because you never know what tomorrow brings

Re:In case of death... by Anonymous Coward · 2007-05-17 10:19 · Score: 0

Many applications have already addressed this with custom DTD readers. Having your mission critical web applicaiton depending on some server you don't control is just asking for trouble.
Re:In case of death... by centinall · 2007-05-17 10:36 · Score: 2, Interesting

what if you're using a 3rd party library that has references to the dtd, schema or whatever? you don't really want to go through and change all of them.

what if XML files, for instance, are being exchanged between your application and others and they are including a dtd that doesn't reside within your domain?

I'm sure there are other scenarios as well.
Re:In case of death... by Anonymous Coward · 2007-05-17 19:36 · Score: 0

It would be nice if third party libraries could supply the dtd alongwith thier distribution like hibernate is doing it. It firsts looks up for the dtd in the locam domain.

D2D by Anonymous Coward · 2007-05-17 10:14 · Score: 0

"This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. "

Install them into every commercial and consumer router.

Not only DTDs, but also ontology definitions by Anonymous Coward · 2007-05-17 10:18 · Score: 1, Insightful

Such a system should also allow stable storage and management of ontology definitions, used within the semantic web.

I would suggest someone like OSTG or the Mozilla foundation...

Re:Not only DTDs, but also ontology definitions by Achromatic1978 · 2007-05-17 11:10 · Score: 1

I would suggest someone like OSTG or the Mozilla foundation...
Hahaha. You crack me up.
Why?

XML? What? by moderatorrater · 2007-05-17 10:21 · Score: 0, Flamebait

You sound like a PHB who thinks to himself, "XML is a buzzword, I'll bet it'll get the job done."

Re:XML? What? by pete6677 · 2007-05-17 16:50 · Score: 1

Well it is, isn't it? And if it doesn't, there's always XML 2.0

Sane? by DogDude · 2007-05-17 10:25 · Score: 5, Insightful

Well, I wouldn't call it sane if anybody who is actively using XML and needs a DTD isn't hosting it right along with whatever web site they're using the XML for. Relying on somebody else to maintain a critical DTD that you use isn't sane. It's pretty dumb.

--
I don't respond to AC's.

Re:Sane? by sconeu · 2007-05-17 10:48 · Score: 1

Who says you're using XML for a website?

--
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Re:Sane? by DogDude · 2007-05-17 12:07 · Score: 2, Insightful

Well, even if you're not, then you should absolutely, positively, and without any doubt, at least in my mind, have a copy of all of your DTD's.

--
I don't respond to AC's.
Re:Sane? by curunir · 2007-05-17 12:26 · Score: 2, Insightful

Exactly. If you write an application that requires a DTD (or XSD for that matter) to parse an XML document, include that file as part of the software. The XML processing code should intercept entity references and load them from the local copy. Not only does this make your application more reliable, it also makes it faster.

Public hosting of schema documents should not be for application use where the application knows ahead of time what kind of document it will be parsing (like the RSS situation). In all likelihood, a change to that schema document will cause an error in the XML parsing anyway, since the parser isn't expecting new or changed elements.

Public hosting of documents should be reserved for editors that create XML documents that must comply with a given format. This allows XML authors to validate their documents against the schema, but nothing breaks when the publicly-hosted document becomes unavailable.

--
"Don't blame me, I voted for Kodos!"

No by Bogtha · 2007-05-17 10:25 · Score: 5, Insightful

You shouldn't be using DTDs any more. Validation is better achieved with RelaxNG, and you shouldn't use them for entity references because then non-validating parsers won't be able to handle your code.

For those document types that already use DTDs, either you ship the DTDs with your application, or you cache them the first time you parse a document of that type.

The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust. You shouldn't be looking at the hosting to solve the problem.

--
Bogtha Bogtha Bogtha

Re:No by Anonymous Coward · 2007-05-17 10:45 · Score: 3, Insightful

The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust.

Not sufficiently robust is an understatement. ****ing stupid is what I would call it. If every browser had to hit the W3C site for the HTML DTDs every time they loaded a web page, the web would collapse.
Re:No by rholtzjr · 2007-05-17 11:45 · Score: 1

Okay, so what you are saying is that we ship the SAME DTD that is already defined with the application that we provide. ???? WHAT ????

This is does not follow OO design methodologies! REUSE!!!! The whole point behind OO design is that we reuse existing components. If we can not do this then what is the point of OO. If we have defined a DTD that can be used BY the community, then it should be made available FOR the community. The re-distribution of the DTD does not make sense, as it could be altered from one iteration to another. If we suggest this approach, we would have MANY iterations of the SAME DTD.

I am not sure I agree with your assessment of "If you use a DTD, then you must provide it". This does not solve the issue

RelaxNG may provide a mechanism to allow the custom modification of existing schemas, by providing a modification still does not solve the issue of: REGISTERING your schema with any applications that uses it..

The real problem here is that fact that an XML document REQUIRES a DTD!!!, whether its a default or or custom defined.

If a default/Custom DTD is required for the parsing of an XML documentation, then it SHOULD be provided as a service, as this follows the re-use advantages of an OO design pattern!

IN other words, It your application require parsing of a structure, then this stucture MUST be publically available

The bigger question is HOW we REMOVE this dependency!!!
Re:No by msuarezalvarez · 2007-05-17 15:11 · Score: 1

From what you write, it is clear that this is among the least of your problems... Anyways: please do not shout as much!

Mod parent up by Mr+44 · 2007-05-17 10:31 · Score: 1

This is just not an issue worth solving...

Don't know what a DTD is? by ryanisflyboy · 2007-05-17 10:34 · Score: 2, Informative

This can help:
http://en.wikipedia.org/wiki/Document_Type_Definit ion

Don't use them by Anonymous Coward · 2007-05-17 10:35 · Score: 5, Insightful

If the absence of these files will break your app or service, then you need to make your app or service more robust.

Sure, DTD files are necessary for development. If your app requires that they be used to validate something in real time each time it is comes in from a client or whatever, then use an internal copy of the version of the DTD file that you support. If the host makes a change to it (or drops it, or lets it get hacked), your app won't break, and you can decide when you will implement and support that change.

I really don't see what is gained by making the real time operation of your application dependent on the availability and pristinity of remotely and independently hosted files. It just makes you fragile, and you can get all the benefits you need from just checking the files during your maintenance and development cycles.

Re:Don't use them by Skreems · 2007-05-17 13:06 · Score: 4, Informative

Exactly. The only point of having a URL associated with a DTD is to assure a unique identifier for each one. It wasn't worth starting a group specifically to regulate DTD identifiers, so they hooked it to a system that's already regulated. Yeah, it's nice to have the DTD live at that location, so if you get a file with a reference to an unfamiliar DTD you can pull it down on the spot, but it shouldn't be required.

--
Slashdot needs a "-1, Wrong" moderation option.
The Urban Hippie

Doctypes are completely broken design. by Ant+P. · 2007-05-17 10:36 · Score: 1

The only other language I know of that even allows file sourcing over HTTP is PHP, and there it's a gaping security hole that defaults to off. In everything else, the dependencies *get installed to the local file system*.

Perhaps something like "pool.ntp.org"? by Zocalo · 2007-05-17 10:39 · Score: 4, Insightful

NTP.org" maintains a pool of public NTP servers that are accessible via the hostname "pool.ntp.org", so perhaps something similar would work for a global DTD repository. An industry organization with a vested interest, the W3C seems like the most logical, could maintain the DNS zone and organizations could volunteer some server space and bandwidth to host a mirror of the collected pool of DTDs. Volunteering organizations might come and go, but when that happens it's just a matter of updating the DNS zone to reflect the change and everyone using DTDs just needs to know a single generic hostname will always provide a copy of the required DTD.

Just a thought...

--
UNIX? They're not even circumcised! Savages!

using non-local cached copy considered harmful by tota · 2007-05-17 10:41 · Score: 4, Interesting

Most tools provide a way to refer to a DTD on a public URL, yet use the local copy instead. (ie: taglib-location directive in java)

Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).

--
TODO: 753) write sig.

Call me crazy... by Nimey · 2007-05-17 10:41 · Score: 4, Interesting

but just have your DTD as a W3C standard, distribute copies with your software, and don't bother a remote server until a new version of the DTD is released. Then distribute it with a new version of your software.

Seriously, what the fuck were they thinking relying on a server to be always available?

--
Hail Eris, full of mischief...

E pluribus sanguinem

Re:Call me crazy... by libkarl2 · 2007-05-17 11:44 · Score: 1

Seriously, what the fuck were they thinking relying on a server to be always available? I've noticed the trend lately. Folks *want* some server to always be available. They want this so badly, they just go about their business as if the server in question would always be available. Even trained pros, who know better, sometimes think and/or act this way. Especially with regards to systems they can't see, and do not have to maintain. Thus, the Hard & Painful Lessons of Life(tm) still have their place in the world. ;(

--
You are where you are at the time you are there.
Re:Call me crazy... by Megane · 2007-05-17 13:24 · Score: 2, Interesting

Even more stupid is that the URI had a freaking version number in the filename! It's not like someone would update it, and then give it the old version number. It's going to give you the same file even when there's a newer version!

--
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Re:Call me crazy... by Nimey · 2007-05-17 13:41 · Score: 4, Funny

It's not like someone would update it, and then give it the old version number.

Your trust in the world is cute. :-)

--
Hail Eris, full of mischief...

E pluribus sanguinem

URI vs URL by Sparr0 · 2007-05-17 10:41 · Score: 5, Insightful

A key mistake in your assumptions was brought up when the Netscape fiasco was news, and I will bring it up again...

"http://my.netscape.com/publish/formats/rss-0.91.d td" is a URI. It uniquely identifies a file. It *HAPPENS* to also be the URL for that same file, for now, but that is just a fortunate intentional coincidence. Your software should not rely on or require the file to be located at that URL. /var/dtd/rss-0.91.dtd is a perfectly valid location for the file identified by the URI "[whatever]/rss-0.91.dtd". What we need is for XML-using-software authors to support and embrace local DTD caches, AND package DTDs along with their applications (with the possibility of updating them from the web if neccessary).

It is silly that millions of RSS readers fetch a non-changing file from the same web site every day. It is only very slightly less silly that they fetch it from the web at all.

Re:URI vs URL by Fnkmaster · 2007-05-17 12:11 · Score: 1

Actually, I'd go a step further. It might be useful to actually *not* host the DTD itself at that URI. As I recall, there was never a requirement that DTDs actually be located at the URI if it was treated as a URL.

If instead the URL just returned a page that said: "You can find a copy of the appropriate DTD at the following locations..." and listed them, it would remove the temptation to introduce a programmatic dependency on that URL being live but still give people a way to find that resource, and force developers to map the URI to a file internally in their applications.
Re:URI vs URL by uctechdude · 2007-05-17 14:05 · Score: 1

agreed...though i don't think its silly with RSS...it just helps the lazies :D

--
Linux fixes all the cracked Windows.
Re:URI vs URL by treeves · 2007-05-18 06:34 · Score: 1

very slightly less silly
This reminds me of something. . .
Voice-over: Here at Luton it's a three-cornered fight between Alan Jones - Sensible Party; in the middle, Tarquin Fintimlimbimwhinbimlim Bus- stop F'tang F'tang Olè Biscuitbarrel, Silly Party, and Kevin Phillips-Bong, the Slightly Silly candidate.

--
...the future crusty old bastards are already drinking the Kool-Aid.

MOD PARENT UP by timster · 2007-05-17 10:50 · Score: 1

Don't usually do this, but the above comment is the first one in this conversation that explains why this problem doesn't really exist.

--
I have seen the future, and it is inconvenient.

XML Catalogs by Chris+Chiasson · 2007-05-17 10:55 · Score: 1

I think there is an OASIS standard called XML Catalogs for redirecting offsite schema requests to a local copy...

Re:XML Catalogs by holloway · 2007-05-17 11:23 · Score: 1

Yes, you're right, that's the standard way of caching them locally. I'm not sure that all RSS clients are XML processors though.

HTML clients (browsers) don't go requesting the HTML dtd, and so it could be said that the RSS client shouldn't either. For RSS clients though they're more pure in that they take the DTDs definition of entities literally so we do need to access the DTD.

But you'd expect clients to cache them, using XML catalogs as you say. They should be packaged with the standard DTDs, a default DTD with all the HTML entities, and only check for updates occasionally without requiring it.

--
-Docvert converts MSWord to OpenDocument, clean HTML

Localized hosting by Alien54 · 2007-05-17 10:59 · Score: 1

what to stop someone from hosting this files locally, for their own use, on a local server? In some cases this would not be practical, with redirects for downloading, etc. but could this be done for some instances?

--
"It is a greater offense to steal men's labor, than their clothes"

Re:Localized hosting by FLEB · 2007-05-17 11:27 · Score: 1

I think (might be wrong) that most of the problems come from some apps which:

1.) Use the DTD URI to determine a document's type, from a list of known URI/type associations in the application. (For instance, a web browser that checks the DTD to determine whether to render in HTML or XHTML mode.)

and

2.) Validate the document against the DTD from the copy stored at the URI (given that the URI is a URL... it does not necessarily have to be.)

And, if the DTD isn't at the URL (fails on 2), it barfs from not being able to validate the document. However, if the URI is not one from its known list (hosted elsewhere, for instance), it would not know which of its rendering schemes to use to display/process/etc. the document.

--
Information wants to be free.
Entertainment wants to be paid.
You just want to be cheap.
Re:Localized hosting by rfreedman · 2007-05-17 11:33 · Score: 1

Indeed, I've always considered this a must for production applications - particulary intranet applications The overhead of retreiving the DTD from the web is simply unacceptable in many situations.
Re:Localized hosting by bytesex · 2007-05-17 19:42 · Score: 2, Interesting

Exactly. What always struck me about certain applications that do a DTD-conformant XLST processing step _every_time_ a web page is checked. That means my web app is dependent on the location on the internet being reachable (proxies!! downtime!! all that yummy goodness!!), plus the unacceptable overhead. But.. they merrily keep on making XSLT processors that _will_not_run_ without access to the DTD (I'm looking at you java!).

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:Localized hosting by Anonymous Coward · 2007-05-17 20:03 · Score: 0

Apps with built-in behavior for certain attributes or elements should look for them by qualified name. Checking against a fixed set of DTD or schema URIs is intensely stupid. How are you going to define new attributes or elements or entities if you can't fork the public DTD/schema and use a new URI in your own file: or http: space to point to it? And if it's not a problem that you can't do that, why bother with XML?
Re:Localized hosting by mhall119 · 2007-05-18 07:22 · Score: 1

Apps with built-in behavior for certain attributes or elements should look for them by qualified name.
So that changing the format of the XML document requires a recompile of the app? No thanks.

--
http://www.mhall119.com

Re: Is Dedicated Hosting for Critical DTDs Necess by Anonymous Coward · 2007-05-17 11:04 · Score: 0

Are Critical DTDs Necessary?

As far as I know this is an quite old story.

And we (the Slashdotters) came already to the conclusion that programmers who write code that relies on such kind of external resource need to be fired because they're obviously incompetent and a danger to the business of their employer.

So, it doesn't really matter if such external resources are hosted one way or the other. You stay away from them. You stay away from them. You don't use these external resources in your code.

Uhhm.... I thought we were using XML Schema now??? by SadGeekHermit · 2007-05-17 11:11 · Score: 1, Offtopic

People are still using DTD's? I thought everybody switched to XML Schema a while back. God, I can't keep up with this constant flux!

I need some chinese food. Hmm...

Schezuan!

--
NO CARRIER

Not again by dedazo · 2007-05-17 11:14 · Score: 3, Informative

This has been covered before here and elsewhere... anyone who is using a DTD as a URL rather than a URI needs to be taken out and shot. I say bring them all down and let all the apps that rely on them die or be fixed.

--
Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo

Re:Not again by Anonymous Coward · 2007-05-17 20:27 · Score: 0

The goal is to make sure everyone can process the document, so the URI should be a URL for the author's copy of the DTD or schema with any changes they've made. If it doesn't resolve what good is it?

If your answer is "Thou Shalt Not Modify The DTD" your format is not extensible and you shouldn't be using XML at all.
Re:Not again by Anonymous Coward · 2007-05-18 02:18 · Score: 0

BS, it's just an "identifier". Using URI which point to *your* domain prevent that any random bozo uses the same identifier by accident. Registering a MIME type or similar would take too much time and effort. You are definitely not supposed to download the DTD, especially not automagically. People nowadays just fscking can't think globally. They assume ish like "oh well it's only one request per hour at most" and then there are 10 million users which cause *only* one request per hour each. Yippy yay kay, being brain-dead is a bliss.

Supply local DTDs with your app by Dragonshed · 2007-05-17 11:23 · Score: 4, Interesting

I recently (within the last year) deployed an application that end users use for downloading and viewing custom content, and are intended to install the app onto laptops, tablets, and other portable devices allowing them view said content both on and off-line.

When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.

In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.

Centralization of more than DTDs is good. by MikeFM · 2007-05-17 11:23 · Score: 1

The trick is to make centralized copies of important, or oft used, files available. I'd not just do DTD's. I think as AJAX, Web 2.0, or whatever you wanna call it, grows more popular and demands users download more and more Javascript, images, etc that are often the same files between different websites that it could be very useful to them if we stored a copy of those shared files on one server, with caching properly configured, so that users need to only download and store one copy instead of dozens of copies.

You don't have to centralize the originals - just copies. You get the benefits of a centralized resource without the risks of a centralized organization.

--
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.

Re:Centralization of more than DTDs is good. by Pieroxy · 2007-05-18 00:21 · Score: 1

That's very good! So that hackers need to hack this one server and modify a tiny javascript file to screw up thousands of websites... It looks as if security was overlooked in your reasoning. Damn hackers !

--
Write boring code, not shiny code!
Re:Centralization of more than DTDs is good. by MikeFM · 2007-05-22 05:37 · Score: 1

Security on one server, ran by someone that knows what they're doing, will be better than what we have now, which is thousands of webservers ran by people who have almost no idea what they are doing and no time and money to dedicate to doing it right.

Take a look at what servers get hacked. It isn't often those that are well maintained by trained people with years of experience. It's usually people stupid enough to run a server that hasn't been updated in three years.

--
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.

The DNS root servers are run by... by _iris · 2007-05-17 11:36 · Score: 1

Wikipedia's Root nameserver entry says that 4 of the 13 root nameservers are run by private companies.

Catalog files? by aamcf · 2007-05-17 11:40 · Score: 1

Am I missing something here, or is this problem solved by catalog files? Surely any decent XML parser that can download an external DTD subset from a URI can get the DTD subset via a catalog file?

Re:Catalog files? by gmack · 2007-05-17 14:44 · Score: 1

Or better yet why can't you just copy the blasted thing to your own site if your going to use it?

Is there some technical reason I'm not aware of that means it has to stay somewhere central?
Re:Catalog files? by EsbenMoseHansen · 2007-05-17 17:57 · Score: 4, Insightful

Or better yet why can't you just copy the blasted thing to your own site if your going to use it?

Is there some technical reason I'm not aware of that means it has to stay somewhere central?
There shouldn't be, yet I would be greatly surprised if some application didn't match on the entire DTD string, hostname and all.
I am equally baffled at what applications need the DTD for anyway. Except for generic XML applications, what use is a DTD? Most applications only handles a fixed few XML document types anyway.
Finally, if they really need that DTD... any distro have most major DTDs available. No reason why they couldn't carry a few extra. Should be easy to just search for them locally.

--
Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.
Re:Catalog files? by Anonymous Coward · 2007-05-18 03:24 · Score: 0

Yes, either individual organizations, or ISPs, or consortia could cache DTDs and XML Schemas and such - there are several pretty mature (although not well understood or used in many IT Spaces) catalog providers for XML parsers. The obvious place is for a few major corporations or open source houses to setup internal systems like this, and publish how they did it.

http://www.xml.com/pub/a/2004/03/03/catalogs.html and plenty of other places can tell you how to use a catlog resolver of your very own.

In terms of standards organizations - who's to say that a W3C server is more likely to stay up forever holding some chunk of DTD/Schemas than a Netscape server? It's not a question of finding the One True Universewide Schema Server - it's more about spreading around the information and routing around damaged sections of the net (to overuse and stretch an old meme).
Re:Catalog files? by jmcwork · 2007-05-18 04:41 · Score: 1

The DTD (or more recently schema) should define the allowable content and structure of the XML document for validation purposes. This is supposed to be one of the selling points of XML, being able to verify that a document is valid.
Re:Catalog files? by EsbenMoseHansen · 2007-05-18 07:39 · Score: 1

The DTD (or more recently schema) should define the allowable content and structure of the XML document for validation purposes. This is supposed to be one of the selling points of XML, being able to verify that a document is valid.
Sure, but no sane RSS reader is going to read the DTD, parse it and validate the served XML. Nor will a browser chew through the (humongous) XHTML DTD and then validate it against this DTD. Rather, such application will have the structure of the XHTML hardwired into the application, and ignore any unknown tags. The only check would be to check if the DTD is known, plus any structure tests that the application chooses to implement --- usually the free ones.
Authoring tools are, of course, another matter, but those would probably want the relevant DTDs locally installed anyway.

--
Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.

sure there is! by commodoresloat · 2007-05-17 11:48 · Score: 2, Funny

What's wrong with this website?

I have a server in my basement we could use. by fyoder · 2007-05-17 11:53 · Score: 4, Funny

Linux box with an uptime of 153 days. It does have to go down now and again so I can clean the dust and cat fur out of it, but that doesn't take too long.

--
Loose lips lose spit.

Re:I have a server in my basement we could use. by Skapare · 2007-05-17 13:27 · Score: 2, Funny

I have an old Sun Sparc 5/70 that still works. Rock solid machine and has OpenBSD loaded on it. I even have a static IP address on my dialup service I could put it on.

--
now we need to go OSS in diesel cars

DTDs, XML entities and the non-breaking space by Darkforge · 2007-05-17 11:56 · Score: 3, Funny

Unfortunately, DTDs aren't just for validation... they're also the only good way to define "entities" (e.g. "&foo;") in XML. This comes up a lot when trying to put HTML in XML feeds, because HTML has a lot of entities that aren't in the XML spec. Specifically, you may notice that you can't type " " in ordinary XML.

It's trivial to define " " yourself in a DTD, (<!ENTITY nbsp "&#a0;">) and many of the standard DTDs out there do define it, but by the XML 1.0 standard it's got to be defined somewhere or else the XML won't parse.

--

When I moderate, I only use "-1, Overrated". That way, I never get meta-moderated!

Re:DTDs, XML entities and the non-breaking space by Lachlan+Hunt · 2007-05-17 12:06 · Score: 1

You're better off using numeric or hexadecimal character references instead, or just encoding the file in UTF-8 and using whatever character you need directly. Although, it would have really helped if XML 1.0 had predefined the entire set of entity references defined HTML4, instead of amp, lt, gt, quot and apos. Then they all could have been used without a DTD.

--
By reading this signature, you hereby agree with the content of the above comment.

HTML 5 by somethinghollow · 2007-05-17 11:58 · Score: 1

I don't know why important DTDs aren't just turned into serializations. HTML 5 (and, in practice, HTML in general) has a text/html serialization because the major browsers don't care about DTDs. It seems like well-published specifications like RSS should just be serialized and DTDs ignored, even though they are presented, instead of breaking when the DTD can't be found. I guess that wouldn't work if a generic XML parser was used for RSS, but for RSS readers, the DTD shouldn't matter.

A few problems with RelaxNG validation by wowbagger · 2007-05-17 11:58 · Score: 1

The last time I checked, there is no mechanism by which an XML file can provide a link to the corresponding RelaxNG schema in the same way that it can provide a DTD.

Thus, while an application which expects files conforming to a specific schema can validate against that schema, it is not possible for a program to validate an arbitrary XML file. For example, there is no way xmllint can automatically find the related RelaxNG schema, in the same way that it can find the DTD.

If I am wrong, and there is a way to provide the schema, please enlighten me.

--
www.eFax.com are spammers

Re:A few problems with RelaxNG validation by SimHacker · 2007-05-17 13:59 · Score: 1

There's a reason for that!
Here's a discussion about it on the Relaxng-user mailing list:
http://relaxng.org/pipermail/relaxng-user/2003-Oct ober/thread.html
>> I'm a relatively new "convert" (from XML Schema) to RELAX NG. I understand that there is no standard way to associate a RELAX NG schema with a document. I'm just wondering if there is any plan to make this possible.
> Not really. The theory is that you might want to validate a document against different schemas for different purposes, and no one schema is really preferred.
James Clark weighs in with his usual clarity:

>> In simpler words, the people who designed the technology don't see a consistent way to formally express an association that already exists, or didn't implement it yet.
> It's part of the general problem of specifying appropriate XML processing; an RNG-specific solution is neither particularly general nor, IMHO, particularly useful.
I would divide the problem of specifying appropriate XML processing for a document into:
(a) how to specify the process to be performed
(b) how to locate the appropriate processing specification
I see (b) as a special case of the problem of how to specify rules that, given an XML document, find a related resource. This is problem that the XML vocabulary that I've designed for nXML mode is intended to solve. It's not specific to RELAX NG or for that matter to schemas. You could use the same vocabulary to describe how to find the XSLT stylesheet to use to display an XML document.
[...]
Although it's important to be able to individually specify the schema to use for a particular document, it's also convenient to be able to specify rules that apply to classes of document. For example, on my system I have a rule that says when the namespace URI of the document element is http://relaxng.org/ns/structure/1.0, then the schema is /home/jjc/schema/relaxng.rnc.
James
> Hum, that's a place where I would expect the XML Catalogs to take a role in abstracting the file paths.
I think that's an independent issue. If you are in an environment that has a policy of using XML catalogs for URI remapping in XML-related contexts, then it would make sense to use them for remapping both URIs occuring in include/externalRef in schemas and URIs occurring in locating files. However I don't see any need to explicitly couple locating files to catalogs. My personal opinion is that, although XML catalogs are an appropriate solution to the problem of publicId-to-URI mapping, using XML catalogs to perform URI-to-URI mapping is an XML-specific solution to a non-XML-specific problem.
James
> Thus, for me the only reasonable choice is still to use the DOCTYPE declaration for all associations
If you want to use DOCTYPEs, the nXML method can accomodate you (by doctypePublicId rules). However, I find the problems of using DOCTYPEs worse by far than the problem of associations disappearing on a rename. And even with DOCTYPEs, you can still get problem of the association changing; you still have to associate your DOCTYPEs with schemas. If you force me to put something in the instance, I would much prefer a processing instruction.
There's no single right way to do the association. Different users will legitimately prefer different approaches. A solution needs to be flexible enough to accomodate them.
James
> My opinion is that this association should be obligatory once present and could not be overriden.
It's a basic tenet of RELAX NG that the schema is not inherent in the document and that validation is a process that has two independently-specifiable inputs. Section 8 of the spec says: "A conforming RELAX NG validator must be a

--
Take a look and feel free: http://www.PieMenu.com

DTDs are Useless by Lachlan+Hunt · 2007-05-17 11:58 · Score: 1

Quick, someone register http://all.your.dtds.are.belong.to.us/ :-)

Seriously though, we don't need dedicated hosting for DTDs. We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless. Web browser vendors realised this a long time ago. No browser ever read HTML's SGML DTDs, and they do not use validating parsers for XHTML either (although, they use a hack to parse a subset of the DTD to handle XHTML and MathML entity references).

DTDs are bad for several reasons:

DTDs pollute the document with schema-specific syntax. Since the document itself declares the rules, the question on answered by DTD validation is not the question that should be asked. DTD validation aswers the question "Does this document conform to the rules it declares itself?" The interesting question is "Does this document conform to these rules?" when the person who asks the question chooses the rules the question is about.

DTDs mix a validation mechanism, an inclusion mechanism and an infoset augmentation mechanism. The inclusion mechanism is mainly used for cheracter entities, which solve (but only it if the DTD is processed and processing it is not required!) an input problem by burdening the recipient instead of keeping input matters between the editing software and the document author.

DTDs aren't particularly expressive.

DTDs don't support Namespaces in XML.

Plus, if a UA needs to request the DTD every time it parses the file, that adds significant overhead by the time it fetches the DTD, parses it and checks the document for validity. It's just not worth it. The Netscape RSS DTD issue was a mistake, and it's time to learn from that. There are much better alternatives available for validating XML than DTDs, such as RelaxNG or Schematron.

--
By reading this signature, you hereby agree with the content of the above comment.

Re:DTDs are Useless by nagora · 2007-05-17 23:57 · Score: 1

We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless
That would involve them not being idiots. Not going to happen.
TWW

--
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"

Isn't this addressed already? by Talchas · 2007-05-17 12:01 · Score: 1

Isn't this what doctypes like this are for:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transition al.dtd">

That whole PUBLIC thing means that the browser can have its own copy so that it doesn't have to fetch it off the website. Is there a reason that this is not the standard way of doing this?

--
As the Americans learned so painfully in Earth's final century,free flow of information is the only safeguard against...

Re:Isn't this addressed already? by nevali · 2007-05-19 10:45 · Score: 1

No.

The URI there (http://www.w3.org/TR/xhtml1/DTD/xhtml1-transition al.dtd) doesn't have to be a URL. It could be a URN, or some other kind of URI.

In other words, it's just an identifier--using a URL was just a nice easy way of making sure it was unique.
Re:Isn't this addressed already? by Anonymous Coward · 2007-05-26 23:16 · Score: 0

XML-based formats are supposed to be extensible. At any time I can fork a schema or DTD and publish my tweaked version. An opaque identifier that doesn't enable the parser to fetch and cache the proper schema or DTD is unusable, and in fact doing so is why the SGML doctype declaration supported (file)system and public identifiers in the first place (rather than just a top-level element name and version number).

short answer: no by coaxial · 2007-05-17 12:03 · Score: 3, Insightful

Validation is overrated. Especially, when it comes to RSS. There's so many competing "compatable" standards, that really aren't. feedparser.org has a great write up about the state of RSS. It's pathetic.

If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway. When you're constructing XML, you should write it according to the DTD, but if you're relying on a remote site, then you're asking for trouble. Just cache the version locally, but seriously, you're tool shouldn't really need it. You're engineers do, but not the tool.

Finally, it's trivial to reconstruct a dtd from sample documents.

Re:short answer: no by JustNiz · 2007-05-17 12:12 · Score: 1

>> Finally, it's trivial to reconstruct a dtd from sample documents.

But it won't be the same DTD as the one used to create the documents, which is probably the 'standard' one.
Re:short answer: no by coaxial · 2007-05-17 12:46 · Score: 1

But it doesn't matter. It's the one that's actually used. Real world data always trumps ideal data.
Re:short answer: no by KermodeBear · 2007-05-17 15:54 · Score: 2, Interesting

Off-topic gripe, but:
If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway.
I did work developing a large XML-based integration with the mortgage lender AmeriQuest. Boy, did they have interesting ideas on what valid XML is! I had to deal with fun things like:

<tag />data</tag> - An empty tag being used for an opening tag
</tag>data</tag> - A closing tag being used for an opening tag
<tag>data<tag> - The opposite problem - two opening tags!
<tag attribute="data" attribute="data2" attribute="data3"> - The same attribute appearing multiple times because they wanted to send multiple values. God forbid they create child nodes!

and other fun things. Imagine my frustration when I was told, "The customer is always right, if that is what they are sending then find a way to handle it..." Different xml events had special processing to turn the invalid xml into a well-formed document so that they could be parsed. Ugh.

AmeriQuest had farmed all the work out to an Indian outsourcing firm. You get what you pay for...

--
Love sees no species.
Re:short answer: no by coaxial · 2007-05-17 16:47 · Score: 1

AmeriQuest had farmed all the work out to an Indian outsourcing firm. You get what you pay for...

I'll admit, that's immediately what I thought when you described the code. It reminded me of what a friend of mine said when he had to interface with some Indian outsourced code. In dealing with them, he said, "They have all the arrogance and skill of a freshman CS student."

That's not to say Indians can't code, anyone that's been in any large western company or university knows they can. The problem is with the outsoursing firms. Many have crappy "engineers" in an effort to lower costs and cash in on a growth industry. It's a standard economic phenomenon.
Re:short answer: no by JustNiz · 2007-05-18 07:38 · Score: 1

You're putting the cart before the horse otherwise why bother with a DTD at all?

The purpose of a DTD is to define what the legal/agreed format of the data is, Not what format the data is actually in.

You can't deduce the DTD from the XML itself as the XML may be illegally/incorrectly formatted or otherwise corrupted.

Also the XML file you create the DTD from my not happen to use all the legal variations of allowed formats, therefore you won't get a complete DTD anyway.
Re:short answer: no by coaxial · 2007-05-18 11:44 · Score: 1

The purpose of a DTD is to define what the legal/agreed format of the data is, Not what format the data is actually in.

Huh?

A DTD is a computer readable formal spec of a document structure. Why it needs to be computer readable is beyond me. All it does is allow for pendanic software, which really isn't desirable quality, except in a lint tool.

Practically speaking, the easiest way to spec an XML document is simply to construct an example document using every bell and whistle available in the system.

You can't deduce the DTD from the XML itself as the XML may be illegally/incorrectly formatted or otherwise corrupted.

You are simply, utterly, and completely wrong.

Malformed and and invalid XML can handled through simple regexps. How do you think HTML has been handled since day 1? You can even correct the XML prior sending it to true parser if you want, but you don't have to.

Furthermore, the literature is full of techniques to infer DTDs from example XML documents. It's a standard problem and there's a standard solution.

Also the XML file you create the DTD from my not happen to use all the legal variations of allowed formats, therefore you won't get a complete DTD anyway.

True, but you just keep sampling documents until you get a good enough DTD.

Say you're sampling XHTML documents. Does it really matter if you don't find the "loz" entity? Of course not. No one uses it. If it was relevant, and you had a representiitive sample, you would have seen it. Constructing the true DTD isn't necessary. You only need the parts you actually use. Even if you do come across something you missed, you simply throw a warning and ignore it. It's called failing gracefully. That's what you're supposed to do anyway; not only in the XML world, but the engineering world in general.

DTDs are a complete nonissue.

EXACTLY by wowbagger · 2007-05-17 12:03 · Score: 4, Insightful

Exactly right, but it is even worse than that:

A DTD spec SHOULD have both a PUBLIC identifier and a SYSTEM identifier. The system identifier is strongly recommended to be a URL so that a validating parser can fetch the DTD if the DTD is not found in the system catalog.

The system catalog is supposed to map from the PUBLIC identifier to a local file, so that the parser needn't go to the network.

If you are running a recent vintage Linux, look in /etc/xml/ - there are all the catalog maps for all the various DTDs in use.

So:

The application writers SHOULD have added the DTDs to the local system's catalog.
Failing that, the application SHOULD have cached the DTD locally the first time it was fetched, and never fetched it again.

--
www.eFax.com are spammers

Builtin DTDs everywhere! by darthflo · 2007-05-17 12:13 · Score: 1

Now I may have not quite grasped the importance of DTDs, but I can think of only one scenario where retrieving a DTD from a to-be-determined location would be useful: Validating XML against any DTD. (Solution: Whomever wants to validate will also provide the DTD.)
To my knowledge any other application could just depend on builtin DTDs for validating the formats it knows and don't care about whatever it doesn't know as it wouldn't be able to intelligently use them, anyways.

Did I forget to take in account one of those nice tiny little huge details somewhere?

ICANN song. by Short+Circuit · 2007-05-17 12:26 · Score: 1

I'm sorry...it just has to be done. Source should be obvious. But I butchered it horribly because I kept getting pwned by the line length filter.

Anything you can do, ICANN do better./ICANN do anything Better than you.

No, you can't./Yes, ICANN. No, you can't./Yes, ICANN. No, you can't./Yes, ICANN, Yes, ICANN!

Anything you can be ICANN be greater./Sooner or later, I'm greater than you.

No, you're not. Yes, I am./No, you're not. Yes, I am./No, you're NOT!. Yes, I am./Yes, I am!

ICANN shoot a partridge With a single cartridge./ICANN get a sparrow With a bow and arrow.
ICANN live on bread and cheese.
And only on that?/Yes./So can a rat!
Any note you can reach ICANN go higher.
ICANN sing anything Higher than you.
No, you can't. (High)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN! (Highest)

Anything you can buy ICANN buy cheaper./ICANN buy anything Cheaper than you.

Fifty cents?/Forty cents! Thirty cents?/Twenty cents! No, you can't!
Yes, ICANN, Yes, ICANN!

Anything you can say ICANN say softer./ICANN say anything Softer than you.
No, you can't. (Softly)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer)
YES, ICANN! (Full volume)
ICANN drink my liquor Faster than a flicker./ICANN drink it quicker And get even sicker!

ICANN open any safe.
Without bein' caught?/Sure./That's what I thought--you crook!

Any note you can hold ICANN hold longer.ICANN hold any note Longer than you.

No, you can't.
Yes, ICANN No, you can't/Yes, ICANN No, you can't.
Yes, ICANN
Yes, I-I-I-I-I-I-I-I-I No, you C-A-A-A-A-A-A-A-A-A-A-A-A-N'T--
CA-A-A-A-N! (Cough, cough!)
Yes, you ca-a-a-an!

Anything you can wear ICANN wear better./In what you wear I'd look better than you.
In my coat?/In your vest! In my shoes?/In your hat! No, you can't!/Yes, ICANN Yes, ICANN!

Anything you say ICANN say faster./ICANN say anything Faster than you.
No, you can't. (Fast)
Yes, ICANN. (Faster) No, you can't. (Faster)
Yes, ICANN. (Faster) Noyoucan't. (Faster)
YesIcan! (Fastest)

ICANN jump a hurdle./ICANN wear a girdle.
ICANN knit a sweater./ICANN fill it better!
ICANN do most anything!/Can you bake a pie?
No./Neither can I.
Anything you can sing ICANN sing sweeter./ICANN sing anything Sweeter than you.

No, you can't. (Sweetly)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't, can't, can't (sweeter)
Yes, ICANN, CAN, CAN (Sugary)

Yes, ICANN! No, you can't!

--
tasks(723) drafts(105) languages(484) examples(29106)

URLs were never sane by trimbo · 2007-05-17 12:28 · Score: 1

Think about it.

A URL has:

A hostname
A PORT number
A path on that machine

The only one of those that the machine itself has any control over hiding from the user is the path, which can be virtualized. However, many aren't. DTDs certainly don't seem to be.

A distributed system for this kind of mission-critical information is what we need. Think DNS for documents, rather than just hosts.

XML catalog files let your app use local copies... by KarmaRundi · 2007-05-17 13:06 · Score: 3, Informative

You can map public and system identifiers to local resources. Use them for dtds, schemas, stylesheets, etc. Here's the spec. Google for more information.

DNS? by saltmiser · 2007-05-17 13:21 · Score: 1

Do it like the DNS system, have a bunch of companies (*cough* google yahoo verizon mozilla microsoft (or not) *cough*) host this stuff, I doubt they'll all go bankrupt at the same time ;P

DTD Critical Hosting by liothen · 2007-05-17 13:22 · Score: 2, Insightful

Why doesnt the content provider just provide the dtd. Why have to worry about caching it or random errors poping up in it, when the DTD can be stored on the very same server as the website, or stored with the application. Then it doesnt matter if another company screws up or if some miliscious hacker decideds to attack the DTD it doesnt effect your product...
Some might think well what if it changes?
well its obvious download the new one update your xhtml/xml or application to the specific changes.

Think first, implement later by Anonymous Coward · 2007-05-17 13:49 · Score: 0

Using URLs is just a non-bureaucratic way to avoid name clashes, which is rather clever. However, using http:/// as prefix is rather brain-dead because all the other brain-dead people will assume that you have to anything from this URL. It would have been smarter to add a dedicated URN prefix for this like "namespace:", "spec:" or "whatever:".

yes: XML Catalogs; no: DTD document hosting by jmaline · 2007-05-17 13:54 · Score: 1

Kind of, but not really.

Yes, XML catalogs are the answer.

Nothing in the XML specs says that any actual document is hosted at the URI. It's a mechanism to specify a globally unique identifier. It's an identifer, not a promise to host a document. Some folks host the DTD document at the URI, but there's no requirement to do that.

While I'm sure not every RSS client uses a high-quality XML implementation, it seems clearly true that every RSS client is an XML processor. RSS is an XML format. So an RSS client is, um, processing XML...

As for checking for updates, that's a non-issue. Remember, the URI is a unique identifier. If you ever update the DTD, you'd generate a new URI.

Re:yes: XML Catalogs; no: DTD document hosting by holloway · 2007-05-17 18:16 · Score: 1

Nothing in the XML specs says that any actual document is hosted at the URI. It's a mechanism to specify a globally unique identifier. It's an identifer, not a promise to host a document. Some folks host the DTD document at the URI, but there's no requirement to do that.
For the XMLNS yes, but for the DOCTYPE no. The system resource field in a doctype does refer to a system path or a url, and there's no way to resolve non-default XML entities. You could assume HTML entities, as most RSS parsers do.
While I'm sure not every RSS client uses a high-quality XML implementation, it seems clearly true that every RSS client is an XML processor. RSS is an XML format. So an RSS client is, um, processing XML...
They're processing tags, but anything that touches XML doesn't qualify as an "XML processor" (as per the W3C definition). An XML processor must fail when faced with an unresolvable entity or invalid XML. Many RSS parsers such as the Universal Feed Parser do not read RSS as XML but through string manipulation, regexs, and complex parsing logic. They're taking a leaf from HTML parsers and doing the best they can with the given document in order to be more robust. They tend to dynamically replace entities through string replacement rather than resolving external DTDs. This is backed up by Chris Finke of Netscape, who said..
Theoretically, RSS readers load this file when parsing an RSS 0.91 feed. However, In practice, most readers (including those built into Firefox and Internet Explorer) either just ignore the file or load their own cached copy. The unavailability of this file had the effect of causing certain feed readers - Microsoft's Live.com RSS gadget, for one - to refuse to display RSS 0.91 feeds.
So, there are some RSS parsers that resolve DTDs, and some that don't. The ones that want to parse a DTD to resolve entities should cache it, as I was saying.

--
-Docvert converts MSWord to OpenDocument, clean HTML

the best host is localhost by rickla · 2007-05-17 14:07 · Score: 1

Having anything in a live project linking externally is insane! I never understood how developers can risk this.

We use maven, use dtd's schemas wsdl etc. Much of the wsdl and other files refer to online areas. We download these and alter the references to be local. Otherwise we would have a build fail because of an internet issue, which is just nuts.

Same with maven, we have our own local repository where we keep a subset of what we use. Again same situation. In these cases this is just for building, I can't imagine doing this on a live site. This can especially go for externally referenced javascript... local copies are your friend.

Re: W3C Not a Bad Idea by Douglas+Goodall · 2007-05-17 15:27 · Score: 1

The real question is whether the Internet users at large would help pay for robust infrastructure at w3 to support this.

Well... by frank_adrian314159 · 2007-05-17 15:47 · Score: 1

Just flush XML and then it wouldn't be an issue...

--
That is all.

guess who will want control over it by mr_musan · 2007-05-17 16:15 · Score: 1

I am all for it but such a think needs to be international NOT in control of a company based in any country *cough* ICANN *cough*

Missing 4ml.org by Sivaraj · 2007-05-17 16:51 · Score: 1

Just the other day, I was looking for a DTD for 4ML related to music and lyric notations. But the website is not working. Most probably the guy got bored with it and forgot to pay the hosting company.

We definitely need some sustainable way to host the DTDs.

~Sivaraj

The security implications are extremely ugly by knorthern+knight · 2007-05-17 16:58 · Score: 2, Insightful

1) There are some sensitive environments (military, etc) where you simply do *NOT* connect your internal network to "teh interweb". No ifs, ands, ors, buts. The result is a broken browser where the DTD's are required.

2) Remember the incident where popular "safe" Superbowl sites were compromised and laced with malware-installing code? What happens to millions of Firefox-on-Windows users when a bunch of Russian mobsters or Chinese government agents hijack a DTD host and load it with a zero-day Windows exploit?

3) Remember "pharming", where DNS servers are hijacked to redirect *CORRECTLY TYPED URLS* to malware-infested sites. Even if the bad-guys can't hijack the DTD host, they can still hijack Windows-based DNS servers (ptui!) and anybody who relies on them gets redirected to a malware-install site.

That's the problem; here's my solution. It's composed of two parts.

A) DTDs will be *LOCAL FILES ON YOUR WORKSTATION* (excepting "thin clients").

B) Browsers (or possibly Operating Systems) will include new DTDs with updates. In posix OS's (*NIX, BSD) DTDs will be stored in /etc/dtd/ and users will be able to add their own DTDs in ~/.dtd
Windows will have its own locations. When you get your regular update for your browser (or alternatively, your OS), part of the update will be any new DTDs. There will be a separate file for each DTD and version, so that your browser can properly handle multiple tabs opening to sites using different versions of the same base DTD.

--

I'm not repeating myself
I'm an X window user; I'm an ex-Windows user

What organization? by amper · 2007-05-17 19:37 · Score: 1

Why, the same organization that should probably be responsible for *all* critical Internet infrastructure standards, just as it is responsible for the standards relating to telecommunications and radio communications.

The ITU (also here.

Go ahead, laugh, but I think it's long past time for control of such functions as DNS, NTP, assigned numbers, et cetera, to be transferred out of the hands of primarily US-based corporations and loosely coupled organizations such as the IETF and IANA and into the hands of some sort of international treaty organization.

Since the ITU not only fits this description, but in fact was founded to deal with precisely these sorts of issues, why not let it do what it does for the Internet as well?

Descentralization (Much better) by drakoo · 2007-05-17 21:40 · Score: 1

The best way would be to use a decentralized information storage system like ed2k: URI scheme or Magnet: URI scheme

Stupid question :) by NekoXP · 2007-05-17 22:44 · Score: 1

What organization would be the likely custodian of such data?

Is it not obvious that it may as well be the W3C? XML is their standard, operating a registry for public-use DTDs would be a rather reasonable service to provide..

Bad idea any way you slice it by Yogs · 2007-05-18 00:10 · Score: 1

As many people have alluded to already, it's an incredibly bad idea to make your application have an unnecessary dependency on an external service. Keep a local copy, just copy it down once and you're done, simple as can be.

But maybe there are urls out there pointing to "the latest and greatest" version, rather than a specific version, and you like the idea of using "the latest and greatest". So, think for a moment what happens when the DTD/schema changes. Is your app magically going to change how it deals with the xml at the same time? Of course not!

So, until you can get out a patch, you'd be refusing xml docs your code/xsl has been built to handle, and possibly letting in xml docs that your code/xsl has not been built to handle. Whereas, if you just kept a local DTD/schema, you would have no trouble keeping it, and the code/xsl behind in in sync.

Mirror them yourself by Anonymous Coward · 2007-05-18 01:56 · Score: 0

If everyone mirrored every DTD they need, we'd not have this problem; They aren't large just mirror the ones you use.

Security is key and is solution by fsbogus · 2007-05-18 06:58 · Score: 1

The DTD is great for doing the development. It is horrible for having to validate a transaction each and every time.

Think about this. Why should you pull down a DTD each time you goto validate a transaction? Does your transaction dynamically change each time or does it stay the same for long periods of time? Likely it stays the same.

Additionally if you are referencing a DTD that is external to your environment, why the hell would you trust that DTD? How do you verify that that DTD is the correct DTD? There is enough cracking and whatever else out there that somebody could sneak in and either change the DTD without telling anybody and it still seems to work or maybe not. Perhaps the change introduces a hole into your system or somebody's system that allows a cracker in. It is purely dumb to ever reference an external DTD ever!

In my job, I explicitly remove DTD references before parsing XML documents to prevent a security issue from happening. Also, what if your customer providing the DTD changes the DTD without telling you and simultaneously changes the document you validate against to match those changes? Will your system be able to handle that change and adjust. In theory it should, but in practice, especially if you are a business, that is bad business. Contracts likely need to be changed, people need to be paid, etc.

If you want your system to break, reference and external DTD. If you want to increase your security risk, reference and external DTD.

--

The statement below is FALSE

The statement above is TRUE

No... by KillerCow · 2007-05-19 07:31 · Score: 1

...they are unique identifiers, not URLs.

They don't need to be hosted for the same reason that there isn't a machine out there called com.sun.java.util.dates.FunnyDate

Slashdot Mirror

Is Dedicated Hosting for Critical DTDs Necessary?

140 comments