Netscape Dumps Critical File, Breaks RSS 0.9 Feeds

← Back to Stories (view on slashdot.org)

Netscape Dumps Critical File, Breaks RSS 0.9 Feeds

Posted by ryuzaki0 on Sunday January 14, 2007 @03:28AM from the hate-when-that-happens dept.

An anonymous reader writes "In the standard definition of RSS 0.91, there are a couple of lines referring to 'DOCTYPE' and referencing a 'dtd' spec hosted on Netscape's website. According to an article on DeviceForge.com quite a few RSS feeds around the web probably stopped working properly over the past few weeks because Netscape recently stopped hosting the critical rss-0.91.dtd file. Probably someone over at netscape.com simply thought he was cleaning up some insignificant cruft." Some explanation has been offered by a Netscape employee.

11 of 137 comments (clear)

Min score:

Reason:

Sort:

Ugh by SuperBanana · 2007-01-14 03:35 · Score: 4, Insightful

According to an article on DeviceForge.com quite a few RSS feeds around the probably web stopped working properly over the past few weeks because Netscape recently stopped hosting the critical rss-0.91.dtd file.
STOP, Grammar time. Ooooh whoooaaa oh oh...
Probably someone over at netscape.com simply thought he was cleaning up some insignificant cruft."
Or Netscape got tired of people using their bandwidth. Regardless of the reasons: if you reference a file on someone's site, it's hardly their fault if they move/change/delete it, and it breaks your stuff.

--
Please help metamoderate.
1. Re:Ugh by aiken_d · 2007-01-14 04:30 · Score: 2, Insightful
  
  Well, you're doing pretty good spotting grammar problems, but you're pretty ignorant when it comes to the content of the story.
  
  RSS is a spec that was created and promoted by Netscape. For proper validation, this remote (Netscape-hosted) file is needed and was supplied by Netscape to facilitate implementation of RSS. Removing it breaks a standard that they promoted. That is, they encouraged people to use this file that they hosted as part of the RSS spec. To reiterate, they told people to use this file and then (presumably accidentally) removed it, thereby breaking some RSS feeds and readers. Got it?
  
  Maybe you should stick to the grammar nazi stuff on stories you actually understand, or at least leave out the technically ignorant comments and only complain about syntax problems in the story.
  
  --
  If I wanted a sig I would have filled in that stupid box.
Obligatory Lamport quote on distibuted systems by Programmer_Errant · 2007-01-14 03:38 · Score: 5, Insightful

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."
SPOF? by basketcase · 2007-01-14 03:41 · Score: 2, Insightful

Or maybe some smart person at Netscape decided to teach some people a lesson about using a 3rd party as a single point of failure?
The point... by Junta · 2007-01-14 03:51 · Score: 3, Insightful

Is to have a common component shared among many documents without replication.

Class paths is java are the perfect example to say how it *should* work. Java CLASSPATHs in every application/installation I have seen are site-local, all paths accessible without going over the internet to another site to get classes.

To be similar, an RSS site should copy this DTD to their local server, or to a server with which they have a concrete understanding of the relationship. Either a commercial agreement with a peer or at least using a server from an organization who explicitly defines the purpose of hosting to be a common place to promote it as a standard.

Did netscape promise itself to be an organization sharing that DTD explicitly, or did site developers get in the practice because 'it just always was there'?

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Then they're broken! by r00t · 2007-01-14 04:25 · Score: 4, Insightful

Fetching the spec is idiotic.

First of all, that's a needless operation. It can take time; don't forget the DNS lookup and all.

Second of all, it's not as if you could handle any random DTD. Software doesn't work that way. (this is one of the reasons why XML itself is a mostly-lame idea) If the XML doesn't match expectations, you can't convert it to your own internal representation. You probably have a C struct that you need to fill in. Even in some wild interpreted language like perl, you just won't have any use for unexpected data structures and you damn well need the expected data structures.
Re:Why would this break RSS readers? by Anonymous Coward · 2007-01-14 04:45 · Score: 1, Insightful

You're saying that it's impossible to correctly parse XML unless you can get a port 80 connection to a random internet host while doing it?

Wow, XML is even more fucked up than I thought.
Seriously bad programming by owlstead · 2007-01-14 04:52 · Score: 5, Insightful

If I would create a reader that was dependent on version 0.91 of the distribution, it sure as hell would include the DTD in local storage. It makes no sense to create a reader that can also use, say, version 0.92 since you would not know what had changed (and there is no such thing as inheritence between versions of a DTD afaik). Actually, as other readers noted, it would be terribly stupid to make your web-server or client rely on a third party computer for which you cannot guarantee the uptime.

These URL's are mainly there for their Uniqueness, not so much as for a place of quaranteed storage. Of course, they are also a nice place to look for the actual definition, but after that you would need a local repository. This is the first thing an XML library should support, and the first thing a moderately intelligent programmer should look at. I get *very* annoyed if this kind of basic rules are ignored. And I've even seen them ignored by people pointing to the XML digital signature definitions, where security and reliability should be the first requirements in the design.

Also, what would happen if w3c.org or netscape.com go the way of the Gopher? If they go bust? It's a quickly changing world out there.
Re:Sorry about that by martyb · 2007-01-14 05:18 · Score: 3, Insightful

(Nobody on the team was especially aware of this DTD file since all of the old Netscape employees were let go last year around the time Netscape.com was redeveloped; anybody working at Netscape now was hired since then.)
Now, why this file was living under my.netscape.com is anybody's guess, but we'll have it restored ASAP. I only wish that someone had brought it to our attention so that I didn't have to find out about it from Slashdot.

Ummm, maybe I'm mising something here, but I would think that your web log would show a spike in 404 errors for this file, right? In my experience, it is helpful to assume that I do not know what I don't know, and to put procedures in place to help make those omissions stick out. So, a scan of your log files not only for this file, but for any others that also have a high number of 404's (especially from a multitude of referers) would be worth investigating.
BTW, best of luck on the redesign!
Re:Why would this break RSS readers? by KrisWithAK · 2007-01-14 05:19 · Score: 3, Insightful

You are right. I wish I would have seen this article earlier so that I could have posted sooner -- and others to get to see the "solution"!

Ever since I started developing on a laptop during my commute, I discovered that XML-based programs like J2EE servers would simply stop working. I experienced the same thing at work where, by default, your desktop applications (namely Eclipse) do not have access to the internet, and the servers will never have access to the "Internet".

The proper thing to do is for your application to use an XML catalog for resolving entities/URIs. There is a good article at http://xml.apache.org/commons/components/resolver/ resolver-article.html that helped me out. In addition, if you are using Eclipse with the web tools platform, you can customize the catalog so it resolves DTDs and entities locally. See http://wiki.eclipse.org/index.php/Using_the_XML_Ca talog.
Re:W3C doctype by DavidTC · 2007-01-14 05:51 · Score: 2, Insightful

It's not any different, except the w3c is run by intelligent people, and Netscape, apparently, is not.
I've always thought the full paths were a bit stupid too, and they should have some sort of shortcut standard, one that says "Use w3c's HTML4.0 standard", and the web browser knows how to contruct a path to find w3c standards. That way, when "Use netscape's RSS0.91" standard stopped working, web browsers could have a trivial update, or their config could even be changed manually, to tell them where to find netscape's standards.
Granted, they already have something like this in the DOCTYPE, that's what '-//W3C//DTD HTML 4.01//EN' is, but then they blow it by then including the path after that. The parser should, instead, have to look at W3C and go 'Hey, I know where that is, that's w3c.org' and construct a standardized path using 'DTD HTML 4.01', like 'http://w3c.org/doctype/HTML4.01.dtd'. (And I just realized that string mysteriously doesn't include 'strict' or whatever in it, so now I'm slightly confused as to what good it's for.)
That way, when something happened to a server, the standard can be trivally updated to say 'W3C now means this domain, instead of w3c.org', and every damn page in existence doesn't have to change. Mandate that every parser should expose these locations to be reconfigured manually if needed, although obviously some sort of automatic updating is a good idea. (Notice, in general, application software doesn't need to be updated, because application software doesn't try to download the stuff in the first place.)
Now someone's going to host the DTD at some random place, and everyone will manually update everything to load the wrong URL when someone asks for "http://netscape.com/publish/formats/rss-0.91.dtd" then Netscape will move it back, and some applications will change back, and some won't, and it will be a big mess.
I understand the point of paths, in that, in theory, everyone can produce their own format and publish their own DTD. This has not, and probably is not, going to happen, and at this point all browsers interpet DOCTYPE strings as unparsable strings, and the only ones who actually read the things are the validators.

--
If corporations are people, aren't stockholders guilty of slavery?