A New Data Model for the Web
An anonymous reader writes "Adam Bosworth delivered what
could be considered a seminal lecture (mp3) at the last MySQL conference about a new data model
for the web, why the plain HTML web succeeded, and why XQuery or the
Semantic web are failures. He is emphatic that RSS 2.0/Atom are the
next big thing and represent the new data model for the web. The audio
is rather long at forty plus minutes and there are a few
places
where the
talk has been covered."
This is a great talk, and I really enjoyed it, but I'm not sure I buy it.
I haven't really digested the talk, so maybe that's why. But this is my gut reaction against what he's saying.
I don't think that geeks fully acknowledge the role of what I think of as bibliography in the web ecosystem.
I was an English major. Let's say that you want to learn about Faulkner. If you go to the card catalogue, and search for books about Faulkner, you get a lot of hits -- more books than you could ever read. It's essentially useless.
What you really need is a bibliography -- something written by a Faulkner scholar who says "these are the really important and groundbreaking books about Faulkner." That's one of the cool things about Encyclopedia Brittanica -- at the end of their articles, they tend to give you a run down of some of the key books on the subject.
So if you want to read a biography of George Washington, EB will let you find the right one. That's important, because there are so many biographies of George Washington out there.
That's my key point. If you go to a university library and use the catalogue to do a mechanical search for books about George Washington, the results aren't very useful. But if you read the bibliography at the end of the Encyclopedia Brittanica article, it's extremely useful.
I'm trying to draw a distinction between mechanical searches, on one hand, and selections based on human judgement on the other.
Google is useful in larege part, I think, because page rank lets you find what are essentially good bibliography pages. You use a dumb mechanical search to put you in touch with people who know their subjects and who have good judgement (hopefully).
The other day, for example, I was thinking about an old programming language called APL. I searched for it, and found a couple of pages that seemed to have collected just about everything APL -- anecdotes, personal histories, tutorials, implementations, pictures of the goofy APL keyboards, etc.
The Google powered web is cool because it combines the mechanical and the bibliographic so well. Google gets me to the bibliography -- it pulls that needle out of the haystack. But it's the bibliography that lets me drill down.
This is important. The really good stuff I read about APL didn't come directly from the actual google result page. There was a link in between -- the google result page took me to the APL bibliography page, and from there I was able to hit the meat of the matter.
We've seen, over the past decade, an explosion in which mechanical searching can do. Because it's been getting so much better so quickly, it's dominating the way we think about how we find information. It's causing us to give bibliography -- the judgement of experts -- short shrift.
But bibliography is absolutely key to the google ecosystem.
My problem with attempts to impose more structure on data is that it always breaks things. It's beefing up mechanical searches, which are already very good, and it does it at the expense of bibliography.
I buy the argument in this lecture more than the guy making it does. He complains about heavier structures, and how the complexity will prevent people from producing and consuming information. I think that almost any move away from what we have now will do the same thing. The more you structure information, the harder it is for people to provide bibliography.
The point is that the ideal medium for bibliogrphy is free form -- one person saying, "this is what I think" to another.
The genius of google is that page rank gives you a mechanical way to uncover the best bibliographies. The best ones tend to show up at the top of the results.
In the old days, there was alta vista, and there was yahoo. Yahoo used human beings to categorize data manually. They'd put sunglasses next to the best sites in many categories -- flag something as a "cool site". Alta vista was pure mechanical searching, with no human judg