A New Data Model for the Web
An anonymous reader writes "Adam Bosworth delivered what
could be considered a seminal lecture (mp3) at the last MySQL conference about a new data model
for the web, why the plain HTML web succeeded, and why XQuery or the
Semantic web are failures. He is emphatic that RSS 2.0/Atom are the
next big thing and represent the new data model for the web. The audio
is rather long at forty plus minutes and there are a few
places
where the
talk has been covered."
I'm downloading the speech now, but if it's anything like this great speech he gave last year, it will be well worth listening to. That one changed my mind about what great things might look like. I've realized the great and wonderful content management system that my group is building is utterly doomed, for example, and I already have a new job in hand. It's all about the sloppiness.
There's way more to successful formats than the structure. But let me name two essentials.
What use is a format of data if the data itself is useless?
How can a format take off when only few have access to publishing in it?
That's the way Gopher went. Only admins could add pages. Meantime, most of people with access to the net, were able to create their own ~/public_html
Now RSS is the big thing. People add RSS to everything. Where are MSIE's "channels"? Spamvertisment available to the chosen few. Revolutionary video tape technologies competetive to VHS: None in shops, few movies available. And so on, and so on...
Anagram("United States of America") == "Dine out, taste a Mac, fries"
This is a great talk, and I really enjoyed it, but I'm not sure I buy it.
I haven't really digested the talk, so maybe that's why. But this is my gut reaction against what he's saying.
I don't think that geeks fully acknowledge the role of what I think of as bibliography in the web ecosystem.
I was an English major. Let's say that you want to learn about Faulkner. If you go to the card catalogue, and search for books about Faulkner, you get a lot of hits -- more books than you could ever read. It's essentially useless.
What you really need is a bibliography -- something written by a Faulkner scholar who says "these are the really important and groundbreaking books about Faulkner." That's one of the cool things about Encyclopedia Brittanica -- at the end of their articles, they tend to give you a run down of some of the key books on the subject.
So if you want to read a biography of George Washington, EB will let you find the right one. That's important, because there are so many biographies of George Washington out there.
That's my key point. If you go to a university library and use the catalogue to do a mechanical search for books about George Washington, the results aren't very useful. But if you read the bibliography at the end of the Encyclopedia Brittanica article, it's extremely useful.
I'm trying to draw a distinction between mechanical searches, on one hand, and selections based on human judgement on the other.
Google is useful in larege part, I think, because page rank lets you find what are essentially good bibliography pages. You use a dumb mechanical search to put you in touch with people who know their subjects and who have good judgement (hopefully).
The other day, for example, I was thinking about an old programming language called APL. I searched for it, and found a couple of pages that seemed to have collected just about everything APL -- anecdotes, personal histories, tutorials, implementations, pictures of the goofy APL keyboards, etc.
The Google powered web is cool because it combines the mechanical and the bibliographic so well. Google gets me to the bibliography -- it pulls that needle out of the haystack. But it's the bibliography that lets me drill down.
This is important. The really good stuff I read about APL didn't come directly from the actual google result page. There was a link in between -- the google result page took me to the APL bibliography page, and from there I was able to hit the meat of the matter.
We've seen, over the past decade, an explosion in which mechanical searching can do. Because it's been getting so much better so quickly, it's dominating the way we think about how we find information. It's causing us to give bibliography -- the judgement of experts -- short shrift.
But bibliography is absolutely key to the google ecosystem.
My problem with attempts to impose more structure on data is that it always breaks things. It's beefing up mechanical searches, which are already very good, and it does it at the expense of bibliography.
I buy the argument in this lecture more than the guy making it does. He complains about heavier structures, and how the complexity will prevent people from producing and consuming information. I think that almost any move away from what we have now will do the same thing. The more you structure information, the harder it is for people to provide bibliography.
The point is that the ideal medium for bibliogrphy is free form -- one person saying, "this is what I think" to another.
The genius of google is that page rank gives you a mechanical way to uncover the best bibliographies. The best ones tend to show up at the top of the results.
In the old days, there was alta vista, and there was yahoo. Yahoo used human beings to categorize data manually. They'd put sunglasses next to the best sites in many categories -- flag something as a "cool site". Alta vista was pure mechanical searching, with no human judg
or is it just me? I know it is hard to predict the way technology is going, the only reason HTML still is around is because it works, and was widely adopted, and nothing else gives any [real] benefits (for now).
:-)
as far as I am concerned, however you split up content, style, updates, 'sitefiles' (my collective analogue for rss and related technologies) the fact is one coherent, styled document must be the end result.
Too much is being read into content management and RSS. Yes RSS is cute, I use it to have a BBC and CNN link in my firefox, and I just one click to read articles, not go to the site.
RSS and podcasting is the worst combination of not-new hype ever. Downloading a file through the web, wow new!
Seriously, pod casting should be renamed downloading audio.
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
Open tag - close tag - encode amper's and, greater-than, less-than.
Use appropriate character-encoding and -decoding at I/O-borders.
Finished.
Everyone who is not able to do these things correctly by hand or to make his script output correct XML should continue flipping burgers and does not belong in this industry.
What kind of Kindergarten is IT turning into?
Fuck.
Miss a tag in XML, sorry, no rendering today. The result? No-one writes XML by hand
Actually, it works the other way around. Because syntax errors are immediately obvious when writing XML, it's a lot easier to write by hand, because when you make a mistake, you notice it straight away.
The reason why so many people use libraries with XML is because it's a standard format with libraries for practically every language. Using a library often saves time compared with writing stuff by hand.
that means your average Perl, Python, PHP coder will actually have to read some docs or a specification to remember how to output this stuff so they just won't bother.
Rubbish. They'll do exactly what they did to learn how to generate HTML - look at a few examples and make their own that looks like the example. <?php echo('<foo>My XML Document</foo>'); ?> is no harder than <?php echo('<h1>My HTML Document</h1>'); ?>
Bosworth says that's why RSS 2.0 beats the pants off RSS 1.0, anyone can create these files and the freely available libraries that handle this stuff are really really fault tolerant.
Both RSS 1.0 and RSS 2.0 use XML syntax and have freely available libraries anybody can use. But didn't you just say that nobody will bother using XML formats because people won't read the documentation that tells them how to use such libraries?
Sorry, I trust Sir Tim Berners-Lee more than I trust "Adam Bosworth".
/>'s in what he writes (go check out his blog -- horrendous !) before pretending to talk about Web fundamentals.
That guy can start by learning how to add some <br
NNTP is an irreplaceable source of technical information. In contrast the world wouldn't skip a beat if all RSS feeds stopped tomorrow.