A New Data Model for the Web

← Back to Stories (view on slashdot.org)

Posted by timothy on Tuesday July 26, 2005 @08:51PM from the data's-no-model dept.

An anonymous reader writes "Adam Bosworth delivered what could be considered a seminal lecture (mp3) at the last MySQL conference about a new data model for the web, why the plain HTML web succeeded, and why XQuery or the Semantic web are failures. He is emphatic that RSS 2.0/Atom are the next big thing and represent the new data model for the web. The audio is rather long at forty plus minutes and there are a few places where the talk has been covered."

9 of 54 comments (clear)

Min score:

Reason:

Sort:

A war between the humans and computer scientists by Felonius+Thunk · 2005-07-26 21:38 · Score: 3, Insightful

I'm downloading the speech now, but if it's anything like this great speech he gave last year, it will be well worth listening to. That one changed my mind about what great things might look like. I've realized the great and wonderful content management system that my group is building is utterly doomed, for example, and I already have a new job in hand. It's all about the sloppiness.
Content, Availablity... by Vo0k · 2005-07-26 22:56 · Score: 3, Insightful

There's way more to successful formats than the structure. But let me name two essentials.
What use is a format of data if the data itself is useless?
How can a format take off when only few have access to publishing in it?
That's the way Gopher went. Only admins could add pages. Meantime, most of people with access to the net, were able to create their own ~/public_html
Now RSS is the big thing. People add RSS to everything. Where are MSIE's "channels"? Spamvertisment available to the chosen few. Revolutionary video tape technologies competetive to VHS: None in shops, few movies available. And so on, and so on...

--
Anagram("United States of America") == "Dine out, taste a Mac, fries"
Really enjoyed, but not sure I buy by astrashe · 2005-07-26 23:36 · Score: 5, Insightful

This is a great talk, and I really enjoyed it, but I'm not sure I buy it.

I haven't really digested the talk, so maybe that's why. But this is my gut reaction against what he's saying.

I don't think that geeks fully acknowledge the role of what I think of as bibliography in the web ecosystem.

I was an English major. Let's say that you want to learn about Faulkner. If you go to the card catalogue, and search for books about Faulkner, you get a lot of hits -- more books than you could ever read. It's essentially useless.

What you really need is a bibliography -- something written by a Faulkner scholar who says "these are the really important and groundbreaking books about Faulkner." That's one of the cool things about Encyclopedia Brittanica -- at the end of their articles, they tend to give you a run down of some of the key books on the subject.

So if you want to read a biography of George Washington, EB will let you find the right one. That's important, because there are so many biographies of George Washington out there.

That's my key point. If you go to a university library and use the catalogue to do a mechanical search for books about George Washington, the results aren't very useful. But if you read the bibliography at the end of the Encyclopedia Brittanica article, it's extremely useful.

I'm trying to draw a distinction between mechanical searches, on one hand, and selections based on human judgement on the other.

Google is useful in larege part, I think, because page rank lets you find what are essentially good bibliography pages. You use a dumb mechanical search to put you in touch with people who know their subjects and who have good judgement (hopefully).

The other day, for example, I was thinking about an old programming language called APL. I searched for it, and found a couple of pages that seemed to have collected just about everything APL -- anecdotes, personal histories, tutorials, implementations, pictures of the goofy APL keyboards, etc.

The Google powered web is cool because it combines the mechanical and the bibliographic so well. Google gets me to the bibliography -- it pulls that needle out of the haystack. But it's the bibliography that lets me drill down.

This is important. The really good stuff I read about APL didn't come directly from the actual google result page. There was a link in between -- the google result page took me to the APL bibliography page, and from there I was able to hit the meat of the matter.

We've seen, over the past decade, an explosion in which mechanical searching can do. Because it's been getting so much better so quickly, it's dominating the way we think about how we find information. It's causing us to give bibliography -- the judgement of experts -- short shrift.

But bibliography is absolutely key to the google ecosystem.

My problem with attempts to impose more structure on data is that it always breaks things. It's beefing up mechanical searches, which are already very good, and it does it at the expense of bibliography.

I buy the argument in this lecture more than the guy making it does. He complains about heavier structures, and how the complexity will prevent people from producing and consuming information. I think that almost any move away from what we have now will do the same thing. The more you structure information, the harder it is for people to provide bibliography.

The point is that the ideal medium for bibliogrphy is free form -- one person saying, "this is what I think" to another.

The genius of google is that page rank gives you a mechanical way to uncover the best bibliographies. The best ones tend to show up at the top of the results.

In the old days, there was alta vista, and there was yahoo. Yahoo used human beings to categorize data manually. They'd put sunglasses next to the best sites in many categories -- flag something as a "cool site". Alta vista was pure mechanical searching, with no human judg
1. Re: Really enjoyed, but not sure I buy by gidds · 2005-07-27 00:37 · Score: 2, Insightful
  
  Don't you think that Google itself is functioning like a bibliography? The important pages, the ones most worth seeing, are likely to be the most linked-to, and so appear at the top of the list. The rating is done by every web site creator, and the collation by Google; doesn't that make PageRank effectively a bibliographic tool?
  
  --
  Ceterum censeo subscriptionem esse delendam.
a case of capt. obvious? by tod_miller · 2005-07-26 23:58 · Score: 4, Insightful

or is it just me? I know it is hard to predict the way technology is going, the only reason HTML still is around is because it works, and was widely adopted, and nothing else gives any [real] benefits (for now).

as far as I am concerned, however you split up content, style, updates, 'sitefiles' (my collective analogue for rss and related technologies) the fact is one coherent, styled document must be the end result.

Too much is being read into content management and RSS. Yes RSS is cute, I use it to have a BBC and CNN link in my firefox, and I just one click to read articles, not go to the site.

RSS and podcasting is the worst combination of not-new hype ever. Downloading a file through the web, wow new! :-)

Seriously, pod casting should be renamed downloading audio.

--
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
Re:Just listened to the whole thing by Anonymous Coward · 2005-07-27 00:18 · Score: 3, Insightful

Open tag - close tag - encode amper's and, greater-than, less-than.

Use appropriate character-encoding and -decoding at I/O-borders.

Finished.

Everyone who is not able to do these things correctly by hand or to make his script output correct XML should continue flipping burgers and does not belong in this industry.

What kind of Kindergarten is IT turning into?

Fuck.
Re:Just listened to the whole thing by Linus+Torvaalds · 2005-07-27 00:23 · Score: 4, Insightful

Miss a tag in XML, sorry, no rendering today. The result? No-one writes XML by hand

Actually, it works the other way around. Because syntax errors are immediately obvious when writing XML, it's a lot easier to write by hand, because when you make a mistake, you notice it straight away.

The reason why so many people use libraries with XML is because it's a standard format with libraries for practically every language. Using a library often saves time compared with writing stuff by hand.

that means your average Perl, Python, PHP coder will actually have to read some docs or a specification to remember how to output this stuff so they just won't bother.

Rubbish. They'll do exactly what they did to learn how to generate HTML - look at a few examples and make their own that looks like the example. <?php echo('<foo>My XML Document</foo>'); ?> is no harder than <?php echo('<h1>My HTML Document</h1>'); ?>

Bosworth says that's why RSS 2.0 beats the pants off RSS 1.0, anyone can create these files and the freely available libraries that handle this stuff are really really fault tolerant.

Both RSS 1.0 and RSS 2.0 use XML syntax and have freely available libraries anybody can use. But didn't you just say that nobody will bother using XML formats because people won't read the documentation that tells them how to use such libraries?
Very skeptical by Exaton · 2005-07-27 02:12 · Score: 2, Insightful

Sorry, I trust Sir Tim Berners-Lee more than I trust "Adam Bosworth".

That guy can start by learning how to add some <br />'s in what he writes (go check out his blog -- horrendous !) before pretending to talk about Web fundamentals.
RSS Same As NNTP Newsgroups... by Anonymous Coward · 2005-07-27 03:46 · Score: 2, Insightful
but with
- no standardized means of replying/interacting,
- no means of maintaining topicality,
- no means of adding attachments,
- no bonafide archive,
- poor performance,
- egos that want to control responses to their posts.
NNTP is an irreplaceable source of technical information. In contrast the world wouldn't skip a beat if all RSS feeds stopped tomorrow.