A New Data Model for the Web
An anonymous reader writes "Adam Bosworth delivered what
could be considered a seminal lecture (mp3) at the last MySQL conference about a new data model
for the web, why the plain HTML web succeeded, and why XQuery or the
Semantic web are failures. He is emphatic that RSS 2.0/Atom are the
next big thing and represent the new data model for the web. The audio
is rather long at forty plus minutes and there are a few
places
where the
talk has been covered."
Do we take two steps back every week in this industry or what? RSS is a text file format. It's not a "data model".
What are the operators for manipulating this data? What is the type system? How is integrity guaranteed? How do I build a distributed database system with it?
There is only one complete data model: the relational model. Demonstrate to me how this "new" data model is not either 1) some subset of the relational model or 2) a bunch of nonsense, not a data model at all.
He's got one thing right: XQuery (return to the hierarchic databases of yesterday) and RDF (return to the network model, but with a fixed 3-value schema) are nothing to waste your time on.
To me his assertions are like saying, for example, the fundamental theorems of electromagnetism no longer apply to cell phones because they can now play MP3s, or something. Makes no sense.
Unfortunately, there is nobody left in this industry that has any clue about databases.
He is emphatic that RSS 2.0/Atom are the next big thing and represent the new data model for the web.
Here's the thing: RSS 2.0 and Atom really don't have a revolutionary data model. They are just file formats that list short descriptions, in a sequential order, with a bit of meta data, that get polled on a regular interval. That's all.
They are only popular because the use pattern is different to normal web pages. The tech itself is pretty mundane. Internet Explorer 4.0 has something similar with "channels", way back in the 90s.
You could have done the same thing with a subset of HTML 2.0 in the 90s. The main reasons people didn't is because they didn't think of it and the need wasn't as great.
The Semantic Web, on the other hand, is doing new stuff. Some of it we don't know how to do yet. Some of it is immediately practical, some of it isn't. The Semantic Web is more of an idea than a tangible product.
By saying that RSS and Atom somehow "beat" the Semantic Web, he's comparing apples to oranges. It just doesn't make sense.
The reason the web took off so well was because it was built from a few simple principles that could be generalised. Resources that could be addressed. Simple, text-based markup. Simple, text-based protocol.
The Semantic Web will probably take off in the same way, with various bits already being used to varying degrees of success (e.g. Mozilla already uses RDF). But it's a much bigger problem, so expecting it to take off just as quickly is naive.
I'm downloading the speech now, but if it's anything like this great speech he gave last year, it will be well worth listening to. That one changed my mind about what great things might look like. I've realized the great and wonderful content management system that my group is building is utterly doomed, for example, and I already have a new job in hand. It's all about the sloppiness.
Good presentation. It reminded me of an email I got the other week on my local LUG mailing list. Someone was complaining about how strict XML processing is vs HTML processing. If you miss a tag in HTML, yeah, no problem, the parser will forgive you. Miss a tag in XML, sorry, no rendering today. The result? No-one writes XML by hand (unless they're a masochist) and that means your average Perl, Python, PHP coder will actually have to read some docs or a specification to remember how to output this stuff so they just won't bother. Bosworth says that's why RSS 2.0 beats the pants off RSS 1.0, anyone can create these files and the freely available libraries that handle this stuff are really really fault tolerant. He says a lot of stuff about scalability and other stuff, but you can just listen to the mp3 if you wanna hear what he said.
How we know is more important than what we know.
Heh heh heh... He said 'seminal'...
There's way more to successful formats than the structure. But let me name two essentials.
What use is a format of data if the data itself is useless?
How can a format take off when only few have access to publishing in it?
That's the way Gopher went. Only admins could add pages. Meantime, most of people with access to the net, were able to create their own ~/public_html
Now RSS is the big thing. People add RSS to everything. Where are MSIE's "channels"? Spamvertisment available to the chosen few. Revolutionary video tape technologies competetive to VHS: None in shops, few movies available. And so on, and so on...
Anagram("United States of America") == "Dine out, taste a Mac, fries"
u:newidea
p:ideaman
via BugMeNot.
This is the speech in mp3 and the speech in AAC/M4B (for iTunes/iPod).
The slashdot story mis-sells the content of the speech. For me it was just AB talking about how it would be useful to have a simple system of aggregation that goes beyond subscribing to an RSS feed.
It's not a new data model & the semantic has not failed, in fact, it's more important when considering how to work with the diverse resulting data.
boakes.org
This is a great talk, and I really enjoyed it, but I'm not sure I buy it.
I haven't really digested the talk, so maybe that's why. But this is my gut reaction against what he's saying.
I don't think that geeks fully acknowledge the role of what I think of as bibliography in the web ecosystem.
I was an English major. Let's say that you want to learn about Faulkner. If you go to the card catalogue, and search for books about Faulkner, you get a lot of hits -- more books than you could ever read. It's essentially useless.
What you really need is a bibliography -- something written by a Faulkner scholar who says "these are the really important and groundbreaking books about Faulkner." That's one of the cool things about Encyclopedia Brittanica -- at the end of their articles, they tend to give you a run down of some of the key books on the subject.
So if you want to read a biography of George Washington, EB will let you find the right one. That's important, because there are so many biographies of George Washington out there.
That's my key point. If you go to a university library and use the catalogue to do a mechanical search for books about George Washington, the results aren't very useful. But if you read the bibliography at the end of the Encyclopedia Brittanica article, it's extremely useful.
I'm trying to draw a distinction between mechanical searches, on one hand, and selections based on human judgement on the other.
Google is useful in larege part, I think, because page rank lets you find what are essentially good bibliography pages. You use a dumb mechanical search to put you in touch with people who know their subjects and who have good judgement (hopefully).
The other day, for example, I was thinking about an old programming language called APL. I searched for it, and found a couple of pages that seemed to have collected just about everything APL -- anecdotes, personal histories, tutorials, implementations, pictures of the goofy APL keyboards, etc.
The Google powered web is cool because it combines the mechanical and the bibliographic so well. Google gets me to the bibliography -- it pulls that needle out of the haystack. But it's the bibliography that lets me drill down.
This is important. The really good stuff I read about APL didn't come directly from the actual google result page. There was a link in between -- the google result page took me to the APL bibliography page, and from there I was able to hit the meat of the matter.
We've seen, over the past decade, an explosion in which mechanical searching can do. Because it's been getting so much better so quickly, it's dominating the way we think about how we find information. It's causing us to give bibliography -- the judgement of experts -- short shrift.
But bibliography is absolutely key to the google ecosystem.
My problem with attempts to impose more structure on data is that it always breaks things. It's beefing up mechanical searches, which are already very good, and it does it at the expense of bibliography.
I buy the argument in this lecture more than the guy making it does. He complains about heavier structures, and how the complexity will prevent people from producing and consuming information. I think that almost any move away from what we have now will do the same thing. The more you structure information, the harder it is for people to provide bibliography.
The point is that the ideal medium for bibliogrphy is free form -- one person saying, "this is what I think" to another.
The genius of google is that page rank gives you a mechanical way to uncover the best bibliographies. The best ones tend to show up at the top of the results.
In the old days, there was alta vista, and there was yahoo. Yahoo used human beings to categorize data manually. They'd put sunglasses next to the best sites in many categories -- flag something as a "cool site". Alta vista was pure mechanical searching, with no human judg
A new data model?
Couldn't we please focus on implementing the old data model correctly first?
Karma: Positive (probably because of superiour intellect)
or is it just me? I know it is hard to predict the way technology is going, the only reason HTML still is around is because it works, and was widely adopted, and nothing else gives any [real] benefits (for now).
:-)
as far as I am concerned, however you split up content, style, updates, 'sitefiles' (my collective analogue for rss and related technologies) the fact is one coherent, styled document must be the end result.
Too much is being read into content management and RSS. Yes RSS is cute, I use it to have a BBC and CNN link in my firefox, and I just one click to read articles, not go to the site.
RSS and podcasting is the worst combination of not-new hype ever. Downloading a file through the web, wow new!
Seriously, pod casting should be renamed downloading audio.
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
I didn't understand why he said that? I've always heard it was good to put all your logic into the DB.
Would anyone care to explain that a little? And please dumb it down a lot, I'm not that smart in databases.
boakes.org
The average Slashdot story links to a 2-5 minute article, and most people don't even bother to read that before they post a comment. Since this story links to a 40-minute MP3 that no one will bother listening to, the comments page should be an interesting read...
-William Brendel
Sorry, I trust Sir Tim Berners-Lee more than I trust "Adam Bosworth".
/>'s in what he writes (go check out his blog -- horrendous !) before pretending to talk about Web fundamentals.
That guy can start by learning how to add some <br
Nevertheless, for the masses in their office cubicles RSS feeds are the next "big thing".
...and then being able to deliver that promise in the time frame we promised.
;^) and Professional Management (and I think we can all link back to "those" discussions) who may not be able to understand the potential and maintain our innovation, leading us down the path of the AI'ers.
Try telling the masses that the next big thing is a new data model for the web, based on semantics, and 99% of them will ask you what "semantic" means, never mind the intangible data model that is the real underlying improvement.
Show them a little program that sits on their desktop and feeds them the latest from CNN, the BBC etc and they understand that.
Web development and IT in general is running a real risk of falling into a litany of problems that similar industries have, without trying to mitigate them or indeed learn anything from them.
As MoonFog points out, we run the risk of doing what AI researchers did. Promising the Silver Bullet was just around the corner... "If the average Joe can't understand a computer then we can make the computer understand them!"
How about mobile telecoms companies such as Nokia or 3? They accidentally invented the bane of good social graces in the 20th century, text messaging, and have forever since been trying to come up with the next great money spinner. Have MMS, 3G, video phones, PTT, mobile gaming etc... ever really produce as much money or been as much of a success as texting?
Finally, the manufacturing industry and mechanical Engineers (esp. in the UK). It may have been Maggie who nailed the coffin shut (and to this day the manufacturing industry in Britain is a bit of a mess, just look at Rover) but it was foreshadowed by a calamitous shifting of project management responsibilities beforehand. Large mechanical engineering projects went from being managed by those who did it but for the love (Isambard Kingdom Brunel right through to Sir Frank Whittle) to "professional" managers who couldn't maintain the level of innovation, back finally to the Engineers themselves who promptly allowed everything to go over budget and over time because nobody had taught them about project management.
Where am I going with all this ()? Is RSS our predictive text, a nice addition but not truly the next big thing? So what is the "next big thing"? Web 2, The Semantic Web? Possibly, but we need to be careful, our management structures are currently in the stage of evolution between leaving those who did it for the love (Hello Linus
If the masses can't tell the difference between the apples and the oranges then the comparison won't bother them, just give them the apple now and work on making the orange even better.
Regards, Phil
The language Tutorial-D in the article you refer to is yet another language for relational databases! Darwen and Date are critics of SQL implementations; they are NOT critics of the relational database as you imply. They are instead the strongest relational database proponents.
Indeed the relational model is the only model with logically provable underpinnings. In ON DOCUMENT- VS. DATA-BASES Chris Date explains:
And about "document databases" (this would include HTML & XML):
NNTP is an irreplaceable source of technical information. In contrast the world wouldn't skip a beat if all RSS feeds stopped tomorrow.
"He's got one thing right: XQuery (return to the hierarchic databases of yesterday) and RDF (return to the network model, but with a fixed 3-value schema) are nothing to waste your time on."
Well damn! There goes my Firefox extension.
In the speech, Adam Bosworth predicted that "RSS 2.0 and Atom will be the lingua franca that will be used to consume all data from everywhere" because they "are simple formats that are sloppily extensible."
It's true that many seem to be moving in this direction. For example, A9's OpenSearch is a simple extension to RSS. The Findory API offers simple, RSS-based access to news and blog search results. Yahoo offers a few services through more the more complex Yahoo APIs, but offers many more through Yahoo RSS, including news and web search results.
It seems that most web services may end up standardizing on simple REST protocols using RSS and Atom.
Is there a Dweeb mod point?
OSGGFG - Open Source Gamers Guide to Free Games
http://www.google.com/profiles/malachid