Web Redesigned With Hindsight
Randy Sparks writes "Tim Berners-Lee has been speaking about his vision for the Web. He proposed the Semantic Web six years ago and it's taken that long for the W3C to ratify his plans for Resource Description Framework (RDF) and the OWL Web Ontology Language (OWL). Effective the Semantic Web is the Web as we know it put into database form and with added metadata. You can read more about it over on MacWorld and see a Semantic Web proof-of-concept at the Web Archive."
The web is popular because it's easy to create web pages. The semantic web stuff strikes me as something that only someone with a PhD in semantics could love. IMO it violates the KISS principle.
Have you read my blog lately?
This may seem a subtle point, but being a Bush fan is far worse than being JUST a Republican.
Good thoughts, it's a shame that Microsoft's bundling of IE with Windows makes anything the WWW Consortium largely irrelevent, even when the specs come from MS themselves (CSS).
That being said, relying on publisher embedded meta-data to be relevent on the WWW is probabally wrong. Someone, somewhere, is going to try to lie in that metadata as a way of making money.
Burn Hollywood Burn
The macworld article isnt very informative to someone who've never heard of this "next generation" web, but it seems like they want to add it on top of the existing WWW.
Why cant someone just invent a new similar, improved web that is separated from the current WWW, with its own specific browser, and implement the various ins, outs and whathaveyous to keep the riffraff from exploiting it in very annoying ways?
This kind of thing goes to show how much difference can be made by getting the initial trajectory right.
A few small changes at the start can lead to BIG consequences later as the inertia of the whole mess gets going.
Anyone else out there with a really great idea? Do us all a favor and think as far ahead as you can before you release it on the world. Even then, it will still eventually not be going in the optimal direction.
"Provided by the management for your protection."
- Intelligent search engines that produce much better results than Google etc. because they can index the meaning of documents, not the words they contain.
- Agent technology that can retrieve information for you, price compare items you are shopping for and automate a number of interesting processes.
- Automatic clustering of website around subjects of interest to create much richer knowledge-oriented navigation.
But the Semantic Web project can't succeed as it is currently specified. It is working towards standards for storing and managing the meta-content required for this Brave New World but doesn't tackle the much harder problem of how to create meta-content that is consistent and pervasive. At present this is left to individual web page authors with no mechanism to ensure consistency. Without consistency, the Semantic Web is doomed. If I tag a web page as being about "software engineering" and another person uses the tag "computer programming" the Semantic Web can't tell they are about the same thing.In a world where an estimated 70% of web pages don't even have a title isn't it rather unrealistic to expect most web page authors will learn a complex new representation like RDF and consistently tag their pages with it?
Clay Shirky has a very good article on this. I recommend reading it before you get too excited about the Semantic Web.
Sailing over the event horizon
The semantic web does keep it simple. It's supplimental to current web pages and is optional. It simply adds more data for computers to read. It's something very basic that leaves the opportunity for much more complex things later. Anyone who can't understand a triple - a subject, verb, and object - probably failed second grade english.
Developers: We can use your help.
Excuse me, but can they stop overdesigning HTML? Its a freaking pseudo-layout language. The whole beauty of it is that complete newbs can learn to text-edit it. Now, with all the crufty front matter, its impossible to hand-write html that will pass a verifier. Many of the more useful layout features that don't have anything to do with style classes are being put into css instead of html proper. HTML is a dead simple concept, and as such should be a newbie tool. Instead, its just getting increasingly baroque. It really doesn't need more crap.
Now, the http system itself - that could do with some upgrades. More support for "push" content is what it needs - like slashdot telling _me_ when there is new news so my browser can refresh, and sending me a diff instead of the full new page. Or support for distributed file hosting. Or some way to recieve HTTP requests from behind a NAT (even if it requires an external name server to help you along) without forwarding ports to yourself (if thats at all possible). My knowledge of network topology is limited at best, but if I can get ICQ messages while behind a nat, why can't I serve HTML? Its still just receiving unrequested data - messages in one case, requests for content in the other.
Semantic web was thought up and designed six years ago and a lot has happened on the internet since the first road map by TBL.
/end rant
Lots of sites and portals are offering services simililar to semantic web, but they do _not_ want to share the precious metadata they are harvesting from various web sites; google.com, yahoo.com, pricerunner.com, the list goes on and on.
Besides, the way I see it semantic web tries to solve problems AI research has struggled with for decades; give a machine the ability to reason. SW wants to squeeze complex real world objects into formal representations by creating chunks, or graphs of metadata.
A computer using reasoning/inference to understand different graphs from different contexts will probably fail miserably because simple lists of metadata won't be enough to determine whether two graphs/contexts actually describes similar objects.
So, who's gonna provide high quality metadata and who/what is going to use it?
Having access to tons of annotated data is a wonderfull dream. I could see academic institutions going for this, but not corporations for the most part.
You see, corporations don't WANT you to be able to access data easily. One of the major driving factors of the current web is advertising. Basically, this is something none of us want to see, but with web pages it's easy to try and force us to see it. Properly annotated data would kill advertising as we know it, something the corporations will not let happen.
Also, corporations do not want us to be able to easily compare data either. Take prices for instance. Many stores have promises like "we'll match any price". This worked on the basis that it's hard and tedious to go check other prices and people will think "well, hey, if they are making this promise surely they already have the lowest price otherwise everyone would be calling them on it". Well, no, most people will not go check for lower prices, and if they do and end up finding lower prices elsewhere, they will often buy elswhere. Easy price comparisons are not something online stores want to allow.
Ulitmatly, most sites want to force you to look at data they want you to look at (ads). I doubt we'll ever see all web data in a nice annotated form allowing us to view only what we are interested in.
The real problem is that people are creating so-called semi-structured data in the first place.
You are absolutely right: people are wrong. Data
must fit into the relational model or it doesn't g
et to play on the web.
I look forward to the web going down for schema updates. Hmm, I'm not sure this approach scales too well...
"..useful layout features that don't have anything to do with style classes..." ??
you're joking, right?
"beauty of... complete newbs... text-edit"
gack. if you think what you see when you view source in your average web page is beautiful, you sir, are beyond help.
html *should* be simple -- but in practice it's bloated, convoluted, and full of things that have only to do with presentation. the markup should simply describe the content. css should describe how it looks. it's cleaner, more readable, *easier* to write, read, maintain... and it's better-performing.
this separation of content from presentation is so clearly a design goal for web developers and architects. do you really oppose it?
"it's impossible to hand-write html that will pass a verifier"
this is also ignorant and false. it's quite easy. and using tools like TIDY to help is straightforward if you have trouble.
I won't get into your ideas about changes to HTTP1.1 now, but had to say something about your distorted perception of the role of html/css in the web.
La via sola al paradiso incommincia nel inferno
I admit, I use Golive for my websites. Because it does most of the work for me - and together with some scripted exporting and stuff I hardly have to touch the code, and it's nicely compliant and lean. I *can* code. I just don't really enjoy it, and it's not worth it for the amount of work I do nowadays.
;-)
I'd love to jump on the next thing, and I see the use of all this meta stuff. I try to treat meta tags with respect btw, and only use them on relevant pages.
But for this to take off, you'd need tools that organize the meta data FOR you. So that you only have to edit it lightly, to take out the silliness. Akin to using automated translation.
Which begs the question: why not make search engines and agents smarter instead?
I mean, I can't be the only lazy person here, can I? And I have sort of an interest in the stuff, so I'd probably do what's required, but most people wouldn't I'm sure.
If I were a betting man, I'd put my money on agents - even after all the bullshit and the failed expectations from the late '90s. I'd love to have some clever agents do my searches for me, and on the mac, there are already some pretty clever programs available for free (http://www.devon-technologies.com/)
(yeah, I'm too lazy to put this post in HTML too, so sue me
I think, therefore I am...I think.
Yes, it is beautiful. Why? 'cause it was written by a twelve year old who read a three page hand out her teacher gave her on "how to make a webpage", and she's been learning by tinkering since then.
People are not coders. People are users. Users want to just use things - not muck around with research, not have to learn whole new lexicons for each task, just get stuff done. HTML is practically the only pure-text system they seem to do that in anymore - everything else is covered in complex guis. To many people, html is the bridge to programming. With that bridge lost, they might never want to use anything that's not pure wzywig, and there aren't many programming languages like that.
Like it or not, HTML has become the learning ground for many budding computer users.
My CSS complaints came out wrong - what I was complaining about CSS was that originally, everything that could be done in CSS could be done in HTML as well. You could write proper, stripped HTML and use robust CSS, or you could just do the whole damn thing in ugly, ugly HTML, and still have access to the whole featureset. Now there are features that exist only in CSS beyond simply defining classes of things that already occur in HTML. So, newb html-only users end up with an incomplete feature set. If CSS was more intuitive this wouldn't be a problem, but currently it is far too cryptic to push onto an uninformed user. As a result, learning users stick to pure HTML, and thus are stuck with half a feature set.
Please explain to me how this:
is better, or more readable, than:
Username taken, please choose another one.
Technologies like this are typically doomed to failure, because they violate one key precept-
computers should work for us, not the other way around.
Few people will bother with the effort of semantically marking up their documents, and
fewer still will do so in a way that is consistent in any way to be useful.
Computers / programmers will need to become better at analyzing human communication, anything else
hardly seems worth the effort.
Nice idea though.