Challenging the Ideas Behind the Semantic Web
mytrip writes to tell us that after a recent presentation to the American Association for Artificial Intelligence (AAAI) Tim Berners-Lee was challenged by fellow Google exec Peter Norvig citing some of the many problems behind the Semantic Web. From the article: "'What I get a lot is: "Why are you against the Semantic Web?" I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."
I'm calling the Anti-Neutrality Web Designers of Amerika!
Demands of inequality such as this should be allowed!
(btw, the spelling doctor has "loosing" as in "loosing the hownds for the huhnt")
But you just gotta have another sigarette
The current semantic web seems to offer a technology too fragile to use on the global scale. The complexity of various classification and ontological schemes, work needed to provide the metadata etc. Also, semantic web seems to offer great opporturnities for spammers and other mischief makers. Now we already have comment and reference spamming, but semantic web (on the global scale) raises the possibilities enormously.
I dunno, I'm guessing he knew exactly what he was saying... But I do wonder if he was trying to tease Tim Berners-Lee a little. It would be interesting to see/hear audio/video of that exchange.
Sure, the technical limitations of Joe Public might slow the growth of the Semantic Web on the whole, but what few people realize is that the Semantic Web has already existed for years in in-house or limited-audience networks. Just look at FOAFnaut (an update in a few weeks will return it to full usability) or the very much real-world examples in Geroimenko & Chen's Visualizing the Semantic Web (Springer, 2005).
In one of the very first papers mentioning the Semantic Web, some paragraph was devoted to something then lost in the hype around the semantic web: the Web of trust, which had to be something like a certification of metadata. This is perhaps to be again regarded as important for the semantic web and the web in general (although not easy to manage).
By the way, Norvig is not only a Google exec, but also a well known AI researcher, author of one of most important books on that subject.
Slightly offtopic. Peter Norvig gave a talk at my university on similar topics, and there was a short Q&A afterwards.
:)
One of the students asked him what he did for his 20% project. He said that he was usually too busy keeping tabs on what the other employees were doing with their 20% time, so he didn't quite get around to working on his. He told us what he wanted to do, as motivation for himself.
The basic idea is that when he used to work for NASA, it'd always make him upset when people saw faces in random spots on the moon's terrain, and claimed it was aliens that NASA was covering up, or similar. So, he was planning on taking facial recognition software and running it on all of google earth. I think it'd be pretty awesome..
Any progress yet, Mr. Norvig? I'd love to see the results..
Powered by Web3.5 RC 2
It's the business users too that are a problem. I'm currently trying to get a project on the rails based on semantic web technology, and I'm confronted with an IT department where some are even struggling with the difference between subtyping and instantiation- let alone more advanced modelling issues... It doesnt help ofcourse that most people never even heard of conceptual modelling languages such as ORM but instead were thought to use uml and ER where it's the modellers' responsibility to make a distinction between what is conceptual, logical and physical which ofcourse most never did.
In regards to the google issue I think the idea that you should crawl everything is faulty cause you need to be able to trust the source. Most ontologies will simply be restricted to a certain domain and corresponding user group, often in a b2b context. Integrating every man and his dog, the lawnmower and the kitchen sink with some kind of top level ontology is merely a nice-to-have philosophical issue that I dont expect to be solved in the near future, if only cause we havent seen much advances since Aristole started toying around with the idea. In other words, at google they are worried about an issue that's atleast a decade away from now, probably even more.
Especially if the rules appear to be an incomprehensible ad-hoc mix of principles taken from a dozen not-quite-fully-baked AI dissertations.
I still don't think I truly understand how RDF is supposed to work...
I don't think anyone does.
I'm not saying that the semantic web is bullshit, but it does trigger my bullshit detector. At least one of them must be broken.
Am I part of the core demographic for Swedish Fish?
Do not forget that the semantic Web is not a replacement of the existing technologies: HTML contents will always be there but, What if these little 'metadata' description where added to ALL the Web Pages? In this case, the pages could be categorized, analysed and searched much more easily, and the algorithms related to these operations would be better. In such an scenario, the use or one or another Web search engine would be irrelevant because all of them would have powerful and acurate algorithms. Maybe a threat to google's business model? These would be the perfect world, but we have to assume that Webs would certainly lie or made mistakes in their semantic descriptions. OK, but... would it produce an scenario worst than the actual?. Now, fake webs are quite common; irrelevant sites try to advertise them by using all the available means to attract the most visitors, misleading them. The best web search engine is this who best filters these sites in the searches. In a semantically described Web, the problem will be the same, but there would be another easy-to-use filtering criteria to enhance the results. the Web search engines' algorithm will be better for sure.
On my website, there are a few links, among which are:
- Download
- Forums
The Download and Forums links are next to each other, and highly visible (48x48 icons with labels). But people go to the forum to ask where they can download my program! When I ask them why they didn't click on the Download link, they don't give an answer.
If that isn't user incompetence, then what is it? And yes, this happened for real. In fact, it happens all the time, so it's not just 1 or 2 people.
The problem with users (authors) is valid when we consider individual authors creating data (RDF, HTML, ...) "by hand". TimBL has referred to the Semantic Web as a global database of knowledge (as compared to the current web of text content). The problem of incompetent users goes away and higher value of data is achieved when exposing already existing content and databases on the Semantic Web. Think sites like SlashDot, wordpress.com, amazon.com, NY Times, ...
Authoring of RDF data is not so different from authoring XML or RSS. This means that costs of putting your site on the Semantic Web are quite low. The benefits are a global reuse of information.
For example: it is easy to install WordPress SIOC plugin to export RDF from any WordPress based weblog. Individual users don't have to care what RDF is or looks like. And the data about all posts and comments are now computer readable and can be reused in a number of ways, e.g., to create a TimeLine of your posts.
If we take this approach and expose data from existing sites in RDF, the task of authoring quality data can be accomplished. The problem of spam referred in the article can be dealt with by signing the information - since Semantic Web is still young the problems of misuse can be addressed in the architecture right from the beginning.
I would like to focus your attention in another important area - consumers of Semantic Web data. There is and will be quality data out there. What is interesting now is to find new and useful ways to use this information and add value over what can be done with simple web pages.
But what, exactly, is the definition of the 'Semantic Web'? How is it different from what has been done in the past? Is there any agreement of any sort as to what it means? If yes, please let me know. If not, then how can we achieve this goal if we do not know what it is?
I am confused, I really do not see too many differences in the web in the last few years. Nothing 'Earth Shattering' anyway.
putting the 'B' in LGBTQ+
We already have that problem without the Semantic Web. Semantic Web coding is not a fix for that problem, it's a fix for other problems.
This is like saying "Don't use Open Source software because people will do bad things with it". People will do bad things with or without Open Source software, and with or without the Semantic Web.
Anyway the article isn't very clear... By "Semantic Web", are we talking about using <div>s and <p>s instead of <table>s and <br>s? Or are we talking about microformats? Or something else?
One more thing... if google doesn't support the "Semantic Web", it will most likely fail.
Depends--if Norvig got Russell (co-author with him on Artificial Intelligence - a Modern Approach) to go in with him for a tag-team kind of thing, they'd probably win. On the other hand, Berners-Lee has the W3C on his side, a notoriously large and heavy organization, which could be hard to topple.
As a side note, I heard from a friend who was attending that Norvig's opening comment about people always asking him "Why are you against the Semantic Web?" was a response to Berners-Lee's opening, 'Poeple always ask me, "Why are you against Artificial Intelligence?"'
Google has the right idea, automatic extraction of semantics from content.
But content has no semantics.
Meaning is a verb, and "to mean" is an action of a knowing subject. Communication is an attempt to stimulate the same meanings in multiple subjects--kind of a psychological choreography.
As such, meaning is not extracted from content, ever. Rather, probable meaning is inferred from content, and the basis of inference is fundamentally psychological. What a given word, symbol, sentence, paragraph or page means will depend entirely on who is doing the meaning, and any attempt to infer meaning from content will therefore require a model of who is doing the meaning.
Building such a model is non-trivial, and multiple models will be required to serve the needs of multiple constituencies. The meaning of "nekkid women with goats" will vary widely depending on who is doing the meaning: a man or a woman, an religious person or a rational person, a child or an adult, and so on.
Tagging mechanisms that allow anyone who visits a page to classify it will be subject to an enormous range of variation even without spammers, bored teenagers and other malicious entities. People have mentioned the low quality of data in freeDB, and that is within an extremely narrow, specialized area with well-recognized commercially-created categories. Consider the mess that will result from the average American, nearly half of whom believe that god created humans in our current form in the past few thousand years being free to tag pages dealing with evolution and ID.
None of this is to say that some form of classification wouldn't be a good thing, nor that useful classifications aren't possible. But successful attempts at associating common (within a given culture/sex/age group) meanings with content are going to be vastly harder than most advocates of the semantic web believe, and will be based on a fundamental awareness that content on its own has no sematics. Until that fact is recognized and incorporated into the designs as the deepest level we will get nothing useful out of the semantic web.
Blasphemy is a human right. Blasphemophobia kills.
I won't be so sure (that it is not a general case).
A regular user won't be inventing his own ontologies the same way as he is not inventing a new RSS format. There is a set of well-define ontologies that you can use to describe your data. And a regular user won't be hand-crafting RDF data either. Instead RDF data will be exported from his applications the same way as RSS and Atom are exported from his weblog software or as Word saves users files.
RDF data will still merge together, provided there are "crystalisation points" that are common to data from different sources.
Regarding Luc Steels' research you are mentioning - could you give some pointers to his work?