Challenging the Ideas Behind the Semantic Web

← Back to Stories (view on slashdot.org)

Challenging the Ideas Behind the Semantic Web

Posted by ryuzaki0 on Tuesday July 18, 2006 @05:39PM from the there-isn't-any-deception-on-the-internet dept.

mytrip writes to tell us that after a recent presentation to the American Association for Artificial Intelligence (AAAI) Tim Berners-Lee was challenged by fellow Google exec Peter Norvig citing some of the many problems behind the Semantic Web. From the article: "'What I get a lot is: "Why are you against the Semantic Web?" I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."

6 of 144 comments (clear)

Min score:

Reason:

Sort:

Semantics... by Thakandar2 · 2006-07-18 17:52 · Score: 5, Funny

"Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."

Here I was, thinking we were arguing over Semantics...
Semantic web is currently fragile technology by UR30 · 2006-07-18 17:56 · Score: 5, Interesting

The current semantic web seems to offer a technology too fragile to use on the global scale. The complexity of various classification and ontological schemes, work needed to provide the metadata etc. Also, semantic web seems to offer great opporturnities for spammers and other mischief makers. Now we already have comment and reference spamming, but semantic web (on the global scale) raises the possibilities enormously.
Re:Googlebombing by Wastl · 2006-07-18 18:28 · Score: 5, Insightful

The "Semantic Web" is not about search engines, as you and many other posters seem to believe. It is about representing Web content in a structured, formal way that is more easily accessed by machines, going beyond simple presentation. This can be used for searching, but also for many other applications, e.g. integration, exchange, personalisation, ... .

Spam content on the Semantic Web is in no way different to spam content on the normal Web (well, except that it is formal). This also means that a search engine that is capable of working with Semantic Web data has exactly the same issues with trust as traditional search engines. Except that on the Semantic Web, trust can be expressed formally as well. Similar to the authorities in Google, whose outgoing links make a statement about the trustworthiness of other sites, an "authority" on the Semantic Web can make statements about the trustworthiness of other sites. However, these statements are explicit, and they could also be used to state that another site is *not* trustworthy.

Google has the right idea, automatic extraction of semantics from content.

Google does not extract any semantics from content. It merely analyses the linking between websites and connects that with keywords. No semantics here.

Sebastian
Web of Trust by VDM · 2006-07-18 18:40 · Score: 5, Interesting

In one of the very first papers mentioning the Semantic Web, some paragraph was devoted to something then lost in the hype around the semantic web: the Web of trust, which had to be something like a certification of metadata. This is perhaps to be again regarded as important for the semantic web and the web in general (although not easy to manage).
By the way, Norvig is not only a Google exec, but also a well known AI researcher, author of one of most important books on that subject.
Hmph... by Jello+B. · 2006-07-18 19:06 · Score: 5, Funny

That anti-semantic bastard...
Re:A bad example: FreeDB by kthejoker · 2006-07-19 00:22 · Score: 5, Insightful

Ugh, this is the major misconception of proper Semantic Web implementation.

There are two user types of Semantic Web materia: the individual user and the group.

The individual user only cares about context. It's like a Proustian adventure for him. If he tags Slashdot as "blatherscyte" because that's how he views it, then that's valid. If he tags it as "cmdrTaco" because he is stalking Rob, then that's valid, too. And if he tags it as "monkey" because one time he was petting a monkey while he viewed the site, then that's valid, too. It's like the old saying, "Whether you think you can or think you can't, you're right." There are no wrong semantics for the individual user, because it is his context alone which defines the usefulness of a tag.

For this reason, the individual user should be allowed to tag freely and without limits, and also be able to edit or remove tags later.

----

Now for the group, they have a different goal. Context does them no good, because they don't have the same context. Their goal then is consensus. Take your problem at FreeDB. The simple solution is to let people vote on the accuracy of disputed tags. Or flag ones they view as incorrect, and then review those that meet a certain threshold for flagging. Basically, you want the group to filter out things that don't apply to the group, WHILE maintaining individual context. You don't delete the tags that the group has rejected - you just hide them from the person who has come to view the group tags.

I think this dichotomy of group vs. individual is what has gotten us into trouble with the Semantic Web. To use one example, I think delicious' big mistake was to show you "popular" tags for a given link. What that does is encourages you not to create your own tags, but instead just piggyback on popularity. Over time, this creates homogeny, which is great for the group, but not for the individual user. Sure, they can probably find that link again in a minimal amount of time, but if an individual tag might help them find it faster, but they shunned individual tags for groupthink, so much the worse for them.

And on the flipside if you don't provide proper weighting and trust metrics into your tagging system, you are opening yourself up to not only abuse and inappropriate behavior, but also to the "incompetence" mentioned in the article, which is not so much incompetence as a zero-filter. It's like reading Slashdot at -1. It's kind of a touchy-feely way to look at it, but in Web 2.0 thinking, it's bad to delete content; just filter it out instead. It's bad to censor opinions from the software side; let each user do their own stifling. Give the users complete control over the content, and they will find models that work. It's that simple.

The main problem with the Google guy's point is that philosophically, Google is more groupthink than individual user, because they're a search engine. They value consensus over context. In the future, perhaps they will value context a little bit more than they do. Until then, they have to stand where they stand, because they can't let context into their system. They've tried some clunky mechanisms to do so (Personal Search, anyone?) but until they get it right, the Semantic Web won't have any value to them.