Challenging the Ideas Behind the Semantic Web

← Back to Stories (view on slashdot.org)

Challenging the Ideas Behind the Semantic Web

Posted by ryuzaki0 on Tuesday July 18, 2006 @05:39PM from the there-isn't-any-deception-on-the-internet dept.

mytrip writes to tell us that after a recent presentation to the American Association for Artificial Intelligence (AAAI) Tim Berners-Lee was challenged by fellow Google exec Peter Norvig citing some of the many problems behind the Semantic Web. From the article: "'What I get a lot is: "Why are you against the Semantic Web?" I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."

16 of 144 comments (clear)

Min score:

Reason:

Sort:

Problems w/ the Semantic Web by CTalkobt · 2006-07-18 17:49 · Score: 4, Insightful

is the users.

Not the ones searching but the ones creating the content.

They'll be some idiot out there (like there is now) that will code his data in a way that guarantees that he gets the most page views etc. So often searched terms will turn up on search indexes and other ilk.

It's a loosing proposition unless you come up with filters but then they have their own set of problems.

--
There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
1. Re:Problems w/ the Semantic Web by CRCulver · 2006-07-18 18:11 · Score: 4, Interesting
  
  ...is the users. Not the ones searching but the ones creating the content.
  
  Sure, the technical limitations of Joe Public might slow the growth of the Semantic Web on the whole, but what few people realize is that the Semantic Web has already existed for years in in-house or limited-audience networks. Just look at FOAFnaut (an update in a few weeks will return it to full usability) or the very much real-world examples in Geroimenko & Chen's Visualizing the Semantic Web (Springer, 2005).
Semantics... by Thakandar2 · 2006-07-18 17:52 · Score: 5, Funny

"Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."

Here I was, thinking we were arguing over Semantics...
Damn by ErikTheRed · 2006-07-18 17:53 · Score: 4, Funny

"...Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."
Because Norvig vs. Berners-Lee going 10 rounds in a cage is something I'd pay to see.

--

Help save the critically endangered Blue Iguana
Semantic web is currently fragile technology by UR30 · 2006-07-18 17:56 · Score: 5, Interesting

The current semantic web seems to offer a technology too fragile to use on the global scale. The complexity of various classification and ontological schemes, work needed to provide the metadata etc. Also, semantic web seems to offer great opporturnities for spammers and other mischief makers. Now we already have comment and reference spamming, but semantic web (on the global scale) raises the possibilities enormously.
1. Re:Semantic web is currently fragile technology by znu · 2006-07-18 18:36 · Score: 4, Insightful
  
  The full semantic web scheme really ignores a lot of what the Internet has taught us about what technologies succeed. It's not about grand visions and long specifications, it's about simple stuff that solves real problems of limited scope. Look at RSS, for instance; it's about the simplest thing which could do the job it does.
  
  I think we'll eventually realize most of the benefits of the semantic web, but it won't be a result of a grand vision imposed from the top down and implemented all at once. It'll probably be though increasing adoption of microformats, which don't try to classify and specify everything, and are implemented entirely using existing web standards.
  
  --
  This space unintentionally left unblank.
Googlebombing by QuantumFTL · 2006-07-18 18:11 · Score: 4, Insightful

The biggest problem with the semantic web is spam. If you can trust the tags, it's a beautiful idea. If you can't, it's worse than useless - it's a waste of time. Google has the right idea, automatic extraction of semantics from content. If there's no real content, then (hopefully) that will be reflected in the semantic analysis.

Me, I estimate we're 5-10 years away from doing anything terribly useful with all of this stuff, but I can definitely envision the day when an internet without semantics seems as distant as an internet without Google.
1. Re:Googlebombing by Wastl · 2006-07-18 18:28 · Score: 5, Insightful
  
  The "Semantic Web" is not about search engines, as you and many other posters seem to believe. It is about representing Web content in a structured, formal way that is more easily accessed by machines, going beyond simple presentation. This can be used for searching, but also for many other applications, e.g. integration, exchange, personalisation, ... .
  
  Spam content on the Semantic Web is in no way different to spam content on the normal Web (well, except that it is formal). This also means that a search engine that is capable of working with Semantic Web data has exactly the same issues with trust as traditional search engines. Except that on the Semantic Web, trust can be expressed formally as well. Similar to the authorities in Google, whose outgoing links make a statement about the trustworthiness of other sites, an "authority" on the Semantic Web can make statements about the trustworthiness of other sites. However, these statements are explicit, and they could also be used to state that another site is *not* trustworthy.
  
  Google has the right idea, automatic extraction of semantics from content.
  
  Google does not extract any semantics from content. It merely analyses the linking between websites and connects that with keywords. No semantics here.
  
  Sebastian
2. Re:Googlebombing by QuantumFTL · 2006-07-18 18:53 · Score: 4, Informative
  
  Google does not extract any semantics from content. It merely analyses the linking between websites and connects that with keywords. No semantics here.
  
  I believe you are referring to PageRank, which is one of many algorithms used by google to determine search relevance. This article discusses their use of Latent Semantic Indexing, which is a somewhat crude but effective form of sematic inference which is widely used in the field of NLP.
Incompetence of users such as Slashdot editors... by rsidd · 2006-07-18 18:14 · Score: 4, Insightful

Thanks for the illustration of what Norvig meant. How is "Google Director of Search and AAAI Fellow Peter Norvig" (original article) semantically equivalent to "fellow Google exec" (Slashdot summary)? The latter suggests that Tim Berners-Lee too is a Google exec, which would be news to him.
Always bet on the million monkeys by IvyMike · 2006-07-18 18:26 · Score: 4, Insightful

It's really, really difficult to get people to follow rules. We're lazy, we're incompetent (yes), and some of us are evil. I still don't think I truly understand how RDF is supposed to work exactly, and it doesn't even seem like it will be fun to try.

On the other hand, it's really easy to release a million monkeys and let the create what they will. It's not so easy to sort through what they end up producing, but Google does a surprisingly good job of this.

It reminds me of the early days of the Web, when companies like CompuServe and AOL wanted to design and own all content. On the other hand, an internet server with httpd let anybody make a ~/public_html directory and put up whatever they wanted to. The million monkeys won that battle. I think they'll win this one, too.
Web of Trust by VDM · 2006-07-18 18:40 · Score: 5, Interesting

In one of the very first papers mentioning the Semantic Web, some paragraph was devoted to something then lost in the hype around the semantic web: the Web of trust, which had to be something like a certification of metadata. This is perhaps to be again regarded as important for the semantic web and the web in general (although not easy to manage).
By the way, Norvig is not only a Google exec, but also a well known AI researcher, author of one of most important books on that subject.
Norvig's personal project by tfinniga · 2006-07-18 18:43 · Score: 4, Interesting

Slightly offtopic. Peter Norvig gave a talk at my university on similar topics, and there was a short Q&A afterwards.

One of the students asked him what he did for his 20% project. He said that he was usually too busy keeping tabs on what the other employees were doing with their 20% time, so he didn't quite get around to working on his. He told us what he wanted to do, as motivation for himself.

The basic idea is that when he used to work for NASA, it'd always make him upset when people saw faces in random spots on the moon's terrain, and claimed it was aliens that NASA was covering up, or similar. So, he was planning on taking facial recognition software and running it on all of google earth. I think it'd be pretty awesome..
Any progress yet, Mr. Norvig? I'd love to see the results.. :)

--
Powered by Web3.5 RC 2
Hmph... by Jello+B. · 2006-07-18 19:06 · Score: 5, Funny

That anti-semantic bastard...
Semantic knigth by Anonymous Coward · 2006-07-18 20:20 · Score: 4, Funny

This remind me of the famous Semantic knigth parody...
Re:A bad example: FreeDB by kthejoker · 2006-07-19 00:22 · Score: 5, Insightful

Ugh, this is the major misconception of proper Semantic Web implementation.

There are two user types of Semantic Web materia: the individual user and the group.

The individual user only cares about context. It's like a Proustian adventure for him. If he tags Slashdot as "blatherscyte" because that's how he views it, then that's valid. If he tags it as "cmdrTaco" because he is stalking Rob, then that's valid, too. And if he tags it as "monkey" because one time he was petting a monkey while he viewed the site, then that's valid, too. It's like the old saying, "Whether you think you can or think you can't, you're right." There are no wrong semantics for the individual user, because it is his context alone which defines the usefulness of a tag.

For this reason, the individual user should be allowed to tag freely and without limits, and also be able to edit or remove tags later.

----

Now for the group, they have a different goal. Context does them no good, because they don't have the same context. Their goal then is consensus. Take your problem at FreeDB. The simple solution is to let people vote on the accuracy of disputed tags. Or flag ones they view as incorrect, and then review those that meet a certain threshold for flagging. Basically, you want the group to filter out things that don't apply to the group, WHILE maintaining individual context. You don't delete the tags that the group has rejected - you just hide them from the person who has come to view the group tags.

I think this dichotomy of group vs. individual is what has gotten us into trouble with the Semantic Web. To use one example, I think delicious' big mistake was to show you "popular" tags for a given link. What that does is encourages you not to create your own tags, but instead just piggyback on popularity. Over time, this creates homogeny, which is great for the group, but not for the individual user. Sure, they can probably find that link again in a minimal amount of time, but if an individual tag might help them find it faster, but they shunned individual tags for groupthink, so much the worse for them.

And on the flipside if you don't provide proper weighting and trust metrics into your tagging system, you are opening yourself up to not only abuse and inappropriate behavior, but also to the "incompetence" mentioned in the article, which is not so much incompetence as a zero-filter. It's like reading Slashdot at -1. It's kind of a touchy-feely way to look at it, but in Web 2.0 thinking, it's bad to delete content; just filter it out instead. It's bad to censor opinions from the software side; let each user do their own stifling. Give the users complete control over the content, and they will find models that work. It's that simple.

The main problem with the Google guy's point is that philosophically, Google is more groupthink than individual user, because they're a search engine. They value consensus over context. In the future, perhaps they will value context a little bit more than they do. Until then, they have to stand where they stand, because they can't let context into their system. They've tried some clunky mechanisms to do so (Personal Search, anyone?) but until they get it right, the Semantic Web won't have any value to them.