Challenging the Ideas Behind the Semantic Web

← Back to Stories (view on slashdot.org)

Challenging the Ideas Behind the Semantic Web

Posted by ryuzaki0 on Tuesday July 18, 2006 @05:39PM from the there-isn't-any-deception-on-the-internet dept.

mytrip writes to tell us that after a recent presentation to the American Association for Artificial Intelligence (AAAI) Tim Berners-Lee was challenged by fellow Google exec Peter Norvig citing some of the many problems behind the Semantic Web. From the article: "'What I get a lot is: "Why are you against the Semantic Web?" I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."

7 of 144 comments (clear)

Min score:

Reason:

Sort:

Problems w/ the Semantic Web by CTalkobt · 2006-07-18 17:49 · Score: 4, Insightful

is the users.

Not the ones searching but the ones creating the content.

They'll be some idiot out there (like there is now) that will code his data in a way that guarantees that he gets the most page views etc. So often searched terms will turn up on search indexes and other ilk.

It's a loosing proposition unless you come up with filters but then they have their own set of problems.

--
There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
Googlebombing by QuantumFTL · 2006-07-18 18:11 · Score: 4, Insightful

The biggest problem with the semantic web is spam. If you can trust the tags, it's a beautiful idea. If you can't, it's worse than useless - it's a waste of time. Google has the right idea, automatic extraction of semantics from content. If there's no real content, then (hopefully) that will be reflected in the semantic analysis.

Me, I estimate we're 5-10 years away from doing anything terribly useful with all of this stuff, but I can definitely envision the day when an internet without semantics seems as distant as an internet without Google.
1. Re:Googlebombing by Wastl · 2006-07-18 18:28 · Score: 5, Insightful
  
  The "Semantic Web" is not about search engines, as you and many other posters seem to believe. It is about representing Web content in a structured, formal way that is more easily accessed by machines, going beyond simple presentation. This can be used for searching, but also for many other applications, e.g. integration, exchange, personalisation, ... .
  
  Spam content on the Semantic Web is in no way different to spam content on the normal Web (well, except that it is formal). This also means that a search engine that is capable of working with Semantic Web data has exactly the same issues with trust as traditional search engines. Except that on the Semantic Web, trust can be expressed formally as well. Similar to the authorities in Google, whose outgoing links make a statement about the trustworthiness of other sites, an "authority" on the Semantic Web can make statements about the trustworthiness of other sites. However, these statements are explicit, and they could also be used to state that another site is *not* trustworthy.
  
  Google has the right idea, automatic extraction of semantics from content.
  
  Google does not extract any semantics from content. It merely analyses the linking between websites and connects that with keywords. No semantics here.
  
  Sebastian
Incompetence of users such as Slashdot editors... by rsidd · 2006-07-18 18:14 · Score: 4, Insightful

Thanks for the illustration of what Norvig meant. How is "Google Director of Search and AAAI Fellow Peter Norvig" (original article) semantically equivalent to "fellow Google exec" (Slashdot summary)? The latter suggests that Tim Berners-Lee too is a Google exec, which would be news to him.
Always bet on the million monkeys by IvyMike · 2006-07-18 18:26 · Score: 4, Insightful

It's really, really difficult to get people to follow rules. We're lazy, we're incompetent (yes), and some of us are evil. I still don't think I truly understand how RDF is supposed to work exactly, and it doesn't even seem like it will be fun to try.

On the other hand, it's really easy to release a million monkeys and let the create what they will. It's not so easy to sort through what they end up producing, but Google does a surprisingly good job of this.

It reminds me of the early days of the Web, when companies like CompuServe and AOL wanted to design and own all content. On the other hand, an internet server with httpd let anybody make a ~/public_html directory and put up whatever they wanted to. The million monkeys won that battle. I think they'll win this one, too.
Re:Semantic web is currently fragile technology by znu · 2006-07-18 18:36 · Score: 4, Insightful

The full semantic web scheme really ignores a lot of what the Internet has taught us about what technologies succeed. It's not about grand visions and long specifications, it's about simple stuff that solves real problems of limited scope. Look at RSS, for instance; it's about the simplest thing which could do the job it does.

I think we'll eventually realize most of the benefits of the semantic web, but it won't be a result of a grand vision imposed from the top down and implemented all at once. It'll probably be though increasing adoption of microformats, which don't try to classify and specify everything, and are implemented entirely using existing web standards.

--
This space unintentionally left unblank.
Re:A bad example: FreeDB by kthejoker · 2006-07-19 00:22 · Score: 5, Insightful

Ugh, this is the major misconception of proper Semantic Web implementation.

There are two user types of Semantic Web materia: the individual user and the group.

The individual user only cares about context. It's like a Proustian adventure for him. If he tags Slashdot as "blatherscyte" because that's how he views it, then that's valid. If he tags it as "cmdrTaco" because he is stalking Rob, then that's valid, too. And if he tags it as "monkey" because one time he was petting a monkey while he viewed the site, then that's valid, too. It's like the old saying, "Whether you think you can or think you can't, you're right." There are no wrong semantics for the individual user, because it is his context alone which defines the usefulness of a tag.

For this reason, the individual user should be allowed to tag freely and without limits, and also be able to edit or remove tags later.

----

Now for the group, they have a different goal. Context does them no good, because they don't have the same context. Their goal then is consensus. Take your problem at FreeDB. The simple solution is to let people vote on the accuracy of disputed tags. Or flag ones they view as incorrect, and then review those that meet a certain threshold for flagging. Basically, you want the group to filter out things that don't apply to the group, WHILE maintaining individual context. You don't delete the tags that the group has rejected - you just hide them from the person who has come to view the group tags.

I think this dichotomy of group vs. individual is what has gotten us into trouble with the Semantic Web. To use one example, I think delicious' big mistake was to show you "popular" tags for a given link. What that does is encourages you not to create your own tags, but instead just piggyback on popularity. Over time, this creates homogeny, which is great for the group, but not for the individual user. Sure, they can probably find that link again in a minimal amount of time, but if an individual tag might help them find it faster, but they shunned individual tags for groupthink, so much the worse for them.

And on the flipside if you don't provide proper weighting and trust metrics into your tagging system, you are opening yourself up to not only abuse and inappropriate behavior, but also to the "incompetence" mentioned in the article, which is not so much incompetence as a zero-filter. It's like reading Slashdot at -1. It's kind of a touchy-feely way to look at it, but in Web 2.0 thinking, it's bad to delete content; just filter it out instead. It's bad to censor opinions from the software side; let each user do their own stifling. Give the users complete control over the content, and they will find models that work. It's that simple.

The main problem with the Google guy's point is that philosophically, Google is more groupthink than individual user, because they're a search engine. They value consensus over context. In the future, perhaps they will value context a little bit more than they do. Until then, they have to stand where they stand, because they can't let context into their system. They've tried some clunky mechanisms to do so (Personal Search, anyone?) but until they get it right, the Semantic Web won't have any value to them.