Challenging the Ideas Behind the Semantic Web
mytrip writes to tell us that after a recent presentation to the American Association for Artificial Intelligence (AAAI) Tim Berners-Lee was challenged by fellow Google exec Peter Norvig citing some of the many problems behind the Semantic Web. From the article: "'What I get a lot is: "Why are you against the Semantic Web?" I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."
is the users.
Not the ones searching but the ones creating the content.
They'll be some idiot out there (like there is now) that will code his data in a way that guarantees that he gets the most page views etc. So often searched terms will turn up on search indexes and other ilk.
It's a loosing proposition unless you come up with filters but then they have their own set of problems.
There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
"Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."
Here I was, thinking we were arguing over Semantics...
Help save the critically endangered Blue Iguana
The current semantic web seems to offer a technology too fragile to use on the global scale. The complexity of various classification and ontological schemes, work needed to provide the metadata etc. Also, semantic web seems to offer great opporturnities for spammers and other mischief makers. Now we already have comment and reference spamming, but semantic web (on the global scale) raises the possibilities enormously.
The biggest problem with the semantic web is spam. If you can trust the tags, it's a beautiful idea. If you can't, it's worse than useless - it's a waste of time. Google has the right idea, automatic extraction of semantics from content. If there's no real content, then (hopefully) that will be reflected in the semantic analysis.
Me, I estimate we're 5-10 years away from doing anything terribly useful with all of this stuff, but I can definitely envision the day when an internet without semantics seems as distant as an internet without Google.
Thanks for the illustration of what Norvig meant. How is "Google Director of Search and AAAI Fellow Peter Norvig" (original article) semantically equivalent to "fellow Google exec" (Slashdot summary)? The latter suggests that Tim Berners-Lee too is a Google exec, which would be news to him.
It's really, really difficult to get people to follow rules. We're lazy, we're incompetent (yes), and some of us are evil. I still don't think I truly understand how RDF is supposed to work exactly, and it doesn't even seem like it will be fun to try.
On the other hand, it's really easy to release a million monkeys and let the create what they will. It's not so easy to sort through what they end up producing, but Google does a surprisingly good job of this.
It reminds me of the early days of the Web, when companies like CompuServe and AOL wanted to design and own all content. On the other hand, an internet server with httpd let anybody make a ~/public_html directory and put up whatever they wanted to. The million monkeys won that battle. I think they'll win this one, too.
In one of the very first papers mentioning the Semantic Web, some paragraph was devoted to something then lost in the hype around the semantic web: the Web of trust, which had to be something like a certification of metadata. This is perhaps to be again regarded as important for the semantic web and the web in general (although not easy to manage).
By the way, Norvig is not only a Google exec, but also a well known AI researcher, author of one of most important books on that subject.
Slightly offtopic. Peter Norvig gave a talk at my university on similar topics, and there was a short Q&A afterwards.
:)
One of the students asked him what he did for his 20% project. He said that he was usually too busy keeping tabs on what the other employees were doing with their 20% time, so he didn't quite get around to working on his. He told us what he wanted to do, as motivation for himself.
The basic idea is that when he used to work for NASA, it'd always make him upset when people saw faces in random spots on the moon's terrain, and claimed it was aliens that NASA was covering up, or similar. So, he was planning on taking facial recognition software and running it on all of google earth. I think it'd be pretty awesome..
Any progress yet, Mr. Norvig? I'd love to see the results..
Powered by Web3.5 RC 2
That anti-semantic bastard...
This remind me of the famous Semantic knigth parody...
Ugh, this is the major misconception of proper Semantic Web implementation.
There are two user types of Semantic Web materia: the individual user and the group.
The individual user only cares about context. It's like a Proustian adventure for him. If he tags Slashdot as "blatherscyte" because that's how he views it, then that's valid. If he tags it as "cmdrTaco" because he is stalking Rob, then that's valid, too. And if he tags it as "monkey" because one time he was petting a monkey while he viewed the site, then that's valid, too. It's like the old saying, "Whether you think you can or think you can't, you're right." There are no wrong semantics for the individual user, because it is his context alone which defines the usefulness of a tag.
For this reason, the individual user should be allowed to tag freely and without limits, and also be able to edit or remove tags later.
----
Now for the group, they have a different goal. Context does them no good, because they don't have the same context. Their goal then is consensus. Take your problem at FreeDB. The simple solution is to let people vote on the accuracy of disputed tags. Or flag ones they view as incorrect, and then review those that meet a certain threshold for flagging. Basically, you want the group to filter out things that don't apply to the group, WHILE maintaining individual context. You don't delete the tags that the group has rejected - you just hide them from the person who has come to view the group tags.
I think this dichotomy of group vs. individual is what has gotten us into trouble with the Semantic Web. To use one example, I think delicious' big mistake was to show you "popular" tags for a given link. What that does is encourages you not to create your own tags, but instead just piggyback on popularity. Over time, this creates homogeny, which is great for the group, but not for the individual user. Sure, they can probably find that link again in a minimal amount of time, but if an individual tag might help them find it faster, but they shunned individual tags for groupthink, so much the worse for them.
And on the flipside if you don't provide proper weighting and trust metrics into your tagging system, you are opening yourself up to not only abuse and inappropriate behavior, but also to the "incompetence" mentioned in the article, which is not so much incompetence as a zero-filter. It's like reading Slashdot at -1. It's kind of a touchy-feely way to look at it, but in Web 2.0 thinking, it's bad to delete content; just filter it out instead. It's bad to censor opinions from the software side; let each user do their own stifling. Give the users complete control over the content, and they will find models that work. It's that simple.
The main problem with the Google guy's point is that philosophically, Google is more groupthink than individual user, because they're a search engine. They value consensus over context. In the future, perhaps they will value context a little bit more than they do. Until then, they have to stand where they stand, because they can't let context into their system. They've tried some clunky mechanisms to do so (Personal Search, anyone?) but until they get it right, the Semantic Web won't have any value to them.