Semantic Web Under Suspicion
Dr Occult writes "Much of the talk at the 2006 World Wide Web conference has been about the technologies behind the so-called semantic web. The idea is to make the web intelligent by storing data such that it can be analyzed better by our machines, instead of the user having to sort and analyze the data from search engines. From the article: 'Big business, whose motto has always been time is money, is looking forward to the day when multiple sources of financial information can be cross-referenced to show market patterns almost instantly.' However, concern is also growing about the misuses of this intelligent web as an affront to privacy and security."
...and growing and evolving.
Take a look at the "blogosphere" and the tagging/classification initiative that's happening there.
Sure, it seems crude and unrefined but it's working, like most grass-roots initiatives do when compared with grandiose "industry standards" and the big, bulky workgroups that try to define them.
body massage!
What I really want to see is the search engine reduce the duplicated content to single entries (try Googling for a Java classname and you'll see how many Google-searched websites have the API on them), or order them by reoccurrance of the word or phrase giving the context more value than the popularity of the page.
There is a huge problem with this, and it goes back to the days of people jamming 1000 instances of their keywords at the bottom of their pages in the same fant color as the background. Also, your desire to rate the pages on context requires an ontology type algo, which is NOT easy. Google has been working on this for a little while now, but it is a big hill to climb. They are using popularity as a substitution for this. It is not the most effective, but it is a pretty decent second option.
There is another issue with the approach you suggest. If Google decides that javapage.htm is the end all be all of JAVA knowledge, and removes all other listings from their database - then everyone and their grandmother will be fed information from this one source. That will ultimately reduce the effectiveness of Google to return valid responses to people who do not use search like a robot.
There is a human element at play here that Google is attempting to cater to through sheer numbers. Not everyone knows how to use search properly, hell most people have no idea. Keyword order, booleans, quotes - these will all affect the results given back, but very few people use them right off the bat. If you reduce the number of returned listings for a single word search to one area that was detirmined to be the authority, you have just made your search engine less effective in the eyes of the less skilled. I would be willing to bet that this less skilled group composed most of Googles userbase.
If you don't cater to these people, then you lose marketshare, and then you lose revenue from advertisers, and then you go out of business.
"Big business, whose motto has always been time is money"
That motto is really "anything for a buck". Even if business has to wait or waste time to get money, it will wait until the cows come home - then sell them.
--
make install -not war
You could already do this semantic web nonsense if people would just stick to a standard and be honest with what they publish.
Nobody wants to do that however. Mobile phone companies always try to make their offering sound as attractive as possible by highlighting the good points and hiding the bad ones. Phone stores try to cut through this by making their own charts for comparing phone companies but in turn try to hide the fact that they get a bigger cut from some companies then others.
It wouldn't be at all hard to set up a standard that would make it very easy to tell what cell phone subscription is best for you. Getting the companies involved to participate is impossible however.
This is the real problem with searching the web right now. It wouldn't be at all hard to use google today if everyone was honest with their site content. For instance, removed the word "review" from a product page if no review is available.
Do you think this is going to happen anyday soon? No, then the semantic web will not be with us anyday soon either.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
The next great leap in searching the web won't be due to the semantic web. It'll be natural language processing. Soon the day will come when you will be able to type in a "real" question and truely get the best answers back. We all know keyword searching doesn't cut it. But a complete question can be interpolated to a logical query. It'll require no change to current web pages. Just a much smarter search engine.
Developers: We can use your help.
All the hoopla around the Semantic Web reminds me exactly of the days "XML" became the latest high-flying meme touted by "tech" writers en masse. Witness:
The semantic search engine would then cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.
And here in all its glory is the 1999 version:
The software would then use XML to cross-reference all of the information about hotels in Majorca, including checking whether the rooms are available, and then bring back the results which match your query.
Of course, the problem with this fantasy of XML was that no standardization of schemas led to an infinite mix of tagging and thus, the laypersons idea that "this XML document can be read and understood by any software" was pure bunk.
Granted, the semantic web addresses many of these problems, but IMHO the underlying problem remains: layers of context on top of content still need to be parsed and understood.
So the question remains: will the Semantic Web be implemented in a useful fashion before some develops a Contextual Web Mining system that understands web content to a degree that it fufills the promise of the Semantic Web without additional context?
Disclaimer: I work on contextual web content extraction software so yes I may be biased towards this solution, but I really think the Semantic Web has a insanely high hurdle (proper implementation in millions of web pages) before we can tell how successful it is.
"All of this data is public data already," said Mr Glaser. "The problem comes when it is processed."
The privacy and security concerns are bizarre. They're saying that there is currently an implicit "security through obscurity" and that's ok. However, if someone were to make available data more easily found, then it would be less secure?
Here's a radical thought; don't make any data public you don't want someone to see. Blaming Google because you put your home address on your blog and "bad people" found you is absurd. If data is sensitive it shouldn't be there now.
You can't really bitch about peeping Tom's if you built the glass house.
The huge, glaring issue with the Semantic Web idea that I see is: how do you force the creators of web content to put the right semantic tags on their content? What's to stop there being thousands of sites full of nothing but semantic tags so that even Swoogling for "747" brings up porn first? The clear answer is that the tags will have to be out of the control of the creators of the web content. That means somebody or someTHING else - namely, your Semantic Web search engine of choice - has to figure out your site's tags for you. And the ONLY way to accurately judge, classify and rank a web page is by its actual, real content. This is just another way of looking at the same problem. I'm waiting to be impressed.
qntm.org