Semantic Web Under Suspicion
Dr Occult writes "Much of the talk at the 2006 World Wide Web conference has been about the technologies behind the so-called semantic web. The idea is to make the web intelligent by storing data such that it can be analyzed better by our machines, instead of the user having to sort and analyze the data from search engines. From the article: 'Big business, whose motto has always been time is money, is looking forward to the day when multiple sources of financial information can be cross-referenced to show market patterns almost instantly.' However, concern is also growing about the misuses of this intelligent web as an affront to privacy and security."
What I really want to see is the search engine reduce the duplicated content to single entries (try Googling for a Java classname and you'll see how many Google-searched websites have the API on them), or order them by reoccurrance of the word or phrase giving the context more value than the popularity of the page.
There is a huge problem with this, and it goes back to the days of people jamming 1000 instances of their keywords at the bottom of their pages in the same fant color as the background. Also, your desire to rate the pages on context requires an ontology type algo, which is NOT easy. Google has been working on this for a little while now, but it is a big hill to climb. They are using popularity as a substitution for this. It is not the most effective, but it is a pretty decent second option.
There is another issue with the approach you suggest. If Google decides that javapage.htm is the end all be all of JAVA knowledge, and removes all other listings from their database - then everyone and their grandmother will be fed information from this one source. That will ultimately reduce the effectiveness of Google to return valid responses to people who do not use search like a robot.
There is a human element at play here that Google is attempting to cater to through sheer numbers. Not everyone knows how to use search properly, hell most people have no idea. Keyword order, booleans, quotes - these will all affect the results given back, but very few people use them right off the bat. If you reduce the number of returned listings for a single word search to one area that was detirmined to be the authority, you have just made your search engine less effective in the eyes of the less skilled. I would be willing to bet that this less skilled group composed most of Googles userbase.
If you don't cater to these people, then you lose marketshare, and then you lose revenue from advertisers, and then you go out of business.