Publishers Seek Change in Search Result Content
explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."
http://www.the-acap.org/project-documents.php
At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
This article will disappear from our site in N days, so it better disappear from search engines at the same time
Don't frame this article
Don't extract images or thumbnails from this article
If you show a cached copy of this article, it better include the original ads
etc.
Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:
http://www.w3.org/2001/tag/group/track/issues/36
See Tim B-L's original mail here:
http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093
One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).
Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.
Simple - They want to have their cake and eat it too.
They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.
However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.
Can you hear the violins?