Publishers Seek Change in Search Result Content
explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."
http://www.the-acap.org/project-documents.php
At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
This article will disappear from our site in N days, so it better disappear from search engines at the same time
Don't frame this article
Don't extract images or thumbnails from this article
If you show a cached copy of this article, it better include the original ads
etc.
Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:
http://www.w3.org/2001/tag/group/track/issues/36
See Tim B-L's original mail here:
http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093
One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).