Slashdot Mirror


Publishers Seek Change in Search Result Content

explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."

3 of 181 comments (clear)

  1. Here's the documentation by Wesley+Felter · · Score: 5, Informative

    http://www.the-acap.org/project-documents.php

    At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
    This article will disappear from our site in N days, so it better disappear from search engines at the same time
    Don't frame this article
    Don't extract images or thumbnails from this article
    If you show a cached copy of this article, it better include the original ads
    etc.

  2. robots.txt a W3C issue by m94mni · · Score: 5, Informative

    Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:

    http://www.w3.org/2001/tag/group/track/issues/36

    See Tim B-L's original mail here:

    http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093

    One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).

  3. Re:The Text I Actually Submitted by pla · · Score: 4, Informative

    Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

    Simple - They want to have their cake and eat it too.

    They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.

    However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.

    Can you hear the violins?