Slashdot Mirror


Publishers Seek Change in Search Result Content

explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."

7 of 181 comments (clear)

  1. Here's the documentation by Wesley+Felter · · Score: 5, Informative

    http://www.the-acap.org/project-documents.php

    At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
    This article will disappear from our site in N days, so it better disappear from search engines at the same time
    Don't frame this article
    Don't extract images or thumbnails from this article
    If you show a cached copy of this article, it better include the original ads
    etc.

  2. And the link to ACAP... by Bill+Dimm · · Score: 3, Informative

    You would think an article about ACAP would provide a link to it.

  3. robots.txt a W3C issue by m94mni · · Score: 5, Informative

    Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:

    http://www.w3.org/2001/tag/group/track/issues/36

    See Tim B-L's original mail here:

    http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093

    One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).

  4. What a joke by WindBourne · · Score: 2, Informative

    If these publishers want to own the search engines, then they should build their own! These engines do them a favor. This is no different than the music publishers trying to control the bands and how they get paid.

    --
    I prefer the "u" in honour as it seems to be missing these days.
  5. Re:The Text I Actually Submitted by fm6 · · Score: 3, Informative

    Even without your comments, your submission is way too long. You quoted nearly one third of the article! Next time, take the time to summarize the article in a few sentences. Not only will that make room for your opinions, it will make for a more readable submission that's more likely to he accepted.

  6. Re:The Text I Actually Submitted by pla · · Score: 4, Informative

    Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

    Simple - They want to have their cake and eat it too.

    They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.

    However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.

    Can you hear the violins?

  7. Re:The Text I Actually Submitted by 1u3hr · · Score: 3, Informative
    But why is it opt out when every other media is opt in?

    1) "Every other media is opt in" -- not true. Fair use applies for most media, allowing summaries and brief quotes without permission, which is what this is about. E.g.: Watch your TV news and you'll often see video taken from other TV news shows, clearly often without explicit permission -- does any US station pay Al Jazeera when they use their video?

    2) The web has always been "opt-out". Thus if you change this assumption, the vast majority of web pages, with no expressed policy, would be excluded from search engines.