Publishers Seek Change in Search Result Content
explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."
http://www.the-acap.org/project-documents.php
At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
This article will disappear from our site in N days, so it better disappear from search engines at the same time
Don't frame this article
Don't extract images or thumbnails from this article
If you show a cached copy of this article, it better include the original ads
etc.
You would think an article about ACAP would provide a link to it.
Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:
http://www.w3.org/2001/tag/group/track/issues/36
See Tim B-L's original mail here:
http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093
One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).
If these publishers want to own the search engines, then they should build their own! These engines do them a favor. This is no different than the music publishers trying to control the bands and how they get paid.
I prefer the "u" in honour as it seems to be missing these days.
Even without your comments, your submission is way too long. You quoted nearly one third of the article! Next time, take the time to summarize the article in a few sentences. Not only will that make room for your opinions, it will make for a more readable submission that's more likely to he accepted.
Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.
Simple - They want to have their cake and eat it too.
They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.
However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.
Can you hear the violins?
1) "Every other media is opt in" -- not true. Fair use applies for most media, allowing summaries and brief quotes without permission, which is what this is about. E.g.: Watch your TV news and you'll often see video taken from other TV news shows, clearly often without explicit permission -- does any US station pay Al Jazeera when they use their video?
2) The web has always been "opt-out". Thus if you change this assumption, the vast majority of web pages, with no expressed policy, would be excluded from search engines.