Publishers Seek Change in Search Result Content

← Back to Stories (view on slashdot.org)

Publishers Seek Change in Search Result Content

Posted by ryuzaki0 on Sunday December 2, 2007 @08:29AM from the content-overprotection dept.

explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."

14 of 181 comments (clear)

Min score:

Reason:

Sort:

The Text I Actually Submitted by explosivejared · 2007-12-02 08:37 · Score: 5, Interesting

When I submitted this I added that a lot of times the more I see in a search result, the more likely I am to hit that website. I know going in that the search engine is going to have the full story. It's a summary. That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that. Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

--
I got a catholic block.
1. Re:The Text I Actually Submitted by Anonymous Coward · 2007-12-02 09:00 · Score: 3, Interesting
  
  It may be the case that they have ads on the page for which they get paid by the page view, and by allowing search engines to show a summary, you may be saved from going to the page, depriving them of revenue.
  
  However, I tend to agree with you, and when I don't see a relevant summary, I'm simply less likely to click through to the page, so this may well backfire on them. Either they're not understanding search users' usage patterns, or else they believe that this is so prevalent that nothing will have summaries, and searchers will be forced to click through to find anything.
2. Re:The Text I Actually Submitted by wizardforce · 2007-12-02 09:22 · Score: 2, Interesting
  
  Tom Curley, the AP's chief executive, said the news cooperative spends hundreds of millions of dollars annually covering the world, and that its employees risk often their lives doing so. Technologies such as ACAP, he said, are important to protect AP's original news reports from sites that distribute them without permission.
  
  Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site.
  I hope they realize that restricting search engine crawlers with robots.txt this way really doesn't do much other than decrease the number of people who visit their dite in the first place. I wonder how that will alter their revenue streams. Let them go ahead with it and the whole thing will be self correcting.
  
  --
  Sigs are too short to say anything truly profound so read the above post instead.
3. Re:The Text I Actually Submitted by ShieldW0lf · 2007-12-02 10:56 · Score: 2, Interesting
  
  As it is, I don't care if the publishers rot.
  
  I do. Every time I hear about something like this, the site goes on my CustomizeGoogle blacklist, never to be seen again. It was the slashdot policy of posting "registration required" links to the New York Times that got me started on this path, and honestly, I'm better informed for it. All these big "news" publishers deliver is sanitized, oversimplified, dumbed down, biased and superficial stories blended with propaganda and outright lies concocted by private interests who stand to gain by your being misinformed. They make you stupider for having been exposed to them. Anyone with integrity has already adapted or left long ago, and those that are left are personally responsible for the wreckage. I hope airplanes land on their heads.
  
  --
  -1 Uncomfortable Truth
4. Re:The Text I Actually Submitted by ShieldW0lf · 2007-12-02 11:34 · Score: 4, Interesting
  
  Wow. Don't you think you're overreacting, just a little?
  
  Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.
  
  I am.
  
  I'll be launching a service in the new year to help actively creating artists make a profit off selling original works, leveraging the copyleft and mashup cultures to generate a fanbase and simultaneously devalue the global copyright pool.
  
  For the right types of creators, the strategy of increasing the amount of budgets available for custom work by annihilating the cost of existing bodies of work is a valid one, and I intend to make it very easy for those types of people to do so as a side effect of their making money off the things that you cannot copy.
  
  You'll excuse me if I wait till the new year to slashdot myself, but I assure you, I have sunk hundreds and hundreds of man hours and a lot of my own dough into putting my money where my mouth is, and when I'm ready, you will know all about it whether you like it or not, because it will be some noteworthy stuff.
  
  So no. I don't think I'm overreacting at all. I like to think when it all pans out in the end I'm going to play some small but important personal role in bringing the old things crashing down as a matter of fact. And have the people doing the real work be richer for it.
  
  --
  -1 Uncomfortable Truth
So they tell you what they don't want you to see? by Anonymous Coward · 2007-12-02 08:41 · Score: 5, Interesting

Hmm, i wonder how long before someone opens a search engine that indexes only what is "hidden"(yeah, really...) by the ACAP settings.

Just don't do it in the US or someone will tell the judge: "The defendant knowingly circumvented the DRM - which is called ACAP - of our online newspaper".

ACAP - Anonymous Coward Anonymously Posting
Historical footnote: where robots.txt came from by charlie · 2007-12-02 08:48 · Score: 5, Interesting

My one lasting legacy on the web ...

Back in 1993, when I was teaching myself Perl in my spare time (while working for a -- cough -- UNIX company called The Santa Cruz Operation -- no relation to the current Utah asshats of that name), I was practicing by working on a spider. Now, back then SCO's Watford engineering centre was connected to the internet by a humongous 64kbps leased line. And I was working with a variety of sources on robots, and it just so happened that because I was doing a deterministic depth-first traversal of the web (hey, back then you could subscribe to the NCSA "what's new on the web" bulletin and visit all the interesting new websites every day before your coffee cooled), I kept hitting on Martin Kjoster's website. And Martin's then employers (who were doing something esoteric and X.509 oriented, IIRC) only had a 14.4kbps leased line. (Yes, you read that right: a couple of years later we all had faster modems, but this was the stone age.)

Eventually Martin figured out that I was the bozo who kept leeching all his bandwidth, and contacted me. Throttling and QoS stuff was all in the future back then, so he went for a simpler solution: "Look for a text file called /robots.txt. It has a list of stuff you are not to pull in. Obey it, or I yell at your sysadmins." And so, I guess, my first attempt at a spider was also the first spider to obey the embryonic robot exclusion protocol. Which Martin subsequently generalized and which got turned into a standard.

So if you're wondering why robots.txt is rather simplistic and brain-dead, it's because it was written to keep this rather simplistic and brain-dead perl n00b from pillaging Martin's bandwidth.

Ah, the good old days when you could accidentally make someone invent a new protocol before breakfast ...
1. Re:Historical footnote: where robots.txt came from by mboverload · 2007-12-02 09:09 · Score: 3, Interesting
  
  That is one of the coolest stories I've heard in a long time.
  
  I'm fascinated at the beginnings of the web and the people who drove it.
  
  If you know any place where I can hear more of these please let me know. (reading your blog right now)
Re:Here's a tip... by hedwards · 2007-12-02 09:02 · Score: 2, Interesting

That was largely my thought. It makes very little sense as to why anybody would blind click on a link in this day and age. I personally depend upon the summaries to decide whether or not to click. If I don't get a summary I don't click.

It would make far more sense for these institutions to just take their sites completely off of the search engines via robots.txt and save up those slots in the search results for sites that want traffic. Or perhaps limit it to just the front page, but I think that one can still do that with a competently crafted robots.txt as well.
It's not a bad thing. by 91degrees · 2007-12-02 10:12 · Score: 2, Interesting

Seems to be a lot of people slightly upset over this. But I think this is a good thing. They already have the ability to stop search engines from indexing at all. Now they have much more fine grain control. They can also make their results more useful by setting expiry dates. Presumably they'll also be able to be more specific about what he summary says, and might actually be more useful.

Now some sites will probably want to over control, but they'll lose out.
Re:So they tell you what they don't want you to se by morgan_greywolf · 2007-12-02 10:49 · Score: 3, Interesting

Specifically, this seems geared towards sites like Google News that aggregate stories and then publish snippets of them on their home page.

Personally, I don't really see the problem. You either want your site spidered or you don't. You don't get to control the presentation of the data that is spidered, only the search engines get to do that.

SO the thing is here is that Google takes its ordinary web spider, applies a little magic to it, and then displays the results as a news page. Big deal.

You either want your site spidered or you don't. You can't have your cake and eat it too.

--
My blog
Yeah, but... by Anonymous Coward · 2007-12-02 11:10 · Score: 1, Interesting

Those things are stupid. Were I Google, I'd put up something on my website that made them consent to my terms, or forgo indexing entirely. I can't blame them for wanting more control, but I don't think they should get it. I don't trust them at all.
Re:A prediction by allthingscode · 2007-12-02 11:21 · Score: 2, Interesting

If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them. Google doesn't supply any content, but it does supply a service: It's the first place people go to find out information. If they need more than a summary, they can click on links from the summary page to get details. People aren't going to go to ten websites to look for something if they can start at one place.

You are right: If the search engines disappeared, the big news services wouldn't care. Actually, they would probably enjoy it, because people would go to the New York Times, Washington Post, and other big names sites rather than seeing these smaller sites with better reporting and commentary. But you contradict yourself as well. You say that if the search engines disappeared, the internet would just create more, but then you say that if the big news services stopped providing news, the search engines would die. No they wouldn't. The internet would create more, filling the need.

If the news sites want to control their content better, fine. But I guarantee you the next whine you will hear from them is how Google isn't directing traffic to their websites and it must all be retribution by Google for being made to limit what it displays, rather than people clicking on sites where they can read the summary.
Re:A prediction by zantolak · 2007-12-02 13:20 · Score: 2, Interesting

The organized synthesis and presentation of this content is, in itself, useful content. The number of people using news aggregators should have clued you in on this.