Slashdot Mirror


Publishers Seek Change in Search Result Content

explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."

7 of 181 comments (clear)

  1. The Text I Actually Submitted by explosivejared · · Score: 5, Interesting

    When I submitted this I added that a lot of times the more I see in a search result, the more likely I am to hit that website. I know going in that the search engine is going to have the full story. It's a summary. That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that. Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

    --
    I got a catholic block.
    1. Re:The Text I Actually Submitted by Anonymous Coward · · Score: 3, Interesting

      It may be the case that they have ads on the page for which they get paid by the page view, and by allowing search engines to show a summary, you may be saved from going to the page, depriving them of revenue.

      However, I tend to agree with you, and when I don't see a relevant summary, I'm simply less likely to click through to the page, so this may well backfire on them. Either they're not understanding search users' usage patterns, or else they believe that this is so prevalent that nothing will have summaries, and searchers will be forced to click through to find anything.

    2. Re:The Text I Actually Submitted by ShieldW0lf · · Score: 4, Interesting

      Wow. Don't you think you're overreacting, just a little?

      Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.


      I am.

      I'll be launching a service in the new year to help actively creating artists make a profit off selling original works, leveraging the copyleft and mashup cultures to generate a fanbase and simultaneously devalue the global copyright pool.

      For the right types of creators, the strategy of increasing the amount of budgets available for custom work by annihilating the cost of existing bodies of work is a valid one, and I intend to make it very easy for those types of people to do so as a side effect of their making money off the things that you cannot copy.

      You'll excuse me if I wait till the new year to slashdot myself, but I assure you, I have sunk hundreds and hundreds of man hours and a lot of my own dough into putting my money where my mouth is, and when I'm ready, you will know all about it whether you like it or not, because it will be some noteworthy stuff.

      So no. I don't think I'm overreacting at all. I like to think when it all pans out in the end I'm going to play some small but important personal role in bringing the old things crashing down as a matter of fact. And have the people doing the real work be richer for it.

      --
      -1 Uncomfortable Truth
  2. So they tell you what they don't want you to see? by Anonymous Coward · · Score: 5, Interesting

    Hmm, i wonder how long before someone opens a search engine that indexes only what is "hidden"(yeah, really...) by the ACAP settings.

    Just don't do it in the US or someone will tell the judge: "The defendant knowingly circumvented the DRM - which is called ACAP - of our online newspaper".

    ACAP - Anonymous Coward Anonymously Posting

  3. Historical footnote: where robots.txt came from by charlie · · Score: 5, Interesting

    My one lasting legacy on the web ...

    Back in 1993, when I was teaching myself Perl in my spare time (while working for a -- cough -- UNIX company called The Santa Cruz Operation -- no relation to the current Utah asshats of that name), I was practicing by working on a spider. Now, back then SCO's Watford engineering centre was connected to the internet by a humongous 64kbps leased line. And I was working with a variety of sources on robots, and it just so happened that because I was doing a deterministic depth-first traversal of the web (hey, back then you could subscribe to the NCSA "what's new on the web" bulletin and visit all the interesting new websites every day before your coffee cooled), I kept hitting on Martin Kjoster's website. And Martin's then employers (who were doing something esoteric and X.509 oriented, IIRC) only had a 14.4kbps leased line. (Yes, you read that right: a couple of years later we all had faster modems, but this was the stone age.)

    Eventually Martin figured out that I was the bozo who kept leeching all his bandwidth, and contacted me. Throttling and QoS stuff was all in the future back then, so he went for a simpler solution: "Look for a text file called /robots.txt. It has a list of stuff you are not to pull in. Obey it, or I yell at your sysadmins." And so, I guess, my first attempt at a spider was also the first spider to obey the embryonic robot exclusion protocol. Which Martin subsequently generalized and which got turned into a standard.

    So if you're wondering why robots.txt is rather simplistic and brain-dead, it's because it was written to keep this rather simplistic and brain-dead perl n00b from pillaging Martin's bandwidth.

    Ah, the good old days when you could accidentally make someone invent a new protocol before breakfast ...

    1. Re:Historical footnote: where robots.txt came from by mboverload · · Score: 3, Interesting

      That is one of the coolest stories I've heard in a long time.

      I'm fascinated at the beginnings of the web and the people who drove it.

      If you know any place where I can hear more of these please let me know. (reading your blog right now)

  4. Re:So they tell you what they don't want you to se by morgan_greywolf · · Score: 3, Interesting

    Specifically, this seems geared towards sites like Google News that aggregate stories and then publish snippets of them on their home page.

    Personally, I don't really see the problem. You either want your site spidered or you don't. You don't get to control the presentation of the data that is spidered, only the search engines get to do that.

    SO the thing is here is that Google takes its ordinary web spider, applies a little magic to it, and then displays the results as a news page. Big deal.

    You either want your site spidered or you don't. You can't have your cake and eat it too.