Slashdot Mirror


Publishers Seek Change in Search Result Content

explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."

51 of 181 comments (clear)

  1. The Text I Actually Submitted by explosivejared · · Score: 5, Interesting

    When I submitted this I added that a lot of times the more I see in a search result, the more likely I am to hit that website. I know going in that the search engine is going to have the full story. It's a summary. That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that. Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

    --
    I got a catholic block.
    1. Re:The Text I Actually Submitted by Anonymous Coward · · Score: 3, Interesting

      It may be the case that they have ads on the page for which they get paid by the page view, and by allowing search engines to show a summary, you may be saved from going to the page, depriving them of revenue.

      However, I tend to agree with you, and when I don't see a relevant summary, I'm simply less likely to click through to the page, so this may well backfire on them. Either they're not understanding search users' usage patterns, or else they believe that this is so prevalent that nothing will have summaries, and searchers will be forced to click through to find anything.

    2. Re:The Text I Actually Submitted by iminplaya · · Score: 3, Insightful

      Not only that, but if the big search engines start restricting search results, we might see many more "home grown" search engines fill the net with spiders that won't respect robots.txt, and start clogging the tubes which are already getting clogged with advertisement. As it is, I don't care if the publishers rot.

      --
      What?
    3. Re:The Text I Actually Submitted by wizardforce · · Score: 2, Interesting

      Tom Curley, the AP's chief executive, said the news cooperative spends hundreds of millions of dollars annually covering the world, and that its employees risk often their lives doing so. Technologies such as ACAP, he said, are important to protect AP's original news reports from sites that distribute them without permission.

      Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site.
      I hope they realize that restricting search engine crawlers with robots.txt this way really doesn't do much other than decrease the number of people who visit their dite in the first place. I wonder how that will alter their revenue streams. Let them go ahead with it and the whole thing will be self correcting.
      --
      Sigs are too short to say anything truly profound so read the above post instead.
    4. Re:The Text I Actually Submitted by fm6 · · Score: 3, Informative

      Even without your comments, your submission is way too long. You quoted nearly one third of the article! Next time, take the time to summarize the article in a few sentences. Not only will that make room for your opinions, it will make for a more readable submission that's more likely to he accepted.

    5. Re:The Text I Actually Submitted by pla · · Score: 4, Informative

      Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

      Simple - They want to have their cake and eat it too.

      They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.

      However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.

      Can you hear the violins?

    6. Re:The Text I Actually Submitted by ShieldW0lf · · Score: 2, Interesting

      As it is, I don't care if the publishers rot.

      I do. Every time I hear about something like this, the site goes on my CustomizeGoogle blacklist, never to be seen again. It was the slashdot policy of posting "registration required" links to the New York Times that got me started on this path, and honestly, I'm better informed for it. All these big "news" publishers deliver is sanitized, oversimplified, dumbed down, biased and superficial stories blended with propaganda and outright lies concocted by private interests who stand to gain by your being misinformed. They make you stupider for having been exposed to them. Anyone with integrity has already adapted or left long ago, and those that are left are personally responsible for the wreckage. I hope airplanes land on their heads.

      --
      -1 Uncomfortable Truth
    7. Re:The Text I Actually Submitted by Anonymous+Brave+Guy · · Score: 2, Insightful

      Wow. Don't you think you're overreacting, just a little?

      Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    8. Re:The Text I Actually Submitted by LiquidCoooled · · Score: 2, Insightful

      That's the problem.

      They want you on their site, but they want the power to summarise and manage their search engine face to maximise foot traffic whilst not giving the whole story away.

      --
      liqbase :: faster than paper
    9. Re:The Text I Actually Submitted by ShieldW0lf · · Score: 4, Interesting

      Wow. Don't you think you're overreacting, just a little?

      Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.


      I am.

      I'll be launching a service in the new year to help actively creating artists make a profit off selling original works, leveraging the copyleft and mashup cultures to generate a fanbase and simultaneously devalue the global copyright pool.

      For the right types of creators, the strategy of increasing the amount of budgets available for custom work by annihilating the cost of existing bodies of work is a valid one, and I intend to make it very easy for those types of people to do so as a side effect of their making money off the things that you cannot copy.

      You'll excuse me if I wait till the new year to slashdot myself, but I assure you, I have sunk hundreds and hundreds of man hours and a lot of my own dough into putting my money where my mouth is, and when I'm ready, you will know all about it whether you like it or not, because it will be some noteworthy stuff.

      So no. I don't think I'm overreacting at all. I like to think when it all pans out in the end I'm going to play some small but important personal role in bringing the old things crashing down as a matter of fact. And have the people doing the real work be richer for it.

      --
      -1 Uncomfortable Truth
    10. Re:The Text I Actually Submitted by 1u3hr · · Score: 3, Informative
      But why is it opt out when every other media is opt in?

      1) "Every other media is opt in" -- not true. Fair use applies for most media, allowing summaries and brief quotes without permission, which is what this is about. E.g.: Watch your TV news and you'll often see video taken from other TV news shows, clearly often without explicit permission -- does any US station pay Al Jazeera when they use their video?

      2) The web has always been "opt-out". Thus if you change this assumption, the vast majority of web pages, with no expressed policy, would be excluded from search engines.

    11. Re:The Text I Actually Submitted by iminplaya · · Score: 2, Insightful

      You got it wrong. I don't care about the livelihood of the people who try to restrict and monopolize the information I read everyday on the web. Big difference there.

      --
      What?
    12. Re:The Text I Actually Submitted by Tanktalus · · Score: 2, Funny

      As a slashdot user, I *only* look at the summaries. I don't click to read the actual article, but learn everything I need to know about a subject simply by the summary available on google.

      It works fine here, so why not on google?

  2. So they tell you what they don't want you to see? by Anonymous Coward · · Score: 5, Interesting

    Hmm, i wonder how long before someone opens a search engine that indexes only what is "hidden"(yeah, really...) by the ACAP settings.

    Just don't do it in the US or someone will tell the judge: "The defendant knowingly circumvented the DRM - which is called ACAP - of our online newspaper".

    ACAP - Anonymous Coward Anonymously Posting

  3. This says it all... by doas777 · · Score: 4, Insightful

    from TFA ""The free riding deprives AP of economic returns on its investments," he said."

    same old rule applies; never trust anyone who uses business terms like ROI, for he cares not for you or society, but only for what he can remove from your wallet, without getting arrested over it.

    1. Re:This says it all... by Seumas · · Score: 2, Insightful

      I don't see how difficult this is. If you do not want something on the internet, don't put it on the internet. It's not like Google is going out there and signing into a paying account and indexing paid-for content. In fact, how many times have you found something on google, clicked it -- only to find that all you can read is one paragraph before the site (NYT, etc) throws you a sales pitch to pay $5 or $20 if you want to read the rest?

      My opinion? Good riddance to the lot of them. Please take all the "yahoo answers" and "mahalo" results that show up in various searches with you.

      If news outlets and publishers had their way, google would have to pay them for the privilege to index their sites. And then every time their site's link appeared in a google results page, they'd charge google again. It's a pathetic attempt to try to wrangle some revenue out of a failing concept.

  4. My reaction... by Z80xxc! · · Score: 5, Insightful

    Personally, I think that it's useful for Google and other search engines to show what's truly relevant when you're searching for a page. The fact is, I'm more likely to click on a search result if I can see some of the actual content, and more specifically, the actual text or images that I was looking for. If they don't show me what I want to see, I won't see the rest of it. If it only shows some text that they decide I should see, then it makes it much harder to determine what I'm actually looking at. Even as it now, when results come up that are ambiguous, I find myself less likely to click on them. I readily admit that robots.txt is getting old and isn't really enough any more, but I'm not sure if what they're proposing is the right answer. Additionally, if Google were to implement a new method of searching using ACAP, then what would happen to the sites using the old methods? Would they not be indexed? What if I want all my material to be shown and I don't feel like going through and choosing every little detail about what to allow and not to allow? It's an idea worth looking at, but it's not anywhere a finished, usable idea.

  5. Terms & Condition by SaidinUnleashed · · Score: 5, Insightful

    I really wish that the AP and other similar entities would realize that no matter the legal backing of their terms and conditions of redistribution very few people actually care, and people care less every day. At Burger King, they provide a copy of the newspaper. Does the AP get money for every reader? I think not. This is just are ridiculous as it would be if they tried to make Burger King pay for every person who reads the newspaper while in the restaraunt.

    --
    Shiny. Let's be bad guys.
  6. Here's the documentation by Wesley+Felter · · Score: 5, Informative

    http://www.the-acap.org/project-documents.php

    At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
    This article will disappear from our site in N days, so it better disappear from search engines at the same time
    Don't frame this article
    Don't extract images or thumbnails from this article
    If you show a cached copy of this article, it better include the original ads
    etc.

  7. Seriously by Anonymous Coward · · Score: 5, Insightful

    If you don't want anything to be indexed or archived, it needs to be behind a secure connection or NOT POSTED AT ALL.

  8. Here's a tip... by Digital+Vomit · · Score: 5, Insightful

    Here's a tip:

    If you don't want something to become public knowledge -- accessible by anyone -- then don't put it on the internet.

    --
    Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.
    1. Re:Here's a tip... by hedwards · · Score: 2, Interesting

      That was largely my thought. It makes very little sense as to why anybody would blind click on a link in this day and age. I personally depend upon the summaries to decide whether or not to click. If I don't get a summary I don't click.

      It would make far more sense for these institutions to just take their sites completely off of the search engines via robots.txt and save up those slots in the search results for sites that want traffic. Or perhaps limit it to just the front page, but I think that one can still do that with a competently crafted robots.txt as well.

  9. Historical footnote: where robots.txt came from by charlie · · Score: 5, Interesting

    My one lasting legacy on the web ...

    Back in 1993, when I was teaching myself Perl in my spare time (while working for a -- cough -- UNIX company called The Santa Cruz Operation -- no relation to the current Utah asshats of that name), I was practicing by working on a spider. Now, back then SCO's Watford engineering centre was connected to the internet by a humongous 64kbps leased line. And I was working with a variety of sources on robots, and it just so happened that because I was doing a deterministic depth-first traversal of the web (hey, back then you could subscribe to the NCSA "what's new on the web" bulletin and visit all the interesting new websites every day before your coffee cooled), I kept hitting on Martin Kjoster's website. And Martin's then employers (who were doing something esoteric and X.509 oriented, IIRC) only had a 14.4kbps leased line. (Yes, you read that right: a couple of years later we all had faster modems, but this was the stone age.)

    Eventually Martin figured out that I was the bozo who kept leeching all his bandwidth, and contacted me. Throttling and QoS stuff was all in the future back then, so he went for a simpler solution: "Look for a text file called /robots.txt. It has a list of stuff you are not to pull in. Obey it, or I yell at your sysadmins." And so, I guess, my first attempt at a spider was also the first spider to obey the embryonic robot exclusion protocol. Which Martin subsequently generalized and which got turned into a standard.

    So if you're wondering why robots.txt is rather simplistic and brain-dead, it's because it was written to keep this rather simplistic and brain-dead perl n00b from pillaging Martin's bandwidth.

    Ah, the good old days when you could accidentally make someone invent a new protocol before breakfast ...

    1. Re:Historical footnote: where robots.txt came from by mboverload · · Score: 3, Interesting

      That is one of the coolest stories I've heard in a long time.

      I'm fascinated at the beginnings of the web and the people who drove it.

      If you know any place where I can hear more of these please let me know. (reading your blog right now)

    2. Re:Historical footnote: where robots.txt came from by Cheapy · · Score: 4, Funny

      It figures that perl would be the root of one giant misunderstanding.

      --
      Would you kindly mod me +1 insightful?
  10. And the link to ACAP... by Bill+Dimm · · Score: 3, Informative

    You would think an article about ACAP would provide a link to it.

    1. Re:And the link to ACAP... by Jah-Wren+Ryel · · Score: 5, Funny

      You would think an article about ACAP would provide a link to it. Sorry, their new exclusionary rules prevent any linking to their content.
      --
      When information is power, privacy is freedom.
  11. What right do they have to limit crawlers? by Entropius · · Score: 5, Insightful

    As I understand it the main purpose of robots.txt is to prevent crawlers from consuming excessive amounts of network resources, not to "protect content". It's not a contract; it's not legally-binding; it's a request that automated web agents choose to follow if they want to be polite, or rather a description of how to be polite in the context of a certain site. (Nobody wants crawlers to be indexing dynamically-generated pages, for instance.) As an example, the physics preprint archive arXiv.org has a rather sternly-worded warning: "Follow our robots.txt file or you'll wander off into terabytes of dynamically-generated files, chewing up lots of our bandwidth, and we'll have to ban you to protect our bandwidth bill." That's what it's for, not "protecting content".

    Banning Google from visiting a page and then summarizing its result on a search page is pretty much equivalent to Slashdot banning me from saying "There's this article at goatpron.slashdot.org/whatever that has a description of goat bestiality that I think you might find interesting".

    As long as the summaries are sufficiently short so that they fall under the fair use exception (which Google search results surely do), Google can keep on doing what they're doing.

    1. Re:What right do they have to limit crawlers? by Jeffrey+Baker · · Score: 4, Insightful

      You might find it odd, but there's a lot of lawyers out there (almost all of them, in my experience) who seriously claim that the Terms of Service linked at the bottom of every commercial website. They say it's binding even if you've never read it, and even if it changes and you haven't read the changes. It's binding even if it's not linked from anywhere obvious.

      Now, I realize that these people are idiots, and that probably their future involves a wall, their backs, and a revolution, but at present their counsel is widely respected among the holders of wealth and power. So when you say that robots.txt is "not a contract" you should talk to a lawyer about that. You'd be amazed at the things they say.

    2. Re:What right do they have to limit crawlers? by piojo · · Score: 4, Insightful

      You might find it odd, but there's a lot of lawyers out there (almost all of them, in my experience) who seriously claim that the Terms of Service linked at the bottom of every commercial website. They say it's binding even if you've never read it, and even if it changes and you haven't read the changes. It's binding even if it's not linked from anywhere obvious. That's true, but I'm interested in whether a computer program has to obey contracts. If I write a program and it breaks contracts, am I immediately responsible, or must someone tell me that the program is breaking contracts. If the program is viewed as a tool or an extension of myself, it's probably the former. But programs are frequently not extensions of myself. For instance, if I downloaded the program, not wrote it, there would be no way I could know it was violating contracts.
      --
      A cat can't teach a dog to bark.
  12. Go for it, publishers! by xigxag · · Score: 4, Insightful

    I understand completely. I too would like to stop my nosy neighbors from peering at me out of their window when I leave my house in the morning. My plan is to implement "pay per stare" at some point in the future but they aren't gonna pay if they can get their jollies for free. I blame the "Sun" and "street lamps" and "glass" and other devices that interfere with my ability to effect sole distribution over the intellectual property that is my personal image. Well, at the very least, I should be able to sue torch/flashlight manufacturers into oblivion and then use my deserved winnings to tackle the big boys 150 gigameters away.

    --
    There are two kinds of people: 1) those who start arrays with one and 1) those who start them with zero.
  13. robots.txt a W3C issue by m94mni · · Score: 5, Informative

    Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:

    http://www.w3.org/2001/tag/group/track/issues/36

    See Tim B-L's original mail here:

    http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093

    One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).

  14. Hoisted by their own petard by hal9000(jr) · · Score: 4, Insightful

    I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed. It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page. Egads, what if you read the summary and decided not to visit the site after all?

    Fine. But as we all know, we probably have a few sites that we book mark and visit often. We probably get alot of news from RSS. But alot of people are directed to sites via search engines. So if a content producer, say a news paper, doesn't want it's content indexed, then fine. It will only result in a LOSS of traffic to their site.

    Look, content producers have to make money. They have people to pay, stuff to print, etc. They have expenses. It is truly sad that rather than trying to figure out how to make content relevant and useful, some content producers simply want to continue analog methods in a digital world.

    Gee, just a thought, but what about a way to display a summary and an ad chosen by the content producer along with the summary? Advertisers would spend lots for that kind of exposure.

    1. Re:Hoisted by their own petard by grcumb · · Score: 2, Insightful

      I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed.

      I'm going to assume that you actually mean "...is being indexed."

      It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page. Egads, what if you read the summary and decided not to visit the site after all?

      Your tongue-in-cheek tone is noted. But at the end of the day, the Internet doesn't allow the kind of control old-school publishers want. Not only is that horse gone, there's no barn left to put it back in, if we ever did manage to find it.

      It's an unfortunate fact of life that these people need to have a smart, communicative geek (like, say Larry Lessig) sit down with them and explain that a fundamental aspect of digital information is that it can be replicated with virtually no effort and next to no cost. Additionally, the Internet is a point-to-point network. It is agnostic by design, and works only as long as we accept that we have more to gain by getting along together than by working alone, following our own arbitrary rules. (Make sure Ballmer's not in the room when you get to this part.)

      People like to talk a lot about the Tragedy of the Commons, but the one thing the Internet teaches us is that it's a fallacy where networks are concerned. The Internet is ubiquitous and effectively infinite - 'effectively' in the sense that there's always another copy of a given piece of information on the Internet.

      This means that control is a pipe dream. The best we can do is use moral suasion to request that people respect our wishes with regards to particular content. We ceased to control it the moment we put it on the Net. The fact that most people actually do play nice is one of the miracles of online society.

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    2. Re:Hoisted by their own petard by ScrewMaster · · Score: 3, Insightful

      The best we can do is use moral suasion [google.vu] to request that people respect our wishes with regards to particular content.

      Ha, yeah. I recently purchased a DVD where the opening scene showed a kid snatching a woman's purse and running off, with the voice of doom saying "You wouldn't steal purse would you?" in a ridiculous, nay, pathetic attempt at "moral suasion". I was then subjected to several more unskippable minutes of this asinine lecturing, various legal threats, plus a couple of movie previews and advertisements that I couldn't skip past either. What the hell? So by the time I reached the main feature, I was so irritated (seeing that I'd just paid sixteen bucks for the damn disc) that I pulled the disc from the player and fired up DVD-Shrink. Half an hour later I had a re-authored copy without all the crap, and that's what I watched.

      Idiots.

      --
      The higher the technology, the sharper that two-edged sword.
  15. Pointless by HalAtWork · · Score: 3, Insightful

    If you put it on the internet, and users are meant to access it, why should search engines differentiate any content based on probably arbitrary criteria? If pay sites restrict content and give out special logins for paying users, search engines cannot index it and the content is kept 'private'. If a site that has non-restricted content (restricted by special login) then why shouldn't it be indexed? It would be a disadvantage to the end user, because they cannot find the content as easily (especially if the web site's search engine sucks) and it would be a disservice to the content provider, since their site would be less likely to show up in search results. What is the point? Is this the same thing as people disabling right-click on certain web sites to try and prevent you from 'stealing' content, the same content that is available in your cache, and that would be illegal to use if the content is copyrighted anyway? Is this the same thing as people embedding pictures in flash for the same reason? If all of this results in less usable, less indexable, and more annoyances, just to restrict the way content is accessible and viewed?

    Then that's not the web anymore, that's not really in the spirit of the internet... why not just stick to print or something? And then have it in a special store where you can only buy it with some currency you made up, with an exchange rate you control? Oh, and have a special door for the store that can only be opened with a special device you have to order! Er, anyway... I hope you can understand my point.

  16. oblig... by doyoulikeworms · · Score: 3, Funny

    Bustin' ACAP in Google's ass.

  17. What a joke by WindBourne · · Score: 2, Informative

    If these publishers want to own the search engines, then they should build their own! These engines do them a favor. This is no different than the music publishers trying to control the bands and how they get paid.

    --
    I prefer the "u" in honour as it seems to be missing these days.
  18. Average people and news consumption by jhRisk · · Score: 3, Insightful

    I think the mistake we're using here is that we're assuming most folks consume their news like we do. Sorry to generalize but I believe most of us seek to become informed and thoroughly review and critique what we read. However, most people are satisfied with tidbits and in fact want nothing more. For example, the macob are satisfied with a headline like "Multiple Car Accident Kills 50" and a thumb of the pile up... the noseies like "Brad Wears Ugly Glasses For the First Time" and a thumb... etc. Yes those are terrible headlines and hyperbole to make my point. Imagine a search engine unlike Google which provides summaries of multiple sources offering these tidbits in a single page without the source's ads? Oh wait http://www.ask.com/ and perhaps others although I'm stating soley that they have such a type of offering and not that they do so violating any rules.

    I'm against most tactics that appear to be an organization seeking to squash an alternative or new and unknown element they think is encroaching on their bottom line and this move smells of it but feel it's a rare case of smoke without an actual fire. Just wanted to throw that out there while I seek more info on this tidbit.

    --
    That's just my POV... no more, no less.
  19. It's not a bad thing. by 91degrees · · Score: 2, Interesting

    Seems to be a lot of people slightly upset over this. But I think this is a good thing. They already have the ability to stop search engines from indexing at all. Now they have much more fine grain control. They can also make their results more useful by setting expiry dates. Presumably they'll also be able to be more specific about what he summary says, and might actually be more useful.

    Now some sites will probably want to over control, but they'll lose out.

  20. Just so I'm clear by Jay+L · · Score: 4, Insightful

    A bunch of publishing organizations have gathered together and are attempting to create an Internet standard for restricting searchable content.

    They haven't involved Google, Yahoo, or Microsoft in the process. In fact, the only search company they mention in their FAQ is Exalead, who I didn't even think I've heard of (though now I think I may have once downloaded their desktop trial product).

    This is going to be implemented how?

    In related news, I have issued a new policy for how I (and anyone who joins my club) am to be treated in airport security lines. I will be publishing this policy on my home page, and I am certain it will win widespread adoption among travelers.

    Q:Have you discussed this with security administrators?

    A:In addition to the many travelers who have co-signed the new policy, we have an agreement-in-principle from Madge, the security and commissary chief at the fourth-largest regional airport in greater Bozeman.

  21. Re:So they tell you what they don't want you to se by morgan_greywolf · · Score: 3, Interesting

    Specifically, this seems geared towards sites like Google News that aggregate stories and then publish snippets of them on their home page.

    Personally, I don't really see the problem. You either want your site spidered or you don't. You don't get to control the presentation of the data that is spidered, only the search engines get to do that.

    SO the thing is here is that Google takes its ordinary web spider, applies a little magic to it, and then displays the results as a news page. Big deal.

    You either want your site spidered or you don't. You can't have your cake and eat it too.

  22. A prediction by Anonymous+Brave+Guy · · Score: 4, Insightful

    That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that.

    This being Slashdot, I predict that huge numbers of people will now arrive in this thread and say that you're absolutely right, the search engines are providing a great service, and the publishers should just suck it up because they'd die without them.

    The thing is, they're completely wrong. It's actually the other way around, for the simple reason that news aggregators produce no useful content of their own.

    For you or me, as someone who wants to know what's happening today, we can do one of two obvious things using a web browser. We can visit a specific news site we already know about (or at least guess at a URL), or we can start with an aggregator like Google News. Either way, many people will only read the headlines and summary for most stories. Either way, someone had to go out and get the information to write that story. But in one case, the people who brought the knowledge to the public get the page hit, while in the other, the search engine gets the hit in exchange for ripping much of the value of the other sites' content and the people who actually provided the content get nada.

    It's common at this point for someone to pipe up with a fair use argument, but again, they are wrong, for the simple reason that while the headlines and summaries on news aggregators may only be small excerpts from the entire article, they represent a very significant chunk of the value. You can easily determine this by observing the proportion of users who look something up on an aggregator and never follow through to read any article in more detail; I don't know exactly what the answer is, but I'll wager it's a substantial proportion, perhaps even the majority.

    Another common argument is that the news sites would die without input from search engines, but again I can't believe this is really true. When I reach lunchtime at work, I do not visit Google to find the BBC News web site, I just type in news.bbc.co.uk. (Actually, I visit the bookmark, but the first time that's what I typed.) Google, or any other news aggregator, is wholly unnecessary to my finding the main news site. Even without that, I could easily have guessed that the BBC News web site could be reached at www.bbc.co.uk/news or news.bbc.co.uk, either of which would have got me there immediately. The site is advertised via the BBC's other media as well. A significant proportion of the links I e-mail to and receive from friends and family are direct links to stories on the site.

    Basically, if every search engine on the planet disappeared tomorrow, I rather doubt the big news services would care. As with everything else to do with search engines, they are just a middleman service, and one that is entirely expendable. If they weren't around, the Internet community would just develop an alternative or five, probably rather quickly, just as it always does.

    On the other hand, if the big news services stopped providing news tomorrow, aggregator services like Google News would be completely dead, because they provide absolutely no value in themselves. They simply scrounge content from one source and visitors from another, and insert themselves as a middle man to cream off some of the profits.

    The very fact that one service could survive quite happily without the other, while the other would die immediately without the first, tells us everything we need to know about the merits and public service benefits of each. That being the case, I find it hard to argue with the publishers' position that the news aggregators are basically ripping them off, and I don't really have much sympathy with the two most common counter-arguments people seem to be making in this Slashdot discussion.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:A prediction by allthingscode · · Score: 2, Interesting

      If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them. Google doesn't supply any content, but it does supply a service: It's the first place people go to find out information. If they need more than a summary, they can click on links from the summary page to get details. People aren't going to go to ten websites to look for something if they can start at one place.

      You are right: If the search engines disappeared, the big news services wouldn't care. Actually, they would probably enjoy it, because people would go to the New York Times, Washington Post, and other big names sites rather than seeing these smaller sites with better reporting and commentary. But you contradict yourself as well. You say that if the search engines disappeared, the internet would just create more, but then you say that if the big news services stopped providing news, the search engines would die. No they wouldn't. The internet would create more, filling the need.

      If the news sites want to control their content better, fine. But I guarantee you the next whine you will hear from them is how Google isn't directing traffic to their websites and it must all be retribution by Google for being made to limit what it displays, rather than people clicking on sites where they can read the summary.

    2. Re:A prediction by Anonymous+Brave+Guy · · Score: 2, Insightful

      If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them.

      I'm not at all convinced that's really true. To borrow a related copyright-area theme, it's like the RIAA saying that they have to use DRM, because otherwise no-one will buy legal copies of their stuff. It's just an assumption, which they aren't yet willing to risk violating in case it goes wrong. That doesn't necessarily mean that if they had no choice but to work on a different basis, they'd lose out.

      But you contradict yourself as well. You say that if the search engines disappeared, the internet would just create more, but then you say that if the big news services stopped providing news, the search engines would die. No they wouldn't. The internet would create more, filling the need.

      Actually, I said the community would create alternatives. I have been rather sceptical about the real benefits brought by the search engines for a long time, principally because of their tendency to leech off the most valuable content from others, without ever giving any back themselves. I'm also not convinced they're particularly useful anyway these days; they are so frequently gamed by sites taking advantage of SEO that I find fewer and fewer useful sites on there and turn more and more to following links from other sites, recommendations from friends, and so on.

      For an idea of how far this can scale, consider the nature of linking in the world of blogs: popular articles get widely cited very fast, and the quality of links is generally equal or better than the source site since the links are all hand-picked. A good blog can develop a huge readership in a matter of months, and the whole system is one big meritocracy right down to the level of individual articles.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    3. Re:A prediction by Anonymous+Brave+Guy · · Score: 2, Insightful

      You're imposing blanket assumptions on a specialist niche, which is never a smart thing to do. TFA is talking about news sources, and so is everyone else in this discussion. What anyone else on the web does is pretty much irrelevant here.

      And in any case, you're wrong about the value. In the US, which has one of the most liberal fair use regimes in any jurisdiction today, whether the copy being made affects the value for the original is a major question when deciding whether the copy constitutes fair use. In most other places, the restrictions are far tighter anyway, which I suspect explains the settlements referred to in the article.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    4. Re:A prediction by zantolak · · Score: 2, Interesting

      The organized synthesis and presentation of this content is, in itself, useful content. The number of people using news aggregators should have clued you in on this.

    5. Re:A prediction by Gallowglass · · Score: 2, Insightful

      You wrote: "I'm also not convinced they're particularly useful anyway these days..."

      And yet, every time I Google, I find what I'm looking for. To my mind, that's useful.

  23. Makes no sense by mattwarden · · Score: 2, Insightful

    If search engine caching of their content is hurting these publishers, then they would use currently-supported methods to keep crawlers out:

    User-agent: *
    Disallow: /

    Oh, but that's right, they do want to be indexed in search engines because it increases their revenue.

    So, what's the problem, again?

  24. huh? by wap911 · · Score: 2, Insightful

    What do they not understand about *DO NOT CRAWL*? Robots.TXT is just fine. If it ain't broke, don't try to fix it. So now I have to have a .robotaccees to go along with .htaccess?

  25. Re:So they tell you what they don't want you to se by gbjbaanb · · Score: 2, Funny

    I think it would have been better named Content Retrieval Access Protocol.