Slashdot Mirror


Lawsuit Stops Headline Scraping

Stephen Larson alerts us to the out-of-court settlement of Gatehouse v NY Times, a lawsuit that attempted to stop the Boston Globe from linking to headlines and excerpting initial sentences from a competitor's Web site. At issue was the Globe's practice — barely distinguishable from those of Google News, Yahoo, and others — of linking to another news source's coverage of local news. The upshot is that the Boston Globe will stop the linking. No judicial precedent was set, because the case was settled before reaching a judge.

24 of 85 comments (clear)

  1. Lawsuit Stops Headline Scraping by tpheiska · · Score: 5, Funny

    Stephen Larson alerts us to the out-of-court settlement of Gatehouse v NY Times, a lawsuit that attempted to stop the Boston Globe from linking to headlines and excerpting initial sentences from a competitor's Web site. Read more here.

    --
    "wahts woring iwth my tyoping?"
    1. Re:Lawsuit Stops Headline Scraping by $RANDOMLUSER · · Score: 4, Funny

      Mr. Piquepaille?!? We thought you was dead!

      --
      No folly is more costly than the folly of intolerant idealism. - Winston Churchill
  2. Web fundamental by Anonymous Coward · · Score: 3, Insightful

    Since links are so fundamental to the web, wouldn't it be easier if they just GTF off the internet rather than bother with these lawsuits?

    1. Re:Web fundamental by h4rm0ny · · Score: 5, Insightful


      I don't know. Screen scrapers can be pretty fucking irritating. Particularly in the parallel case of support forums. It's a problem when you want to search for a problem with some code or a database and the first eight hits are all the same post on different "forums", (usually all ripped off Usenet). How do you know if the replies are the same on all threads. What if *you* want to reply? Which site do you use? And they obscure different answers just through drowning them out. Ideally, I want a Google or Yahoo search engine plugin which will let me exclude all the scrapers.

      --

      Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
    2. Re:Web fundamental by Anonymous Coward · · Score: 5, Insightful

      I agree, but not everything that is annoying should be made illegal.

    3. Re:Web fundamental by Anonymous+Brave+Guy · · Score: 5, Insightful

      This isn't really about the links, though, is it? On a news site, the effort required to identify a story and get the key facts right is a large part of the value of the site. If someone else can come along and copy the headline and intro, they've got most of that same value for nothing. They are just parasites, damaging the people who are doing the real work, and not even adding any useful value for society more generally. This is why places with sensible copyright laws judge fair use by criteria other than just the size of the excerpt.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    4. Re:Web fundamental by dattaway · · Score: 4, Informative

      I would give almost anything to have a blacklist of domains I could set while logged into google so that those never showed up in my searches ever again...

      Exactly what you are looking for, Google's customizable search engine:

      http://www.google.com/coop/cse/

    5. Re:Web fundamental by Ioldanach · · Score: 2, Informative

      This isn't really about the links, though, is it? On a news site, the effort required to identify a story and get the key facts right is a large part of the value of the site. If someone else can come along and copy the headline and intro, they've got most of that same value for nothing.

      I took a look at this when the first article came out. The plaintiff's site has an RSS feed. The defendant's site looks like it was aggregating the headlines and initial sentence or so of several locally relevant news sites' RSS feeds, displaying that, and linking the headline back to the originating site. Basically, exactly what you expect an rss aggregator to do.

    6. Re:Web fundamental by pauljlucas · · Score: 2, Interesting

      I just looked at Google's CSE. I don't see any way to blacklist domains. You can whitelist, but not blacklist.

      --
      If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
  3. Conflicting interests by anticlimate · · Score: 4, Insightful

    Deeplinking and "stealing" your stories may hurt you int the short term financially. But - let's face it - the real reason of operating a newspaper or site is to make your audience see the world through your goggles. The more your opinionated news are linked or copied in one, the more influence you have on other people's thoughts, decisions etc.
    Yes I'm that cynical (in the case of the news industry at least).

    1. Re:Conflicting interests by gmack · · Score: 4, Insightful

      Actually deep linking helps you in the short term.

      You end up with a site like boston.com sending their own customers to your site where their customers read your news articles and you get revenue because you get paid when they see your ads.

      Objections to deep linking come from the flawed idea that without deep linking the customers would have come to the main page and read the ads there before going on to the page in question. I find it much more likely that they would never have known about the article at all.

      Whoever filed that lawsuit needs to be fired.

    2. Re:Conflicting interests by clickclickdrone · · Score: 3, Insightful

      >the real reason of operating a newspaper or site is to make your audience see the world through your goggles
      Not quite - newspapers, magazines etc exist to sell advertising space. The editorial content is just there to support this aim. It was different prior to the early 70's when magazines/papers genuinely existed to provide information to the readers but now that's a byproduct.

      --
      I want a list of atrocities done in your name - Recoil
    3. Re:Conflicting interests by Ender_Stonebender · · Score: 4, Insightful

      They won a battle. It doesn't mean that they've won the war. Especially since the settlement was out-of-court, so the legality of their action hasn't truly been tested. (There are many reasons for settling out of court - you know you can't win; you know can win but it won't be worth the price, you might win but the cost of the judgement against you plus legal fees will be higher than what the other party is willing to take in settlement, etc.)

      And as other people have pointed out in this thread, there's a good chance that deeplinking actual drives increased page views by sending people directly to content they are interested in rather than relying on them to find interesting content on their own via the site's main page.

      --
      Loose things are easy to lose. You're getting your hair cut. They're going there to see their aunt.
  4. This is ridiculous by biscuitlover · · Score: 5, Informative

    FTA, it sounds like Gatehouse see this as a copyright violation but, as several other posters have pointed out, the same thing goes on on news aggregator sites all the time. In fact most stories on Slashdot contain snippets from other sites. It's an unavoidable and very useful facet of the web

    This is yet another example of 'old' media not really understanding online practices. Most sites benefit tremendously from others linking to them - look at what happens with Slashdot. That is, unless the 'benefit' is so great that their server turns to dust.

    1. Re:This is ridiculous by Anonymous+Brave+Guy · · Score: 4, Interesting

      FTA, it sounds like Gatehouse see this as a copyright violation but, as several other posters have pointed out, the same thing goes on on news aggregator sites all the time.

      Which doesn't make it any less of a copyright violation. "Him too" is not a defence in law.

      In fact most stories on Slashdot contain snippets from other sites.

      And sometimes Slashdot does go too far, but at least it's in a grey area, with original content and editorial control as well. Presenting factual information is one thing. Mechanically cloning another's work and using their exact words, while adding no value at all of your own, is another.

      It's an unavoidable and very useful facet of the web

      What is, the using links part, or the mechanical copying without adding value part?

      This is yet another example of 'old' media not really understanding online practices.

      It sounds to me like yet another example of 'new' media thinking that by being on the Internet they are somehow exempt from the law.

      Most sites benefit tremendously from others linking to them - look at what happens with Slashdot.

      In this context? I'd like to see some evidence of the benefits the people doing the original work derive in this sort of case, please.

      By the way, Slashdot is a particularly unfortunate example, since people not reading the original article is a running joke and "Slashdot Effect" is not a term used to describe an abundance of ad revenue giving your business a huge boost.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    2. Re:This is ridiculous by analog_line · · Score: 2, Interesting

      the same thing goes on on news aggregator sites all the time

      Depends on what you mean by "the same thing". First off, Boston.com is not a news aggregator. They are a news generator. They make money selling ads because theoretically someone wants to see the content they generate (and up until now at least, the Boston Globe staff has produced quite a lot of important news that people want to read, the whole expose on presidential signing statements was broken by a Globe reporter). The main problem here is that the people that run Boston.com decided that they have the right to sell ads next to content that they do not own, nor have a contractual right to sell ads next to.

      Google News (since it's the only news aggregator I use) sells no ads next to any page under the news.google.com subdomain that I've been able to find. Yes I just looked. No ads on the news search pages, no nothing, not even when I turned javascript on for google.com. That does 2 things that favor Google in this regard. First off, they aren't directly profiting from the link, and this preserves a lot of goodwill on the part of the people they link to, so they're less likely to be viewed as a direct financial parasite and sued. Secondly, it's a lot more likely that a fair use claim is going to be upheld in the event that someone decides to sue (remember, no use is fair use unless a court rules that it is), if it even gets to that point, since Google has set up systems for you to exclude your site from index/display on anything Google if you happen to think they're the devil and a thief.

    3. Re:This is ridiculous by Scrameustache · · Score: 2, Interesting

      You sound like one more person who fails to understand the concept of fair use and that old laws are not written with new technological possibilities in mind.

      You might like to reflect on what you wrote there, until you understand the irony.

      New technologies do not negate fair use, it just adds new uses. Some of which are fair, and some not.

      I maintain that linking with an extract is fair:
      The 1961 Report of the Register of Copyrights on the General Revision of the U.S. Copyright Law cites examples of activities that courts have regarded as fair use: âoequotation of excerpts in a review or criticism for purposes of illustration or comment; quotation of short passages in a scholarly or technical work, for illustration or clarification of the author's observations; use in a parody of some of the content of the work parodied; summary of an address or article, with brief quotations, in a news report; reproduction by a library of a portion of a work to replace part of a damaged copy; reproduction by a teacher or student of a small part of a work to illustrate a lesson; reproduction of a work in legislative or judicial proceedings or reports; incidental and fortuitous reproduction, in a newsreel or broadcast, of a work located in the scene of an event being reported.â

      --

      You can't take the sky from me...

  5. But links to your site are good ... by hattig · · Score: 3, Insightful

    Linking to other media sites is a common feature of many news sites. BBC News has links to other site's reporting for stories. It's just a headline and link, nothing special.

    That link boosts the other site's search rankings, and every click-through is a reader that they didn't have before, and an ad-hit, and maybe a repeat visitor.

    Taking the headline and the entire article is a different issue altogether, but I don't think that is the situation in this case. It is like all the Belgian (?) newspapers that want to have zero online presence or searchability. It makes no sense! You either participate, or you fade away on the fringes. That's why there is a "web" in "world wide web". Why be a bit of gossamer drifting on the wind when you can be in the web and actually be useful?

  6. Re:General law about search and link services? by NoisySplatter · · Score: 4, Insightful

    No, the fact that they settled means that the court case was likely to cost more than the settlement. They agreed to stop the linking so they lost by default in the settlement.

    --
    In Soviet Russia meme tires of you!
  7. Comment removed by account_deleted · · Score: 2, Interesting

    Comment removed based on user account deletion

  8. Lawsuit stops what? by houghi · · Score: 2, Informative

    No judicial precedent was set, because the case was settled before reaching a judge.

    As it was settled outside the lawsuit, the lawsuit settled nothing. Also no precedent, so this is actually bad news.
    Now we still do not know what is and what is not legal. A complete lawsuit would have been better, be it for or against linking.

    --
    Don't fight for your country, if your country does not fight for you.
  9. Wrong by Jabbrwokk · · Score: 2, Insightful

    the real reason of operating a newspaper or site is to make your audience see the world through your goggles.

    No it isn't. The real reason is to make money. If your competitor is stealing your work and using it for their own financial gain, I think you have a right to be pissed off and sue.

    You give the media too much credit -- its motives are surprisingly shallow. It doesn't really care what you think, you are free to agree or disagree, as long as you are reading/watching/listening, and of course, paying attention to those wonderful advertisers who make the whole thing possible.

  10. Laches? Fair Use? by mdmkolbe · · Score: 2, Interesting

    Which doesn't make it any less of a copyright violation. "Him too" is not a defence in law.

    Actually Laches could be a defense. If the plaintiff did not sue other entities that engaged in this practice and then the defendant on seeing that the plaintiff didn't sue also engaged in that practice but the plaintiff suddenly decided to sue the plaintiff but not the other entities, then the defense could claim a laches defense.

    (That is in theory, however the facts of this case probably don't support laches because (1) google/yahoo/etc are not competing with the newspaper but the other newspaper is thus it is a slightly different act and (2) laches requires that the defendant to suffer some harm from the "trick" of not suing for a long time and then suing.)

    Regardless of the above they might still have a defense under fair use or at least be able to modify their reporting to make it fair use. Regarding the four tests for fair use, they will most certainly lose on the first test (commercial nature), but on the second (nature of work: news/facts), third (amount: one sentence) and fourth (commercial impact: more viewers on page thus more advertisement revenue) tests they could win.

  11. Why this makes sense. by Anonymous+Freak · · Score: 2, Informative

    Unlike Google News, the Boston Globe is, itself, a news-reporting organization. Mixing their own stories with those from competitors can lead to confusion. I didn't manage to see the offending page before they took down their linked stories; but I imagine it was done in such a way as to have the original source difficult to identify.

    A pure aggregator service, like Google News, is different because it is rather obvious that ALL it is doing is aggregating. There is no 'new reporting' being done by "Google".

    --
    Another non-functioning site was "uncertainty.microsoft.com."
    The purpose of that site was not known.