Slashdot Mirror


Image Detecting Search Engines' Legal Fight Continues

Mr. steve points to this New York Times article about sites like ditto.com and the new google image-search engine, writing: "Search engines that corral images are raising Napsteresque copyright issues." Expect to see a lot more sites with prominent copying policies and "no-download" images, and trivial circumvention of both. If an image is part of your site's design, you wouldn't truly want to prevent downloads, would you? ;)

9 of 220 comments (clear)

  1. bullsh*t by teknopurge · · Score: 2, Informative

    Google clearly posts comments about the copyrights possibly associated with the images that it returns.

    http://techienews.utropicmedia.com help us beta!!

  2. Don't sign up for NYTimes: by cavemanf16 · · Score: 5, Informative

    Here's the story without the signup restriction: http://archive.nytimes.com/2001/09/06/technology/c ircuits/06IMAG.html

  3. Re:well... by jesser · · Score: 3, Informative

    Google cache does not contain your images. When you view the page from the google cache, Google adds <BASE HREF="http://www.iceball.net/peter/"> at the top of the page to instruct your browser to treat all relative URLs in the page not as relative to Google's cache of the page, but to your page. So when your browser sees <img src="PSORGLOGO.jpg"> later in the document, it interprets that as <img src="http://www.iceball.net/peter/PSORGLOGO.jpg"&g t; and loads the image from your server. If your site was down, and I went to Google's cache of your site, I would not be able to see the images.

    --
    The shareholder is always right.
  4. Re:Some sites are already doing this with cookies by drodver · · Score: 2, Informative

    Opera has a wonderful setting allowing sites to set all the cookies they like, and when you close the browser every single one goes into the trash. No problems viewing pages or placing orders and it makes tracking you a little bit harder.

  5. Next step.... by www.sorehands.com · · Score: 4, Informative
    What about the companies that build databses of images, websites, etc. from spidering the web?

    They sell access to these databases to their clients to search for illegal copies of their works, or to see any mention of them in an unfavorable light. Is this an infringement?

  6. Especially since robots.txt lets you disallow this by MemeRot · · Score: 4, Informative
    A little thing called robots.txt - look it up here or here if you don't know what it is.

    Allows really useful features like marking given directories, pages, or files off-limits to a specific robot or all robots in general. Boy... a technical solution to a technical problem? Who'd a thunk it?

    Quickie examples (this is SO simple folks):
    User-agent: *
    Disallow: /

    Boom! No more google telling that horrible world of pirates and thieves about your site. Not many visitors either though....

    So maybe you want to exclude just googlebot from your images and image directory with the following:

    User-agent: googlebot
    Disallow: /image

    If you want to do this for multiple directories, you add on more Disallow lines:

    User-agent: *
    Disallow: /image
    Disallow: /cgi-bin/

    Now if you put

    meta name="robots" content="All,INDEX"
    meta name="revisit-after" content="5 days"

    in your code to show up high on the search engines, you shouldn't be surprised or upset when you SHOW UP HIGH ON THE SEARCH ENGINES.

    Not all robots follow the robots.txt standard, and there's no way of forcing them too. But google does, and that seems to be the big concern here.

    A real life example, slashdot's robot.txt file (at slashdot.org/robots.txt):

    # robots.txt for Slashdot.org
    User-agent: *
    Disallow: /index.pl
    Disallow: /article.pl
    Disallow: /comments.pl
    Disallow: /users.pl
    Disallow: /search.pl
    Disallow: /palm
    Disallow: index.pl
    Disallow: article.pl
    Disallow: comments.pl
    Disallow: users.pl
    Disallow: search.pl
  7. So say no to the robots :) by MemeRot · · Score: 3, Informative

    You can use a little thing called robots.txt - look it up here or here if you don't know what it is.

    Allows really useful features like marking given directories, pages, or files off-limits to a specific robot or all robots in general. Boy... a technical solution to a technical problem instead of a new round of lawsuits?

    Quickie examples (this is SO simple folks):
    User-agent: *
    Disallow: /

    Boom! No more google telling that horrible world of pirates and thieves about your site. Not many visitors either though....

    So maybe you want to exclude just googlebot from your images and image directory with the following:

    User-agent: googlebot
    Disallow: /image

    This will still allow your main pages to be indexed according to your meta keywords, but will disallow any 'napsterization'. Of course since it requires people running sites to do work and understand technology lots of people will probably decided lawsuits are easier.

    Robots.txt DOES require you to run your own domain. If you don't, try using meta tags in the head of the html code for a similar effect, but it is harder to implement (must be on each page rather than site wide) and less supported. Info here.

    If you spend that much time on the images... spend 5 minutes making a robots.txt file to indicate you don't want them taken by bots. But always consider anything you put on the net as published, if something's private don't put it on the net.

  8. Do you have any idea how robots.txt works? by MemeRot · · Score: 4, Informative

    User-agent: *
    Disallow: /image
    Put all image files in the /image directory.

    or I would recommend for him:
    User-agent: *
    Disallow: /
    - i don't think he has any 'right' to use the search sites to promote his site if he doesn't consent to them copying his data. Is html code protected by copyright? This would make all search sites illegal, and destroy the internet as a usable resource. So because the consequences would be untenable, we should answer no.

    That's all. Meta tags, which you seem to be thinking of, are a pain in the ass, poorly supported, and only worth using if you don't control the domain and can't put up your own robots.txt file.

    If I put 10 pizzas on a picnic table with a note saying 'please dont eat my pizza' and leave it there for 3 days - it will be eaten. If I do this ignoring the safe that's right there that I could use to lock them in, then i'm an idiot.

  9. Re:robots.txt by prizog · · Score: 3, Informative

    See the Ticketmaster case: copyright notices are not binding on spiders.

    Grep for "terms and conditions" in:
    http://www.gigalaw.com/library/ticketmaster-tick et s-2000-03-27.html