Image Detecting Search Engines' Legal Fight Continues

← Back to Stories (view on slashdot.org)

Image Detecting Search Engines' Legal Fight Continues

Posted by timothy on Thursday September 6, 2001 @09:14AM from the mmmm-thumbnails dept.

Mr. steve points to this New York Times article about sites like ditto.com and the new google image-search engine, writing: "Search engines that corral images are raising Napsteresque copyright issues." Expect to see a lot more sites with prominent copying policies and "no-download" images, and trivial circumvention of both. If an image is part of your site's design, you wouldn't truly want to prevent downloads, would you? ;)

16 of 220 comments (clear)

Min score:

Reason:

Sort:

Don't sign up for NYTimes: by cavemanf16 · 2001-09-06 09:17 · Score: 5, Informative

Here's the story without the signup restriction: http://archive.nytimes.com/2001/09/06/technology/c ircuits/06IMAG.html
well... by fjordboy · 2001-09-06 09:20 · Score: 4, Insightful

my site (Peterswift.org is cached on google and they have my images and pretty much everything i have on my site on their site. However, this doesn't bother me at all. They don't claim ownership to any of it, in fact, they blatantly say that they don't own any of it! I don't have a problem with them taking my page and putting it on their site. That just means more people access my page, and if my site ever happens to be down, then I don't have to worry as much. In fact..I hope google caches my site today, because I just uploaded about 40 or 50 images in the last week in my pictures folder, and if they cache it, then i don't have to worry about me screwing up html or anything...i can always pull my site from google. it is just another backup. :)

--
The anti-salmon
1. Re:well... by jesser · 2001-09-06 09:55 · Score: 3, Informative
  
  Google cache does not contain your images. When you view the page from the google cache, Google adds <BASE HREF="http://www.iceball.net/peter/"> at the top of the page to instruct your browser to treat all relative URLs in the page not as relative to Google's cache of the page, but to your page. So when your browser sees <img src="PSORGLOGO.jpg"> later in the document, it interprets that as <img src="http://www.iceball.net/peter/PSORGLOGO.jpg"&g t; and loads the image from your server. If your site was down, and I went to Google's cache of your site, I would not be able to see the images.
  
  --
  The shareholder is always right.
What right do they have? by Mike+Schiraldi · 2001-09-06 09:22 · Score: 3, Interesting

Putting a picture on the web is like walking around in a public place.

If someone takes a picture of me out on the street, i have no right to keep them from publishing it. If i don't want people to take pictures of me doing something, i don't do it in a public place.

If you don't want Google picking up your pictures, and you don't want people saving your pictures to their hard drives, don't put the pictures on the web.

--

--
Mod up a post Rob doesn't like and you'll never mod again
Wouldn't that? by wbav · 2001-09-06 09:24 · Score: 4, Funny

Wouldn't that make ie/netscape/mozilla/opera/ect the program you are downloading with?

I can just see it now:
Judge shuts down microsoft for distubting software that allows you to violate copyrights by downloading images. Microsoft was shut down Monday for it's popular browser Internet Exploder. A representive from the company said "We were shocked. I mean, we didn't really expect the software to work in the first place."

Of course we won't see such a headline, but still, turnabout is fair play.

--

=================
Unix is very user friendly, it's just picky about who its friends are.
m$ breaking the DMCA with image toolbar? by BrookHarty · 2001-09-06 09:27 · Score: 5, Funny

I just started using IE's image toolbar, nice thing, i was on a site that tried to protect the images with javascript, i just clicked on image and the toolbar popped up, clicked save picture...

Is m$ breaking the DMCA with thier circumvention?
robots.txt by mj6798 · 2001-09-06 09:27 · Score: 5, Insightful

The guy's site (http://www.goldrush1849.com/) still does not have a robots.txt file. Either Kelly is incompetent, or he does this deliberately to get other people to trick other people into "using" his content and sue them later.
1. Re:robots.txt by prizog · 2001-09-06 11:48 · Score: 3, Informative
  
  See the Ticketmaster case: copyright notices are not binding on spiders.
  
  Grep for "terms and conditions" in:
  http://www.gigalaw.com/library/ticketmaster-tick et s-2000-03-27.html
  
  --
  Become a FSF associate member before the low #s are used
There is a very simple answer by graveyhead · 2001-09-06 09:31 · Score: 4, Insightful

There is a very simple answer for the artist in the ditto.com case. Watermark all your production images. You can create yourself a Photoshop action to automate this very easily, and a GIMP script version wouldn't be all that tough either. Make them unusable unless they obtain a (non-web based) copy from you. I couldn't even finish reading the horrible article because they compared the pitiful ditto.com vs nobody case to Napster vs. RIAA twice before the article was half-finished.

--
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
Free Advertising by B.B.Wolf · 2001-09-06 09:43 · Score: 3, Insightful

I spend much time and effort on my graphics. For me it is a form of art. When I see one of my Backgrounds on a coworkers system, I am tickeled pink. I would like to make money at this someday. The more my work spreads, the easier it will be to sell my services. As for downloading and distributing my backgrounds as art theft, get real. The only valuable item is the 16 to 30 layer 1600X1200 gimp file and all the variouse auxillary files I used to generate the 1024X786 .png. It is just like photograph. It is'nt the print, but the negative that is important, despite what some anal Photo trade organizations are pushing for.
Next step.... by www.sorehands.com · 2001-09-06 10:01 · Score: 4, Informative

What about the companies that build databses of images, websites, etc. from spidering the web?

They sell access to these databases to their clients to search for illegal copies of their works, or to see any mention of them in an unfavorable light. Is this an infringement?

--
Fight Spammers!
1. Re:Next step.... by Angst+Badger · 2001-09-06 10:51 · Score: 3, Insightful
  
  Imagine a world in which there are no search engines impartially indexing the web. You'd end up with a few popular outlets that would either showcase their own subsidiaries or sell listings to their partners.
  
  Bing! You have reinvented TV, but with online ordering capabilities. Having failed to create interactive television, Big Business is systematically destroying those elements of the web that made it better than interactive TV.
  
  Your spoonfeeding, already in progress, will now resume.
  
  --
  Proud member of the Weirdo-American community.
Especially since robots.txt lets you disallow this by MemeRot · 2001-09-06 10:07 · Score: 4, Informative

A little thing called robots.txt - look it up here or here if you don't know what it is.

Allows really useful features like marking given directories, pages, or files off-limits to a specific robot or all robots in general. Boy... a technical solution to a technical problem? Who'd a thunk it?

Quickie examples (this is SO simple folks):
User-agent: *
Disallow: /

Boom! No more google telling that horrible world of pirates and thieves about your site. Not many visitors either though....

So maybe you want to exclude just googlebot from your images and image directory with the following:

User-agent: googlebot
Disallow: /image

If you want to do this for multiple directories, you add on more Disallow lines:

User-agent: *
Disallow: /image
Disallow: /cgi-bin/

Now if you put

meta name="robots" content="All,INDEX"
meta name="revisit-after" content="5 days"

in your code to show up high on the search engines, you shouldn't be surprised or upset when you SHOW UP HIGH ON THE SEARCH ENGINES.

Not all robots follow the robots.txt standard, and there's no way of forcing them too. But google does, and that seems to be the big concern here.

A real life example, slashdot's robot.txt file (at slashdot.org/robots.txt):

# robots.txt for Slashdot.org
User-agent: *
Disallow: /index.pl
Disallow: /article.pl
Disallow: /comments.pl
Disallow: /users.pl
Disallow: /search.pl
Disallow: /palm
Disallow: index.pl
Disallow: article.pl
Disallow: comments.pl
Disallow: users.pl
Disallow: search.pl
So say no to the robots :) by MemeRot · 2001-09-06 10:18 · Score: 3, Informative

You can use a little thing called robots.txt - look it up here or here if you don't know what it is.

Allows really useful features like marking given directories, pages, or files off-limits to a specific robot or all robots in general. Boy... a technical solution to a technical problem instead of a new round of lawsuits?

Quickie examples (this is SO simple folks):
User-agent: *
Disallow: /

Boom! No more google telling that horrible world of pirates and thieves about your site. Not many visitors either though....

So maybe you want to exclude just googlebot from your images and image directory with the following:

User-agent: googlebot
Disallow: /image

This will still allow your main pages to be indexed according to your meta keywords, but will disallow any 'napsterization'. Of course since it requires people running sites to do work and understand technology lots of people will probably decided lawsuits are easier.

Robots.txt DOES require you to run your own domain. If you don't, try using meta tags in the head of the html code for a similar effect, but it is harder to implement (must be on each page rather than site wide) and less supported. Info here.

If you spend that much time on the images... spend 5 minutes making a robots.txt file to indicate you don't want them taken by bots. But always consider anything you put on the net as published, if something's private don't put it on the net.
Do you have any idea how robots.txt works? by MemeRot · 2001-09-06 10:53 · Score: 4, Informative

User-agent: *
Disallow: /image
Put all image files in the /image directory.

or I would recommend for him:
User-agent: *
Disallow: /
- i don't think he has any 'right' to use the search sites to promote his site if he doesn't consent to them copying his data. Is html code protected by copyright? This would make all search sites illegal, and destroy the internet as a usable resource. So because the consequences would be untenable, we should answer no.

That's all. Meta tags, which you seem to be thinking of, are a pain in the ass, poorly supported, and only worth using if you don't control the domain and can't put up your own robots.txt file.

If I put 10 pizzas on a picnic table with a note saying 'please dont eat my pizza' and leave it there for 3 days - it will be eaten. If I do this ignoring the safe that's right there that I could use to lock them in, then i'm an idiot.
Re:property by anticypher · 2001-09-06 13:20 · Score: 3, Interesting

The answer is YES! Maybe.

Images on my site are my property. In every jpeg image (and powerpoint, word and text file) I create, I place my copyright statement. I also have a robots.txt file to prevent copying by search engines. To google's credit, they obey the robots.txt file, but others are not so considerate.

Recently, I had the occasion to place a number of images and other copyrighted works on a website hosted on one of my machines. The copyrighted works were available for a period of about 20 minutes, long enough for my friend (who paid me in beer, including many pints tonight just before I typed this, apologies for typos and bad grammar) and his brother to retrieve the works. My friend used AOL Instant Messenger to tell his brother which URL to find the images, including the obscure URL.

After I saw the two of them had retrieved the images, I left the site up for some stupid reasons (end of work day, beers calling, phone calls from idiots). Apache was running on an obscure port (28962) on an IP address with no DNS/reverse DNS entry. About 14 hours after my friend has sent the URL to his brother via AIM, I saw an AOL spider crawled my site for those works.

Its pretty fucking obvious that AOL is sucking up every copyrighted work they can, presumably to have copies of everything of value that passes by AIM. Their EULA allows them unlimited copyright to anything that passes by their systems, even if it is hosted on a third party system that doesn't agree to their EULA.

The machines involved slowly crawled the site, about one hit per minute from 4 different IP addresses. Machines like:
spider-loh-ta012.proxy.aol.com, spider-loh-ta016.proxy.aol.com, cache-loh-aa01.proxy.aol.com, and
cache-loh-ab02.proxy.aol.com carefully worked the site, following every link, and grabbing every (huge) jpeg and ppt file. Stupid of me to not filter AOL from my website, but I've learned. From now on, only password protected protocols that can't be easily picked up in plaintext streams.

Since that incident, I've been able to work this demonstration into my security reports. A client can set up a totally fake URL on a random port, send a message by AIM, and within 24 hours, the site is spidered by AOL, regardless of the robots.txt file. Sending an FTP username and password will result in the site being accessed within 24 hours. AOL hasn't responded to any of my queries, so that makes the whole thing even more interesting from a security aspect, and makes me even more money.

So don't place any intellectual property on any internet connected machine, if you want to retain control of your copyright. Large corporations will take your works, and if they happen to have great value later on, you won't see any recompense. I actually feel bad for the RIAA/MPAA giants, because they can't defend themselves, even with the DMCA and new European laws. You may own the IP for a work, but the internet doesn't care. "Get over it".

the AC

--
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on