Wayback Archives as a Law Tool

← Back to Stories (view on slashdot.org)

Wayback Archives as a Law Tool

Posted by Zonk on Friday July 29, 2005 @05:28AM from the time-travel-is-fun dept.

Carl Bialik from the WSJ writes "The Wayback Machine's internet archive and Google's cached pages are becoming indispensable tools for some lawyers, especially specialists in intellectual-property law. Dell has used copies of expired websites to get the domain name DellComputersSuck.com transferred to it, the Wall Street Journal reports. EchoStar used Wayback in a case against a Polish TV company. Playboy checks Wayback to look for infringers of its trademark bunny or other images. And Wayback was even used to discredit a witness and reach a mistrial in a Canada murder case."

18 of 198 comments (clear)

Min score:

Reason:

Sort:

Text (Yes) Images (not always) by bigwavejas · 2005-07-29 05:29 · Score: 3, Informative

WBM works great for pulling up historical text content, but I've noticed it tends to be hit-and-miss with images. Try pulling up a website and chances are you'll see broken image links.

--
"Simplify, simplify, simplify!" Thoreau
1. Re:Text (Yes) Images (not always) by el_gordo101 · 2005-07-29 05:43 · Score: 3, Informative
  
  That's because they don't save the image files, as far as I can tell. The images actually point back to the site that was archived through some sort of re-direct on the WBM site. If the images files no longer exist on the original site, they will not display on the WBM archived page.
  
  --
  TODO: Insert witty sig
2. Re:Text (Yes) Images (not always) by Adult+film+producer · 2005-07-29 05:49 · Score: 2, Informative
  
  That's because they don't save the image files, as far as I can tell. The images actually point back to the site that was archived through some sort of re-direct on the WBM site.
  
  I don't think that's the case, I don't think it even tries to grab the images from the server. At least with my webpage, I just clicked on the Nov 17th, 2001 archive... it had all the old images that I've long since deleted... and the server logs show no hits/404's for those images...
  
  Maybe that's not always how it operates though, *shrug*
3. Re:Text (Yes) Images (not always) by el_gordo101 · 2005-07-29 06:02 · Score: 3, Informative
  
  Interesting. Our sites date back to 1999. The images files from the older versions of sites do not show up, as these files were deleted long ago. The newer versions that use the images files that are still resident in our /images directory work fine. I am not sure how they handle images.
  
  --
  TODO: Insert witty sig
One easy workaround... by DJ+Rubbie · 2005-07-29 05:35 · Score: 5, Informative

$ cat robots.txt
User-agent: ia_archiver
Disallow: /

My site is not archived there, problem solved.

(Of course, if another of these service pops up...)

--
Please direct all bug reports to /dev/null
So perhaps censoring the archive is wrong? Signed? by expro · 2005-07-29 05:36 · Score: 4, Informative

Destroying evidence? If I don't want to be caught and ask for older web pages to be removed, that may contain incriminating evidence such as illegal copies of things or illegal links, is this different from a request by any other copyright holder to have his pages removed, and can it be punished? What are the archives retention policies, and have legal orders been served to prevent destruction of evidence?

What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.
Re:Employers are using Google too by op12 · 2005-07-29 05:37 · Score: 4, Informative

It's like that CNN article (I think it was posted here) from a few days back:

Bloggers learn the price of telling too much
Re:I wonder... by Peyna · 2005-07-29 05:45 · Score: 2, Informative

Right, I know that the website commited a crime in the past but they didn't know it was a crime at the time so it's not the same as beating the crap out of someone.

Ignorance of the law is no excuse.

I can't believe I actually have to point that out to someone.

--
What?
robots.txt by Cyburbia · 2005-07-29 05:48 · Score: 3, Informative

Even if the Wayback Machine archived your site, adding an appropriate robots.txt file to your Web site's root directory will make _all_ previous archives inaccessible to the public. I discovered this by accident, after I blocked the Wayback Machine robot by accident in an attempt to control malicious spiders. After I modified robots.txt, all the old archives reappeared after a few weeks.
I used the Wayback machine to grab thousands of messages from an old WWWBoard-based message board that I ran, for eventual conversion to vBulletin. Some years, the Wayback Machine crawled every month; others it didn't even visit. Probably 80% of the messages that were posted before 2000 are lost to the ether of cyberspace. Guess you can't expect it to archive everything.
Old Drivers by TheSeventh · 2005-07-29 05:57 · Score: 4, Informative

I was able to use the wbm last year to find some old device drivers for a no-name motherboard I had from '97. The company went out of business, and their remaining stockpiles were bought by some other Chinese company, but the wbm actually had old copies of the drivers, and even a bios update for the board. Now, I always check there when I am having a hard time finding stuff that I knew used to be around.

/bq

--
Just because you're paranoid, it doesn't mean that they're not out to get you.
Re:School by LnxAddct · 2005-07-29 06:08 · Score: 4, Informative

Its been a bit since I was in highschool and had to get around filters, but the most sure fire way is to run an ssh server at home on a port that your school's firewall lets through (most let 22, but to be less suspicious choose like 25 or 443 or something) and then carry putty around on a pen drive. Whenever you need unrestricted access, pop open putty, connect to your server and create a dynamic tunnel, on that pen drive you can also have firefox and have a socks proxy set up to use port like 1080 or whatever port you choose for a dynamic proxy. There you go, unlimited, encrypted surfing all bypassing your school's filters and beening tunneled through your house. This all assumes that your school's firewall doesn't block based on protocol, but rather ports.
Regards,
Steve
If only there was a Firefox extension by Rurik · 2005-07-29 06:17 · Score: 3, Informative

Google Caching and Wayback lookups. You could easily look URLs up by right-clicking on them.

Oh, wait, there is one.

/shameful plug
disclaimer by TTK+Ciar · 2005-07-29 06:27 · Score: 3, Informative

I do not speak for The Archive. The above post should not be considered to reflect the official position of The Archive. It is purely my own personal opinion, and it was uttered under the influence of painkillers (I had my wisdom teeth yanked out of my jaw Wednesday, qv my Slashdot journal entry). Else I probably would have refrained -- talking about this at all while there's a court case pending was probably a really stupid idea, and I (usually) know better.

-- TTK
Re:What about copyrights? by Ngwenya · 2005-07-29 06:28 · Score: 5, Informative

OK, you may look at WBM as a library, but IS it?

Yes. It really is. It's a registered member of the American Library Association. Details on http://www.archive.org/about/about.php

It's an honest to God library, which also means that Section 108 of the USC on Copyright applies. Public libraries in the US (and here in the UK) have some pertinent exemptions to the copyright restrictions that bind us mere mortals.

--Ng
Re:Childish Grudges by shawb · 2005-07-29 06:36 · Score: 2, Informative

Well, the DellComputersSuck.com website actually DESERVED to be sued. Is this a free speach issue? Not really. They were selling another brand of computers, so trademark infringement would actually come into play here.

--
I'll never make that mistake again, reading the experts' opinions. - Feynman
Re:False info being planted by shawb · 2005-07-29 06:43 · Score: 2, Informative

That's why the archive's administrators signed an affidavit stating that the information was, to the best of their knowledge, not tampered with. And it would be up to the other lawyers to prove that you were falsifying the information, which could lead you into further trouble or at the very least remove any doubt of your guilt from the jury and show that you were indeed acting in bad faith.

--
I'll never make that mistake again, reading the experts' opinions. - Feynman
Re:But see, they signed a peice of paper by sholden · 2005-07-29 06:45 · Score: 2, Informative

They might. But if the other side can provide some evidence that it isn't a true archive than said technician is in deep shit.
There ought to be a Law by Nom+du+Keyboard · 2005-07-29 08:54 · Score: 2, Informative

Requests from third parties to remove information are generally denied. The Wayback Machine makes exceptions in certain circumstances, for example if the Web pages contain personal information provided in confidence, such as medical data.
I bet the (so-called) Church of Scientology gets everything they want pulled.
In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.
This is pretty bogus because it only works if there is still a current web-site at the spidered address that is on-line and can deliver a robots.txt file saying DON'T! It has already been proven in another case that rapid-fire multiple requests to WBM will cause it to give up pages even when robots.txt says not to.
I see two ways to fix this problem of misuse of a valuable archive:
1: Federal law PROHIBITING the use of evidence from the Wayback Machine in court trials. This is a valuable historical archive that will be less valuable if people worry that it can be used against them in the future in unforeseen ways, and block contributing to it. How many sites already block the WBM TCI/IP address range?
2: WBM could simply announce that they refuse to cooperate in any future trials -- AND THEN DO EXACTLY THAT! Without them to attest to the accuracy of the retrieved data, many cases relying on that data would fall flat on their faces.
Think for a moment. The WBM was not created to make lawyer's lives easier, and their law firms richer!

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."