Wayback Archives as a Law Tool
Carl Bialik from the WSJ writes "The Wayback Machine's internet archive and Google's cached pages are becoming indispensable tools for some lawyers, especially specialists in intellectual-property law. Dell has used copies of expired websites to get the domain name DellComputersSuck.com transferred to it, the Wall Street Journal reports. EchoStar used Wayback in a case against a Polish TV company. Playboy checks Wayback to look for infringers of its trademark bunny or other images. And Wayback was even used to discredit a witness and reach a mistrial in a Canada murder case."
WBM works great for pulling up historical text content, but I've noticed it tends to be hit-and-miss with images. Try pulling up a website and chances are you'll see broken image links.
"Simplify, simplify, simplify!" Thoreau
$ cat robots.txt
User-agent: ia_archiver
Disallow: /
My site is not archived there, problem solved.
(Of course, if another of these service pops up...)
Please direct all bug reports to
Destroying evidence? If I don't want to be caught and ask for older web pages to be removed, that may contain incriminating evidence such as illegal copies of things or illegal links, is this different from a request by any other copyright holder to have his pages removed, and can it be punished? What are the archives retention policies, and have legal orders been served to prevent destruction of evidence?
What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.
It's like that CNN article (I think it was posted here) from a few days back:
Bloggers learn the price of telling too much
Right, I know that the website commited a crime in the past but they didn't know it was a crime at the time so it's not the same as beating the crap out of someone.
Ignorance of the law is no excuse.
I can't believe I actually have to point that out to someone.
What?
I used the Wayback machine to grab thousands of messages from an old WWWBoard-based message board that I ran, for eventual conversion to vBulletin. Some years, the Wayback Machine crawled every month; others it didn't even visit. Probably 80% of the messages that were posted before 2000 are lost to the ether of cyberspace. Guess you can't expect it to archive everything.
I was able to use the wbm last year to find some old device drivers for a no-name motherboard I had from '97. The company went out of business, and their remaining stockpiles were bought by some other Chinese company, but the wbm actually had old copies of the drivers, and even a bios update for the board. Now, I always check there when I am having a hard time finding stuff that I knew used to be around.
/bq
Just because you're paranoid, it doesn't mean that they're not out to get you.
Its been a bit since I was in highschool and had to get around filters, but the most sure fire way is to run an ssh server at home on a port that your school's firewall lets through (most let 22, but to be less suspicious choose like 25 or 443 or something) and then carry putty around on a pen drive. Whenever you need unrestricted access, pop open putty, connect to your server and create a dynamic tunnel, on that pen drive you can also have firefox and have a socks proxy set up to use port like 1080 or whatever port you choose for a dynamic proxy. There you go, unlimited, encrypted surfing all bypassing your school's filters and beening tunneled through your house. This all assumes that your school's firewall doesn't block based on protocol, but rather ports.
Regards,
Steve
Google Caching and Wayback lookups. You could easily look URLs up by right-clicking on them.
/shameful plug
Oh, wait, there is one.
I do not speak for The Archive. The above post should not be considered to reflect the official position of The Archive. It is purely my own personal opinion, and it was uttered under the influence of painkillers (I had my wisdom teeth yanked out of my jaw Wednesday, qv my Slashdot journal entry). Else I probably would have refrained -- talking about this at all while there's a court case pending was probably a really stupid idea, and I (usually) know better.
-- TTK
OK, you may look at WBM as a library, but IS it?
Yes. It really is. It's a registered member of the American Library Association. Details on http://www.archive.org/about/about.php
It's an honest to God library, which also means that Section 108 of the USC on Copyright applies. Public libraries in the US (and here in the UK) have some pertinent exemptions to the copyright restrictions that bind us mere mortals.
--Ng
Well, the DellComputersSuck.com website actually DESERVED to be sued. Is this a free speach issue? Not really. They were selling another brand of computers, so trademark infringement would actually come into play here.
I'll never make that mistake again, reading the experts' opinions. - Feynman
That's why the archive's administrators signed an affidavit stating that the information was, to the best of their knowledge, not tampered with. And it would be up to the other lawyers to prove that you were falsifying the information, which could lead you into further trouble or at the very least remove any doubt of your guilt from the jury and show that you were indeed acting in bad faith.
I'll never make that mistake again, reading the experts' opinions. - Feynman
They might. But if the other side can provide some evidence that it isn't a true archive than said technician is in deep shit.
I bet the (so-called) Church of Scientology gets everything they want pulled.
In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.
This is pretty bogus because it only works if there is still a current web-site at the spidered address that is on-line and can deliver a robots.txt file saying DON'T! It has already been proven in another case that rapid-fire multiple requests to WBM will cause it to give up pages even when robots.txt says not to.
I see two ways to fix this problem of misuse of a valuable archive:
1: Federal law PROHIBITING the use of evidence from the Wayback Machine in court trials. This is a valuable historical archive that will be less valuable if people worry that it can be used against them in the future in unforeseen ways, and block contributing to it. How many sites already block the WBM TCI/IP address range?
2: WBM could simply announce that they refuse to cooperate in any future trials -- AND THEN DO EXACTLY THAT! Without them to attest to the accuracy of the retrieved data, many cases relying on that data would fall flat on their faces.
Think for a moment. The WBM was not created to make lawyer's lives easier, and their law firms richer!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."