Wayback Archives as a Law Tool
Carl Bialik from the WSJ writes "The Wayback Machine's internet archive and Google's cached pages are becoming indispensable tools for some lawyers, especially specialists in intellectual-property law. Dell has used copies of expired websites to get the domain name DellComputersSuck.com transferred to it, the Wall Street Journal reports. EchoStar used Wayback in a case against a Polish TV company. Playboy checks Wayback to look for infringers of its trademark bunny or other images. And Wayback was even used to discredit a witness and reach a mistrial in a Canada murder case."
WBM works great for pulling up historical text content, but I've noticed it tends to be hit-and-miss with images. Try pulling up a website and chances are you'll see broken image links.
"Simplify, simplify, simplify!" Thoreau
Peaple go to the library and dig through hundreds of old newspapers and records, whats the big deal with using wayback for websites?
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
... WBM respects the site's decision to not allow archiving. Unfortunately, those sites who might be the most interesting know that, and know that they can block archiving.
A few weeks ago /. had a bit on an Internet Archive that got sued for having material that was ordered withdrawn by a court. How can they have things both ways? I think I'm begining to understand why law school is so hard, you have to learn to think like a lawyer, which is like learning to type with your nose.
Bacardi + slashdot = negative karma.
Maybe this is a bit off-topic, but employers are also known to use Google and web archives to check up on the past of a potential employee. So be careful what kind of statements you make on the net using your real name.
The owls are not what they seem
I wonder if people will try to sue website owners for content that they already pulled off of thier website. I mean I would hope not but I could see how this could happen. A person realizes that certain content is copyrighted and then pulls it and later on some lawyer for the owner of this content sues and uses google cache or WBM as a tool to prove it posted copyrighted material.
http://www.law.com/jsp/article.jsp?id=112124551185 5
What?
If Google cached pages from the WBM and the WBM archived Google cached pages, wouldn't that cause an infinite loop. j/k
$ cat robots.txt
User-agent: ia_archiver
Disallow: /
My site is not archived there, problem solved.
(Of course, if another of these service pops up...)
Please direct all bug reports to
Destroying evidence? If I don't want to be caught and ask for older web pages to be removed, that may contain incriminating evidence such as illegal copies of things or illegal links, is this different from a request by any other copyright holder to have his pages removed, and can it be punished? What are the archives retention policies, and have legal orders been served to prevent destruction of evidence?
What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.
Stand back Sherman as we set the dials on the wayback machine to 1845....
Seriously, theres been many times I would want to kiss the person running wayback. I lost my home a few years back and had several websites that I lost because I hosted out of my house. I have been able to rebuild, or come fairly close to duplicating those original sites.
As for lawyers, if there wasn't somebody already archiving all these sights, they'd get someone to do it for them and then it would not be accessible to the public. I guess we need to take the good with the bad on this.
I read Slashdot for the headlines, because the headlines, unlike the articles, are usually original and never duplicated
In the article, it mentions one of the archive's technicians signing an affidavit saying they think it's a true archive. No one would ever lie about that for a big corporate payout.
Never confuse volume with power.
The childish nature of these corporations is ridiculous. Looking through archives of up to nine years just to point out: "Hey, you said we suck!" Who cares.
If Dell did not suck, they would not have to be so defensive.
Sometimes being able to see in the past is more valuable than being able to see in the future. It makes sense for lawyers to try to find ways to look back, for evidence and proof can only be found in the past.
If you have evidence, you can prove your claims. If you can prove your claims, you win a dispute. If you win the dispute in favor of your client, that makes you one good lawyer.
Technology ramblings : Simple is Beautiful
In the future what will stop someone or some entity from falsifying information knowing that the legal system will use this.
,what if she had WBM's bot goto a "nice" page and not the real site that case could have gone differently.
How can the info in wayback.org or google be trusted ? You can make redirect pages based on googlebot or wayback that have nothing to do with what is really on the site.
In the article it is mentioned that vodaphone.com was taken by a squatter and they used wayback to show that her intentions were "intended to misleadingly attract consumers"
I think that if someone wants to they can plan ahead and use this in a nefarious fashion.
Success is not the result of spontaneous combustion, you must set yourself on fire.
I used the Wayback machine to grab thousands of messages from an old WWWBoard-based message board that I ran, for eventual conversion to vBulletin. Some years, the Wayback Machine crawled every month; others it didn't even visit. Probably 80% of the messages that were posted before 2000 are lost to the ether of cyberspace. Guess you can't expect it to archive everything.
From Apple.com circa 1996:
New PowerBook Family
Addressing the needs of customers in small offices, home offices, business and education, Apple announced the Macintosh PowerBook 1400 series, combining 117MHz PowerPC speed with a removable CD-ROM drive and expansion options.
Ah 1996 was how long ago? I remember lusting after those 117Mhz.
I was able to use the wbm last year to find some old device drivers for a no-name motherboard I had from '97. The company went out of business, and their remaining stockpiles were bought by some other Chinese company, but the wbm actually had old copies of the drivers, and even a bios update for the board. Now, I always check there when I am having a hard time finding stuff that I knew used to be around.
/bq
Just because you're paranoid, it doesn't mean that they're not out to get you.
Playboy checks Wayback to look for infringers of its trademark bunny or other images.
So they're basically just sitting around surfing porn too, eh?
"If I could live to be several hundred
I could take a walk and really wander, really wonder."
Its been a bit since I was in highschool and had to get around filters, but the most sure fire way is to run an ssh server at home on a port that your school's firewall lets through (most let 22, but to be less suspicious choose like 25 or 443 or something) and then carry putty around on a pen drive. Whenever you need unrestricted access, pop open putty, connect to your server and create a dynamic tunnel, on that pen drive you can also have firefox and have a socks proxy set up to use port like 1080 or whatever port you choose for a dynamic proxy. There you go, unlimited, encrypted surfing all bypassing your school's filters and beening tunneled through your house. This all assumes that your school's firewall doesn't block based on protocol, but rather ports.
Regards,
Steve
yuhuhpoiu sauick
Google Caching and Wayback lookups. You could easily look URLs up by right-clicking on them.
/shameful plug
Oh, wait, there is one.
I do not speak for The Archive. The above post should not be considered to reflect the official position of The Archive. It is purely my own personal opinion, and it was uttered under the influence of painkillers (I had my wisdom teeth yanked out of my jaw Wednesday, qv my Slashdot journal entry). Else I probably would have refrained -- talking about this at all while there's a court case pending was probably a really stupid idea, and I (usually) know better.
-- TTK
I imagine if you searched for sites talking about 9-11 pre-9-11-2001 you'd find some interesting things. Post 9-11 there were a zillion references, but pre-9-11 there couldn't have been that many.
Same with the london bomings, no?
because I have been enjoined by this Holy Office to abandon the false opinion which maintains that the Sun is the centre
OK, you may look at WBM as a library, but IS it?
Yes. It really is. It's a registered member of the American Library Association. Details on http://www.archive.org/about/about.php
It's an honest to God library, which also means that Section 108 of the USC on Copyright applies. Public libraries in the US (and here in the UK) have some pertinent exemptions to the copyright restrictions that bind us mere mortals.
--Ng
Playboy checks Wayback to look for infringers of its trademark bunny or other images.
In a related story, managers at Playboy have taken note of productivity differences between John Salem, who was tasked with finding instances of people illegally using the playboy logo, and Henry Waxman, who has been looking for instances of "other images," but has been observed taking frequent bathroom breaks.
They killed Canada! Those bastards!
- Peace
Free as in "the Truth shall set you..."
I bet the (so-called) Church of Scientology gets everything they want pulled.
In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.
This is pretty bogus because it only works if there is still a current web-site at the spidered address that is on-line and can deliver a robots.txt file saying DON'T! It has already been proven in another case that rapid-fire multiple requests to WBM will cause it to give up pages even when robots.txt says not to.
I see two ways to fix this problem of misuse of a valuable archive:
1: Federal law PROHIBITING the use of evidence from the Wayback Machine in court trials. This is a valuable historical archive that will be less valuable if people worry that it can be used against them in the future in unforeseen ways, and block contributing to it. How many sites already block the WBM TCI/IP address range?
2: WBM could simply announce that they refuse to cooperate in any future trials -- AND THEN DO EXACTLY THAT! Without them to attest to the accuracy of the retrieved data, many cases relying on that data would fall flat on their faces.
Think for a moment. The WBM was not created to make lawyer's lives easier, and their law firms richer!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Here's what happens, I think: 1) At first, no reference to Wayback machine in robots.txt. Site is spidered and the archive placed on line. 2) Add Wayback Machine to robots.txt. The site is no longer spidered, and old archives are hidden from public view. 3) Remove Wayback Machine from robots.txt again. Spidering resumes, and all the old archives of the site reappear. However, there is no archive of your site from the time robots.txt was up; remember, it wasn't spidered then.