Wayback Archives as a Law Tool
Carl Bialik from the WSJ writes "The Wayback Machine's internet archive and Google's cached pages are becoming indispensable tools for some lawyers, especially specialists in intellectual-property law. Dell has used copies of expired websites to get the domain name DellComputersSuck.com transferred to it, the Wall Street Journal reports. EchoStar used Wayback in a case against a Polish TV company. Playboy checks Wayback to look for infringers of its trademark bunny or other images. And Wayback was even used to discredit a witness and reach a mistrial in a Canada murder case."
WBM works great for pulling up historical text content, but I've noticed it tends to be hit-and-miss with images. Try pulling up a website and chances are you'll see broken image links.
"Simplify, simplify, simplify!" Thoreau
Peaple go to the library and dig through hundreds of old newspapers and records, whats the big deal with using wayback for websites?
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
... WBM respects the site's decision to not allow archiving. Unfortunately, those sites who might be the most interesting know that, and know that they can block archiving.
A few weeks ago /. had a bit on an Internet Archive that got sued for having material that was ordered withdrawn by a court. How can they have things both ways? I think I'm begining to understand why law school is so hard, you have to learn to think like a lawyer, which is like learning to type with your nose.
Bacardi + slashdot = negative karma.
Maybe this is a bit off-topic, but employers are also known to use Google and web archives to check up on the past of a potential employee. So be careful what kind of statements you make on the net using your real name.
The owls are not what they seem
I wonder if people will try to sue website owners for content that they already pulled off of thier website. I mean I would hope not but I could see how this could happen. A person realizes that certain content is copyrighted and then pulls it and later on some lawyer for the owner of this content sues and uses google cache or WBM as a tool to prove it posted copyrighted material.
http://www.law.com/jsp/article.jsp?id=112124551185 5
What?
If Google cached pages from the WBM and the WBM archived Google cached pages, wouldn't that cause an infinite loop. j/k
$ cat robots.txt
User-agent: ia_archiver
Disallow: /
My site is not archived there, problem solved.
(Of course, if another of these service pops up...)
Please direct all bug reports to
Is his name Tommy?
Can you be Even More Awesome?!
Destroying evidence? If I don't want to be caught and ask for older web pages to be removed, that may contain incriminating evidence such as illegal copies of things or illegal links, is this different from a request by any other copyright holder to have his pages removed, and can it be punished? What are the archives retention policies, and have legal orders been served to prevent destruction of evidence?
What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.
Stand back Sherman as we set the dials on the wayback machine to 1845....
Seriously, theres been many times I would want to kiss the person running wayback. I lost my home a few years back and had several websites that I lost because I hosted out of my house. I have been able to rebuild, or come fairly close to duplicating those original sites.
As for lawyers, if there wasn't somebody already archiving all these sights, they'd get someone to do it for them and then it would not be accessible to the public. I guess we need to take the good with the bad on this.
I read Slashdot for the headlines, because the headlines, unlike the articles, are usually original and never duplicated
I wonder how this relates to copyright laws. If I write a disertation on my blog, copyright it, and publish it, wouldn't it be a copyright violation for WBM to republish it for profit?
You got any karma man? I really neeed it. Just a little hit! Come on!
http://web.archive.org/web/*/http://sonicblue.com/ > before Dec 2, 2000. I miss that site! Slashdot totally messes up the url though.
* /http://sonicblue.com/
http://sonicblue.com/>http://web.archive.org/web/
The first is by the url method, the second a href. Get them on it pronto.
In the article, it mentions one of the archive's technicians signing an affidavit saying they think it's a true archive. No one would ever lie about that for a big corporate payout.
Never confuse volume with power.
Most companies send a cease and disist. So if they have to use the wayback machine to find infringements (because people are not CURRENTLY infringing), then there's no order to cease and disist to be sent, yes? Or are they going to drag your ass to court and sue you for something that you stopped doing already, even though the Internet Archive is CURRENTLY doing it (with your material AND the infringed party's material).
The childish nature of these corporations is ridiculous. Looking through archives of up to nine years just to point out: "Hey, you said we suck!" Who cares.
If Dell did not suck, they would not have to be so defensive.
Is there nothing it can't do?
If brevity is the soul of wit, then how does one explain Twitter?
Sometimes being able to see in the past is more valuable than being able to see in the future. It makes sense for lawyers to try to find ways to look back, for evidence and proof can only be found in the past.
If you have evidence, you can prove your claims. If you can prove your claims, you win a dispute. If you win the dispute in favor of your client, that makes you one good lawyer.
Technology ramblings : Simple is Beautiful
In the future what will stop someone or some entity from falsifying information knowing that the legal system will use this.
,what if she had WBM's bot goto a "nice" page and not the real site that case could have gone differently.
How can the info in wayback.org or google be trusted ? You can make redirect pages based on googlebot or wayback that have nothing to do with what is really on the site.
In the article it is mentioned that vodaphone.com was taken by a squatter and they used wayback to show that her intentions were "intended to misleadingly attract consumers"
I think that if someone wants to they can plan ahead and use this in a nefarious fashion.
Success is not the result of spontaneous combustion, you must set yourself on fire.
Can the wayback archive be used as a tool for showing admins on /. that this article is basically a dupe?
+1 funny, -2 overrated. Life isn't fair.
I used the Wayback machine to grab thousands of messages from an old WWWBoard-based message board that I ran, for eventual conversion to vBulletin. Some years, the Wayback Machine crawled every month; others it didn't even visit. Probably 80% of the messages that were posted before 2000 are lost to the ether of cyberspace. Guess you can't expect it to archive everything.
Many organizations that block access to sites also block the Wayback Machine. Here in DoD where I work, this is the case.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
From Apple.com circa 1996:
New PowerBook Family
Addressing the needs of customers in small offices, home offices, business and education, Apple announced the Macintosh PowerBook 1400 series, combining 117MHz PowerPC speed with a removable CD-ROM drive and expansion options.
Ah 1996 was how long ago? I remember lusting after those 117Mhz.
I was able to use the wbm last year to find some old device drivers for a no-name motherboard I had from '97. The company went out of business, and their remaining stockpiles were bought by some other Chinese company, but the wbm actually had old copies of the drivers, and even a bios update for the board. Now, I always check there when I am having a hard time finding stuff that I knew used to be around.
/bq
Just because you're paranoid, it doesn't mean that they're not out to get you.
And when they block archive.org, what will you do then? Even stupid admins tend to grow clues from time to time.
What makes him so good?
Or maybe this was intended: "Internet Acrhive's Wayback Machine's internet archive."
Lawyers rarely google or do research, Paralegals do it. Pretty soon it's going to trickle up to the actual lawyers who use this stuff where it's coming from. Then being rich, they will go buy loads of Google stock. If for no other reason then to keep these goodies flowing.
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
Playboy checks Wayback to look for infringers of its trademark bunny or other images.
So they're basically just sitting around surfing porn too, eh?
"If I could live to be several hundred
I could take a walk and really wander, really wonder."
Dear Mr. FlameBait:
No need to defend myself against imbeciles who have no idea how to speak to a lady.
+1, Duh
Its been a bit since I was in highschool and had to get around filters, but the most sure fire way is to run an ssh server at home on a port that your school's firewall lets through (most let 22, but to be less suspicious choose like 25 or 443 or something) and then carry putty around on a pen drive. Whenever you need unrestricted access, pop open putty, connect to your server and create a dynamic tunnel, on that pen drive you can also have firefox and have a socks proxy set up to use port like 1080 or whatever port you choose for a dynamic proxy. There you go, unlimited, encrypted surfing all bypassing your school's filters and beening tunneled through your house. This all assumes that your school's firewall doesn't block based on protocol, but rather ports.
Regards,
Steve
It would be extraordinarily difficult to forge old yellowed hardcopy of newspapers or microfiche copies that are duplicated across numerous library archives .
Thank goodness its equally difficult to forge a few html pages and file time stamps. No-one could ever do that.
Virtual reality indeed. I just shake my head and wonder.
yuhuhpoiu sauick
Google Caching and Wayback lookups. You could easily look URLs up by right-clicking on them.
/shameful plug
Oh, wait, there is one.
One time, the administrator of WrongPlanet.net that I co-founded it with (I'm no longer running the place) put up a "mirror" of one autism site that went down. It was actually the Archive.org mirror displayed in a frame. The owner of the site was irate and rude about the whole thing, and wanted to sue for "copyright infringement". My friend changed it to a text link in a forums post to the archive.org link, and they still persisted, talking about how horrible it was, and how much pain it caused them, how we were "exploiting" them.
A lot of people really don't understand the internet, it's crazy.
I do not speak for The Archive. The above post should not be considered to reflect the official position of The Archive. It is purely my own personal opinion, and it was uttered under the influence of painkillers (I had my wisdom teeth yanked out of my jaw Wednesday, qv my Slashdot journal entry). Else I probably would have refrained -- talking about this at all while there's a court case pending was probably a really stupid idea, and I (usually) know better.
-- TTK
I imagine if you searched for sites talking about 9-11 pre-9-11-2001 you'd find some interesting things. Post 9-11 there were a zillion references, but pre-9-11 there couldn't have been that many.
Same with the london bomings, no?
because I have been enjoined by this Holy Office to abandon the false opinion which maintains that the Sun is the centre
Here in DoD where I work, this is the case.
And they don't mind you posting with the nym of the ex-Info Minister of Iraq? Hooray! A little rose of humour blooms in the darkest depths of the Pentagon's labyrinths!
--Ng
Playboy checks Wayback to look for infringers of its trademark bunny or other images.
In a related story, managers at Playboy have taken note of productivity differences between John Salem, who was tasked with finding instances of people illegally using the playboy logo, and Henry Waxman, who has been looking for instances of "other images," but has been observed taking frequent bathroom breaks.
Well, why should anyone care what someone may or may not have said about a company in the past?
Furthermore, there was no logic involved in saying Dell sucks--merely a statement of preferential opinion that is absolutely meaningless in the broader scope of things; does this mean Dell should sue me, too? Therein lies the point.
I never knew that such a thing (Wayback Machine) existed! I almost cried tears of joy to find that while I had to take down (when the hosts started charging) the sites I developed during my college years long back, almost all of it, including all changes, is there! Bwahaha!!! http://web.archive.org/web/*/http://www.thomso.net />
And when they block archive.org, what will you do then?
:D No one will be able to block you then! Oh, wait...
www.archive.org.nyud.net:8090. Ta-da!
How many people immediately went to dellcomputerssuck.com to see what was there?
"Stop throwing the Constitution in my face, it's just a goddamned piece of paper!" - George W. Bush Nov. 2005
But how can you force the submitter of the removal request to store a copy, let alone an exact copy, from which the checksum can be calculated?
If the submitter keeps the page it is not necessarily the same set of bytes as the removed one (think dynamic pages).
Avantslash: low-bandwidth mobile slashdot.
So if some criminal tries to use robots.txt to hide information, the lawyers can still get it, either by subpoena/discovery to wayback, or an injunction to get the guy to remove his robots.txt, at which point the data returns to wayback!
cool!
Thank you for adding some intelligence to this inane debate. Cheers.
Wow, who would have thunk that an archive of history could be useful to anyone, let alone for rich companies to use to sue?
I think its widely recognized that the Internet Archive is general societal good, it should be funded as such.
you have to learn to think like a lawyer, which is like learning to type with your nose, and look good while you're doing it.
The Kruger Dunning explains most post on
that Joe Wilson outed his own wife in 2002.
w ww.mideasti.org/html/bio-wilson.html
http://web.archive.org/web/20030720060539/http://
So they're checking for past infringments. Is that the same as current infringment? I.e. even if you get a takedown notice and obey it, are you still equally guilty that once upon a time you in violation? That's what it's soundling like here.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
They killed Canada! Those bastards!
- Peace
Free as in "the Truth shall set you..."
I bet the (so-called) Church of Scientology gets everything they want pulled.
In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.
This is pretty bogus because it only works if there is still a current web-site at the spidered address that is on-line and can deliver a robots.txt file saying DON'T! It has already been proven in another case that rapid-fire multiple requests to WBM will cause it to give up pages even when robots.txt says not to.
I see two ways to fix this problem of misuse of a valuable archive:
1: Federal law PROHIBITING the use of evidence from the Wayback Machine in court trials. This is a valuable historical archive that will be less valuable if people worry that it can be used against them in the future in unforeseen ways, and block contributing to it. How many sites already block the WBM TCI/IP address range?
2: WBM could simply announce that they refuse to cooperate in any future trials -- AND THEN DO EXACTLY THAT! Without them to attest to the accuracy of the retrieved data, many cases relying on that data would fall flat on their faces.
Think for a moment. The WBM was not created to make lawyer's lives easier, and their law firms richer!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
BS. How can I be modded Offtopic? Offtopic would be if I had said something about paving streets. THAT is offtopic. Referring to the Wayback machine IS NOT OFFTOPIC IN THIS ARTICLE. try to RTFA next time you mod me down.
Honesty may be the best policy, but by process of elimination, dishonesty is the second best policy.
Here's what happens, I think: 1) At first, no reference to Wayback machine in robots.txt. Site is spidered and the archive placed on line. 2) Add Wayback Machine to robots.txt. The site is no longer spidered, and old archives are hidden from public view. 3) Remove Wayback Machine from robots.txt again. Spidering resumes, and all the old archives of the site reappear. However, there is no archive of your site from the time robots.txt was up; remember, it wasn't spidered then.
when i was in highschool i just plugged in a laptop, used the right static IP and could do whatever i wanted:)
district office REALLY didn't like encrypted traffic going in and out via ssh thou..
Here's what happens, I think:
This is absolutely correct and consistent with previous experiences with WBM.
cpghost at Cordula's Web.
For what it's worth, the Internet Archive has at least at one point in its history collected Usenet posts. This isn't in the Wayback machine, though.
http://www.archive.org/about/faqs.php#The_Wayback_ Machine
--Pat / zippy@cs.brandeis.edu / blog / pics
Rove is still a lyingscumbagcrook. The WBM reference states: "He is married to the former Valerie Plame and has two sons and two daughters." (he = Joe Wilson)
Rove, may he burn in hell, added the little extra: 'VP work in covert ops at the firm'.
Not mentioned in the WSJ article: this is run as a not-for-profit. It is privately funded.
... 'unless we can take over the world by actually doing evil', WBM has no objective other than, say, making all data & knowledge on earth available to everyone for free.
... as it must in order to find all those things to cache. It searches more of the Internet than Google.
Unlike Google/cache, where the 'do no evil' motto is looking more & more as though it comes second to
Little known fact: the Internet Archive also has a search engine
Sorry to offend your delicate senses. However, the conclusion that you drew in your original post was terribly flawed. If you can't see that from the overly dramatic response that I posted, then I guess it's hopeless.
Incidentally, whether or not I know how to speak to a lady has no bearing on the serious flaw (both logically and morally) in your original post. But it's a nice way of shifting attention away from my point.
-h-
Having an opinion about a machine does not necessarily involve so much logic, and especially not morals. I own a Dell and I can say it sucks if I feel like it. If that's immoral, off to Hell I go. :-P
You work for Dell or something?
I was involved in a lawsuit over contents of a web page, and we used the Wayback machine to gather evidence of a web page over time. The only problem was that the dates the page was archived in the Wayback database were inconsistent and sometimes too far apart. Some significant contents of the site were missed because they were only available for a few weeks, and were skipped over by their spider. In the end, it didn't help much, it only hurt our case.
Usually, you don't have to make a big proof of facts. During a deposition, you ask a witness whether the website said such-and-such at a given time. You show them the document, and ask again. Since there is high probability the archive.org content is unadulterated, the witness is usually pretty dumb to deny under oath, and more often will simply admit and authenticate the document.
Sometimes, they may shrug their shoulders, and the best you can get is a "dunno," and acknowledgment that they can't deny whether it said what it said at that time, but don't know that it did. Many witnesses cannot do this credibly, without diminishing their credibility when they later testify they are sure about something else, but the facts dictate the likelihood that depositions and admission can get you all the evidence you need, and the document becomes more of a demonstrative exhibit than principal evidence.
If that doesn't work, you have two threshold evidentiary issues: (1) authenticating the document as legit; and (2) overcoming the hearsay rule.
Authentication isn't hard. You get a declaration from archive.org saying it is a legit business record, and the other side rarely has any evidence to the contrary. That will be enough to get it in, and probably enough for the jury to give it credence.
Then you have the hearsay problem, which may be one-way or two-way hearsay. The document is a record of Archive saying that susie.com said "Jimmy eats rice" on a particular date, and is being offered for the truth of that statement. It may also be offered for the truth of the proposition that "Jimmy eats rice," as opposed to the mere proposition that "Suzie said 'Jimmy eats rice'" in which case it is double hearsay.
There are a host of exceptions that may be applicable in every particular case. The primary hearsay issue is routinely overcome using the business records exception, and other exceptions may or may not apply depending upon the nature of the statement, the party making the statment, independent indicia of credibility and so forth. It is both easier and harder than it seems to solve these problems, and this kind of stuff is why you pay lawyers the big bucks.