Wayback Archives as a Law Tool
Carl Bialik from the WSJ writes "The Wayback Machine's internet archive and Google's cached pages are becoming indispensable tools for some lawyers, especially specialists in intellectual-property law. Dell has used copies of expired websites to get the domain name DellComputersSuck.com transferred to it, the Wall Street Journal reports. EchoStar used Wayback in a case against a Polish TV company. Playboy checks Wayback to look for infringers of its trademark bunny or other images. And Wayback was even used to discredit a witness and reach a mistrial in a Canada murder case."
WBM works great for pulling up historical text content, but I've noticed it tends to be hit-and-miss with images. Try pulling up a website and chances are you'll see broken image links.
"Simplify, simplify, simplify!" Thoreau
Peaple go to the library and dig through hundreds of old newspapers and records, whats the big deal with using wayback for websites?
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
... WBM respects the site's decision to not allow archiving. Unfortunately, those sites who might be the most interesting know that, and know that they can block archiving.
Personally, I use it as a proxy to get past the school's firewall. Stupid admins can't keep me out!
Honesty may be the best policy, but by process of elimination, dishonesty is the second best policy.
A few weeks ago /. had a bit on an Internet Archive that got sued for having material that was ordered withdrawn by a court. How can they have things both ways? I think I'm begining to understand why law school is so hard, you have to learn to think like a lawyer, which is like learning to type with your nose.
Bacardi + slashdot = negative karma.
Maybe this is a bit off-topic, but employers are also known to use Google and web archives to check up on the past of a potential employee. So be careful what kind of statements you make on the net using your real name.
The owls are not what they seem
I wonder if people will try to sue website owners for content that they already pulled off of thier website. I mean I would hope not but I could see how this could happen. A person realizes that certain content is copyrighted and then pulls it and later on some lawyer for the owner of this content sues and uses google cache or WBM as a tool to prove it posted copyrighted material.
http://www.law.com/jsp/article.jsp?id=112124551185 5
What?
If Google cached pages from the WBM and the WBM archived Google cached pages, wouldn't that cause an infinite loop. j/k
$ cat robots.txt
User-agent: ia_archiver
Disallow: /
My site is not archived there, problem solved.
(Of course, if another of these service pops up...)
Please direct all bug reports to
Is his name Tommy?
Can you be Even More Awesome?!
Destroying evidence? If I don't want to be caught and ask for older web pages to be removed, that may contain incriminating evidence such as illegal copies of things or illegal links, is this different from a request by any other copyright holder to have his pages removed, and can it be punished? What are the archives retention policies, and have legal orders been served to prevent destruction of evidence?
What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.
Stand back Sherman as we set the dials on the wayback machine to 1845....
Seriously, theres been many times I would want to kiss the person running wayback. I lost my home a few years back and had several websites that I lost because I hosted out of my house. I have been able to rebuild, or come fairly close to duplicating those original sites.
As for lawyers, if there wasn't somebody already archiving all these sights, they'd get someone to do it for them and then it would not be accessible to the public. I guess we need to take the good with the bad on this.
I read Slashdot for the headlines, because the headlines, unlike the articles, are usually original and never duplicated
I wonder how this relates to copyright laws. If I write a disertation on my blog, copyright it, and publish it, wouldn't it be a copyright violation for WBM to republish it for profit?
You got any karma man? I really neeed it. Just a little hit! Come on!
I've instructed archive.org to remove my sites from the archive. If I can't put copyrighted stuff on my page, then they can't put my copyrighted stuff on their page. This way I can at least correct errors of judgement without leaving a trail.
http://web.archive.org/web/*/http://sonicblue.com/ > before Dec 2, 2000. I miss that site! Slashdot totally messes up the url though.
* /http://sonicblue.com/
http://sonicblue.com/>http://web.archive.org/web/
The first is by the url method, the second a href. Get them on it pronto.
In the article, it mentions one of the archive's technicians signing an affidavit saying they think it's a true archive. No one would ever lie about that for a big corporate payout.
Never confuse volume with power.
It would be more interesting if he could beat people at games where the best strategy isn't to wiggle the controller as fast as possible.
Most companies send a cease and disist. So if they have to use the wayback machine to find infringements (because people are not CURRENTLY infringing), then there's no order to cease and disist to be sent, yes? Or are they going to drag your ass to court and sue you for something that you stopped doing already, even though the Internet Archive is CURRENTLY doing it (with your material AND the infringed party's material).
The childish nature of these corporations is ridiculous. Looking through archives of up to nine years just to point out: "Hey, you said we suck!" Who cares.
If Dell did not suck, they would not have to be so defensive.
Is there nothing it can't do?
If brevity is the soul of wit, then how does one explain Twitter?
Sometimes being able to see in the past is more valuable than being able to see in the future. It makes sense for lawyers to try to find ways to look back, for evidence and proof can only be found in the past.
If you have evidence, you can prove your claims. If you can prove your claims, you win a dispute. If you win the dispute in favor of your client, that makes you one good lawyer.
Technology ramblings : Simple is Beautiful
In the future what will stop someone or some entity from falsifying information knowing that the legal system will use this.
,what if she had WBM's bot goto a "nice" page and not the real site that case could have gone differently.
How can the info in wayback.org or google be trusted ? You can make redirect pages based on googlebot or wayback that have nothing to do with what is really on the site.
In the article it is mentioned that vodaphone.com was taken by a squatter and they used wayback to show that her intentions were "intended to misleadingly attract consumers"
I think that if someone wants to they can plan ahead and use this in a nefarious fashion.
Success is not the result of spontaneous combustion, you must set yourself on fire.
Can the wayback archive be used as a tool for showing admins on /. that this article is basically a dupe?
+1 funny, -2 overrated. Life isn't fair.
I used the Wayback machine to grab thousands of messages from an old WWWBoard-based message board that I ran, for eventual conversion to vBulletin. Some years, the Wayback Machine crawled every month; others it didn't even visit. Probably 80% of the messages that were posted before 2000 are lost to the ether of cyberspace. Guess you can't expect it to archive everything.
http://www.archive.org/web/web.php How can the lawyers or the Judges can even say that the wayback scripts (php or perl or java) did NOT modify a website's content before committing them to their archive database ?
Chris ,
Php Programmers.
From Apple.com circa 1996:
New PowerBook Family
Addressing the needs of customers in small offices, home offices, business and education, Apple announced the Macintosh PowerBook 1400 series, combining 117MHz PowerPC speed with a removable CD-ROM drive and expansion options.
Ah 1996 was how long ago? I remember lusting after those 117Mhz.
I was able to use the wbm last year to find some old device drivers for a no-name motherboard I had from '97. The company went out of business, and their remaining stockpiles were bought by some other Chinese company, but the wbm actually had old copies of the drivers, and even a bios update for the board. Now, I always check there when I am having a hard time finding stuff that I knew used to be around.
/bq
Just because you're paranoid, it doesn't mean that they're not out to get you.
What makes him so good?
Or maybe this was intended: "Internet Acrhive's Wayback Machine's internet archive."
The childish nature of SilentShriek is ridiculous. Looking through archives of up to nine years just to point out: "Hey, you said I'm a child rapist!" Who cares.
If SilentShriek was not a child rapist, he would not have to be so defensive.
We were also amused to see what people once found impressive. Back the sites were designed with html editors and paint, instead of Dreamweaver and Photoshop.
Lawyers rarely google or do research, Paralegals do it. Pretty soon it's going to trickle up to the actual lawyers who use this stuff where it's coming from. Then being rich, they will go buy loads of Google stock. If for no other reason then to keep these goodies flowing.
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
Playboy checks Wayback to look for infringers of its trademark bunny or other images.
So they're basically just sitting around surfing porn too, eh?
"If I could live to be several hundred
I could take a walk and really wander, really wonder."
+1, Duh
It would be extraordinarily difficult to forge old yellowed hardcopy of newspapers or microfiche copies that are duplicated across numerous library archives .
Thank goodness its equally difficult to forge a few html pages and file time stamps. No-one could ever do that.
Virtual reality indeed. I just shake my head and wonder.
yuhuhpoiu sauick
, the dog from the Rocky and Bullwinkle show. He or his heir, Sherman, will sue Google and win.
Google Caching and Wayback lookups. You could easily look URLs up by right-clicking on them.
/shameful plug
Oh, wait, there is one.
One time, the administrator of WrongPlanet.net that I co-founded it with (I'm no longer running the place) put up a "mirror" of one autism site that went down. It was actually the Archive.org mirror displayed in a frame. The owner of the site was irate and rude about the whole thing, and wanted to sue for "copyright infringement". My friend changed it to a text link in a forums post to the archive.org link, and they still persisted, talking about how horrible it was, and how much pain it caused them, how we were "exploiting" them.
A lot of people really don't understand the internet, it's crazy.
Silly question, but how are these teams archiving the archive for the court case? I'm assuming they can't just use archive.org's cache directly because what's to stop the owner of the domain from requesting archive.org remove the site (aside from an injunction)?
That's probably it.
I played some of these games against an experienced player (this was my first time playing these particular games, as I'm not a gamer at all). After being beaten twice, I decided to try an experiment.
It worked. I beat my friend every time, eight games in all.
How did I do it? I just mashed all of the controls on the controller as quickly as I could. It worked every time. I didn't even need to watch the action on the screen.
In short, the blind kid is pulling a scam. Most people don't know how these games work, so it looks like he has an extraordinary talent, when in fact he's just exploiting a weakness in the design of most combat games.
I do not speak for The Archive. The above post should not be considered to reflect the official position of The Archive. It is purely my own personal opinion, and it was uttered under the influence of painkillers (I had my wisdom teeth yanked out of my jaw Wednesday, qv my Slashdot journal entry). Else I probably would have refrained -- talking about this at all while there's a court case pending was probably a really stupid idea, and I (usually) know better.
-- TTK
I imagine if you searched for sites talking about 9-11 pre-9-11-2001 you'd find some interesting things. Post 9-11 there were a zillion references, but pre-9-11 there couldn't have been that many.
Same with the london bomings, no?
because I have been enjoined by this Holy Office to abandon the false opinion which maintains that the Sun is the centre
Here in DoD where I work, this is the case.
And they don't mind you posting with the nym of the ex-Info Minister of Iraq? Hooray! A little rose of humour blooms in the darkest depths of the Pentagon's labyrinths!
--Ng
From the headline:
"...mistrial in a Canada murder case."
I never heard about that. I have, on the other hand, heard about a Canadian murder case.
Playboy checks Wayback to look for infringers of its trademark bunny or other images.
In a related story, managers at Playboy have taken note of productivity differences between John Salem, who was tasked with finding instances of people illegally using the playboy logo, and Henry Waxman, who has been looking for instances of "other images," but has been observed taking frequent bathroom breaks.
Well, why should anyone care what someone may or may not have said about a company in the past?
Furthermore, there was no logic involved in saying Dell sucks--merely a statement of preferential opinion that is absolutely meaningless in the broader scope of things; does this mean Dell should sue me, too? Therein lies the point.
I never knew that such a thing (Wayback Machine) existed! I almost cried tears of joy to find that while I had to take down (when the hosts started charging) the sites I developed during my college years long back, almost all of it, including all changes, is there! Bwahaha!!! http://web.archive.org/web/*/http://www.thomso.net />
but isn't that the point. The websites that did mention the attacks before they happened would lead you to the people who planned the attacks?
maybe this is a slashdot/linux joke I'm not getting.
How many people immediately went to dellcomputerssuck.com to see what was there?
"Stop throwing the Constitution in my face, it's just a goddamned piece of paper!" - George W. Bush Nov. 2005
A good friend of mine is a professional photographer and he's currently suing Avnet for more then liberal use of his copyrighted images which they didn't want to pay him for the use rights of. When he had discovered the first image, he called me and I told him he could get more info about how and when they had used his images from the wayback machine. He and his lawyer started combing it and came up with tons of exaples and showed them exactly when and how is images had been used by avnet. The case has been going on for two years and so far avnet is not looking so well.
After they filed suit, Avnet's archives disappeared from the wayback machine, but it doesn't matter because they have hard copies of everything.
Why would this stuff disappear?
But how can you force the submitter of the removal request to store a copy, let alone an exact copy, from which the checksum can be calculated?
If the submitter keeps the page it is not necessarily the same set of bytes as the removed one (think dynamic pages).
Avantslash: low-bandwidth mobile slashdot.
Playboy shouldn't be able to go after anyone infringing on the bunny logo. They allow the military and others to use that logo all the time and they didn't enforce their rights with those offenders and thereby lose the right to enforce against anyone that infringes on it.
I wrote their lawyer for a client of mine that wanted to use it and mentioned the above and their council refused to comment on it, probably because she knows I'm right.
So if some criminal tries to use robots.txt to hide information, the lawyers can still get it, either by subpoena/discovery to wayback, or an injunction to get the guy to remove his robots.txt, at which point the data returns to wayback!
cool!
Wow, who would have thunk that an archive of history could be useful to anyone, let alone for rich companies to use to sue?
I think its widely recognized that the Internet Archive is general societal good, it should be funded as such.
you have to learn to think like a lawyer, which is like learning to type with your nose, and look good while you're doing it.
The Kruger Dunning explains most post on
"If you win the dispute in favor of your client, that makes you one good lawyer."
No, it makes you a SKILLED lawyer. A GOOD lawyer is someone who would only accept victory in a case if they are SUPPOSED to win - i.e. in my book getting a murder (who is actually guilty and in fact SHOULD be punished) off the hook with legal trickery makes the lawyer almost as bad for society as the murder him/herself. In that case, in my opinion, the GOOD lawyer would plead the facts (guilty) and do their best to make the best representation of the reality they can. But getting them off when they actually ARE guilty should be a crime in and of itself.
The only disadvantage I can see is opinion content. People change opinions over the years. Stuff they believe in when they were younger change as they mature and realized what life is really about. But the problem is with the ability of looking back in the archives, what was said years before could come back and bite you.
Web users should be given the option to opt out of the wayback since no one realized that websites and stuff were being archived a few years ago.
Remember what you said years ago can still hurt you.
that Joe Wilson outed his own wife in 2002.
w ww.mideasti.org/html/bio-wilson.html
http://web.archive.org/web/20030720060539/http://
So they're checking for past infringments. Is that the same as current infringment? I.e. even if you get a takedown notice and obey it, are you still equally guilty that once upon a time you in violation? That's what it's soundling like here.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
They killed Canada! Those bastards!
- Peace
Free as in "the Truth shall set you..."
I bet the (so-called) Church of Scientology gets everything they want pulled.
In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.
This is pretty bogus because it only works if there is still a current web-site at the spidered address that is on-line and can deliver a robots.txt file saying DON'T! It has already been proven in another case that rapid-fire multiple requests to WBM will cause it to give up pages even when robots.txt says not to.
I see two ways to fix this problem of misuse of a valuable archive:
1: Federal law PROHIBITING the use of evidence from the Wayback Machine in court trials. This is a valuable historical archive that will be less valuable if people worry that it can be used against them in the future in unforeseen ways, and block contributing to it. How many sites already block the WBM TCI/IP address range?
2: WBM could simply announce that they refuse to cooperate in any future trials -- AND THEN DO EXACTLY THAT! Without them to attest to the accuracy of the retrieved data, many cases relying on that data would fall flat on their faces.
Think for a moment. The WBM was not created to make lawyer's lives easier, and their law firms richer!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Here's what happens, I think: 1) At first, no reference to Wayback machine in robots.txt. Site is spidered and the archive placed on line. 2) Add Wayback Machine to robots.txt. The site is no longer spidered, and old archives are hidden from public view. 3) Remove Wayback Machine from robots.txt again. Spidering resumes, and all the old archives of the site reappear. However, there is no archive of your site from the time robots.txt was up; remember, it wasn't spidered then.
So, the wayback machine copies websites (gee, wonder if they get access to pay for view sites...) Do they get permission to copy trademarked and copywritten info on all those sites? If not, and the info is clearly trademarked/copywritten, and this does not sound like it fits into a catagory of the exceptions (such as personal use) how do they claim not to be in violation of the trademarker/copywriter's rights? (and therefore not liable?)
Hmm, I smell lawsuits (just sat for the bar...) Just think, if it can gain access to some corporation's websites where trade secrets are stored for employee (but not public) access, and the company's competition uses those trade secrets to gain market advantage...
Here's what happens, I think:
This is absolutely correct and consistent with previous experiences with WBM.
cpghost at Cordula's Web.
For what it's worth, the Internet Archive has at least at one point in its history collected Usenet posts. This isn't in the Wayback machine, though.
http://www.archive.org/about/faqs.php#The_Wayback_ Machine
--Pat / zippy@cs.brandeis.edu / blog / pics
Rove is still a lyingscumbagcrook. The WBM reference states: "He is married to the former Valerie Plame and has two sons and two daughters." (he = Joe Wilson)
Rove, may he burn in hell, added the little extra: 'VP work in covert ops at the firm'.
Not mentioned in the WSJ article: this is run as a not-for-profit. It is privately funded.
... 'unless we can take over the world by actually doing evil', WBM has no objective other than, say, making all data & knowledge on earth available to everyone for free.
... as it must in order to find all those things to cache. It searches more of the Internet than Google.
Unlike Google/cache, where the 'do no evil' motto is looking more & more as though it comes second to
Little known fact: the Internet Archive also has a search engine
A lawyer who intentionally loses cases, for any reason, is a bad lawyer in my book. Our legal system is based on the assumption that having qualified representatives arguing the case for both sides in a dispute as effectively as they can will result in a passable approximation of justice. If a lawyer starts intentionally performing worse for certain clients based on popular opinion(or the lawyer's personal judgement, which can be seen simply as a subset of popular opinion) of the case, they are betraying that system in a very fundamental sense.
I'm not saying that our legal system is perfect, but it is, in my opinion, superior to a system where it is acceptable for a lawyer to take on a judging role at their discretion.
I bet it hasn't been mentioned, so -
There are (spam) companies who check out (popular) web addresses to see if they're not in use anymore, after this they buy the adress and put in any of your run-of-the-mill "search-engines" in it, and ALSO a robots.txt that doesn't allow any archiving.
Meaning what? It kills whatever WBM has stored on the site.
Also, I've been thinking that there needs to be an Art-WBM, I really loved gameart.com, and trying to get what was on the site from the WBM is quite hopeless. Also just small sites - small artists - would be real nice to have their work stored for future generations, Steven Garofalo's I remember being questioned on sijun where it went - the site takes you to just the thing I said about "search-engines". ("rustedfaith.com What you need, when you need it | Popular Categories | Sex Art Music Rust Faith Blackjack")
In regards to law, I can't remember the tanglible address or website, but in the name of law, companies have gotten WBM to remove sites they didn't like. (Oh, could always try on "Really evil company")
the sun is god
I was involved in a lawsuit over contents of a web page, and we used the Wayback machine to gather evidence of a web page over time. The only problem was that the dates the page was archived in the Wayback database were inconsistent and sometimes too far apart. Some significant contents of the site were missed because they were only available for a few weeks, and were skipped over by their spider. In the end, it didn't help much, it only hurt our case.
In Soviet Russia, Wayback archives you!
What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.
/. anonymously. You never know who may be trying to infringe your rights.
You must be kidding! Have you ever heard of the right of protection from self incrimination. Many of these pages are not authored by what would be considered public people. Some of them are private citizens discussing private thoughts with friends and family. What gives these archives the right to make them public.
This is why I post on
Usually, you don't have to make a big proof of facts. During a deposition, you ask a witness whether the website said such-and-such at a given time. You show them the document, and ask again. Since there is high probability the archive.org content is unadulterated, the witness is usually pretty dumb to deny under oath, and more often will simply admit and authenticate the document.
Sometimes, they may shrug their shoulders, and the best you can get is a "dunno," and acknowledgment that they can't deny whether it said what it said at that time, but don't know that it did. Many witnesses cannot do this credibly, without diminishing their credibility when they later testify they are sure about something else, but the facts dictate the likelihood that depositions and admission can get you all the evidence you need, and the document becomes more of a demonstrative exhibit than principal evidence.
If that doesn't work, you have two threshold evidentiary issues: (1) authenticating the document as legit; and (2) overcoming the hearsay rule.
Authentication isn't hard. You get a declaration from archive.org saying it is a legit business record, and the other side rarely has any evidence to the contrary. That will be enough to get it in, and probably enough for the jury to give it credence.
Then you have the hearsay problem, which may be one-way or two-way hearsay. The document is a record of Archive saying that susie.com said "Jimmy eats rice" on a particular date, and is being offered for the truth of that statement. It may also be offered for the truth of the proposition that "Jimmy eats rice," as opposed to the mere proposition that "Suzie said 'Jimmy eats rice'" in which case it is double hearsay.
There are a host of exceptions that may be applicable in every particular case. The primary hearsay issue is routinely overcome using the business records exception, and other exceptions may or may not apply depending upon the nature of the statement, the party making the statment, independent indicia of credibility and so forth. It is both easier and harder than it seems to solve these problems, and this kind of stuff is why you pay lawyers the big bucks.