Federal Judge Says Internet Archive's Wayback Machine A Perfectly Legitimate Source Of Evidence
Tim Cushing, reporting for TechDirt (condensed): Those of us who dwell on the internet already know the Internet Archive's "Wayback Machine" is a useful source of evidence. So, it's heartening to see a federal judge arrive at the same conclusion, as Stephen Bykowski of the Trademark and Copyright Law blog reports.From the report: The potential uses of the Wayback Machine in IP litigation are powerful and diverse. Historical versions of an opposing party's website could contain useful admissions or, in the case of patent disputes, invalidating prior art. Date-stamped websites can also contain proof of past infringing use of copyrighted or trademarked content.From TechDirt: The defendant tried to argue that the Internet Archive's pages weren't admissible because the Wayback Machine doesn't capture everything on the page or update every page from a website on the same date. The judge, after receiving testimony from an Internet Archive employee, disagreed. He found the site to a credible source of preserved evidence -- not just because it captures (for the most part) sites as they were on relevant dates but, more importantly, it does nothing to alter the purity of the preserved evidence.
this means archive.is isn't as good a source, since it heavily alters pages in the process of storing them.
Just submit a DMCA request. Poof!
that once something hits the Web, it's there forever.
Does this 1990 archive of clownsex.org make my ass look big???
what if wayback machine starts altering pages? Who will independently verify that they were not tinkered with? Can we trust a private entitity with collecting information?
Same applies also to companies which collect torrent infringement info. They can put there anything they want, even inflate the numbers (e.g. this is why Switzerland banned such collection of data from being evidence).
The internet archive is, frankly, quite very crappy. I'm going to ignore the problems they have retaining good and knowledgeable employees. Their internal data structures are lossy as much of what they ought to have you cannot access because the software is doing really stupid stuff in the background. This is visible from the outside and I had (back then still) employees confirm that to me. Short version: Their framing sucks big large hair balls through small tubes. There is also that blanket robots.txt set up by domain squatters are allowed to retroactively alter the visible record.
So, while what they have is occasionally useful (though more often the stuff I need is simply not accessible so they are only useful as a source of last resort), and their own current employees will naturally insist that archive.org is not terminally broken somehow, using them as evidence is iffy at best. The defendant has the right of it, that archive.org can very well distort the evidence it coughs up in dangerous ways, if not so much by altering the record, though it might do that to since you do not get a full clean original back, not by a long shot, then at the very least by omission. And that too can be quite damaging and distortive.
Internet Archive has a DMCA Exemption http://archive.org/about/dmca....
It's amazing what is trusted these days. For example, archive.org is not regulated, controlled, managed, or ANYTHING WHATSOEVER that could be considered legally binding yet here they are trusting it for legal decisions. Do they not understand how easy it would be to put fake data up there, remove data, alter data, etc? This is equivalent to asking a random private citizen that has nothing to do with a case to testify as a witness in said case. It's ridiculous.
If they do want to make legal decisions then the source should be a legally liable source bound by strict legal guidelines and control.
Please consider installing the "ArchiveTheWeb" Chrome extension then: https://chrome.google.com/webstore/detail/archivetheweb/jgpbjlabbfodbjecclkddfnanflgkjfe?hl=en-US
It automatically saves the web pages you surf and browse TO The Internet Archives' Wayback Machine.
Where's the federal funding to make sure that it's a maintained repository? it's a charitable organization but I would think some sort of royalty arrangement should be provided. I mean if the copyright/trademark/patent system is making use of it or the plaintiffs/defendants then it should have some direct funding stream in terms of its value as a provider of information. I could also see litigants subpoenaing witnesses to ascertain how information is collected etc. That doesn't come for free, not by a long shot.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
it does nothing to alter the purity of the preserved evidence.
Good, but that shouldn't be enough. What it does to avoid that purity being altered?
The case appears to be about a trucker jobs website using a trucking companies "Trademark" (a logo maybe?) without permission. The "trademark" apparently was removed from their website before the court case but was still available on the Wayback Machine. Maybe if there was some time frame that was necessary to the case they might have a point but the case is about the defendant EVER using the logo. So unless they can prove that the Wayback Machine ADDS random content to websites it archives they don't have a leg to stand on.
Several questions arise.
1) How do I know the data in the Wayback machine has not been tampered with?
2) How is the chain of custody of the information verified?
The rules of evidence are that when evidence is recovered from the crime scene, everything must be accounted for, including how the evidence was obtained and handled.
The wayback machine supposedly scrapes the web and saves the data. But, what happens to the data between the hosting server and the Wayback machine's client? How do we know what route the data took? How do we know it was not tampered with along the way? How do we know it was not tampered with once it arrived at Wayback's server? How do we know it was not tampered with while in Wayback's custody?
These are important questions that need to be resolved before the legitimacy of the Wayback machine can be trusted. Otherwise, what's to stop me intercepting a connection between the wayback machine and, say, a church pastor's personal website, and inserting a bunch of child porn that could later be used to prosecute the pastor?
does it leave what we browse incognito out??
This means it could be used to fight patent trolls who abuse open source and creative commons items. Makerbot comes to mind on this.
We need to make an internet archive archive!
Was addressed in Voter Verified v. Premier Election Solutions, 698 F.3d 1374 (2013). And that was some random forum.
See pages 8-9: http://www.cafc.uscourts.gov/sites/default/files/opinions-orders/11-1553.pdf
Well, the actual quote is:
Of course, in the case of archive.org the equivalent of 1984's "Memory Hole" is the way they treat domain hijackers that put a robots.txt block on prior content of that domain name:
It's gone. Poof.. not even a bright flash of plasma, let alone smoke.
Seastead this.
An interesting link on robots.txt and preservation:
http://www.netpreserve.org/web...
(SPOILER: no anwsers)
This will just result in more businesses opting out of the Wayback machine as a matter of course, limiting its usefulness.