The Internet Archive Sued Over Stored Pages
Kailash Nadh writes "The Internet archive, which has been storing snapshots of millions of webpages since 1996 has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor. In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia. Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal." CT:update note that the submittor got it backwards: Healthcare Advocates is the sueing Wayback and Harding Earley Follmer & Frailey, not the other way around.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
I can tell you exactly where the problem lies (and I know this because I have customers who behave this way):
When they write documents, they write them in HTML format. They send their email, they send itin HTML format. When I asked for them to prepare content for their website, they gave me a Microsoft Word document in HTML format, and said "You don't have to use the same fonts I used in this document, but please keep the layout the same on my website."
These users equate "a document" to "a website", and they think that once they stop using or sending that document out, that their "website" should be removed as well. They think websites are "sent" to people, not requested "by" people, and that when you close your browser, your "document" is gone.
That simply is not the case, and people need to be re-educated to understand these technologies and how they work. The Internet was MEANT to be self-healing, in case one node or another went down, information and information pathways would still be functioning.
It seems rather more likely that the plaintifs fucked up their robots.txt file entries and that is why they were spidered.
At the risk of receiving yet another deposition I was part of the conversations that led to robots.txt. It was never intended to be an access control mechanism or an effective content control mechanism within the meaning of the DMCA. The objective was simply to allow sites with automatically generated content to tell the robots that parts of their site are not suitable for spidering.
So now it looks like we are going to have revisit the business model for the way back machine and work out how to float a littigation fund.
Actually one way that it could be done is to sign and timestamp material on receipt and offer the signatures as a premium service.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/