Snapshotting the Whole Internet?
Anonymous Coward writes "CNN is running a story about a company that is saving periodic 'snapshot' archives of the whole www (or as much as they can) for historical purposes. Interestingly they say that although they might have considered saving everything except ads, they didn't throw away the ads because historians claim that ads give a better "glimpse of what life was like" in the past. I wonder what legal ramifications will arise for possessing such archives of the "whole web" as snapshots-in-time. Thoughts of DeCSS, CPHack, MS Kerberos' click-wrap license, I.P. "ownership" of collected databases cross my mind."
There is info on the side on how the archive is accessed, created, who pays for it, everything. Read it before you hit that post button another time.
---
END OF LINE
Our website is at http://www.archive.org.
We are *NOT* a company. We are a non-profit organization making our archives freely available to researchers, scholars, historians, etc.. A for-profit company may not be the right model to insure long term longevity of the collections. We only archive publicly available information on the Internet.
We currently have about 17TB of Web pages and images on disk. We've also got about 6TB of older stuff on tape that we are migrating to disk. We're growing at about 3-4TB/month. We are not yet getting Usenet or streaming media because of labor limitations. Anyone wanna come work for us?
We buy storage PC's with twenty 75GB IDE hard drives, 2 667Mhz CPUs and 512MB RAM. We run Linux, but are migrating to FreeBSD because of the 2GB file size barrier.
Access currently requires a bit of UNIX skill. There is no browser interface to our collections. You'll need to be able to write your own search software, as the only index we have right now is a URL index. If you want access, you'll need to fill out a form at http://www.archive.org/proposal.html.
Kurt Bollacker
Technical Director, Internet Archive
kurt@archive.org -- www.archive.org
P.O. BOX 29244, San Francisco, CA 94129
vox: 415-561-6796 -- fax: 415-561-6768
Well I suppose that the sheer amount of perversion and degredation available on the net at this point in history will provide a lot of interest to future historians, so in that context sure it'll be "historically interesting"!
You just don't get it, do you? Should historians gloss over the Holocaust, the Reconstruction, and the Dark Ages simply because they were "icky?" Sometimes the darker elements of society are the most worth examining in a historical context. The whole point of the saying about those who don't study history are doomed to repeat it isn't that you should study only the good points and avoid them.
But, pornography aside, what is there of real historical value on the net? Sure there are any number of mindless geocities homepages full of drivel about people's pets, but sifitng through this would drive anyone mad and there are a lot more "insightful" sources already available about today's culture.
Do you think it's not just as frustrating to shuffle through archives of old 19th century newspapers to find ads and articles about the medicine of the day? The point that the man speaking for the Internet Archive was making is that this is not a study of only the famous. With these archives at hand, you can study the transition from the early days of research papers to the rise of pornography and personal websites to the current days of e-commerce to whatever major social trend the web next holds. An archive of the web shows how society has adapted to the format. You can see what issues were hot enough to spur crops of websites only to fade away in the span of a year or two.
Face the music that the majority of humanity isn't putting out "insightful" commentary. Ignoring the common man is a mistake that many historians simply can't ignore because there's nothing available about them. All the "mindless" Geocities sites give an insight into the kind of people that use them.
Unfortunately the web as it stands at the moment shows the worst side of humanity rather than its best side - historians looking through terabytes of things like the anarchists cookbook, virulent anti-Christian diabtribes, terrorist manifestos and race hate sites will hardly pick up a balanced view of society will they?!
Sounds like you're the one with the hardly balanced view of society if you honestly think that is what the majority of the web is. The fact is that the majority of the web currently is commercial sites and those "mindless" Geocities sites you like to talk down about. Though there are some bad elements on the web, it's also worth historical note that the web led to the coming out of many of these fringe groups. The very anarchy and rebellion of the web is of major historical interest, and the web is becoming one of the more important socio-economic influences of the turn of the century, at least in America.
But unless it will be used as the basis for future studies then this project is a waste of time, so I don't think you have a valid point here.
Ah, but it will be. Say in 30 years you want to do some research on the Y2K histeria of the turn of the century. While there will be plenty of books to read through, a major factor in spreading the word about Y2K was the Web. However, these web sites are already mostly gone from the Web today. Fortunately, the Internet Archive may have already preserved them for future study.
Would you like to study the rise of Linux or of the web itself? Many of the early web pages about the topics could provide priceless research. Hell, even if you really object to the large amount of pornography, the booming porn industry on the web was a major driving factor in advances in e-commerce. It would also be valuable in studying the "warez" counter-culture of today.
Plus, like it or not, it's not for you to say. This is being done by a privately funded group. If you really feel so strongly that the web is worthless and should absolutely not be archived for historical purposes, then go torch the place. While you're at it, go ahead and start burning those libraries that hold material about history you object to. Otherwise, your choices are "shut up" and "like it."
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").