Internet Archive Gets 4.5PB Data Center Upgrade
Lucas123 writes "The Internet Archive, the non-profit organization that scrapes the Web every two months in order to archive web page images, just cut the ribbon on a new 4.5 petabyte data center housed in a metal shipping container that sits outside. The data center supports the Wayback Machine, the Web site that offers the public a view of the 151 billion Web page images collected since 1997. The new data center houses 63 Sun Fire servers, each with 48 1TB hard drives running in parallel to support both the web crawling application and the 200,000 visitors to the site each day."
one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system
Are there any resources the let us see websites from 1996, 95, 94, or 93? I would love to revisit the web as it appeared when I first discovered it (1994 at psu.edu).
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
from http://www.lesk.com/mlesk/ksg97/ksg.html The 20-terabyte size of the Library of Congress is widely quoted and as far as I know is derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.
1. Thirteen million photographs, even if compressed to a 1 MB JPG each, would be 13 terabytes.
2. The 4 million maps in the Geography Division might scan to 200 TB.
3. LC has over five hundred thousand movies; at 1 GB each they would be 500 terabytes (most are not full-length color features).
4. Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.
This makes the total size of the Library perhaps about 3 petabytes (3,000 terabytes).
so 230 libraries by the old standard or 1.5 by the new standard
... one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot,..
As I recall from one of Brewster's talks: Part of the idea was that you can install redundant copies of this data center around the world and keep 'em synced.
You can ship 4.5 petabytes over a single OC-192 link in about 71 days.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Here's a video tour of one if you need it for reference.
Don't forget to turn off the water and unplug the ethernet cables. Just be very careful with the power cords.
Dual Opteron < $600
This seems to be an exact use case for the X4500-type system, which as far as I'm aware is pretty unique.
Indeed. Sun is on a density kick. Check out the X4600, which does for processing power what the X4500 did for storage.
In both cases, there actually are competing products that are sort of the same. The most conspicuous difference is that the Sun versions cram the whole caboodle into 4 rack units per system, about half the space required by their competitors.
More absurdly-dense Sun products:
http://www.sun.com/servers/x64/x4240/
http://www.sun.com/servers/x64/x4140/
The point of these systems is that they take up less expensive rack space than equivalent competitors. They're also "greener": if you broke all that storage and computing power down into less dense systems, you'd need a lot more electricity to run them and keep them cool. That not only saves money, it gives the owner the ability to claim they're working on the carbon footprint.
The CDs are already in digital format, so compressing them is a cardinal sin.
The photos, movies, and maps are in analog format to start with, so we don't feel so bad using lossy compression. Image files are really big. I think the 1GB estimate per movie is pretty good, considering shorts, black and white, and the standard (or lower) definition of most of them. That would allow for a very high detail scan of the movie in something like MPEG4.
And, since they started in analog formats, there's no fair way to determine what resolution to scan them. I mean, even a million by a million pixels could not be a 'lossless' interpretation of a 1x1cm image, so you have to accept that any digital conversion will be lossy regardless of encoding.
At least that would be my rationale. Not that this question needed to be answered...
Buckle your ROFL belt, we're in for some LOLs.