Internet Archive Gets 4.5PB Data Center Upgrade
Lucas123 writes "The Internet Archive, the non-profit organization that scrapes the Web every two months in order to archive web page images, just cut the ribbon on a new 4.5 petabyte data center housed in a metal shipping container that sits outside. The data center supports the Wayback Machine, the Web site that offers the public a view of the 151 billion Web page images collected since 1997. The new data center houses 63 Sun Fire servers, each with 48 1TB hard drives running in parallel to support both the web crawling application and the 200,000 visitors to the site each day."
one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system
I have no idea how much 4.5 PB is until it's given in units of Libraries of Congress.
Does lusting after all their space make me a peta-phile?
Life==Jeopardy. All the answers are right in front us - the hard part is coming up with the correct question.
so all one need to do to "own the internet" is to drive a big rig and ... lift the container off their parking lot?
I can now theoretically steal "the internet" with a flatbed truck and a lift. There's something to be said for conventional data centers: They're rather hard to load onto a truck and drive off with.
#fuckbeta #iamslashdot #dicemustdie
Well I hope it is bolted down.
http://michaelsmith.id.au
Just imagine what you could do with a beowulf cluster of 4.5 PB datacenters. You could create regular archives of the internet archives!
(As a webserver administrator, I can't stress how important it is to keep backups.)
Yes, "thumper" refers to the rabbit. I have a Sun Managed Storage slide somewhere about how data tends to, er, multiply...
--dave
davecb@spamcop.net
Are there any resources the let us see websites from 1996, 95, 94, or 93? I would love to revisit the web as it appeared when I first discovered it (1994 at psu.edu).
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
Unfortunately the Wayback Machine will still be slower than hell. :p
"Those who would sacrifice essential liberties for a little temporary safety deserve neither liberty nor safety." - BenF
It sometimes takes the form of a giant blue linksys router. So that we may better worship it.
The internet is only about 2TB once you've removed all the redundant copies of 2g1c and goatse.cx.
"Common sense will be the death of us all"
The Internet Archive also works with about 100 physical libraries around the world whose curators help guide deep Internet crawls. The Internet Archive's massive database is mirrored to the Bibliotheca Alexandrina, the new Library of Alexandria in Egypt, for disaster recovery purposes.
Incidentally: FileFront is closing in five days, taking with it any files that aren't hosted elsewhere.
I am told that many of the Half-Life mods hosted there are not available anywhere else, so get while the getting is good...
... of a 4.5 petabyte datacenter in a shipping container in transit.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
and not a single "finally a place big enough to store all of my porn" reference? Y'all are slacking tonight.
on a very slightly serious note, how much content would be referenced by, say, TPB? Sure the trackers are small, but that's got to be huge.
Didnt that burn down a few thousand years ago?
So wehre does the 4.5PB come in to this?
That wasn't the ribbon, it was the powercord! Someone's going to be embarassed!
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
I guess /.'s readers can no longer multiply, but 63 servers * 48TB/server = 3024TB =~ 3PB.
I'm guessing they had 1.5PB already?
Andy
P.S. yes, I'm looking for a class 8 truck and a set of hydraulic jacks... but before I steal the Internet Archive, as a consumer, I DEMAND that the entire thing fit in my shirt pocket, and have an Apple logo on it!!!
So where does the 4.5PB come into this?
... one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot,..
As I recall from one of Brewster's talks: Part of the idea was that you can install redundant copies of this data center around the world and keep 'em synced.
You can ship 4.5 petabytes over a single OC-192 link in about 71 days.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
63 servers * 48 disk of 1 TB = 3024 TB. According to the announcement on the archive.org 3 Petabytes would be right.
Riiiight... because you happen to have a really really good mental image of exactly how many rooms/shelves/books/pages are stored in the Library of Congress!
(Which incidentally doesn't happen to be static, BTW; yo momma's LoC ain't the same size as my LoC.)
I don't know if I'm the only one who read it this way, but the summary makes it seem like these servers have a bit of a job on their hands as it is, what with hosting the site and doing their web-crawling/archiving...and we slashdotted this thing? We're going to blow that little metal building up.
The new data center houses 63 Sun Fire servers
That's not very specific. "Sun Fire" is a brand that for a while got applied to all of Sun's rack-mount servers (except for NEBS-compliant servers, which were and are called "Sun Netra"). A little confusing, of course, which is why they've started calling new SPARC boxes "Sun SPARC Enterprise" to differentiate them from those mangy x64 "Sun Fire" systems. Except that there are still SPARC systems called "Sun Fire", so I guess the confusion factor didn't get any better...
Anyway, the specific server being used here is the Sun Firex X4500, a system with no less than 48 1 TB disks in a 4U space. Notice that this model is EOLed; presumably iarchive got a deal on some remaindered machines.
The shipping container is something we've seen before.
They cut the ribbon? How are they supposed to access that much data unless they buy a new one?
You would be stealing A backup copy of THE Internet. An incomplete one at that, but still quite extensive.
Now... If you were somehow able to steal that copy AND break the internet... your stolen internet may be considered THE internet.
Mit der Dummheit kämpfen Götter selbst vergebens
From TFA (yeah, I know):
So they get all 200,000 hits in a 7-minute window? I picture a sysadmin going insane for a few moments then napping in a hammock for the rest of the day.
Dewey, what part of this looks like authorities should be involved?
[subject correction]
PB, not TB... hehe.
Stop Global Warming!
Just say no to irreversible processes!
The presence of weaponized sharks implies the need for a moat. Somehow I doubt the city and county governments would appreciate its construction on the premise, as the presence of the said sharks would preclude passing it off as a swimming pool.
The internets isn't like a truck! It's a series of tubes!
Actually, I was thinking the largest collection of pr0n the world has ever seen (to date.)
Inquiring minds want to know.
They're keeping the offsite backup distributed around the Internet, using the World-Wide Web to store it in real time.
Part of it may even be on *your* machine! We've really got to stop Brewster from leaching all your storage and make him store his backup himself - this business of using the originals to back up the backup just isn't sustainable!
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Yo dawg we hear you like to back up your internets so we put a backup container inside your backup container so you can back up your back up while you back up your back up.
So there are 48 1TB discs running in paralell? What does that mean? RAID1? Well, we know it can't mean that, so RAID5? Seems like it couldn't mean that either. Maybe multiple raid5 arrays with 6 discs per array?
Maybe some sort of nifty new filesystem I don't know about?
So slashdot, How do you think the discs are arranged and what filesystem do you think they use?
Whats the biggest tape backup one can buy nowadays?
Letsee...4.5 PB = 4500 TB
A Linksys NAS200 can support two 1 TB hard drives, which is 1 TB of storage configured as RAID 1.
Thus, 4500 NAS200 boxes could hold that much data. Dang. My house only has a 200 A main; I don't even have enough electrical service to run all of them. And my wife would have a fit when our electric bill showed up.
http://www.archive.org/donate/
... ribbon on a new 4.5 petabyte data center [CC] housed in a metal shipping container that sits outside.
I knew it! The internet is a big truck you can throw stuff in! What's this series of tubes business?
The game.
Now we'll never have to worry about losing the old cached version of goatse
Why use it if webpages are being deleted from it.
I have tried to use it before on some websites and the information was ALL DELETED.
If a person puts a PUBLIC website up then it can be archived.
The internet archive just shows that it has no backbone and it ISN'T interested in being a legitimate archive.
I noticed that the dates on many webpages are entirely incorrect. For example, it says my webpage existed in 2001, when I started it in 2005 . . .
A healthy brain forgets that which is not important enough to remember - maybe we as a planet should do the same?
This is a total waste of energy to maintain just so a couple of nerds can exclaim, "ZOMG remember frames!?!"
Or possibly, there's just making the content unavailable until copyright expires? Seriously, they don't have any law behind what they do, so they have to tread relatively carefully in order to not cause themselves bigger problems than not being able to archive a small number of websites.
[FUCK BETA]
Actually, 100km/h (62.19mph) is 27.78m/s (91.13fps). So a 20-foot container on a truck will pass any given point in about 0.219sec. That's a burst bandwidth of 20.5PB/s or 164Pb/s.
My "fast" internet connection is more than 9 orders of magnitude slower, at a mere 100Mb/s. Now I'm really annoyed with my ISP.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
...it'll have the whole canon of 1930s Film Noir to use as reference material.
4.5 petabytes should be enough for anyone.
63 servers with 3024 hard disks in total jammed into the confines of a metal shipping container that sits out in the sun.
That's sounds like either a recipe for disaster or a great Mythbusters episode.
I wonder if they're running OpenSolaris/ZFS on these hosts...
isn't the Internet Archive the same thing as Google Cached Pages?
I don't know how many times it go to the "way back machine" to find NOTHING!
Ever since it's been used in courts, the archive has been deliberately censored BEFORE backups even gets inside! I bet most of it now are Spam sites and startups-before-lawyers-get-involved sites.
If MS takes something down, don't bother looking at archive.org. It was never copied there to begin with.
Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
63 * 48 * 1TB = 3024TB ~= 3PB
For where does the number 4.5PB originate?
OK, so google is indexing the whole web, as well as these guys.....that's great, we have a sort of redundancy should anything go wrong.