Slashdot Mirror


Internet Archive Gets 4.5PB Data Center Upgrade

Lucas123 writes "The Internet Archive, the non-profit organization that scrapes the Web every two months in order to archive web page images, just cut the ribbon on a new 4.5 petabyte data center housed in a metal shipping container that sits outside. The data center supports the Wayback Machine, the Web site that offers the public a view of the 151 billion Web page images collected since 1997. The new data center houses 63 Sun Fire servers, each with 48 1TB hard drives running in parallel to support both the web crawling application and the 200,000 visitors to the site each day."

235 comments

  1. Where do they store 4.5TB off site by wjh31 · · Score: 5, Interesting

    one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

    1. Re:Where do they store 4.5TB off site by fuzzyfuzzyfungus · · Score: 3, Informative

      TFA indicates that they have a mirror at the library of Alexandria. Unless things have changed since last I read about them, the mirroring is pretty much it. The Internet Archive does very impressive work; but they don't have that much money. No Real Big Serious Enterprise tape silos here.

    2. Re:Where do they store 4.5TB off site by LiquidCoooled · · Score: 5, Funny

      one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

      floppy disks.
      lots of floppy disks.

      --
      liqbase :: faster than paper
    3. Re:Where do they store 4.5TB off site by MichaelSmith · · Score: 4, Funny

      Its like the two USB hard disks I use for backups. Pick up the container and swap it with the container from secure storage,

    4. Re:Where do they store 4.5TB off site by MrEricSir · · Score: 4, Funny

      It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.

      --
      There's no -1 for "I don't get it."
    5. Re:Where do they store 4.5TB off site by DigiShaman · · Score: 4, Interesting

      Umm, how many forklifts and 18 wheelers does it take to swap out 4.5 petabytes worth of data each day?

      --
      Life is not for the lazy.
    6. Re:Where do they store 4.5TB off site by CannonballHead · · Score: 1

      [subject correction]
      PB, not TB... hehe.

    7. Re:Where do they store 4.5TB off site by clarkkent09 · · Score: 1

      It's 4.5PB, which is a whole different thing, and TFA says it's mirrored at the library of Alexandria, Egypt. I guess that counts as off-site :)

      --
      Negative moral value of force outweighs the positive value of good intentions.
    8. Re:Where do they store 4.5TB off site by Bearhouse · · Score: 2, Funny

      Not reliable enough.
      I suggest that this important resource be backed up to punched cards.
      This would also enable handy comparisons in units that us oldies understand, such as ELOCs
      (Equivalent Library of Congress).
      I'd calculate it myself, but seem to have mislaid my slide rule...

    9. Re:Where do they store 4.5TB off site by commodore64_love · · Score: 5, Funny

      They'd better have it backed-up. Last time the Alexandria library burned-down, we lost about one thousands years of collected information from ancient Greece and Rome. Ooopsie.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    10. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 1, Funny

      They have Charlie Babbitt on their staff. No need to replicate.

    11. Re:Where do they store 4.5TB off site by houghi · · Score: 2, Funny

      If they need much more, I have some AOL disks laying around that they can use.

      --
      Don't fight for your country, if your country does not fight for you.
    12. Re:Where do they store 4.5TB off site by TheGratefulNet · · Score: 1

      one would assume that something like this does regular off-site back-ups

      there are BIG fat cables you connect, wait 3 seconds, then do a massively parallel 'dd if= ...'

      --

      --
      "It is now safe to switch off your computer."
    13. Re:Where do they store 4.5TB off site by __aasqbs9791 · · Score: 3, Funny

      I'd suggest also using stone slabs. Water can do serious damage to paper, and don't get me started on fire hazards. Good old Stone Slabs resist both of those really well. I'm not sure what the write speed is, however, so you'll probably need to hire many stonecutters to work in parallel.

    14. Re:Where do they store 4.5TB off site by TheGratefulNet · · Score: 2, Funny

      It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.

      in music today, there is a so-called 'loudness war' and I think I've discovered what it is: they're removing the zeroes, thinking that 'all ones' will make the music even louder!

      I wonder if its reversable? where do the zeroes go? can they be unzeroed? we should try to find them.

      --

      --
      "It is now safe to switch off your computer."
    15. Re:Where do they store 4.5TB off site by Wingman+5 · · Score: 1
      Jeeze how much bandwidth would you need to fill that thing in one day... runs to google.
      (4.5 petabyte) / (1 day) = 54.6133333 GBps
      According to List of device bandwidths the closest things to filling it in one day are:
      • HyperTransport 3.1 (3.2 GHz, 32-pair) 409,600 Mbit/s 50 GB/s
      • PC3-16000 DDR3-SDRAM (triple channel) 480.4 Gbit/s 48.4 GB/s
    16. Re:Where do they store 4.5TB off site by Sponge+Bath · · Score: 1

      Followed by run length encoding of the remaining ones.

    17. Re:Where do they store 4.5TB off site by pedrop357 · · Score: 2, Insightful

      4.5TB isn't that bad. Heck, we have 1TB tapes right now. 5 of them can be carried in a small bag.

      It's the 4.5PB that the Internet Archive could use that's hard to store offsite. 4500 1TB tapes can be pretty unruly.

    18. Re:Where do they store 4.5TB off site by jd · · Score: 1

      When you get right down to it, any hard-coded data on silicon is just data on a stone slab. Since you can compile SystemC into a hardware spec, you can write stone slabs as fast as you can generate C.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    19. Re:Where do they store 4.5TB off site by wealthychef · · Score: 1

      Can you say, Parallelism?

      --
      Currently hooked on AMP
    20. Re:Where do they store 4.5TB off site by medelliadegray · · Score: 3, Insightful

      i find it impressive they have all that hardware for a mere 200k users a day.

      --
      Troll, Troll, go away and flame again some other day
    21. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 2, Informative

      In Brewster Kahle's December 2007 TED talk he mentions a third mirror in the Netherlands.
      http://www.ted.com/index.php/talks/brewster_kahle_builds_a_free_digital_library.html

      As he puts it, the Archive is mirrored on 'a fault line, a flood plain, and in the Middle East'.

      Funny thing is I can't find another reference to the Netherlands mirror. The Bibliotheca Alexandrina site mentions a plan to eventually have four sites (California, Alexandria, Europe, and Asia), but that's it. Anyone know what happened with the Netherlands site?

    22. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 3, Interesting

      one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

      Create snapshot of zpool (think LVM VG):
      # zfs snapshot mydata@2009-03-24

      Send snapshot to remote site:
      # zfs send mydata@2009-03-24 | ssh remote "zfs recv mydata@2009-03-24"

      Create a new snapshot the next day:
      # zfs snapshot mydata@2009-03-25

      Send only the incremental changes between the two:
      # zfs send -i mydata@2009-03-24 mydata@2009-03-25 | ssh remote "zfs recv mydata@2009-03-25"

      Now this looks a lot like rsync, but the difference is that rsync has to traverse the file system tree (directories and files), while ZFS only has to look at the 'birth time' (think ctime) of each block of data (not even the full file metadata) to see if it's newer than the first snap shot. If you're talking about tens (or hundreds) of thousands of directories, and an order of magnitude more files, that's a lot of overhead if nothing has changed. For 48 TB raw (what a Sun X4500 can have), ZFS can see nothing has changed in a few minutes.

      Creation of snapshots is instantaneous and there is no overhead in them (except that the space from deleted files isn't reclaimed / reused). There are people who create them every five seconds, and sync with a remote server--so at most you would lose five seconds worth of data if your disk died.

      All changes are also ACID, so if you start your send-recv, and the transmission dies part way through, the receiving end won't have a partial copy of the data latest snapshot--it's all or nothing of the last good change.

    23. Re:Where do they store 4.5TB off site by rackserverdeals · · Score: 1

      one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

      Dude, the Internet Archive IS the offsite backup.

      At least mine anyway. Tape drives be damned.

      --
      Dual Opteron < $600
    24. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      I'm surprised that you got no serious answer.

      In that scale, you don't backup anymore, you save the data tripple (or more) redundantly in the first place.

      In essence, all the machines form sort of a peer-to-peer network with a distributed hashtable (DHT) to store and lookup data.

      The 'petabox' had a pretty cool website once which explained all that in detail, but it seems to redirect only to a page on archive.org now...

      Freenet or YaCy may also be good examples of that technique which offer background (basic concepts understanding) material if someone is curious.

    25. Re:Where do they store 4.5TB off site by dziban303 · · Score: 2, Insightful
      (The truth about AIG and Congress.) http://www.foxnews.com/video2/video08.html?maven_referralObject=3833532 - Watch now

      I can't take anyone seriously who puts "truth" and a link to Fox news in the same signature.

    26. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      "In that scale, you don't backup anymore, you save the data tripple (or more) redundantly in the first place."

      Hypothesis: nonsense
      Demonstration: Your container gets fire and fluf! all your tripple (or more) redundant data is lost in a burst.

      Surely you are one of those that believe RAID is a valid backup strategy.

    27. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 5, Funny

      Can you say, Parallelism?

      Parallelogram.... crap
      Parallellellell... dammit
      Parapalouza... >

      Why did you have to point that out to everyone? :(

    28. Re:Where do they store 4.5TB off site by bitrex · · Score: 1

      What you've got to do is you've got to make punch-card copies of all your data. Then you're gonna take those punch cards, and you're going to put them on a wooden table. Then you're going to take digital photographs of them, email them to your backup site in the Netherlands, where they'll get the photographs printed and stack them in a vacuum sealed airtight length of 24" PVC sewer pipe stored on the top floor of a windmill to avoid flood damage. That is what we call mission critical backup procedure, my friends.

    29. Re:Where do they store 4.5TB off site by notthepainter · · Score: 4, Interesting

      Sadly, even modern day archives get wrecked. See http://www.spiegel.de/international/germany/0,1518,611311,00.html

    30. Re:Where do they store 4.5TB off site by Samah · · Score: 2, Funny

      It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.

      Compressed with XML! Because XML makes everything better... right?
      Right?

      --
      Homonyms are fun!
      You're driving your car, but they're riding their bikes there.
    31. Re:Where do they store 4.5TB off site by zach297 · · Score: 5, Funny

      I'd suggest also using stone slabs. Water can do serious damage to paper, and don't get me started on fire hazards. Good old Stone Slabs resist both of those really well. I'm not sure what the write speed is, however, so you'll probably need to hire many stonecutters to work in parallel.

      A math problem. My favorite. I don't know much about stone cutters but lets assume they can write one bit every 2 seconds. Thats 1 byte in 16 seconds. The internet archive is (4.5 x 1,125,899,906,842,624) 5,066,549,580,791,808 (5 quadrillion) bytes. That works out to 81,064,793,292,668,928 (81 quadrillion) seconds or about 2,570,547,732 (2.5 billion) years. That is far to long for their stringent 2 month backup cycle. They would need 15,423,286,395 (15.4 billion) stone cutters to keep schedule assuming they had unlimited stone. Last time I checked there were only between 6 and 7 billion people with only a small fraction of them being stone cutters. That leaves but one solution. Force the web developers to become stone cutters. This would not only increase the work force but also reduce the amount needed to backup because fewer people will be making more web pages to backup.

    32. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      Are you sure it wasn't middle earth? Frodo is pretty good at taking care of things.

    33. Re:Where do they store 4.5TB off site by fortapocalypse · · Score: 1

      But then when the stone slabs deteriorate you have to build an Ark to carry them.

    34. Re:Where do they store 4.5TB off site by Omniscient+Lurker · · Score: 4, Interesting

      Instead of writing in binary you could write the data in a base-36 format and then convert back to binary. The stone cutters could then store more data per glyph increasing their write rate considerably (and decreasing read rate) by amounts I am unwilling to calculate.

    35. Re:Where do they store 4.5TB off site by MaggieL · · Score: 1

      Silly. There's no requirement that it be willful. Look at the education system.

      --
      -=Maggie Leber=-
    36. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      Can't they just use the Wayback Machine to recover the data?

      I mean, I assume the Internet Archive at least archives itsself.

    37. Re:Where do they store 4.5TB off site by Tubal-Cain · · Score: 1

      Did you know that if you stick the same disk back in, you can fit the entire backup on a single floppy?

    38. Re:Where do they store 4.5TB off site by Tubal-Cain · · Score: 1

      any hard-coded data on silicon is just data on a stone slab.

      Ceramic and glass are both stone, right?

    39. Re:Where do they store 4.5TB off site by k3vlar · · Score: 1

      It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.

      We should back up the digits 1 and 0 too, just to be safe. In the event of a fire, we want to be able to reconstruct the data, and what will we do if all the 1's get damaged? We'll also have a handy comparison for the 0, so we can insert it back without too much effort when we un-compress the data.

      --
      Unlike porn, which yada yada rimshot hey-ooh!
    40. Re:Where do they store 4.5TB off site by Rural · · Score: 2, Informative

      Their aim is to preserve the content found on the Web. They need the hardware for that. I assume they don't need much for the "serving users" part.

    41. Re:Where do they store 4.5TB off site by garry_g · · Score: 1

      It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.

      ... and once the useless zeroes are out, a RLE of the remaining data takes care of the rest ... then, once a day, some monkey comes in and does a backup on an old 1541 ... the disk is then xeroxed and the paper copy safely stored in between the stocks and derivatives of AIG, ensuring that no-one will ever find it in between that worthless pile of paper ...

    42. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      They could use the c64 web-server c64web @ http://www.c64web.com a 1541 drive holds a lot of byte's.

      Kind of David meets Golif.

    43. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      If you complain about the school system, you're living in a district that doesn't have
      the resources to provide a quality education.
      I have worked with kids that attended the best schools in Portland ( Lincoln and Lake Oswego High )
      and have found that the students attending have a broad and varied education.
      I was told by one student that a weak education was due to the failings
      of the parents to achieve wealth and prestige and move to districts that provided a quality education.
      This does not bode well for everyone that's excluded from these intense, focused and expensive
      "publicly" founded institutions.

      The children of the new poor will rise up and butcher the bastards that stole their
      dream of a place of their own and food for their kids. We will have a Mao style cleansing.
      And the it will be over.

    44. Re:Where do they store 4.5TB off site by PReDiToR · · Score: 1

      Actually, all those "zeros" aren't totally zero. There are micro traces of energy used to make them up that get detected out of the equation and rounded down to zero.

      I have written a program to transfer all those tiny bits of electricity into a battery I own in another country, and sooner or later with all those round downs, I'll have enough energy to sell to the ex-USSR countries so they won't need gas any more.

      Profit!

      --

      Do not meddle in the affairs of geeks for they are subtle and quick to anger
    45. Re:Where do they store 4.5TB off site by aetherworld · · Score: 1

      You could also GZIP it first!

    46. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      Nice TDWTF reference.

    47. Re:Where do they store 4.5TB off site by knutkracker · · Score: 1

      TFA indicates that they have a mirror at the library of Alexandria.

      You mean the site of possibly the most catastrophic series of data-loss incidents in recorded history?

      Who's idea was that?

    48. Re:Where do they store 4.5TB off site by fuzzyfuzzyfungus · · Score: 1

      Don't worry, the Romans are gone, and they purchased some lions to keep Christians away from the backups...

    49. Re:Where do they store 4.5TB off site by ATMD · · Score: 1

      > Anyone know what happened with the Netherlands site?

      Umm... it flooded?

      --
      Nobody else has this sig.
    50. Re:Where do they store 4.5TB off site by morie · · Score: 1

      This is also the reason real loud amps go up to 11...

      --
      Sig (appended to the end of comments I post, 54 chars)
    51. Re:Where do they store 4.5TB off site by the+cleaner · · Score: 1

      could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

      The secrets of backups in a datacenter that big is: You don't. Backup, that is.

      A simple calulation shows, that a full backup would basically never end, and incremental backups only help you if you if you had something to increment from. And a restore would never end as well.

      Instead you make damn sure, you have redundancy in everything. DRBD is your friend. RAID controllers are, too.

      If you have the money, integrate with the big ones (EMC, HP, SUN). If you don't, use a clever solution (Caringo).

      --
      Could be worse. Could be raining.
    52. Re:Where do they store 4.5TB off site by commodore64_love · · Score: 2, Insightful

      >>>I can't take anyone seriously who puts "truth" and a link to Fox news in the same signature.

      Neither can I take seriously anyone who believes MSNBC or CNN are unbiased and/or better alternatives. Or is prejudiced (prejudges a report without ever watching it). For example I may think Rachel Maddow is a joke, but at least I listen to what she has to say before I laugh. And sometimes, she says something worthy of hearing... it's good to keep an open mind and listen to the opposition.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    53. Re:Where do they store 4.5TB off site by commodore64_love · · Score: 1

      >>>Cracks had likewise been discovered in the archive recently, but had been discounted.
      >>>The collapse may be connected with the construction of a new subway line under the street

      Brilliant. You know there's a subway being built underground, and you notice there's cracks appearing, but you pretend the two are unconnected and do nothing. Way to live in denial museum curators.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    54. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      That's the whole point! The Archive has a partnership with the new Library of Alexandria to duplicate files; the Archive preserves its content in multiple locations for this very reason.

    55. Re:Where do they store 4.5TB off site by CopaceticOpus · · Score: 2, Funny

      "XML is like violence. If it doesn't solve your problem, you're not using enough of it."

    56. Re:Where do they store 4.5TB off site by jd · · Score: 1

      Ceramic certainly is, glass is one of those weird cases where the state of matter is not definitively in either the category of solid or liquid but if you were to use conventional classifications, glass could be considered an igneous rock.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    57. Re:Where do they store 4.5TB off site by AMuse · · Score: 1

      One or two forklifts and one 18-wheeler. The data is already pre-packed in a shipping container.

    58. Re:Where do they store 4.5TB off site by bandmassa · · Score: 1

      The stone cutters just have to work faster or they'll get fired.

      --
      "I hope you like Guinness, Sir. I find it a refreshing substitute for, er... food." Col. Jack O'Neil, SG-1
    59. Re:Where do they store 4.5TB off site by DiLLeMaN · · Score: 1

      In fact, if you have ever tried to retrieve an old version of a website, you would've found out that "serving users" is done using a single pentium 1 machine in the basement.

      --
      /var/run/twitter.sock is a twitter socket puppet.
    60. Re:Where do they store 4.5TB off site by DiLLeMaN · · Score: 1

      they're in /dev/null, of course.

      --
      /var/run/twitter.sock is a twitter socket puppet.
    61. Re:Where do they store 4.5TB off site by Samah · · Score: 1

      I've heard that quote somewhere before, and I was actually thinking of it when I posted. :)

      --
      Homonyms are fun!
      You're driving your car, but they're riding their bikes there.
    62. Re:Where do they store 4.5TB off site by Anonymous Coward · · Score: 0

      Who's idea

      "Whose".

  2. Story is meaningless without LOC measurement by Dr_Banzai · · Score: 5, Funny

    I have no idea how much 4.5 PB is until it's given in units of Libraries of Congress.

    1. Re:Story is meaningless without LOC measurement by Anonymous Coward · · Score: 5, Informative
    2. Re:Story is meaningless without LOC measurement by Wingman+5 · · Score: 5, Interesting

      from http://www.lesk.com/mlesk/ksg97/ksg.html The 20-terabyte size of the Library of Congress is widely quoted and as far as I know is derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.

      1. Thirteen million photographs, even if compressed to a 1 MB JPG each, would be 13 terabytes.
      2. The 4 million maps in the Geography Division might scan to 200 TB.
      3. LC has over five hundred thousand movies; at 1 GB each they would be 500 terabytes (most are not full-length color features).
      4. Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.

      This makes the total size of the Library perhaps about 3 petabytes (3,000 terabytes).

      so 230 libraries by the old standard or 1.5 by the new standard

    3. Re:Story is meaningless without LOC measurement by commodore64_love · · Score: 3, Informative

      83 terabyte in the LOC, so 4.5 petabytes == 54 Libraries of Congress

      4.5 petabytes == 4500 terabyte hard drives, times $75 each == ~$340,000 == how much taxpayers spend, each hour, to maintain the LOC

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    4. Re:Story is meaningless without LOC measurement by commodore64_love · · Score: 1

      P.S. My "83 terabyte" quote comes directly from the Library of Congress statistics, mid-2008.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    5. Re:Story is meaningless without LOC measurement by clarkkent09 · · Score: 1

      Bah, LOC is outdated. 4.5PB = 1 Shipping Container

      --
      Negative moral value of force outweighs the positive value of good intentions.
    6. Re:Story is meaningless without LOC measurement by v1 · · Score: 1

      a new 4.5 petabyte data center

      4.5 PB? Is that the best you can do? sheesh, amateurs....

      Though it also did surprise me they only get 200,000 hits/day. I expected the WayBack Machine to get a lot more traffic than that.

      --
      I work for the Department of Redundancy Department.
    7. Re:Story is meaningless without LOC measurement by clarkkent09 · · Score: 1

      I think that's 200K unique visitors. According to alexa, archive.org is the 386th most visited site on the internet last week which is not to be sneezed at

      --
      Negative moral value of force outweighs the positive value of good intentions.
    8. Re:Story is meaningless without LOC measurement by dln385 · · Score: 2, Insightful

      from http://www.lesk.com/mlesk/ksg97/ksg.html The 20-terabyte size of the Library of Congress is widely quoted and as far as I know is derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.

      1. Thirteen million photographs, even if compressed to a 1 MB JPG each, would be 13 terabytes. 2. The 4 million maps in the Geography Division might scan to 200 TB. 3. LC has over five hundred thousand movies; at 1 GB each they would be 500 terabytes (most are not full-length color features). 4. Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.

      This makes the total size of the Library perhaps about 3 petabytes (3,000 terabytes).

      so 230 libraries by the old standard or 1.5 by the new standard

      Compress each audio file to a 5 MB MP3. That's 17.5 TB. Total size would be 750 terabytes.

      So the data would be 6 LOC.

    9. Re:Story is meaningless without LOC measurement by merreborn · · Score: 2, Insightful

      Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.

      You compressed the video, and the photographs, but not the audio? And why do you need a full CD for every sound recording? Surely many of them are far shorter than a full CD?

    10. Re:Story is meaningless without LOC measurement by Xtravar · · Score: 2, Interesting

      The CDs are already in digital format, so compressing them is a cardinal sin.

      The photos, movies, and maps are in analog format to start with, so we don't feel so bad using lossy compression. Image files are really big. I think the 1GB estimate per movie is pretty good, considering shorts, black and white, and the standard (or lower) definition of most of them. That would allow for a very high detail scan of the movie in something like MPEG4.

      And, since they started in analog formats, there's no fair way to determine what resolution to scan them. I mean, even a million by a million pixels could not be a 'lossless' interpretation of a 1x1cm image, so you have to accept that any digital conversion will be lossy regardless of encoding.

      At least that would be my rationale. Not that this question needed to be answered...

      --
      Buckle your ROFL belt, we're in for some LOLs.
    11. Re:Story is meaningless without LOC measurement by turbidostato · · Score: 1

      "According to alexa, archive.org is the 386th most visited site on the internet"

      386? I would think that with all those Sun boxes they already were 64 bits at least!

    12. Re:Story is meaningless without LOC measurement by martin-boundary · · Score: 1
      1 Library Of Congress = 1/200 inch wide cube approx.

      Feynman estimated the LOC at about 1 petabit, which would make the Internet Archive containing roughly 36 petabits a cube on the order of 1/50 inch wide.

      So it should fit in your pocket.

    13. Re:Story is meaningless without LOC measurement by Thaelon · · Score: 1

      This should probably be added to units.

      --

      Question everything

    14. Re:Story is meaningless without LOC measurement by Anonymous Coward · · Score: 4, Funny

      460.8 Lines of Code? What's that supposed to be? Hello World in COBOL?

    15. Re:Story is meaningless without LOC measurement by Anonymous Coward · · Score: 0

      How many shipping containers in a football field?

    16. Re:Story is meaningless without LOC measurement by robinesque · · Score: 1

      Google needs to add this to their unit conversion table so that I can convert "number of horns on a unicorn petabytes in libraries of congress".

    17. Re:Story is meaningless without LOC measurement by Anonymous Coward · · Score: 1, Funny

      And I guess the new meme will be "how many Internet Archive data centers is that?"

    18. Re:Story is meaningless without LOC measurement by emj · · Score: 1

      In my experience black and white movies compress at about the same bit rate as color movies. Especially if they are bad quality.

    19. Re:Story is meaningless without LOC measurement by Anonymous Coward · · Score: 0

      You really think it costs $75 an hour per drive per hour? Get real.

      Also putting FoxNews in your sig makes you a wingnut.

    20. Re:Story is meaningless without LOC measurement by Lunzo · · Score: 1

      You Yanks and your old-fashioned measurements. The rest of the world has moved to the metric system's "Libraries of Alexandria" unit.

  3. Storage Envy by jacksinn · · Score: 5, Funny

    Does lusting after all their space make me a peta-phile?

    --
    Life==Jeopardy. All the answers are right in front us - the hard part is coming up with the correct question.
    1. Re:Storage Envy by fm6 · · Score: 1

      Yes.

    2. Re:Storage Envy by Rude+Turnip · · Score: 1

      Why don't you have a seat over there...

    3. Re:Storage Envy by Anonymous Coward · · Score: 0

      I do believe you have won.

    4. Re:Storage Envy by Anonymous Coward · · Score: 0

      Not to be confused with a PETA-phile. Those animal lover lovers.

    5. Re:Storage Envy by Zapotek · · Score: 1

      *Knock* *knock* Open up, we know you're inside!

    6. Re:Storage Envy by Anonymous Coward · · Score: 0

      ohai, were from the partyvan club! you won a free ride to jailbait island! come with us nao!

  4. Own the internet! by Anonymous Coward · · Score: 5, Funny

    so all one need to do to "own the internet" is to drive a big rig and ... lift the container off their parking lot?

    1. Re:Own the internet! by peragrin · · Score: 5, Funny

      well if you plug in a laser printer you can print off a hard copy for your boss.

      --
      i thought once I was found, but it was only a dream.
    2. Re:Own the internet! by Anonymous Coward · · Score: 0

      Jack Bauer did it last week.

    3. Re:Own the internet! by PReDiToR · · Score: 1

      I hope you're using company paper and ink.

      Never pay for at home that which you can use for free at work.

      --

      Do not meddle in the affairs of geeks for they are subtle and quick to anger
    4. Re:Own the internet! by rrohbeck · · Score: 1

      No joke. Where I work we had a big UPS with diesels for the data center outside on a trailer because the building didn't have enough space.

      It disappeared one weekend.

    5. Re:Own the internet! by peragrin · · Score: 1

      Honestly when i had two ink jet printers have the ink go bad and solidify in between the infrequent uses of printers at home i stopped keeping on.

      If it is important I print t at work, or if it is too large, I send it to a friend who can print it on a large commercial laser jet.

      --
      i thought once I was found, but it was only a dream.
  5. Slight problem? by girlintraining · · Score: 5, Funny

    I can now theoretically steal "the internet" with a flatbed truck and a lift. There's something to be said for conventional data centers: They're rather hard to load onto a truck and drive off with.

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:Slight problem? by rackserverdeals · · Score: 4, Interesting

      Here's a video tour of one if you need it for reference.

      Don't forget to turn off the water and unplug the ethernet cables. Just be very careful with the power cords.

      --
      Dual Opteron < $600
    2. Re:Slight problem? by fightinfilipino · · Score: 3, Funny

      so the Internet really is a big truck, hauling all of our lulz and our memes across the globe.

      take THAT, Ted Stevens!

    3. Re:Slight problem? by diablovision · · Score: 1

      Nothing is stopping you from putting it inside a building, cementing it into its foundation, or surrounding it with appropriately weaponized sharks.

      --
      120 characters isn't enough to explain it.
    4. Re:Slight problem? by bigsteve@dstc · · Score: 1

      But don't forget to wrap the container in cling-film before you drop it into the shark tank.

    5. Re:Slight problem? by caffeinemessiah · · Score: 1

      There's something to be said for conventional data centers: They're rather hard to load onto a truck and drive off with.

      Yes, but imagine the bandwidth!

      --
      An old-timer with old-timey ideas.
    6. Re:Slight problem? by caerwyn · · Score: 1

      Never underestimate the bandwidth of a... stationwagon filled with 4.5 petabyte shipping containers?

      --
      The ringing of the division bell has begun... -PF
    7. Re:Slight problem? by taucross · · Score: 1

      The idea of winning "one internets" is suddenly not quite so appealing.

      --
      "In the absence of the ability to establish the attribute of truth they tried to establish the noble attributes."
    8. Re:Slight problem? by couchslug · · Score: 1

      "or surrounding it with appropriately weaponized sharks."

      or defending it with other ISO containerized systems:

      Super Sangar

      http://www.mod.uk/DefenceInternet/Templates/NewsArticle.aspx?NRMODE=Published&NRNODEGUID={8F432B04-D3C9-419E-8365-5204663E648C}&NRORIGINALURL=%2FDefenceInternet%2FDefenceNews%2FEquipmentAndLogistics%2FSecuritySurveillanceAndsuperSangars.htm&NRCACHEHINT=Guest

      --
      "This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
    9. Re:Slight problem? by rackserverdeals · · Score: 1

      Speed limit is 65mph, 3024GB going 65mph = 1965560GB/h or 3276GB/s That's pretty fast.

      --
      Dual Opteron < $600
    10. Re:Slight problem? by JambisJubilee · · Score: 1

      I can now theoretically steal "the internet" with a flatbed truck and a lift.

      Carmen Sandiego?

    11. Re:Slight problem? by Voyager529 · · Score: 1

      This gives the term "Rick Rolling" a whole new meaning.

  6. Housed in a metal shipping container.... by MichaelSmith · · Score: 1

    Well I hope it is bolted down.

    1. Re:Housed in a metal shipping container.... by stonedcat · · Score: 0

      And lets hope those bolts go into something solid, like not dirt.

      --
      You can't take the sky from me.
    2. Re:Housed in a metal shipping container.... by keithjr · · Score: 1

      I'm surprised the summary doesn't mention it, but the shipping container is a Sun "Project Blackbox", or rather "Modular Data Center" as it's officially called.

    3. Re:Housed in a metal shipping container.... by Anonymous Coward · · Score: 0

      A story about 63 Sun servers replacing 800 linux servers won't fly here.

  7. Only one new datacenter? by Anonymous Coward · · Score: 0

    Just imagine what you could do with a beowulf cluster of 4.5 PB datacenters. You could create regular archives of the internet archives!

    (As a webserver administrator, I can't stress how important it is to keep backups.)

  8. Nice use for a bunny-rabbit by davecb · · Score: 1

    Yes, "thumper" refers to the rabbit. I have a Sun Managed Storage slide somewhere about how data tends to, er, multiply...

    --dave

    --
    davecb@spamcop.net
  9. What about 1996 and earlier? by commodore64_love · · Score: 4, Interesting

    Are there any resources the let us see websites from 1996, 95, 94, or 93? I would love to revisit the web as it appeared when I first discovered it (1994 at psu.edu).

    --
    "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    1. Re:What about 1996 and earlier? by Tumbleweed · · Score: 4, Funny

      I would love to revisit the web as it appeared when I first discovered it (1994 at psu.edu).

      No, you wouldn't.

    2. Re:What about 1996 and earlier? by Matheus · · Score: 2, Funny

      The entire internet prior to 1996 is archived on an old PC that I'm currently trying to get the 5GB disk restored on.. why I've kept all that old porn for so long completely escapes me tho. :)

    3. Re:What about 1996 and earlier? by Profane+MuthaFucka · · Score: 2, Informative

      Because after 1996 women shaved all their hair off due to a mistaken belief that men prefer their women to look like little girls. We don't, we like the big bushes, and that is why you must save that porn for the good of mankind.

      --
      Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
    4. Re:What about 1996 and earlier? by scottrocket · · Score: 1, Informative

      Yes, "The Wayback Machine", at archive.org. Coincidentally, I was there just last night, looking at a January '98 Slashdot.

    5. Re:What about 1996 and earlier? by scottrocket · · Score: 1

      Oops. '93-'96 - apparently in my universe, '98 is before '93.

    6. Re:What about 1996 and earlier? by Hurricane78 · · Score: 0, Offtopic

      Speak for yourself. I like my pussy shaven. But as someone who never licks his girlfriend's pussy, you would not know why that is, would you?

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    7. Re:What about 1996 and earlier? by Profane+MuthaFucka · · Score: 1

      Are you trying to insult me?

      --
      Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
    8. Re:What about 1996 and earlier? by TuaAmin13 · · Score: 1

      Failed Connection. We're sorry. Your request failed to connect to our servers. This may be due to temporary problems in our data center, or difficulty serving a higher-than-usual volume of traffic.

      I think Slashdot just slashdotted the wayback archive of Slashdot. I just tried looking it up and got the above error.

      So it IS possible to slashdot slashdot.

    9. Re:What about 1996 and earlier? by scottrocket · · Score: 1
      "So it IS possible to slashdot slashdot."

      Yes but only in the past, in which case measures would be taken to block future people-but then we wouldn't do it, in which case no measures would be taken to block future people and then we would do it, and measures would be taken to block future people, and then...and then

      ^error

    10. Re:What about 1996 and earlier? by cffrost · · Score: 1

      Next, we just need to thieve Hollywood's instantaneous "Enhance That!" app that converts 4-bpp GIF thumbnails into 6144x4096 floating-point TIFFs.

      --
      Thank you, Edward Snowden.

      "Arguments from authority are worthless." —Carl Sagan
    11. Re:What about 1996 and earlier? by Anonymous Coward · · Score: 0

      If I could attach an animated GIF of a construction worker and some yellow and black tape looking images and put UNDER CONSTRUCTION inside of a H1> tag, that would account for >90% of the webpages back then.

    12. Re:What about 1996 and earlier? by Matheus · · Score: 1

      OR maybe the data center just drove away..

    13. Re:What about 1996 and earlier? by skeeto · · Score: 1

      Surfing Freenet today is like surfing the web around '94 or '95, but with no blink tags.

    14. Re:What about 1996 and earlier? by Hurricane78 · · Score: 1

      Nah. If i would try to insult you, I would try to come up with something that is not true.

      KAZBAAAM! XD

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    15. Re:What about 1996 and earlier? by Profane+MuthaFucka · · Score: 1

      If you want to learn how to insult someone, you should take lessons from the woman who caught me masturbating outside her window last night. You're a fucking newb compared to her. Don't even bother to try.

      --
      Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
  10. That is a shit ton of space by webheaded · · Score: 1

    Unfortunately the Wayback Machine will still be slower than hell. :p

    --
    "Those who would sacrifice essential liberties for a little temporary safety deserve neither liberty nor safety." - BenF
    1. Re:That is a shit ton of space by jd · · Score: 1

      Fortunately, Hell has now been upgraded to 2 mb/s, thanks to British Telecom.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:That is a shit ton of space by kelnos · · Score: 1

      Two millibits per second? And that's an upgrade? Ouch.

      --
      Xfce: Lighter than some, heavier than others. Just right.
    3. Re:That is a shit ton of space by jd · · Score: 1

      One method of suspend-to-disk is to do a freeze/thaw. It has taken Hell over 16 billion years to do just the freeze. Two millibits per second should be able to do both in less than half the time.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  11. They had it on South Park by chadimus · · Score: 1

    It sometimes takes the form of a giant blue linksys router. So that we may better worship it.

  12. In reality... by tacarat · · Score: 1

    The internet is only about 2TB once you've removed all the redundant copies of 2g1c and goatse.cx.

    --
    "Common sense will be the death of us all"
    1. Re:In reality... by EdZ · · Score: 1

      Then you lose any data that may be stored in the arrangement of those many redundant, redistributed and reencoded copies. Distributed steganography, if you will

  13. They store 4.5PB in Egypt! by CannonballHead · · Score: 4, Funny

    The Internet Archive also works with about 100 physical libraries around the world whose curators help guide deep Internet crawls. The Internet Archive's massive database is mirrored to the Bibliotheca Alexandrina, the new Library of Alexandria in Egypt, for disaster recovery purposes.

    1. Re:They store 4.5PB in Egypt! by Anonymous Coward · · Score: 4, Funny

      Egypt could be a good choice. The area is fairly famous for reliable persistent storage. From papyrus scrolls to stone engravings, things tend to keep there better than most places. There really aren't many other geographical areas on earth that can claim the same kind of data retention rates over the time periods they've dealt with. Though despite their impeccable track record with avoiding hardware failures, they've done significantly worse when it comes to data loss due to theft and/or hackers/pirates.

      The one curious part about that choice is that the library at Alexandria is the one notable case where mass amounts of data were irreparably lost. So it's odd that they'd choose to entrust their data to that specific institution. Perhaps they felt that since it's under new management, the previous problems will have been resolved.

      However, had the choice been mine, I would have chosen to store my offsite data in Luxor. It's data retention was quite good, and included one data store that was preserved in its entirety for over 3000 years. As an added benefit, it seems that they've opened a second location that's significantly more convenient for the IA since there's no overseas transmission to worry about.

    2. Re:They store 4.5PB in Egypt! by Darkk · · Score: 1

      I oughta start a new company called:

      Off-Planet backups where I'd use the moon to store your precious data!

      Only three things I'd have to worry about would be:

      1) Aliens (if they are out there)
      2) Meteors
      3) Solar Flairs

      Other than that pretty solid plan to me!

    3. Re:They store 4.5PB in Egypt! by Hadlock · · Score: 1

      3) Solar Flairs

      Don't forget:
       
      4) Spelling Nazis

      --
      moox. for a new generation.
    4. Re:They store 4.5PB in Egypt! by Tubal-Cain · · Score: 1

      Just broadcast the (encrypted) data out into the void. When you need to retrieve it, simply invent a FTL drive and jump as far as the desired data will arrive shortly after.

    5. Re:They store 4.5PB in Egypt! by Anonymous Coward · · Score: 0

      I don't want to talk about my solar flair.

    6. Re:They store 4.5PB in Egypt! by timbck2 · · Score: 1

      How many pieces of solar flair do you have?

      --
      Absurdity: A statement or belief manifestly inconsistent with one's own opinion. -- Ambrose Bierce
  14. In Other News by Erik+Fish · · Score: 5, Informative

    Incidentally: FileFront is closing in five days, taking with it any files that aren't hosted elsewhere.

    I am told that many of the Half-Life mods hosted there are not available anywhere else, so get while the getting is good...

    1. Re:In Other News by bluesatin · · Score: 1

      It's a sad sad day when FileFront shuts it's doors.

      Please don't make us start downloading things from FilePlanet again, it makes me cry a little inside.

      )':

    2. Re:In Other News by skeeto · · Score: 1

      I was going to say "good riddance", but I was mixing them up with places like FilePlanet. I pictured that annoying situation where you want to download a 200kB file from them, but first you have to sign up for an account. So, you fill out a big form, check your e-mail, log in, click your way through several pages to the actual download, then get in a 40 minute queue. When you make it out of the queue you have 1 minute to start your download or else you have to get back in line and wait again. Awful.

      FileFront looks like it didn't have that bullshit. Sad news then.

  15. Never underestimate the bandwidth ... by Ungrounded+Lightning · · Score: 4, Insightful

    ... of a 4.5 petabyte datacenter in a shipping container in transit.

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
    1. Re:Never underestimate the bandwidth ... by Wingman+5 · · Score: 1

      (4.5 petabyte) / (1 year) = 153.114984 MBps
        now that would be some bad lag.

    2. Re:Never underestimate the bandwidth ... by Anonymous Coward · · Score: 0

      I'd reply in a year and get modded +5 Funny, except /. closes discussions after a short time.

    3. Re:Never underestimate the bandwidth ... by giblfiz · · Score: 1

      Government economic stimulus: Treating a patient for anemia with an iron supplement made from his own extracted blood.

      I can't resist replying to your Sig...
      It's like treating a patient for anemia with iron supplements made from his own extracted blood from the future. We are taking on debt, not trying to push through a one year ballenced budget. I'm not sure it's a good idea, but it's a much better one than what your describing.

    4. Re:Never underestimate the bandwidth ... by Ungrounded+Lightning · · Score: 1

      It's like treating a patient for anemia with iron supplements made from his own extracted blood from the future.

      Unfortunately, when you finance with debt on an economy-wide basis you pay double - or more. There's the return payment. (Plus the interest - which is the "more".) But there's also the cost to the economy of whatever WOULD have been done with the "borrowed" resources but now is not done because the resources were diverted.

      When they talk of how many jobs were created by the stimulus, ask how many jobs were destroyed by it: Destroyed because money was stolen by taxes or the value was stolen from the existing money by inflation. Normally the answer will be "more" - "a lot more" - because the funded programs are less productive than what was defunded and government-managed funds transfer is far less than 100% efficient.

      Look up the "fallacy of the broken window".

      --
      Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  16. 30 comments... by Anonymous Coward · · Score: 0

    and not a single "finally a place big enough to store all of my porn" reference? Y'all are slacking tonight.

    on a very slightly serious note, how much content would be referenced by, say, TPB? Sure the trackers are small, but that's got to be huge.

  17. library of Alexandria, Egypt by Anonymous Coward · · Score: 0

    Didnt that burn down a few thousand years ago?

  18. 63 x 48 = 3024Tb by eotwawki · · Score: 3, Insightful

    So wehre does the 4.5PB come in to this?

    1. Re:63 x 48 = 3024Tb by glitch23 · · Score: 1

      The article doesn't make it clear so I can only guess that the missing storage capacity is part of some SAN. Maybe the 48 1TB hard drives are only local storage (obviously) but are in addition to some existing SAN that they didn't mention in this particular article. Either that or the article is just wrong about the 4.5PB database.

      --
      this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
    2. Re:63 x 48 = 3024Tb by SirLoadALot · · Score: 1

      Good point. My best guess would be that they are actually 1.5 TB drives. That would get the numbers about right.

    3. Re:63 x 48 = 3024Tb by NickW1234 · · Score: 1

      That was my assumption as well. Of course, that's not accounting for any redundancy.

    4. Re:63 x 48 = 3024Tb by spinkham · · Score: 4, Informative

      TFA says "...eight racks filled with 63 Sun Fire x4500 servers with dual- or quad-core x86 processors running Solaris 10 with ZFS. Each Sun server is combined with an array of 48 1TB hard drives." (emphasis mine)

      I would guess this means there's a x4500 with 24TB in local disks, and 48TB in attached storage per machine. (24+48)*63 does give us the quoted number

      --
      Blessed are the pessimists, for they have made backups.
    5. Re:63 x 48 = 3024Tb by rackserverdeals · · Score: 1

      I don't think that's right. Sun's site has a video tour of it. Haven't finished it yet but it's here.

      --
      Dual Opteron < $600
    6. Re:63 x 48 = 3024Tb by rackserverdeals · · Score: 1

      The new datacenter is only 3PB. I guess the total storage, with the old data centers is 4.5 PB.

      So 48x63 gives you 3PB of raw storage. I'm guessing there using less because I can't imagine them running it in raid 0.

      --
      Dual Opteron < $600
    7. Re:63 x 48 = 3024Tb by pwnies · · Score: 1

      Actually it's a bit less than that even. The Sunfire servers they're using, or "thumpers" as they're nicknamed generally use zfs to store their data. However, the default configuration of these systems is to use a Raidz config for the drives (think raid 5). Essentially, the configuration uses 8 6-disk raidz configs, all aggregated together into one giant pool. The reason why it's less than what they state here, is that one disk from each of those eight raidzs are parity disks. That drops the theoretical storage to only 40TB per thumper. That puts us at 2520TB.

      Again though, even that's a high number, because once formatted, each thumper only delivers about 36TB of storage. So in actuality, they only have about 2268TB of storage (half of what they claim).

      My only guess is that the 4.5PB number comes from Sun's advertising dept, who are running the numbers on the theoretical max that the container could hold. If you use the highest capacity drives readily available on the market right now (1.5TB drives, as the 2GB drives are a bit hard to get ahold of), no parity, and no loss in formatting, the numbers are correct. 63 x 48 x 1.5 = ~4500

    8. Re:63 x 48 = 3024Tb by rackserverdeals · · Score: 2, Informative

      Sun has more information and an Interactive tour of the Internet Archive modular data center on their site.

      The total raw capacity of the container is 3 peta bytes. In reality it's going to be less than that. First, 2 disks are likely to be setup in a mirrored pool for the system disks. I believe the root pool only supports mirrors, not raidz. Not sure if this has changed.

      That leaves you with 46 disks for data. Maybe they partitioned part of the root pool to include in the data pools, not sure, but zfs works better with whole disks.

      In the interactive tour, they weren't clear on how they set up the pools.

      Side note. Maybe I'm cynical, but if this was the other way around, with linux servers replacing sun/solaris servers that probably would have been the headline.

      Pretty neat to find out that the internet archive is powered by Java too. The wayback machine is java as well as the crawlers.

      --
      Dual Opteron < $600
    9. Re:63 x 48 = 3024Tb by Anonymous Coward · · Score: 0

      DO you understand how RAiD work? 4.5 petaphiles total, 3 peteaphiles usable.

    10. Re:63 x 48 = 3024Tb by Anonymous Coward · · Score: 0

      If the headline was "800 x86 Linux servers replaced by 63 Sun Servers running Solaris" people would have thought April first came early this year.

    11. Re:63 x 48 = 3024Tb by Anonymous Coward · · Score: 0

      They might actually be using 1.5TB drives.

    12. Re:63 x 48 = 3024Tb by isorox · · Score: 1

      The total raw capacity of the container is 3 peta bytes. In reality it's going to be less than that. First, 2 disks are likely to be setup in a mirrored pool for the system disks. I believe the root pool only supports mirrors, not raidz. Not sure if this has changed.

      Our Thumpers are like that, but the new ones (4550) have a CF card for booting the OS. ZFS booting is supported since u5 or u6, but only booting from a mirrored pool.

      I was all excited by the storage denstity until recently. We had an issue with one of the controllers. A later patch fixed it. We installed that patch, which hung the machine (our backup machine). After powercycling (the alom didn't want to depower the box), we had a corrupt ufs boot partition that fsck couldn't fix. In the end I had to install u6 onto the boot drives.

      Combine the ridiclous not-quite-x86 not-quite-sparc remote access (the VGA port doesn't work with our raritan systems, the alom network grabbing the console isn't reliable, depending on whether vga is plugged in), and the difficulty for our org of maintaining solaris (compared with linux), as well as the cost, means we're not recommending them for our field offices.

      We have our machines set up in a z2, 44TB storage, with 2TB of parity. That's 40TiB, or 10TiB per rack unit.

    13. Re:63 x 48 = 3024Tb by rackserverdeals · · Score: 1

      You had all the drives set up as one raidz2 array? From what I understand, that's not the best way of doing it. I can't find the link but there was an entry on blogs.sun.com about how it's better to use a number of raidz's in a pool than have the pool consist of a single raidz array. Here's an example of that zfs configuration. And here's one that discusses performance and MTTDL.

      I was under the impression that the boards use the same type of daughter card that the TYAN server boards use. Raritan makes an IP KVM card for those boards. Have you tried contacint Raritan to see if you could use that instead?

      --
      Dual Opteron < $600
    14. Re:63 x 48 = 3024Tb by isorox · · Score: 1

      No,it's not the best way of doing it for a given value of best (performance). Our value was more storage density and disk failure. We offset the chance of getting random corruption by weekly scrubs and nightly rsyncs.

  19. Whoopsie! by Profane+MuthaFucka · · Score: 1

    That wasn't the ribbon, it was the powercord! Someone's going to be embarassed!

    --
    Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
  20. 3PB or 4.5PB by Anonymous Coward · · Score: 0

    I guess /.'s readers can no longer multiply, but 63 servers * 48TB/server = 3024TB =~ 3PB.

    I'm guessing they had 1.5PB already?

    Andy

      P.S. yes, I'm looking for a class 8 truck and a set of hydraulic jacks... but before I steal the Internet Archive, as a consumer, I DEMAND that the entire thing fit in my shirt pocket, and have an Apple logo on it!!!

  21. 63 x 48 =3024 by eotwawki · · Score: 0, Redundant

    So where does the 4.5PB come into this?

  22. You can ship it over OC-192... by Ungrounded+Lightning · · Score: 4, Interesting

    ... one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot,..

    As I recall from one of Brewster's talks: Part of the idea was that you can install redundant copies of this data center around the world and keep 'em synced.

    You can ship 4.5 petabytes over a single OC-192 link in about 71 days.

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
    1. Re:You can ship it over OC-192... by TheGratefulNet · · Score: 5, Funny

      You can ship 4.5 petabytes over a single OC-192 link in about 71 days.

      yeah, but just at the 70th day, someone will pick up the phone and the whole thing will have to be resent.

      --

      --
      "It is now safe to switch off your computer."
    2. Re:You can ship it over OC-192... by Anonymous Coward · · Score: 0

      You can ship 4.5 petabytes over a single OC-192 link in about 71 days.

      Assuming you have OC-192s to every location over trans-oceanic distances dedicated only to this.

      The initial sync would probably be done locally with a trunked 10 GigE, and then ship the duplicate container to the back up location. Then send deltas over your trans-oceanic links.

      Syncing snapshots (and only sending the deltas) is very easy with ZFS (which is presumably what they're using if they're on Solaris 10).

    3. Re:You can ship it over OC-192... by Anonymous Coward · · Score: 0

      whoosh! four mods didn't get the joke??

    4. Re:You can ship it over OC-192... by aaarrrgggh · · Score: 4, Insightful

      Or, you can ship the 40' containers in just under two weeks!

    5. Re:You can ship it over OC-192... by Plaid+Phantom · · Score: 1

      But I don't think they would fit in a station wagon...

      --
      All comments are properties and trademarks of the voices in my head. Not like I'm gonna claim them.
    6. Re:You can ship it over OC-192... by Anonymous Coward · · Score: 0

      "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."
      -- Andrew S. Tanenbaum

      s/station wagon/shipping container/
      s/tapes/Sun Fire servers/
      s/highway/ocean currents/

    7. Re:You can ship it over OC-192... by evilviper · · Score: 1

      Or, you can ship the 40' containers in just under two weeks!

      Yeah, but RSync over tractor-trailer is still in ALPHA.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    8. Re:You can ship it over OC-192... by jamiethehutt · · Score: 1

      Never underestimate the bandwidth of a cargo ship full of datacenters?

  23. Math by PowerKe · · Score: 3, Informative

    63 servers * 48 disk of 1 TB = 3024 TB. According to the announcement on the archive.org 3 Petabytes would be right.

    1. Re:Math by Anonymous Coward · · Score: 0

      And so, 1 library of congress.

  24. Mental images of libraries by macraig · · Score: 1

    Riiiight... because you happen to have a really really good mental image of exactly how many rooms/shelves/books/pages are stored in the Library of Congress!

    (Which incidentally doesn't happen to be static, BTW; yo momma's LoC ain't the same size as my LoC.)

  25. Slashdotted by thefolkmetal · · Score: 1

    I don't know if I'm the only one who read it this way, but the summary makes it seem like these servers have a bit of a job on their hands as it is, what with hosting the site and doing their web-crawling/archiving...and we slashdotted this thing? We're going to blow that little metal building up.

  26. "Sun Fire" by fm6 · · Score: 3, Informative

    The new data center houses 63 Sun Fire servers

    That's not very specific. "Sun Fire" is a brand that for a while got applied to all of Sun's rack-mount servers (except for NEBS-compliant servers, which were and are called "Sun Netra"). A little confusing, of course, which is why they've started calling new SPARC boxes "Sun SPARC Enterprise" to differentiate them from those mangy x64 "Sun Fire" systems. Except that there are still SPARC systems called "Sun Fire", so I guess the confusion factor didn't get any better...

    Anyway, the specific server being used here is the Sun Firex X4500, a system with no less than 48 1 TB disks in a 4U space. Notice that this model is EOLed; presumably iarchive got a deal on some remaindered machines.

    The shipping container is something we've seen before.

    1. Re:"Sun Fire" by ximenes · · Score: 2, Informative

      Since they're using one of Sun's modular datacenters that is actually on the Sun campus, I would imagine that they got some financial incentives / support from Sun for all of this.

      The X4500 is EOL as you mention, although it was still sold a few months back. It lives on as the X4540, which really isn't that different; the main thing is it's moved to a newer Opteron processor type and is a fair bit cheaper. So they didn't really miss out on anything.

      It's kind of interesting to me that they went this route, as opposed to a bunch of servers talking to a bunch of storage separately. This seems to be an exact use case for the X4500-type system, which as far as I'm aware is pretty unique.

    2. Re:"Sun Fire" by fm6 · · Score: 4, Interesting

      This seems to be an exact use case for the X4500-type system, which as far as I'm aware is pretty unique.

      Indeed. Sun is on a density kick. Check out the X4600, which does for processing power what the X4500 did for storage.

      In both cases, there actually are competing products that are sort of the same. The most conspicuous difference is that the Sun versions cram the whole caboodle into 4 rack units per system, about half the space required by their competitors.

      More absurdly-dense Sun products:

      http://www.sun.com/servers/x64/x4240/
      http://www.sun.com/servers/x64/x4140/

      The point of these systems is that they take up less expensive rack space than equivalent competitors. They're also "greener": if you broke all that storage and computing power down into less dense systems, you'd need a lot more electricity to run them and keep them cool. That not only saves money, it gives the owner the ability to claim they're working on the carbon footprint.

    3. Re:"Sun Fire" by Anonymous Coward · · Score: 2, Informative

      Anyway, the specific server being used here is the Sun Firex X4500 [sun.com], a system with no less than 48 1 TB disks in a 4U space. Notice that this model is EOLed; presumably iarchive got a deal on some remaindered machines.

      There are newer X4540s which are mostly the same, but have newer CPUs, and can hold more memory (16 -> 64 GB).

    4. Re:"Sun Fire" by Anonymous Coward · · Score: 0

      Anyway, the specific server being used here is the Sun Firex X4500, a system with no less than 48 1 TB disks in a 4U space.

      Speaking of density, these guys put 896 TB of raw disk (624 TB usable after RAID, overhead, and spares) in one cabinet.

    5. Re:"Sun Fire" by mgblst · · Score: 1

      So not only are the archive old content, but old hardware as well!

    6. Re:"Sun Fire" by MrPerfekt · · Score: 1

      Because ZFS is a gross memory hog. :) These servers work out pretty good for cold storage, but putting lots of random I/O read stress on it and you'll be left wanting some 15k SCSI disks. Oh well, price vs. performance.

      --
      I just wasted your mod points! HA!
  27. they cut the ribbon? by unfunk · · Score: 1

    They cut the ribbon? How are they supposed to access that much data unless they buy a new one?

    1. Re:they cut the ribbon? by jd · · Score: 1

      Easy. Ribbon's only good for short-distance parallel links. If they've got backups in Egypt, they must be using serial cables.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  28. Not "THE" but "A" internet... by denzacar · · Score: 1

    You would be stealing A backup copy of THE Internet. An incomplete one at that, but still quite extensive.

    Now... If you were somehow able to steal that copy AND break the internet... your stolen internet may be considered THE internet.

    --
    Mit der Dummheit kämpfen Götter selbst vergebens
  29. Load distributions by Just+Some+Guy · · Score: 1

    From TFA (yeah, I know):

    a Web site that gets about 200,000 visitors a day or about 500 hits per second on the 4.5 petabyte database.

    So they get all 200,000 hits in a 7-minute window? I picture a sysadmin going insane for a few moments then napping in a hammock for the rest of the day.

    --
    Dewey, what part of this looks like authorities should be involved?
    1. Re:Load distributions by diablovision · · Score: 1

      hit != visitor

      --
      120 characters isn't enough to explain it.
    2. Re:Load distributions by JesseMcDonald · · Score: 1

      I imagine that each visitor generates more than one database hit. This /. article page is made up of over thirty files, for example, each of which would probably count as at least one hit. By my simplistic calculations each visitor would need to generate about 212 database hits to maintain an average of 500 hits/s, which, while high, is not entirely unreasonable.

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
  30. Re:Where do they store 4.5PB off site by Chosen+Reject · · Score: 1, Offtopic

    [subject correction]
    PB, not TB... hehe.

    --
    Stop Global Warming!
    Just say no to irreversible processes!
  31. Minor problem with weaponized sharks by Anonymous Coward · · Score: 0

    Nothing is stopping you from putting it inside a building, cementing it into its foundation, or surrounding it with appropriately weaponized sharks.

    The presence of weaponized sharks implies the need for a moat. Somehow I doubt the city and county governments would appreciate its construction on the premise, as the presence of the said sharks would preclude passing it off as a swimming pool.

  32. Fool by Anonymous Coward · · Score: 0

    The internets isn't like a truck! It's a series of tubes!

  33. 4.5 PB... yumm by Anonymous Coward · · Score: 0

    Just imagine what you could do with a beowulf cluster of 4.5 PB datacenters. You could create regular archives of the internet archives!

    Actually, I was thinking the largest collection of pr0n the world has ever seen (to date.)

  34. Ok, how do you backup this thing ? by slashdotlurker · · Score: 1

    Inquiring minds want to know.

    1. Re:Ok, how do you backup this thing ? by Zapotek · · Score: 1

      Compressed and incrementally?
      Or is it incrementally compressed?
      I'm no expert anyways. :)

  35. The off-site backup IS the Internet. by billstewart · · Score: 4, Funny

    They're keeping the offsite backup distributed around the Internet, using the World-Wide Web to store it in real time.

    Part of it may even be on *your* machine! We've really got to stop Brewster from leaching all your storage and make him store his backup himself - this business of using the originals to back up the backup just isn't sustainable!

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:The off-site backup IS the Internet. by PopeGumby · · Score: 1

      They're keeping the offsite backup distributed around the Internet, using the World-Wide Web to store it in real time.

      I heard that sometimes the backup even gets updated BEFORE the proper archive? As much as two months before!

      Whats up with that?

  36. oblig. by indi0144 · · Score: 1

    Yo dawg we hear you like to back up your internets so we put a backup container inside your backup container so you can back up your back up while you back up your back up.

  37. RAID Array/File system? by Anonymous Coward · · Score: 0

    So there are 48 1TB discs running in paralell? What does that mean? RAID1? Well, we know it can't mean that, so RAID5? Seems like it couldn't mean that either. Maybe multiple raid5 arrays with 6 discs per array?

    Maybe some sort of nifty new filesystem I don't know about?

    So slashdot, How do you think the discs are arranged and what filesystem do you think they use?

    Whats the biggest tape backup one can buy nowadays?

  38. 4500 TB... by Anonymous Coward · · Score: 0

    Letsee...4.5 PB = 4500 TB

    A Linksys NAS200 can support two 1 TB hard drives, which is 1 TB of storage configured as RAID 1.

    Thus, 4500 NAS200 boxes could hold that much data. Dang. My house only has a 200 A main; I don't even have enough electrical service to run all of them. And my wife would have a fit when our electric bill showed up.

  39. Donate by Anonymous Coward · · Score: 0
  40. I knew it! by thatskinnyguy · · Score: 1

    ... ribbon on a new 4.5 petabyte data center [CC] housed in a metal shipping container that sits outside.

    I knew it! The internet is a big truck you can throw stuff in! What's this series of tubes business?

    --
    The game.
  41. Awesome . . . by petegas · · Score: 1

    Now we'll never have to worry about losing the old cached version of goatse

  42. Internet Archive= Censored webpages.Donors =STUPID by zymano · · Score: 1

    Why use it if webpages are being deleted from it.

    I have tried to use it before on some websites and the information was ALL DELETED.

    If a person puts a PUBLIC website up then it can be archived.

    The internet archive just shows that it has no backbone and it ISN'T interested in being a legitimate archive.

  43. way back machine dates are wrong by societyofrobots · · Score: 1

    I noticed that the dates on many webpages are entirely incorrect. For example, it says my webpage existed in 2001, when I started it in 2005 . . .

    1. Re:way back machine dates are wrong by Anonymous Coward · · Score: 0

      Is it possible someone else owned the domain name before you?

  44. Is this really fucking neccessary? by Anonymous Coward · · Score: 0
    How many copies of "website under construction" GIFs do we really need to archive from 1996?

    A healthy brain forgets that which is not important enough to remember - maybe we as a planet should do the same?

    This is a total waste of energy to maintain just so a couple of nerds can exclaim, "ZOMG remember frames!?!"

  45. Re:Internet Archive= Censored webpages.Donors =STU by u38cg · · Score: 1

    Or possibly, there's just making the content unavailable until copyright expires? Seriously, they don't have any law behind what they do, so they have to tread relatively carefully in order to not cause themselves bigger problems than not being able to archive a small number of websites.

    --
    [FUCK BETA]
  46. Bandwidth by AliasMarlowe · · Score: 1

    Actually, 100km/h (62.19mph) is 27.78m/s (91.13fps). So a 20-foot container on a truck will pass any given point in about 0.219sec. That's a burst bandwidth of 20.5PB/s or 164Pb/s.

    My "fast" internet connection is more than 9 orders of magnitude slower, at a mere 100Mb/s. Now I'm really annoyed with my ISP.

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    1. Re:Bandwidth by rackserverdeals · · Score: 1

      Thats pretty fast! I figured I had the calculation wrong.

      I've seen someone calculate truck bandwidth before when trying to decide whether to transfer backups over the wire or by wheels.

      We should get rid of wires all together and have the internet run on trucks. :)

      --
      Dual Opteron < $600
  47. When "Skynet" becomes self-aware... by Peet42 · · Score: 1

    ...it'll have the whole canon of 1930s Film Noir to use as reference material.

  48. Obligatory by nmg196 · · Score: 1

    4.5 petabytes should be enough for anyone.

    1. Re:Obligatory by hesaigo999ca · · Score: 1

      Not me, I have a desire to get ALL the movies off of pirate bay

  49. What about all that heat? by Ukab+the+Great · · Score: 1

    63 servers with 3024 hard disks in total jammed into the confines of a metal shipping container that sits out in the sun.

    That's sounds like either a recipe for disaster or a great Mythbusters episode.

  50. ZFS? by otis+wildflower · · Score: 1

    I wonder if they're running OpenSolaris/ZFS on these hosts...

    1. Re:ZFS? by rackserverdeals · · Score: 1

      Solaris 10 and ZFS according to the article.

      --
      Dual Opteron < $600
  51. Confused by imajinarie · · Score: 1

    isn't the Internet Archive the same thing as Google Cached Pages?

  52. You beat me to it by denis-The-menace · · Score: 1

    I don't know how many times it go to the "way back machine" to find NOTHING!

    Ever since it's been used in courts, the archive has been deliberately censored BEFORE backups even gets inside! I bet most of it now are Spam sites and startups-before-lawyers-get-involved sites.

    If MS takes something down, don't bother looking at archive.org. It was never copied there to begin with.

    --
    Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
  53. math by misterjava66 · · Score: 1

    63 * 48 * 1TB = 3024TB ~= 3PB
    For where does the number 4.5PB originate?

  54. What about google.... by hesaigo999ca · · Score: 1

    OK, so google is indexing the whole web, as well as these guys.....that's great, we have a sort of redundancy should anything go wrong.