Slashdot Mirror


PetaBox: Big Storage in Small Boxes

An anonymous reader writes "LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive, the non-profit organization that creates periodic snapshots of the Internet. The PetaBox products, made by Capricorn Technologies, are based on Via mini-ITX motherboards running Debian or Fedora Linux. The IA's PetaBox installation consists of about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes, according to the article. Now to strap one of those puppies to my iPod!" The Internet Archive continues to astound.

20 of 295 comments (clear)

  1. Good to see. by Anonymous Coward · · Score: 5, Funny

    For all the jokes out there about people 'downloading the internet' it's good to know someone is actually doing it.

    1. Re:Good to see. by FireballX301 · · Score: 4, Funny

      Who the heck cares about the rest of the internet, can this thing hold all the pr0n?

    2. Re:Good to see. by Anonymous Coward · · Score: 5, Funny

      But does it run Lin... um.

      How about a Beo.. oh damn

    3. Re:Good to see. by Council · · Score: 5, Interesting

      In one of the weirder perspective exercises I've ever conceived:

      5 petabytes of storage is enough for a brief five-minute DVD-quality sex scene for each person of legal age in the US (two to a scene). 100 petabytes would be five minutes of porn of every pair of people in the world.

      I actually wonder about this a little; how many women have posed nude on the internet? There seem to be an awful lot; I haven't been able to see them all (though I will continue to try). Where do they mostly come from, I wonder.

      --
      xkcd.com - a webcomic of mathematics, love, and language.
    4. Re:Good to see. by Mark+Hood · · Score: 4, Funny

      There seem to be an awful lot; I haven't been able to see them all (though I will continue to try). Where do they mostly come from, I wonder.

      Let me get this straight, you're trying to see all the porn in the world, and you still don't know where babies come from? :)

      --
      Liked this comment? Why not buy me something nice
  2. You hear about the Petabox? by Dancin_Santa · · Score: 5, Funny

    Michael Jackson was heard breathing a sigh of relief. He thought it was where they sent Petafiles.

    R. Kelly was scrambling to find the company's phone number.

    1. Re:You hear about the Petabox? by pyrrhonist · · Score: 4, Funny
      Michael Jackson was heard breathing a sigh of relief. He thought it was where they sent Petafiles.

      Hmmm, this seems almost familiar...

      Let's analyze this situation:

      • The time on our posts is exactly the same.
      • There's a difference of only 3 in the post id values.
      • I was unable to foresee the R. Kelly connection.
      This can only mean one thing... You are the Kwisatz Haderach!

      GET OUT OF MY MIND!!!

      --
      Show me on the doll where his noodly appendage touched you.
  3. archive.org by Nasarius · · Score: 4, Interesting
    Internet Archive, the non-profit organization that creates periodic snapshots of the Internet.

    They do a lot more than that! I've just been downloading some Warren Zevon shows from their Live Music Archive.

    --
    LOAD "SIG",8,1
  4. copyright by DualG5GUNZ · · Score: 5, Interesting

    Not to sound like an advocate or anything... But how is it that the Internet Archives project resists claims of copyright infringement and the likes when they have copies of entire websites in their records?

    --
    "I'm a philosophy major. That means I can think deep thoughts about being unemployed." -- Bruce Lee
  5. Petabox? by eclectro · · Score: 4, Funny


    Isn't that what naked girls climb out of to protest fur coats?

    Thank you, I'll be here all week.

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
  6. great usage. by Bananatree3 · · Score: 4, Informative

    Seriously, I think archive.org deservese sutch a storage system. I have very often wanted to go back to view an archive of a website a while ago, but the cache on Google was from yesterday. It also gives multiple archives of the website based on day which can be quite handy, especially for news related sites. I think they quite well deserve it.

  7. 'small box' by MonoSynth · · Score: 5, Funny

    So the inventor of the microprocessor dies and suddenly the definition of 'small box' for computer components is again reduced too 'fits in a big room'....

  8. maybe i'll be quoted in 15 years.. by qda · · Score: 4, Funny

    "nobody needs more than a perabyte of storage"

  9. Slashdotted .... by theoddbot · · Score: 4, Informative
  10. A Great Historical Tool by simrook · · Score: 5, Insightful

    The Internet represents a great historical tool. Case and point is what happened on 9/11. Being able to go back and see the progression, paranoia, patrotism, and early iraq/afgahanistan/binladen/hussien posts and opinions on various new sites is amazing. cnn, fox, the ny times, all are archived several times on 9/11 on archive.org.

    I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!

    The ability to look at a large representation of socity at one single critical moment in time, and being able to have first hand sources for all that information is something that can truely change the way history is recorded (and not in the bad newspeak ingsoc way either). Infact, a wholeistic archive of what happens day-to-day, in an easily accessible format, might well help written history to be more representative of actual history (instead of, say the history Bush wants us to believe; that the Iraq war was for human right and not wmd's). I love Foucault.

    The internet archive rocks... really hope this project continues full blast.

    - Peace

    --
    'Truth' is linked in a circular relation with systems of power which produce and sustain it...
  11. Wayback and Slashdot by mcrbids · · Score: 4, Funny

    Go ahead. Try Slashdot in the wayback machine.

    Slashdot has looked virtually identical since 1998!

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:Wayback and Slashdot by hawk · · Score: 4, Funny
      Oh, c'mon. It's not that bad.

      Why, just last year they introduced an entirely new story into the rotation of duplicates . . .

      :)

      hawk

  12. Re:No redundancy? WTF? by Depili · · Score: 4, Informative

    Acording to the archive.org (http://www.archive.org/web/petabox.php) they indeed have some redundancy, but not raid. They are operating each system as a separete node, and mirroring nodes. The above link also sheds light on other questions regarding TFA

  13. They don't like RAID by billstewart · · Score: 4, Interesting
    I was a bit puzzled by that also - the article said the things come in racks of 40 or 64TB, and 16 racks times 64TB is about 1PB, not 1.5.

    Also, the article says they don't like RAID, due to bad experiences with RAID5, and the system is configured as JBOD (Just a Bunch Of Disks). It doesn't say why, or what users should do to get equivalent protection. My guess is that depending on RAID within a box means you're still vulnerable if the box's CPU or disk controller decides to scribble the disks, or the power supply decides to catch fire or short out and deliver 240VAC on the +5V line or whatever. So if you want a RAID-like set of redundancy, set up your applications or file system mounting or something to calculate the protection disk in software and hand it off to another 1U box for storage.

    The overhead of the motherboards here is not that high - they're about $150-200, and support 4 disks that probably cost $200-300 each, so they're only about 20% of the cost, which is not bad. The article didn't say they're using SATA, and it sounded like it's some IDE variant instead, but if you're only using 100 Mbps Ethernet to connect to the box and not the optional GigE, it's not the bottleneck anyway. If you wanted an alternative design, you could probably do something with a couple of 4-way SATA controllers per CPU, with a lot of disks stacked vertically in a 3-4U box looking like an X-serve or something. But that wouldn't necessarily have much of an advantage.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  14. Two points by Salamander · · Score: 4, Interesting

    First off, this isn't quite an example of a company suddenly deciding to donate stuff to the Archive. As can be seen on their own website, Capricorn was spun off from the Archive on July 1, 2004. To a large extent, Capricorn exists for the specific purpose of providing storage to the Archive, and if that same storage can be sold to others so much the better.

    Second, what about interconnects and performance? The product descriptions say nothing about SCSI or FC or other storage-oriented connectivity, so one must assume that the connection to these boxes is through a network. That would mean each node is an NFS server (or similar), serving up 1.6TB using a 1GHz C3 processor, a maximum of 1GB of memory (for caching etc.) and what appears to be a single GigE link. Can you say unbalanced? The Internet Archive might be the only system with an access pattern so sparse that the ratio between capacity and performance wouldn't be crippling. Don't try using one of these with any other kind of application if performance is a concern...and BTW they don't seem to say anything about high availability or other storage functionality (e.g. integrated backup or snapshots) either. Capricorn's big play seems to be power consumption, but there are other players that can beat them on density (e.g. Copan with 224TB per rack) and multitudes who can offer better performance/functionality. I hate to sound negative, but this is a product so specialized as to be uninteresting.

    Disclaimer: I think I met some of the Copan guys once and they seemed cool enough, but there's no other relationship between me and them. That just happened to be the first name I thought of in this space.

    --
    Slashdot - News for Herds. Stuff that Splatters.