Slashdot Mirror


Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)

An anonymous reader writes: My workplace has recently had two internal groups step forward with a request for almost a half-petabyte of disk to store data. The first is a research project that will computationally analyze a quarter petabyte of data in 100-200MB blobs. The second is looking to archive an ever increasing amount of mixed media. Buying a SAN large enough for these tasks is easy, but how do you present it back to the clients? And how do you back it up? Both projects have expressed a preference for a single human-navigable directory tree. The solution should involve clustered servers providing the connectivity between storage and client so that there is no system downtime. Many SAN solutions have a maximum volume limit of only 16TB, which means some sort of volume concatenation or spanning would be required, but is that recommended? Is anyone out there managing gigantic storage needs like this? How did you do it? What worked, what failed, and what would you do differently?

2 of 219 comments (clear)

  1. Depends who you ask... by snowgirl · · Score: 4, Interesting

    At Facebook, it's memcached, with an HDD backup, eventually put onto tape...

    At Google, it's a ramdisk, backed up to SSD/HDD, eventually put onto tape...

    For anyone who can't afford half a petabyte of RAM with the commensurate number of computers? I have no good ideas... except maybe RAM cache of SSD, cache of HDD, backed up on tape...

    Using something like HDFS to store your data in a Hadoop cluster of file requests, is likely the best F/OSS solution you're going to get for that...

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  2. You're asking like you will be implementing it... by tlambert · · Score: 4, Interesting

    You're asking like you will be implementing it... don't.

    Gather all their requirements, gather your requirements on top of it (I'm pretty confident that some of those requirements were your additions for "you'd be an idiot to have that, but not also have this...", possibly including the backup).

    Then put out an Preliminary RFP to the major storage vendors, including asking them what they'd say you'd missed in the preliminary.

    Then take the recommendations they make on top of the preliminary with a grain of salt, since most of them will be intended to insure vendor lock-in to their solution set, revise the preliminary, and put out a final RFP.

    Then accept the bid that you like which management is willing to approve.

    Problem solved.

    P.S.: You don't have to grow everything yourself from seed you genetically modify yourself, you know...