Slashdot Mirror


Long-Term Storage of Moderately Large Datasets?

hawkeyeMI writes "I have a small scientific services company, and we end up generating fairly large datasets (2-3 TB) for each customer. We don't have to ship all of that, but we do need to keep some compressed archives. The best I can come up with right now is to buy some large hard drives, use software RAID in linux to make a RAID5 set out of them, and store them in a safe deposit box. I feel like there must be a better way for a small business, but despite some research into Blu-ray, I've not been able to find a good, cost-effective alternative. A tape library would be impractical at the present time. What do you recommend?"

5 of 411 comments (clear)

  1. Different manufacturers by idiot900 · · Score: 4, Insightful

    Hard drives are ridiculously cheap these days, especially for how much data you are storing. You may wish to consider buying drives from different manufacturers but of the same size to put in a single mirrored set. This way if there is a problem with a particular batch of drives it won't ruin everything.

  2. Re:Exactly what you're doing by hardburn · · Score: 4, Insightful

    That's why you hot-swap them. You treat them just like tapes. In fact, once you start doing that, you realize that RAID mirroring isn't helping you any (striping is another matter).

    The best way to backup a big hard drive these days is with another big hard drive.

    --
    Not a typewriter
  3. Re:Exactly. by Anonymous Coward · · Score: 5, Insightful

    Ok, yes, we see you know a lot about this.

    So what's your recommendation?

  4. Re:Exactly what you're doing by Again · · Score: 4, Insightful

    (Or btrfs on a Linux distro)

    Are you honestly suggesting using an in-development filesystem for backup purposes?

  5. Re:Exactly. by TooMuchToDo · · Score: 4, Insightful
    Either MogileFS, Lustre, or possible Hadoop (depending on the type and size of the data). Any sort of distributed file system where multiple chunks, replicas, etc (3 is a good number, more is better if you have cheap disk and deduping at the filesystem level) are constantly available.

    Feel free to ask more questions.