Slashdot Mirror


Distributed Storage Systems for Linux?

elambrecht asks: "We've got a _lot_ of data we'd like to archive and make sure it is accessible via the web 24/7. We've been using a NetApp for this, but that solution is just waaaay to expensive to scale. We want to move to using a cluster of Linux boxes that redundantly store and serve up the data. What are the best packages out there for this? GFS? MogileFS?"

2 of 52 comments (clear)

  1. Our crystal ball is fuzzy! by afabbro · · Score: 4, Insightful
    What kind of idiotic Ask Slashdot is this? All of the important data is missing:
    • What's "a lot"? 1MB is a lot of data if you think about it. When people start talking about "a lot" of data these days, I assume they're meaning hundreds of terabytes. Is that what you mean?
    • What's the budget? What performance do you need? Do you need to back it up? Do you need to replicate it? Your post is sort of like "hi, I have a problem. What is the answer? Thanks!"
    Also, it's "too expensive to scale," my friend. You'd think an "Editor" like Cliffy would fix posts, but he's too lazy.

    If you can afford NetApp, why not keep with NetApp? A bunch of Linux boxes is not a storage solution. Indeed, what does Linux have to do with anything? We're talking storage here. What are you planning to do - put in 200 of them with internal SATA drives? Yeah, that'll be a lot cheaper to maintain...

    I'm not shilling for NetApp, but if you really have "a lot" of data to put "on the web" "24/7" then you need some kind of real storage solution like a NetApp or one of their competitors.

    Now go away and please take Cliff with you.

    --
    Advice: on VPS providers
  2. We use OpenAFS by Bamfarooni · · Score: 4, Insightful

    We have about 27TB of data from Mars (and adding another TB per month) that we need to keep online. We have been using netapps, but at ~$25K/TB, plus maintenance (3 years maintenance is about as much as a whole new system) they're just WAY too expensive for data warehousing.

    We've moved to using linux based OpenAFS servers. A high quality 3U box (qsol.com) loaded with 16x 300GB ATA drives costs about $8.5K and provides us about 3.5TB (2 drives for parity, 2 drives for hot-swap). That works out to $2.5K/TB. If your risk tolerance is higher than mine, you can bring that up to $8K/5.5TB, for about $1.5K/TB). We really want 99.999% availability, so just to be safe, we keep a 100% redundent read-only copy on a second machine (AFS supports this beautifully, including automatic fail-over).

    OpenAFS has a couple of features that make it better than NFS (client-side cache, for instance), but it also has a few drawbacks, like no files >2GB.