Distributed Storage Systems for Linux?
elambrecht asks: "We've got a _lot_ of data we'd like to archive and make sure it is accessible via the web 24/7. We've been using a NetApp for this, but that solution is just waaaay to expensive to scale. We want to move to using a cluster of Linux boxes that redundantly store and serve up the data. What are the best packages out there for this? GFS? MogileFS?"
Check out Lustre at http://www.lustre.org/ It's being developed/used by the DOE on alot of Supercomputer Cluster systems, for multi-terabyte storage stuff.
Panasas http://www.panasas.com/products_overview.html has some products which probably fit your requirement of high speed distributed storage.
Get A Centera.
I'm biased but this is a high level Linux based storage system done right. It's not easy to create a coherent storage system out of lots of separate machines, the software that runs on this cluster does a lot of work. This thing fully redundant with no single point of failure, dynamically expandable without even taking it offline, it scales to 100's of terabytes and manages all that content continuously (scanning for corruption and fixing it, garbage collecting, etc..). The cluster has redundant backend networks and parallel paths everywhere, it even uses reiserfs to store the data. There's a lot of good engineering in this unit and they sell it at a decent price compared to NAS boxes.
Check it out:
http://www.emc.com/products/systems/centera.jsp
I do work for EMC (like I said.. I'm biased) but I don't speak for them, my opinions are my own.
Storage clustering is simply hard to do while still presenting a low level filesystem interface. Tossing that out and creating file storage as a high level service with a richer interface seems like the right approach to me. Show me a storage clustering solution that doesn't do that and I'll show you something full of bugs, expandability issues, limitations, and pain points.
set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
I did some reasearch on clustering filesystems for work a while ago. Here's the Cliffs-notes version:
GFS High-end, a pain in the ass to set up and run. Wants a RHEL server or two to run. OpenGFS Started as a fork of GFS when the GFS license changed, it has followed a bit of a different path. Not nearly as stable or fast as GFS, but might be there some day. Lustre Lustre should be really nice, but is horrendous to run (at least, that's the word from my friends at Sandia, who know a thing or two about it). General consensus is that you need a full-time staff member just to make it work. If you can afford that, it's a good way to go. PVFS Fast, light-weight, not POSIX-compatible. If your apps don't need the stuff it doesn't do, or you're willing to write some glue code for your app to speak PVFS natively instead of using the FS driver, this is a great way to go. Looks simple to set up (as simple as these things get).^]:wq