Which OSS Clustered Filesystem Should I Use?
Dishwasha writes "For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home. I have suffered twice through complete loss of data once due to accidentally not re-enabling the notification on my hardware RAID and having an array power supply fail and the RAID controller was unable to recover half of the entire array. Now, I run RAID-10 manually verifying that each mirrored pair is properly distributed across each enclosure. I would like to upgrade the hardware but am currently severely tied to the current RAID hardware and would like to take a more hardware agnostic approach by utilizing a cluster filesystem. I currently have 8TB of data (16TB raw storage) and am very paranoid about data loss. My research has yielded 3 possible solutions: Luster, GlusterFS, and Ceph."
Read on for the rest of Dishwasha's question.
"Lustre is well accepted and used in 7 of the top 10 supercomputers in the world, but it has been sullied by the buy-off of Sun to Oracle. Fortunately the creator seems to have Lustre back under control via his company Whamcloud, but I am still reticent to pick something once affiliated with Oracle and it also appears that the solution may be a bit more complex than I need. Right now I would like to reduce my hardware requirements to 2 servers total with an equal number of disks to serve as both filesystem cluster servers and KVM hosts."
"GlusterFS seems to be gaining a lot of momentum now having backing from Red Hat. It is much less complex and supports distributed replication and directly exporting volumes through CIFS, but doesn't quite have the same endorsement as Lustre."
"Ceph seems the smallest of the three projects, but has an interesting striping and replication block-level driver called Rados."
"I really would like a clustered filesystem with distributed, replicated, and striped capabilities. If possible, I would like to control the number of replications at a file level. The cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations. And lastly it should require as minimal hardware as possible with the possibility of upgrading and scaling without taking down data."
"Has anybody here on Slashdot had any experience with one or more of these clustered file systems? Are there any bandwidth and/or latency comparisons between them? Has anyone experienced a failure and can share their experience with the ease of recovery? Does anyone have any recommendations and why?"
"GlusterFS seems to be gaining a lot of momentum now having backing from Red Hat. It is much less complex and supports distributed replication and directly exporting volumes through CIFS, but doesn't quite have the same endorsement as Lustre."
"Ceph seems the smallest of the three projects, but has an interesting striping and replication block-level driver called Rados."
"I really would like a clustered filesystem with distributed, replicated, and striped capabilities. If possible, I would like to control the number of replications at a file level. The cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations. And lastly it should require as minimal hardware as possible with the possibility of upgrading and scaling without taking down data."
"Has anybody here on Slashdot had any experience with one or more of these clustered file systems? Are there any bandwidth and/or latency comparisons between them? Has anyone experienced a failure and can share their experience with the ease of recovery? Does anyone have any recommendations and why?"
LVM, mdadm & Ext4 or ZFS seems like it would be more then adequate for this. A 2U server can hold 36TB of raw data with software raid and consumer disks. 2.5" would be preferable for home use considering power usage unless your a fellow Canadian; in which case servers make great space heaters.
If you have more than one server then it's pretty easy to set up rsync with rolling backups (rsnapshot or rdiff-backup or whatever) which is more of a proper backup solution. It's also probably a bit easier to administrate than a clusterfs.
Having said that, Hadoop's HDFS looks quite good. AFAIK it is pretty robust, and it runs on top of an existing FS so you won't need to repartition, which is useful. FUSE file system driver, and Java, will be a bit slower than in-kernel, but probably not an issue for bulk data storage.
Oh, and another option is the Distributed Replicated Block Device. Though this is basically network RAID and not replication on a per file basis.
Drobo pro with 3 TB drives setup with dual redundancy will get you 18 Gigs of drive space. In the future, just swap out drives as drive sizes get larger and you can continue to expand. www.drobo.com
-----BEGIN PGP SIGNATURE-----
12345
-----END PGP SIGNATURE-----
Lustre is pretty cool, but it's not magic pixie dust. It won't break the laws of physics and somehow make a single node faster than it would be as a NFS server. It's for situations when a single file server doesn't have the bandwidth to handle lots of simultaneous readers and writers. A "small" Lustre filesystem these days usually has 8-16 object storage servers serving mid-high tens of TB. The high end filesystems have literally hundreds of OSSes and multiple PB served. The largest I know of right now is the 5PB Spider filesystem at Oak Ridge National Labs.
One nice thing about Lustre on the low end is that you can grow it... Start out small and add new OSSes and OSTs as you need them. This often makes sense in Life Sciences and digital animation scenarios where the initial fast storage needs are unknown or the initial budget is limited (but expected to grow). But if you're never planning to get beyond the capacity of a single node or two, Lustre is just going to be overhead. I don't know much about the other clustered filesystem options.
A host is a host from coast to coast...
Unless it's down, or slow, or fails to POST!
What kind of performance are you after? If you're not after anything over 40MB/S, I'd go for unRAID. I use this at home and it's brilliant. I've replaced many drives over the years, and I've had two hard drives fail with no massive consequences (data isn't striped). Plus, many many plugins are now available. SimpleFeatures (replacement gui), Plex Media Server, SQL, Email notifications with APCUPSD support etc etc.
Yeah, subby just needs to:
Two issues here:
1. You're approaching the problem from the wrong angle. IMV, the angle you take should be "how long can can I afford to be without this data and how much money am I prepared to throw at a solution?" rather than "what technology exists that I can use to make the system more reliable?". Taking the former approach allows you to plan exactly how you'd deal with data loss - whether it's through human error, software/hardware failure, fire, theft, flood or what have you. Taking the latter approach tends to result in some whacking great Heath Robinson (or if you're American, Rube Goldberg) of a solution that still has a whacking great hole in it somewhere.
2. 8TB of data is not an enormous amount by any modern standard. You can buy a NAS box off-the-shelf today that will take 12x3TB hard disks for 36TB (18TB if you've got the good sense to run them in a RAID 1+0 configuration) of storage; at this level they typically have replication built right into them so you can buy two and replicate one to the other (though like all replication-type solutions, it's not a form of backup and you mustn't treat it as such). If that doesn't appeal, simply put a couple of SATA controllers in a cheap box and run OpenFiler. Anything you cobble together yourself based on the latest clustered filesystem du jour will suffer from one huge flaw - a system that's designed to be highly-available is frequently less reliable than one that isn't, simply because you're making it that much more complicated that there's a lot more to go wrong.