Which OSS Clustered Filesystem Should I Use?
Dishwasha writes "For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home. I have suffered twice through complete loss of data once due to accidentally not re-enabling the notification on my hardware RAID and having an array power supply fail and the RAID controller was unable to recover half of the entire array. Now, I run RAID-10 manually verifying that each mirrored pair is properly distributed across each enclosure. I would like to upgrade the hardware but am currently severely tied to the current RAID hardware and would like to take a more hardware agnostic approach by utilizing a cluster filesystem. I currently have 8TB of data (16TB raw storage) and am very paranoid about data loss. My research has yielded 3 possible solutions: Luster, GlusterFS, and Ceph."
Read on for the rest of Dishwasha's question.
"Lustre is well accepted and used in 7 of the top 10 supercomputers in the world, but it has been sullied by the buy-off of Sun to Oracle. Fortunately the creator seems to have Lustre back under control via his company Whamcloud, but I am still reticent to pick something once affiliated with Oracle and it also appears that the solution may be a bit more complex than I need. Right now I would like to reduce my hardware requirements to 2 servers total with an equal number of disks to serve as both filesystem cluster servers and KVM hosts."
"GlusterFS seems to be gaining a lot of momentum now having backing from Red Hat. It is much less complex and supports distributed replication and directly exporting volumes through CIFS, but doesn't quite have the same endorsement as Lustre."
"Ceph seems the smallest of the three projects, but has an interesting striping and replication block-level driver called Rados."
"I really would like a clustered filesystem with distributed, replicated, and striped capabilities. If possible, I would like to control the number of replications at a file level. The cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations. And lastly it should require as minimal hardware as possible with the possibility of upgrading and scaling without taking down data."
"Has anybody here on Slashdot had any experience with one or more of these clustered file systems? Are there any bandwidth and/or latency comparisons between them? Has anyone experienced a failure and can share their experience with the ease of recovery? Does anyone have any recommendations and why?"
"GlusterFS seems to be gaining a lot of momentum now having backing from Red Hat. It is much less complex and supports distributed replication and directly exporting volumes through CIFS, but doesn't quite have the same endorsement as Lustre."
"Ceph seems the smallest of the three projects, but has an interesting striping and replication block-level driver called Rados."
"I really would like a clustered filesystem with distributed, replicated, and striped capabilities. If possible, I would like to control the number of replications at a file level. The cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations. And lastly it should require as minimal hardware as possible with the possibility of upgrading and scaling without taking down data."
"Has anybody here on Slashdot had any experience with one or more of these clustered file systems? Are there any bandwidth and/or latency comparisons between them? Has anyone experienced a failure and can share their experience with the ease of recovery? Does anyone have any recommendations and why?"
Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.
The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.
And don't forget that you have to use Web Services to access The Cloud. Nothing is more secure than SOA and Web Services, with the exception of perhaps SaaS. But I think that Cloud Services 2.0 will combine the tiers into an MVC-compliant stack that uses SaaS to increase the security and partitioning of the data.
My main concern isn't with the security of The Cloud, but rather with getting my Indian team to learn all about it so we can deploy some first-generation The Cloud applications and Web Services to provide the ultimate platform upon which we can layer our business intelligence and reporting, because there are still a few verticals that we need to leverage before we can move to The Cloud 2.0.
RAID is not a backup solution!
Get a girlfriend.
-- if it doesn't succeed in protecting your data, it'll make your wife die trying.
Is the only reason you're looking at a clustered filesystem that you don't want to lose data? Because if it is, it's probably not what you want. The purpose of a clustered filesystem is to minimize downtime in the face of a hardware failure. You still need a backup in the case of a software failure or in case you fat finger something, because a mass deletion can replicate to all copies.
Where is PronFS when we desperately need one?
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Would recommend you look at nagios monitoring - you can monitor your raid with that. Has saved me a number of times (always nice to be notified when something fails).
20 disks seems like overkill for your storage needs. Seems like the more disks you use the greater the risk of failure of one or more of them. Also, your electricity bill must be through the roof. I have 4 3TB drives with a 3Ware controller in RAID5 array which gives me the same storage capacity with 1/5th the drives Aren't you making this more complicated than it needs to be? ...Maybe that's the point?
...I just came for the free beer.
LVM, mdadm & Ext4 or ZFS seems like it would be more then adequate for this. A 2U server can hold 36TB of raw data with software raid and consumer disks. 2.5" would be preferable for home use considering power usage unless your a fellow Canadian; in which case servers make great space heaters.
Setup a mirrored server at a parents/relatives house that's preseeded and run rsync jobs to it. Add more storage and you can do generations too.
You ask about the technical specifications; but, when commenting regarding the three likely candidates you found, you've put philosophical objections first and foremost. I think you first need to figure out which factor is more important to you - specs, or philosophy. Otherwise you're probably going to waste a lot of time arguing in circles.
#DeleteChrome
We've had a few problems with Gluster (nodes getting out of sync and corrupting data - despite following the docs to the letter). Very nice in theory, and will be great if the stability gets a bit of work, but until then I'm hesitant to recommend it. We've also found the performance a bit lacking.
How about ZFS with your RAID controllers in single drive mode (or worst case JBOD)? Let ZFS handle the vdevs as mirrors or raidz1/2 as you wish. ZFSforLinux is rapidly maturing and definitely stable enough for a home nas. Or go the OpenIndiana route if that's what you're comfortable with.
My 4TB setup has actually been a joy to maintain since committing to ZFS, with BTRFS waiting in the wings. The only downside is biting the bullet and using modern CPUs and 4-8GB memory. Recommissioning old hardware isn't the ideal way to go, ymmv.
Just a thought.
We have been using OCFS (Oracle Cluster File System) for some time in production between a few different servers.
Now, I am not a sysadmin so can't comment on that aspect. I'm like a product manager type, so I only really see two sides of it: 1) when it is working normally and everything is fine 2) when it stops working and everything is broken.
Overall from my perspective, I would rate it as "satisfactory". The "working normally" aspect is most of the time; everything is relatively seamless - we add new content to our servers using a variety of techniques (HTTP uploads, FTP uploads, etc) and they are all magically distributed to the nodes.
Unfortunately we have had several problems where something happens to the node and it seems to lose contact with the filesystem or something. At that point the node pretty much becomes worthless and needs to be rebooted, which seems to fix the problem (there might be other less drastic measures but this seems to be all we have at the moment).
So far this has JUST been not annoying enough for us to look at alternatives. Downtime hasn't been too bad overall; now we know what to look for we have alarming and stuff set up so we can catch failures a little bit sooner before things spiral out of control.
I have very briefly looked at the alternatives listed in the OP and look forward to reading what other reader's experiences are like with them.
Why are you spending your money like that? Sneaker-net your drives to AWS EBS. It's a no-brainer.
I was going to say Lustre, but then I saw that you only have 16TB. 15 years ago that would have been impressive, but these days, those supercomputers you mention probably have that much in DRAM, and their file storage is in the multi-petabyte range. Lustre is optimized for large scale clusters, in which you have entire nodes (a node is a computer, here) dedicated to I/O - bringing external data into the in-cluster network fabric, while other nodes are compute nodes - they don't talk to the outside world, except by getting data via the I/O nodes.
That's why you'll see all this talk of OSSs and OSTs, as though they'd be distinct systems - on a large scale cluster they are.
For only 16TB, what you want is a SAN, or maybe even a NAS.
If you want open source, then go with openfiler. It supports pretty much everything. I haven't stress tested it, but it seems to work well for that order of magnitude of data.
Try Tahoe-LAFS.
/bin/rm
I wonder how long it would take to backup 8TB to carbonite dot com?
I think the best disk-hardware agnostic solution for preventing filesystem dataloss is an LTO-4 autoloader and regular tape backups (hopefully taken off site regularly). They are pretty cheap, a superlader3 with an 8 tape (6TB/12TB) capacity is less than $3000. Or buy a refurb LTO3 autoloader for a third the price and half the capacity.
You will spend all this effort to build this solution... and then your house will catch fire.
On the good side, the fire department WILL manage to save the basement by filling it with 80,000 gallons of water at 2,000GPM per fire engine.
Or, you'll be wiped out by a flood. Or a drunk will drive through the side of your house. Or you'll have a gas leak and the house will detonate. Or carpenter ants will eat away the floor joists.
Raid is not a backup solution. Neither is replication... if you whack the data, it'll likely be replicated. If you get a compromised machine somewhere, files they touch will likely be replicated. They only thing you're creating is an overly complex hardware mitigation. If THAT is how you define "data preservation"... you're doing it wrong.
Look more for a solution to move stuff offsite - a cheap pair of N routers running Tomato or OpenWRT, to a neighbor's house, and you reciprocate with each other. Bonus points if you use versions, transaction logs, journals, etc.
help me i've cloned myself and can't remember which one I am
"For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home"
At home?? I've met some people pretty fanatical about their porn collections but this hits some new highs! Kudos to you, Sir!
Take it from someone who has been there no better resource than IRON MOUNTAIN to store a backup copy of your data.
Offsite every day for full or once a week depending on how important your data is..do Fulls and delta's
You can use Apache Hadoop's HDFS. http://hadoop.apache.org/hdfs/ It is fairly simple to set up, very scalable, and it is very easy to set up a replication factor so that all your data is replicated 2, 3 or even more number of times across your cluster. It is used at many places for distributed computing, but I see no reason that it couldn't serve you well as a large personal file service.
Unraid works well for a home solution. I had 2x2 TB drives fail within a one month period of time and lost no data.
Of course he did, because what kind of frame-up would it be if he didn't confess, and reveal the body?
They clearly threatened him with something to get him to cooperate. If they're going to suborn the justice system, why stop there? Why not actually kill the woman, and then threaten the father if he doesn't confess to the crime?
This paranoid conspiracy has been brought to you by the letters U,F,O and the number 52.
Drobo pro with 3 TB drives setup with dual redundancy will get you 18 Gigs of drive space. In the future, just swap out drives as drive sizes get larger and you can continue to expand. www.drobo.com
-----BEGIN PGP SIGNATURE-----
12345
-----END PGP SIGNATURE-----
Seriously stop with the experimental and filesystem projects still in beta. You need one that is matured and time tested. Do a bit of research. I don't even run RAID and have yet to permanently lose anything in probably 20 years.
Only the State obtains its revenue by coercion. - Murray Rothbard
http://en.wikipedia.org/wiki/Global_File_System
http://en.wikipedia.org/wiki/OCFS
Lustre is pretty cool, but it's not magic pixie dust. It won't break the laws of physics and somehow make a single node faster than it would be as a NFS server. It's for situations when a single file server doesn't have the bandwidth to handle lots of simultaneous readers and writers. A "small" Lustre filesystem these days usually has 8-16 object storage servers serving mid-high tens of TB. The high end filesystems have literally hundreds of OSSes and multiple PB served. The largest I know of right now is the 5PB Spider filesystem at Oak Ridge National Labs.
One nice thing about Lustre on the low end is that you can grow it... Start out small and add new OSSes and OSTs as you need them. This often makes sense in Life Sciences and digital animation scenarios where the initial fast storage needs are unknown or the initial budget is limited (but expected to grow). But if you're never planning to get beyond the capacity of a single node or two, Lustre is just going to be overhead. I don't know much about the other clustered filesystem options.
A host is a host from coast to coast...
Unless it's down, or slow, or fails to POST!
Lustre - no replication (it's on the roadmap for sometime in the next few years), and it relies on access to shared storage (read: FC/iSCSI disk array, and if that fails you loose your data.). OCFS - no replication, designed for multiple servers accessing one array. Ceph - has replication, but still in active development, and somewhat complex. Good if you don't mind loosing your data (it's in alpha... if it breaks, you get to keep both pieces...) GlusterFS - I have no experience with it, but it seems to be pretty stable at this point. And has some degree of replication with is a plus. If all you're going for is replicated storage across two systems I'd recommend just setting them up separately and rsync'ing from one to the other. Otherwise, one filesystem crash will take out all your data - parallel filesystems can buy you some reliability, but still can't be considered "backup" strategies. And you still need to pay attention to things like RAID (at least RAID6! RAID5 is likely to fall apart after one disk failure with >2 TB disks),
seriously. NetApp
What kind of performance are you after? If you're not after anything over 40MB/S, I'd go for unRAID. I use this at home and it's brilliant. I've replaced many drives over the years, and I've had two hard drives fail with no massive consequences (data isn't striped). Plus, many many plugins are now available. SimpleFeatures (replacement gui), Plex Media Server, SQL, Email notifications with APCUPSD support etc etc.
I wish Bob were my neighbor.
MFS seems to work quite well. it's designed for large files, not lots of small ones. it will allow you to set number of replicas required at the file or directory level.
http://www.moosefs.org/
One as the primary, sharing space via NFS for your VMs and whatever else. Throw a couple of SSDs in there for caching.
The second replicating from the first (via ZFS send/receive, or just simple rsync) with snapshotting for backups and regular syncs to some off-site data store for truly irreplaceable data.
This is the setup I use at home, and it sits behind a 3-node VMware cluster, several desktop PCs (one of which boots from the main server over iSCSI), and couple of media PCs.
Other than that, your requirements seem a bit confused. "Cluster filesystem" looks to be a buzzword being thrown out there without any actual need for same. "the cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations" is a non-sequitur as neither a cluster filesystem, nor high-availability are a necessity for "guest migration".
What are your key requirements here ? Data reliability is a lot easier (=cheaper) to achieve than high availability, and it's a struggle to see how real high availability could be any sort of requirement in a home server scenario.
go buy 2 or 3 cheap 8-10TB NAS devices
cycle one of them through every few months for a backup, and then store it at another physical location
that will run you less than $3000 total and a lot fewer headaches
http://serverfault.com
This is what I understood the common wisdom to be for years now.
You're serious about protecting your porn...
I have been using MooseFS for over a year now, it has proven to be amazingly solid, and very easy to set up and manage. I am running a 600 TB install that is maintaining over 40 million files for a large music service.
Check out the MooseFS website:
http://moosefs.org
Moose can also run on any Unix like system, so you are not restricted to Linux, I have connected Linux, FreeBSD and Mac OSX systems to it, it also scales very cleanly and was much faster in our initial tests than GlusterFS. I highly recommend it!
Is online redundancy (IE availability) your concern? Or is it recover-ability?
If your concern is the ability to recover in the event of hardware failure, you are over complicating the situation. I have about 1.5 TB of "data" between pictures of the family, movies, music, games, configs, documentation, and the list goes on. So, my primary storage server at home has 2x 2TB Western Digital Green drives that are just in a simple Linux software mirror. I also have two more disks that alternate between my house and a safe deposit box at the bank. About once a month (or more frequently if I add files to my server), I rsync my data to the disk at home, and take it to the bank.
The script that syncs does a simple rsync --delete -avx /blah/ /backup/ I also mount /blah (the source) as read only while I do the rsync to prevent something stupid from happening.
Now, you mentioned you had a large array, and that's fine. I'd buy a few 3TB drives and create a volume group with them, create your /backup on that volume group, and do the same thing. These are backup disks, they don't need to be fast.
I don't trust hardware raid (specialized controller raid), and while I am a unix admin, and manage large GPFS, Ibrix, and GFS clusters at work, I think that simplicity is always better.
The safe deposit box costs me about $25 / year, and keeps me safe in the event of a fire, theft, meteor, zombie invasion, etc.
A friend suggested that I just put a few drives in one of his servers, and rsync via ssh to his box. I don't want to do this for two reasons.
1) I don't have a lot to hide, but I don't really want everyone poking through all my pictures and whatnot
2) I'm lazy, so I'd probably script it up and I wouldn't think about it until I needed it. So, it wouldn't prevent me accidentally blowing data away on the replica before I noticed I blew something up.
Remember, a truly wise man never plays leapfrom with a unicorn
I worked for a 911 call center. We had redundant raid arrays for the oracle database. One array was RAID 0+1 (striped mirrors), and the other was 6 (block-level striping with double distributed parity) . The idea was that you have redundancy so the odds of both arrays crapping out is unlikely (wear patterns on the drives will not be identical, whichever one dies first, gets fixed first, then mirrored, and if the other dies the next day, you don't lose any day data and there is no downtime). Its not a bad idea to archive data too. That was done on a regular basis. I have my own databases for a website. Not huge, not oracle, but I run weekly archives (I dump the database, and archive all of the scripts and website directories and compress it all). My database only currently sucks up about 600MB, and the website directories (and all the config files) maybe another 150MB. My directory archives are incremental after the first to save space. Either you archive, make redundant archives, or risk data loss. I've worked for places that move moved tapes offsite. Thats one option, another is a wifi connected NAS in the garage. If there is a fire, your data is safe. Its another option you could look at. Just sayin'.
Make sure they're a neighbor you can trust, though. You don't want the FBI kicking in your door all of a sudden b/c your neighbor was backing up their questionable porn stash to your household.
FreeBSD 9 is in RC and supports a fairly current ZFS version. I just switched a few weeks ago and couldn't be happier. I was running Nexenta before (Solaris kernel, Ubuntu-ish userland). The Nexenta zpool imported into FreeBSD flawlessly.
I guess you're not a smurf.
Openindiana with zfs
http://openindiana.org/
Or
Greyhole
http://www.greyhole.net/
I've got a measly couple TB, but I was reminded the hard way that digital replication isn't perfect unless all the hardware is perfect.
I was running a server with normal (non-ECC) RAM and was hit by single bit replication errors that only occurred about every 500GB due to a faulty motherboard memory circuit.
PAR2 or RAR style repair blocks can prevent this problem if you don't let it go too many generations between the times that you verify your backups.
If you're looking to have any kind of decent performance in your VMs this just won't work.
I've worked with VMs on all different kinds of storage (fiber channel SAN, local disk, iSCSI SAN (over 1Gb and 10Gb ethernet), Local hardware raid, NFS file shares, GFS2 (as in the RedHat cluster file system), and MooseFS and GlusterFS) All of these have been either in large test labs or in production cloud deployments. I've never had a cluster file system get close to passing muster as a storage medium for VM usage. IO is the number 1 bottleneck in virtualized environments, and these schemes just add completely unacceptable latency and bandwidth restrictions.
The only way to really run VMs is fiber channel SAN, local disk (or hardward raid), or iSCSI with 10GbE (on the storage server side). Even iSCSI with 2GbE (2x1GbE bonded) is not speedy enough to support more than 5-10 VMs running concurrently. You'll start to see problems at 5 VMs if the VMs are windows... For whatever reason Windows really likes to write to the disk. Currently I have 4 servers in my basement, a single storage server (6 2TB drives in a raid6, giving 8TB of usable disk) and 3 VM servers (2 2TB drives each, in hardward RAID1). I run the VMs locally and back them up to the storage machine over iSCSI nightly. I also have a shared volume on the storage system that all VMs and my household computers can access. I use openfiler for my storage system, if I had the money it would be nice to get a second storage server and replicate it (which openfiler supports), but I don't have that cash just sitting around right now
Backing up 8TB of data (ok, so I have about 5TB used), is basically impossible offsite, so we have a "special" folder on the shared drive that is backed up using crashplan, its about 600GB, and the first backup took nearly 3 months over a 5mbps upload.
The above setup is the only one I've found that is both a) somewhat affordable, and b) performs well enough to do actual work in the VMs. It provides for some mobility in the event of a hardware failure (if a VM server crashes, I can run the crashed VMs via iSCSI on another server (from the day old backup), If the storage server crashes, the only "important" data is the 600GB in the special folder... which would take 2 months to download over my home connection... But could be downloaded in stages, IE get the most important stuff immediately). If both a vm server and the storage server crash, I'm out the VMs that were running on the vm server, but again the important data is off-site, and the VMs can be rebuilt in a day or less.
Simply use ZFS across your drives. There is no way you can use all your resources (network bandwidth, disk bandwidth) even on a low-end machine unless you get to ~50-200TB and require more than ~100,000 IOPS (which is doable on a single machine loaded with SSD, memory and 10GbE). There are setups that offer 1PB with 1M IOPS running on 2 very beefy (failover) hosts, only after that, distributed becomes necessary (unless of course you need geographical distribution).
Distributed file systems are nice if you know what to use them for. If you don't (as you already admit being lackluster with eg. your RAID setup), you'll risk losing more data to it than it will ever help you. Yes, doing it wrong has a much higher chance of your data getting lost than simply going for a single machine.
Custom electronics and digital signage for your business: www.evcircuits.com
Quickly reviewing, I would go with GlusterFS. GlusterFS is free software, licensed under GNU GPL v3 license. Lustre Filesystem is GPL, but tainted by Oracle as you noted. Ceph is LGPL. I would go with the license you are most comfortable with. OrangeFS is also LGPL which you may wish to check out.
You are being MICROattacked, from various angles, in a SOFT manner.
Questioner: Clustered filesystems like the ones you listed are needed for much larger scaled problems than your situation -- trying to employ them for your scenario will introduce a whole new layer of necessary complexity that is likely more of a risk to your data (or at least your sanity) than the problem you are trying to solve. Not the right toolbox for the job.
Stick with RAID (software or something like ZFS if you are concerned about hardware compatibility -- research the caveats).
A couple machines running NexentaStor in a replicated config might be a relatively low-pain way to go.
If you're that concerned about your data, you should really consider regular off-site backups (lots of cloud services now days that do the job well).
Oh, and RAID IS NOT BACKUP.
Because you get the same amount of single sector failures, no matter what the capacity of your discs is. As soon as they can slam more data on the same surface, they will do so, because the commercial threshold for data loss seems to be the chance of single sector failure.
Also, if i had only 10 SATA discs for virtual machines image storage, I'd be really unhappy, let alone three. in the summary it clearly states hosting VMs for HA is a requirement. Judging by the number of disks without looking at the requirements is bad, m'kay?
Because you need the VMs with HA, I'd really be looking at enterprise level storage with decent backups. Distributed filesystems will, as far as i know, not grant you transparent failover for your hypervisor. You'll still need some server to centralize your storage requests on a block device level, making the distribution layer invisible to your hypervisors.
I was promised a flying car. Where is my flying car?
You don't seem to understand a few basics about storage, so let me explain them briefly:
Backup is a method of storing your data in a safe place, so if you accidentally or purposefully delete it, or if you have a (severe) hardware failure, you still have your data. This automatically means you'll want to store your backup data on a totally, physically separated medium. If someone wants to destroy your data, a distributed filesystem won't do you any good. Taking one way snapshots over a network link to a remote location, for instance using rsync and a remote filesystem that supports snapshots, can be a viable solution for short term backups, but if you want longer term retention, "old hat" backup equipment still is a viable solution. How are you planning to restore from data corruption that happened 2 weeks ago? How do you protect against single sector failure? I have yet to see consumer grade raid controllers that actually do a read-verify on every read, so you're depending on raid-scrubbing to detect failures, with the setups you're looking at. A backup is for recovery of data lost on your primary storage system. You can make your primary storage system resilient with distributing and snapshotting it to an inch of it's life, but it's not a backup. If you don't make backups, your data obviously isn't worth it, so why bother making your primary storage resilient in the first place?
A "Super blahblahbla" or whatever hardware you are planning to buy now, will not give you "a decade's worth of time". Look at 10, 20 and 30 years ago. Would you honestly say you'd want to store all your data on a state of the art 7*40GB RAID5 system, as was the bees' knees in 2001? Or how about a pristine 40MB IDE hard drive, the best you could buy in 1991? I think 1981 was still cassette or single sided floppy disc territory.... Seriously, never look forward more than 3 years with setups like this.
I was promised a flying car. Where is my flying car?
Or alternatively you back everything off to tape (rotating sets) and store them in a fireproof safe.
Taking one way snapshots over a network link to a remote location, for instance using rsync and a remote filesystem that supports snapshots, can be a viable solution for short term backups, but if you want longer term retention, "old hat" backup equipment still is a viable solution. How are you planning to restore from data corruption that happened 2 weeks ago?
It's easy enough to keep any desired schedule of incremental backups with rsync - search for rsnapshot for example, or BackupPC if you want a fancy web-based interface.
Otherwise, 100% agreement: backups should be physically separated from the primary data, preferably by significant geographical distance (think about fire) and duplicated on several locations.
I hear all the time this "must move stuff off site" and that is in theory a good practice. I reality it is overkill for the home user for the majority of his downloaded movies.
When I looked at the data that I REALLY needed to keep, I came to "not very much". Nothing that I could not host (encrypted) at my provider. I am talking less then 20MB in data.
When my house burns down, I have other worries then my MP3 collection or my movies. I have not take copies of all the books I have in the house either, even though that is technical possible.
Remember: this is a home solution. Now if it were a business solution, then off-site backup that you pay for must be an option. If that is too expensive, then the data is not worth saving.
Don't fight for your country, if your country does not fight for you.
Stop bashing on the Raid!=Backup thing, we all know and its irrelevant to the question.
I believe his main concern is having one giant volume (say 30 TB) to store data, and not about using it as a backup solution. (he did not even use the word)
A backup for that volume would simply be duplicating the setup offsite, possibly offline archiving the cloned disks or (what i'd do) the complete hardware setup.
I once investigated GlusterFS too, was impressed and descided that its for larger scale projects and for me only overcomplicates things.
I ultimately solved this by buying several cheap QNAP TS410's and giving up on single volumes over 6TB of size, and mounting those seperately on the machines that use that data.
I'm still interested however in the possibility of running GlusterFS on QNAP products however.
Hivemind harvest in progress..
I currently have online 1x RAID5 (3 disks, 1TB), 1x RAID5 (3 disks, 2TB), 1 x mirror (2 x 1TB -- these were in a RAID5, but one died leaving me with two "free" disks since I moved the 2TB to one array), 2x750GB single disks over iSCSI, 2x250GB single disks, and 1x750GB single disk.
Let me reiterate:
File server:
1x3 disk RAID5, 2TB usable
1x3 disk RAID5, 4TB usable
2x 750GB single disks
Linked via 1GB ethernet (slow, I know, but I don't want to spend 300$ on 2x 1Gb ethernet NICs with jumbo frames and bonding on 2Gbps (internal + pcie card) total bandwidth). The file server serves iSCSI and encryption, and I run the filesystem drivers on the local box (NFS has never given me more than 2.5MBps, even over 1Gbps link -- so I don't do it. I don't need to mount the partition from multiple hosts, anyway.)
primary box:
1 eSATA 250GB
1 3-bay rack 250GB
1 3-bay rack 750GB
2x 1TB disk mirror
Main machine is an i7-920, which I mount the filesystems over iSCSI (fileserver handles RAID and encryption, remember), iSCSI being set up to checksum the information. In the past 6 months, I've had no problem with this setup, ext4 on all partitions. Note that I make the partition size at _MOST_ one disk -- 3x 1TB disk array has 2 partitions, 1TB each -- I do this so that if I need to, I can move everything to one disk (2TB disks are doable, but if I had a 4TB array.. uhh..), rebuild/fix that partition of the raid (or possibly the whole RAID itself) with not much trouble. This has saved me.
In the past I tried BTRFS, but due to _SEVERE_ performance cost (perhaps because I was running an SQLite database, but 30 second lag to fsync), I dropped it and went back to ext4. ext4 has one or two bugs* (and fsck, which btrfs lacks -- STILL), but MUCH better. BTRFS seems to be like WinFS.
Ok, so dedicated box to do encription/RAID (hey, like hardware RAID -- just in software and limited only by your bonded ethernet), your local box mounts via iSCSI and manages the filesystems (which you can still fsck on the fileserver box -- they can't be mounted at that time, anyway). So do this, make sure that you can copy _all_ of the data for a partition to _one_ disk, and your RAID can lose a disk, you replace it, get a corrupt partition, you can copy off/format/recopy, etc. You're safe. I've been doing this for 6 years, and have only lost data on FreeBSD's graid5 -- good, but hard to work with. Linux, I've had to use an old version of mdadm (3.0 requires 2MB offset from start, 3.0 doesn't -- guess which built the raid and which I used to try to repair it), but with Linux I have not lost data. 8TB over 4 years, no data loss. It can be done, the above is what you want to do.
Don't try and make a 20 disk array. You will screw yourself. Make it so that you can copy _all_ your data elsewhere and rebuild. Use a tool like Static File Dupes (sourceforge) to manage duplicate data.
*
EXT4-fs (sdi): initial error at 1311095913: __ext4_get_inode_loc:4929: inode 2401866: block 144179490
EXT4-fs (sdi): last error at 1311095913: __ext4_get_inode_loc:4929: inode 2401866: block 144179490
EXT4-fs (sdk): error count: 1
EXT4-fs (sdk): initial error at 1302253869: ext4_put_super:719
EXT4-fs (sdk): last error at 1302253869: ext4_put_super:719
EXT4-fs (sdj): error count: 1
EXT4-fs (sdj): initial error at 1302253902: ext4_put_super:719
EXT4-fs (sdj): last error at 1302253902: ext4_put_super:719
Each of these ISCSI disks were fsck'd before being mounted. Each is a separate filesystem/partition. Over time, or perhaps as soon as being mounted after fsck, each has kernel errors.
Linux 2.6.39.4 #4 SMP 2011 x86_64 Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz GenuineIntel GNU/Linux
A mirror of live spinning disks updated at intervals is icing on the cake if you can afford it after you have real backups - doing it instead is can look extremely stupid when things go wrong.
A web hosting company near me failed spectacularly due to that mistake - their mirror was mirroring garbage and they lost all of their clients files. Of course it made it even into the print media and it made them look very stupid.
I'm surprised that removable backup media has not caught up with the speed of change in hard disk sizes. In the olden days we used to backup with a couple of QIC-60's and we were happy. Later it was DAT backups, What inexpensive tape backup technologies are available today? It would seem that the best alternative is to use a drive itself as a backup medium and take it off-site.
OCFS was originally designed specifically for storing Oracle datafiles, in a cluster, in a non-POSIX fashion. After that came OCFS2, which is POSIX compliant, but can deadlock when NFS exported due to the way NFS handles locking, in a way that can be worked around with the "nodirplus" NFS mount option (not available on all OSes, but Linux is ok). They since developed ASM (Automatic(ed?) Storage Management) which threw away the traditional filesystem presentation of your oracle datafiles, and subequently bundled that into the release of 11gR2 clusterware and extended the functionality to give us ACFS - ASM Clustered Filesystem.
11gR2 clusterware is designed to be clustered with shared storage, and depending on the options when created will happily give you a POSIX compliant clustered filesystem for any occasion - datafiles, regular files - whatever. It is Oracle's implementation of their "best practice" Stripe And Mirror Everything methodology with the aim of not only high availability, but consistently high performance, through spreading all your data across all your disks, and implementing mirroring in a sane way too (split your disks into two (or three!) failure groups, and the software will ensure there are 2 (or 3!) copies of each block. All you do is add disks to the pool(s), and if you have the space you can dynamically remove disks from the pool too. You can fsck, mkfs, mount and unmount it, take snapshots (!), and the lead-up to all that is all not much of a stretch from LVM. Google for Oracle ACFS and see the "Basic Steps to Manage Oracle ACFS Systems" section.
OCFS was only ever available for Linux, but ACFS now supports other platforms... probably doesn't matter to you. The one catch I've found so far is the ~1Gb RAM overhead to run the clusterware PER NODE. There's other reasonable stuff, like you need the network layer to be up in order to start the ACFS supporting services, so you can't put anything related to the basic boot process on those volumes.
The cost of 11gR2 clusterware? ... nothing. I think it's one of very few "free" (as in beer) products they do. It will work on anything they've compiled it for though - generally means your Enterprise OS like RHEL5 (and should be easy to shoehorn onto CentOS), a recent SuSE release, and of course their own Oracle Enterprise Linux - which I believe is also free to use, but pay through the nose if you want them to support your implementation. Remember that this system is the platform for some very expensive Oracle products, but at the same time it is perhaps a younger product than some you'll have already looked at.
As for the fencing method, it all works via heartbeat to disks in your ACFS pool. If the clusterware can't "ping" the disk within the threshold, it forces the system that's having the issue to reboot. Such is the nature of ensuring sanity when using shared disk. I suggest looking at it if your boxen can spare the RAM and you're happy to accept their OTN license agreement, as it really does seem to be one of Oracle's better products at an amazing price for what you get.
Take a look at tahoe least authority file system.
It is intended to be used on systems that are distributed in a network (like the internet) for secure and failsafe storage of data. But no one prevents you from running multiple instances of this software on a single system.
Data is encrypted on the client (so the storage servers know nothing about the data) and distributed in a configurable way. If you have 10 disks you can set configure it so that al data is distributed for example to at least 8 disks and 5 (configurable) disks are enough to recover the date. So even if 3 random disks fail, your data is still safe!
take a look at it on https://tahoe-lafs.org/trac/tahoe-lafs
Did i mention that it is free and open source?
http://www.xtreemfs.org/ is a distributed fs with no single point of failure (i guess, depending on the configuration), for high latency networks, if you want to put nodes on WAN. It's fairly easy to set up, now it replicates also mutable files, I dunno about its performance or reliability.
---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
You talk about VM, distribured data..... but what do you actually want?
Have you heard of the CAP theorem? does it apply to what your doing?
How dynamic is your data?
How granular are updates?
Is it of a transactional nature?
Is it your DVD/Bluray collection?
From what I'm hearing well managed software raid would suit, ATAoE 10Gps+ no need for more complexity, of course you could use a rack are file system but first you'd need to provide a problem that it actually solves.
So you run BangBros out of your basement?
In terms of just backing it up... you can use AMAZON s3... and you can just mail them hard drives instead of uploading over the internetorium. Of course, they mail your hard drive back just as soon as they have sucked all of the data off them and put it in your S3 account. Then you can start EC2 instances and do all of the supercomputing you want. I can't image what you would have AT HOME that takes up 5 TB of space though. I suppose you could be running your own version of pirate Netflix or something, but even so, a few hundred movies would, if ripped at the full 8 GB per film, take up only about 3 TB of space... so you might consider cramming all of those movies down to .avi files of 1 GB or so and thereby freeing up 4 TB of disk space, which will of course save you a lot of money when you upload it all to your S3 account.
if your life is such a big joke then why should I care?
why not mirror them to other nodes using DRBD or something like Hadoop where there are copies of parts of data distributed across a larger subset of machines?
You can figure out what you need without asking it here, asking this question on /. is overkill, and frankly the only interesting thing to me is what exactly the 16 TB of data contains.
Seriously.
My wife and I thought through this, and the only thing we felt we HAD to put offsite was our pictures. So we have an account with a backup provider that allows rsync, and I have it set up to update nightly. Works great so far.
We also discussed building three 'backup boxes' that we could place at some relative's houses...then everyone with one of the backup boxes could back up to the other two. We decided not to do it for the expense, though, and we didn't think the relatives would be that interested.
Those filesystems are not designed primarily with your scenario in mind. If you want a hardware agnostic support, use software RAID or a non-cluster filesystem like ZFS.
Distributing your storage will probably not enhance your ability to survive a mishap. In fact, the complexity of the situation probably increases your risk of messing up your data (I have heard more than a couple of instances of someone accidentally destroying all the contents of a distributed filesystem, but in those professional contexts they have a real backup strategy. You'll be pissing away money on power to drive multiple computers that you really don't need to power.
If you care about catastrophic recovery, you need a real backup solution. This may mean identifying what's "important" from a practical home situation. If you don't mind downtime so long as your data is accessible in a day or two (e.g. time to get replacement parts) without going to your backup media and without suffering the loss of non-critical data, then also having a software raid or ZFS is the way to go. If you want to avoid downtime (within reason), get yourself a box with basic redundancy designed into it like a tower server from Dell/HP/IBM. If Intel, you would sadly want to go Xeon to get ECC, on AMD you can get ECC cheaper. In terms of drive count, I'd dial it back to 4 3TB drives in a RAID5 (or 5 in RAID6 if you wanted), safe on power and reduce risk in the system.
XML is like violence. If it doesn't solve the problem, use more.
Home brew solutions are good for a small business, but once you move into multi-terabyte solutions, you should consider a more Enterprise ready solution. If I were in your position, I would consider a dedicated storage area network device such as an EMC VNX or NetApp storage array. Both handle multi-terabyte solutions well. Both are also easy to manage and integrate well into most network environments (CIFS / NFS / FC / FCoE / iSCSI). If you are looking for just NFS / CIFS, Isilon also makes a very fast and scalable NAS device that is super easy to manage.
Eric Bursley
I have a small herd of AFS servers at home, sounds like it would meet your needs.
One RW and a herd of RO replicants, at least for the important stuff. The RO replicants are updated automatically every day or so by a script I wrote, I can also run it manually. I believe you are limited to 6 RO replicants for each RW volume and I'm bumping up against that limit at home, don't know how big installations survive that limitation.
If the RW blows up, which hasn't happened, supposedly its trivial to make one of the RO a live RW.
I also snapshop backup each night at 2am and have the daily snapshots mounted on ~/backup to make it easy to correct accidental deletion errors.
If you use AFS be prepared for an avalanche of people who have never used it, or haven't used it since 1996, or tried to use it without reading any docs or howtos or tutorials, telling you its impossible and too complicated and too difficult and should never be attempted and it'll never work. On the other hand, I just used some simple tutorials and walkthrus found via google, practically screencasts in terms of level of detail, and found it to be quite trivial, like a couple hours work, which isn't bad for all it does. I (almost) feel sorry for the haters. Sucks to be them, I guess.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Stick six 3T drives into a box, run rsync between'em daily (cron!). Build identical box in a different room/house, run rsync between first box and 2nd box daily/weekly (depending if local network or not)? each box, maybe ~$2k. That's 18T of space per box, with maybe 9T of it if you want to keep two copies of everything on each box.
Monitor disks, and as soon as OS tells ya the disk is going bad, replace it. It's amazing what modern disks are capable of.
Unless data changes rapidly, there's no need for replication that happens more than daily (e.g. typical home server box for whatever it is you're doing). Worst case scenario, you lose updates for that date. Have a tiny raid for critical stuff if that's not acceptable.
Easy to recover, 'cause, you have all the files right there, just swap disk, rsync will ensure all is good. No weird formats to deal with, no configurations to fiddle with (not something you want to be doing while recovering from a bad disk).
"If anything can go wrong, it will." - Murphy
Hi,
I work for a supercomputing center and am the maintainer of our 1/2 PB Lustre deployment. I also hang out on the GlusterFS and Ceph IRC channels and mailing lists and have spent some time looking at both solutions for some of our other systems.
For what you want, Lustre isn't really the right answer. It's very fast for large transfer (though slow for small ones). On our storage I'm getting about 12GB/s under ideal conditions and that's totally uninteresting as far as Lustre goes. There are very few other options out there that are competitive at the ultra-high-end (ie PBs of storage at 100+ GB/s). On the other hand you *really* need to understand the intricacies of how it works to properly maintain it. It doesn't handle hardware failures very gracefully and there are still numerous bugs in production releases. A lot of progress has been made since the Oracle acquisition, but it's going to be a while before I'd consider Lustre mainstream. I wouldn't use it for anything other than scratch (ie temporary data) storage space on a top500 cluster.
GlusterFS and Ceph are both interesting. GlusterFS is pretty easy to setup and has a replication mode but last I heard there were some issues simultaneously enabling striping and replication at the same time. Now that RedHat is backing it I imagine its going to pick up in popularity really fast. Also, having the metadata distributed on the storage servers eliminates a major problem that Lustre still has: A single centralized metadata server. Having said this it's still pretty young as far these kinds of filesystems go, and it's not immune from problems either. Read through the mailing list.
Ceph is also very interesting, but you should really run it on btrfs and that's just not there yet. You can also run it on XFS but there have been some bugs (see the mailing list). Ceph is really neat but I wouldn't consider it production ready. Rumors abound though that dreamhost is going to be making some announcements soon. Watch this space.
Ok, if you are still reading, here's what I would do if I were you:
If you are running on straight up gigabit ethernet you basically have no reason to bother with distributed storage from a performance perspective. 10GE is a cheap upgrade path and a single server will easily be able to handle the number of clients you'll have on a home network. From a reliability standpoint I've personally found that something like 70-80% of the hardware problems I have are with hardware raid controllers. I'd stick with something like ZFS on BSD (or Nexenta if you don't mind staying under 18TB for the free license). Then export via NFS or iscsi depending on your needs. If you want HA across multiple servers, here's what people are doing on BSD with ZFS:
http://blather.michaelwlucas.com/archives/221
Two issues here:
1. You're approaching the problem from the wrong angle. IMV, the angle you take should be "how long can can I afford to be without this data and how much money am I prepared to throw at a solution?" rather than "what technology exists that I can use to make the system more reliable?". Taking the former approach allows you to plan exactly how you'd deal with data loss - whether it's through human error, software/hardware failure, fire, theft, flood or what have you. Taking the latter approach tends to result in some whacking great Heath Robinson (or if you're American, Rube Goldberg) of a solution that still has a whacking great hole in it somewhere.
2. 8TB of data is not an enormous amount by any modern standard. You can buy a NAS box off-the-shelf today that will take 12x3TB hard disks for 36TB (18TB if you've got the good sense to run them in a RAID 1+0 configuration) of storage; at this level they typically have replication built right into them so you can buy two and replicate one to the other (though like all replication-type solutions, it's not a form of backup and you mustn't treat it as such). If that doesn't appeal, simply put a couple of SATA controllers in a cheap box and run OpenFiler. Anything you cobble together yourself based on the latest clustered filesystem du jour will suffer from one huge flaw - a system that's designed to be highly-available is frequently less reliable than one that isn't, simply because you're making it that much more complicated that there's a lot more to go wrong.
Clustered filesystems are not designed to make your data safer, or to provide ease of recovery. In fact, they make both of those things a bit more difficult. In the case of Lustre, the point is performance -- I have N servers that I am willing to dedicate to serving the filesystem, I can therefore get N times the throughput for large distributed jobs.
File systems that provide replication help, but unless it is copy on write (COW), it does nto take the place of backups.
If you are paranoid about data safety, invest in a backup solution. The only reason to use a distributed file system is for increased performance.
Posting anonymously because I know you guys will mod this down.
Anyway, Windows DFS-R (Distributed File System Replication) does seem to fit the bill. It's hardware agnostic, very scalable, and uses a multi-master replication model. We use it across 10 sites for the last 7 or 8 years (previously it was called FRS) with excellent results.
The best part about it is that it's easy to use and set up. You can create replica reports (and schedule them) that show you exactly what is going on. Across slow links you can throttle how much bandwidth it uses and even schedule when replication happens. We've found that with a low number of writes real-time works great. Also, it only replicates what has changed in the file, not the entire file (it will replicate just an ACL change if that's all that has happened).
Disclaimer: I'm the project lead for HekaFS, which is based on GlusterFS.
If you're concerned about data protection, you'll want to worry about node as well as disk failures. Some distributed filesystems, including Lustre and PVFS*, take a rather old-school "use RAID and implement your own heartbeat/failover between server pairs" approach, and that just sucks. GlusterFS and Ceph don't have that wart; neither do MooseFS or XtreemFS, which I would consider the other alternatives. They all have their own forms of replication built into the filesystem, so you don't need to set up and maintain another layer for them. Unfortunately, neither MooseFS nor Ceph survived even simple tests - write a few files in parallel, flush caches, read them back in parallel - when I ran those tests on the same hardware as GlusterFS and XtreemFS which did fine. That was a while ago, though, so take that with a grain of salt. Ceph in particular has a lot of awesome technology and has a very bright future IMO, but it's taking a while for it to realize that potential.
Out of GlusterFS and XtreemFS, the choice has a lot to do with your exact use case. XtreemFS has a pretty strong focus on wide-area replication, so if that's part of your need now or likely to be in the future then it's probably a bit stronger. GlusterFS does have some wide-area replication, but I consider it rather weak. Within a single data center, I'd give GlusterFS the edge. It has better local performance than XtreemFS in my tests, and it has what I consider by far the best setup/management interface.
The one caveat I'd offer is that all of the filesystem I've mentioned excel for sequential access for large files. For random access, and especially for metadata-heavy workloads, they all suck to some degree. As others have mentioned, you might very well be better off with a simple NFS server pair with cheap shared storage and heartbeat/failover to ensure availability.
Slashdot - News for Herds. Stuff that Splatters.
20 disks seems like overkill for your storage needs. Seems like the more disks you use the greater the risk of failure of one or more of them. Also, your electricity bill must be through the roof. I have 4 3TB drives with a 3Ware controller in RAID5 array which gives me the same storage capacity with 1/5th the drives.
You should seriously consider adding another drive and migrating to RAID6, because RAID5 has a fatal flaw that may cost you all of your data (or at least force you to restore from backup, but, seriously, keeping 9 TB backed up isn't easy).
The problem with RAID5 is that if you lose one of your drives it leaves your array in a very risky state. This seems obvious, since it's clear that the failure of any drive at that point will lose all of your data, but it's actually at least an order of magnitude worse than it appears. Why? Because the failure of a second drive at that point is actually quite likely. When you install a replacement drive, the array has to resync to incorporate the new drive and get back to a health state. Do do this, the resync operation has to read every single block of every remaining drive. This means that if there are any other latent failures, unrecoverable blocks that just haven't been noticed yet, the resync will find them and the resulting failure will lose all of your data.
In fact, even a transient failure can lose all of your data. I was actually able to recover mine once, due to the fact that I was using software RAID rather than hardware. Linux mdraid allowed me to "forcibly" restart my degraded array (carefully specifying the order of my disks exactly as they had been; which information I had thanks to the e-mails md had sent me), at which point I ran out and bought enough big disks that I could back the entire set up. The backup succeeded.
After a similar experience which was even more harrowing because the failure wasn't transient, I abandoned RAID5 for data I care about and switched to RAID6.
My current approach is:
In addition, I also run regular surface scans on all of my drives. In theory this should make RAID5 acceptable since it should identify any waiting problems before the array is degraded. In practice, I still don't trust RAID5.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
When I looked at the data that I REALLY needed to keep, I came to "not very much". Nothing that I could not host (encrypted) at my provider. I am talking less then 20MB in data.
Don't have kids?
For those of us that do, we typically end up with a lot of photos and video that we would really, really hate to lose.
My solution for this is high-volume off-site backup using Tahoe LAFS. I have about 200 GB backed up now, and will have 400 GB within a couple of months.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
I'm hearing you say "clusterfs" but what I'm reading from your post is "remotely recoverable filesystem". A cluster filesystem makes lots and lots of sense if you've got a 100 nodes that need high speed access to a single piece of storage.... this doesn't sound like this application...
What you should setup is a ZFS box (I'm a fan of raid10, but pick the raid of your choice) we'll call this machine A.
Now go build an indentical box named "B".
Now, what you'll want to do is setup an rsync process that does the following...
1. send a zfs command from A to B creating a snapshot of the appropriate file systems.
2. rsync said filesystem from A to B
3. Sleep some amount of time, goto step 1.
Now at some point you'll want to clean up all the snapshots on B, but that's an excersie I'll leave to the reader.
Another option is to take the snapshot on A and then use zfs send to send the snapshots as well.
Yes Francis, the world has gone crazy.
just use GlusterFS and get to your happy place, seriously awesome stuff with a great community, having tried many alternatives, GlusterFS is light weight, easy to setup and manage :)
ciao/Riaan
I've put mooseFS through its paces with good results on FreeBSD and a couple of MacbookPro's. The easy configuration, real-time stats, self-healing and the ability to quickly add more instances to increase throughput are just a few highlights. Documentation is a bit terse, but complete enough for anyone with a few hours to spare to get it up and running. There are a few quite large companies using it in production in Europe.
Gluster - up and down all the time in heavy LAMP production for about a year. Ended up replacing it with Netapp.
P.S. I am in no way affiliated with any of the products/companies mentioned in my post.
Most certainly is very relevant to the question. His entire premise is NOT losing data. He went out of his way to recite two separate anecdotes to that effect. And none of his solutions resolve that goal.
help me i've cloned myself and can't remember which one I am
... That's a lot of porn
"help me i've cloned myself and can't remember which one I am" ----> Kill the less powerful one.
I come to Slashdot only to read sigs. One you are reading is mine.
Just use ZFS / SAN... why not ???
I come to Slashdot only to read sigs. One you are reading is mine.
http://www.drbd.org/
Yeah, when looking at his post it seems that things can never get boring around him;)
rdiff-backup keeps old versions around via backwards differences. And if you use it right, the most recent version of a file can just be read from the backup drive without using rdiff-backup.
And it can operate over a network.
According to the docs, it's supposed to be available for mac and windows, too.
-- hendrik
Try freenas or openfiler, they both offer software raid with double parity. If you want, seed a 2nd and drop it in a remote location (parents, or friends house), then keep up to date with whatever method you prefer, tunneled rsync, whatever. They support snapshots, so even if you replicate something bad you can get it back.
I know this is not 4chan but
> 8 TB
> 'researching' a cluster file system
> one or two servers
OP is retarded... slashretarded
I hope you lose all your porn, retard
Then you will need to care only for 4 MB of important data
I have experience with glusterfs - it's ok for large volumes and you can always access underlying filesystem directly, but I wouldn't call it safe. Some files might not get written completely (if any server gets restarted in process), or some replication errors (like invalid startup sequence after power outage) could lead to files missing completely from all machines. So, my scripts are like copy/wait/verify/delete for most access.
I would recommend software mirroring + lvm on top of that with periodic backups on online storage (also diff file lists for deletes and email reports for files deleted locally).
I tried Ceph at around 5 computers for two weeks and experienced dataloss (reported the bug). They have probably fixed it by now.
After that, I switched to MooseFS, which works great (although it's not as advanced as Ceph). MooseFS has done a great job for a year now.
You can also consider MooseFS http://www.moosefs.org/
setup is easy, can be mount in user space,rpm/deb packages exist