Distributed Filesystems for Linux?
Zoneball looked at 3 distributed filesystems, here are his thoughts:
" Open AFS was the solution I chose because I have the experience with it from college. For performance, AFS was built with an intelligent client-side cache, but did not support network disconnects nicely. But there are other alternatives out there.
Coda appears to be a research fork from an earlier version of AFS. Coda supports disconnected operations. But, the consensus on the Usenet (when I looked into filesystems a while ago) was that Coda was still too 'experimental.'
Intermezzo looks like it was started with the lessons learned from Coda, but (again from Usenet) people have said that it is still too unstable and it crashes their servers. The last 'news' on their site is dated almost a year ago, so I don't even know if it's being developed or not"
So if you were to recommend a distributed filesystem for Linux machines, would you choose one of the three filesystems listed here, or something else entirely?
If some latency is acceptable, you could just setup cron to run rsync, or some other synchronization tool every 5 minutes. Just don't forget to run a NTP server on your network, and synchronize the time on every computer that runs rsync. Otherwise you might lose data due to clocks out of sync.
It seems like a distributed filesystem might be overkill for your needs. If what you really want is the appearance of a single common machine, why not just pick one as a server, and set up your other boxes as X clients. You can even pull out most of their memory and storage, and stick it in the server, thus turning them all into pretty powerful machines.
Money I owe, money-iy-ay
I usually use rsync for one way backups, and unison where I need 2 way synchronization.
Rsync is nice because you can update lots of files very quickly, as it only moves binary diff's between files. Also, if it is a costly network link, you have the option to specify max transfer rates, so you don't kill your pipe when it runs from your cron job.
Unison is nice because it is pretty smart about determining which files should be moved, and can correctly handle new and deleted files on either end of the link. Plus it supports doing all of it's comm via ssh, so it's secure.
rsync
unison
The downside to both of these being that neither of them are instantaneous. However, I've had much success running both of these as often as every 5 minutes. Just make sure that you launch them from a script that is smart enough to check for already running instances before it starts trying to move data.
NFS alone does not handle this, but you can do (or at least adequately simulate) much of these features with additional software.
/home directory (or wherever you like to mount them) using autofs. You will often create a auto.home configuration file in autofs that contains a mount point for each user. These all then appear in the common /home directory.
Handling connection/disconnection is your automounter daemon "autofs".
Given that you distribute each user to one host's disk, you can combine them all to a common
In order to help you maintain your auto.home file you might use an administration aid such as NIS (or NYS or NIS+). This helps you keep your configuration files on several machines common to one another.
Replication is handled by mirroring the disks and possibly providing dual servers.
While the discreteness of your distribution of files is in user account chunks, this does make for a distributed file system.
Yes, it is all kluged together, and something better needs to be created.
Nope, NFS is -not- a distributed file system. NFS is a point to point file system. And, unless you are using kerberized NFS, it is not secure.
The only file system that is truely distributed, has a global namespace, replication, and fault tolerance is AFS.
NFS is pretty much the same as CIFS for Windows. And, version 4 still doesn't have global namespace and volume location.
So, NFS can't be a common answer because it isn't even allowed to be in the game.
+4 cents.
Plan 9 gives you a different perspective and it is interesting.
That's like saying "jumping off a cliff is not the most intelligent thing to do." NFS is easily the LEAST secure option of ANY filesharing system.
NFS is only appropriate on a 100% secured(physical and network-level) network. If anyone/someone can plug in, forget it. If anyone has root on ANY system or there are ANY non-unix systems, forget it. If ANY system is physically accessible and can be booted off, say, a CDROM, forget it. The only major security tool at your disposal is access by IP, which is pathetic. Oh, and you can block root access.
Even though you can block root access for some/all clients, it's still massively insecure, and this remains NFS's greatest problem. You have zero way of authenticating a system. NFS is like a store where you could walk in, pick up any item you wanted, and say "I'm Joe Shmoe, bill me for this!" and they'd say "Right-o!" without even looking at you. All systems with the right IPs are explicitly trusted, and their user/permissions setups are also explicitly trusted.
NFS is a pretty good performer, especially when tuned right and on a non-broken client(which linux is VERY far from.) However, its entire security model is in dire need of a complete overhaul. There needs to be a way to authenticate hosts, for one, more similar to WinNT's domain setup, which is actually incredibly intelligent(aside from the weak LANMAN encryption.) The administrative functionality in NFS can't compare to the features that have been available to MacOS and Windows administrators for over a decade, and it's purely embarassing.
Either that, or AFS/Coda need to get a lot more documentation and (for Coda)implementation fixes. The unix world desperately needs a good filesharing system...
Please help metamoderate.
Other options like LDAPS and Kerberos offer at least some form of security.
ypcat, then brute force attack on the resulting passwd file is as old as dirt, and sadly still works. I was a bit dissappointed when I saw NIS as a required service on the Redhat cert syllabus.
This may sound harsh, but I don't think there is much excuse for run NIS in this day and age. Anyone who does this in an environment where security is a concerns deserves what they get.
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
It's been featured on slashdot before.
Non-Linux Penguins ?
You can use it as a distributed file system. I think you'd need a bit of glue logic to make it look like one, though.
I'm doing this, but one thing that annoys me about Linux is that it can't do swap over NFS, which means I'm stuck with just installed RAM on my diskless clients.
Sure, there's patches around for it, but they're not exactly reliable. Come on, guys, Solaris has been doing this for years now - how hard can it be?
Linux Journal (I think) had a story by a guy who was using cvs to sync his home directory between work and home. I think he said he did commits and updates every few days, or when he got tired of things being out of sync. For what he wanted, consistent config files and so on with little hassle, and the ability to intelligently merge differences if necessary, it worked well enough for him.
In the medium term, however, I think WebDAV will become a better option, because it can be served and accessed with standard web servers and clients, in addition to being mappable onto the file system.
The Linux kernel already has WebDAV support (CODA hooks plus some user-mode process), although I'm not sure how well it works.
To do a true backup, you must copy permissions. To copy permissions, the target system needs to have the same UIDs and GIDs as the source system. This is hard to do on Windows and OS X. Typical tools such as rsync, Unison and rdiff-backup make no effort to solve this problem. Suggestions?
Doesn't samba-tng support a true DFS, exporting a virtual file system (combination of shares from multiple systems)?
Anyone using it?
http://shfs.sourceforge.net/
This has some potential.
I'm from the same lab from which SFS comes, so I'm a bit biased, but I've been using it in a production setting for the last two years. My major use is to work from home and access my MIT filesystem remotely. I also maintain a network of ~40 machines distributed around the world, and I use SFS to provide access to centralized home directories on them. Very, very convenient. The software is stable, and the support is good. It works on *BSD and Linux. It also works on some versions of MacOS X, but may require an upgraded gcc on the latest (see the fs.net mail archives).
Highly recommend cheking it out. Mega convenient.
Why don't you try Linux FailSafe, It's GPL and available on SGI's web site. It can cluster applications as well as filesystems.
Use rsync. Default is to map user and group names at both ends of the connection, unless you specify --numeric-ids. Of course you have to have at least the names right, otherwise there's nothing to work with. And you need rooteness on the receiving end, but that's also to be expected.
I've been using rsync for some time now to manage moving research data between home and school and I'm thoroughly impressed. Great piece of software.
Stefan Axelsson
IBM deployed a new distributed filesystem that goes beyond AFS, it's called GPFS and it's part of the xCAT package. You can find it here.
Unfortunately, documentation is really poor at this moment... but I think it could be a really good solution.
May the source be with you!
I mean, if I use AFS, does that mean from now on, every time I run an install script for some random package that chmods something, I have to realize that the script doesn't really work, and then I have to analyze its intent and then do some ACL thing that accomplishes the same intent? Ugh, I am not interested in things that create more work for humans.
Another annoying-looking thing is that it's really a filesystem, even from the servers' point of view. Unlike sharing/exporting services such as NFS and Samba, which you can run on top of your choice of filesystem (ext3, Reiserfs, xfs, etc), it appears that AFS/OpenAFS combines both the disk and the network topics. That means you don't get the advantages of all the great work the filesystem geeks have been doing in the last few years.
It almost strikes me as inelegant design or something, that a single project concerns itself with both the details of how things are laid out on a disk, and also how to do network-related things such as replication. Somebody made their black box too big for my tastes.
Am I wrong about all this?
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
How does it handle disconnects?
Are the files available off-line?
I guess this doesn't really apply to "home usage" but I have to manage a lot of machines over a SAN and if you don't want people screwing up your SAN, you better use something like CXFS.
:)
CXFS uses a sort of token technique and allows multiple file accesses. That way, we get the same files on all the machines but w/o the NFS overhead and network congestion. File read/write are done over the fiber channel switch and the "metadata" is done over a private network. This is WAY much faster than NFS over Gigabit Ethernet. One good thing about CXFS is the redundency possibility. You can have failover servers and other neat things.
The only drawback, is that you need an SGI server but then, you can use Windows and Solaris clients. Very stable but probable not for home use
-- Leeeter than leet