Slashdot Mirror


Distributed Filesystems for Linux?

zoneball asks: "What would you use for a distributed file system for Linux? I have several GNU/Linix machines running at home, and wanted to be able to see more or less the same file tree (especially all the ~user directories) regardless of which machine I'm connected to, and where the traversal into the distributed file system space is largely transparent for the end-user. Are there any URLs or documents that compare the features, bugs, road map, stability of these and other distributed filesystems? Which offers the best stability and protection from future obsolescence?"

Zoneball looked at 3 distributed filesystems, here are his thoughts:

" Open AFS was the solution I chose because I have the experience with it from college. For performance, AFS was built with an intelligent client-side cache, but did not support network disconnects nicely. But there are other alternatives out there.

Coda appears to be a research fork from an earlier version of AFS. Coda supports disconnected operations. But, the consensus on the Usenet (when I looked into filesystems a while ago) was that Coda was still too 'experimental.'

Intermezzo looks like it was started with the lessons learned from Coda, but (again from Usenet) people have said that it is still too unstable and it crashes their servers. The last 'news' on their site is dated almost a year ago, so I don't even know if it's being developed or not"

So if you were to recommend a distributed filesystem for Linux machines, would you choose one of the three filesystems listed here, or something else entirely?

28 of 375 comments (clear)

  1. rsync? by vadim_t · · Score: 1, Interesting

    If some latency is acceptable, you could just setup cron to run rsync, or some other synchronization tool every 5 minutes. Just don't forget to run a NTP server on your network, and synchronize the time on every computer that runs rsync. Otherwise you might lose data due to clocks out of sync.

  2. None of the above by SlightlyMadman · · Score: 3, Interesting

    It seems like a distributed filesystem might be overkill for your needs. If what you really want is the appearance of a single common machine, why not just pick one as a server, and set up your other boxes as X clients. You can even pull out most of their memory and storage, and stick it in the server, thus turning them all into pretty powerful machines.

    --

    Money I owe, money-iy-ay
  3. Re:Mirroring file system by Arethan · · Score: 4, Interesting

    I usually use rsync for one way backups, and unison where I need 2 way synchronization.
    Rsync is nice because you can update lots of files very quickly, as it only moves binary diff's between files. Also, if it is a costly network link, you have the option to specify max transfer rates, so you don't kill your pipe when it runs from your cron job.
    Unison is nice because it is pretty smart about determining which files should be moved, and can correctly handle new and deleted files on either end of the link. Plus it supports doing all of it's comm via ssh, so it's secure.

    rsync

    unison

    The downside to both of these being that neither of them are instantaneous. However, I've had much success running both of these as often as every 5 minutes. Just make sure that you launch them from a script that is smart enough to check for already running instances before it starts trying to move data.

  4. Re:NFS by Anonymous Coward · · Score: 1, Interesting

    NFS alone does not handle this, but you can do (or at least adequately simulate) much of these features with additional software.

    Handling connection/disconnection is your automounter daemon "autofs".

    Given that you distribute each user to one host's disk, you can combine them all to a common /home directory (or wherever you like to mount them) using autofs. You will often create a auto.home configuration file in autofs that contains a mount point for each user. These all then appear in the common /home directory.

    In order to help you maintain your auto.home file you might use an administration aid such as NIS (or NYS or NIS+). This helps you keep your configuration files on several machines common to one another.

    Replication is handled by mirroring the disks and possibly providing dual servers.

    While the discreteness of your distribution of files is in user account chunks, this does make for a distributed file system.

    Yes, it is all kluged together, and something better needs to be created.

  5. Re:NFS by rmdyer · · Score: 4, Interesting

    Nope, NFS is -not- a distributed file system. NFS is a point to point file system. And, unless you are using kerberized NFS, it is not secure.

    The only file system that is truely distributed, has a global namespace, replication, and fault tolerance is AFS.

    NFS is pretty much the same as CIFS for Windows. And, version 4 still doesn't have global namespace and volume location.

    So, NFS can't be a common answer because it isn't even allowed to be in the game.

    +4 cents.

  6. Plan 9 by Anonymous Coward · · Score: 1, Interesting

    Plan 9 gives you a different perspective and it is interesting.

  7. NFS is not even close to secure by SuperBanana · · Score: 4, Interesting
    It's not the most secure option around

    That's like saying "jumping off a cliff is not the most intelligent thing to do." NFS is easily the LEAST secure option of ANY filesharing system.

    NFS is only appropriate on a 100% secured(physical and network-level) network. If anyone/someone can plug in, forget it. If anyone has root on ANY system or there are ANY non-unix systems, forget it. If ANY system is physically accessible and can be booted off, say, a CDROM, forget it. The only major security tool at your disposal is access by IP, which is pathetic. Oh, and you can block root access.

    Even though you can block root access for some/all clients, it's still massively insecure, and this remains NFS's greatest problem. You have zero way of authenticating a system. NFS is like a store where you could walk in, pick up any item you wanted, and say "I'm Joe Shmoe, bill me for this!" and they'd say "Right-o!" without even looking at you. All systems with the right IPs are explicitly trusted, and their user/permissions setups are also explicitly trusted.

    NFS is a pretty good performer, especially when tuned right and on a non-broken client(which linux is VERY far from.) However, its entire security model is in dire need of a complete overhaul. There needs to be a way to authenticate hosts, for one, more similar to WinNT's domain setup, which is actually incredibly intelligent(aside from the weak LANMAN encryption.) The administrative functionality in NFS can't compare to the features that have been available to MacOS and Windows administrators for over a decade, and it's purely embarassing.

    Either that, or AFS/Coda need to get a lot more documentation and (for Coda)implementation fixes. The unix world desperately needs a good filesharing system...

    1. Re:NFS is not even close to secure by rneches · · Score: 3, Interesting

      And if you're lazy and/or adventurous, you can turn on NFS over TCP in your kernel and tunnel it over ssh or ppp/ssh. I've never tried it, but it ought to work. I understand that NFS over TCP is relatively untested, but is reputed to work rather well. Doing weird things like this would be a pretty good way to test the NFS over TCP code, and I'm sure the developers would be interested to hear how it goes. Particularly if you run a lot of data over it for a long time, and have a good way of verifying that all is well. Or, better still - if all is not well, and you have a good way of articulating what went wrong.

      Of course, that doesn't mean it's a good idea. I think your solution with IPSec is much more elegant. Unfortunately, I happen to need to get through a heavily packet-shaped network that massively favors port 80, and drops random packets everywhere else. Not IPSec friendly at all. I avoid this by running multiple ppp/ssh tunnels through the retarded parts of the network and letting my gateway balance between them. Unfortunately, this requires privileged accounts on many, many boxes in odd places.

      By the way, 10 points to any Northeastern University students who send polite, well considered complaints to Network Services. Not RESNet - they exist only to prevent you from talking to Network Services. Don't bother yelling at them - they exist specifically for that purpose. RESNet has no authority whatsoever to, for instance, allow CVS to work when Network Services decides to to drop 90 percent of packets on port 2401. This is for your benifit - I'm perfectly happy with my tunnels.

      --
      In spite of the suggestions and all the tests that I have made, I have not cavato a spider from the hole.
    2. Re:NFS is not even close to secure by bfields · · Score: 4, Interesting
      "Maybe NFS4 is your answer?"
      More up-to-date NFSv4 links: As part of University of Michigan/CITI's work on NFSv4, we're implementing rpcsec_gss on Linux, which uses kerberos to authenticate every NFS request and reply. This applies equally well to earlier versions of NFS, and interoperates with other vendor's NFS implementations. While it's still not sufficiently tested for production use, the code is going in to the 2.5 kernel series (thank-you, Mr. Torvalds, for accepting crypto into 2.5...) and is being actively developed.

      --Bruce Fields

    3. Re:NFS is not even close to secure by Agthorr · · Score: 3, Interesting
      I run NFS over IPSec. That solves many of the security issues.

      -- Agthorr

  8. NIS == "Hack me please" by Kunta+Kinte · · Score: 4, Interesting
    Don't use NIS, unless you have absolutely no other option.

    Other options like LDAPS and Kerberos offer at least some form of security.

    ypcat, then brute force attack on the resulting passwd file is as old as dirt, and sadly still works. I was a bit dissappointed when I saw NIS as a required service on the Redhat cert syllabus.

    This may sound harsh, but I don't think there is much excuse for run NIS in this day and age. Anyone who does this in an environment where security is a concerns deserves what they get.

    --
    Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
    1. Re:NIS == "Hack me please" by Morrig · · Score: 2, Interesting

      NIS supports shadow, so there's no need to distribute an unshadowed passwd map like your example.

      Anyway a lot of unix systems that have been around for a long time use NIS still. This is probably why it's still on the RedHat certs (people wanting to put a redhat workstation on their existing network, or to upgrade their existing NIS server to something newer).

      Besides, it can be a pain to get rid of using NIS, especially on older networks. So it will probably be around, and will need to be supported, for many many years.

  9. Something more than this... by dargaud · · Score: 3, Interesting
    May I suggest something more than this ? If you have static IPs and you are running linux, why not install OpenMosix ? It's a cluster patch to the kernel, very easy to install and use. Not only does it turn your pile of hardware into one giant SMP system, it also comes with a special filesystem on top of ext3, so that you can see all drives from all nodes. I have it running on 24 processors, but I don't know how well it would perform through the Internet.

    It's been featured on slashdot before.

    --
    Non-Linux Penguins ?
  10. WebDAV by Anonymous Coward · · Score: 1, Interesting

    You can use it as a distributed file system. I think you'd need a bit of glue logic to make it look like one, though.

  11. Re:NFS/BOOTP by BJH · · Score: 2, Interesting

    I'm doing this, but one thing that annoys me about Linux is that it can't do swap over NFS, which means I'm stuck with just installed RAM on my diskless clients.

    Sure, there's patches around for it, but they're not exactly reliable. Come on, guys, Solaris has been doing this for years now - how hard can it be?

  12. cvs? subversion? by Anonymous Coward · · Score: 1, Interesting

    Linux Journal (I think) had a story by a guy who was using cvs to sync his home directory between work and home. I think he said he did commits and updates every few days, or when he got tired of things being out of sync. For what he wanted, consistent config files and so on with little hassle, and the ability to intelligently merge differences if necessary, it worked well enough for him.

  13. WebDAV by g4dget · · Score: 3, Interesting
    Right now, I think the answer is to run NFS: it's by far the easiest to set up and the best in a UNIX environment. AFS, CODA, Intermezzo, and SMB are pretty iffy in comparison.

    In the medium term, however, I think WebDAV will become a better option, because it can be served and accessed with standard web servers and clients, in addition to being mappable onto the file system.

    The Linux kernel already has WebDAV support (CODA hooks plus some user-mode process), although I'm not sure how well it works.

  14. Re:Mirroring file system - example w/ssh by Earlybird · · Score: 2, Interesting

    To do a true backup, you must copy permissions. To copy permissions, the target system needs to have the same UIDs and GIDs as the source system. This is hard to do on Windows and OS X. Typical tools such as rsync, Unison and rdiff-backup make no effort to solve this problem. Suggestions?

  15. Samba 3 Virtual File System by Anonymous Coward · · Score: 2, Interesting

    Doesn't samba-tng support a true DFS, exporting a virtual file system (combination of shares from multiple systems)?

    Anyone using it?

  16. SHFS by TimCrider · · Score: 1, Interesting

    http://shfs.sourceforge.net/

    This has some potential.

    1. Re:SHFS by Cyuonut · · Score: 1, Interesting
      This has some potential.
      As the web page states:
      "To say the truth, it is our work for Operating Systems course at Charles University. It is just hack, but works, at least for me :-)."

      Obviously LuFS has still more potential.
  17. Re:Self Certifying File System by angio · · Score: 3, Interesting

    I'm from the same lab from which SFS comes, so I'm a bit biased, but I've been using it in a production setting for the last two years. My major use is to work from home and access my MIT filesystem remotely. I also maintain a network of ~40 machines distributed around the world, and I use SFS to provide access to centralized home directories on them. Very, very convenient. The software is stable, and the support is good. It works on *BSD and Linux. It also works on some versions of MacOS X, but may require an upgraded gcc on the latest (see the fs.net mail archives).

    Highly recommend cheking it out. Mega convenient.

  18. Linux FailSafe by Anonymous Coward · · Score: 1, Interesting

    Why don't you try Linux FailSafe, It's GPL and available on SGI's web site. It can cluster applications as well as filesystems.

  19. Re:Mirroring file system - example w/ssh by lars_stefan_axelsson · · Score: 2, Interesting
    To do a true backup, you must copy permissions. To copy permissions, the target system needs to have the same UIDs and GIDs as the source system.

    Use rsync. Default is to map user and group names at both ends of the connection, unless you specify --numeric-ids. Of course you have to have at least the names right, otherwise there's nothing to work with. And you need rooteness on the receiving end, but that's also to be expected.

    I've been using rsync for some time now to manage moving research data between home and school and I'm thoroughly impressed. Great piece of software.

    --
    Stefan Axelsson
  20. IBM GPFS by LynXmaN · · Score: 2, Interesting

    IBM deployed a new distributed filesystem that goes beyond AFS, it's called GPFS and it's part of the xCAT package. You can find it here.

    Unfortunately, documentation is really poor at this moment... but I think it could be a really good solution.

    --
    May the source be with you!
  21. What troubles me about AFS/OpenAFS... by Sloppy · · Score: 2, Interesting
    ..is that apparently it doesn't use unix-style file permissions. "ACLs are better" you might say, but still, it's different. It sounds like using it would not be transparent -- not just for the admin who has to learn how to set it up but also for users and existing software and scripts, which assume the chmod way of doing things.

    I mean, if I use AFS, does that mean from now on, every time I run an install script for some random package that chmods something, I have to realize that the script doesn't really work, and then I have to analyze its intent and then do some ACL thing that accomplishes the same intent? Ugh, I am not interested in things that create more work for humans.

    Another annoying-looking thing is that it's really a filesystem, even from the servers' point of view. Unlike sharing/exporting services such as NFS and Samba, which you can run on top of your choice of filesystem (ext3, Reiserfs, xfs, etc), it appears that AFS/OpenAFS combines both the disk and the network topics. That means you don't get the advantages of all the great work the filesystem geeks have been doing in the last few years.

    It almost strikes me as inelegant design or something, that a single project concerns itself with both the details of how things are laid out on a disk, and also how to do network-related things such as replication. Somebody made their black box too big for my tastes.

    Am I wrong about all this?

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  22. Re:Self Certifying File System by oneself · · Score: 2, Interesting

    How does it handle disconnects?
    Are the files available off-line?

  23. I use CXFS at work by leeet · · Score: 2, Interesting

    I guess this doesn't really apply to "home usage" but I have to manage a lot of machines over a SAN and if you don't want people screwing up your SAN, you better use something like CXFS.
    CXFS uses a sort of token technique and allows multiple file accesses. That way, we get the same files on all the machines but w/o the NFS overhead and network congestion. File read/write are done over the fiber channel switch and the "metadata" is done over a private network. This is WAY much faster than NFS over Gigabit Ethernet. One good thing about CXFS is the redundency possibility. You can have failover servers and other neat things.

    The only drawback, is that you need an SGI server but then, you can use Windows and Solaris clients. Very stable but probable not for home use :)

    --
    -- Leeeter than leet