Slashdot Mirror


Distributed Filesystems for Linux?

zoneball asks: "What would you use for a distributed file system for Linux? I have several GNU/Linix machines running at home, and wanted to be able to see more or less the same file tree (especially all the ~user directories) regardless of which machine I'm connected to, and where the traversal into the distributed file system space is largely transparent for the end-user. Are there any URLs or documents that compare the features, bugs, road map, stability of these and other distributed filesystems? Which offers the best stability and protection from future obsolescence?"

Zoneball looked at 3 distributed filesystems, here are his thoughts:

" Open AFS was the solution I chose because I have the experience with it from college. For performance, AFS was built with an intelligent client-side cache, but did not support network disconnects nicely. But there are other alternatives out there.

Coda appears to be a research fork from an earlier version of AFS. Coda supports disconnected operations. But, the consensus on the Usenet (when I looked into filesystems a while ago) was that Coda was still too 'experimental.'

Intermezzo looks like it was started with the lessons learned from Coda, but (again from Usenet) people have said that it is still too unstable and it crashes their servers. The last 'news' on their site is dated almost a year ago, so I don't even know if it's being developed or not"

So if you were to recommend a distributed filesystem for Linux machines, would you choose one of the three filesystems listed here, or something else entirely?

30 of 375 comments (clear)

  1. NFS by mao+che+minh · · Score: 4, Informative
    I know that this is going to be the most common answer, but just go with NFS. It's not the most secure option around, but obviously the simplest to implement and the best documented.

    NFS Linux FAQ
    Howto #1
    Howto #2

    If you find yourself needing help, try asking people at Just Linux forums, or trying the NFS mailing list.

    1. Re:NFS by gallir · · Score: 5, Insightful

      Naaaaaaaaaa.....

      NFS is not distributed, it's only "networked" or "remote". I t doesn't support any: replication, disconnection, sharing, distribution. It is centralised, requires the same user names|numberpace and security.

      In one word, it's far away of the requirements, at least if you compare them with the listed FS in the question.

      --
      sgis ddo ekil t'nod i
    2. Re:NFS by rmdyer · · Score: 4, Interesting

      Nope, NFS is -not- a distributed file system. NFS is a point to point file system. And, unless you are using kerberized NFS, it is not secure.

      The only file system that is truely distributed, has a global namespace, replication, and fault tolerance is AFS.

      NFS is pretty much the same as CIFS for Windows. And, version 4 still doesn't have global namespace and volume location.

      So, NFS can't be a common answer because it isn't even allowed to be in the game.

      +4 cents.

    3. Re:NFS by nosferatu-man · · Score: 4, Insightful

      "For every complex problem, there is an answer that is clear, simple, and wrong." -- HL Mencken

      'jfb

      --
      To spur "enterprise Linux," Big Bang, the distributed two-phase commit.
  2. Self Certifying File System by nescafe · · Score: 5, Informative

    I would use SFS, the Self Certifying File System. Assuming all the systems you are using are supported, it offers global, secure access to anything you care to export.

  3. Well it depends... by Tsugumi · · Score: 5, Informative
    For my money, nfs in a LAN, afs over a WAN, it really depends on the size of the network your trying to play with.

    Since openafs forked from the old transarc/IBM codebase, it looks as if it has a real future. It's used by a load of educational and research institutions (notably CERN), as well as Wall Street firms.

  4. Background on DFS by El+Pollo+Loco · · Score: 5, Informative

    Check here for a good background on DFS. It also has a quick table comparison of the popular programs, and a walkthrough to set up Intermezzo.

  5. PVFS by Kraken137 · · Score: 5, Informative

    We use PVFS at work to give us a high-performance network filesystem for use with our clusters.

    http://parlweb.parl.clemson.edu/pvfs/

  6. openmosix by joeldg · · Score: 5, Informative

    I run an openmosix cluster with the openmosix filesystem here at work. Three computers.. no problems...
    If you want to take a look..
    http://lucifer.intercosmos.net/index.php
    linkage and I am going to be placing some tutorials up. -joeldg

  7. Ye olde Samba by Anonymous Coward · · Score: 4, Informative

    Samba works fine. I personally have approximately 5 samba mounts in my filesystem totally transparent for anybody who was to walk up and use my computer.

    No need to unnecessarily complicate things here, samba is simple to set up and functions great.

  8. Re:permissions? by phorm · · Score: 4, Informative

    That's what NIS is for. You can schedule regular downloads of group/passwd files, which are updated in a NIS database stored on a master server, and passed down to "slave" servers.

  9. Intermezzo does appear to be a current project by Dr.Zap · · Score: 5, Informative

    While there is no new news posted on the site, ther are current tarballs on the ftp server, as recent as 5.9.03. (but that file appears to be a redux, last update to code seems to be 3.13.03)

    The sourceforge page for the project (http://sourceforge.net/projects/intermezzo) shows status as production/stable but the info there looks stale too.

  10. Future obsolescence ? by Rosco+P.+Coltrane · · Score: 4, Insightful
    Which offers the best stability and protection from future obsolescence?

    This guy must have installed too many versions of the same Microsoft products.
    In the GNU/Linux world, BSD world, and to some extend in the entire Unix world, good designs do not become obsolete. Even not-so-good designs often stick around, for the sake of backward compatibility. In the newest greatest Linux kernel, you can still have a.out support, NFS, Minix, FAT16 filesystem support ... You can still configure you networking using scripts for 2.0- or 2.2-based distros. You can often use 20 year old programs under Unix, albeit sometimes with some effort.

    Only in the M$ world is obsolescence such a big issue, because that obsolescence is planned. In short, don't worry that much about obsolescence : if Coda is as good as it looks, it'll be there for a long time. If SomeCrappyDistributedFS FileSystem is used by enough users, it'll stay around for compatibility's sake anyway, even if it sucks.

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
  11. Re:permissions? by Dysan2k · · Score: 4, Informative

    To be honest, big time, but a lot of people forget the other side of life with NFS, and that's NIS/NIS+. The yp-tools include pretty good NIS support, but not sure of NIS+. Would use niether in a production environment personally, but a common Auth system which is easy to manage would solve that issue.

    Could also look into LDAP (VERY complex, no good starting point that I've been able to find) and Kerbreos auth methods as well.

    Should give you a central point for uids/usernames. But NFS does not have transparent mounting that I'm aware of so that you could mount, say the /home directory of 5 computers onto / on a central system and it display all the mounts simultaneously. For example:

    <ECODE>
    CPU1 contains: /home/foo
    /home/baz

    CPU2 contains: /home/tic
    /home/tac

    CPU3 contains: /home/toe

    on CPU4, you'd do the following:
    mount CPU1:/home /home
    mount CPU2:/home /home
    mount CPU3:/home /home

    And you'd end up with on CPU4:
    /home/tic
    /tac
    /toe
    /foo
    /baz
    </ECODE>

    If there is a way to do this, please lemme know. I've heard people talk about it in the past, but haven't seen anything come of it yet.

    --
    -What have you contributed lately?
  12. Re:Format, Install Windows Server 2000 or 2003 by donkeyboy · · Score: 5, Funny

    That should have read...

    Format, Install Windows Server 2000 or 2003, Repeat

  13. NFS is not a DFS by purplebear · · Score: 5, Informative

    Just so you all know. NFS is a network accessible FS. A DFS can also be network accessible from clients, but it physically resides on multiple systems.

  14. Re:Mirroring file system by dlakelan · · Score: 4, Informative

    Whoa, you definitely need Unison.

    Unison will synchronize any two file trees in The Right Way (TM).

    Get the gtk version for interactive conflict resolution.

    --
    ((lambda (x) (x x)) (lambda (x) (x x))) http://www.endpointcomputing.com a scientific approach to custom computing.
  15. Obsolete ? by CmdrTostado · · Score: 5, Funny

    Which offers the best stability and protection from future obsolescence?

    The best protection from future obsolescence is to use something that is already obsolete.

  16. Re:Mirroring file system by Arethan · · Score: 4, Interesting

    I usually use rsync for one way backups, and unison where I need 2 way synchronization.
    Rsync is nice because you can update lots of files very quickly, as it only moves binary diff's between files. Also, if it is a costly network link, you have the option to specify max transfer rates, so you don't kill your pipe when it runs from your cron job.
    Unison is nice because it is pretty smart about determining which files should be moved, and can correctly handle new and deleted files on either end of the link. Plus it supports doing all of it's comm via ssh, so it's secure.

    rsync

    unison

    The downside to both of these being that neither of them are instantaneous. However, I've had much success running both of these as often as every 5 minutes. Just make sure that you launch them from a script that is smart enough to check for already running instances before it starts trying to move data.

  17. AFS vs NFS by runderwo · · Score: 4, Insightful
    It takes more time to set up an AFS cell than a NFS server, but the rewards are pretty tremendous IMO.

    It's become such a part of my day to day life that I can't really describe the things I was missing before. The best things about it are probably the strong, flexible security and ease of administration. It also gives you everything you need from a small shop all the way up to a globally available decentralized data store.

    There seems to be a good comparison here. I would strongly recommend AFS for all of your distributed filesystem needs. (The OpenAFS developers are cool too!)

    1. Re:AFS vs NFS by pHDNgell · · Score: 4, Informative

      I'm disturbed at the number of people who are recommending NFS as a distributed filesystem solution. While it might be easy to get going initially, I've had more long-term problems with my NFS server and client interactions than my AFS. To get my NFS clients to behave anything like AFS clients, I had to build and install an automounter that could use NIS config.

      You only have to wait for the first day you want to reboot a fileserver without breaking every system on your network or waiting for startup dependencies, etc... One day, I moved all of the volumes off of an active fileserver (i.e. volumes being written) and shut the thing down and moved it to another machine room, brought it back up, and moved the volumes back. The reads and writes continued uninterrupted, no clients had to be restarted, no hung filesystems anywhere, etc...

      --
      -- The world is watching America, and America is watching TV.
  18. Tutorial by TheFlu · · Score: 5, Informative

    I just went through this process a few weeks ago and I must say I'm really glad I went through the trouble of setting it up...it's very cool. I actually wrote a tutorial about how to accomplish this by using NIS and NFS. I hope you find it helpful.

    The only trouble you might run into with the setup I used is some file-locking issues with programs wanting to share the same preference files.

  19. NFS is not even close to secure by SuperBanana · · Score: 4, Interesting
    It's not the most secure option around

    That's like saying "jumping off a cliff is not the most intelligent thing to do." NFS is easily the LEAST secure option of ANY filesharing system.

    NFS is only appropriate on a 100% secured(physical and network-level) network. If anyone/someone can plug in, forget it. If anyone has root on ANY system or there are ANY non-unix systems, forget it. If ANY system is physically accessible and can be booted off, say, a CDROM, forget it. The only major security tool at your disposal is access by IP, which is pathetic. Oh, and you can block root access.

    Even though you can block root access for some/all clients, it's still massively insecure, and this remains NFS's greatest problem. You have zero way of authenticating a system. NFS is like a store where you could walk in, pick up any item you wanted, and say "I'm Joe Shmoe, bill me for this!" and they'd say "Right-o!" without even looking at you. All systems with the right IPs are explicitly trusted, and their user/permissions setups are also explicitly trusted.

    NFS is a pretty good performer, especially when tuned right and on a non-broken client(which linux is VERY far from.) However, its entire security model is in dire need of a complete overhaul. There needs to be a way to authenticate hosts, for one, more similar to WinNT's domain setup, which is actually incredibly intelligent(aside from the weak LANMAN encryption.) The administrative functionality in NFS can't compare to the features that have been available to MacOS and Windows administrators for over a decade, and it's purely embarassing.

    Either that, or AFS/Coda need to get a lot more documentation and (for Coda)implementation fixes. The unix world desperately needs a good filesharing system...

    1. Re:NFS is not even close to secure by tzanger · · Score: 5, Informative

      I use a very simple script to help keep NFS secure:

      IPTABLES=/usr/sbin/iptables
      RPCINFO=/usr/sbin/rpc info
      GREP=/usr/bin/grep
      AWK=/usr/bin/awk

      $IPT ABLES -F nfs
      $IPTABLES -N nfs &> /dev/null
      $RPCINFO -p localhost | $AWK '/portmap|mount|nfs|lock|stat/ \
      { print "iptables -A nfs -p " $3 " --dport " $4 " -j DROP" }' | \
      /bin/bash

      $IPTABLES -L INPUT -vn | $GREP -q 'nfs all -- !ipsec0+'
      if [ $? -ne 0 ]; then
      $IPTABLES -I INPUT 1 -i eth0 -j nfs
      fi

      Basically it only allows incoming NFS-related connections over ipsec, dropping anything that is not. NFS port allocation is dynamic by default and I know you can force ports, but this seemed far easier to scale.

      One thing I have noticed (and perhaps it's common knowledge to NFS experts) is that in order to get locking to work at all, my NFS clients had to be running statd and lockd. Without 'em everything worked but locking would fail every time.

    2. Re:NFS is not even close to secure by bfields · · Score: 4, Interesting
      "Maybe NFS4 is your answer?"
      More up-to-date NFSv4 links: As part of University of Michigan/CITI's work on NFSv4, we're implementing rpcsec_gss on Linux, which uses kerberos to authenticate every NFS request and reply. This applies equally well to earlier versions of NFS, and interoperates with other vendor's NFS implementations. While it's still not sufficiently tested for production use, the code is going in to the 2.5 kernel series (thank-you, Mr. Torvalds, for accepting crypto into 2.5...) and is being actively developed.

      --Bruce Fields

  20. NIS == "Hack me please" by Kunta+Kinte · · Score: 4, Interesting
    Don't use NIS, unless you have absolutely no other option.

    Other options like LDAPS and Kerberos offer at least some form of security.

    ypcat, then brute force attack on the resulting passwd file is as old as dirt, and sadly still works. I was a bit dissappointed when I saw NIS as a required service on the Redhat cert syllabus.

    This may sound harsh, but I don't think there is much excuse for run NIS in this day and age. Anyone who does this in an environment where security is a concerns deserves what they get.

    --
    Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
  21. OpenAFS all the way by fsmunoz · · Score: 5, Informative

    I had more or less the same basic requirements and I opted for AFS.

    My needs were a little more demanding (had to be implemented in GNU/Linux, Solaris, AIX, HP-UX and as an extra Windows 2000) and grocking AFS can be difficult at first but it was the best choice by far. Stable across all the Unices, very secure (this was another requirement) and integrates perfectly with our Kerberos Domain and LDAP accounting info. It provides a unique namespace that can span multiple servers transparently, does replication, automatic backups and read-only copies, client-side cache with callbacks, has a backup (to tape) system that can be used stand-alone or integrated with existing backup structures (Amanda, Legato, TSM) AND was the basis for the DCE filesystem, DFS (as a side note I find it interesting - and sad - that most things people try to emulate this days are present in DCE , and Windows 2000 got many of the "new features" from a technology initially made for Unix :DFS, DCOM, Directory Services, SSO, DCE-RPC, etc.)

    AFS is amazing and much more robust than any distributed filesystem I know of; it has shortcomings when servers time out, but apart from that it's really an excellent solution; an example I generally use to give an idea of some of the good features of AFS is a relocation of a home directory to another server. The user doesn't even notice that his home directory was moved to another server *even if he was using it and was writing stuff to disk*; at most all writing calls to his home dir have a small delay (a couple of seconds) even if his/her home dir was 5 Gb worth.

    Kerberos integration is an added bonus, if you can you can use this as an excuse to kerberize your systems and form a Kerberos Domain. If you don't want to just stick with the standard AFS KA server.

    In my setup I have Windows users accessing their home dirs in AFS using the Kerberos tickets they have from the Windows login and the fact that a cross-realm trust was made between the Unix DOmain and the AD; the can edit all the files they are entitled to with that ticket, and the system is so secure that Transarc used to put the source code in it's public AFS share and added the customers that bought the source to the ACL of the directory that contained it.

    With all this features it would be hard not to vivedly recommend OpenAFS as the best solution for a unified, distributed filesystem. Bandwidth utilization is, in my experience, at least half of what NFS uses, which is an added bonus.

    cheers,

    fsmunoz

    1. Re:OpenAFS all the way by MilliAtAcme · · Score: 4, Informative

      I second this "all the way" thought. I've been running OpenAFS for almost 2 years now on Debian GNU/Linux (many Thanks to Sam Hartman, the maintainer) and have never been disappointed. It's been pretty darn solid and, most importantly, has never lost any of my data through various upgrade cycles. It's a bit of a change in thinking, however, for those coming from an NFS background.

      There were three big wins for me...

      (1) Global file namespace managed server-side and accessible from anywhere... LAN, WAN, whatever. All clients see files in the same location.

      Unlike NFS, where you have to "mount" volumes within the file system on each client, the AFS file system is globally the same, living under "/afs", so every client accesses the same information via the same file system path. A notion of "cells" makes this possible... information under a single administrative authority lives in a "cell", e.g., "/afs/athena.mit.edu" is the top-most "mount point" for a well-known cell at MIT. Volumes, in AFS parlence, also aren't tied to any particular server or even location in the name space as far as the clients know. A client doesn't have to know explicitly in it's configuration which server a given bit of information lives on, and that data can be moved around behind the scenes as necessary (increase the volume space, increase the redundancy, taken offline, etc...) All volume mounts are handled server-side. The clients only have to know about the cell database server, and that can be determined via AFSDB records in DNS. (I.e., your AFS "cell" name matches up with your domain name, e.g., /afs/athena.mit.edu matches up with "athena.mit.edu" in DNS.) So almost all management aspects are handled server-side.

      (2) Client side implementations.

      All my Linux and Windows machines can access the same AFS file space. An OS X client is available too, but I've not needed that to date, but might someday. I thus have all home directory information, as well as a lot of binaries, living in the AFS file space, in one place. And behind the scenes, that info is on multiple AFS servers that have RAID-5 disk arrays and weekly tape backups going on.

      (3) The file system "snapshot" feature, for backups.

      You can take a snapshot of volume(s) at a particular point in time and roll them onto tape without needing to take them offline. You don't have to worry about inconsistencies in the files. Folks can continue to update files but the backup snapshot doesn't change. Very much the same as the snapshot feature on Netapps. These snapshots, called backup volumes, can even be mounted in the file space so folks can get access to the old view of the volume, e.g., accidentally deleted a critical file and need it back.

      And security via Kerberos is nice, especially if you already have an infrastructure. But it's not too hard to setup a single KDC to get started. In the Debian distribution docs for OpenAFS, there's a setup and configuration transcript that makes things relatively easy and clears up a lot of questions.

      In summary, OpenAFS is a very good solution here.

  22. A potted review of several distributed filesystems by elronxenu · · Score: 5, Informative

    Why not stick with NFS for the time being?

    I went through the "is coda right for me?" phase, and also "is intermezzo right for me?" and also spent tens of hours researching distributed filesystems and cluster filesystems online ... my conclusion is that the area is still immature, I will let the pot simmer for a few more years (hopefully not many), and use NFS in the meantime.

    My situation: desire for scalable and fault-tolerant distributed filesystem for home use with minimal maintenance or balancing effort. Emphasis on scalable, I want to be able to grow the filesystem essentially without limit. I also don't want to spend much time moving data between partitions. And last but not least, the bigger the filesystem grows, the less able I will be to back it up properly. I want redundancy so that if a disk dies the data is mirrored onto another disk, or if a server dies then the clients can continue to access the filesystem through another server.

    All that seems to be quite a tall order. I checked out coda, afs, PVCS, sgi's xfs, frangipani, petal, nfs, intermezzo, berkeley's xfs, jfs, Sistina's gfs and some project Microsoft is doing to build a serverless filesystem based on a no-trust paradigm (that's quite unusual for Microsoft!).

    Berkeley's xFS (now.cs.berkeley.edu) sounded the most promising but it appears to be a defunct project, as their website has been dead ever since I learned of it, and I expect the team never took it beyond the "research" stage into "let's GPL this and transform it into a robust production environment". Frangipani sounds interesting also, and maybe a little more alive than xFS.

    On the other hand coda, afs and intermezzo are all in active development. afs IMHO suffered from kerberitis, i.e. once you start using kerberos it invades everything and it has lots of problems (which I read about on the openAFS list every day). AFS doesn't support live replication (replication is done in a batch sense) either.

    CODA doesn't scale and doesn't have expected filesystem functionality: for 80 gigs of server space I would require 3.2 gigs of virtual memory, and there's a limit to the size of a CODA directory (256k) which isn't seen in ordinary filesystems. There's also the full-file-download "feature". CODA is good for serving small filesystems to frequently disconnected clients but it is not good for serving the gigabyte AVIs which I want to share with my family.

    Intermezzo is a lot more lightweight than CODA and will scale a lot better, but it's still a mirroring system rather than a network filesystem. I might use that to mirror my remote server where I just want to keep the data replicated and have write access on both the server and the client, but it's again not a solution for my situation.

    The best thing about intermezzo is that it sits on top of a regular filesystem, so if you lose intermezzo the data is still safe in the underlying filesystem. CODA creates its own filesystem within files on a regular filesystem, and if you lose CODA then the data is trapped.

    Frangipani is based on sharing data blocks, so like NFS it should be suitable for distributing files of arbitrary size. I need to look at it in a lot more detail; this is probably the right way to build a cluster filesystem for the long haul. For the short term, Intermezzo is probably the right way for a lot of people: it copies files from place to place on top of existing filesystems.

    What I did in the end:

    • new server (Celeron 1.3 GHz, 512 meg RAM)
    • 2 x 80 gig IDE disks
    • Each IDE drive has 2 partitions (one small, one huge)
    • Each partition is RAID-1 mirrored with its partner on the other disk
    • The huge RAID partition is defined to Linux LVM (logical volume manager)
    • Logical volumes are created within that for root, /home, etc...
    • All logical volumes are of type ext3 for recoverability.

    The way it works is tha

  23. Watch for NFSv4 in the future! by Sri+Ramkrishna · · Score: 4, Informative

    Watch for the new version of NFSv4. There are already a sample implementation in the linux 2.5 tree. NFSv4 will address most of the problems that NFSv3 and others have. Including plugin security models, namespace, and revamped ACL handling.

    It's also WAN friendly, letting several operations be done at the same time with a single directive. (COMPOUND directive) It also allows you to migrate one filesystem to another with no stale filehandles. Basically, it's trying to be an AFS killer.

    For more information, take a look at
    http://www.nfsv4.org/

    Lots of good info including the IETF spec. It's a interesting read.

    The spec is not quite complete. Currently, I believe there are discussions with how NFSv4 will work with IPsec.

    Cheers,
    sri