Distributed Filesystems for Linux?
Zoneball looked at 3 distributed filesystems, here are his thoughts:
" Open AFS was the solution I chose because I have the experience with it from college. For performance, AFS was built with an intelligent client-side cache, but did not support network disconnects nicely. But there are other alternatives out there.
Coda appears to be a research fork from an earlier version of AFS. Coda supports disconnected operations. But, the consensus on the Usenet (when I looked into filesystems a while ago) was that Coda was still too 'experimental.'
Intermezzo looks like it was started with the lessons learned from Coda, but (again from Usenet) people have said that it is still too unstable and it crashes their servers. The last 'news' on their site is dated almost a year ago, so I don't even know if it's being developed or not"
So if you were to recommend a distributed filesystem for Linux machines, would you choose one of the three filesystems listed here, or something else entirely?
NFS Linux FAQ
Howto #1
Howto #2
If you find yourself needing help, try asking people at Just Linux forums, or trying the NFS mailing list.
I would use SFS, the Self Certifying File System. Assuming all the systems you are using are supported, it offers global, secure access to anything you care to export.
Since openafs forked from the old transarc/IBM codebase, it looks as if it has a real future. It's used by a load of educational and research institutions (notably CERN), as well as Wall Street firms.
Check here for a good background on DFS. It also has a quick table comparison of the popular programs, and a walkthrough to set up Intermezzo.
We use PVFS at work to give us a high-performance network filesystem for use with our clusters.
http://parlweb.parl.clemson.edu/pvfs/
I run an openmosix cluster with the openmosix filesystem here at work. Three computers.. no problems...
If you want to take a look..
http://lucifer.intercosmos.net/index.php
linkage and I am going to be placing some tutorials up. -joeldg
anime+manga together at last.. in real time.
Samba works fine. I personally have approximately 5 samba mounts in my filesystem totally transparent for anybody who was to walk up and use my computer.
No need to unnecessarily complicate things here, samba is simple to set up and functions great.
That's what NIS is for. You can schedule regular downloads of group/passwd files, which are updated in a NIS database stored on a master server, and passed down to "slave" servers.
While there is no new news posted on the site, ther are current tarballs on the ftp server, as recent as 5.9.03. (but that file appears to be a redux, last update to code seems to be 3.13.03)
The sourceforge page for the project (http://sourceforge.net/projects/intermezzo) shows status as production/stable but the info there looks stale too.
This guy must have installed too many versions of the same Microsoft products. ... You can still configure you networking using scripts for 2.0- or 2.2-based distros. You can often use 20 year old programs under Unix, albeit sometimes with some effort.
In the GNU/Linux world, BSD world, and to some extend in the entire Unix world, good designs do not become obsolete. Even not-so-good designs often stick around, for the sake of backward compatibility. In the newest greatest Linux kernel, you can still have a.out support, NFS, Minix, FAT16 filesystem support
Only in the M$ world is obsolescence such a big issue, because that obsolescence is planned. In short, don't worry that much about obsolescence : if Coda is as good as it looks, it'll be there for a long time. If SomeCrappyDistributedFS FileSystem is used by enough users, it'll stay around for compatibility's sake anyway, even if it sucks.
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
To be honest, big time, but a lot of people forget the other side of life with NFS, and that's NIS/NIS+. The yp-tools include pretty good NIS support, but not sure of NIS+. Would use niether in a production environment personally, but a common Auth system which is easy to manage would solve that issue.
/home directory of 5 computers onto / on a central system and it display all the mounts simultaneously. For example:
/home/foo
/home/baz
/home/tic
/home/tac
/home/toe
/home /home /home
/tac
/toe
/foo
/baz
Could also look into LDAP (VERY complex, no good starting point that I've been able to find) and Kerbreos auth methods as well.
Should give you a central point for uids/usernames. But NFS does not have transparent mounting that I'm aware of so that you could mount, say the
<ECODE>
CPU1 contains:
CPU2 contains:
CPU3 contains:
on CPU4, you'd do the following:
mount CPU1:/home
mount CPU2:/home
mount CPU3:/home
And you'd end up with on CPU4:
/home/tic
</ECODE>
If there is a way to do this, please lemme know. I've heard people talk about it in the past, but haven't seen anything come of it yet.
-What have you contributed lately?
That should have read...
Format, Install Windows Server 2000 or 2003, Repeat
Just so you all know. NFS is a network accessible FS. A DFS can also be network accessible from clients, but it physically resides on multiple systems.
Whoa, you definitely need Unison.
Unison will synchronize any two file trees in The Right Way (TM).
Get the gtk version for interactive conflict resolution.
((lambda (x) (x x)) (lambda (x) (x x))) http://www.endpointcomputing.com a scientific approach to custom computing.
Which offers the best stability and protection from future obsolescence?
The best protection from future obsolescence is to use something that is already obsolete.
I usually use rsync for one way backups, and unison where I need 2 way synchronization.
Rsync is nice because you can update lots of files very quickly, as it only moves binary diff's between files. Also, if it is a costly network link, you have the option to specify max transfer rates, so you don't kill your pipe when it runs from your cron job.
Unison is nice because it is pretty smart about determining which files should be moved, and can correctly handle new and deleted files on either end of the link. Plus it supports doing all of it's comm via ssh, so it's secure.
rsync
unison
The downside to both of these being that neither of them are instantaneous. However, I've had much success running both of these as often as every 5 minutes. Just make sure that you launch them from a script that is smart enough to check for already running instances before it starts trying to move data.
It's become such a part of my day to day life that I can't really describe the things I was missing before. The best things about it are probably the strong, flexible security and ease of administration. It also gives you everything you need from a small shop all the way up to a globally available decentralized data store.
There seems to be a good comparison here. I would strongly recommend AFS for all of your distributed filesystem needs. (The OpenAFS developers are cool too!)
LRC, the best-read libertarian site on the web
I just went through this process a few weeks ago and I must say I'm really glad I went through the trouble of setting it up...it's very cool. I actually wrote a tutorial about how to accomplish this by using NIS and NFS. I hope you find it helpful.
The only trouble you might run into with the setup I used is some file-locking issues with programs wanting to share the same preference files.
--It's Pimptastic!--
That's like saying "jumping off a cliff is not the most intelligent thing to do." NFS is easily the LEAST secure option of ANY filesharing system.
NFS is only appropriate on a 100% secured(physical and network-level) network. If anyone/someone can plug in, forget it. If anyone has root on ANY system or there are ANY non-unix systems, forget it. If ANY system is physically accessible and can be booted off, say, a CDROM, forget it. The only major security tool at your disposal is access by IP, which is pathetic. Oh, and you can block root access.
Even though you can block root access for some/all clients, it's still massively insecure, and this remains NFS's greatest problem. You have zero way of authenticating a system. NFS is like a store where you could walk in, pick up any item you wanted, and say "I'm Joe Shmoe, bill me for this!" and they'd say "Right-o!" without even looking at you. All systems with the right IPs are explicitly trusted, and their user/permissions setups are also explicitly trusted.
NFS is a pretty good performer, especially when tuned right and on a non-broken client(which linux is VERY far from.) However, its entire security model is in dire need of a complete overhaul. There needs to be a way to authenticate hosts, for one, more similar to WinNT's domain setup, which is actually incredibly intelligent(aside from the weak LANMAN encryption.) The administrative functionality in NFS can't compare to the features that have been available to MacOS and Windows administrators for over a decade, and it's purely embarassing.
Either that, or AFS/Coda need to get a lot more documentation and (for Coda)implementation fixes. The unix world desperately needs a good filesharing system...
Please help metamoderate.
Other options like LDAPS and Kerberos offer at least some form of security.
ypcat, then brute force attack on the resulting passwd file is as old as dirt, and sadly still works. I was a bit dissappointed when I saw NIS as a required service on the Redhat cert syllabus.
This may sound harsh, but I don't think there is much excuse for run NIS in this day and age. Anyone who does this in an environment where security is a concerns deserves what they get.
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
I had more or less the same basic requirements and I opted for AFS.
:DFS, DCOM, Directory Services, SSO, DCE-RPC, etc.)
My needs were a little more demanding (had to be implemented in GNU/Linux, Solaris, AIX, HP-UX and as an extra Windows 2000) and grocking AFS can be difficult at first but it was the best choice by far. Stable across all the Unices, very secure (this was another requirement) and integrates perfectly with our Kerberos Domain and LDAP accounting info. It provides a unique namespace that can span multiple servers transparently, does replication, automatic backups and read-only copies, client-side cache with callbacks, has a backup (to tape) system that can be used stand-alone or integrated with existing backup structures (Amanda, Legato, TSM) AND was the basis for the DCE filesystem, DFS (as a side note I find it interesting - and sad - that most things people try to emulate this days are present in DCE , and Windows 2000 got many of the "new features" from a technology initially made for Unix
AFS is amazing and much more robust than any distributed filesystem I know of; it has shortcomings when servers time out, but apart from that it's really an excellent solution; an example I generally use to give an idea of some of the good features of AFS is a relocation of a home directory to another server. The user doesn't even notice that his home directory was moved to another server *even if he was using it and was writing stuff to disk*; at most all writing calls to his home dir have a small delay (a couple of seconds) even if his/her home dir was 5 Gb worth.
Kerberos integration is an added bonus, if you can you can use this as an excuse to kerberize your systems and form a Kerberos Domain. If you don't want to just stick with the standard AFS KA server.
In my setup I have Windows users accessing their home dirs in AFS using the Kerberos tickets they have from the Windows login and the fact that a cross-realm trust was made between the Unix DOmain and the AD; the can edit all the files they are entitled to with that ticket, and the system is so secure that Transarc used to put the source code in it's public AFS share and added the customers that bought the source to the ACL of the directory that contained it.
With all this features it would be hard not to vivedly recommend OpenAFS as the best solution for a unified, distributed filesystem. Bandwidth utilization is, in my experience, at least half of what NFS uses, which is an added bonus.
cheers,
fsmunoz
Why not stick with NFS for the time being?
I went through the "is coda right for me?" phase, and also "is intermezzo right for me?" and also spent tens of hours researching distributed filesystems and cluster filesystems online ... my conclusion
is that the area is still immature, I will let the pot simmer for a
few more years (hopefully not many), and use NFS in the meantime.
My situation: desire for scalable and fault-tolerant distributed filesystem for home use with minimal maintenance or balancing effort. Emphasis on scalable, I want to be able to grow the filesystem essentially without limit. I also don't want to spend much time moving data between partitions. And last but not least, the bigger the filesystem grows, the less able I will be to back it up properly. I want redundancy so that if a disk dies the data is mirrored onto another disk, or if a server dies then the clients can continue to access the filesystem through another server.
All that seems to be quite a tall order. I checked out coda, afs, PVCS, sgi's xfs, frangipani, petal, nfs, intermezzo, berkeley's xfs, jfs, Sistina's gfs and some project Microsoft is doing to build a serverless filesystem based on a no-trust paradigm (that's quite unusual for Microsoft!).
Berkeley's xFS (now.cs.berkeley.edu) sounded the most promising but it appears to be a defunct project, as their website has been dead ever since I learned of it, and I expect the team never took it beyond the "research" stage into "let's GPL this and transform it into a robust production environment". Frangipani sounds interesting also, and maybe a little more alive than xFS.
On the other hand coda, afs and intermezzo are all in active development. afs IMHO suffered from kerberitis, i.e. once you start using kerberos it invades everything and it has lots of problems (which I read about on the openAFS list every day). AFS doesn't support live replication (replication is done in a batch sense) either.
CODA doesn't scale and doesn't have expected filesystem functionality: for 80 gigs of server space I would require 3.2 gigs of virtual memory, and there's a limit to the size of a CODA directory (256k) which isn't seen in ordinary filesystems. There's also the full-file-download "feature". CODA is good for serving small filesystems to frequently disconnected clients but it is not good for serving the gigabyte AVIs which I want to share with my family.
Intermezzo is a lot more lightweight than CODA and will scale a lot better, but it's still a mirroring system rather than a network filesystem. I might use that to mirror my remote server where I just want to keep the data replicated and have write access on both the server and the client, but it's again not a solution for my situation.
The best thing about intermezzo is that it sits on top of a regular filesystem, so if you lose intermezzo the data is still safe in the underlying filesystem. CODA creates its own filesystem within files on a regular filesystem, and if you lose CODA then the data is trapped.
Frangipani is based on sharing data blocks, so like NFS it should be suitable for distributing files of arbitrary size. I need to look at it in a lot more detail; this is probably the right way to build a cluster filesystem for the long haul. For the short term, Intermezzo is probably the right way for a lot of people: it copies files from place to place on top of existing filesystems.
What I did in the end:
The way it works is tha
Watch for the new version of NFSv4. There are already a sample implementation in the linux 2.5 tree. NFSv4 will address most of the problems that NFSv3 and others have. Including plugin security models, namespace, and revamped ACL handling.
It's also WAN friendly, letting several operations be done at the same time with a single directive. (COMPOUND directive) It also allows you to migrate one filesystem to another with no stale filehandles. Basically, it's trying to be an AFS killer.
For more information, take a look at
http://www.nfsv4.org/
Lots of good info including the IETF spec. It's a interesting read.
The spec is not quite complete. Currently, I believe there are discussions with how NFSv4 will work with IPsec.
Cheers,
sri