Slashdot Mirror


The Linux Filesystem Challenge

Joe Barr writes "Mark Stone has thrown down the gauntlet for Linux filesystem developers in his thoughtful essay on Linux.com. The basic premise is that Linux must find a next-generation filesystem to keep pace with Microsoft and Apple, both of whom are promising new filesystems in a year or two. Never mind that Microsoft has been promising its "innovative" native database/filesystem (copying an idea from IBM's hugely successful OS/400) for more than ten years now. Anybody remember Cairo?"

33 of 654 comments (clear)

  1. New FS by stecoop · · Score: 5, Interesting

    Linux must find a next-generation filesystem to keep pace

    What are the winds of change saying? R..E..I..S..E..R...4...

    1. Re:New FS by AstroDrabb · · Score: 5, Informative
      Reiser4 is going to be great. Here are some of the features for those who don't like to click-n-read:

      1. * Reiser4 is the fastest filesystem,
      2. and here are the benchmarks.
      1. * Reiser4 is an atomic filesystem, which means that your filesystem operations either entirely occur, or they entirely don't, and they don't corrupt due to half occuring. We do this without significant performance losses, because we invented algorithms to do it without copying the data twice.
      1. * Reiser4 uses dancing trees, which obsolete the balanced tree algorithms used in databases (see farther down). This makes Reiser4 more space efficient than other filesystems because we squish small files together rather than wasting space due to block alignment like they do. It also means that Reiser4 scales better than any other filesystem. Do you want a million files in a directory, and want to create them fast? No problem.
      1. * Reiser4 is based on plugins, which means that it will attract many outside contributors, and you'll be able to upgrade to their innovations without reformatting your disk. If you like to code, you'll really like plugins....
      1. * Reiser4 is architected for military grade security (sponsored by DARPA). You'll find it is easy to audit the code, and that assertions guard the entrance to every function.

      Notice the plugin feature. This will create endless possibilities for what you can do with the file system. Want to tie a DB/SQL search function in to it? Write a plugin, want special security? Write a plugin. Tons of possibilites with ReiserFS4 and it is _very_ fast. This is hands down better then the MS "a filesystem as a DB" approach. ReiserFS4 will be like Firebird, lean-n-mean-n-fast. Want more features, grab _your_ favorite plugins!
      --
      If Tyranny and Oppression come to this land,
      it will be in the guise of fighting a foreign enemy. -James Madison
    2. Re:New FS by prisoner-of-enigma · · Score: 4, Interesting

      I've been shouted down before about this, but I'm going to keep asking for it because it's a useful feature for my company: what about per-file compression in the file system? Now before anyone has a hissy fit, let me explain.

      We output a lot of digitally-created video files that are huge (think HDTV resolution). Most of these files are output uncompressed because either (a) the file format doesn't support compression or (b) the multimedia program doesn't support compression. Either way, a few minutes of HDTV-quality uncompressed video will absolutely destroy a few hundred gigabytes of space in no time.

      We have to hold on to some of this video for quite some time, but we only need to get at it infrequently. It's too big to fit on DVD-R's, tape is too slow, ZIPping it up hinders easy access later, and removable hard drives are expensive. File system compression, on the other hand, does wonders. We routinely get 60%-80% compression on archived video files, and it's allowed us to stretch our disk capacity a long, long way because of it.

      We've considered archiving our video in some kind of compressed streaming format like AVI, Quicktime, or MPEG-2, but none of these offer lossless codecs that are appropriate for us, and we're unwilling to accept using a lossy compressor.

      So, I ask the question again: when, if ever, is anyone going to implement file compression on a Linux file system? Or does it already exist but is buried somewhere in some arcane HOWTO or website?

      --
      In the end they will lay their freedom at our feet and say to us, Make us your slaves, but feed us. - Fyodor Dostoyevsky
    3. Re:New FS by FleaPlus · · Score: 5, Informative

      If you're concerned about compression speed, you may want to take a look at LZO. It's got incredibly fast compression, and even faster decompression. I think it was even used on the Mars Rovers.

  2. Hans Reiser's vision of the future by alanw · · Score: 5, Informative

    Hans Reiser has written a white paper containing his thoughts on the design of the next major version of ReiserFS.

    1. Re:Hans Reiser's vision of the future by minginqunt · · Score: 5, Interesting

      In addition to Reiser4, there are a whole whost of projects that aim to provide all or part of what BFS achieved, Spotlight (MacOS X Tiger) and WinFS will achieve.

      This includes Beagler/Dashboard

      http://www.nat.org/dashboard
      http://www.gnome.o rg/projects/beagle/

      And of course, the ambitious Gnome Storage project, being pushed by Seth Nickell. He recently wrote a paper comparing all the technologies, found here:

      http://www.gnome.org/~seth/blog/document-indexin g

    2. Re:Hans Reiser's vision of the future by MouseR · · Score: 5, Informative

      Apple's Spotlight isn't a file system. It's a search engine that uses and maintains metadata stored in the file system.

      HFS+ is the current OS X file system, and that of Tiger (next revision of OS X) as well. Spotlight uses HFS+'s built-in metadata support to enhance it's search capabilities. What Tiger offers more to application developers is an API to add metadata to documents, something that was limited until now.

  3. Don't try to keep up with Microsoft and Apple by suso · · Score: 5, Insightful

    Instead, try to keep up with the demands and needs of users.

    1. Re:Don't try to keep up with Microsoft and Apple by Anonymous Coward · · Score: 4, Insightful

      Don't try to keep up with Microsoft and Apple. Instead, try to keep up with the demands and needs of users.

      In this case, they're one and the same.

    2. Re:Don't try to keep up with Microsoft and Apple by jilles · · Score: 4, Insightful

      Actually that involves keeping up with the rest of the field as well. Not every feature MS adds to their OS should be duplicated. But some features are useful and should be considered.

      MS has basically announced/demonstrated most of the new features that are in longhorn. Effectively that has given the linux community two years to come up with competing features. Adding database features to a filesystem makes sense, beos has demonstrated that you can do some nifty stuff with it and both apple and MS have anounced to do this.

      The linux community however is divided. You can install reiserfs, maybe develop some tools that use some of its more advanced features but that doesn't fundamentally change anything if openoffice, KDE and Gnome and other programs don't coevolve to use the new features.

      The same goes for stuff like avalon. While everybody is still talking about how such technology might be used in OSS projects like mozilla, Gnome, MS is well on their way of implementing something that may actually work.

      Filesystems with rich metadata were already a good idea ten years ago. The OSS community has talked about them where others have implemented them. Two years of more talking would be fairly consistent. IMHO the OSS community is underperforming in picking up new technology and good ideas.

      --

      Jilles
    3. Re:Don't try to keep up with Microsoft and Apple by kfg · · Score: 4, Insightful

      Users often demand and think they need all sorts of pointless, worthless, daft shit. Commercial companies, of course, have to cater to this, and the less ethical directly exploit it ( I'll sell you speaker cables that I've meditated over while sitting under a mystical waterfall to infuse them with energy and align their molecules, only $2000 a set. If you can't hear the difference it's because your chakras are blocked, but don't worry, I've developed a homeopathic remedy, only $20 a bottle. Oh, they only work while listening to my taped lecture series though, just $499. Remember to sit on my special magnetic pad at the same time (available to members only)).

      How's about this for a better idea, instead of trying to keep with Microsoft try to keep up with sound software engineering principles in designing our file systems?

      There may even come a time when the required action to impliment this idea is to do nothing.

      KFG

  4. easy answer by dAzED1 · · Score: 5, Insightful
    nfs4, with solid integrations for auth servers (ldap to active directory, etc).

    We live in a network-based universe. Local filesystems are already good - whether its just continued development in Reiser, or whatever else.

    Nfs4, though - its like afs, only without the sucky stuff. AIX is now including nfs4 in its AIX5.3 release, even! With the Big Dog on board, we should realize there's wisdom in that direction ;)

    1. Re:easy answer by Anonymous Coward · · Score: 4, Informative

      Man, you totally miss the point. NFS is not a file system (don't be fooled by the name), it's a network protocol. The files provided by a NFS server have to be physically stored on some (real) filesystem, like ext3 or reiserfs.

      This is very much like saying "the future of filesystems is apache2, local filesystems are already good, now we have to concentrate on apache2".

  5. ReiserFS is pretty damn good by bigberk · · Score: 4, Informative

    Hans Reiser has some interesting ideas about the role of a modern file system. Here's a recent USENET post describing some of the immediately visible features of reiserfs v3. Some people have said that there was corruption in the past, but I think there are no longer any problems in recent 2.4 kernels. Namesys is now developing Reiser4, which appears to be more flexible (still needs time to stabilize though). If I had to put down my money on a future filesystem though, it would be ReiserFS.

  6. I want a transparent filesystem/VM by valen · · Score: 5, Interesting


    I want a disk equivalent of top - something that'll tell me what processes are kicking the shit out of the disks, and by how much.

    If Linux could do that - it's more a VM thing than a filesystem - I'd stick with ext3 for years to come.

    Who needs a filesystem in a database when you have a database that lives on your filesystem (updatedb). Get that updating in realtime, with more things (like permissions, access times etc.) and a lot of the work is done.

    john

  7. Filesystems are tools by tikoloshe · · Score: 5, Insightful

    Filesytems are tools that will suit different purposes. Some are good for databases, some for lots of small files, some for lots of reading, some for writing, some for networks, some for streaming.
    So to develop a one handy "swiss army knife" of filesystems may not be the best route. For the most part one knows what a system will be doing and can build in the most appropriate filesystem for the job.

    --
    --
    1. Re:Filesystems are tools by beee · · Score: 5, Insightful

      A good filesystem should be capable of handling all potential applications (for example, FAT32 has found its way into grandmother's desktop and production web servers). Specializing a FS is a huge mistake, and any highly-specific FS introduced to date has been a huge flop. This is not the best route to travel for Linux.

      --


      + Donald Gunth
      + Email: dgunth@quicktek.net
      "Caffeine is the greatest lubricant ever created." -ESR
    2. Re:Filesystems are tools by Anonymous Coward · · Score: 4, Funny

      for example, FAT32 has found its way into grandmother's desktop and production web servers

      Wow, your grandmother has production webservers! Cool. ;-)

  8. Gnome Storage by leandrod · · Score: 4, Interesting

    Gnome Storage should be a step in the right direction, and it gets it right by not reinventing the wheel, just using PostgreSQL as its database engine.

    This way we can test the waters without messing with the kernel. When the concept is tried, we can decide if we make PostgreSQL a required part of a GNU/Linux system, or a Hurd translator, or whatever.

    --
    Leandro Guimarães Faria Corcete DUTRA
    DA, DBA, SysAdmin, Data Modeller
    GNU Project, Debian GNU/Lin
  9. Re:bah! by Fjornir · · Score: 5, Funny

    I'll use flat files and grep like god intended.

    --
    I want a new world. I think this one is broken.
  10. Compatible by cubicledrone · · Score: 4, Funny

    Just make sure it is incompatible with all the current applications so we can rewrite everything. Add a cool feature or something too.

    --
    Business isn't willing to pay for products, innovation and careers, so we get brands, mortgage commercials and layoffs.
  11. Keep it all modular, please by JBMcB · · Score: 5, Interesting

    Make the core filesystem small, robust and fast. Journalling, realtime and not much else. Make add-on modules for fancy things like ACL's, quota, compression, encryption, compatability, extended attributes, etc... Put in shims for calling attributes from a database (db or SQL or whatever)

    XFS comes close, ReiserFS 4 is nice, too. The most important thing is keeping the base filesystem simple and FAST. You think NTFS is fast? Try deleting a complete Cygwin install (>30K files) It takes AGES, even from the command prompt. I've deleted 15K files (That's 15 THOUSAND files) on Reiser 3 on the same machine, it took a few seconds.

    DO NOT make a database driven filesystem. Some day we will have a true, document based desktop paradigm (OpenDoc anyone?) but probably not for several years, until then we need SPEED.

    --
    My Other Computer Is A Data General Nova III.
  12. Next generation? by stratjakt · · Score: 5, Interesting

    Lets get the "this generation" filesystems working correctly, shall we?

    Solid, universal support for ACLs, and while we're at it, let's fix the whole user/group namespace mess Unix has with it. Let's use an SID-style id like Windows does.

    For example: my small network at home, centrally authenticated through ldap.

    Now, windows knows the difference between the user "jim" on local machine A, "jim" on machine B, and "jim" the domain user. They'd be shown as MACHINEA/jim, DOMAIN/jim, etc.. The various SIDs take the domain (or workstation) SID and append the UID. So if his number is 100, his sid is "long-domain-sid" + uid. So when you pass around sid tokens, you know exactly which jim you're talking about.

    Now in linux, we just have numbers for users and groups. If user 100 on machine A is "jim", user 100 could be "sally" on machine B. Moving that stuff to ldap becomes messy, now I have to reconcile the numbering schemes of all the machines I want to migrate. Ick. And you get all kinds of screwy stuff sharing folders, if you ls it on one machine it'll show wholly different ownerships.. Is the source of about a billlion and one nfs security holes.

    And of course, since a file can only have one permission set - owner, user, group, it sure does make for some sucky shit. The lazy among us would just run as root all the time to avoid the whole damn mess.

    I know there's a circle jerk of workarounds, patches and gotchas to avoid this, but it should never be a problem in the first place. The basic unix security model is out-of-date, and is the source of many systemic problems.

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:Next generation? by mattdm · · Score: 5, Interesting

      Is the source of about a billlion and one nfs security holes.

      Or rather, it is the source of the NFS security hole. But it's okay. NFS4 (or 3, even) with Kerberos totally solves this problem, much more elegantly.

      Everyone's all excited by ACLs, but I'm sceptical of their real world value. The "keep it simple" principle of security can't be emphasized enough. With ACLs, you have to really examine the access rights of a given object to figure out what's going on. With the standard Unix user/group system -- with simple directory-based inheritence -- it's completely transparent.

      And, most importantly, I've yet to see one thing worth doing with ACLs which couldn't be set up with user/group permissions instead -- and more simply.

    2. Re:Next generation? by Malor · · Score: 5, Interesting

      Properly done, an ACL system will give you a MORE secure system, not a less secure one, because there are fewer chances for mistakes.

      In the NT 4.0 days, one of the better ways to handle permissions was the 'AGLP' standard. User A)ccounts go in G)lobal groups, G)lobal groups go in L)ocal groups, and local groups get P)ermissions.

      This allows a nice level of indirection. I implemented this standard by specifying that Global groups described groups of people, and that Local groups specified access privileges. I built Local groups on each server describing the kind of access privileges they offered. Generally, I would make four groups for each of my intended shares: Share NA (no access), Share RO, Share RW, and Share Admin. I would assign the appropriate ACLs in the filesystem, and then put Global groups from the domain into the proper Local groups. The Accounting group, for instance, might get RW on the Accounting share. Management might get RO, and the head of Accounting and the network admins would go into the Share Admin group.

      What this meant was that, once I set up a server, I *never again* had to touch filesystem permissions. Not ever. All I had to do was manipulate group membership with User Manager... with the caveat, of course, that affected users had to log off and on again for permissions to take effect. But this is also true with Unix, in many cases. (when group membership changes).

      Note that Windows 2K and XP have more advanced ways to handle this, so don't use this design in a Win2K+ network.... this is the beginnings of the right idea, but 2K added some new group concepts. Under Active Directory, this idea isn't quite right. (I'd be more specific but I have forgotten the details... I don't work much with Windows anymore.)

      ACLs are key to this setup, because I can arbitrarily specify permissions and assign those permissions to arbitrary groups.

      By comparison, User, Group, and Other are exceedingly coarse permissions, and it is very easy to make a mistake. What if someone from Human Resources needs access to a specific Accounting share, but nothing else? Under Unix, I can't just put them in the Accounting group, because that will give them access to everything under that Group permission. I'd probably have to make a new group, and put everyone from Accounting and the one person from HR into that, and then put the special shared files into a specific directory, and make sure the directory is set group suid. That is a lot of steps. Everything is always done in a hurry in IT, and lots of steps are a great way to make mistakes. Messing up just one can result in security compromise.

      In my group-based ACL system, I'd still have to make a custom group, perhaps "HR People With Access to Accounting Share". But I'd only have to touch one user account, the HR person's, and wouldn't have to disrupt Accounting's normal workflow at all, or touch any filesystem permissions.

      Instead of a whole series of steps, any one of which can be done wrong, I have only three: Create new Global group, put HR person in new Global group, put Global group in the correct Local group. All done. Hard to screw this up too badly.

      Now, I'll be the first to admit that a badly-implemented ACL setup is a complete nightmare. But a clean, well-thought-out ACL system, in a complex environment, is virtually always superior to standard Unix permissions.

  13. dtrace by DarkMan · · Score: 4, Informative

    dtrace, due with Solaris 10 does that. So it's not quite a top equivelent, but it does laet you answear your questions ("What processes are kicking the shit out of the disk", and "By how much"), and long with the also useful "In what way" i.e. many small writes, hugh seek to read ratio, or what have you.

    It is, however, expert driven, unlike top, which is simple to use. Still, I think that dtrace shows the furture of performance monitoring apps.

    Note that dtrace lives partially in the kernel - it's not portable to Linux.

  14. Re:Next premise, please by sql*kitten · · Score: 5, Informative

    And neither of whom have a journaled filesystem yet, while Linux has many to choose from.

    What are you talking about? NTFS has had journalling for over a decade. And Unicode. And ACLs. And streams. And reparse points (these are amazingly cool). And compression. And encryption. And ... you get the point.

    Now, MS doesn't use most of this good stuff, but it's all in there. Even three-letter file extensions on Windows are obsolete, since everything on NTFS can be an OLE server. There's nothing on Linux that comes close to the capabilities of NTFS. About the only major thing NTFS is missing is versionning, which VMS has.

  15. Re:why not improved ramdisk? by Jeff+Mahoney · · Score: 4, Informative

    If you load everything on the filesystem to memory on boot, you end up wasting a lot of memory, since you typically use only a very small subset of your filesystem at any given time.

    The solution would be to load things "on demand," as you've suggested.

    Linux already does this, and it does more.

    If you've ever looked at the output of free(1) after your system has been running for an hour or so, it will appear as if almost all your memory is in use. See those last two columns, "buffers" and "cached"? That's your "on-demand ramdisk" at work.

    Linux will use memory that applications aren't using to cache filesystem data (including executables and metadata) to speed future accesses. If your applications need more memory than is currently free, the kernel will drop cached data rather than swap out application memory to disk. That way, you get the benefits of having your executables on a ramdisk, with the flexibility of not having to sacrifice running application performance in the process.

  16. Re:Another solution in search of problem by lightknight · · Score: 5, Insightful

    Right, and how often do you misplace files?

    More than three times a week, and that's criminal.

    I mean, throwing things about in your home or My Documents directory are fairly standard. How often do you put your (picture) files in a \qw3r3et354t\bchnjc8g45\3j4n45g9u98d directory?

    While everyone seems to see WinFS (and associated services) as some sort of search panacea, your ability to retrieve those files is linked to 1.) its metadata and 2.) your ability to recall a search term that appears in the metadata. If your search for "bird" and the metadata specifies "hawk", short of a dictionary search, you still cannot find it. It doesn't matter if the uber search capabilities can span the entire hard drive in 5 secs, and run through multi-dimensional data. You still need a search term, and that search term (in whole or in part) must appear somewhere in the file, be it the filename or metadata.

    Essentially, WinFS makes data appear more ordered (assuming you take the time to fill out the fields). Otherwise, it's useless.

    --
    I am John Hurt.
  17. Re:not so fast ... by cortana · · Score: 5, Funny

    What's that? The ghost of Andrew Tenenbaum... mouthing the word "Microkernel, microkernel" over and over again!

  18. Re:not so fast ... by AstroDrabb · · Score: 4, Insightful

    ReiserFS 3 had bugs in the early versions just as all software will. That is why reiserFS was not used for productions systems for a while. It will probably be the same with ReiserFS 4. I will use it at home when it first comes out, but not where I don't want to chance data corruption.

    --
    If Tyranny and Oppression come to this land,
    it will be in the guise of fighting a foreign enemy. -James Madison
  19. Re:not so fast ... by poelzi · · Score: 5, Informative
    P.S. The whole thing - filesystem as a DB - is complete crap. You can't do a bunch of fs operations in a single transaction and have ACID semantics on the transaction as a whole. Sure - searching is great. But database means much more than just a searching interface.


    Sorry, but you are wrong here. Reiser4 is atomic and you can pack as many operations into one transaction as you like, you just have to use the reiser4 system call. This is, because there is no standard system call for atomic filesystem transactions. Modern filesystems are databases, build to store files and query them trough filenames, reiser4 is the first filesystem where search path can be done through plugins, therefore you can index everything you want.
    --
    kindly regards daniel
  20. Re:New FS (Reiser4 has a compression plugin coming by hansreiser · · Score: 5, Informative

    Reiser4 has a compression plugin coming. We got gzip to work, but it consumes too much cpu, so now we are doing lzo which can compress at disk drive speed. The lzo plugin has a bug, maybe next week....

    Hans

    (You can email edward@namesys.com for details).