State Of The Filesystem
Skeme writes "Have you heard of Plan 9 or Reiser4 but don't know much about them? Are you curious about the improvements free software is making to its filesystems in general? Read my summary of the current developments in the filesystem: namely, what improvements we can expect (a lot), and what Linux and the BSDs can do to improve on the filesystem."
I've always wondered how these filesystems with metadata handle transferring files between different systems. It would suck to have all your MP3 info in filesystem metadata and then lose it all when you transferred to a system without fs metadata. Anyone have any insight?
will ReiserFS be ready for the 2.6 kernel? Just curious.
Any sufficiently advanced libertarian utopia is indistinguishable from government.
... trailing / in filenames now.
Usage of text editors as well, cat and echo will rule!
chess
sure, im going to listen to this guy about filesystem implementation when he cant even set up MIME types on his webserver. jackass.
It's really nice. But what does it brings new that we shoudl rewrite 90% of all system tools too use this new features? I find "cp /a/..uid /b/..uid" same as "chmod /a --reference=/b"...
There has been talk in the kernel mail list of implementing 9p, the plan 9 distributed filesystem.
Now that is Open Source, I hope it will happen.
Seriously it does. When I first tried linux (Mandrake 8.1) I found that reiserfs meant I could shut down my system inproperly with a small risk of corruption and no need to defrag. It was also big stride for linux on the destkop Why my computer telling me to run fsck manually, what is /dev/hdb15 mean?</grandma> Now grandma dosen't need to worry when her grandchildren pull out the plug.
Im a happy reiserfs user for over 1 1/2 years and I hope that ext2 gets deprecated in 2.6 to put it out of its misery forever. XFS and JFS are good for servers, but reiserfs is the most popular "desktop" file system.
The article is not about filesystems in anything like a general sense; it's just a PR piece for ReiserFS and its creator, enunciating criteria specific to that implementation and then attempting to show how it actually (gasp!) meets its own idiosyncratic goals. What a waste of time.
Slashdot - News for Herds. Stuff that Splatters.
As far as i understand this.
...and not very general. Interesting for its comments on what's being tried out in R-FS & Plan9 but certainly doesn't manage to be a general summary of what's going on.
How about the changes coming in 2.6 (like xfs support built in)?
The article makes some good points but for me it could have done with rewriting to make it more general, separate the analysis of filesystem implementation problems from technical detail, and included more examples from other file systems.
"we demand rigidly defined areas of doubt and uncertainty!"
The concept of reducing primitives is a good one, and based in sound mathematical theory. As already pointed out though, you need some container format that can handle some of these new ideas, things like very small files, files as directories and so on. This is needed, because when you transfer files through lossy mediums like the internet, or older filing systems, you don't want to lose the structure.
As far as I know, there isn't a container format that can do this. Tar is showing its age already, I wouldn't like to see it hacked yet again. Zip is alright, but you'd need to break compatability to add in all those extra features, and then it wouldn't be zip anymore. It'd also be inefficient.
So, what I propose is rather than reinvent the wheel to solve this problem, we add support for "boxing" to the Linux kernel.
A box is a filing system in a file. We already use them, to some extent - it's been possible to mount ISO images using the loopback filing system for a while. What's needed is to take this to the next level. The first thing is that we need the ability to use files as mount points, not just directories. When files and directories are the same, well, I guess that should be easier.
The .box file format simply contains a short header with some useful metadata, like maybe a checksum, and the filing system it contains (maybe that isn't needed). The fun part is the UI. What you need is the ability to right click on any dirfile (for want of a better term) and choose the "Box it" option. You'd need a better label. What essentially happens then is that the heirarchy below this point is sucked up and transformed into an ISO containing perhaps a "Reiser4-Lite" filing system. You can forgo the journal and other things that are redundant purely for storage.
The user has then converted their file or directory into something that can be transferred across the net, on Windows compatible CDs and so on, without losing the inherant structure of the original.
At the other end, choosing the "Unbox" option mounts the contents of the box using the loopback FS, mounted at the point of the file. To the user, it is seamless, far easier than zips or tarballs.
Of course, there are lots of complications. You have to agree on the format to use inside the box, for one, because the need to have kernel mods and so on makes it more complex than just installing tar.
I think MacOS has something a little bit similar with disk mountable images (.dmg) files, but the MacOS filing system is rather poor, and I don't know how easy it is for users to create them. Also the OS unfortunately applies some magic to them - for instance Safari will automatically extract the contents of the DMG file then destroy it when you download one (but other stuff does not, oops).
Anyway. That's one way to prevent loss of vital structure when transferring across lossy mediums, that I can think of. There are probably others.
If Linux and related systems move to filesystems with really powerful metadata support, presumably the lockin would be much stronger. Moving a directory from Linux to a Windows system may be possible but the programming to do it will become increasingly painful and the risk of data loss will rise. And with mainframe integrity, why would you want to, Mr. customer?
Apart from the CS issues, is this an attempt to use the embrace, extend weapons of Microsoft against it by turning the Linux filesystem into a full mainframe system, effectively squeezing out Windows servers by a convergence between big tin and small boxes? I guess this is pretty pie in the sky but I'd like to think so.
Panurge has posted for the last time. Thanks for the positive moderations.
Really, I think someone should get on with finishing the NTFS filesystem access in the kernel. With people migrating to XP it's really becoming more important that this driver is fixed (how long has it been declared "dangerous" for write use now?!).
I'd really like to know why this driver has taken so long to complete - is there some information that the developers don't have access to? Some technical reason? What?!
Code, Hardware, stuff like that.
there os should have a 'list' of what's supported by each fs. when you copy a file from fsa to fsb the os (or program) should compare feature and let you know (somehow) that something is not (may not) going to work right. if you copy something from the regular ext2 system that is case sensitive to a ms-dos floppy disk, something should try to remind you. or the program checks this and looks for problems.
remember that not all problems can be detected, so are you willing to live with: 'this may not work correctly' messages?
eric
if that crock, that bag-on-the-side, that mess is what we have to look forward to, I think i'll switch to BSD.
/directory/..owner is a big ugly crock.
I mean, acessing owner data by travelling into a directory then backwards out of it again like: vi
Feel that power? That's mah MOUSING FINGER
I see it happening more and more that people present their summaries, articles and technical papers on the net as pdfs. This is very inconvenient.
Pdfs are nice for printing and publishing on traditional media, because you can be sure they will be in the correct layout etcetera for the printer. But on the web, where people browse between lightweight, easy html-documents, they're just a nuisance.
Please, if you must publish a pdf, publish an html version next to it.
I do not know much about file systems, so I have a few questions.
o w.ogg. I let
RightAboutNow.ogg/title be "Right About Now", /artist be "Millencolin". I
could add a file RightAboutNow.ogg/genre and put "punk" in it.
Once we see the GConf example, other possibilities immediately spring to mind. In almost all multimedia formats--such as MP3, MPEG, and OGG--there is a tagging system for storing things such as the author and title. Instead of storing those attributes in the tag--yet another namespace--store it in the file-as-adirectory. I have a file / music/Millencolin/PennybridgePioneers/RightAboutN
While this is a nice example, I wonder if there really is an advantage to this kind of file system, because it seems like it takes more effort to keep track of all those sub-files instead of keeping all the info in a single file. Anyone can shed some light on this?
Also, what if I format my partitions with different file systems, say Reiserfs 4 and ext3, could there be any imcompatibility issues? Imagine I kept mp3's on both partitions, would my mp3 player know how to handle both formats, since the tag info is dealt with differently on both systems?
Before adopting any of these ideas, one must consider the security implications of doing so.
If we assume that the filesystem is decoupled from the access control layer in the kernel, then one must ensure that any operation that potentially affects security is adequately controlled.
For example, on systems with POSIX_RESTRICTED_CHOWN, the following ought to be illegal:
cp foo/..uid bar/..uid
This can be accomplished by making the UIDs mode 444. Without POSIX_RESTRICTED_CHOWN, the UID is 644. However, we have now moved a systemwide security feature into the filesystem. If multiple filesystems are configured into one kernel, then they ought to be consistent; otherwise the security model will be flawed.
As for things such as allowing access to an environment, doesn't that break encapsulation? It means for a certain filename, the filesystem must grovel through a user-space process to find the environment. Also, if a change in some external environment immediately affects some partially-related processes (e.g. daemons started from that shell), then a whole new raft of security holes will come up based on a process' environment or filesystem layout changing unexpectedly.
Cool ideas, but let's be careful lest we make a steaming pile of Swiss cheese.
I'm really getting tired of the ever-creeping assertion that transactions are required for [x]. At first x was ACID-compliant relational databases, and such was true because ACID was defined as such. However, then I started to see assertions that relational databases had to be ACID-compliant (mostly from the anti-MySQL camps who were ignoring the long history of highly valuable, non-ACID relational databases).
Now, in this article, I see the assertion that databases in general require transactions, and thus cannot be supported by a filesystem.
Worse, the logic is self-refuting, as the article previously states that a filesystem is a database, just a limited one. As it happens, POSIX-type filesystems are quite powerful, and let's not kid ourselves into thinking that they have not served us well for 20-30 years! Yes, changes are coming and I'm frankly quite impressed by Hans Reiser's accomplishment in finally coming up with a balanced-tree-based filesystem. Many have tried and failed where he succeeded.
That's because his was a great step forward, not because the old UNIX filesystems weren't also. Let's stop trying to re-define terms so that we can explain why the last 20 years were the dark-ages. They simply were not.
so in other words, no one will ever use it? great, next story please.
I agree that we need a revolution in how filesystems work inside an operating system, but it seems that the arguments placed in this paper had alot of holes.
For one thing, the need for changing a filesystem should not really be solely concerned on space or metadata. I think security, speed of data retrieval, and self correcting error engines should be centered on the new systems.
The reason for the speed of data retrieval as being more important than data size is because hardrives are getting much bigger than they are faster. In five years, we may have 20 terabyte drives, but the access speeds will still be horrible.
Security and error correction are obvious points that should be implemented on a systemwide level. When these features are system wide, then management becomes much easier for all system users.
a read/write implementation of BeFS for linux!
I agree that metadata in the filesystem is a risky proposition. Just on general principle, I prefer my data inside the file and not left with the filesystem. The MP3 metadata example, to me, is like Windows file extensions on HGH. I remember John Dvorak wrote a piece about Windows file extensions a long while back, and he argued that file types, etc. should be inside the file. A header of sorts. I tended to agree then, and I see filesystem metadata as a bad trend.
"Have you heard of Plan 9 or Reiser4 but don't know much about them?"
Then STFW!!!
This article seems to just be the author brainstorming or feeling excited about reiserfs. It's hardly a "summary of developments in the filesystem". Now if he was asking about opinions on his article it'd be fine, but he's not, so I'll just discard this as another non-news.
I'm surprised at the negativity of some of the comments here, moaning that POSIX semantics are perfect and nothing else can possibly be countenanced...
Plan9 namespaces and Reiser4 really do bring a lot more to the table in terms of useful expanded semantics and utility than all the posix filesystems. Posix extended attributes are very limited, and some filesystem implementors seem to be keen to implement them in the most restricted way possible ( eg size limitations in ext3).
The annoying this with Reiserfs is that the VFS will lag it by a few versions, and very very few apps will make any use of its special system call. Sigh. We'll be stuck with databases in a file for a long while yet.
One thing I would like to know about reiserfs is how attributes are attached to directories? If they are just small files in the "directory" bit of a file, what distinguishes them from children of the directory? Or are attributes just banned from dirs? Seems limiting.
Heh!
My grandma is a kernel hacker.
Nobody (apart from perhaps this guy) has ever claimed that this syntax will actually ever be used, or needed. There are other possible syntaxes available, and in fact one long term blue sky plan for RFS is to allow many different types of syntax within the same file path, including for instance things that vaguely resemble database queries.
So, don't get hung up on the syntax given in this article.
We had plan9 machines here 10 years ago...
I don't think any exist anymore, in fact I don't even think the inferno install works anymore.
But anyway, it isn't a "new advance" anymore.
People keep trying to use file hierarchies as data bases. You can do a lot of stuff, but arrays and m to n forward and reverse mappings aren't among the things you can do with filesystems. That's why you have databases and XML.
It seems that the author presumed that the only use of LDAP is to provide passwords for user authentication. While that is a common use of LDAP it is not the only use.
It would seem that having a file system that is LDAP aware could be extremely useful. Imagine if your LDAP tree were reflected as a tree in your file system. You wouldn't need to embed LDAP calls in your application, it would just be data in your file system. So looking up an attribute for the current user, or a user, would be as simple as reading a file that holds the value of the attribute.
Maybe I'm missing something but isn't ext3 the most popular "desktop" file system?
Is for someone to come up with a real unlimited snapshotting filesystem for linux. I don't want to use user mode hacks (as nice as they are rsync style snapshotting isn't reliable enough), or snapshotting that only allows a shadow copy of the entire volume, I want to be able to tell the users that they can just go into ~/.snapshot/time (where time can be hours, days, or weeks in the past) and copy the file they messed up back into their home directory. Basically I want the most usefull feature of netapps without the HUGE markup =) The cost in admin time both in user interaction and reduced need to do tape retrieval and file restores is immense.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
I use HFS+ because Apple makes me. Is it efficient? reliable? Well... it works most of the time.
The + kept us busy for a while.
Man and Goat
I think it's going to be an absolute mess.
I don't know much, but storing meta-data in files in which you have free reign over what you can put in them cannot be good. Firstly a lot of files are going to be full of garbage. I could theoretically copy-paste a ascii dump of Lord of the Rings into a meta tag. Imagine having to search a directory - oh sorry files-as-directories - in my filestructure for a specific tag.
Furthermore I can see people putting .mp3's online with nothing but SPAM in them. You've basically found a new way for lowlifes to make a living.
And then there are viruses, exloits etc.
Yuioup
Cool ideas, but let's be careful lest we make a steaming pile of Swiss cheese.
/etc/passwd just before the system crashes, you'll be unable to log in again when the system comes up.
/var/log/messages as the system went down, and you'll find this message (as well as the rest of its 4k block) in /etc/passwd, while your changed password file may be found at the end of /var/log/messages. This is a feature of ReiserFS, not a bug.
Evidently you haven't used ReiserFS. It already does this.
ReiserFS only journals filesystem metadata. Because it uses a B+ tree balanced allocation scheme for file blocks, when the system crashes the last pair of blocks written will often be swapped with respect to their files. For example (this has happened to me and separately to a friend) if you modify out
What happened? syslogd wrote a panic message to
I still miss the raw speed of ReiserFS, to be sure, but EXT3 has kept every last one of my hundred-odd filesystems rock solid for two years now, which is really what you want a journalled FS to do.
ObOnTopicComment: Miner's examples are clumsy and ill-considered. BBN's Dave Mankins put a relational database into the 4.1BSD filesystem back in 1984, and Plan 9 took a more rational approach with its namespace algebra. This is not a new idea, so there's no absolutely no excuse for breathless exposition based merely on coolness factor.
At best, Miner's descriptions obscure any true value Reiser's proposal might have. Organize my MP3 collection with FS metadata and lose it all when I try to move to another FS? What is he thinking? Is he thinking at all?
Sheesh.
I feel this very interessting.
:-)
For example I like the currents devfs and procfs (although not perfect). Those help me a lot to "debug" new hardware connection. SImple and coherent.
I also imagine some RDB support. Imagine a "select * from account" where account would be asimple directory. Would be nice to use "find" for the where clause
Imagine also implementting OO classes wiht inheritenace using symlinks... And more...
Yes would be very nice!
D.
It looks insightful
The only thing I'm concerned about is backward compatibility - if someone accidentally tries to open a file with a trailing slash, and gets an error because now it's a directory, then it's a Bad Thing.
[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
...are not necessarily a good thing; many big programs (particularly, propertary ones) need some tweaking on ENV variables to work properly
I enjoyed the presentation of Reiser and Plan 9 information, however...
... well, you get the idea.
I think the author should revisit a few things.
- The author assumes his position is correct, without noting, and then refuting, opposing arguments.
- Very few reference materials, three of which were internet sources.
- Inconsistent use of pronouns. Is this a first person or impersonal paper?
- Comparisons to ext3, ntfs, et alii, are missing.
- Some terms are introduced, but not explained. Although you are writing with technical people in mind, even they might not be familiar with all of the terms.
The author appears to believe a synthesis of Plan 9, Reiser, and RDBMS features is the best solution, however, why this particular synthesis was chosen is not described.
I'm just trying to be helpful. I mean if people only give complements
A filesystem that goes wrong properly...
Imagine your filesystem is a library...
Imagine you drop a bomb on it...
Books and pages scattered all over the place... Yet you can still work out which page belongs to which book, and where on the shelves they used to sit..
Having been caught short by LVM and reiser before (which just couldn't deal with a 46GB gap in the filsystem where a disk used to be) it seems to me that no-one's made a filesystem that breaks properly...
For me.. speed is not an issue.. nor is CPU usage... I'm quite happy to throw a dedicated box at handling the filesystem... I just want something that is written with the thought in mind... "Our hardware is unreliable.. It will die.. it wll loose bytes... How do we deal with it"... not the usual thing that people do of "Our disk is 100% reliable... how can we most efficiently organise it... Hrm.. now what do we do when we encounter problem X" There should be no fsck... That should be handled by 'mount'
Sure.. there's always RAID... but I reckon I've got about 28MB of data in total that's critically important... the rest is just crap I downloaded from the net and can always get again...
Continuing my anology further...
If your filesystem's a library, then then your physical disks are wings of the library.. (east wing.. west wing.. etc)...
I want a librarian who's job is to work out the best way to organise the data in the different parts of the library... When you're looking for a book, it's often easiest just to ask the librarian.. who magically holds much of that information in his/her head.
The librarian is also responsible for making sure that multiple copies are maintained for popular or important works...
It'd also let me do the thing I've always wanted...
chattr +mirror filename
(i.e. This file is important and must be mirrored on 2 or more disks)
Just my two pennies worth... I'll write it one day if someone else doesn't get there first.
Market forces can be a mixed blessing, there were quite a few brilliant innovations in there that never made it onto the Wintel platform. jm
This sig is just as redundant as the rest of this posting
This is said by someone who obviously hasn't done any real world application profiling. It's quite the opposite -- CPU is relatively rarely a limiting factor in desktop applications, dealing with the HDD very often is.
This is very often why adding more memory to a system makes it seem more responsive -- larger disk buffers, less need for disk based virtual memory.
Basically hard disks are very often *the* limitation; CPUs are fast.
Adobe Still Ignores Elcomsoft-discovered Holes describes security problems with PDFs.
That explains why I had so many problems with ReiserFS...I was trying to use it on a *functioning* system! They should put some kind of warning in their docs or something.
The more I read this, the more it reminded me of the marketing version of how Apple would like us to think of Resource Forks.
Truthfully, there isn't exactly a lot of difference in the concept or the idea. Implementation is vastly different but the idea remains very similar.
Why do I want to accept this sort of idea anymore than I want to accept resource forks? If I copy a file with resource forks from one of my macs to nearly any other OS on the market thats not specifically configured to support them, I lose that information. Why do I want to continue this?
I use HFS+ because I have to. To get all the functionality I want out of my macs, its the only real option I have. But for anything other than system level files that are never likely to be copied to another machine, this is just a waste of time to me.
Next question. Say I do run this file system on my machines. I build up a heap of data and I'm using "files as directories" to store metadata about those files. How do I back it up? Don't even try to tell me "rebuild tar". Haven't we put tar through enough to try and extend its capabilities? I wouldn't touch a file system with these capabilities without a guaranteed way of being able to backup ALL the data. Otherwise its just truly not worth the effort.
I wish people with clever ideas to redesign POSIX namespaces would spend ten years in system administration first so they realise what's involved with managing REAL WORKING SYSTEMS.
/bin/prog /bin/prog
/bin/prog into my home fs - Counter-intuitive to the path semantics. If I run this a second time it copies my copy of /bin/prog over itself - Inconsistent.
/etc/passwd becomes a hierarchy of files. Just logging in one user will involve multiple open()-read()-close() operations. Whilst these might be efficiently implementable at fs-level, it is still very inefficient in user space, or will at least require a dramatic rethink of unix tools.
Some of the ideas might well lead in useful directions, but some (at least as described in the paper) are plain silly. viz:
1) with overlayed mounts:
suppose my home dir is mounted read-write over a read-only system root, and I do not have a "/bin/prog" in my home dir. Consider:
cp
First time, it copies the system
2) Attributes in the namespace
We have a rather carefully written setuid chown/chgrp/chmod replacement which can be run by users in an "admin" group, and allows devolution of 1st-line support tasks to nominated users. It won't touch files whose uid/gid is 100, so they can only touch non-system files.
If attributes (file uid) is file/..uid and cp is supposed to handle what chown does, the above breaks big-time. We now need a custom cp replacement. Either that or we have to add an ACL for the admin group to every file we want them to manage, which is a great deal of effort, and likely end up inconsistent.
Contrary to the paper, setuid and PARTICULARLY setgid is NOT going to go away in the real world any time soon, as far as files are concerned. Ports less than 1024 are a different matter and I agree with the document.
3) Consider the number of file descriptors involved if
In POSIX, attributes...However, as more people want more atributes (such as support for access control lists)
Great now DARPA wants to control my access to my pr0n too!
All your base are belong to us!
so why why why is anyone working on improving the FS until NFS is fixed or replaced with a decent security model. Ironically fixing NFS appears to be trivial--something like ssh to authenticate the client might do the job. But this might break a lot of systems in an imhogenous network. even so I dont see any movement here. Everyone's worried about layers over NFS but not NFS itself.
If I'm an idiot then I'm in good company. I know dozens of other sys admins and none of has a clue how to secure an NFS system against people who can jack in to the network.
ironically it gets even worse now that people use ssh and ssh-keys to log in. it makes the network less not more secure. its simpler to attack a client with remote mounted home directory using ssh keys than it is to sniff for telnet passwords!
Some drink at the fountain of knowledge. Others just gargle.
I assume Plan9 is an ironic nod to the "worst film ever". When I develop my new filing system, which will only allow numeric characters in filenames, will delete the MFT every time the computer is rebooted, and will require a new directory for each file added to the system - that FAT16 limit of 512 was FAR too generous - I'm going to call it BattlefieldEarthFS.
When I am king, you will be first against the wall.
I like this, VERY much so! Unfortunately, it's going to be a bitch to get everybody on the same page and moved over to a system like this. I'm not saying it's impossible, it's just going to be incredibly hard.
We'll probably need to see a distro (probably Debian based since Debian based distros tend to do these things first) that purely focuses on stuff such as this. It would hopefully set the standard by which others would have to live by.
We'll see... too far down the road to know for sure though. Still some unanswered questions, such as performance hits and/or memory usage.
Well, what do you use for foo.h anc foo.c files?
fooSource and fooHeader?
Hardly an improvement, I'd say.
The real problem is Windows compatibility. I have to use FAT32 as a default just to be able to read the file system on my Mac, Windows, and Linux boxes. Whatever the future holds, it at least needs a driver for Windows that is easy to use....
already one such implementation exists such that FreeBSD can expose its file system to plan9 machines (as you would expect it gets imported into your namespace. Would can be a different place depending on the namespace of the current process. Even (temporarily) "replacing" your local files with versions on the FreeBSD Box, if that's what you want.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Why not do both? It would seem the eaisiest solution would to be to implement common header files, like Dvorak suggests, that then get mirrored into the file system. This could eaisly be done by the file system when ever it writes a file. That way, the fs could have a rational database for searching and all, but the files retain control over the metadata. Transferring the file would be no problem. The metadata would get transferred in the header of the file, and then written to the database by the filesystem. (and yes, there would be a little overhead for checking and writting the metadata to the fs everytime a file is written, but this is being done anyways by any fs that uses a metadata database, yes/no?)
As anyone who read the grandparent post would know, when I wrote `latex2pdf' in the parent post, I meant `latex2html'.
I am TheRaven on Soylent News
Mac OSX mounds disks as either NFS, AFS or SMB. I assumed that AFS was apple files service or apple file share. Is this just a case of two things with the same acronym?
It isn't so much lack of support (NTFS is actually pretty good with metadata) as the old command-line mentality that sticks with an ancient method of identifying file types out of pure inertia. You don't have to be a Mac or Be person (I'm neither) to wish for a more reliable and less ambiguous method of typing files. But how are you going to get all those Windows "Power Users" to change? Sure, extensions are error prone. And its a pain when two applications seize the same extension. (Even different Microsoft applications do this!) And newbies are always getting in trouble with them. But extensions are easy to understand. Case closed, alas.
#!/usr/bin/bash
I totally Agree with you.
all application specific data about a file should be included with the file.
--meh--
To create one, use dd to get the size you want (from /dev/zero) or copy from some real partition (ie, a floppy or a hd), run the desired mkfs if it was empty, and mount with the proper options. I have been using this to check ISOs and to clone floppies and store them into CDs for ages. I have used it to get perfect FS images too (machine updates, ie) and then extract the contents anywhere (eer, ok, Linux machines :) ). If you have access to a RH, check /sbin/mkinitrd, it is a script that builds the initrd in a similar fashion.
BTW, Linux also supports mounting dirs into other dirs, with -o bind.
I used to live like that, then I took my big hard drive, slapped it into a linux box and shared it out with NFS, Samba, and NetAtalk. Now I can access all my files, which automagically get backed-up, from any machine on my LAN. Stop waiting for the 'universal' FS to show up, it'll never happen.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
I never read pdfs. If it isn't in html, or plain text, it's not worth reading.
Anyone working with very large amounts of very small data should be using a database. That's what databases are designed for, and heavily optimized for. If you want file system access to the database, there's nothing stopping you from doing this. I know there are linux kernel patches that will access SQL databases. Turning the whole filesystem into a database is a compatibility nightmare waiting to happen. You don't need database optimizations for storing files like /etc/fstab, and if you do, you've got other problems.
Yes, some of these experimental filesystems are returning some interesting results and creating some nice niche capabilities, but personally the filesystem he describes being well suited only for very large files is exactly what I and a great many other people need. Anything small is cached in memory. The only small-write function I need is the file system journal.
WARNING: there is a trojan on your
I hate to do that. :)
Oh, well. At least he didn't spell it, "Wala!".
file/..uid/..uid/..uid/..u
I'm delighted with the prospect of metadata-as-files and files-as-directories (ergo, metadata-as-directories?) -- but here we have another problem to address: Insufficiently escaped data. Human-readable data fields (including filename, if the user can read/write it) should be able to contain any human-readable characters. Filenames should be free to contain normal punctuation; path separators -- again, if the user can read/write paths -- should be selected from outside the normal punctuation set, or else the stuff between the separators should be escaped. Or the user-accessible file and path names should be stored as metadata.
Can't tell you how much frustration I've endured over other people's improperly escaped data. This just looks like one more case.
(Mac OS <=9 used a colon as a path separator, making it the only forbidden character for file/folder names, which could have been avoided so easily: How about a pipe, guys? or (shiver) a backslash? Or, even better, some control character unique to the Mac, akin to the Option-Shift-K Apple logo? Programmers. Sheesh.)
I just encountered the strangest artefact with the WinXP filesystem.
The "date modified" attribute of a folder never changes! It doesn't matter what I do in the folder, delete files, move files, add files, modify files; the "date modified" of the containing folder never changes.
How can this be? Win2k certainly worked the way you would expect. Has MS intentionally broken the XP filesystem to screw over all those 3rd party programs that rely on getting a reliable value for this attribute?
any insight much appreciated.
This is just dumb.
Hard drive space is under a buck a gig. And, some moron is talking about the benefits of saving a few kbytes, or maybe even several megs across the whole system by replacing config files in a universal standard with a completely non-portable implementation. Brilliant!
Blech. Dumb. Dumb. Dumb.
-- Intrusion prevention for Linux servers. www.cylant.com
An important advance in this direction is LFS, the Log-structured FileSystem. It's not exactly new; most of the recent improvements in it have been fine-tuning of the access and cleaning algos.
Basically, the main structure on disk is the Log. It stores all the iNodes and all the file data and metadata. If you have to write something to disk, you write it at the end of the Log. With a good buffer cache, this is extremely fast because you write large amounts of data contiguously.
Every so often you create a checkpoint, which is the metadata required to locate all the inodes and file data on disk at a particular time. Although I'm not aware of any implementation that allows this, you could theoretically roll the filesystem, or some part of it, back to any particular checkpoint (which has not been cleaned yet), or make it look to some user level program as if you had (they'd only have r/o access tho). Checkpoints also make crash recovery pretty fast.
Reading times are not quite as good as Ext2/3 under some circumstances (some workloads can massively frag files) but if you rewrite the majority of a file at a time, reading times can actually be faster. And running the cleaner a lot makes it even better.
There is a lot of CPU overhead, but very little disk-seeking overhead. The result is that as CPUs get faster, your IO will get faster; disk seek times are not getting much faster, and they are not the bottleneck with LFS.
The only major downside of LFS is the cleaner. Since the log only gets written at the end, it accumulates cruft and fragmentation over time, and grows enormous. You need garbage collection. So you have to deallocate data which has been rewritten later in the log, and compact highly framented segments into a smaller number of dense segments, in order to vacate segments for writing. This cleaner process burns serious CPU and disk, and is the main thing keeping LFS off the desktop. But if you're content to let your CPU spin for awhile every night to clean up, you don't have to run it while you're working.
Google for Sprite LFS
I hereby place the above post in the public domain.
The whole point of GConf is that lots of tiny files make it too slow to read, right? Presumably all that overhead of open/read/close syscalls is the reason. If they just made it all one big file, they would need only 4 syscalls to read in the whole config (stat/open/read/close).
If the FS went and made each config value a separate file, you would have to readdir/open/read/close a thousand times! Now imagine how many packets need to be sent when your GConf directory tree is across the WAN. The solution to this is to have a new syscall which would take all of these file requests and do them at once, returning the bits in some annoying new data structure (XML?).
Ideally, this would be implemented with the files staying the way they are, and simply having plug-ins that know how to read/write a specific format (MP3, GConf, passwd, etc.) to allow humans to play with them as necessary. This would allow me to easily write small scripts/programs to manipulate small bits of metadata without causing incompatibility with downlevel systems or massive slowdowns due to an explosion of syscalls.
aqazaqa
There were two pretty good articles in LinuxJournal written by Hans Reiser. Part II was published in May '03, but I'm not sure when Part I was published. The articles are high-level, for the most part, but has some interesting analysis of the algorithms used in Reiser4's design. The articles are available online to LJ subscribers.
Linux: The world's best text-adventure game.
Maybe we should replace filesystems with /etc/passwd access, something like:
/bin:
presistent hashmaps that have O(1) lookup?
This way, you can add any attribute to any
object you like. See Python's hashmaps.
So for
passwdhash = disk0["passwords"]
roothash = passwdhash["root"]
rootshell = roothash["shell"]
rootgid = roothash["gid"]
For chown of
disk0["/bin"]["perms"] = 0755
To check for available attribs, do:
print disk0["/bin/"].keys()
Bram
Bram Stolk http://stolk.org/tlctc/
I think the disk image approach is popular because, as you said, some types of metadata do not translate into all filesystems.. and now that we have OSX supporting NFS, CIFS, and a handful of other systems that could very likely be attached and in use, it's far easier to use an HFS+ disk image, if that's the kind of system that is ideal for you to install from.
For the uninitiated... mounting a disk image on a mac does not require any actual thought.. you double click it, and the disk pops up on the desktop. It's a very convenient tool. You can, of course, mount it the old fashioned unix way if using the gui violates your principles
What was great about BFS/Tracker? Can you elaborate?
Just like unix. Oh wait, it is unix.
... /dev/disk2s2 on /Volumes/PCTools 1.1 (local, nodev, nosuid, read-only)
I click on "PCTools 1.1.dmg"
then, after finder is done with it
# mount
It quite possibly is lif... funny enough, osx tells me
% file blah.dmg
blah.dmg: VAX COFF executable not stripped - version 376
He lost me as soon as he held up GConf as an example of what was to be accomplished. Have you ever LOOKED at the "xml" files that GConf generates? Ever tried to climb the ~/.gconf (and /etc/gconf) trees? I put GConf (and anything that aspires to be like it) in the same category with the Windows Registry. GConf is, by far, the thing I like least about GNOME (and, on the whole, I like GNOME).
Why do people keep adding needless complexity to fix systems that aren't even broken? If I can't edit my configs with vi, I'd rather use something else.
I want all of the power and none of the responsibility.
I don't think we necessarily need new filesystems or filesystem features. We just need a reliable and efficient way to get the bits onto the media. What I do think we need are new filesystem objects. In the past we've mostly been limited to the use of files, directories, and maybe links. Those work fine and are good building blocks, but how about extending the directory model. Have the directory begin to act like a file.
This is really more of an OS issue, but rather than having all the components of an application in many various locations, locate them all within subdirectories of one main directory. Then when you want to use that application, you click on the main directory rather than the actual application binary within that directory. Within that directory, you could include all kinds of meta data that would be useful to check the integrity of the application(SHA-1,MD5,file lists).
You could also use this idea to implement a kind of CVS for individual files. Rather than having some text file called foo.txt, you have a directory called foo.txt. Within that directory, there would exist the most recent version of foo.txt, but there would also be diff's that could be used to review the history of that files evolution. If you want to review the changes you've made or to revert to a version of that file two saves prior, the OS would automagically apply the necessary diff's on the fly.
We just need to begin to use the filesystem more creatively.
Anyone can make up a cock-and-bull story like this.
Just on general principle, I prefer my data inside the file and not left with the filesystem. The MP3 metadata example, to me, is like Windows file extensions on HGH.
.attributes created on the target legacy system; I'd be happier if just one big XML file could be created with the same name as the original file.
//rich onto server //legacy, and then you want to restore some files from //legacy to //rich. If all the metadata was stored in a big XML file, then when you copy the file from //legacy to //rich you restore all the metadata; you wouldn't accidentally slice off attributes by forgetting to copy one or more rich attributes files.
I'm with you -- I like self-contained file formats.
But I don't think he was proposing that you not use Ogg tags or MP3 tags; he was talking about the filesystem abstracting the tags. If you changed "Stagnation.ogg/album" to the string "Trespass", then the filesystem abstraction layer should update the Ogg "album" tag inside the file to be "Trespass".
The key benefit here is that you would not need some wacky command-line utility program to let you view and change tags on Ogg files. You could just use the shell. In bash:
for ii in *.ogg; do echo "Trespass" > $ii/album; done
Note that this same one-liner would work if you were in a directory with MP3 files, and you changed "*.ogg" to "*.mp3". Currently, you need to run vorbiscomment for your Ogg files, and mp3info for MP3 files. (I just checked, and sure enough, they take different arguments to do the same operations.)
Personally, I'd like to see a standard metadata portable XML format for legacy systems. People talk about copying a file from a rich metadata filesystem and having new files like
Suppose you backup server
You could do most of the fancy tricks of the rich metadata filesystem on a legacy filesystem that used the big XML file to store the rich metadata. And as long as the legacy system is just smart enough to look at the main data part of the XML and leave the metadata tags alone, you could still modify the file with sed, awk, perl or whatever, and then copy the big XML file onto your rich metadata filesystem and still not lose any rich metadata.
Note also that the big XML file could be used to deal with existing rich metadata systems, like resource forks from Macintosh filesystems, or multiple data streams from NTFS files.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
IBM will be launching their quantum series were information will be stored in the nucleus of the atoms
WHAT!!! Any references for this particular technology where I can get more information? raj
Well, if your name is Bill Gates, and the filesystem in question is the one underpinning Longhorn, that's not a bug, it's a feature.
What we have is a filesystem that is much more complicated in usage and concept. You can no longer see /usr/local/etc/apache.conf and know which parts are directories and which one is a flat configuration file. A database in one big file that is sitting in a standard filesystem is much more usable and easy to work with than the proposed solution.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
The transfer of metadata to a metadataless system is actually an old one, as anyone who's copied Mac files to DOS/Windows any time in the last, uh, 20 years could tell you.
Once again, the cutting edge of Linux research has finally caught up to state-of-the-art proprietary systems of ~15 years ago. Brav-fucking-oh.
I think keywords are a bad idea.
In the wonderful world of DOS they have keywords like CON and NUL
Which have caused no end of trouble.
If you allow a file/directory to have arbitrary tags, and access them like directories you will have problems.
Why take a step backwards?
ext2+journal=ext3
:/
fat32+journal=fat33?
fat32x is already taken
Slashdot requires you to wait 2 minutes between each successful posting of a comment to allow everyone a fair chance at posting a comment.
It's been 1 minute since you last successfully posted a comment
Chances are, you're behind a firewall or proxy, or clicked the Back button to accidentally reuse a form. Please try again. If the problem persists, and all other options have been tried, contact the site administrator.
I *like* using disjoing namespaces to organize disjoint functions or concepts. I like having separate tools that do one job well, rather than having to stop and think about what tool X's operation *means* in this part of the namespace.
/dev is an off-the-wall idea. I certainly don't want the filesystem on *my* machines to become even more ambiguous.
I've been doing Unix for a decade and I *still* think that
i was surprised to see no mention of it in the article, nor did i see any comments that brought it up. maybe it's off-topic, i guess.
:)
well, whatever the case, the Be File System is a POSIX-like file system (with all of the POSIX-mandated features), but it extends it far beyond with its named attributes. any file can have any number of arbitrary attributes that contain any arbitrary length of arbitrary data. i think the only real limitation was free space for the inode.
the only thing that is different from Reiser and Plan9, as i uderstand it, is how those attributes are accessed. it's been a while, but i think you had to use a rather clumsy utility to access the attributes from the CLI, but any aplication could be written to take advantage of them. quite easily, in fact.
it seems, if we took Reiser and BeFS and did a kind of crossover, we'd have an almost ideal file system.
grey wolf
LET FORTRAN DIE!
What is this obsession with reinventing filesystems? Yes, we all want blazingly fast and efficient filesystems that are self tuning. But why this rush to throw out the basic concepts that have been proven to work?
The article is a grabbag of featurettes, but with no coherent model. The classic UNIX filesystem (ufs, ffs, ext) has a simple model: everything is a file. Directories are files. Devices are files. Files are files. 99.99% of the time this is sufficient. It could use some tuning, but there is no reason to replace it with a big accretion of warts. Environment variables as files is an excellent idea. But the example of "gconf keys as files" is a wart, since gconf files are already files. If there is a pressing need for inode-free files (metadata only) then extend the existing model with inode-free files. Simple.
When you do a brainstorming session, you don't take all of the ideas on the chalkboard and roll them up into one new system. Instead you take the best ideas that work together and leave the others behind. Sometimes you even have to leave a good idea on the table. It's great that people are coming up with great ideas. But those ideas need to be fit into a working model that is lucid and elegant.
A Government Is a Body of People, Usually Notably Ungoverned
When you look at the examples, Plan 9 or pre OS X Apple, the basic concepts were all worked out over ten years ago. Apple has backed off because of competitive pressure, and Bell Labs is effectively dead. Anything that Microsoft does will be plagued by bugs and bad design, and will have 'features' designed to trap users into only using Microsoft products. (Note: trapping users is an underlying reason that Microsoft has such bad design and buggy code.)
I know that academics are trying a lot of new things, but for work like this you need a real live operating system. Without Open Source, only those inside a large company research group with an OS would be able to tackle this class of problem. Times have changed, and there are far fewer places like Xerox PARC, Bell Labs or DEC research, and those that exist are on a short leash. If the state of the art is going to be extended, Open Source is going to make it happen.
Go ahead and use ReiserFS. I dare you.
Filesystems must be replaced with persistent object trees where each object maps to a system class available to all programming languages through a COM-like mechanism. Directory objects should implement a collection interface, and the O/S will have the burden of the object I/O (since it will know the object's class). Reflection is enough to know what a file is and which "metadata" it has.
Feh.
Search 2010 Gen Con events
the security hole in NFS is simply client and server mutually authenticating each other. Encryption of transmitted data is not required. Or rather its not a security hole per se: you might or might not care if someone reads your data but you sure as heck do care if someone can explot nfs to either edit your data or log onto the client.
Thus a better solution would be only to use ssh during the mount (and export) steps, then let all traffic flow over a normal NFS channel.
I remember reading an interesting paper a while back about metadata and some security implications. Doesn't the whole idea of metadata-as-file leave the casual user open to problems of unsure content?
For example:
1) Someone makes an image called "Cool Wallpaper.jpg" (cuz the average Joe doesn't use PNG).
2) Joe user downloads it and installs it in some common directory
3) Joe gets busted for child pr0n cuz they attached something nasty in the metadata
This is a pretty bad example, but I can think of all sorts of interesting issues with badly configured common space and executable files. (I realize that if the ACL's are correct this isn't a problem, but Joe Average tends to mess up a lot)
The Apollos had a typed file system, and as I recall wbak captured the type information, ACLs, and possilby other information.
Here is some info from: Harvard
While I agree that using a uniform metadata system and a Plan9-like OS could be very insteresting, I think that the type should be kept within the filename (foo.ogg) instead of having a file foo with an attribute type=ogg inside.
Why? For the same reason, that we use name for files instead of numbers: it's easier for the users. If you embed the type inside an attribute, what happen if you have a file foo.ogg with an attribute type=exec ?
Lots of problems for the users!
The type attribute is special IMHO: it tells the users what this file will be used for.
If I download a file foo.ogg, I'm not especially carefull before activating it in my browser (at worse it will only hurt my ears), if it is an executable file, I should be especially carefull..
Putting the type attribute in the filename is the easiest way to tell this information reliably to the user, so let's use the KISS principle here!
Like that old IBM workhorse, ISAM.
Eventually, we will have filesytems implemented using relational databases instead of hierarchical databases. BeOS's filesystem, was a good first step towards this reality.
Imagine a filesystem organized like a relational database conforming to Codd's 12 rules of databases. Programmers could grow and change the filesystem schema to match engineering requirements. It will be cool!
Try copying some files between a VFAT partition and an EXT2 or EXT3 partition if you want to get an idea how Linux handles such issue.
The tradeoff is garbage on one file that was being written to when the crash happened, or garbage in the whole disk after holding down 'y' through thousands of incomprehensible fsck questions.
Please remove the period key from his keyboard. Also try slapping him sharply at the end of sentences when he speaks, do not allow him to trail off, until he demonstrates that he doesn't think like he writes.
When will File Systems become optional user databases above Orthogonal Persistent storages, which are much easier to implement efficiently, and form a much more convinient platform to work on? :)
Well, the filesystems in OS/2 (FAT, HPFS, JFS) do support metadata (Extended Attributes). CD-ROMs don't. A cheap trick for not losing the attributes from files when saving on CD-ROMs is packing the files in .zip files (compressed or not). The .zip format supports metadata but I think they usually are OS-specific, though.
.EA file for each original file and glue it back later, but this is ugly.
Another option is to cut the EAs into a separate
__
Men with no respect for life must never be allowed to control the ultimate instruments of death.
GW Bu
Nobody's arguing that journalling isn't a big win.
EXT3 doesn't require you to fsck either, but it gives you data journalling as well as metadata journalling, so your system files don't get mysteriously trashed.
you idiot.
Although most file systems are not relational databases, I sure wish they were. Hierarchies don't handle multiple orthogonal categories very well. Hierarchies are okay for small stuff, but get messy when the volume exceeds about 100 folders or so IMO.
But, hierarchies are easier for most users to relate to. It seems like a choice between the "proper" way to organize information (relational or set-based), and the "easy to learn" way. Thus, unfortunately, I don't think relational/set file systems will catch on any time soon. Bummer.
Table-ized A.I.
You have complete isolation between the physical and logical filesystem. Adding extra disk space is no harder than use the logical volume manager and grow the filesystem. And if your box ever happens to go down, it is back up in no time.
It also scales very well under heavy loads, even on lowly intel hardware. Of course the is no substitute for a nice big Origin :)
How does change notification work in Reiser 4? Is it publish/subcribe (efficient) or polling (as FAM does)? How does it interact with security?
Nothing is further from the truth. Mailing Disks is Faster than Uploading Data From the linked article: it will take a year to read a 20-terabyte disk
Filesystems most certainly need developing!