State Of The Filesystem
Skeme writes "Have you heard of Plan 9 or Reiser4 but don't know much about them? Are you curious about the improvements free software is making to its filesystems in general? Read my summary of the current developments in the filesystem: namely, what improvements we can expect (a lot), and what Linux and the BSDs can do to improve on the filesystem."
I've always wondered how these filesystems with metadata handle transferring files between different systems. It would suck to have all your MP3 info in filesystem metadata and then lose it all when you transferred to a system without fs metadata. Anyone have any insight?
There has been talk in the kernel mail list of implementing 9p, the plan 9 distributed filesystem.
Now that is Open Source, I hope it will happen.
If Linux and related systems move to filesystems with really powerful metadata support, presumably the lockin would be much stronger. Moving a directory from Linux to a Windows system may be possible but the programming to do it will become increasingly painful and the risk of data loss will rise. And with mainframe integrity, why would you want to, Mr. customer?
Apart from the CS issues, is this an attempt to use the embrace, extend weapons of Microsoft against it by turning the Linux filesystem into a full mainframe system, effectively squeezing out Windows servers by a convergence between big tin and small boxes? I guess this is pretty pie in the sky but I'd like to think so.
Panurge has posted for the last time. Thanks for the positive moderations.
I do not know much about file systems, so I have a few questions.
o w.ogg. I let
RightAboutNow.ogg/title be "Right About Now", /artist be "Millencolin". I
could add a file RightAboutNow.ogg/genre and put "punk" in it.
Once we see the GConf example, other possibilities immediately spring to mind. In almost all multimedia formats--such as MP3, MPEG, and OGG--there is a tagging system for storing things such as the author and title. Instead of storing those attributes in the tag--yet another namespace--store it in the file-as-adirectory. I have a file / music/Millencolin/PennybridgePioneers/RightAboutN
While this is a nice example, I wonder if there really is an advantage to this kind of file system, because it seems like it takes more effort to keep track of all those sub-files instead of keeping all the info in a single file. Anyone can shed some light on this?
Also, what if I format my partitions with different file systems, say Reiserfs 4 and ext3, could there be any imcompatibility issues? Imagine I kept mp3's on both partitions, would my mp3 player know how to handle both formats, since the tag info is dealt with differently on both systems?
I agree that we need a revolution in how filesystems work inside an operating system, but it seems that the arguments placed in this paper had alot of holes.
For one thing, the need for changing a filesystem should not really be solely concerned on space or metadata. I think security, speed of data retrieval, and self correcting error engines should be centered on the new systems.
The reason for the speed of data retrieval as being more important than data size is because hardrives are getting much bigger than they are faster. In five years, we may have 20 terabyte drives, but the access speeds will still be horrible.
Security and error correction are obvious points that should be implemented on a systemwide level. When these features are system wide, then management becomes much easier for all system users.
I agree that metadata in the filesystem is a risky proposition. Just on general principle, I prefer my data inside the file and not left with the filesystem. The MP3 metadata example, to me, is like Windows file extensions on HGH. I remember John Dvorak wrote a piece about Windows file extensions a long while back, and he argued that file types, etc. should be inside the file. A header of sorts. I tended to agree then, and I see filesystem metadata as a bad trend.
I may be way off base, but I see the need to have a existing non-journalling file system. Someone stop me if I'm wrong, but in my mind things like audio recording, video capturing and the likes would suffer performancewise from being run on journalled file systems
The stars that shine and the stars that shrink
in the face of stagnation the water runs before your eyes
It seems that the author presumed that the only use of LDAP is to provide passwords for user authentication. While that is a common use of LDAP it is not the only use.
It would seem that having a file system that is LDAP aware could be extremely useful. Imagine if your LDAP tree were reflected as a tree in your file system. You wouldn't need to embed LDAP calls in your application, it would just be data in your file system. So looking up an attribute for the current user, or a user, would be as simple as reading a file that holds the value of the attribute.
XFS is a good example of journalling filesystems. But how about filesystems like Coda, AFS and Intermezzo, a new generation of networking (actually - distributed) filesystems allowing disconnected operations?
Less is more !
Is for someone to come up with a real unlimited snapshotting filesystem for linux. I don't want to use user mode hacks (as nice as they are rsync style snapshotting isn't reliable enough), or snapshotting that only allows a shadow copy of the entire volume, I want to be able to tell the users that they can just go into ~/.snapshot/time (where time can be hours, days, or weeks in the past) and copy the file they messed up back into their home directory. Basically I want the most usefull feature of netapps without the HUGE markup =) The cost in admin time both in user interaction and reduced need to do tape retrieval and file restores is immense.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
I feel this very interessting.
:-)
For example I like the currents devfs and procfs (although not perfect). Those help me a lot to "debug" new hardware connection. SImple and coherent.
I also imagine some RDB support. Imagine a "select * from account" where account would be asimple directory. Would be nice to use "find" for the where clause
Imagine also implementting OO classes wiht inheritenace using symlinks... And more...
Yes would be very nice!
D.
Actually, ZIP is extensible. You can add new data blocks to the ZIP file and if you try to unzip it with a decompressor that doesn't understand your extensions, they'll just be ignored. The Be ZIP implementation extends ZIP with support for its native file system attributes meta data, for example.
TAR isn't so much showing its age as dribling into a bib in the old folks home.
You're right, XML isn't the most efficient way for gconf to store data. But is rewriting the filesystem really the answer?
LordBodak's journal.
The only thing I'm concerned about is backward compatibility - if someone accidentally tries to open a file with a trailing slash, and gets an error because now it's a directory, then it's a Bad Thing.
[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
A filesystem that goes wrong properly...
Imagine your filesystem is a library...
Imagine you drop a bomb on it...
Books and pages scattered all over the place... Yet you can still work out which page belongs to which book, and where on the shelves they used to sit..
Having been caught short by LVM and reiser before (which just couldn't deal with a 46GB gap in the filsystem where a disk used to be) it seems to me that no-one's made a filesystem that breaks properly...
For me.. speed is not an issue.. nor is CPU usage... I'm quite happy to throw a dedicated box at handling the filesystem... I just want something that is written with the thought in mind... "Our hardware is unreliable.. It will die.. it wll loose bytes... How do we deal with it"... not the usual thing that people do of "Our disk is 100% reliable... how can we most efficiently organise it... Hrm.. now what do we do when we encounter problem X" There should be no fsck... That should be handled by 'mount'
Sure.. there's always RAID... but I reckon I've got about 28MB of data in total that's critically important... the rest is just crap I downloaded from the net and can always get again...
Continuing my anology further...
If your filesystem's a library, then then your physical disks are wings of the library.. (east wing.. west wing.. etc)...
I want a librarian who's job is to work out the best way to organise the data in the different parts of the library... When you're looking for a book, it's often easiest just to ask the librarian.. who magically holds much of that information in his/her head.
The librarian is also responsible for making sure that multiple copies are maintained for popular or important works...
It'd also let me do the thing I've always wanted...
chattr +mirror filename
(i.e. This file is important and must be mirrored on 2 or more disks)
Just my two pennies worth... I'll write it one day if someone else doesn't get there first.
The more I read this, the more it reminded me of the marketing version of how Apple would like us to think of Resource Forks.
Truthfully, there isn't exactly a lot of difference in the concept or the idea. Implementation is vastly different but the idea remains very similar.
Why do I want to accept this sort of idea anymore than I want to accept resource forks? If I copy a file with resource forks from one of my macs to nearly any other OS on the market thats not specifically configured to support them, I lose that information. Why do I want to continue this?
I use HFS+ because I have to. To get all the functionality I want out of my macs, its the only real option I have. But for anything other than system level files that are never likely to be copied to another machine, this is just a waste of time to me.
Next question. Say I do run this file system on my machines. I build up a heap of data and I'm using "files as directories" to store metadata about those files. How do I back it up? Don't even try to tell me "rebuild tar". Haven't we put tar through enough to try and extend its capabilities? I wouldn't touch a file system with these capabilities without a guaranteed way of being able to backup ALL the data. Otherwise its just truly not worth the effort.
Good old Acorn RISC OS already supported the use of directories as files back in the eighties. E.g.: Click on a file to open it, shift click to show a directory of sub-files (recurse at will).
You're thinking of application directories. It could not be done with ordinary files - They had to be a directory.
The only difference was that if a directory had an exclamation mark as its first character, RISC OS's default action would be to execute a file called !Run inside that directory.
Fantastic idea - It meant that you could keep all the libraries and files required for an application in one folder. It also meant that you could move the program and its associated files wherever you liked on the hard disk - There was only ever one icon to move.
If you ever decided to move stuff around like that in Windows, prepare for your programs to stop working.
AmigaOS had a single meta-data field in its various filesystems, called the "file comment" or "file note". While it was clearly far less powerful than what is proposed here, it was incredibly useful for lots of purposes.
The problem you describe, losing the meta-data when moving to another filesystem, was non-existant on the Amiga for two reasons: all filesystems implemented filenotes, and LhA (the Amiga-equivalent of tar/gzip) saved and restored filenotes. A similar solution can be used for Linux (tar and gzip are getting very long in the tooth anyway).
I would love to see proper meta-data fields implemented throughout the system, especially if it is pervasive throughout the OS (so I can search for specific values from the desktop, for example).
Finally, I believe BeOS had something similar, and users of that filesystem also tend to be enthousiastic about it. All this doesn't mean the current solution is necessarily the best one, of course - just that the concept is pretty damn useful.
already one such implementation exists such that FreeBSD can expose its file system to plan9 machines (as you would expect it gets imported into your namespace. Would can be a different place depending on the namespace of the current process. Even (temporarily) "replacing" your local files with versions on the FreeBSD Box, if that's what you want.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
I used to live like that, then I took my big hard drive, slapped it into a linux box and shared it out with NFS, Samba, and NetAtalk. Now I can access all my files, which automagically get backed-up, from any machine on my LAN. Stop waiting for the 'universal' FS to show up, it'll never happen.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
syslogd wrote a panic message to /var/log/messages as the system went down, and you'll find this message (as well as the rest of its 4k block) in /etc/passwd, while your changed password file may be found at the end of /var/log/messages. This is a feature of ReiserFS, not a bug.
Care to explain that? How is corrupting one of the files essential to the normal operation of your system not a bug? The whole point of a modern filesystem is to recover cleanly; even fsck can run unattended (mostly) but what you're saying here is that after an unplanned reboot, you might or might not need a sysadmin to manually intervene.
The whole point of GConf is that lots of tiny files make it too slow to read, right? Presumably all that overhead of open/read/close syscalls is the reason. If they just made it all one big file, they would need only 4 syscalls to read in the whole config (stat/open/read/close).
If the FS went and made each config value a separate file, you would have to readdir/open/read/close a thousand times! Now imagine how many packets need to be sent when your GConf directory tree is across the WAN. The solution to this is to have a new syscall which would take all of these file requests and do them at once, returning the bits in some annoying new data structure (XML?).
Ideally, this would be implemented with the files staying the way they are, and simply having plug-ins that know how to read/write a specific format (MP3, GConf, passwd, etc.) to allow humans to play with them as necessary. This would allow me to easily write small scripts/programs to manipulate small bits of metadata without causing incompatibility with downlevel systems or massive slowdowns due to an explosion of syscalls.
aqazaqa
Maybe we should replace filesystems with /etc/passwd access, something like:
/bin:
presistent hashmaps that have O(1) lookup?
This way, you can add any attribute to any
object you like. See Python's hashmaps.
So for
passwdhash = disk0["passwords"]
roothash = passwdhash["root"]
rootshell = roothash["shell"]
rootgid = roothash["gid"]
For chown of
disk0["/bin"]["perms"] = 0755
To check for available attribs, do:
print disk0["/bin/"].keys()
Bram
Bram Stolk http://stolk.org/tlctc/
He lost me as soon as he held up GConf as an example of what was to be accomplished. Have you ever LOOKED at the "xml" files that GConf generates? Ever tried to climb the ~/.gconf (and /etc/gconf) trees? I put GConf (and anything that aspires to be like it) in the same category with the Windows Registry. GConf is, by far, the thing I like least about GNOME (and, on the whole, I like GNOME).
Why do people keep adding needless complexity to fix systems that aren't even broken? If I can't edit my configs with vi, I'd rather use something else.
I want all of the power and none of the responsibility.
Just on general principle, I prefer my data inside the file and not left with the filesystem. The MP3 metadata example, to me, is like Windows file extensions on HGH.
.attributes created on the target legacy system; I'd be happier if just one big XML file could be created with the same name as the original file.
//rich onto server //legacy, and then you want to restore some files from //legacy to //rich. If all the metadata was stored in a big XML file, then when you copy the file from //legacy to //rich you restore all the metadata; you wouldn't accidentally slice off attributes by forgetting to copy one or more rich attributes files.
I'm with you -- I like self-contained file formats.
But I don't think he was proposing that you not use Ogg tags or MP3 tags; he was talking about the filesystem abstracting the tags. If you changed "Stagnation.ogg/album" to the string "Trespass", then the filesystem abstraction layer should update the Ogg "album" tag inside the file to be "Trespass".
The key benefit here is that you would not need some wacky command-line utility program to let you view and change tags on Ogg files. You could just use the shell. In bash:
for ii in *.ogg; do echo "Trespass" > $ii/album; done
Note that this same one-liner would work if you were in a directory with MP3 files, and you changed "*.ogg" to "*.mp3". Currently, you need to run vorbiscomment for your Ogg files, and mp3info for MP3 files. (I just checked, and sure enough, they take different arguments to do the same operations.)
Personally, I'd like to see a standard metadata portable XML format for legacy systems. People talk about copying a file from a rich metadata filesystem and having new files like
Suppose you backup server
You could do most of the fancy tricks of the rich metadata filesystem on a legacy filesystem that used the big XML file to store the rich metadata. And as long as the legacy system is just smart enough to look at the main data part of the XML and leave the metadata tags alone, you could still modify the file with sed, awk, perl or whatever, and then copy the big XML file onto your rich metadata filesystem and still not lose any rich metadata.
Note also that the big XML file could be used to deal with existing rich metadata systems, like resource forks from Macintosh filesystems, or multiple data streams from NTFS files.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely