Database File System
ozy writes "With all the fuss about searching and Spotlight and WinFS, check out the Database File System a completely different interface for your files, implemented in KDE. There is actually a request for developers to join a project to implement this under GNOME and leave how we use the desktop today behind."
...seems to have something more interesting: storage
...I can kind of see this would make it easy to search and locate documents. What about backups though? How would a user be able to group (manually) related files together, so that the whole bunch can be backed up later, without having to search for all seeminly related (or unrelated) keywords to trace all hitherto-unrelated documents?
Secondly, with this mass of files being spread over several disks, surely, this is in a way forcing the user to "search" for everything. Or isnt it? Will the underlying FS layer still be accessible in the general way that it is?
http://efil.blogspot.com/
Isn't this thing with DB's getting a little excessive? You're adding another layer and step to storing data which will in all likely hinder performance. I'm not sure the benefit out weight the cost.
-----
One is born into aristocracy, but mediocrity can only be achieved through hard work.
Such thing should be implemented at kernel level to be transparent for *any* aplication. Without this it will just lead to a mess (like 4 different implementations) and some apps working with it and most not. As f.e. you can browse SMB network with Nautilus but when you actually try to open a file (from SMB via Nautilus) in OpenOffice.org you will get a info that viewer does not support this method... It must be a standard system routine not another level between system and GUI.
Article doesn't address whether or not we can turn DBFS off and use the more traditional hierarchical method of file placement. Will we be dragged into this kicking and screaming?
I have always thought that version control (file histories, branching and atomic changes) would be nice to have at the file system level. Instead of storing myessay-firstdraft.doc, myessay-seconddraft.doc, myessay-final.doc, the file system should do the work. Then if I want to make a bunch of changes (perhaps I want to try a new page layout), I should be able to commit them as one atomic change (or throw them all away if I change my mind). Then, when I want to make a set of documents with US spelling, I should be able to branch the whole lot (using no disk space) and make the small changes from UK spelling while still being able to integrate other changes I have made.
How much permforance overhead will this cause? The 'Desktop Environments' already eat a lot of RAM and CPU.
How much disk space will you lose over this? All the metadata has to be stored somewhere, and just glancing over the link I read something about a versioning system, which will definently take up quite a bit of space. Will a 20gb hard drive become 15gb with DBFS?
Doesn't seem to be true. Article indicates the DBFS will run on top of other filesystems. if you don't use KDE, you don't use DBFS... ext2/3 etc is still there. The real question is whether you can use KDE without using DBFS...
Perhaps I am more organized than most but I already categorize my files and such in the hierachal file structure.
Isn't Rieserfs planning to do this on the kernel level?
Where does that leave other fs choices and storage and other idea dbfs?
I see more and more people saying look what neat things you can do with these tools.
But really why do you personally want something like this?
Curious to see the response is all.
ACK
The author is asking for help to move the project to Gnome.
Quote:There is of-course the hard choice of platform. I choose KDE? because I am familiar with QT a bit, and because it is inherently object-oriented, being C++ and all. But in my mind GNOME? is much closer to how I would like a desktop system to function. So I would like to go for the GNOME option. I leave KDE developers to do what they want with this, and I am offering them support. But I would like to focus my efforts on GNOME and implementing the above in GNOME.
Any volunteers?
In the first place we will need developers. Would you like to join, send me an email (o.gorter@student.utwente.nl) with DBFS and JOIN somewhere in the subject. If you are not a developer, but still would like to help, please revisit this page in a few weeks. There will probably be a community website by then somewhere.
The database file system originated from the ideas of an object-orientated database. Keywords and references are all part of the orientation objects of the database to index to files or other objects. It does away with the traditional hierarchal view, being rooted at some place. The OODB does not need to be rooted as it is more like a web. The DBFS seems to try to implement part of the concept of the OODB. Good. There are many more features an OODBFS can offer: dynamic organization, classification, and mutliple "skeletal" views to name a few. I hope that this DBFS will give a taste of what an OODBFS offers.
(In the course of the heated discussion about Reiser4.)
While everybody is busy making fun of WinFS, Microsoft is very quietly and successfully letting their customers install SharePoint sites all over the place.
As usual, Xerox came up with the concept years ago (DocuShare). Sigh.
there's no place like ~
Maybe we could call it a "filing" system since it indexes files that are on another file system. Really, a file system IS a database, not an add-on that indexes files. Still, perhaps this is a better approach than trying to redo all the file-system internals. Although to be truly useful, this needs to be an API that is GUI-independent, with GUI-bindings as needed.
Stop letting good ideas be victimized by the M$ marketing FUD machine. Someone said "database filesystem" That was a good idea. M$ can along and said gee lets steal that idea. Hey there is no existing implementation to copy how do we do it. The answer is they did not do it. They put a database to keep rather mundane information on top of NTFS(a real filesystem) and called it WINFS(NTFS and a almost completely unrealated database, not a file system). Keeping data on the relations ships between files is a nice idea. Putting it in user space is dumb. Its overhead. Look at most of the example he gave "find a word document I worked on last month". All that info is already in the filesystem. A filesystem really is already a database in the strictest since. It stores whats on which inode assoicated with how many blocks which you could think of as attributes. It also sores attibutes like permissions and dates. Why not just put some more attributes into it like subject and relatedtopicID . If you did that and then added the ability to maintain some other tables where you could put extended descriptions and stuff, and built up the query engine to be able to efficently solve queries users will likely ask then you'd have what your really looking for. Addionally you would lose the overhead to a degree because you'd be storing informaiton once instead of in the FS and in the database.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Normally I file things in a hierarchial method by year and month and by project name (2004file/9sep/) or (2004file/workfile/projectname), but still I lose track now and then and need keywords. Change the "slash" slant to fit your OS.
Have a look at this for a userspace filesystem http://www.inf.bme.hu/~mszeredi/avfs/
/etc/apt/sources.list
There are a debian package called fuse-source and fuse-utils. I.e. use all these nice kio plugins on filesystem level.
Just add
deb http://www.kalyxo.org/debian/ unstable main
deb-src http://www.kalyxo.org/debian/ unstable main
to your
apt-get update
apt-get fuse-source fuse-utils
Actually, I didn't test that yet. Someone else?
Adding a database layer to Gnome sounds like using another 300 megabytes of storage on my hard drive. I simply do not need the database.
If the FSF/GNU folks really want to do something revolutionary, they should fork Linux+Gnome into 2 distinct paths: minimalist and maximalist. The maximalist is what we have now. The minimalist is a minimally featured Linus+Gnome distribution. It is the bare minimum in functionality that we need to have a decent operating system and desktop.
Into this minimalist installation, I will then add the applications (e.g. MatLab) that I use daily.
Keeping the traditional file system is only logical as this database sort of file access is in fact a higher level of abstraction.
.....
Considering there are numerious project of such higher level of file access abstraction going on, it does become a secondary choice for the user if they want to use one of these higher abstraction level file access systems.
To remove the traditional file system altogether would be a mistake, as then it could become a system of babel or keywords.... "what was I thinking when I created that keyword and lets not even get into what crazy joe was thining when he came up with his keywords...
But hey, given how MS based developers would create some obscure dll name and place it in some obscure location in order to copy protect
higher level abstractions are useful only to the point that you can, if need be, drop down in abstraction level to get your bearings as to where you are. If you cannot touch physical reality then how do you know you are not floating around aimlessly?
being out of touch with physical reality can evenm be very dangerious and hard to correct.
standard filesystems only have ONE index - a hierarchical one that contains a certain amount of real-time-updated indexing (such as the timestamps on a directory)
but it is NOT a relational database: you CANNOT easily create or use an alternative index to your files.
that's what all the fuss is about.
some people mentioned here that they already organise their files. great. fantastic.
HOW LONG DID IT TAKE YOU?
and how long would it take to reorganise?
with a relational database, all your indexes are updated AUTOMATICALLY.
therefore, doing searches on a relational database filesystem (find me all music files with dates between last week and last month: SELECT * from files WHERE files.type = "music" and files.date NOW() - 7days
you _can't_ do that sort of thing on a traditional filesystem.
sure, you can emulate it by creating symbolic links all over the place, but what happens when a file is deleted or moved? you need to manage / relocate the symbolic links...
nah.
databases.
fantastic idea.
now can we have them as a kernel module, pleeease?
At a deeper technical level, nany of the questions asked here have historical answers or clues in The Design and Implementation of the Inversions File System. The abstract reads:
Note that this paper was published in early 1993. Many of the issues it addresses are relevant to DBFS, and many of DBFS's advantages are foreseen by that paper. IMHO DBFS has chosen a direction that should have better performance than inversion, not to mention lower risk and easier failure recovery.Inversion was built on POSTGRES, which makes one wonder what happened to the source.
Sorry, but I have to say it, you are an idiot!
...
Did you even care to RTFA?
This has nothing to do with the gnome or kde devs. Some developer invested his time to come up with something he thought was useful and all you can do is complain?
And if you look at the project, it is something completley different then a plugin for the admittedly great ReiserFS4 and it is here and usable right now.
So friggin stop your stupid whinig.
And to the mods who modded parent interesting
Btw., why don't you whine to the Reiser people that they should stop developing now? It would be as justified as your whining aboutt this project.
A db fs with rich searching of metadata requires the orderly and accurate entry of said metadata.
You can't get organisation out of nothing - you are just asking people to be organised in a slightly different manner.
An organised person can already work effectively in a filesystem with current tools. The fact that they are organised is the key.
A disorganised person is not helped as their metadata will be erratic or absent. In fact might they now have the capability to be even more disorganised?
As I see it this is not solving an organisational problem as much as shifting the interface to the problem. I do not believe it to be either an easier or better way to organise personal data. Conversely I do not believe it to be inherently worse either. It's just different.
Where's the gain?
For those of you that have not yet looked at the Mac OS X (Tiger) preview and WWDC web cast, the new spotlight technology built into the next version of Mac OS X looks very much like a fully integrated database file system. And it's incredibly fast. Go check it out!. Note: QuickTime required. Mplayer may work for us Linux heads but I haven't tried it.
-- DuckWing
People can offer their opinions for or against this, but I think that any innovation benefits linux. I've read about WinFS and it sounds like a good idea, but who knows when it will be ready. If people working in their spare time can get something like this working in linux before Microsoft can get it out, I think that would just be another reason to trust the open source model of developing code and squash Ballmer's FUD.
I don't have too much trouble using a hierarchy file system. I keep my stuff pretty organized, but computers are supposed to save time, not create more problems. If this database can do a good job, I'll give it a shot.
This basically seems like giving find the capability to do file and then storing the results with locate. And add some time stamping, sorting capabilities and GUI.
Since all but the GUI are basic commands, it would seem sensible to have an underlying library with hooks for use by your choice of desktop manager.
To-do List: Receive telemarketing call during a tornado warning. Check.
I wouldn't normally jump in except this is modded +4, even though the poster doesn't appear to have read the article.
The article doesn't talk about an actual "file system" (it admits its a bit of a misnomer itself), merely a way of referencing files with "keywords" in database that, according to the author, will make it easier to find your files. The author has written a file browser and file selector dialogs for KDE that use this, then goes on to say that he's planning to make a GNOME implementation soon, but nowhere does anyone start any name calling.
Since this article essentially discusses a GUI and not a file system (which could be implemented at kernel level), it would be a little silly to use the same tool for GNOME and KDE, since they have different look-and-feel. That said, a common library of functions needed by both the GNOME and KDE versions would obviously be sensible, and would speed up porting to other desktops.
Furthermore, this idea claims to work on top of existing heirarchical file systems (removing the idea of a hierarchical FS completely would involve a major restructuring of the whole operating system and everything that works on it), so just saying that it should use reiser 4, I suppose because its just l33t, is a little redundant. This should work on top of just about any FS, ext2, reiser, or even an NFS mount.
Whether or not you look at system files every day probably depends on what you are doing with your machine and what you consider "system files" to be. Moreover, this idea would seem to go entirely against the whole UNIX "everything is a file" philosophy which is supposed to be one of the great strengths of UNIX.
flossie
Write now. Defend liberty
I'm using Subversion for a project and the idea of Atomic Commits seems like an obvious direction for file systems. If that other slashdot story is correct, storage becomes less of an issue and it would be possible to roll back the system to any point in time or to only roll back one file if need be. Now throw an intuitive way to navigate files on top of that and you've got a sure winner.
In the grand scheme of things, only a very small handful of us on earth are aware of Linux or even know what an Operating System is for that matter. File systems seem to be the big stumbling block for new users. Anything that can make computers and therefore access to information easier for the coming waves of new computer users (maybe billions of people?) will be a good thing. Even if the "bloat" slows down the system by 10%.
I hate to preach but that old quote comes to mind "With great power comes great responsibility". I don't think most of the people working on the OS that will soon dominate in developing nations (that's Linux) are aware of the harm they can do by slowing down Linux development with petty personal disputes. Like it or not, Linux is no longer just an edgy hacker tool. It has the potential to change the lives of Billions of people.
What if Digg added local news and a Slashdot inspired comment karma system? ---
http://houndwire.com
Search for: Documents
Modified since: last backup
I don't know about the other implementations but Searchlight and WinFS are implemented atop the existing filesystem. (The FS in WinFS supposedly stands for "Future Storage" and not "File System") Sort of like how Google is implemented "on top" of the regular hypertext-linked internet.
Take a look at an AS/400 (iSeries). They've been around for more than 10 years. And before that you had the System 38 and System 36... and so on..... Why is this some big revelation now?
-Cnik
This did mean that you had to use the right tools to get into the files or you had to cope with the changes in programs that worked on them.
VMS also had file revision numbers on files as a couple of posters have noted.
Both of these were nice in some ways, but relatively difficult to deal with in other ways. By comparison unix is straight-jacketed but easy to use.
What about backups though? How would a user be able to group (manually) related files together, so that the whole bunch can be backed up later, without having to search for all seeminly related (or unrelated) keywords to trace all hitherto-unrelated documents?
Defining "group" is one of the problems with hierarchical systems. What is the group you want is dependent on your needs at the moment. Relational makes it easier to create ad-hoc groups.
For example, do you want to backup by file age, file type, and/or by topic? And in the real world topics naturally overlap. Set theory has an easier time with this because it is meant to deal with overlaps; but hierarchies get messy beyond about 3 orthogonal factors. Relational is closer to set theory than trees are.
A drawback of sets is that most people are not familiar with non-tree arrangements. There will probably be a "training hump".
More about sets versus trees.
Table-ized A.I.
Before inventing something you should check if no one did this earlier. Because there you have GNOME Storage. Don't be fooled by screenshots there. Storage isn't only cool search facility with native language parser ("computer, find me all porn I've downloaded yesterday" anyone?).
Storage is, suprisingly, method to store files decomposed to contents. The great searching ability is a side effect.
Imagine collaborating in of group of people over one document. Every one got some paragraphs to edit. With Storage, everyone can edit this document in the same time, seeing other's changes as letters are typed. Store version history and you have revision control. Throw in network transparency (you go to other department, connect laptop and automagically you can work on those department files) with OpenTalk (Zeroconf/Rendezvous) and you got best idea since hierarhical directories.
Be sure to read whitepaper about Storage available on mentioned site. Also check for Storage related entries in Seth's blog (Seth is one architect of GNOME Storage). Now if only KDE people work on compatibility with Storage, freenix desktop would rule the world.
BTW, KDE, don't miss chance of integration! KDE is planning to introduce google-like search in desktop. Don't reinvent wheel! Beagle is here, working. Just integrate Beagle with KDE desktop and we are set.
:wq
Joel on Software said it best:
For example, WinFS, advertised as a way to make searching work by making the file system be a relational database, ignores the fact that the real way to make searching work is by making searching work. Don't make me type metadata for all my files that I can search using a query language. Just do me a favor and search the damned hard drive, quickly, for the string I typed, using full-text indexes and other technologies that were boring in 1973.
Laws do not persuade just because they threaten. --Seneca
I hate GNOME-VFS and KIOSlaves and all that stuff that people try to roll into a higher layer than they should. The idea of combining VFS functionality and a *desktop environment* is *stupid* and exactly the sort of fragmentation that hurts everyone. Want a userspace VFS layer (like, other than the existing transparent systems like LUFS and FUSE, which are much better if you can use them, because existing apps continue to work) Fine. Make one. Call it libvfs, and make KDE and GNOME bindings. But for the love of God, quit trying to roll filesystem functionality into desktop environments. It's ridiculous, and not where it belongs.
That being said, I'm still not a huge fan of this.
There are three main features that people seem to want with a DB-based FS:
* Transaction/views/high-level-DB functionality.
You don't want a FS that tries to do all this. DBMSes have worked for a long time to do this efficiently. If you want this, use a real DBMS, because it's going to smoke what someone tried to hack up into a filesystem.
* Automated index updates. Basically, locate but atomatically updated as changes occur.
Mac OS Classic had synchronous index updates as the filesystem ran. This makes the filesystem slow. It sucks for servers.
It's a much better idea to do asynchronous index updates, where the index approximates the filesystem quickly, but perhaps not right away -- you can do that without killing performance. Basically, you have the kernel notify a userspace daemon when changes occur, which rebuilds the part of the directory hierarchy (possibly waiting a while to see if it can use idle time). If a ton of changes occur to a small portion of the directory hierarchy, instead of trying to keep up with each change (say, a million changes to part of the directory hierarchy), the update daemon rolls all those queued up updates into a singled queued up full update of that entire portion of the directory hierarchy. It does a bit of extra indexing, but doesn't get backlogged.
This *could* be done under Linux, but there is one bit of kernel support that is missing -- currently, it is unusably inefficient for an app using Linux's existing dnotify() directory monitoring mechanism to deal with (a) changes to a directory containing many entries and (b) changes to directories anywhere on a filesystem (currently, dnotify() requires a FD for each directory to be monitored).
There *is* a enhanced dnotify patch out that would improve Linux's dnotify() mechanism to the point where people can write daemons with this, but Linus has not yet rolled it in. Once he does so, we can get the ball rolling on such daemons.
* Indexing of metadata.
People want to be able to search for their files using metadata. Again, this can just as easily be done using such a daemon. Existing apps continue to work, people can choose where to put something in a unique hirarchy so that they can reach it again, but files can also be addressed via metadata. If you want to provide, say, a tabbed file selector containing a tab for selecting files via metadata, that's quite feasible.
A major reasons why you don't want to provide a full DB-based interface is that you lose the existing hierarchical representation, and every app in the world stops working. You don't want people to just "save this file to a filesystem" and have to address it via metadata -- you want the user to give the thing some kind of unique identifier.
May we never see th