Database File System

Backups, and being organized in a general way? by manavendra · 2004-09-06 03:19 · Score: 5, Interesting

...I can kind of see this would make it easy to search and locate documents. What about backups though? How would a user be able to group (manually) related files together, so that the whole bunch can be backed up later, without having to search for all seeminly related (or unrelated) keywords to trace all hitherto-unrelated documents?

Secondly, with this mass of files being spread over several disks, surely, this is in a way forcing the user to "search" for everything. Or isnt it? Will the underlying FS layer still be accessible in the general way that it is?

--
http://efil.blogspot.com/

Re:Backups, and being organized in a general way? by manavendra · 2004-09-06 03:37 · Score: 4, Funny

Well, you may find it funny, I've already messed up two pair of trousers - once when my phone rang and second when my friend put his hand on my shoulder to get my attention...

--
http://efil.blogspot.com/
Re:Backups, and being organized in a general way? by Aeiri · 2004-09-06 03:46 · Score: 5, Funny

You should have posted that as Anonymous Coward...

"Implementing in GNOME" by kosmosik · 2004-09-06 03:19 · Score: 5, Insightful

Such thing should be implemented at kernel level to be transparent for *any* aplication. Without this it will just lead to a mess (like 4 different implementations) and some apps working with it and most not. As f.e. you can browse SMB network with Nautilus but when you actually try to open a file (from SMB via Nautilus) in OpenOffice.org you will get a info that viewer does not support this method... It must be a standard system routine not another level between system and GUI.

Version control would be nice as well by zero-one · 2004-09-06 03:21 · Score: 5, Interesting

I have always thought that version control (file histories, branching and atomic changes) would be nice to have at the file system level. Instead of storing myessay-firstdraft.doc, myessay-seconddraft.doc, myessay-final.doc, the file system should do the work. Then if I want to make a bunch of changes (perhaps I want to try a new page layout), I should be able to commit them as one atomic change (or throw them all away if I change my mind). Then, when I want to make a set of documents with US spelling, I should be able to branch the whole lot (using no disk space) and make the small changes from UK spelling while still being able to integrate other changes I have made.

Re:Version control would be nice as well by ultrabot · 2004-09-06 03:27 · Score: 4, Interesting

I have always thought that version control (file histories, branching and atomic changes) would be nice to have at the file system level.

Sounds like a job for an SVN plugin for Reiser4 file system. Anyone doing one already?

--
Save your wrists today - switch to Dvorak

Disadvantages by BHearsum · 2004-09-06 03:21 · Score: 4, Insightful

How much permforance overhead will this cause? The 'Desktop Environments' already eat a lot of RAM and CPU.
How much disk space will you lose over this? All the metadata has to be stored somewhere, and just glancing over the link I read something about a versioning system, which will definently take up quite a bit of space. Will a 20gb hard drive become 15gb with DBFS?

Re:Disadvantages by aodl · 2004-09-06 04:03 · Score: 4, Insightful

While performance is something that should always be kept in mind, we are a long way away from the days of the original Macintosh where a desk accessory had to weigh in at 600 bytes in order to make the cut and fit into both memory and on a floppy disk. As current desktop machines outperform the high end servers of a few years ago, it would be nice to put a lot of that muscle to use in improving the user experience. I'm not excusing bloated and slow code here, but we don't really need to be counting bytes.

In any case, database based operating systems have been around for decades, from OS/400 to the BeOS. Many BeOS users claimed it was hands down faster than any other shipping OS at the time, and it featured a journaling, database-styled file system. One of the primary developers of that file system is now working at Apple on Mac OS X 10.4's spotlight functionality.

The thing is - as our desktop storage continues to grow at the pace that it does, and as we curiously find ways to fill it up, new ways of looking at and finding the information we store are going to be needed.

DBFS, Gnome Storage, Apple's Spotlight, and WinFS, all take different routes to get there. It's worth looking at all and what they offer and where they differ. WinFS, is a new storage layer that combines file system resources with more structured data in a Relational/XML hybrid system, with the aim (from what I gather) of turning the file system into a global "soup" of data. That sort of soup can be seen in office suites or PDA style applications, and in older Operating Systems like the Newton OS, where everything is a shared and available resource that is stored and available through common structures. Spotlight, on the other hand, combines file system searches and indexes (think 'locate') with full content indexes and a metadata index, which uses 'importers' to parse out other file formats. Spotlight is not a new file system, but an indexing system that acts on files in the file system. From what I remember of Gnome Storage, it is similar, using the VFS layer and Postgres triggers and callbacks, along with plug-ins, to parse and extract relevant metadata and contents out of files. DBFS looks to be like WinFS in that it purely wants to be a new kind of information store. I don't know which style will win out. My theory is that technologies like Spotlight will eventually evolve into a new kind of storage system, while remaining familiar and file based for todays users and developers. But this is an idea whose time has more than come. It's something that's been promised for the desktop for at least a decade, and has been shown to work, albeit in targeted OS's (the Newton) or ones that never achieved mass market penetration (BeOS).

So I think that performance concerns aren't that big of a concern, so long as (like all development) there are good people working on the solution.

Reiserfs, storage and why do you want this? by ACK!! · 2004-09-06 03:25 · Score: 4, Interesting

Perhaps I am more organized than most but I already categorize my files and such in the hierachal file structure.

Isn't Rieserfs planning to do this on the kernel level?

Where does that leave other fs choices and storage and other idea dbfs?

I see more and more people saying look what neat things you can do with these tools.

But really why do you personally want something like this?

Curious to see the response is all.

--
ACK /ak/ interj. 2. [from the comic strip "Bloom County"] An exclamation of surprised disgust, esp. i

Re:Reiserfs, storage and why do you want this? by ctr2sprt · 2004-09-06 05:18 · Score: 4, Interesting

OK, let's consider an example other than documents. Joe is a big music fan with a couple hundred CDs. He likes having instant access to any one song when he wants it and has a lot of hard drive space, so he's MP3-ized (or ogg-ized or flac-ized or whatever) his entire collection, plus all the MP3s he's acquired via other means.
Joe is pretty good about organization, so he names every MP3 properly with the group, album, and track names, plus the track number. (Something like The Beatles/White Album/01-Back in the USSR.mp3.) This way if he knows, for example, the track name but not what album it's on, he can find it pretty quickly using a method like yours.
The problem is if he wants to do something like put all the country music he has, for example, in a playlist. How does he do that? It can be done, certainly, but if he has a collection with several thousand MP3s it's so tedious and difficult as to be effectively impossible. What if he wants to listen to 60s rock? What if he wants to find a particular song he has, but all he knows is it's between 3 and 5 minutes long, came out between 2002 and 2003, and is probably categorized as either "pop" or "alternative?" What if he just wants a list of all the songs he never listens to because he's sick of what he's been playing lately? Or maybe he needs to free up disk space and wants to find out what he'll miss the least.
All these things are impossible to do in an efficient and timely manner using our current system. He can certainly use a command-line ID3 tagger to strip out the things he cares about, something like
find mp3 -type f -print0 | xargs -0 id3tag -l | grep 'Genre: Country'
but that's painfully slow: a second or two per file means a big connection will take 15 minutes or longer to scan, and if you typed "Gerne" by mistake you have to do it all over.
Now if you had a filesystem-like object which could be smart and store ID3 metadata in the filesystem, then it would be much faster: the main overhead to using the find/xargs/id3tag/grep approach described above comes from having to seek through the MP3 file to get at the metadata. The reason this needs to be a "core OS component," perhaps even part of the kernel, is that MP3 tags can change at whim and the filesystem needs to know about it or its metadata can get out of sync. It's possible, but impractical, to update this on a periodic basis, like the locatedb; it makes much more sense to have the kernel inform some plugin "Hey, this file just changed, do you care?" And if the plugin does care, it can look at the changes, see if it's affected, and possibly update the metadata to match. It could also go the other way, where Joe updates the filesystem metadata and the OS knows to update the MP3's ID3 tags too.

Comments by the Author by data1 · 2004-09-06 03:28 · Score: 5, Interesting

The author is asking for help to move the project to Gnome.

Quote:There is of-course the hard choice of platform. I choose KDE? because I am familiar with QT a bit, and because it is inherently object-oriented, being C++ and all. But in my mind GNOME? is much closer to how I would like a desktop system to function. So I would like to go for the GNOME option. I leave KDE developers to do what they want with this, and I am offering them support. But I would like to focus my efforts on GNOME and implementing the above in GNOME.

Any volunteers?
In the first place we will need developers. Would you like to join, send me an email (o.gorter@student.utwente.nl) with DBFS and JOIN somewhere in the subject. If you are not a developer, but still would like to help, please revisit this page in a few weeks. There will probably be a community website by then somewhere.

Re:Performance? by Anonymous Coward · 2004-09-06 03:31 · Score: 5, Insightful

Depends on what you are using your computer for of course.

You can say the same thing for a GUI, and its correct for certain applications of computers, but wrong in others.

Re:Performance? by psavo · 2004-09-06 03:34 · Score: 4, Insightful

Isn't this thing with DB's getting a little excessive? You're adding another layer and step to storing data which will in all likely hinder performance. I'm not sure the benefit out weight the cost.

Well, if it's only a name-translation thingy, then it shouldn't affect performance of file reading (when operating on sufficiently big files), only file opening/stat:ing.

--
fucktard is a tenderhearted description

Sigh, Andrew Morton seemed to be right... by greppling · 2004-09-06 03:37 · Score: 5, Interesting

...when he said on LKML, slightly paraphrased: "The only reason I see to put filesystem semantic enhancements into the kernel, is that it would be socially hard to get people to agree an a single userspace library."

(In the course of the heated discussion about Reiser4.)

This is not a file system by MobyDisk · 2004-09-06 03:40 · Score: 4, Insightful

Maybe we could call it a "filing" system since it indexes files that are on another file system. Really, a file system IS a database, not an add-on that indexes files. Still, perhaps this is a better approach than trying to redo all the file-system internals. Although to be truly useful, this needs to be an API that is GUI-independent, with GUI-bindings as needed.

This looks like BLINKX plus more by lcsjk · 2004-09-06 03:43 · Score: 4, Interesting

I already use blinkx, beta, from http://www.blinkx.com/, to automatically search my files along with internet keywords. It doesn't have the search by date or extension and is not configurable to my liking, but it seems to do a good job of finding things I have misplaced. Integrated with the author's system, this could make a great search system.

Normally I file things in a hierarchial method by year and month and by project name (2004file/9sep/) or (2004file/workfile/projectname), but still I lose track now and then and need keywords. Change the "slash" slant to fit your OS.

Re:Performance? by kosmosik · 2004-09-06 03:48 · Score: 4, Informative

Nobody is sugesting to use such database FS for entire system. Only for specific data (f.e. user documents) - not entire system (binaries, libraries etc.) where such performance matters. Well in fact it will improve performance since right now applications that need such indexing (best examples are apps for organizing music (like iTunes) or digital pictures colections (like Adobe Photo Album or Google Picasa)) do it themselves which probably is not the fastest way and is not unified across the system. Now for *some* applications such view on files that lets you query for specific files/objects operating on query results rather as directories of files have much benefit. But it is only for organizing data, and in limited scope (as I've said - digital music, photography, probably some other fields). I don't really belive that this would seed up searching for office documents over LAN or smth. - when somebodys documents are in mess DB-FS won't change anything as the documents probably lack metadata, proper naming anyway.

Re:Performance? by BenjyD · 2004-09-06 03:54 · Score: 4, Insightful

Why not just run in console mode? All this GUI stuff is just getting in the way of absolute performance.

If it adds 0.5 seconds to every time you save a file, but saves you 20 seconds of filesystem navigating every time you open the file, that's a worthwhile tradeoff. Add to that the fact that copmuters don't get tired or bored, while humans do, and it makes even more sense to shift as much of the burden of working onto the computer as is practical.

Re:Performance? by jgardn · 2004-09-06 03:59 · Score: 4, Insightful

Not necessarily. Consider the performance of finding a document you wrote two years ago. How long does it take you to walk through the directory hierarchy browsing file names? How fast is the file search tool? Wouldn't it be faster if you could say "Show me the documents I wrote two years ago" and the refine the search or browse the results?

Storing data in a relational database is natural because it is more like the way we store data in our minds than the hierchical structures of traditional file systems.

Also, we allow a complete abstraction of the underlying database in relational systems. The database can store the data however it sees fit, and can arrange the data on disk without the users noticing a change.

I look forward to experimenting with a relational filesystem. I think it would be a wonderful thing to try out and see if it actually has the advantages I outlined above. I'd also like to see the actual disadvantages.

--
The radical sect of Islam would either see you dead or "reverted" to Islam.

Read the article, read some history by eschasi · 2004-09-06 04:00 · Score: 4, Interesting

To quote from the article (which most folks have not read, as usual):

The DBFS does not actually store files, it holds references to files on the underlying hierarchy based file system.

That line alone should answer many of the questions re backup, speed of FS performance, etc.

At a deeper technical level, nany of the questions asked here have historical answers or clues in The Design and Implementation of the Inversions File System. The abstract reads:

This paper describes the design, implementation, and performance of the Inversion file system. Inversion provides a rich set of services to file system users, and manages a large tertiary data store. Inversion is built on top of the POSTGRES database system, and takes advantage of low-level DBMS services to provide transaction protection, fine-grained time travel, and fast crash recovery for user files and file system metadata. Inversion gets between 30% and 80% of the throughput of ULTRIX NFS backed by a non-volatile RAM cache. In addition, Inversion allows users to provide code for execution directly in the file system manager, yielding performance as much as seven times better than that of ULTRIX NFS.

Note that this paper was published in early 1993. Many of the issues it addresses are relevant to DBFS, and many of DBFS's advantages are foreseen by that paper. IMHO DBFS has chosen a direction that should have better performance than inversion, not to mention lower risk and easier failure recovery.

Inversion was built on POSTGRES, which makes one wonder what happened to the source.

Apple's Spotlight by DuckWing · 2004-09-06 04:19 · Score: 4, Informative

For those of you that have not yet looked at the Mac OS X (Tiger) preview and WWDC web cast, the new spotlight technology built into the next version of Mac OS X looks very much like a fully integrated database file system. And it's incredibly fast. Go check it out!. Note: QuickTime required. Mplayer may work for us Linux heads but I haven't tried it.

--
-- DuckWing

Not Unix by flossie · 2004-09-06 05:20 · Score: 5, Interesting

the DBFS does not store system files: No shared libraries, no font files or others like that. These are not documents, not files you look up at a day to day basis, and have no place in a file system.

Whether or not you look at system files every day probably depends on what you are doing with your machine and what you consider "system files" to be. Moreover, this idea would seem to go entirely against the whole UNIX "everything is a file" philosophy which is supposed to be one of the great strengths of UNIX.

--
flossie

Write now. Defend liberty

SVN + DBFS + 10TB Nano Hoozie by KrackHouse · 2004-09-06 05:33 · Score: 4, Interesting

I'm using Subversion for a project and the idea of Atomic Commits seems like an obvious direction for file systems. If that other slashdot story is correct, storage becomes less of an issue and it would be possible to roll back the system to any point in time or to only roll back one file if need be. Now throw an intuitive way to navigate files on top of that and you've got a sure winner.

In the grand scheme of things, only a very small handful of us on earth are aware of Linux or even know what an Operating System is for that matter. File systems seem to be the big stumbling block for new users. Anything that can make computers and therefore access to information easier for the coming waves of new computer users (maybe billions of people?) will be a good thing. Even if the "bloat" slows down the system by 10%.

I hate to preach but that old quote comes to mind "With great power comes great responsibility". I don't think most of the people working on the OS that will soon dominate in developing nations (that's Linux) are aware of the harm they can do by slowing down Linux development with petty personal disputes. Like it or not, Linux is no longer just an edgy hacker tool. It has the potential to change the lives of Billions of people.

--
What if Digg added local news and a Slashdot inspired comment karma system? ---
http://houndwire.com

23 of 296 comments (clear)