Database File System
ozy writes "With all the fuss about searching and Spotlight and WinFS, check out the Database File System a completely different interface for your files, implemented in KDE. There is actually a request for developers to join a project to implement this under GNOME and leave how we use the desktop today behind."
...seems to have something more interesting: storage
...I can kind of see this would make it easy to search and locate documents. What about backups though? How would a user be able to group (manually) related files together, so that the whole bunch can be backed up later, without having to search for all seeminly related (or unrelated) keywords to trace all hitherto-unrelated documents?
Secondly, with this mass of files being spread over several disks, surely, this is in a way forcing the user to "search" for everything. Or isnt it? Will the underlying FS layer still be accessible in the general way that it is?
http://efil.blogspot.com/
Isn't this thing with DB's getting a little excessive? You're adding another layer and step to storing data which will in all likely hinder performance. I'm not sure the benefit out weight the cost.
-----
One is born into aristocracy, but mediocrity can only be achieved through hard work.
Such thing should be implemented at kernel level to be transparent for *any* aplication. Without this it will just lead to a mess (like 4 different implementations) and some apps working with it and most not. As f.e. you can browse SMB network with Nautilus but when you actually try to open a file (from SMB via Nautilus) in OpenOffice.org you will get a info that viewer does not support this method... It must be a standard system routine not another level between system and GUI.
So now, you will lose access to your file system if you use a simple window manager instead of KDE?
Great idea.
Article doesn't address whether or not we can turn DBFS off and use the more traditional hierarchical method of file placement. Will we be dragged into this kicking and screaming?
I have always thought that version control (file histories, branching and atomic changes) would be nice to have at the file system level. Instead of storing myessay-firstdraft.doc, myessay-seconddraft.doc, myessay-final.doc, the file system should do the work. Then if I want to make a bunch of changes (perhaps I want to try a new page layout), I should be able to commit them as one atomic change (or throw them all away if I change my mind). Then, when I want to make a set of documents with US spelling, I should be able to branch the whole lot (using no disk space) and make the small changes from UK spelling while still being able to integrate other changes I have made.
I like the looks of this and the way it can search the file system. Nice job! This would be a great way to keep track of multiple projects. Blows away the Winfs idea, I will try it out.
Professional Politicians are not the solution, they ARE the problem.
How much permforance overhead will this cause? The 'Desktop Environments' already eat a lot of RAM and CPU.
How much disk space will you lose over this? All the metadata has to be stored somewhere, and just glancing over the link I read something about a versioning system, which will definently take up quite a bit of space. Will a 20gb hard drive become 15gb with DBFS?
why don't the nice KDE people and the nice Gnome people work on developing a library that sits on top of this and then we can stop all the stupid name calling and use the right tool for the right job
...vividly encapsulates that post-Watergate/pre-punk/coked-up moment when you could trust no one, least of all yourself.
"The DBFS does not actually store files, it holds references to files on the underlying hierarchy based file system. The GUI part is implemented in KDE where it replaces all hierarchy based file accesses. This gives an impression that there is no hierarchy, but to applications nothing has changed, the open-file and save-file dialogs have the same APIs."
Perhaps I am more organized than most but I already categorize my files and such in the hierachal file structure.
Isn't Rieserfs planning to do this on the kernel level?
Where does that leave other fs choices and storage and other idea dbfs?
I see more and more people saying look what neat things you can do with these tools.
But really why do you personally want something like this?
Curious to see the response is all.
ACK
might sligtly offtopic, but is there any open source software for windows that could do the same thing? I produce a lot of notes, and i want to be able to categorize my files.
A little stupidity is as unlikely as a little pregnancy
The author is asking for help to move the project to Gnome.
Quote:There is of-course the hard choice of platform. I choose KDE? because I am familiar with QT a bit, and because it is inherently object-oriented, being C++ and all. But in my mind GNOME? is much closer to how I would like a desktop system to function. So I would like to go for the GNOME option. I leave KDE developers to do what they want with this, and I am offering them support. But I would like to focus my efforts on GNOME and implementing the above in GNOME.
Any volunteers?
In the first place we will need developers. Would you like to join, send me an email (o.gorter@student.utwente.nl) with DBFS and JOIN somewhere in the subject. If you are not a developer, but still would like to help, please revisit this page in a few weeks. There will probably be a community website by then somewhere.
The database file system originated from the ideas of an object-orientated database. Keywords and references are all part of the orientation objects of the database to index to files or other objects. It does away with the traditional hierarchal view, being rooted at some place. The OODB does not need to be rooted as it is more like a web. The DBFS seems to try to implement part of the concept of the OODB. Good. There are many more features an OODBFS can offer: dynamic organization, classification, and mutliple "skeletal" views to name a few. I hope that this DBFS will give a taste of what an OODBFS offers.
neither gnome nor kde are window managers, and I doubt berman would allow either to appear on enterprise today or tommorow.
(In the course of the heated discussion about Reiser4.)
What a stupid thinking. Both things: Making it for KDE and porting it to Gnome.
In reality DBFS should only provide higher API besides standard one. And if desktop or application supports this API here's all of the functionality. If not plain FS access is all you get. In Gnome case gnome-vfs should detect and use DBFS, and in KDE probably the equivalent is KIOSlave.
Kernel should set some standard API for filesystems like DBFS and Reiser4 if not for other reason then to set starting access point for filesystems like these.
Signature Pro version 1.13.2-3 release 83.5 beta3try7 after-breakfast edition
While everybody is busy making fun of WinFS, Microsoft is very quietly and successfully letting their customers install SharePoint sites all over the place.
As usual, Xerox came up with the concept years ago (DocuShare). Sigh.
there's no place like ~
Maybe we could call it a "filing" system since it indexes files that are on another file system. Really, a file system IS a database, not an add-on that indexes files. Still, perhaps this is a better approach than trying to redo all the file-system internals. Although to be truly useful, this needs to be an API that is GUI-independent, with GUI-bindings as needed.
Stop letting good ideas be victimized by the M$ marketing FUD machine. Someone said "database filesystem" That was a good idea. M$ can along and said gee lets steal that idea. Hey there is no existing implementation to copy how do we do it. The answer is they did not do it. They put a database to keep rather mundane information on top of NTFS(a real filesystem) and called it WINFS(NTFS and a almost completely unrealated database, not a file system). Keeping data on the relations ships between files is a nice idea. Putting it in user space is dumb. Its overhead. Look at most of the example he gave "find a word document I worked on last month". All that info is already in the filesystem. A filesystem really is already a database in the strictest since. It stores whats on which inode assoicated with how many blocks which you could think of as attributes. It also sores attibutes like permissions and dates. Why not just put some more attributes into it like subject and relatedtopicID . If you did that and then added the ability to maintain some other tables where you could put extended descriptions and stuff, and built up the query engine to be able to efficently solve queries users will likely ask then you'd have what your really looking for. Addionally you would lose the overhead to a degree because you'd be storing informaiton once instead of in the FS and in the database.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
What I ment is.. DFS interface should come as filesystem tool (xfsprogs, e2fsprogs, reiser4progs etc) rather then depending on one desktop or window manager.
[ I can not bring myself to believe that if knowledge presents danger, the solution is ignorance ] -- Isaac Asimov
Read about Beagle here. I posted about this on Slashdot a few days ago.
Simpy
Normally I file things in a hierarchial method by year and month and by project name (2004file/9sep/) or (2004file/workfile/projectname), but still I lose track now and then and need keywords. Change the "slash" slant to fit your OS.
Have a look at this for a userspace filesystem http://www.inf.bme.hu/~mszeredi/avfs/
/etc/apt/sources.list
There are a debian package called fuse-source and fuse-utils. I.e. use all these nice kio plugins on filesystem level.
Just add
deb http://www.kalyxo.org/debian/ unstable main
deb-src http://www.kalyxo.org/debian/ unstable main
to your
apt-get update
apt-get fuse-source fuse-utils
Actually, I didn't test that yet. Someone else?
DoXfs
Adding a database layer to Gnome sounds like using another 300 megabytes of storage on my hard drive. I simply do not need the database.
If the FSF/GNU folks really want to do something revolutionary, they should fork Linux+Gnome into 2 distinct paths: minimalist and maximalist. The maximalist is what we have now. The minimalist is a minimally featured Linus+Gnome distribution. It is the bare minimum in functionality that we need to have a decent operating system and desktop.
Into this minimalist installation, I will then add the applications (e.g. MatLab) that I use daily.
Keeping the traditional file system is only logical as this database sort of file access is in fact a higher level of abstraction.
.....
Considering there are numerious project of such higher level of file access abstraction going on, it does become a secondary choice for the user if they want to use one of these higher abstraction level file access systems.
To remove the traditional file system altogether would be a mistake, as then it could become a system of babel or keywords.... "what was I thinking when I created that keyword and lets not even get into what crazy joe was thining when he came up with his keywords...
But hey, given how MS based developers would create some obscure dll name and place it in some obscure location in order to copy protect
higher level abstractions are useful only to the point that you can, if need be, drop down in abstraction level to get your bearings as to where you are. If you cannot touch physical reality then how do you know you are not floating around aimlessly?
being out of touch with physical reality can evenm be very dangerious and hard to correct.
The idea of a database-driven file system is nothing new.. try googling for mysqlfs
My internet connection uses their network, so it may become sluggish for some time, while the /.ing is in effect...
standard filesystems only have ONE index - a hierarchical one that contains a certain amount of real-time-updated indexing (such as the timestamps on a directory)
but it is NOT a relational database: you CANNOT easily create or use an alternative index to your files.
that's what all the fuss is about.
some people mentioned here that they already organise their files. great. fantastic.
HOW LONG DID IT TAKE YOU?
and how long would it take to reorganise?
with a relational database, all your indexes are updated AUTOMATICALLY.
therefore, doing searches on a relational database filesystem (find me all music files with dates between last week and last month: SELECT * from files WHERE files.type = "music" and files.date NOW() - 7days
you _can't_ do that sort of thing on a traditional filesystem.
sure, you can emulate it by creating symbolic links all over the place, but what happens when a file is deleted or moved? you need to manage / relocate the symbolic links...
nah.
databases.
fantastic idea.
now can we have them as a kernel module, pleeease?
At a deeper technical level, nany of the questions asked here have historical answers or clues in The Design and Implementation of the Inversions File System. The abstract reads:
Note that this paper was published in early 1993. Many of the issues it addresses are relevant to DBFS, and many of DBFS's advantages are foreseen by that paper. IMHO DBFS has chosen a direction that should have better performance than inversion, not to mention lower risk and easier failure recovery.Inversion was built on POSTGRES, which makes one wonder what happened to the source.
Sorry, but I have to say it, you are an idiot!
...
Did you even care to RTFA?
This has nothing to do with the gnome or kde devs. Some developer invested his time to come up with something he thought was useful and all you can do is complain?
And if you look at the project, it is something completley different then a plugin for the admittedly great ReiserFS4 and it is here and usable right now.
So friggin stop your stupid whinig.
And to the mods who modded parent interesting
Btw., why don't you whine to the Reiser people that they should stop developing now? It would be as justified as your whining aboutt this project.
"grep" - why didn't anyone else on the planet think of that one? That's fantastic. Now I just HOPE to goodness I've always managed to always save everything under ~/Documents, never anywhere else, and then make sure I'm only searching for ASCII data inside any files. I'm all set! Thanks. You've made it much easier for me to find all my pictures and audio files related to various topics.
creation science book
"I am currently looking for a job. Interested in employing me? Drop me an email."
Ozy has worked on this in his time available as a student. If he gets a job doing something different, he might drop DBFS, and it might die a lonely orphan. Email him only with DBFS job offers, please!
--
make install -not war
The smallish (windows) app that you downloaded and in made an index of everything on your PC?
(or was it from altavista?)
http://slashdot.org/~GuyFawkes/journal
After reading through the comments to this story I have to say I'm sincerely disgusted of you once again.
/. mod crew.
About 90% of the posters don't even care to read the article.
Now this fact of course doesn't stop you from bitching and whining, on the contrary, it's so much easier if you don't have a clue about what you are talking about.
So as in any story that reports about something new about half the posts consists of "I don't need this, it suxors!!11!!one!", while the other half is busy telling everyone that yet an other project that does the same is hardly needed. Of course the project does something no other project does, but hey, who am I to actually read the article?
I mean I can remember there being other stories about database and filesystem, so hey, this project must be redundant. And if it isn't, who cares, at least I'll be modded insightful by the always competent
Disgusted as always,
AC
A db fs with rich searching of metadata requires the orderly and accurate entry of said metadata.
You can't get organisation out of nothing - you are just asking people to be organised in a slightly different manner.
An organised person can already work effectively in a filesystem with current tools. The fact that they are organised is the key.
A disorganised person is not helped as their metadata will be erratic or absent. In fact might they now have the capability to be even more disorganised?
As I see it this is not solving an organisational problem as much as shifting the interface to the problem. I do not believe it to be either an easier or better way to organise personal data. Conversely I do not believe it to be inherently worse either. It's just different.
Where's the gain?
For those of you that have not yet looked at the Mac OS X (Tiger) preview and WWDC web cast, the new spotlight technology built into the next version of Mac OS X looks very much like a fully integrated database file system. And it's incredibly fast. Go check it out!. Note: QuickTime required. Mplayer may work for us Linux heads but I haven't tried it.
-- DuckWing
People can offer their opinions for or against this, but I think that any innovation benefits linux. I've read about WinFS and it sounds like a good idea, but who knows when it will be ready. If people working in their spare time can get something like this working in linux before Microsoft can get it out, I think that would just be another reason to trust the open source model of developing code and squash Ballmer's FUD.
I don't have too much trouble using a hierarchy file system. I keep my stuff pretty organized, but computers are supposed to save time, not create more problems. If this database can do a good job, I'll give it a shot.
Just give every doc a url ? A bit more restful
"If the King's English was good enough for Jesus, it's good enough for me!" -- "Ma" Ferguson, Governor of Texas (circa
From the FAQ
"KDE, GNOME, make up your mind.
Choice is a good thing. Myself I use Mac OS X daily and love it (much to the irritation of my friends and family). I am not against KDE, or GNOME. Actually the DBFS has two parts in it, a low level part, which can be shared by KDE and GNOME (or Mac OS X or Windows XP) and a GUI part. The DBFS cannot do without a graphical display, and I have to choose a platform. GNOME seems to go the route of instant apply and simplicity for the user, which is more inline to my own ideas. That is why I now want to focus on GNOME."
WE DON'T NEED NO BLOG CONTROL.
DBFS is a much more accurate model of our stored data, and how we use it, than the hierarchial databases we've used to bootstrap the world into using personal computers. But it's still really a prototype, proof of concept, because its data server calls the hierachical filesystem API of the Linux filesystem model, on which it runs. Underneath DBFS, and above the disk (or other storage media) driver, is an inode database, which is hierarchial. A giant improvement in efficiencies, whether speed, space, or complexity, would come from implementing a SQL-to-inode engine directly, without the intervening layer of even something good like a ReiserFS or something well supported like ext2/3.
Linux's VFS system for modular filesystem installation is a good package. Packaging a DBFS in a VFS API, which operates directly on inodes, while replying to calls from a storage server like DBFS, SAMBA, NFS, or all of them, gives a powerful storage layer that interoperates with current apps, while opening the future for new apps that can use the relations among the data, as well as the meta-/data. A SQL API, as well as familiar data object references, would make such a system complete. Then work could begin on factoring file browsers, file access dialogs, email GUIs, and every other app/feature that will be revolutionized by dragging data out of the 1960s hierachial file cabinet, into the 21st Century of networked relational multimedia objects. Let's get cracking!
--
make install -not war
When I got my first PC in 1990, I was faced with a similar problem. I first put all my data (I'm a writer and I used Word Perfect 5.1) in hierarchical directories descriptive of the project., since WP supported long filenames and the ability to search the long names.
.3 extensions to "keycode" my files: finished edits ended with my initials; works-in-progress were *.raw and partially-done files ended in *.edt
I did, however, use the DOS
Later I added *.fnl for dead files and *.prt for formatted files that have been or are ready for printing.
Soon I found the limitations to WP's scheme and found XTree-Gold, a DOS tree-based filemanager similar to Norton Commander, but more powerful in that it has a global searchable list function by filename, too, and can be made to sort by date.
So what can any of these new DB filesystems do that XTree-Gold for DOS or its clone ytree (http://www.han.de/~werner/ytree.html) for Linux/Unix can't, given the abillity to use long, descriptive filenames on ordinary fss?
A large part of proprietary email systems (GroupWise and Exchange in particular) is the quick search and retrieval of messages (files).
Email systems do this by storing pointers to messages (files), along with pertinent data (subject, received/sent date, etc.) in the DB. GroupWise also stores small (2kb) messages inside the DB, but IMHO, storing the files themselves in the DB is probably a bad idea.
The messages (files) are still accessable in the event of DB corruption and the DB can be rebuilt by scanning the files.
I don't see any reason the whole FS couldn't work this way...
Goofy, Geeky Gifts and More!
Am I the only one to notice that this article didn't have a from the x-or-y department line?
Here's the facts:
Database filesystems REQUIRE traditional filesystems to write on top of (unless the SQL server is implemented IN the kernel, which everyone (and I) agree is too much bloat). So, for DESKTOP machines, and STORAGE SERVERS, this technology would rock. It improves the ability for a user to find his/her files effeciently.
Meanwhile, for mission critical systems, for the underlying systems, for EVERYTHING ELSE, they will continue using ordinary hierarchy file systems like Reiser.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
You need a graphical display? Wow, great, send to trash automatic systems. Or blind people.
This basically seems like giving find the capability to do file and then storing the results with locate. And add some time stamping, sorting capabilities and GUI.
Since all but the GUI are basic commands, it would seem sensible to have an underlying library with hooks for use by your choice of desktop manager.
To-do List: Receive telemarketing call during a tornado warning. Check.
Using a linux desktop is like using Win3.1 or 95 hyped up on really lame raver drugs. I've always found it sad and extremely frustrating that a venue with MASSIVE potentiall for innovation has instead been spending its time reimplimenting Windows Explorer. :|
:-(
I'm down with anything that makes the linux desktop experience a real linux desktop experience- not a pissass wannabe win95 experience or a solaris experience. There's so much cool stuff going on under the hood... but the thin candy shell feels like GPX or Coby* slapped onto BMW internals. I know it's possible to do better.... but after years of seeing Windows and MacOS features reimplimented (not nearly as well in most cases) in linux, I'm starting to lose faith.
*GPX and Coby make shit electronics that are cheap and break very, very easily. Coby specifically, in that their products (and LOGO!) are modeled after Sony... and the best they can do is to come across as a cheap ripoff.
One of the greatest features of GNU/Hurd is the idea of translators. By getting rid of the traditional file system model, you can make many interfaces to systems that "look" like filesystems, and such keyword based indexing/browsing/searching could be impelemented at the user level *as a filesystem*, rather than as a set of calls on top of a traditional filesystem requiring the application to "know" how to use it.
The Hurd documents talk about everything from an "ftp filesystem" to ways to rewrite X. I can imagine an IM program where you could echo and cat to friends like:
echo "Hi there" >> ~/IM/jabber/friend@jabberserver.org
When you think of the filesystem as a set of interfaces rather than a set of files, the possibilities are endless, including the one presented in the project.
It's sad that Hurd hasn't done better in making a working system, and it's also sad that Linux hasn't taken this "Best of Breed" idea.
Whether or not you look at system files every day probably depends on what you are doing with your machine and what you consider "system files" to be. Moreover, this idea would seem to go entirely against the whole UNIX "everything is a file" philosophy which is supposed to be one of the great strengths of UNIX.
flossie
Write now. Defend liberty
It's more a document repository than a filesystem. But sure, it's a good idea. :)
Reluctance to change is a pretty common public reaction to just about anything especially when it's about something that people don't understand.
Think about something as simple as USB technology. It was a great idea from the beginning, but we were all so used to "parallel for printers, serial for modems" mentality that we couldn't see into how useful and universal it could become. But how about today? Just about anything new can be given a USB interface.
Now I'm reading about all kinds of reservations about changing the way files are accessed.... or even stored. A database filesystem? Now we're playing with our comfortable concept of "how things are done" again. When will we every stop shaking things up like this!? Damnit!!!
Actually, consider this. People are damned afraid of it coming from Microsoft. With MS, it's pretty much an "all one way" kind of thing. Once Microsoft's "viral infection" starts to proliferate and this early adopter causes the next company to upgrade and the next and the next, there's no stopping it.
Now let's look at open source. If an idea doesn't work, you can abandon it with little to no investment lost. Get another computer to run the newest experiments in systems concepts and don't abandon your old way until you're ready.
Shhh! Don't let David Blunkett hear you! If he finds out that computers can do this, he will make it illegal not to use keylogging.
flossie
Write now. Defend liberty
I'm using Subversion for a project and the idea of Atomic Commits seems like an obvious direction for file systems. If that other slashdot story is correct, storage becomes less of an issue and it would be possible to roll back the system to any point in time or to only roll back one file if need be. Now throw an intuitive way to navigate files on top of that and you've got a sure winner.
In the grand scheme of things, only a very small handful of us on earth are aware of Linux or even know what an Operating System is for that matter. File systems seem to be the big stumbling block for new users. Anything that can make computers and therefore access to information easier for the coming waves of new computer users (maybe billions of people?) will be a good thing. Even if the "bloat" slows down the system by 10%.
I hate to preach but that old quote comes to mind "With great power comes great responsibility". I don't think most of the people working on the OS that will soon dominate in developing nations (that's Linux) are aware of the harm they can do by slowing down Linux development with petty personal disputes. Like it or not, Linux is no longer just an edgy hacker tool. It has the potential to change the lives of Billions of people.
What if Digg added local news and a Slashdot inspired comment karma system? ---
http://houndwire.com
Search for: Documents
Modified since: last backup
I don't know about the other implementations but Searchlight and WinFS are implemented atop the existing filesystem. (The FS in WinFS supposedly stands for "Future Storage" and not "File System") Sort of like how Google is implemented "on top" of the regular hypertext-linked internet.
Is this the same Hurd that will ship one year after Duke Nukem Forever?
Frankly, one of the greatest features of Hurd is that people still talk about it, in the present tense no less, despite appearances that it will never exist.
From GNU Hurd: It is not ready for production use, as there are still many bugs and missing features.
On the negative side, the support for character devices (like sound cards) and other hardware is mostly missing.
As I understand it, Searchlight is implemented as indexed metadata in additon to the standard Mac OS HFS+ file system. It's not a true "database," but rather a quick and fairly functional facsimile. (Similarly, WinFS is not a FS at all, anymore.)
Take a look at an AS/400 (iSeries). They've been around for more than 10 years. And before that you had the System 38 and System 36... and so on..... Why is this some big revelation now?
-Cnik
A few years ago I remember seeing a filesystem based access gateway to MySQL. That was pretty neat because you could access rows of information using standard Unix tools like grep, sed, awk, numsum and so on.
I know it was there in System 7, which still puts us in the early nineties, I think. Not sure of the year--not sure of the System version--not sure whether it was a database or a Metadata whoozywhatsis. But I am very sure that I never had my operating system complain that "somebody must have (gasp!) *moved* a file!".
What strikes me here is not the ease with which you could find a file (it was not a sure thing), but the rarity in which you had mis-placed one. Even the most elderly Macs you might come across kept pretty tight track of what was going on in the FS.
A resourceful* Mac user could assign one of eight or ten categories to a file, which would then show up COLOR-CODED in your SORTABLE display. Note that these were not, however, fields--just one-shot categories. If you wished to use them as fields, great, but you had a total of eight (or ten) values.
I was just surprised not to see it mentioned here yet. Nevermind OSX. If I could have the MacOS 9 interface (same as System 7, basically) on top of something POSIX, well, then we'd be talking. And yes, I would use my desktop database to find things impossible in other pricey consumer-oriented OSs.
* sorry bout that
Did you work on that file last month? Find all files you worked on last month. Was it a word document? Find all word documents you worked on last month. Was it for a certain project. Find all word documents from that project you worked on last month. That is the thinking the DBFS supports.
What a coincidence. That is the thinking that "find" supports, too. And I don't have to run a bloated desktop environment to use it. Cool.
This did mean that you had to use the right tools to get into the files or you had to cope with the changes in programs that worked on them.
VMS also had file revision numbers on files as a couple of posters have noted.
Both of these were nice in some ways, but relatively difficult to deal with in other ways. By comparison unix is straight-jacketed but easy to use.
Added to which, if someone wants "slimware", its already out there - it was written in 1995. If you are stuck with a 486, boo-hoo, fortunately there was a time when this was the cutting edge, and during that time people wrote and optimized code like lynx and fvwm and xview. So the code is there if you still need it, stop complaining!
I am tired of the posts on here telling us about the crippled/lameduck hardware some welfare case is running, and how we all need to accomodate him. For Christ's sake folks, you can get a P4 desktop for $500.
Let me guess, the first app you fire up is top, and you get hives if the system dips below 95% idle?
If considering file-system as database, then consider application as database. The Run-Time-Access package makes your program's internal data structures appear as tables in a PostgreSQL DB. RTA is to a DB as /proc is to a file-system.
http://www.runtimeaccess.com/
...that's all us tech folk need is for the users to be EVEN MORE RETARDED! More abstraction is not necessarily a good thing.
The database-like filesystem of BeOS was replaced in the "Advanced Access" release (before PR 9). It was said to have inadequate performance for multimedia work. The old filesystem was replaced with BFS, which has been written by Dominic Giampaolo. It is pretty powerful with its indexable custom attributes. But it is not really a database-like system - as the old file system of BeOS was.
By the time you factor in the near annual cost of replacing DVD-R drives that have failed due to laser burnout it costs about as much today to store data on hard drives as on DVDs. I have 320GB in storage on my desktop that cost all of $200 - why give a fuck if my "search engine" eats up 30GB? These types of accessories are an investment in memory augmentation - buying a bigger hard drive is an incrediby cheap investment for the benefits obtained.
If you want minimal, install gentoo or something and learn to love blackbox. For a long time I used blackbox with mdk8 on 200mhz pentium MMX systems and with only 128MB ram it was pretty damn acceptable and it looked very nice. Minimal is easy - minimal is what linux has been all along.
When I was stuck in windows I used Thumbsplus for this (which, despite its name, will work with just about any MIME type, not just pics). I had several databases, all of them 2GB (for easy backup to DVD). The one thing I have found no acceptable substitute for in linux, ironically, is thumbsplus.
I've been wanting something like this for years. Many times I've stumbled upon a website during a search that seemed mildly interesting for whatever reason at the time but not relevant, only to think a few days later "what was that site where I found...?"
I know that sounds a bit like dashboard, but what I've seen of dashboard falls short of this - it's almost like the two products are the same concepts but built from different directions. When they meet in the middle, many of us will finally have something an order of magnitude closer to our dream desktop.
300MB? Hell, my /tmp is four times that!
What about backups though? How would a user be able to group (manually) related files together, so that the whole bunch can be backed up later, without having to search for all seeminly related (or unrelated) keywords to trace all hitherto-unrelated documents?
Defining "group" is one of the problems with hierarchical systems. What is the group you want is dependent on your needs at the moment. Relational makes it easier to create ad-hoc groups.
For example, do you want to backup by file age, file type, and/or by topic? And in the real world topics naturally overlap. Set theory has an easier time with this because it is meant to deal with overlaps; but hierarchies get messy beyond about 3 orthogonal factors. Relational is closer to set theory than trees are.
A drawback of sets is that most people are not familiar with non-tree arrangements. There will probably be a "training hump".
More about sets versus trees.
Table-ized A.I.
Before inventing something you should check if no one did this earlier. Because there you have GNOME Storage. Don't be fooled by screenshots there. Storage isn't only cool search facility with native language parser ("computer, find me all porn I've downloaded yesterday" anyone?).
Storage is, suprisingly, method to store files decomposed to contents. The great searching ability is a side effect.
Imagine collaborating in of group of people over one document. Every one got some paragraphs to edit. With Storage, everyone can edit this document in the same time, seeing other's changes as letters are typed. Store version history and you have revision control. Throw in network transparency (you go to other department, connect laptop and automagically you can work on those department files) with OpenTalk (Zeroconf/Rendezvous) and you got best idea since hierarhical directories.
Be sure to read whitepaper about Storage available on mentioned site. Also check for Storage related entries in Seth's blog (Seth is one architect of GNOME Storage). Now if only KDE people work on compatibility with Storage, freenix desktop would rule the world.
BTW, KDE, don't miss chance of integration! KDE is planning to introduce google-like search in desktop. Don't reinvent wheel! Beagle is here, working. Just integrate Beagle with KDE desktop and we are set.
:wq
Change directory could be made to handle DB queries by having some bracketing character sequence in the directory names be made reserved and using it to enclose relational queries. On access the db could execute the queries and would need to remove the pseudo path elements from what gets sent to the underlying file system. Then any scripting system that needed to pass a path could take advantage of the DB access without the app really being aware of it and without massive reprogramming.
Joel on Software said it best:
For example, WinFS, advertised as a way to make searching work by making the file system be a relational database, ignores the fact that the real way to make searching work is by making searching work. Don't make me type metadata for all my files that I can search using a query language. Just do me a favor and search the damned hard drive, quickly, for the string I typed, using full-text indexes and other technologies that were boring in 1973.
Laws do not persuade just because they threaten. --Seneca
I hate GNOME-VFS and KIOSlaves and all that stuff that people try to roll into a higher layer than they should. The idea of combining VFS functionality and a *desktop environment* is *stupid* and exactly the sort of fragmentation that hurts everyone. Want a userspace VFS layer (like, other than the existing transparent systems like LUFS and FUSE, which are much better if you can use them, because existing apps continue to work) Fine. Make one. Call it libvfs, and make KDE and GNOME bindings. But for the love of God, quit trying to roll filesystem functionality into desktop environments. It's ridiculous, and not where it belongs.
That being said, I'm still not a huge fan of this.
There are three main features that people seem to want with a DB-based FS:
* Transaction/views/high-level-DB functionality.
You don't want a FS that tries to do all this. DBMSes have worked for a long time to do this efficiently. If you want this, use a real DBMS, because it's going to smoke what someone tried to hack up into a filesystem.
* Automated index updates. Basically, locate but atomatically updated as changes occur.
Mac OS Classic had synchronous index updates as the filesystem ran. This makes the filesystem slow. It sucks for servers.
It's a much better idea to do asynchronous index updates, where the index approximates the filesystem quickly, but perhaps not right away -- you can do that without killing performance. Basically, you have the kernel notify a userspace daemon when changes occur, which rebuilds the part of the directory hierarchy (possibly waiting a while to see if it can use idle time). If a ton of changes occur to a small portion of the directory hierarchy, instead of trying to keep up with each change (say, a million changes to part of the directory hierarchy), the update daemon rolls all those queued up updates into a singled queued up full update of that entire portion of the directory hierarchy. It does a bit of extra indexing, but doesn't get backlogged.
This *could* be done under Linux, but there is one bit of kernel support that is missing -- currently, it is unusably inefficient for an app using Linux's existing dnotify() directory monitoring mechanism to deal with (a) changes to a directory containing many entries and (b) changes to directories anywhere on a filesystem (currently, dnotify() requires a FD for each directory to be monitored).
There *is* a enhanced dnotify patch out that would improve Linux's dnotify() mechanism to the point where people can write daemons with this, but Linus has not yet rolled it in. Once he does so, we can get the ball rolling on such daemons.
* Indexing of metadata.
People want to be able to search for their files using metadata. Again, this can just as easily be done using such a daemon. Existing apps continue to work, people can choose where to put something in a unique hirarchy so that they can reach it again, but files can also be addressed via metadata. If you want to provide, say, a tabbed file selector containing a tab for selecting files via metadata, that's quite feasible.
A major reasons why you don't want to provide a full DB-based interface is that you lose the existing hierarchical representation, and every app in the world stops working. You don't want people to just "save this file to a filesystem" and have to address it via metadata -- you want the user to give the thing some kind of unique identifier.
May we never see th
Oh, and as for reasons that I don't like this particular implementation of a DB-based filesystem -- they add the idea of attributes, which is cutesy, kinda of like Nautilus did with emblems (grr...darn desktop-environment-level FS stuff), but then they overload the keyword namespace with attributes, which is just plain silly, and runs the risk of any app using attributes getting into name collisions.
May we never see th
I can't really see the need for it in a single machine environment - nice touch, but unnecessary.
Where I see the real benefit for this type of file system is when it's networked. Then the whole idea of thin network clients starts to come into it's own. Not only does the end user naturally then have to save all his/her work onto the SAN (easily doable today with other methods), but they transparently do it in a way that means other people can easily find what they've done. They'd also naturally have access to shared resources without having to consciously search through shared drives etc. Just an idea, but it could be an absolute boon for collaborative working, if done correctly...
Bring the network storage right into the opsys!
If it uses the QT library, forget it being used by GNOME people.
The fact is that XML in some form or other seems to be the direction that many "office" type programs are headed, and it is amenable to being used as the underlying means of indexing other document types too - there being XML extensions and filters for many databases as well as native databases for it.
In fact, Reiser with the XML plug in is supposed to be one of the fastest XML databases out there.
So... if the implementation is split into file store and meta-data store, and the metadata is stored in XML - then it should be possible to cater to both those who can (and want to) use Reiser and those who can't/don't.
Of course I may just be blowing smoke out my ears after the Labor day blowout last night ;)
The other point I'd make is that there should be some way to plug-in tools for after-the-fact analysis and indexing of documents to add to the system - I have e-mail and correspondence going back to 1983 (from Xenix and TRS-Dos days) that I'd love to include in the system but only if the system will find and catalog it for me. I'd certainly be willing to teach it how to deal with Scripsit and Multi-plan files (even Visicalc).
Don't worry if the analysis takes days/weeks/months - shove it off into background and let it vie for time with your screen saver or SETI. I don't care because I don't expect to get around to it manually for another 20+ years ;)
Taking a look at ctime and mtime (on Unix systems) for example to tell when a document was initially created and last modified would go a long way towards filling in the gaps on the timeline. Then categorize by the keywords in the directory tree structure the document is found in (there are several trees from the past that contain the word "customer" for instance - and all items below those trees should be able to be aggregated) and by gross file-type (visicalc, multiplan, excel, etc. are all spreadsheets for example) and things would start looking up.
Use the magic file, not file extensions - look inside the file (using "strings" for instance) for keywords if the file type's major structure isn't (yet) known - and abstract comments and other info (date/time/f-stop, copyright notices, etc.) from the likes of image files too. Even abstract the image files down to a predominent color (show me all the green images - or all the flesh-colored ones). Abstract the image size and track it too - show me all the images except the 160x120 and 640x480 (they are created from the originals)
Think about the fact that the average desktop system is idle a majority of the time and use that CPU horsepower to make my life easier.
Also - don't worry about using more storage for the meta-data than the original document used - storage is now so cheap compared to my time that you can use 10-100 times as much if it will save me an hour looking for something.
I'm watching any/all of these types of projects looking for one that will come even close to what I think I need.
Been there, done that, paid for the T-shirt
and didn't get it
Yes I own a Mac, I admit that bias. But even before I had heard about Spotlight I had doubts about the WinFS approach to file storage.
Why? Because a user should not have to help manage metadata if at all possible. If they do things will just become a mess. I think in the end the only reliable data at all is what is actually in the files themselves - so any approach that parses and extracts file metadata from contents is going to end up with way better searches for a user than one where you have to say "this is a picture of Uncle Fred".
The advantage a Newton or Palm has is they provide all the key apps for the OS. I think that even in Windows Microsoft would have a hard time getting enough programs to get behind the storage of key metadata to make it as useful as they would look - who knows, perhaps acquiring such backing is part of the WinFS delay.
The Spotlight approach seems cool in that is offers the possibility of custom metadata-extraction plugins, that perhaps go the extra mile to provide meta-meta data (like figuring out which pictures were mostly water or sky).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Let's take a step back and consider some fundamentals:
- Filesystems are a way to represent resources in an organized namespace.
- Disk capacity is increasing faster than our ability to utilize it effectively.
- Networks are everywhere, from Internet to WiFi to Lan Party.
- How you use data is more relevant than where you use it.
- Social networks provide context and reputation to resources and conversations.
Lets mix them all together and see what comes out:
- A filesystem that is not "fixed" to a disk, but is a "view" associated with you (securely) that travels wherever you go, using networks and aggressive caching on plentiful disk space to make access fast.
- A filesystem that is a shared collaborative space organized by the way you use resources within it, and the way your peers in your social network use their resources as well. New resources can be quickly brought to your attention as something "interesting". Common resources are more aggressively cached, making them fast to access and distribute, more reliable.
- A filesystem that has rich semantics and robust metadata data associated with the resources contained within. This metadata is built via implicit feedback based on the way you and your peers use its contents as well as explicit feedback / assignment you manually perform.
Microsoft and everyone else is focused on the flexible view of resources but only on local data contained on disks you have in front of you.
Where OSS can beat everyone to the punch is the decentralized, networked view which makes file resources as mobile as the peers who use them. This scares content owners, and breaks DRM models. This is an area where OSS has an advantage, and can be on the leading edge of innovation instead of the trailing copycat angle (WinFS/Spotlight, etc)
Why doesnt anyone care to implement this on a lower level so all DEs/filemanagers/whatever can use these kind of features instead of binding it to one or two single enviroments?
45,000 files is nothing - I know people who easily have ten times that. And if it's just about finding redheads with big ones - well.. ok then let's go there: What if you want to put together a slideshow of redheads with big ones? What if you're an old schooler and want it to include those other redheads with not so big ones as well? And what if you want your slideshow to include only "big ones" plus your favorite shots of Julia Hayes and Angie Everhardt? Now you have to navigate to god knows how many directories, fish through thousands of jpegs and add them to some application's "sideshow" list. All so you can enjoy a thirty minute wank. Sure, you can save the slideshow - but repeating this dozens or hundreds of times sounds to me like a huge waste of wanking time.
Enter your data properly, tag it as you add it (and tag the old stuff as you have time) and there's no need to do any more. That directory of 500 Julia Hayes images might be "anonymous" to the database because they're all just numbers and letters, but you only have to select the "JuliaHayes" directory and enter "Julia Hayes Redhead Penthouse Canada usenet dancer height:66 mea:37C-23-34 zoology elephants ballerina" one time to correlate every single file in that folder to the database with those properties. Then the computer will know which are your favorites - they're the ones you keep clicking on and leaving open for minutes at a time.
So now putting together a good wank is only a matter of clicking on your "favorites" link, typing "redheads ++big ones," and enjoying the show. And if sharing this info means it might give some salesdroid the impetus to offer you even more Julia Hayes (ahem) "data"... well, ain't that the point?
The whole point of the DB file system is to easily search for files in a massive HD. The files you'd likely need to search for are not data used by games or applications but your own media.
/vids.. yadaa.
such as text/ mp3s / pics
Thus, forcing all file xactions to take place through a database wouldn't be useful all the time.
You pretty much want something along the lines of what windows media player "Media library" does with your music and videos.
You build a database when you're not using the computer.. and you load the "explorer/searcher" program when you need to find something.
calling it a new filesystem seems a lil much.
The server and client are implemented using O'Caml, but the client has different APIs: O'Caml, C, C++ and Objective-C.
The only languages I see there of any use to 99% of the world are C and C++. What I really need to improve my support structure is to adopt OSS implemented in OcaML -- give me a break.
Or should you put them in the folder named "grandma"?
The idea of a database filesystem is, of course, not novell at all. Implementations have existed for years on the command line, include a mysql mounter..
Here's my idea, as I've been telling people for years (and for which I've been awaiting the arrival of Reiser4):
The darn thing needs to start with a filesystem that supports extended attributes and very rapid name/metadata reading. Then, we need to add or extend commands such as the "ls" command to take advantage of SQL querying of filenames.
Next, build a GUI kpart for making queries and the ability to drag and drop a query as an icon on the desktop (stored view).
Imagine a GUI interface organized into rows and columns where the first row
(1) holds checkboxes for (show/don't show) the results.
(2) The next row holds attribute names in dropdown boxes so you can pick what attribute names are shown (or not shown) from left to right...such as name, creation date, project, and document type.
(3) The third row holds dropdown boxes for sorting which the options: Ascending, Descending, or None.
(4) The fourth row are where-clause text entry boxes. For example, one might put "> 06/25/04 and 07/25/04" under the creation date attribute.
(5) And all following rows are basic "Or" statement lines...show anything that matches the criteria on this line or any other line.
This mechanism has been referred to as a QBE (Query By Example) and it translates easily into a SQL statement. SQL statements can be parsed and a list of relevent files quickly produced.
One thing I hadn't thought of was also putting this mechanism into the File Open dialog.....but I guess that's a natural....I just didn't try to think much about it....good idea.
Matthew C. Tedder
Why is IT dumbing down to the level of those consumners who call the monitor a computer and the beige box under their desks a hard drive?
I've been using this for over a year - saved my ass so many times..
Subverison and DavFS
restore yesterdays file:
svn cat -r {yesterday} http:/..../myfile.doc > myfile.doc
I really need to look at fixing davfs for 2.6 though.
Taking PHP to the next level: phpmole, php codedoc, php-gtk pear installer, DataObjects for php, ldap schema viewer and
Well, the link doesn't work as of course Microsoft cannot comprehend someone might want to link directly to a section..
I found it after a little looking. The API is mainly for programs that want to add metadata, which is neat and all... but not for the recognizers.
Now I figured they had to have something that would a least pull EXIF data from files (and I was also thinking of facial recognition parsers for meta-meta data). But I can't find how you would write a custom one for the system, or how it would get invoked - not all of the links under the WinFS overview appear to work.
I really think Microsoft is banking more on programs creating rich trees of relationships though, more than having a bunch of file parsers (they jusy need them to essentially pre-load the DB).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Ok, related to the last sentence of my last post (Microsoft intends people more to make use of rich API's) I found this blog complaining about WinFS - and describing exactly the biggest problem I see with the whole thing.
If you read the blog entry linked to, you find this paragraph:
No - the "big deal" about WinFS IMHO isn't "search". Like Web Services - it's about the fact that it's an attempt to get a higher level of interoperability between programs through agreement on schemas. Hopefully, toward the goal of bootstrapping network effects, and unintended/innovative consequences, on the client. WinFS defines an extensible object model and persistence mechanism, as well a rich and extensible "relationship" mechanism that can be used to intertwingle objects with that are somehow related to one another. Some kinds of stock relationships are obvious: common Author, common Artist, common Location, common Priority, etc. Some may be more subtle or domain-specific: common Project, common Client, common Contributors, or even manual and thus not-easily-described ties.
Like the first blog says, this hope of "interoperability between applications" is a pipe-dream that may work OK with office but will then proceed to fall flat on its face beyond that realm.
It's pretty telling that Tiger will have Spotlight working next year, while you won't even see a public WinFS is 2006. The domain is more complex, but does it need to be when Spotlight is achieving the same practical effect for 99.9% of use?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Guys,
Oracle has already created this, its iFS. I don't remember how advanced this has gone on Linux, but I know that its available on Windows platforms..
Perhaps the Beagle folks should have done some research about Gadget, which has been in planning for about a year, and been working for more than four months. Why did they go and implement their own thing when they could have just integrated Gadget into the GNOME desktop.
The original idea is both brilliant and quite old. Alan Cooper wrote about it in his landmark book "About Face: The essentials of interface design" in 1995 and again in "About Face 2.0: The essentials of interaction design". He's got a lot of good suggestions for a decent implementation. I hope that the DBFS developers read that. WinFS developers propably have read it too, but as always, they're trying to add too much spice in it, and end up with an indigestible agglomeration of crap, that won't come out of their colons before sometime around 2007.
Ok I understand that serachable databases can be usefull for alot of things, but I still do not see how this can replace hierarctical directory structures?
I mean there are times when I dont know the name of the file, when I created it, or what it was about. But I do know what directory I placed it in?
What if you still want to be able to find a file based on location?
Sorry for the clueless question.
Anybody who ever tried to gather a significant bibliography, won't need any other example to convince him that a DB tool for dealing with metadata can be darn useful.
Sure, I tried to handle it all with symlinks for a while, but a single paper can have many authors, subjects, versions (ps, pdf), etc. rendering
maintenance painstakingly complex.
Since I work in an IT lab, we ended up building our own set of DB tools to allow access to papers through author, date, journal, URL, lab, subject, title, etc. and to generate bibTeX, html or XML indexes.
But I guess there are many other labs struggling to maintain bibliographies, unaware that such tools even exist. And they would benefit greatly from the possibility to handle metadata through a relational DB at system level.
Nothing is foolproof to a sufficiently talented fool.
XFS does not already support operations on extended attributes?
...
OpenBeOS people have already completed the work in the OpenBFS (in fact it one of the only compllleted parts) and they implemented a lot of features not done in the Be implementation, but already planned, like for instance searching on extended attibutes.
Perhaps it is time to porting it to Linux
RSX-11 for the PDP-11 also had versioning (the ;1 ;2 etc. that another respondent mentioned).
It was a DEC thing; they had file versioning on many of their OSes.
Every time that I saved a file, say, from a text editor, the system created a new version.
Multiple versions got annoying after a while, because there'd be so many of them.
Fortunately, what passed for their shell supported wildcards (e.g., "DEL FILE.FOR;[10-21]" (or something similar)).
Versioning would be nice, as long as it is not "in your face".
For example, right-click on a file to pop up a menu that, among other things, accesses earlier versions, or (in a hierarchical filesystem) have a special directory ("..."? "..ver.."?) that contains earlier versions.
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
I was using Safari, not sure why the links would not work right. They do work in Mozilla.
I am a little dissapointed to see that you register file promoters only by extension, and not by other criteria.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Probably no one remembers this, but there was a project called Medusa in development 4 or 5 years ago by Rebecca Schulman and the other happy hackers over at Eazel. It had some awsome indexing technology but a lot of people eyed it suspiciously because it was somethnig of a resource hog, espceially for its day. Probably, these days no one would flinch at dropping that kind of RAM on something like this. Unfortunately, after Eazel crashed and burned, Medusa was heaped on the pile dead carcases of so many other free software projects that line the road to software Nirvana. It's good that this functionality will finally be come to fruition with projects like Storage and DBFS, but I always thought that had Medusa been saved, we might already have a mature document indexing system by now.
Alas, alack.