Using Relational Databases as Virtual Filesystems?
"To conquer our fears we're trying to get a handle on exactly what is where, with the goal of reorganizing the true physical locations of data to minimize the business impact if any single NFS server goes down. At the moment, the plan of attack is to construct a relational Oracle 8.1.6 database on linux which will basically mirror the filesystem in a DB. To accomplish this, I'm writing a horde of scripts using the perl DBI which will poll the entirety of the NFS filesystems on our network and create what basically amounts to a virtual filesystem in the DB which we can then drill into for specific information in much less time than it would take us to search through the actual filesystems in question. In addition, we gain the ability to maintain historical data, which allows us, among other things, to know exactly what went wrong if a luser rm's, mv's, or cp's the wrong thing to the wrong place.
Has anyone tried this before? And is this even a good idea? Does anyone know of existing packages that will do this? I'm really curious what the slashdot community thinks of the idea. I was several hours into this before someone said to me, 'Do you realize you're writing a filesystem in SQL?'"
I see nothing wrong with what you are doing, and it is one heck of a good idea. I really wish I had the time to go in and write some little drivers that would journal the addition/deletion of files and folders to the personal SQL server on my PC, so that when I do my (frequent) searches for files, they would be quick. Locate is nice, but it isn't real-time...
You'll get good info, and the info is the most important part for doing a good job of reorganizing things. Ad-hoc can be fine when everybody is responsible for their own stuff, but when the whole system is supposed to work cohesively, nothing is as cool as a really well-engineered large system that is bulletproof, and you can't do that with out really good planning.
Maybe off topic...
I think the whole "files and folders" system is artificial anyway. We had no concept of files at first, until the technology on the mainframes was good enough that we were able to finally put some structure around the data. Then we started categorizing the files, and eventually the device, folders/directories, etc. structure evolved. In addition, we have this server hierarchy, with mount points, etc. It is a lot more complex, and somewhat more capable of organizing our data, but it still isn't even close to how we really think. It was invented because it was a good way to organize that was efficient for the horsepower available at the time. Now we're somewhat entrenched by it.
Database filesystems are a much more natural way to do things, though. How do you categorize your MP3s? By Hard rock/Soft Rock/Pop/Oldies? By Artist? By title of the song? With folders you have to pick one method. With a database, you can switch anytime.
Now that the computers are capable of it, we're starting to move in the direction of database filesystems already. MP3 categorizers are coming in quickly, as are filesystem indexers (locate, and MS's indexing server). Handheld devices kind of go by a "database" filesystem as well.
I envision a filesystem as follows: a flat set of files, each with a serial number (inode number basically). Also, a database that associates each inode with any number of attributes. There are certain pre-defined attributes with globally well-understood meanings, and namespace rules about defining new attributes and personal-use attributes. The attributes can themselves have attributes (more on this later).
Attributes include file names (as opposed to "filenames," though similar), creation/modification/access dates, owner, comment, file type, keywords. And a file may or may not be assigned a "Default open with" attribute... Or maybe 2 or 3 (in which case a list box would pop up when you double clicked on the file!)
If the file didn't have a "default open with," what then? Well, it probably has a file type attribute. Say, File Type text/plain. Well, the attribute text/plain has a "default open with" attribute of "GVim.EXE." Well, cool!
This would be nice in a lot of places: source code control would be very simple. Just put a "version" attribute on, and move the "latestversion" tag around. Also, this eliminates the need for multiple copies of a file in a build environment (well, theoretically, you should be able to eliminate this by proper engineering, but we all know that sometimes you miss something and have to copy a file somewhere...) -- you just refer to the same file twice.
That is my idea of how the database file system should work. Of course, it really messes with our current paradigm, and it introduces some problems (bye bye canonical pathnames for files!) but sometimes I really wish I had a database behind my filesystem, not a hierarchy.
Time flies like an arrow. Fruit flies like a banana.
We don't use Filers where I work, but just the local file system. Our CVS root looks something like this:
systems/$hostname
systems/$hostname/home
systems/$hostname/etc
systems/$hostname/var
and so forth. You can then add files as you want -- if you don't want to back up everything in /etc as it was created automagically, then don't.
This set up uses a very mature piece of software. It has a lot of nice interfaces (cvsweb, wincvs). It also gives your users the ability to pull their own backups and doing branching of their home directories :)
It works really well as you can back up the central repository at what ever frequency you want.
It may take a lot of storage to back it all up, but it will probably be smaller than an oracle database and you will get diffs for your history.
-- DrZaius - Minister of Sciences and Protector of the Faith
The guy who wrote Reiser FS has quite some things to say on this very subject: What he says (in short) is:
.) Databases shouldn't exist. They exist only because System-Programmers didn't listen to Application-Programmer's needs.
.) Those Application-Programmers implemented their own solution to the problem: Databases.
That's why he started Reiser FS. It is today the ideal filesystem to store little bits of information ( 100 bytes) which you normally store in Databases, because you don't want millions of small files lying around.
(I know this is a simplified view, read his own arguments for a broader discussion.)