Slashdot Mirror


Using Relational Databases as Virtual Filesystems?

Pogie asks: "At my office, we've got what one could only describe as a huge Network attached storage infrastructure. We're talking multiple terabytes of applications, user trees, data files, sybase and oracle databses, etc. 'In the beginning' it was a concious decision to create a shared NFS infrastructure using NetApp Filers (I humbly recommend them over SAN solutions any day...flame on!), but our data center has grown so large, and there are so many interdependencies that we're becoming concerned that if the wrong filer goes down, our production network would be, to say the least, hosed." To combat this problem, Pogie wants to implement his filesystem in a relational database...Oracle to be precise. Read on for his reasoning.

"To conquer our fears we're trying to get a handle on exactly what is where, with the goal of reorganizing the true physical locations of data to minimize the business impact if any single NFS server goes down. At the moment, the plan of attack is to construct a relational Oracle 8.1.6 database on linux which will basically mirror the filesystem in a DB. To accomplish this, I'm writing a horde of scripts using the perl DBI which will poll the entirety of the NFS filesystems on our network and create what basically amounts to a virtual filesystem in the DB which we can then drill into for specific information in much less time than it would take us to search through the actual filesystems in question. In addition, we gain the ability to maintain historical data, which allows us, among other things, to know exactly what went wrong if a luser rm's, mv's, or cp's the wrong thing to the wrong place.

Has anyone tried this before? And is this even a good idea? Does anyone know of existing packages that will do this? I'm really curious what the slashdot community thinks of the idea. I was several hours into this before someone said to me, 'Do you realize you're writing a filesystem in SQL?'"

3 of 52 comments (clear)

  1. Re:sounds crazy by rfreynol · · Score: 2, Informative

    Actually, if he used Oracle iFS (internet files system) the files could reside in the db (security and ability to get good, clean backups) while being mountable in via many protocols (nfs, samba, http, ftp, etc.)

    I often hit files with grep/sed/awk from a unix box that has a iFS filesystem mounted via nfs.

    The solution to his problem is iFS, and he probably already has a license for it.

  2. Database storage -- and the real battle: the users by webwench_72 · · Score: 2, Informative

    There's a product I've been working with for 3 years which does something similar to what you are asking for:

    OpenText Livelink

    They have a number of other products that probably have similar underpinnings.

    Basically, a database (can be Oracle, Sybase, or SQL Server) keeps meta-information on items stored in the database. Items can be documents, folders, URL links, tasks, discussion topics and replies, etc. Meta-information includes dates of creation, updating, deletion (i.e. the makings of an audit trail), whether the document is checked out for editing and if so, by whom (that makings of a simple source-control system), who can see or edit the document (implies that a table of users is also maintained, which it is), etc.

    Livelink provides a server (basically a big CGI app) that you can run through a web server, allowing this stuff to be navigated and maintained through the web. The web pages are customizable. Really, the whole thing is customizable, which means you can write all kinds of little apps and processes above and beyond what is supplied by Livelink (our most common examples are scanning apps, that scan and store new documents in one step, and document expiration processes, which force certain documents to be read and revised every 6 months or whatever). There's also an API for VB, C++, and Java, to allow access methods other than the web.

    Depending on the number and size of documents you're storing, documents can be kept in the database itself, or can be kept in a filesystem, and pointers to those documents stored in the database. The second option is usually preferred because the first option will cause trouble when it comes time to backup or restore from backup, or to migrate data, etc.

    The biggest disadvantage from a user's point of view is the need to log in, if you plan to keep any semblance of an ACL, source control, or auditing. You could provide one common login for read-only access to most of your files, which would ease the pain a bit. Or you could 'roll your own' solution, based on some of the premises used by this type of system.

    I'd like to add that I think the technology to use is the least of your issues. The biggest issue will be in finding and categorizing all of the content that's already out there. When you find 4 different variations of the same document, or 3 different builds of the same source code, or tables for 3 different apps in one database schema, and it's not clear what is what, how will you know who to contact? Who among your users has the extra headcount to spare to give you detailed info on all their files and databases, etc? How much stuff is out there, that was owned by people who have left or been laid-off, who no one else can provide info on? That's gonna be your real battle.

    --

  3. Re:This is how things should really be anyway by Anonymous Coward · · Score: 1, Informative

    I'm not sure, but isn't this in some ways how the BFS (The Be File System) is organized? I mean a BFS file could be associated with arbitary amounts of metadata, one of which was the "default open with" mentioned in the post I'm responding to.

    In his article comparing Mac OS X to Be OS Scott Hacker (author of the Be Bible) even used the MP3 example to demonstrate the flexibility of Be OS metadata. He demonstrated that he could, by only using metadata effectivly mirror the ID3-tags in his MP3 collection and query by artist/genre/year/etc. directly from the Tracker (the Be OS desktop application). How the BFS goes about handling folders I know nothing about.