Slashdot Mirror


Using Relational Databases as Virtual Filesystems?

Pogie asks: "At my office, we've got what one could only describe as a huge Network attached storage infrastructure. We're talking multiple terabytes of applications, user trees, data files, sybase and oracle databses, etc. 'In the beginning' it was a concious decision to create a shared NFS infrastructure using NetApp Filers (I humbly recommend them over SAN solutions any day...flame on!), but our data center has grown so large, and there are so many interdependencies that we're becoming concerned that if the wrong filer goes down, our production network would be, to say the least, hosed." To combat this problem, Pogie wants to implement his filesystem in a relational database...Oracle to be precise. Read on for his reasoning.

"To conquer our fears we're trying to get a handle on exactly what is where, with the goal of reorganizing the true physical locations of data to minimize the business impact if any single NFS server goes down. At the moment, the plan of attack is to construct a relational Oracle 8.1.6 database on linux which will basically mirror the filesystem in a DB. To accomplish this, I'm writing a horde of scripts using the perl DBI which will poll the entirety of the NFS filesystems on our network and create what basically amounts to a virtual filesystem in the DB which we can then drill into for specific information in much less time than it would take us to search through the actual filesystems in question. In addition, we gain the ability to maintain historical data, which allows us, among other things, to know exactly what went wrong if a luser rm's, mv's, or cp's the wrong thing to the wrong place.

Has anyone tried this before? And is this even a good idea? Does anyone know of existing packages that will do this? I'm really curious what the slashdot community thinks of the idea. I was several hours into this before someone said to me, 'Do you realize you're writing a filesystem in SQL?'"

7 of 52 comments (clear)

  1. You're really putting too much effort into this by Mr.+Foogle · · Score: 2, Insightful
    Or rather, too much effort in the wrong direction. I can't imagine this would work very well.

    You don't need to stuff your file system data into a database, you need to investigate high-availability .. ya know, fall over systems, redundant everything etc.

    My biggest objection would be that you're violating the KISS rule, and making life a living hell for whoever follows you.

    Or, maybe I just have a limited imagination.

    --
    Display some adaptability.
  2. Oracle iFS by sohp · · Score: 3, Insightful

    Why yes, this has been done, and you can get it from Oracle, under the name Oracle Internet File System, and I've played with it a bit. Interesting concept, not a very robust implementation, but perhaps it's gotten better since I tried it under 8.1.7? It's kind of neat to be able to mount a drive under windows that's really data in an Oracle table.

  3. Re:HA Linux by sigwinch · · Score: 3, Insightful
    In the end, what extra capabilities is Oracle going to give you that the right sort of filesystem wouldn't?

    1. Consistent backups are trivial. Are there any common filesystems that provide this?
    2. Applications can do execute atomic transactions that involve multiple files and directories.
    3. It is easy to keep older versions files around and do undeletes.
    4. If you store keep the data and metadata in separate tables, it is easy to create totally different views of the same files. Dunno if this is useful...
    5. You can do access controls as appropriate to your problem.

    I'm not necessarily advocating RDMBS-as-filesystem, but the idea does have some merit.

    They're two very different ideas for data storage (heirachical vs relational is just for starters).

    Hierarchical is a special case of relational: 1) Each item has a foreign key for its parent directory, or NULL if it's in the root. 2) There is a UNIQUE constraint on foreign key + item name.

    --

    --
    Kuro5hin.org: where the good times never end. ;-)

  4. Why relational? by cperciva · · Score: 3, Insightful

    IANADE (... Database Expert), but...

    Why are you considering a *relational* database? Unless you're planning on completely changing filesystem semantics I don't see why you wouldn't just use a simple hierarchical database.

    I mean, seriously, you want to have a filesystem which acts like a distributed database; but you don't really need to be able to run RDBMS queries do you? You'll probably end up with a much better result if you work down a checklist and decide which database features you want and which will just add bloat.

  5. The idea dates back to the 60's by one-egg · · Score: 4, Insightful
    Back in the 60's, there was the Michigan Terminal System, out of the University of Michigan. Their filesystem was DB-based. That was before relational became the "in" thing, so it was ISAM.

    It was an interesting idea. I think that the problem they had in MTS will be the same with your idea: not everything fits neatly into the DB model. In fact, some things really have to be shoehorned in.

    The insightful reader will be saying, "But wait! You also have to shoehorn stuff into the conventional FS model." True enough. The question is how much fits naturally and how much has to be shoehorned.

    My contention is that the conventional model is a better fit for most stuff. That's especially (perhaps sadly) true because of legacy software that expects the conventional model. Perhaps a ground-up OS and application implementation would be able to rethink some of those issues and find new insights. But I'm naturally skeptical.

    There is also the issue of performance. I know little about DBs (my loss), but it seems to me that if the FS is stored in an existing relational system, you're going to have to warp some stuff to make it fit. I'd suspect that either you're going to have to make every file be a different table, or you're going to have to store the contents of every file as a variable-length text field. Either option is going to have really nasty effects on the efficiency of the DB, which has been highly optimized under the assumption that each table contains tons of highly homogeneous records.

    I wouldn't want to dive into that kind of can of worms as an "I want to use it in production" project. It might make interesting research on a 5-year horizon, though.

  6. sounds crazy by DrSkwid · · Score: 1, Insightful

    i can understand why one would want to put the database into the filesystem but to put the filesystem into the database!!

    you hide the data from many of the day to day tools people are used to using

    cat, more, >, >>, awk, sed, grep, join, and a host of other cli tools.

    hierarchies work and people understand them. How are you going to switch the filesystem tree into RDBM tables. It would keep Codd himself burning the midnight oils.

    You might be getting yourself some extra sleep at night secure that your Oracle won't fail you but other tools are around to help keep a filesystem available (as other posters have mentioned).

    Think about what you are throwing away and locking in to. All new employees will have to learn Oracle SQL to get to files, a delete with a badly phrased where clause could cost you plenty of time (and don't say you haven't done it).

    A simple example
    vi /home/data/somefile.txt
    or
    select data from home-data where filename='somefile.txt' > tmp
    vi tmp
    sed tmp -f escapescript.sed
    copy the output to the clipboard
    update home-data set data = 'paste here'

    Ok I realise that a few scripts can replace it but please, what's the gain?

    The expertise of your internal support staff will need to be at the advanced oracle user level.

    But, hey, go for it

    come back in a year and let us know how it went

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  7. Re:Some possible advantages and shortcomings by duffbeer703 · · Score: 3, Insightful

    There are plenty of reasons to want to use an RDBMS.

    If you are working with financial data or health records, there are Federal reporting requirements relating to who accesses what data where & when. Using a central Oracle or other RDBMS makes it easier to keep track of what's up.

    Why Oracle? Maybe the organization has a bunch of PL/SQL gurus. Maybe having Java integrated into the DB is advantageous. Or maybe they have a giant Oracle server sitting around with extra cycles.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK