Slashdot Mirror


MySQL FS

xcyber writes "Developer, Database Admin and user, MySQL is developing an mysql filesystem for Linux to mount database on Linux as a fs. This is still in development stage and the development team would like to receive comment on this. So please let us know. " "Because you can" dammit. Thats just plain awesome.

77 of 198 comments (clear)

  1. Re:SQL? by smileyy · · Score: 2

    Amongst all these other examples, it's probably worth noting that SQL is a declarative language. Basically, it allows you to express the results -- without worrying about the procedure used to generate the results.

    --
    pooptruck
  2. Re:Liquid file system by William+Tanksley · · Score: 2

    You're absolutely right that a prototype could be built using current file systems, but said prototype would be SLOW and eat a LOT of space. It's better to use appropriate data structures and algorithms.

    And yes, file systems are databases; they're merely inflexible databases using ANCIENT technology. Not all databases are created equal.

    -Billy

  3. sweet! by Phexro · · Score: 4

    phexro!pyramid:~$ SELECT * from pr0n WHERE sex='f' AND species='goat';
    --

    1. Re:sweet! by Fervent · · Score: 2
      Since I haven't used SQL in a few years, what command would make !(this SQL command) [e.g., no goats, please]?

      -
      -Be a man. Insult me without using an AC.

      --

      - I don't care if they globalize against free speech. All my best free thoughts are done in my head.

    2. Re:sweet! by jonnythan · · Score: 2

      Isn't it amazing how much more you post to slashdot and stuff right after you break up with a girl?

      Just went through it to. Good Luck ;)

  4. Re:One step further by William+Tanksley · · Score: 3

    Nice. However, first things first: any replacement for the current system has to start by doing all the things the current system does, at least as simply. This is the main reason I think 'cd' is a good command to include.

    It's BAD to try for too much with the first release. If you'd like an 'object system', by all means prototype one using conventional directories; you'll decide quickly that it's little different from modern Unix (remember ioctl!). In other words, an overly complex solution.

    We need a true file system, one in which ioctl isn't needed. See the latest plan9 OS for details.

    -Billy

  5. Oracle File System by eric2hill · · Score: 3

    A while back (a year maybe?) Oracle announced their iFS product. Dubbed the Internet file system, it gave file system, IMAP, POP, FTP, and web access to the database through a common software. I haven't had the chance to work with it, and it still may not even be available, but to be able to store files in the database and enforce integrity, it's extremely easy to track revisioning, maintain lists, and perform searches and reports. It seems like wonderful technology that should be a part of every OS, but I'm curious as to performance. Has anyone had any experience with iFS?

    --
    LOAD "SIG",8,1
    LOADING...
    READY.
    RUN
  6. Re:People are missing the point. by drudd · · Score: 2

    You can almost always access BLOBS (or equivilent fields) from different languages and environments, the problem is that they're all different. DBI, JDBC, ODBC, etc... each one has it's own gotchas. Being able to access these fields as a part of a filesystem gives it the ultimate portability. Even shell scripts can quickly and easily access the database!

    Doug

    --
    Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
  7. Re:This project should help MySQL by daniell · · Score: 2
    The additional consideration for this line of thinking is:
    • Does MySQL have the same directed and well defined core of development that linux development had and has?[think kernel here]
    • Is the current base of MySQL well written enough, with enough source infrastructure to survive eventual restructuring during concurrent feature enhancement?
    • Is there, as there was with linux, little competition in similar projects offering a similar feature set that might attract more followers or be a better candidate than MySQL for development attention?
    I'm not saying that MySQL lacks any of these. But there are tons of opensourced projects that just needed a bit of getting better that never did because they never really were good enough on a source level. Linux is a lucky case, but take heart, if there hadn't been linux, you still could have run the most fo the gnu system on BSD thanks to GCC.

    Lastly, I'm largly unaware of any linux-only apps that actually make or break a user's choice to use linux vs. any other unix. I think what really makes or breaks the choice is price-point and percieved momentum. pauvre pauvre netBSD.

    -Daniel

  8. Re:This might stimulate nerds and developers by Masem · · Score: 2
    Not necessary: I'd assume that when you get to the level below the field level in the dir structure, things would behave like links to items. That is, if I have items '325', '326', etc under Employee_ID, and items 'Smith, A.', 'Anderson, N.', etc, those would point to the same set of unique objects, specifically the data base records. So Mr. HR guy comes along, and if he knows that Smith gets a $100,000 raise, he doesn't have to know he's employee #326, just that he's an Employee, findable under a standard OS find function.

    --
    "Pinky, you've left the lens cap of your mind on again." - P&TB
    "I can see my house from here!" - ST:
  9. Good idea by debrain · · Score: 2
    I think this is a brilliant idea. It opens up the database to a whole slew of standard commands, but in particular it makes sure the database has a sensible way of being accessed.

    As well, this would be fantastic for configurations (in particular the complex ones of Gnome and KDE) since large amounts of data could be elegantly compartmentalized in a standard way. I find this nifty with the growing complexity of filestructures in these config sets, they would be open to editing and updating through the standard filesystem method, or through a standard SQL query system.

    1. Re:Good idea by Masem · · Score: 2
      XML.

      Seriously. The average config file is an heirarchical structure as opposed to a table structure. This makes it ideal to use XML.

      I think that making config files usable by XML, one could write custom configuration apps that would work with any program. Example: mythical "gappconf" would simply need the rc file as well as some DTD description of the tags for that program, and it would present the standard tree widget with descriptions of what options you can change, what restrictions you have, etc.

      Now, extending your idea above, I would think it's also possible, but beyond my ability, to write a fs that fakes a directory structure based on an XML file, so that you can decend into directories via your favorite shell and change specific options that you want.

      --
      "Pinky, you've left the lens cap of your mind on again." - P&TB
      "I can see my house from here!" - ST:
  10. Read "Distributed Systems" by Sape Mullender by mibailey · · Score: 2

    Section 14.2.2 Has a discussion of File Systems versus Databases written by M. Satyanarayanan (of AFS and CODA fame). He says that although file systems and databases have much in common, there are several areas in which they differ conceptually including encapsulation, naming, and the ratio of search time to usage time. Basically file systems are appropriate when there is high temporaly locality, while databases are used in situations where there is little locality and concurrent read and write sharing of data at a fine grain level are required.

  11. MysqlFS by cazz · · Score: 2

    All of the high end RMDBS use raw file system access. By not using a "regular" file system, you gain a huge performance jumps. If MySQL is doing its own locking and recovery, then the overhead from the file system is wasteful.

    Oracle has been doing this for years.

    This is just another important step that MySQL needs hurdle before it is concidered for high end applications.

    --
    -b
    1. Re:MysqlFS by Erasmus+Darwin · · Score: 2
      All of the high end RMDBS use raw file system access. By not using a "regular" file system, you gain a huge performance jumps.

      Which actually has nothing to with the posted article. This article is about creating a virtual filesystem that serves as an interface to the contents of an existing MySQL database (kind of like the /proc filesystem). It doesn't mention anything about creating a special filesystem for MySQL to store its internal data in a more efficient form.

  12. PalmOS by Spankophile · · Score: 2

    This sounds vaguely similar to how PalmOS apps work.

    THe entire filesystem is based around the idea of a database, where memory chunks are accessed based on the name of the app, etc....

  13. Re:Isn't this what Reiser FS is for? by gimpboy · · Score: 2

    Warning: I'm human. Sometimes stuff I post here is wrong. Use your head. Question authority

    this must be one of those times. without some way of querying your file system (except for ls) then you loose the relational aspect of the database. then you just have a filesystem that stores metadata for fast recovery. this is good from a fs point of veiw, but it does nothing to help you find the files you are looking for. it provides no relations between files, and does not store file descriptions (that are useful to a human).

    eg. this is an image file that can be catagorized under political, humor, bill clinton, letch, etc.

    use LaTeX? want an online reference manager that

    --
    -- john
  14. Re:Liquid file system by Khelder · · Score: 2
    A system like this has been implemented at Xerox PARC. It's called Placeless Documents. It seems to have ended, but there's a follow-on project called Harland that provides an attribute-based storage mechanism for Java (and is available "for trial use", whatever that means).

    I've seen presentations about Placeless Documents and it's really cool.

  15. A DB FS? by jd · · Score: 2

    Hmmm. Interesting concept. Not sure what the use would be. 'where' would be easy to implement in SQL, though.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  16. Like the AS/400! by kroymen · · Score: 2

    This is a great idea if it's implemented well. The AS/400 is an example of a system that was entirely implemented around the idea of a full-featured DB implemented as a filesystem.
    ...and since it still has one of the best uptime records in the industry, and transaction processing times that consistently rank in the best-of-the-best lists, it's a good platform to imitate. Too often it's overlooked because of the green-screen terminals, but at its core, the AS/400 is easily one of the most advanced implementations of computer technology available to the general public.

  17. Re:Other low-level data storage systems out there? by costas · · Score: 2

    Well, SQL wont work --I just don't need (or want) the slow down from interpreting and abstracting SQL commands --and yes, I want *all* the speed I can get.

    I am looking for something way lower level. ReiserFS isn't a bad solution, I just dont believe that their plugin API is mature enough to base another project on top of --or am I wrong?

  18. Re:This might stimulate nerds and developers by rho · · Score: 2

    Good point -- didn't think about being able to use the "Find" function of an OS.

    However, how is this better than a dedicated web app for HR flacks?

    Find your favorite non-computer-literate person and see if they even KNOW that there's a "Find Files and Folders" in their Start menu? (I'm assuming a Windows-centric office here)

    Like I said, I think it's cool and potentially useful, but probably not as useful for non-nerds

    --
    Potato chips are a by-yourself food.
  19. Re:Liquid file system by William+Tanksley · · Score: 2

    I'm not sure what you mean when you say 'staticly' or 'dynamicly'. I suspect, however, that you're assuming that the foundation of a fluid file system is a set of files, directories, and links. It's not. It's almost certainly a relational database, one optimised for the task of getting a set of items (objects, files, whatever) which are categorised under a given set of categories.

    I also don't know what you mean by 'number of possible categories'. I think you're mistaking 'categories' for 'sets of categories'. In my example, "/etc/wtanksle" is a set of two categories; "etc" is a category. I could see some reason to cache the results of category queries; that's an optimization concern, and not my specialty. I don't see any reason to try to precache all possible queries, as you seem to imply.

    Its speed will be almost irrelevant; I predict that it'll be about as fast as the current system, but even if it's hundreds of times slower it'll still be fast enough, since caching is trivial and looking up a file based on a full filespec is almost never done.

    -Billy

  20. Re:This project should help MySQL by mr · · Score: 2

    Would the kernel be anywhere near where it is today if people hadn't gotten others interested by writing intriguing, linux-only apps? Probably not.

    Your analysis is wrong.

    Most anything shipped on linux distro is nothing more than a Unix program PORTED.

    Unless GCC, X and others are 'linux only' apps.

    --
    If it was said on slashdot, it MUST be true!
  21. reiserfs is intended to be a database fs by MenTaLguY · · Score: 3

    Much of the ultimate point of ReiserFS is the marriage of databases and filesystems (filesystems are really just a limited sort of database anyway). This is the reason for the all the commercial funding; there are people out there who really want this.

    See Hans Reiser's White Paper for information on where he's going with this.

    For what it's worth, database filesystems are not a new thing at all. Hans is just planning on accomplishing this in a way that completely preserves the Unix file metaphor and related concepts.

    --

    DNA just wants to be free...
  22. Never use a gift horse to do a man's job by laertes · · Score: 2
    This seems like a horrible idea. The idea of both a filesystem and a database is to store data in a (hopefully) secure, long term fashion. However, to call their aproaches radically different is understatement bordering on absurdity.

    A database is about data. The data is partitioned into tables and columns, with a large number of additional constraints (unique, primary key, foreign key and check clauses, for example) to limit the values of the data. Additionaly, the data is strongly typed. In order to access this data, SQL supports very high level commands, like SELECT, INSERT, UPDATE and DELETE.

    The power of a database is most basically in its very high level nature. You, as the user/programmer, do not care where the data is, who else is using it, how it is stored, or what the old values where. The database management system takes care of all of that. Other powerful features of databases include indexes, joins, subselects, real NULLs, aggregate(set) functions, and GROUP BYs (sub-setting).

    Now, contrast this with the low level file/directory structure. In this, you have a hierarchy of directories, each of which contains one or more files. A file is nothing more that a stream of bytes, and the only constraint they have is that they be uniquely named within their directory. Also, a single file can be in more that one directory.

    In order to use a file, the programmer must know where the file is, possibly who else is using it (with lock files, for example), what format the data is stored in and, if they want to be able to undo their actions, the old values. The advantages to files are the plethora of tools for manipulating them (at least in the case of text), and lower startup cost (eg. it takes less time to make a stupid file format than a SQL schema).

    This project is therefore brain-dead as an application development platform. 'But,' I can hear the reply, 'it's useful for users who want to change the data in the database.' Reply: every database accepts SQL, which modifies the data. Some SQL API's I've seen only take two lines of code to retreive some data. And SQL won't shit on your data if you accidentally type it in in the wrong format, it'll conplain, but your data will be safe and secure.

    This is quite possibly the worst idea I've ever heard. Worse than Linux as an Internet Explorer plugin, worse than Napster as a family tree generator, worse than Quake III as a spreadsheet, and even worse than Apache as a VMS shell.

    Not that I have anything personal against it.

    --

    Yes, I'm still a junky. Are you still a bitch?
  23. Re:Liquid file system by Elbows · · Score: 2

    This sounds really cool, but it seems there could be some problems with implementation. If you build category listings dynamically, this drastically slows down tasks like a simple directory listing (or even locating a file by name), because you start having to do searches. Of course you can speed this up wi/ good indexing, but you still have to pull those indices off the disk and do a fair amount of processing.

    You might be able to build some of the categories statically, but if your fs is truly fluid, then the number of possible categories is gonna be too huge to build and maintain statically. Maybe it needs to be a little less liquid, or maybe you can find a way to indentify commonly accessed files/categories and build that stuff statically, then do everything else dynamically.

    I also think this needs to integrate with rather than replacing a traditional fs. I doubt this method will ever be as efficient in terms of looking up, creating, and deleting files as a traditional fs, so it would be bad for system stuff like /bin, /temp, etc. OTOH, it would be great for home directories where the user is mostly storing documents and a relatively minor performance hit isn't noticeable.

  24. Re:More like BeOS's filesystem? by Alderete · · Score: 2

    Early version of BeOS did use a database FS for the entire system but they dropped it by R4 (I think that's the right version) because of performance issues.

    The early versions of BeOS used a separate database (not very complex) and filesystem, which wound up being very difficult to work with, so eventually they merged the two. The "database" aspects of the BeOS filesystem are more of being able to add (relatively) arbitrary data to particular filetypes, and do searching based on those criteria. It isn't a formal database in any sense of the word.

    Versions of BeOS prior to the Preview Release had a file system and a separate database. Because it was difficult to keep the data in the two separate systems consistant, it was decided that they should merge. This happened in Preview Release 1, and BFS remains relatively unchanged today.

    At the time there was a lot of enthusiasm for the merged design to be a database-based file system, but after a lot of research, Dominic Giampaolo, the engineer doing the design and coding, determined that wasn't going to work. The reason is it becomes too difficult to filter out the files you aren't interested in. There is a lot of organizational value in a hierarchical, structured, traditional file system.

    The design for BFS that was implemented is best described as an "attribute-adorned file system," with a query engine that can search against the attributes, and some indexing to make common queries fast. There's a fairly simple query language (along with simple GUI tools), but it's not as complex or capable as SQL (nor would you really want it to be). You can execute those queries from the command line if you want, which can be pretty useful when piped to another program (much as find is in Unix, but simpler to work with).

  25. Re:Liquid file system by Hard_Code · · Score: 4

    Damn, you...both of you stole *my* idea! ;)

    For a long time now I've been thinking about filesystem-as-database concept. We've passed the point where computing is about optimizing hardware resources. It is now about optimizing *user* and *information* resources. If your hardware is blazingly fast, but you are lost in a sea of irrelevant information, you can't do anything. I think that's where the database/meta-filesystem comes in.

    With all this rich content around, we should not be searching for files based on some arbitrary linear categorical name. We should be searching on *attributes*. We should be searching on *association*. E.g., "List all files relating to my work that I have store on my home computer", "Now, of those, show me all files that pertain to status reports". Or "List all data I have on the artists and bands in my music collection". etc.

    This is where plain, flat, hierarchical file systems fail. We need basically a data "repository", and various ways of obtaining information from that repository, based on attributes, categories, mime types, relation to *other* files, etc.

    --

    It's 10 PM. Do you know if you're un-American?
  26. Isn't this what Reiser FS is for? by Bruce+Perens · · Score: 4
    Reiser FS is for building a database as a filesystem. See namesys.org .

    Bruce

    1. Re:Isn't this what Reiser FS is for? by cduffy · · Score: 2

      The reiserfs folks are working on both a plugin API and additional hooks to add access to its DB features (which have been designed into the low-level stuff for quite some time).

    2. Re:Isn't this what Reiser FS is for? by PureFiction · · Score: 3

      Except there is no SQL interface to reiserFS ;)

    3. Re:Isn't this what Reiser FS is for? by Bruce+Perens · · Score: 2

      Yes, you don't have to tell me that :-) .

  27. Liquid file system by William+Tanksley · · Score: 5
    This is exciting on a number of levels, even if the specific database being used isn't my choice; I've been looking for a suitable base for some of my ideas regarding a "fluid file system" (someone else generated a good writeup, calling their ideas a liquid file system, but my name is better :-).

    In my vision, 'documents' would be categorised, and the categories could be viewed in a manner very similar to how we now view directories, except that a file is in more than one folder at a time. A file which is named /etc/wtanksle/ppp.conf could also be referred to as /wtanksle/etc/ppp.conf, or if it's unambiguous, /etc/ppp.conf. /dev/removable gives the list of all removable devices; /dev/scsi gives the SCSI devices (including the removable ones).

    The potential uses are many -- I think it would make a lot of common computer tasks a lot easier.

    Oh well -- anyhow. :-)

    -Billy

    1. Re:Liquid file system by gimpboy · · Score: 3

      i've been working on something sort of similar. i upload a file into the database (currently storing the files on a normal partition) and the file has associated with it a file type, description, md5 hash, and a couple other things. now when ever i want a picture of clinton. i do a select where file type is image and description has the word clinton in it. right now i only have a php interface, but i have a friend who's going to do a perl/console interface.

      the cool thing is that i can stream the data via apache to whatever application i want. so i'm going to upload all of my mp3's and build file lists based on the primary keys of the files. then i can stream the data to mpg123/xmms. it works really well, and since i store the md5sum i can prevent myself from storing exact copies of a file.

      i'm useing postgres right now. if they had the ability to mount raw partitions, and get over the 8k limit (this ones coming soon) that would be great. it would make backups easier. now i just have to dump the database and then backup the db dumb and the /files directory to tape.



      use LaTeX? want an online reference manager that

      --
      -- john
  28. This project should help MySQL by Galvatron · · Score: 3
    The licensing issues are, for many people, the MOST important. Just because this won't be very good at first doesn't mean it won't improve, and most likely the more popular this project gets, the more work will get done on MySQL.

    Take the linux kernel. Would the kernel be anywhere near where it is today if people hadn't gotten others interested by writing intriguing, linux-only apps? Probably not. Perhaps one day MySQL will evolve to the point where this will be useful, perhaps due to developers attracted by this project.

    --
    "The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
    1. Re:This project should help MySQL by Bruce+Perens · · Score: 2
      Well, he's a bit confused but not too far off. The kernel would have been a heck of a lot less interesting if there hadn't already been the GNU system to run on top of it.

      Thanks

      Bruce

  29. This might stimulate nerds and developers by rho · · Score: 2

    The everyday user won't exactly go nuts over it, though.

    The site gives the example "imagine marketroids browsing through the directories to directly access columns and entries" (or words to that effect)

    No way. Hey, don't get me wrong, I LIKE that idea, and it gives me a pretty cool idea for a couple of projects that I'm working on, but think carefully about it: any sufficiently useful database for a large company is also sufficiently large that a directory tree is absolutely the slowest and most confusing way to access data held within a database.

    For example, let's look at two examples:

    • input SQL directly: "update employees set salary = '100000' where employee_id = '325'"
    • browse directory tree: "okay, double-click on "Databases", double-click on "Human-Resources", double-click on "Employees", scroll down until you see "325", double click on "325", then double-click on "Salary". Change the number to "100000"

    It's not bad, but it's not as good. Plus, with good programmers (and good communication between programmers and management), the SQL is so abstracted out, it makes no difference. It gets condensed to a list of names and a checkbox next to the names. Those that get "checked" get a raise to $100,000.

    To be truly useful to non-programmers (or non-analytical thinkers, if you will), the MySQL-FS would have to abstract out so much of the Database, you're back to a filesystem and a set of scripts to update a MySQL database.

    It's cool, but it's not for your regular joe. Beyond a couple of levels, the average computer user gets lost in a heirarchal filesystem -- assuming they don't fill it up with "Untitled Folders" and such.

    --
    Potato chips are a by-yourself food.
    1. Re:This might stimulate nerds and developers by rho · · Score: 2

      and POW! You've got the religion!

      Currently, programmers are about as far removed from the computers they work on as a chimpanzee is from a spider monkey. An end user is an armadillo.

      People DON'T think in terms of files and folders -- that's a hold over from filing cabinets where it is the only method to wrangle and organize physical documents. People think in terms of tasks and projects.

      If you're working on a project and you remember something you did 2 years ago that you can re-use (or at least build from), you don't say, "It's in that tape we backed up to in the 'Projects for Harold' folder". You say "Didn't we do something similar for Harold a few years ago? Something to do with sand and apricots, I think"

      You pull up Harold's info, look through the list of projects done for him, and find one described as "Peachs and Beaches" -- bingo, you've found it.

      --
      Potato chips are a by-yourself food.
  30. whats the point? by Sanity · · Score: 2
    Why does everyone associated with SQL suddenly think that the more things that use that bloated, outdated language, the better? What is the point of this? It is just adding a very thick, slow, and unnescessary layer to something where it is normally essential that it works as quickly as possible.

    --

  31. Re:Other low-level data storage systems out there? by cduffy · · Score: 2

    You can compile your SQL statements at the beginning of your program, so that they aren't reinterpreted later. Thus, unless load time is essentual to you, you may be better off with an existing database.

    Regarding the ReiserFS plugin API, you're probably right. However, you don't necessarily need plugins if your project is simple enough. That is to say, if all you're doing is associating a set of data with a key, you make a file (named by the key) and put the data in. Need multiple keys? Use symlinks.

    If your project is of some size, lightweight file support will likely be done before you are (it certainly will if you throw Reiser some money -- he funds his team that way).

    Really, though, I think SQL is almost certainly your best option. The hashing and cacheing done by most modern databases more than makes up for whatever speed is lost to SQL support -- and once again, that speed loss is a load-time thing only if you write your app correctly.

  32. Turn the unix philosophy on its ear... by Nugget94M · · Score: 2

    Unix: "Everything is a file" Linux: "Everything is a file except for the files, which are records."

  33. Re:Other low-level data storage systems out there? by costas · · Score: 2

    Hmm... yes, Reiser maybe a way to go. Some benchmarks are in order. But, alas, SQL-based DBs are still too slow for what I am planning. SQL commands/queries etc. maybe interpreted to some intermiadate language/bytecode, *but* the real slowdown comes from the abstraction layers needed to support SQL queries and the like.

    Again, for a normal application you're absolutely right. But if you want to push/crunch a few GBs around in a coupla minutes, every little slowdown counts :-)... That's why most high-end datamining applications don't use RDBMSs...

  34. Re:Other low-level data storage systems out there? by costas · · Score: 2

    Well, I am glad you're happy, but just about anything implementing a b-tree or skip-list implementation exclusively in RAM will get blazing speeds. The problem of course is, what happens when your application's needs exceed practical RAM sizes (say 7-8GBs these days)?

    I think a well-balanced solution with cache and FS-level access (ReiserFS maybe, in a coupla years from now) will do better. Although, I am really more impressed with SGI's XFS.

  35. To see how this works, check out IBM AS/400 by Loge · · Score: 3

    The AS/400 uses a relational database as a universal data store for all system, application, and user data resources. The database is protected with very fine-grained access privileges and managed with well-defined administrative tools, which dramatically boosts security (since there is only one global security mechanism to manage all system and application resources).

    This approach also simplifies development, which helps to make the AS/400 such a powerful application engine.

  36. Re:Other low-level data storage systems out there? by costas · · Score: 2

    No, not quite that low-level :-)... B/B* tree implementation and the ability to handle well over 2GB of data comfortably (speed wise) is also a must --say around the neighborhood of ~1TB. Multi-user capabilities are also good, and ACID would be cool, but not a must.

  37. Re:Other low-level data storage systems out there? by cduffy · · Score: 2

    The abstraction layers on *your* end or that of the database? The former don't need to exist (one word: "inline") and the latter have been optimized very, very heavily.

    Not all SQL-based databases are alike. If you have the hardware budget for a {SMP,clustering,mainframe} system, a good RDBMS will take advantage of it -- something which might not be said of solutions optimized to perform well on lower-end hardware.

    So, yes -- do your benchmarks, on hardware comparable to what you'll be using for your actual production system. And don't count SQL-based DBs out yet; I would be entirely unsurprised if the overhead which makes them flexible is more than made up for by the heavy optimizations done elsewhere.

  38. They're "duals" of one another by Christopher+B.+Brown · · Score: 2
    In some senses. But they're not exactly isomorphic.
    • ReiserFS provides a way that you should be able to efficiently build a DB hierarchically as a set of directories and files, where files are the "leaf nodes" that contain field data, and where you might use symbolic links to represent secondary indices.

      It would provide pretty "weak typing" of a sort of TCLish style where "everything is a string, sort-of."

    • In contrast, MySQL provides a way of representing "structured data," with "strongly typed fields." And the filesystem view provides a convenient way of looking at that data.
    In effect the ReiserFS approach is to provide a way of building "weakly-typed" hierarchical databases; MySQLFS provides a way of putting a conveniently-browsable hierarchy on top of a strongly-typed relational database.

    There are probably a lot of useful applications out there that wouldn't care much about the distinctions. That probably parallels the way that a lot of applications out there don't really care that MySQL does not satisfy the ACID properties or offer triggers, foreign keys, or other such things.

    It also might be regarded as parallelling the way that Lisp-like languages have "strongly-typed data" with dynamic typing, which is a bit the way ReiserFS might be used, whilst "MySQLFS" looks a bit more like the "static strong typing" of ML/Haskell. Which is a rather weaker analogy...

    In any case, the distinctions between ReiserFS-as-DB and MySQLFS are fairly strong. MySQLFS looks a lot, by the way, like the NameSpace concept in Casbah.

    --
    If you're not part of the solution, you're part of the precipitate.
  39. Zope file system by Doc+Hopper · · Score: 3
    One advantage of Zope is easy access to the database via FTP. Although this isn't a true "UNIX file system", it can demonstrate the value of using a DB filesystem -- you FTP files up, and with built-in versioning you can view any number of versions via the Zope interface.
    I believe that is one of the goals of ReiserFS as well -- that database vendors use file systems to store data instead of having to use raw disk partitions, or deal with file system overhead plus database overhead...

    Matt Barnson

  40. How would this work? by PureFiction · · Score: 2

    Lets see...

    goober:$ cd /mnt/sqldb
    goober:$ ls
    USER_ID FIRST_NAME LAST_NAME TIMESTAMP
    goober:$ mkdir AGE
    goober:$ echo "Oh crap, there goes my schema!"
    "Oh crap, there goes my schema!"
    goober:$ cd USER_ID
    goober:$ ls
    11023 11025 11044 11055 11092
    goober:$ rm 11023
    goober:$ echo "Wow! I hope that wasnt relational!"
    "Wow! I hope that wasnt relational!"
    goober:$ exit

    Seriously, what type of integrity checking will be enforced in this filesystem?

    I am betting that you either have robust integrity, which would give a completely counterintuitive file system, or lax integrity which would open the doors for all sorts of mischevious errors and data corruption.

    1. Re:How would this work? by runswithd6s · · Score: 2

      Open your mind a little further. You're only envisioning the first steps. Certainly being able to transist the database like a filesystem is an extremely powerful interface. Data integrity could be preserved by interacting with the shell the user is in while (s)he is trying to manipulate data. For example:

      goober:$ cd USER_ID
      goober:$ ls
      11023 11025 11044 11055 11092
      goober:$ rm 11023
      Removing USER_ID 11023 will also remove the data for this user in the FIRST_NAME, LAST_NAME, TIMESTAMP, and AGE fields. Do you really want to do this? (Y/n) Yes
      goober:$

      Get the picture? ;-) Essentially, the interface will have to understand how to translate a row into a hierarchical representation. Getting a useful and easy-to-understand interface this way may not be the easiest thing to do. Add the difficulty of resolving foreign key dependencies and such a CLI would get quite confusing. An interesting spin on this would be to bind canned queries to directory names. Still, it has a very limited scope. Complex queries would be difficult to attain through such a CLI.

      --
      assert(expired(knowledge)); /* core dump */
  41. Re:Other low-level data storage systems out there? by Abcd1234 · · Score: 2

    Umm... Berkeley DB and your favorite C compiler? ;)

  42. Integrity of data in a database filesystem. by lamester · · Score: 2

    This database filesystem might have a real advantage in terms of keeping the database records in a consistent state. Remember that in Unix files have arbitrary data. So a journal filesystem tends to keep the meta-data in a consistent state. They don't do much about the application data written into the files. However if mySQLfs had knowledge of the records being written, presumably it could do a lot of the cool stuff done in main frame OS's to ensure integrity. I don't know the VFS interfaces though, so I'm not sure if this is implementable under the current linux framework.

  43. well, wasn't that just lovely by tewwetruggur · · Score: 2
    I do regret that the post this is attached to happened.

    This is what happens when you discover that your co-worker has been posting crap as cyb0rq_m0nk3y, and then they feel that it would be funny to post their inane rant on my computer while in the restroom.

    Makes us (the tewwetruggur contingency) look by far dumber than normal.

    again, my apologies.

    --
    Hi! This is the Sig, blatantly attached to the end of this comment.
  44. Re:A real world use for this by rho · · Score: 2

    I've never been thrilled with the performance of storing LOBs in any kind of DB -- Oracle, PostgreSQL, or MySQL. The plain-old filesystem tends to do it better and faster. I usually store the path to an object in the DB instead.

    That being said, I have used the LOB stoage in Postgres to implement a versioning system for in-house work (and it worked well enough to prove to me that it's do-able, but not well enough to actually use). The concept is sound, but the implementation needs some work.

    However, using a DB's LOB is a helluva lot better than using CVS for binary objects. CVS seems afflicted with unseemly memory bloat when checking in/out large binary objects...

    --
    Potato chips are a by-yourself food.
  45. More like BeOS's filesystem? by Trinition · · Score: 2

    Would this MySQL-based file system be more like BeOS's file system, where files can have arbitrary attributes at the FS level, and you can query based on them?

    1. Re:More like BeOS's filesystem? by Shadowlion · · Score: 5

      No. I believe BeOS just has a meta attribute at the FS level filled with supposed attributes.

      BeOS doesn't have one big clump of data that is partitioned; it has a lot of little bits of data. There is nothing "supposed" about the attributes.

      For instance, an "MP3" datatype might have fields for Artist, Title, Album, Year, and Comments. You could then search for any song with the word "Land," performed by a group with "Men" in their name, on any album between the years 1982-1988, with the phrase "sounds like crap" in the comment field.

      There's no way to actually define a new field at the FS level.

      Sure there is. Preferences->Filetypes allows you to add new attributes to a particular filetype, as well as define new filetypes. It's up to the associated applications to do anything meaningful with the new field or fields, but you can pretty much do what you want.

      On the command line, I don't think you can manipulate a global filetype. However, for individual files, you can add your own attributes, delete existing ones, and so on.

      Early version of BeOS did use a database FS for the entire system but they dropped it by R4 (I think that's the right version) because of performance issues.

      The early versions of BeOS used a separate database (not very complex) and filesystem, which wound up being very difficult to work with, so eventually they merged the two. The "database" aspects of the BeOS filesystem are more of being able to add (relatively) arbitrary data to particular filetypes, and do searching based on those criteria. It isn't a formal database in any sense of the word.


      --

    2. Re:More like BeOS's filesystem? by Drone-X · · Score: 2
      Would this MySQL-based file system be more like BeOS's file system, where files can have arbitrary attributes at the FS level, and you can query based on them?

      No, the MySQL filesystem will only provide the means to access regular databases as if they were filesystems.

      I imagine that in the root of the mount point there will be a directory tables/ with a list of tables in it. This is not like the BeOS's filesystem that stores files in a table.

      Nontheless (sp.) this is pretty cool, if a stable version of this thing is released (which is probably not for the near feauture) I imagine using this to make backups to a CVS server (like the article suggested).

  46. What about metadata by Bruce+Perens · · Score: 2
    I suppose that file-associated metadata, or simply a file containing metadata, could take care of typing.

    Thanks

    Bruce

  47. Is MySQL ready for that? by Trinition · · Score: 5

    No offense to MySQL, but is it ready for such a task? Last I heard, MySQL didn't have record-level-locking except in some experimental forks. Are there any features lacking from MySQL that might make another database more appropriate (ignoring for the moment the license of them).

  48. *holds up sign* by Enahs · · Score: 2

    "Please do not feed the trolls."

    Hell, I will anyway. WTF are you talking about??? GTK *flickers*? Since when? I used to have GTK apps on an old 40MHz 486 (and it was a DLC machine at that...a Cyrix 486 that plugged into a 386 mobo) and I didn't see said flickers *unless the app was poorly written, was doing animation, and didn't double-buffer.*

    Bah, Troll Tech wrote somewhere? The PR department wrote an official release that said something along the lines of "we did not emulate the slow and flickery refresh of GTK(or was it gnome?)" Bullshit. Show me the link. Why would Troll Tech have a position on GNOME, anyway? They don't compete with GNOME in any way. They write a toolkit. I've seen some poorly-written QT programs that display flicker like all hell, and I've seen some GTK apps with decent animation. I've also seen well-written QT apps that display no flickering, and bad GTK apps that do. It depends on the app, I suppose.

    --
    Stating on Slashdot that I like cheese since 1997.
  49. One step further by Travis+Fisher · · Score: 2
    Think of the file system as a database, yes, but think of the files not as records but as objects in the sense of object-oriented programming. A file is an instance of a particular class of object [*]. Associated with that class are various methods (e.g. associated with an HTML file are methods to view this with netscape, or edit it in my favorite raw text editor, or edit it with my favorite HTML composer, etc.; associated to an executable file is the execution of it.). There is a hierarchy of classes with inheritance (the class of HTML files is a subclass of PLAINTEXT files).

    [*] really the class has other data structures besides the actual file data: e.g. file name, a field for comments about the file, etc., which may vary from class to class

    There are also a variety of classes which serve as containers. The most obvious are what traditionally are directories or desktops. Another container class is "query", which has typical database search methods associated. These can be saved, copied, etc.

    Imagine this: your command line should not be associated with a particular directory location, but rather a particular query. On the command line you most frequently use "cq" ("change query"), "rq" ("restrict query"), and "eq" ("expand query"). So to view the penguin image I know lurks somewhere on my drive, the sequence would be something like

    % cq type=image
    5037 files selected
    % rq *pengiun*
    2 files selected
    % ls
    pengiun_57.jpg
    pengiun.gif
    % ./penguin.gif
    No default action for type "gif"; performing default action for type "image": opening penguin.gif with gimp...

    (And, of course, there are obvious database sorts of features that any sensible graphical file explorer should have...)

    To summarize:
    (1) YES!!! Regardless of how exactly the system implements it, the filesystem should be interfaced as a database.
    (2) Furthermore, don't view files just as RECORDS -- view them as active OBJECTS that are instances within a hierarchical class structure.

    Finally, I think a lot of this can be done just with user interface, without having it explicitly in the filesystem. In fact, things have definately been moving this direction, at least for graphical file explorers. Has anyone added this sort of thing to a command shell?

  50. Au Contraire by Christopher+B.+Brown · · Score: 2
    The fact that MySQL doesn't have all those funky "relational features" like foreign keys, triggers, rules, and stored procedures means that this sort of "view" is just about perfect.

    All those complications that MySQL eschews are the sorts of things that would muss up the idea of viewing "database as FS hierarchy."

    And as for the "locking" and "transactional" issues, the point is not terribly different. Filesystems generally don't provide ACID properties; neither does MySQL; that fits together well.

    Mind you, it's quite possible that there's a much bigger controversy concerning stability; based on the MySQLFS web page, it appears that they're passing a CORBA IOR into the kernel. What can that possibly mean other than that they're assuming the presence of the "kORBit" implementation in the kernel? The flaming that surrounded "Why don't we try putting an ORB in the Linux kernel?" was much more vigorous than any flaming about MySQL lacking some ACID features! :-).

    --
    If you're not part of the solution, you're part of the precipitate.
  51. Re:Other low-level data storage systems out there? by cduffy · · Score: 2

    First off, if you want something with serious DB features but without using SQL, you'd do well to just write a wrapper which adds/looks up entries in an SQL database but can be accessed without SQL. I don't know of anything like this existing right now simply because people who want serious database features (or who are writing a serious database) use SQL.

    Well, almost.

    You can also use ReiserFS -- particularly in a little while, after it impliments lightweight files (thus reducing the amount of overhead for eath record). Yup, ReiserFS has low-level support for relational storage, and lots of Other Cool Stuff. I understand that Squid has accelerated support for it; I've also seen a system for indexing newsgroup articles that uses Reiserfs as its backend. Roughly put, this is possible because of reiserfs's blazing speed when working with small files; it also has a plugin API (in-progress?) and Assorted Other Good Stuff.

  52. Be OS by BoySetsFire · · Score: 2

    the Be OS has had a database-like, journaling filesystem since it's very first release. it's the best of both a database, and a file system. I don't know what i would do without it. it makes sorting my thousands upon thousands of mp3s a snap. Add a CD the the collection, fill in the attributes for genre, album, year of release, and so on, and I have a fuly searchable collection.

    --
    "One man's "magic" is another man's engineering."-- Robert A. Heinlein
  53. Re:Just because you can ... by MarkCC · · Score: 2


    Well, but maybe you should.

    Sure - a DB accessed as a filesystem doesn't present the full power of the DB through the filesystem API. And sure, a DB filesystem doesn't necessarily have the same performance characteristics as a standard filesystem.

    But there are some very significant applications where a DB presented as a filesystem makes brilliant sense. Here's two simple ones off the top of my head.

    Configuration management. Systems like CVS go to great trouble to get transactional behavior, so that you can't lose code if the program crashes in the middle of an update. If you're using a DBFS, you've got transactionality and rollback for free.

    Micro-applications. There are a lot of simple applications which really need transactionality/rollback facilities, but which can't (either for portability or for size reasons) make use of a complete transactional database facility. Write it to access files, and let the database take care of transactions.

    I don't have anything to do with this project, but I think it's a great idea, because I'm doing almost the same thing with DB2. (Why DB2? Because I work for IBM Research..) I'm building an SCM system, and I don't want the higher layers of my system to need to understand the database or the particular table layout that I'm using. So they access it as a filesystem; downbelow, it's a rock-solid database.

    Of course, all of the above assumes transactionality - which is not yet fully supported by MySQL. So I'd be a little paranoid before using this, to make certain that they're using the transactional tables!

    -Mark

  54. A real world use for this by sphix42 · · Score: 2

    Many people like to store binaries in the mysql databases, such as images. This would really help improve their ability to code this.

    As PHP is used in conjunction with MySQL a lot, the functions like move_uploaded_file could be used to store blobs in the database rather than an insert into a blob field making your code much easier to read, but, of course, making the server setup a lot more complicated.

    Without row level locking, however, you will face bottlenecks if you try to do anything besides a mostly read-only file system.

  55. Other low-level data storage systems out there? by costas · · Score: 2

    Let me go OT for just a second here: does anybody out there know of any open-source systems out there that can do large-scale data storage *without* SQL? I am thinking of a simple C/C++ API that you can use to retrieve and write data from/to tables/fields, nothing much fancier than that. So far, my best be seems to be ColdStore. Any other pointers?

  56. Re:easy & dangerous. Like sirens? by Pinball+Wizard · · Score: 2
    I have long wanted a database to be integral to the OS. However the way I have thought of it would be instead of using the filesystem, you would have a table. The record names would be something like "FileName", "Owner", "LastModified", "Text", etc.

    This would be great for text based files and spreadsheets. The possibilities for searching and updating your files would be greatly enhanced by having them maintained by a database.

    I don't think a database would be appropriate for graphics or music files(other than storing pointers to those files, but certainly any text based file would be ideal.

    Given my thoughts on how a database enabled filesystem would work, I don't think very many joins or triggers would be necessary. Most things could be handled by single tables.

    Besides, there is the matter that mySQL doesn't support foreign keys or triggers anyway, and last I checked those features weren't on the to do list. :)

    --

    No, Thursday's out. How about never - is never good for you?

  57. Re:Please explain by (void*) · · Score: 2

    Not just that, but for a robust DB, the commit and rollback operations are atomic. It either happens or it doesn't. No half-way measures. So if your disk crashes during a commit operation, you are guaranteed that either the operation went through or not. No mangling of the data.

  58. Re:SQL? by Nafai7 · · Score: 2

    SQL is a series of codes used to database interaction.

    SELECT Age, Height, Name FROM MyAddressBook WHERE Age<18

    The above statement will return the fields Age, Height, and Name from the table "MyAddressBook" and limit the values to only those whose age is less than 18.

    There is also ways using SQL to insert data, create tables, and lots and lots of other stuff.

  59. Re:SQL? by prizog · · Score: 2

    "Structured Query Language". A standard way of getting data into and out of DBs.

    Example: "select answer from whizbang where
    qid = 22 and sid = 1"

    whizbang is a table (imagine a spreadsheet that's accessed row-by-row). The query grabs the answer column from the whizbang table, but only in rows where the value in the qid column is 22, and the value in the sid column is 1.

  60. PostgreSQL filesystem by Ray+Dassen · · Score: 2
    There used to be a PostgreSQL filesystem (see this Linux Journal article).

    Can someone compare this to the MySQL filesystem, or perhaps point me to a place where pgfs can still be downloaded?

  61. Database independance by Carl+Drougge · · Score: 2

    While this doesn't really seem very useful to me (SQL is after all good at what it does..), it seems silly to make it for just one database. It's easier to use common APIs (ODBC?), or at least something custommade but generic, and try to keep the SQL generic too (nothing fancy is needed for this sort of thing anyway) from the start. It's soo much harder to change after the fact. (Not that they said anything about this, but I assume that means it's as MySQL-specific as it can be..)

  62. Re:People are missing the point. by kaisyain · · Score: 2

    If a filesystem is a database is a filesystem and getting BLOBs out of a database is so hard, why don't you just store the pathname in the database and the image in the filesystem? I'm having a hard time envisioning storing images in a database as being "often necessary". But what do I know? :-)

  63. Filesystem interfaces by pergamon · · Score: 2

    You can give just about anything a filesystem interface, its just a matter of how good the implementation is and how useful it is.

    There have already been even FTP and HTTP filesystems for several operating systems if memory serves, and I know there have been a couple other odd ones for BeOS. I nearly did a database FS for Be a few years ago myself.

    Speaking of which, this would be much easier to implement (filesystems are simple to write) and more useful IMHO (because there are already standard APIs to query filesystems and support any number of attributes for files at the OS-Filesystem level) to do for BeOS.

    I'm sure it can be done for linux too, but I have doubts as to the usefulness of it under any OS, much less one where you don't have the luxury of being able to utilize existing attribute support.

    It might give some shortcuts for reading, but writing will likely be very complicated. I don't see a good way to do anything along the lines of joins either. The idea of using "." files/directories will help provide some of that I suppose. Permissions will also be a problem, though I guess you could just go by login name.

    A good reason to have filesystem interfaces to complex resources (like FTP, HTTP, databases, etc) is that it is easy to access things on a filesystem from within just about every programming language on every platform. However, by forcing the normal interfaces to those resources down into what can be done to a filesystem some things also become very complicated. To do those more complicated things will either mean complicated interfaces or programs that give the filesystem information through some other means (ioctl?) or perhaps writing commands to a file within that filesystem ("cat 'SELECT blah FROM blahtable' > /mnt/mysql/queries/testquery" and then looking in /mnt/mysql/queries/testquery/ for the result set, for example).

    In short, I'm sure it'll be very fun to implement and be an interesting toy which may even have some uses...

  64. People are missing the point. by mikehoskins · · Score: 4
    Did you ever hear about BLOBs? Imagine being able to load and unload BLOBs (Binary Large OBjects) in a database in an easier fashion, or, at least, in two different ways.

    Imagine a dynamic web site that uses this! You could simply copy files (especially graphics files) to/from a table easily and look them up via SQL queries! My goodness, the usefulness is extreme, people.

    Have any of you (fs!=db) nay-sayers ever tried to store/retrieve GIFs and JPEGs in a relational database for a web site -- an often daunting, but often necessary task? There are whole article on my to store/retrieve pics as BLOBs via MySQL/PHP on PHPBuilder.com: http://www.phpbuilder.com/columns/florian19991014. php3 and (sorta) http://www.phpbuilder.com/columns/bealers20000904. php3

    So, for those of you who can't get over this idea, try doing sites that store images in databases sometime. An idea like this (one being done by the big RDBMs -- and I work for one of those) is a BOON for websites. It also has many other applications.

    A layer of abstraction is often a good thing for filesystems, and it's where things are headed. IMHO, I think db's could provide BETTER security and make things more distributed, rather than current filesystems. Imagine whole new networked filesystems that are distributed databases. Open your mind. Think about it hard before brushing it aside.

    Besides a db is an fs is a db. It depends on how you look at it, your definition, etc. Is a filesystem relational? Does a db use local storage, often RAW storage. The true computer definition of the two is not all that different. And, SQL is not the only query language out there. Haven't you heard of CLI, which uses commands like cat, ls, echo, rm, mv to handle data? What about those relationships called directories?

    I say, what's the real issue? Raw speed? Oh, wah! Grow up and join the enterprise! Oops, I guess the AS/400 must not be a viable platform; they've been doing this HOW LONG?!?!?

    Q: When are we Linux/Open Source people going to get enterprise-level file and storage management?A: When we get to the point that we implement at least a JFS (if not a full-fledged logged filesystem, good logical volume management, real uninterruptible power, truly fault-tolerant hardware/software clustering, better security, and fully distributed storage management that backups and versions data automagically.

    On a lighter note, MySQL now implements a filesystem. :-)