MySQL FS
xcyber writes "Developer, Database Admin and user, MySQL is developing an mysql filesystem for Linux to mount database on
Linux as a fs. This is still in development stage and the development
team would like to receive comment on this. So please let us know.
" "Because you can" dammit. Thats just plain awesome.
Amongst all these other examples, it's probably worth noting that SQL is a declarative language. Basically, it allows you to express the results -- without worrying about the procedure used to generate the results.
pooptruck
You're absolutely right that a prototype could be built using current file systems, but said prototype would be SLOW and eat a LOT of space. It's better to use appropriate data structures and algorithms.
And yes, file systems are databases; they're merely inflexible databases using ANCIENT technology. Not all databases are created equal.
-Billy
phexro!pyramid:~$ SELECT * from pr0n WHERE sex='f' AND species='goat';
--
Nice. However, first things first: any replacement for the current system has to start by doing all the things the current system does, at least as simply. This is the main reason I think 'cd' is a good command to include.
It's BAD to try for too much with the first release. If you'd like an 'object system', by all means prototype one using conventional directories; you'll decide quickly that it's little different from modern Unix (remember ioctl!). In other words, an overly complex solution.
We need a true file system, one in which ioctl isn't needed. See the latest plan9 OS for details.
-Billy
A while back (a year maybe?) Oracle announced their iFS product. Dubbed the Internet file system, it gave file system, IMAP, POP, FTP, and web access to the database through a common software. I haven't had the chance to work with it, and it still may not even be available, but to be able to store files in the database and enforce integrity, it's extremely easy to track revisioning, maintain lists, and perform searches and reports. It seems like wonderful technology that should be a part of every OS, but I'm curious as to performance. Has anyone had any experience with iFS?
LOAD "SIG",8,1
LOADING...
READY.
RUN
You can almost always access BLOBS (or equivilent fields) from different languages and environments, the problem is that they're all different. DBI, JDBC, ODBC, etc... each one has it's own gotchas. Being able to access these fields as a part of a filesystem gives it the ultimate portability. Even shell scripts can quickly and easily access the database!
Doug
Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
- Does MySQL have the same directed and well defined core of development that linux development had and has?[think kernel here]
- Is the current base of MySQL well written enough, with enough source infrastructure to survive eventual restructuring during concurrent feature enhancement?
- Is there, as there was with linux, little competition in similar projects offering a similar feature set that might attract more followers or be a better candidate than MySQL for development attention?
I'm not saying that MySQL lacks any of these. But there are tons of opensourced projects that just needed a bit of getting better that never did because they never really were good enough on a source level. Linux is a lucky case, but take heart, if there hadn't been linux, you still could have run the most fo the gnu system on BSD thanks to GCC.Lastly, I'm largly unaware of any linux-only apps that actually make or break a user's choice to use linux vs. any other unix. I think what really makes or breaks the choice is price-point and percieved momentum. pauvre pauvre netBSD.
-Daniel
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
As well, this would be fantastic for configurations (in particular the complex ones of Gnome and KDE) since large amounts of data could be elegantly compartmentalized in a standard way. I find this nifty with the growing complexity of filestructures in these config sets, they would be open to editing and updating through the standard filesystem method, or through a standard SQL query system.
Section 14.2.2 Has a discussion of File Systems versus Databases written by M. Satyanarayanan (of AFS and CODA fame). He says that although file systems and databases have much in common, there are several areas in which they differ conceptually including encapsulation, naming, and the ratio of search time to usage time. Basically file systems are appropriate when there is high temporaly locality, while databases are used in situations where there is little locality and concurrent read and write sharing of data at a fine grain level are required.
All of the high end RMDBS use raw file system access. By not using a "regular" file system, you gain a huge performance jumps. If MySQL is doing its own locking and recovery, then the overhead from the file system is wasteful.
Oracle has been doing this for years.
This is just another important step that MySQL needs hurdle before it is concidered for high end applications.
-b
This sounds vaguely similar to how PalmOS apps work.
THe entire filesystem is based around the idea of a database, where memory chunks are accessed based on the name of the app, etc....
Warning: I'm human. Sometimes stuff I post here is wrong. Use your head. Question authority
this must be one of those times. without some way of querying your file system (except for ls) then you loose the relational aspect of the database. then you just have a filesystem that stores metadata for fast recovery. this is good from a fs point of veiw, but it does nothing to help you find the files you are looking for. it provides no relations between files, and does not store file descriptions (that are useful to a human).
eg. this is an image file that can be catagorized under political, humor, bill clinton, letch, etc.
use LaTeX? want an online reference manager that
-- john
I've seen presentations about Placeless Documents and it's really cool.
Hmmm. Interesting concept. Not sure what the use would be. 'where' would be easy to implement in SQL, though.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
This is a great idea if it's implemented well. The AS/400 is an example of a system that was entirely implemented around the idea of a full-featured DB implemented as a filesystem.
...and since it still has one of the best uptime records in the industry, and transaction processing times that consistently rank in the best-of-the-best lists, it's a good platform to imitate. Too often it's overlooked because of the green-screen terminals, but at its core, the AS/400 is easily one of the most advanced implementations of computer technology available to the general public.
Well, SQL wont work --I just don't need (or want) the slow down from interpreting and abstracting SQL commands --and yes, I want *all* the speed I can get.
I am looking for something way lower level. ReiserFS isn't a bad solution, I just dont believe that their plugin API is mature enough to base another project on top of --or am I wrong?
Good point -- didn't think about being able to use the "Find" function of an OS.
However, how is this better than a dedicated web app for HR flacks?
Find your favorite non-computer-literate person and see if they even KNOW that there's a "Find Files and Folders" in their Start menu? (I'm assuming a Windows-centric office here)
Like I said, I think it's cool and potentially useful, but probably not as useful for non-nerds
Potato chips are a by-yourself food.
I'm not sure what you mean when you say 'staticly' or 'dynamicly'. I suspect, however, that you're assuming that the foundation of a fluid file system is a set of files, directories, and links. It's not. It's almost certainly a relational database, one optimised for the task of getting a set of items (objects, files, whatever) which are categorised under a given set of categories.
I also don't know what you mean by 'number of possible categories'. I think you're mistaking 'categories' for 'sets of categories'. In my example, "/etc/wtanksle" is a set of two categories; "etc" is a category. I could see some reason to cache the results of category queries; that's an optimization concern, and not my specialty. I don't see any reason to try to precache all possible queries, as you seem to imply.
Its speed will be almost irrelevant; I predict that it'll be about as fast as the current system, but even if it's hundreds of times slower it'll still be fast enough, since caching is trivial and looking up a file based on a full filespec is almost never done.
-Billy
Would the kernel be anywhere near where it is today if people hadn't gotten others interested by writing intriguing, linux-only apps? Probably not.
Your analysis is wrong.
Most anything shipped on linux distro is nothing more than a Unix program PORTED.
Unless GCC, X and others are 'linux only' apps.
If it was said on slashdot, it MUST be true!
Much of the ultimate point of ReiserFS is the marriage of databases and filesystems (filesystems are really just a limited sort of database anyway). This is the reason for the all the commercial funding; there are people out there who really want this.
See Hans Reiser's White Paper for information on where he's going with this.
For what it's worth, database filesystems are not a new thing at all. Hans is just planning on accomplishing this in a way that completely preserves the Unix file metaphor and related concepts.
DNA just wants to be free...
A database is about data. The data is partitioned into tables and columns, with a large number of additional constraints (unique, primary key, foreign key and check clauses, for example) to limit the values of the data. Additionaly, the data is strongly typed. In order to access this data, SQL supports very high level commands, like SELECT, INSERT, UPDATE and DELETE.
The power of a database is most basically in its very high level nature. You, as the user/programmer, do not care where the data is, who else is using it, how it is stored, or what the old values where. The database management system takes care of all of that. Other powerful features of databases include indexes, joins, subselects, real NULLs, aggregate(set) functions, and GROUP BYs (sub-setting).
Now, contrast this with the low level file/directory structure. In this, you have a hierarchy of directories, each of which contains one or more files. A file is nothing more that a stream of bytes, and the only constraint they have is that they be uniquely named within their directory. Also, a single file can be in more that one directory.
In order to use a file, the programmer must know where the file is, possibly who else is using it (with lock files, for example), what format the data is stored in and, if they want to be able to undo their actions, the old values. The advantages to files are the plethora of tools for manipulating them (at least in the case of text), and lower startup cost (eg. it takes less time to make a stupid file format than a SQL schema).
This project is therefore brain-dead as an application development platform. 'But,' I can hear the reply, 'it's useful for users who want to change the data in the database.' Reply: every database accepts SQL, which modifies the data. Some SQL API's I've seen only take two lines of code to retreive some data. And SQL won't shit on your data if you accidentally type it in in the wrong format, it'll conplain, but your data will be safe and secure.
This is quite possibly the worst idea I've ever heard. Worse than Linux as an Internet Explorer plugin, worse than Napster as a family tree generator, worse than Quake III as a spreadsheet, and even worse than Apache as a VMS shell.
Not that I have anything personal against it.
Yes, I'm still a junky. Are you still a bitch?
This sounds really cool, but it seems there could be some problems with implementation. If you build category listings dynamically, this drastically slows down tasks like a simple directory listing (or even locating a file by name), because you start having to do searches. Of course you can speed this up wi/ good indexing, but you still have to pull those indices off the disk and do a fair amount of processing.
/bin, /temp, etc. OTOH, it would be great for home directories where the user is mostly storing documents and a relatively minor performance hit isn't noticeable.
You might be able to build some of the categories statically, but if your fs is truly fluid, then the number of possible categories is gonna be too huge to build and maintain statically. Maybe it needs to be a little less liquid, or maybe you can find a way to indentify commonly accessed files/categories and build that stuff statically, then do everything else dynamically.
I also think this needs to integrate with rather than replacing a traditional fs. I doubt this method will ever be as efficient in terms of looking up, creating, and deleting files as a traditional fs, so it would be bad for system stuff like
The early versions of BeOS used a separate database (not very complex) and filesystem, which wound up being very difficult to work with, so eventually they merged the two. The "database" aspects of the BeOS filesystem are more of being able to add (relatively) arbitrary data to particular filetypes, and do searching based on those criteria. It isn't a formal database in any sense of the word.
Versions of BeOS prior to the Preview Release had a file system and a separate database. Because it was difficult to keep the data in the two separate systems consistant, it was decided that they should merge. This happened in Preview Release 1, and BFS remains relatively unchanged today.
At the time there was a lot of enthusiasm for the merged design to be a database-based file system, but after a lot of research, Dominic Giampaolo, the engineer doing the design and coding, determined that wasn't going to work. The reason is it becomes too difficult to filter out the files you aren't interested in. There is a lot of organizational value in a hierarchical, structured, traditional file system.
The design for BFS that was implemented is best described as an "attribute-adorned file system," with a query engine that can search against the attributes, and some indexing to make common queries fast. There's a fairly simple query language (along with simple GUI tools), but it's not as complex or capable as SQL (nor would you really want it to be). You can execute those queries from the command line if you want, which can be pretty useful when piped to another program (much as find is in Unix, but simpler to work with).
Damn, you...both of you stole *my* idea! ;)
For a long time now I've been thinking about filesystem-as-database concept. We've passed the point where computing is about optimizing hardware resources. It is now about optimizing *user* and *information* resources. If your hardware is blazingly fast, but you are lost in a sea of irrelevant information, you can't do anything. I think that's where the database/meta-filesystem comes in.
With all this rich content around, we should not be searching for files based on some arbitrary linear categorical name. We should be searching on *attributes*. We should be searching on *association*. E.g., "List all files relating to my work that I have store on my home computer", "Now, of those, show me all files that pertain to status reports". Or "List all data I have on the artists and bands in my music collection". etc.
This is where plain, flat, hierarchical file systems fail. We need basically a data "repository", and various ways of obtaining information from that repository, based on attributes, categories, mime types, relation to *other* files, etc.
It's 10 PM. Do you know if you're un-American?
Bruce
Bruce Perens.
In my vision, 'documents' would be categorised, and the categories could be viewed in a manner very similar to how we now view directories, except that a file is in more than one folder at a time. A file which is named /etc/wtanksle/ppp.conf could also be referred to as /wtanksle/etc/ppp.conf, or if it's unambiguous, /etc/ppp.conf. /dev/removable gives the list of all removable devices; /dev/scsi gives the SCSI devices (including the removable ones).
The potential uses are many -- I think it would make a lot of common computer tasks a lot easier.
Oh well -- anyhow. :-)
-Billy
Take the linux kernel. Would the kernel be anywhere near where it is today if people hadn't gotten others interested by writing intriguing, linux-only apps? Probably not. Perhaps one day MySQL will evolve to the point where this will be useful, perhaps due to developers attracted by this project.
"The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
The everyday user won't exactly go nuts over it, though.
The site gives the example "imagine marketroids browsing through the directories to directly access columns and entries" (or words to that effect)
No way. Hey, don't get me wrong, I LIKE that idea, and it gives me a pretty cool idea for a couple of projects that I'm working on, but think carefully about it: any sufficiently useful database for a large company is also sufficiently large that a directory tree is absolutely the slowest and most confusing way to access data held within a database.
For example, let's look at two examples:
It's not bad, but it's not as good. Plus, with good programmers (and good communication between programmers and management), the SQL is so abstracted out, it makes no difference. It gets condensed to a list of names and a checkbox next to the names. Those that get "checked" get a raise to $100,000.
To be truly useful to non-programmers (or non-analytical thinkers, if you will), the MySQL-FS would have to abstract out so much of the Database, you're back to a filesystem and a set of scripts to update a MySQL database.
It's cool, but it's not for your regular joe. Beyond a couple of levels, the average computer user gets lost in a heirarchal filesystem -- assuming they don't fill it up with "Untitled Folders" and such.
Potato chips are a by-yourself food.
--
You can compile your SQL statements at the beginning of your program, so that they aren't reinterpreted later. Thus, unless load time is essentual to you, you may be better off with an existing database.
Regarding the ReiserFS plugin API, you're probably right. However, you don't necessarily need plugins if your project is simple enough. That is to say, if all you're doing is associating a set of data with a key, you make a file (named by the key) and put the data in. Need multiple keys? Use symlinks.
If your project is of some size, lightweight file support will likely be done before you are (it certainly will if you throw Reiser some money -- he funds his team that way).
Really, though, I think SQL is almost certainly your best option. The hashing and cacheing done by most modern databases more than makes up for whatever speed is lost to SQL support -- and once again, that speed loss is a load-time thing only if you write your app correctly.
Unix: "Everything is a file" Linux: "Everything is a file except for the files, which are records."
Hmm... yes, Reiser maybe a way to go. Some benchmarks are in order. But, alas, SQL-based DBs are still too slow for what I am planning. SQL commands/queries etc. maybe interpreted to some intermiadate language/bytecode, *but* the real slowdown comes from the abstraction layers needed to support SQL queries and the like.
:-)... That's why most high-end datamining applications don't use RDBMSs...
Again, for a normal application you're absolutely right. But if you want to push/crunch a few GBs around in a coupla minutes, every little slowdown counts
Well, I am glad you're happy, but just about anything implementing a b-tree or skip-list implementation exclusively in RAM will get blazing speeds. The problem of course is, what happens when your application's needs exceed practical RAM sizes (say 7-8GBs these days)?
I think a well-balanced solution with cache and FS-level access (ReiserFS maybe, in a coupla years from now) will do better. Although, I am really more impressed with SGI's XFS.
The AS/400 uses a relational database as a universal data store for all system, application, and user data resources. The database is protected with very fine-grained access privileges and managed with well-defined administrative tools, which dramatically boosts security (since there is only one global security mechanism to manage all system and application resources).
This approach also simplifies development, which helps to make the AS/400 such a powerful application engine.
No, not quite that low-level :-)... B/B* tree implementation and the ability to handle well over 2GB of data comfortably (speed wise) is also a must --say around the neighborhood of ~1TB. Multi-user capabilities are also good, and ACID would be cool, but not a must.
The abstraction layers on *your* end or that of the database? The former don't need to exist (one word: "inline") and the latter have been optimized very, very heavily.
Not all SQL-based databases are alike. If you have the hardware budget for a {SMP,clustering,mainframe} system, a good RDBMS will take advantage of it -- something which might not be said of solutions optimized to perform well on lower-end hardware.
So, yes -- do your benchmarks, on hardware comparable to what you'll be using for your actual production system. And don't count SQL-based DBs out yet; I would be entirely unsurprised if the overhead which makes them flexible is more than made up for by the heavy optimizations done elsewhere.
- ReiserFS provides a way that you should be able to efficiently build a DB hierarchically as a set of directories and files, where files are the "leaf nodes" that contain field data, and where you might use symbolic links to represent secondary indices.
- In contrast, MySQL provides a way of representing "structured data," with "strongly typed fields." And the filesystem view provides a convenient way of looking at that data.
In effect the ReiserFS approach is to provide a way of building "weakly-typed" hierarchical databases; MySQLFS provides a way of putting a conveniently-browsable hierarchy on top of a strongly-typed relational database.It would provide pretty "weak typing" of a sort of TCLish style where "everything is a string, sort-of."
There are probably a lot of useful applications out there that wouldn't care much about the distinctions. That probably parallels the way that a lot of applications out there don't really care that MySQL does not satisfy the ACID properties or offer triggers, foreign keys, or other such things.
It also might be regarded as parallelling the way that Lisp-like languages have "strongly-typed data" with dynamic typing, which is a bit the way ReiserFS might be used, whilst "MySQLFS" looks a bit more like the "static strong typing" of ML/Haskell. Which is a rather weaker analogy...
In any case, the distinctions between ReiserFS-as-DB and MySQLFS are fairly strong. MySQLFS looks a lot, by the way, like the NameSpace concept in Casbah.
If you're not part of the solution, you're part of the precipitate.
I believe that is one of the goals of ReiserFS as well -- that database vendors use file systems to store data instead of having to use raw disk partitions, or deal with file system overhead plus database overhead...
Matt Barnson
Matthew P. Barnson
I learn what I think when I read what I write
Lets see...
/mnt/sqldb
goober:$ cd
goober:$ ls
USER_ID FIRST_NAME LAST_NAME TIMESTAMP
goober:$ mkdir AGE
goober:$ echo "Oh crap, there goes my schema!"
"Oh crap, there goes my schema!"
goober:$ cd USER_ID
goober:$ ls
11023 11025 11044 11055 11092
goober:$ rm 11023
goober:$ echo "Wow! I hope that wasnt relational!"
"Wow! I hope that wasnt relational!"
goober:$ exit
Seriously, what type of integrity checking will be enforced in this filesystem?
I am betting that you either have robust integrity, which would give a completely counterintuitive file system, or lax integrity which would open the doors for all sorts of mischevious errors and data corruption.
Umm... Berkeley DB and your favorite C compiler? ;)
This database filesystem might have a real advantage in terms of keeping the database records in a consistent state. Remember that in Unix files have arbitrary data. So a journal filesystem tends to keep the meta-data in a consistent state. They don't do much about the application data written into the files. However if mySQLfs had knowledge of the records being written, presumably it could do a lot of the cool stuff done in main frame OS's to ensure integrity. I don't know the VFS interfaces though, so I'm not sure if this is implementable under the current linux framework.
This is what happens when you discover that your co-worker has been posting crap as cyb0rq_m0nk3y, and then they feel that it would be funny to post their inane rant on my computer while in the restroom.
Makes us (the tewwetruggur contingency) look by far dumber than normal.
again, my apologies.
Hi! This is the Sig, blatantly attached to the end of this comment.
I've never been thrilled with the performance of storing LOBs in any kind of DB -- Oracle, PostgreSQL, or MySQL. The plain-old filesystem tends to do it better and faster. I usually store the path to an object in the DB instead.
That being said, I have used the LOB stoage in Postgres to implement a versioning system for in-house work (and it worked well enough to prove to me that it's do-able, but not well enough to actually use). The concept is sound, but the implementation needs some work.
However, using a DB's LOB is a helluva lot better than using CVS for binary objects. CVS seems afflicted with unseemly memory bloat when checking in/out large binary objects...
Potato chips are a by-yourself food.
Would this MySQL-based file system be more like BeOS's file system, where files can have arbitrary attributes at the FS level, and you can query based on them?
Thanks
Bruce
Bruce Perens.
No offense to MySQL, but is it ready for such a task? Last I heard, MySQL didn't have record-level-locking except in some experimental forks. Are there any features lacking from MySQL that might make another database more appropriate (ignoring for the moment the license of them).
"Please do not feed the trolls."
Hell, I will anyway. WTF are you talking about??? GTK *flickers*? Since when? I used to have GTK apps on an old 40MHz 486 (and it was a DLC machine at that...a Cyrix 486 that plugged into a 386 mobo) and I didn't see said flickers *unless the app was poorly written, was doing animation, and didn't double-buffer.*
Bah, Troll Tech wrote somewhere? The PR department wrote an official release that said something along the lines of "we did not emulate the slow and flickery refresh of GTK(or was it gnome?)" Bullshit. Show me the link. Why would Troll Tech have a position on GNOME, anyway? They don't compete with GNOME in any way. They write a toolkit. I've seen some poorly-written QT programs that display flicker like all hell, and I've seen some GTK apps with decent animation. I've also seen well-written QT apps that display no flickering, and bad GTK apps that do. It depends on the app, I suppose.
Stating on Slashdot that I like cheese since 1997.
[*] really the class has other data structures besides the actual file data: e.g. file name, a field for comments about the file, etc., which may vary from class to class
There are also a variety of classes which serve as containers. The most obvious are what traditionally are directories or desktops. Another container class is "query", which has typical database search methods associated. These can be saved, copied, etc.
Imagine this: your command line should not be associated with a particular directory location, but rather a particular query. On the command line you most frequently use "cq" ("change query"), "rq" ("restrict query"), and "eq" ("expand query"). So to view the penguin image I know lurks somewhere on my drive, the sequence would be something like
% cq type=image ./penguin.gif
5037 files selected
% rq *pengiun*
2 files selected
% ls
pengiun_57.jpg
pengiun.gif
%
No default action for type "gif"; performing default action for type "image": opening penguin.gif with gimp...
(And, of course, there are obvious database sorts of features that any sensible graphical file explorer should have...)
To summarize:
(1) YES!!! Regardless of how exactly the system implements it, the filesystem should be interfaced as a database.
(2) Furthermore, don't view files just as RECORDS -- view them as active OBJECTS that are instances within a hierarchical class structure.
Finally, I think a lot of this can be done just with user interface, without having it explicitly in the filesystem. In fact, things have definately been moving this direction, at least for graphical file explorers. Has anyone added this sort of thing to a command shell?
All those complications that MySQL eschews are the sorts of things that would muss up the idea of viewing "database as FS hierarchy."
And as for the "locking" and "transactional" issues, the point is not terribly different. Filesystems generally don't provide ACID properties; neither does MySQL; that fits together well.
Mind you, it's quite possible that there's a much bigger controversy concerning stability; based on the MySQLFS web page, it appears that they're passing a CORBA IOR into the kernel. What can that possibly mean other than that they're assuming the presence of the "kORBit" implementation in the kernel? The flaming that surrounded "Why don't we try putting an ORB in the Linux kernel?" was much more vigorous than any flaming about MySQL lacking some ACID features! :-).
If you're not part of the solution, you're part of the precipitate.
First off, if you want something with serious DB features but without using SQL, you'd do well to just write a wrapper which adds/looks up entries in an SQL database but can be accessed without SQL. I don't know of anything like this existing right now simply because people who want serious database features (or who are writing a serious database) use SQL.
Well, almost.
You can also use ReiserFS -- particularly in a little while, after it impliments lightweight files (thus reducing the amount of overhead for eath record). Yup, ReiserFS has low-level support for relational storage, and lots of Other Cool Stuff. I understand that Squid has accelerated support for it; I've also seen a system for indexing newsgroup articles that uses Reiserfs as its backend. Roughly put, this is possible because of reiserfs's blazing speed when working with small files; it also has a plugin API (in-progress?) and Assorted Other Good Stuff.
the Be OS has had a database-like, journaling filesystem since it's very first release. it's the best of both a database, and a file system. I don't know what i would do without it. it makes sorting my thousands upon thousands of mp3s a snap. Add a CD the the collection, fill in the attributes for genre, album, year of release, and so on, and I have a fuly searchable collection.
"One man's "magic" is another man's engineering."-- Robert A. Heinlein
Well, but maybe you should.
Sure - a DB accessed as a filesystem doesn't present the full power of the DB through the filesystem API. And sure, a DB filesystem doesn't necessarily have the same performance characteristics as a standard filesystem.
But there are some very significant applications where a DB presented as a filesystem makes brilliant sense. Here's two simple ones off the top of my head.
Configuration management. Systems like CVS go to great trouble to get transactional behavior, so that you can't lose code if the program crashes in the middle of an update. If you're using a DBFS, you've got transactionality and rollback for free.
Micro-applications. There are a lot of simple applications which really need transactionality/rollback facilities, but which can't (either for portability or for size reasons) make use of a complete transactional database facility. Write it to access files, and let the database take care of transactions.
I don't have anything to do with this project, but I think it's a great idea, because I'm doing almost the same thing with DB2. (Why DB2? Because I work for IBM Research..) I'm building an SCM system, and I don't want the higher layers of my system to need to understand the database or the particular table layout that I'm using. So they access it as a filesystem; downbelow, it's a rock-solid database.
Of course, all of the above assumes transactionality - which is not yet fully supported by MySQL. So I'd be a little paranoid before using this, to make certain that they're using the transactional tables!
-Mark
Many people like to store binaries in the mysql databases, such as images. This would really help improve their ability to code this.
As PHP is used in conjunction with MySQL a lot, the functions like move_uploaded_file could be used to store blobs in the database rather than an insert into a blob field making your code much easier to read, but, of course, making the server setup a lot more complicated.
Without row level locking, however, you will face bottlenecks if you try to do anything besides a mostly read-only file system.
Let me go OT for just a second here: does anybody out there know of any open-source systems out there that can do large-scale data storage *without* SQL? I am thinking of a simple C/C++ API that you can use to retrieve and write data from/to tables/fields, nothing much fancier than that. So far, my best be seems to be ColdStore. Any other pointers?
This would be great for text based files and spreadsheets. The possibilities for searching and updating your files would be greatly enhanced by having them maintained by a database.
I don't think a database would be appropriate for graphics or music files(other than storing pointers to those files, but certainly any text based file would be ideal.
Given my thoughts on how a database enabled filesystem would work, I don't think very many joins or triggers would be necessary. Most things could be handled by single tables.
Besides, there is the matter that mySQL doesn't support foreign keys or triggers anyway, and last I checked those features weren't on the to do list. :)
No, Thursday's out. How about never - is never good for you?
Not just that, but for a robust DB, the commit and rollback operations are atomic. It either happens or it doesn't. No half-way measures. So if your disk crashes during a commit operation, you are guaranteed that either the operation went through or not. No mangling of the data.
SQL is a series of codes used to database interaction.
SELECT Age, Height, Name FROM MyAddressBook WHERE Age<18
The above statement will return the fields Age, Height, and Name from the table "MyAddressBook" and limit the values to only those whose age is less than 18.
There is also ways using SQL to insert data, create tables, and lots and lots of other stuff.
"Structured Query Language". A standard way of getting data into and out of DBs.
Example: "select answer from whizbang where
qid = 22 and sid = 1"
whizbang is a table (imagine a spreadsheet that's accessed row-by-row). The query grabs the answer column from the whizbang table, but only in rows where the value in the qid column is 22, and the value in the sid column is 1.
Become a FSF associate member before the low #s are used
Can someone compare this to the MySQL filesystem, or perhaps point me to a place where pgfs can still be downloaded?
While this doesn't really seem very useful to me (SQL is after all good at what it does..), it seems silly to make it for just one database. It's easier to use common APIs (ODBC?), or at least something custommade but generic, and try to keep the SQL generic too (nothing fancy is needed for this sort of thing anyway) from the start. It's soo much harder to change after the fact. (Not that they said anything about this, but I assume that means it's as MySQL-specific as it can be..)
If a filesystem is a database is a filesystem and getting BLOBs out of a database is so hard, why don't you just store the pathname in the database and the image in the filesystem? I'm having a hard time envisioning storing images in a database as being "often necessary". But what do I know? :-)
You can give just about anything a filesystem interface, its just a matter of how good the implementation is and how useful it is.
/mnt/mysql/queries/testquery" and then looking in /mnt/mysql/queries/testquery/ for the result set, for example).
There have already been even FTP and HTTP filesystems for several operating systems if memory serves, and I know there have been a couple other odd ones for BeOS. I nearly did a database FS for Be a few years ago myself.
Speaking of which, this would be much easier to implement (filesystems are simple to write) and more useful IMHO (because there are already standard APIs to query filesystems and support any number of attributes for files at the OS-Filesystem level) to do for BeOS.
I'm sure it can be done for linux too, but I have doubts as to the usefulness of it under any OS, much less one where you don't have the luxury of being able to utilize existing attribute support.
It might give some shortcuts for reading, but writing will likely be very complicated. I don't see a good way to do anything along the lines of joins either. The idea of using "." files/directories will help provide some of that I suppose. Permissions will also be a problem, though I guess you could just go by login name.
A good reason to have filesystem interfaces to complex resources (like FTP, HTTP, databases, etc) is that it is easy to access things on a filesystem from within just about every programming language on every platform. However, by forcing the normal interfaces to those resources down into what can be done to a filesystem some things also become very complicated. To do those more complicated things will either mean complicated interfaces or programs that give the filesystem information through some other means (ioctl?) or perhaps writing commands to a file within that filesystem ("cat 'SELECT blah FROM blahtable' >
In short, I'm sure it'll be very fun to implement and be an interesting toy which may even have some uses...
Imagine a dynamic web site that uses this! You could simply copy files (especially graphics files) to/from a table easily and look them up via SQL queries! My goodness, the usefulness is extreme, people.
Have any of you (fs!=db) nay-sayers ever tried to store/retrieve GIFs and JPEGs in a relational database for a web site -- an often daunting, but often necessary task? There are whole article on my to store/retrieve pics as BLOBs via MySQL/PHP on PHPBuilder.com: http://www.phpbuilder.com/columns/florian19991014. php3 and (sorta) http://www.phpbuilder.com/columns/bealers20000904. php3
So, for those of you who can't get over this idea, try doing sites that store images in databases sometime. An idea like this (one being done by the big RDBMs -- and I work for one of those) is a BOON for websites. It also has many other applications.
A layer of abstraction is often a good thing for filesystems, and it's where things are headed. IMHO, I think db's could provide BETTER security and make things more distributed, rather than current filesystems. Imagine whole new networked filesystems that are distributed databases. Open your mind. Think about it hard before brushing it aside.
Besides a db is an fs is a db. It depends on how you look at it, your definition, etc. Is a filesystem relational? Does a db use local storage, often RAW storage. The true computer definition of the two is not all that different. And, SQL is not the only query language out there. Haven't you heard of CLI, which uses commands like cat, ls, echo, rm, mv to handle data? What about those relationships called directories?
I say, what's the real issue? Raw speed? Oh, wah! Grow up and join the enterprise! Oops, I guess the AS/400 must not be a viable platform; they've been doing this HOW LONG?!?!?
Q: When are we Linux/Open Source people going to get enterprise-level file and storage management?A: When we get to the point that we implement at least a JFS (if not a full-fledged logged filesystem, good logical volume management, real uninterruptible power, truly fault-tolerant hardware/software clustering, better security, and fully distributed storage management that backups and versions data automagically.
On a lighter note, MySQL now implements a filesystem. :-)