How To Implement A Database Oriented File System
ALundi writes "A really great read from Andrew Orlowski over at The Register on how Benoit Schillings and Dominic Giampaolo created the 64-bit journaled and attribute based Be File System. Schillings and Giampaolo discuss a variety of design and implementation issues, including data integrity and file system performance. " Interesting in the context of MSFTs plans to
implement a DB filesystem
in future versions of MS Windows.
I used Be on and off for about 6 months. Once you get the hang of it(the filesystem), you see the true power- especially with the address book.
If you want some real insights into the OS as a whole, check out the BEOS Bible. not so good if you want an in depth discussion, but for non-kernel hackers it's a fun read and very informative.
Read the section explaining how their address book works. it's really cool.
Looking for Book Reviews? Check out Literary Escapism.
If I remember correctly, Be originally used a "true" database backend as the filesystem, but ran into performance issues compared to the R5 fs implementation. I can't help but wonder how many of these issues were largely due to the speed of the technology used at the time.
I suppose it depends on your application, but I know a lot of web-based platforms already use true db backends (Oracle, PostgreSQL) to handle all data storage, representing a filesystem as a hierarchial set of rows in numerous tables. I've written several applications this way, and am currently working on a content management platform which also uses this model. Need to make a change to the filesystem structure (adding attributes, changing the security model)? Just modify the DB structure and you're done, especially with databases like PostgreSQL where you can use the database engine itself for a *lot* of functions (via triggers, stored procedures, security settings, etc).
As more and more functionality is brought into web-based application environments, I can see the importance of "old style" filesystems starting to fade somewhat for a lot of apps. Yes, they'll still be necessary (the database itself has to reside somewhere, obvisouly), but not in the same way they used to be. Just a few thoughts
Dominic's book, Practical File System Design with the Be File System, is wonderful. I'd never delved into the innards of a file system before. Reading his book was enjoyable and interesting. I learned quite a lot.
Transcript show: self sigs atRandom.
XFS is also "database-like". But BFS seems to be rather more ambititous an effort -- and very intriguing.
This is one of several BeOS features that the Open Source community should reall consider stealing. But let's consider these features individually, with one eye on whether they're likely to achieve acceptance outside the ranks of BeOS enthusiasts. Let's not waste time on wholesale BeOS clones and compatibility layers. Those are exercises in denial. BeOS was a nice piece of work, but it's as dead as CP/M. Deal with it.
Simple joins, and most of them are can be replicated with links if necessary. Almost all the databases I've seen would lose little from moving out of the DB and into the filesystem.
But then, you lose all the advantages of database journaling (not just integrity, the ability to rollback to a previous state, if necessary), consistency (you have to make sure the files can't be accessed by other operations before the commits are done) and replication.
If I were building an application today with Oracle, I would be very tempted to use iFS for these reasons.
One of the areas that I am interested in is using a DB FS with various "helper" apps. These helper apps would provide a way of managing your data. They could be integrated into another app directly or as a plugin or they could be standalone apps by themselves.
One could provide a helper app that allows you to look at the stored files in ways other than your typical file listing. To do this would require various metadata attributes to be associated with the data. Helper apps would be provided for all of the standard applications that read and write to data to the file.
For example, one app could set attributes to categorize data. One could then search for data on your system, and potentially others, much in the same way as you would search for stuff on Yahoo.
Say you are in a rush to finish your taxes but you need to put together an itemized list of business expenses. You have information on them stored in various places including text files, e-mail, spreadsheets, etc. You could use find and grep to go poking around looking for them or you could use a helper app that does a quick search of attributes and presents you with a list of candidates and ca even call up other apps or services to look at the data. Once you have identified the data another attribute can be set that your tax software uses to record it and pull it into your tax forms.
As the ExtremeTech article pointed out, they are not even considering putting the full-blown SQL Server into Windows. SQL Server is too resource-intensive (it really wants to use all of the available CPU, memory and disk space), too much overhead, and most importantly to MS, too profitable (sales of SQL Server / BackOffice make up about 10-15% of MS's revenue.) There's no reason to bundle it if people are willing to pay a ton for it separately!
As the article says, they're thinking (nothing decided yet) about including MSDE, which is exactly the same as SQL Server 2000, except it is tuned for 5 concurrent users (and hard-limited to 10), the database size is limited to 2GB per database (the same as the Jet DB, aka Access), and it doesn't have the nice GUI admin tools bundled.
Also, the OFS (Object File System) discussed previously probably won't get added either. There's a good reason why it was talked about way back in the Cairo (pre-Win95) days but never implemented - it's really, really hard to do, and it's hard to even convince anybody of its value. (Just look at Be.) Active Directory was originally supposed to be an object store, but I don't think anybody uses it for that (if anybody even uses it at all.)
What probably will be included is an improved version of Indexing Service, which is currently included in Windows 2000 and XP. For those of who are fortunate enough to be unfamiliar with Indexing Service (formerly Index Server), it's an NT service (think "daemon") that periodically scans the file system for new / updated files, and then adds whatever metadata it can extract into a database of sorts, which is then used to speed up searches in the built-in Search dialog on the Start menu.
There are a couple of problems with the current implementation:
So, in summary, MS's plans for the DB-in-the-filesystem look a lot more like Reiser4 than like BeFS or SQL Server.
It's really not Microsoft's innovation.
IBM's AS/400 (a midrange computer system targeted for commercial use/accounting/warehouse/etc...) is based on an object-oriented database filesystem which is implemented at the firmware level (SLIC) rather than at the OS-level - and this system has been around for about 20 years and IIRC it always had quite good performance.
-arch----
A few words about its architecture, if you're interested...
The operating system (OS/400) itself runs on top of this object-oriented low-level "OS" by calling its APIs - as a result, most parts of OS/400 are platform-independent. If you'd manage to get the SLIC running on another hardware platform, you could probably install a nearly unmodified version of OS/400, and it would do its work.
Actually, I'd call the SLIC code the 'real' operating system kernel rather than OS/400, because OS/400 itself would not work without an apropriate SLIC layer.
Everything on the system is an object, so you'll always have to use the object's methods to perform some operation.
For some applications that may be an advantage, because security is enforced on each object at the firmware level. For other applications it might also be a disadvantage, because you'll always have to use a limited set of APIs for modifying data. That blocks many methods commonly used for writing highly optimized code.
-end arch----
One of the benefits of having a database-filesystem is probably the fact that you do not need to run a database product on top of the OS.
Every object on the system can be backed up and restored in a very simple way. Logical files (multiple logical views of one physical file) can help to keep data management simple and consistent.
On the other hand, you will have to update the entire OS (including the kernel) when you need to install a new release of the database - which means, that you'll have to reboot the machine.
And - last but not least - the more code you have in the OS kernel, the higher is the probability of having dangerous bugs somewhere in the kernel.
It should not be necessary to mention, that bugs in the OS kernel may compromise all system security.
There are certainly many advantages and disadvantages regarding the database-filesystem issue, so I think it all depends on what you want to do with your computer.
-----
kind regards from Austria,
octogen
PS: i hope my english isn't too poor..
And - by the way - even Microsoft uses AS/400 boxes for running its business, so what do you think, where did they get their inspiration from...?
No, you don't.
Be used the following method of determining which app would open a file:
Notice that nowhere in here does it take into account what app created a file.
Even better, you can always right click on any file (or any list of files of the same type) and get a list of:
- The preferred app for that file
- The preferred app for that filetype
- Any apps that can open that filetype
- Any apps that can open the supertype
- Any apps that can open any filetype
And the list shows up pretty much instantaneously. Do that in any other system, anywhere, no matter how many hacks you add. And yes, you'd use it every single day of your life if it were available to you.So no, I'd say that you don't understand the problem.