newdocms: Beyond the Hierarchical File System
Manuel Arriaga writes "After two years of hard work (and many scrapped versions), I have just released a (ugly, but working!) preview version of newdocms, a completely new document management system. newdocms isn't a file browser: it is a layer between the hierarchical file system (HFS) and the user, which provides a radically new way to store and retrieve documents. No longer will you browse complex directory trees or directly interact with the HFS; instead, you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on.
For the first time you have a true alternative to the hierarchical file system at the OS level. Through the modification of the KDE shared libraries, newdocms currently works with all KDE apps! (I am looking for volunteers to add support for GNOME and OpenOffice.org!) This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
I'm already using The Brain. It's *really* unique, and it works. It works very well. And, in addition to organizing files the way YOU want them organized, it also connects random thoughts, web sites, emails, etc. If you haven't seen it, check it out. It's pretty damn incredible.
Don't I remember reading something about the Blackcomb file system being database driven? Billg called the current file system a "cesspool" and said it's going to be completely overhauled, IIRC.
Oh well, in a few years the *n?x-philes will be screaming about M$ stealing their ideas. Figures.
What Microsoft suggested something like this, everyone went mental, and I got bitch slapped for saying I thought it was a good idea.
- Not confusing enough.
- No possibility of new patents.
- Lack of ability to lock users into your proprietary file system.
I didn't know HFS was broken.NetInfo connection failed for server 127.0.0.1/local
I have worked with many a user that has had problems with the concept of folders (directories). Perhaps those users can grasp this concept easier.
1. "Filesystem? I don't need no stinkin filesystem!" An ideal Palm-esque computing environment wouldn't have any filesystem. There simply isn't any reason for it. Why would you store addresses in an address file or a book report in a word file? Saving/Opening files should be transparent to the end user. Versioning should be built in, yet simple to understand. Forking files can be accomplished without copying a file. This is intuitively the simplist idea.
2. If you somehow *have* to think in terms of files, then your conclusion may be to use files. However, I don't see why anybody would come up with a hierachical file system, unless they were accomidating for hardware limitations. Placing files somewhere within a huge directory tree is just too darn complicated. Why should the same file not exist in multiple directories? Why should copies of a file exist? Everything, including advanced security policies (more advanced than what is currently possible) is available for a *keyword* driven filesystem.
I believe this is a step in the right direction and I can't wait until my favorite OS (not Linux) adopts a similar feature.
That's the whole reason for the program -- you shouldn't have to remember long, detailed folder structures and filenames in order to retrieve a file you were looking for.
I can't tell you how many times I've had to help users find some file, shortcut, document or spreadsheet that they've "lost" because they forgot the correct path. But they do remember it involved a loan, or it involved a party announcement, or something similar. I swear, just the other day I spent an hour waiting on another employee to get off the phone so I could find a folder shortcut another employee had lost. She wasn't sure what folder the shortcut referred to, but she knew it contained documents of a certain type.
Do you see a pattern here? To me, this sounds just like what Microsoft is trying to do with Longhorn, and potentially Office 11. People are tired of searching and hunting through folders and heirarchies full of oddly named files and temp folders that can confuse Joe User.
This is awesome software and definitely a step forward. It might not change the geek community, but it will certainly help out system admins of the world. While your method still works (and hopefully, in the future, these two systems should work hand-in-hand, but that's another project I suppose), this is a damn fine alternative.
I agree. Basically the only way this is different from your HFS is that it encapsulates the meta-data (that is currently in the path name) differently. I'm not sure that's any better or worse. In fact, I myself like to be able to see at a glance what all the categories of documents that I have are which is quite easy with HFS, but doesn't sound so easy here. Perhaps that's more because this is a new idea and not mature yet.
Everyone seems hot to SQL the file system, and while I think that will be the way of the future, I don't think that there is a clear view of how that works from the user's perspective yet. Remember that this is a rather large paradigm shift from what everyone is used to. It's going to take a while for this to mature to the point that Joe User is going to be able to hack it. I mean, I looked at the Save As dialog on that page, and while it looks cool it also looks counter-intuitive to me and I'm a developer! How much more will a user get confused?
All in all we're going in the right direction, but by no means are we anywhere near the goal yet.
Ben
Exactly. In fact, these hierarchies do not make sense to anyone, encountering them for the first time. There's nothing user friendly about them at all, really. They aren't even alphabetically sorted, which is the least you usually expect from a file cabinet. It's just the simplest way of doing things and it seems logical to you, because you haven't worked with any other kind of file system since you're first computer experience. Admittedly, a keyword driven system would not give you a shorter syntax. But administering a system using thousands of levels of subdirectories would not do that, either. Imagine a database driven file system, combined with near-perfect speech-recognition software. Suddenly the additional keywords required do not matter so much, and the advantages of a system like this could really become obvious.
This is exactly what I have been wanting for almost a decade now.
..etc as well as those that are simply wallpapers and photos). More importantly, if you see a good bump texture for a certain surface, describe it as such without changing the filename.
Some uses I imagine
- Create music playlists on the fly (MoodLogic doesn't count)
- Categorize work files (Across the whole partition, find images that serves as bumps, HDRI
- Install Windows and service packs first, mark files as "windows native". Then install apps. Some OS glitch, you need to reinstall ? Backup all files with directory structure which don't have "windows native" tag alongwith c:\program files and registry. Reinstall windows, restore the backed up files. Voila, no app installations required.
I was reading about Reiser4 last weekend and HR mentioned similar functionality IIRC. I would hope everyone can sees the point behind metadata...it's kinda the reason XML is considered a GOOD THING. The question is...can we shift our paradigms to use this newer model? Change is hard to effect...this would have to be adopted be a mainstream OS for this to really catch on and be widely used. (Asbestos uunderwear on!) Isn't Longhorn's new DB filesystem also supposed to offer some or aLL of this? (RTFA if you want to reply please!) MS might not be as behind the curve as we'd like to think....time will tell if this will actually be widely accepted. My .02.
Always value the individual over the system. --Bruce Lee "I don't need a Sig - I have a custom 191" - me
of the difference in the GUI vs. command line mind set.
These abstraction layers have been used before on OSes such as MAC OS and OS/2. The problems always came into play when you pass the files around. There is always a step that strips the extended information. The key is wide acceptance and establish a standard for the data storage. Be sure there is a way to pass the extended data in a text format (i.e. XML) when you want to store the files on a non-supported system (or so command line tools can be easily modified to update the db).
The idea is good and I am sure it will be very useful to a lot of people. Good Luck.
This looks a lot like something I've used in the past - FileNET Content Management Services. FileNET lets you create meta-data for each document you save, as well as a complete version history and check-in/check-out for each document if you want to. It also allows for hierarchical storage of files as well as using the meta-data so you can still categorize things by folder if you want, but still query documents by any of the indexes that you have built. It will even add a full-text search across everything in the library if you want, and it has no problems indexing most standard formats including Word and PDF files.
The system I have been dreaming of for a while would be far more graphical (had a quick look at thebrain.com, it's still text with a few lines as far as I can see).
My dream system would enable you to specify file attributes such as size, path(s), name, type etc, as well as regex greps on the content, and then plot the filing system in 3D space, through which you could move with a joystick. You would be able to assign attributes to graphical features, eg make scripts cuboid, text files spherical, bigger files bigger on a logarithmic scale and so on. Related files would appear like solar systems, and by changing the importance of the file attributes you could change the way the files grouped.
Probably not what you'd want to use every day, but I'm sure I'd find a few mislaid files with such a system.
Virtually serving coffee
Hierarchical file systems are as close to intuitive as you get. Everything you do in the real world, as pertains to dealing with information, mimics a hierarchical file system. Your chilton manuals are in the garage, your cookbooks and recipe boxes are in the kitchen or dining room, your computer books are by your computer. You don't look in the computer manual for how to change your oil. When you are trying to bake a cake, you don't walk out into the garage for inspiration. Having information organized into different places, and then having those places subdivided into different boxes is intuitive, and is how most organized people think.
v able\Yesterday\Tomorrow\A WeekAgoToday might be confusing, but the filesystem paradigm isn't.
1. (a) "We don't need no stinking filesystem." The ideal palmesque OS would have the same idea just demonstrated differently. You aren't going to open up your notepad to see an address. The address file is in the address program (directory). The schedule file is in the calendar program(directory). The programs you use to open the files become your folders.
1. (b) "Saving/Opening files should be transparent" The only people that would think like this in the real world have been living with someone that picks up after them all the time. When you are working on some (paper and pencil) project, and just stand up and walk away, do you exepect it to be available at the office tomorrow? When you start working on several projects in succession on your desk, and have reams of loose paper, can you easily bore your way back down. No, reasonable, organized people pick up the porject they are working on, file it away in the file cabinet/brief case/wherever it is supposed to go. There are logical beginnings and endings to your working on a project that only you can decide on. A spreadsheet, for example, do you want it to save every time you make a change... No, by their design, you would normally set up all your formulas, save that, and then every day/month/year open up the spreadsheet, plug the numbers, get the results, and save the specific results to a different file, or just look at the values produced. Not to mention, when you sit down at your desk in the morning, do you expect your desktop to know what project you want to work on? No, and you don't expect your computer to know what project you are working on either. Opening/Saving files shouldn't be and can't be transparent to the user.
I used to use a lot of floppies when growing up. I appropriated a lot of disks from other places. I used the "grab the black disk with the couple of remnant label pieces... no the other black disk... No, the one with the two small pieces of adhesive... Ooops, the one with the three pieces..." Now, I have to search all the disks everytime I want anything off of them, because I never labeled them. Saving things in well defined locations, for well defined tasks is reasonable, intuitive, and necesary task to saddle a user of any system/technology/information with.
2. I don't really need to address this point specifically, since the answer is inherent in the points above. The overly large filesystems are part of a whole system that the user doesn't really need to know about. That is why the "Desktop/..." paradigm of Windows came about, and is so useful. People working on your word processor have a reason to put the font files in one directory, the plugins in another, and the preferences in a third. The user couldn't care less. If you start the user in a directory tree just for them, then they won't be stuck in a huge file system, and can still work in a fashion that has made sense for litteraly thousands of years.
The filesystem paradigm has been around for a long time, again litterally thousands of years, because it works, it is easy, and it is how people think.
G:\Netowkrfilesystem\
Accounting\AccountsRecie
Should my porn directory be organized into movies, stills and texts or perhaps perverted, spicy and nice? Whichever atrribute I choose I will have trouble searching on the other.
How about store the files alphabeticaly, by model name? Install PostgreSQL/PHP and assign key words. Use drop down menus (breast size, hair color, action, file type (image, movie, etc.) etc.) to only bring up what you're looking for. It seems that this is what this guy's doing, only for KDE save/open.
I drank what? -- Socrates
I have recently become very annoyed with the way I am storing information. I've realised that I have four parallel, similar, yet completely independent methods for cataloguing information. One is the file system - directories containing documents, images, etc on various subjects. The second is the Favourites list in my web browser, containing links to web sites on various subjects. The third is my email contact list, containing groups of contacts in various categories. The fourth is my mailbox hierarchy, containing archives of emails on various subjects.
What I realy need isn't a way to help store one category of information, but a single unified way to store all related information together, by subject. All my documents, emails, web links and addresses relevent to a particular customer, for example.
I don't realy need a search tool (although they're always a nice function to have), I need a way to keep everything together and easily accessible.
Simon Hibbs
I've noticed about three main types of people in the world of open source: those who fix things, those who try to improve existings things (i.e., make it run faster, smaller, etc.) and those who like to tinker and make new stuff. This person seems to fit in the third category. As far as I can tell, this person is not so much trying to "fix" the file system, but to make a new and different version and/or approach to it. This may be a good thing. But if you don't like it, don't use it.
Life sucks, but death doesn't put out at all....
--Thomas J. Kopp
The author mentions softlinks, but claims that most uses of them are to make shortcuts. Well, maybe on M$ systems, but real systems let you use them better, and a little education of users (and perhaps a GUI-based frontend for 'ln') would make them more popular.
Everyone I know using UNIX-based systems easily grasps the idea of linking a file so that it appears in more than one place, and uses that.
A zillion years ago there was a concept called the 'Xanadu file system' which, if I am recalling correctly, was very similar to what the author has actually implemented. I did a quick google and found one tangential reference to it for those interested in late 1980s/early 1990s history
http://tgif.fremont.ca.us/~mfw/diss/node39.html
That the author has produced working code is a HUGE INNOVATION. That this innovation has been produced by one person with a personal itch to be scratched is the reason that free/open software works so well.
This sort of improvement in the user interface is what will allow Linux/BSD derivatives to drive right over the top of certain proprietary systems in common use today.
I am very easy to get along with, but I don't have time to waste being nice to people who are being stupid. -Theo
I think the tree model is ideal. What is not ideal is everything after the tree.
The file selection widget (FSW) is a core element of any high-level toolkit, and yet I've never seen one that provided any kind of utility that I need to make a filesystem work well in a GUI.
For starters, all FSWs should have memory, and they should understand what they're being used for. All of my graphics apps should "remember" where the last graphics app saved a file and default to that directory. Same goes for opening a file. Or office apps.
They should also have a history pull-down.
We also need a graphical abstraction for the filesystem (other than the MS-like horizontal tree) that customizes itself through use. If, for example, there are three directories that I load and save files to/from all the time, they should be the most obvious and accessible things in the tree.
Do these things, and graphical interaction with a filesystem makes sense.
As for a metadata filesystem, I think there's utility in it to some extent, but unless "rm" understands it, and it's easy to use from that level too, it's useless to anyone who really USES a UNIX(-like) system.
Actually I would LOVE to have everything accessable in a database somehow. I've been wondering about something using the userfs stuff. Not really mounting a mysql database as a usermode filesystem but having information from the system available that way.
I've found myself many times wishing I could just type "select location,filename from datastore where contents like %resume%"
SQL comes much more naturally to me than the find command does. I would love an easier way to index the contents of everyfile on my system by an arbitrary number of metadata and then have that accessable via a simple sql statement.
I remember Scott Hacker did something similar with BeFS and his webserver at somepoint but he's long gone as is BeOS.
Am I the only one that this makes sense to?
"Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
Gosh, are you telling me I have to think up keyword and the like? Smells like work to me.
/. article about this some months ago), and the user could really be confused! It might be neat to have the system automatically find neighborhoods of documents (by content matching and by time).
Wouldn't it be great if this overpower POS (piece of silicon) could catagorize the document itself? It would not really need "natural language" ability; just steal (er, borrow) ideas from web search engines and have a thesaurus handy.
Combine this with the idea that the "save" button is outdated (there was a
The real silver bullet to good programs is caffeine; lots and lots of caffeine! *twitch, twitch*
"Filesystem? I don't need no stinkin filesystem!" An ideal Palm-esque computing environment wouldn't have any filesystem.
I've been thinking along these lines for a couple years now. Suppose a computing appliance, perhaps handheld, or not, didn't have a filesystem. How would you make use of the hard disk?
Suppose the software saves everything in memory resident database. No filesystem, and no disk. Everything stays in memory. But it is virtual memory. Every page in memory has a reserved backing store page on the disk. The disk partition for this OS is just a big swap area. The total size of your usable "memory" is the swap area, not RAM. Now powering off the device becomes very fast. And so does powering on. No more "booting up" nonsense. You press the "off" button, and almost instantly the device is off. No matter how much data you have, or if you were in the middle of a huge unsaved word processing document, the device instantly powers off and back on again. No artificial concept of "saving" a file -- just like PalmOS. You don't "save" anything. In fact, no artificial concept of computer files. (For flamers: I'm not outlining a fully fleshed out implemention here, just some rough ideas, think different.)
You can still move your stuff to other computers via. "syncing" or whatever you want to call it. It's just that higher level concepts are copied, uploaded, downloaded, e-mailed, etc. rather than a file (i.e. collection of untyped, unlabeled bytes). I may move my mp3's, and they are still categorized by artist, recording, date, label, etc., etc..
I've also been thinking that a filesystem such as NTFS or ReiserFS that allows attaching huge ammounts of metadata, or small amounts of metadata to any file would be important. For instance, my 4096x2048 digital photograph of the grand canyon (big file), should still be able to have a thumbnail (say about 128 KB) attached as metadata. Since the thumbnail is part of the "directory" information of the file, merely copying the file to another location retains all the metadata. (As opposed to Windows or KDE, where the thumbnail is another little hidden file somewhere near where the original file was stored.) Heck, I might want a graphic thumbnail metadata attached to an mp3 file. Of course, I suggest ReiserFS or NTFS because there should be no limit on the number of labeled metadata attachments, nor on their size. I should be able to attach metadata "Title":"Grand Canyon", "TYPE","TIFF", or "Audio Clipping":<5 MB of audio data> just as easily. When I move the file, the metadata moves with it -- but the metadata is not seen in the primary information flow -- i.e. sequence of bytes -- that make up the "file" data.
As much as I hate Microsoft, I expect that it is they who will do stuff like this first. Ideas such as I am discussing here will encounter lots of resistance from the old school. Just look at the resistance to the topic of this article in this discussion. (I remember when we had to had to organize and save our files ourselves, and we used stupid extensions like ".jpeg" as the only metadata, and it was uphill both ways.)
Drifting to a different topic, I wonder if true innovations at higher levels come from us geeks? We put up with the most abysmal user interfaces for so long that we are not even capable of recognizing a bad user interface. We are comfortable with what we've got. I frequently see the attitude: if I can learn this stuff, then you can too. If you can't get under the hood of your 1920's car and fix it when it frequently has minor troubles, then you shouldn't be driving. Where I'm going with this is that it may take talented people who are being paid to build next generation interfaces who follow someone else's vision who is not constrained by the present.
Just some opinions. I should quit rambling now.
The price of freedom is eternal litigation.
Case 1:
I'm your average home user, but even so I have about 100 documents I work on. However, I was smart enough to give them meaningful filenames and locations where it takes only a few seconds to find the file. Remembering attributes for each and every file would be a pain.
Case 2:
I'm a developer. I'm sorry, but I want file Y in F/O/O/BAR. I need something exact to describe where a file is at least. Anything else doesn't work.
Case 3:
I'm a mornon who doesn't give a flying-f*** about where I put my files, and I don't care what I name them. I already have documents in my C\:, C:\Windows/Temp, C:\sdf34\, and C:Documants. It takes me a couple minutes or two to find a file. What? I have to classify by keyword now? Who do you think I am? It needs to classify the files for me or I won't have any of it.
Case 4:
I'm a scientist/business man that deals with classifications on a day to day basis. I already have a database because I needed it to be efficient. If it was on the file system level, then it'd be pretty cool.
I can't think of any other positive cases where this product is useful. Thus, it's my bet that it'll be niche forever. Anybody got any other use cases that I'm obviously missing?
While I do think the work presented is a great idea, it seems to me that it's a lot of effort just to setup the system.
Thats pretty much the problem with meta-data based file systems. They're great for new projects, where you have a clean start and can actually add metadata to the files. The real problem is legacy data.
My home directory weighs in at just under five gigabytes, and has files dating back over ten years, and thats just the "personal stuff". My work partition has about eight gigabytes, which is mainly source code.
I'm really not going to be able to associate metadata with every individual file by hand. Until automatic tools come along that will data mine the file content and automaticlly do some minimal level of association.
On top of this a whole new generation of development tools needs to be written. At a very basic level you need a version of make that will build all C source files on the disk with associated meta data "Belonging to Project X, dated no later than last week".
When you think about it you'll realise that while as a concept its fairly powerful, we won't be switching to using this sort of thing soon. For the same reasons the semantic web and RDF are having problems getting adopted, metadata based file systems face real problems before people will start widly adpoting them...
Al.The Daily ACK - Eclectic posts by yet another hacker
For those that don't want to use this... don't. But i can almost guarantee that you don't know where all your scripts and documents are located off the top of your head. And other people who might need to browse your shared directories certainly don't. And you can't tell me that there has never been a time that a doc could have been properly placed in more than one folder- everything can't be pigeonholed into one exact category... that is the nature of information. However, i agree that this system has its limitations. I've been wanting to do something like this, but it would allow you to save your document where you want in the HFS, and would make links from other "category" directories to the "actual" directories based on what the computer knows about the file (i.e. it is an mp3 file, so it goes in the audio category and the mp3 category, at least) and categories that you select in the Save dialog box. These categories and files would also be indexed in some type of database for additional searching capabilities. Please let me know of any products that do something similar to this.
I agree with you totally!
:-)
I'm sick and tired of having to navigate to the same folder each time from each app. Even once I set the "default directory" in each app, some of them ignore it. Most apps don't even care where I want to store my files, so don't give me an option. I like the idea of pushing that functionality into the FSW and entirely removing it from the app.
One clarification concerning management of file locations. I frequently find myself flipping between 2 or 3 basic locations because the files are of the 3 types. The FSW would have to be smart enough to anticipate which folder I'm wanting to work with (not just look at the file extention, but based on the pattern, I'm going between A, B and C, and just used C, so chances are I'm going back to A. If that is the level of intelligence you are considering, OH GOD YES! I'm there, dude!
=======================
Psyclo, the dark night.
Mike, the computer geek.
We have a document imaging system that does basically just that. It's a Win32 package called application extender from OTG software. It hooks into your file->save dialogs and stores all your documents in a share with a nasty ID as the name, but then you look things up via the attributes you've set. Normal users don't actually even interact, or know where the true files are stored.
It's actually excruciatingly painful for users to deal with sometimes, since their interface makes it very difficult for normal users to figure out how to open an "actual file" rather than something in the application extender database.
My problem has always been, for example, for class note-taking, do I set up:
college
class 1
homework
schedule
notes
class 2
assignments
schedule
notes
(etc)
or
college
homework
class 1
class 2
schedule
class 1
class 2
notes
class 1
class 2
(etc)
And I've often thought about this ability. Perhaps add some autodetection capabilities... give files automatic attributes such as "English"/"Spanish"/"Romanji" or "C"/"C++"/"Perl"...?
if the answer isn't violence, neither is your silence / freedom of expression doesn't make it alright
For one thing, HFS makes document security simple. By storing in directory X, you limit use of the document to those with various levels in User Group X.
For the home user/single PC, it's GIGO -- no matter the file system, whether HFS or metadata, the user has to recall it. Usually when looking for those 2-y.o. records, the user will give up and do a full content search. No great loss in productivity for the simple home user, who doesn't have that much data to organize in the first place.
For corporations with networks and immense document structures(where metadata comes in handy), there are already dozens of software/servers that allow indexing by metadata -- like Centra2000 (now Konfig), or *gag* Sharepoint Portal Server, or Documentum. The admin stores documents in an HFS (for determining security/accessibility), but the users find the docs using metadata, indexing, or links without having to worry about the OS Directory location. Very reliable, easy for users to understand.
In the end, the problem is solved for business, and for home users, the problem is the home user, not the amount of data or structure of the FS.