newdocms: Beyond the Hierarchical File System
Manuel Arriaga writes "After two years of hard work (and many scrapped versions), I have just released a (ugly, but working!) preview version of newdocms, a completely new document management system. newdocms isn't a file browser: it is a layer between the hierarchical file system (HFS) and the user, which provides a radically new way to store and retrieve documents. No longer will you browse complex directory trees or directly interact with the HFS; instead, you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on.
For the first time you have a true alternative to the hierarchical file system at the OS level. Through the modification of the KDE shared libraries, newdocms currently works with all KDE apps! (I am looking for volunteers to add support for GNOME and OpenOffice.org!) This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
"This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
... or not. As I recall, BeOS had a fully functional database driven file systerm although it did not entirely through out the hierarchical side of things either (probably a good decision in my opinion). In fact, I recall reading a while back that future versions of Windows were supposed to have database driven file systems as well.
While free software is great, let's not get too cocky about what kind of innovations it can produce when we aren't aware of what the traditional software companies have already done.
Microsoft Sharepoint also allow you to store your own metadata with files - and also grab the "properties" from office files. This is not to substitute the folder tree, but in addition to it, and it's indeed a great tool (aimed more at the corporation than the individual)
:-P
But it's MS and here I am burning karma for just mentioning it. Big deal, I can spare the karma
I like this guys enthusiasm for open source.
I have questions though about the users ability to apply meaning attributes to the numerous amounts of content. If the user fails to provide meaningfull attributes the system fails to provide the user with meaningful results. In which case I would judge this system to less user-friendly because the files would be returned in a 1 big lump.
This idea stricks me as an implementation of something similar to the Dublin Core Metadata Initiative except for local content. Wouldn't this project benefit from enabling the user to manage ALL types of information, even remote. It wouldn't be a large stretch of the imagination to take that step.
If anybody is interested how the Dublin Core works in application you might want to check out the Zope CMF(Content Management Framework).
My experience from using Zope's CMF is that the initial learning process of a user using this method of organiztion was slow and bumpy. Although I must point out that my experience with the system was only with using a single implementation, so I'm not making the assertion that an implementation couldn't be designed that could improve the learning curve for users.
I would also like to point out to the people that have said this would ruin Linux that they don't understand exactly what this tool does. Its a means of effeciently catalogging and managing content. Any use of the tool does not restrict the user to that tool alone; it can be used in conjunction with the traditional HFS. The author even says so in the article.
"Russian puppets - forgot the name
Babushkas. If you want some, there's always Google."
Nope. They are called Matryoshka dolls. Babushka means 'Grandma'.
Windows XP has most of the groundwork for this - Windows has actually had it for a while; for some reason the last piece (the filesystem that lets you take advantage of it all) keeps not showing up.
You want metadata on files? NTFS streams give you a place to store metadata (much like Mac resource forks but with any number of named streams).
You want to search on the metadata? The Microsoft Indexing Server will build a database and let you search on it (though it's a very strange system to use - in XP go into Administrative Tools, Computer Management, Services and Applications, Indexing Service, System and click on "Query the Catalog". You can do instant searches for all kinds of stuff, look at the help.
OLE Structured Storage is like a single file version of the filesystem we're talking about - a way of saving a bunch of objects (some of which you didn't create but that are in your document) into a file. I believe Microsoft's Office apps use it (could be wrong there though).
Right-click on an MP3 file and pick Properties in XP and go to the Summary tab. There's the metadata - the stuff the index server is going to index. If you add a new file format to the system, you can supply a DLL that will be able to supply the metadata for those files - so you download an MP3, save it on your disk, and the index server uses the DLL to get the metadata and add it to the database. It works pretty well.
I don't really have a point to all this, just listing some stuff that Windows has that "should" make it easy for Microsoft to add the OO FS someday and have it instantly work with existing apps.
- Steve
Extract and populate the meta data from generating application would be a good and obvious starting point.
e.g.
HTML: use TITLE as an ABOUT entry (and Hn blocks too), the URL as the SOURCE etc..
DOCUMENTS: use the XML schema for atributes
PICTURES: The COLOUR DEAPTH and RESOLUTION can be simply extracted
Then let the user add/edit the attributes that have been automatically set. Not ideal but starting a point.
XP & 2000 have full text indexing. You can either run a service that contantly indexes your files to search quickly, or it will just search through your files but it takes a while.
That indexing on W2K is just about worthless. It's much slower then anything in Unix and frequently It gets convinced that your hard drive is empty, by that I mean all searches instantly return false. I ended up turning it off it truly is a worthless piece of junk.
War is necrophilia.
Folders as a way of describing file hierarchies were part of the "desktop metaphor" that was developed in the late-70s/early-80s at Xerox PARC as part of the Xerox Star system. (I might have some of these details wrong, I worked at Xerox in the 80s.) The whole point of the desktop metaphor was to transform geeky computer internals into concepts the average office worker or exec could understand. Star even used "file cabinets" to organize folders.
/a/maze/of/twisty/little/folder/names/all/differen t.
Anyway, we did a lot of other cool stuff at Xerox in the 80s. There were two other information management systems that used non-hierarchical organizations. The Analyst (implemented in Smalltalk-80) and NoteCards (in Lisp) both had lattice file systems. You could create arbitrary links from one item to another, with lots of different kinds of links each with its own semantic meaning. It was an amazingly powerful way to navigate your files.
Why go to all that trouble? Because we found that it didn't matter how carefully people filed stuff away, they always were losing things. So the important thing was to make it as easy as possible for people to find their files, either by browsing or searching. In The Analyst, a document could be linked to by multiple folders, keywords, or other documents. The browser and search tools took advantage of the richness of linkages to make finding things easy. You just had to remember a few things about the item to locate it, rather than having to recall
Corel products have been doing essentially that for years -- you can set any number of directories as "places to find whichever sort of file", named and sorted however you like. And they do remember where you were working last time you had that app up.
~REZ~ #43301. Who'd fake being me anyway?
Ok, that worked for that companies files, but how do I manage the 100,000+ files just on my laptop (and no, it's not porn. Some of us have real lives and real data!)? Moore's law also applies to the amount of files collected by users on their hard drives, but we are rapidly reaching the limit of files we can manage (i.e. navigate, not store) with the traditional file systems.
This is a problem of sufficient complexity that it is probably beyond any single individual to solve. The existing hierarchal system is flexible and simple to use. It aint broke, but it is limited in its ability to support the management of a large amount of disparate but loosely linked information. However, getting a taxonomical attribute classification file system into the mainstream will require a quantum leap, as there are many problems to be solved to achieve the same simplicity as our existing solution.
A good attribute based file system is real hard:
They can be used for different purposes and users
Attributes should be hierarchal (classifying): E.g.: Operating System::\bin\bash, or Operating System::\system32\drivers\bin
Note that while it is possible to have strong taxonomies that are non hierarchal through use of multiple attributes, the hierarchical structure gives a richer and more precise language for the taxonomy. Also, simple attributes like "blue" are fairly meaningless - are we talking about "emotion\blue" or "color\blue"?
Need to have multiple attributes to achieve a full classification. E.g. Operating System::~\My Documents\, Applications::\Word\, User Classifications::~\work\\projects\
Classification data required should be minimal in the beginning
Need to classify files enough to maintain "sensible" uniqueness. Timestamps can differentiate between two files for uniqueness (e.g. financial reports#1999 Vs financial reports#2000), but more meaningful classifications can be found (e.g. financial reports#My tax Vs financial reports#Enron Corp). (Especially as file system timestamps often don't match the original time relevant to the content).
Classifications for existing content should evolve over time, i.e. more precise classification data should be added to old content as new content is added.
We need multiple layers to support such a file system:
Underlying file system, APIs etc
Applications that are taxonomically aware
GUI and command line based tools of equal capabilities
Tools for manually classifying content
Navigation tools that provide rapid, intuitive navigation of multiple dimensions
Users trained to understand the file system metaphors & mechanics
And then we need:
Intelligence tools to automatically suggest classification attributes based on the content and the systems learnt understanding of the user.
We need to know when to stop! Hey, we could build a full AI system to manage contextual relationships and understand content, but let's get the minimal set of features required to make the use of the system compelling.
SQL is not the answer here...
The metadata management risks and issues are similar to other file systems. Remember DOS compressed drives? Eventually MS achieved reliable file system compression in NTFS. Yes, this is complex and risky until the bugs are sorted out. No, I don't want to run a production system on an unstable filesystem.
So good luck to Manuel. This will be an area of much activity.