newdocms: Beyond the Hierarchical File System
Manuel Arriaga writes "After two years of hard work (and many scrapped versions), I have just released a (ugly, but working!) preview version of newdocms, a completely new document management system. newdocms isn't a file browser: it is a layer between the hierarchical file system (HFS) and the user, which provides a radically new way to store and retrieve documents. No longer will you browse complex directory trees or directly interact with the HFS; instead, you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on.
For the first time you have a true alternative to the hierarchical file system at the OS level. Through the modification of the KDE shared libraries, newdocms currently works with all KDE apps! (I am looking for volunteers to add support for GNOME and OpenOffice.org!) This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
It sounds basically like when you want to find a file, you go type in a few pieces of meta-data, and then hit "search". It's a way to do it, but it seems to me (and it's early, so bear with me) that it's easier for me to remember one piece of meta-data (i.e. the path to the file) than several (as it would seem with this setup, as you would have to present more than one piece of data to differentiate between different documents, let's say, created by the same author on the same day). Maybe I'm just used to a HFS, but I find it simple to open up a command prompt and type "pico /documents/foo/bar/fubar.txt".
Anyway, an interesting concept.
This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems.
This is completely untrue. There are lots of other options (like The Brain) that have been out for a while that have nothing to do with "free software". Hell, the fact that other proprietary systems (that are better, in my opinion) came out earlier shows that not only is "free software" irrelevant in this discussion, but it actually lags behind software driven by the profit model.
While I do think the work presented is a great idea, it seems to me that it's a lot of effort just to setup the system.
It would be ideal if the computer -- the thing that is supposed to make life easier -- did the clasification. Until that happens I cannot see myself even considering such a file access method.
-- bartman
Who came up with the idea of "folders" anyway? Not hierarchical trees, but the metaphor.
The biggest problem with folders is no one wants to be a file clerk and weed, sort, and file their docs. The act of socking away a doc should as mindless as possible, not because (all) users are mindless but because they have better things to do, and shouldn't spend a minute adding keywords to every doc they might never see again.
You know how it is -- you're searching and coming up with junk, and want to yell at the computer, do what I meant, not what I said! This would be one of my first pics for AI on a personal computer.
I agree folders doesn't cut it, though as a metaphor for explaining the tree it's not bad. The problem is the tree.
That was done pre-UNIX with PICK. The whole O/S was a database.
Microsoft has been working on an Object File System for years and it is rumored that it might finaly ship in Yukon.
A database baked file system is a great idea for an O/S. But the relational model is long overdue for the garbage pail. Modern programming languages since C have used pointers or object references. If JOIN and messing arround with tables is so good why don't we all use COBOL?
One of the things that appeared in VMS a while back that was pretty cool (and pretty easy to do on a log based file system) was transactions at the file level. You could take any set of file I/O operations and wrap a transaction arround them. This meant that you could have atomic updates to any file base resource without having to suffer the pain of SQL.
It would be pretty easy to implement this on a Linux log based file system (or windows for that matter). All you do is extend the log structure so you can group operations together and implement some sort of commit flag.
You could then build an object oriented filestore database using XML flat files. OK so maybe the system is not going to be up to storing millions of records without more infrsastructure. However most programming tasks use configuration files that are unlikely to be more than a few tens of Kb and are routinely managed as in memory structures anyway.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
I have used "The Brain" while I was in Windows, but it was nearly useless as it didn't support the two most important things:
.docs or .xls.
a) Web browsing
it should now the sites you've visited, know your bookmarks and allow you to open everything you have found with a simple click.
b) E-Mail.
When it finds an E-Mail a simple double-click should be enough to open it in your mail, show you the thread it belongs to, etc.
I guess, that I'm not the only one, who has more important things in mails than in
Bye egghat.
-- "As a human being I claim the right to be widely inconsistent", John Peel
1. Paths tend to get long.
2. You have to be careful of your "current path". Some apps have weird defaults and if you're not careful, you end up with your file in a strange location.
3. Some items do not fit into the hierarchical structure. Should my porn directory be organized into movies, stills and texts or perhaps perverted, spicy and nice? Whichever atrribute I choose I will have trouble searching on the other.
Of course I can always use locate or find, but these tools only look at preset attributes (filename, last access date, substrings) and the solution from the article lets you specify your own attributes.
So where do your documents go when you save them with newdocms? As you might have noticed (if you looked at the window titles after saving something), they are stored as ~/Docs/{numeric id}.{ext}.7 All the metadata is stored in a file called ~/newdocms.db. (It is not wise to delete it!) In that file each document's attributes are associated with its unique numeric id (the one which is used as a file name).
Right.
This is astoundingly bad software engineering.
Manuel, when your software fails, and it will, and somehow that db file gets trashed you've rendered that users' files as a huge heap of unsorted data. Effectively it would be 100 times worse than never implementing your system than 10 times better. No matter how bulletproof you think your code is, it probably isn't 100% perfect so having all your eggs in one basket is unwise to say the least.
Even if your code is 100% perfect this is a mistake. What happens when a sector goes bad and this file is trashed? What happens when the first really dangerous linux worm makes it a point to delete *.db from the filesystem?
Give the files names that are coded with human readable attribs! Double up that db file! Jesus, man... build SOME kind of redundancy in your system before you throw away the old way of storing the data.
There's a reason why there is such a scramble to implement a general attribute system at the FS level on many FS projects right now(*). The time has come for OSS to start being smart about this, but cramming all your metadata into a single file and throwing the backup out the window is just a very, very poor idea.
(*) BeOS was, yet again, way ahead of it's time with BeFS.
I believe metadata is a useful additional means to find files, however I would still want heirarchy as the primary storage. For most people the only metadata they ever consider is the name of a file, and this is often poorly named. I applaud the effort of the person who is doing this project though.
-- Solaris Central - http://w