Slashdot Mirror


newdocms: Beyond the Hierarchical File System

Manuel Arriaga writes "After two years of hard work (and many scrapped versions), I have just released a (ugly, but working!) preview version of newdocms, a completely new document management system. newdocms isn't a file browser: it is a layer between the hierarchical file system (HFS) and the user, which provides a radically new way to store and retrieve documents. No longer will you browse complex directory trees or directly interact with the HFS; instead, you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on. For the first time you have a true alternative to the hierarchical file system at the OS level. Through the modification of the KDE shared libraries, newdocms currently works with all KDE apps! (I am looking for volunteers to add support for GNOME and OpenOffice.org!) This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."

29 of 650 comments (clear)

  1. I already use a different one: by NineNine · · Score: 5, Interesting

    I'm already using The Brain. It's *really* unique, and it works. It works very well. And, in addition to organizing files the way YOU want them organized, it also connects random thoughts, web sites, emails, etc. If you haven't seen it, check it out. It's pretty damn incredible.

    1. Re:I already use a different one: by NineNine · · Score: 4, Insightful

      This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems.

      This is completely untrue. There are lots of other options (like The Brain) that have been out for a while that have nothing to do with "free software". Hell, the fact that other proprietary systems (that are better, in my opinion) came out earlier shows that not only is "free software" irrelevant in this discussion, but it actually lags behind software driven by the profit model.

  2. Interesting... by Akardam · · Score: 5, Insightful

    It sounds basically like when you want to find a file, you go type in a few pieces of meta-data, and then hit "search". It's a way to do it, but it seems to me (and it's early, so bear with me) that it's easier for me to remember one piece of meta-data (i.e. the path to the file) than several (as it would seem with this setup, as you would have to present more than one piece of data to differentiate between different documents, let's say, created by the same author on the same day). Maybe I'm just used to a HFS, but I find it simple to open up a command prompt and type "pico /documents/foo/bar/fubar.txt".

    Anyway, an interesting concept.

    1. Re:Interesting... by Obiwan+Kenobi · · Score: 5, Interesting
      Maybe I'm just used to a HFS, but I find it simple to open up a command prompt and type "pico /documents/foo/bar/fubar.txt"


      That's the whole reason for the program -- you shouldn't have to remember long, detailed folder structures and filenames in order to retrieve a file you were looking for.

      I can't tell you how many times I've had to help users find some file, shortcut, document or spreadsheet that they've "lost" because they forgot the correct path. But they do remember it involved a loan, or it involved a party announcement, or something similar. I swear, just the other day I spent an hour waiting on another employee to get off the phone so I could find a folder shortcut another employee had lost. She wasn't sure what folder the shortcut referred to, but she knew it contained documents of a certain type.

      Do you see a pattern here? To me, this sounds just like what Microsoft is trying to do with Longhorn, and potentially Office 11. People are tired of searching and hunting through folders and heirarchies full of oddly named files and temp folders that can confuse Joe User.

      This is awesome software and definitely a step forward. It might not change the geek community, but it will certainly help out system admins of the world. While your method still works (and hopefully, in the future, these two systems should work hand-in-hand, but that's another project I suppose), this is a damn fine alternative.

    2. Re:Interesting... by Elwood+P+Dowd · · Score: 5, Insightful

      Except that those users that can't remember where their shortcuts are aren't going to set up good metadata in the first place. So knowing that it's about loans isn't going to help anyway.

      When it comes to that, users just need full text indexing of their documents so they can do full text searches more quickly. Iduno about windows, but we've definitely got that in mac os.

      --

      There are no trails. There are no trees out here.
    3. Re:Interesting... by Theatetus · · Score: 5, Insightful
      When it comes to that, users just need full text indexing of their documents so they can do full text searches more quickly. Iduno about windows, but we've definitely got that in mac os.

      Great for writers, not so good for graphic artists. I sysadmined for a few years in a graphics/video shop that had tens of thousands of images on the various fileservers. I essentially wrote a very simple version of this "DB on top of FS" idea because I was tired of helping people find their TIFFs.

      Yes, /home/projects/DOJ/annual_report/masters is just one piece of metadata, and some people find that easier to remember than several keywords. OTOH, suppose two years later you want to reuse that image of the hispanic male using a computer. Was that in /home/projects/DOJ/annual_report/masters or /home/projects/USDA/website/images ?

      My solution (and, it would seem, the article's, though I'm sure that one is a lot more robust), was to keep the users away from the FS completely. Just let them bring up all the images tagged with "hispanic male computer." Most graphics shops I've seen either built a DB file manager or bought one.

      Honestly, I think the idea of computers holding a lot of "files" organized into "directories" is a little old. It was great in 1970 but maybe (like this guy is doing) we should rethink it a little. Why not say a computer has certain knowledge ("files") and certain capabilities ("executables")? Rather than naming files, describe the data you want the computer to retain, and retreive it later from that description.

      As somebody pointed out, Office2K/XP and W2K/XP have something like this already, but people don't use it because they still have to name files. That's the crucial step, I think, and that's why I took that power out of my users' hands. They never named files; the app did it for them. Instead, they described files and versions. Abstraction and all that...

      Anyways, this idea may not help everybody, but it sounds like my old users would have liked it (they, btw, were very good about using specific and accurate keywords... no QWERTY effect here; they just didn't think in terms of files and directories). Plus, it's nice to see somebody trying to move past the "files and directories" mindset we've had for the past 3 decades.

      --
      All's true that is mistrusted
  3. Remind anyone of something? by chrisseaton · · Score: 5, Interesting

    What Microsoft suggested something like this, everyone went mental, and I got bitch slapped for saying I thought it was a good idea.

  4. Re:What's wrong with hierachical systems anyway? by MacAndrew · · Score: 5, Funny

    What's wrong with hierachical systems anyway?

    Well, they're pretty darn hard to spell, for one thing. ;-)

  5. Re:What's wrong with hierachical systems anyway? by Mononoke · · Score: 5, Interesting
    They work fine for me
    What's wrong with HFS?
    1. Not confusing enough.
    2. No possibility of new patents.
    3. Lack of ability to lock users into your proprietary file system.
    I didn't know HFS was broken.
    --
    NetInfo connection failed for server 127.0.0.1/local
  6. LIAR! by gazbo · · Score: 5, Funny

    Microsoft couldn't have come up with this idea: the submission explicitly states that it wouldn't be possible outside the free software model. QED.

  7. looks like very high quality work, but... by bartman · · Score: 4, Insightful

    While I do think the work presented is a great idea, it seems to me that it's a lot of effort just to setup the system.

    It would be ideal if the computer -- the thing that is supposed to make life easier -- did the clasification. Until that happens I cannot see myself even considering such a file access method.

    --
    -- bartman
  8. Re:What's wrong with hierachical systems anyway? by archeopterix · · Score: 5, Funny

    The answer is in the G:\archived\userFolders\shlemiel\appfiles\textdocs \myFavEditorFiles\compDocs\scratch\WhyHierarchical FSBad.txt file.

  9. Re:What's wrong with hierachical systems anyway? by b_pretender · · Score: 4, Interesting
    If you try to forget everything that you know about computers, and then abstractly think about what a filesystem should be you come to one of the following two conclusions:

    1. "Filesystem? I don't need no stinkin filesystem!" An ideal Palm-esque computing environment wouldn't have any filesystem. There simply isn't any reason for it. Why would you store addresses in an address file or a book report in a word file? Saving/Opening files should be transparent to the end user. Versioning should be built in, yet simple to understand. Forking files can be accomplished without copying a file. This is intuitively the simplist idea.

    2. If you somehow *have* to think in terms of files, then your conclusion may be to use files. However, I don't see why anybody would come up with a hierachical file system, unless they were accomidating for hardware limitations. Placing files somewhere within a huge directory tree is just too darn complicated. Why should the same file not exist in multiple directories? Why should copies of a file exist? Everything, including advanced security policies (more advanced than what is currently possible) is available for a *keyword* driven filesystem.

    I believe this is a step in the right direction and I can't wait until my favorite OS (not Linux) adopts a similar feature.

  10. Didn't BeOS have this years ago by nosse_elendili · · Score: 5, Informative

    "This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."

    ... or not. As I recall, BeOS had a fully functional database driven file systerm although it did not entirely through out the hierarchical side of things either (probably a good decision in my opinion). In fact, I recall reading a while back that future versions of Windows were supposed to have database driven file systems as well.

    While free software is great, let's not get too cocky about what kind of innovations it can produce when we aren't aware of what the traditional software companies have already done.

  11. Historical Q by MacAndrew · · Score: 5, Insightful

    Who came up with the idea of "folders" anyway? Not hierarchical trees, but the metaphor.

    The biggest problem with folders is no one wants to be a file clerk and weed, sort, and file their docs. The act of socking away a doc should as mindless as possible, not because (all) users are mindless but because they have better things to do, and shouldn't spend a minute adding keywords to every doc they might never see again.

    You know how it is -- you're searching and coming up with junk, and want to yell at the computer, do what I meant, not what I said! This would be one of my first pics for AI on a personal computer.

    I agree folders doesn't cut it, though as a metaphor for explaining the tree it's not bad. The problem is the tree.

  12. Not sure it's any better... by ArthurDent · · Score: 5, Interesting

    I agree. Basically the only way this is different from your HFS is that it encapsulates the meta-data (that is currently in the path name) differently. I'm not sure that's any better or worse. In fact, I myself like to be able to see at a glance what all the categories of documents that I have are which is quite easy with HFS, but doesn't sound so easy here. Perhaps that's more because this is a new idea and not mature yet.

    Everyone seems hot to SQL the file system, and while I think that will be the way of the future, I don't think that there is a clear view of how that works from the user's perspective yet. Remember that this is a rather large paradigm shift from what everyone is used to. It's going to take a while for this to mature to the point that Joe User is going to be able to hack it. I mean, I looked at the Save As dialog on that page, and while it looks cool it also looks counter-intuitive to me and I'm a developer! How much more will a user get confused?

    All in all we're going in the right direction, but by no means are we anywhere near the goal yet.

    Ben

  13. This system would demand a lot of discipline... by MyNameIsFred · · Score: 4, Insightful
    ...you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on...
    The problem I see with this system is that it requires you to be disciplined when you save a document. I could see something like this working for things like MP3s where there is an internet database that could be used to select the appropriate attributes. However, in the work environment where you're cataloging Word files and Excel spreadsheets, I don't see it as useful. From my experience, when I'm searching for an old file, its never for the reason I would have guessed, so I wouldn't have picked the right attributes when I saved it. In fact, I find it best to use features such as the MacOS X find dialog (or grep on the command line) that allows me to search by content.
    1. Re:This system would demand a lot of discipline... by Just+Some+Guy · · Score: 5, Insightful
      Furthermore, it's hard enough to get people to give their documents reasonable names. Convincing them to tag their files with accurate meta-data seems like an exercise in futility. I can hear the conversations:

      IT staffer: "That's the 3rd quarter financial report? You should click 'Financial', 'Quarterly', 'Company-wide', and 'Public'."
      Secretary: "I already named it T42f.doc. Get it? 'T' for third. '4' for quarter. '2' for 2002. 'f' for financial - 'F' is for filing'."
      IT staffer: "But noone but you can find it!"
      Secretary, with a wink: "Hmmm... I never thought about that."

      I'm really not joking. If you can't get people to use filenames like "Prelimary quote to Foo, Inc. for widget sales 2002-12-23.doc", why are they going to bother picking those attributes from a menu?

      How about this: Give the users a palette of choices (with the ability to add more as required), and generate the filename based on their choices. Don't even give them the option of whipping up their own personal hash table - make them let the program come up with reasonable names for everything. You could even set a threshold, such as "At least one attribute from each category must be checked", or "every file must have at least 4 attributes".

      --
      Dewey, what part of this looks like authorities should be involved?
  14. Re:SQL does not cut it by Zeinfeld · · Score: 4, Insightful
    What we really need is a really relational, full DBMS (with sane defaults) as the fundamental storage component of an OS.

    That was done pre-UNIX with PICK. The whole O/S was a database.

    Microsoft has been working on an Object File System for years and it is rumored that it might finaly ship in Yukon.

    A database baked file system is a great idea for an O/S. But the relational model is long overdue for the garbage pail. Modern programming languages since C have used pointers or object references. If JOIN and messing arround with tables is so good why don't we all use COBOL?

    One of the things that appeared in VMS a while back that was pretty cool (and pretty easy to do on a log based file system) was transactions at the file level. You could take any set of file I/O operations and wrap a transaction arround them. This meant that you could have atomic updates to any file base resource without having to suffer the pain of SQL.

    It would be pretty easy to implement this on a Linux log based file system (or windows for that matter). All you do is extend the log structure so you can group operations together and implement some sort of commit flag.

    You could then build an object oriented filestore database using XML flat files. OK so maybe the system is not going to be up to storing millions of records without more infrsastructure. However most programming tasks use configuration files that are unlikely to be more than a few tens of Kb and are routinely managed as in memory structures anyway.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  15. Plz don't forget E-Mail and Web documents by egghat · · Score: 4, Insightful

    I have used "The Brain" while I was in Windows, but it was nearly useless as it didn't support the two most important things:

    a) Web browsing

    it should now the sites you've visited, know your bookmarks and allow you to open everything you have found with a simple click.

    b) E-Mail.

    When it finds an E-Mail a simple double-click should be enough to open it in your mail, show you the thread it belongs to, etc.

    I guess, that I'm not the only one, who has more important things in mails than in .docs or .xls.

    Bye egghat.

    --
    -- "As a human being I claim the right to be widely inconsistent", John Peel
  16. Re:What's wrong with hierachical systems anyway? by archeopterix · · Score: 5, Insightful
    Yeah, non-descriptive directory names are poo. But make those directory names descriptive, and all of a sudden you're not so much of an idiot.
    There are bigger problems than non-descriptive names:

    1. Paths tend to get long.
    2. You have to be careful of your "current path". Some apps have weird defaults and if you're not careful, you end up with your file in a strange location.
    3. Some items do not fit into the hierarchical structure. Should my porn directory be organized into movies, stills and texts or perhaps perverted, spicy and nice? Whichever atrribute I choose I will have trouble searching on the other.

    Of course I can always use locate or find, but these tools only look at preset attributes (filename, last access date, substrings) and the solution from the article lets you specify your own attributes.
  17. This should be implemented at the FS level by Anonymous Coward · · Score: 4, Insightful

    So where do your documents go when you save them with newdocms? As you might have noticed (if you looked at the window titles after saving something), they are stored as ~/Docs/{numeric id}.{ext}.7 All the metadata is stored in a file called ~/newdocms.db. (It is not wise to delete it!) In that file each document's attributes are associated with its unique numeric id (the one which is used as a file name).

    Right.

    This is astoundingly bad software engineering.

    Manuel, when your software fails, and it will, and somehow that db file gets trashed you've rendered that users' files as a huge heap of unsorted data. Effectively it would be 100 times worse than never implementing your system than 10 times better. No matter how bulletproof you think your code is, it probably isn't 100% perfect so having all your eggs in one basket is unwise to say the least.

    Even if your code is 100% perfect this is a mistake. What happens when a sector goes bad and this file is trashed? What happens when the first really dangerous linux worm makes it a point to delete *.db from the filesystem?

    Give the files names that are coded with human readable attribs! Double up that db file! Jesus, man... build SOME kind of redundancy in your system before you throw away the old way of storing the data.

    There's a reason why there is such a scramble to implement a general attribute system at the FS level on many FS projects right now(*). The time has come for OSS to start being smart about this, but cramming all your metadata into a single file and throwing the backup out the window is just a very, very poor idea.

    (*) BeOS was, yet again, way ahead of it's time with BeFS.

  18. Intuitive by ACNeal · · Score: 4, Interesting

    Hierarchical file systems are as close to intuitive as you get. Everything you do in the real world, as pertains to dealing with information, mimics a hierarchical file system. Your chilton manuals are in the garage, your cookbooks and recipe boxes are in the kitchen or dining room, your computer books are by your computer. You don't look in the computer manual for how to change your oil. When you are trying to bake a cake, you don't walk out into the garage for inspiration. Having information organized into different places, and then having those places subdivided into different boxes is intuitive, and is how most organized people think.

    1. (a) "We don't need no stinking filesystem." The ideal palmesque OS would have the same idea just demonstrated differently. You aren't going to open up your notepad to see an address. The address file is in the address program (directory). The schedule file is in the calendar program(directory). The programs you use to open the files become your folders.

    1. (b) "Saving/Opening files should be transparent" The only people that would think like this in the real world have been living with someone that picks up after them all the time. When you are working on some (paper and pencil) project, and just stand up and walk away, do you exepect it to be available at the office tomorrow? When you start working on several projects in succession on your desk, and have reams of loose paper, can you easily bore your way back down. No, reasonable, organized people pick up the porject they are working on, file it away in the file cabinet/brief case/wherever it is supposed to go. There are logical beginnings and endings to your working on a project that only you can decide on. A spreadsheet, for example, do you want it to save every time you make a change... No, by their design, you would normally set up all your formulas, save that, and then every day/month/year open up the spreadsheet, plug the numbers, get the results, and save the specific results to a different file, or just look at the values produced. Not to mention, when you sit down at your desk in the morning, do you expect your desktop to know what project you want to work on? No, and you don't expect your computer to know what project you are working on either. Opening/Saving files shouldn't be and can't be transparent to the user.

    I used to use a lot of floppies when growing up. I appropriated a lot of disks from other places. I used the "grab the black disk with the couple of remnant label pieces... no the other black disk... No, the one with the two small pieces of adhesive... Ooops, the one with the three pieces..." Now, I have to search all the disks everytime I want anything off of them, because I never labeled them. Saving things in well defined locations, for well defined tasks is reasonable, intuitive, and necesary task to saddle a user of any system/technology/information with.

    2. I don't really need to address this point specifically, since the answer is inherent in the points above. The overly large filesystems are part of a whole system that the user doesn't really need to know about. That is why the "Desktop/..." paradigm of Windows came about, and is so useful. People working on your word processor have a reason to put the font files in one directory, the plugins in another, and the preferences in a third. The user couldn't care less. If you start the user in a directory tree just for them, then they won't be stuck in a huge file system, and can still work in a fashion that has made sense for litteraly thousands of years.

    The filesystem paradigm has been around for a long time, again litterally thousands of years, because it works, it is easy, and it is how people think.

    G:\Netowkrfilesystem\
    Accounting\AccountsReciev able\Yesterday\Tomorrow\A WeekAgoToday might be confusing, but the filesystem paradigm isn't.

  19. agree by ragnar · · Score: 4, Insightful

    I believe metadata is a useful additional means to find files, however I would still want heirarchy as the primary storage. For most people the only metadata they ever consider is the name of a file, and this is often poorly named. I applaud the effort of the person who is doing this project though.

    --
    -- Solaris Central - http://w
  20. Re:What's wrong with hierachical systems anyway? by OneEyedApe · · Score: 5, Interesting

    I've noticed about three main types of people in the world of open source: those who fix things, those who try to improve existings things (i.e., make it run faster, smaller, etc.) and those who like to tinker and make new stuff. This person seems to fit in the third category. As far as I can tell, this person is not so much trying to "fix" the file system, but to make a new and different version and/or approach to it. This may be a good thing. But if you don't like it, don't use it.

    --
    Life sucks, but death doesn't put out at all....
    --Thomas J. Kopp
  21. Re:Folders by egreB · · Score: 4, Interesting

    Russian puppets - forgot the name
    Babushkas. If you want some, there's always Google.

    Consider this: you save your spreadsheet today as "Yearly Report 2002", and two days later you want to call it back your mind just doesn't say "Yearly Report 2002", but more like "Financial Data last year". Then your nice database-filesystem won't find it either. Unless there is some serious AI backing it.
    Now that would be an interesting file storage abstraction. I've played with the idea of a relational file structure, that would enable one to save meta-information on a file and later find it by information that relates to it. Implemented correctly, you could save your "Yearly Report 2001" and later find it by asking for "financial data two years ago". Something that combines newdocms and ThoughtTreasure.

    ThoughtTreasureTM is a relational information storage handler combined with a (semi-)intelligent AI. You can supply information like "Peter loves Paul" and "Paul hates Cahtrine." You can then ask questions like "Who does Peter like?" and "What relationship are there between Paul and Cahtrine?" If you say stuff like "Peter dislikes Paul" it complains like "But I thought Peter loved Paul." But it goes far further than that. You can have it parse a movie review, and ask about information about the movie "Who directed Pulp Fiction? Who starred it?"

    Combined with a file storage solution, this would open quite interesting, new forms of computer file storage.

  22. Re:If I can't text process it, then I don't want i by tweek · · Score: 4, Interesting

    Actually I would LOVE to have everything accessable in a database somehow. I've been wondering about something using the userfs stuff. Not really mounting a mysql database as a usermode filesystem but having information from the system available that way.

    I've found myself many times wishing I could just type "select location,filename from datastore where contents like %resume%"

    SQL comes much more naturally to me than the find command does. I would love an easier way to index the contents of everyfile on my system by an arbitrary number of metadata and then have that accessable via a simple sql statement.

    I remember Scott Hacker did something similar with BeFS and his webserver at somepoint but he's long gone as is BeOS.

    Am I the only one that this makes sense to?

    --
    "Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
  23. Re:What's wrong with hierarchical systems anyway? by Anonymous Coward · · Score: 5, Funny

    I've noticed about three main types of people in the world of open source

    Unfortunately you overlook the fourth and largest group -- those who COMPLAIN about everything and do nothing. :)

  24. Windows groundwork by SteveX · · Score: 5, Informative

    Windows XP has most of the groundwork for this - Windows has actually had it for a while; for some reason the last piece (the filesystem that lets you take advantage of it all) keeps not showing up.

    You want metadata on files? NTFS streams give you a place to store metadata (much like Mac resource forks but with any number of named streams).

    You want to search on the metadata? The Microsoft Indexing Server will build a database and let you search on it (though it's a very strange system to use - in XP go into Administrative Tools, Computer Management, Services and Applications, Indexing Service, System and click on "Query the Catalog". You can do instant searches for all kinds of stuff, look at the help.

    OLE Structured Storage is like a single file version of the filesystem we're talking about - a way of saving a bunch of objects (some of which you didn't create but that are in your document) into a file. I believe Microsoft's Office apps use it (could be wrong there though).

    Right-click on an MP3 file and pick Properties in XP and go to the Summary tab. There's the metadata - the stuff the index server is going to index. If you add a new file format to the system, you can supply a DLL that will be able to supply the metadata for those files - so you download an MP3, save it on your disk, and the index server uses the DLL to get the metadata and add it to the database. It works pretty well.

    I don't really have a point to all this, just listing some stuff that Windows has that "should" make it easy for Microsoft to add the OO FS someday and have it instantly work with existing apps.

    - Steve