newdocms: Beyond the Hierarchical File System
Manuel Arriaga writes "After two years of hard work (and many scrapped versions), I have just released a (ugly, but working!) preview version of newdocms, a completely new document management system. newdocms isn't a file browser: it is a layer between the hierarchical file system (HFS) and the user, which provides a radically new way to store and retrieve documents. No longer will you browse complex directory trees or directly interact with the HFS; instead, you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on.
For the first time you have a true alternative to the hierarchical file system at the OS level. Through the modification of the KDE shared libraries, newdocms currently works with all KDE apps! (I am looking for volunteers to add support for GNOME and OpenOffice.org!) This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
They work fine for me
I'm already using The Brain. It's *really* unique, and it works. It works very well. And, in addition to organizing files the way YOU want them organized, it also connects random thoughts, web sites, emails, etc. If you haven't seen it, check it out. It's pretty damn incredible.
It sounds basically like when you want to find a file, you go type in a few pieces of meta-data, and then hit "search". It's a way to do it, but it seems to me (and it's early, so bear with me) that it's easier for me to remember one piece of meta-data (i.e. the path to the file) than several (as it would seem with this setup, as you would have to present more than one piece of data to differentiate between different documents, let's say, created by the same author on the same day). Maybe I'm just used to a HFS, but I find it simple to open up a command prompt and type "pico /documents/foo/bar/fubar.txt".
Anyway, an interesting concept.
I think this will really on the user providing meaningful information in too big a way... I have users now that can't find the files they saved just the other day, and who can't cope with hierarchical folders arranged in chronological format...
Great idea though... although, come to think of it, it might just be that everyone is so used to what they have, they just treat anything else as anathema.... keep at it though, they used to brun people who said the earth was round...
Don't I remember reading something about the Blackcomb file system being database driven? Billg called the current file system a "cesspool" and said it's going to be completely overhauled, IIRC.
Oh well, in a few years the *n?x-philes will be screaming about M$ stealing their ideas. Figures.
What Microsoft suggested something like this, everyone went mental, and I got bitch slapped for saying I thought it was a good idea.
This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
How is an "ugly" beta version of an untested new file management system a testament to the power of free software? And why is this better than a hierarchical system? Hierarchies make sense to Joe user. Normalized databases (you did normalize it, right?) do not. And why on earth would I want to set all kinds of BS attributes on a file instead of just clciking File, Save As, and then hitting the little "My Documents" button in the window that pops up?
I have worked with many a user that has had problems with the concept of folders (directories). Perhaps those users can grasp this concept easier.
but I already organize my docs, apps and files by hard drive/app/purpose/category/file. so not sure if I want to change the way I've been organizing my systems. now if only it was written say 10 years ago, then it would have a better chance.
Of Ray Kurzweil's "narrative filesystem". He did a presentation on it at an expo last year, it seemed pretty interesting. Although, at least for now, I'm with the guy who posted above expressing satisfaction with hierarchical filesystems, it's nice to see that there is open development of this kind of thing. I shall download and test frothwith!
Don't read this!
This seems to be written from a user centric point of view.
So when i start a daemon it will have to do a database query to read it's config file?
What kind of unforseen security issues are gonna pop out of this thing?
Sure searching for end user documents is nice, but uh heh well "locate *.rtf" works for me just fine at this point...
Wouldn't be better to do this at kernel level, just like BeOS did? Basically, this would need to implement extended attributes (linux: partially done), attribute indexing, notifications (linux: dnotify) and use it all together. Basically, BeOS "live queries" or "live searches" were basically index listings with notifications on change.
SQL is not good enough, because it subtract features, add arbitrary restrictions and is not as simple and powerful as the relational model for database management.
What we really need is a really relational, full DBMS (with sane defaults) as the fundamental storage component of an OS.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
sPh
XFS sounds like it would be the perfect underlying file system for this.
Microsoft couldn't have come up with this idea: the submission explicitly states that it wouldn't be possible outside the free software model. QED.
While I do think the work presented is a great idea, it seems to me that it's a lot of effort just to setup the system.
It would be ideal if the computer -- the thing that is supposed to make life easier -- did the clasification. Until that happens I cannot see myself even considering such a file access method.
-- bartman
How much revenue is made from this?
In a time of disasterous unemployment and financial breakdown in the technology sector of the economy I think the money-making question is _very_ important regarding all new projects.
My father is really going to understand that. Not a bad idea but the implementation appears to need work. Another interesting thing to note is that this is probably coded in C++ and is going to be a bitch once again to interface with scripting libraries. I love KDE but it is a difficult task to integrate other languages with.
Got Code?
I think that this program highlights a problem with *nix that really needs to be addressed before it becomes a truly mainstream desktop OS. Namely, the filey system is so complex that it is extremely intimidating for a new user to know where things are located. At the risk of recieving nasty comments in reply, MS has known for years that documents, programs, and OS should be placed in their own distinct directories. This program is a step in the right direction, but the underlying issues that it deals also require attention.
Exactly. Users STILL have to create their own type of organization.
/documents contains documents. Duh.
/documents/work contains documents for work.
The problem is people don't want to be organized, so they look to technology to help them be lazy. Plus try explaining 'metadata' to someone. At least now you can use the file cabinet, drawers, folders, papers example to explain the layout to someone.
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
sounds familiar to scopeware which was was covered by slashdot earlier.
reSisTanCe iS fUtILe
ting systems, not that the commercial developers couldn't do it with their own products. Clearly they could. Or are we now claiming that "innvation" belongs solely to the open source community now and not to commercial developers?
You like your Macintosh better than me, don't you Dave? Dave? Can you hear me Dave?
"This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
... or not. As I recall, BeOS had a fully functional database driven file systerm although it did not entirely through out the hierarchical side of things either (probably a good decision in my opinion). In fact, I recall reading a while back that future versions of Windows were supposed to have database driven file systems as well.
While free software is great, let's not get too cocky about what kind of innovations it can produce when we aren't aware of what the traditional software companies have already done.
Who came up with the idea of "folders" anyway? Not hierarchical trees, but the metaphor.
The biggest problem with folders is no one wants to be a file clerk and weed, sort, and file their docs. The act of socking away a doc should as mindless as possible, not because (all) users are mindless but because they have better things to do, and shouldn't spend a minute adding keywords to every doc they might never see again.
You know how it is -- you're searching and coming up with junk, and want to yell at the computer, do what I meant, not what I said! This would be one of my first pics for AI on a personal computer.
I agree folders doesn't cut it, though as a metaphor for explaining the tree it's not bad. The problem is the tree.
This idea was made by GNOME and now being inherited and implemented by KDE. Read here. And again please don't make Linux start to suck with that idea.
The Brain is an interface on top of your current FS. Things like this have been done going back to the days of the Leading Edge Word Processor (separate file to get around the 8.3 naming conventions).
I believe the point that this mad scientist was making was that he's completely replaced the FS with this new database-based one.
It's certainly not innovative, but it's something different I guess.
--- I wish I could hear the soundtrack to my life. That way I'd know when to duck.
I agree. Basically the only way this is different from your HFS is that it encapsulates the meta-data (that is currently in the path name) differently. I'm not sure that's any better or worse. In fact, I myself like to be able to see at a glance what all the categories of documents that I have are which is quite easy with HFS, but doesn't sound so easy here. Perhaps that's more because this is a new idea and not mature yet.
Everyone seems hot to SQL the file system, and while I think that will be the way of the future, I don't think that there is a clear view of how that works from the user's perspective yet. Remember that this is a rather large paradigm shift from what everyone is used to. It's going to take a while for this to mature to the point that Joe User is going to be able to hack it. I mean, I looked at the Save As dialog on that page, and while it looks cool it also looks counter-intuitive to me and I'm a developer! How much more will a user get confused?
All in all we're going in the right direction, but by no means are we anywhere near the goal yet.
Ben
Ok, everyone, before you say "I like heirarchal systems, why the hell did he do this?" think about the benefits. I for one, find this to be one of the most interesting OSS projects i've seen in a while because of the layer that it operates on.
As we generate and download more and more "content" on our systems we will need better and better ways to search it. I'm tired of navigating through 5 folder deep structures to get to particular files.
Proprietary file formats are to blame. Imagine if every file format was similar to open office xml based documents. A background process could be indexing the documents and making available very powerful searching. Then add a natural expression engine on top to get the documents. "show me all documents that contain the word resume"
Got Code?
Who needs this? As one poster put it, isnt the path the only real piece of meta data you need to find a file? Think about mom and dad. What do they want to know? "Where are the christmas pictures of the baby from last year?" "What happened to that email I sent my brother last week?" "Where's the latest copy of my resume?" and so on. Natural language aside, these are all metadata-type queries (mostly dealing with time and filetype data, both of which can be extracted without any additional effort by the user). I think that if such a system of searching files is ever perfected, we'd have a serious killer app on our hands. Isn't this part of what the "semantic web" is all about? Isn't it frustrating to everybody that even the best search engine in the world still can't understand "find me all books whose author is mark twain"? It seems like a logical progression to expect that. Just like most of us aren't searching the web for *pages*, but rather particular *informatin* on those pages, I think that Joe User doesn't care about looking for *files*, but rather the information contained in those files. Thus it's only reasonable that if you give a user a way of easily describing those files by something more than just a filepath, that it will then be easier to find the information later.
www.HearMySoulSpeak.com
or something very much like it a few years ago.
i used it and it works like a charm.
of course hierarchical file systems are easy to use, you can name folders after categories, and they are easy to backup.
Interfacer.
all your HFS are belong to us.
for search etc.
Otherwise people will stumble with attributes, use misspellings etc.
Can happen with HFS, too. But people tend to stick to the old directories ("my files, my pictures, my porn") than creating new ones.
Owner of a Mensa membership card.
But thankfully, it's an article about file systems.
Stop corporate
My job has been shopping for this during the last year. Apparently, they already bought a system and will be installing it in Feb. I just wish there had been an alternative back a year ago.
Congrats anyhow!!!
Sounds a lot like BFS.
If memory serves me correctly, the BeOS team was originally trying to do a pure database filesystem (no hierarchy), but found (in the early '90s) that the performance hit was too heavy on the hardware of the time.
The whole desktop/file/filesystems may indeed be ripe for a new metaphor to help conceptualise them. When computers were the principal domain of workers, the idea of a desktop with files and folders allowed them to grasp alien concepts.
But computers are becoming ubiquitous, pervasive. Perhaps a new metaphor could be found. An example could be objects in rooms. Think of different folders as different rooms - all files (or rather, all streams) are objects in those rooms. Navigation between rooms is possible through doors.
Of course, as others have pointed out, the HFS ain't broken, so why fix it? (Answer: why not? PC cases aren't broken, but we still have case-modders, don't we?)
It seems to me that the majority of people who reply with "I use HFS just fine, file-> save as -> my documents works just fine" are also the type of people who don't actually create more than a few documents anyway.
I write a lot of documents and my filing system becomes ever more difficult to manage, without the skills that a librarian or filing secretary has I find that my documents become harder to locate over time. To me this is a potential solution to that problem, I do however appreciate that "Joe Bloggs" will not understand what it is about, but as far as I am concerned "Joe Bloggs" should not be using computers in the first place. Pandering to his ilk has set computing back 10 years.
The potential pitfall of this system could be where many documents have been written about the same subject i.e. testresult001.txt to testresult999.txt. The user would know with the traditional system that he wants testresult823.txt but with the new system would be presented with 1000 choices. I am possibly being myopic here!
Perhaps it is time for a new paradigm and I for one will be looking at this method with great interest.
Sounds like Microsoft Sharepoint to me. Sorry, but that doesn't excite me too much...
If so, and based on the bad Sharepoint implementations I have seen, this seems unlikely to be world-shaking. How about a follow-up article on this in a year?
- -
Are you an SF Fan? Are you a Tru-Fan?
This is exactly what I have been wanting for almost a decade now.
..etc as well as those that are simply wallpapers and photos). More importantly, if you see a good bump texture for a certain surface, describe it as such without changing the filename.
Some uses I imagine
- Create music playlists on the fly (MoodLogic doesn't count)
- Categorize work files (Across the whole partition, find images that serves as bumps, HDRI
- Install Windows and service packs first, mark files as "windows native". Then install apps. Some OS glitch, you need to reinstall ? Backup all files with directory structure which don't have "windows native" tag alongwith c:\program files and registry. Reinstall windows, restore the backed up files. Voila, no app installations required.
Correct me if I'm wrong but weren't the early versions of PICK based on a concept such as this? Alas, seen from here (Raining Data) it seems that PICK Systems has merged... and here PICKBASIC History is a interesting read...
I remember back when Apple was going out of business and I switched from Mac OS 7 to Windows. The single most disorienting thing about it was the effort that Microsoft went to to insulate me from the filesystem.
On the Mac, I was used to double-clicking the hard drive, then navigating through the directories to the application I wanted to use, and launching it from there. In windows there was this blasted Start button and some mysterious heirarchy of things under the "Programs" tab... No idea what they referred to, no sense of where they came from. It felt like if you were to install an application that didn't put something there, you'll never see it again.
Needless to say, I hear "insulate the user from the filesystem", and I start to worry.
On the other hand, this is free software after all. Free as in Choice. I may well download it and play with it. At this moment I doubt pretty strongly that I'll prefer it to an HFS approach. But I'm all for choice!
As it stands now, it seems to me that this approach is the same thing as putting all files in one large flat directory and then viewing the files based on the extention. i.e. *.avi *.mp3 ect. Seems to me that you can do this with a hierarchical file system and the way we do things today gives you more control in how to store items. For example putting multiple word docs in different folders based on use.
I have used "The Brain" while I was in Windows, but it was nearly useless as it didn't support the two most important things:
.docs or .xls.
a) Web browsing
it should now the sites you've visited, know your bookmarks and allow you to open everything you have found with a simple click.
b) E-Mail.
When it finds an E-Mail a simple double-click should be enough to open it in your mail, show you the thread it belongs to, etc.
I guess, that I'm not the only one, who has more important things in mails than in
Bye egghat.
-- "As a human being I claim the right to be widely inconsistent", John Peel
> The problem is people don't want to be organized.
Thank you for telling me what I want and what not. Without you and your knowledge what I and others want, this world luckely don't suck.
I was reading about Reiser4 last weekend and HR mentioned similar functionality IIRC. I would hope everyone can sees the point behind metadata...it's kinda the reason XML is considered a GOOD THING. The question is...can we shift our paradigms to use this newer model? Change is hard to effect...this would have to be adopted be a mainstream OS for this to really catch on and be widely used. (Asbestos uunderwear on!) Isn't Longhorn's new DB filesystem also supposed to offer some or aLL of this? (RTFA if you want to reply please!) MS might not be as behind the curve as we'd like to think....time will tell if this will actually be widely accepted. My .02.
Always value the individual over the system. --Bruce Lee "I don't need a Sig - I have a custom 191" - me
What happens when the database gets trashed?
This idea isn't that great.
Sounds like someone thinks they're smarter than they really are.
of the difference in the GUI vs. command line mind set.
These abstraction layers have been used before on OSes such as MAC OS and OS/2. The problems always came into play when you pass the files around. There is always a step that strips the extended information. The key is wide acceptance and establish a standard for the data storage. Be sure there is a way to pass the extended data in a text format (i.e. XML) when you want to store the files on a non-supported system (or so command line tools can be easily modified to update the db).
The idea is good and I am sure it will be very useful to a lot of people. Good Luck.
I believe the point that this mad scientist was making was that he's completely replaced the FS with this new database-based one
"it is a layer between the hierarchical file system (HFS) and the user"
ALso, have you ever heard about ODMA (Open Document Management API)? This was intended as a cross-pattform interface for DMSs, but so far only has Windows implementations. Why not use an already established standard so that any DMS-aware application can use whatever DMS is installed on the system?
I think this is great. I've been thinking for a long time that the hierarchical file system is not necessarily the best way of organising data. This idea definitely merits serious consideration.
Even if it never comes to maturity in its present form, its good to see free software being used as a testing ground for new ideas as well as re-implementing existing concepts.
SIR, I suggest you make use of the racial slurs database. I will not do your homework for you.
Personally, I do not think computers need to move towards making things yet easier for the user (instead people need to get a clue!). I do see how this might save some in support costs, but the end result will be dumber users, who now still can't find things, just for different reasons.
If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
this sort of innovation could never happen if it weren't for the free software nature of the underlying systems
Surely this is an overstatement. I think what you mean is that a guy off the street couldn't add this file navigation scheme to an existing commercial OS, not that the commercial developers themselves couldn't do it. Or are we now suggesting that the open software movement is the sole owner of the term "innovation"?
You like your Macintosh better than me, don't you Dave? Dave? Can you hear me Dave?
So where do your documents go when you save them with newdocms? As you might have noticed (if you looked at the window titles after saving something), they are stored as ~/Docs/{numeric id}.{ext}.7 All the metadata is stored in a file called ~/newdocms.db. (It is not wise to delete it!) In that file each document's attributes are associated with its unique numeric id (the one which is used as a file name).
Right.
This is astoundingly bad software engineering.
Manuel, when your software fails, and it will, and somehow that db file gets trashed you've rendered that users' files as a huge heap of unsorted data. Effectively it would be 100 times worse than never implementing your system than 10 times better. No matter how bulletproof you think your code is, it probably isn't 100% perfect so having all your eggs in one basket is unwise to say the least.
Even if your code is 100% perfect this is a mistake. What happens when a sector goes bad and this file is trashed? What happens when the first really dangerous linux worm makes it a point to delete *.db from the filesystem?
Give the files names that are coded with human readable attribs! Double up that db file! Jesus, man... build SOME kind of redundancy in your system before you throw away the old way of storing the data.
There's a reason why there is such a scramble to implement a general attribute system at the FS level on many FS projects right now(*). The time has come for OSS to start being smart about this, but cramming all your metadata into a single file and throwing the backup out the window is just a very, very poor idea.
(*) BeOS was, yet again, way ahead of it's time with BeFS.
This looks a lot like something I've used in the past - FileNET Content Management Services. FileNET lets you create meta-data for each document you save, as well as a complete version history and check-in/check-out for each document if you want to. It also allows for hierarchical storage of files as well as using the meta-data so you can still categorize things by folder if you want, but still query documents by any of the indexes that you have built. It will even add a full-text search across everything in the library if you want, and it has no problems indexing most standard formats including Word and PDF files.
Microsoft Sharepoint also allow you to store your own metadata with files - and also grab the "properties" from office files. This is not to substitute the folder tree, but in addition to it, and it's indeed a great tool (aimed more at the corporation than the individual)
:-P
But it's MS and here I am burning karma for just mentioning it. Big deal, I can spare the karma
The AS/400 and its predecessors have had a database oriented file sysem for centuries (in Moore-law years).
I find it a nuicance from a programmer's point of view, and indeed it can get quite messy when you have about 100 different Libraries on your 400, each with a few dozen Objects, some of them with another few hundred members etc etc. From the point of view of the application, and the end user (who will typically have only a single version of a few applications installed on his dedicated database server), it is the greatest thing since sliced silicon.
I think it predates the hierarchical filesystem by a lifetime as well (again in Moore's law years).
The system I have been dreaming of for a while would be far more graphical (had a quick look at thebrain.com, it's still text with a few lines as far as I can see).
My dream system would enable you to specify file attributes such as size, path(s), name, type etc, as well as regex greps on the content, and then plot the filing system in 3D space, through which you could move with a joystick. You would be able to assign attributes to graphical features, eg make scripts cuboid, text files spherical, bigger files bigger on a logarithmic scale and so on. Related files would appear like solar systems, and by changing the importance of the file attributes you could change the way the files grouped.
Probably not what you'd want to use every day, but I'm sure I'd find a few mislaid files with such a system.
Virtually serving coffee
Hierarchical file systems are as close to intuitive as you get. Everything you do in the real world, as pertains to dealing with information, mimics a hierarchical file system. Your chilton manuals are in the garage, your cookbooks and recipe boxes are in the kitchen or dining room, your computer books are by your computer. You don't look in the computer manual for how to change your oil. When you are trying to bake a cake, you don't walk out into the garage for inspiration. Having information organized into different places, and then having those places subdivided into different boxes is intuitive, and is how most organized people think.
v able\Yesterday\Tomorrow\A WeekAgoToday might be confusing, but the filesystem paradigm isn't.
1. (a) "We don't need no stinking filesystem." The ideal palmesque OS would have the same idea just demonstrated differently. You aren't going to open up your notepad to see an address. The address file is in the address program (directory). The schedule file is in the calendar program(directory). The programs you use to open the files become your folders.
1. (b) "Saving/Opening files should be transparent" The only people that would think like this in the real world have been living with someone that picks up after them all the time. When you are working on some (paper and pencil) project, and just stand up and walk away, do you exepect it to be available at the office tomorrow? When you start working on several projects in succession on your desk, and have reams of loose paper, can you easily bore your way back down. No, reasonable, organized people pick up the porject they are working on, file it away in the file cabinet/brief case/wherever it is supposed to go. There are logical beginnings and endings to your working on a project that only you can decide on. A spreadsheet, for example, do you want it to save every time you make a change... No, by their design, you would normally set up all your formulas, save that, and then every day/month/year open up the spreadsheet, plug the numbers, get the results, and save the specific results to a different file, or just look at the values produced. Not to mention, when you sit down at your desk in the morning, do you expect your desktop to know what project you want to work on? No, and you don't expect your computer to know what project you are working on either. Opening/Saving files shouldn't be and can't be transparent to the user.
I used to use a lot of floppies when growing up. I appropriated a lot of disks from other places. I used the "grab the black disk with the couple of remnant label pieces... no the other black disk... No, the one with the two small pieces of adhesive... Ooops, the one with the three pieces..." Now, I have to search all the disks everytime I want anything off of them, because I never labeled them. Saving things in well defined locations, for well defined tasks is reasonable, intuitive, and necesary task to saddle a user of any system/technology/information with.
2. I don't really need to address this point specifically, since the answer is inherent in the points above. The overly large filesystems are part of a whole system that the user doesn't really need to know about. That is why the "Desktop/..." paradigm of Windows came about, and is so useful. People working on your word processor have a reason to put the font files in one directory, the plugins in another, and the preferences in a third. The user couldn't care less. If you start the user in a directory tree just for them, then they won't be stuck in a huge file system, and can still work in a fashion that has made sense for litteraly thousands of years.
The filesystem paradigm has been around for a long time, again litterally thousands of years, because it works, it is easy, and it is how people think.
G:\Netowkrfilesystem\
Accounting\AccountsRecie
I believe metadata is a useful additional means to find files, however I would still want heirarchy as the primary storage. For most people the only metadata they ever consider is the name of a file, and this is often poorly named. I applaud the effort of the person who is doing this project though.
-- Solaris Central - http://w
What is the deal with people wanting EVERYTHING in a SQL/LDAP style databse! Every intern I have to manage out of college seems to have been brainwashed to think that whatever the app, it's data should be in a relational database.
I like datbases, but for somethings they should not be used!
When it comes to the OS, I want to be able to text process data EASILY...with BASH! This road leads to things like binary configuration files and that leads to things like the Microsoft registry which I detest.
Databasizing everything (including the filesystem) IS NOT THE ANSWER
How would this be better than searching an index of a HFS volume or disk? I don't have a problem finding files by name, size, volume location, or type on a plain old HFS. What's the big deal?
Transistors and Beer!!
I'm apparently not reading too well today. I managed to skip that sentence and zeroed in on the later one about changing libraries.
--- I wish I could hear the soundtrack to my life. That way I'd know when to duck.
It's about the same thing as a normal filesystem! Just with keywords.
/cdrom is my CD-ROM. On my server /usr is a different partition. How would that be represented?
First it talks about assigning keywords to files. Okay, that's an useful feature. Now, I look at the "improved" file save dialog. First it looks more confusing. Second, I wouldn't say the document is a letter. I might rather call it 'Essay' (not sure if it's the right word), and here I just lost my ability to find it by keywords. Following that same line of thought, I never remember using the term "computing" either. I might search for "Computers", "File systems" or "Data storage". It seems to me that every file would need 20 keywords to be easily found.
Later, it just seems to show that this idea won't fly by introducing collections. Hello, that's a directory! Or at least I see no difference between that collection and ~/photos/2002_07/beach, for example. The difference between that and ~/photos/beach/2002_07/ doesn't worry me at all, by the way, because there's a handy tool called find(1).
And I can think of more problems with that. How do I backup my music? I can imagine different categories it could be under, maybe 'music', or 'sound', or 'audio', or 'songs'. Can I be sure that all my music is tagged as 'music' and not one of the other ones?
Oh, and I just thought of another thing. How do we delimit physical disks in this way? On my computer I know
I have recently become very annoyed with the way I am storing information. I've realised that I have four parallel, similar, yet completely independent methods for cataloguing information. One is the file system - directories containing documents, images, etc on various subjects. The second is the Favourites list in my web browser, containing links to web sites on various subjects. The third is my email contact list, containing groups of contacts in various categories. The fourth is my mailbox hierarchy, containing archives of emails on various subjects.
What I realy need isn't a way to help store one category of information, but a single unified way to store all related information together, by subject. All my documents, emails, web links and addresses relevent to a particular customer, for example.
I don't realy need a search tool (although they're always a nice function to have), I need a way to keep everything together and easily accessible.
Simon Hibbs
That Amazon doesn't have a patent on this?
Funnies aside, I'll bet you half a bunch of grapes that there's at least one patent on something that arguable descibes this, just waiting for someone to implement it so that they can sick the lawyers on them. Them being you. I hope not, but I really wouldn't be surprised.
If you were blocking sigs, you wouldn't have to read this.
Shared-library hacks are not "the OS level" even if you're talking about libc, and even less so for something KDE-specific.
Wrong again. At least five years ago I was using a Windows shell extension that let me attach metadata to files and search by metadata. I don't remember the name off the top of my head, but it was similar to Explorer Notes or FileNotes or Annotater. Sure, those only work in Explorer, but that's no worse than only working in KDE. Far from being a "free-software innovation" this is something that's been kicking around for ages in the non-free world and the free-software version is (as usual) pretty late on the scene.
Slashdot - News for Herds. Stuff that Splatters.
My god. Our users are so dumb they can barely remember the name they gave a file 10 minutes ago, and now you want them to think in terms of "concepts," "terms," and "shortcuts"? This will confuse them sooo much it isn't even funny.
unique: Being the only one of its kind.
*really* unique: c.f. redundancy
If you were blocking sigs, you wouldn't have to read this.
Actually there are many products that are designed specifically for organizing pictures. I am familiar with the commericial ones, but there may be open source ones as well. For example, MacOS X comes with iPhoto, which automatically downloads pictures from digital cameras. It allows you to categorize photos, provide keyword descriptions for searching, etc. Extensis Potofolio provides more advanced capabilities for professionals. There are others...
The author mentions softlinks, but claims that most uses of them are to make shortcuts. Well, maybe on M$ systems, but real systems let you use them better, and a little education of users (and perhaps a GUI-based frontend for 'ln') would make them more popular.
Everyone I know using UNIX-based systems easily grasps the idea of linking a file so that it appears in more than one place, and uses that.
I don't think that manually putting your files into categories will really do much to improve your usage-expirience. What we need is "Google build into the OS". What about a file attribute, that, when it's set causes the operating system to index the file in the background, so that it's content can be quickly searched? Of course some kind of "Page Rank (tm)" system would be required to sort the files according to importance.
What do you do when you see an endangered animal eating an endangered plant?
A zillion years ago there was a concept called the 'Xanadu file system' which, if I am recalling correctly, was very similar to what the author has actually implemented. I did a quick google and found one tangential reference to it for those interested in late 1980s/early 1990s history
http://tgif.fremont.ca.us/~mfw/diss/node39.html
That the author has produced working code is a HUGE INNOVATION. That this innovation has been produced by one person with a personal itch to be scratched is the reason that free/open software works so well.
This sort of improvement in the user interface is what will allow Linux/BSD derivatives to drive right over the top of certain proprietary systems in common use today.
I am very easy to get along with, but I don't have time to waste being nice to people who are being stupid. -Theo
plan9 could do all you are asking. It uses 9p a protocol for file access, the file server responds to requests for files as can user level processes.
You could bundle all the meta data you like into your user level file server process and present the data in whatever format you like.
It's really no big deal
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Its very interesting to be able to fit thousands of nodes on a screen and to be able to focus on just a few that are of interest. pictures of the system are available here.
http://github.com/gbook/nidb
Go to www.scopeware.com to get the same thing for windows with a cool stream view interface!
databasing everything isn't the answer.
databasing everything well is.
Even Windows Registry could be presented as a Text file. The problem with the registry is the amount of useless information stored in the same, and not well managed, registry.
not the idea itself.
OK, lemme break it down for you...
Lycoris.
Mandrake.
and the 800 lb. gorilla in the race, MacOS X.
Yes there are xNIX-like operating systems on the desktop. When I get home from working at a Windows-centric company and am finished fighting Windows quirks and MS Office quirks for the day, I fire up my PC running Mandrake 9 with KDE and I am HOME AGAIN. Yes, Linux has its own peculiar set of quirks. But I don't mind them.
Please note: this is my opinion. Do not just say "oh yeah, another Windoze SuX0rz post." If you really, really like Windows, that's your prerogative. Enjoy.
-.\\-H-
"But you've already got a DVD. It lasts forever....In the digital world, we don't need back-ups..."
-- Jack Valenti
On slashdot saying the emporer has no clothes is considered "trolling".
This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
Look, I'm an advocate of free software. I also have experience developing on proprietary platforms. To say that "this sort of innovation could never happen" on a proprietary system is hogwash. Seriously, any well-documented system can be modified in this way. Even totally undocumented ones (think older TI calculators) can be hacked to add new layers of functionality. Open source (not necesarilly free) software is easier to develop for if docs are lacking, but otherwise this is not a case where OSS proves better that proprietary systems.
Just so you don't think I'm biased against free software, I'm writing this message on a machine running KDE.
t'nera semordnilap
Oracle has had something very similar for a long while, with awesome features (like running SQL queries to find a set of files).
Xanadu just became reality!
668: Neighbour of the Beast
Well, I might as well be unpopular today. I do understand the reasoning behind this product. As a Windows developer (bye, bye, Karma), I frequently have to deal with paths like "C:\Documents and Settings\myUsername\My Documents\Visual Studio Projects\My Solution\Some Project\bin\Debug\datalayer.dll".
However, if I wish to search an area of the namespace, it's simple enough to set up an Index Service catalog for it (bye, bye, more Karma). So long as you're sane about it and don't index your entire filesystem, things perform fine. I use the search feature all the time. Sometimes I even define keywords on the file for searching.
If I don't use the indexer, I can always use grep, file search, or whatever, to search the namespace by content. It takes a few extra seconds, but it works.
Email is a great example of this. If you're anything like me, you get 10 spams for every real email you get, and I get lots. A few years ago, I got tired of constantly filing all my email in a folder structure. Now I just treat my inbox like a giant stream and search it whenever I need anything. The mail is already stored in a database (Exchange 2000, bye bye Karma!), and the search is quick even with a few hundred megs of mail being searched across my VPN. If I ever get around to installing SpamAssassin, my methods may change.
Now, I'm sure I'll get slammed for using Microsoft products, but the fact is I've got gigs of data on my primary development box. I've got every remotely important file and email I've ever worked with in the last ten years - pared down to about 20GB of data. I'm sure I'd be completely lost without these search features.
How is this new product different?
BRENT ROCKWOOD, EST'd 1975
when it was called find|grep|awk etc.
I'm pretty fond of my organized hierarchy of projects and work documents. And I'd like to keep it that way, most of the time. Most documents indeed fit into the structure. I'm doing code and latex most of the time, so it would be useless throwing these in random directories, so you can't process them through compilers.
But when I'm trying to put down some misc ideas, and I'd like to store them, there's no way it fits in my perfectly organized structure (try to find the contradiction in this sentence). For these little ideas, newdocms would pretty useful, but I'd like to be able to switch between both systems.
So, what I'd like to propose is that you shouldn't just overwrite open/save widgets in any random desktop environment, but you should offer newdocms as an alternative next to HFS, so any user can pick whatever organizational style he prefers, whether it is HFS or a newdocms-styled system.
--- Sigmentation Fault - Comments Dumped
"it is a layer between the hierarchical file system (HFS) and the user, which provides a radically new way to store and retrieve documents"
The only things that are radically new is that a) it is open source b) it is aimed at individuals.
Commercial EDMS (Electronic Document Management Systems) vendors have been doing this for years - companies like Documentum and Filenet. Like newdocms, they combine the filesystem and a database of attributes for file storage and retrieveal. Documentum itself has gotten very sophisticated; it can read the text of the document and, using an XML taxononmy of your choice, auto-file the document in the appropriate place. Likewise searches can be performed on both user-supplied attributes, computer-generated taxonomic values, or the text of the document itself. Companies like pharmaceutical manufacturers, with millions of complex documents, simply couldn't function without it. And it works with multimedia files as well, actually reading the close-captioning on movie clips to perform it's auto-tagging and auto-filing operations.
That said, it is a great accomplishment, and a welcome addition to this KDE user. While I have worked with EDMS systems off and on for the past 5 years, I could never actually afford one myself, and none of the current EDMS vendors that I'm aware of support Linux.
As work begins on version 1.1, I suggest taking a look at the features available with the commercial EDMS products for ideas - especially for things they haven't thought of!
Good work!
Actually I would LOVE to have everything accessable in a database somehow. I've been wondering about something using the userfs stuff. Not really mounting a mysql database as a usermode filesystem but having information from the system available that way.
I've found myself many times wishing I could just type "select location,filename from datastore where contents like %resume%"
SQL comes much more naturally to me than the find command does. I would love an easier way to index the contents of everyfile on my system by an arbitrary number of metadata and then have that accessable via a simple sql statement.
I remember Scott Hacker did something similar with BeFS and his webserver at somepoint but he's long gone as is BeOS.
Am I the only one that this makes sense to?
"Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
Mentioned Microsoft.
The xmas issue of new scientist had an article about some research on office desk layouts. The messy desks were more efficient than the clean desks as the user knew where things were. The tidy workers who always filed everything (or were forced to by policy) spent a lot of time looking for documents.
:-)
This is "of course" why you can drag&drop docs onto your "desktop". So why shouldn't there be a proper implementation of a messy filesystem
No you don't get it. First of all for you again LINUX IS NO DESKTOP OS this doesn't mean that you can't run a DESKTOP on it. I'm doing the same here with KDE ontop of Linux as a layer. The problem is that people (let's name them JOE DUMBASS USER FROM WINDOWS) want that Linux behaves exactly as Windows. And there are a bunch of developers outside from the GNOME community that exactly try to make this happen. E.g. their plan is to completely hide the underlaying OS from these users, they want to change everything inside Linux to be simple, more of the means of Windows. Their idea is that the user does not want to know how Linux works, how their homedir works, how they read files etc.
With other words, people who are in no way agreeing of the philosophy of Linux and it's complexity, who are also a minority of the whole Unix and Linux community are paid to change Linux to mature into some sort of Windows.
That's the point hope you understand it. We all have nothing against a cool flexible Desktop. But if so then please integrate this Desktop into Linux and don't try to integrate Linux into the Destkop.
checking for Qt... configure: error: Qt (>= Qt 3.1 (20021021)) (library qt-mt) not found.
It needs a Qt from October(?) apparently. A stock RH8 (Qt3.0.5) isn't good enough, even though his (limited) docs indicate that you just need KDE3.x (except he then recommends 3.1pr5 !).
creation science book
In case anyone is arguing that this is simply useless fluff (which it might turn out to be, but for now, give it a fair shake), keep in mind how much your perspective might change if you actually had ENORMOUS number of files on your computer, let's say a couple of decades, each year massive in its own, and every so soften you do want to find that report Carl wrote somewhere back there in the 80s.
Sorry for the preaching, but please try changing your perspective when thinking about this.
Gosh, are you telling me I have to think up keyword and the like? Smells like work to me.
/. article about this some months ago), and the user could really be confused! It might be neat to have the system automatically find neighborhoods of documents (by content matching and by time).
Wouldn't it be great if this overpower POS (piece of silicon) could catagorize the document itself? It would not really need "natural language" ability; just steal (er, borrow) ideas from web search engines and have a thesaurus handy.
Combine this with the idea that the "save" button is outdated (there was a
The real silver bullet to good programs is caffeine; lots and lots of caffeine! *twitch, twitch*
What a radically original idea (not! Hello Documentum!) - do a bad shared library hack, then store everything underneath in one really _HUGE_ linear directory. Dude, filesystems scale because you use directories. If you leave everything in one top directory your project is going to come to a crawl after you start loading up a few thousand files (depending on the FS). Consider using a digital tree (say 500 files underneath each sub directory max).
I like this guys enthusiasm for open source.
I have questions though about the users ability to apply meaning attributes to the numerous amounts of content. If the user fails to provide meaningfull attributes the system fails to provide the user with meaningful results. In which case I would judge this system to less user-friendly because the files would be returned in a 1 big lump.
This idea stricks me as an implementation of something similar to the Dublin Core Metadata Initiative except for local content. Wouldn't this project benefit from enabling the user to manage ALL types of information, even remote. It wouldn't be a large stretch of the imagination to take that step.
If anybody is interested how the Dublin Core works in application you might want to check out the Zope CMF(Content Management Framework).
My experience from using Zope's CMF is that the initial learning process of a user using this method of organiztion was slow and bumpy. Although I must point out that my experience with the system was only with using a single implementation, so I'm not making the assertion that an implementation couldn't be designed that could improve the learning curve for users.
I would also like to point out to the people that have said this would ruin Linux that they don't understand exactly what this tool does. Its a means of effeciently catalogging and managing content. Any use of the tool does not restrict the user to that tool alone; it can be used in conjunction with the traditional HFS. The author even says so in the article.
...the contents of the file itself. Anything else is extra. Nice to have, but extra.
You want to find last quarters financial data and can't remember where you put it? OK, search for something that would be in that file ("Nov 2002"), and of file type X(whatever you use for a spreadsheet)
HFS or file descriptor info adds to this searching, but if you can't remember where you put it....search the text.
Doesn't work with binaries or images, but you get the idea.
Sorry, I couldn't resist. Pico is just such a bad, baaad editor. It's sort of become the "editor of the idiot" on Linux. When I used to use pico, I was too embarrassed to admit it, so whenever posting something about editing a file, I would substitute something else for pico ;-)
So next time, to save some embarrassment, try substituting nedit, vim, vi, emacs, or better yet, ed ;-)
Sticking feathers up your butt does not make you a chicken - Tyler Durden
He has a great point. when you copy a file into an email or onto a disk, it's metadata has to get attached to it somehow. Otherwise all the categories you put a file into are stripped and you're victim.. er .. coworker gets a file called 645325.xls and SMACK you're worse off then when you started.
Categorization such as this breaks down when a file falls under multiple categories. That picture I took of the family that had a picture of my mom, dad, sister, and dog? Where should I put it?
etc.
When I come back a month later, and want to find all pictures that contain my dog, having a metadata description field that says "This is my mom, dad, sister, and dog on our trip in 99." and then searching said metadata would seem a whole lot easier than trying to remember which of many folders it might happen to be in.
This isn't that revolutionary an idea. There is a scheme that was presented in the 50's or 60's, maybe earlier using cards with holes that could be cut or not. Pass a knitting needle through the holes for the attributes you want and shake and the cards that are logical "and" of your search criteria fall on the floor. A truly parallel associative memory. You idea sounds like a variant of this 'prior art'. But a good idea none the less.
OK. We have had proprietory DMS for a long time which behave exactly like this. Alter the dialog and index the document. This completely bi-passes the actual HFS indexing in-so-far-as if you place a document in the HFS by hand (from the shell) it is not indexed into the DMS. We have seen this before when DOS became windows (8.3 -> LFN).
Now I must confess to not having read the web page but this sounds like a stop gap until the true VFS can be written to work this way.
ive been working on a very similar, almost identical system for a while. still mind-ware, but im glad someone got a round to it. good job.
Question
http://www.ironfroggy.com/
...is something like a cross between a meta-data based document repository (like this or like The Brain), with automatic-mirroring capability for websites - or at least something that can be integrated into a weblog. I like to store interesting links and articles I encounter daily and be able to search them later. However, as we all know, what a link points to today, may be gone tomorrow, so that article you want to reference or remember might have *poof*ed out of existence. Right now this involves me manually mirroring every weblog post with 'wget' which is decidedly non-optimal and provides very little as far as searching and crossreferencing (which would be the real win: "what other articles do I have that relate to this article"). I'm sure it would be easy to hack together something for this specific case (cookies are a problem though), but I would hope there could be a general solution.
It's 10 PM. Do you know if you're un-American?
I'd just like to say thanks for giving us such a nice piece of software. It's needed!
...perhaps.
Why it causes a stir / borderline controversial:
Well,
- goes against HFS, or rather doesn't use it. hierarchies are a bit of a `paradyme` for a lot of people.
We don't want to have to choose between "Shall I index this document by HFS" or "Shall i index this document by newdocms" or "damn I've got to do both anyway"
- it's useability over dymanics (can't change fields and stuff quite so much)
I'm not too bothered about these 2 things at the moment because, well it's nice software simply to have.
But I must admit I would have liked it to store it's database via HFS. This would have been complex to envision? Easier said than done perhaps.
A system that works with HFS would have to be really dynamic and thus, a LOT of work.
I feel it's a system I could design and a programmer could write, but it's a rare person who is both a designer and programmer.
I'm sure in the future they'll be newdocmshfs convertors and for similar programs if newdocms doesn't grow to be comparable in importance to HFS.
But, I'll still be using this software for sure.
A blog I run for the wealth
watch closelly, or even participate, on this project. might be useful.
What ? Me, worry ?
Here's something that is not stressed enough in school: the HFS is a database, with the fully qualified path name as unique ID and basic operations of update, delete, record lock, and retrieve supported across most operating systems.
Other query operations are supported such as wildcard characters and, in large OSes other than Unix, a variety of other attribute queries (a la "/usr/bin/find" but accessible from "ls").
Now the file table itself is a database, which can be readily implemented using a relational database. Microsoft NT an other OSes have had such support for quite a while now.
I'm glad to see the full relational database FS model starting to hit the mainstream. By this time researchers are looking into XML based File Systems (store metadata in XML-like syntax, support any XML query on the files).
Which brings us back to an often overlooked fact. Linux has, in general, not been at the leading edge of OS research (with the possible exception of the beowulf architecture). This is alright as for many years the goal of Linux was to reimplement Unix on the intel x386 architecture. However we must keep in mind that the really advanced OS features out there have yet to make it into Linux, things such as new environment metaphors, persistent data support, and intelligent user interactions.
For exampe, with music. Music comes on CDs, organized by Artist, then Album, then Track. So I have /music/Tool/Undertow on my filesystem. This works, but it's still somewhat lacking. I would like my Tool albums also categorized into the categories "Metal" and also "Kinda Weird". Obviously, with HFS I can't put Tool into two different categories(I've tried this with symlinks, it's not so fun...).
That way, if I wanted to listen to random metal, or random weird shit, I could enter a query, and could get results for a bunch of good stuff to listen to. OTOH, if I know I want to listen to a particular song, or album, or artist, I would still have my HFS.
In other terms, think of the real world. EVERYTHING in the real world has an actual physical location, so should everything on your computer. That's why, in the real world we have catelogs, indexes, etc. for finding the real physical locations of our stuff.
I think what would be really, really excellent would be if all files on a filesystem could have meta-data. Then file selectors could have an extra query field, then you could go to either your root(/) or your music folder(/music), then enter some queries, and everything under the current folder matching those queries could be displayed in the fileselector.
That, IMO, is the optimal way of doing things.
Sticking feathers up your butt does not make you a chicken - Tyler Durden
A new filing system needs to appear to be very similar to our existing socument filing system in order to gain acceptance among the bulk of users.
The primary element lacking in today's HFS is the a good interface to soft links (or aliases).
This is my 2nd-generation HFS wishlist. Don't take my "Save" or "Open" dialogs away (just yet).
What's wrong with hierarchical systems?
Well, for one thing they place a needless step in the path of users attempting to find information they desire.. that of finding "where it is". If I want my banking records, I should just search for files bearing the "banking" stamp rather than find the door the the maze in which I hope they will be found. I think a query-based storage/retrieval system places "file search" in its rightful place in the process (as the entry point) rather than as a fallback to be resorted to when the user is unsure "where to look".
Another weakness is that the impose an arbitrary order to what is really an unordered series of attributes the files being stored answer to. A user may decide to store my vacation plans in ~/personal/banking . But an equally valid choice is ~/banking/personal (or any number of other "places"). The problem becomes vastly larger when a large file system has multiple users storing files in it and looking for files they (or others) have placed there. People misfile information or fail to find files they are looking for. But the honest truth is that these two places SHOULD be the same place, and an attributed file system could treat them as such.
HFS implementations often attempt to sidestep this issue by providing search tools which are lame in every real-world enduser system ever made. They treat "search by attribute" as a recovery mechanism
The schema choices made when creating the hierarchy optimizes the storage according to a particular philosophy and obstructs those users who have a different need than those anticipated by the designer of the hierarchy structure.
tone
tone
I've noticed about three main types of people in the world of open source
:)
Unfortunately you overlook the fourth and largest group -- those who COMPLAIN about everything and do nothing.
First all, I agree that this is a great idea to supplement HFS storage of documents. Documents will often fit into more than one category, so users have to choose one, and then do multiple searches to find it later. (Should I file a review for an employee under his name, reviews, and year, or year, reviews, and name, or reviews, name, and year??)
If users don't use thoughtful, meaningful, and clear directory structures and file names now, what possible argument can be given that they will use anything else in a thoughtfull, meaningful, and clear way? I get enough calls from users that can't even remember where they saved documents or what they named them (or why I should know) to make me doubt their ability to utilize such a toolset.
The ability to store keywords in a document and search by them have been in word processing programs for years. Those that are organized will take new tools such as newdocms and use them to the greatest advantage and receive the most benefit. Those that are not organized will only be further confused as they continue to save Document1.doc, Document2.doc, etc. with even more meaningless keywords such as 'expense report' or 'boobie picture'.
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
Yep, that's what IPTC fields, and a misc directory, are for. Throw something like this into grep, and grep those fields based on the file's magic number. Of course, that assumes that someone didn't already add that type of function to grep :)
Oh, here's a caption indexing program. You should be using your IPTC fields when creating your images (Whoops, that would be self-organization), and this program will create an index based on your captions. Grep that! :P
I'm sorry, but all these tools already exist. A new filesystem isn't necessary. IPTC fields are huge, and wouldn't really work with this new os-level filesystem. If you're looking for pictures, you query the IPTC fields. If you're looking for an email, you should use a proper subject.
Again, it's just people looking for an easy way out of organizing.
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
Windows XP has most of the groundwork for this - Windows has actually had it for a while; for some reason the last piece (the filesystem that lets you take advantage of it all) keeps not showing up.
You want metadata on files? NTFS streams give you a place to store metadata (much like Mac resource forks but with any number of named streams).
You want to search on the metadata? The Microsoft Indexing Server will build a database and let you search on it (though it's a very strange system to use - in XP go into Administrative Tools, Computer Management, Services and Applications, Indexing Service, System and click on "Query the Catalog". You can do instant searches for all kinds of stuff, look at the help.
OLE Structured Storage is like a single file version of the filesystem we're talking about - a way of saving a bunch of objects (some of which you didn't create but that are in your document) into a file. I believe Microsoft's Office apps use it (could be wrong there though).
Right-click on an MP3 file and pick Properties in XP and go to the Summary tab. There's the metadata - the stuff the index server is going to index. If you add a new file format to the system, you can supply a DLL that will be able to supply the metadata for those files - so you download an MP3, save it on your disk, and the index server uses the DLL to get the metadata and add it to the database. It works pretty well.
I don't really have a point to all this, just listing some stuff that Windows has that "should" make it easy for Microsoft to add the OO FS someday and have it instantly work with existing apps.
- Steve
Holy shit, that sounds like a whole lot of infrastructure just to work around shortcomings of windows 9x! ;-)
Sticking feathers up your butt does not make you a chicken - Tyler Durden
Canto has done something similar for windows/mac, this is a database-driven metadata storage system for digital assets of any type. It's also extensible through custom plug-ins. It goes a big step furthur by adding a server component for group/enterprise wide asset storage.
regular expressions?
And unless I'm missing something here I havn't a clue as to what this has to do with free software. You've created a database that stands as a layer *between* the HFS and the user, as a middleware layer.
Well, I can do that in Access and VBA.
Besides, as everyone * and his grandmother, literally* is saying already, people understand and *LIKE* the HFS. That's why it's lasted so long as it is, not for any technical reason.
"Mary, get me the deeds for the Swanson account."
Ok, go to the "Records" room. Go to the "file cabinets." Go to the "S" section. Open the "Sw" drawer. Remove the "Swanson" file folder. Remove the "deeds" envelope. Open envelope and remove "deeds."
HFS is how the real world works. *YOU* are trying to invent a "logical" system that can only be applied in a virtual space, which means that people will find it *less* intuitive.
And you've done it in a bloatful, crufty, way that emits bogons like crazy. It makes me want to scream "It's alive!"
File locations are already stored in a database, why, not only, reinvent the wheel but glue an extra set to the side? It would have been far more elegant and resource friendly if you'ld just made a user friendly front end to find and regular expressions.
Which of course brings us to where the problem of finding files really lies. The failure of the "user" to give files meaningful search terms.I don't see your system really addressing this issue in any meaningful way. This is actually much easier to do in a flat HFS way than in a RDMS way. Some*one* is going to have to *tell* the database that this is an article *about* the NYT, not an article *from* the NYT.
It really does all come down to the nut that holds the keyboard. You can only go so far in designing around that.
KFG
Finally! A PICK-like filesystem.
Having worked with PICK database-like OS
I found it's much easier and better to have
data in database than in hierarchical structuure
of files.
Who in sane condition will put 10 real folders
one into another?! Like in Russian Matryoshka,
but folders?! The folders model has no real equivalent in office work.
Russian puppets - forgot the name
... sort of. :)
Babushkas. If you want some, there's always Google.
Um, I'm pretty sure babushka is Russian for an old woman or grandmother, or a statute of same. (Or I see in the dictionary, a head scarf. This is sort of like aloha.)
I think the poster refers to Ukrainian (or Russian) Nesting Dolls.
Well, you did ask
This doesn't really sound like an hfs replacement. More of a needed addition.
I mean replacing the normal tree with meta tags sounds like a dangerous plan, but being able to search for a meta tag, instead of the file name sounds useful.
Also having the same file in multiple locations is something that would be quite useful in the win domain as most programs don't treat the shortcut the same as a real file.
later
I think Multics dates from 1965, see History and 1965 paper on the file system.
Just to let you know that MS didn't innovate this one.
Back in the 80's a British computer company, ICL, produced a chunk of hardware called CAFS. This did a basically similar thing.
Unfortunately for them their management was absolutely crap and a number of their other ideas (Distributed Array Processor, mainframes with a tagged architecture, compilable scripting language) all disappeared. I believe they are now an MS reseller.
In my years of Desktop Support I found that non-techie users always used Outlook as their filing system. They left all their attachments in their messages and used to find things by searching through their emails by sender's name, date, subject, and text indexing built into Outlook.
.pst(email file) was one big file this was a really bad idea and I used to try to show the users how to set up a useful directory structure in which to store their attachments but they never wanted to bother learning anything but Word, Excel and Outlook.
Since the whole
If you could reproduce the search abilities of Outlook in the OS then these people still probably wouldn't use it but it is worth a try.
Neither "newdocms" nor "The Brain" are any big breakthroughs. Attributed and associative information storage go back all the way to Vannevar Bush, if not before. There have been numerous attempts to implement them, mostly in research settings. The most successful one, the web itself is an experiment in associative information storage. Commercial implementations are, as usual, very late to the game and just trying to skim the cream off decades of research by others.
So...you never have to resort to find/grep, right?
Also, see one of Joe Celko's SQL for Smarties columns as an illustration of how to use clever hackery to get around the lack of one relational feature with ANSI-SQL.
Another good resource is the Database Debunking web page. Browse through the "Content" section to see many discussions on various data models.
I once heard a good example of how Human Computer Interaction should work. The human can't find something so asks the computer.
H: Where did you put it?
C: Put what?
H: You know...
C: Oh, in the usual place.
I had thought that this was a well-known HCI story, but I couldn't find it anywhere.
Anyway, the point is that users don't want to specify where things go, either by path or by meta-data. They just want the computer to look after them and return them when required.
What if I could google my file system? Then finding that expenses claim I made last March would be as easy as finding dubious web sites. I wouldn't have to rememeber where I saved it, what meta-data I attached to it, or even what format it was in. I haven't used any file search utility that worked as well as a web search.
Alert readers may note that this suggestion is slightly spoiled by the fact that I couldn't find the HCI example I was looking for on Google...
I think approach taken by "The Brain" is better: no need to burden the file system with indexing and attribution. However, there is nothing unique to "The Brain"--there are plenty of other systems that create, maintain, organize, and display relationships among documents. "The Brain" just has a better marketing strategy behind it and a slicker UI.
been there done that ;-)
Sticking feathers up your butt does not make you a chicken - Tyler Durden
Sharepoint Portal Server uses something pretty much exactly like this for document management. I wrote a small web based application for my last employer's intranet that did the same thing. I didn't see it as innovative or cool, even, I thought it was just "the way to do it".
As far as "free software" having anything at all to do with it, I just don't get that.
The truth doesn't care what I think.
Microsoft has been working on this for ages. Remember Cairo? I think I first heard presentations on it in 92. Basically all of it has been implemented over the years except for the famed object file system, which is basically what this sounds like. Plans are currently to implement it in Longhorn.
You can already search on these things within your email client...but if it's integrated with the OS, you can search on these attributes from anywhere. You end up with one database for all your apps.
There are obvious namespace issues, it wouldn't be trivial to get it all working smoothly, but this is a giant first step.
It's a bad idea to put this sort of thing in the kernel, like Microsoft is doing: file system attributes and indexing add too much unnecessary complexity and overhead to the kernel; kernel file systems aren't used primarily for organizing user-created documents.
(You also deserved to get "bitch slapped" for calling an old and tired idea like that "radical".)
Case 1:
I'm your average home user, but even so I have about 100 documents I work on. However, I was smart enough to give them meaningful filenames and locations where it takes only a few seconds to find the file. Remembering attributes for each and every file would be a pain.
Case 2:
I'm a developer. I'm sorry, but I want file Y in F/O/O/BAR. I need something exact to describe where a file is at least. Anything else doesn't work.
Case 3:
I'm a mornon who doesn't give a flying-f*** about where I put my files, and I don't care what I name them. I already have documents in my C\:, C:\Windows/Temp, C:\sdf34\, and C:Documants. It takes me a couple minutes or two to find a file. What? I have to classify by keyword now? Who do you think I am? It needs to classify the files for me or I won't have any of it.
Case 4:
I'm a scientist/business man that deals with classifications on a day to day basis. I already have a database because I needed it to be efficient. If it was on the file system level, then it'd be pretty cool.
I can't think of any other positive cases where this product is useful. Thus, it's my bet that it'll be niche forever. Anybody got any other use cases that I'm obviously missing?
Well, I'm sorry, but Windows NT allows you to design your own filesystems too. I'm sure that the task is easier with Open Source (especially if you do it in userland libraries), but you can create your own low level file system on NT.
I passed the Turing test.
This program sounds somewhat like the information management program called The Literary Machine. It is a program not only for writers but for anyone who wants a powerful way of storing and retrieving tidbits of info. Manuel, have you looked at this program? They have a free version that is very functional. Here is the link:
http://www.sommestad.com/LM_1_5.htm
The Strengths of his approach
Manuel's system permits grouping by arbitrarty metadata in an arbitrary order - huge plus
It appears to fit inside existing file systems - gives simplicity and portability
Integration with existing applications
Hiearchical organization is also useful - think poor man's AI.
Areas for Improvement
While it fits in the existing HFS, it doesn't take advantage of existing metadata in HFS. Filenames are now numbers (not helpful). Build it to still take a standard HFS filename and path, and maybe automatically glean info from the filename and path (uncategorized metadata)
It should automatically grab a minimum set of metadata based upon file type (MIME type). Pictures have size, color depth, etc. Documents have word counts, page counts. All files have creation dates, edit dates, edit bys, etc.
Completely arbitrary metadata becomes a jungle. Set up a way to manage desireable metadatas for each file type/MIME type.
Another approach that could be just as useful would be a standalone librarian/card catalog application. It would have a daemon scanning all new files to automatically find standard metadata. Heck, use the Library of Congress hierarchical topic system and start with card catalog metadata - grow from there. The pro of this approach is less intrusiveness (on existing data, filesystems, etc.). The con is another application, which means it is less integrated with existing applications.
It does sound like Longhorn without the required new hardware.
But a computer magnifies the user. If organized it makes them better. But if Dis-organizes it amplifies the mess.
Wish him luck. The anwers is like Tivo. There is some bit of crude AI that tried to remove the Human as much as possible from the real work.
Shouldn't this be posted on FM?
You might come up with new ways to organize the tree, maybe some different ways but it's still the only way you can process data.
Hierarchical file systems have become as prominent as they are because they are simple, they are efficient, they work, and they support all the indexing and attribution people need (well, that may not be true of VFAT, but it is true of modern UNIX file systems). Attribution schemes are easily mapped onto hierarchies. For example, I organize my documents and projects as ~/{work,personal}/(project)/...
But I also have other directory trees. I store stuff by keywords under "~/notes"; file names in that directory are very long, containing many words. To find something, I can search for it with "locate keyword" or "ls ~/notes | grep keyword", and to browse through it, I might use "cd ~/notes; view $(ls | grep keyword)", or "cd ~/notes; ls | grep keyword | xbrowse".
Experimental results are stored as ~/data/(experiment)/(condition)/(subcondition)/(su bsubcondition)/(measurements), and I have dozens of gigabytes of that stuff. The notion of a working directory lets me keep my data and my projects apart, and the hierarchical naming provides a quick an simple attribution scheme.
Directory trees are also easy to query in UNIX/Linux. If I don't remember whether project "foo" was personal or work-related, I can refer to it as "~/*/foo". If I'm looking for all the projects containing a particular image "img.jpg", I use something like "find ~/{work,personal} | fgrep img.jpg". Looking for any TeX file containing a particular phrase is also simple: "locate .tex | xargs grep -l phrase" for an older file and "find ~ | fgrep .tex | xargs grep -l phrase" for one I modified today. The command line utilites that UNIX and Linux come with out of the box are more powerful for indexing, attributing, and associating information than anything any commercial product or file system hack does, and they give me far more freedom in making crucial space/time tradeoffs.
These operations are not as fast as if the system had maintained indexes (well, for "locate" it does), but they are fast enough. Given that most of the time, I just want data access as quickly as possible, that's the right tradeoff as far as I'm concerned. I don't want to have to add additional metadata or pay overhead for indexes and attributions to speed up a rare and already fast operation further.
I think a lot of these efforts to add attribution and indexing to the UNIX file system are because people just don't understand the capabilities they have already at their fingertips in UNIX.
Of course, there is probably value for people working in GUI environments on Linux to get some handholding: you can't "find" or "grep" from within a GUI, so GUI-based users really need extra help. And for that purpose, a library based approach like "newdocms" is the right one. But this stuff does not belong into the kernel, at least on a general purpose OS like UNIX or Linux.
The hierarchal method of organization is not the primary trouble. Even for novice users the concept of files and folders for their data is relatively simple and intuitive.
The problem lies with the "everything is a file" concept. This idea is powerful but archaic. Very few average users need to directly manipulate files on their system *other* than the stuff they created in their applications.
By calling everything a file, when the user looks for their stuff, it's needle in a haystack time.
To the average user there are only three types of files..
1) Programs (executables to you and I)
2) My stuff
3) Other crap that makes the computer run.
Make the default file system work like that, and people won't lose so many files. Leave the 'everything is a file' method in expert mode.
-Z
MS introduced a version of this many years ago, granted somewhat simplified in its first incarnation, and the user completly failed to use it most of ther time.
They are attempting to do a much better version in the next version of their operating system.
But then you can freetext search files, that are compatible with the indexing engine in MS oses today, and that is really neat.
This kind of "expansion" upon the meta-data model has been implemented (including the user level interface to interact and manipulate files based on those extra file properties). A simple example could be ID3 tags within MP3 files allowing an MP3 software application such as iTunes to dynamically manipulate smart playlists.
Surely, for this expansion upon the HFS paradigm to flourish and be healthy, it's going to be colossally important to standardize the actual meta-data "tags" (similar to how ID3 tags slowly becoming standardized). Without some standard for people to follow, there's going to be a tremendous amount of fragmentation between implementations.
Standardization of something like this might seem like it's once again moving away from allowing people to fully organize things the way they absolutely want, but like file formats and protocols, people are going to want their data to work across as many software applications/file systems/architectures/etc. as possible. If it's necessary, a higher level abstraction can always be designed to allow people to be even more anal about organization.
Find the 2003 financial statement... ...289384 files match
...289383 files match
...839485 files match
Find the 2003 financal statement for mycompany...
Where the %@*# is the #?$^@ file I just created...
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
For those that don't want to use this... don't. But i can almost guarantee that you don't know where all your scripts and documents are located off the top of your head. And other people who might need to browse your shared directories certainly don't. And you can't tell me that there has never been a time that a doc could have been properly placed in more than one folder- everything can't be pigeonholed into one exact category... that is the nature of information. However, i agree that this system has its limitations. I've been wanting to do something like this, but it would allow you to save your document where you want in the HFS, and would make links from other "category" directories to the "actual" directories based on what the computer knows about the file (i.e. it is an mp3 file, so it goes in the audio category and the mp3 category, at least) and categories that you select in the Save dialog box. These categories and files would also be indexed in some type of database for additional searching capabilities. Please let me know of any products that do something similar to this.
FWIW, I'm an MS Active Server Pages (ASP) developer with zero Linux experience. Read on if you still care............
About a year ago, I started working with MS Indexing Service, which is what powers the "Find..." feature in Windows 2000. It seemed like a great concept -- take a hint from the database world, create an index of document properties for files residing on that machine, and then query that index first when searching for files. MS even provided a nifty little API to allow programmatic access to this functionality . Good stuff......
However, I started encountering problems on a number of fronts. First, the number of 3rd-party COM objects that allow programmers to set/get these properties in code approaches zero: the only supported component I can find is Desaware's "File Property" component. And while that works right now, the documentation isn't as robust as I would like..... and, of course, it's proprietary. Second, Indexing Service tends to shine only when used in conjunction with files that are local to the computer where the scripts reside. Once you get into files on shared / remote boxes, you're forced to use a single, hardcoded username/password to run searches -- there's no support for delegation, or for a logged-in user's credentials to be applied when performing those searches. Finally, I'm hard-pressed to find an affordable solution for querying another box's Indexing Service catalogs (i.e. indexes)....... for that, they want you to upgrade to SharePoint Server. That comes out to $4,000 for a server license PLUS $72 per CAL, on top of the cost of Windows 2000).
NewDocMS sounds like a pretty interesting alternative to OLE Structured Storage on MS....... it may not be mature yet, but if it continues to evolve, it'll be yet another quality app for Linux to reduce the TCO for businesses, especially small- and mid-sizers that don't have the deep pockets that standardizing on an MS platform requires. And because it's OSS, it has the potential to mature faster than its proprietary siblings. This, coupled with Linux's unwillingness to foist DRM onto its users (at this time, at the software level) makes it an increasingly attractive file serving alternative.
Who knows, maybe one day soon I'll break down and start messing with Linux. For one thing, I'm loathe to return to the command-line interface, and I don't have enough time for things as-is. But I've already sworn that I won't buy any further versions of Windows beyond Win2k, nor any versions of Office beyond O2k, because they fill my needs nicely without forcing me to do too much. I've already dumped Windows Media Player in favor of Winamp 3.0 with plug-in support for WMV; DivX shows more promise as a video format anyway. And now that WinAmp supports Ogg Vorbis (as of 2.80), I'm considering converting all my Mp3s to Ogg format as well.
Sounds weird, right. Let me explain why?
If all I ever wanted was 2 files, I could name them "A" and "B" and I would know which is the right file to open since I can remember it all.
If I wanted to get some more order in place, and had to manage 10s of files (30-40 not 100 odd), I could still name them as photo1.doc, phone.xls, and so on....
If I wanted to store 1000s of files (let's skip over the 100s part), I would definitely need order. If I wanted to access a specific file at any time, I would use either the complete file path (directory and filename) or the meta keywords I associated with the file.
Sounds good. So what's the problem.
Problem is that most of the people do not start by cataloging 1000 files. It always starts with a few files and then adds up. So if I have only 10 files saved on my computer (or let's say 100 birthday pictures and an address list), I hardly realize the concept of structuring my content. It is only when the files get piled up, we think of organizing them. (I know there are people - less than 10% I think - who start everything organized and may handle this better, but I am talking about the general populace.) So here's how it works - you have a few files, you spend very little effort to organize them. You add some more files, you bring more structure to the file organization. You add some more........You do not have the time to go change change all the other filenames or directory structures to fit your new scheme. And then you start getting into a mess.
The same issues will apply whether it is a hierarchical file system or a meta keyword driven filesystem. If you are left to define your own structure, you will start with the minimum that will suffice and then extend it slowly as needed - eventually leading to either a mess, or a situation where occassionally you will not be able to find files. That is just how it evolves. Also think - you did very well organizing files with 20 keywords and now suddenly see the need for the 21st keyword which will also be meaningful to at least half your other files. Would you go back and change the other files categorization now?
How do we get around this? Thrust people with an organization. Publish a small cheat-sheet / best practices on file organization which people can learn and use. Or provide predefined folders in user's home directory. Or enforce storage of meta-tags everytime a file is saved.
I seem to be rambling, so I will try to cut it short. The idea is nothing different from what we have done in software development. Why do we have guidelines on how to name variables, or functions, or modules? Some of it is to get everyone to the same level. However, part of it is to bring order to the clutter in the program namespace. Same thought applies here. Provide people with more than just a home directory to start with. If going meta-tag way, provide templates. (I am sure all non-programmers will appreciate the order it brings).
What's the bad part? Someone will need to own the definition process. We will end up with a Microsoft definition which might say that all birthday pictures are stored in folder home/myfiles/pictures/birthday or an Apple way which might say that all birthday pictures are stored in folder home/myevents/birthday/pictures.....
Well....
We have a document imaging system that does basically just that. It's a Win32 package called application extender from OTG software. It hooks into your file->save dialogs and stores all your documents in a share with a nasty ID as the name, but then you look things up via the attributes you've set. Normal users don't actually even interact, or know where the true files are stored.
It's actually excruciatingly painful for users to deal with sometimes, since their interface makes it very difficult for normal users to figure out how to open an "actual file" rather than something in the application extender database.
In any one directory there should never be:
I learnt this at the office where I used Windows. I couldn't stand a cluttered "Start Menu", so I broke everything up basically as I said, and voilia, I could find any items within three or four menus. Since they were split logically according to usage, it took me seconds to find anything, if that long.
On my computer, I like to do this with my home directory. Everything becomes so easy to find. I really can't understand why people don't use it properly.
Have you read my journal today?
Good to hear that someone's making it a reality -- it seems like a much more logical approach than a hierarchy for some things...
With something like this, my family could, just possibly, manage to keep our digital photos organized...
This is really important stuff. The file cabinet concept worked well for many years, because it bridged the tree model systems design could create, and an intuitive real-world model users could understand. There are three prevalent models individuals use to interact with the world. Any system design that encompasses only one of these will always be difficult to use for the majority of the population. This app doesnt break new ground beyond the text based approch, but it DOES try to break out of the document/tree centered way things work now. That is a good thing and we should applaud.
The next big mountain to climb is expanding the interface. A variety of ways to interact with systems in different circumstances is needed before the real promise of computers can be realized in everyday life. That means a variety of ways to manage documents and information. While I am at my desk, a HFS is fine. But I want to use a wearable computer for active tasks. A voice driven, keyword oriented filing system with audio feedback is probably going to be more useful than a point and click driven HFS in that situation. An app like this one is helping to build the foundation for these kinds of new interfaces.
The posters that ask whats wrong with HFS, or say that unorganized users will lose their files anyway are suffering from ivory tower vertigo. They are like the monks that saw no reason why anything needed to be written in anything but Latin. Ignore them. They don't understand what you are doing.
This sounds like BeOS's solution?...
Why don't you call it "HFS+"
(OK, it's a lame mac joke - sorry.)
Cool. But how can you mash this into a filename string so it works with everything else in the OS?
But wait, I thought you said it was at the OS level.
Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.
Had this in the BeFS.
;)
It worked great! Glad to see linux is finally starting to catch up to BeOS!
The problem is people don't want to be organized, so they look to technology to help them be lazy. Plus try explaining 'metadata' to someone. At least now you can use the file cabinet, drawers, folders, papers example to explain the layout to someone.
Your right, people don't want to be organized. They want to be lazy. That's what computers are for... to do otherwise thoughtless work.
As far as explaining the word metadata, why would I try to do that? People already understand the word 'description'. Describe the file.
People use metadata every day. When my mom wants to play a game online, she doesn't go to games.yahoo.com. She goes to google and types "online card game". When someone wants to buy a pair of shoes, they don't type http://www.amazon.com/exec/obidos/tg/detail/-/B000 072G1W/qid=1041616305//104-5897266-1255115?v=glanc e&n=507846. They type amazon.com and then search for "Nike Shoes". People will understand it if it's presented correctly (which of course, it won't be until Apple or MS get a hold of it).
-- sorry about that last comment. I couldn't help it.This is a testament to the power of free software
It is? Why? Microsoft thought of this a long time and has it in the works for Longhorn.
And the bottom line is that neither you NOR Microsoft have bothered presenting your creation to real users to see if they actually like that approach. Instead, you've just unleashed it on the world without bothering to measure the real demand for it.
Typical engineering bravado: if we build it, they will come. The real smart engineers are the ones who bother to assess the need for something before building it, and get feedback about the design via prototypes, etc, before building anything.
Moderator hint: a comment is neither "Flamebait" nor "Troll" if it is true.
But we do have subgroups here, some of which tend to act as a mob a la religious fanatics. Try posting something unflattering to Apple or to Mac users in any Apple-Whatever discussion. Doesn't matter if you had a good point or an honest observation; you'll get modded down to oblivion anyway. And yeah, it's tribalism in action, that mortal enemy of independent thought. But of course that could *never* happen on Slashdot, where folk are expected to think for themselves ;}
~REZ~ #43301. Who'd fake being me anyway?
Cheers,
Soulfry
Putting it into the kernel also strikes me as just a dandy way to lose or corrupt all your data, in the event that the filesystem gets wonked -- even what are now trivial disk errors could become disasters. How do you recover your data if the only way to tell which file is what is in a now-mangled index??
Seriously, has anyone got ideas about that? (remembering that current backups are increasingly a pleasant fantasy, rather than functional reality.)
~REZ~ #43301. Who'd fake being me anyway?
Let's approach the problem with a question. Who knows what, about the problem? Solution? A more intelligent software chain, tools that work the entire mental chain from the "I have an idea" level to finished product. Lyx is a niche in this chain. A higher level of intent (is it a chapter or a list?) is embed into the document. Going further up the chain. Is that chapter about the perils of mountain climbing, or that list the ten reasons not to smoke? Idea organizers/processors need to get better. So a richer amount of information can be transparently passed down. Information that can be used by "intelligent agents" (Tetex) to give better results than a human could. Remember work is only work if you notice it.
No longer will you browse complex directory trees or directly interact with the HFS; instead, you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on.
Yes... lets make OS's even more confusing for the average person!
Seriously... why? All you did was change the way we access the HFS... Instead of having a directory called "Program Files" or "home", we now have an "attribute" called "Program Files" and "home". Congratulations, you've sucessfully created a file system that forces you to type in the location of the file instead of just double clicking on some icons...
I wish Linux (and Unix) file systems offered an integrated option for dual-fork files, like those on the Mac. That feature would facilitate clean implementation of ideas like the subject new file structure, and much else...
"free software" ... actually lags behind software driven by the profit model.
Who says free sofware is never driven by the profit model? Tell that to Red Hat.
Find free books.
This actually sounds like a really good idea. There have been rumors flying around that MS is planning on making a DB based filesystem akin to the way that Exchange mail stores work. This approach sounds better to me since it gives you the benefit of both approaches: a DB for fast file location and a true HFS that still allows legacy applications to utilize it. Knowing MS's usual approach they are probably going to negate their existing applications base with the filesystem. Users and admins will likely be in a similar predicament that the mixed-mode domains are currently in; patially NTFS and partially "Storage FS", "NTFS2" or whatever they wind up calling it.
I posted the following on Slashdot a while back and think it fits in nicely here, so I'll repost;
:) But, the only input devices we have are still limiting. The closest thing I've seen to something useful for text input is "Dasher". Combine this with eye tracking and I think you have a great solution for portable computing with no need for KB, twiddler, or the like. The other thing I think we should be looking at is the possibility of CLIs actually learning what we do most and creating aliases based on those actions with notification that we have a new alias that we can use for those actions. The other possibility is textual access to that same DB that the normal users would have in the GUI. This DB would allow us to use our machines in CLI mode with automatic suggestions for related commands, data, services appearing in a "scratch" location on the CLI for the machine's "stream of consiousness". It would become symbiotic. As we learn about our machines, and our machines learn about us, we augment each other. And THAT is what we should be working towards: computers that augment us as individuals while being as transparent or intrusive as the user desires.
I've been thinking about this at great length for the past year or so. The W.I.M.P. interface is going to be with us for a while no matter what we think of it. It will evolve and get enhanced by other developments in input devices (eye tracking, speech recognition, humanoid virtual androids, etc..), but will probably largely remain the same. The real "innovations" (for lack of a less used word) are to be had in new approaches to using the computer to actually get work done.
Unfortunately, I think Microsoft has us in a bad spot right now. I've heard rumours for a while that one of their big projects is some kind of storage/document management system. When you think about it, this makes sense for the business world as the "next big thing" because the suits don't care about data formats and don't WANT to learn about what type of data is compatible with other data. If my hunch is correct (based on the info I've seen in various spots on the net) they are planning to make a transparent, centralized (within an enterprise) mass data storage system that completely abstracts data from file formats. More then likely, the end result will be based on that DB centered filesystem we've been hearing about. So when a user creates data, whether it's graphic, text, audio, etc... it all goes into this DB with approapriate links drawn automatically between the different data. The user never has to think about file formats. They just create their data (which they will likely think of as "documents" with no type) and save it to their published "Folder". The filesystem/OS will take care of all the data type matching. Exchange and Windows XP for Pen Computing are the first glimpses at this kind of thing.
If we really want to get something new happening, we really have to start thinking about a few items:
1. Computers (even with W.I.M.P.) force people to interact in non-human ways.
2. To be truly efficient, every task that a computer could be used for requires different UI approaches to be "optmized" for that use. (Witness the turnkey systems out there for the button pushing monkeys to use)
3. You either have maximum flexibility and number of features at the cost of true ease of use, or you limit your user to make things easier to use. There is no compromise.
To tackle the first point: People have been working for so long on trying to make computers "user friendly" that they've added so many things that actually cripple the user. As Neal Stephenson pointed out in his essay, "In the Beginning There Was Command Line", many metaphors actually prevent the new device from being used to it's full potential. He had an example of a steam powered car that used reigns for steering because it was something people were familiar with. However, it's obvious to us now that the steering wheel (while a new concept) was actually the better interface. I think we need to question whether we really need to hold onto a lot of the metaphors in use today. Should we try and meet our machines halfway, especially since their eventual role will probably be to augment us in many ways? Or maybe we should come up with new, less limiting metaphors? I think it will all come down to how each individual uses their computer.
I know that I feel very limited by GUIs these days. It doesn't matter if it's Windows, Linux or MacOS. I've used them all and can easily move between all of them since they really aren't different at all anymore. However, I do get a lot more usability and flexibility from the CLI for the way I use my machines. Still... the CLI is limiting too. The time to integrate CLI and GUI into something more cohesive than just running an xterm in X, or CMD in Explorer has come. Why don't we have a CLI that has modern text editing facilities. There are many times when I wish I could do a text search through the text in my scrollback buffer. Or how about being able to "drag and drop" filenames to directories in a CLI window, instead of having both a GUI file manager and a CLI open? Or dragging a console command line out of a script you're editing to the desktop and having a new CLI window (or maybe a new tab if you have an MDI capable CLI) pop up with the line ready to execute by pressing enter. Or maybe a way to use the command history to create new scripts easily? Just arrow up to the commands you just used and tag them in the order you want them and have them output to a new script in your home dir. These are basically shortcuts that could make CLI life a lot easier. However, this still barely touches the real issue.
The real problem is that the computers (with ANY UI) still force users into limited ways of interacting and thinking. To manage your files, you have to think in hierarchical fashion even if that ISN'T the way that you work with real paper/books/printouts, etc... File management should be approached in a much different way than it is currently. (Most users I know never even touch their file managers unless they are going to read a floppy.) The "search" tools that many GUIs provide this to some extent, but it's only ephemeral. A search is not a permanent record of a state. The only "views" that we currently have in a GUI are limited to the way that a computer "tech" thinks, not a user. In fact, the very use of the word "file" may be an impediment to using a computer in the most efficient way.
If we take a more object based view. The data would make a slight transformation from "graphic image file" to simply; "Picture" regardless of the format. Text data would no longer be the mish-mash of formats that it currently is (ASCII text, "DOC", RTF, PDF). It would instead become "Letters", "Articles", "Recipes", "Source Code" "Personal Photos", "Promotional Pictures", etc...
Instead of the user arranging folders that contain all of these categories, the OS would already have a pre-ordered layout of filing by these categories. However, this would not be the normal folder structure that a filesystem uses, but it would be a database that manages the underlying filesystem. As new applications get installed, more categories for those apps get added if they don't already exist. When the user opens their personal information store, they would be presented with a list of the categories (with a bias towards the most often used types) to scan through. Once they select the ONE category they are interested in, all other categories dissapear from the list and a new interface is presented with the option to search for a specific document or select a "view". The "view" could be chronological, alphabetical, or relational. If they pick chronological, their choices can be Today, Yesterday, Within the Past Week/Month/Year, Specific Date. If they pick alphabetical, they get the options for Forward/Reverse order, or Specific Letter - Forward Reverse (Ablilities, Accidental, Actionable...). It they pick relational, they can select a specific document and it will present them with a "web" of all related documents on their system, network, or corporate enterprise. This is just a simple illustration of "what could be" for the typical end user. Let's take a look now at what could be for the advanced user.
A lot of times, I find myself with a strong desire to have access to my machines, but being limited by the other things I need to do in daily life. The concept of the wearable computer appeals more and more.
My second point is that depending on how you use your machine, certain UI/input device combos may be more efficient than a "one size fits all" approach. For instance a musician may want to use a computer with a KB, Mouse and a real mixing board input device for virtual studio work. Or an artist might want to use a tablet interface that allows them to draw on screen just as on paper. One of the things that Linux has going for it in this way is that you really could make dedicated distros for different types of work. This would be a great way to usurp Windows from certain arenas since MS would likely never take this appraoch as it would cost too much. But it needen't cost as much for Linux. The freedom it would allow for in UI design would be incredible. Imagine the new kinds of tools and approaches that could be created without being fettered by a "desktop" metaphor. This is where I think some extra specialized work needs to be done: hardware input devices. If we can get Linux to support as many input devices as possible, and combine that with very specific task focused distros (or a distro with "task plug-ins"), we could gain more acceptance in specialized fields.
The third factor is how much power to actually give the user. As we've all seen with the various W.I.M.P. interfaces out there, having more than one way to do something is great, but it gets in the way of user friendliness. I've seen plenty of people get EXTREMELY confused by seeing that they could minimize a window by clicking on the _ widget OR by left clicking on the application's window menu on the left side and selecting "Minimize", or by right clicking on the application's listing in the task menu and right clicking to select "Minimize", or... you get the picture. While it's nice to have all those options (especially as the user becomes more adept, it's likely to confuse the user). I still wonder why no one has taken notice of Nautilus' old (weak, but clueful) approach of having different modes: Beginner, Intermediate, Advanced. Someone need's to sit down and figure out what the easiest GUI thing for most users to do is and pick that ONE approach for a function. Then all of those simple approaches would become the "Beginner" settings. The "Intermediate" settings could incorporate other GUI based approaches that are less commonly used but might be preferred by a more intermeidate user. And the KB shortcuts (there should be one for every function in the GUI) are left to the "Advanced" user mode.
Instead of completely removing features to try and avoid confusing the user, the features should be categorized thoughout all apps and the OS environment into categories of some kind to limit what a beginning user is exposed to. Some people will never break past that, and that is fine. Others will want to explore and learn more. Either way... the real goal needs to be more humanization of the UIs, and more machination of the humans."
I suggest that the view that should be taken of this project is that the average user (especially in businesses) shouldn't need to know what filetype their data is or where it is kept on the storage system. They should be able to search through their data by the attributes that this project has created. The system should really do the "file management" for them behind the scenes. This is not something that the uber-user needs, but "Joe User" would probably find an OS that does these things a LOT more attractive than others...
Un-news
the main 'innovation' of the internet is to have everything in the world cataloged in a gigantic HFS (the url system). this is nothing new, though, libraries have been doing something like an HFS for at least a century. they dont call it 'library and information science' for nothing.
now if you want a 'better way to search it' then make a database layer that can find all that crap automagically. i heard beos did something like this but since the bloody sod wouldnt install i never got to find out.
The problem with a HFS is just that paths are usually not commutative. That is, I might save some .ogg file as 'Albums/Indie/Slut/Blow Up.ogg', but I can not easily simultaneously access it as 'Indie/Albums/Slut/Blow Up.ogg'.
Why would I want that? The first path is useful when I want to browse all full albums. The second is useful when I'm looking for all Indie songs, including those which belong to collections of full albums.
Another path would be simply '/Slut/Hope.ogg', for getting all songs by Slut, including those not belonging to a full album. I would want to find 'Blow Up.ogg' in that same directory as well.
A nicely sorted ogg/mp3 collection is just one example. There are lots of documents on my computer that I could sort more nicely based on attributes instead of a plain hierarchy.
I'm not saying that newdocms meets my requirements, as I haven't tried it yet. My vision is some way of storing files along with attributes and using paths as a kind of query on attributes or keywords. Something to the effect of '/this/that/file' matching attributes/keywords 'this' and 'that', or maybe advanced queries allowing not only and-type queries, but more general boolean or regex operators.
Of course. Your computer is not going to clean up after you. And I don't want it to. But Manuel's attempt at least seems to be trying to give me a more powerful tool to sort my files. Maybe people don't want to be organized, but some might like more advanced features. A hierarchical system works pretty well, but nobody said it's the ultimate perfection.
Congrat's to Manuel Arriaga !
As information becomes more complex and we collectively become computerized, we will need to better manage and relate data items and aggregations (files for short:-).
Funny to think that in a few generations people will look back to the concept of using files as knowing too much about how the computer works and stores data much like we think about having to load a tape or disk device a few decades ago. Also makes you realize the possibility and power of changing the core OS to better match how it will be used in the future. Of course, like Object databases, we'll have to wait a few generations for the dead wood to decay out of our companies before new ideas are used.
Most of this functionality has been available in mainstream OSes for a long time. For example, the "Find" dialog in MacOS allows me to search by file name, creation date, modification date, file type, size, extension and even content (or any combination of the above). So its very easy for me to find all the text documents created in the last week and search them for the word "Grandma."
This is Google for your Harddrive. Why do people keep complaining about this? The idea is AWESOME.
Don't convert MP3 to Ogg.
You might be interested in checking out System Restore on Windows XP. It creates periodic system restore points, or you can create one any time you anticipate problems, but it doesn't affect user files.
For one thing, HFS makes document security simple. By storing in directory X, you limit use of the document to those with various levels in User Group X.
For the home user/single PC, it's GIGO -- no matter the file system, whether HFS or metadata, the user has to recall it. Usually when looking for those 2-y.o. records, the user will give up and do a full content search. No great loss in productivity for the simple home user, who doesn't have that much data to organize in the first place.
For corporations with networks and immense document structures(where metadata comes in handy), there are already dozens of software/servers that allow indexing by metadata -- like Centra2000 (now Konfig), or *gag* Sharepoint Portal Server, or Documentum. The admin stores documents in an HFS (for determining security/accessibility), but the users find the docs using metadata, indexing, or links without having to worry about the OS Directory location. Very reliable, easy for users to understand.
In the end, the problem is solved for business, and for home users, the problem is the home user, not the amount of data or structure of the FS.
you are all jackasses.
cowboy neal is stupid.
So that would work for pictures...
Now what about some movies of the family? How would they be arranged? What about a short story that the kid wrote for school?
Hello McFly!! People aren't just storing documents! Music, pictures, movies, email, and so on, all need to be stored. Making a hack for one type of format doesn't help for the 15 other types.
So forget grep. Forget find. HFS isn't cutting it.
No, I don't trust in god. He'll have to pay up front, like everybody else.
Quoting the article...
...write an OS from scratch or perhaps license code from Microsoft or Apple...
Obviously not a true geek then. Why not talk of licensing code from Sun or IBM?
Follow me
and shouldn't it been in the release of Longhorn. I don't see any differences in what he is doing that Microsoft already hasn't. Just that Microsoft will use MSSQL to drive system. This is great for locating files (searching)
The critique that open source that open source is not that innovative is in my oponiion somewhat right if we look at the final product(not how it's made).
I know it's not a new idea but i've never seen any real implementation either. This is at least a try of something new and not seen in any other major desktop OS.
still reading?
I don't mean to be a jerk -- I think document metadata is a great when you have tons of documents -- but this isn't "radically new". The concept of indexing documents by attributes is the foundation of any document management system. Take a look at OnBase, 1mage, and Laserfiche. Of course, those are all commercial, proprietary products, so if you're dedicated to Free software, they won't do.
Despite that, I think this is a cool project. The direct system integration is neat. I just completely reorganized my Documents folder, but I'm still not satisfied and I feel constrained by the file system. This is the kind of thing I need to help keep my stuff in order.
irb(main):001:0>
Not many... I'm working now for a government office, trying to convert plans to pdf. They come in bits and pieces and lots of the original material is lost. Most of the time I'll end up scanning in whatever I can't find. If I could do a SQL query like "SELECT * FROM PLANS WHERE NO = '9985' AND STATUS = 'FINAL'" and magically have everything I need, but like hell if that's ever going to happen. When they can't even find a very specific file using the search function, there's no way in hell they'll add sufficent metadata.
Kjella
Live today, because you never know what tomorrow brings
I have also been kicking around ways to toss the trees:
http://geocities.com/tablizer/sets1.htm
I don't give any implemention details (other than showing some relational schemas), but I talk about possible features and interfaces. Perhaps such can be built on top of existing RDBMS rather than start from scratch.
Table-ized A.I.
I agree that people are brainwashed to think plain text is evil... XML is an improvement, since it is typically human readable, but if it is a smallish application with a reasonably small search space, then plain text, or even XML is much better than a binary DB.
/etc/passwd as a directory, able to cd in and see the data listed in different ways simply by going to different directories and doing an ls. Individual entries could be opened in a text editor, or vi /etc/passwd and you get the view you are accustomed to. It's all the same to the filesystem, no matter how you do it. If the lowlevel filesystem drivers are aware of the workings of the database, they can and should implement transparent text editing facilities (and structures convenient to shells). The MS registry is a train wreck for many reasons. One of those is they never intended on heavy end-user direct modification, so it is ugly and not well integrated in terms of access. It *could* be as easy as editing a text file when implemented at a low enough level, it's just that people don't.
However, I think something on the scale of a filesystem needs something more efficient, both in terms of storage space efficiency and performance. A plain text fs index would be large and slow to query compared with an optimized database format. That does not mean, of course, that the data has to have special tools to access it. Quite the contrary, when implemented at the OS level, textual representations can be generated rather quickly on the fly for editing if you so choose. I seem to recall reading up on some of the ideas in the next-gen Reiser filesystem. Their example was
Of course I don't agree with the goal of this article either. Too obtrusive to the user. I would prefer something along the lines of 'locate' with more sophisticated and up-to-date indexing methods and more comprehensive attribute recording. Take the many many hints already available rather than asking a user to provide more. Would have to be implemented on a lower level to do it right, but a small price to pay to avoid using find when up-to-the-minute info is needed.
XML is like violence. If it doesn't solve the problem, use more.
Isn't that the entire point of AlienBrain? Source control for artists. Because otherwise they might be tempted to call files "Final version of nysetex32 more recent final32.max" or something like that and then copy that into "Backups" and "new" directories :-)
Of course AlienBrain ain't cheap (and I haven't used it myself), but they've obviously seen a need...
We overlook those assholes to save money on antidepressants.
And yet, what is the filesystem itself but a very highly structured database anyway?
Look at the direction filesystems have been going. First we had simple filesystems with no directories, that just stored names and content. Then we had more complex filesystems that store directories, names, content, and specific attributes (owner, group, permissions). Then we had even more complex filesystems that stored arbitrary ACLs and metadata (NTFS). Alongside that development we have journalled filesystems that use (horrors!) transaction logs.
Since the filesystem is evolving towards being a full-fledged, transaction-capable database anyway, why not leap over the stuff in the middle and go straight for the gold?
Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
There are dozens of organizers for these types of files, all simple indexes. I can metatag any type of file in any OS using a simple indexing system. Simple indexing doesn't require integration of the index into the OS.
Ok--, my caveat is that this is the sort of product I won't use-- I find hfs's fit my needs pretty well, but I see this as good and important anyway for competitive reasons:
This is the same sort of result that Microsoft is trying to achieve in Longhorn, I believe. And this just means that we are beating them and coming out with their intended features *first.*
So, I am all for this because if people oooh and ahhh over Longhorn, we will have the alternative to offer them.
LedgerSMB: Open source Accounting/ERP
I help manage a hell of a lot more machines than that, buddy, and in my experience, which is in no way conslusive but is a damn sight more than yours, it's got some serious problems.
Last time we had to reinstall win2k on a machine was over a year ago. I wish I could remember the details of what went wrong with it. If I remember correctly, it had something to do with very extensive filesystem corruption.
Sticking feathers up your butt does not make you a chicken - Tyler Durden
I've been pondering this idea for a while.
:)
I've spoken about it in many IRC channels.
Now reading this Slashdot article, it seems that my idea is finally impelmented, and even the description is put in almost the exact same words I used to describe the idea.
I wonder if I inspired someone...
Even if they don't, as someone else suggested, always fill in the fields with the same bogus data, they'll figure out a way to create an empty document, save that with bogus data, then always open the document and save it under a new name or with as little extra typing as possible.
:-)
And then, they'll demand that YOU find it. The only way to solve these problems is to make the computer smarter than the user. Shouldn't be hard, but it is.
* And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."
load webpage. control-F. find: "scale"
0 Results Found
Not promising... if you pound on this thing, will it start taking forever?
I think a good metainformation system is desperately needed, though. I was thinking an XML structure with agreed-upon tags at the head of every file that the OS would intercept before it passed the file along to consumer applications.
So I need a different index for every type of file! I don't want an indexer for jpgs, a separate indexer for movies, another one for documents. What if I want to look for a common characteristic across several different types of documents? So I need to search through several little index programs. Not convinient.
Don't get me wrong - I still want to keep the HFS. But, I think an overlay with metadata would be significantly more effective.
Organizing a 120Gig drive is a chore. You have documents, music, pictures, movies, backups, notes, configuration files, and so on. HFS worked for 2Gig drives, but they are limited.
As for integration into the OS, lots of things are integrated into the OS now. Let me mention windows - thumbnails of pictures, web browsing, help functionality. KDE has sampling of audio files too. Just make an option to disable it, if you like it, use it. No arm-twisting either way.
This subject is a love-it/hate-it, that's for sure.
No, I don't trust in god. He'll have to pay up front, like everybody else.
A company I worked for in the mid-nineties wrote an ODMA integration module for AutoCAD which required that the user complete the title block of the drawing before they could save the file. The pertinent attributes were extracted from the drawing and passed on to the document management system (DMS).
With most DMSs, the file to be saved is full-text indexed as well (often this work is done as a background task during slower periods) so that you can locate a document with fuzzy searches, even if you do not no what attributes were used to store it.
Novell Groupwise includes an ODMA compliant DMS which also includes viewer modules for many common file formats, and with the web interface can allow a user on the road to search an entire library, view the results via a web browser, and download or checkout the desired documents.
It would be wonderful if someone could come up with a standards based way to provide similar functionality in a Free Software based DMS. I know of a few companies who cannot switch away from Windows/Novell because of the need for a robust DMS, and the clients to integrate with it. This is especially important in fields like Medicine and Legal, where large numbers of documents are generated on a regular basis.
All opinions expressed are mine, if you want them it'll cost you.
No, not a separate index for each file type. Doesn't matter what the file type is. It only takes a very simple db to set up a metatag index.
With that said, I do have one caveat about hierarchical systems, that aren't typically addressed by such systems - the problem when you need to have a copy of the same thing reside in two (or more) areas.
I typically run into this with bookmarked links - say I have a bookmark for "3D Computer Graphics Programming" - well, it would be nice if it could referenced in multiple areas (each of those words could be a topic, for instance), without needing a copy in each area.
*nix tends to solve this (in the "standard" hierarchical filesystems typically used) through symbolic linking, which isn't a bad thing, but maintenance can be high (of course, *nix and shell scripting can come to the rescue here). Also, sometimes it would be nice if I could just search on some "keywords" and find the links/files I need based on a description (many times I find myself googling and bookmarking a link I already have - it would be nice to google and get the links I already have in my bookmarks first, then the links from an external search engine - I want to keep my links instead of only using google, because sometimes the link "goes away", but there is enough info in it and the URL to track down the site again, if it has moved or whatnot)...
Metadata filesystems try to bridge these areas, but they suffer from the issue of the user needing to enter in data about the file (especially if it is a sound file - ie, MP3 ID tags - or image file, like a jpeg - and IIRC, there are tags for jpegs as well - but are hardly ever filled in) as they save it.
Perhaps what is needed is more of a combo - keep storing stuff in a hierarchical filesystem, plus store a pointer to that file in a searchable database, populating the fields with the metadata from the file itself (if it has such fields, like jpegs, MP3s, etc - if not, ask for some generic data to be filled in). Finally, you would need some front-end "search" tools for the metadata representations, as well as some maintenance tools or something to keep the links/pointers in the DB up-to-date with the location of the file.
Something tells me that a simple version of such a system could be whipped up with a few Perl scripts, MySQL, and a cron job (maintenance)...
Reason is the Path to God - Anon
"This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems." Umm, this is a very old idea.
Sometimes I have trouble naming things because there are so many choices. Now if I could just throw the choices into an attribute list, that might be easier. If his way of doing things is slightly faster than with directories and symlinks, then something has been gained, and it appears that it is.
What an idiot you are.
.db-based metadata engine and replace it with hooks into MetaWhateverFS when it's ready. Until then, it's kinda hard to use a capability that doesn't exist yet, right?
Here's a clue: if his code is done right, he should be able to rip out the
Metadata has always been a chicken-and-egg problem: fs metadata capability is useless without apps, and app support for metadata using non-fs databases gets morons like you screaming for the head of whoever committed the AWFUL CRIME of THINKING DIFFERENTLY!!
Fortunately for the rest of us, there have been a few free thinkers with either the thick skin or the general recognition of superiority to ignore people like you.
I do agree about people being lazy. But only when they store files from internet. I think they aren't lazy when they create files, so sharing files with metadatas will be a quick and clean way to organize personnal store. I have an idea not really cute but simple : xml and browser/p2p plugin.
Is that on a word or doubleword boundary. I'm afraid that I'm having trouble understanding your arguement, even given that you probably mean structure their thoughts. The proper tools can be an aid to structuring your thoughts, and just because a person can't structure their thoughts given one set of tools doesn't mean that when given another set of tools they still can't.
Taking end users that can't handle directories and subjecting them to some sort of SQL syntax is simply absurd.
On the contrary, teaching them some sort of SQL syntax may be exactly what they need to wrap their brains around the problem.
As others have said: if they weren't meaningfully naming files and directories before, they're not going to create useful metadata under this new system.
File names and directories present a conpletely different set of challenges to generating meaningful names than those that exist under the new system. I don't see what evidence is being used to come to the conclusion that everyone who weren't able to give meaningful names to files and directories under a heirachical file system won't create useful metadata under the new system.
I realize that this comment will most likely be lost in the 600+ already made by the time I found it. But I did a search for "PDA" in the comments and found nothing. Am I the ONLY person who noticed that this is what PDA's do already? My Zaurus, for example, has user-definable categories for everything. They can apply to anything from Todo entries, calendar events, contact information, even documents! And the interesting thing is that the documents still have their normal HFS name/path in the linux file system. So the PDA implemented this thing on top of the normal file structure. That what this new system needs. Instead of saving all files in one directory as {number}.{extension}, let people save it as whatever they want wherever they want it. And then store that path in the database instead of the number/extension. That doesn't seem very difficult to me. I guess if you got OS specific you could even store an inode number or something. But this thing has basically already been implemented pretty well on PDA's. It IS cool and useful. I've always wanted to see this kind of system on a normal computer. My only complaint is the lack of heirarchical categories (e.g., "Math302" should imply "School") on my PDA. But that seems to be something this guy has improved upon. He just needs to look at the Zaurus and see how they did it on top of the normal HFS system.
if(!cool) exit(-1);
Ok, that worked for that companies files, but how do I manage the 100,000+ files just on my laptop (and no, it's not porn. Some of us have real lives and real data!)? Moore's law also applies to the amount of files collected by users on their hard drives, but we are rapidly reaching the limit of files we can manage (i.e. navigate, not store) with the traditional file systems.
This is a problem of sufficient complexity that it is probably beyond any single individual to solve. The existing hierarchal system is flexible and simple to use. It aint broke, but it is limited in its ability to support the management of a large amount of disparate but loosely linked information. However, getting a taxonomical attribute classification file system into the mainstream will require a quantum leap, as there are many problems to be solved to achieve the same simplicity as our existing solution.
A good attribute based file system is real hard:
They can be used for different purposes and users
Attributes should be hierarchal (classifying): E.g.: Operating System::\bin\bash, or Operating System::\system32\drivers\bin
Note that while it is possible to have strong taxonomies that are non hierarchal through use of multiple attributes, the hierarchical structure gives a richer and more precise language for the taxonomy. Also, simple attributes like "blue" are fairly meaningless - are we talking about "emotion\blue" or "color\blue"?
Need to have multiple attributes to achieve a full classification. E.g. Operating System::~\My Documents\, Applications::\Word\, User Classifications::~\work\\projects\
Classification data required should be minimal in the beginning
Need to classify files enough to maintain "sensible" uniqueness. Timestamps can differentiate between two files for uniqueness (e.g. financial reports#1999 Vs financial reports#2000), but more meaningful classifications can be found (e.g. financial reports#My tax Vs financial reports#Enron Corp). (Especially as file system timestamps often don't match the original time relevant to the content).
Classifications for existing content should evolve over time, i.e. more precise classification data should be added to old content as new content is added.
We need multiple layers to support such a file system:
Underlying file system, APIs etc
Applications that are taxonomically aware
GUI and command line based tools of equal capabilities
Tools for manually classifying content
Navigation tools that provide rapid, intuitive navigation of multiple dimensions
Users trained to understand the file system metaphors & mechanics
And then we need:
Intelligence tools to automatically suggest classification attributes based on the content and the systems learnt understanding of the user.
We need to know when to stop! Hey, we could build a full AI system to manage contextual relationships and understand content, but let's get the minimal set of features required to make the use of the system compelling.
SQL is not the answer here...
The metadata management risks and issues are similar to other file systems. Remember DOS compressed drives? Eventually MS achieved reliable file system compression in NTFS. Yes, this is complex and risky until the bugs are sorted out. No, I don't want to run a production system on an unstable filesystem.
So good luck to Manuel. This will be an area of much activity.
In addition to the (ample) feedback provided by
/Data/Year:2002/Month:December/Type:Email/From:foo @bar.com
/Data/Year:2002/Month:December/*
:-) ?
others here:
1. I think that the save dialog should be reconsidered. "Traditional" is not an obvious label for the concept of saving a "regular file". Perhaps a multi-tab with "File" and "Database" as options would be less confusing for Joe User.
2. It is very important for such a system to also work at the filesystem level. Actually, for the guru users around here, it is key to acceptability! So, if I'm using bash tab-completion, why can't I do something like:
etc, browse my files according to a virtual heirachy based on various metadata?
3. Backups. I very much like the idea of doing:
tar cvzf 2002December.tgz
Similarly for delete!
4. External data. Is this system for only user data? What about files on an Intranet shared filesystem, or on Internet websites. Can I use this for Bookmarks out into the network? Filed along with relevant documents of my own?
All in all a good direction to take things, but I feel that without system-level integration it will not gain critical mass if only _some_ tools can talk to the database, hence the need for system-level integration. How will the system cope with LaTeX, ps2pdf, etc...
Well, a lot of it *is* the interface... It helps me remember things that I wouldn't normally remember on my trips into the hierarchies on the various computers. It's truly *one* place to find things... I could do the same thing with shortcuts to different things, but not only do I point to documents, but also to email, web sites, etc. And they're organized by the way that I think, so it doesn't matter if it's a text document, email, web site, application, etc. It's all connected. Also, I get to jot down notes about all kinds of things. No need for multiple text files, or one big text file.
One example:
I run a brick & mortar shop where I'm constantly buying new products. I get requests, I find cool things, and I have to keep track of them. I have a few thoughts that are "immediate", "longer term", and "like to have these if I can find 'em". Sometimes they're requests, so I jot a quick note. Sometimes they're in emails, so I'll link the email. Sometimes they're websites, so I'll link to those. And, I also have links from products, to say, ways I'd like to arrange stuff in the store. I need to get product X. Also, I'd like to re-arrange those shelves like that. And, oh yeah, I also have a bill to pay to vendor Y. And I also have to buy product Z next time I order from vendor Y. And here's the fax number for vendor Y. Oh, and Vendor Y has a custom app to order via EDI. Here's a link to that app.
It's all connected in ways that I think to keep it centralized and to jog my memory when I've got hundreds of different things going on. The only drawback is training myself to use it, like any other system. But when I do, I find myself forgetting less, and working much more efficiently.
Well, I'm certainly lazy about organizing my stuff. I don't care about having things organized on paper, I care about understanding things in my head. I would love to have a computer keep track of things that I needed to know, as long as it learned to follow what I'm thinking. Obviously we're a long way from that, but this here is a small but very useful evolutionary step. No doubt people will want to continue to categorize and hierarchical represent things, drawing on the strengths of HFS. But they will also learn to use search and metadata based filesystems to organize things in other ways. And once they get a chance to do this, we can use what we learn to take more steps towards being lazy. And then slovenly victory will be ours!
Ah, yes all those millions of Google users who don't really understand metadata sure are having a hard time searching for things...
I use the Perl script,
./Linux/commands.readme: ~~~PDF Files:~~~~~~~~~~~~~~~~
./Linux/commands.readme: 13. To read Adobe (Acrobat) PDF files,
./Linux/commands.readme: acroread #from Adobe itself
...
.
http://www.byyt.com/find-documents
find-documents
which returns a phrase's context,
1. hierarchical filesystem location, including sub-directory and filename
2. outline-context within that filename, including repeated character lines like
###ANALYSIS##################
and standard grammar school outline entries like
B. Second Point
3) third entry
i) first entry with search phrase somewhere
This find-documents script highlights the first instance
of a relevant file in green, and outlines found phrases in red.
Every outline-context line is also displayed; for example,
"find-documents acroread" replies,
postscript.readme: 3. pdf
postscript.readme: a. acroread 4.05 seems to work well.
postscript.readme: e. xv succeeded,
postscript.readme: However, acroread seems better.
Here, the search word "acroread" always gets highlighted red;
on their first instances, the search filenames
"commands.readme" and "postscript.readme" get highlighted green;
and the file-contents after each filename and ":" get left-aligned more neatly.
If I have no outline organisation in my files,
this find-documents still works,
though the only returned context becomes the file structure.
This find-documents script ignores backup files
like *.old2, *.10-12-2002,
This script acts rather like "grep",
but returns more lines (outline-context) from a file and highlights words.
I chose this approach as a way to keep and access my notes,
yet allowing other Linux tools to search my notes.
I have used this script for 4 years.
It searches my ascii notes well, but probably wouldn't work well
for indexing mp3 and Microsoft Word files.
If you use this find-documents script,
the option "--help" gives options,
and the script's first 150 lines document itself.
For example, "-r" recursively searches files in subdirectories.
Jameson C. Burt, jameson@coost.com
Hmmm, I don't know. Which do you use more, Google (search engine) or Open Directory Project (without the search engine into the directory)??? The advantage of a search engine over a directory (hierarchical file system, dmoz, whatever) only increase when the user submits some metadata when saving the file (google has very little metadata to work with).
OK, so why does version 0.1 of some software package warrent a Slashdot posting? Especially a package that claims to be a "true alternative... at the OS level", but in the very next sentence "works with all KDE apps" "through the modification of the KDE shared libraries". How is that at the OS level? While I'm at it, two years of work to collect some metadata and store it in a file named *.db??? I know guys who could do this in an afternoon.
I've noticed. A fair number of them seem to have been reading this thread...and posting to it.
Life sucks, but death doesn't put out at all....
--Thomas J. Kopp
Just so everyone is up to speed here, all filesystems have to store data on a physical disk. Pretty basic stuff, but when you start to think about the ideas suggested here (SQL queries against a raw disk device?? Physical layout by file type??), you need to rethink things a bit. The reason we have used HFS for the last 30 years is that it's very easy and efficient to store data on a disk using the directory/file paradigm at both the physical and UI levels. If someone here is smart enough to figure out a way to do a DB-on-a-disk, and (heaven forbid!) actually implements such a beast, you will forever be my hero.
All on-disk HFSes store metadata, and most (all?) store file metadata right next to the physical file. I know for sure this is the way ext2 and reiserfs work, having gone over the code myself recently. The hooks in the Linux 2.x FS API are designed to be even more flexible than this. If anyone knows of more and better metadata than any current FS implements, I invite you to roll your own and release a module.
In addition, there is no reason why anyone couldn't put together a user-space daemon to go over the on-disk filesystem recursively, say as a cron job, and put together a searchable index of metadata. This index could then be searched (via SQL even) by another user-land process, maybe even a new shell (sqlsh, anyone?). This shell (and the associated replacements for 'ls', 'cd', etc) wouldn't have to present the FS in hierarchical terms.
In any event, what we've got on disk isn't likely to change any time soon without a radical insight by someone who has experience bit-grovelling, and what we see as users is most likely to go on reflecting what we've got on disk unless someone writes the aforementioned userland tools. I have too many other things I'm working on right now, but mabye someone else will help us all out and start a new project?
This post expresses my opinion, not that of my employer. And yes, IAAL.
... are a better idea: search on Google,
there are many (proto) implementations.
The basic idea is to create dynamic folders
in which files appear only if they meet
semantic criteria.
Bye
A well adjusted person is one who makes the same mistake twice without getting nervous.
>SQL comes much more naturally to me than the find command does.
Then build a 'pseudo-SQL' front end to the 'find' command. What I mean is, a script which inputs SQL syntax and changes it to 'find' syntax behind the scenes.
% find
FIND> select where title like '%pig%' AND '%mud%'
FIND> {numbered files shown one to a line - type number to open in default program for type}
I have always wondered why FTP servers do not have a simple keyword find feature which is either restricted to the files a particular logged on user is allowed to see or the works for the admin.
Directory services like Novell and Active Directory do lookup of information by attribute.
;-), man i hate this moderation business.
Looks like what you are doing.
(only that directory services are mainly read only - update is a relatively slow operation. Maybe that's your problem too?)
of course, posting anonymous makes sure nobody reads this
I think that the hiericharial filesystem is the best way to organize file data. Remember that filesystems store a specific type of data. If you have to store something that dosen't fit well into the model, perhaps you need to use a database or something.
Jordan Bettis
Probably because having to FILL OUT the fields was a PITA. I think what the parent post had in mind was more like a series of checkboxes.
Speaking from bitter experience, you wouldn't believe the lengths some users will go to buck "the system". They see even simple checkboxes and drop-down menus as a gross infringement of their god-given right to create a useless mess. The same finger muscles that these people think with to make "New Document 1", "New Document 2" etc. get used to learn a series of rapid spacebar-and-tab clicks to make the bad metadata box go away. The result is all their documents get tagged as a single type. It takes good auditing (and a stern papal decree from management) in a large corporation to keep this to a manageable minimum.
OTOH, in small groups,where everyone can see who does what, it's harder to hide this kind of laziness. Newdocms looks very promising for workgroups.
Dear Mr. Arriaga,
:-)
We represent Microsoft Corporation, the owner of the federally registered trademark and service mark "MS". The mark "MS" is registered with the United States Patent and Trademark Office under registration numbers 1,775,441; 1,540,928; 1,342,353; 1,329,474; 1,318,717; 1,306,997; and 898018, copies of which are attached. We also represent the editors of Ms. Magazine, which is the licensee of the "Ms." trademark.
The federally registered trademark, "MS", is famous, distinctive and unique. Your use of this mark in the "newdocms" name dilutes the distinctiveness of the mark in violation of the federal trademark antidilution statute, 15 U.S.C. $ 1125(c) and California's antidilution statute. See, Archdiocese of St. Louis v. Internet Entertainment Group, Inc., 34 F.Supp.2d 1145 (E.D. Mo. 1999); Mattel, Inc. v. Internet dimensions, Inc., 55 U.S.P.Q.2d 1620 (S.D.N.Y. 2000); Deere & Co. v. MTD Products, Inc., 41 F.3d 39, 43 (2nd Cir. 1994).
Accordingly, in the hope that we may resolve this matter amicably, we request that you immediately cease and desist the use of this name and transfer it to us at once.
Yours Sincerely,
His Infernal Sliminess, Screwtape
Screwtape, Slubgob and Wormwood Attorneys
666 Wilshire Blvd.
Los Angeles, CA 90010
Suppose the software saves everything in memory resident database. No filesystem, and no disk. Everything stays in memory.
My Newton MessagePad 2000 computer (what I used before I could afford a real laptop) did this. And I got burned twice, in the same way: I had accidentally deleted the contents of a file and then typed some text. But because the app (called "NewtonWorks") saw those as two changes, and the app had only one level of undo, I had no way to recover the text because Newton applications automatically commit changes permanently. Had the app used an open/save metaphor like a Mac or Windows app, I would have been able to rollback my changes by close the document and answer no to "Save changes before closing?".
should still be able to have a thumbnail (say about 128 KB) attached as metadata.
Detail nit: Why should a thumbnail image take 128 KB? Most thumbnail images in image management programs I've seen are stored in a resolution close to 160x120 or smaller. At 160x120, a JFIF (.jpg) image saved in GIMP, with quality cranked up to just where the tiling disappears, weighs in at about 8 KB.
Will I retire or break 10K?
In windows there was this blasted Start button
I always thought of Explorer.exe's Start menu from Windows 9x, NT 4, and 2000 as a direct analog of Mac OS 7-9's Apple menu. The "Programs" item was Microsoft's attempt to twist the Win3.1 program manager into a half-clone of the "Recent Applications" feature of Mac OS 7.5.
Will I retire or break 10K?
I've never had to reinstall Windows 2000.
I've had to reinstall Windows 2000 because the power failed during a Windows Update. When I restarted my computer, I got "Login could not load msgina.dll. Please replace this file or reinstall Windows."
Will I retire or break 10K?