How Do You Organize Your Data?
kpellegr asks: "After returning from a well deserved holiday, I was faced with an exploding inbox. While organizing and deleting my mail, I realised I was having trouble classifying each mail into one specific folder. I had the feeling I should be able to link to one email from several folders (e.g. product information should be linked to from the 'vendor' folder, as well as from a specific project folder where this product is used). The more I thought about this, the more I realised that trees (such as the Windows filesystems) are not really ideally suited for organizing data. On UNIX-like filesystems, symbolic links allow the creation of simple graphs for organising data, but I have the feeling data could be organized more efficiently. How does the Slashdot crowd organize their data? How do you manage files, email, contacts, meetings and all the relationships that might exist between them?"
This is exactly the concept behind virtual folders. The idea is that folders, whether they be in the context of an email program or a filesystem, are actively updated searches. For example, all of your emails could be in one pool, invisible to you. Then each folder would be associated with a rule similar to email filter rules we use now. If an email matches, it shows up, maybe in multiple folders. Bayesian rules allow for even better classifications, if an email is similar enough to several catagories, it can show up in all of them.
Spencer Ogden
A dash of arbitrary directory trees and a pinch of grep.
But seriously, this subject is kind of lacking. The problem I have with organized storage is keeping it organized. I don't have the time nor the will. I need some sort of automagic organization.
The good news is, that while the Window's file system may not support this, if you wait until 2005 (2006, 2007?), this highly demanded feature will be in the next release of Windows -- yes, everyone's favourite Longhorn will turn everything into a database.
Frankly, I don't think turning an OS into a DBMS is the right thing to do, but for certain applications, having this functionality omnipresent will be useful. Well, OK, for this one application, I'm still waiting to see examples of others.
A friend of mine used to use what he termed an archaeological filing system.
It was based on the simple principal that the older something was the further down in the pile it would be.
Your all-in-one-folder technique and "ls -t" would work equally well.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
All mail are kept into one place (say, a MySQL database). You, however, setup filters (that is, SQL queries) that show your e-mails in virtual folders.
That is, messages can be in as many folders as they meet the selection criterion of.
In addition to the obvious "from", "date", "subject", you could assign an arbitrary number of categories which could constitute more selection criteria.
David Gelertner, the comp sci professor author and unabomber victim, has created software he calls Scopeware. It basically organizes information in a series of related chains. These can be date based or otherwise. I haven't used it, but I've read that he is responding to some of the same concerns you mention.
On a less lofty, but free, note, Evolution has "virtual folders" in which you can place anything a filter expression can select. I use them to sort my email by sender address. I still have my main inbox, and all the categorized subfolders, but the virtual folders select particular people out of the massive mail database. So I can recall that Joe said something three weeks ago that relates to a current problem, and look in the "Joe" virtual folder to find it. There's still no easy way to add arbitrary messages to a virtual folder, other than adding a filter rule that selects just that one message. At least I haven't found a way. But it seems to address part of your concern, for email at least.
"Even if you are on the right track, you'll get run over if you just sit there" - Will Rogers
Or is that KAOS (as in "Get Smart") ?
I'm currently playing around with putting all my mail messages, bookmarks, web pages loaded, file accesses (on a day to day basis) into a database. Maybe not all the actual data, but the stuff that might help me find it when I need it. I'm hoping to eventually scan everything that changes on my computer or that I do for keywords and so on and then organize them so I can browse them by some kind of visual graph/map metaphor on any of several axes (type of file, date/time, keywords, directory ....).
I want to be able to go in with a query like "sometime in july I did something having to do with a picnic and watermelon" and get a list of possibilities, then be able to rate those in the hopes of finding the exact info I'm looking for.
OK, so far I only have some pieces of it. But I'm getting closer to a database schema for the information and that will help me figure out better what info I need to collect.
As many people will probably point here, you should check out Evolution's "virtual folders".
JWZ once proposed a more sophisticated approach to store mail without the hierarchical folder structure limits. You can read about it here: Intertwingle
I don't what came out of that. I think it is a good idea still waiting to be implemented.
I know other people have mentioned Evolution's vFolders, but here a little more.
My goal is to never have an email that has value to me land in my inbox. Every time I get an email of "value" which stays in Evolution's inbox, I right click, and "Create Filter from Message". (I'm paraphrasing.)
Every good message should have at least one filter putting it into at least one folder. Some emails have more than one rule, but the whole right click -> create filter thing makes this quick and easy.
-Pete
Soccer Goal Plans
Keeping email organized is a lot harder than it should be. There is no good way to deal with things like a seminar announcement that I need to keep for two weeks but is junk after that, or stuff that I need to remember to read or reply to but don't want to read right now (or stuff I keep because I should read it but don't want to actually read ever).
It is also hard to remember that, when someone emails me some document, the place to store it is not in an email folder, but a directory dedicated to that project or subject. Like if someone sends a reference for a paper I am writing, it should go in ~/papers/journalname/papername/references or something, not just stay as an attachment in my inbox.
And once in a while, you have to waste a day or two reorganizing your crap and deleting old email. This is especially hard when I have copies of documents or programs on different computers, because I have to figure out which ones are the most recent and are the authoritative copy. CVS and rsync help here; CVS makes it obvious which copy is the best one (the one in CVS), and rsync makes it easy to keep things identical on different machines so you don't have the problem to begin with.
What was the question? Oh yeah. Let google index your entire file tree and use it to find stuff.
I think MIT has a project called Haystack designed just for this
Ugh... hate to say it... Outlook client using exchange.
There I said it. Ok, to be fair, I use it because that is what is available and that is what everyone is use, all 800 or so of us... and that is in our org, which is a child org to a much larger org... so a total userbase of about 6000 users...
Here's why it works. I use partially Bayesian based InBoxer to kill spam. Our exchange server also runs Norton anti-virus (which has saved us from SoBig all that crap)... and then the exchange also has a spam filter which adds "spam:" to the subject of all incoming know spam e-mails (which does me not much good).
Ok, that takes care of spam. All list-serves I belong to get put into their own folders... Emails for friends get put into a specific folder. This leaves my inbox. My inbox is shared with all my 'trusted' co workers. When I am gone, they check it on a regular basis for me while I am gone. If I am expecting a high priority e-mail from a certain person, I set it up so an alert e-mail is sent to the right person then that comes in.
For my tasks, that is also shared. When I am gone, I forward my tasks that are due during that period to the right person.
My calander is also shared. On my calender, I mark when I will be gone, and then setup a special list of those who should be alert when they send me an e-mail or task during that period (this stops an e-mail alert being sent to those list-serves I am on when I am gone).
As for files: I manage the share on our central server that we all use. We just went through a major undertaking to get it up to par. ALL files are saved on the server. Everyone has a private drive, and then each 'task' or 'subject' or 'project' has its own folder on the server. Some folders are public, or 'all on our domain'... a majority are 'departmental access' (every one in our small org)... the rest are specific, generally with 3-4 people.
It takes work. But I have access to the files I need and so do the other people in my org. It takes a lot of user education, training, and hand holding.
Couple all this with decent VPN (cisco based) and most users get what they need when they need it.
Oh, and this is at a college. Most departments are as well off as we are. And, yes, slammer has been a bitch to deal with as students move in... but many dedicated staff have solved that problem (not to mention some ingenious network guys... hats off!).
There's always The Brain (thebrain.com) which has a pretty high geek factor but works on a fairly simple premise that data can be organized many different ways in ones brain and provides paths to information based on those associations.
"Do not be swept up in the momentum of mediocrity." - anon
It helps to have a filesystem designed with Database features in mind (ie. just like the BeOS file system). Emails are stored as normal files, with attributes like To, From, Title etc stored in the database. The same concept can be used for media files (MP3 attributes are stored into the database). When you wish to search your data, you can write queries, which are live on the BeOS, and have the results displayed in a directory window.
:-) have been using this feature for years now...
It's rather awkward to explain, but it works amazingly well in practice. Once you've tried it, you realise that there is no need to store data in directories, just make sure that the attributes are up-to-date, and finding any file is a query away. Rumour has it that Windows will adopt a similar system in Longhorn. Yeah, we BeOS users (all 20 of us
Revolution = Evolution
I think this is a step in the right direction. I have been using it for a while now - check it out.
"The goal here is to do for email (starting with your personal mailbox) what Google did for the web... The Google principle: It doesn't matter where information is because I can get to it with a keystroke. So what is Zoe? Think about it as a sort of librarian, tirelessly, continuously, processing, slicing, indexing, organizing, your messages. The end result is this intertwingled web of information. Messages put in context. Your very own knowledge base accessible at your fingertip. No more "attending to" your messages. The messages organization is done automatically for you so as to not have the need to "manage" your email. Because once information is available at a keystroke, it doesn't matter in which folder you happened to file it two years ago. There is no folder. The information is always there. Accessible when you need it. In context." ZOE
I used to beta this thing by this company called Autonomy which would sort and sift all your (and everyone elses) cruft to assemble a list of relevant links (to your stuff and others) in response to your activities.
;-)
IMO it did this in real-time, must have made for some impressive indices.
Maybe this is the answer, open-source Autonomy. I am a mere perlmonks acolyte so I will leave it up to the real brains to figure it out
Not sure if this has been mentioned (probably has), but the new Longhorn release of Windows is supposed to be shipped with a new file system (WinFS) which does exactly what you need. It (again, all just theory right now) will work by using a SQL database instead of a FAT table. This means you can now classify files.
So you'll access a "folder" which basically has a list of properties, and all files with those properties will be show. So if I want all my pictures from my vacation to hawaii, as well as my monthly financial reports, I'd create a folder that "contains" all files on those subjects, and whenever I accessed that folder it'd show me all files that fit those catagories. But on the same hand I can have another "folder" which shows me just my vacation pictures.
I have pondered the same thing. Being a relational fan, I of course lean toward sets instead of (or in addition to) trees. Here is my webpage describing various post-tree approaches and interfaces:
http://www.geocities.com/tablizer/sets1.htm (I know, geocities sucks, but there are too many links to it already to switch.)
Table-ized A.I.
I know other users have already pointed out how well Evolution works for sorting mail, but I just wanted to attest to how well it works even for large amounts of email.
I used to create new folders for specific types of email, but I found it very difficult to manage and search all the folders after a while, so I ended up moving all of my email to a single folder, Inbox. I currently have 24,949 messages in my Inbox and Evolution is still extremely fast when it comes to sorting and searching through them all.
I also make use of the excellent VFolders feature of Evolution, to save frequent searches into their own folders. I've been using Evolution now for several years, and it just keeps getting better and better.
--It's Pimptastic!--
For file systems I use symbolic links in a column viewed filesytem. I really like what a company formerly known as NeXT has done with some of their products. Their software for pictures and music both have a "Library". From there you can drag songs or pictures into "Playlists" (music) or "Albums" (photos).
Very cool.
As for software, I use OmniGraffle and OmniOutliner from OmniGroup. OmniOutliner is especially simple, yet unique. I wonder why no one else has an idea organizer that is so incredible? I couldn't do my job without it. Well, I could, but I'd use a lot of paper or spend a lot of time in OpenOffice messing around with things.
porn1
/home kills my /home disk freeocity, so I move some stuff to /usr/local, and set up a symlink. Then /usr/local runs out of diskspace, so I set up another symlink to /var, etc. Eventually, it all comes crashing down when I can't make a symlink in /dos, because of stupid lacking features of a dos fs.
.avi files though, that I didn't know I had :)
porn2
New Folder
New Folder(1)
unsorted_porn
mp3s
I made the mistake of making too many partitions on my drive. So my porn on my
I'm sure I've got all this porn stashed away somewhere on some random partition on my drive that has no links pointing to it, so I'll never find it. I love it when I do find a 1GB stash of