Best Way To Archive Emails For Later Searching?
An anonymous reader writes "I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?"
I have kept every every email I have ever sent or received since 1990 with the exception of junk mail (though I kept a lot of that as well) ...
You are a hostile lawyer's fantasy come true. ;-)
See subject.
You, sir, are a mental case! I suspect you have OCD with some component of Aspbergers that is making you have this fixation on doing all this work to save ancient bits of information.
How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important. When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.
IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea, as I want to have the email text and attachments in my possession.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Yes! The thing that appeals to me the most about using Gmail is that searching through 5+GB of old emails won't make everything in my machine slow to a crawl. Even with the free Gmail account, you can up the storage to 20GB for $5/year, and that extra space is available from other Google services connected to the same account.
If you want to have more flexibility, sign up for a Backupify account, which can backup Gmail pretty well. As a bonus, when Backupify stores your backups they are kept in plain text format, so you can always pull these and move them elsewhere without having to worry about issues with Gmail's storage formats.
Pedro
----
The Insomniac Coder
At work, we needed to archive (for compliance purposes) all the inbound/outbound email messages of our users (about a 1K aprox). We setup an Ubuntu server with postfix and dovecot IMAP over SSL, using Maildir.
Our users generate about 20K email messages daily, and we store each day in it's own directory, something like this:
INBOX
|- YYYY
|- MM
|- DD
The auditors use Evolution to connect to the archive server and search the emails, even though it takes a little while to load a day of emails for the first time, once it's properly loaded searching is really fast. The server is not that powerful, it's a VM with 2 CPUs and 2GB of RAM. You do need a lot of storage though.
Hope this helps.
--Necesito una chela, bien fria...
I can't tell you the number of times I nearly deleted my archived data, going back to 1997 in my case, not just e-mail either.
Then I got falsely accused of everything except 9-11 as part of a separation / child custody battle that started with a nuclear attack out of the blue.
It is amazing how much of that old data is relevant in such cases, "He did x on 1st June 2000 at our house!" and you have data showing you were 200 miles away doing something you had completely forgotten, with someone you haven't spoken to or seen for 7 years, at the time...
DO NOT DELETE YOUR ARCHIVES, EVER!***
*** unless of course you are a bad person and they incriminate you, in which case you'd better avoid everyone else who archives data.
http://slashdot.org/~GuyFawkes/journal
just because I can.
That's a big assumption. You are asking slashdot, so I'm thinking you can't. Especially because imap never occurred to you.
It's obvious, upload them to gmail!
(only half kidding)
Flappinbooger isn't my real name
Virtualbox is platform independent, and he also mentioned using a VM. Once all the email is on the IMAP server in the VM, you could easily attach to it with a client that runs on any platform.
Also, IMAP servers are platform independent, as they can run on OSX, Windows, Linux, BSD, and almost any other popular OS I can think of. It's just that Linux distros are common, easy to set up, and light enough on resources that they would be easy to set up in a VM, and without the licensing costs of OSX or Windows, it becomes price comparable to lesser solutions.
I know it's a lot to ask these days to get people to read the comments that they are replying to, but maybe, just maybe, someone complaining about a lack of reading comprehension should take more time to read.
Watch for Penguins, they eat Apples and throw rocks at Windows.