Slashdot Mirror


Best Way To Archive Emails For Later Searching?

An anonymous reader writes "I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?"

18 of 385 comments (clear)

  1. IMAP by klingens · · Score: 5, Informative

    An IMAP server (dovecot, cyrus, courier) of your choice for Linux. If you don't have a Linux server you can always run it inside a small VM.

  2. A Lawyer's Fantasy ... by perpenso · · Score: 4, Insightful

    I have kept every every email I have ever sent or received since 1990 with the exception of junk mail (though I kept a lot of that as well) ...

    You are a hostile lawyer's fantasy come true. ;-)

  3. Re:Psychiatric consultation! by balaband · · Score: 5, Funny

    This is slashdot. We save computers older than your dad just to use them as alarm clocks. Please leave.

  4. Print by JustOK · · Score: 4, Funny

    Print then scan

    --
    rewriting history since 2109
  5. Gmail? by spiffydudex · · Score: 5, Informative

    While not open source, Gmail has a good search engine that isn't sluggish. Plus it has roughly 7.5 gigs of space to store data. Use IMAP to push all of your emails to the server and then use that Gmail account for archive email only.

  6. mbox + grep by Anonymous Coward · · Score: 5, Funny

    I use mbox format files and grep.

    IMO, one can't get much more portable than that.

  7. Maildir by alexhs · · Score: 4, Informative

    Maildir.

    And if you have an e-mail client that don't support it, use an IMAP server to feed your client. /thread

    --
    I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
  8. Good IMAP Server by caffeinejolt · · Score: 5, Informative

    If this is really important to you, and you want it all to work across multiple workstations/OSes, your best bet will be to store it all in IMAP. If you have the means and motivation to run this yourself, I would recommend Dovecot. If you don't have the means and motivation, then you can use a service like Gmail to run your IMAP although you give up certain freedoms in doing so. For example, I use Dovecot coupled with Maildir++ as the physical storage format - as a result I can (if I wanted to) change to any email client I wish very quickly, use different email clients at the same time, etc.

  9. Re:Psychiatric consultation! by pz · · Score: 4, Insightful

    You, sir, are a mental case! I suspect you have OCD with some component of Aspbergers that is making you have this fixation on doing all this work to save ancient bits of information.

    How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important. When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

    IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea, as I want to have the email text and attachments in my possession.

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  10. Maildir by roderickm · · Score: 4, Interesting

    Maildir storage format is resistant to bit-rot because it stores each message in a separate file, and uses filesystem directories for mail folders. It's widely supported by user agents (mail readers) and IMAP/POP3/SMTP servers, so you'll never be stranded by the actions of a single software vendor. Finally, it's easily searched using everyday unix tools - find, grep, sed, awk, etc., and you can use the full-text search engine of your choice for speedy searches.

  11. Re:Psychiatric consultation! by Cylix · · Score: 4, Interesting

    I never thought of turning an ancient host into an alarm clock.

    Once however, I did hollow out an SGI case and turn it into a refrigerator.

    The case was just too damned pretty to throw away.

    --
    "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
  12. An Advertiser's Fantasy ... by perpenso · · Score: 5, Interesting

    And now the poster becomes an advertiser's dream come true in addition to being a hostile lawyer's dream come true. ;-)

    Remember that from Google's perspective gmail is a tool to better profile you for targeted advertising. Make sure you are OK with that before giving them access to all your emails.

  13. Re:RETARD MODERATION by Anonymous Coward · · Score: 5, Funny

    Parent is +informative and/or +interesting, not troll. Fucking brain dead moderators these days. Sheesh.

    it suggested a linux solution and made the windows weenies realize how useless their os is. by extension they realized how tiny their penises are and then they finally understood why they like Micro Soft because it describes them perfectly. so they got mad and said "i'll mod it down, yeah, that'll teach them a lesson and make me feel like a real man again!"

  14. DO NOT DELETE. by GuyFawkes · · Score: 5, Insightful

    I can't tell you the number of times I nearly deleted my archived data, going back to 1997 in my case, not just e-mail either.

    Then I got falsely accused of everything except 9-11 as part of a separation / child custody battle that started with a nuclear attack out of the blue.

    It is amazing how much of that old data is relevant in such cases, "He did x on 1st June 2000 at our house!" and you have data showing you were 200 miles away doing something you had completely forgotten, with someone you haven't spoken to or seen for 7 years, at the time...

    DO NOT DELETE YOUR ARCHIVES, EVER!***

    *** unless of course you are a bad person and they incriminate you, in which case you'd better avoid everyone else who archives data.

    --
    http://slashdot.org/~GuyFawkes/journal
  15. Echo chamber... by MrNemesis · · Score: 4, Informative

    ...has me doing a "me too!" to everyone telling you to use IMAP + maildir; I use dovecot myself, complete with self-signed SSL cert (curse you firefox!).

    El_Muerte_TDS has just pointed me towards mairix, a dedicated maildir + friends indexing system which I've just tried out, and seems to be ideal for my use - fast email search has always been a good thing for me, but I've rarely found a nice lightweight indexing solution that was catered only to mail; "desktop" search engines tend to take the opinion that if I want one thing indexed then I automatically want everything indexed, and also insist on running around the clock. Much nicer for my needs to just have one little lightweight indexing program that only runs when I want it to.

    Best thing about mairix IMHO is the way it creates a virtual maildir on the fly using symlinks, so not only is it easily viewable on the command line, it's also automatically compatible with all of those IMAP + maildir clients out there... which, last time I looked, was all of them. Useful hack for KMail users here.

    Disclaimer: my IMAP server has all its databases on an SSD, so even full text searches from the client are pretty speedy (seriously - the lack of access times on small chunks of random data cuts down search times by at least an order of magnitude), but obviously mairix has the advantage of being able to scale to multiple users with >X GB mailboxes much easier than spending a fortune on fast storage.

    --
    Moderation Total: -1 Troll, +3 Goat
  16. Domino by Belial6 · · Score: 4, Funny

    Yes, it is not free, and yes, this suggestion will bring out the trolls, but you might want to consider Lotus Notes/Domino. It is ~$140 for the system, and ~$40 a year maintenance (Includes all upgrades) cost per user, but IBM isn't going anywhere any time soon.

    It has good full text indexing, you can keep your mail on a client, and on the server, with incredibly flexible replication rules for what is stored where.
    It supports IMAP, so it talks well to most clients.

    The iPhone syncs seamlessly with it via ActiveSync, and an Android client is in beta as we speak.

    It includes an http client, and the http client even offers offline access. That's right. You can use the http client, and still read your mail and write emails that will be sent the next time you make a connection.

    It also has folders, but you can put any email into as many folders as you want, so you have the best of both Outlook folders and Gmail tags.

    It supports auto-processing rules for automatic filing of data, as well as being a full development environment if you want to get really fancy.

    It is brain dead easy to set up and maintain.

    The server runs on Linux and Window, and the client runs on Linux, Windows and Mac.

  17. Re:It's obvious - Gmail by flappinbooger · · Score: 4, Insightful

    It's obvious, upload them to gmail!

    (only half kidding)

    --
    Flappinbooger isn't my real name
  18. Re:RETARD MODERATION by halltk1983 · · Score: 5, Insightful

    Virtualbox is platform independent, and he also mentioned using a VM. Once all the email is on the IMAP server in the VM, you could easily attach to it with a client that runs on any platform.

    Also, IMAP servers are platform independent, as they can run on OSX, Windows, Linux, BSD, and almost any other popular OS I can think of. It's just that Linux distros are common, easy to set up, and light enough on resources that they would be easy to set up in a VM, and without the licensing costs of OSX or Windows, it becomes price comparable to lesser solutions.

    I know it's a lot to ask these days to get people to read the comments that they are replying to, but maybe, just maybe, someone complaining about a lack of reading comprehension should take more time to read.

    --
    Watch for Penguins, they eat Apples and throw rocks at Windows.