Slashdot Mirror


Best Way To Archive Emails For Later Searching?

An anonymous reader writes "I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?"

64 of 385 comments (clear)

  1. It's obvious by Mikkeles · · Score: 3, Funny

    Alphabetically!

    --
    Great minds think alike; fools seldom differ.
  2. IMAP by klingens · · Score: 5, Informative

    An IMAP server (dovecot, cyrus, courier) of your choice for Linux. If you don't have a Linux server you can always run it inside a small VM.

    1. Re:IMAP by wealthychef · · Score: 2, Informative

      Well, on OS X the searching problem *should* be solved by Spotlight, as it indexes "all files on your hard drive" (not) into constant-time searches automagically. The trouble with Spotlight is that Apple does not search all folders and I do not know of a way to enable it to search all folders. If you import it into Mail.app, you do get the indexed behavior, and my situation is similar to yours, and I do exactly that. But all those billions of old messages, I keep in an archive that I never look at.
      Anyhow, look into Spotlight on OS X. Ooops, you said "open sourced," right? Damn. I don't know, then.

      --
      Currently hooked on AMP
    2. Re:IMAP by 19thNervousBreakdown · · Score: 3, Informative

      Seconding this. I've been using Dovecot with Maildir on EXT3 for the last few years--my mailbox is about 25k messages, which I keep all in a single folder and use IMAP tags to organize into different virtual folders, much like Gmail's system but without the privacy concerns.

      Dovecot's supplementary indexes makes everything extremely fast (tags, dates, etc), and anything it doesn't catch Thunderbird does, I can search my entire mailbox for a single word in less than a second. I lose my Thunderbird indexes whenever I move to a new computer, but that's just a matter of leaving the client up for a few hours.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    3. Re:IMAP by Graff · · Score: 2, Informative

      No way to archive Entourage data with Spotlight.

      Actually, there is:
      Using Spotlight to search Entourage.

    4. Re:IMAP by Compuser · · Score: 2, Insightful

      Huh? How does a server help with a local archive of emails? Does any of these servers help with importing emails (pre-mbox arapnet emails for instance
      or dbx emails for a more modern example)? Does it provide fast searching (including .doc and .ppt awareness)? This may be a storage approach but does
      not begin to deal with the question raised.

    5. Re:IMAP by flyingfsck · · Score: 2, Informative

      Totally. All my email since about 1998 is in a Citadel mail server. It uses the BerkeleyDB, which can handle 256 Terabytes of mail. That should be enough for any semi-sane person...

      --
      Excuse me, but please get off my Pennisetum Clandestinum, eh!
    6. Re:IMAP by AlterEager · · Score: 2, Informative

      Afaik Cyrus does not support server-side mail searches

      Oh yes it does. Check out squatter.

    7. Re:IMAP by vanyel · · Score: 2, Informative

      I am using dovecot and thunderbird, and have about 60 "live" folders, some with 10's of thousands of messages (a couple with 150+k messages). It is a constant battle with thunderbird, which often goes away for long periods of time, even when not doing anything one would expect to be dealing with the larger folders.

      I'm working on some scripts to archive messages into 30-90-180 day archive folders to keep the live folders down to a manageable size, but it would be nice to find something that already exists...

  3. Delete by Anonymous Coward · · Score: 2, Insightful

    Time to delete them all

  4. A Lawyer's Fantasy ... by perpenso · · Score: 4, Insightful

    I have kept every every email I have ever sent or received since 1990 with the exception of junk mail (though I kept a lot of that as well) ...

    You are a hostile lawyer's fantasy come true. ;-)

    1. Re:A Lawyer's Fantasy ... by ShakaUVM · · Score: 2, Insightful

      >>You are a hostile lawyer's fantasy come true. ;-)

      We've won a couple lawsuits because I save all of my email.

      We had a contract to do a workshop with Maricopa County - the same people whose Sheriff is under investigation by the FBI right now, and of Immigration Law fame. And who have a lot of other shady things going on right now, but I digress.

      I'd traded a series of emails with them planning the workshop. Everything was all set. Then, about a week before the workshop, they say they don't need me to come after all. Ok, sure. So I try to reschedule with them. Nope, sorry, you didn't show up to the workshop, so you breached the contract. I sent them a copy of all the emails. Nope, sorry.

      Filed a lawsuit. They wouldn't settle. Showed the email trail of everything. Got a check for over $30k. Didn't have to do the work. (Of course, I'd have preferred if everyone had just done as they'd said, and it was much more of a hassle to sue than to just do the damn work.)

      Lawsuits are often won over who has the best documentation. If you do your work honestly, having full email records is probably going to help you more than hurt you in lawsuits.

  5. Google Mail. by sidragon.net · · Score: 3, Insightful

    See subject.

  6. Re:Psychiatric consultation! by balaband · · Score: 5, Funny

    This is slashdot. We save computers older than your dad just to use them as alarm clocks. Please leave.

  7. Not Much by maxume · · Score: 2, Informative

    It isn't particularly platform independent (because no one is paying much attention to Windows), but Not Much offers threads and full text search:

    http://notmuchmail.org/

    --
    Nerd rage is the funniest rage.
    1. Re:Not Much by koiransuklaa · · Score: 3, Informative

      +1

      Notmuch can manage absolutely insane amounts of email without any artificial 'archiving'. Of course, if you are looking for a a program that does something else than tagging and searching (like sending, composing or receiving email), you need to look elsewhere.

  8. Print by JustOK · · Score: 4, Funny

    Print then scan

    --
    rewriting history since 2109
  9. Gmail? by spiffydudex · · Score: 5, Informative

    While not open source, Gmail has a good search engine that isn't sluggish. Plus it has roughly 7.5 gigs of space to store data. Use IMAP to push all of your emails to the server and then use that Gmail account for archive email only.

    1. Re:Gmail? by siliconbits · · Score: 2, Insightful

      I second that. Invest in Google Apps to benefit from additional services as well.

    2. Re:Gmail? by pvera · · Score: 3, Insightful

      Yes! The thing that appeals to me the most about using Gmail is that searching through 5+GB of old emails won't make everything in my machine slow to a crawl. Even with the free Gmail account, you can up the storage to 20GB for $5/year, and that extra space is available from other Google services connected to the same account.

      If you want to have more flexibility, sign up for a Backupify account, which can backup Gmail pretty well. As a bonus, when Backupify stores your backups they are kept in plain text format, so you can always pull these and move them elsewhere without having to worry about issues with Gmail's storage formats.

      --
      Pedro
      ----
      The Insomniac Coder
  10. OK, My Favorite by BoRegardless · · Score: 2, Interesting

    MailSteward on the Mac.

    SQL database. Good, Inexpensive, works w/many tens of thousands of emails & more.

    http://mailsteward.com/

  11. Mbox or SQLite by Anonymous Coward · · Score: 2, Insightful

    If you want an "email format" why not mbox? Many things currently support that as an import option.

    If you want a database, why not SQLite? It's about as open as can be, backwards compatibility is almost a religion and should have no problem with hundreds of thousands of entries.

  12. mbox + grep by Anonymous Coward · · Score: 5, Funny

    I use mbox format files and grep.

    IMO, one can't get much more portable than that.

    1. Re:mbox + grep by Anonymous Coward · · Score: 2, Informative

      I second that, and I also use mutt for this because it's so damn fast.

      1. cd Mail/old/
      2. grep -c pattern *
      3. mutt -f candidate-file
      4. use 'l' commands with patterns on mail fields, e.g. subject, from/to, body
      5. view limited message set in thread-sorted mode
      6. tag messages of interest
      7. save tagged messages to a small mbox, or attach them to a newly composed message and send

      it takes mutt about 3 seconds to load a 280 MB archive file with 16k messages on my machine, and less than a second to limit the display by recipient or about 3 seconds again to limit by keywords in the message body. I used to make an mbox per quarter year, but then I started merging them into one per year, as well as going back and purging some of the largest attachments. (Mutt also makes this easy: sorting by message size, then selectively saving and/or deleting attachments. There's usually 10-50 messages in a huge archive which are dramatically larger than the rest, and dealing with these makes the archive much more manageable.)

  13. Maildir by alexhs · · Score: 4, Informative

    Maildir.

    And if you have an e-mail client that don't support it, use an IMAP server to feed your client. /thread

    --
    I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
    1. Re:Maildir by El_Muerte_TDS · · Score: 3, Informative

      mairix is a useful addition to a maildir setup: http://www.rpcurnow.force9.co.uk/mairix/

    2. Re:Maildir by Anonymous Coward · · Score: 2, Informative

      Maildir.

      And if you have an e-mail client that don't support it, use an IMAP server to feed your client. /thread

      With the proviso that you probably want to break up your archives in something akin to the following format:
      . 2009
      . . Q1
      . . . Sent
      . . . Received
      . . Q2
      . . . Sent
      . . . Received
      [...]
      . 2010
      . . Q2
      . . . Sent
      . . . Received

      Lots and lots of messages in a directory can cause problems with many file systems. If you have more than say ~8K or so messages in a folder, I'd recommend breaking it up. At work this is what I do at work (CY/Qx/Sent-Received), which also allows me to move entire quarters into PST files when I hit my quota, but also keep a bunch of stuff online for access via web mail. Instead of quarters you can also do "halves"--H1 (Jan-Jun), H2 (Jul-Dec)--and then Send/Received with-in that.

      For my personal account, I generally haven't broken 4K messages per year yet, and so simply have (CalendarYear/Sent-Received). I currently have everything going back to about 1997 or so.

      Except for sorting mailing list traffic, I don't use folders besides Sent & Received: new messages go into my Inbox or a list folder. If I want to keep the message, it always goes into Received regardless of where it originally arrived. I try clear out the mailing list folders daily so there's no build up (select-all, delete). If I want a message that I got, I know it's going to be in one of the Received folders, and so don't have to digging in a dozen different places trying to figure out some sort of classification system.

      Work is Exchange/Outlook; personal is IMAP.

    3. Re:Maildir by Sancho · · Score: 2, Informative

      mairix is good, but it has some warts and it is not under development anymore. Among other things, it can run out of memory, has problems with parsing certain multipart messages, and can't search for an IP address (or any other string with dot-separated tokens.)

      It's about the best I've found, but I wish someone would pick up development and fix some of the issues. As time goes on, bit-rot is going to set in and mairix will get less and less useful.

    4. Re:Maildir by jgrahn · · Score: 2, Insightful

      Maildir storage format is resistant to bit-rot because it stores each message in a separate file, and uses filesystem directories for mail folders. It's widely supported by user agents (mail readers) and IMAP/POP3/SMTP servers, so you'll never be stranded by the actions of a single software vendor. Finally, it's easily searched using everyday unix tools - find, grep, sed, awk, etc., and you can use the full-text search engine of your choice for speedy searches.

      The only sane alternatives are, as far as I'm concerned:

      • a collection of mbox files
      • a collection of gzipped mbox files
      • a collection of Maildir folders
      • a collection of tarred and gzipped Maildir folders

      Maildir isn't quite as well supported as mbox, but I suppose it's sometimes more convenient to grep these since you get a hit on the particular mail you're searching for, not the mbox file which contains that mail and a thousand others.

      I use gzipped mbox files. One thing I have considered doing is to convert away Quoted-Printable MIME encoding and use Latin 1 (or UTF-8) everywhere. That would make the mboxes easier to use with standard tools like text editors and grep.

      I would never use a database for this. It serves no purpose, except as an invitation for the fuckup fairy. The searches you'd want to are free-text searches anyway.

  14. Good IMAP Server by caffeinejolt · · Score: 5, Informative

    If this is really important to you, and you want it all to work across multiple workstations/OSes, your best bet will be to store it all in IMAP. If you have the means and motivation to run this yourself, I would recommend Dovecot. If you don't have the means and motivation, then you can use a service like Gmail to run your IMAP although you give up certain freedoms in doing so. For example, I use Dovecot coupled with Maildir++ as the physical storage format - as a result I can (if I wanted to) change to any email client I wish very quickly, use different email clients at the same time, etc.

  15. Re:Psychiatric consultation! by pz · · Score: 4, Insightful

    You, sir, are a mental case! I suspect you have OCD with some component of Aspbergers that is making you have this fixation on doing all this work to save ancient bits of information.

    How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important. When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

    IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea, as I want to have the email text and attachments in my possession.

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  16. Maildir by roderickm · · Score: 4, Interesting

    Maildir storage format is resistant to bit-rot because it stores each message in a separate file, and uses filesystem directories for mail folders. It's widely supported by user agents (mail readers) and IMAP/POP3/SMTP servers, so you'll never be stranded by the actions of a single software vendor. Finally, it's easily searched using everyday unix tools - find, grep, sed, awk, etc., and you can use the full-text search engine of your choice for speedy searches.

  17. Re:Psychiatric consultation! by Cylix · · Score: 4, Interesting

    I never thought of turning an ancient host into an alarm clock.

    Once however, I did hollow out an SGI case and turn it into a refrigerator.

    The case was just too damned pretty to throw away.

    --
    "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
  18. citadel by samjam · · Score: 3, Informative

    citadel at www.citadel.org is a full pop3/imap server with full-text indexing.

    Thunderbird can use server-side searches to find messages, and I find that works pretty well.

  19. An Advertiser's Fantasy ... by perpenso · · Score: 5, Interesting

    And now the poster becomes an advertiser's dream come true in addition to being a hostile lawyer's dream come true. ;-)

    Remember that from Google's perspective gmail is a tool to better profile you for targeted advertising. Make sure you are OK with that before giving them access to all your emails.

    1. Re:An Advertiser's Fantasy ... by Nemilar · · Score: 2, Interesting

      OK, so I hear this a lot and I never really understand the problem.

      The "unwritten gmail contract" (and it actually applies to most Google products) is this: We will give you a service for free (in this case Gmail), and in return we are going to profile your use of that service to select ads for you. In the case of gmail, they give you however many GB of storage, always-on cloud email, and the best searchable email system I've ever seen. There are other Google examples, from gtalk to Google Docs. The basic principle behind it is the same, most people understand the deal, and I don't see anything wrong with it. There's no such thing as a free lunch, but this is pretty close.

      --
      Nemilar http://www.techthrob.com - Visit Me!
    2. Re:An Advertiser's Fantasy ... by zekele2 · · Score: 2, Informative

      This is why I pay Google for handling my email. I use Google Apps Premier Edition with my own domain. $50 per user per year, it's cheaper than paying for Office/Outlook, there's 25Gb of space per user, and NO advertising. Using my own domain means there is no lock-in, I can use IMAP and switch to another provider any time.

  20. Re:Psychiatric consultation! by garcia · · Score: 2, Interesting

    Starting with GMail I have kept every e-mail since 6/22/2004. I also brought over many e-mails I had in my saved folders from long before that. Am I insane? No. I have found this archive incredibly useful for any variety of uses even 6 years later.

    Nothing like having your wife ask, "man, I wish we still had the recipe for deviled eggs we made in college. Too bad it was back in 2001." "No problem honey, hold."

    Date: Fri, 26 Jan 2001 13:40:46 -0500
    From: yoyoskippy
    To: garcia@tigerose.com (now dead, have at it spammers)
    Subject: Deviled eggs

    Deviled Eggs

    6 hard cooked eggs
        (throw two more eggs in, so you can check how they are doing)

    pinch of salt (thats a pinch boy, wayyyyy less than 1/4 tsp.)

    1/4 tsp. pepper
    1/2 tsp. dry mustard
    2 Tbsp. Hellmans
    1 Tbsp. Miracle Whip
    Paprika (sprinkles)

    Boil the eggs, use the extra two eggs to check the eggs process. when boiled crack the shell a bit with a spoon. then put the eggs in cold water w/ice cubes. this makes it easier to peel the shell off the egg. Next take the yolks out of the eggs and smash up very finely with fork. next add all of the ingredients together to make the topping. mix well. spoon the mixture onto the egg and then sprinkle on paprika. enjoy. yum yum!!

    Pulled that out a couple weeks ago for a picnic. Yum yum!! was right.

  21. Store them in mbx format by Anonymous Coward · · Score: 2, Insightful

    I recommend mbox (MBX) format.

    1. The format is text based and not likely to become unreadable anytime in the forseeable future.

    2. There are no shortage of tools for manipulating mbox.

    3. Its easily indexed by full text search applications (MS Search included with windows)

    The outlook tools save dialouge has an apple export option which is actually the mbox format.

    In terms of archival access I recommend an IMAP server with a folder hirarchy based on month/year. Your mail client should be configured to leave the messages on the server (not attempt to download via IMAP). This somewhat future proofs migration to different mail clients.

    The only issue is that imap searches are out of the question so you will need to do searches offline with a full text indexing/search application to first find the general folder location of the message you are seeking.

    If your computer has lots of memory then why not just use grep and write a small shell script to forward the message from the archival file to your inbox so that formatting..etc is preserved. If your doing lots of searches the disk cache will back most of it in ram even if its a few GB..

    1. Re:Store them in mbx format by Sancho · · Score: 2, Insightful

      I find that Maildir works better than mbox for my purposes. Roughly all of the same pros, plus:
      4) Doesn't require locking your entire mailbox to modify one message.
      5) Resistant to file/inode corruption (will likely only corrupt one message instead of several.)
      6) Can essentially use shell tools to copy individual messages.

      One thing that's neat to do with maildir mailboxes is to search using grep+xargs and copy the messages you find into a new maildir mailbox (named, perhaps, searchresults). Then you have a handy mailbox populated with your search results. I imagine one could even do this using procmail, so that you could populate the mailbox remotely.

  22. We have something similar at Work by juanca · · Score: 3, Insightful

    At work, we needed to archive (for compliance purposes) all the inbound/outbound email messages of our users (about a 1K aprox). We setup an Ubuntu server with postfix and dovecot IMAP over SSL, using Maildir.

    Our users generate about 20K email messages daily, and we store each day in it's own directory, something like this:

    INBOX
            |- YYYY
                          |- MM
                                    |- DD

    The auditors use Evolution to connect to the archive server and search the emails, even though it takes a little while to load a day of emails for the first time, once it's properly loaded searching is really fast. The server is not that powerful, it's a VM with 2 CPUs and 2GB of RAM. You do need a lot of storage though.

    Hope this helps.

    --
    --Necesito una chela, bien fria...
  23. Re:RETARD MODERATION by Anonymous Coward · · Score: 5, Funny

    Parent is +informative and/or +interesting, not troll. Fucking brain dead moderators these days. Sheesh.

    it suggested a linux solution and made the windows weenies realize how useless their os is. by extension they realized how tiny their penises are and then they finally understood why they like Micro Soft because it describes them perfectly. so they got mad and said "i'll mod it down, yeah, that'll teach them a lesson and make me feel like a real man again!"

  24. Kmail for Outlook stuff and Search. by twitter · · Score: 2, Informative

    Kmail has an excellent .pst converter that will pull out your old Outlook mail. Once you have it in Kmail, you can drag and drop it into any of the supported formats, mbox, mdir etc. If you have already established filters, you can let them sort things out. If not you can use a manual search for to, from, mail list, subject, etc. From there you can run your imap. I carry everything around on my laptop and use kmail instead of using imap. With full drive encryption and xscreensaver, I don't have any worry about losing private information and know that my ISPs have better collections of my email anyway, despite what they say about size limits. I could use Gmail's imap instead of my own but prefer to suck my gmail out with kmail's imap support. Until US networks get more reasonable, I want my mail with me instead of on my own server and I would not advise anyone to leave their mail on someone else's server without having a copy yourself. Because your question is all about search, I have to plug Kmail again. With proper organization of your mail into subfolders for friends, family, lists, companies and projects, mail searches are quick, even on modest hardware like my ancient PIII laptop. Searching everything takes a little longer, but it is not such a burden. Evolution may do as well but something about Gnome turns me off. The only downside is that the 3.5 branch does not seem to be able to search through encrypted mail but I imagine there's some gpg-agent fix for that I'm not aware of.

    --

    Friends don't help friends install M$ junk.

    1. Re:Kmail for Outlook stuff and Search. by AndGodSed · · Score: 2, Informative

      ++ the above, or Evolution - it also imports PST's and from there you can move it to Thunderbird for Windows. If you want uber searchability you could then upload the whole shebang to a gmail account that you sync offline via gears.

      I personally would balk at having all that stuff online with google but hey that would be the best searchable option I know. You can also sync with your Gmail account via imap protocol if gears and the web interface is not for you. Problem with that is that you will lose the great search capability with Gmail.

      Then again Thunderbird has some really cool search addons that might just take care of your needs altogether, plus it is platform agnostic - you can have it on BSD, Linux, Windows or Mac.

      HTH!

  25. IMAP with maildir backend by Fat+Cow · · Score: 2, Insightful

    I migrated all my old personal emails to gmail using IMAP. You can use this to migrate between different on-disk formats like maildir, mbox and pst. I had all my email in yahoo and pulled it down using POP to a maildir, then used an IMAP mail client to copy it across to gmail. Then I regularly back them up from gmail to an on-disk maildir format using mbsync. I picked maildir because it's open and seemed better designed than the alternative, mbox. It's not completely standardized though. I've seen PSTs become corrupt so I try and stay away.

    --
    stay frosty and alert
  26. Re:Psychiatric consultation! by ciderbrew · · Score: 3, Funny

    What do they say?

    June 2001 - "Dave, can't go out tonight. I got a date with that fat chick.YEAH!"
    Sept 2001 - "Dave, She's told me she pregnant."
    Jan 2002 - "Dave, will you be the best man at the wedding :(".


    Shhhh - Dave's the real father (AC doesn't know)..

  27. Re:POO (Plain Old Outlook) by dakohli · · Score: 2, Insightful

    I have to say that PST's can be convenient. However, I have seen many corrupted PST's over the years, and yes I know that there are tools to fix this, but the name of the game here is to actually get your emails out with a minimum of fuss. Also, as to compatibility, I know MS has arbitrarily changed the format of Word. There is nothing to stop them from doing the same to the PST format, and there are several versions of that in existence now. Add this to the fact that as the PST's get bigger, performance drops off. As a really easy expedient solution, using PST's will work, but not well. Using them as a solution for the problem however, I think it will only compound the issues in the long run.

  28. DO NOT DELETE. by GuyFawkes · · Score: 5, Insightful

    I can't tell you the number of times I nearly deleted my archived data, going back to 1997 in my case, not just e-mail either.

    Then I got falsely accused of everything except 9-11 as part of a separation / child custody battle that started with a nuclear attack out of the blue.

    It is amazing how much of that old data is relevant in such cases, "He did x on 1st June 2000 at our house!" and you have data showing you were 200 miles away doing something you had completely forgotten, with someone you haven't spoken to or seen for 7 years, at the time...

    DO NOT DELETE YOUR ARCHIVES, EVER!***

    *** unless of course you are a bad person and they incriminate you, in which case you'd better avoid everyone else who archives data.

    --
    http://slashdot.org/~GuyFawkes/journal
    1. Re:DO NOT DELETE. by cervo · · Score: 3, Insightful

      this can also work against you. Most big companies have record retention policies that include when to delete e-mails. Because those same archives that saved you can bite you in the butt. Also in reality you should be innocent until proven guilty anyway, although I know civil court works differently. But if there is anything you did, maybe an e-mail to another woman that can be spun as evidence you had another girlfriend (even if it was a harmless e-mail just saying hi) then it could bite you.

      Plus no one is 100% squeaky clean. Maybe you admitted you were speeding to someone. Maybe you bought porn website memberships (which could be spun as the reason for a break up, or that you are an unfit parent). Maybe you admitted you were a little too drunk to drive but did it anyway. Maybe you ordered a set of army knives and have the receipt and that gets spun as you have weapons all over the place that could endanger the kids....

      Anyway just saying that too many records could bite you too. Especially if someone from court gets an order for all of them. Then they can be pulled out of context and could be very damaging. Even medical issues could be in the e-mail archives from correspondents with doctors, confirmations of appointments, etc... If that data ever got out it could be damaging to buying insurance as well.

    2. Re:DO NOT DELETE. by afabbro · · Score: 2, Insightful

      Alternatively, spend more time on your personal relationships and home life than maintaining your email archives.

      --
      Advice: on VPS providers
  29. Echo chamber... by MrNemesis · · Score: 4, Informative

    ...has me doing a "me too!" to everyone telling you to use IMAP + maildir; I use dovecot myself, complete with self-signed SSL cert (curse you firefox!).

    El_Muerte_TDS has just pointed me towards mairix, a dedicated maildir + friends indexing system which I've just tried out, and seems to be ideal for my use - fast email search has always been a good thing for me, but I've rarely found a nice lightweight indexing solution that was catered only to mail; "desktop" search engines tend to take the opinion that if I want one thing indexed then I automatically want everything indexed, and also insist on running around the clock. Much nicer for my needs to just have one little lightweight indexing program that only runs when I want it to.

    Best thing about mairix IMHO is the way it creates a virtual maildir on the fly using symlinks, so not only is it easily viewable on the command line, it's also automatically compatible with all of those IMAP + maildir clients out there... which, last time I looked, was all of them. Useful hack for KMail users here.

    Disclaimer: my IMAP server has all its databases on an SSD, so even full text searches from the client are pretty speedy (seriously - the lack of access times on small chunks of random data cuts down search times by at least an order of magnitude), but obviously mairix has the advantage of being able to scale to multiple users with >X GB mailboxes much easier than spending a fortune on fast storage.

    --
    Moderation Total: -1 Troll, +3 Goat
  30. Domino by Belial6 · · Score: 4, Funny

    Yes, it is not free, and yes, this suggestion will bring out the trolls, but you might want to consider Lotus Notes/Domino. It is ~$140 for the system, and ~$40 a year maintenance (Includes all upgrades) cost per user, but IBM isn't going anywhere any time soon.

    It has good full text indexing, you can keep your mail on a client, and on the server, with incredibly flexible replication rules for what is stored where.
    It supports IMAP, so it talks well to most clients.

    The iPhone syncs seamlessly with it via ActiveSync, and an Android client is in beta as we speak.

    It includes an http client, and the http client even offers offline access. That's right. You can use the http client, and still read your mail and write emails that will be sent the next time you make a connection.

    It also has folders, but you can put any email into as many folders as you want, so you have the best of both Outlook folders and Gmail tags.

    It supports auto-processing rules for automatic filing of data, as well as being a full development environment if you want to get really fancy.

    It is brain dead easy to set up and maintain.

    The server runs on Linux and Window, and the client runs on Linux, Windows and Mac.

    1. Re:Domino by Belial6 · · Score: 2, Informative

      Seriously, what is wrong with or for that matter, the Notes client web client?

      I call you out troll. I also call you out on your made up problem of not knowing if something is read or not. Unread marks replicate between servers.

    2. Re:Domino by Belial6 · · Score: 2

      Ok, I read your link. You know, the one that praises the UI for Notes. I still call you out for trolling. The complain the guy had had NOTHING to do with caching. As was explained by the follow-ups to your #6. Unread marks are maintained for each USER. The unread marks replicate just like all other data. No caching issue at all. Basically you are complaining because Exchange is broken, and you want Notes broken in the same way. Why on earth would I want MY unread marks to be changed by YOU reading something?

      Shared mail is generally a questionable action to take anyway. While there are a few exceptions, it is generally what incompetent admins do because Exchange is feature incomplete, and they try to set it up like Exchange. The proper thing to do is use the mail template and create a mail-in database. That way you don't have a user to maintain, and you are not paying for extra licenses that you are not using. Whether you are correctly using a mail-database derived from the mail template, or if you are incorrectly using a fake user, you can still have a mark to indicate that a document has been opened by SOMEBODY. Just put @SetField("GroupRead", "X") in the QueryOpenEvent of the form and add a column to show that. You can get fancier, but that is no harder than a simple Excel spread sheet. With Notes/Domino, you can have proper per user read marks, OR group read marks.

      Of course, even though Notes handles it fine, I would like to know what tool handles it 'right' by your standards, and how that tool lets me know if I have read the email as opposed to anybody having read it.

      So, again, I call you out troll...

  31. Re:Psychiatric consultation! by Jawnn · · Score: 2, Insightful

    How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important.

    While the value you place on being able to retrieve critical pieces of information may be valid, your choice of storage medium is not. An email system is not a file server or database. Most index poorly, if at all, making searches horribly inefficient. And as has already been observed, it may be quite likely that those same things you value will be more than offset by their value to a hostile litigant.

  32. just because I can. by socsoc · · Score: 3, Insightful

    just because I can.

    That's a big assumption. You are asking slashdot, so I'm thinking you can't. Especially because imap never occurred to you.

  33. What about the privacy of those you email with? by perpenso · · Score: 2, Insightful

    What about the privacy of those you correspond with? If they send an email to a gmail account that is one thing, but you are unilaterally deciding to have them participate in the targeted advertising profiling.

  34. Re:MySQL by Anonymous Coward · · Score: 2, Informative

    I've been using hMailServer for a few years now, and it's free. It's an ISP solution, but has some great IMAP facilities like shared storage. I have over 120GB of IMAP data and it's not twitching. It has a MySQL Lite backend, and capabilties for web, pop, imap, AD integration, etc etc. Would recommend it.

  35. Re:Psychiatric consultation! by pz · · Score: 2, Interesting

    When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

    This sort of behavior is odd and not normal. If you want to keep your email, then that's fine, but thinking that it's "vitally important" is odd and I think without question points to some "OCD with some component of Aspberger". If you don't then maybe you need to re-evaluate.

    I am however interested in how you pull demographic analysis out of emails? I mean, hopefully you're not suggesting that you go and chomp on the text to pull out fields of data?

    So on the one hand, you think my saving email for later access and analysis is not useful, but then, you want to know why it is useful?

    I run a research laboratory where we do two things, one is work on restoring sight to the blind, the other is to organize a conference every two years. The primary demographic analysis I need to do is to analyze the country-of-origin for email traffic pertinent to the conference. This has helped to raise many tens of thousands of dollars of support for the conference by demonstrating various aspects of the global attendance to funding agencies.

    Being able to access my email and locate attachments, review discussions, find references, remember addresses, etc., in other words, to recall what someone once wrote to me, has resulted in millions of dollars of grant money to fund my research. Without the ability to review email that is, at times, years old, that would not be possible. Having rich access to my email stream has allowed me to fund my lab, and therefore feed and house my family and the people who work for me, publish high-impact papers, receive numerous awards, get coverage in the international press, etc., or, put better, to run the daily business of a research lab at a high-profile university. While the tools I use are good, they leave a lot to be desired, and having a better system would make me more productive.

    IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea,

    I think that GMail could be the panacea here. I mean, if you're just trying to make sure it lasts and you can search it with ease, then GMail can do it better than you can.

    I dislike GMail for my professional correspondence for a number of reasons: (1) it does not allow me to readily use my university affiliation address (and since that's a top university, that makes a difference whether people like it or not), (2) I do not have ownership of my email, (3) the lack of a good filing / archiving interface makes it hard to associate different threads together, or to limit searches (I intensely dislike the tagging feature), (4) GMail has an only rudimentary ability to edit text since it's browser-based.

    I do use GMail for my personal correspondence, but that's mostly because it's the best of a bunch of poor, but free, services. It does have the best searching features, but falls down in a lot of other ways. It also would be against my employer's policies to store HIPAA-regulated email offsite. So GMail is not a panacea. Thanks for the suggestion, though.

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  36. Re:It's obvious - Gmail by flappinbooger · · Score: 4, Insightful

    It's obvious, upload them to gmail!

    (only half kidding)

    --
    Flappinbooger isn't my real name
  37. Hold the phone! by Anonymous Coward · · Score: 2, Insightful

    Computers, hard drives, backups, electricity, rack space, and maintenance are all free! Fuck! Tell me where you shop for this stuff.

  38. Re:No, not IMAP, it bogs down by wkcole · · Score: 2, Informative

    As anyone who actually uses IMAP can tell you, it bogs down quickly on large mailboxes, violating the poster's requirement about b)

    Not true. Not absolutely false, either. IMAP is an access protocol, not a storage or indexing mechanism, and there is nothing inherent in IMAP that dooms it to be slow in handling large mailboxes. Different combinations of client and server, configurations, and mailbox content and usage can make huge differences in performance. Tens of thousands of messages in a single IMAP folder on a memory-lean server that uses Maildir storage on a UFS or ext2 filesystem with atimes enabled is going to suck horribly, especially with a client that doesn't cache heavily or maintain its own indices. Make that a mbox, and it will work great until you start trying to change it every couple of seconds.

  39. Re:RETARD MODERATION by halltk1983 · · Score: 5, Insightful

    Virtualbox is platform independent, and he also mentioned using a VM. Once all the email is on the IMAP server in the VM, you could easily attach to it with a client that runs on any platform.

    Also, IMAP servers are platform independent, as they can run on OSX, Windows, Linux, BSD, and almost any other popular OS I can think of. It's just that Linux distros are common, easy to set up, and light enough on resources that they would be easy to set up in a VM, and without the licensing costs of OSX or Windows, it becomes price comparable to lesser solutions.

    I know it's a lot to ask these days to get people to read the comments that they are replying to, but maybe, just maybe, someone complaining about a lack of reading comprehension should take more time to read.

    --
    Watch for Penguins, they eat Apples and throw rocks at Windows.
  40. Re:RETARD MODERATION by insertwackynamehere · · Score: 2, Informative

    Do you know what imap is? He's gonna have to have some central storage thing but the mail access is platform independent..yeah if he wants his imap server to be his own than he'll have to pick one os to serve from but every nonshit mail application has imap support from desktop to mobile and hands down gives him what he wants if he takes the time to organize and set it all up