Slashdot Mirror


Ask Slashdot: Best Way To Archive and Access Ancient Emails?

An anonymous reader writes "I started using email in the early 90s and have lost most of that first decade due to ignorance, botched backups, and so on. But since about 2000, I've got most — if not all — of my email in some form or other. I run Linux, so this has mainly been in a mix of various programs: Kmail, Evolution, Thunderbird. The past 2-3 years are still on the IMAP servers. My problem is that I only rarely NEED to look back to email of 5 years ago. But sometimes it's nice. Or I just want to reminisce about something...or find an old attachment that I was sent. But I do not want to be clogging my current email client of choice with vast backups and even more, I don't know if it will even easily convert. The file structures are different, some are mbox, others maildir, etc., and I would ideally like a way to 1) store and archive these emails, 2) access them, and 3) search by Sender, Subject, Date, Attachments. Is there anything I can do or do I just have to keep legacy applications on hand for this? Should I keep trying to upgrade and pull old files into the new applications? Any help or suggestions about what YOU do would be great."

282 comments

  1. Use getmail and dovecot? by Anonymous Coward · · Score: 1

    Personally, I use getmail and dovecot for my mail, not just archived. Everything's available, sorted and filtered on retrieval. I even added dspam to catch what google misses. I think I wrote a script to get the old mbox files into maildir via dovecot's processing, but it all worked and continues to work. Multiple email accounts aren't a problem.

  2. IMAP by sylvandb · · Score: 4, Informative

    Just IMAP it all.

    I went IMAP in 1997 and have never looked back.

    I've also used IMAP as a temporary conversion measure for people switching e-mail clients so even if you aren't sure, it makes a good first step.

    I don't understand the concern about too many e-mails. I can access my email back to 1992. With multiple folders it shouldn't be a problem and with modern indexing a search shouldn't be an issue.

    1. Re:IMAP by DNS-and-BIND · · Score: 4, Informative

      When I search for my brother's name I don't want to wait 30 seconds for a search to complete, nor do I want to see his emails from 10 years ago. I just want to see his last email that he sent about the trip we're taking next week. That's the concern about too many emails.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    2. Re:IMAP by Anonymous Coward · · Score: 0

      Do you mean you run your own IMAP server, and put all the stuff on there? Or are you just saying you leave all your mail on the remote server? If its the former, what program/setup do you use?

    3. Re:IMAP by kwerle · · Score: 5, Insightful

      This.

      I fired up imap servers for all my old mail.
      I fired up a modern mail client (OSX Mail.app) and connected to all of 'em and also to gmail.
      I dragged all my old email into gmail. In a GUI. And it worked.

      Done.

      I no longer run mailservers. Too much of a headache. gmail is awesome (with imap access, even). Indexing, instant searching, etc.

      If you don't want/trust your email to the cloud, then this isn't for you. Unless you want to run your own imap server with whatever backend suits you - then you can dump it all there. I just can't be bothered to manage that after 15+ years of doing so.

    4. Re:IMAP by AK+Marc · · Score: 1

      That's when you sort by sender, then look for his name, since it's all alphabetical. What I don't like is the lack of a convenient multifield sort I want an email with "law" or "IRS" in it from between 1998 and 2000 sent directly to me, with nobody in the "cc" field.

    5. Re:IMAP by Anonymous Coward · · Score: 2, Interesting

      IMAP offers "server side" storage with "client side" viewing.

      Besides the privacy implications, are you seriously suggesting that the OP
      a) find or create an IMAP server,
      b) force feed that server all his archived emails (presuming that there is some way to bulk import email into the IMAP server), and
      c) change his current email setup so that, from now on, his email is sent to the mail server on which the IMAP server runs?

      How is that any easier to manage than his current predicament? ISTM that your suggestion forces him to go into a big "migration" phase, and change his email provider/provisioning. Not the simple solution that I would suggest.

    6. Re:IMAP by hairyfeet · · Score: 2

      Got a better idea? The guy has them in Lord knows how many programs splattered all over the place, he wants it all in one place and searchable and IMAP is good at that. I mean sure we would all prefer a "push button and its done" kinda deal but AFAIK no such thing exists that will let him store it locally. You could run it all into Gmail but some folks have privacy concerns and of course its not like Google hasn't lost stuff in the past, so given a complex situation like TFA I'd say IMAP is probably gonna be the least painful of the bunch.

      Hell he's already on Linux, not like adding an email server role in Linux is hard, millions of Linux boxes do that role everyday. But if you got a better call Hoss I'd like to hear it, given the requirements I'd say IMAP would fit the bill closest with the least amount of hassles. But lets face it when you are dealing with a decade plus worth of data in different formats? Its never gonna be clean and easy, it just don't work that way.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    7. Re:IMAP by arth1 · · Score: 3, Informative

      Pretty much everything not made by Microsoft will support export to good old mbox. It's a good format to store in, because you can always import from it into other formats.
      And you can run simple scripts against the mbox files.
      More than once, I've done a grep against my mail archive, and more than once I've moved it to a new machine and new mail software.

    8. Re:IMAP by Anonymous Coward · · Score: 0

      With just a few squares of toilet paper.

      “I propose a limitation be put on how many squares of toilet paper can be used in any one sitting.” - Sheryl Crow

    9. Re:IMAP by brendank310 · · Score: 1

      You're taking a trip with him next week and you don't remember his name? Stop worrying about email and see a doctor.

    10. Re:IMAP by skids · · Score: 1

      a) cyrrus. done.
      b) there is. pine.
      c) also pretty easy.

      Personally I have 40K messages on an imaps store. A full text search does take a few minutes, but only because the hardware is utterly ancient -- less powerful than many modern cell phones. I should get around to upgrading that.

    11. Re:IMAP by Antique+Geekmeister · · Score: 4, Informative

      _NO_. Under no circumstances use "mbox" for mail storage, or anything other than a temporary stage on the way to transferring it to something contemporary and uable such as Maildir. If you lose that one mbox file, by file system corruption or by fat finger accident or overflowing a partition or in tht eprocess of merging new email with it, you've lost _all_ your mail in that mbox. And as you read, mark, or save mail, that file is constantly churning, making backup and replication of the mail spool far more dangerous and fragile, especially when the mail directory is bulky with years or decades of active mail threads or simply undeleted email.

      mbox was useful when the available inodes on a file system were limited programs benefited from using a single inode for transactions, and backups occurred on magtape, but there is simply no point to it in decasdes.

    12. Re:IMAP by DNS-and-BIND · · Score: 0, Offtopic

      Mozilla Thunderbird is amazingly slow. This shitting comment is mean-spirited, obvious flamebait and I have no idea why you got modded up to +5.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    13. Re:IMAP by icebraining · · Score: 3, Insightful

      I agree with that for new emails, but for an archive file, none of it really applies. File system corruption and fat fingers should be handled by just restoring from backup, and merging / marking as read / etc is not really applicable for old mail, which should be accessed by either viewing it readonly or making a disposable copy.

      mbox might have its problems, but I don't think there's any good reason to spend time converting old files to Maildir.

    14. Re:IMAP by hobarrera · · Score: 3, Insightful

      Archive old emails by year:


      Archives/2013
      Archives/2012
      Archives/2011
      Archives/2010
      Archives/2009 ...

      Only search in the appropiate ones. Easy, right?

    15. Re:IMAP by hobarrera · · Score: 1

      It's a shame gmails has no real filters, like (sieve).

      Yes, the webinterface CAN create filters. But if you're using a desktop clients, it's not confortabe to have to open a DIFFERENT client to configure filters.

      Gmail also lacks some imap features, notably, the sort command.

    16. Re:IMAP by peragrin · · Score: 2

      That's why I do both.

      for every day I use gmail

      but once a year I fire up a current email client and download everything archived. I also purge the archive every couple of years of truly useless emails.

      I keep a a copy for safety and away I go.

      --
      i thought once I was found, but it was only a dream.
    17. Re:IMAP by Anonymous Coward · · Score: 0

      Then what IMAP client, preferably cross-platform, do you recommend oh wise one.

    18. Re:IMAP by martin-boundary · · Score: 1, Funny
      Hello kwerle,
      Thank you for your efforts, we appreciate it.

      Sincerely,
      The FBI.

    19. Re:IMAP by arth1 · · Score: 5, Informative

      _NO_. Under no circumstances use "mbox" for mail storage, or anything other than a temporary stage on the way to transferring it to something contemporary and uable such as Maildir. If you lose that one mbox file, by file system corruption or by fat finger accident or overflowing a partition or in tht eprocess of merging new email with it, you've lost _all_ your mail in that mbox.

      Thus speaks ignorance. If you write corrupt data to a mbox file, nothing prior to the corruption is affected at all. Unlike most formats that don't store each mail in a separate file, you can also very easily run recovery against a mbox file. Heck, a one-liner perl script can retrieve anything from before and after a corruption.

      And "overflowing a partition"? Um, run that by us again. If you mean disk full, that doesn't truly affect a format that's made for appending. You won't be able to append. Any other format you can come up with will have the same problem.

      And for archival purposes, this also does not apply. You don't make changes to your archive. Period.
      And you back it up. Period.
      Which is a heck of a lot easier to do with mbox than most other formats.

      But again, the main strength is that it is so simple, which means that pretty much every mail program out there will support it, one way or another.
      Choosing a more modern format leaves you with fewer options, and less certainty that it will be supported in the future. 20 years down the road, mbox will still be supported. It has an RFC - http://tools.ietf.org/html/rfc4155

      Can you say the same about ANY other format? Maildir doesn't work on systems that doesn't allow colon in file names, and hashes the filename based on the hostname which both isn't portable, and crashes badly for many implementations if you have a non-ascii hostname. Not to mention that the format has balkanized, to the point that it's no longer compatible betweeen implementations.

      Again, for archival purposes, simplicity is the key.

    20. Re:IMAP by adolf · · Score: 3, Interesting

      What's wrong with MBOX? I've been using it with gigabyte-sized folders for over a decade and nothing bad has ---@@@ From MAILER-DAEMON Fri Jul 8 12:08:34 2011

      Seriously, though. Even when mbox gets trashed due to disk corruption, it is still every bit as useable as a trashed maildir: Messages get lost, not whole folders...and even then, that's what backups are for (right?).

      Or at least, that was my experience over the past many moons. As a static archive (Sent-mail-2012), I can't think of a single thing wrong with mbox. And it's easier to cat to tape than maildir.

    21. Re:IMAP by Anonymous Coward · · Score: 0

      Wouldn't work for me, they only give you a bit more than 10GB last I checked.

    22. Re:IMAP by kwerle · · Score: 1

      Yeah.

      • The FBI cares about what you discuss
      • They can't possibly see what it is, already, because your system is super secure - as are all the systems your email travels through inbound and outbound.
      • Everyone you exchange email with wears the same brand of tinfoil hats you do, and there is no way they could be leaking data.
      • Storing stuff on gmail means you could not possibly use PGP (or similar)
    23. Re:IMAP by kwerle · · Score: 1

      It always made me crazy that I would create a filter on a client and it didn't work with other clients - so I'm happy to do my filter config on the server (using gmail's web UI).

      Do you use a client/server config that allows you to create filters using a client UI that then executes on the server? If so, what is it? I guess MS Exchange may do something like that?

    24. Re:IMAP by arth1 · · Score: 2

      Or at least, that was my experience over the past many moons. As a static archive (Sent-mail-2012), I can't think of a single thing wrong with mbox. And it's easier to cat to tape than maildir.

      You don't even have to create a new tape entry, just append to the old one.

      Not to mention that mboxes can b cat'ed together, as they get older. Like combining all dailys to a weekly, weeklies to a monthly, and monthlies to a yearly.
      What it doesn't have is indices, but nothing prevents you from importing and converting a mbox to an indexed format. Or creating an index for that matter.

      When archiving, the main point is that you want a format that is mature, long-term-supported, and easy to use tools again. What do you do with an Informix or Sybase binary archive today? Yet those admins who got laughed at for making humanly readable and script-parsable SQL dumps have no problem when they need to access 15 year old data.
      Same with e-mail. Go with simple, plain text, and well supported. I.e. mbox.

    25. Re:IMAP by Anonymous Coward · · Score: 0

      he's making a point and expressing himself freely in doing so. it's not flame bait, you just wish it was so can feel righteous about it.

    26. Re:IMAP by Yosho · · Score: 3, Interesting

      For what it's worth, my personal mail server is an Athlon X2 3800+, still running Ubuntu 8.04.4 LTS. Pretty old by today's standards. I've got a dovecot server offering IMAP access and a Roundcube webmail server on it. My inbox has about 25,000 messages in it, and there are e-mails in there that go back to 2007.

      Doing a search by sender took maybe 1 or 2 seconds, and the most recent e-mails came up right away since they're sorted by date.

      --
      Karma: Terrifying (mostly affected by atrocities you've committed)
    27. Re:IMAP by Anonymous Coward · · Score: 0

      What I would like is all the old eMail dating back to 1994 to maybe 2 years ago to some DVD+Rs
      and the HEADER info with "subject" and maybe the first 2 lines if I want to be over the top,
      in my local eMail database so I can search though that part and then the App will prompt me for that disc to put in the slot.

      and I got about 4 GB so far in gmail and 40 gigs in my .mac archive to just 5 years ago .

      So I miss my old eMail.... stuck on thunder and other eudoras

    28. Re:IMAP by kcbnac · · Score: 2

      Any web-based client *should* (unless they use a plugin or other weird config tool) save any filters, etc. to the web-based-profile (Like GMail does). Otherwise, if you choose Thunderbird (or any other sane client) you could just copy over the profile between installs. Even across OSes.

      I've successfully used a singular Thunderbird profile on both a Windows and Linux boot off the same machine; granted Linux access to the NTFS partition Windows sat on and it used that profile directory. Been copying it forward to new installs for a few years now.

      Migrating towards a VM (that gets backed up regularly) holding the 'core' stuff that doesn't sync well (Firefox and Chrome both do; so can run that on whatever) - then just use that VM for non-GMail email, and whatever else is worth consolidating down to one machine.

    29. Re:IMAP by martin-boundary · · Score: 0
      • Are you now or have you ever been a member of the Terrorist Party? Never mind, we'll just google your emails.
      • Why intercept emails retroactively in time, when those archives are being voluntarily put on servers we can google automatically from the office? Hey Schmidt, pass me a donut, will ya?
      • Meh, who says you're the target? Thanks for your data, now we can track that terrorist sympathising friend of yours!
      • PGP what now? We just care about your regular address book, we'll ask you to give us the PGP key after we find your suspicious activity from 10 years ago. You don't have anything to hide, do you?
    30. Re:IMAP by kthreadd · · Score: 0

      Mutt for command line, Mulberry for GUI.

    31. Re: IMAP by Anonymous Coward · · Score: 0

      mbox FTW.

    32. Re:IMAP by WillKemp · · Score: 3, Interesting

      [......] are you seriously suggesting that the OP
      a) find or create an IMAP server,

      Ridiculously simple. They're already running Linux, they just have to install dovecot and they've got a fully functional IMAP server (no configuration required) - which has access to all their local mail boxes.

      b) force feed that server all his archived emails (presuming that there is some way to bulk import email into the IMAP server)

      Ridiculously simple. Fire up Thunderbird, configure it to access your local IMAP server, select all, drag and drop.

      c) change his current email setup so that, from now on, his email is sent to the mail server on which the IMAP server runs?

      Why would they need to do that? Thunderbird (or other mail reader of choice) can access multiple accounts.

    33. Re:IMAP by CAIMLAS · · Score: 1

      This is a good recommendation, but it's got a couple "gotchas" along the road. Is he going to push it to the cloud (ie his current IMAP system, quite possibly losing access to it?

      As a mail administrator (cough), I've had to do all sorts of conversions. It can be a sticky mess, and mail migrated from one client to another sometimes won't preserve all the IMAP tags (or client specific tags).

      IMAP as a temporary conversion medium is most certainly going to be required, even if it's just through a google account.

      I would strongly, strongly recommend he take a look at the most excellently functional IMAP Tools by Rick Sanders, who has been most helpful to me in the past when I've needed features or stumbled upon bugs or missing features. It'll do things like mbox/Maildir, etc. cross-conversion, if he wants to keep things local, or he can push it up to IMAP. There's also imapsync, but my experience is that it's significantly lesser: it's more error prone, doesn't do full conversions (IMAP tags often/usually get lost), lacks half the functionality of IMAPTools, and more likely to produce a WTF situation when it does error.

      As for general approach, I personally keep everything in Maildir, segregated by year, on my file server. I can access this through a dovecot setup I've got which is really just plane jane; I've got over 100GB of mail at this point, though I too lost much of my mail from the 90s. I keep the mail system in my MUA, but unsubscribe from the archived years.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    34. Re:IMAP by CAIMLAS · · Score: 1

      I'm pretty sure he was talking about the (more common) filesystem corruption. BAM, you lose a single file (not uncommon) and that's your mbox? You're done.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    35. Re:IMAP by houghi · · Score: 1

      I just delete them once in a while. Why would I keep mails that say "OK, I will see you later". Anything you say can and will be used against you.

      No idea what they will use it for, but I know it will not be in my favor.

      I generally trow away everything that I have not used in one or two years. Obviously there are some exceptions, but they are few and far between.

      --
      Don't fight for your country, if your country does not fight for you.
    36. Re:IMAP by Anonymous Coward · · Score: 1

      Yes, gmail is awesome, but you don't know when google will close the service like google reader.

    37. Re:IMAP by Sigg3.net · · Score: 1

      Bobby Tables?

    38. Re:IMAP by funkboy · · Score: 2

      Run your own IMAP server. For the past decade or so, Dovecot has by far & away become the best choice. If you've set up any other daemons before it's really not very complicated software.

    39. Re:IMAP by SomeKDEUser · · Score: 2

      Then use Kmail and behold the power of akonadi and nepomuk. searching in the later versions is fast -- and I have a very large number of mails.

      It turns out that for large enough collections you _do_ need a DB :)

    40. Re:IMAP by kwark · · Score: 1

      Exactly, my archive is contains compressed mboxes and maildir folders spanning almost 20 years, depending on quantity of mail splitted per year. Mutt has no problem with this mixed setup, in cases where mutt fails to find the mail (there are some folders that contain gpg encrypted messages) I'm looking for I can always fallback to the standard unix text search tools.

    41. Re:IMAP by Antique+Geekmeister · · Score: 2

      Thank you, yes, I was speaking of file system corruption. And because any ongoing email management, such as deleting or even marking as read or unread the old messages, causes change in the content of the mbox file, from the message edited onward, the claim that "it's only risking messages after the point where you edit" is disingenuous. I'm afraid it's not "ignorance" speaking, it's lengthy and painful experience.

      _If_ the mbox files re absolutely static, then mbox can be considered reasonably stable. But if the messages are resorted into new folders, or even worse if the oldest, earliest entries are ever deleted, then the contents of the mbox file _from that message forward_ have to be rewritten. There is no graceful way with most filesystems to simply "snip this 3095 characters content out of the middle between the start and end of this particular message". The means used can be fascinatingly clever and complex but normally involve overwriting _everything after the beginning of the removal_ with the remaining, preserved old content. And we could explore further what happens on the disk when you actually try to delete content after a certain point in a file, and how that churns the underlying filesystem itself, but it's heavily filesytem dependent.

      This means that touching the early entries, and any accidents that occur, corrupts anything after those early entriees. It also means that touching those mbox files causes filesystem churn, becuase the files no longer match the old files and have distinct contents. Unless deletions or additions fo the entries somehow aligns with the old blocks, even most deduplication based filesystems will fail to optimize. And the tendency of some old mbox users to keep _everything_ in simple, large mbox "folders" which are actually single mbox files compounds the issue with backup problems tied to very, very large files, and tied to small edits of those very, very large files causing churn in the backup system.

      Having an RFC for an older, simpler protocol does not make it ideal for modern use. mbox was useful when filesystems were distinctly slowed by many hundreds or thousands of files in one directory, and when the number of inodes available for your home directory and the ability to monitor or mange a mailbox in a consistent format was critical. But Maildir and various tools based on it have, correctly, replaced it. The filesystem issues are one critical reason, and the other is what Dan Bernstein talked about when he wrote Maildir: safe locking or transaction handling for multiple simultaneous client access. (See http://cr.yp.to/proto/maildir.html)

      Maildir successfully follows one of the critical lessons of robust programming. If you make only small changes, you make only small mistakes, and the message handling is vastly safer from adding, deleting, or relocating small files than from merging or extracting individual messages stored in a necessarily vulnerable single archive.

    42. Re:IMAP by hobarrera · · Score: 1

      Do you use a client/server config that allows you to create filters using a client UI that then executes on the server? If so, what is it? I guess MS Exchange may do something like that?

      There's a link to such a protocol in the comment you're replying to. Maybe you should have look at it?
      Thunderbird has a plugin to do client-side configuration of these. I believe kmail includes this as well. I'm pretty sure there's plenty more clients since, as you can see in the link above, it's an IETF standard.

    43. Re:IMAP by jaak · · Score: 0

      I love all these "Use IMAP!" replies.

      That's like asking, "How should I archive all my old web pages?" and answering "Use HTTP!"

      IMAP is an email access protocol, not a storage or archiving format.

    44. Re:IMAP by Cajun+Hell · · Score: 3, Interesting

      Besides the privacy implications

      There aren't any privacy implications. If you there were any, then you would have named one or mentioned an example. The situation prior to typing "sudo apt-get install dovecot" is that he had the data (so it's already subpoena-able or whatever you're trying to imply) and after that he'll also have the data. Nothing changes. Are you complaining that he's keeping the data rather than deleting it? I don't get your point at all.

      are you seriously suggesting that the OP a) find or create an IMAP server,

      Yes, because it's easy. This can be done in literally ten minutes. Maybe a little more if he doesn't already have good storage allocated for it (e.g. a Reiser formatted ~/Maildir, or whatever your own religion commands).

      b) force feed that server all his archived emails (presuming that there is some way to bulk import email into the IMAP server)

      Yes, because it's trivial. It's highly likely that whatever he is using to read each of his different mail archives, can also talk IMAP, because everything talks IMAP. You say "force feeding" as though literally selecting and dragging in a GUI, or picking "copy" or "import" off some menu, is hard. It's not.

      As for step c (changing how he receives email), I don't think that's being suggested but it may be a good idea. He can decide later, whether or not he wants his archive server to become his main/active server. That decision can wait and is not part of the scenario being discussed; it's an opportunity for the future.

      How is that any easier to manage than his current predicament?

      Because then he'll have his archive stored in a system that is specialized for handling the problem, accessible and searchable by any client he wishes to use, possibly even the very same tool he uses for his day-to-day non-archive mail reading. Or he can pick some other IMAP client if it handles mass/archive use case better than the routine use case. Everything Just Works, all together. All of his complexity and exceptions disappear. And at virtually no cost; there's no downside to counter any of the advantages.

      This is one of the easiest no-brainer Ask Slashdots, ever. There is one right objective best simple easy-and-fast-and-good(!) answer, and setting up an IMAP server is it. Probably because email storage is an old, very-solved problem.

      --
      "Believe me!" -- Donald Trump
    45. Re:IMAP by kwerle · · Score: 1

      Seriously?

      It's a protocol/language. It doesn't even list a single client or server implementation on the page. So I'll try again with you:

      Do you use a client/server config that allows you to create filters using a client UI that then executes on the server? If so, what is it?

    46. Re:IMAP by Anonymous Coward · · Score: 0

      "Storing stuff on gmail means you could not possibly use PGP (or similar)" ...That's only true for webmail, with current technology.

      But you can use Gmail in any email client, via either POP or IMAP. And most decent email clients do a nice job of supporting S/MIME or GPG encryption. I'm especially fond of mail.app on the Mac.

      But given the current state of U.S. Federal warrant policies, only a fool would keep email on Gmail's servers longer than six months. Some might argue only a fool would keep email on Gmail's servers for less than that, or use Gmail at all. But Gmail can be entirely safe if you encrypt. (Of course, encrypted emails can't benefit from Gmail's lovely search and instant recovery capabilities. That's kinda the point, of course.)

      Tinfoil hattery? Sure, but just because you're paranoid doesn't mean someone's not out to get you. Or your business data. And if you travel to China or other surveillance states, you're a damn fool if you don't encrypt anything potentially sensitive or valuable.

      And given the many recent hacks of certificate authorities, the CA-mediated trust system is looking increasingly shaky, so manage your own trust using GPG or something of the sort. GPGTools was released for Mountain Lion just a couple days ago, by the way, and is super-slick, and free/open-source. (www.GPGTools.org)

      Meanwhile, to address the OP's question: There are various ways of backing-up your gmail locally in easily searchable form; among the slickest is CloudPull (http://www.goldenhillsoftware.com) which maintains its own store of archived emails which can be searched easily and opened in your mail client if desired.

    47. Re:IMAP by hobarrera · · Score: 1

      I formerly used Thunderbird + http://sieve.mozdev.org/.
      Nowadays, I use mutt and haven't needed to edit my filters yet (mutt doesn't support sieve, regrettably).

      My server is dovecot.

      Here's a more complete list of clients that support sieve: http://sieve.info/clients, since thunderbird's plugin wasn't too good.

    48. Re:IMAP by Anonymous Coward · · Score: 0

      Creating an imap server is easy as pie. Running courier imap for more than a decade here. Haven't bothered with tuning it one bit sine almost as long (I did at first, to learn, but since then, straight apt-get install).

      Mail is only accessible from my home network. Running VPN off of the fritz box. IPSec. All clickedy clack easy setup.

      What not to like? I have switched mail clients at least five times in that time frame. Even read my mail using vi on the server (maildir ever since the beginning, when I used qmail. Now using straight apt-get postfix) for a few days when computer was borked.

      All of this is fed by fetchmail getting multiple addresses I accumulated over the years.

    49. Re:IMAP by Anonymous Coward · · Score: 0

      I can do that. I use Eudora. Yes, the defunct and abandoned piece of software from Qualcomm.

      Its simple, feature packed, searching is fast, and it just plain works.

    50. Re:IMAP by hairyfeet · · Score: 1

      Yeah but read TFA again the guy wants ALL his email, and when I read all I generally take all to mean...well all, as the ones coming NOW and the ones coming THEN which means churn which means bad idea.

      Again no choice here is gonna be really elegant or "push button and done" because its a messy job but given the reqs of TFA I'd say IMAP fits the bill closest with the least risks. Remember he wants the WHOLE THING searchable, which means new AND old have to be in the same place, so MBox would not be the safest move here, IMAP is the better choice.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    51. Re:IMAP by WuphonsReach · · Score: 1

      Run IMAP with a local cached copy of everything. Thunderbird searches will search locally unless you tell it to search on the IMAP server.

      And emails older then a year should be split out into annual archive folders. Which is a fast and dirty way of organizing. If you remember that you talked about something in 2011, it's probably easier to find with good subject lines / searching then trying to keep track of a complex hierarchy of folders.

      Worst case, you have to search both the 2010 and 2012 folders to find something.

      Thunderbird also has a "fast filter" system where you can quickly filter a folder based on keywords in either the subject, sender, receiver or body (or all of the above).

      --
      Wolde you bothe eate your cake, and have your cake?
    52. Re:IMAP by kmoser · · Score: 0

      I'm still waiting for Eudora's awesome search feature to be added Penelope.

    53. Re:IMAP by Anonymous Coward · · Score: 0

      Print them. Done. If you don't want. To waste paper, print to. A PDF file instead. If you think PDF files might not be readable in the future, save them as ODF, or whatever your favorite thing is. I prefer these methods to Save-As HTML, because I either lose images, or end up with a file AND a folder with all the images separate, which to me is sloppy and unnecessary.

      People who want to leave shit on gmail are obviously unconcerned with safety, security, or privacy.

      What happens when enough people regard e-mail the way they now regard snail-mail? It's already happening, more and more people use Shitter and Facefuck for their communication needs, or their text-messages, etc. One day in the not-too distant future, (I expect to see at least one story about this two days from now... but I mean for real, in the not too distant future,) Google may decide to terminate gmail, just the way it has killed so many other services when it realizes they're no longer profitable.

      Gmail has gotten to be such a pain in the ass with Google insisting on changing the format and forcing things on people so much I'm considering abandoning it myself. If they ever do away with plain-html mode, seriously, I will probably leave them.

    54. Re:IMAP by AK+Marc · · Score: 0

      I'm sure I still have a working copy of Eudora running around somewhere. The modern ones work, but you essentially have to write a script in the search field. I want it easy and usable. I haven't used Eudora since I used ISP emails for my email (pre hotmail/gmail).

    55. Re:IMAP by icebraining · · Score: 1

      They're orthogonal; IMAP is just a protocol, not a storing mechanism. You can just add an MBOX mailbox to the IMAP server's configuration if you have one.

    56. Re:IMAP by kwerle · · Score: 1

      As a user, how would you rate your experience of seive as a tool? Was it easy? Easy like configuring gmail filters?

      Did it pretty much do everything you wanted? (sounds like it from the fact you've not recently edited filters)

    57. Re:IMAP by smoore · · Score: 1

      Real easy, I remember that email with the friends address who moved away a few years ago was in the 2009 folder, or was it the 2008? or the 2010 folder?

      --
      Shawn Moore http://www.teuse.net
    58. Re:IMAP by hobarrera · · Score: 1

      So, you want to quickly search for the email, without waiting too many seconds, but you've no idea from what year the email is? I'm assuming you're already indexing it, so sorry, the only solution is to buy faster SDDs, faster RAM, and faster CPU to do the search faster.

    59. Re:IMAP by BitZtream · · Score: 1

      Cyrus IMAP supports server side filtering via sieve, and it can be controlled via the client if your client supports it. Many do, Thunderbird included.

      Cyrus also supports server side searches, so you can search all your years of email in a few seconds via a server side index. Even collections like mine that date back to the late 90s and include over 10GB of email.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  3. Use the IMAP by joel48 · · Score: 2

    Use the IMAP server - if you have control and/or space available.

    I just have a single large archive IMAP folder into which everything that isn't spam gets pushed. You could optionally create subfolders for time ranges (every 1-2 years, whatever works for you). Using dovecot with good indexing support on the backend quick searching has been great. If you do a sub-archive breakout on time the searches will be quicker, you could also then create a virtual mailbox combining them all for when search really needs to span time (and take a good chunk longer)

    There are scripts/utilities available to push mbox, etc. into an IMAP folder, push everything there and use it.

  4. Maildir by the+eric+conspiracy · · Score: 2

    I have all my personal email from 1998 in a Maildir directory with Dovecot as the server on a dual core Atom server running Centos. About 900 MB worth.

    Plenty fast.

     

    1. Re: maildir by shaiay · · Score: 2

      This! Also, for fast Maildir searches, have a look at mairix

    2. Re:Maildir by hedwards · · Score: 3, Interesting

      The main problem that I personally have is where I get emails from people that reference images and such from other servers. Most of the time it's commercial messages that I delete, but sometimes there's a newsletter that I want to save, and the images themselves turn out to be relatively important. Kind of annoys me to have to print them to PDF so that the formating gets preserved.

    3. Re:Maildir by Anonymous Coward · · Score: 0

      You could always create a script which detects such incoming emails and passes all URLs to wget. Of course, then you would be quite vulnerable to spam (showing them that your account exists and is active) and other malicious email (links to zero-days etc). Also, by visiting those links you give out your IP-address and a lot of information about your email client.

    4. Re:maildir by thogard · · Score: 1

      Is Mr Slippery from True Names? If that is the case, wouldn't you want to avoid anything related to mailman?

      True Names by Vernor Vinge is a story that starts out with a guy being questioned by the police since he had to be up to no good since he had more CPU and storage than normal people.

    5. Re:Maildir by Anonymous Coward · · Score: 0

      Considering that it's just some emails, it makes more sense to have a manual program that does it just on the particular emails that I want. But, it's bullshit that so many people break email like that.

  5. Look to the past by jaak · · Score: 2

    Trying to figure out what formats will be available in the future is pretty hard, it's easier to see what formats have been around a long time and are still in use.

    As such, two formats come up readily:

    mbox http://en.wikipedia.org/wiki/Mbox and maildir http://en.wikipedia.org/wiki/Maildir

    1. Re:Look to the past by AK+Marc · · Score: 3, Interesting

      And PST.

    2. Re:Look to the past by Anonymous Coward · · Score: 1

      Yes, actually (though I see what you did there). Everyone appears so focused on patching together custom open source solutions (yes, it is /.), that I see no one has mentioned outlooks pst's.
      I have many many GB from each work assignment, and the new versions of outlook have whopping powerful indexing so I can quickly find obscure emails from 4years ago. And I have had occasion to do so.
      I acknowledge the awesomeness of gmail in particular, but I don't want google to have the only copy of my emails, and don't trust them to not mine the data and give away the answers, so cloud is out for me, at least for archives.
      And as for a format that is future proof, given that is the dominant email client on the planet, I feel confident that importing psts will be a feature of at least 2 gens of clients -after- everyone's moved on from MS.

    3. Re:Look to the past by AK+Marc · · Score: 1

      I wasn't trying to be cute or tricky, but just pointing out what is probably the most highly used mailbox store. And like you noted, sometimes Slashdot ignores the obvious microsoft solution to promote open source. I was making no judgment, just pointing out another option.

    4. Re:Look to the past by dcollins117 · · Score: 1

      Well, to be fair, the OP states "I run LInux" right up front. So, it's not like anyone is pushing an agenda, it's more like we're answering his question appropriately.

    5. Re:Look to the past by Anonymous Coward · · Score: 0

      And PST.

      Ah, the ubiquitous Office pst file. In just a few short months, you too can be babysitting an inscrutible 20GB file of archived mail, hidden somewhere in your Windows user directories that sporadically disappears from your client, gets corrupted for no discernable reason, and needs to be rebuilt monthly.

  6. Convert it to one format by CrypticSpawn · · Score: 1

    Convert all your mail to maildir, and keep it on your home filesystem, whenever you need access from where ever connect to your home vpn, and connect to the filesystem, I have an account in thunderbird where I can search, or do whatever I want to it. Seems to work well.

    1. Re:Convert it to one format by vajorie · · Score: 1

      I thought thunderbird didn't support the maildir format?

  7. Just dump them by sk999 · · Score: 5, Interesting

    Had the same need 20 years ago when migrating from VAX/VMS to Unix. The old emails were saved in a not quite readable format, but I figured I could recover them if necessary. In the end, never bothered. Yes, there are a few (actually, only two) that I'd like to resurrect now, but life moves on.

    1. Re:Just dump them by Anonymous Coward · · Score: 0

      Just to add to what you're saying: it takes time to maintain archives, particularly if those archives are large. If you aren't using those archives, and it sounds like you aren't, then any time spent maintaining the archive is time taken out of your life.

    2. Re:Just dump them by foniksonik · · Score: 1, Insightful

      This. Don't be a data hoarder. Go through them if you must and re-mail the best/important ones. Then dump it to dev/null and move on.

      Do the same with movies, books, bookmarks, photos, apps, docs, etc. you'll be happier without all that baggage. Music of course is another story. Keep that forever and only toss out the dreck (those extra songs on that album you bought because singles didn't exist yet).

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
    3. Re:Just dump them by tutufan · · Score: 2

      Yeah, that's about where I'm at, too. I think of those Buddhist monks making the sand paintings (which they then sweep away). It's an exercise in recognizing the impermanence of all things.

    4. Re:Just dump them by hedwards · · Score: 1

      For mail and bookmarks that makes little sense. It would take me more time to make those decisions than it's really worth. Emails and bookmarks take up such a small amount of space that it's not really worthwhile to worry about.

      Now, general filesystem files, that's a different matter, I used to have a system where I only backed up things I cared about, and let filesystem crashes wipe out the rest of the data. Seemed to work out just fine.

    5. Re:Just dump them by icebraining · · Score: 1

      How large can a personal email archive really be? Even if you're storing the equivalent of a copy of War and Peace per day, for ten years, that's just a couple dozen gigabytes. Nowadays, it's hardly any trouble archiving that.

    6. Re:Just dump them by dcollins117 · · Score: 1

      Don't be a data hoarder.

      Dont tell me what to do with my data.

      Archived emails are an excellent way to track your own history. It often contains data about your life that simply doesn't exist anywhere else. I've used it to answer important questions about what medications I've taken before, who I've talked to about business problems, etc. Knowing exactly what was said and the date it was said on is extremely important data to me. I'm archiving it.

  8. Transfer it all to imap by Antique+Geekmeister · · Score: 1

    Translate it _all_ to IMAP services, in MAILDIR format if available. I've repeatedly been faced with clients, partners, and colleagues who use their email as their insitutional memory and need to migrate to a new service. There are few technologies as straightforward, and robust, as a simiple IMAP server running a light, uncluttered IMAP daemon such as "dovecot", without the complex and nunnecessary requirements of aCyrus IMAP daemon, and most _definitely_ without the complex support requirements of an Exchange, Zimbra, or other corporate grade mail service.

    The primary technology difficulty of this approach is in slurping the mail from your numerous external sources and getting it into the consistent layout. Use folders, not database folders but actual directory folders to separate them. Split them by year to reduce the size of the bulkiest folders. (which MAILDIR does very well). The secondary difficulty is a robust offsite backup policy, so that a hardware or system error does not lose this personal treasure trove of data.

    1. Re:Transfer it all to imap by Anonymous Coward · · Score: 1

      "There are few technologies as straightforward, and robust, as a simiple IMAP server running a light"

      How about concatenating all your files into a single mbox file. I wrote a MIME parser using ragel which could parse _and_ generate data structures for mbox messages faster than they could be loaded from the SLC SSD. It would be trivial to index them. If you want something which will stand the test of time, you want a _single_ file. Directories or folders may seem primitive, but they're not necessarily going to stick around. GMail doesn't even use maildir. All your messages are just blobs in complex data structures spread over multiple servers.

      Also, if you think dovecot is light and uncluttered, then you've been drinking too much of the kool-aid. The whole "dovecot" is faster, safer, more reliable than UW-IMAP or Courier IMAP is marketing. Pro-tip: they all suck. They're all composed of horribly messy code. So... pick your poison.

    2. Re:Transfer it all to imap by icebraining · · Score: 1

      If you want a single file, you can just use tar.

      Directories or folders may seem primitive, but they're not necessarily going to stick around.

      Abstracting the underlying storage (which differs anyway between systems), a directory is just a set of files. They'll exist for as long as files themselves exist, even if they have different names, and they'll certainly be here long after the SMTP headers (used by both Maildir and mbox) is long dead and buried.

      GMail doesn't even use maildir.

      They don't use mbox either.

    3. Re:Transfer it all to imap by epyT-R · · Score: 1

      um.. having one file per message is a lot more portable than some concatenated mess that needs parsing. directories and files aren't going anywhere. While they might be replaced by database structures....oh wait that's right, filesystems ARE databases.

    4. Re:Transfer it all to imap by Immerman · · Score: 1

      A potentially excellent idea IF you can guarantee that one single file won't ever suffer from data corruption. Maildir or other multi-file formats will have a bit more overhead in terms of space and performance, but is far more resistant to data corruption. Good indexing eliminates most of the performance difference, and for my money I'll take data robustness over space any day, at least for personal files. Sure, regular backups are even better, but I know very few people that are actually good about doing that.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    5. Re:Transfer it all to imap by arth1 · · Score: 1

      If you want a single file, you can just use tar.

      And then you need to extract the tar, and your current file system doesn't support colons in file names. Oops.

      A single file where the file name is irrelevant is by far the most portable option.

    6. Re:Transfer it all to imap by tomtomtom · · Score: 1

      A potentially excellent idea IF you can guarantee that one single file won't ever suffer from data corruption. Maildir or other multi-file formats will have a bit more overhead in terms of space and performance, but is far more resistant to data corruption. Good indexing eliminates most of the performance difference, and for my money I'll take data robustness over space any day, at least for personal files. Sure, regular backups are even better, but I know very few people that are actually good about doing that.

      But if the "cur" directory in the Maildir is corrupted then you're back to the same problem; in fact potentially worse depending on how resistant to corruption your filesystem is (will that trash the whole directory or just a part of it? If a single file has corruption in the middle of it can you still read before and after the corruption?). The correct solution to corruption concerns like that is to make good backups. Maildir has advantages over mbox in terms of the consequences of a crash/segfault/whatever while your mail client is writing the mailbox causing an issue but for an archive you won't ever write to it so this shouldn't be an issue.

    7. Re:Transfer it all to imap by icebraining · · Score: 1

      man tar

      --transform, --xform EXPRESSION
                            use sed replace EXPRESSION to transform file names

    8. Re:Transfer it all to imap by arth1 · · Score: 1

      man tar

      --transform, --xform EXPRESSION
                                                  use sed replace EXPRESSION to transform file names

      Then you don't use tar anymore, you use GNU tar. I admin at least a dozen systems where that isn't the case. Depending on a specific program must be the first "no" of archiving anything for posterity.

      You could write a sh script for transforming the file names through standard commands with standard options, and bundle that with your Maildir tar, but that wouldn't solve the real problem: The file names aren't cross system compatible in the first place - changing them won't automatically make Maildir software work with the new file names.

    9. Re:Transfer it all to imap by icebraining · · Score: 1

      I'm not depending on any specific program, I'm just giving the easiest way of doing it. Manually transforming the filenames is easy anyway, and even that is only important under the assumption that I'm working with a broken filesystem.

      The file names aren't cross system compatible in the first place - changing them won't automatically make Maildir software work with the new file names.

      If the Maildir software is running on a system which doesn't support colons, it necessarily has to support some colon-less format. If the Maildir software is running on a system that supports colons, why the hell wouldn't I extract the files there?

    10. Re:Transfer it all to imap by Immerman · · Score: 1

      Not really, have you never had to recover data from a corrupted directory before? All the files are still present and pristine, you've only lost (part of) the index. Raw disk analysis and repair can typically do a pretty good job of recreating the index, especially in modern filesystems that contain redundant organizational data such as bi-directional linked lists. You may still see some actual data corruption, but likely far less than in a consolidated file where a sequence of corrupted data near the beginning may significantly alter the interpretation of everything that follows*.

      You're quite right about actively modified files being more vulnerable to corruption than a read-only archive, but bit-rot still occurs in all common storage media, and the least-modified data is the most vulnerable to that. Moreover a read-only mbox file is still vulnerable to corruption of the directory containing it, or any parent directory thereof.

      *Disclaimer, I have no idea what the internal structure of an mbox file is, perhaps it contains a lot of redundant structural data itself. I do suspect though that there are far more powerful tools designed to repair corrupted filesystems than corrupted mbox files.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    11. Re:Transfer it all to imap by arth1 · · Score: 1

      If the Maildir software is running on a system which doesn't support colons, it necessarily has to support some colon-less format. If the Maildir software is running on a system that supports colons, why the hell wouldn't I extract the files there?

      You confuse system with file system.
      The system may very well support colons in file names, even though the file system doesn't.

      A probably typical use for a mail archive would be to keep it on a transportable hard drive with a file system that can be understood by most operating systems. Like, for instance, NTFS.
      So your local Maildir readihng software may very well understand and expect the colons, while your Maildir can't have them because they're on a file system that uses the colons to mark a stream, a concept your normal file system doesn't have (ext2/3/4) or supports differently (xfs).

      Maildir wasn't built for compatibility or portability. That the filenames are hashes that include the hostname should give you a clue that it wasn't designed to be moved.
      (This also creates problems when using traditional NFS-mounted mail and home directories, but that shouldn't be an issue for archival)

  9. Use a database! by cosm · · Score: 5, Interesting

    I'm a big fan of throwing together a DB when I want to store things categorically like that and want fast searches. If you are up to the task, hunt down some tools/roll your own so that you have a nice relational database and some stored procedures for getting what you want when you need it.

    You could export your emails to some parsable format, write an importer to extract the basics that you want to keep (from/to/subject/body,attachments/entire binary blob/etc) and then bulk insert that mess into on a mysql/sql server tucked away somewhere locally or "in the cloud" (EC2, Azure). Just another option as I'm sure you'll see here many here. At least with this route you are in full control of how you index, what you can search, encryption, performance, level of backups, etc. Maybe not the best way for some but I know if I had over 100000 emails that I wanted searchable very very quickly with advanced SQL like searching, this would be a cool way to do it (time permitting). Good luck! And to the pedantry to ensue...Yes. Good day.

    --
    'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
    1. Re:Use a database! by Anonymous Coward · · Score: 0

      Sheesh, talk about a solution in search of a problem.

    2. Re:Use a database! by zachary.grafton · · Score: 1

      Using a database would be a cool option, but he could also write a Hadoop Map/Reduce job to transform the various input formats into a standard XML format that he could then load into the Apache SOLR search server. It'd save a lot of time trying to mess with the SQL for the search function. Also, he'd get basically free replication if he set up a small Hadoop cluster...

    3. Re:Use a database! by beodd · · Score: 1

      I completely agree.. Database is the answer and most databases can export directly to XML or something for future use. Every OS these days has at least one free database engine available, including android. Once your data is in SQL you can do practically ANYTHING with it!

    4. Re:Use a database! by Anonymous Coward · · Score: 5, Interesting

      And you could make a doilie, and a hat, and a casserole, and wallpaper with the headers, and knit the .signatures into a fancy flying cape.

      Just use IMAP and Maildir. Modern systems are fast enough to allow you to search the content directly, and not vulnerable to the database support wackiness this sort of "I can pre-organize it now and make my life better by wasting it pre-programming my queries" approach.

    5. Re:Use a database! by hobarrera · · Score: 1

      Indeed, databases are the perfect solution, though not relational ones; rather email database formats: like Maildir!

    6. Re:Use a database! by Anonymous Coward · · Score: 0

      When I search for emails, I always search for text either in the subject or in the body. The type of searching features you have in a (SQL-based, I assume) database might solve the problem, but there are far more efficient tools out there. Also, you only need a single table for all your emails, and perhaps one for your contact list, which makes the overhead of the database engine quite extreme.

      As other posters have stated: Use Maildir for storage and find a good tool for indexing and searching. Or just use grep, since it's most likely fast enough if you sort your email in decently named folders.

    7. Re:Use a database! by thogard · · Score: 1

      Or just keep it in mbox files and use grep to filter your searches. I have over 200,000 email messages in my archive and sometimes I do need to find a message from several years back but the cool imap and email programs aren't very good at helping me find just what I want most of the time.

    8. Re:Use a database! by Anonymous Coward · · Score: 0

      Since 1998 I've copied emails from 5 different systems (GMail, Qmail,Axigen etc) to a homespun system based around a Firebird (originally Interbase) DB. The email systems are accessible in their own way, but at any time I can go back and retrieve anything in any way quickly, as all relevant fields are indexed. I work with a number of company ventures, with some cross over situations, so having all email accessible in one place, should I need it, is quite handy at times. I've written a UI in Delphi (for PC), and one in PHP (for remote access) so can view (and reply to / forward if required) the email anywhere.

      As regards comments on "hoarding", I have a rich historical resource which I now value highly.

    9. Re:Use a database! by PuZZleDucK · · Score: 1

      I've always wanted _THAT_ cape :D ... you know me so well Mr. Coward.

      --
      Can a person program a new solution to a problem? Why should anyone be able to stop such a thing? -Richard Stallman
  10. Plain text by Anonymous Coward · · Score: 1

    I use plain text or HTML if it has embedded pictures. Works great.

  11. RE: Best Way To Archive and Access Ancient Emails? by Anonymous Coward · · Score: 0

    Check out Zimbra Desktop, it may be able to handle all the old formats. I use it to download my yahoo mail without paying for premium yahoo garbage in order to back it all up. It's open source and has a linux version.

  12. mbox files by Anonymous Coward · · Score: 0

    one per year, done.

  13. Gmail by lga · · Score: 5, Informative

    Best method of storing and searching old email? Gmail. It can import from pop and imap so you can point it at your other inboxes and let it get on with it.You can upload from other mail clients to Google's imap server. Obviously it's amazing at searching through the archives.

    Best method if you're concerned about Gmail's privacy? I'm still working on that one.

    1. Re:Gmail by zekele2 · · Score: 4, Informative

      Best method of storing and searching old email? Gmail. It can import from pop and imap so you can point it at your other inboxes and let it get on with it.You can upload from other mail clients to Google's imap server. Obviously it's amazing at searching through the archives.

      Best method if you're concerned about Gmail's privacy? I'm still working on that one.

      The solution is Google Apps for your own domain. $5 a month per user, 25Gb space, IMAP, no advertising (which is where most of the privacy issues arise), and most importantly, no lock-in as you can switch your email to a different provider at any time without changing email address. As you said, Gmail is by far the best for searching old email. I haven't run an email server for years.

    2. Re:Gmail by icebraining · · Score: 1

      Better hope the algorithmic overlords don't decide you're a spammer and lock you out. Remember not to keep your eggs in a single basket.

    3. Re:Gmail by hobarrera · · Score: 1

      Obviously it's amazing at searching through the archives.

      Regrettably, gmail's IMAP implementation does not support the sort command.

    4. Re:Gmail by Anonymous Coward · · Score: 0

      Best method if you're concerned about gmail privacy? Look around, everyone cares less.

    5. Re:Gmail by Sigg3.net · · Score: 1

      "Best method if you're concerned about Gmail's privacy? I'm still working on that one."

      Buy Google and shut it down for everyone else. Simple!

    6. Re:Gmail by aliquis · · Score: 1

      Best method if you're concerned about Gmail's privacy? I'm still working on that one.

      You solve one third of it by using GPG.

      (At first I wanted half as in not the senders and receivers ID but then I thought about all those cases with shitty pages sending you information unencrypted.)

      By volume I guess you solve 0,2% because close to noone encrypts.

    7. Re:Gmail by lga · · Score: 1

      The other privacy fear apart from adverts is open access for the state to trawl through Gmail's servers at will.
      http://www.slate.com/blogs/future_tense/2013/03/26/andrew_weissmann_fbi_wants_real_time_gmail_dropbox_spying_power.html
      Of course here in the UK the government want to intercept communications before they even get to Google's servers so the only real answer is a vpn and a private mail server in some other country.

    8. Re:Gmail by Anonymous Coward · · Score: 0

      Use a freebie web service that may delete your account without notice for storage? You're nuts. Gmail shouldn't be used for anything but porn mailing lists and throwaway logins.

  14. maildir by Mr.+Slippery · · Score: 1

    I keep mail archives going as far back as 1996 on my home box in mh format. Sylpheed (my usual mail client), alpine (used over ssh), and nmh (occasionally used in scripting fashion) can all access it, plus I've got the usual Unixy goodness of grep and find and so on. It's a robust and simple setup.

    I pull mail from my server onto my home box via POP. Why anyone wants their e-mail archives on a box that's not under their physical control is beyond my comprehension.

    --
    Tom Swiss | the infamous tms | my blog
    You cannot wash away blood with blood
  15. I use gmail by Anonymous Coward · · Score: 0

    I have email there from 2005 or earlier and I can get it on pretty much any device I want.

    1. Re:I use gmail by the+eric+conspiracy · · Score: 3, Insightful

      So can anyone with a subpoena. And you can bet Google would be running their advertising stuff on that.

      There is no way I would put my life on a public server like that.

    2. Re:I use gmail by wisnoskij · · Score: 2

      It is email. AKA over the web. AKA public.
      And someone with a subpoena can get your records off of your ISP, or just come into your house and take it off your computers.

      --
      Troll is not a replacement for I disagree.
    3. Re:I use gmail by the+eric+conspiracy · · Score: 1

      A subpoena won't get you into a house. That requires a search warrant which requires probably cause of a crime.

      Completely different.

    4. Re:I use gmail by Mister+Liberty · · Score: 1

      Except the gmail server and the ISP server might be located in
      different places.

    5. Re:I use gmail by Anonymous Coward · · Score: 1

      For a criminal case, yes. Not for a civil case.

      OTOH, for a subpoena to issue for private papers like that the court typically must already know what's in those papers, more or less. You cannot use a subpoena to simply go hunting for evidence to help make a claim. This is why it's so hard to catch illegal dumping of toxic waste, for example. Cancer cluster plaintiffs can't just go ask a judge for a subpoena to scour corporate records. They need a whistleblower to say, "yeah... they did it, and it's documented, and it's in a filing cabinet at such-and-such address". _Then_ you can get a subpoena.

    6. Re:I use gmail by JWSmythe · · Score: 1

          If you get a pissy enough opposing counsel and a judge to cooperate, warrants can be issued. Trust me, being on the wrong side of it. Of course, this will vary by your jurisdiction, IANAL, and especially not yours.

      --
      Serious? Seriousness is well above my pay grade.
    7. Re:I use gmail by arth1 · · Score: 1

      It is email. AKA over the web.

      No, e-mail is not over the web. The SMTP protocol is older than and not a part of the HTTP protocol.
      There are some e-mail clients that use the web for the presentation layer, but that has nothing to do with e-mail itself.

      AKA public.

      Also patently false. Most e-mail servers today use SSL to communicate. Even if you sniff the line, you can't get the content of my e-mail.

    8. Re:I use gmail by xiux · · Score: 1

      So I take it that you only send and receive email with people running their own private mail server; certainly not with anyone using an email provider that 'anyone with a subpoena' could access.

    9. Re:I use gmail by wisnoskij · · Score: 1

      Well it is as public as any Gmail account.

      --
      Troll is not a replacement for I disagree.
    10. Re:I use gmail by Anonymous Coward · · Score: 0

      ANY IT company doing business with people in the U.S. MUST give the gov't, on request, the info they serve to the US person or company. This includes Gmail, Apple mail, Microsoft mail, AOL mail, Yahoo mail, or even your local ISP's mail.

      Now, for the really interesting stuff, or the really paranoid, remember that the NSA makes a copy of ALL internet traffic? So even if you have your own email, guess who can decode it.

  16. The obvious answer by 93+Escort+Wagon · · Score: 5, Funny

    Design a MySQL database for storing your mail messages, keying on sender, subject, date, and presence of attachments (bonus points for storing the attachments as blobs rather than as external files). Then write a perl script that'll automatically parse all your incoming email and convert it to database entries. I suppose if you're lazy the script could just monitor your mail spool, but it'd be better to just have it listen for incoming connections and handle the mail directly.

    Next, make copies of that script, modifying as necessary to process all your old mail archives.

    Oh, and you'll need to write another perl script to access all new mail - not from your mail spool, but from this database. You should probably name this system after some animal too. If you absolutely MUST have a graphical interface on it, don't use anything newer than TCL+Tk - but going with curses would be a better choice.

    Oh - it has to be GPLv3, or we'll hate you and probably mailbomb your machine.

    What - isn't that the Slashdot way?

    --
    #DeleteChrome
    1. Re:The obvious answer by the+eric+conspiracy · · Score: 4, Funny

      Holy wheel reinvention, Batman.

    2. Re:The obvious answer by Anonymous Coward · · Score: 0

      way too old-school. You totally need to use a nosql database and ruby to do this.

    3. Re:The obvious answer by NoMaster · · Score: 1

      You mock - but ...

      --
      What part of "a well regulated militia" do you not understand?
    4. Re:The obvious answer by Quick+Reply · · Score: 1

      Seriously, can someone suggest some FOSS solutions that do just this. I have a whole bunch of mbox stores forked at different times. I want to put it all together, remove the duplicates and then run queries to weed out what I need and what to turf! I have hoarded for too long. Gmail came out in 2004 and they said "Archive instead of delete!"... Well now my Inbox is practically unmanageable! Google get most things right but not that one unfortunately.

    5. Re:The obvious answer by Anonymous Coward · · Score: 0

      I think there's an emacs extension that does exactly what you have described, but the key bindings are difficult to set up.

  17. Stop being a hoarder by realmolo · · Score: 4, Insightful

    You don't need all those e-mails. Keep the few you actually care about (copy and paste the text into a regular file, and save any attachments you want), and get on with your life.

    People that keep every e-mail are weird. Quit living in the past.

    1. Re:Stop being a hoarder by Anonymous Coward · · Score: 0

      You don't need all those e-mails. Keep the few you actually care about (copy and paste the text into a regular file, and save any attachments you want), and get on with your life.

      People that keep every e-mail are weird. Quit living in the past.

      Amen. Not to mention the fact that emails are fully disclosable - in public - in any civil or criminal action you may ever face. I used to keep every email too, until it struck me a few years ago that emails you perceived as a casual conversation with friends could be taken out of context in a more formal setting. Imagine being on the stand in court and suddenly you're confronted with a joke in poor taste you made one Friday night after a few beers. Why set yourself up like that? If it wasn't important enough to save at the time, delete it after a few months and move on.

    2. Re:Stop being a hoarder by Ardyvee · · Score: 5, Interesting

      It's kind of like photos, you know? Or letters, and such. People like to store those things, because they serve as a memory aid for what the mind no longer holds. It is also quite useful for history reconstruction/when you are old and have nothing else to do but a box full of photos/letters/etc.

      Not to say that you are wrong on your point, except on the weird part. Unless you are okay with double standards, or you also consider anybody who keeps photos of parties/graduations/etc weird... Just saying.

      --
      I don't care if I'm wrong. I only care about everyone obtaining something from the discussion.
    3. Re:Stop being a hoarder by Anonymous Coward · · Score: 1

      I'm the opposite. I have a few photos, winnowed down from many. Sometimes mice or moths get a few, but mostly it is culling the herd with the common rationale being...bad photography, bad memory, or just...do I really need this?

      Maybe I'll be an unhappy and confused old person, or maybe I'll invent happier memories in my senility.

    4. Re:Stop being a hoarder by FuzzNugget · · Score: 3, Insightful

      Right, I don't need that client email from a few years ago to remind me about a detail on a project. It'd be better just to look like an idiot in front of them.

      Just because *you* don't need that archive doesn't mean everyone else doesn't need it.

      Why the hell *not* keep a conveniently categorized, organized, sorted, indexed and searchable database of all your important electronic communications? A few gigs is nothing these days.

    5. Re:Stop being a hoarder by icebraining · · Score: 4, Interesting

      But why would I waste time manually finding and copying individual emails, when I can just let the backup script archive them all for virtually no cost?

    6. Re:Stop being a hoarder by bill_mcgonigle · · Score: 1

      The OCD guy was projecting onto hoarders.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    7. Re:Stop being a hoarder by Anonymous Coward · · Score: 0

      That is exactly right.

      If you need a photo to help you remember something, then that something is obviously not important enough for you to remember without the photo. Therefore, it is not important to remember it. Therefore, you don't need the photo.

      If you, on the other hand, are senile and need photos to remember important things, then you might as well spend all your time looking at photos. Just make sure you retire first, otherwise you waste company (or government) resources by taking up a position that someone with a sound mind could have.

    8. Re:Stop being a hoarder by CAIMLAS · · Score: 2

      I recently had to go back 5 years to retrieve an email as evidence in my pending divorce. "I never said that!" - like hell you didn't, I've got it right here (and here, and here, and here). She wanted to play hardball, so hardball it was. :(

      I've had to go back 6+ months to retrieve important mail for myself for work and other personal matters as well. Every written correspondence with my ex-wife was in email - some damning, as above, but all of our "love letters" when we were courting. Wouldn't you want to show that to your kids one day, maybe?

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    9. Re:Stop being a hoarder by houghi · · Score: 2

      Not to say that you are wrong on your point, except on the weird part. Unless you are okay with double standards, or you also consider anybody who keeps photos of parties/graduations/etc weird... Just saying.

      I would. In the olden days you would have 24 photos of a whole week. I have been on holidays for 6 weeks and have a film of 36 photo's to prove it.
      Now you have 300 photo's of just packing your suitcase. So yes, if you keep all those thousands of photo's that are completely irrelevant, then you are weird.
      So limit yourself to a reasonable number of pictures per year that you are allowed to keep. And no, 3000 pictures of your baby is not reasonable. It is punishment for the baby when it gets older. It is as if you are saying that it used to be cute and now you hate it.

      Als when you are old and nothing else to do, then there is your memory. My great-aunt was 155 when she died and the last several years of her life when she did not have any visitors, she would just REMEMBER the events. With her brain.

      And keeping all your letters? Good luck when your wife finds your love letters. It does not matter if that was 10 years before you met her, you will sleep on the couch till they are burned.

      --
      Don't fight for your country, if your country does not fight for you.
    10. Re:Stop being a hoarder by Sigg3.net · · Score: 1

      That's the reason I save "important" e-mail in PST format. Never going to bother. Have a beer, instead.

    11. Re:Stop being a hoarder by Tom · · Score: 2

      It's not about living in the past, it is about not worrying what to keep and what to store. And the message you think most important today is likely to be completely worthless in five years, while you would love to still have that other message you thought unimportant back then.

      I keep everything so I can decide today which old message I consider important. It is very rarely that I venture into the archive, but I have needed messages two years old at times.

      --
      Assorted stuff I do sometimes: Lemuria.org
    12. Re:Stop being a hoarder by Anonymous Coward · · Score: 0

      My great-aunt was 155 when she died

      Citation? :)

    13. Re:Stop being a hoarder by johnw · · Score: 1

      There's another good reason to keep all your e-mails. If you keep the lot then you can say with much more certainty, "You never sent me an e-mail about that." You can search your archive with confidence that if it isn't there then it never existed.

      It's very tempting to start deleting some, on the grounds that you just *know* that they aren't important enough to keep, but almost immediately you'll find yourself wavering over where exactly to draw the line. If you don't have a line - just store everything - then the whole process gets much faster. Gmail's Archive button is brilliant.

      As someone else said, a few gigs of e-mail storage is nothing these days.

  18. Thunderbird - for when you're done with GMail by Nexus7 · · Score: 3, Informative

    I need to archive emails that I can search later - but with a twist. These are employees who've left the company. I can't keep 'em on at Google Apps 'cause I have to pay for that by user. So I use IMAP (making sure to set Chats to be shown in the IMAP list), create an account in Thunderbird, and slurp it all on to the local machine. It keeps all the folders, although I doesn't seem to be smart enough to figure out multiple labels, so it looks like it downloads the same email multiple times, once for it's folder, and once for "All Mail." Then I delete the account at Google. You just have to be sure to click through all the folders in Thunderbird and make sure it is done downloading before you blow the Google account away.

    1. Re:Thunderbird - for when you're done with GMail by Anonymous Coward · · Score: 0

      I do the same thing here for our Gmail users who leave. Works great!

    2. Re:Thunderbird - for when you're done with GMail by Anonymous Coward · · Score: 0

      I use a python imap2mbox script. Thunderbird, Apple Mail and may others can read it.... and best part its plain text and holds all attachments uuencoded.

    3. Re:Thunderbird - for when you're done with GMail by Anonymous Coward · · Score: 0

      Or just check the option to have all the mail and documents moved to a specified account when you delete it from your Google Apps user list.

  19. Mbox format and compress them by dbIII · · Score: 1

    You can even read them in a text editor, every half decent email client can use them and there are free or cheap converters for the email clients that are not half decent.

  20. notmuch by Anonymous Coward · · Score: 3, Informative

    http://notmuchmail.org is Gmail for people that don't trust Google. Works great with your existing IMAP server using offlineimap.

    1. Re:notmuch by Beetle+B. · · Score: 1

      I second notmuch. You don't have to use it as your mail reader - you can just use it for indexing and queries. It has Python bindings which makes it really nice. It can search by all the criteria listed except perhaps attachments (it does tag messages with attachments, but I'm not sure what types of searches the submitter wants to do with them). Date based searching is possible, but the syntax is a pain - a nicer way to specify dates has been on their TODO list forever.

      At the moment it doesn't support mbox, and all my mail had been in that format. It was a pain to convert everything to maildir or a similar format it supported, but it was only a one time pain...

      --
      Beetle B.
  21. Gmail. by jtownatpunk.net · · Score: 3, Insightful

    As soon as gmail made IMAP available, everything went there. I used to get my stuff via POP and saved it all going back to the early 90s. When IMAP went live on gmail, I let it chug away for hours and hours until it was synced and all my archived stuff was stored on my gmail account. They've been bumping up the limit faster than my mail's built up so I'm now at 3.9 gigs used of 10.1 available, holding about twenty years of email. I have email clients on a desktop and couple laptops that I fire up every couple of months to sync with gmail and keep local stores in the event that google screws up and loses my data. (I like to think I'd be smart enough to disconnect from the internet before accessing the local clients if my gmail account ever went blank but I've got multiple copies just in case I forget.)

    I know that won't work for email fiends who pile up a gig a month but it works for me. I don't even bother sorting my email any more. It's faster to just search. Not like the old days when it would take my email client half an hour to slog through all the messages. :)

    1. Re:Gmail. by girlinatrainingbra · · Score: 1

      I did not know that gmail had something like that which would let you import your old emails into gmail. I'd just seen imap as a way to export mail out of gmail. Damn. They've got a time machine point of view into all the nooks and crannies of communication done by everyone who's bothered to imap-archive their ancient emails into google. That's even sicker than I'd ever anticipated or would have thought I'd have believed. ;>)

    2. Re:Gmail. by Anonymous Coward · · Score: 0

      That's plain stupid. It only takes a tweak in some magic Google algorithm or an unfounded complaint about spam to delete your account. You will not be notified, and if they react to your protest at all, the only answer will be "Your mail is gone, deal with it." Remember you're not Google's customer, you're their product and their bots can throw you away whenever they feel you've become stale.

  22. Maildir by chipperdog · · Score: 2

    Set up a local courier IMAP server and copy mails there, and archive the Maildirs...each message will be a file and you can use tools like grep to search the Maildirs

  23. Hoarder? by Anonymous Coward · · Score: 0

    Does this classify you as a hoarder? :) I know I am!

  24. You know the old joke... by digitalhermit · · Score: 1

    Call it "Lexi Diamond - Ronda Rousey mud wrestle" and share it on a torrent and soon the whole world will back it up for you....

    Seriously though, even if you were a previous email hoarder, you will likely be able to comfortably archive all your emails *and* the tools needed to access them on a USB stick. Start by finding all the tools you need, source included, and place them on your storage medium. Compress it. Send it to the cloud.

    Mail files can be stored by year (easy enough to do with awk or other mail tools). It will a lot smaller then some may think when you consider the size of your mail spool to the typical Library of Congress (10 Terabytes around 2002). Newegg currerntly has a 3TB drive for $140...

  25. The Email Mandala by Irate+Engineer · · Score: 1, Informative

    Just sweep it all into the Trash Bin, breathe deep, and move on with your life confident in the impermanence of all things.

    Namaste!

    --

    Left MS Windows for Linux Mint and never looked back!

    Vote for Bernie in 2016!

    1. Re:The Email Mandala by Tumbleweed · · Score: 2

      Just sweep it all into the Trash Bin, breathe deep, and move on with your life confident in the impermanence of all things.

      Plus that Trash Bin program has _great_ compression!

    2. Re:The Email Mandala by johnw · · Score: 1

      Plus that Trash Bin program has _great_ compression!

      but empirical tests indicate that it's lossy compression.

  26. Gmail, Hotmail by wisnoskij · · Score: 1

    Take your pick.

    --
    Troll is not a replacement for I disagree.
  27. mbox + dovecot + mairix by Creosote · · Score: 2

    I wouldn't posit this as the best way, but it's what I do. I keep my archival mail on a local filesytem arranged in directories, stored in the old-school mbox format. I run Dovecot under OS X for IMAP access to those messages from anywhere; when I need to search through the whole collection, I use mairix (an indexing and retrieval system).

  28. You are kidding, right? by certain+death · · Score: 3

    Just delete some goddamn email.. hoarder!

    --
    "My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus
    1. Re:You are kidding, right? by Anonymous Coward · · Score: 0

      buuuut, future generations might want to know what it was really like to subscribe to bonzi buddy and coupons.com.

    2. Re:You are kidding, right? by LihTox · · Score: 1

      Just delete some goddamn email.. hoarder!

      ...and this is why we can't watch half of the first six seasons of Doctor Who.

  29. maildir + mutt + maildir-utils by Emperor+Tiberius · · Score: 1

    Simple. Archive mail by the year as it gets too big. Use mutt's search for the basic searching and maildir-utils for the heavy lifting.

    To those saying keeping email forever is hoarding: not if it's done right. You'd be surprised how useful it is to go back and find an email from four years ago.

    1. Re:maildir + mutt + maildir-utils by Anonymous Coward · · Score: 2, Interesting

      To those saying keeping email forever is hoarding: not if it's done right.

      That's like saying neck deep rooms of newspapers/magazines in a house isn't hording if you stack them neatly with little paths running through it.

    2. Re:maildir + mutt + maildir-utils by Anonymous Coward · · Score: 0

      That analogy doesn't fit with the physical world. Your old emails don't get in the way of everyday living.

      Also those who are so afraid of looking back, maybe they are afraid of the examined life. Might reveal how purposeless their efforts have been.

    3. Re:maildir + mutt + maildir-utils by tqk · · Score: 1

      Archive mail by the year as it gets too big. Use mutt's search for the basic searching and maildir-utils for the heavy lifting.

      Agreed, and add OfflineIMAP, grepmail, and ImapFilter. Best is grepmail output to a file creates a new mail file, and mutt can handle gzipped mail files. There's your backup.

      --
      "Tongue tied and twisted, just an Earth bound misfit ..." -- Pink Floyd.
  30. Really? This is what passes for ... by Anonymous Coward · · Score: 1

    .... "News for nerds. Stuff that matters" these days?

    Oh, and stick it all in imap.

  31. Outlook Express! by Vegan+Cyclist · · Score: 2

    heh - i have all my email going back to '98 in Outlook Express. Best email program ever! It's nearly perfect for what i want. (Any way to get it to do inline spell checking, ie, underlines misspelled words as you type?) Still running it on an XP box. Been using Windows Live Essentials a bit for Win8, it's not horrific, but lacks some of the characteristics..hope MS injects some of the OE spirit into it..

    1. Re:Outlook Express! by PuZZleDucK · · Score: 1

      Did I miss a pun or are you serious?

      --
      Can a person program a new solution to a problem? Why should anyone be able to stop such a thing? -Richard Stallman
  32. Mailstore by Anonymous Coward · · Score: 1

    I have been using Mailstore for this purpose for the last few years. Works for my gmail, hosted exchange and my old, unlamented exchange system. Faster to find things with their query than Thunderbird/Outlook search. And the price was right. Before I retired, I kept separate email archives for my major clients -- made it easy to cleanup the file when the relationship wound down. This no longer matters. Everything ends up in Mailstore now -- except the immense quantities of spam. Works for me -- your mileage may vary.

  33. hypermail by sshock · · Score: 1

    An old open-source tool called hypermail may be what you're looking for. It parses mbox files and produces HTML pages with the emails sorted by thread, author, subject, date, etc. http://hypermail-project.org/

  34. Eudora by saccade.com · · Score: 2

    Eudora still runs on my Win7 box. I have email going back to at least the early '90s. All plaintext and easily searchable.

  35. Google by Tihstae · · Score: 1

    Upload it to one or several Google accounts and you have a permanent searchable archive.

    1. Re:Google by icebraining · · Score: 1

      No; Google's data mining algorithms will have a permanent searchable archive. You'll just have a temporary archive, until they decide to retire the service or ban you for some offense perceived by their automated "police".

    2. Re:Google by Tihstae · · Score: 1

      Yep, you are right. I did forget the air quotes around "permanent".

  36. Thunderbird Local Inbox by corychristison · · Score: 1

    I use Thunderbird.

    My mailboxes are all IMAP, so I found a use for the Local Inbox in Thunderbird that I always thought was a useless feature.

    At the end of the year, I create a subfolder labeled by Year, and I download all copies from the the year before the last (eg, my last download was of 2011 emails), then I purge them from the IMAP server to save space This way I still have universal access to my last years emails but easily searchable archives available at home.

    If you keep regular backups of your /home dir then you need not worry about losing them.

    1. Re:Thunderbird Local Inbox by fa2k · · Score: 1

      Same idea here. For my personal mail I have a filter to move sent and received mail older than 30 days off to a local folder. I use my web host/DNS registrar's IMAP service, and I would probably keep them for longer if I ran my own. Most of my /home including the local folder is synced between my desktop and laptop, so I still have access from there, but only to the last 30 days from the web and my mobile. Thunderbird seems to use a reasonable format where I can actually read the messages directly from the file, so even if there are no programs to read it, I can scan the archive and get what I want

  37. Same rules as any archiving: by jrronimo · · Score: 5, Insightful

    I'd say follow the same rules as any archiving of media:

    Pick one format and migrate all of your messages to that: In this case, I'd say mbox. Thunderbird and most other mail programs read it and you can get most of your mail into mbox format via IMAP/Thunderbird from whatever mail client can read your old ones. You can store your mbox files locally in Thunderbird and gain Thunderbird's searching (for instance) without the need for an actual back-end. I was able to read some mail stored in Netscape Mail because it was just mbox files and opening them in Thunderbird was a breeze.

    Most importantly: Every 5-10 years, re-evaluate your storage choice. Is Thunderbird still around? Is mbox still pretty well regarded? If you find you need to migrate again, do it! If both are still active / supported, then hold onto 'em. The only way to perpetually maintain media access is to make sure your choices are still valid on a regular basis. This is true for any media: As the old formats go obsolete (cassette tape, VHS), you need to migrate that data to the next readily accessible format (CDs, DVDs; FLACs, MPEG(?)).

    I think the biggest problem is that you have a mish-mash of stored files right now. You'll save yourself a headache in the future by tearing the band-aid off now and taking the time to get all of your mail into one format. Then, in the future, when you need to convert, it'll be many steps easier since you won't have to visit Slashdot and find out what to do about your mail again next time. :)

    1. Re:Same rules as any archiving: by Anonymous Coward · · Score: 0

      mbox is terrible for searching and archiving. You can't create differential backups, you can't use regular filesystem tools to manage the emails, and you are very sensitive to HDD problems.

      Always use Maildir.

    2. Re:Same rules as any archiving: by CAIMLAS · · Score: 1

      mbox has not been well regarded for 10 years. Maildir or Maildir+, please. And avoid cyrusIMAP at all costs.

      If possible, everything mail related should be saved as Maildir or, secondly, Microsoft's OST format (which has a pretty good history of being readable).

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    3. Re:Same rules as any archiving: by Bronster · · Score: 1

      I'm interested in your reasons for avoiding Cyrus. Which version are you looking at?

    4. Re:Same rules as any archiving: by CAIMLAS · · Score: 1

      The reason for not looking at Cyrus is, in part, because you actually have to look at the version. I've dealt with IIRC version 1.9 up through 2.2 something or other. But that's besides the point; there is a reason why RedHat has moved from Cyrus years ago, and why you won't find it : the developers don't support it unless you are on the very latest version. There are a myriad of other reasons, but of major significance:

      * The tools are not well documented
      * the code is poorly internally documented and is mostly an unknown kludge with few developers
      * it uses a proprietary mail store
      * metadata is stored independently from the data ('for performance' they say - a problem 10+ years ago but no longer, really)
      * it is designed for a very narrow use case (large scale distributed installations)
      * there are myriad problems with upgrading due to minor db changes etc. along the way
      * its developers/ML/IRC channel won't help you unless you're on the current developers snapshot, or something equally asinine.
      * did I mention it's poorly documented?
      * versions offered by distros are old and crusty
      * nobody uses it anymore?
      * hardly anyone knows it

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    5. Re:Same rules as any archiving: by Bronster · · Score: 1

      Honestly, we won't support 2.2 or earlier any more. 2.3 was released in 2005. There comes a point when you have to move on.

      2.3 is still supported for security issues, but it's not going to see much more development.

      2.4 is supported just fine, and has been out since 2010. I really object to the characterisation of the mailing list and the IRC channel, since I'm on both and I try pretty hard to deal with any problems, though if you're on 2.2 or 2.3 I will often say "that's fixed in 2.4, and is not really fixable in earlier versions due to massive architectural changes which were required to make the behaviour consistent in the first place".

      Poor documentation is a problem, and using berkeley DB was a really stupid idea. The upgrading problems are always due to berkeley version incompatibility. It sucks so bad that transitioning everything away from berkeley is an important goal for 2.5.

      Plenty of people still use it, though Dovecot is kinda eating our lunch.

    6. Re:Same rules as any archiving: by CAIMLAS · · Score: 1

      And yet, Debian, Ubuntu, RedHat, et al package 2.2 or (in the case of RHEL) 2.3 still. Why?

      If the developers won't support what's being packaged for common use (even "how do I?" type questions get answered with "upgrade"), then how is the end user supposed to get any support?

      I believe I've talked to you personally on the IRC channel and remember you being helpful. But you were by far the exception in that regard.

      A big part of why Cyrus has its lunch eaten by Dovecot is because moving from Cyrus to anything else is just as easy, if not easier than, upgrading CyrusIMAP to a later version. Cyrus upgrade to a later version: I've more than likely got to not only upgrade my distribution but also pull down a 3rd party repository or build from source. As an added "gotcha", documentation for migration from Cyrus to anything else (well, courier or dovecot) is unimaginably better than Cyrus upgrade documentation - and markedly less complex with fewer corner cases. I've been down this road before myself, twice, and gave Cyrus good faith investigation prior to doing the migrations (I've got 3+ years Cyrus admin experience and less than a year on Dovecot/Courier). If I'm going to go to the headache of upgrading and migrating to a newer distro (as is the case when you inherit the usually-ancient Cyrus box), why not get something that has proper, standardized Maildir support and won't rape itself in rare circumstances (power outages, hardware failure)?

      And dovecot is simply easier to administrate than Cyrus, largely due to the "separate metadata" issue. Night and day: I can just drop a single email they accidentially deleted from yesterday and call it good with dovecot; with cyrus, I've got to restore the whole directory and rebuild it (in all likelihood).

      Yes, it's easier to get 3rd party tools which are, for all intents and purposes, sysadmin hack jobs (imapsync, IMAP Tools) working between two disparate MSAs than it is to actually upgrade Cyrus. And that I can use 'supported', non-PPA, non-3rd party packages to do so? Score.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    7. Re:Same rules as any archiving: by Bronster · · Score: 1

      Yeah, fair enough. The failure to package new versions is a real problem. Thanks for the honest feedback!

  38. lightweight local IMAP server for Windows? by Miamicanes · · Score: 1

    Does there exist any program that's basically a lightweight Windows auto-starting (but 99.999% asleep and inert unless you're actively using it) background service that does nothing besides act like an abstraction layer between some kind of reasonable file-based mailstore roughly analogous to an Outlook .pst file (AFAIK, canonical Maildir is a physical impossibility under Windows) and any IMAP-compatible email client?

    I don't care about being able to access it from anywhere besides my local PC... binding to localhost, and refusing to talk to anything external to my PC is fine. I've just had it with the mess Thunderbird's developers made of their local mailstore right around the time it completely went to hell ~4 years ago (well, and the mess they made with Thunderbird in general). For years, I just moved mailstore files around. Then, for some insane reason, it seems like Thunderbird's files just kind of exploded and proliferated... and worse, did so in ways that seem to screw up and confuse newer versions if you try to make them use files from an older version. If I could just run a semi-fake local IMAP server on my PC to abstract my mail storage away from Thunderbird itself, I could try other mail clients without having to worry about how I'm going to get my mail into them (a remote IMAP server is out of the question... I literally have gigabytes of email, some that literally came from Eudora Pro more than 18 years ago and just got converted and converted as I went along.

    I thought about the usual option of an ARM-based mini-server or an old laptop, but I also have zero-tolerance for server slowness. Stutters and hiccups are bad enough without adding a server that has the resources & performance of a 500MHz Pentium III (on a good day) into the equation. At least if I'm running it locally & it spends 99.999% of its time harmlessly asleep, when I *do* go to access it, it'll have the full resources of a quadcore 3.2GHz i7 behind it for nearly-instantaneous response. The problem with running a full-blown IMAP4 server on my PC is that it's going to always be soaking up ram, and running at a higher background level (anticipating constant remote users it'll never actually see). I just want something that runs as fast and hard as it needs to and can when called upon to do so, then goes and silently hides in the corner until the next time I speak to it.

  39. maildir: qmail, courier-imapd, roundcube by czth · · Score: 2

    I run qmail for sending/receiving mail (on Gentoo; netqmail package), using maildir, of course. On top of that, I run the Courier IMAP server on my internal network (with TLS encryption). Until a few months ago I used Mutt as a client (console-based), but I've moved to using Roundcube (web-based email), which I initially installed for my wife, and have been happy with it. I also have some automatic filtering to folders via Maildrop (another Courier utility; it looks at a ~/.mailfilter file to route mail).

    Roundcube/the IMAP server's search is OK most of the time - I keep my inbox small and move older mail to sub-folders - when I want to do advanced searches or search large mailboxes I log in and grep through folders of interest; this works well with the maildir format with one file per message. Maildir was also quite resilient when I had a HD crash and needed to recover some lost mail (block scan for blocks that look like mail headers found most missing items, and I do better backups now - mail is under ~/.maildir and gets backed up automatically).

    I would move older messages to maildir (there are plenty of mbox converters, and almost anything non-proprietary should be convertible to mbox or maildir via existing programs or a short perl script) - even if at some point maildir dies off entirely, which seems unlikely, converting it to another format will always be trivial due to its simplicity and it has the advantages mentioned above of being able to search easily with grep etc.

    1. Re:maildir: qmail, courier-imapd, roundcube by eneville · · Score: 0

      That sounds fine, you were doing things correctly until you stopped using mutt. Using rc/squirrelmail both require PHP and web server, just to look at mail. mutt has an excellent search facility, can keep an index of headers and will show you want you want 99.9% of the time giving the result I want within two or three lines: l ~d 7d ~s holiday etc

  40. Try Object Storage by Anonymous Coward · · Score: 0

    Convert to maildir (one message per file). Then upload into an object storage service, such as Amazon S3 or something based on Openstack Swift. Those services are designed to handle millions of objects in a single, flat namespace.

    Since S3 and Swift both publish over HTTP, you can use some sort of simple text document indexing service (local Solr installation, maybe? Dunno). Object storage is perfect for archiving and storage of emails, but I don't know of any commercial or open source implementations around that yet.

  41. PSTs / nightly backup by Tuxedo+Jack · · Score: 1

    I use PSTs and nightly backup.

    Sure, you can use GMail or the amorphous cloud for your purposes, but quite frankly, remember - if it's not in your possession, it's not as secure as it could be.

    No, I don't have world-ending secrets in my possession, but yes, I do get paranoid about my data.

    --

    Striking fear in the authors of godawful fanfiction, I am here, appearing in darkness, Tuxedo Jack!
  42. Clueless. Takes 10 minutes start to finish by raymorris · · Score: 3, Informative

    Force feed? WTF are you taking about? Dovecot can use any make mail format. Just set MAILDIR if it's in a non-standard directory. So the whole procedure is:

    yum install dovecot
    vim /etc/dovecot.conf (only if using a nonstandard mail location)
    service dovecot restart
    set username and password in GUI client

    I never will understand why some people feel the need to post on topics they don't have the slightest clue about.

    1. Re:Clueless. Takes 10 minutes start to finish by WillKemp · · Score: 5, Funny

      I never will understand why some people feel the need to post on topics they don't have the slightest clue about.

      Because it's a long standing Slashdot tradition!

    2. Re:Clueless. Takes 10 minutes start to finish by Anonymous Coward · · Score: 0

      Ever since Senator Pizza started Slashdot in 2004, people have engaged in this sort of behavior, you idiot.

  43. Someon has figured this out by Anonymous Coward · · Score: 0

    because LinkedIn is sending me suggestions that I know people, that I know for a fact I only corresponded with a few times ten years ago on another old email account, and know no other way

  44. Archiving and retrieval solution. by Anonymous Coward · · Score: 0

    Just print them all out. Why?

    1. Reliability. Paper lasts longer and is not subject to bit rot, computer crashes, or system failures.
    2. Redundancy. Just make extra copies.
    3. Disaster recovery. Ship a second set of copies to another city.
    4. Indexing and retrieval? Boost the economy and create jobs by hiring a nephew/niece/virtual assistant.

  45. Maildir+IMAP by Anonymous Coward · · Score: 0

    I have them backed up in my gmail account as well, but basically I have a IMAP folder called Archive and all of my old mail is in there. The search in mutt ignores those folders unless I am in those folders. Tada!

    As a side point I have 51GB of mail!!! (1988 onward)

  46. Take a look at MH by tbuskey · · Score: 1

    MH stores each email as a plain text file, each folder as a directory. It uses the unix filesystem as its database. It's very quick and has tools to re-order a folder quickly.

    In addition, MH has tools to convert mail formats. It was designed in the days of low cpu power and small disks. It also lent itself well to being wrapped by other tools like xmh, exmh and mh-e so you don't have to learn the raw MH commands.

    Yes, IMAP is cool, but don't discount MH. Plus the O'Reilly MH book is free as a PDF.

    Oh, some IMAP servers and mail clients use MH format or something derived from it.

    1. Re:Take a look at MH by RedLeg · · Score: 1
      +1

      What he said, MH is the tool for this task. I have mail going back to early 90s, each message in a separate text file, sorted into directories by year. Once you're archiving in this format, you can then index the files for more rapid searches, or, if you're old school, just grep around when you're looking for something.

      Best thing is, once you have them organized this way, you're done, and can burn backups of the archive (by year) directories to CD or other long term storage, and not have to worry about loosing anything.

      One warning: beware filesystem limitations on number of files in a directory. If you convert a HUGE amount of mail at one time and dump it into one dir, you may end up with a problem, so RTFM (read the friendly man pages) and plan ahead accordingly. You may need for example to split a year into quarters if that years mail exceeds a limit (not that I've run into that problem....)

      BTW, the O'Reilly book is a must. Grab the pdf, but get a paper copy if you can as it's quite hefty.

      Hope this helps.....

      Red

    2. Re:Take a look at MH by uckelman · · Score: 1

      (n)MH is definitely what you want for this. I have mail going back to the late 90s stored this way. And you want to use mairix for indexing it all. When I need to find something in the 450k messages I have, I can find it from the command line using mairix in about the time it takes me to type the command. (I would argue that nmh is an excellent mail client for working with new mail as well. And in the past year or so, there's been a burst of development activity, as well---the most I've seen in the whole time I've been using it---involving such luminaries as Paul Vixie. So it's definitely not dying.)

  47. It's IO bound, not CPU or RAM by raymorris · · Score: 1

    The 500 Mhz Pentium and the Core i7 will have roughly the same performance in this use case because IO is the bottleneck. The speed is the speed of the disk and filesystem.

    To be more specific, a Pentium has a throughput of around 2 GB/s. Compare to 10 MB/s for a 7200 RPM drive doing random access on small files, 100 MB/s on large ones.

    So it's entirely reasonably to use a small low power Linux system like a Western Digital World Edition network drive or the ARM based stuff you mentioned for IO bound applications such as a file server or IMAP. You won't lose any appreciable performance.

    1. Re:It's IO bound, not CPU or RAM by Anonymous Coward · · Score: 0

      That's ridiculous. Any e-mail archive these days will fit in RAM (cached filesystem). So the Core i7 will blow away the Pentium on searches and indexing and everything else.

    2. Re:It's IO bound, not CPU or RAM by Anonymous Coward · · Score: 0

      A slow CPU with a SSD would beat a fast CPU without one.

  48. Convert to a common format by crath · · Score: 1

    I gathered up all my historical email records a few years ago, and used Aid4Mail to convert all the various mailbox formats to the common format I use today. Choose a format that's convenient for you, and standardize on it. Here's the product website: http://www.aid4mail.com/

    1. Re:Convert to a common format by crath · · Score: 1

      I should add that I use X1, http://x1.com/, to index and access the Aid4Mail converted emails.

  49. Consistency and Simplicity by FuzzNugget · · Score: 1

    Find a mail program that you like and stick with it. Important factors to consider:

    * How it stores the mail and attachments: mbox or other ASCII format good, proprietary binary format like PST, bad.

    * How well it manages years and years worth of 10's or 100's of emails a day.

    * How gracefully it fails from data corruption (this is where storage configurations that keep eggs in separate baskets are a very good thing indeed)

    * Something with good importers. If not, there are 3rd party programs and services that claim to be able to convert from any mail client to another.

    Personally, I've used Eudora for the last ten years (v7.x when it was still maintained by Qualcomm, not that godawful travesty Mozilla cobbled together, which is just Thunderbird dolled up to look slightly like Eudora but function nothing like it) and have scarcely considered anything else.

    Yes, it's a bit goofy, requires some advanced trickery at times and the configuration screens might as well all be labeled "Miscellaneous", but it more than makes up for it...

    No other mail client can come close to the MDI that lets you view endlessly configurable summaries of any number of mailboxes at a glance.

    It stores all mail in plain text (close to mbox, but not quite ... though close enough that you can grab an mbox file and trick Eudora into thinking it's a native file without any manual editing) and dumps all attachments as normal files into a single directory. Yes, that directory becomes kind of ... well, huge, so it's kind of klunky in that way, but it does ensure that you can access those files at any time without having to deal with any interface beyond the operating system.

    Mail consumes pretty much just what that amount of text and files would. Meta data and configuration takes up little else.

    I have 10+ years (~3GB) of email and there are zero performance issues. It does offer the option of indexing mailboxes for faster searches as well.

    It is truly the geek's mail client. I love this mail program so much, I will use it in a VM when it eventually becomes incompatible, but it works problem-free on Windows 7. And even in the event that it were somehow unusable, I'd still have access to all my mail; after all, it's just a bunch of flat text files.

    1. Re:Consistency and Simplicity by FuzzNugget · · Score: 1

      Aw, sonofabitch, posting from a phone doesn't always work out. Just stop reading after the first "flat text files".

  50. Are you guys kidding??? by Anonymous Coward · · Score: 1

    Try Mailstore http://www.mailstore.com/.

  51. Same problem, but at the server level. by evenmoreconfused · · Score: 1

    Our family / family business has run, with increasing formality, email servers in various flavours since the mid-90's. These servers have processed messages including everything from lots (like really lots -- in the tens of thousands at least) of family pictures to (no doubt) lots of personal email of the many dozens of staff who have worked with us over the years. In general, the server settings have always been set to "retain everything", including full Exchange journalling, because there was no way to delete things without risking losing some important pictures someone sent to someone else.

    I'm not too worried about the business activity traffic, because anything recent is well replicated in many other places -- primarily in various cached Outlook data files. But where family members threw away their old machines, the only copies of these important things are in the server journals we have archived. Is there some solution that can rationalize these millions of messages into some sort of structure?

    In addition, I presume that this can only be done for individuals who actually want old items to be retrieved from the archives, as anyone else would be protected by privacy rights.

    --
    No. Well...maybe. Actually, yes. It really just depends.
  52. IMAPSize will go the job by Anonymous Coward · · Score: 0

    Try IMAPSize

    Since your email service is IMAP based, it will make an off-line cache which you can periodically sync.
    You can search it in various ways, and it will let you take backups.

    You can even use it to migrate your mail history to another IMAP server.

    Enjoy.

  53. I've wondered that by rickb928 · · Score: 1

    I've got 16 years' worth of email in a multitude of formats, including all of those that are excoriated as unreliable, fatally flawed, or Satan's preferred meas of communication with our world.

    I have them on O-L-D CD-ROMS, DVDs, saved to two cloud services, on my personal server, and tar'd/zipped/RAR'd/Stuff'd in some of the same places, and up to three copies in each of these places. I wonder if the .sitf files will even decompress, but no point in deleting them.

    I've really only used a very few mail clients though; pine, elm, Eudora, GroupWise, Outlook in so many versions, POTP. My own servers have been the usual evolution of Sendmail, Dovecot, and now dbMail. And I use Yahoo! and Gmail as mirrors. Yahoo! Mail is my spam bucket as well as second realtime mirror, Gmail I use as a mirror and for some primary communications. my 'personal' email has been the same since 1996, but I've had three work emails.

    And my email archives are close to unusable, of course. I guess I should try and take some of this advice.

    And when I do rifle through the really old stuff, I need to put it through a spam filter. Some of that old spam people would pay for today. Not the IRC stuff.

    I really should take a week and make sense of it. Naw, crap, who am I kidding? 1996?

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  54. how archival? by davidwr · · Score: 1

    If you just want "archival, for the next 5-10 years, then redo it all over again as technology changes" then the other answers in this thread are what you want.

    If you want "archival, for 20+ years, without having to do it over every 5-10 years" then some form of human-readable plain text or at least representeded-as-plain-text-for-attachments is what you want. Make sure all file attachments are in well-documented formats (e.g. JPEG) so someone will be able to write a decoder for them 20 years from now if one isn't readily available. If they aren't, be sure to store file-format information with your archive.

    If you want "archival, for 200+ years" the you want all of the above, stored on archival media that are likely to be readable 200+ years from now along with a description of how to interpret html and file attachments. Archival paper, archival microfilm, archival "etched onto plastic but microscopicly" media, etc. are what you want.

    If you want "archival, for 20,000+ years" then talk to the people who are working on how to label long-term (10,000+ -year) nuclear waste storage dumps, they may have some ideas that work.

    If you want "archival, 2M+ years" then I'm out of ideas. Look me up in 2M+1 years and tell me what you found that worked for you.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  55. i like this for email archiving by chaos4u · · Score: 2

    http://www.mailstore.com/en/mailstore-home.aspx

    works well quick searches and its local .

    unfortunately its windows only but may work fine under wine.

    --
    Music the Paint dancefloor the canvas your body the brush
  56. In my case by kilodelta · · Score: 1

    I've got six years of archived email. I like Thunderbird's archiving scheme. You can have it automatically create archives as a calendar year goes by.

  57. Keep raw, search raw by holophrastic · · Score: 1

    If you're not actually working with the old e-mails, and you don't mind waiting a few moments to search them, just keep the raw e-mails, in raw transmissible e-mail format, and be done.

    They are nothing more than a whack of text files at that point. And they are properly formatted with headers and everything.

    Want to seach? Full text search and you're done. Want to search by subject only" Simple regex search /^Subject\:.*?cucumber/ finds "cucumber" only on the subject line (yeah yeah, header folding exists, this isn't a regex lesson).

    Every e-mail client from the birth of the first one until the death of the last one support raw e-mail formats. And you can probably just pipe them all to sendmail and send them all again.

    All of that said, I'm a big proponent of forgetting the past. Hoarding is consistent with many psychological problems.

  58. MailSteward by cmholm · · Score: 1

    I've been using MailSteward on OSX. The starter version handles 15k or so entries using SQLite before it starts to bog, while the trade up is a front end to MySQL.

    --
    Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.
  59. Imap shlylock by Anonymous Coward · · Score: 0

    All you need to is export to UNIX MBX then you can convert to anythying

    for windows, thebat! can doo this shiznits

  60. ElasticSearch by geekboybt · · Score: 1

    Store it all as plain text files (mbox format?), and write a quick script to send it all to an ElasticSearch index.

  61. Stop being a data hoarder by Anonymous Coward · · Score: 0

    Seriously... Just delete that old, stale useless data and move on. Stop being OCD and hoarding that old crap...

  62. Completely missing the point... by ZeroPly · · Score: 3, Insightful

    The problem is that a throwaway email might become critically important later on. There is no way to know in advance what is important and what is not.

    True story: while deployed in the Army, our communications guy could not find a piece of equipment which was very important and very pricey. He had been signing the monthly inventory forms saying he had it, assuming it was in a cabinet. He could not find any paperwork showing it was signed out - it had just disappeared sometime in the last 3 months and no one had seen it.

    On a long shot, I started searching my email - since I keep every last one. Sure enough, about 2 months prior, there was a throwaway email from him to the effect that he was going to turn in item X for repair since it was acting flaky. He checked at the contractor mentioned in that email, and it was sitting on the shelf waiting for pickup.

    --
    Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
    1. Re:Completely missing the point... by Anonymous Coward · · Score: 0

      "our communications guy could not find a piece of equipment which was very important and very pricey"

      Let me guess - was it a hammer or a toilet seat? I you had said Air Force, I would have suggested B2.

  63. tar by radicimo · · Score: 1

    tar tzvf | grep

    I've got email going back 20 years, and this has not failed me. Maybe you need to sub 'x' for 't' and use less, but don't over-engineer this problem.

    --
    100 REM PISS OFF CODE FASCISTS 200 GOTO 100
  64. Re:Really? This is what passes for ... by radicimo · · Score: 1

    And this is what passes for snarky comment, now-a-days?

    Back in my day, we'd walk a mile uphill in the snow and get our 9600 baud modems connected before chiming in from the peanut gallery.

    --
    100 REM PISS OFF CODE FASCISTS 200 GOTO 100
  65. Ancient e-mail? by Black+Parrot · · Score: 1

    I just put the original clay tablets in shoeboxes and stack them in my garage.

    --
    Sheesh, evil *and* a jerk. -- Jade
  66. Ancient Email by rossdee · · Score: 1

    You should probably get Dr Daniel Jackson or Samantha Carter or Dr Rodney McKay to translate it into english first.

    1. Re:Ancient Email by aristotle-dude · · Score: 1

      You should probably get Dr Daniel Jackson or Samantha Carter or Dr Rodney McKay to translate it into english first.

      I was looking for someone to say this and I was not disappointed. Thank you. Would Ancient be encoded in Unicode?

      --
      Jesus was a compassionate social conservative who called individuals to sin no more.
  67. Parchment by SpaghettiPattern · · Score: 3, Funny

    Parchment -no less- does it for my ancient emails.

    --

    I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
    1. Re:Parchment by Anonymous Coward · · Score: 0

      Meh - that's a faddish new technology that won't last. I carve all my emails onto stone tablets.

  68. What I did for all about 50,000 emails by gaspyy · · Score: 2

    I had to archive the emails since 1996. They were in multiple formats - Outlook Express dbx, Mailbox from Netscape Navigator and Thunderbird, Outlook.

    I converted all of them in .eml format. It's a simple, text format that can be read by the OS and easily parsed by any program and script. Much better than mbox or something else. Then I renamed all of them according to a rule - YYYYMMDDhhmm [From] [Subject]

    Now I can easily find any email. I can browse them using the file system, I can search them using the OS or via a script. Windows indexes them and extracts the metadata so any search is very quick.

    1. Re:What I did for all about 50,000 emails by bigjosh · · Score: 1

      I tried to do the same thing, but not as much luck...

      http://josh.com/notes/archive-huge-pst-email-collection/default.htm

      How did you get Windows to read and index the meta data from the headers in the EML files?

      How do you view the EML files, especially ones with MIME html and attachments?

      Thanks,
      josh

  69. Delete by Anonymous Coward · · Score: 0

    Delete them

  70. office365 by ljw1004 · · Score: 1

    I switched to Office365 for this. $8/month for unlimited storage and good bandwidth.

    I used to use mbox format on a regular Linux web host (pair networks) from 1995 which worked fine. But they weren't scaling up their storage allowance as technology progressed, so as my archive grew bigger, I was paying too much for it. Tipping point was about 2005.

    Next I switched to maildir format on an opensuse box running in my basement, 1tb RAID2 hard disk backed up automatically to a 1tb Usb drive, and also mirrored (using unison) to an offsite machine. My email archive was important to me and I never wanted to lose it.

    But this was a pain. It was a pain to administer the system, a pain to make sure I had spare disks for when they failed, a pain to be sure my software RAID would even work, a pain to make sure my firewall was always open to inbound IMAPS, a pain to periodically move email from pair networks to this archive every year.

    Also I provide email for my family on the other side of the Atlantic, and this basement server wasn't suitable for them (not enough uptime when e.g. rewiring my house).

    Office365 wound up being cheaper for my family than pair networks. It has an "unlimited" $8/month plan for me, and 25gb $4/month for my family. It has a decent enough webclient and great (fast) online search, far faster than any searching I did with mbox or maildir servers. I feel more secure with its reliability and uptime. And being a Microsoft employee (C#/VB language design team, unrelated to Office) I use Windows devices and email clients, which generally work better with Exchange than IMAP.

    1. Re:office365 by ljw1004 · · Score: 1

      PS. My archive is 26gb at the moment.

    2. Re:office365 by Anonymous Coward · · Score: 0

      You're a M$ employee and you're paying retail? We're paying less than $1/month/user as part of our new EA

  71. Best medium ever by Anonymous Coward · · Score: 0

    Zip disks.

  72. Emailchemy is your ticket by 1stumpy · · Score: 1

    Use www.weirdkid.com/products/emailchemy to ensure your mails are "normalized" to a homogenous format ( rfc2822 ). Then, find a Linux solution akin to www.mailsteward.com to manage your archive.

  73. Sometimes searching is a pain by thogard · · Score: 1

    I have a number of messages from the mid 80s that are in MMDF or PMDF format as well as mbox but they are on a reel to reel tape and my new computer doest have any place for the tape to go.

    Can anyone in Melbourne read a 9 track tape?

    1. Re:Sometimes searching is a pain by johnw · · Score: 1

      Can anyone in Melbourne read a 9 track tape?

      If you'd asked that question 25 years ago I would have been able to help you.

      Now I too have tapes which I wrote in Melbourne then, but no means to read them back. If I could only read them, I would happily dump them to a tiny bit of my current on-line storage and it would be nice to look at some of that old material.

      The dangers of not keeping your storage media up to date...

  74. mboxes by Anonymous Coward · · Score: 0

    I dont trust anyone else for mail than myself.

    I have all mails dating back to the early 90ies in monthly mbox files. Every year or so i gzip -9 those mbox files.

    For archival purpose the first rule in my procmailrc is "write copy of email to archive". I _never_ delete mails from the archive.

    Looking up mails is done by "grepmail" which constructs a new temporary mbox file from old mbox files by searching header and/or body.

    Thats approx. ~5GB gzipped mbox files for my private emails.

    mbox files are much better for archival and compression than maildirs are.

    This all fits well into my email setup with mutt, procmail, bogofilter, gpg, fetching mails with uucp to my laptop. Sounds all very 70ies but i am used to be able to have access to all my emails without network connectivity. Open the lid, read/write mails, answer to mailinglists etc. The next time i have IP connectivity e.g. a low bandwidth GSM connection i exchange mails - compressed, failure resilient etc.. UUCP has long been forgotten and people try to get the same comfort with offlimap etc ...

    A UNIX system be it a laptop should have email connectivity.

    Flo f@zz.de

  75. Re:Pentrate my tight virgin anus, apk by nospam007 · · Score: 0

    Anybody having a Greasemonkey script to filter out this asshole?

    It's getting tedious lately, my Finger hurts from scrolling.

  76. Use safe format by Anonymous Coward · · Score: 0

    You had better choosing a safe and well tested format. You know that DVD, HDD, SSD or similar modern storage technologies are not reliable enough or do not have a historic proof of reliability. Choose something with centuries of history. You know, my grandma's love letters and photos survived WWII and all the bombings. And birth records on paper go back to the middle-ages. But you may turn for more sound technology to ancient Mesopotamia.

  77. pine & bzgrep by funkboy · · Score: 1

    Personally, I'm lazy. I've been using Pine (now Alpine) directly on a mail server for all my mail since 1995 (on my own servers since '97). Old habits die hard.

    It works great over really low bandwidth connections (though sometimes high latency can be annoying), you can view any attachments you need automagically with X11 forwarding via SSH, and you don't care at all about which machine you're accessing it from. Also you get to read the TEXT in your mails & not HTML, most of which is useless garbage when it comes to emails (for the 0.1% of HTML mail I do actually need to read as HTML, such as tables, Linx often gets the job done, & if not I just bounce it to my gmail account, which is pretty much full of spam otherwise).

    When various folders get Too Big (or I move on to another job, or whatever) I move them into an "archive" folder (& I have an "old-archive" folder for the really ancient stuff) and bzip2 them. I archive my inbox files at the beginning of every year too. When I need to find something old, I just bzgrep for it. After an archiving session (which takes all of 5 minutes) the whole thing gets backed up from my mail server to my NAS at home.

    Did I mention that my backup MX is a SparcStation 20 and still works just fine for all this? Of course I don't keep much on it but if my main server dies I can still send & receive mail just fine.

    Note that this is not exactly something I sat down & spent time thinking about, I just started moving mail out of the way like this when I left college & built a couple of OpenBSD mail & DNS servers, and kept doing it as it works well enough.

  78. "An anonymous reader writes" by Emperor+Shaddam+IV · · Score: 1

    "An anonymous reader writes" - What? I've been on Slashdot for a while and enough is enough. "An anonymous reader" shouldn't be able to submit articles. "Anonymous" cowards are already trolling Slashdot to dead. If you don't have the guts to post under your username, then why should you have the right to post anything?

    Just my opinion.

  79. Move on by rich_salz · · Score: 2

    Ugh. Drop all that stuff. Who needs it? My gmail folder has 20 messages in it. Lighten your (psychic) load.

  80. Local Maildir + Imap Server or Mutt by Anonymous Coward · · Score: 0

    I would convert everything to Maildir and either use mutt directly on them and/or run a local Imap server and rely on its searching capabilities.

  81. Compressed monthly mbox files by dskoll · · Score: 1

    I have more than 13 years' worth of archived mail; I keep two bzip-compressed mbox files for each month: Sent-YYYY-MM.bz2 and Received-YYYY-MM.bz2

    Searching is a bit slow, but I hardly ever have to search that far back so I don't mind. More recent mail (going back about a year) stays on the IMAP server. Also, my company produces an email archiving product that lets me search very quickly based on sender, recipient, subject, full-text body search, etc. which is great for mail going back up to about two years.

  82. problem ? by Tom · · Score: 1

    I fail to see the problem. I have mails going back a decade or more all stored in maildir on an imap server. Done. I've changed clients several times, servers several times, no problem.

    So what's the problem that makes an "ask slashdot" necessary?

    --
    Assorted stuff I do sometimes: Lemuria.org
  83. Mozilla mail (Seamonkey) and one folder per year. by Anonymous Coward · · Score: 0
    I've got 10 years worth sitting there right now, it all works fine. For me it's about 3000 emails in each. It's pretty useful to have it all there online, I probably go back and dig something out once or twice a week.

    Easy to back up the files, and a documented format.

  84. Same Problem; My Solution by Thumper_SVX · · Score: 1

    This one will probably get buried because of the sheer weight of comments in this thread... but here goes.

    I had the same quandary about four years ago; mail going back a decade at that point which I wanted to keep around. It was in various clients, as in your case. What I did was build a POSTFIX / IMAP server using (at the time) Gentoo. I then attached those clients and simply copied all the archived email up, one client at a time. I then went about building a SquirrelMail front end which did great for a while.

    The problem as you can probably ascertain was search. It was tough to trawl through all those emails... but last year I converted my entire email system to Zimbra and simply did an IMAP import of all the data from my old IMAP server to the Zimbra database. While Zimbra still stores everything in MBX format (I think), it also uses MYSQL to store index data. It also happens that Zimbra has a really nice web front end, and everything's really nicely integrated. Now I have email going back 15 years or thereabouts, all searchable in pretty swift order. I added the Zimbra Desktop app to my laptops and I even have a local cache. As for backups, I have a Linode running a custom kernel and the ZFS filesystem, and nightly I have a script on my server that backs up the entire Zimbra store using "zfs send / zfs recv". Since my entire email store is around 9GB it isn't terribly expensive... and I use the same Linode for hosting a hub for my OpenVPN network... which means all my computers can communicate privately from anywhere in the world across a constantly up VPN tunnel.

    And for those who think you don't need to keep all that email... bully for you. I have had to refer to decade old emails before in order to provide better service to my customers. My email archive also came in very handy during the divorce from my ex wife for reasons you can probably imagine but I'd rather not get into. That's also handy stuff to keep around... just in case.

  85. Unhelpful Answers. Restate Question. by Anonymous Coward · · Score: 0

    Wow, lots of unhelpful 'dump it' stuff here. Allow me to restate the question in a fashion that might draw a decent response.

    I need a mail archiving solution. I have lots of mail/mailboxes, some years old that must be retained due to policy or legal requirement. I want to off load it from my client(s) and server(s) for performance and backup reasons, but I need to be able to go back and extract messages for evidence, discovery, or what-have-you.

    What's an efficient and inexpensive way for me to archive my mail way from my email client and server, yet keep it available, should I need to search for something? Ideally, the solution would not be home built/custom. A COTS solution seems like a better idea for the sake of ongoing compatibility.

    There are many solutions for Outlook/Exchange, but they don't support Evolution and they are also very expensive. What are my options ofr non-Microsoft systems?

    1. Re:Unhelpful Answers. Restate Question. by dskoll · · Score: 1

      We (Roaring Penguin Software Inc.) have an anti-spam system that has an archiving add-on if you're looking for commercial software. It's built on PostgreSQL, so supports searching including full-text body searches.

      Searching is done via a Web interface; we don't have specific integration with particular email clients.

    2. Re:Unhelpful Answers. Restate Question. by ImdatS · · Score: 1

      There were enough really good solutions proposed above:

      1) Standardize on one format - preferably maildir(1)
      2) Convert all your emails into rfcxxxx (i forgot - but you can look it up) and copy to maildir-format
      3) on Linux or other *nix-based systems, you can use many tools to search

      (1) I have 15 years of email, about 60GB, roughly 120,000 Emails (sent + received). I use Mac OSX, so I have stored them in Mail.app - because Mail.app uses something like maildir-format and I will never lose my emails, even when I switch to another client.
      Every time a year ends, I create a two new folders under Archives/Inbox and Archives/Sent respectively with the year in for digits, e.g.:

      Archives/Inbox/2012
      Archives/Sent/2012

      Then I move the emails to the respective folder. From then on, I exclude these entries from "standard default search"; Only when I purposefully want to search in them, I choose to do so.

      This has worked quite well for fifteen years now - and before Mail.app, I used to use PowerMail, Eudora, Outlook Express, Mutt, Pine, and so on - now I standardized on Mail.app with its maildir-structure and am happy.

    3. Re:Unhelpful Answers. Restate Question. by ImdatS · · Score: 1

      Oh, and forgot to mention:

      I would suggest NOT to use any (commercial) solution that stores your emails in some weird BLOB from which there is no export possibility at one point. As long as any (commercial) solution supports something like maildir, you will be fine - anything else will be a sure guarantee that you won't be able to read your emails anymore once the solution-provider is gone and there is no documentation about their storage format.

      Lastly: on backups - don't look for anything that is email-specific - I mean with that: treat your emails like any other important file/data that you have. There's nothing wrong with being paranoid with regards to backups (I have a 4-level-backup system for my emails, photos, music, and other important documents... the only thing I'm missing at the moment is an off-site backup solution for these...)

  86. "unexpected online backup" by Anonymous Coward · · Score: 0

    You could just leak them online. They'd be around forever.

  87. Huge Legal Liability by TechnoJoe · · Score: 0

    IANAL, but those emails pose a HUGE legal liability if you ever get sued. You might think it's innocent enough -- maybe a cat picture or something -- but you have no idea how creative a lawyer can be. Perhaps he'll try to claim copyright infringement or something.

    You need to take the complete opposite approach. You should only be archiving emails that have a clear need to be retained. I realize you cannot always know that in advance. However, in the rare occurrences I didn't have an email I needed, I was able to get the information another way. IMHO, you are far better off risking not having an email than a sh!t storm legal woes from having too many emails.

  88. I did this a bit over a year ago by Anonymous Coward · · Score: 0

    I did this a bit over a year ago with my Gmail account. I wanted to have a local backup "just in case" I had problems with Google in the future (I think the Google+ push-out contributed to my motivation). I know it's not entirely analogous to your scenario because I've only got a single source, but the process should be adaptable.

    I created a loopback ZFS filesystem (gmail.backup) and a script that runs every day at 11am (cron! w00t!). The script

    1. Mounts the ZFS loopback;
    2. Marks the ZFS as read-write;
    3. Runs offlineimap (python script) to copy gmail IMAP to local IMAP
    4. Takes a snapshot;
    5. Marks the ZFS as read-only;
    6. Unmounts the ZFS

    To access the archives, I use mutt pointed at the local imap folders (i.e., "mutt -f gmail/INBOX". To get a "point in time" picture of my email, I point mutt at the relevant snapshot ("mutt -f gmail.snapdir/20120630/INBOX").

    I haven't dug into the specifics of retrieving emails beyond the mutt interface, but since each message (maildir format) is a file, I'm assuming that if/when the need arises that I'll be able to get what I need. In the meantime, I've got almost nine years of messages synced to my local system and updated every day.

  89. maildir + notmuch by Anonymous Coward · · Score: 0

    http://notmuchmail.org/

  90. Recoll - private personal search by Anonymous Coward · · Score: 0

    Recoll http://www.lesbonscomptes.com/recoll/ allows you to search your messages, and it handles most storage formats. You will get duplicates if you index live and backup messages, but you can filter by path in an advanced search.

    recoll runs locally, with your messages stored in mbox or most other common email formats, so the archive remains private and out of hosted services, if you want privacy (and / or speed).

    (email is just a tiny part of desktop search - you get a keyword index of every document in the search path, with stemming)

  91. Searching email archives by israel · · Score: 1

    I have had the same issue, email archives that are complete from the mid-90s and sporadic emails from the 1980s. What I've been doing is archiving most of the messages in text files in mbox format , one file per month, and I gzip them after a certain period of time to conserve space.

    Unfortunately 'grep' and similar utilities have been insufficient to do decent searches on them. What I ended up doing is building my own search utility in python. It allows me to specify multiple search terms, regular expressions or strings, search blocks of files (e.g. in this case finding blocks that are delimited by a starting '^From ' line), as well as automatically descending into directories, tar files, gzipped files, etc. With this I can easily run a search across any set of files that I desire (even if I've tarred and compressed them) and get out resulting output that I can read with a mail reader program such as Mutt. I've found it to be extremely useful for this, as well as almost all other search tasks that I do.

    If you are interested in using it, I've made it available on github. It's at https://github.com/bruceisrael/search

  92. What's wrong with IMAP? by lpq · · Score: 1

    Dovecot handles all the formats you mentioned, mbox, maildir, etc...

    Then access everything w/IMAP.

    I keep everything in mbox format...going back to 1999....

    Things are very hierarchical. I don't keep everything. List mails
    go into list-boxes and I read them like newsgroups.

    I have multiple levels of personal mail.....sorta like google's circles...
    but unrelated to that...

    Keep it all in /home/lpq/mail ... about 5.1G of it...

  93. Mairix by subreality · · Score: 1

    I don't bother sorting or categorizing or anything. I just have procmail send a copy to an archive file which I rotate once a year, and I index it all with mairix: http://www.rpcurnow.force9.co.uk/mairix/ . I can search on date, sender, subject, body, etc, and in a few seconds I have what I need.

  94. Archiving by Anonymous Coward · · Score: 0

    I am presently working on this myself... I have pulled down all my webmail into Outlook 2010.

    As long as the view is not set to "show in converstaions", you can ctrl-a the "all mail" list, and then go to

    File > Save as > text only

    It takes a while, and Outlook looks like it is going to lock up, but it will dump out 25,000+ messages into one pretty large text file (22mb). Still legible, full message headers, no attachments though.

    I put it in Evernote (I am guessing OneNote would work too), but the .txt files have to be broken down to 4mb or so files, as Evernote refuses to deal with files longer than 5 million characters. Evernote will accept up to 25mb files according to their FAQ, but if it's trying to index the file I think the rules change.

    Anyway, I have 15 years of webmail stored away there. Text is a decent archive solution for me in this case.