Ask Slashdot: Best (or Better) Ways To Archive Email?
An anonymous reader writes: I've been using email since the early '90s and have probably half a million emails in various places and accounts. Some of them are currently in .tar files, others in the original folders from obsolete or I-don't-use-them-anymore mail clients. Some IMAP, some POP3. You get the picture. I don't often need to access emails older than a year or two, but when I do, I have found that my only hope for the truly archived ones is to guess what Grep combo might find the right text in the file ... and then pick through the often unformatted, unwrapped, super ugly text until I find the email address or info that I'm searching for. Because of this, I tend to at-all-costs leave emails on servers or at least in the clients so that I can more easily search and find.
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
MailStore Home is the defacto best free method I've found: http://www.mailstore.com/en/mailstore-home-email-archiving.aspx
I remember having a similar problem years ago with E-mail in several systems and getting annoyed that everything was in different formats in different E-mail clients. I fixed the problem by setting up my own IMAP server. An IMAP server is a mail server that's compatible with virtually ALL E-mail clients but what's important about them is they act as mail stores unlike POP3 so you can upload mail to an IMAP server without screwing up formatting or anything. Then once you get all your E-mail up to your IMAP server, you can chose to just store it there (just remember to back it up now and then) or you can redownload it all into a Mail folder on ThunderBird (Backup Thunderbird's Mail store folder for protection) ThunderBird probably isn't going away in the foreseeable future but if it does, sometime down the road you can reuse your IMAP server to transfer it to another mail client.
Use the Thunderbird archive.
Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore.
Nope! :-)
I have about 10 years of email in Thunderbird. It keeps data in the mbox format which is a well supported open standard. The files are human readable and can be greped. There's lots of 3rd-party tools that support mbox. Thunderbird builds indexes (maybe those are proprietary) which are good enough that I can search that decade of email in a few seconds. (Maybe that is only searching by subject, to, and from. Message body searches might take longer). I remove attachments from old mail though, because that eats up space and is not valuable. If I needed the attachment, I saved it somewhere more appropriate.
The Thunderbird archive feature merely moves the mail into separate mbox folders to keep the main file from getting too big. It doesn't make them proprietary.
The hard part might be moving existing mail into that format from whatever it is in now.
Holding your business emails too long is a liability risk... they are subject to discovery in the case of a lawsuit. Most businesses have a limited email retention policy for that very reason.
This sword clearly cuts both ways.
And that is why sane companies come up with sane records retention schedules. Minutes from the fun committee meeting? Yeah, toss 'em after 5 years. Design and patent work? 25 years, plus additional terms if required.
Some of the things we have at the major Canadian bank I work at have life times ranging from 5 years to 20 years, to permanent. It's a PITA to setup because people hate changing their routines, but it absolutely comes in handy in exactly those types of cases.
I asked a similar question to Slashdot about a month ago, where I wanted to stash E-mail and have it accessible if I'm on the road.
I looked at a few options. Using a virtual machine, an offsite storage provider, and so on.
What I have wound up doing is buying a NAS. Synology or QNAP are good companies for this. The NAS I bought was a basic one, but it supports RAID 1, which is critical. It also gets backed up automatically via a script that goes in via SSH, creates a tar file, pipes it to zbackup which has a repository on another NAS. zbackup is ideal for backups of E-mail, and having another machine pull the backups helps deal with ransomware, once the bad guys start hitting devices.
I then enabled the mail server functionality, which gave me an implementation of dovecot and roundcube. This not just gave me IMAP access, but access via the web (SSL). Using the onboard firewalling, I limited the IP range that the NAS talks with, to just the IP range of the commercial VPN service I use (which is a small provider, run by some competent admins.) This way, for an attacker to even get to an open port forwarded past the router to the machine, they have to have an account with that small VPN provider.
For me, this has worked well. I have access to my E-mail over IMAP or the web. Since the NAS doesn't send or receive mail directly (mail just gets copied to it when archived), it doesn't need SMTP access in or out.
Caveat: Focus on security when setting this up. Ideally, you could use the NAS's built in eCryptFS capability to protect the IMAP maildir directories so physical theft of the NAS doesn't mean your critical E-mails belong to someone else. From there, put the NAS in its own DMZ, blocking all outgoing traffic except for it checking for OS updates, and only allowing incoming traffic to the TLS-based ports, preferably with heavy IP restrictions. For backups, do a pull based system, so if the NAS gets infected, the bad guys can only put garbage in the backups, and not attack previously stored data.