Ask Slashdot: Best (or Better) Ways To Archive Email?
An anonymous reader writes: I've been using email since the early '90s and have probably half a million emails in various places and accounts. Some of them are currently in .tar files, others in the original folders from obsolete or I-don't-use-them-anymore mail clients. Some IMAP, some POP3. You get the picture. I don't often need to access emails older than a year or two, but when I do, I have found that my only hope for the truly archived ones is to guess what Grep combo might find the right text in the file ... and then pick through the often unformatted, unwrapped, super ugly text until I find the email address or info that I'm searching for. Because of this, I tend to at-all-costs leave emails on servers or at least in the clients so that I can more easily search and find.
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
I remember having a similar problem years ago with E-mail in several systems and getting annoyed that everything was in different formats in different E-mail clients. I fixed the problem by setting up my own IMAP server. An IMAP server is a mail server that's compatible with virtually ALL E-mail clients but what's important about them is they act as mail stores unlike POP3 so you can upload mail to an IMAP server without screwing up formatting or anything. Then once you get all your E-mail up to your IMAP server, you can chose to just store it there (just remember to back it up now and then) or you can redownload it all into a Mail folder on ThunderBird (Backup Thunderbird's Mail store folder for protection) ThunderBird probably isn't going away in the foreseeable future but if it does, sometime down the road you can reuse your IMAP server to transfer it to another mail client.
Use the Thunderbird archive.
Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore.
Nope! :-)
I have about 10 years of email in Thunderbird. It keeps data in the mbox format which is a well supported open standard. The files are human readable and can be greped. There's lots of 3rd-party tools that support mbox. Thunderbird builds indexes (maybe those are proprietary) which are good enough that I can search that decade of email in a few seconds. (Maybe that is only searching by subject, to, and from. Message body searches might take longer). I remove attachments from old mail though, because that eats up space and is not valuable. If I needed the attachment, I saved it somewhere more appropriate.
The Thunderbird archive feature merely moves the mail into separate mbox folders to keep the main file from getting too big. It doesn't make them proprietary.
The hard part might be moving existing mail into that format from whatever it is in now.
Holding your business emails too long is a liability risk... they are subject to discovery in the case of a lawsuit. Most businesses have a limited email retention policy for that very reason.
I friggin' hate people who, on an Ask Slashdot, completely fail to answer the question and say something that has nothing to do with the topic at hand.
And yes, I am aware of the irony of posting a comment like this to criticize one, so you needn't bother pointing that out.
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
Holding your business emails too long is a liability risk..
I was just asked to recover email from the late 90s as part of a means to prove we had prior art on a patent that was being asserted against us. The email history included draft drawings, work orders to a manufacturer requesting customizations to our manufacturing equipment, invoices and negotiations with customers to work with it. etc. All with a clearly documented timeline that could be verified with multiple 3rd parties if it came to a court situation.
This sword clearly cuts both ways.
There is no good reason to keep 25 years of email.
There is no good reason to assume that your needs are the same as those of others.
I had a client who insisted he needed to keep every email forever. I thought he was full of shit until he explained to me why.
He works as a vendor rep, helping them sell shit to a well-known Fortune 50 retailer.
As it turns out, this Fortune 50 company periodically audits years old (like sometimes 5+ years) invoices and receiving information and arbitrarily decides "we just realized that shipment you sent us in 2009 was short, but we paid the invoice in full. So we're going to subtract the overpayment -- plus interest -- from the current amount we owe you."
Part of this guy's job was the ability to get the shipping/receiving info as it happens, and the old email lets him present info that basically says "you said it was a complete shipment in 2009, so no deductions".
What I found kind of amazing was that somehow this retroactive auditing is considered acceptable. My guess is vendors are just expected to eat it or not get their product on the shelves.
I asked a similar question to Slashdot about a month ago, where I wanted to stash E-mail and have it accessible if I'm on the road.
I looked at a few options. Using a virtual machine, an offsite storage provider, and so on.
What I have wound up doing is buying a NAS. Synology or QNAP are good companies for this. The NAS I bought was a basic one, but it supports RAID 1, which is critical. It also gets backed up automatically via a script that goes in via SSH, creates a tar file, pipes it to zbackup which has a repository on another NAS. zbackup is ideal for backups of E-mail, and having another machine pull the backups helps deal with ransomware, once the bad guys start hitting devices.
I then enabled the mail server functionality, which gave me an implementation of dovecot and roundcube. This not just gave me IMAP access, but access via the web (SSL). Using the onboard firewalling, I limited the IP range that the NAS talks with, to just the IP range of the commercial VPN service I use (which is a small provider, run by some competent admins.) This way, for an attacker to even get to an open port forwarded past the router to the machine, they have to have an account with that small VPN provider.
For me, this has worked well. I have access to my E-mail over IMAP or the web. Since the NAS doesn't send or receive mail directly (mail just gets copied to it when archived), it doesn't need SMTP access in or out.
Caveat: Focus on security when setting this up. Ideally, you could use the NAS's built in eCryptFS capability to protect the IMAP maildir directories so physical theft of the NAS doesn't mean your critical E-mails belong to someone else. From there, put the NAS in its own DMZ, blocking all outgoing traffic except for it checking for OS updates, and only allowing incoming traffic to the TLS-based ports, preferably with heavy IP restrictions. For backups, do a pull based system, so if the NAS gets infected, the bad guys can only put garbage in the backups, and not attack previously stored data.