Ask Slashdot: Best (or Better) Ways To Archive Email?
An anonymous reader writes: I've been using email since the early '90s and have probably half a million emails in various places and accounts. Some of them are currently in .tar files, others in the original folders from obsolete or I-don't-use-them-anymore mail clients. Some IMAP, some POP3. You get the picture. I don't often need to access emails older than a year or two, but when I do, I have found that my only hope for the truly archived ones is to guess what Grep combo might find the right text in the file ... and then pick through the often unformatted, unwrapped, super ugly text until I find the email address or info that I'm searching for. Because of this, I tend to at-all-costs leave emails on servers or at least in the clients so that I can more easily search and find.
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
MailStore Home is the defacto best free method I've found: http://www.mailstore.com/en/mailstore-home-email-archiving.aspx
That's true, but from the sounds of it this is for business reasons. For business it's probably more important than if it was personal.
Website Just Down For Me? Find out
`OfflineImap` (for fetching into a local maildir), then `mu` for indexing and searching.
As for converting your already-archived mail into maildir format, that's a little more tricky. Once they're in maildir format, you can just use `tar` to compress the ones you don't currently need to access.
https://www.mailarchiva.com/
Works pretty well.
I'm sorry, I can't hear you over the sound of how awesome I am.
In that case, everyone at the company should print out each email they receive.
I remember having a similar problem years ago with E-mail in several systems and getting annoyed that everything was in different formats in different E-mail clients. I fixed the problem by setting up my own IMAP server. An IMAP server is a mail server that's compatible with virtually ALL E-mail clients but what's important about them is they act as mail stores unlike POP3 so you can upload mail to an IMAP server without screwing up formatting or anything. Then once you get all your E-mail up to your IMAP server, you can chose to just store it there (just remember to back it up now and then) or you can redownload it all into a Mail folder on ThunderBird (Backup Thunderbird's Mail store folder for protection) ThunderBird probably isn't going away in the foreseeable future but if it does, sometime down the road you can reuse your IMAP server to transfer it to another mail client.
I've been using email since the early 1980's, 1982 specifically. I was using "mail" then, later mailx, later whizbang graphical clients.
I still have tar archives of emails from a PDP-11. I can still read them today. Why? Because open formats. Tar archives from the dawn of time can still be read on a modern Linux system today. Once you start locking things up in proprietary formats such as used by Outlook, it gets harder to read them once that format dies. Not impossible, but certainly a bigger PITA.
Tar will probably still be here long after I am gone, so from my POV it is a format with suitable longevity. The underlying messages were encoded in plain old (mbox, I think) mail format, which is also still readable by modern mail clients, and even if it wasn't, it's plain old ASCII, so "less" would suffice in a pinch. Stay away from weird binary / closed formats!
Use the Thunderbird archive.
Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore.
Nope! :-)
I have about 10 years of email in Thunderbird. It keeps data in the mbox format which is a well supported open standard. The files are human readable and can be greped. There's lots of 3rd-party tools that support mbox. Thunderbird builds indexes (maybe those are proprietary) which are good enough that I can search that decade of email in a few seconds. (Maybe that is only searching by subject, to, and from. Message body searches might take longer). I remove attachments from old mail though, because that eats up space and is not valuable. If I needed the attachment, I saved it somewhere more appropriate.
The Thunderbird archive feature merely moves the mail into separate mbox folders to keep the main file from getting too big. It doesn't make them proprietary.
The hard part might be moving existing mail into that format from whatever it is in now.
Holding your business emails too long is a liability risk... they are subject to discovery in the case of a lawsuit. Most businesses have a limited email retention policy for that very reason.
I friggin' hate people who, on an Ask Slashdot, completely fail to answer the question and say something that has nothing to do with the topic at hand.
And yes, I am aware of the irony of posting a comment like this to criticize one, so you needn't bother pointing that out.
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
That's true, but from the sounds of it this is for business reasons. For business it's probably more important than if it was personal.
For business it can be even more important to clean things out. Having old things on hand is more likely to work against you than work in your favor. Yes, some documents need to be carefully retained and kept on file for the life of the business and the best place to do that is not in email. Most of these communications should be disposed of on a regular basis.
Most business lawyers I've worked with have strongly recommended a data retention policy to dump email regularly and always before the 3-month government communications free-for-all. Most work places I've been at have had 3 months before automatic forced deletion of email. If it is important it does not belong in email. Unread email is treated differently under the law, and currently any email that is six months old or older and marked as unread can be opened and read by federal agencies without a warrant. Similarly, transitory communications like chat logs and even file transfers through services like DropBox are easily accessed by government's prying eyes. Don't keep data there because lots of organizations, including government agencies, corporate spies, and opposition lawyers, can all get access to it.
If it is important it gets printed and filed, or moved to electronic documents that are properly archived, or otherwise moved to a better location than email. Paper files and electronic archives get properly maintained with their own data retention policies. Contracts and agreements made get filed with dates.
There is no good reason to keep 25 years of email.
Print out and properly file what is important. Agreements and important documents get filed. Properly file and archive personal mementos (not in email) or put them in a scrap book.
//TODO: Think of witty sig statement
Holding your business emails too long is a liability risk..
I was just asked to recover email from the late 90s as part of a means to prove we had prior art on a patent that was being asserted against us. The email history included draft drawings, work orders to a manufacturer requesting customizations to our manufacturing equipment, invoices and negotiations with customers to work with it. etc. All with a clearly documented timeline that could be verified with multiple 3rd parties if it came to a court situation.
This sword clearly cuts both ways.
There is no good reason to keep 25 years of email.
There is no good reason to assume that your needs are the same as those of others.
I had a client who insisted he needed to keep every email forever. I thought he was full of shit until he explained to me why.
He works as a vendor rep, helping them sell shit to a well-known Fortune 50 retailer.
As it turns out, this Fortune 50 company periodically audits years old (like sometimes 5+ years) invoices and receiving information and arbitrarily decides "we just realized that shipment you sent us in 2009 was short, but we paid the invoice in full. So we're going to subtract the overpayment -- plus interest -- from the current amount we owe you."
Part of this guy's job was the ability to get the shipping/receiving info as it happens, and the old email lets him present info that basically says "you said it was a complete shipment in 2009, so no deductions".
What I found kind of amazing was that somehow this retroactive auditing is considered acceptable. My guess is vendors are just expected to eat it or not get their product on the shelves.
Bob from accounting.
When you move, if you find a carton from the previous move unopened, discard it without opening. Follow the same rule and throw away the old emails. There is nothing of value in it.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Put all your mail on an imap server. You'll be able to access it with any mail client. Set up the imap server as the archive destination for TBird. Now all your mail is archived in the imap server and is accessible.
You don't trust your email host? That's fair. Run your own imap server on your NAS or even your desktop machine. Everything stays right there on your own media and is still future-proof with regard to changing clients. If you need to change servers you just use your favorite email client to transfer mail from one to another.
I have everything online at my email provider. In my case, "everything" goes back to the mid-90s. I recently switched hosting providers and did just as I described: Set up separate accounts in TBird with the old and new providers. Select all in a folder on the old provider, drag to a folder on the new provider. (Well, actually I had to do it in chunks of under 5000 messages or TBird would get all crashy on me. But you get the idea.) It was kind of tedious to move hundreds of thousands of messages, but it was merely tedious. It wasn't problematic.
Chelloveck
I give up on debugging. From now on, SIGSEGV is a feature.
After trying several solutions I settled on Mairix. Searches are screaming fast (less than a second to search several hundred thousand emails), indexing is fast, it's reliable (no problems in the 5+ years I've been using it), and the search language is easy and flexible.
* I use procmail to send a copy of everything to an archive, rotated monthly .bashrc: "function search() { mairix -o $$ $* && mutt -f ~/Mail/$$ ; rm ~/Mail/$$ ; }" .mairixrc:
* The archive is therefore just a handful of mbox files
* I have a cron job to run "mairix -Q" every 5 minutes, and "mairix -p" nightly
* I have this in my
* And here's my
base=~/Mail
database=~/.mairixdb
mbox=archive-*
mformat=mbox
omit=spam
With the above, I can find:
* everything from slashdot in the last two months: search f:slashdot d:2m-
* any emails I sent containing "squishy" in the body: search f:subreality b:squishy
* messages with "password" or "passwd" or similar in the subject: search s:passw=
* get a quick summary of the search language: search -h
It's so good that I download all my email from my work Gmail account so I can search it... sometimes Google's search just isn't precise enough to find what I need.
I use a combination of mutt + offlineimap + notmuch for mail, local archiving and a very powerful search.
I've been on this setup the past 6years or so. If mutt isn't your thing this approach is modular so you could simply sync with offlineimap and index/search with notmuch.
Have a squat over at the hobo house.
I asked a similar question to Slashdot about a month ago, where I wanted to stash E-mail and have it accessible if I'm on the road.
I looked at a few options. Using a virtual machine, an offsite storage provider, and so on.
What I have wound up doing is buying a NAS. Synology or QNAP are good companies for this. The NAS I bought was a basic one, but it supports RAID 1, which is critical. It also gets backed up automatically via a script that goes in via SSH, creates a tar file, pipes it to zbackup which has a repository on another NAS. zbackup is ideal for backups of E-mail, and having another machine pull the backups helps deal with ransomware, once the bad guys start hitting devices.
I then enabled the mail server functionality, which gave me an implementation of dovecot and roundcube. This not just gave me IMAP access, but access via the web (SSL). Using the onboard firewalling, I limited the IP range that the NAS talks with, to just the IP range of the commercial VPN service I use (which is a small provider, run by some competent admins.) This way, for an attacker to even get to an open port forwarded past the router to the machine, they have to have an account with that small VPN provider.
For me, this has worked well. I have access to my E-mail over IMAP or the web. Since the NAS doesn't send or receive mail directly (mail just gets copied to it when archived), it doesn't need SMTP access in or out.
Caveat: Focus on security when setting this up. Ideally, you could use the NAS's built in eCryptFS capability to protect the IMAP maildir directories so physical theft of the NAS doesn't mean your critical E-mails belong to someone else. From there, put the NAS in its own DMZ, blocking all outgoing traffic except for it checking for OS updates, and only allowing incoming traffic to the TLS-based ports, preferably with heavy IP restrictions. For backups, do a pull based system, so if the NAS gets infected, the bad guys can only put garbage in the backups, and not attack previously stored data.