Ask Slashdot: Best (or Better) Ways To Archive Email?
An anonymous reader writes: I've been using email since the early '90s and have probably half a million emails in various places and accounts. Some of them are currently in .tar files, others in the original folders from obsolete or I-don't-use-them-anymore mail clients. Some IMAP, some POP3. You get the picture. I don't often need to access emails older than a year or two, but when I do, I have found that my only hope for the truly archived ones is to guess what Grep combo might find the right text in the file ... and then pick through the often unformatted, unwrapped, super ugly text until I find the email address or info that I'm searching for. Because of this, I tend to at-all-costs leave emails on servers or at least in the clients so that I can more easily search and find.
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?
Use the Thunderbird archive.
Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore.
Nope! :-)
I have about 10 years of email in Thunderbird. It keeps data in the mbox format which is a well supported open standard. The files are human readable and can be greped. There's lots of 3rd-party tools that support mbox. Thunderbird builds indexes (maybe those are proprietary) which are good enough that I can search that decade of email in a few seconds. (Maybe that is only searching by subject, to, and from. Message body searches might take longer). I remove attachments from old mail though, because that eats up space and is not valuable. If I needed the attachment, I saved it somewhere more appropriate.
The Thunderbird archive feature merely moves the mail into separate mbox folders to keep the main file from getting too big. It doesn't make them proprietary.
The hard part might be moving existing mail into that format from whatever it is in now.
Holding your business emails too long is a liability risk... they are subject to discovery in the case of a lawsuit. Most businesses have a limited email retention policy for that very reason.
I friggin' hate people who, on an Ask Slashdot, completely fail to answer the question and say something that has nothing to do with the topic at hand.
And yes, I am aware of the irony of posting a comment like this to criticize one, so you needn't bother pointing that out.
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
Holding your business emails too long is a liability risk..
I was just asked to recover email from the late 90s as part of a means to prove we had prior art on a patent that was being asserted against us. The email history included draft drawings, work orders to a manufacturer requesting customizations to our manufacturing equipment, invoices and negotiations with customers to work with it. etc. All with a clearly documented timeline that could be verified with multiple 3rd parties if it came to a court situation.
This sword clearly cuts both ways.
There is no good reason to keep 25 years of email.
There is no good reason to assume that your needs are the same as those of others.
I had a client who insisted he needed to keep every email forever. I thought he was full of shit until he explained to me why.
He works as a vendor rep, helping them sell shit to a well-known Fortune 50 retailer.
As it turns out, this Fortune 50 company periodically audits years old (like sometimes 5+ years) invoices and receiving information and arbitrarily decides "we just realized that shipment you sent us in 2009 was short, but we paid the invoice in full. So we're going to subtract the overpayment -- plus interest -- from the current amount we owe you."
Part of this guy's job was the ability to get the shipping/receiving info as it happens, and the old email lets him present info that basically says "you said it was a complete shipment in 2009, so no deductions".
What I found kind of amazing was that somehow this retroactive auditing is considered acceptable. My guess is vendors are just expected to eat it or not get their product on the shelves.