Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive?
First time accepted submitter txoof writes "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010. The archive is for storage only. It is a mirror of my Gmail account. The archives are both sent and received mail compressed in a hierarchy of weekly, monthly and yearly mbox files. I've chosen mbox for a variety of reasons, but mostly because it is the simplest to implement with fetchmail. After inspecting some of the archives, I've noticed that the larger files are a result of attachments sent by well-meaning family members. Things like baby pictures, wedding pictures, etc. What I would like to do is from this point forward is strip out all of the attachments and only save the texts of the emails. What would be a sane way to do that using simple tools like fetchmail?"
You have. Thunderbird includes archival folders and a Lucene search engine.
Exactly this, and even if it's a few GB. It's just too small amount to bother about.
Agreed. 500MB is trivial, especially if it includes a bunch of large attachments. I just checked my email directory at home, and it's 2.7GB in size. It's on a network drive and Thunderbird accesses it more-or-less instantly; there is no discernible lag in showing the content of any mail folder - the hierarchy of folders is complicated, but some folders are large. The network drive is backed-up automatically three times a week, so its risk of loss is tolerably low. With modern email clients, the penalty of huge email directories should be tiny.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Why bother indeed. When I look at my mailfolders, I try to think on my personal mail when the last time was that I actually searched for something older then one year,
Mails that I keep are orders I placed and passwords that I requested. All the rest I delete after one year.
I already do a lot of deleting after reading already. e.g. most mailing lists will be deleted almost immediately. Things I keep are bug reports I filed, till they are closed.
This is something I do in real life as well. If I have not used something in a year and there is no emotional value, I will trow it away. Even though it is technically possible to keep everything, I see no reason to do so.
Don't fight for your country, if your country does not fight for you.
Ask yourself: When are you ever going to read all those email again? When is *anybody* ever going to read them again.
As soon as:
1) you divorce
2) you get arrested for ANYTHING
3) They arrive with a search warrant for any reason
4) You sue or are sued
5) You run for office
6) You get hacked
Seriously, I keep VERY little historical Email. Very little.
I am not so vain that I believe there is any historical significance, and have never needed to go back more than a couple months for anything.
Just Delete it. Its safer that way.
Sig Battery depleted. Reverting to safe mode.
I'm at about 12GB myself, and that's one of the two big reasons that I keep the mail in maildir format and connect all clients to it via imap. Using a real mail server has kept that from happening to me (again) for years now. The other reason is that it makes it really easy to change clients to play around, or access it from lots of places.