Ask Slashdot: Best Way To Archive and Access Ancient Emails?
An anonymous reader writes "I started using email in the early 90s and have lost most of that first decade due to ignorance, botched backups, and so on. But since about 2000, I've got most — if not all — of my email in some form or other. I run Linux, so this has mainly been in a mix of various programs: Kmail, Evolution, Thunderbird. The past 2-3 years are still on the IMAP servers. My problem is that I only rarely NEED to look back to email of 5 years ago. But sometimes it's nice. Or I just want to reminisce about something...or find an old attachment that I was sent. But I do not want to be clogging my current email client of choice with vast backups and even more, I don't know if it will even easily convert. The file structures are different, some are mbox, others maildir, etc., and I would ideally like a way to 1) store and archive these emails, 2) access them, and 3) search by Sender, Subject, Date, Attachments. Is there anything I can do or do I just have to keep legacy applications on hand for this? Should I keep trying to upgrade and pull old files into the new applications? Any help or suggestions about what YOU do would be great."
Just IMAP it all.
I went IMAP in 1997 and have never looked back.
I've also used IMAP as a temporary conversion measure for people switching e-mail clients so even if you aren't sure, it makes a good first step.
I don't understand the concern about too many e-mails. I can access my email back to 1992. With multiple folders it shouldn't be a problem and with modern indexing a search shouldn't be an issue.
Had the same need 20 years ago when migrating from VAX/VMS to Unix. The old emails were saved in a not quite readable format, but I figured I could recover them if necessary. In the end, never bothered. Yes, there are a few (actually, only two) that I'd like to resurrect now, but life moves on.
I'm a big fan of throwing together a DB when I want to store things categorically like that and want fast searches. If you are up to the task, hunt down some tools/roll your own so that you have a nice relational database and some stored procedures for getting what you want when you need it.
You could export your emails to some parsable format, write an importer to extract the basics that you want to keep (from/to/subject/body,attachments/entire binary blob/etc) and then bulk insert that mess into on a mysql/sql server tucked away somewhere locally or "in the cloud" (EC2, Azure). Just another option as I'm sure you'll see here many here. At least with this route you are in full control of how you index, what you can search, encryption, performance, level of backups, etc. Maybe not the best way for some but I know if I had over 100000 emails that I wanted searchable very very quickly with advanced SQL like searching, this would be a cool way to do it (time permitting). Good luck! And to the pedantry to ensue...Yes. Good day.
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
Best method of storing and searching old email? Gmail. It can import from pop and imap so you can point it at your other inboxes and let it get on with it.You can upload from other mail clients to Google's imap server. Obviously it's amazing at searching through the archives.
Best method if you're concerned about Gmail's privacy? I'm still working on that one.
A latent existence
Design a MySQL database for storing your mail messages, keying on sender, subject, date, and presence of attachments (bonus points for storing the attachments as blobs rather than as external files). Then write a perl script that'll automatically parse all your incoming email and convert it to database entries. I suppose if you're lazy the script could just monitor your mail spool, but it'd be better to just have it listen for incoming connections and handle the mail directly.
Next, make copies of that script, modifying as necessary to process all your old mail archives.
Oh, and you'll need to write another perl script to access all new mail - not from your mail spool, but from this database. You should probably name this system after some animal too. If you absolutely MUST have a graphical interface on it, don't use anything newer than TCL+Tk - but going with curses would be a better choice.
Oh - it has to be GPLv3, or we'll hate you and probably mailbomb your machine.
What - isn't that the Slashdot way?
#DeleteChrome
You don't need all those e-mails. Keep the few you actually care about (copy and paste the text into a regular file, and save any attachments you want), and get on with your life.
People that keep every e-mail are weird. Quit living in the past.
I'd say follow the same rules as any archiving of media:
:)
Pick one format and migrate all of your messages to that: In this case, I'd say mbox. Thunderbird and most other mail programs read it and you can get most of your mail into mbox format via IMAP/Thunderbird from whatever mail client can read your old ones. You can store your mbox files locally in Thunderbird and gain Thunderbird's searching (for instance) without the need for an actual back-end. I was able to read some mail stored in Netscape Mail because it was just mbox files and opening them in Thunderbird was a breeze.
Most importantly: Every 5-10 years, re-evaluate your storage choice. Is Thunderbird still around? Is mbox still pretty well regarded? If you find you need to migrate again, do it! If both are still active / supported, then hold onto 'em. The only way to perpetually maintain media access is to make sure your choices are still valid on a regular basis. This is true for any media: As the old formats go obsolete (cassette tape, VHS), you need to migrate that data to the next readily accessible format (CDs, DVDs; FLACs, MPEG(?)).
I think the biggest problem is that you have a mish-mash of stored files right now. You'll save yourself a headache in the future by tearing the band-aid off now and taking the time to get all of your mail into one format. Then, in the future, when you need to convert, it'll be many steps easier since you won't have to visit Slashdot and find out what to do about your mail again next time.
I never will understand why some people feel the need to post on topics they don't have the slightest clue about.
Because it's a long standing Slashdot tradition!