Ask Slashdot: Best Way To Archive and Access Ancient Emails?
An anonymous reader writes "I started using email in the early 90s and have lost most of that first decade due to ignorance, botched backups, and so on. But since about 2000, I've got most — if not all — of my email in some form or other. I run Linux, so this has mainly been in a mix of various programs: Kmail, Evolution, Thunderbird. The past 2-3 years are still on the IMAP servers. My problem is that I only rarely NEED to look back to email of 5 years ago. But sometimes it's nice. Or I just want to reminisce about something...or find an old attachment that I was sent. But I do not want to be clogging my current email client of choice with vast backups and even more, I don't know if it will even easily convert. The file structures are different, some are mbox, others maildir, etc., and I would ideally like a way to 1) store and archive these emails, 2) access them, and 3) search by Sender, Subject, Date, Attachments. Is there anything I can do or do I just have to keep legacy applications on hand for this? Should I keep trying to upgrade and pull old files into the new applications? Any help or suggestions about what YOU do would be great."
Just IMAP it all.
I went IMAP in 1997 and have never looked back.
I've also used IMAP as a temporary conversion measure for people switching e-mail clients so even if you aren't sure, it makes a good first step.
I don't understand the concern about too many e-mails. I can access my email back to 1992. With multiple folders it shouldn't be a problem and with modern indexing a search shouldn't be an issue.
Use the IMAP server - if you have control and/or space available.
I just have a single large archive IMAP folder into which everything that isn't spam gets pushed. You could optionally create subfolders for time ranges (every 1-2 years, whatever works for you). Using dovecot with good indexing support on the backend quick searching has been great. If you do a sub-archive breakout on time the searches will be quicker, you could also then create a virtual mailbox combining them all for when search really needs to span time (and take a good chunk longer)
There are scripts/utilities available to push mbox, etc. into an IMAP folder, push everything there and use it.
I have all my personal email from 1998 in a Maildir directory with Dovecot as the server on a dual core Atom server running Centos. About 900 MB worth.
Plenty fast.
Trying to figure out what formats will be available in the future is pretty hard, it's easier to see what formats have been around a long time and are still in use.
As such, two formats come up readily:
mbox http://en.wikipedia.org/wiki/Mbox and maildir http://en.wikipedia.org/wiki/Maildir
Had the same need 20 years ago when migrating from VAX/VMS to Unix. The old emails were saved in a not quite readable format, but I figured I could recover them if necessary. In the end, never bothered. Yes, there are a few (actually, only two) that I'd like to resurrect now, but life moves on.
I'm a big fan of throwing together a DB when I want to store things categorically like that and want fast searches. If you are up to the task, hunt down some tools/roll your own so that you have a nice relational database and some stored procedures for getting what you want when you need it.
You could export your emails to some parsable format, write an importer to extract the basics that you want to keep (from/to/subject/body,attachments/entire binary blob/etc) and then bulk insert that mess into on a mysql/sql server tucked away somewhere locally or "in the cloud" (EC2, Azure). Just another option as I'm sure you'll see here many here. At least with this route you are in full control of how you index, what you can search, encryption, performance, level of backups, etc. Maybe not the best way for some but I know if I had over 100000 emails that I wanted searchable very very quickly with advanced SQL like searching, this would be a cool way to do it (time permitting). Good luck! And to the pedantry to ensue...Yes. Good day.
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
Best method of storing and searching old email? Gmail. It can import from pop and imap so you can point it at your other inboxes and let it get on with it.You can upload from other mail clients to Google's imap server. Obviously it's amazing at searching through the archives.
Best method if you're concerned about Gmail's privacy? I'm still working on that one.
A latent existence
Design a MySQL database for storing your mail messages, keying on sender, subject, date, and presence of attachments (bonus points for storing the attachments as blobs rather than as external files). Then write a perl script that'll automatically parse all your incoming email and convert it to database entries. I suppose if you're lazy the script could just monitor your mail spool, but it'd be better to just have it listen for incoming connections and handle the mail directly.
Next, make copies of that script, modifying as necessary to process all your old mail archives.
Oh, and you'll need to write another perl script to access all new mail - not from your mail spool, but from this database. You should probably name this system after some animal too. If you absolutely MUST have a graphical interface on it, don't use anything newer than TCL+Tk - but going with curses would be a better choice.
Oh - it has to be GPLv3, or we'll hate you and probably mailbomb your machine.
What - isn't that the Slashdot way?
#DeleteChrome
You don't need all those e-mails. Keep the few you actually care about (copy and paste the text into a regular file, and save any attachments you want), and get on with your life.
People that keep every e-mail are weird. Quit living in the past.
So can anyone with a subpoena. And you can bet Google would be running their advertising stuff on that.
There is no way I would put my life on a public server like that.
I need to archive emails that I can search later - but with a twist. These are employees who've left the company. I can't keep 'em on at Google Apps 'cause I have to pay for that by user. So I use IMAP (making sure to set Chats to be shown in the IMAP list), create an account in Thunderbird, and slurp it all on to the local machine. It keeps all the folders, although I doesn't seem to be smart enough to figure out multiple labels, so it looks like it downloads the same email multiple times, once for it's folder, and once for "All Mail." Then I delete the account at Google. You just have to be sure to click through all the folders in Thunderbird and make sure it is done downloading before you blow the Google account away.
http://notmuchmail.org is Gmail for people that don't trust Google. Works great with your existing IMAP server using offlineimap.
As soon as gmail made IMAP available, everything went there. I used to get my stuff via POP and saved it all going back to the early 90s. When IMAP went live on gmail, I let it chug away for hours and hours until it was synced and all my archived stuff was stored on my gmail account. They've been bumping up the limit faster than my mail's built up so I'm now at 3.9 gigs used of 10.1 available, holding about twenty years of email. I have email clients on a desktop and couple laptops that I fire up every couple of months to sync with gmail and keep local stores in the event that google screws up and loses my data. (I like to think I'd be smart enough to disconnect from the internet before accessing the local clients if my gmail account ever went blank but I've got multiple copies just in case I forget.)
I know that won't work for email fiends who pile up a gig a month but it works for me. I don't even bother sorting my email any more. It's faster to just search. Not like the old days when it would take my email client half an hour to slog through all the messages. :)
Set up a local courier IMAP server and copy mails there, and archive the Maildirs...each message will be a file and you can use tools like grep to search the Maildirs
It is email. AKA over the web. AKA public.
And someone with a subpoena can get your records off of your ISP, or just come into your house and take it off your computers.
Troll is not a replacement for I disagree.
I wouldn't posit this as the best way, but it's what I do. I keep my archival mail on a local filesytem arranged in directories, stored in the old-school mbox format. I run Dovecot under OS X for IMAP access to those messages from anywhere; when I need to search through the whole collection, I use mairix (an indexing and retrieval system).
Just delete some goddamn email.. hoarder!
"My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus
heh - i have all my email going back to '98 in Outlook Express. Best email program ever! It's nearly perfect for what i want. (Any way to get it to do inline spell checking, ie, underlines misspelled words as you type?) Still running it on an XP box. Been using Windows Live Essentials a bit for Win8, it's not horrific, but lacks some of the characteristics..hope MS injects some of the OE spirit into it..
Eudora still runs on my Win7 box. I have email going back to at least the early '90s. All plaintext and easily searchable.
Just sweep it all into the Trash Bin, breathe deep, and move on with your life confident in the impermanence of all things.
Plus that Trash Bin program has _great_ compression!
To those saying keeping email forever is hoarding: not if it's done right.
That's like saying neck deep rooms of newspapers/magazines in a house isn't hording if you stack them neatly with little paths running through it.
I'd say follow the same rules as any archiving of media:
:)
Pick one format and migrate all of your messages to that: In this case, I'd say mbox. Thunderbird and most other mail programs read it and you can get most of your mail into mbox format via IMAP/Thunderbird from whatever mail client can read your old ones. You can store your mbox files locally in Thunderbird and gain Thunderbird's searching (for instance) without the need for an actual back-end. I was able to read some mail stored in Netscape Mail because it was just mbox files and opening them in Thunderbird was a breeze.
Most importantly: Every 5-10 years, re-evaluate your storage choice. Is Thunderbird still around? Is mbox still pretty well regarded? If you find you need to migrate again, do it! If both are still active / supported, then hold onto 'em. The only way to perpetually maintain media access is to make sure your choices are still valid on a regular basis. This is true for any media: As the old formats go obsolete (cassette tape, VHS), you need to migrate that data to the next readily accessible format (CDs, DVDs; FLACs, MPEG(?)).
I think the biggest problem is that you have a mish-mash of stored files right now. You'll save yourself a headache in the future by tearing the band-aid off now and taking the time to get all of your mail into one format. Then, in the future, when you need to convert, it'll be many steps easier since you won't have to visit Slashdot and find out what to do about your mail again next time.
I run qmail for sending/receiving mail (on Gentoo; netqmail package), using maildir, of course. On top of that, I run the Courier IMAP server on my internal network (with TLS encryption). Until a few months ago I used Mutt as a client (console-based), but I've moved to using Roundcube (web-based email), which I initially installed for my wife, and have been happy with it. I also have some automatic filtering to folders via Maildrop (another Courier utility; it looks at a ~/.mailfilter file to route mail).
Roundcube/the IMAP server's search is OK most of the time - I keep my inbox small and move older mail to sub-folders - when I want to do advanced searches or search large mailboxes I log in and grep through folders of interest; this works well with the maildir format with one file per message. Maildir was also quite resilient when I had a HD crash and needed to recover some lost mail (block scan for blocks that look like mail headers found most missing items, and I do better backups now - mail is under ~/.maildir and gets backed up automatically).
I would move older messages to maildir (there are plenty of mbox converters, and almost anything non-proprietary should be convertible to mbox or maildir via existing programs or a short perl script) - even if at some point maildir dies off entirely, which seems unlikely, converting it to another format will always be trivial due to its simplicity and it has the advantages mentioned above of being able to search easily with grep etc.
Force feed? WTF are you taking about? Dovecot can use any make mail format. Just set MAILDIR if it's in a non-standard directory. So the whole procedure is:
yum install dovecot /etc/dovecot.conf (only if using a nonstandard mail location)
vim
service dovecot restart
set username and password in GUI client
I never will understand why some people feel the need to post on topics they don't have the slightest clue about.
http://www.mailstore.com/en/mailstore-home.aspx
works well quick searches and its local .
unfortunately its windows only but may work fine under wine.
Music the Paint dancefloor the canvas your body the brush
The problem is that a throwaway email might become critically important later on. There is no way to know in advance what is important and what is not.
True story: while deployed in the Army, our communications guy could not find a piece of equipment which was very important and very pricey. He had been signing the monthly inventory forms saying he had it, assuming it was in a cabinet. He could not find any paperwork showing it was signed out - it had just disappeared sometime in the last 3 months and no one had seen it.
On a long shot, I started searching my email - since I keep every last one. Sure enough, about 2 months prior, there was a throwaway email from him to the effect that he was going to turn in item X for repair since it was acting flaky. He checked at the contractor mentioned in that email, and it was sitting on the shelf waiting for pickup.
Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
Parchment -no less- does it for my ancient emails.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
I had to archive the emails since 1996. They were in multiple formats - Outlook Express dbx, Mailbox from Netscape Navigator and Thunderbird, Outlook.
I converted all of them in .eml format. It's a simple, text format that can be read by the OS and easily parsed by any program and script. Much better than mbox or something else. Then I renamed all of them according to a rule - YYYYMMDDhhmm [From] [Subject]
Now I can easily find any email. I can browse them using the file system, I can search them using the OS or via a script. Windows indexes them and extracts the metadata so any search is very quick.
Ugh. Drop all that stuff. Who needs it? My gmail folder has 20 messages in it. Lighten your (psychic) load.