How Do You Store and Reconcile Email Archives?
heyitsjustme wants to know how you deal with old email. "I delete most of what I get but keep the stuff from friends and relations as an archive. Unfortunately I have these email archives from the late 80's through today in the form of macintosh, linux and windows mailboxes including AOL 1.0 mailboxes. What does everyone use to archive email across multiple platforms and non-standard mailbox formats? Is there an easy solution out there? Does anyone archive IM?"
...with fetchmail / procmail / cyrdeliver for sorting and storing from other sources. How can 5GB of mail can't be wrong?! I can slice and dice my all my email (including about a gig of spam...) for choice bits of information.
My God! It's full of Voids!
I use the basic Unix mail format, essentially plain text series of messages. Eudora does fine with it; and most anything else can read/import it. I have email going back to the 80's in this format. The one time I had to convert was when I was working for a company that used "Quickmail" on the Mac. I wound up reverse engineering their format and hacking up a program to convert it to plain text.
I also have email archives that stretch back to the early-1990s. I pretty much still have every email I've ever sent or received. When upgrading email clients, I often migrate my archives with me, converting them using whatever client's built-in importing and exporting functions I have available. I went from Eudora to Outlook Express to Thunderbird to Mac Mail. I also have programs that "pop" webmail off their sites (gmail, hotmail and yahoo) to consolidate them in whatever current mail client I'm using. I just keep them in neat folders ("Old Eudora Mail," "Old Yahoo Mail")..
but ...
... just wondering as the Submitter did what i like /. Submitters to do: make me think and look for new, better stuff ... or better ways to do old-stuff.
:)
Along these lines, is there an OSS package that can read the varied formats the Submitter is referring to, tag and drop them in a DB with a nice, friendly, web-enabled (secure) front-end for searching?
My former employer kept *all* of his email from the last 20 years in tar.gz files. Let's just say it wasn't easy to find an email from er, 15 years ago very easily.
Is there a package that can read the mbox, the other box-formats, plain text, pull from pop, old tar.gz bundles, categorize (sorta), tag and make such things searchable?
Totally a shot in the dark here, i'm not a mail guy at all
It is the "drink" that makes me wonder, sorry
I second this.
I started running my own IMAP server on an old machine a year or so ago - and synced all my old mail archives to various folders.
My mailserver also solves another problem - multiple POP accounts. I have my IMAP server set up so that each one of my POP accounts gets automaticly tagged and sent to it's own folder.
A third common problem this solves is having multiple machines. Now my desktop's email client is always synced with my laptop's email client. Before I had run into problems when ever I traveled and fetched my email from the road.
I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
I never keep emails, or archive IMs or any other form of communication. Once a email is read, it is deleted. Same goes for normal old-skool mail, I read it and then trash it. The only exceptions are of letters/email of some importance such as information I need to keep handy, or if it has some kind of sentimental value (letters from deceased relatives for example.)
Sure, HDD space is cheap; but I tend to equate people who archive every single form of written communication to those who have an Obsessive Compulsive Disorder, in that they hoarde everything in sight: newspapers, snail mail, magazines, boxes, etc..
Commit to memory and destroy the evidence. Thats my way of handling archives.
One word: IMAP
...who knows what else. I've got freedom to try whatever I want at any given moment without losing my current or past mail.
Absolutely. I use no fewer than two mail clients on two different machines on any given business day. Every email I've sent since 1995 or something like that, and received since 1998 is available and searchable. Over this time, I've accessed this archive with the following clients:
* pine (lots of pine)
* mac mail
* thunderbird
* various netscapes/mozillas
* ML (some random IMAP reader)
* My phone (my old Sony/Ericcson speaks IMAP)
* My palm (two different apps)
* python
* a java webmail system I wrote
* three or four other webmail systems
* mutt
-- The world is watching America, and America is watching TV.
I have several CDs worth of stuff archived with ForKeeps:
http://www.fkeeps.com/whofor.htm
It's a bit of an old program and the interface is clunky, but it works reasonably well once you work through it.
--Steve
Maybe other companies do it but until there is proof then you can't slander them but Microsoft do it, so they're fair game.
Thanks, but I'll pass.
I do data mining research, most recently on the Enron email dataset, and I've actually been having to roll my own multi-mailbox storage, access, and retrieval systems. It's taking way more time than I'd like, at this point I've gotten a database and web-based viewers made up (beware, they're quite slow).
If anyone has an idea of an open-source application similar to what the submitter is looking for, it would help my research quite a bit. There's practical research applications in this stuff, if someone's interested in making it.
Just about every email program that I've used has managed to export to CSV. A few web-based email systems didn't allow such imports and some hunting on the web found some sort of convertor (like YahooPOPS!, etc.) that converted to POP and then I exported them to CSV using Eudora or Outlook, or whatever program I was particularly enamored with. .PST files that are archived on CDs and on an external hard drive. .PST in case I needed them for anything later on. ...), since I can usually remember roughly when I got an email that I was looking for - alternatives include perhaps by name of sender or company. :)
Admittedly, sometimes the column names didn't match up ("Sender" v "From"), etc., but for the most part that how I did it. I also made an effort to keep the number of email accounts that I had to a minimum. At this point of time, most everything is stored in the form of
I also made an effort to keep my email accounts to a minimum, which probably made this entire process significantly easier and when I did close an account (like when I finished work at a company), I exported the emails from there and kept them in
As far as indexing works - I have them stored in 6 month segments (Jan97-Jun97, Jul97-Dec97,
I do archive IMs - Trillian worries about it for me.
Hope this helps.
I keep the emails in mailbox format (that is, in plain text as it is stored in most UNIX systems), in several files. The reason I do that is that most email readers (MUA) can read mailbox format. I keep them in several files to make it more manageable.
The tools that I use to manipulate emails are mostly "from", "procmail", "grep", and "less". There used to be tools from the "elm" era (still remember them?), such as "frm" (which is better than "from"), "reademail" (to read individual email, given the number of email in the archive), "deletemail" (which can delete an individual email in the archive). Too bad, these tools are gone. At one point I slapped a simple Tk interface as a front end to those tools. But it didn't scale well.
At one point I did experiment to store emails in indiviual files. But the tools to manipulate them are limitted. I used MH.
The next experiment I did was to take all those email headers and put them in a database. (I used msql, which was popular at that time.) Then, I had a Java applet and perl script to make queries to the database (and actually did an analysis of my reading habit). The actual emails were stored as plain text files. Each email was stored in individual file. Basically, the original email was untouched. I got bored and never continue the project.
Now ... I am stll searching for the perfect email tools.
My father was concerned about the longevity of his e-mail a few years ago, so I created a small batch file that converts his Outlook Express mail archive into mbox on a monthly basis. Last month he asked if I could convert them "into a web site" so he can get an idea of a thread history without parsing a huge file. When I get a moment I'm planning to write a script that outputs each message to a new file in html tags and use the message subject and date to create a rudimentary index.html.
I'm surprised no one has tried this before. It's a good low-tech solution for people who require information in a hurry and is more immediate than a flat file.
Outlook + IMAP is the way I do it. You can drag messages between local storage and your mail server.
You have to print it with something. Ink: one of the most expensive ways to put stuff on paper. Heck, they say it costs seven times more than champagne per drop! That, plus the costs of cartridges and printer maintenance and, and... oh the horror! ;)
Me? I obsessively reinstall my operating system and reimport old mailboxes into my mail client, so I have a dozen copies of 5-year old email, ten copies of 4-year old email, 8 copies of 3-year old email, etc. No need for backups... plus when I search my computer for old email, I get a dozen copies of what I'm looking for!
This is why the House of Lords was resistent to the prosecution of Nazi war criminals for so long, incidentally.
Wikileaks, no DNS
I consolidated all my personal e-mail since 1995 into a Maildir (which I access using IMAP).. It totalled only 60 MB. I don't think that is a whole lot that I need to worry about disk space or searching or my IMAP server not able to handle it. The way I have it organized, my searches don't occur on any of the old mail (unless I want it to). The only point I think you were right about is the evidence used against me (in my case, anyhow). It's kinda entertaining to go back and read some of my old correspondences and see how much of a different person I was back then. It's kinda like looking at old diaries or something.
With the advent and subsequent improvements of LiveCD distros, it should be relatively painless for the average /.'er to:
* Create a multi-session CD/DVD with your favorite Linux LiveCD distro
(or roll your own and create an ISO for future use)
and
* Backup email files to said CD/DVD
(I suggest a set of re-writable media of good quality to play it safe.)
Further suggestions:
1. It would be advisable to split your archives (ie. Mail2004, etc.), especially if you plan to retain a sizeable amount of mail.
2. Convert archives from older mail clients before creating backup, or use a newer mail client that can read the old files with ease.
Good luck!