How Do You Store and Reconcile Email Archives?
heyitsjustme wants to know how you deal with old email. "I delete most of what I get but keep the stuff from friends and relations as an archive. Unfortunately I have these email archives from the late 80's through today in the form of macintosh, linux and windows mailboxes including AOL 1.0 mailboxes. What does everyone use to archive email across multiple platforms and non-standard mailbox formats? Is there an easy solution out there? Does anyone archive IM?"
No need for rear view mirrors. What is behind you is not important.
Some drink at the fountain of knowledge. Others just gargle.
I archive all my pr0n on DVDs these days. It's really easy and oh wait... fsck!
rm -rf
and you are done!
Save it all. With the exception of some mail archives lost to catastrophic disk failures (I keep archives for my own convenience, not for any official purposes, so I don't back them up), I keep all my email.
Thunderbird is able to import all my old mail archives (from years and years of Eudora) and search it effectively. If I were inclined to export all my archives from my Mac to my Windows machine, I could use Google Desktop Search to really search through it all.
...so I just delete everything after a major deal falls through.
the best way to consolidate various types of emails may be to email them to a common source. Then archive from there.
I delete most of what I get
You must work for microsoft
----
Squirrel
...with fetchmail / procmail / cyrdeliver for sorting and storing from other sources. How can 5GB of mail can't be wrong?! I can slice and dice my all my email (including about a gig of spam...) for choice bits of information.
My God! It's full of Voids!
I use the basic Unix mail format, essentially plain text series of messages. Eudora does fine with it; and most anything else can read/import it. I have email going back to the 80's in this format. The one time I had to convert was when I was working for a company that used "Quickmail" on the Mac. I wound up reverse engineering their format and hacking up a program to convert it to plain text.
I delete almost everything, and only save a few very important or personal emails. For those I do keep, I print to PDF, and archive by date and person/subject. It works exceptionally well for me. It is all electronic, takes very little disk space, and keeps the clutter to a minimum, and eliminates most of the cross platform nightmares.
One word: IMAP. If you can read your email using any decent email client, it should support moving it to an IMAP server. If you are using web-based email or some crappy client which can't export emails to a standard/raw format, you'll have to write a script to convert the messages.
Ever since I first got acquainted with e-mail on my Apple IIe in the '80s, I've used e-mail programs that offer plain-text storage as at least an option. It's one of the most universal formats in existence, and can be read one way or another on computers both decades old and brand new. I encountered some weird proprietary clients in the '90s that still stored e-mail in this format, because from a corporate perspective, this stuff was still in its infancy, plus HTML hadn't yet mucked everything up. To this day I still store in plain text from Eudora 6.2.
I burn it to CD-Rs that I know won't get moved around or scratched. They stand a good chance of lasting the rest of my life.
The coolest voice ever.
EmailMan has the answers to your problem.
More utilities than I want to bother with, but hopefully they'll have the converter(s) you need.
Good Luck!
This might be useful, if they don't collapse under /.
"It is a greater offense to steal men's labor, than their clothes"
I give mine to Microsoft for the safe storage and instant retreval they are renoud for. Oh wait..
That's one of the many reasons why I have stayed with Pegasus Mailfor many years. Because they were created in the same program I know that I can still access my old mail files without problems.
What I do at year end is move all of that year's messages to a new folder and reset my filters so that the new year's messages go into a new set of folders.
Periodically I just copy off previous year's messages to CD.
At least few times I have been able to back a couple of years and find information that I lacked.
Three Squirrels
but ...
... just wondering as the Submitter did what i like /. Submitters to do: make me think and look for new, better stuff ... or better ways to do old-stuff.
:)
Along these lines, is there an OSS package that can read the varied formats the Submitter is referring to, tag and drop them in a DB with a nice, friendly, web-enabled (secure) front-end for searching?
My former employer kept *all* of his email from the last 20 years in tar.gz files. Let's just say it wasn't easy to find an email from er, 15 years ago very easily.
Is there a package that can read the mbox, the other box-formats, plain text, pull from pop, old tar.gz bundles, categorize (sorta), tag and make such things searchable?
Totally a shot in the dark here, i'm not a mail guy at all
It is the "drink" that makes me wonder, sorry
I only use e-mail clients that store mail in ascii with standard headers. This means no Outlook mail. I still use the Netscape e-mail client to view and organize my mail. Also I have various perl scripts that can access the e-mail archive. I have 22 years of e-mail, archived on my PC. It gets backed up with the nightly backup onto a swapable firewire drive. I swap the backup every morning and have one of the drives with me.
Almost every email client around can import and export mbox formats. Getting your email in a format that is going to be readable in 20 years is the first step, otherwise why bother?
Worse comes to worst mbox is readable as plain text.
Gmail?
I don't know about you but I generate about 6GB of email archives per year. Besides that having my email potentially available for searching doesn't sit well with me. I'm not sure where it stands now but there were a lot of potential privacy issues with Gmail.
No I don't receive hords of email, just a lot of engineering related with source code,research, white papers attached. If you do anything business related it's important to keep all of the original emails received so there is an electronic paper trail.
I second this.
I started running my own IMAP server on an old machine a year or so ago - and synced all my old mail archives to various folders.
My mailserver also solves another problem - multiple POP accounts. I have my IMAP server set up so that each one of my POP accounts gets automaticly tagged and sent to it's own folder.
A third common problem this solves is having multiple machines. Now my desktop's email client is always synced with my laptop's email client. Before I had run into problems when ever I traveled and fetched my email from the road.
I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
Don't ask Microsoft.
Democracy is two wolves and a sheep voting on lunch.
I've been using mbox format since the eighties, and never had a problem with it on any platform. It's pretty much been THE standard for email for as long as email has existed. If I ever were to switch, I'd probably switch to maildir, which has nearly as wide-spread support these days.
I never keep emails, or archive IMs or any other form of communication. Once a email is read, it is deleted. Same goes for normal old-skool mail, I read it and then trash it. The only exceptions are of letters/email of some importance such as information I need to keep handy, or if it has some kind of sentimental value (letters from deceased relatives for example.)
Sure, HDD space is cheap; but I tend to equate people who archive every single form of written communication to those who have an Obsessive Compulsive Disorder, in that they hoarde everything in sight: newspapers, snail mail, magazines, boxes, etc..
Commit to memory and destroy the evidence. Thats my way of handling archives.
IMAP isn't really a word, its an anocrym. But I agree, IMAP is the way I use, it helps for the relevant email and on my network I use both Linux and Windows (with a dedicated Linux box). I have Evolution set up to continually check and sort my email into IMAP folders and I generally read them off my linux box. If I need to click on links, I generally open up any Windows email client (from Thunderbird to Outlook) and it'll connect to IMAP and my emails will all appear (nicely sorted too!). If I need webmail, I have squirrelmail (which I use) to access my IMAP system remotely using any web browser and I can get at my hotmail email (from the old days, but my accounts are still active) using freepops or some other Web Email to POP3 gateway. Everything (but gmail, my mailing list archive), is in my IMAP server - I just backup one area.
I always wondered where this setting was...
Convert everything into mbox format. formail will help you with that.
Use mairix to search through email.
mutt is the best mail client ever.
-rsw
Combine this with spotlight/tiger in mac os. Spotlight indexes PDF content. print it to pdf and it will be searchable. Assuming you have a Mac that is.
Some drink at the fountain of knowledge. Others just gargle.
I archive my mail on /dev/null. Send it there daily.
I log and keep all my traffic including IRC logs going back to '94.
Hey B5_geek, here's a trick to free up a lot of disk space *and* raise the S/N ratio in your logs:
mv irclog.txt irclog.txt.fat && grep -vi lol irclog.txt.fat > irclog.txt && rm -f irclog.txt.fat
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Whatever you do, I think its best to keep it in an open and obvious format like mbox or maildir. The nice thing about maildir though, is that since all the messages are seperate, it might be a little easier to write a program to put them into a new format.
.saved-messages-YYYY-MM and also to my inbox. I simply don't touch the saved-messages folders and when I am done with the message in my inbox, I just delete it. This has worked well for me and makes it much easier to deal with archiving old mail. In the end, having categorized folders and such is just a waste of time. Its kinda like the wm2 (window manager) way of thinking, but for mailboxes.
Personally, since 1999, I've been using a combination of maildir and procmail to archive and save my mail. Every message that comes in, goes to a folder called
One word: IMAP
...who knows what else. I've got freedom to try whatever I want at any given moment without losing my current or past mail.
Absolutely. I use no fewer than two mail clients on two different machines on any given business day. Every email I've sent since 1995 or something like that, and received since 1998 is available and searchable. Over this time, I've accessed this archive with the following clients:
* pine (lots of pine)
* mac mail
* thunderbird
* various netscapes/mozillas
* ML (some random IMAP reader)
* My phone (my old Sony/Ericcson speaks IMAP)
* My palm (two different apps)
* python
* a java webmail system I wrote
* three or four other webmail systems
* mutt
-- The world is watching America, and America is watching TV.
I have several CDs worth of stuff archived with ForKeeps:
http://www.fkeeps.com/whofor.htm
It's a bit of an old program and the interface is clunky, but it works reasonably well once you work through it.
--Steve
That way it won't be subject to a sub poena. You'll regret it one day if you don't. Do you realize how much incriminating stuff you have in there?
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
My car is bigger than yours. Move it or lose it!
10 Your mother told you to stop being such a pack rat.
9. Disks fill up, no matter how cheap they are. Low cost doesn't excuse gluttony.
8. Backups take forever.
7. Restores take an eternity, especially if your not confident.
6. Mail client gets slower and slower.
5. Searches take too long.
4. Mail clients make mistakes, especially on big stores. See #7
3. Your CYA evidence may be used against you.
2. A mail store is not a file system and SMTP is not a file transfer protocol.
And the number one reason to delete your old email...
1. IT'S ALL A BUNCH OF USELESS CRAP JUST AS IT WAS WHEN YOU FIRST RECEIVED IT!!
6GB yearly? Holy shit...
Do you actually sign up to those free porn places?
I use grepmail to find old emails that I might need. Grepmail lets you use perl regular expressions to find messages and then outputs the entire message where a match was found. You can use grepm to open grepmail matches as a mailbox in mutt. grepine does the same for Pine, which I use.
At the end of each year I clean the spam out of my archives using a procmail recipe and spamassassin. This recipe marks messages as deleted in the mailbox. I open these in pine, sort by deleted, and double check them. Once I'm sure they're all spam, I delete them:
The special spamassassin config turns off bayesian filtering and sets the threshold high:
The rest of the spam I clean out by hand.Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
Well, if you're going to be on this topic, a mention of ZOË is pretty much required.
ZOË is a sort of an archiving proxy that sits between your mail client and your mail server. It stores and indexes everything, so you can pop open a browser window and do a search on anything you've ever sent or received. Naturally, this was created before gmail.
With ZOË you don't need to worry about those pesky email folders and waiting for long searches.
Naturally, spam filtering before ZOË is a good idea.
Signatures are a waste of bandwi (buffering...)
I do data mining research, most recently on the Enron email dataset, and I've actually been having to roll my own multi-mailbox storage, access, and retrieval systems. It's taking way more time than I'd like, at this point I've gotten a database and web-based viewers made up (beware, they're quite slow).
If anyone has an idea of an open-source application similar to what the submitter is looking for, it would help my research quite a bit. There's practical research applications in this stuff, if someone's interested in making it.
Just about every email program that I've used has managed to export to CSV. A few web-based email systems didn't allow such imports and some hunting on the web found some sort of convertor (like YahooPOPS!, etc.) that converted to POP and then I exported them to CSV using Eudora or Outlook, or whatever program I was particularly enamored with. .PST files that are archived on CDs and on an external hard drive. .PST in case I needed them for anything later on. ...), since I can usually remember roughly when I got an email that I was looking for - alternatives include perhaps by name of sender or company. :)
Admittedly, sometimes the column names didn't match up ("Sender" v "From"), etc., but for the most part that how I did it. I also made an effort to keep the number of email accounts that I had to a minimum. At this point of time, most everything is stored in the form of
I also made an effort to keep my email accounts to a minimum, which probably made this entire process significantly easier and when I did close an account (like when I finished work at a company), I exported the emails from there and kept them in
As far as indexing works - I have them stored in 6 month segments (Jan97-Jun97, Jul97-Dec97,
I do archive IMs - Trillian worries about it for me.
Hope this helps.
I've found the easiest way to handle EMail when it's in multiple formats like that is to just print everything out and store it in boxes in my garage.
I am NOT a man!
I am a free number!
I keep the emails in mailbox format (that is, in plain text as it is stored in most UNIX systems), in several files. The reason I do that is that most email readers (MUA) can read mailbox format. I keep them in several files to make it more manageable.
The tools that I use to manipulate emails are mostly "from", "procmail", "grep", and "less". There used to be tools from the "elm" era (still remember them?), such as "frm" (which is better than "from"), "reademail" (to read individual email, given the number of email in the archive), "deletemail" (which can delete an individual email in the archive). Too bad, these tools are gone. At one point I slapped a simple Tk interface as a front end to those tools. But it didn't scale well.
At one point I did experiment to store emails in indiviual files. But the tools to manipulate them are limitted. I used MH.
The next experiment I did was to take all those email headers and put them in a database. (I used msql, which was popular at that time.) Then, I had a Java applet and perl script to make queries to the database (and actually did an analysis of my reading habit). The actual emails were stored as plain text files. Each email was stored in individual file. Basically, the original email was untouched. I got bored and never continue the project.
Now ... I am stll searching for the perfect email tools.
I strongly recommend Outport. It does an extremely good job of converting MSFT Outlook attachments into something more readable (mbox I think, it has been a while). MS Outlook usually mangles attachments into some wrapper called TNEF.
Also, anyone know of a client program that will recursively create folders on an IMAP server (maybe a server issue. In which case, what server?)
I had gotten over translating my years of Outlook email into something more universally readable, but I have so many nested folders that the inability to have the client recirsively create IMAP folders is an issue. Suggestions?
Have a look at bincimap. It works well, installs easily and seems to be quite secure.
See http://www.bincimap.org/ for more details.
It runs on my small linux server without problems and I can access my emails securely over ssl from anywhere. The only limit is the hd size, so even a couple of GB should be no problem.
...and let mutt sort out.
:0 c:
I had multiple folders, sorted by people/project. I got in a complete mess and finally snapped when I spent half an hour looking for a simple message.
Use procmail to write all incoming messages to 'all-mail-YYYY-MM' and use Mutt hooks to write out to the same directory.
At the end of the year, cat them together and make 'all-mail-YYYY'. Accessing and reading this mailbox can be done with 'mutt -R -f all-mail-YYYY' as this opens read-only. Use 'l' to do 'limit' searches and use ~t, ~f, and ~b in AND combinations to limit on To: From: and body of messages. It's lovely only having to look in one place!
Procmail:
INCOMING=all-mail-`date +%Y-%m`
# now I want to keep a copy of EVERYTHING in a dated directory
$INCOMING
Muttrc:
set record="+all-mail-`date +%Y-%m`"
Works for me!
Dr Fish
My father was concerned about the longevity of his e-mail a few years ago, so I created a small batch file that converts his Outlook Express mail archive into mbox on a monthly basis. Last month he asked if I could convert them "into a web site" so he can get an idea of a thread history without parsing a huge file. When I get a moment I'm planning to write a script that outputs each message to a new file in html tags and use the message subject and date to create a rudimentary index.html.
I'm surprised no one has tried this before. It's a good low-tech solution for people who require information in a hurry and is more immediate than a flat file.
IMAP is the answer. I don't use IMAP on a regular basis, but it did let me export mail from outlook over to Evolution on linux.
I used the UW IMAP server, which is a little easier to set up than the Cygnus one.
The UW IMAPd keeps its folders in mbox format, so it's a great tool for converting oddly formatted mail.
Moving email is pretty easy -- it's harder to move calendar entries, address books, notes, and the other sorts of data that ends up in a program like outlook. I think the easiest way to do it would be to sync to a palm device, on windows, and then do it again under linux, although I haven't actually tried that.
As a bonus, you can tell which emails are worth reading by how they get moderated. All your work related emails will probably be modded Troll, except for your performance review, which will be modded +5 Funny. Email from your illicit lovers will be modded Insightful, since that type of thing is new to most of us. Email from your family will be conveniently modded down so you will not have to deal with it. Your friends won't need to send you any email at all, since they are probably already on Slashdot, and therefore, know enough to post in your threads.
Problem solved. Ah, Slashdot... Is there anything it can't do?
Show me on the doll where his noodly appendage touched you.
There are traps.
Information wants to be free.
Entertainment wants to be paid.
You just want to be cheap.
I have a machine that runs a dedicated IMAP server with one account on it (mine) which has my 2GB+ of e-mail since 1996. (minus the spam of course). That way I can easily switch between different clients and not have to worry about converting my e-mail all the time.
You have to print it with something. Ink: one of the most expensive ways to put stuff on paper. Heck, they say it costs seven times more than champagne per drop! That, plus the costs of cartridges and printer maintenance and, and... oh the horror! ;)
Me? I obsessively reinstall my operating system and reimport old mailboxes into my mail client, so I have a dozen copies of 5-year old email, ten copies of 4-year old email, 8 copies of 3-year old email, etc. No need for backups... plus when I search my computer for old email, I get a dozen copies of what I'm looking for!
One assumes, perhaps wrongly, that pdfs are a more durable format than mail. This of course is what the entire "ask slashdot" question was about. How do you deal with past mail in different mail programs. If you keep it in pdf format then it probably will be readable regardless of the mail program that generated it. However then the problem is wading through 10,000 old e-mail pdfs. Spotlight solves this. Now that spotlight exists one assumes no operating systme in the future will ever be without something like spotlight.
Some drink at the fountain of knowledge. Others just gargle.
You mean, besides the NSA?
You open all your email with an email client and move all the disparate inboxes into a big IMAP store on your own computer or one provided by a joint like Fastmail.fm or Runbox.com
Then, you keep a local backup on any computer that you move to with offlineimap, a wonderful utility that doubles as a multi-inbox syncronizer and backup utility. I have been using it for the past two years and can attest to its reliability.
EnjoyPragmatism as an ideology is not particularly pragmatic in the long term. Keep it in mind when you dismiss Free Software
I actually have two backups of my mail:
I've about 15G of emails now dating back to the early 90s, all stored in a locally-installed Cyrus IMAP server (maildir format, technically). Never used AOL's mail or free webmails so that was never a concern of mine.
This is why the House of Lords was resistent to the prosecution of Nazi war criminals for so long, incidentally.
Wikileaks, no DNS
If you're serious about archiving or migrating your email, take a look at Mailbag Assistant and Aid4Mail for Windows. Mailbag Assistant makes it easy to read email from many different formats. It can search and display your email archives from CD-ROMs and any other location accessible to Windows Explorer. Aid4Mail supports even more mail clients and can archive your messages into highly compressed ZIP files. It can also help you migrate your email to another mail client or a database. Aid4Mail is very accurate; it can correctly migrate status information and is capable of rebuilding Eudora mailbox files and MS Outlook message folders into standard mbox files.
Mailbag Assistant 3.8:
http://www.fookes.com/mailbag/
Aid4Mail 1.0:
http://www.aid4mail.com/
--
Eric Fookes
http://www.fookes.com/
BTW, it's also the only way to reconcile an enforced corporate suffering of /barf/ Outlook and more sane programs like Evolution (slooow) and kmail (not quite as pretty) or pine (good for SSH on slow dialups ;-).
With an IMAP backend you can try it all without tying yourself into one format - that's Open Standards for you!
Insert
With the advent and subsequent improvements of LiveCD distros, it should be relatively painless for the average /.'er to:
* Create a multi-session CD/DVD with your favorite Linux LiveCD distro
(or roll your own and create an ISO for future use)
and
* Backup email files to said CD/DVD
(I suggest a set of re-writable media of good quality to play it safe.)
Further suggestions:
1. It would be advisable to split your archives (ie. Mail2004, etc.), especially if you plan to retain a sizeable amount of mail.
2. Convert archives from older mail clients before creating backup, or use a newer mail client that can read the old files with ease.
Good luck!