Slashdot Mirror


How Do You Store and Reconcile Email Archives?

heyitsjustme wants to know how you deal with old email. "I delete most of what I get but keep the stuff from friends and relations as an archive. Unfortunately I have these email archives from the late 80's through today in the form of macintosh, linux and windows mailboxes including AOL 1.0 mailboxes. What does everyone use to archive email across multiple platforms and non-standard mailbox formats? Is there an easy solution out there? Does anyone archive IM?"

20 of 380 comments (clear)

  1. Cyrus Imap... by DaGoodBoy · · Score: 2, Interesting

    ...with fetchmail / procmail / cyrdeliver for sorting and storing from other sources. How can 5GB of mail can't be wrong?! I can slice and dice my all my email (including about a gig of spam...) for choice bits of information.

    --
    My God! It's full of Voids!
  2. Unix mail format by saccade.com · · Score: 3, Interesting

    I use the basic Unix mail format, essentially plain text series of messages. Eudora does fine with it; and most anything else can read/import it. I have email going back to the 80's in this format. The one time I had to convert was when I was working for a company that used "Quickmail" on the Mac. I wound up reverse engineering their format and hacking up a program to convert it to plain text.

    1. Re:Unix mail format by Zocalo · · Score: 2, Interesting
      Ditto, in my case the "mbox" format to be precise. I currently use Procmail to automatically CC all incoming messages to a dedicated archive file, one per month, each year in a seperate folder. Outgoing mail is also sent to the same file, although I could easily have an "infile" and an "outfile", break mail apart by topic, or whatever. For more robust long term backup purposes I simply tarball the dozen files within each directory into a file called "mail-yyyy.tar.gz" and backup as normal.

      Since mbox is a pretty standard format many tools have a built in import routine or that there there will be an existing third party tool to handle any conversions at least. Failing that, it's fairly trivial to cobble together a one-off conversion tool using a scripting language, or even to batch remail each message one at a time if your new email client uses some undocumented storage format, or is an online service like GMail.

      --
      UNIX? They're not even circumcised! Savages!
  3. Re:Italian school of driving by boybaha · · Score: 4, Interesting

    I also have email archives that stretch back to the early-1990s. I pretty much still have every email I've ever sent or received. When upgrading email clients, I often migrate my archives with me, converting them using whatever client's built-in importing and exporting functions I have available. I went from Eudora to Outlook Express to Thunderbird to Mac Mail. I also have programs that "pop" webmail off their sites (gmail, hotmail and yahoo) to consolidate them in whatever current mail client I'm using. I just keep them in neat folders ("Old Eudora Mail," "Old Yahoo Mail")..

  4. Kinda Sorta OT by hot_Karls_bad_cavern · · Score: 4, Interesting

    but ...

    Along these lines, is there an OSS package that can read the varied formats the Submitter is referring to, tag and drop them in a DB with a nice, friendly, web-enabled (secure) front-end for searching?

    My former employer kept *all* of his email from the last 20 years in tar.gz files. Let's just say it wasn't easy to find an email from er, 15 years ago very easily.

    Is there a package that can read the mbox, the other box-formats, plain text, pull from pop, old tar.gz bundles, categorize (sorta), tag and make such things searchable?

    Totally a shot in the dark here, i'm not a mail guy at all ... just wondering as the Submitter did what i like /. Submitters to do: make me think and look for new, better stuff ... or better ways to do old-stuff.

    It is the "drink" that makes me wonder, sorry :)

  5. Re:One Word by Murphy+Murph · · Score: 2, Interesting

    I second this.
    I started running my own IMAP server on an old machine a year or so ago - and synced all my old mail archives to various folders.

    My mailserver also solves another problem - multiple POP accounts. I have my IMAP server set up so that each one of my POP accounts gets automaticly tagged and sent to it's own folder.

    A third common problem this solves is having multiple machines. Now my desktop's email client is always synced with my laptop's email client. Before I had run into problems when ever I traveled and fetched my email from the road.

    --
    I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
  6. Archive what? by Mishura · · Score: 3, Interesting

    I never keep emails, or archive IMs or any other form of communication. Once a email is read, it is deleted. Same goes for normal old-skool mail, I read it and then trash it. The only exceptions are of letters/email of some importance such as information I need to keep handy, or if it has some kind of sentimental value (letters from deceased relatives for example.)

    Sure, HDD space is cheap; but I tend to equate people who archive every single form of written communication to those who have an Obsessive Compulsive Disorder, in that they hoarde everything in sight: newspapers, snail mail, magazines, boxes, etc..

    Commit to memory and destroy the evidence. Thats my way of handling archives.

  7. Re:One Word by pHDNgell · · Score: 5, Interesting

    One word: IMAP

    Absolutely. I use no fewer than two mail clients on two different machines on any given business day. Every email I've sent since 1995 or something like that, and received since 1998 is available and searchable. Over this time, I've accessed this archive with the following clients:

    * pine (lots of pine)
    * mac mail
    * thunderbird
    * various netscapes/mozillas
    * ML (some random IMAP reader)
    * My phone (my old Sony/Ericcson speaks IMAP)
    * My palm (two different apps)
    * python
    * a java webmail system I wrote
    * three or four other webmail systems
    * mutt ...who knows what else. I've got freedom to try whatever I want at any given moment without losing my current or past mail.

    --
    -- The world is watching America, and America is watching TV.
  8. Archiving tool: ForKeeps by sstern · · Score: 3, Interesting

    I have several CDs worth of stuff archived with ForKeeps:

    http://www.fkeeps.com/whofor.htm

    It's a bit of an old program and the interface is clunky, but it works reasonably well once you work through it.

    --
    --Steve
  9. Re:email archive by Anonymous Coward · · Score: 1, Interesting
    Of course its a fair comment. As you state many fortune 500 companies engage in record manipulation and Microsoft WAS caught, therefore it is a perfectly legitimate comment.

    Maybe other companies do it but until there is proof then you can't slander them but Microsoft do it, so they're fair game.

  10. Re:One word by Anonymous Coward · · Score: 0, Interesting
    And have all my mail arrive in the NSA's inbox?

    Thanks, but I'll pass.

  11. Practical research applications by Martian_Bob · · Score: 2, Interesting

    I do data mining research, most recently on the Enron email dataset, and I've actually been having to roll my own multi-mailbox storage, access, and retrieval systems. It's taking way more time than I'd like, at this point I've gotten a database and web-based viewers made up (beware, they're quite slow).

    If anyone has an idea of an open-source application similar to what the submitter is looking for, it would help my research quite a bit. There's practical research applications in this stuff, if someone's interested in making it.

  12. CSV by vnangia · · Score: 2, Interesting

    Just about every email program that I've used has managed to export to CSV. A few web-based email systems didn't allow such imports and some hunting on the web found some sort of convertor (like YahooPOPS!, etc.) that converted to POP and then I exported them to CSV using Eudora or Outlook, or whatever program I was particularly enamored with.
    Admittedly, sometimes the column names didn't match up ("Sender" v "From"), etc., but for the most part that how I did it. I also made an effort to keep the number of email accounts that I had to a minimum. At this point of time, most everything is stored in the form of .PST files that are archived on CDs and on an external hard drive.
    I also made an effort to keep my email accounts to a minimum, which probably made this entire process significantly easier and when I did close an account (like when I finished work at a company), I exported the emails from there and kept them in .PST in case I needed them for anything later on.
    As far as indexing works - I have them stored in 6 month segments (Jan97-Jun97, Jul97-Dec97, ...), since I can usually remember roughly when I got an email that I was looking for - alternatives include perhaps by name of sender or company.
    I do archive IMs - Trillian worries about it for me. :)
    Hope this helps.

  13. Email archiving and tools by rahard · · Score: 2, Interesting
    I archive most of my emails. Up to this point, my email archive is close to 2 GBytes.

    I keep the emails in mailbox format (that is, in plain text as it is stored in most UNIX systems), in several files. The reason I do that is that most email readers (MUA) can read mailbox format. I keep them in several files to make it more manageable.

    The tools that I use to manipulate emails are mostly "from", "procmail", "grep", and "less". There used to be tools from the "elm" era (still remember them?), such as "frm" (which is better than "from"), "reademail" (to read individual email, given the number of email in the archive), "deletemail" (which can delete an individual email in the archive). Too bad, these tools are gone. At one point I slapped a simple Tk interface as a front end to those tools. But it didn't scale well.

    At one point I did experiment to store emails in indiviual files. But the tools to manipulate them are limitted. I used MH.

    The next experiment I did was to take all those email headers and put them in a database. (I used msql, which was popular at that time.) Then, I had a Java applet and perl script to make queries to the database (and actually did an analysis of my reading habit). The actual emails were stored as plain text files. Each email was stored in individual file. Basically, the original email was untouched. I got bored and never continue the project.

    Now ... I am stll searching for the perfect email tools.

  14. OE to mbox to html by Gax · · Score: 2, Interesting

    My father was concerned about the longevity of his e-mail a few years ago, so I created a small batch file that converts his Outlook Express mail archive into mbox on a monthly basis. Last month he asked if I could convert them "into a web site" so he can get an idea of a thread history without parsing a huge file. When I get a moment I'm planning to write a script that outputs each message to a new file in html tags and use the message subject and date to create a rudimentary index.html.

    I'm surprised no one has tried this before. It's a good low-tech solution for people who require information in a hurry and is more immediate than a flat file.

  15. Re:Disk space is cheap. Why bother deleting? by glesga_kiss · · Score: 2, Interesting
    This may involve installing Outlook, exporting all of your mail to Outlook, and importing it all from outlook, but it is worth it.

    Outlook + IMAP is the way I do it. You can drag messages between local storage and your mail server.

  16. Ink by PCheese · · Score: 2, Interesting

    You have to print it with something. Ink: one of the most expensive ways to put stuff on paper. Heck, they say it costs seven times more than champagne per drop! That, plus the costs of cartridges and printer maintenance and, and... oh the horror! ;)

    Me? I obsessively reinstall my operating system and reimport old mailboxes into my mail client, so I have a dozen copies of 5-year old email, ten copies of 4-year old email, 8 copies of 3-year old email, etc. No need for backups... plus when I search my computer for old email, I get a dozen copies of what I'm looking for!

  17. ex post facto Law by Morosoph · · Score: 2, Interesting

    This is why the House of Lords was resistent to the prosecution of Nazi war criminals for so long, incidentally.

  18. Re:Dave's top ten by Anonymous Coward · · Score: 1, Interesting

    I consolidated all my personal e-mail since 1995 into a Maildir (which I access using IMAP).. It totalled only 60 MB. I don't think that is a whole lot that I need to worry about disk space or searching or my IMAP server not able to handle it. The way I have it organized, my searches don't occur on any of the old mail (unless I want it to). The only point I think you were right about is the evidence used against me (in my case, anyhow). It's kinda entertaining to go back and read some of my old correspondences and see how much of a different person I was back then. It's kinda like looking at old diaries or something.

  19. Backup mail archives along with a Linux Live CD... by PCMeister · · Score: 2, Interesting

    With the advent and subsequent improvements of LiveCD distros, it should be relatively painless for the average /.'er to:

    * Create a multi-session CD/DVD with your favorite Linux LiveCD distro
    (or roll your own and create an ISO for future use)

    and

    * Backup email files to said CD/DVD
    (I suggest a set of re-writable media of good quality to play it safe.)

    Further suggestions:
    1. It would be advisable to split your archives (ie. Mail2004, etc.), especially if you plan to retain a sizeable amount of mail.
    2. Convert archives from older mail clients before creating backup, or use a newer mail client that can read the old files with ease.

    Good luck!