Slashdot Mirror


Ask Slashdot: Best Way To Archive and Access Ancient Emails?

An anonymous reader writes "I started using email in the early 90s and have lost most of that first decade due to ignorance, botched backups, and so on. But since about 2000, I've got most — if not all — of my email in some form or other. I run Linux, so this has mainly been in a mix of various programs: Kmail, Evolution, Thunderbird. The past 2-3 years are still on the IMAP servers. My problem is that I only rarely NEED to look back to email of 5 years ago. But sometimes it's nice. Or I just want to reminisce about something...or find an old attachment that I was sent. But I do not want to be clogging my current email client of choice with vast backups and even more, I don't know if it will even easily convert. The file structures are different, some are mbox, others maildir, etc., and I would ideally like a way to 1) store and archive these emails, 2) access them, and 3) search by Sender, Subject, Date, Attachments. Is there anything I can do or do I just have to keep legacy applications on hand for this? Should I keep trying to upgrade and pull old files into the new applications? Any help or suggestions about what YOU do would be great."

10 of 282 comments (clear)

  1. Just dump them by sk999 · · Score: 5, Interesting

    Had the same need 20 years ago when migrating from VAX/VMS to Unix. The old emails were saved in a not quite readable format, but I figured I could recover them if necessary. In the end, never bothered. Yes, there are a few (actually, only two) that I'd like to resurrect now, but life moves on.

  2. Use a database! by cosm · · Score: 5, Interesting

    I'm a big fan of throwing together a DB when I want to store things categorically like that and want fast searches. If you are up to the task, hunt down some tools/roll your own so that you have a nice relational database and some stored procedures for getting what you want when you need it.

    You could export your emails to some parsable format, write an importer to extract the basics that you want to keep (from/to/subject/body,attachments/entire binary blob/etc) and then bulk insert that mess into on a mysql/sql server tucked away somewhere locally or "in the cloud" (EC2, Azure). Just another option as I'm sure you'll see here many here. At least with this route you are in full control of how you index, what you can search, encryption, performance, level of backups, etc. Maybe not the best way for some but I know if I had over 100000 emails that I wanted searchable very very quickly with advanced SQL like searching, this would be a cool way to do it (time permitting). Good luck! And to the pedantry to ensue...Yes. Good day.

    --
    'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
    1. Re:Use a database! by Anonymous Coward · · Score: 5, Interesting

      And you could make a doilie, and a hat, and a casserole, and wallpaper with the headers, and knit the .signatures into a fancy flying cape.

      Just use IMAP and Maildir. Modern systems are fast enough to allow you to search the content directly, and not vulnerable to the database support wackiness this sort of "I can pre-organize it now and make my life better by wasting it pre-programming my queries" approach.

  3. Gmail by lga · · Score: 5, Informative

    Best method of storing and searching old email? Gmail. It can import from pop and imap so you can point it at your other inboxes and let it get on with it.You can upload from other mail clients to Google's imap server. Obviously it's amazing at searching through the archives.

    Best method if you're concerned about Gmail's privacy? I'm still working on that one.

  4. The obvious answer by 93+Escort+Wagon · · Score: 5, Funny

    Design a MySQL database for storing your mail messages, keying on sender, subject, date, and presence of attachments (bonus points for storing the attachments as blobs rather than as external files). Then write a perl script that'll automatically parse all your incoming email and convert it to database entries. I suppose if you're lazy the script could just monitor your mail spool, but it'd be better to just have it listen for incoming connections and handle the mail directly.

    Next, make copies of that script, modifying as necessary to process all your old mail archives.

    Oh, and you'll need to write another perl script to access all new mail - not from your mail spool, but from this database. You should probably name this system after some animal too. If you absolutely MUST have a graphical interface on it, don't use anything newer than TCL+Tk - but going with curses would be a better choice.

    Oh - it has to be GPLv3, or we'll hate you and probably mailbomb your machine.

    What - isn't that the Slashdot way?

    --
    #DeleteChrome
  5. Re:IMAP by kwerle · · Score: 5, Insightful

    This.

    I fired up imap servers for all my old mail.
    I fired up a modern mail client (OSX Mail.app) and connected to all of 'em and also to gmail.
    I dragged all my old email into gmail. In a GUI. And it worked.

    Done.

    I no longer run mailservers. Too much of a headache. gmail is awesome (with imap access, even). Indexing, instant searching, etc.

    If you don't want/trust your email to the cloud, then this isn't for you. Unless you want to run your own imap server with whatever backend suits you - then you can dump it all there. I just can't be bothered to manage that after 15+ years of doing so.

  6. Re:Stop being a hoarder by Ardyvee · · Score: 5, Interesting

    It's kind of like photos, you know? Or letters, and such. People like to store those things, because they serve as a memory aid for what the mind no longer holds. It is also quite useful for history reconstruction/when you are old and have nothing else to do but a box full of photos/letters/etc.

    Not to say that you are wrong on your point, except on the weird part. Unless you are okay with double standards, or you also consider anybody who keeps photos of parties/graduations/etc weird... Just saying.

    --
    I don't care if I'm wrong. I only care about everyone obtaining something from the discussion.
  7. Same rules as any archiving: by jrronimo · · Score: 5, Insightful

    I'd say follow the same rules as any archiving of media:

    Pick one format and migrate all of your messages to that: In this case, I'd say mbox. Thunderbird and most other mail programs read it and you can get most of your mail into mbox format via IMAP/Thunderbird from whatever mail client can read your old ones. You can store your mbox files locally in Thunderbird and gain Thunderbird's searching (for instance) without the need for an actual back-end. I was able to read some mail stored in Netscape Mail because it was just mbox files and opening them in Thunderbird was a breeze.

    Most importantly: Every 5-10 years, re-evaluate your storage choice. Is Thunderbird still around? Is mbox still pretty well regarded? If you find you need to migrate again, do it! If both are still active / supported, then hold onto 'em. The only way to perpetually maintain media access is to make sure your choices are still valid on a regular basis. This is true for any media: As the old formats go obsolete (cassette tape, VHS), you need to migrate that data to the next readily accessible format (CDs, DVDs; FLACs, MPEG(?)).

    I think the biggest problem is that you have a mish-mash of stored files right now. You'll save yourself a headache in the future by tearing the band-aid off now and taking the time to get all of your mail into one format. Then, in the future, when you need to convert, it'll be many steps easier since you won't have to visit Slashdot and find out what to do about your mail again next time. :)

  8. Re:IMAP by arth1 · · Score: 5, Informative

    _NO_. Under no circumstances use "mbox" for mail storage, or anything other than a temporary stage on the way to transferring it to something contemporary and uable such as Maildir. If you lose that one mbox file, by file system corruption or by fat finger accident or overflowing a partition or in tht eprocess of merging new email with it, you've lost _all_ your mail in that mbox.

    Thus speaks ignorance. If you write corrupt data to a mbox file, nothing prior to the corruption is affected at all. Unlike most formats that don't store each mail in a separate file, you can also very easily run recovery against a mbox file. Heck, a one-liner perl script can retrieve anything from before and after a corruption.

    And "overflowing a partition"? Um, run that by us again. If you mean disk full, that doesn't truly affect a format that's made for appending. You won't be able to append. Any other format you can come up with will have the same problem.

    And for archival purposes, this also does not apply. You don't make changes to your archive. Period.
    And you back it up. Period.
    Which is a heck of a lot easier to do with mbox than most other formats.

    But again, the main strength is that it is so simple, which means that pretty much every mail program out there will support it, one way or another.
    Choosing a more modern format leaves you with fewer options, and less certainty that it will be supported in the future. 20 years down the road, mbox will still be supported. It has an RFC - http://tools.ietf.org/html/rfc4155

    Can you say the same about ANY other format? Maildir doesn't work on systems that doesn't allow colon in file names, and hashes the filename based on the hostname which both isn't portable, and crashes badly for many implementations if you have a non-ascii hostname. Not to mention that the format has balkanized, to the point that it's no longer compatible betweeen implementations.

    Again, for archival purposes, simplicity is the key.

  9. Re:Clueless. Takes 10 minutes start to finish by WillKemp · · Score: 5, Funny

    I never will understand why some people feel the need to post on topics they don't have the slightest clue about.

    Because it's a long standing Slashdot tradition!