Slashdot Mirror


Ask Slashdot: Best Way To Archive and Access Ancient Emails?

An anonymous reader writes "I started using email in the early 90s and have lost most of that first decade due to ignorance, botched backups, and so on. But since about 2000, I've got most — if not all — of my email in some form or other. I run Linux, so this has mainly been in a mix of various programs: Kmail, Evolution, Thunderbird. The past 2-3 years are still on the IMAP servers. My problem is that I only rarely NEED to look back to email of 5 years ago. But sometimes it's nice. Or I just want to reminisce about something...or find an old attachment that I was sent. But I do not want to be clogging my current email client of choice with vast backups and even more, I don't know if it will even easily convert. The file structures are different, some are mbox, others maildir, etc., and I would ideally like a way to 1) store and archive these emails, 2) access them, and 3) search by Sender, Subject, Date, Attachments. Is there anything I can do or do I just have to keep legacy applications on hand for this? Should I keep trying to upgrade and pull old files into the new applications? Any help or suggestions about what YOU do would be great."

17 of 282 comments (clear)

  1. IMAP by sylvandb · · Score: 4, Informative

    Just IMAP it all.

    I went IMAP in 1997 and have never looked back.

    I've also used IMAP as a temporary conversion measure for people switching e-mail clients so even if you aren't sure, it makes a good first step.

    I don't understand the concern about too many e-mails. I can access my email back to 1992. With multiple folders it shouldn't be a problem and with modern indexing a search shouldn't be an issue.

    1. Re:IMAP by DNS-and-BIND · · Score: 4, Informative

      When I search for my brother's name I don't want to wait 30 seconds for a search to complete, nor do I want to see his emails from 10 years ago. I just want to see his last email that he sent about the trip we're taking next week. That's the concern about too many emails.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    2. Re:IMAP by kwerle · · Score: 5, Insightful

      This.

      I fired up imap servers for all my old mail.
      I fired up a modern mail client (OSX Mail.app) and connected to all of 'em and also to gmail.
      I dragged all my old email into gmail. In a GUI. And it worked.

      Done.

      I no longer run mailservers. Too much of a headache. gmail is awesome (with imap access, even). Indexing, instant searching, etc.

      If you don't want/trust your email to the cloud, then this isn't for you. Unless you want to run your own imap server with whatever backend suits you - then you can dump it all there. I just can't be bothered to manage that after 15+ years of doing so.

    3. Re:IMAP by Antique+Geekmeister · · Score: 4, Informative

      _NO_. Under no circumstances use "mbox" for mail storage, or anything other than a temporary stage on the way to transferring it to something contemporary and uable such as Maildir. If you lose that one mbox file, by file system corruption or by fat finger accident or overflowing a partition or in tht eprocess of merging new email with it, you've lost _all_ your mail in that mbox. And as you read, mark, or save mail, that file is constantly churning, making backup and replication of the mail spool far more dangerous and fragile, especially when the mail directory is bulky with years or decades of active mail threads or simply undeleted email.

      mbox was useful when the available inodes on a file system were limited programs benefited from using a single inode for transactions, and backups occurred on magtape, but there is simply no point to it in decasdes.

    4. Re:IMAP by arth1 · · Score: 5, Informative

      _NO_. Under no circumstances use "mbox" for mail storage, or anything other than a temporary stage on the way to transferring it to something contemporary and uable such as Maildir. If you lose that one mbox file, by file system corruption or by fat finger accident or overflowing a partition or in tht eprocess of merging new email with it, you've lost _all_ your mail in that mbox.

      Thus speaks ignorance. If you write corrupt data to a mbox file, nothing prior to the corruption is affected at all. Unlike most formats that don't store each mail in a separate file, you can also very easily run recovery against a mbox file. Heck, a one-liner perl script can retrieve anything from before and after a corruption.

      And "overflowing a partition"? Um, run that by us again. If you mean disk full, that doesn't truly affect a format that's made for appending. You won't be able to append. Any other format you can come up with will have the same problem.

      And for archival purposes, this also does not apply. You don't make changes to your archive. Period.
      And you back it up. Period.
      Which is a heck of a lot easier to do with mbox than most other formats.

      But again, the main strength is that it is so simple, which means that pretty much every mail program out there will support it, one way or another.
      Choosing a more modern format leaves you with fewer options, and less certainty that it will be supported in the future. 20 years down the road, mbox will still be supported. It has an RFC - http://tools.ietf.org/html/rfc4155

      Can you say the same about ANY other format? Maildir doesn't work on systems that doesn't allow colon in file names, and hashes the filename based on the hostname which both isn't portable, and crashes badly for many implementations if you have a non-ascii hostname. Not to mention that the format has balkanized, to the point that it's no longer compatible betweeen implementations.

      Again, for archival purposes, simplicity is the key.

  2. Just dump them by sk999 · · Score: 5, Interesting

    Had the same need 20 years ago when migrating from VAX/VMS to Unix. The old emails were saved in a not quite readable format, but I figured I could recover them if necessary. In the end, never bothered. Yes, there are a few (actually, only two) that I'd like to resurrect now, but life moves on.

  3. Use a database! by cosm · · Score: 5, Interesting

    I'm a big fan of throwing together a DB when I want to store things categorically like that and want fast searches. If you are up to the task, hunt down some tools/roll your own so that you have a nice relational database and some stored procedures for getting what you want when you need it.

    You could export your emails to some parsable format, write an importer to extract the basics that you want to keep (from/to/subject/body,attachments/entire binary blob/etc) and then bulk insert that mess into on a mysql/sql server tucked away somewhere locally or "in the cloud" (EC2, Azure). Just another option as I'm sure you'll see here many here. At least with this route you are in full control of how you index, what you can search, encryption, performance, level of backups, etc. Maybe not the best way for some but I know if I had over 100000 emails that I wanted searchable very very quickly with advanced SQL like searching, this would be a cool way to do it (time permitting). Good luck! And to the pedantry to ensue...Yes. Good day.

    --
    'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
    1. Re:Use a database! by Anonymous Coward · · Score: 5, Interesting

      And you could make a doilie, and a hat, and a casserole, and wallpaper with the headers, and knit the .signatures into a fancy flying cape.

      Just use IMAP and Maildir. Modern systems are fast enough to allow you to search the content directly, and not vulnerable to the database support wackiness this sort of "I can pre-organize it now and make my life better by wasting it pre-programming my queries" approach.

  4. Gmail by lga · · Score: 5, Informative

    Best method of storing and searching old email? Gmail. It can import from pop and imap so you can point it at your other inboxes and let it get on with it.You can upload from other mail clients to Google's imap server. Obviously it's amazing at searching through the archives.

    Best method if you're concerned about Gmail's privacy? I'm still working on that one.

    1. Re:Gmail by zekele2 · · Score: 4, Informative

      Best method of storing and searching old email? Gmail. It can import from pop and imap so you can point it at your other inboxes and let it get on with it.You can upload from other mail clients to Google's imap server. Obviously it's amazing at searching through the archives.

      Best method if you're concerned about Gmail's privacy? I'm still working on that one.

      The solution is Google Apps for your own domain. $5 a month per user, 25Gb space, IMAP, no advertising (which is where most of the privacy issues arise), and most importantly, no lock-in as you can switch your email to a different provider at any time without changing email address. As you said, Gmail is by far the best for searching old email. I haven't run an email server for years.

  5. The obvious answer by 93+Escort+Wagon · · Score: 5, Funny

    Design a MySQL database for storing your mail messages, keying on sender, subject, date, and presence of attachments (bonus points for storing the attachments as blobs rather than as external files). Then write a perl script that'll automatically parse all your incoming email and convert it to database entries. I suppose if you're lazy the script could just monitor your mail spool, but it'd be better to just have it listen for incoming connections and handle the mail directly.

    Next, make copies of that script, modifying as necessary to process all your old mail archives.

    Oh, and you'll need to write another perl script to access all new mail - not from your mail spool, but from this database. You should probably name this system after some animal too. If you absolutely MUST have a graphical interface on it, don't use anything newer than TCL+Tk - but going with curses would be a better choice.

    Oh - it has to be GPLv3, or we'll hate you and probably mailbomb your machine.

    What - isn't that the Slashdot way?

    --
    #DeleteChrome
    1. Re:The obvious answer by the+eric+conspiracy · · Score: 4, Funny

      Holy wheel reinvention, Batman.

  6. Stop being a hoarder by realmolo · · Score: 4, Insightful

    You don't need all those e-mails. Keep the few you actually care about (copy and paste the text into a regular file, and save any attachments you want), and get on with your life.

    People that keep every e-mail are weird. Quit living in the past.

    1. Re:Stop being a hoarder by Ardyvee · · Score: 5, Interesting

      It's kind of like photos, you know? Or letters, and such. People like to store those things, because they serve as a memory aid for what the mind no longer holds. It is also quite useful for history reconstruction/when you are old and have nothing else to do but a box full of photos/letters/etc.

      Not to say that you are wrong on your point, except on the weird part. Unless you are okay with double standards, or you also consider anybody who keeps photos of parties/graduations/etc weird... Just saying.

      --
      I don't care if I'm wrong. I only care about everyone obtaining something from the discussion.
    2. Re:Stop being a hoarder by icebraining · · Score: 4, Interesting

      But why would I waste time manually finding and copying individual emails, when I can just let the backup script archive them all for virtually no cost?

  7. Same rules as any archiving: by jrronimo · · Score: 5, Insightful

    I'd say follow the same rules as any archiving of media:

    Pick one format and migrate all of your messages to that: In this case, I'd say mbox. Thunderbird and most other mail programs read it and you can get most of your mail into mbox format via IMAP/Thunderbird from whatever mail client can read your old ones. You can store your mbox files locally in Thunderbird and gain Thunderbird's searching (for instance) without the need for an actual back-end. I was able to read some mail stored in Netscape Mail because it was just mbox files and opening them in Thunderbird was a breeze.

    Most importantly: Every 5-10 years, re-evaluate your storage choice. Is Thunderbird still around? Is mbox still pretty well regarded? If you find you need to migrate again, do it! If both are still active / supported, then hold onto 'em. The only way to perpetually maintain media access is to make sure your choices are still valid on a regular basis. This is true for any media: As the old formats go obsolete (cassette tape, VHS), you need to migrate that data to the next readily accessible format (CDs, DVDs; FLACs, MPEG(?)).

    I think the biggest problem is that you have a mish-mash of stored files right now. You'll save yourself a headache in the future by tearing the band-aid off now and taking the time to get all of your mail into one format. Then, in the future, when you need to convert, it'll be many steps easier since you won't have to visit Slashdot and find out what to do about your mail again next time. :)

  8. Re:Clueless. Takes 10 minutes start to finish by WillKemp · · Score: 5, Funny

    I never will understand why some people feel the need to post on topics they don't have the slightest clue about.

    Because it's a long standing Slashdot tradition!