Slashdot Mirror


Ask Slashdot: Best (or Better) Ways To Archive Email?

An anonymous reader writes: I've been using email since the early '90s and have probably half a million emails in various places and accounts. Some of them are currently in .tar files, others in the original folders from obsolete or I-don't-use-them-anymore mail clients. Some IMAP, some POP3. You get the picture. I don't often need to access emails older than a year or two, but when I do, I have found that my only hope for the truly archived ones is to guess what Grep combo might find the right text in the file ... and then pick through the often unformatted, unwrapped, super ugly text until I find the email address or info that I'm searching for. Because of this, I tend to at-all-costs leave emails on servers or at least in the clients so that I can more easily search and find.

My question is whether there's any way to safely store them in a way that I can actually use them later, offline, in a way that allows for easy date searches, email address searches, and so on. Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore. So what ways to people archive emails effectively? Or is this totally a lost cause and I should keep limping along with grep?

177 comments

  1. MailStore Home is the Answer by Anonymous Coward · · Score: 2, Informative

    MailStore Home is the defacto best free method I've found: http://www.mailstore.com/en/mailstore-home-email-archiving.aspx

    1. Re:MailStore Home is the Answer by Daniel_Staal · · Score: 2

      Sounds like it might be good, if you run Windows. Another option is just to set up a home IMAP server that you can dump into - Dovecot handles large volumes of mail quite effectively, for instance. The mails would get stored in Maildir folders, so you can migrate or hand search if you need to as well.

      The only downside is finding an IMAP client that will let you work with it without trying to make a local copy the moment you connect. (Mulberry is good, but hasn't been updated in ages. Or you can set up a webmail client on the 'server' box.)

      --
      'Sensible' is a curse word.
    2. Re:MailStore Home is the Answer by MightyYar · · Score: 1

      Yes - almost any email client (from the last, what, 20 years?) can handle IMAP. Fire up the old mail clients that produced each archive format and drag everything over to the IMAP server. You could even drag it all over to Gmail.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    3. Re:MailStore Home is the Answer by Zak3056 · · Score: 1

      +1 for Mailstore, though we use the enterprise version and not the personal version. We had A LOT of resistance when we first deployed, but we managed to get all email into a single repository and get rid of all the damned PST files people had accumulated over the years. Resistance faded after a couple of weeks, and people are generally happy with it now.

      --
      What part of "shall not be infringed" is so hard to understand?
    4. Re:MailStore Home is the Answer by Jack+Griffin · · Score: 0

      Microsoft solved this problem 15 years ago (Outlook and PSTs). But this is Slashdot and we hate Microsoft, so must try to find and Open source solution for the same problem, even if it's clunky as fuck and hard to understand.

    5. Re:MailStore Home is the Answer by Daniel_Staal · · Score: 2

      PSTs have a history of getting corrupted and having you lose everything in them - and also have some issues with going to large numbers of files per PST. But it's a solution.

      However, it's more complicated than dumping into an IMAP folder for the original requester (as everything would have to be imported into Outlook), and it costs more.

      But this isn't particularly clunky or hard to understand - set up a IMAP mail server (like any other, using common and well-documented tools) and transfer the mail to it. (Using the tools of whatever mail service they are in at the moment.) Done. Now you can access it with just about any email program out there - including Outlook, if you so desire.

      --
      'Sensible' is a curse word.
    6. Re:MailStore Home is the Answer by bigjosh · · Score: 1

      Agreed. I did a lot of work to export my multi-multi gigabyte email archive into a massive tree of RFC822 EML files and then come up with ways to sort and search them- but in the end I almost always just fire up MailStore Home to find an old email. I like having all the original EML files around for the very rare case when I need to do something special like searching for a string inside a header, but for normal stuff MailStore is great.

    7. Re:MailStore Home is the Answer by aaronb1138 · · Score: 1

      I was going to suggest running a pirate copy of Exchange in a VM so he could get the online functionality.

      As for the remarks about the PST corrupting... rarely happens in Outlook 2010+, never if the PST is static / not connected to Exchange (OSTs have issues on Outlook 2013 with repeated network adapter handoffs between wired and wifi). He needs one golden backup of the PST and it will be solid.

      Frankly, the most robust, mobile, inexpensive, and secure solution is an Outlook.com account used as an archive + index. Free, minimal data mining compared to GMail, plenty of space, accessible from browser or any client. If he wants absolutely zero data mining, go paid with O365 and add the legal/compliance archive to have a second paid backup.

    8. Re:MailStore Home is the Answer by Anonymous Coward · · Score: 0

      I need to talk to those folks - it is skipping/ignoring some messages on a brand new archive and I don't know why...until I have it figured, I wouldn't recommend it....archive should be everything - it isn't like I keep duplicate messages!

    9. Re:MailStore Home is the Answer by davester666 · · Score: 2

      FOIA requests to the NSA to access them. You don't need to do anything to archive, and storage is free. Only have to pay for access.

      --
      Sleep your way to a whiter smile...date a dentist!
    10. Re:MailStore Home is the Answer by Jack+Griffin · · Score: 1

      PSTs have a history of getting corrupted and having you lose everything in them

      Citation? Sure I've seen plenty of corruptions, from the file not being dismounted correctly, but Outlook has a built-in PST repair tool which fixes this effortlessly.
      Also if your data is precious, then keep a back up. PSTs are an archive so shouldn't change. Keeping two copies of each is trivial.

    11. Re:MailStore Home is the Answer by nv2r · · Score: 1

      +1 to e-mail server. My father receives scans of documents every day, and since his client is Thunderbird, from time to time the size of local mailbox was exceeded (2 to 4 Gb of mailbox file was too small). I set up HMailServer (Free, Windows-based) with remote IMAP for him. His Thunderbird deletes all local copies older than two months, and IMAP server is used to access the archives (the IMAP accounts in Thunderbird are set to not to download and store local e-mail). For more "cloud-like" solutions, web-based mail client can be used.

    12. Re:MailStore Home is the Answer by Anonymous Coward · · Score: 0

      True but let me say that in Outlook 365 AutoArchive function is gone. No comment on this.

    13. Re:MailStore Home is the Answer by Joce640k · · Score: 1

      Microsoft solved this problem 15 years ago (Outlook and PSTs).

      Huh?

      I just had to migrate a bunch of Outlook mail for people who were moving from XP to Windows 10. "Solved" isn't a word I'd use to describe the convoluted process needed to do it.

      --
      No sig today...
    14. Re:MailStore Home is the Answer by Jack+Griffin · · Score: 1

      You're not clear on your problem, are you just upgrading from XP to Win10, or also upgrading Outlook/Exchange?
      It's just a guess, but I suspect your problems are less related to accessing a PST and more to do with all the other stuff that comes with OS/App version migrations.

    15. Re:MailStore Home is the Answer by Thumper_SVX · · Score: 1

      Functionally there's not a lot of need, though the database search features of Exchange are kind of nice.

      Myself, I actually use Zimbra which is open source and free for personal use. I have that in a VM on my home server and connect using IMAP and when on the road I can still access it via the web. It uses Postfix for email on the back end with a MySQL database that contains all the mail metadata. Yeah, Zimbra uses Java heavily which kind of sucks but it's really not too bad. As of today I have email going back to 1998 or so in my Zimbra archive and the entire VM eats up ~20GB of hard drive space give or take. It's also still a live email server so I receive new mail all the time. For a client I can use any IMAP client I like or hit up the web interface. I can even use Z-Push (also open-source) to connect it up to my phone using ActiveSync if I choose, though in truth I managed to break that about a year ago and just haven't gotten around to fixing it yet.

      I don't publish any IMAP ports on the public Internet BTW... there are in fact zero open ports on that system... technically. I have an OpenVPN network set up among my personal computers, and if I'm not on a personal computer I have my own web server on a Linode that uses NGINX to proxy back to my Zimbra server across that tunnel. The SMTP data coming in hits a Mailcleaner VM that I have set up as well that does all my mail filtering before being passed to Zimbra.

      Is this all overkill? Oh hell yes; but I do this sort of stuff for fun anyway and doing all of this taught me a bunch of skills. Bonus; on another VM I also host OwnCloud that's proxied through NGINX in the same way as Zimbra is so I have ~300GB of data I can selectively sync to my various computers... and I host email and OwnCloud for my son and girlfriend as well. Backups are all done through s3cmd to push the data to S3 where my S3 account is set to archive to Glacier after 24 hours. I think my entire backup of critical data costs me maybe $10 a month... then I pay $20 for my Linode. But all of that is sunk cost anyway because I'm using it for other stuff anyway. Plus, there's the learning aspect which is really valuable to me.

    16. Re: MailStore Home is the Answer by Anonymous Coward · · Score: 0

      A simple mbox or mairdir format would have had zero issues when upgrading the client. PST is horrible.

    17. Re: MailStore Home is the Answer by Anonymous Coward · · Score: 0

      Why not something like Postfix, Cyrus-IMAP and Roundcube? Kolab packages all that up quite nicely and gives you filtering, virus scanning and spam protection as well - all totally free and open source.

  2. hoarding mentality by Khashishi · · Score: 1, Insightful

    Sure, they might be useful at some point, but do you really need your emails from 20 years ago? Life is temporary. All things decay. Attachment causes suffering.

    1. Re:hoarding mentality by slazzy · · Score: 2

      That's true, but from the sounds of it this is for business reasons. For business it's probably more important than if it was personal.

      --
      Website Just Down For Me? Find out
    2. Re:hoarding mentality by Anonymous Coward · · Score: 0

      I thought hate leads to suffering?

    3. Re:hoarding mentality by Anonymous Coward · · Score: 2, Funny

      In that case, everyone at the company should print out each email they receive.

    4. Re:hoarding mentality by Anonymous Coward · · Score: 0

      I thought hate leads to suffering?

      I believe he's referring to Buddhism, not Star Wars.

    5. Re:hoarding mentality by sinij · · Score: 1, Funny

      I always print out my TPS reports that Bob sends out.

    6. Re:hoarding mentality by Anonymous Coward · · Score: 0

      I use this method.

      Done/notdone /keep

      In the keep folder is maybe 20-30 emails over 20 years. I get probably 200 a day. I also regularly purge the keep folder.

    7. Re:hoarding mentality by lord_mike · · Score: 4, Informative

      Holding your business emails too long is a liability risk... they are subject to discovery in the case of a lawsuit. Most businesses have a limited email retention policy for that very reason.

    8. Re:hoarding mentality by Anonymous Coward · · Score: 0

      Hate, attachment, Star Wars, Buddhism, a yoda-doll enema; what's the difference?

    9. Re:hoarding mentality by DNS-and-BIND · · Score: 4, Insightful

      I friggin' hate people who, on an Ask Slashdot, completely fail to answer the question and say something that has nothing to do with the topic at hand.

      And yes, I am aware of the irony of posting a comment like this to criticize one, so you needn't bother pointing that out.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    10. Re:hoarding mentality by the_webmaestro · · Score: 1

      Which Bob?

    11. Re:hoarding mentality by Anonymous Coward · · Score: 1

      You don't need them until you need them. I've had more than one occasion where my finding an email in a haystack saved my bacon.

      A better associated question is why vastly more powerful search capabilities aren't built in to pretty near all email programs and services. It seems it is immeasurably harder to do good searching now than it was 30 years ago on unix boxes. Yes, the volume of storage has grown, but so has the memory and the cpu power. Everything is now "in the cloud" and almost nothing can be found except, possibly, by google and the NSA. Sad state of affairs.

    12. Re:hoarding mentality by Frobnicator · · Score: 2

      That's true, but from the sounds of it this is for business reasons. For business it's probably more important than if it was personal.

      For business it can be even more important to clean things out. Having old things on hand is more likely to work against you than work in your favor. Yes, some documents need to be carefully retained and kept on file for the life of the business and the best place to do that is not in email. Most of these communications should be disposed of on a regular basis.

      Most business lawyers I've worked with have strongly recommended a data retention policy to dump email regularly and always before the 3-month government communications free-for-all. Most work places I've been at have had 3 months before automatic forced deletion of email. If it is important it does not belong in email. Unread email is treated differently under the law, and currently any email that is six months old or older and marked as unread can be opened and read by federal agencies without a warrant. Similarly, transitory communications like chat logs and even file transfers through services like DropBox are easily accessed by government's prying eyes. Don't keep data there because lots of organizations, including government agencies, corporate spies, and opposition lawyers, can all get access to it.

      If it is important it gets printed and filed, or moved to electronic documents that are properly archived, or otherwise moved to a better location than email. Paper files and electronic archives get properly maintained with their own data retention policies. Contracts and agreements made get filed with dates.

      There is no good reason to keep 25 years of email.

      Print out and properly file what is important. Agreements and important documents get filed. Properly file and archive personal mementos (not in email) or put them in a scrap book.

      --
      //TODO: Think of witty sig statement
    13. Re:hoarding mentality by Anonymous Coward · · Score: 0

      Just ask Hillary

    14. Re:hoarding mentality by MightyYar · · Score: 1, Insightful

      Email is the new "box of letters". It can be fun and sentimental to go through old correspondence. When you die, your kids will have fun reading your old emails if they can figure out your devious passwords.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    15. Re:hoarding mentality by vux984 · · Score: 5, Insightful

      Holding your business emails too long is a liability risk..

      I was just asked to recover email from the late 90s as part of a means to prove we had prior art on a patent that was being asserted against us. The email history included draft drawings, work orders to a manufacturer requesting customizations to our manufacturing equipment, invoices and negotiations with customers to work with it. etc. All with a clearly documented timeline that could be verified with multiple 3rd parties if it came to a court situation.

      This sword clearly cuts both ways.

    16. Re:hoarding mentality by jonnyj · · Score: 5, Insightful

      There is no good reason to keep 25 years of email.

      There is no good reason to assume that your needs are the same as those of others.

    17. Re:hoarding mentality by swb · · Score: 5, Interesting

      I had a client who insisted he needed to keep every email forever. I thought he was full of shit until he explained to me why.

      He works as a vendor rep, helping them sell shit to a well-known Fortune 50 retailer.

      As it turns out, this Fortune 50 company periodically audits years old (like sometimes 5+ years) invoices and receiving information and arbitrarily decides "we just realized that shipment you sent us in 2009 was short, but we paid the invoice in full. So we're going to subtract the overpayment -- plus interest -- from the current amount we owe you."

      Part of this guy's job was the ability to get the shipping/receiving info as it happens, and the old email lets him present info that basically says "you said it was a complete shipment in 2009, so no deductions".

      What I found kind of amazing was that somehow this retroactive auditing is considered acceptable. My guess is vendors are just expected to eat it or not get their product on the shelves.

    18. Re:hoarding mentality by malditaenvidia · · Score: 2

      Bob from accounting.

    19. Re:hoarding mentality by MobyDisk · · Score: 1

      This is common practice, but hiding evidence of past crimes is a scary reason to delete old emails.

    20. Re:hoarding mentality by Anonymous Coward · · Score: 1, Informative

      This sword clearly cuts both ways.

      And that is why sane companies come up with sane records retention schedules. Minutes from the fun committee meeting? Yeah, toss 'em after 5 years. Design and patent work? 25 years, plus additional terms if required.

      Some of the things we have at the major Canadian bank I work at have life times ranging from 5 years to 20 years, to permanent. It's a PITA to setup because people hate changing their routines, but it absolutely comes in handy in exactly those types of cases.

    21. Re:hoarding mentality by ole_timer · · Score: 1

      we live in a litigious society. keep all emails. never say anything you don't want repeated. we'll always have criminals.

      --
      nothing to see here - move along
    22. Re:hoarding mentality by ole_timer · · Score: 0

      if you commit crimes there's no excuse. most criminals are like dogs, they return to their vomit.

      --
      nothing to see here - move along
    23. Re:hoarding mentality by Anonymous Coward · · Score: 0

      Holding your business emails too long is a liability risk..

      I was just asked to recover email from the late 90s as part of a means to prove we had prior art on a patent that was being asserted against us. The email history included draft drawings, work orders to a manufacturer requesting customizations to our manufacturing equipment, invoices and negotiations with customers to work with it. etc. All with a clearly documented timeline that could be verified with multiple 3rd parties if it came to a court situation.

      This sword clearly cuts both ways.

      That's why I use my "home server" for email. If it's good enough for high ranking politicians to send top secret emails then it's good enough for me. I also archive it to my Lois Lerner hard drive. That's the best of both worlds because it's available if you need it, yet suddenly disappears in an "unrecoverable" way if subpoena'd.

    24. Re: hoarding mentality by Anonymous Coward · · Score: 0

      Email retention rules aren't normally to hide evidence of crimes. In the banking industry they are to protect the bank from things that were common and legal when being discussed but at some point in the future the government's position on it changes and they get retroactively charged with a crime for just doing business 10 years before.

    25. Re:hoarding mentality by jon3k · · Score: 1

      I use archiving in gmail. Out of sight, but I can dig them up if I ever need them. Best of both worlds.

    26. Re:hoarding mentality by Krishnoid · · Score: 1

      My guess is vendors are just expected to eat it or not get their product on the shelves.

      I guess they don't have enough clout to write a dispute length (e.g., up to 1 year) into their contract with the retailer. Although I suppose if they have to fall back to the contract in the event of a dispute, that retailer may not use them much longer.

    27. Re:hoarding mentality by TheCarp · · Score: 1

      I don't know that this is really a case for storing email forever. Yes that is true, but it also means that decades of email are available for searching and can be required to be searched or given up.

      The reality is, design docs should be saved. These sorts of notes and work should be saved. Retaining emails may provide a solution, but that doesn't mean it is a good solution or the right one. I would submit the real issue here is that nobody saved the documents; but instead relied on email to save it for them.

      Honestly, I think sometimes its better to fail, because crisis percipitates change. all sucess in a case like this does is reinforce that lazily leaving important docs in email is ok.

      --
      "I opened my eyes, and everything went dark again"
    28. Re:hoarding mentality by Anonymous Coward · · Score: 0

      bullshit.

    29. Re:hoarding mentality by cyberchondriac · · Score: 1

      Surely your company would have other evidence than emails to support your prior art? Didn't your company apply for a patent? If so but you got denied, you have no case anyway. If you got approved, then the patent is your evidence. If you never applied, well.. I dunno if the patent office retains those records, probably not. What about the actual drafts and invoices themselves? Are they stored on a data server, or printed out and kept on file? I would never use an email server as a data repository, though it's convenient to have the date stamps and all.

      --

      Look back up at my post, now look back down, you're on the Internet. Now look back up. I'm a signature.
    30. Re:hoarding mentality by Anonymous Coward · · Score: 0

      Much colon cleansing I sense in you.

    31. Re:hoarding mentality by Anonymous Coward · · Score: 0

      So where are your instructions on "properly filing" what is "properly filing" maybe having something with a subject, date stamp, as well as a header of who it is from, who the file is to, and then what it contains? I guess someone could make millions if they came up with a piece of software that could do all of this for a person.

      Another nice option to add to the above requirements would be a way to search all of these items that are "properly filed" and maybe a way to sort them.

      You really have me thinking

    32. Re:hoarding mentality by l3v1 · · Score: 1

      "There is no good reason to keep 25 years of email. "

      Of course there is. You should know, some people actually comunicate about important matters, and keeping records can oftentimes be very beneficial.

      My moving window for keeping all e-mails from all e-mail accounts is 10 years, however, I also have some mails dating back as far as '95. Thing is, you can never know when and what you could need, and given the almost 0 long term cost of storing those e-mails, it's better to have them than not to have them.

      --
      I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
    33. Re:hoarding mentality by Anonymous Coward · · Score: 0

      Was it EMC? I bet it was EMC.

    34. Re:hoarding mentality by Anonymous Coward · · Score: 0

      I dunno but I can tell you our practices and make it sound authoritative - it's the Slashdot way! (KGIII - not logging in.)

      Regardless of your needs, you don't store it in email. You export it to the appropriate storage area so that it can be indexed, searched, and backed up with the appropriate, timely, backup solutions that should be set and configured for each type of storage with varied degrees of redundancy. You don't store it in your email client and on your email server. That's a horrible idea.

      There. That's my authoritative statement on the fact and it's right and I will accept no argument - even if you're right. And you're not! Hrumf! (That was our policy. So it must work for everyone.)

    35. Re:hoarding mentality by Anonymous Coward · · Score: 0

      This is still KGIII. We used the mighty CTRL + S and had our own search function until we bough a search appliance (which was not as good as advertised). You can save in eml, txt, or HTML format with most clients that I'm familiar with. You can even SHIFT + LEFT CLICK and select multiples and save them all at the same time. They even have a nice title and a date for when you saved them. Any decent search application can index them. Then have a sane automated deletion process with a short-term backup in place. We were doing this in the 1990s, very early 90s, and I assume that it's just as easy today.

      What's wrong with you kids? :D (I seriously need a nap.)

    36. Re:hoarding mentality by swb · · Score: 1

      I think that was the risk.

      From what I could tell, the products they repped were not like major name brands owned by other Fortune 50 (or even 100, or maybe even 500) companies, so it was the epitome of unequal bargaining power.

      It really was a case of either being able to dispute it effectively with documentation, eat the costs, or complain and lose a major chunk of your retail distribution.

      If it had been a vendor of equal weight to the retailer, then it gets a lot harder for the retailer.

    37. Re:hoarding mentality by LinuxIsGarbage · · Score: 1

      Most work places I've been at have had 3 months before automatic forced deletion of email.

      We have one of those. A big PITA. 1 year would be much more manageable. I backup the local sync'd cache every three months. If I need to dig back in old emails, I go offline, restore the old cache, and retrieve what I need. On more that one occasion I've had to ask customers or suppliers "Do you remember x many months or years ago we talked about y? Do you still have that email?"

      Luckily they are as much of a pack rat and can produce the email.

    38. Re:hoarding mentality by LinuxIsGarbage · · Score: 1

      My moving window for keeping all e-mails from all e-mail accounts is 10 years, however, I also have some mails dating back as far as '95. Thing is, you can never know when and what you could need, and given the almost 0 long term cost of storing those e-mails, it's better to have them than not to have them.

      While the long term cost is low, I think it's important to make a clear mark as to what's active, and what's archived. I use calendar years, and early in the year, archive digital photos, "my documents", etc from the previous year. That way I know the "2014" data set is fixed, and is the same on all backups. Then I only need to worry about stuff active in the current year. If I need to dig back to find old information I can, but it's not cluttering up my current workspace / hard drive / etc.

    39. Re:hoarding mentality by Frobnicator · · Score: 1

      "There is no good reason to keep 25 years of email. "

      Of course there is. You should know, some people actually comunicate about important matters, and keeping records can oftentimes be very beneficial.

      Seems like you ignored the rest of the post, fixating on that one line.

      I am not saying to dump important documents. I am saying a hodgepodge of email systems is a terrible archival method.

      If there are communications that need to be preserved, preserve them properly.

      As I pointed out, contracts should be preserved properly, generally meaning a hard copy printed out and kept in a physical file folder, or electronic copies should be properly archived properly as electronic documents. Mementos should be preserved, ether electronically or as hard copies, in an appropriate other container. Business-related documents should be classified and sorted and archived correctly for their type of documents.

      Many types of documents have useful lifetimes that your lawyer will be quick to explain.Three months and six months are both critical windows when it comes to government access, government agencies can swoop in and demand copies of any email or other communication not marked as 'read' at that age, and do so without a warrant or court order. Three years or four years is the legal limit for most documents to be used in civil suits, if the document is older than that then generally it cannot be used as it is no longer timely. There are very few other documents you will receive in email that you would need to keep longer than that. If you do, keep them in a more permanent form than email.

      It is surprising to me that so many /. users are missing the point of data retention policies. It seems when government agencies come up data retention policies and data destruction at the end of the limits come up quickly. In stories about businesses with decades of data people quickly jump on them for not destroying it in a timely manner, how job applications and similar records shouldn't be kept forever. But when it comes to a person's own policy, ain't nobody got time for that.

      Keep what needs to be kept, and keep it properly. Discard what does not need to be kept. 25 years with 500,000 emails is about 24.9 months and 499,950 too many. Sort that crap out.

      --
      //TODO: Think of witty sig statement
    40. Re:hoarding mentality by Jack+Griffin · · Score: 1

      "we just realized that shipment you sent us in 2009 was short, but we paid the invoice in full. So we're going to subtract the overpayment -- plus interest -- from the current amount we owe you."

      Can't see this holding up in court. Having been to court a few times, the longer the period between the event, and you raising the dispute, the much more difficult your chances are of convincing anyone.
      And as with science, the law also recognise the burden of proof is with the claimant, so good luck proving you had one box missing from your delivery 6 years ago, but are only raising it just now.

    41. Re:hoarding mentality by vux984 · · Score: 1

      Surely your company would have other evidence than emails to support your prior art?

      Sure it does. But email has the advantage of being time stamped, with copies sent to 3rd parties.

      The dates on purely internal digital documents are much harder to establish if their integrity is challenged.

      Didn't your company apply for a patent?

      No. We felt (and still feel) that the 'innovation' was obvious, and that the patent has no merit.

      But arguing that is expensive and time consuming and risky. If we can demonstrate that we were making, selling, and using the 'invention' well before they 'invented' it, then what exactly did they invent? And the patent falls apart.

      What about the actual drafts and invoices themselves?

      Electronic records. And the invoices only show that we sold the product, not what the product actually was. They are an important part of the puzzle but not a complete picture. The original cad drawing files and other drawings... sure they exist, but establishing a date that can withstand a challenge is much harder. When they are referenced and even attached in emails though that gives a strong timeline; and again with references to external 3rd parties who can testify to that timeline if needed.

      I would never use an email server as a data repository, though it's convenient to have the date stamps and all.

      The email was old enough that it wasn't on the live server. Hence I was tasked with pulling it from actual backups.

      If the email server had room for it all though, I'd have no issue with having it available live too. As it is, many of the company principles have mailboxes exceeding 30GB going back a decade or so.

    42. Re:hoarding mentality by vux984 · · Score: 1

      I don't know that this is really a case for storing email forever. Yes that is true, but it also means that decades of email are available for searching and can be required to be searched or given up.

      Yes, I agreed, it cuts both ways.

      The reality is, design docs should be saved. These sorts of notes and work should be saved. Retaining emails may provide a solution,

      Email provides a timeline that is much harder to forge, and which can be verified and testified to by external 3rd parties who were referenced and/or copied on various messages.

      Files in a folder somewhere... 5 minutes and anyone here could make them say they were written whatever date we wanted.

      I would submit the real issue here is that nobody saved the documents; but instead relied on email to save it for them.

      We had documents the documents. But email is what ties them all together, and provides a strong evidence of a timeline. Documents are shown to be referenced on a given date, in a given context, etc. External parties received them on a given date, or even if they weren't attached they were referenced. And those external parties can themselves be asked to confirm the record.

      I would submit the real issue here is that nobody saved the documents; but instead relied on email to save it for them.

      That seems an argument of semantics. Much of the important documentation in question is the content of the email itself, not the attachments. You assume we didn't have the attachments, but that's not the case here. The case here is that the email "proves" the various attachments existed when we said the existed, that copies sent to external parties contained the contents we claim they contained. And its all mixed with personal records, external contacts, even mentions of then-current-events all which combine to provide a credibility and verifiability to the claims than simply some digital documents that we claim were created on such and such a date.

      For example, if I say, here is a word file from april 2002 showing the design. That's worthless. I could have made it yesterday. It could have contained anything in 2002. etc etc etc. But here is an email of THAT word file, with a copy sent to the manufacturer discussing it, and also mentioning the hockey game. It all checks out. And you can go find the guy working for that manufacturer -- they might still have their own copy of that exact SAME email. And THAT is evidence that will stand up in court.

    43. Re:hoarding mentality by mcswell · · Score: 1

      I wish I still had the email from my first Internet (or maybe it was Arpanet) purchase back in 1986. I think it would be worth something as an antique. Of course it wasn't the kind of Internet purchase you'd make now. Some guy was advertising a used portable dishwasher on usenet.forsale.washington.kingcounty.net (or something like that). His wife had probably been telling him he was crazy, he should take out an ad in one of the paper want-ads they had back then. And I had been telling my wife she didn't need a dishwasher, she married one, but somehow she didn't believe me. Turned out the guy was maybe ten or twenty miles from us. We completed the deal in person (cash or check, don't remember), and I took it home. I suspect both our wives were astonished at what their geeky husbands had done. A year or two later, we shipped it to Colombia when we moved there. A few years after that, we returned to the US, and sold it for probably what we had paid plus shipping. So yes, I'd like to have that guy's usenet posting, and my email to him.

    44. Re:hoarding mentality by houstonbofh · · Score: 1

      Unread email is treated differently under the law, and currently any email that is six months old or older and marked as unread can be opened and read by federal agencies without a warrant.

      And how do they get to it without a warrant? My server is behind locked doors, and I have the keys...

    45. Re:hoarding mentality by thegarbz · · Score: 1

      you sent us in 2009 was short, but we paid the invoice in full

      Sounds like a company which is a SAP partner.

    46. Re:hoarding mentality by Tukz · · Score: 1

      If you are stapling the reports, please do remember that's my stapler. I would like it back, please.

      --
      - Don't do what I do, it's probably not healthy nor safe. -
    47. Re:hoarding mentality by swb · · Score: 1

      You're a company with $100 million in annual revenue for whom a Fortune 50 retailer represents some significant percentage of your total product distribution and sales.

      They pull some dubious move and you sue them.

      They easily determine the Chinese manufacturer of your product, obtain said product with their private label on them and drop your product.

      Now you're a $75 million company.

      That $50k or whatever in deductions from a past year audit you just saved suddenly isn't a very good stance, outside of its moral value.

    48. Re: hoarding mentality by MobyDisk · · Score: 1

      I can only speak for the United States, but here, the constitution explicitly forbids "ex post facto" laws that make something a crime retroactively. Can you cite an example where this has happened? Perhaps this precaution is to protect banks from such activity in other countries?

    49. Re:hoarding mentality by Gnomaana · · Score: 1

      Just the act of saying that is the reason to delete old email can be construed as obstruction of justice. If a lawyer sometime down the road tries to subpoena the email and discovers the sole reason it is no longer available was to make it unavailable for discovery, he can add obstruction to the list of charges.

    50. Re:hoarding mentality by cyberchondriac · · Score: 1

      Well, good luck in any case. Are these tape backups from the late '90s?

      --

      Look back up at my post, now look back down, you're on the Internet. Now look back up. I'm a signature.
    51. Re:hoarding mentality by vux984 · · Score: 1

      Are these tape backups from the late '90s?

      Nope. They live on spinning rust. Seriously... the entire thing is well under a terabyte. Its not exactly hard to keep it around.

    52. Re:hoarding mentality by dfsmith · · Score: 1

      Spinning rust (particulate iron oxide) hard disk drives were obsolete by the early 1990s—about 40MB was the cut-off for that disk technology.

    53. Re:hoarding mentality by Jack+Griffin · · Score: 1

      You're a company with $100 million in annual revenue for whom a Fortune 50 retailer represents some significant percentage of your total product distribution and sales.

      They pull some dubious move and you sue them.

      They easily determine the Chinese manufacturer of your product, obtain said product with their private label on them and drop your product.

      Now you're a $75 million company.

      That $50k or whatever in deductions from a past year audit you just saved suddenly isn't a very good stance, outside of its moral value.

      You've been watching too many movies.

    54. Re:hoarding mentality by Frobnicator · · Score: 1

      And how do they get to it without a warrant?

      Under the Stored Communications Act, with an administrative subpoena that does not require any probable cause statement or review by a judge. No warrant required, but full legal force.

      If you maintain it on your own you can fight it if you want.

      If they give it to a third party like your ISP or some other service provider, they might fight it, or might not. Choose your partners carefully.

      --
      //TODO: Think of witty sig statement
    55. Re:hoarding mentality by houstonbofh · · Score: 1

      If you maintain it on your own you can fight it if you want.

      Actually, in these cases, they don't try. They naver want people to know when they are data trolling them.

  3. Notmuch by Anonymous Coward · · Score: 0

    A half a million messages? That's not much. You should use notmuch.

    notmuchmail.org

  4. Send it to Hillary.. by Anonymous Coward · · Score: 0

    She will hold onto it for you!

    1. Re:Send it to Hillary.. by Tablizer · · Score: 1

      Her server actually lasted longer than the one she was "supposed to" use. Contrary to popular myth, the office server was not designed for high-security or anything else special. It probably had lowest-bidder quality, and backups either failed or were lost. (A separate procedure was used for classified stuff.)

  5. Email? by Anonymous Coward · · Score: 0

    rm -rf *

  6. PDF by Anonymous Coward · · Score: 0

    Ive always either manually PDF'd the ones I felt were worthwhile, or used an archival utility that did it for me. They go in hierarchal folders based on time/date.
    I'm a Mac user, so spotlight gives me enough search functionality. Also, I can use various PDF utilities to join together various PDF that should stay connected.

    1. Re:PDF by Anonymous Coward · · Score: 0

      FFS..what a painfully awful process...if I were doing that I would hope one of those idiot drone aviators would fly into my head.

  7. Store as local maildir. by t551 · · Score: 2

    `OfflineImap` (for fetching into a local maildir), then `mu` for indexing and searching.

    As for converting your already-archived mail into maildir format, that's a little more tricky. Once they're in maildir format, you can just use `tar` to compress the ones you don't currently need to access.

    1. Re:Store as local maildir. by Anonymous Coward · · Score: 0

      there are mbox2maildir utilities that will split the files into maildir format

  8. Mairix for local search by logicTrAp · · Score: 1

    mairix is another good solution for searching them, once you've got them in local mbox/mh/maildir spools. I think back when I was converting to maildir I scripted mutt to copy them in, but it's obviously harder if you've got them in proprietary formats.

  9. IMAP by Anonymous Coward · · Score: 0

    You already have your solution, store all the email on an IMAP server then connect to it with whatever client you desire and do your searches. You can connect a client such as thunderbird to multiple accounts and copy your messages to the 1 IMAP server. Thunderbird's archive feature just copies your emails to date based folders for organization purposes, it's all still on the IMAP server. IMAP is client independent so if your current client is discontinued you just pickup another IMAP client and keep going. You'll just need to keep backups of the IMAP server / data and migrate it as updates are needed.

    Now, I would encourage you to question why you really need 25 year old emails. Delete junk you really don't need.
     

    1. Re:IMAP by Sowelu · · Score: 1

      I go through >10 year old emails all the time. "Hey, I remember talking to a professor about this algorithm." "Where did I go camping that year?" "What was my order number for that game I bought ages and ages ago, since they accept them for free copies of the remake?" "I'm trying to gather information on something, but the person I talked to has long since died and their site isn't on archive.org." It's only going to happen more and more often for older and older stuff.

      Email is also really convenient for backing up work that's under the ten megabyte range...manuscripts, source code, etc. If someone doesn't have a proper backup system or it's not easy to use from the system they're on at the moment, emailing something to themselves is quick and easy. Old work gets rescued from floppies all the time, and surely there's some fascinating, ancient projects backed up in emails that people have long since forgotten about.

    2. Re:IMAP by LinuxIsGarbage · · Score: 1

      I go through >10 year old emails all the time. "Hey, I remember talking to a professor about this algorithm." "Where did I go camping that year?" "What was my order number for that game I bought ages and ages ago, since they accept them for free copies of the remake?" "I'm trying to gather information on something, but the person I talked to has long since died and their site isn't on archive.org." It's only going to happen more and more often for older and older stuff.

      Agree. Or I'm like "What was the flight I took last year from JFK to LAX? It worked good with my connections". Even recently for work I noticed that when I ordered software from one supplier, I got an email, copied to the local vendor, with the serial number. I had another package we bought from them (by someone else that since left) where I could track down the PO, but not the serial number. I emailed the guy copied on my email, and he could dig up the copy he was CC'd on.

      Email is also really convenient for backing up work that's under the ten megabyte range...manuscripts, source code, etc. If someone doesn't have a proper backup system or it's not easy to use from the system they're on at the moment, emailing something to themselves is quick and easy.

      Critical University term end reports I remember regularly emailing copies to myself. If my computer exploded, or I accidentally overwrote everything and hit save, I could restore to a known good copy.

      Old work gets rescued from floppies all the time, and surely there's some fascinating, ancient projects backed up in emails that people have long since forgotten about.

      I'm surprised about work getting rescued from floppies. Back when floppies were a thing, I was shocked at how many people relied on them to hold ALL their school work. Given they were the most unreliable storage format ever invented (and at the time, '99 or so, hard drives were relatively reliable), very frequently people would lose an entire term's worth of work. I remember once I was able to recover the auto-save copy off a nearly corrupt floppy of someone's large term end project. They were almost in tears.

  10. The dark side of the force is powerful.... by Anonymous Coward · · Score: 0

    I have Personal Storage Table (.pst) files from the late 90's that open in Outlook 2016, and in a free viewer thing (systools?) I have at home with no issues...great folder support, tagging support, sorting, filtering, file system search speed. etc. Owned my MSFT, but openly published and free forever licensing

    1. Re:The dark side of the force is powerful.... by LinuxIsGarbage · · Score: 1

      The early version of PST files has a file limit of something like 2GB, at which point the whole database has a risk of becoming corrupt. So it is worth breaking it down into bitesize chunks (yearly?) that are easier to manage and archive.

  11. Just keep them like any other email by mattventura · · Score: 1

    I just put them in a mail folder. Make a new email account for them if you want. Then you still get the benefit of being able to access them on-demand anywhere through IMAP.

  12. Mailarchiva by NfoCipher · · Score: 2

    https://www.mailarchiva.com/
    Works pretty well.

    --
    I'm sorry, I can't hear you over the sound of how awesome I am.
    1. Re:Mailarchiva by acoustix · · Score: 1

      I use the enterprise version of Mailarchiva at my work. It is a solid product.

      --
      "A plan fiendishly clever in its intricacies"- Homer Simpson
  13. importexport tools - Outlook - eM CLient by dejitaru · · Score: 1

    I know that Thunderbird has a plug-in that supports exporting emails messages into .eml files were you can have the filename show date and subject and such. But it's not that easy to use. https://addons.mozilla.org/en-... I have personally been archiving with two programs, Outlook and eM client. Outlook because it provides a .PST file where you have a database that's easy to search through (in outlook) plus I can archive calendar, contacts, and tasks. eM Client, which is free to use for two email accounts (at a time, you can always delete and add another). It's like the Thunderbird plugin (exports to .eml) above but much more intuitive and works really well.

  14. This is digital hoarding by Anonymous Coward · · Score: 0

    A mental health practicionner is your best option here.

  15. Mail Consolidation IMAP by foxalopex · · Score: 3, Informative

    I remember having a similar problem years ago with E-mail in several systems and getting annoyed that everything was in different formats in different E-mail clients. I fixed the problem by setting up my own IMAP server. An IMAP server is a mail server that's compatible with virtually ALL E-mail clients but what's important about them is they act as mail stores unlike POP3 so you can upload mail to an IMAP server without screwing up formatting or anything. Then once you get all your E-mail up to your IMAP server, you can chose to just store it there (just remember to back it up now and then) or you can redownload it all into a Mail folder on ThunderBird (Backup Thunderbird's Mail store folder for protection) ThunderBird probably isn't going away in the foreseeable future but if it does, sometime down the road you can reuse your IMAP server to transfer it to another mail client.

    1. Re:Mail Consolidation IMAP by BitZtream · · Score: 0

      If the dude is asking on slashdot how to store his email ... that he's been using 'since the 90s' ... trying to get him to understand imap now is probably a waste of time.

      If he was capable of setting up an imap server, he never would have asked the question.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    2. Re:Mail Consolidation IMAP by angel'o'sphere · · Score: 1

      Thanx for your enlightment.

      What exactly has setting up an IMAP server to do with eMail archives? Making them searchable etc. ??

      Are you from a different planet where eMail and IMAP works different?

      I'm only reading this thread because I have the same problem, about a million mails. How the fuck do you expect me to get them into an IMAP server? I have them on DISK!!

      And most mails I get, I get via POP. Why should I leave my mails on my providers IMAP server?

      If you want to contribute, then sy something constructive instead of flambaiting the parent poster.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    3. Re:Mail Consolidation IMAP by Goglu · · Score: 1

      After trying to consolidate all my emails on Outlook and then losing a couple of years of archives because of a file corruption, in 2007, I did just that: set up an IMAP server (Dovecot), using MailDir format (which saves each e-mail in its own file). A regular job rsync's everything to another machine for simple backups.

      Everything's was then migrated on a Plug Computer (low performance, but excellent power consumption).

      The whole setup, from installing Debian to having the server running took less than 4 hours, using online guides.

    4. Re:Mail Consolidation IMAP by The_Revelation · · Score: 1

      It should be noted that while virtually ALL E-mail clients are compatible with IMAP, Microsoft Outlook 2010+ doesn't play very well with it at all. You will typically find that, even after making all of the recommended account changes, patching it up to the nines, mail access is always gammy and delayed when talking to the IMAP connector.

      So, if your considering IMAP, perhaps considering dumping Outlook.... or just using Outlook for your calendaring and run the rest of your email out of Thunderbird.


      Alternatively, use online services like gmail. That way you get the best of both worlds and don't have to bother backing up your mail server.

    5. Re:Mail Consolidation IMAP by Anonymous Coward · · Score: 0

      I faced a similar situation. I happen to use Mac Mail currently... looking at switching back to Thunderbird on Linux... Sorry Apple 10.6 was the pinnacle of your OS, been headed down hill for years. Anyways, I host websites for the extended family's business ventures... just a matter of one more host and correcting line ending in a few mbox files import into Thunderbird and push to the IMAP server. Later I switched to Mac and everything just synced in. I could use either offline because they had a cached copy, both have decent searches... I may be rather proficient with grep, but choose a mailbox and type a name into a ui field... heck of a lot easier. And now moving back to Linux looking like a reality... email is the one thing I am confident will just work. Now if I just had a good export tool for my keychain...

    6. Re:Mail Consolidation IMAP by houstonbofh · · Score: 1

      Leave it to Microsoft to fuck up a universal standard that is 20 years old. So did Google, but at least gmail works, even if it does work weirdly...

    7. Re:Mail Consolidation IMAP by Thumper_SVX · · Score: 1

      Become your own provider; set up your own IMAP server either in-house or on a cheap hosted solution like Linode then import your data. If you want to get really complex then use scripting with S3CMD or some other tool so you can now back it all up to S3, then configure your S3 to archive to Glacier after 24 hours or so. Yeah, that means some costs but there are ways of mitigating that too.

      One possibility is have a server at home with all your mail... make it a VM or a PM... whatever. Import the data through IMAP and it's all available.

      I went one step further and grabbed Zimbra which has a MySQL metadata database. Gives me a nice web GUI and really good search capabilities. Stores my mail going back almost 20 years and does it in ~20GB of hard drive space. You can use your own imagination to figure out securing it and stuff like that.

      Of course, there are myriad options. Mailarchiva is really solid too with fantastic search... I've used it at a few small companies for mail archiving and it's brilliant. It all depends (a) how much work you're willing/able to put in, (b) how much money you want to spend and (c) how much functionality you really need. I like my Zimbra setup because it's nominally free (I already have the server, so spinning up a Zimbra VM is effectively zero cost) and gives me loads of functionality. I have an OpenVPN network for my private computers so I can access the IMAP ports, and I have the web interface published on a private web server so I can use it at machines that aren't mine... though typically I don't need to do that because I nearly always have a laptop with me and can use my phone in a pinch.

  16. Mail archives by Todd+Knarr · · Score: 1

    One option might be to set up a local IMAP server on your machine and archive your mail there. Then any mail client that talks IMAP could access it.

    Thunderbird's nice in that it uses the standard maildir format (one file per message, mail folders are just directories under the root of the tree) for it's local copy. Most IMAP servers understand and can use that format so you can just dump a copy of the local mail store into the IMAP server's user mail directory (or if that doesn't work, use the Unix movemail command to suck everything up from the local mail store and send it to the IMAP server) and be set. The message files are text so grepping for content's still an option of last resort. There are database-based solutions that have more options for tagging and searching, but they tend to cost money and once your mail's in them it's more of a headache to get it back out when you want to change software (this is an archive, it's inevitable that your current software will be unsuitable/unavailable at least once before the archive becomes old enough to be irrelevant).

    1. Re:Mail archives by rduke15 · · Score: 1

      Thunderbird's nice in that it uses the standard maildir format (one file per message, mail folders are just directories under the root of the tree)

      Unfortunately, NO it doesn't! Maybe you just mistyped this, or else you are confusing Thunderbird the mail client, with an IMAP server like Dovecot, Courier, or others.

      IMAP servers usually do use the "Maildir" system to store emails: 1 file per mail, which is very nice, and helps a lot with backups.

      Thunderbird, the mail client, stores in mbox format: 1 file per folder. So if you add 1 email to your 2GB folder, that 2GB file will need to be backed up again. But at least, it's a text format, so it's still much better than Outlook's propietary binary .pst files.

      Apple Mail used to use the mbox format. It now uses Maildir.

    2. Re:Mail archives by Todd+Knarr · · Score: 1

      It might be that I'm on Linux instead of Windows, but for me Thunderbird clearly says that the message storage type is "File per message (maildir)" and the directories exactly match the format of the maildir folders Dovecot uses on the server. You can even see the setting in the advanced preferences General tab although it's greyed out by default (the mail.server.default.canChangeStoreType setting probably controls that). I know Thunderbird used to use mbox files, but I've only ever seen it use maildir on Linux.

  17. My very unideal solution by sinij · · Score: 1

    My very unideal solution is to archive individual relevant emails under 'relevant emails' folder as plain text files. Otherwise, I don't retain emails and intentionally purge them. This way, when becomes taboo in near or far future, it won't be easy to dig through my digital trash and establish long-term pattern of 'abuse', allowing me to pretend that I am also outraged at these people still practicing such barbarism. Like not recycling your urine for drinking water. Who doesn't do that in 2035?!

  18. Thunderbird - where's the objection? by Archtech · · Score: 1

    With modern hard drive sizes I don't see the need for compression. Without compression you can use any good free text search tool. I have kept a good proportion of my email since about 1990, and it's all in Thunderbird. (Messages from earlier clients I just emailed to myself en masse).

    Thunderbird has pretty good search capability, but as I am still running on Windows 7 I use Copernic Desktop Search, which has some useful features. (It indexes and searches files, and handles Firefox as well as Thunderbird). With this kind of volume, I do think an indexing tool is better than grep unless you want to have a lot of coffee breaks.

    --
    I am sure that there are many other solipsists out there.
    1. Re:Thunderbird - where's the objection? by I4ko · · Score: 1

      Maildir, either directly connected to a client (e.g. Evolution) or backing up an IMAP server (e.g. Dovecot) is great. I personally prefer the IMAP route. I have close to 20 years of email that way.

    2. Re:Thunderbird - where's the objection? by tlambert · · Score: 1

      Here is the objection to Thunderbird:

      "On December 1, 2015, Mozilla Executive Chairwoman Mitchell Baker announced in a company-wide memo that Thunderbird needs to be uncoupled from Firefox. She referred to Thunderbird as paying a tax on Firefox and said that she does not believe Thunderbird has the potential for "industry-wide impact" that Firefox does."

  19. One Big File? by Tablizer · · Score: 1

    I don't understand why emails are not more often stored as one-file-per-message, with a time-stamp as the start of the file name (YYYY-MM-DD etc.).

    Some file systems are wasteful for lots of small files by padding actual space into large discrete chunks, but they should remedy that rather than stuff all messages into one big file.

    1. Re:One Big File? by Anonymous Coward · · Score: 0

      As a corollary, also beware One Big Directory :-)

    2. Re:One Big File? by Tablizer · · Score: 1

      True. Using the existing file system to divide by topics and sub-topic also makes a lot of sense.

  20. use standard (open) formats w/ proven records by Anonymous Coward · · Score: 2, Interesting

    I've been using email since the early 1980's, 1982 specifically. I was using "mail" then, later mailx, later whizbang graphical clients.

    I still have tar archives of emails from a PDP-11. I can still read them today. Why? Because open formats. Tar archives from the dawn of time can still be read on a modern Linux system today. Once you start locking things up in proprietary formats such as used by Outlook, it gets harder to read them once that format dies. Not impossible, but certainly a bigger PITA.

    Tar will probably still be here long after I am gone, so from my POV it is a format with suitable longevity. The underlying messages were encoded in plain old (mbox, I think) mail format, which is also still readable by modern mail clients, and even if it wasn't, it's plain old ASCII, so "less" would suffice in a pinch. Stay away from weird binary / closed formats!

    1. Re:use standard (open) formats w/ proven records by mcswell · · Score: 1

      "Tar will probably still be here long after I am gone." Yes, but will anyone know how to use the tar command? http://xkcd.com/1168/

  21. Solr by wmelnick · · Score: 1

    Sounds like you are somewhat tech savvy. Dump all of your emails into files, or any basic store mechanism then load the whole thing into Solr and let it be your search engine.

  22. Yes: Thunderbird archive by MobyDisk · · Score: 4, Informative

    Use the Thunderbird archive.

    Thunderbird for example has 'Archive' as an option, but if I migrate to a different client I assume that won't work anymore.

    Nope! :-)

    I have about 10 years of email in Thunderbird. It keeps data in the mbox format which is a well supported open standard. The files are human readable and can be greped. There's lots of 3rd-party tools that support mbox. Thunderbird builds indexes (maybe those are proprietary) which are good enough that I can search that decade of email in a few seconds. (Maybe that is only searching by subject, to, and from. Message body searches might take longer). I remove attachments from old mail though, because that eats up space and is not valuable. If I needed the attachment, I saved it somewhere more appropriate.

    The Thunderbird archive feature merely moves the mail into separate mbox folders to keep the main file from getting too big. It doesn't make them proprietary.

    The hard part might be moving existing mail into that format from whatever it is in now.

    1. Re:Yes: Thunderbird archive by MSG · · Score: 1

      I'd second "use the thunderbird archive" and add "use IMAP."

      Thunderbird can archive mail into a single folder, or per-year folders, or per-month folders. When you are using IMAP, those folders are on the server, and accessible from any client. All of the clients I'm aware of allow you to "subscribe" or not to folders of your choosing, and most offer more fine grained control to choose what to download and keep locally in order to control client storage and bandwidth use.

      Thunderbird has an excellent search engine built in, so searching is straightforward.

      Thunderbird also supports IMAP tags (labels), so you can apply an arbitrary number of tags/labels to each message. This is a lot more flexible than sorting messages into folders manually. Once you start tagging messages, a clear and simple workflow becomes clear:

      Your inbox should contain only messages that require you to act on them in some way. Once a message no longer requires action, tag it if necessary and archive it. Or, if it is definitely not required, delete it.

      Simple. Now your inbox is cleaner, you'll spend less time sorting mail, and a lot less time searching for it. You can unsubscribe from older archives if you like, or simply choose not to keep them locally to save disk space on the client.

    2. Re:Yes: Thunderbird archive by Solandri · · Score: 1

      Agreed Thunderbird works well for an archive. There are just two gotchas I've encountered.

      1. The MBOX format gloms all your mail into one continuous text file. It does not have a special string to denote the beginning of a new mail message. It uses "From " (F r o m + a space) to figure out where the beginning of a mail message is. Consequently, if an email has a line in the body where someone has actually typed "From " as the beginning of a sentence, Thunderbird can mistake that as the beginning of a new email (there are a couple other checks it does - read the link if you want the details).

      2. If you used Thunderbird as an actual email client in the past, getting it to stop trying to login to check for new emails can be problematic. My Thunderbird setup was extensively customized with mail sorted into different folders by subject. I don't want to lose that sorting so I can't simply dump it into a new Thunderbird install (at least not with a lot of work setting it up again). So I just put up with the program occasionally hanging for 10-15 seconds while it tries to connect to a defunct mail server to download new mail. This may have been fixed - I've only had to look up 3 or 4 archived emails in the past 5 years, so I haven't bothered upgrading Thunderbird in a long time.

    3. Re:Yes: Thunderbird archive by flargleblarg · · Score: 1

      Consequently, if an email has a line in the body where someone has actually typed "From " as the beginning of a sentence, Thunderbird can mistake that as the beginning of a new email (there are a couple other checks it does - read the link if you want the details).

      Actually, no. You're wrong. If there is "From " at the beginning of a line, then what the mbox format specifies is that it be reencoded as ">From ", so that it can be decoded.

      Unfortunately (and this is the real problem), it does not require that ">From " be reencoded as ">>From ", so in other words encoding and decoding is not an invertible situation, because most MDAs are stupid about encoding. :-/

    4. Re:Yes: Thunderbird archive by literaldeluxe · · Score: 1

      For me, problem #1 causes issues when downloading mailboxes/folders outside of Thunderbird in mbox format and then copying them into Thunderbird. The only solution I've found is going through imported folders to figure out which messages had problems (which requires checking against the original, uncorrupted messages), then downloading those individually and importing them (Thunderbird figures out what to do if you copy a single message in).

      Problem #2 can be solved by unchecking all of the server-checking options, and then setting the mail server to 127.0.0.1.

  23. Standard storage formats: Mbox or Maildir by Anonymous Coward · · Score: 0

    The easiest way to convert is by uploading the mail to an IMAP Server and then using a tool of your choice to download the mails to one of the standard mail storage formats (or if you can access the mail server files and it stores mail as mbox files or in maildir format, get it directly from there).

  24. Why? by Anonymous Coward · · Score: 0

    This is not a normal problem to have. Filthy hoarder.

  25. In a mail client by Anonymous Coward · · Score: 0

    I don't see why you cannot just store them in mail client of your choice? You are probably better off using a local client over a webmail client (although gmail would be happy to import all of your mail and index it for you either over imap from a mail client or using one of the various loader tools out there), but I have never had any trouble importing old mail archives into thunderbird or outlook. If you set your folders or labels or whatever your client uses for organization correctly you should be able to search by account fairly easily.

  26. compressed mbox by Anonymous Coward · · Score: 0

    gzip -9 mbox

  27. suffering by Anonymous Coward · · Score: 0

    Attachments cause suffering.

    There, fixed that for you....
    -email Buddha

  28. Bluewave! by TWX · · Score: 1

    I just save them in my Bluewave offline mail reader! .qwk is the way to go!

    --
    Do not look into laser with remaining eye.
  29. What I did by Anonymous Coward · · Score: 0

    JustAnotherOldGuy here, posting from an undisclosed location....

    What I did was spend a day or so writing a script that extracted the emails from the various mail files they were in, and stuffed them all in a big honkin' MySQL database (just one table, no need to get fancy). It's about 500K rows all told. A simple interface lets me search any/all of the fields (subject, to/from email, body, etc and locate what I want without too much trouble. Yeah, it was a bit of a pain to do but it was worth it in the end.

    I only need to search it once in a while but when I do it's a lifesaver.

    This approach worked for me, it may or may not work for you.

  30. Don't archive, migrate by hackertourist · · Score: 1

    Every time I switched mail clients or computers, I made sure to import all mail from the old to the new program. Messages that were made in my first mail account (in Eudora, on Macintosh System 7) are still accessible in my current Mac (Apple Mail, OSX 10.10). I don't need it often, but when I do, it's one search away.

  31. mail piler by Anonymous Coward · · Score: 0

    mailpiler.org

  32. Sounds Like You're Making a Classic Type III Error by CAOgdin · · Score: 1

    ...Solving the wrong problem.

    eMail is not a storage medium; it is for short communiques, and sometimes those lead to threads while an issue is threshed through. But using your eMail system for historical storage is like buying a small automobile for long-haul freight. Or, using Twitter to negotiate a contract.

    Decide what of all your data you intend keep, and find a useful, generic tool for storage and retrieval, irrespective of content.

  33. Ever since... by Anonymous Coward · · Score: 0

    Ever since the NSA added a RESTful API to their database,
    it's been pretty trivial for me to search my email history.

    CAP === 'unprimed'

  34. No need to save emails.... by __aaclcg7560 · · Score: 1

    I'm periodically annoyed by some people who still respond to emails that I wrote 15 years ago as if it was only yesterday. Delete the old emails and move on in life.

  35. with sentbox_2013/ and archive/. Good guy or bad g by raymorris · · Score: 1

    This is what I do, run IMAP locally (Dovecot). Every year or so, I create a folder callled sentbox_2013/ and move all the sent emails from 2013 there. My regular sentbox contains the last 14-20 months or so.

    I also have a folder called archive/ which holds the few messages I think I'll actually need again.

    Regarding whether it's a good idea or a bad idea to keep them in terms of legal disputes and such:
    Having the documents will allow someone to prove what was actually said. If you're the a shady character, managing your business like it was Enron, you probably do not want to keep the evidence around. On the other hand, if you're working for the Software Freedom Law Center communicating with people who appear to be violating the GPL you probably want to save your communications- if the truth is clearly on your side, you may want to be able to prove what's true.

      If you're naturally very upfront and ethical in what you do and say, emails may be more likely to help you than hurt you.

  36. Dont be a pack rat. by 140Mandak262Jamuna · · Score: 2

    When you move, if you find a carton from the previous move unopened, discard it without opening. Follow the same rule and throw away the old emails. There is nothing of value in it.

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    1. Re:Dont be a pack rat. by Anonymous Coward · · Score: 0

      There is nothing of value in it.

      Except when there is.

      Unfortunately you don't often know you'll need something until you do.

    2. Re:Dont be a pack rat. by Anonymous Coward · · Score: 0

      I don't find that to be useful advice. We just had a funeral in the family, and we have old boxes, unopened, with family photos. So it seems possible that some unopened boxes have more value than others. You could either invest the time and effort to sort them, or just keep them all. We have room to store them, so they're fine where they are.

      Email has a lot in common. It would take some amount of effort to sort and classify those old emails, but for now they're all in one pile. It makes sense to me to keep a copy of those old emails and just use a computer to search through the content when I want to find something old. The cost of storing a handful of gb of old emails is very small.

    3. Re:Dont be a pack rat. by GI+Jones · · Score: 1

      Except that one time when someone important (or not so important to you) dies and walking through old email correspondence lets you relive moments that are gone and may have been forever forgotten without the help of archived emails. While I don't want anyone routing through my emails while I'm alive, I can imagine what a treasure trove of my history is trapped in email format and can be visited, explored and enjoyed by someone else, be it a distant grandchild doing research or a historian trying to understand the emergence of internet technology and its social impacts 75 years from now.

      --
      "Perhaps most amazingly, votaries of 'diversity' insist on absolute conformity." -- Tony Snow
    4. Re:Dont be a pack rat. by Anonymous Coward · · Score: 0

      Mom, did Dad have a small penis? He had SO many people offering him pills....

    5. Re:Dont be a pack rat. by Anonymous Coward · · Score: 0

      When you move, if you find a carton from the previous move unopened, discard it without opening. Follow the same rule and throw away the old emails. There is nothing of value in it.

      What if it has photos of your grandmother who's now dead?

      Family records you thought were gone forever?

      That cat you thought ran away? Oh wait, I guess on that one. . .

  37. Mail Format is Key by Anonymous Coward · · Score: 0

    There are 4 different formats for saving mail succinctly described in the following Mutt configuration page:
    http://mutt.blackfish.org.uk/storage/

    Your ability to open old mail files and parse the content depends on the format. If your 10+ years of emails are in different formats you have a challenge. Migrating different formats is not perfect and the altered files can be challenged when presented in court. If you really going to spend hours on this, I would pick a format any try to migrate copies to that common format. Use the new format for the searches but keep the originals as presentable evidence.

    The value of a consistent format is that it makes searching easier. I use mh format which Sylpheed/Claws-Mail/Mutt can all search by mailbox.

  38. Make searchable PDFs by Bearhouse · · Score: 1

    In addition to rolling your own imap, as has already been suggested, you can/should also do this.
    If you are a Windows and Outlook user, (and if not, Google and torrents are your friend) burn a wet weekend learning the mysteries of those two plus acrobat pro. Get a clean install on a fast PC with plenty of memory and an ssd.
    Import all your old crap into outlook (look it up)
    Install acrobat pro including outlook plugin... Trivially use this to create searchable PDFs including attachments.

  39. IMAP server by Chelloveck · · Score: 2

    Put all your mail on an imap server. You'll be able to access it with any mail client. Set up the imap server as the archive destination for TBird. Now all your mail is archived in the imap server and is accessible.

    You don't trust your email host? That's fair. Run your own imap server on your NAS or even your desktop machine. Everything stays right there on your own media and is still future-proof with regard to changing clients. If you need to change servers you just use your favorite email client to transfer mail from one to another.

    I have everything online at my email provider. In my case, "everything" goes back to the mid-90s. I recently switched hosting providers and did just as I described: Set up separate accounts in TBird with the old and new providers. Select all in a folder on the old provider, drag to a folder on the new provider. (Well, actually I had to do it in chunks of under 5000 messages or TBird would get all crashy on me. But you get the idea.) It was kind of tedious to move hundreds of thousands of messages, but it was merely tedious. It wasn't problematic.

    --
    Chelloveck
    I give up on debugging. From now on, SIGSEGV is a feature.
  40. Re:Sounds Like You're Making a Classic Type III Er by Kardos · · Score: 1

    It sounds like you've made a Category 6C blunder by providing a solution to a different problem.

    Nobody has the time to sift through two decades of emails and pick out the important things. Even if they did, the custom database thing to put them in will definitely not be cross platform, necessitating keeping a copy of the original mess of mbox/tar/etc files around to dig through.

  41. +1 for Mairix by subreality · · Score: 2

    After trying several solutions I settled on Mairix. Searches are screaming fast (less than a second to search several hundred thousand emails), indexing is fast, it's reliable (no problems in the 5+ years I've been using it), and the search language is easy and flexible.

    * I use procmail to send a copy of everything to an archive, rotated monthly
    * The archive is therefore just a handful of mbox files
    * I have a cron job to run "mairix -Q" every 5 minutes, and "mairix -p" nightly
    * I have this in my .bashrc: "function search() { mairix -o $$ $* && mutt -f ~/Mail/$$ ; rm ~/Mail/$$ ; }"
    * And here's my .mairixrc:


    base=~/Mail
    database=~/.mairixdb
    mbox=archive-*
    mformat=mbox
    omit=spam

    With the above, I can find:

    * everything from slashdot in the last two months: search f:slashdot d:2m-
    * any emails I sent containing "squishy" in the body: search f:subreality b:squishy
    * messages with "password" or "passwd" or similar in the subject: search s:passw=
    * get a quick summary of the search language: search -h

    It's so good that I download all my email from my work Gmail account so I can search it... sometimes Google's search just isn't precise enough to find what I need.

  42. Delete key by Anonymous Coward · · Score: 0

    Use your delete key. Seriously. You don't need 25 years of every old email you've received.

    This is just pack-rat hoarding behavior you're engaging in, and shame on Slashdotters for trying to enable it.

    1. Re:Delete key by jeffb+(2.718) · · Score: 1

      Yes, it's so tragic whenever we see these stories about a lonely old hacker found dead in his apartment, trapped under a toppled pile of bits.

      Get a grip. Our digital closets are growing much faster than our digital hoards. Space and indexing technologies are growing faster than our compulsion to accumulate plaintext. Keeping email is not a problem.

  43. Barracuda by Anonymous Coward · · Score: 0

    Barracuda

  44. Inbox by aaaaaaargh! · · Score: 1

    I just leave them in the inbox or whatever folder they end up in according to my sorting scripts. I'm using claws-mail as a client.

    Works for me.

  45. Re:Sounds Like You're Making a Classic Type III Er by angel'o'sphere · · Score: 1

    eMail is not a storage medium
    Of course it is.
    eMail is no difference than paper mail.

    Solving the wrong problem
    Depending on your "problem" you are obliegd by law to store them and have them accessible for 10 years, minimum. Depending on situation up to 30 years.

    Or, using Twitter to negotiate a contract.
    And what would be wrong with that? With 90% of my business partners: I have no contract at all. All we do is negotiation: can you do that? Yes I can! What is your price/timeframe? Something like X/Y ... Oki, done!

    That easily runs via twitter or Skype or even IRQ.

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  46. Just ask the NSA to search their archive by Anonymous Coward · · Score: 0

    Why waste your own time and bandwidth?

  47. For ease of use by ReginaldBarclay · · Score: 1

    I recommend the one-mail-per-file, and one-directory-per-folder, idea. It's not exactly, well, new - but it beats everything else by miles.
    Yes, this means you keep your mail local. This is a good thing, as this means /under your control/.
    grep? Yes, works. Easily.
    glimpseindex? Yes, works. Easily.
    Anything else? Yes, works. Easily.

    I keep all my mail from 1998 onwards (when I switched from a certain commercial provider with a proprietary email system) in that way. And it Just Works.

    ('course, Gmail/MSNmailorwhateveritscalledtoday are out. Who cares. exmh (and mutt in a pinch) FTW.)

    1. Re:For ease of use by dfsmith · · Score: 1

      I recommend the one-folder-per-file mbox idea. Beats MailDir handily.

      • grepmail? Yes works.
      • fsck? Much, much faster.
      • gzip? In the blink of an eye.
      • du -s? Lickety-split!
  48. offlineimap + mu4e by Anonymous Coward · · Score: 0

    I have about a dozen different IMAP accounts synced up with offlineimap and can search and filter through about 5GB of emails in under a second thanks to mu4e's indexing and rich filtering syntax.

    And because all the emails are stored on my server, they're all incrementally backed up as part of my daily system backup process.

  49. mutt + offlineimap + notmuch by Sadsfae · · Score: 2

    I use a combination of mutt + offlineimap + notmuch for mail, local archiving and a very powerful search.

    I've been on this setup the past 6years or so. If mutt isn't your thing this approach is modular so you could simply sync with offlineimap and index/search with notmuch.

    --
    Have a squat over at the hobo house.
  50. A few options by Anonymous Coward · · Score: 0

    Some options:
    1) Upload them to gmail - it has very strong search ability, and can do message bodies as well as metadata.
    2) Put them in an imap server, like dovecot. Last I heard, dovecot could index e-mail metadata, but not message bodies
    3) Put them in a Maildir or MH folder, and index them using lucene or pyindex or whatever. I use http://stromberg.dnsalias.org/~strombrg/pyindex.html , but I'm biased - I wrote it. This can do message bodies as well as metadata.

  51. Next step by Anonymous Coward · · Score: 0

    I already save it, now I want to index it and make some sense of it. It is impressively valuable.

    I don't mind deleting the spam. :D

    JJ

    1. Re:Next step by Anonymous Coward · · Score: 0

      Can you believe nobody has solved this basic need? Attachments are the point!

  52. rename and save as txt files by Anonymous Coward · · Score: 0

    auto rename each email according to date, time, other party and subject, and save as txt file.

  53. I asked a very similar question last month... by mlts · · Score: 3, Informative

    I asked a similar question to Slashdot about a month ago, where I wanted to stash E-mail and have it accessible if I'm on the road.

    I looked at a few options. Using a virtual machine, an offsite storage provider, and so on.

    What I have wound up doing is buying a NAS. Synology or QNAP are good companies for this. The NAS I bought was a basic one, but it supports RAID 1, which is critical. It also gets backed up automatically via a script that goes in via SSH, creates a tar file, pipes it to zbackup which has a repository on another NAS. zbackup is ideal for backups of E-mail, and having another machine pull the backups helps deal with ransomware, once the bad guys start hitting devices.

    I then enabled the mail server functionality, which gave me an implementation of dovecot and roundcube. This not just gave me IMAP access, but access via the web (SSL). Using the onboard firewalling, I limited the IP range that the NAS talks with, to just the IP range of the commercial VPN service I use (which is a small provider, run by some competent admins.) This way, for an attacker to even get to an open port forwarded past the router to the machine, they have to have an account with that small VPN provider.

    For me, this has worked well. I have access to my E-mail over IMAP or the web. Since the NAS doesn't send or receive mail directly (mail just gets copied to it when archived), it doesn't need SMTP access in or out.

    Caveat: Focus on security when setting this up. Ideally, you could use the NAS's built in eCryptFS capability to protect the IMAP maildir directories so physical theft of the NAS doesn't mean your critical E-mails belong to someone else. From there, put the NAS in its own DMZ, blocking all outgoing traffic except for it checking for OS updates, and only allowing incoming traffic to the TLS-based ports, preferably with heavy IP restrictions. For backups, do a pull based system, so if the NAS gets infected, the bad guys can only put garbage in the backups, and not attack previously stored data.

    1. Re:I asked a very similar question last month... by Anonymous Coward · · Score: 0

      why RAID1 support is critical?
      I find propagating disk errors on both copies ... sad. Sad, not critical.

    2. Re:I asked a very similar question last month... by justthinkit · · Score: 1

      Staggering levels of complexity and cost...

      My 20 years of emails are in the text file format native to Eudora. If I use any other email systems, I just bcc myself (i.e. Eudora). All in, ZIPped, I'm under 90MB.

      One post-processing thing helps -- I strip unneeded headers, and this chops out about half of the size.

      Text files forever baby.

      --
      I come here for the love
    3. Re:I asked a very similar question last month... by mlts · · Score: 1

      It is more complex than just tossing the E-mail from Eudora (guessing mbox format) into a zip file. However, I do have access to the mail from anywhere, and clicking on a VPN, firing up a dedicated IMAP app isn't that bad.

      The costs are sunk anyway. The NAS gets used for other things (zbackup repository), so having its dual-core CPU handle some basic IMAP processing when I choose to click the "archive" button on Thunderbird doesn't hurt.

      Locally, the mail is stored in the maildir format. While not as convenient as the mbox format that Thunderbird uses, it just a bunch of .eml files stuffed in a directory, and fairly easy to grep though by hand, should dovecot fail. Only downside is the sheer amount of inodes all the messages take up.

    4. Re:I asked a very similar question last month... by wwalker · · Score: 1

      And then let's say the motherboard in your NAS dies. Let's say it happens in 10 years (I'm being generous here), and there is no Synology/QNAP around any more, or even if they still exist, they don't make compatible products any more. Can you pull HDDs out of your NAS and read data from them somehow, in a convenient non-spend-a-week-copying-individual-files-by-hand way?

        That's why a generic Linux install on a commodity PC hardware will beat any NAS for longevity.

    5. Re:I asked a very similar question last month... by Anonymous Coward · · Score: 0

      I have a similar setup with Synology and it works very well. Since Synology is using Dovecot and Postfix, everything is in their standard format. You can then use the Amazon Glacier app, if you so choose, or any of the backup apps on there (including Cloud Sync to Box.net, Dropbox, Microsoft's service) to back it up long term. You can use this as backup email server only (where you just offload emails to it) or as a everyday email server (as long as your ISP allows you).

      I asked a similar question to Slashdot about a month ago, where I wanted to stash E-mail and have it accessible if I'm on the road.

      I looked at a few options. Using a virtual machine, an offsite storage provider, and so on.

      What I have wound up doing is buying a NAS. Synology or QNAP are good companies for this. The NAS I bought was a basic one, but it supports RAID 1, which is critical. It also gets backed up automatically via a script that goes in via SSH, creates a tar file, pipes it to zbackup which has a repository on another NAS. zbackup is ideal for backups of E-mail, and having another machine pull the backups helps deal with ransomware, once the bad guys start hitting devices.

      I then enabled the mail server functionality, which gave me an implementation of dovecot and roundcube. This not just gave me IMAP access, but access via the web (SSL). Using the onboard firewalling, I limited the IP range that the NAS talks with, to just the IP range of the commercial VPN service I use (which is a small provider, run by some competent admins.) This way, for an attacker to even get to an open port forwarded past the router to the machine, they have to have an account with that small VPN provider.

      For me, this has worked well. I have access to my E-mail over IMAP or the web. Since the NAS doesn't send or receive mail directly (mail just gets copied to it when archived), it doesn't need SMTP access in or out.

      Caveat: Focus on security when setting this up. Ideally, you could use the NAS's built in eCryptFS capability to protect the IMAP maildir directories so physical theft of the NAS doesn't mean your critical E-mails belong to someone else. From there, put the NAS in its own DMZ, blocking all outgoing traffic except for it checking for OS updates, and only allowing incoming traffic to the TLS-based ports, preferably with heavy IP restrictions. For backups, do a pull based system, so if the NAS gets infected, the bad guys can only put garbage in the backups, and not attack previously stored data.

    6. Re:I asked a very similar question last month... by dfsmith · · Score: 1

      Hopefully the OP will have been doing regular IMAP offline-syncs with the mail server; on several machines.

    7. Re:I asked a very similar question last month... by mlts · · Score: 1

      The NAS uses Linux's LVM2 and ext4 for the drives in the machine, using a "secret sauce" to adjust the LVMs as disks are inserted/resized.

      I don't know how LVM software will be in 10 years, but I think Linux's LVM software (and ext4) isn't too hard to decode if I need to pull the drives out due to a failed component.

  54. It is called mbox by Anonymous Coward · · Score: 0

    .. and it is the standard format for storing emails that has been around since email was invented. Some proprietary mail applications like to use their own custom format ( I'm looking at you Micro$oft Outlook ), but Thunderbird still uses the standard format and so will always be usable.

  55. I did this a few years ago. by Anonymous Coward · · Score: 0

    I had email going back to 1990. Backups in various formats, including QWK, BlueWave, CSV, PST, elm and a dozen or so email accounts I rarely (if ever0 used any more. I use gmail, and wanted it all accessible to me on gmail.

    I started by converting files. I found a utility that exported all of my QWK/BlueWave emails to CSV files (it also put attachments in a folder, linking to the file in the message. Very few attachments in the really old stuff...). Next I used Outlook to import the csv files from each of those accounts, each to its own pst file (this was just a smart move). A little hand work to add the 250 attachments back to the files they belonged in. I then created the matching folder structure on my gmail account and copied the messages over.

    Next, I fired up Thunderbird and imported all of the .elm messages into that. After the import, I used IMAP to create the directory structure and copied over all of the mail to gmail.

    Next came the CSV files. For that, I used Outlook's import feature to bring those emails in. Again, I then used IMAP to create the directory stucture and import into gmail.

    Same went for the PST files. Opened them up one-by-one, created the structure and moved the mail to gmail.

    The initial move took time. It took me a week or so to import all 8 gig of email - most of it was waiting for processes to complete. But once it was done, it was done. But like most, I want it in more than one place. So I use getmail to backup my gmail account. This runs every 2 hours (I can tolerate a potential 2-hour loss).

    opening Outlook with the .pst file and connecting outlook to gmail via imap. Folder by folder, I copied the email (all 6GB of it) over to gmail using its psudo IMAP

  56. Thunderbird v. Eudora by Anonymous Coward · · Score: 0

    Thunderbird is the next generation of Eudora. Thunderbird stores attachments within the email messages. Eudora stores the attachments as external files. Depending on your requirements, with Eudora you'll get much smaller mailboxes at the end. They're all flat text (without all the attachments), so they zip nicely. Eudora has a good search engine within as well.

  57. Mbox or maildir. by Anonymous Coward · · Score: 0

    Access them with whatever tool which can do it: Mutt, Dovecot imap, whatever. They'll index that for you, but remember: index is a throwaway convenience, the original is the mails in mbox or maildir format.

    Simple, no lockin to any stupid software insisting on reinventing wheels.

    Don't trust any software insisting in "converting" your mails.

  58. mbox by bigtreeman · · Score: 1

    Always used mbox format, got 7 years emails right here, immediately accessible,
    before that on an old hard drive, same format, easy to load, backed up in annual mbox files.
    Easy job for grep, or just open with Thunderbird and sort/search.

    --
    Go well
  59. Commertial Solutions by Anonymous Coward · · Score: 0

    Market now is shifting from Veritas EV to MS Exchange 2013 that have archiving feature for free and even to office 360. Done quite a few migrations this year.
    You can also look into EMC SoureOne and CommVault.

  60. Dsync from Dovecot by DeBaas · · Score: 1

    A tip: Dovecot has a nice sync tool http://wiki2.dovecot.org/Tools... Perfect to get your email from different IMAP sources to your own system. It can also change mailbox format etc. Combine that with Dovecot itself to give you IMAP access and you have access. You can also use it to keep it in sync with an off site archive.

    Dovecot does have full body search, but it is quite CPU intensive. No problem if you just run it for a few users and except that it may take a while on a large amount of emails. Not too great if you're hosting for lots of users.

    --
    ---
  61. Does Anyone Have Any Actual Experience? by Anonymous Coward · · Score: 0

    Does anyone have any actual experience in this? I've got clients with mailboxes containing 500,000 to 1 million messages. Archiving isn't very hard. Searching is a fucking BITCH!

    500,000 message is a HUGE amount of email, it takes forever to archive/index/re-index in a way that allows full text searches. I have yet to find anything other than commercial SQL based archive and discovery systems, that work reliably or worth a shit. It's costly too because not only do you have to pay a massive fee for the software, there is also the additional server and storage requirements.

    I'd love to find a solution. I see these idiotic posts on Slashdot about using IMAP and Thunderbird(!?). Are you kidding? You don't understand the problem.

    That much mail is an issue to leave on the server. Storage requirements, performance issues, indexing and recovery times are all major problems that make leaving all the messages on the server highly undesirable. Putting the messages into an offline archive is tedious and beyond slow or difficult to search, as the OP explained. The archive needs to be online or nearline. It needs to perform well and be quickly searchable with near instant retrieval. But, it will be disused for most of its life so it needs to be cost/resource effective. AND, unlike most of the commercial offerings, it needs some form of standards compliance. Exporting 5,000 messages for discovery in a proprietary format or as individual unsearchable image PDFs is not useful. It should support import/export in multiple formats including mail client readable and searchable documents like PDF-A.

  62. MailVault - Google-like search for your email by Anonymous Coward · · Score: 0

    Check out MailVault (mailvault.in). It is super easy to setup and use, runs on Windows & Linux, has a Google-like search (via a web interface) for ALL your email, or if you prefer, you can access your mail right from within your email client via the built-in IMAP server.

    It can import from a variety of sources (mbox, maildir, emls, thunderbird, pst, pop3, imap, smtp) - so you should be able to archive your old, as well as your current email.

    Assuming this is for personal use, you can run it on your laptop, back-up onto a portable hard-disk (it can also automatically make a secondary backup on the external disk), and you'll always have access to all your email, whenever you want.

    I love grep, but to handle many years worth of email, I prefer MailVault :)

  63. Standard Unix mail spool / mbox files by Anonymous Coward · · Score: 0

    There's a standard that has been around forever and is the only one that's guaranteed to be openable forever - standard Unix mail spool files and mbox files.

    Why, you ask, will they be around forever? Because they're text files which can be viewed, searched and processed by anyone who knows the few rules about how they work. It's not like sed, awk and grep are somehow going to be replaced by the vendor with new programs that no longer support ASCII.

  64. MailVault - Google-like search for your email by Anonymous Coward · · Score: 0

    Use MailVault. It is super easy to setup and use, runs on Windows & Linux, has a Google-like search (via a web interface) for ALL your email, or if you prefer, you can access your mail right from within your email client via the built-in IMAP server.

    It can archive email from a variety of sources (mbox, maildir, emls, thunderbird, pst, pop3, imap, smtp) - so you should be able to archive your old, as well as your current email.

    Assuming this is for personal use, you can run it on your laptop, back-up onto a portable hard-disk (it can also automatically make a secondary backup on the external disk), and you'll always have access to all your email, whenever you want.

    I love grep, but to store and manage many years worth of email, I prefer MailVault :)