Slashdot Mirror


Best Way To Archive Emails For Later Searching?

An anonymous reader writes "I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?"

385 comments

  1. It's obvious by Mikkeles · · Score: 3, Funny

    Alphabetically!

    --
    Great minds think alike; fools seldom differ.
    1. Re:It's obvious by Anonymous Coward · · Score: 0

      http://www.barracudanetworks.com/ns/products/archiver-overview.php

      adelenaw!

  2. IMAP by klingens · · Score: 5, Informative

    An IMAP server (dovecot, cyrus, courier) of your choice for Linux. If you don't have a Linux server you can always run it inside a small VM.

    1. Re:IMAP by hedwards · · Score: 1, Informative

      Yeah, IMAP is the way to go, personally, I use IMAP on my email account and mailstorehome to do the actual download and backup. The OP will probably end up having to set up a personal server to get the program to download the older mail, but that can be done easily enough via a virtual machine.

    2. Re:IMAP by arivanov · · Score: 1

      The guy mentioned entourage. If he is running MacOS he can run any of these on MacOS.

      This solves the "storage" problem. However, this does not solve the search/index/etc problem. I have 9G+ and growing IMAP store going back to 1999 with several hundred of folders in it so I am facing a similar problem. Using Thunderbird search and even grepping it on the server just does not cut it any more.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    3. Re:IMAP by wealthychef · · Score: 2, Informative

      Well, on OS X the searching problem *should* be solved by Spotlight, as it indexes "all files on your hard drive" (not) into constant-time searches automagically. The trouble with Spotlight is that Apple does not search all folders and I do not know of a way to enable it to search all folders. If you import it into Mail.app, you do get the indexed behavior, and my situation is similar to yours, and I do exactly that. But all those billions of old messages, I keep in an archive that I never look at.
      Anyhow, look into Spotlight on OS X. Ooops, you said "open sourced," right? Damn. I don't know, then.

      --
      Currently hooked on AMP
    4. Re:IMAP by wvmarle · · Score: 1

      For storage, IMAP is definitely the way to go.

      I'm using Cyrus myself for this exact purpose (e-mail from the last 7 years about; estimate 20 GB worth of mails; I have many mails that come with attachments). No specific reason to use that one; seemed to be the easiest to set up at the time; it works fine for me.

      Main reason for me to use an imap server is that it is client-independent, and as it's open source it's not some weird proprietary format. So great to store mails, easy to retrieve mail remotely, easy to set up webmail, no problems with syncing between clients.

      However it does not help much with search. This is an issue I'm running into now and then as Evolution simply sucks at searching through a large mail base. Afaik Cyrus does not support server-side mail searches; other imap servers may do so. So if you need fast search Cyrus doesn't seem to be the server of choice. Personally I'd look into storing the mail in an SQL database if search is a major issue.

    5. Re:IMAP by tenco · · Score: 1

      My first thought was "Maildir". AFAIK there are IMAP servers with a maildir backend. So, yeah, someone should get an IMAP server with Maildir backend for this job.

    6. Re:IMAP by Anonymous Coward · · Score: 1, Interesting

      I have 3GB of email and I use Eudora, searching for emails isn't that slow if you can organize stuff into smaller folders.

      So I'm sure a more geeky solution can be much faster: e.g. a postgresql database with full text search and metadata search.

      The stuff you'd want to search is mainly in text so it shouldn't be too difficult.

      If you wanted an equivalent search for videos, sound or pictures that'll be harder - e.g. given a picture of this object, please find videos containing this object.

    7. Re:IMAP by 19thNervousBreakdown · · Score: 3, Informative

      Seconding this. I've been using Dovecot with Maildir on EXT3 for the last few years--my mailbox is about 25k messages, which I keep all in a single folder and use IMAP tags to organize into different virtual folders, much like Gmail's system but without the privacy concerns.

      Dovecot's supplementary indexes makes everything extremely fast (tags, dates, etc), and anything it doesn't catch Thunderbird does, I can search my entire mailbox for a single word in less than a second. I lose my Thunderbird indexes whenever I move to a new computer, but that's just a matter of leaving the client up for a few hours.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    8. Re:IMAP by Nimloth · · Score: 1

      No way to archive Entourage data with Spotlight.
      Entourage is teh suck. Wait till OL2010 on Mac, until then stick with Mail.

    9. Re:IMAP by Graff · · Score: 2, Informative

      No way to archive Entourage data with Spotlight.

      Actually, there is:
      Using Spotlight to search Entourage.

    10. Re:IMAP by JWSmythe · · Score: 1

          I totally agree, but there are some other bits to include. You probably already knew them, but they're worth mentioning. At one point, I had a decade of mail stored. It was broken up by year and month. Year folders, month subfolders. It made it just a bit more manageable to read through. That's just a personal decision though. Some people like sorting by the sender, or what they were involved in. You can subscribe or unsubscribe folders too, so it's not necessary to download the headers for 10 years of mail just to set up on a new computer.

          Right now, there are two common storage formats, mbox and maildir. mbox is a single file, that can get huge. maildir is a single file for each message. For either of these, there are a plethora of server softwares to use with them.

          The server software make a big difference. Some are a bastard to set up and use. Some install easily and just work. Some are quick, and some are pathetically slow. I'm fond of dovecot. Who knows what will be available in 10 years though. If you stay with good standard format (like maildir), it will be easier to survive future migrations. I changed imap software several times, and the users never knew. A few times, I've helped people migrate from non-standard formats like Exchange and Zimbra. In both of those cases, it required acquiring or changing their passwords, and then downloading their mail to the new server. It's not all that bad once you script it, but can still take a long time, depending on how much crap people have kept laying around.

          If you keep it in a VM as you suggested, it's important to keep backups outside of the VM. I lost one of my VM's a couple weeks ago. It was just on my desktop, and the drive started to fail, and it lost a few clusters that the VM resided in, so the whole file became corrupt. It wasn't important though. I keep a whole bunch of VM's to test backward compatibility in. It's nice to have a few dozen VM's, rather than a few dozen machines running outdated OS's.

      --
      Serious? Seriousness is well above my pay grade.
    11. Re:IMAP by Compuser · · Score: 2, Insightful

      Huh? How does a server help with a local archive of emails? Does any of these servers help with importing emails (pre-mbox arapnet emails for instance
      or dbx emails for a more modern example)? Does it provide fast searching (including .doc and .ppt awareness)? This may be a storage approach but does
      not begin to deal with the question raised.

    12. Re:IMAP by flyingfsck · · Score: 2, Informative

      Totally. All my email since about 1998 is in a Citadel mail server. It uses the BerkeleyDB, which can handle 256 Terabytes of mail. That should be enough for any semi-sane person...

      --
      Excuse me, but please get off my Pennisetum Clandestinum, eh!
    13. Re:IMAP by burisch_research · · Score: 1

      Oh yes, Entourage gave me numerous problems, and the 'solutions' presented by Apple themselves were of no use whatever. This on a brand new machine, fully updated, with a genuine copy of Office, also fully updated.

      So when I had a break-in a few weeks ago and my mac disappeared, my emotions could be summarized by "meh".

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
    14. Re:IMAP by simcop2387 · · Score: 1

      courier supports maildir backed imap, i use it myself and its great for being able to get at things over ssh (mutt), imap with kmail, or even web mail hooked up to the imap. as someone else said earlier, IMAP is THE way to go for storage. as far as searching, i'm not sure, i haven't found anything that's perfect at it yet.

    15. Re:IMAP by AlterEager · · Score: 2, Informative

      Afaik Cyrus does not support server-side mail searches

      Oh yes it does. Check out squatter.

    16. Re:IMAP by wvmarle · · Score: 1

      Thanks for the tip. I didn't know about it. The first hit I got on Google also said it's a "little known feature" of the Cyrus IMAPD.

      Do you also happen to know how I can have Evolution use IMAP search? Or is that done automatically? Their IMAP implementation remains wacky after all...

    17. Re:IMAP by metamatic · · Score: 1

      Yes. Dovecot, IMAP server.

      And use Maildir+ for storage.

      End of problem.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    18. Re:IMAP by WuphonsReach · · Score: 1

      A lot of inexpensive hosting companies also now offer IMAP, and it shares space with your web content. I have a few gigabytes of email over at LogicWeb on their IMAP server for one of my personal domains.

      I generally archive by year, rather then trying to archive by category. This is where GMail's tags would come in handy as you can apply multiple tags to an email. But if you don't have tags, sorting by year is quick and dirty, and you can always do full-text searches.

      (I have archives going back a full decade, which is important for work email so I can find how we did something 2-3 years ago.)

      --
      Wolde you bothe eate your cake, and have your cake?
    19. Re:IMAP by WuphonsReach · · Score: 1

      I totally agree, but there are some other bits to include. You probably already knew them, but they're worth mentioning. At one point, I had a decade of mail stored. It was broken up by year and month. Year folders, month subfolders. It made it just a bit more manageable to read through. That's just a personal decision though. Some people like sorting by the sender, or what they were involved in. You can subscribe or unsubscribe folders too, so it's not necessary to download the headers for 10 years of mail just to set up on a new computer.

      I used to sort by subject (job number - for work related stuff), but ultimately fell back to archiving by year (year/month was too fine grained). Searching for a sender/receiver name is fast with IMAP, as your client will already have that information as part of the message headers. So I've never felt the need to file stuff away based on who sent it.

      As I get older, I get lazier. Filing stuff by year is easy, so it gets done. And it keeps my individual folder sizes at a manageable size of a few thousand messages per year. Plus, I can generally guess the year and get a hit on the first search. Worst case, I have to look up/down one year. Trying to do the same by month would not get me as fast of results.

      Especially since in Thunderbird I can quickly filter by subject line / to / from / etc. So I'm rarely looking at a list of more then 100 messages when hunting for something.

      --
      Wolde you bothe eate your cake, and have your cake?
    20. Re:IMAP by vanyel · · Score: 2, Informative

      I am using dovecot and thunderbird, and have about 60 "live" folders, some with 10's of thousands of messages (a couple with 150+k messages). It is a constant battle with thunderbird, which often goes away for long periods of time, even when not doing anything one would expect to be dealing with the larger folders.

      I'm working on some scripts to archive messages into 30-90-180 day archive folders to keep the live folders down to a manageable size, but it would be nice to find something that already exists...

    21. Re:IMAP by neerolyte · · Score: 1

      Serious question...

      Did you have to do anything special to get tags to work?

      Are you only using Thunderbird or other clients as well?

      Does it support unlimited tags?

    22. Re:IMAP by 19thNervousBreakdown · · Score: 1

      I use an extension called Tag Toolbar to allow unlimited tags, and to edit the default tags--the default ones that Thunderbird ships with ($tag1, $tag2, ... or something similar) are special tags that are client-defined and as far as I can tell, useless. They're also the only ones most mail clients choose to support. Why would you want to store something on the server that has a different meaning on every client? Anyway, other than the tag toolbar, no I didn't have to do anything special.

      I'm using other clients, which don't support IMAP tags and that's a bummer because Dovecot supports tag search directly on the server, so virtual folders would be beautiful on the iPhone for instance... oh well. It works nice on Thunderbird, and as far as I can tell that's the only one right now.

      Oh, I'm also using Sieve to filter my mail, and it has very nice IMAP tag support, you can set them, read them, whatever.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
  3. Delete by Anonymous Coward · · Score: 2, Insightful

    Time to delete them all

  4. A Lawyer's Fantasy ... by perpenso · · Score: 4, Insightful

    I have kept every every email I have ever sent or received since 1990 with the exception of junk mail (though I kept a lot of that as well) ...

    You are a hostile lawyer's fantasy come true. ;-)

    1. Re:A Lawyer's Fantasy ... by Jawnn · · Score: 1

      Bravo, sir. You win my "most insightful comment of the week" prize, hands down.

      The OP should give some very serious thought to the wisdom of keeping all that email. It may be relatively harmless (I'll wager he's not in a position where his correspondence is likely to be of interest to potential litigants), but dude, hoarding is a disease. Seek treatment.
      Meanwhile, look for something that uses IMAP style storage and a database for indexing purposes. Be prepared for a laborious process of importing and indexing all that email. It will be worth it. You don't want to have to traverse the content of every message in a folder every time you search for this or that string in the body of some 8 year-old message. The list of systems that will do this is short.

    2. Re:A Lawyer's Fantasy ... by tareko · · Score: 1, Insightful

      I don't get the difference between this person and all the people of old whose many personal and mundane letters litter collections everywhere and make historical accounts more rich and precise. I bet he can track the births and deaths of countless relationships through those emails, which is itself of tremendous worth.

    3. Re:A Lawyer's Fantasy ... by Sancho · · Score: 1

      Hoarding is mostly a problem when it causes some sort of harm. Usually this harm manifests as danger (keeping lots of old paper around can be a fire hazard) or space issues (not having enough room for all of your stuff.) In the digital world, there are two main dangers to hoarding: ediscovery and running out of space.

      The latter is almost completely irrelevant in the context of e-mail. E-mail messages are so tiny and hard drives are so large that it's ludicrous to be concerned over them. A liberal estimate of my lifetime of e-mail is a little over half a gigabyte. Even the smallest (new) USB flash drives would hold that with room to spare.

      The former is more of a concern, but I think that the vast majority of people will never have to worry about it.

    4. Re:A Lawyer's Fantasy ... by darkonc · · Score: 1

      I have kept every every email I have ever sent or received since 1990 with the exception of junk mail (though I kept a lot of that as well) ...

      You are a hostile lawyer's fantasy come true. ;-)

      Only if you do nasty things that you don't want people to know about. If you live by principles, treat people well and avoid doing things that you'd regret seeing the light of day, saving all of your communications can make you a defense lawyer's wet dream.

      It's generally nasty people who like to do things in the dead of night. It's companies like Microsoft that force their $6digit-per-year employees to delete all emails more than 30 days old 'because we don't have the space', but can provide gigabytes of free email to hundreds of millions of strangers via hotmail.com .

      --
      Sometimes boldness is in fashion. Sometimes only the brave will be bold.
    5. Re:A Lawyer's Fantasy ... by russotto · · Score: 1

      Only if you do nasty things that you don't want people to know about. If you live by principles, treat people well and avoid doing things that you'd regret seeing the light of day, saving all of your communications can make you a defense lawyer's wet dream.

      Only if your principles are the same as those of the judge and/or jury.

    6. Re:A Lawyer's Fantasy ... by perpenso · · Score: 1

      ... If you live by principles, treat people well and avoid doing things that you'd regret seeing the light of day, saving all of your communications can make you a defense lawyer's wet dream.

      That is a fantasy, the fact that we need good samaritan laws to protect people demonstrates this. In the real world good people get sued and lose in court every day.

    7. Re:A Lawyer's Fantasy ... by perpenso · · Score: 1

      I don't get the difference between this person and all the people of old whose many personal and mundane letters litter collections everywhere and make historical accounts more rich and precise. I bet he can track the births and deaths of countless relationships through those emails, which is itself of tremendous worth.

      Those people of old often intended their letters to be saved for posterity. Many also took a hostile reader's interpretation into consideration given the biased press of those days, often far worse than what we see today. Given this they composed things far more carefully, often explaining their reasoning in detail. Today's off-the-top-of-your-head emails where a response or "conversation" can be nearly instantaneous are a poor comparison.

      That said I agree that such an archive could be of value to social/cultural anthropologists, much like when they dig through old land fills and garbage dumps. Don't laugh about going through garage dumps until you have read about things like Vindolanda, http://www.vindolanda.com/roman_vindolanda.html.

    8. Re:A Lawyer's Fantasy ... by tha_mink · · Score: 1

      A liberal estimate of my lifetime of e-mail is a little over half a gigabyte.

      Seems a little low. I think I'm working on doing over twice that per year, and I'm not even trying to keep an archive.

      --
      You'll have that sometimes...
    9. Re:A Lawyer's Fantasy ... by Sancho · · Score: 1

      That's personal mail, so it excludes work mail. I keep recent mail much longer, and get a lot more mail recently than I used to, which is why it was a liberal estimate. I basically just added up the sizes for mail from the past year, then multiplied by (age - agewhenigotinternet).

      Maybe I just don't get much mail.

      It's lonely here.

    10. Re:A Lawyer's Fantasy ... by darkonc · · Score: 1
      The funny thing about that comment is that the Good Samaritan wasn't about helping a stranger. A Samaritan was the heretic-outcast of the day. It was a parable of tolerance. If Jesus was alive and well and living in New York, today, it would probably be the "Tale of the Good Muslim".

      In any case, if you were good and principled, and got sued, having all of your correspondence on file would be far more likely to help you than hurt you -- and in many cases it would have no effect at all.

      If, on the other hand, you play fast and loose with the law, and treated the people around you like pawns to be used and abused, then having all of your correspondence on file would be more likely to prove fatal to the defense.

      --
      Sometimes boldness is in fashion. Sometimes only the brave will be bold.
    11. Re:A Lawyer's Fantasy ... by ShakaUVM · · Score: 2, Insightful

      >>You are a hostile lawyer's fantasy come true. ;-)

      We've won a couple lawsuits because I save all of my email.

      We had a contract to do a workshop with Maricopa County - the same people whose Sheriff is under investigation by the FBI right now, and of Immigration Law fame. And who have a lot of other shady things going on right now, but I digress.

      I'd traded a series of emails with them planning the workshop. Everything was all set. Then, about a week before the workshop, they say they don't need me to come after all. Ok, sure. So I try to reschedule with them. Nope, sorry, you didn't show up to the workshop, so you breached the contract. I sent them a copy of all the emails. Nope, sorry.

      Filed a lawsuit. They wouldn't settle. Showed the email trail of everything. Got a check for over $30k. Didn't have to do the work. (Of course, I'd have preferred if everyone had just done as they'd said, and it was much more of a hassle to sue than to just do the damn work.)

      Lawsuits are often won over who has the best documentation. If you do your work honestly, having full email records is probably going to help you more than hurt you in lawsuits.

    12. Re:A Lawyer's Fantasy ... by bill_mcgonigle · · Score: 1

      If you live by principles, treat people well and avoid doing things that you'd regret seeing the light of day, saving all of your communications can make you a defense lawyer's wet dream.

      Sure, if you don't have any corrupt or evil people running your government.

      "If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him" - Cardinal Richelieu

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  5. Re:Psychiatric consultation! by Anonymous Coward · · Score: 0, Troll

    You say that like its a bad thing. People with OCD are great at giving oral sex.

  6. Google Mail. by sidragon.net · · Score: 3, Insightful

    See subject.

    1. Re:Google Mail. by WiglyWorm · · Score: 1

      This was my first thought as well. 0 reason for anything else.

    2. Re:Google Mail. by Anonymous Coward · · Score: 1, Insightful

      Uh, privacy would be the reason.

    3. Re:Google Mail. by dave420 · · Score: 1

      For paranoid folks, I'm sure it is.

    4. Re:Google Mail. by Anonymous Coward · · Score: 0

      searching is capped at two years period, iirc

    5. Re:Google Mail. by Anonymous Coward · · Score: 0

      Please mod parent as troll -1. FFS people RTFA, GMail is not a possibility in this case. Besides, who wants Google to have access to their personal correspondence?!

  7. Re:Psychiatric consultation! by balaband · · Score: 5, Funny

    This is slashdot. We save computers older than your dad just to use them as alarm clocks. Please leave.

  8. one word by Anonymous Coward · · Score: 1, Interesting

    gmail

  9. Not Much by maxume · · Score: 2, Informative

    It isn't particularly platform independent (because no one is paying much attention to Windows), but Not Much offers threads and full text search:

    http://notmuchmail.org/

    --
    Nerd rage is the funniest rage.
    1. Re:Not Much by koiransuklaa · · Score: 3, Informative

      +1

      Notmuch can manage absolutely insane amounts of email without any artificial 'archiving'. Of course, if you are looking for a a program that does something else than tagging and searching (like sending, composing or receiving email), you need to look elsewhere.

  10. Print by JustOK · · Score: 4, Funny

    Print then scan

    --
    rewriting history since 2109
    1. Re:Print by zeil · · Score: 0

      Ouch.. really... if your going to do that at least print to PDF

    2. Re:Print by Anonymous Coward · · Score: 0

      That would obviously be very fast for text, but what about binary attachments?

    3. Re:Print by tylernt · · Score: 1

      That would obviously be very fast for text, but what about binary attachments?

      http://www.ollydbg.de/Paperbak/

      --
      DRM 'manages access' in the same way that a prison 'manages freedom'
    4. Re:Print by Anonymous Coward · · Score: 0

      your = possessive. For example, "YOUR email hoarding is going to end up ruining YOUR life."

      you're = contraction of "you are." For example, "YOU'RE going to have a big problem with all that email."

      It's really so simple a 5 year old could handle it. You either mean "you are" and use "you're" or you don't.

    5. Re:Print by herdingcats · · Score: 1

      you are obviously one of my former executive bosses.

    6. Re:Print by JustOK · · Score: 1

      ur rite

      --
      rewriting history since 2109
    7. Re:Print by Anonymous Coward · · Score: 0

      This idea is not search-able and is totally retarded, of course.

    8. Re:Print by RealGrouchy · · Score: 1

      Duh, you just search by the date you scanned the e-mail.

      - RG>

      --
      Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
    9. Re:Print by Abstrackt · · Score: 1

      He'd be better off printing then photographing the emails and turning them into a coffee table book.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
    10. Re:Print by Anonymous Coward · · Score: 0

      nonono, export to .pdf. Don't give paper wasters any ideas :)

  11. Gmail? by spiffydudex · · Score: 5, Informative

    While not open source, Gmail has a good search engine that isn't sluggish. Plus it has roughly 7.5 gigs of space to store data. Use IMAP to push all of your emails to the server and then use that Gmail account for archive email only.

    1. Re:Gmail? by siliconbits · · Score: 2, Insightful

      I second that. Invest in Google Apps to benefit from additional services as well.

    2. Re:Gmail? by Threni · · Score: 1

      Thirded. And if you're bothered about non-oss/future proofing etc, just download it all every now and again via the POP3 interface and burn it/keep it locally, accessible via Thunderbird (for example).

    3. Re:Gmail? by pvera · · Score: 3, Insightful

      Yes! The thing that appeals to me the most about using Gmail is that searching through 5+GB of old emails won't make everything in my machine slow to a crawl. Even with the free Gmail account, you can up the storage to 20GB for $5/year, and that extra space is available from other Google services connected to the same account.

      If you want to have more flexibility, sign up for a Backupify account, which can backup Gmail pretty well. As a bonus, when Backupify stores your backups they are kept in plain text format, so you can always pull these and move them elsewhere without having to worry about issues with Gmail's storage formats.

      --
      Pedro
      ----
      The Insomniac Coder
    4. Re:Gmail? by Anonymous Coward · · Score: 0

      Except that gmail doesn't index the full text of every e-mail - so searching old mail is a hit-and-miss affair.

    5. Re:Gmail? by Anonymous Coward · · Score: 0

      I totally agree.

      However... I've had problems with the transfer of my old mails: I used Thunderbird to 'mount' GMail via IMAP, and then just copied all my old mails over. As this took about 2-3h, I didn't babysit the process. Much later I found out that part of mails is missing, because Thunderbird had apparently run on an error of sorts and stopped copying the data; with no error message I would have noticed. No problem, as I can just re-copy the old data. But... Does anyone have advice for an idiot-proof Thunderbird-to-GMail copy process?

    6. Re:Gmail? by Americium · · Score: 0, Redundant

      Tried Google desktop? You can search your outlook/thunderbird mail in a blink of an eye, as well as anything else on your computer.

    7. Re:Gmail? by Capt.+Skinny · · Score: 1

      just download it all every now and again via the POP3 interface and burn it/keep it locally

      The only problem with that is you lose all of your tags/labels. All you manage to download is a TON of unsorted email.

    8. Re:Gmail? by Anonymous Coward · · Score: 0

      Along with sharing all that data with the 100 or so folks at Google who have the right passwords to get to the data. Heck that number is probably small. Yep, I know no one at my own company ever used a password for something they probably shouldn't have.

    9. Re:Gmail? by Anonymous Coward · · Score: 0

      While not open source, Gmail has a good search engine that isn't sluggish. Use IMAP to push all of your emails to the server and then use that Gmail account for archive email only.

      This is especially good for your privacy. Don't forget to push there the the top 10 most embarrassing emails of your college days, and to publish the password in the clear on Facebook.

    10. Re:Gmail? by dave420 · · Score: 1

      Not this shit again. Do you have any idea how many emails go through Google's servers each second, and do you also have any idea of how banal the emails in question are? We're not talking about a service that has 5 subscribers, who each send 20 emails a day, with one user regularly emailing the President.

    11. Re:Gmail? by AtomicJake · · Score: 1

      Very bad idea:

      1) Privacy and possibility of identity theft.
      2) 7.5 gigs is nothing for 20 years of email - unless you do not use attachments. If you upgrade, you also share your real name and credit card etc. with Google.
      3) Your email address probably changed multiple times over those 20 years - do you want to change all emails in the sense that the email address needs to be changed?

    12. Re:Gmail? by AtomicJake · · Score: 1

      Not this shit again. Do you have any idea how many emails go through Google's servers each second, and do you also have any idea of how banal the emails in question are? We're not talking about a service that has 5 subscribers, who each send 20 emails a day, with one user regularly emailing the President.

      So what? If you have a Google account with all your personal data, and then have an issue with a Google employee (for whatever private matter; say he thinks that you cheated at him when you sold him your car) - and all your private emails (and those with your car insurance) are at Google: have fun.

      Obviously this is true for all email services, which do not encrypt all messages and decrypt them basically in your email viewer. If you have all your emails at home (and a backup in a remote location), there should be no incentive to now upload them to a service.

    13. Re:Gmail? by dkf · · Score: 1

      Tried Google desktop? You can search your outlook/thunderbird mail in a blink of an eye, as well as anything else on your computer.

      TB3 searches multiple GB of email in seconds (well, for me it does) producing not just a list of matches but also a graph of the frequency of those matches over time.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    14. Re:Gmail? by Anonymous Coward · · Score: 0

      +1 here.

      also great support for loading archives of emails.
      And I love that you can access everything online, right when you need it

    15. Re:Gmail? by ljw1004 · · Score: 1

      Yes! The thing that appeals to me the most about using Gmail is that searching through 5+GB of old emails won't make everything in my machine slow to a crawl.

      Why so slow?

      I'm using Outlook as my mail client, connected to 3Gb of email archives in a linux IMAPS server running in my basement. My laptop's about two years old now.

      It's plenty fast enough. I just tested it with a complete search for the phrase "happy birthday". it took a little over 2 seconds.

    16. Re:Gmail? by tha_mink · · Score: 1

      1) Privacy and possibility of identity theft. 2) 7.5 gigs is nothing for 20 years of email - unless you do not use attachments. If you upgrade, you also share your real name and credit card etc. with Google. 3) Your email address probably changed multiple times over those 20 years - do you want to change all emails in the sense that the email address needs to be changed?

      1. How is that different from any other mail provider? 2. Um, are you advocating not buying anything online ever? 3. WTF are you talking about? Why do you need to change the email address on old emails again? I'm thinking you don't know what you're talking about.

      --
      You'll have that sometimes...
    17. Re:Gmail? by AtomicJake · · Score: 1

      1) Privacy and possibility of identity theft.
      2) 7.5 gigs is nothing for 20 years of email - unless you do not use attachments. If you upgrade, you also share your real name and credit card etc. with Google.
      3) Your email address probably changed multiple times over those 20 years - do you want to change all emails in the sense that the email address needs to be changed?

      1. How is that different from any other mail provider?
      2. Um, are you advocating not buying anything online ever?
      3. WTF are you talking about? Why do you need to change the email address on old emails again?

      I'm thinking you don't know what you're talking about.

      1. It is clearly different from running your own email server. But it is also differrent from any email provider from which you POP your email and then store it locally.

      2. No; but I am saying that if you use a premium account then you add to all those private data also now a provable address and other information. That's also true for any book that you buy at Amazon, but you typically do not leave 10 GB private data with Amazon.

      3. If your first address was aj1990@xyz.edu and then 3 years later aj93@myhome.com and then changed again a couple of years, it might be fun working with those emails over the Gmail or any IMAP interface. Think also about your "sent" folder, not only other folders.

    18. Re:Gmail? by Anonymous Coward · · Score: 0

      7.5GB, meh.

      For $256 per year, you can buy a whole terabyte of email storage space from Google: https://www.google.com/accounts/PurchaseStorage

    19. Re:Gmail? by Anonymous Coward · · Score: 1, Informative

      I used to Google eMail Uploader. It's perfect! Recreate tags instead of Oulook subfolders and gets all contacts in the same time. It took~2h for +4000 emails splitted between Outlook and thunderbird. At the end it just makes a warning for only 2 emails attachments (on +4000).
      Now I'm happy to have everything at the same place in the cloud.

    20. Re:Gmail? by tha_mink · · Score: 1

      1. It is clearly different from running your own email server. But it is also differrent from any email provider from which you POP your email and then store it locally. 2. No; but I am saying that if you use a premium account then you add to all those private data also now a provable address and other information. That's also true for any book that you buy at Amazon, but you typically do not leave 10 GB private data with Amazon. 3. If your first address was aj1990@xyz.edu and then 3 years later aj93@myhome.com and then changed again a couple of years, it might be fun working with those emails over the Gmail or any IMAP interface. Think also about your "sent" folder, not only other folders.

      1. How is it different from any email provider that you POP you mail? You don't think they're keeping a copy? Or at least the logs to rebuild one?
      2. So?
      3. What's the problem with that? I have 6GB of Gmail from 7 different accounts and I don't have a problem with working with any of it.

      --
      You'll have that sometimes...
    21. Re:Gmail? by Threni · · Score: 1

      It'll be sorted by time/subject. That's enough for me. Labels are handy to prioritize new email but after that it's just mail -as long as I can search it, and read in order from when you've found it, that's plenty.

  12. OK, My Favorite by BoRegardless · · Score: 2, Interesting

    MailSteward on the Mac.

    SQL database. Good, Inexpensive, works w/many tens of thousands of emails & more.

    http://mailsteward.com/

    1. Re:OK, My Favorite by BoRegardless · · Score: 1

      Forgot to note a key factor and that is ultimately format independence, since email clients come and go over time & then many key output formats, so you are not restricted on that avenue.

      The search function is certainly a key for me, as sometimes I know only one key word in attempting to find a note about material, object or company from 15 years back.

    2. Re:OK, My Favorite by Anonymous Coward · · Score: 0

      I'm not trying to diss th app you mention but tens of thousands of emails is not a lot of email... I'm pretty sure most mail clients can handle that (except maybe Outlook, I haven't had to use it for a while). There are people who handle >10000 new messages a month, and have millions of saved emails. Most of those are list emails that will never be read, but that doesn't mean they're useless.

    3. Re:OK, My Favorite by rthille · · Score: 1

      Yeah, Mail.app says I have 377,002 _unread_ messages in my Computers folder (including all the subfolders).
      Not sure how many total messages, or the total size. I asked Mail.app, and it's busy thinking about it.

      Spotlight seems to do pretty well, but sometimes I can be looking at a message, see a word, and ask spotlight and it says, "huh?".

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    4. Re:OK, My Favorite by Kaboom13 · · Score: 1

      The number of messages you can store in modern versions of Outlook is effectively only limited by the file system/hard drive space available. Because Outlook uses a database for it's back end, you will get best performance by archiving older mail out to a separate file, which it does by default. I've seen some users with truly monstrous .pst files.

  13. Mbox or SQLite by Anonymous Coward · · Score: 2, Insightful

    If you want an "email format" why not mbox? Many things currently support that as an import option.

    If you want a database, why not SQLite? It's about as open as can be, backwards compatibility is almost a religion and should have no problem with hundreds of thousands of entries.

  14. Use gmail. by el_jake · · Score: 0, Redundant

    Migrate all to gmail With gmail you got room for your couple of GB. And the search feature works like a charm. Only thing missing is "folders" to make it act like you are used to.

    --
    In order to form an immaculate member of a flock of sheep one must, above all, be a sheep.
    1. Re:Use gmail. by pz · · Score: 1

      Migrate all to gmail With gmail you got room for your couple of GB. And the search feature works like a charm. Only thing missing is "folders" to make it act like you are used to.

      Although the searching features in GMail are great, I find the interface with a single unified sequence of mail, and lack of folders (the tagging feature is far too clunky) to be a major impediment. The biggest issue though, is that I do not own a copy of the information on my own server.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    2. Re:Use gmail. by mspohr · · Score: 1

      Gmail does not have folders but it does have tags. Tags can be used like folders but are more flexible since you can have more than one tag on a message. However, I have found that gmail's searching is so good that I don't even need to use the tags. Everything just goes into the "Archive" and the gmail search always finds what I want... quickly and easily.

      --
      I don't read your sig. Why are you reading mine?
  15. mbox + grep by Anonymous Coward · · Score: 5, Funny

    I use mbox format files and grep.

    IMO, one can't get much more portable than that.

    1. Re:mbox + grep by Anonymous Coward · · Score: 2, Informative

      I second that, and I also use mutt for this because it's so damn fast.

      1. cd Mail/old/
      2. grep -c pattern *
      3. mutt -f candidate-file
      4. use 'l' commands with patterns on mail fields, e.g. subject, from/to, body
      5. view limited message set in thread-sorted mode
      6. tag messages of interest
      7. save tagged messages to a small mbox, or attach them to a newly composed message and send

      it takes mutt about 3 seconds to load a 280 MB archive file with 16k messages on my machine, and less than a second to limit the display by recipient or about 3 seconds again to limit by keywords in the message body. I used to make an mbox per quarter year, but then I started merging them into one per year, as well as going back and purging some of the largest attachments. (Mutt also makes this easy: sorting by message size, then selectively saving and/or deleting attachments. There's usually 10-50 messages in a huge archive which are dramatically larger than the rest, and dealing with these makes the archive much more manageable.)

    2. Re:mbox + grep by delmo · · Score: 1

      How is this 'funny'? That's how I've archived 15 years worth of email. Really.

    3. Re:mbox + grep by KiloByte · · Score: 1

      Yay mbox with more than 1000 plain text mails, or just several ones with attachments of the size business drones tend to send you.

      Maildirs are just as greppable, with none of the downsides.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    4. Re:mbox + grep by Anonymous Coward · · Score: 0

      Why is parent modded Funny? This solution does work amazingly well.

    5. Re:mbox + grep by uiuyhn8i8 · · Score: 0

      >How is this 'funny'? That's how I've archived 15 years worth of email. Really.

      I don't understand the funny mod either. Even though I think I might be even more lo-tech. All outgoing mail for the last 15 years are concateneted to a plain text file (cause that's what rmail in emacs does) and old incoming mail is in a emacs rmail file (also plain text). With less, grep and some shell or perl oneliners I can search 2GB faster than all the gui-clickers.

  16. Use Gmail by Anonymous Coward · · Score: 0

    Gmail is the only mail service I know of that was designed from the ground up for easy searching and tagging (with Labels) your mail.

  17. Use MySQL or some other FOSS DB by perpenso · · Score: 0

    Abandon trying to do this with an email client app's archive. It is doubtful they are designed or tested with this amount of data in mind. Maybe you could set up your own email server with a web front end. Or perhaps the best route would be to use MySQL or some other database and build a web front end for browsing, searching, etc.

    1. Re:Use MySQL or some other FOSS DB by garyebickford · · Score: 1

      +1 on the database. If you can get the data into some sort of mail server temporarily, you can use procmail to parse the mail headers and generate SQL insertions. There's probably something newer - I used this method in 1998 to parse incoming mail from a remote server, that sent status updates every hour.

      Mail headers are not that difficult, so if you can get the data into a few standard formats (I don't know about the Outlook formats), you could even do this with a scripting language of your choice, directly from the file. Procmail is nice because it's very good about splitting the mail at the correct points. But, like I said, there's probably newer tools.

      In the database you only need fields for (off the top of my head) Date Sent, Date Delivered, To, From, Subject, All Headers, Body and Attachments, plus probably one separate table for the raw data with the same indices so you can augment it later with stuff like mail ID and threading, etc. Then run a Full Text index on the body and subject. You could get fancy with separate tables for all the different To and From, etc.

      In the very early 1990s I built and sold a tool for the NeXT called MailQuery, which combined NeXTMail with a 'context aware full text semantic search engine' called Metamorph - presently part of the Texis text search system. That was cool. It was phenomenally good at letting you type in key words that were related to what you were looking for, and finding exactly the right email. You didn't have to remember the exact words - just the ideas, more or less.

      --
      It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
  18. Entourage? by Anonymous Coward · · Score: 0

    Why do you hate yourself?

  19. Courier-imap by mpol · · Score: 1

    I can advise a Linux server with Courier-imap. It's easy to centrally store your mail, and as long as it's on the internet you can reach it. Even from work, with friends, or on vacation.
    It's not really fast in my experience, but not terribly slow.
    And you can save things in Maildir format, which is universally supported. And it's easy to backup with some scripts.

    --

    Well, don't worry about that. We can get you back before you leave. (Dr. Who)
    1. Re:Courier-imap by Richy_T · · Score: 1

      Also easy to search and sort. And I don't mean with a mail client's search features but truly powerful tools like find, grep and xargs.

      +1 for IMAP/Maildir.

  20. Gmail by Quick+Reply · · Score: 0, Redundant

    Use Gmail like a normal person, not your requirements but close enough [insert solution for offline Gmail backup here because Google are proprietary and evil]

    1. Re:Gmail by Sancho · · Score: 1

      Lots of people have been suggesting gmail, and that's great for some. There are some significant limitations/constraints, though.

      1) I use the common "business identifier@vanitydomain.com" trick to help identify who is selling my e-mail address. Gmail has plus-addressing, which works reasonably well, however it is imperfect. Some spammers know about plus-addressing, and strip the plus.
              Google Apps for Domains would work, except that you're pretty limited in the number of addresses you can use without paying exorbitant (for these purposes) fees.

      2) Forwarding mail to Google destroys valuable header information. Redirecting mail can cause it to get blocked by the spam filter (sometimes so badly that it doesn't even make it into your spam folder.) So even keeping your own mail server and just bouncing everything up there isn't a viable solution.

      3) Having Google pop mail from your server is probably the most workable technical solution, but then Google has your password. Also, there are size limitations, in case you happen to have large attachments that you need to preserve.

      The OP may not have any of these issues, in which case Gmail is a great choice. Unfortunately, I'm looking for the same thing (searchability) and Gmail won't work for me.

      However, mairix works reasonably well.

    2. Re:Gmail by Anonymous Coward · · Score: 0

      You aren't necessarily limited to the number of addresses without paying. I do the same thing, but I just use a catch-all... that is, all mail that doesn't match an account comes to me. Then I can have that tagged and sorted as appropriate. This lets me have infinite addresses, without needing accounts. You only have to pay for accounts.

    3. Re:Gmail by tha_mink · · Score: 1

      1) I use the common "business identifier@vanitydomain.com" trick to help identify who is selling my e-mail address. Gmail has plus-addressing, which works reasonably well, however it is imperfect. Some spammers know about plus-addressing, and strip the plus. Google Apps for Domains would work, except that you're pretty limited in the number of addresses you can use without paying exorbitant (for these purposes) fees.

      Yeah. 50. You need more than that for your email address?

      2) Forwarding mail to Google destroys valuable header information. Redirecting mail can cause it to get blocked by the spam filter (sometimes so badly that it doesn't even make it into your spam folder.) So even keeping your own mail server and just bouncing everything up there isn't a viable solution.

      So, get google to check it for you. Don't forward it, have google check it with POP or Imap for you. No problem, your headers stay intact and you're good to go.

      3) Having Google pop mail from your server is probably the most workable technical solution, but then Google has your password. Also, there are size limitations, in case you happen to have large attachments that you need to preserve.

      The size is pretty large to start and super cheap to increase.

      --
      You'll have that sometimes...
  21. not much by Anonymous Coward · · Score: 0
  22. Maildir by alexhs · · Score: 4, Informative

    Maildir.

    And if you have an e-mail client that don't support it, use an IMAP server to feed your client. /thread

    --
    I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
    1. Re:Maildir by El_Muerte_TDS · · Score: 3, Informative

      mairix is a useful addition to a maildir setup: http://www.rpcurnow.force9.co.uk/mairix/

    2. Re:Maildir by Anonymous Coward · · Score: 2, Informative

      Maildir.

      And if you have an e-mail client that don't support it, use an IMAP server to feed your client. /thread

      With the proviso that you probably want to break up your archives in something akin to the following format:
      . 2009
      . . Q1
      . . . Sent
      . . . Received
      . . Q2
      . . . Sent
      . . . Received
      [...]
      . 2010
      . . Q2
      . . . Sent
      . . . Received

      Lots and lots of messages in a directory can cause problems with many file systems. If you have more than say ~8K or so messages in a folder, I'd recommend breaking it up. At work this is what I do at work (CY/Qx/Sent-Received), which also allows me to move entire quarters into PST files when I hit my quota, but also keep a bunch of stuff online for access via web mail. Instead of quarters you can also do "halves"--H1 (Jan-Jun), H2 (Jul-Dec)--and then Send/Received with-in that.

      For my personal account, I generally haven't broken 4K messages per year yet, and so simply have (CalendarYear/Sent-Received). I currently have everything going back to about 1997 or so.

      Except for sorting mailing list traffic, I don't use folders besides Sent & Received: new messages go into my Inbox or a list folder. If I want to keep the message, it always goes into Received regardless of where it originally arrived. I try clear out the mailing list folders daily so there's no build up (select-all, delete). If I want a message that I got, I know it's going to be in one of the Received folders, and so don't have to digging in a dozen different places trying to figure out some sort of classification system.

      Work is Exchange/Outlook; personal is IMAP.

    3. Re:Maildir by houghi · · Score: 1

      Mod parent up. I would also go for the most simple way for storing and that would be plain text.

      --
      Don't fight for your country, if your country does not fight for you.
    4. Re:Maildir by Tablizer · · Score: 1

      Expending each message to an individual file can be problematic on some servers. Not only can it take up more space that way, but it may take a long time for a directory listing to process or finish. Test first.

    5. Re:Maildir by pooh666 · · Score: 1

      At least one person here has a clue. When you get into man G's of email, one email per file starts to not scale very well. Great for a years worth of data, but what if you have 10 like me? Some suggestions on things like DBmail start to come in then. I don't know wth bitrot is, but I think in general that is what backups are for and couldn't easier with a DB.

    6. Re:Maildir by Sancho · · Score: 2, Informative

      mairix is good, but it has some warts and it is not under development anymore. Among other things, it can run out of memory, has problems with parsing certain multipart messages, and can't search for an IP address (or any other string with dot-separated tokens.)

      It's about the best I've found, but I wish someone would pick up development and fix some of the issues. As time goes on, bit-rot is going to set in and mairix will get less and less useful.

    7. Re:Maildir by Have+Brain+Will+Rent · · Score: 1

      Why do you say that? Are you concerned with the amount of storage consumed by 1 file per message? Is it the search time to go through a large directory?

      --
      The tyrant will always find a pretext for his tyranny - Aesop
    8. Re:Maildir by Have+Brain+Will+Rent · · Score: 1

      Although it used to be common it has been a long time since I've seen a *nix system that had serious problems with large directories. If it does become a problem it is easy to add another level to the directory tree, e.g. by year.

      --
      The tyrant will always find a pretext for his tyranny - Aesop
    9. Re:Maildir by jgrahn · · Score: 2, Insightful

      Maildir storage format is resistant to bit-rot because it stores each message in a separate file, and uses filesystem directories for mail folders. It's widely supported by user agents (mail readers) and IMAP/POP3/SMTP servers, so you'll never be stranded by the actions of a single software vendor. Finally, it's easily searched using everyday unix tools - find, grep, sed, awk, etc., and you can use the full-text search engine of your choice for speedy searches.

      The only sane alternatives are, as far as I'm concerned:

      • a collection of mbox files
      • a collection of gzipped mbox files
      • a collection of Maildir folders
      • a collection of tarred and gzipped Maildir folders

      Maildir isn't quite as well supported as mbox, but I suppose it's sometimes more convenient to grep these since you get a hit on the particular mail you're searching for, not the mbox file which contains that mail and a thousand others.

      I use gzipped mbox files. One thing I have considered doing is to convert away Quoted-Printable MIME encoding and use Latin 1 (or UTF-8) everywhere. That would make the mboxes easier to use with standard tools like text editors and grep.

      I would never use a database for this. It serves no purpose, except as an invitation for the fuckup fairy. The searches you'd want to are free-text searches anyway.

    10. Re:Maildir by dkf · · Score: 1

      I would never use a database for this. It serves no purpose, except as an invitation for the fuckup fairy.
      The searches you'd want to are free-text searches anyway.

      A lot of database engines now support free-text searching (though you have to build appropriate indices first).

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    11. Re:Maildir by pooh666 · · Score: 1

      How many files can you have in a dir? It might not be an issue now but with ext2 there was a limit, and reaching it did very bad things. I ran into that at the 65K files mark one time and you can't even delete things like normal because you can't access the files you want to remove. You can save new emails, and so your mail server has nowhere to store anything. As always the hybrid of a few table/files mixed with a DB like access will work better for many more files. That is why I can have 60million records in my db and not care about file system issues like the above. Who came up with one file per email anyway? Why is it all your eggs in one basket or else one big giant basket? Email is old, that is why. And people keep making the same stupid mistakes with it over and over, because there are still lots of howtos doing it wrong or "good enough" for now.

    12. Re:Maildir by Bigjeff5 · · Score: 1

      Most file systems have a limit on the number of files in a directory.

      Since the idea is to be platform independent, this is something that needs to be delt with for the OP to have an effective solution.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
  23. Backwards and Forwards Compatibility by Anonymous Coward · · Score: 0

    Um, the same way people have been doing it since email was invented, text files (with base64 for those binary bits'n'pieces). Only way to be sure.

  24. Re:Psychiatric consultation! by Anonymous Coward · · Score: 1, Funny

    You, sir, are a mental case! I suspect you have OCD with some component of Aspbergers that is making you have this fixation on doing all this work to save ancient bits of information.

    You, sir, are a jerk! I suspect you have low self-esteem with some component of hemorrhoids that is making you have this fixation on being rude.

  25. Good IMAP Server by caffeinejolt · · Score: 5, Informative

    If this is really important to you, and you want it all to work across multiple workstations/OSes, your best bet will be to store it all in IMAP. If you have the means and motivation to run this yourself, I would recommend Dovecot. If you don't have the means and motivation, then you can use a service like Gmail to run your IMAP although you give up certain freedoms in doing so. For example, I use Dovecot coupled with Maildir++ as the physical storage format - as a result I can (if I wanted to) change to any email client I wish very quickly, use different email clients at the same time, etc.

    1. Re:Good IMAP Server by AtomicJake · · Score: 1

      Have fun searching 20 years of emails via the IMAP protocol ...

    2. Re:Good IMAP Server by caffeinejolt · · Score: 1

      That's why I recommended Dovecot - it uses indexes which make searching 20 years of emails very possible.

    3. Re:Good IMAP Server by AtomicJake · · Score: 1

      I used Dovecot - but a quite old version. At the time, when I used it searching 2 yeqrs worth of emails was already slow (from Thunderbird). Another issue that I had was that the search was not really comfortable - but this might be the issue with Thunderbird at that time.

  26. Re:Psychiatric consultation! by pz · · Score: 4, Insightful

    You, sir, are a mental case! I suspect you have OCD with some component of Aspbergers that is making you have this fixation on doing all this work to save ancient bits of information.

    How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important. When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

    IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea, as I want to have the email text and attachments in my possession.

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  27. Maildir by roderickm · · Score: 4, Interesting

    Maildir storage format is resistant to bit-rot because it stores each message in a separate file, and uses filesystem directories for mail folders. It's widely supported by user agents (mail readers) and IMAP/POP3/SMTP servers, so you'll never be stranded by the actions of a single software vendor. Finally, it's easily searched using everyday unix tools - find, grep, sed, awk, etc., and you can use the full-text search engine of your choice for speedy searches.

  28. Re:Psychiatric consultation! by sakdoctor · · Score: 1

    I would use a computer older than your dad just to use as an alarm clock, but I just can't help upgrading.

  29. Re:Psychiatric consultation! by Cylix · · Score: 4, Interesting

    I never thought of turning an ancient host into an alarm clock.

    Once however, I did hollow out an SGI case and turn it into a refrigerator.

    The case was just too damned pretty to throw away.

    --
    "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
  30. citadel by samjam · · Score: 3, Informative

    citadel at www.citadel.org is a full pop3/imap server with full-text indexing.

    Thunderbird can use server-side searches to find messages, and I find that works pretty well.

  31. Storage and search can be different problems by greg1104 · · Score: 1

    Have you looked at Archiveopteryx? That is one potential solution to the storage side of the problem. It stores the messages into a PostgreSQL database with minimal tinkering, so you can always get the original plain text stuff back out again. Consider it a database of mbox files that exposes an IMAP interface. You can't get any less proprietary than Postgres, and you can scale up many of its operations using standard database approaches in that area.

    What I would do here is store messages there as my permanent store for them, dump periodically to full plain-text backups just for disaster recovery, then experiment with search software that runs on top of it using IMAP as the transport. There I don't have any specific advice. Ultimately it should be possible to extend Archiveopteryx to handle that too--PostgreSQL has decent full-text search built in--but I don't know of anybody working on that.

    Probably easier to break this into two pieces, get a robust solution for the storage side, and then see what clients have search capabilities you like that won't choke on importing your data.

    1. Re:Storage and search can be different problems by Scotch42 · · Score: 1

      The Archiveopteryx solution is the one I have setup for a customer of mine with similar requirement. He has about 600GB of email now... You can access the email trough IMAP with any client or use phppgadmin or any other tools to access the database directly with any custom queries you need.

  32. An Advertiser's Fantasy ... by perpenso · · Score: 5, Interesting

    And now the poster becomes an advertiser's dream come true in addition to being a hostile lawyer's dream come true. ;-)

    Remember that from Google's perspective gmail is a tool to better profile you for targeted advertising. Make sure you are OK with that before giving them access to all your emails.

    1. Re:An Advertiser's Fantasy ... by Nemilar · · Score: 2, Interesting

      OK, so I hear this a lot and I never really understand the problem.

      The "unwritten gmail contract" (and it actually applies to most Google products) is this: We will give you a service for free (in this case Gmail), and in return we are going to profile your use of that service to select ads for you. In the case of gmail, they give you however many GB of storage, always-on cloud email, and the best searchable email system I've ever seen. There are other Google examples, from gtalk to Google Docs. The basic principle behind it is the same, most people understand the deal, and I don't see anything wrong with it. There's no such thing as a free lunch, but this is pretty close.

      --
      Nemilar http://www.techthrob.com - Visit Me!
    2. Re:An Advertiser's Fantasy ... by camperdave · · Score: 1

      By storing personal data on gmail, you are one hack away from identity theft. I prefer to keep as few personal details on the net as possible.

      --
      When our name is on the back of your car, we're behind you all the way!
    3. Re:An Advertiser's Fantasy ... by perpenso · · Score: 1

      There is nothing unwritten about it. Google is quite up front in their agreement that they data mine your emails for targeted advertising purposes. I agree that there is nothing wrong with this, but I disagree that most people are aware of this.

    4. Re:An Advertiser's Fantasy ... by wvmarle · · Score: 1

      Until they start selling that information about you to third parties. Google having a profile about me that's used in house to target ads to me, is OKish acceptable. Them selling this info to third parties is a definite no-go. And there is nothing that I am aware of preventing them doing just that, other than their own ethics.

    5. Re:An Advertiser's Fantasy ... by jridley · · Score: 1

      Yup, I'm really highly concerned that an advertiser might learn that I like electronics and am a huge computer geek. Because there's no other way they could know that.

      Seriously, this is what I did; I pushed everything to GMail, like the OP, tens of thousands of emails, going back to the 90s.

      Email is not and has never been a secure media, so if you've been putting sensitive data in emails, you're not being really bright anyway.

    6. Re:An Advertiser's Fantasy ... by Idiomatick · · Score: 1

      By keeping it at home you are one failure from losing it all.

    7. Re:An Advertiser's Fantasy ... by zekele2 · · Score: 2, Informative

      This is why I pay Google for handling my email. I use Google Apps Premier Edition with my own domain. $50 per user per year, it's cheaper than paying for Office/Outlook, there's 25Gb of space per user, and NO advertising. Using my own domain means there is no lock-in, I can use IMAP and switch to another provider any time.

    8. Re:An Advertiser's Fantasy ... by Nutria · · Score: 1

      By keeping it at home you are one failure from losing it all.

      That's a big, steaming pile of horse gonads.

      External hard drives are a cheap, easy way to back up all your data.

      --
      "I don't know, therefore Aliens" Wafflebox1
    9. Re:An Advertiser's Fantasy ... by russotto · · Score: 1

      External hard drives are a cheap, easy way to back up all your data.

      You're still one fire away from losing it all. If it's really all that important, some sort of off-site backup is a necessity.

    10. Re:An Advertiser's Fantasy ... by Lost+Race · · Score: 1

      External notebook drive + safe deposit box. Full snapshot of the filesystem each day (magical rsync scripts using --link-dest) with full-drive encryption and one or two visits to the bank each month. House burns down, I lose a couple weeks of current events.

      It's so easy and so secure, I cannot understand why nobody else seems to do this.

    11. Re:An Advertiser's Fantasy ... by vakuona · · Score: 1

      It's so easy and so secure, I cannot understand why nobody else seems to do this.

      Because it's easier and more sane to just create a really secure password.

    12. Re:An Advertiser's Fantasy ... by Nutria · · Score: 1

      safe deposit box.

      To heck with that...

      Your parents' house, girlfriend's apartment, etc is perfectly adequate.

      --
      "I don't know, therefore Aliens" Wafflebox1
    13. Re:An Advertiser's Fantasy ... by Lost+Race · · Score: 1

      Sure, that works too. "Safe deposit box" is just an example of secure off-site storage. If you already have one anyway it's about as easy as any other location, especially if your girlfriend isn't off-site and your parents are far away and you work at home and your friends have an almost supernatural ability to misplace any item....

    14. Re:An Advertiser's Fantasy ... by unger · · Score: 1

      imho, most people do not understand the deal.

      and very few people recognize a more subtle ethical dilemma. by agreeing to google's tos not only are you giving up your privacy, you are also surrendering the privacy of the people with whom you send and receive email. something almost everyone would acknowledge that they don't have the right to do without first acquiring consent.

    15. Re:An Advertiser's Fantasy ... by tehcyder · · Score: 1

      The "unwritten gmail contract" (and it actually applies to most Google products) is this: We will give you a service for free (in this case Gmail), and in return we are going to profile your use of that service to select ads for you.

      is not worth the paper it's written on,

      In the case of gmail, they give you however many GB of storage, always-on cloud email, and the best searchable email system I've ever seen. There are other Google examples, from gtalk to Google Docs. The basic principle behind it is the same, most people understand the deal, and I don't see anything wrong with it. There's no such thing as a free lunch, but this is pretty close.

      Are you sure about that?

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    16. Re:An Advertiser's Fantasy ... by Anonymous Coward · · Score: 0

      Thats not correct at all.. yes they of course take the text for advertising, but only that onethat is open at the moment. It is a usual google adsense wrapper, like you use it on your site.. ut just read actual visible stuff

    17. Re:An Advertiser's Fantasy ... by Anonymous Coward · · Score: 0

      Or if you don't have a girlfriend, and your parents' house is upstairs...

    18. Re:An Advertiser's Fantasy ... by Anonymous Coward · · Score: 0

      Good! I want to be profiled so that Google can offer me products that I am interested in and provide links that may be of value to ME! Lets face it, no matter where you go, you are going to get bombarded with advertisements. Why not make them meaningful to you?

  33. IMAP for storage by Anonymous Coward · · Score: 0

    Use a suitable IMAP server with an appropriate storage backend to store all that email. No matter the backend storage the daemon you choose uses, your email will always be accessible in an open, standard protocol by any (many) IMAP-enabled mail clients!

  34. POO (Plain Old Outlook) by CatoNine · · Score: 1

    Hate to break it here; but since 1990 I've been storing *all* my mail (and calendar and SMSes) in a plain old Outlook PST archive file. It is a fairly good and fexible database format with lots of import / export en search options. Future compatibility is well guaranteed. To keep it snappy, I've been systematically removing big attachments (documents and pictures), possibly replacing them with a texttual reference to where they are elswhere stored on disk. . I know, I know, low tech and the Borg, but future proof for now :-).

    1. Re:POO (Plain Old Outlook) by Anonymous Coward · · Score: 0

      Did you RTFS at all? Outlook fails on every requirement...cross-platform, open format and quickly searchable.

    2. Re:POO (Plain Old Outlook) by dakohli · · Score: 2, Insightful

      I have to say that PST's can be convenient. However, I have seen many corrupted PST's over the years, and yes I know that there are tools to fix this, but the name of the game here is to actually get your emails out with a minimum of fuss. Also, as to compatibility, I know MS has arbitrarily changed the format of Word. There is nothing to stop them from doing the same to the PST format, and there are several versions of that in existence now. Add this to the fact that as the PST's get bigger, performance drops off. As a really easy expedient solution, using PST's will work, but not well. Using them as a solution for the problem however, I think it will only compound the issues in the long run.

    3. Re:POO (Plain Old Outlook) by dbIII · · Score: 1

      Not very future proof when you have certain failure at 2GB and only have very slow third party tools to get things back.
      Outlook not so good.

    4. Re:POO (Plain Old Outlook) by Anonymous Coward · · Score: 0

      Nah, no hard limit since 2003: "By default, Outlook .pst and .ost files that are created by using Office Outlook 2003 and later versions are in the updated Unicode file format that allows larger file sizes (the 2-gigabyte (GB) limit is eliminated). The data file format (non-Unicode ANSI) used by Outlook 2002 and earlier is also supported by Outlook 2010."

    5. Re:POO (Plain Old Outlook) by Cato · · Score: 1

      I have had various corrupted PSTs over the years - strangely, more on Outlook 2007 than earlier versions. Most recently, Outlook managed to corrupt two files that are archives of previous years' emails, so no changes at all should have been made - quite an achievement to mess those up. One of the most annoying messages from Outlook 2007 is that the PST file is too full, and you need to delete some messages - I get this on opening the PST file and of course can't delete anything... nice Catch-22.

    6. Re:POO (Plain Old Outlook) by dakohli · · Score: 1

      Sadly I have seen that all too often. Working as a Sys Admin, a few years back, we were on Outlook 2000, and the PST's failed regularly. Sometimes we were able to restore a file from backup, but invariably the more recent emails that were needed were just not there. At the time we were not subject to any requirements that made mail restoration mandatory. I am sure that these have made life much easier for those that keep everything, and need them restored from time to time.

  35. grep by vanye · · Score: 1

    You can laugh, but its good almost enough for what I need.

    All my archived email (93-2004) was copied to a NAS as individual messages (still have the Cyrus directory structure). Its the more recent stuff that lives in PSTs that is the problem.

    One day I'll get around to going the same for my news postings. That's where the nuggets of interest are.

  36. my solution by je+ne+sais+quoi · · Score: 1

    I'll chime in with my own solution. My archive is not as extensive as yours but I have most everything from 2005 or so (excepting mailling lists, other junk, etc.). My solution is sort of silly, I just use Apple's Mail.app. The reason I use this is because Mail.app enables you to store and organize everything as separate folders and since Spotlight is blazingly fast and does a great job for searching. I try to keep my number of messages in a folder on the order of a few thousand messages, for my e-mail load I find that breaking up the folders by year works well (yes, you can still search across year). The folders themselves are stored under ~/Library/Mail/Mailboxes. Each folder has its own directory and series of .emlx which are an Apple specific form of xml that includes one message per file. The problem with this solution is that the emlx files are proprietary and subject to change. That said, I have successfully managed to copy mailboxes to new computers with a new OS. It did require an extra step or two beyond just copying my Mailboxes directories to the new computer however. Worst case though, the emlx files are in plain text so you can grep through them if you have to, and you can really had to (e.g. if you're logged onto the computer remotely), or you could write a script that parses most of the information from the file.

    --
    Gentlemen! You can't fight in here, this is the war room!
  37. Re:Psychiatric consultation! by DigiShaman · · Score: 1

    Na, he's probably a lawyer.

    That's right, I'm looking at you Mr. "I've got a 22GB mailbox on the new Exchange 2007 system". Quotas, learn em, love em, use em!

    --
    Life is not for the lazy.
  38. Donate your archive to science ... by perpenso · · Score: 1

    In your will donate your archive to science. I'm sure it would make an interesting thesis project for some PhD candidates out there. I'm seriously, consider this.

    1. Re:Donate your archive to science ... by Anonymous Coward · · Score: 0

      Yeah, in the psychology department, under "personality disorders".

    2. Re:Donate your archive to science ... by perpenso · · Score: 1

      Actually I was thinking along the lines of social/cultural anthropology. Much as some researchers are now digging through land fills from the 1940s and 1950s, corporate and government document archives, etc. Tossing in a few personal document archives might be interesting.

  39. Fairly reliable way to get mail out of old clients by bruns · · Score: 1

    Theres one method i've used fairly often in the past for getting mail out of an older client - provided the older client supports imap (lookout and lookout express do).

    First, setup a new account on your imap server just for archival purposes (you can setup an imap server on any UNIX/Linux distro and even Windows with Cygwin fairly easily - dovecot is a good place to start). Make sure its using either mbox or maildir (preferred).

    Second, setup said account on all the mail clients you'd like to archive. Make sure you are setting them up as imap and not pop3.

    Third, drag the contents of each local folder/inbox/etc to a folder on the archive specific imap account. It will take a while, but the entire contents of your mailbox will be copied over, message by message, in imap's way of doing things, then deposited by the imap server into a the local format of your choice.

    You've just created flat text versions of client specific archives. Create folders, sub folders, etc and organize things in your modern client which can easily do imap. You can easily search with any numerous free packages, archive and compress permanently with squashfs, or even just leave them available through imap to search with the new Thunderbird's (3.1) global indexer.

    --
    Brielle
  40. Stuff it in a server by SplatMan_DK · · Score: 1

    You should put all that stuff on an IMAP server on your home network (preferably a box you can reach from the outside using DDNS or a static entry if you have your own domain).

    In that way your client OS'es can be whatever platform you choose, and they will all be able to access your mail storage.

    Put older mails in separate folders.

    If you can work with Linux there are plenty of choices. If not, consider Windows Home Server and get a mailserver product for Windows - there are plenty!

    Many advanced email clients, such as Outlook or Evolution, will allow you to search for mails based on any criteria you like (subject, sender, body, date, etc). Hmmm except perhaps the actual mail header ;-)

    Personally i would never do this though. Generating and saving data is easy - limiting it is hard. Consider deleting stuff - you could start by deleting everything older than 36 months. The more you have to search through the more difficult it gets. In the end finding a single mail will be (or in your case: IS) like a needle in a haystack ...

    Also, why save all mails? Every time you reply to a mail a copy of the original mail is often included in your answer. So from today, consider deleting All inbound mails that you reply to ;-)

    - Jesper

    - Jesper

    --
    My security clearance is so high I have to kill myself if I remember I have it...
  41. Python! by BertieBaggio · · Score: 1

    While this answer will almost certainly not suit the OP, it may be of interest to other folk looking to archive their email. Using python and a combination of imaplib and some basic file I/O you can save the original text of messages. My rationale for this was firstly that it's probably less problematic than converting between various email client formats; and secondly that it's a decent way to learn some python! ;)

    My rather basic implementation just dumps every email from an (IMAP) folder sequentially. I rely on grep for searching. However, it does have the prerequisite of the email being stored on a mailserver accessible via IMAP.

    --
    If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
  42. Look at it a different way by Eristone · · Score: 1

    Scary thought, but you might just want to pick up one of the tools that the lawyers use for electronic discovery. They cover multiple mail formats (including older generations of said formats) and set it up so that it's easy for an intern to search for keywords and the like, so someone that understands tech should be able to use it I've had to use the Clearwell appliance and it did what it was supposed to do, including finding attachments and indexing them for ease of search. (No, I don't work for Clearwell, and wouldn't have used their tool at all except for t.. er anyways)

  43. Roll your own... by Anonymous Coward · · Score: 0

    This sounds like the perfect time to roll your own software to do what you are looking for. Use a LAMP stack, write or use a few format converters, voila! you're done!

  44. Making things easier. by Anonymous Coward · · Score: 1, Informative

    To help spare you the precious keystrokes it would take to Google this yourself, you can go straight to “Google Apps for Businesses” and sign-up. Now did you really have to Ask Slashdot?

  45. Re:Psychiatric consultation! by garcia · · Score: 2, Interesting

    Starting with GMail I have kept every e-mail since 6/22/2004. I also brought over many e-mails I had in my saved folders from long before that. Am I insane? No. I have found this archive incredibly useful for any variety of uses even 6 years later.

    Nothing like having your wife ask, "man, I wish we still had the recipe for deviled eggs we made in college. Too bad it was back in 2001." "No problem honey, hold."

    Date: Fri, 26 Jan 2001 13:40:46 -0500
    From: yoyoskippy
    To: garcia@tigerose.com (now dead, have at it spammers)
    Subject: Deviled eggs

    Deviled Eggs

    6 hard cooked eggs
        (throw two more eggs in, so you can check how they are doing)

    pinch of salt (thats a pinch boy, wayyyyy less than 1/4 tsp.)

    1/4 tsp. pepper
    1/2 tsp. dry mustard
    2 Tbsp. Hellmans
    1 Tbsp. Miracle Whip
    Paprika (sprinkles)

    Boil the eggs, use the extra two eggs to check the eggs process. when boiled crack the shell a bit with a spoon. then put the eggs in cold water w/ice cubes. this makes it easier to peel the shell off the egg. Next take the yolks out of the eggs and smash up very finely with fork. next add all of the ingredients together to make the topping. mix well. spoon the mixture onto the egg and then sprinkle on paprika. enjoy. yum yum!!

    Pulled that out a couple weeks ago for a picnic. Yum yum!! was right.

  46. Anonymous Coward by Anonymous Coward · · Score: 0

    Throw all that all old stuff away. When you need to do a email search, file a Freedom of Information Act request with the NSA.

  47. Plain text and Google desktop by Anonymous Coward · · Score: 0

    Future proof your emails by keeping them in plain text format. Then use third party software to index and search your email collection. I recommend google desktop.

  48. Store them in mbx format by Anonymous Coward · · Score: 2, Insightful

    I recommend mbox (MBX) format.

    1. The format is text based and not likely to become unreadable anytime in the forseeable future.

    2. There are no shortage of tools for manipulating mbox.

    3. Its easily indexed by full text search applications (MS Search included with windows)

    The outlook tools save dialouge has an apple export option which is actually the mbox format.

    In terms of archival access I recommend an IMAP server with a folder hirarchy based on month/year. Your mail client should be configured to leave the messages on the server (not attempt to download via IMAP). This somewhat future proofs migration to different mail clients.

    The only issue is that imap searches are out of the question so you will need to do searches offline with a full text indexing/search application to first find the general folder location of the message you are seeking.

    If your computer has lots of memory then why not just use grep and write a small shell script to forward the message from the archival file to your inbox so that formatting..etc is preserved. If your doing lots of searches the disk cache will back most of it in ram even if its a few GB..

    1. Re:Store them in mbx format by Sancho · · Score: 2, Insightful

      I find that Maildir works better than mbox for my purposes. Roughly all of the same pros, plus:
      4) Doesn't require locking your entire mailbox to modify one message.
      5) Resistant to file/inode corruption (will likely only corrupt one message instead of several.)
      6) Can essentially use shell tools to copy individual messages.

      One thing that's neat to do with maildir mailboxes is to search using grep+xargs and copy the messages you find into a new maildir mailbox (named, perhaps, searchresults). Then you have a handy mailbox populated with your search results. I imagine one could even do this using procmail, so that you could populate the mailbox remotely.

  49. Use a gmail account by dethkultur · · Score: 1

    I did this myself, going back only 10 years though. It has been invaluable. Gmail gives you 7GB (with a little more every day), and the searching is top notch and instant.

    There are several apps out there to import mail into a gmail account, and it is pretty easy your email is still available via pop or imap (which I'm doubting)... for stuff in a pst file, what I ended up doing was adding the new gmail account into outlook, and then dragging and dropping emails 1000 at a time into the new account. (i also did this for a Groupwise mailbox from one old job) It's slow, but it works. In addition, it tags the mail for you with "Inbox" or "Sent", so you can easily retag it later. Once it is in there, it is a little gold mine to get whatever you need.

  50. I've had this same problem... by lpfarris · · Score: 1

    I was hoping to read some answer that answered my similar requirements. My requirements were for a searchable, portable mail message database. Ability to tag messages is also important. I had high hopes for Mozilla Raindrop, but my last experience with it didn't do anything for me. Here's what I am doing now: I have set up an IMAP server (imapd) on an Ubuntu server. Thunderbird is currently my primary email client. Thunderbird connects to all my various email accounts. When I am ready to archive an email, it gets copied to a folder on my imap server. The emails are tagged, and stored in folders by quarter to keep any particular file from getting to large. What I would like is the ability to store them in a searchable database with an open source implementation.

  51. Re:Psychiatric consultation! by Anonymous Coward · · Score: 0

    Dear god I'm glad my wife will never see my college emails.

  52. mbox +mutt/thunderbird+mairix by Anonymous Coward · · Score: 1, Insightful

    I have been archiving my mails for the past 10 years. My method has been to download the mails in mbox format once a year and use a combination of mairix to search through teh mails and either mutt or thunderbird to see the actual mails.

  53. Maildirs + Mairix by Anonymous Coward · · Score: 0

    Use Maildir(s) and Mairix for the search engine.

  54. Please seek psychiatric help by Anonymous Coward · · Score: 0

    What use does this have? Isn't this just the digital equivalent of hoarding? Delete all of this, you'll feel better. I delete any email over two weeks old.

  55. Hotmail by Rik+Sweeney · · Score: 0, Flamebait

    Why should Gmail get all the attention?

    1. Re:Hotmail by ahess247 · · Score: 0

      Why should Gmail get all the attention?

      Because as web-based email goes, Hotmail is abysmally bad?

    2. Re:Hotmail by mjwx · · Score: 1

      Why should Gmail get all the attention?

      Because Hotmail and Ymail are crap.

      Gmail is on par with the better paid for email services I've used, nowhere near as good as having a fully qualified and competent sysadmin running your own private server but still, Gmail is free.

      --
      Calling someone a "hater" only means you can not rationally rebut their argument.
  56. Just because I can? by mrv00t · · Score: 1, Interesting

    would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems:

    ...so you can't?

  57. We have something similar at Work by juanca · · Score: 3, Insightful

    At work, we needed to archive (for compliance purposes) all the inbound/outbound email messages of our users (about a 1K aprox). We setup an Ubuntu server with postfix and dovecot IMAP over SSL, using Maildir.

    Our users generate about 20K email messages daily, and we store each day in it's own directory, something like this:

    INBOX
            |- YYYY
                          |- MM
                                    |- DD

    The auditors use Evolution to connect to the archive server and search the emails, even though it takes a little while to load a day of emails for the first time, once it's properly loaded searching is really fast. The server is not that powerful, it's a VM with 2 CPUs and 2GB of RAM. You do need a lot of storage though.

    Hope this helps.

    --
    --Necesito una chela, bien fria...
  58. Whats wrong with Eudora? by sjs132 · · Score: 1

    I still use Eudora... 7.1.09 paid mode from years ago... I use XP for my wifes computer and have different Eudora folders based on who is logged in. Works like a champ. The nice thing is I can sort the old emails by sender (for listserv's and such) to be put into folders, and then use the find email function to search things. I hardly ever have problems finding an email as long as I know WHO/WHAT I'm looking for and where - Body, from, subject, etc.. Sadly, No meta tags.. :( BTW, Mine goes back to.. early 90's also when @ college we used Eudora on Floppies with Windows 3.1 I think... Maybe it was 95 seems so long ago...

    --
    --- Relax, that mass muderer is just trying to reduce our carbon footprint, one fetus at a time...
  59. imap + sql for storage by itzdandy · · Score: 1

    The many comments here about using just imap with maildir or mbox storage backends forget to mention that these are all very slow to search when you have thousands of messages. They dont store the files in any kind of disk-seek friendly format. soo..

    I suggest either putting a dovecot with maildir++ system on fast SSD to overcome the poorly organized(on disk) files
    -and/or-
    using a mysql/postgresql backend on dovecot or courier or your favorite imap that supports *sql. The mail would be stored with each detail in a different column in the table. Then you can index the sender, recipient, subject etc. You will need to either have a mail client that can use imap search so you can get the search to happen on the db side, or you could put together a php interface to search the database directly for the messages you are looking for.

    imap isnt going away in the next decade and either is mysql or postgresql or the sql language in general. worse case would be to migrate the mail table to a new db, which would be done with a db dump and fairly trivially.

  60. Don't use PST by LoganTeamX · · Score: 1

    PSTs are hard-coded to tank, depending on the version of Outlook used. Right now with Outlook 2007 it's 20GB. Nobody NEEDS that much mail, but as an archive it's possible. Maybe a CMS server like Knowledgetree? Provided that it can parse the mail passed into it, it's a great open-source project that seems to have great staying power and development. I'll be testing that myself this week using mail messages that currently reside in Thunderbird.

    --
    One of the 187.
  61. Re:Psychiatric consultation! by AnonymousClown · · Score: 1

    Red or blue one?

    --
    RIP America

    July 4, 1776 - September 11, 2001

  62. Re:RETARD MODERATION by Anonymous Coward · · Score: 5, Funny

    Parent is +informative and/or +interesting, not troll. Fucking brain dead moderators these days. Sheesh.

    it suggested a linux solution and made the windows weenies realize how useless their os is. by extension they realized how tiny their penises are and then they finally understood why they like Micro Soft because it describes them perfectly. so they got mad and said "i'll mod it down, yeah, that'll teach them a lesson and make me feel like a real man again!"

  63. Kmail for Outlook stuff and Search. by twitter · · Score: 2, Informative

    Kmail has an excellent .pst converter that will pull out your old Outlook mail. Once you have it in Kmail, you can drag and drop it into any of the supported formats, mbox, mdir etc. If you have already established filters, you can let them sort things out. If not you can use a manual search for to, from, mail list, subject, etc. From there you can run your imap. I carry everything around on my laptop and use kmail instead of using imap. With full drive encryption and xscreensaver, I don't have any worry about losing private information and know that my ISPs have better collections of my email anyway, despite what they say about size limits. I could use Gmail's imap instead of my own but prefer to suck my gmail out with kmail's imap support. Until US networks get more reasonable, I want my mail with me instead of on my own server and I would not advise anyone to leave their mail on someone else's server without having a copy yourself. Because your question is all about search, I have to plug Kmail again. With proper organization of your mail into subfolders for friends, family, lists, companies and projects, mail searches are quick, even on modest hardware like my ancient PIII laptop. Searching everything takes a little longer, but it is not such a burden. Evolution may do as well but something about Gnome turns me off. The only downside is that the 3.5 branch does not seem to be able to search through encrypted mail but I imagine there's some gpg-agent fix for that I'm not aware of.

    --

    Friends don't help friends install M$ junk.

    1. Re:Kmail for Outlook stuff and Search. by AndGodSed · · Score: 2, Informative

      ++ the above, or Evolution - it also imports PST's and from there you can move it to Thunderbird for Windows. If you want uber searchability you could then upload the whole shebang to a gmail account that you sync offline via gears.

      I personally would balk at having all that stuff online with google but hey that would be the best searchable option I know. You can also sync with your Gmail account via imap protocol if gears and the web interface is not for you. Problem with that is that you will lose the great search capability with Gmail.

      Then again Thunderbird has some really cool search addons that might just take care of your needs altogether, plus it is platform agnostic - you can have it on BSD, Linux, Windows or Mac.

      HTH!

    2. Re:Kmail for Outlook stuff and Search. by Vancorps · · Score: 1

      From my perspective Evolution is pretty pointless unless their storage mechanism can not properly support greater than 2gig mailbox sizes. When you have users with thousands of mails organizing every which way it's not particularly practical to have multiple personal folders. Microsoft has spent the last decade trying to reduce the need for psts and here you have a new solution which brings the problem back to the foreground.

      Since the original question was the best way to store email for later searching I'd suggest Commvault as it's a server based solution and integrates nicely with proper backup procedures. It'll cost ya but it's a hell of a lot faster to setup and use than any other solution I've seen. It'll backup and archive email from almost any server so you can archive multiple environments and give yourself a consolidated search as opposed to pushing it onto the users.

      In my experience most companies only care about archiving old email for litigation purposes so you need something that can be audited easily and readily. Even Exchange 2010 mailbox archiving isn't sufficient for this purpose. The closest Linux based solution I've encountered was MailArchiva which looks pretty decent but is still no where near as complete as a Commvault solution.

    3. Re:Kmail for Outlook stuff and Search. by Kvasio · · Score: 1

      that was my way of thinking but in recent months I've seen (and I am not alone) that gmail keeps on not finding things I am sure are there (and are eventually found there).
      Perhaps part of the cloud is not eager to take part in indexing or searches ....

    4. Re:Kmail for Outlook stuff and Search. by AndGodSed · · Score: 1

      My mailbox regularly reaches more than three gigs and I used Evolution for a while before jumping back to Thunderbird.

      I now tend to keep it small.

      One thing I really dislike about Evolution is the way it breaks when you move folders in the mail tree.

      Try it and see the breakage. As a PST conversion tool it is pretty cool though.

    5. Re:Kmail for Outlook stuff and Search. by Anonymous Coward · · Score: 0

      Warning: User maintains more than a dozen sockpuppet accounts on Slashdot.

  64. Mac OS X Mail by Anonymous Coward · · Score: 0

    I have worked across most of the clients you mention and found the search interfaces (especially in Outlook) to be horrendous. When Spotlight search came out on Mac OS X, the speed of searching my emails in OSX Mail got so fast, that I now use it as a reference. I have stored email back to 1993, and searches come up in split seconds. There are several subjects that I check my historical email from 11 years of mailing lists before going online or checking a book. I regularly use it to find out "where I put that email from X".

    1. Re:Mac OS X Mail by Smurf · · Score: 1

      I agree with this.

      In fact, the OP said he uses (used?) Entourage, which means that he has a Mac (or at least had one recently).

      One important thing the AC did not mention: You can easily export from Mail to mbox format (just select the messages you want and choose Save As... and "Raw Message Source" format). mbox is unlikely to go away any time soon, and is anyway text-based so the info will always be recoverable.

      Having an exit strategy is crucial when choosing a format for your data, be it email, music, photos, word processing, etc.

  65. IMAP with maildir backend by Fat+Cow · · Score: 2, Insightful

    I migrated all my old personal emails to gmail using IMAP. You can use this to migrate between different on-disk formats like maildir, mbox and pst. I had all my email in yahoo and pulled it down using POP to a maildir, then used an IMAP mail client to copy it across to gmail. Then I regularly back them up from gmail to an on-disk maildir format using mbsync. I picked maildir because it's open and seemed better designed than the alternative, mbox. It's not completely standardized though. I've seen PSTs become corrupt so I try and stay away.

    --
    stay frosty and alert
  66. Try Aid4Mail by crath · · Score: 1

    There's a commercial, but low cost, package that I've used to do exactly what you are describing: http://www.aid4mail.com/

    Aid4Mail converts email to and from a variety of mail formats. The feature that you might find useful is that it will create a zip archive that contains standard .msg format email messages. Use that in combination with an indexing programme. I use X1 (http://x1.com/), but there are lots of indexing programmes that will index zip archives for easy searching.

    1. Re:Try Aid4Mail by wodon · · Score: 1

      Aid4Mail is great at converting mail stores into generic formats. It can rip into individual messages or mail stores.
      Run it with the /f option to get more emails.

      The Fookes support guys are really helpful too.

      Also the command line tool is great.

      This is only for the conversion though, won't help with the indexing.

      --
      It's My Tea and I'll Drink it if I Want To!
  67. Nostalgia by Anonymous Coward · · Score: 0

    I hereby name you MR NOSTALGIA.

  68. Re:Psychiatric consultation! by ciderbrew · · Score: 3, Funny

    What do they say?

    June 2001 - "Dave, can't go out tonight. I got a date with that fat chick.YEAH!"
    Sept 2001 - "Dave, She's told me she pregnant."
    Jan 2002 - "Dave, will you be the best man at the wedding :(".


    Shhhh - Dave's the real father (AC doesn't know)..

  69. MySQL by perpenso · · Score: 1

    Perhaps the best route would be to use MySQL or some other FOSS database and build a web front end for browsing, searching, etc

    1. Re:MySQL by Anonymous Coward · · Score: 2, Informative

      I've been using hMailServer for a few years now, and it's free. It's an ISP solution, but has some great IMAP facilities like shared storage. I have over 120GB of IMAP data and it's not twitching. It has a MySQL Lite backend, and capabilties for web, pop, imap, AD integration, etc etc. Would recommend it.

  70. reiser! by Anonymous Coward · · Score: 0

    I haven't seen this mentioned yet, but if you DO go with your own IMAP server use ReiserFS for whatever partition the mail resides on. Generally faster for small files (like old emails).

  71. Re:Psychiatric consultation! by m50d · · Score: 0, Troll

    Please. It's never "vitally important"; no-one will die if you don't. I wonder how much difference your "vital demographic analysis" has actually made to anything, ever.

    --
    I am trolling
  72. MH Mail by Anonymous Coward · · Score: 0

    MH Mail is an old standard, but it's mailbox format works very well and the tools scale even better. It's basically similar to Maildir, but stored in the user's home directory.

    I have mailboxes with a million messages in them and it works fine (still takes a while to search, but it doesn't suffer from a tipping point of bad behavior).

  73. Easy by Anonymous Coward · · Score: 0

    Get Gmail. Star everything important.

    Done.

  74. Re:Psychiatric consultation! by houghi · · Score: 1
    --
    Don't fight for your country, if your country does not fight for you.
  75. SOLR by Anonymous Coward · · Score: 0

    Import them all into SOLR. Lucence based full text indexing and can import various binary file types.

  76. XENA by WetCat · · Score: 1

    http://xena.sourceforge.net/
    A great Java free software for mail (and other documents) automatic normalization and archivation, developed by Australian Government

  77. Google Apps by Genocaust · · Score: 1

    Google Apps for your domain offers a bulk-import feature from Outlook and other clients.

    Gmail offers all that you wish for. Take the free premium trial for GApps, bulk import, then cancel. Problem solved? :)

    --
    It could be that the only purpose of your life is to serve as a warning to others.
  78. DO NOT DELETE. by GuyFawkes · · Score: 5, Insightful

    I can't tell you the number of times I nearly deleted my archived data, going back to 1997 in my case, not just e-mail either.

    Then I got falsely accused of everything except 9-11 as part of a separation / child custody battle that started with a nuclear attack out of the blue.

    It is amazing how much of that old data is relevant in such cases, "He did x on 1st June 2000 at our house!" and you have data showing you were 200 miles away doing something you had completely forgotten, with someone you haven't spoken to or seen for 7 years, at the time...

    DO NOT DELETE YOUR ARCHIVES, EVER!***

    *** unless of course you are a bad person and they incriminate you, in which case you'd better avoid everyone else who archives data.

    --
    http://slashdot.org/~GuyFawkes/journal
    1. Re:DO NOT DELETE. by cervo · · Score: 3, Insightful

      this can also work against you. Most big companies have record retention policies that include when to delete e-mails. Because those same archives that saved you can bite you in the butt. Also in reality you should be innocent until proven guilty anyway, although I know civil court works differently. But if there is anything you did, maybe an e-mail to another woman that can be spun as evidence you had another girlfriend (even if it was a harmless e-mail just saying hi) then it could bite you.

      Plus no one is 100% squeaky clean. Maybe you admitted you were speeding to someone. Maybe you bought porn website memberships (which could be spun as the reason for a break up, or that you are an unfit parent). Maybe you admitted you were a little too drunk to drive but did it anyway. Maybe you ordered a set of army knives and have the receipt and that gets spun as you have weapons all over the place that could endanger the kids....

      Anyway just saying that too many records could bite you too. Especially if someone from court gets an order for all of them. Then they can be pulled out of context and could be very damaging. Even medical issues could be in the e-mail archives from correspondents with doctors, confirmations of appointments, etc... If that data ever got out it could be damaging to buying insurance as well.

    2. Re:DO NOT DELETE. by afabbro · · Score: 2, Insightful

      Alternatively, spend more time on your personal relationships and home life than maintaining your email archives.

      --
      Advice: on VPS providers
    3. Re:DO NOT DELETE. by Anonymous Coward · · Score: 0

      Even medical issues could be in the e-mail archives from correspondents with doctors, confirmations of appointments, etc... If that data ever got out it could be damaging to buying insurance as well.

      I am not sure how it works in the US, but in most countries you actually have to declare specified medical conditions - intentionally hiding it can be tantamount to fraud. So I guess that you should say "if you commit fraud don't store any evidence", but that is just common sense :-) Also for insurance - a lot of companies won't do extensive vetting before policy acceptance, but if you claim they will fully investigate you. This makes sense - if you didn't declare upfront, they can just reject a claim AND keep premium income.

    4. Re:DO NOT DELETE. by Anonymous Coward · · Score: 1, Insightful

      "If one would give me six lines written by the hand of the most honest man, I would find something in them to have him hanged."

      --Armand Jean du Plessis, Cardinal et Duc de Richelieu

    5. Re:DO NOT DELETE. by tehcyder · · Score: 1

      part of a separation / child custody battle that started with a nuclear attack out of the blue

      Wow, that is pretty hardcore.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    6. Re:DO NOT DELETE. by Abstrackt · · Score: 1

      part of a separation / child custody battle that started with a nuclear attack out of the blue

      Wow, that is pretty hardcore.

      He was living in Hiroshima at the time.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
  79. What I (would) do by Lennie · · Score: 1

    As many above have mentioned part of this, I just wanted to put some of it together:

    - setup a small server with a file system with checksums - ohh, that probably just leaves zfs
    - setup dovecot on the server with maildirs
    - setup clients to use imap to put messages on the server, if you have any existing imap-accounts, use mbsync directly on the server
    - setup thunderbird as a client to index it all in thunderbirds own index-files, so you can search it directly from thunderbird
    - use xapian or something similair to index your maildirs on the server so you can search it on the commandline when you need to
    - use rsync to copy the whole bunch offsite to somewhere that you trust or use duplicity to copy it somewhere you don't trust

    --
    New things are always on the horizon
  80. mbox by dskoll · · Score: 1

    have mail going back to 1991 archived as mbox files. Some of it is pretty disorganized, but since 2000 I've organized mail into Sent-Archived and Received-Archived directories with the mbox files named YYYY-MM.

    It's a pain to search. But on the other hand, I hardly ever need to search the really old stuff, so grep and friends are good enough.

    I may eventually split it out into maildir format and use a full-text indexing engine such as Xapian to make searching easier. But I'll probably keep the master mbox archive; the format is incredibly simple and it's easy to munge into other formats as necessary.

  81. Re:Psychiatric consultation! by Anonymous Coward · · Score: 0

    Got you both beat. Except for a 1-year gap when I was using a VAX and a 1-year gap where I lost all my data on main drive and backup, I've got all my email since 1987. Yes, Virginia, they did have email in 1987.

  82. Re:Psychiatric consultation! by Anonymous Coward · · Score: 0

    You, sir, are a mental case! I suspect you have OCD with some component of Aspbergers that is making you have this fixation on doing all this work to save ancient bits of information.

    How was this modded Informative? Saving correspondence for future reference is critically important.

    Many good things taken to an extreme become a clinical warning sign. Note the poster saves *all* emails and admits that this includes a lot of junk mail. Saving an email that contains some sort of technical/business/etc content is one thing but do you also save *all* the "where do we go for lunch today" emails or just the one that references some newly discovered gem of a restaurant?

  83. Echo chamber... by MrNemesis · · Score: 4, Informative

    ...has me doing a "me too!" to everyone telling you to use IMAP + maildir; I use dovecot myself, complete with self-signed SSL cert (curse you firefox!).

    El_Muerte_TDS has just pointed me towards mairix, a dedicated maildir + friends indexing system which I've just tried out, and seems to be ideal for my use - fast email search has always been a good thing for me, but I've rarely found a nice lightweight indexing solution that was catered only to mail; "desktop" search engines tend to take the opinion that if I want one thing indexed then I automatically want everything indexed, and also insist on running around the clock. Much nicer for my needs to just have one little lightweight indexing program that only runs when I want it to.

    Best thing about mairix IMHO is the way it creates a virtual maildir on the fly using symlinks, so not only is it easily viewable on the command line, it's also automatically compatible with all of those IMAP + maildir clients out there... which, last time I looked, was all of them. Useful hack for KMail users here.

    Disclaimer: my IMAP server has all its databases on an SSD, so even full text searches from the client are pretty speedy (seriously - the lack of access times on small chunks of random data cuts down search times by at least an order of magnitude), but obviously mairix has the advantage of being able to scale to multiple users with >X GB mailboxes much easier than spending a fortune on fast storage.

    --
    Moderation Total: -1 Troll, +3 Goat
    1. Re:Echo chamber... by worf_mo · · Score: 1

      I sync my mail (~ 2GB) from the server (courier) to my laptop with offlineimap. A number of years ago I used mairix and later nmzmail to index and search the Maildirs, but then I settled on mu. I find mu to be very fast, both when indexing and when searching.

      Also, with mu I can integrate Emacs, org-mode, remember and mutt, which is the perfect combination for my needs. I use org-mode as a GTD-like task manager, and from within mutt I can create a new entry in my org file with a reference to the message in the Maildir.

  84. Add A Key Word by b4upoo · · Score: 1

    Although it would involve keeping an index you could add a strange key word to each piece of email within the body of the email. For example all emails from Donna in 2009 could be tagged with donna09. Running a search should yield all emails from Donna in 2009. You could also add the month. jaunuary09donna for example. You can even ask people to install a tag in every email they send to you.

  85. IMAP is a protocol, not a file format by NotesSensei · · Score: 1

    IMAP is a messaging protocol. You can't store things in IMAP. What you can do: upload eMail messages to a mail server which then stores it in [insert-mail-server-specifics-here]. The format you are looking for is MIME. MIME is complete and keeps all the header information. Every message is one file that can be read on any platform. You could opt for MIME messages in a directory structure and use some fulltext index software (Google desktop, Apache Lucene etc.) You can probably find software that creates index lists (like by sender / subject / date)

    1. Re:IMAP is a protocol, not a file format by caffeinejolt · · Score: 1

      I did not assert it was a format. As far as the format, I recommend Maildir++, which when coupled with Dovecot (the IMAP server I recommended) does exactly what you wrote "You could opt for MIME messages in a directory structure and use some fulltext index software (Google desktop, Apache Lucene etc.) You can probably find software that creates index lists (like by sender / subject / date)"

    2. Re:IMAP is a protocol, not a file format by Bigjeff5 · · Score: 1

      You did say "store it all in IMAP", which is incorrect.

      Of course, you clarify that a little later by recommending Maildir++ for the physical storage format.

      Nothing is ever stored in IMAP. It can only be sent or received in IMAP.

      Frankly, since the OP never asked for a messaging protocol, I have no idea why everyone is recommending IMAP for anything. He wants an archive, and he wants it to be platform neutral. Frankly, you can't beat something like a SQL database for those requirements. Maildir and the like seem alright, but are running into the "I don't want a format that will go away in a few years" part of his request. SQL is going nowhere, and you can put anything in a SQL database. Done correctly, it can also be very quick to find and retrieve emails.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
  86. Domino by Belial6 · · Score: 4, Funny

    Yes, it is not free, and yes, this suggestion will bring out the trolls, but you might want to consider Lotus Notes/Domino. It is ~$140 for the system, and ~$40 a year maintenance (Includes all upgrades) cost per user, but IBM isn't going anywhere any time soon.

    It has good full text indexing, you can keep your mail on a client, and on the server, with incredibly flexible replication rules for what is stored where.
    It supports IMAP, so it talks well to most clients.

    The iPhone syncs seamlessly with it via ActiveSync, and an Android client is in beta as we speak.

    It includes an http client, and the http client even offers offline access. That's right. You can use the http client, and still read your mail and write emails that will be sent the next time you make a connection.

    It also has folders, but you can put any email into as many folders as you want, so you have the best of both Outlook folders and Gmail tags.

    It supports auto-processing rules for automatic filing of data, as well as being a full development environment if you want to get really fancy.

    It is brain dead easy to set up and maintain.

    The server runs on Linux and Window, and the client runs on Linux, Windows and Mac.

    1. Re:Domino by crunzh · · Score: 1

      But the client is horrible! Makes my head hurt and every sense of good UI design cringe! I have used nothing worse than the notes/domino client that I use at work. I would rather use hotmail (and thats not saying anything nice about hotmail)! There are so much basic functionality that works badly. Just remembering what emails are read/unread is hard for it when you have multiple servers.

      --
      Visit http://www.crunzh.com/ for free software. Mac/Lin/Win
    2. Re:Domino by Belial6 · · Score: 2, Informative

      Seriously, what is wrong with or for that matter, the Notes client web client?

      I call you out troll. I also call you out on your made up problem of not knowing if something is read or not. Unread marks replicate between servers.

    3. Re:Domino by crunzh · · Score: 1

      Try reading comment #6 and followups for a explanation of one of the issues: http://teneo.wordpress.com/2009/08/20/a-second-look-at-lotus-notes/

      --
      Visit http://www.crunzh.com/ for free software. Mac/Lin/Win
    4. Re:Domino by Anonymous Coward · · Score: 0

      is that still alive ?...

      Or.... just use outlook on ramdisks :)
      i think it is cheaper

    5. Re:Domino by Belial6 · · Score: 2

      Ok, I read your link. You know, the one that praises the UI for Notes. I still call you out for trolling. The complain the guy had had NOTHING to do with caching. As was explained by the follow-ups to your #6. Unread marks are maintained for each USER. The unread marks replicate just like all other data. No caching issue at all. Basically you are complaining because Exchange is broken, and you want Notes broken in the same way. Why on earth would I want MY unread marks to be changed by YOU reading something?

      Shared mail is generally a questionable action to take anyway. While there are a few exceptions, it is generally what incompetent admins do because Exchange is feature incomplete, and they try to set it up like Exchange. The proper thing to do is use the mail template and create a mail-in database. That way you don't have a user to maintain, and you are not paying for extra licenses that you are not using. Whether you are correctly using a mail-database derived from the mail template, or if you are incorrectly using a fake user, you can still have a mark to indicate that a document has been opened by SOMEBODY. Just put @SetField("GroupRead", "X") in the QueryOpenEvent of the form and add a column to show that. You can get fancier, but that is no harder than a simple Excel spread sheet. With Notes/Domino, you can have proper per user read marks, OR group read marks.

      Of course, even though Notes handles it fine, I would like to know what tool handles it 'right' by your standards, and how that tool lets me know if I have read the email as opposed to anybody having read it.

      So, again, I call you out troll...

  87. Re:Psychiatric consultation! by Idiomatick · · Score: 1

    My old alarm clock PC doubles as a web server.

  88. Here is what I am doing .... by Anonymous Coward · · Score: 0

    As a number of people have suggested, I use IMAP but here is my scheme.

    For emails dating back a few years (2.5 in my case), I have this stored on a hosted IMAP server with server side capabilities. For emails older than that I have stored in mbox format interfaced to by mulberry (a now mostly dormant client), but any mbox aware client will work.

    The hosted IMAP server holds all my sent mail, archived, and inbox mail in last 2.5 years. It has a webmail interface (horde), I also use the following clients on various computers to access it: postbox on my netbook, and mulberry on my desktop.

    People have suggested gmail, and that works too, but my solution above is a freebie given that I already have to pay for web hosting. It also gives me the freedom to use aliases to filter mail as well as own my own email address.

  89. ASCII text with Mime-encoding as "primary" by davidwr · · Score: 1

    The raw data should be in one of the common "mbox" formats with MIME-encoding. It doesn't have to be all in one file either - one file per year or per month should be fine. This has been around since the 1990s and you won't risk losing access due to the file format in your lifetime.* You will lose your folder organization, but you can get around that by making the folder name part of the file name or using filesystem-level folders to segregate messages, e.g. "2008/April/junk.mbox" or "junk/2009/April.mbox" and so on.

    You can make "working copies" of this in any format you like. You can even be "simple" and use your operating system's text-index tools to index the files. You won't have quick-access to pictures or other binary or non-ascii text attachments but opening the mbox in any mail-reader that understands this file type - and there are many - will get you to the attachment.

    *guarantee void if life-extending technology allows you to live more than 125 years from now.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  90. Re:Psychiatric consultation! by pixelpusher220 · · Score: 1

    you obviously don't watch enough hollywood geek thriller movies. Someone is *always* going to die if the information isn't found right fscking now!

    --
    People in cars cause accidents....accidents in cars cause people :-D
  91. The future prove eMail format is MIME by NotesSensei · · Score: 1

    You want to store all your messages in MIME format. MIME is reasonably well defines, your messages when arriving from the Internet are most likely MIME. It can be opened with any text editor or displayed on the command line (cat somefile.mime). It can contain attachments (you need to take care of attachments -- the binary format might outdate). Some suggested solutions (maildir) use native MIME files and then any fulltext indexer will do. Looks like mairix might be good for listing inbox style your messages. Good luck!

  92. Re:Psychiatric consultation! by Jawnn · · Score: 2, Insightful

    How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important.

    While the value you place on being able to retrieve critical pieces of information may be valid, your choice of storage medium is not. An email system is not a file server or database. Most index poorly, if at all, making searches horribly inefficient. And as has already been observed, it may be quite likely that those same things you value will be more than offset by their value to a hostile litigant.

  93. Re:Psychiatric consultation! by Eggplant62 · · Score: 1

    Damn, I'm going for +5 Funny and you guys mod me down to -1 Troll? Tough crowd. Get a sense of humor, will ya?

  94. just because I can. by socsoc · · Score: 3, Insightful

    just because I can.

    That's a big assumption. You are asking slashdot, so I'm thinking you can't. Especially because imap never occurred to you.

  95. HTML by drinkypoo · · Score: 1

    Convert to HTML with something meant for creating online archives. Then if you put it on a filesystem you can index it and search it at will. Unless you really need the originals this is your best no-coding option for later convenient reading. It is also possible to use some software to generate the indices etc. with the originals included within the archive pages.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  96. Re:RETARD MODERATION by Anonymous Coward · · Score: 0

    Re-affirmation is a wonderful thing.

    Funny thing is, what you describe looks exactly like a description of your own post, I can only conclude that the attitude you describe has nothing whatsoever to do with whichever OS they choose to use rather than their own personalities.

  97. Re:Psychiatric consultation! by tha_mink · · Score: 0, Troll

    When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

    This sort of behavior is odd and not normal. If you want to keep your email, then that's fine, but thinking that it's "vitally important" is odd and I think without question points to some "OCD with some component of Aspberger". If you don't then maybe you need to re-evaluate. I am however interested in how you pull demographic analysis out of emails? I mean, hopefully you're not suggesting that you go and chomp on the text to pull out fields of data?

    IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea,

    I think that GMail could be the panacea here. I mean, if you're just trying to make sure it lasts and you can search it with ease, then GMail can do it better than you can.

    --
    You'll have that sometimes...
  98. Do those you correspond with agree to profiling? by perpenso · · Score: 1

    Yup, I'm really highly concerned that an advertiser might learn that I like electronics and am a huge computer geek. Because there's no other way they could know that.

    Are you concerned that your emails "leak" such information about those that you are corresponding with? Are they OK with this? If they sent their email to a gmail account that's one thing, you could argue they implicitly agreed to the profiling. However by uploading all your emails to gmail there is no such implicit agreement.

  99. Google Apps, Emailchemy, Google Uploaders by ahess247 · · Score: 1

    I recently did something very similar with mail dating back to 1993 or so in multiple mailbox formats (Eudora, PST, Thunderbird mbox, etc.)

    Get a Google Apps account http://www.google.com/apps/intl/en/business/index.html
    This allows you to run a gmail interface with mail on your own domain.

    If you need more than the available storage for free, you can pay for 25 gigs, but it seems like the free level will work for you.

    For the PST files, upload them with Google Apps Migration for Microsoft Outlook
    http://tools.google.com/dlpage/outlookmigration

    Alternately, migrate the PSTs to Thuderbird using Emailchemy
    http://www.weirdkid.com/products/emailchemy/

    Then, if you're on a Mac (it seems you are) upload to Google Apps via the Google Email Uploader for Mac
    http://code.google.com/p/google-email-uploader-mac/

    This will upload everything you have in your Thunderbird environment. And it will take some time. At first it may look like the program has frozen, but give it a half hour or so to sort through all your Thunderbird folders, and then let it upload the mail overnight. It took me a few overnight uploads, but it was worth it.

    Once you have it in Google its very searchable and flexible. You can for instance re-organize it using labels, and then re-download to Thunderbird via IMAP if you like.

  100. won't work - storage space too puny by rubycodez · · Score: 1

    gmail only allows 7.5GB of space currently

    1. Re:won't work - storage space too puny by nexttech · · Score: 1

      So, pay Google for more storage. At $5 a year for 20gigs it seems to be worth it.

  101. Re:Psychiatric consultation! by h4rm0ny · · Score: 1

    +++THIS POST IS INTENDED TO BE HUMOROUS+++

    It's usually best if you're making a joke on /. that begins with anything other than "I for one welcome...", "In Soviet Russia..." or is itself entirely a quote from a Simpsons episode that was broadcast ten years ago, to give the poor /.'ers a helping clue such as beginning your post with "+++THIS POST IS INTENDED TO BE HUMOROUS+++".

    --

    Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
  102. What about the privacy of those you email with? by perpenso · · Score: 2, Insightful

    What about the privacy of those you correspond with? If they send an email to a gmail account that is one thing, but you are unilaterally deciding to have them participate in the targeted advertising profiling.

    1. Re:What about the privacy of those you email with? by dolmen.fr · · Score: 1

      Who is the target of that advertising? The Gmail user.

      Not the sender-who-is-not-a-Google-account-owner. Because currently Google doesn't do e-mail advertising and I don't see how they would link an e-mail adress to an web user who does not have an Google account with that e-mail address.

    2. Re:What about the privacy of those you email with? by Anonymous Coward · · Score: 0

      Who is the target of that advertising? The Gmail user.

      Not the sender-who-is-not-a-Google-account-owner. Because currently Google doesn't do e-mail advertising and I don't see how they would link an e-mail adress to an web user who does not have an Google account with that e-mail address.

      It depends on the contents of the email. Besides the body a signature may be a good source of info like real names and personal web pages. Additionally the gmail user may add identifying info via google's contacts info. Also a sender may have just visited an obscure link that was pasted into the email. I'm sure google does not need perfect verified info and that some sort of weighting system is used. However ongoing communications can begin to create a clearer identification. To understand how seemingly insignificant bits of info can accumulate to create an accurate picture read up on Britain's Enigma program at Bletchley Park during the second world war.

      Google's terms of service indicate that the info they record is not limited to the gmail service. There is nothing to prevent them from combining gmail data with web searches, pages visited, links clicked, etc.

  103. Re:Psychiatric consultation! by TheRaven64 · · Score: 1

    Actually, email systems tend to be the most searchable precisely because of people like the grandparent. If someone's sent me something in an email, I can usually find that email in less time than it takes to find where I saved the attachment. I have every email I've sent or received since 1997 (excluding spam, but including mailing lists), which comes to about 3.6GB. In spite of this size, it's well indexed by my mail client and searches generally only take a few seconds to produce the correct result.

    --
    I am TheRaven on Soylent News
  104. so do I by koodawg · · Score: 1

    I've also been archiving my emails since the early 90s. I've got a few hundred thousand messages. I've always used procmail to store in mbox format. I use shell/grep etc for searching. With procmail I archive like so;

    $HOME/Mail/year/SenderOrRecipientAddr

    where SenderOrRecipientAddr is either the senders email addr or the recipients, depending upon whether it's mail to me or from me. This way for example everything I send and receive to/from joe@smith.com is in the same mbox file.

    And storing it under $HOME/Mail allows imap to serve it up.

  105. Privacy of others? by perpenso · · Score: 1

    What about the privacy of the other people involved in an email? Did they consent to be part of gmail's targeted advertising profiling? Perhaps emailing a gmail account implicitly does so but that is not the case when you upload everything on your own.

  106. DBMail by nicc777 · · Score: 1

    DBMail. I use it on a Linode host (small fee every month).

    --
    Need an ISP in South Africa?
  107. Re:Psychiatric consultation! by Zaiff+Urgulbunger · · Score: 1

    Ahhh.... so you have access to the time even when you're away from home? Clever! ;)

  108. imapsync + cyrus in vm by mdgreen · · Score: 1

    Currently I use imapsync (http://freshmeat.net/projects/imapsync/) to sync all of my email to shared archive folders on a vm with the cyrus imap server installed. I wrote a shell script that syncs all of my mail into an archive folder for the current year, then deletes the email off of the original imap server. From time to time I have searched for a way to write all of the archived mail to an indexed format that can go on a cd/dvd that needs no mail reader to search for, but have found nothing. I worry that 20 years down the road there will be no way to run the vm, the imap server, or a client to access it. So good luck ;)

  109. Re:Psychiatric consultation! by Anonymous Coward · · Score: 1, Insightful

    Say what?

    It's the modern equivalent of saving all your personal letters and other correspondence. What the heck is abnormal about that? In the old days you'd have a bundle of letters stored in the attic somewhere. But this doesn't result in heaps of paper or file cabinets full of it that get in your way, as it does for people with a genuine mental problem. For e-mail, you can store it all on one small (these days) hard disk placed in a drawer somewhere, with space to spare -- even with all the spam! And the process of figuring out how to better organize it and archive it going forward will be a useful learning exercise that might have applications elsewhere (e.g., at work, where people might be asking exactly the same question).

    It's no worse than deciding to tidy up your office or study area and figuring out a system to better keep track of things so you can find them later.

    I mean, heck, the President of the United States had the same fricking problem: how to properly archive e-mail, a problem discussed here numerous times. As a common problem -- personally and in business -- listening to other people's solutions before digging into it yourself is an efficient way to deal with it.

  110. Imap Vote (Cyrus) by psbrogna · · Score: 1

    I've been a Cyrus IMAP admin for over a decade and have experienced no problems with user email boxes in the 6 Gb - 8 Gb range or single imap boxes with > 1E+06 messages. Performance of large batch message operations is also satisfactory (ie. import, export). It's also very useful to have server side message tagging support (ie. like gmail). I've heard other similar reports regarding FOSS imap servers such as Dovecot & UW and there seems to be at least some consensus that they are easier to manage than Cyrus but I have no direct experience regarding the relative ease of administration. Running your own local Zimbra might be a nice starting point as well- gives you a bunch of personal productivity functionality in a single groupware app. I'm running my own Zimbra instance on a RackCloud server for $90/year (all-in) for exactly this purpose.

  111. No, not IMAP, it bogs down by Anonymous Coward · · Score: 0

    As anyone who actually uses IMAP can tell you, it bogs down quickly on large mailboxes, violating the poster's requirement about b)

    1. Re:No, not IMAP, it bogs down by wkcole · · Score: 2, Informative

      As anyone who actually uses IMAP can tell you, it bogs down quickly on large mailboxes, violating the poster's requirement about b)

      Not true. Not absolutely false, either. IMAP is an access protocol, not a storage or indexing mechanism, and there is nothing inherent in IMAP that dooms it to be slow in handling large mailboxes. Different combinations of client and server, configurations, and mailbox content and usage can make huge differences in performance. Tens of thousands of messages in a single IMAP folder on a memory-lean server that uses Maildir storage on a UFS or ext2 filesystem with atimes enabled is going to suck horribly, especially with a client that doesn't cache heavily or maintain its own indices. Make that a mbox, and it will work great until you start trying to change it every couple of seconds.

    2. Re:No, not IMAP, it bogs down by raddan · · Score: 1

      As someone who used to run an IMAP server for a few hundred users (dovecot on OpenBSD, Maildirs totaling several TB in size), I can say this is not true. How well IMAP performs on large mailbox is largely a function of how braindead your IMAP client is. Certain versions of Outlook are pretty slow, but things work rather well with Outlook 2010. Thunderbird is insanely fast, UNLESS you turn on the offline indexing features. I haven't used the latest Apple Mail, but it also had a tendency to spawn so many threads that the imapd on the other end would start closing them. You can configure how many concurrent connections to use somewhere in the prefs. My iPhone works wonderfully with IMAP. Back in the day, I used Sylpheed, and it too was quite fast.

  112. Re:IMAP with DBMail by TravisHein · · Score: 1

    I agree, IMAP is the way to go. the dbmail.org project has an implementation of an imap service that uses a database as the back end. This allows you to, in theory, create a custom application to do full text search over the mail contents (that are stored into database tables). the default schema already does a good job to normalize mail headers and recipient email addreses on the mail, so as to help to filter searches using those. This kind of searching and indexing is of course a custom thing to have to build. I currently have not gotten around to doing this yet (after several years of running dbmail now), but I found that having the mail contents stored in a database does provide slightly better perfomance over time than having the many many individual files when a mailbox is backed by MailDir or Mailbox file system based storages. The only hitch is yes, you need to have to interoperate with windows, such as if you use windows only, its inconvenient compared to using PST files I guess. I have envisioned creating a virtual machine that runs a linux operating system loaded just with the dbmail and database stack, effectively creating a macro PST file type of thing, a service / appliance / single virtual machine image file I boot up to be my easy to search through mail storage repository.

  113. you can buy google storage by ergean · · Score: 1
    1. Re:you can buy google storage by censored711man · · Score: 1

      WHY oh why on earth would you pay someone to do something that you can do for yourself for free?

    2. Re:you can buy google storage by tha_mink · · Score: 1

      WHY oh why on earth would you pay someone to do something that you can do for yourself for free?

      Where do you live that it is free to keep 20GB of storage alive? No power? No hardware? I'm interested in how you get that for free?

      --
      You'll have that sometimes...
    3. Re:you can buy google storage by trentblase · · Score: 1

      I agree with the sentiment, but it's not unreasonable for him to have zero marginal cost (if he already runs a terabyte server for other purposes)

    4. Re:you can buy google storage by rubycodez · · Score: 1

      s'easy! most of us we already have that much free space, so incremental cost ot storing our indexed emails is $0.

      some of us are even disciplined enough to even have a backup system!

    5. Re:you can buy google storage by rubycodez · · Score: 1

      sure, and for $50 a year you can run the google business apps including gmail with much more storage. but I don't want to pay google anything, nor let them control and do targeted ads with my data

    6. Re:you can buy google storage by ergean · · Score: 1

      Some people are lazy... but there are some who would love to have access any time, any where with an internet connection and not bother with a true and tested back-up solution... and most people don't need more then a few GB for truly important data... if they need more, the they have mostly garbage. (We don't include family photos and films here or other stuff that needs dedicated backup)

      As some of you said - we all have some spare space on our systems, hell I use may parents computers for off-site back-up for years. I can have access to it but it costs to much to keep my hardware online for a "maybe I'll need that" once in a blue moon.

      So at this point gmail is ideal - it has the best search and you can access everything in a few seconds if you have an internet connection.

  114. if you want it to be "future proof" there's only.. by Anonymous Coward · · Score: 0

    mbox format. That's all you can really do if you want it to be readable by anything you want in the future.

    I've archived all of my email since 1992 in yearly ( later monthly ) tarballs.
    I double tar the monthly ones into an annual one just for convenience when I later go to bzip/gzip them.
    Ironically, storage density has become so high, I haven't bothered zipping up any of the last 2 years, even though it's nearly 10,000x the disk space as my 1992 files.

  115. As I walk thru the valley where my hay I do bale, by Anonymous Coward · · Score: 0

    I laugh at you English, cuz I don't have email!

  116. Most are missing the point by Compuser · · Score: 1

    I am in the same boat as the original poster. And I think the question has not yet been answered. My requirements (and I suspect the original guy's too) are:

    1. A client that can import and store emails in a wide variety of formats

    2. A client that can search emails (including office format and PDF attachments) quickly
    2a. A figure of merit: 100000 emails, 10 gigs, 100 msec or less for search (core i7, plenty of RAM, SSD)
    2b. Ideally search would allow SQL-like searches on any field and understand regexp

    3. A client that requires no IP stack to function so it could be run on a machine detached from internet
    (I have on-line and off-line machines for security and I disable IP stack on off-line machines to prevent
    temptation to use them online if my other machine fails).

      4. Crucially, a client that is easy to install, configure, and use. If your solution involves configuring a server
    or worse yet, configuring a server in a virtual machine then it is not workable. I do not have the time to
    figure it all out and I suspect only real sys-admins would consider this a solution.

  117. Re:Psychiatric consultation! by Anonymous Coward · · Score: 0

    You seem to assume saving an email takes an active effort and therefore it is more OCD and wasted effort to save all the emails instead of only some.

    In fact, those of us replying with suggestions on how we do it have found the opposite. We have taken a simple action to save-by-default and it takes extra effort to exclude some emails from this treatment. It's one action every month or so to shift off the entire old message set to an archive, whether there are four or four thousand messages. And if the OP is like me, the reason most junk mail is deleted but not all is because most is deleted by the automatic filters and he's too lazy to go after the individual ones that sneak through. Here's my periodic effort: tag all messages between Y1-M1-D1 and Y2-M2-D2; save tagged messages to mbox foo; delete tagged messages from inbox. It took me longer to type it here than to perform it for real.

    When you have effective search methods, it doesn't matter how much extra is saved by accident. When you've solved a few work-related issues by finding an old message with exactly the right info in 5 minutes instead of spending hours recreating technical information from a project several months or years removed, you realize it's an important work tool. As mentioned far above, the only reason not to archive is if you are going to have incriminating information in your archive; but it's a leap of faith to believe nobody else has that information, even if you delete it. So a better solution is to avoid being a bad person and writing about it in emails.

    I know what real pack-rat OCD people do, as I have some in my extended family. And I can safely say that my archiving of email does not clutter my life or my mind; rather it occupies an infinitesimal space on my hard drive and frees me from periodic sifting through data to carefully record those technical nuggets before I purge my active in box. I just shift them all by date-range and forget about it. It's kind of like delete-all with search-based undo, for when it turns out I really did need it for work.

  118. Some options by forrie · · Score: 1

    I am in the same boat. I ended up importing them into Cyrus for the last few years. It's not fool-proof, however if you configure the "squatter" service, it will do some rich indexing. I have found that, over time, even when older messages have an attachment, it doesn't always translate correctly into modern mailers. There could be several reasons behind that.

    A while ago, I saw a project called Zoe which was aimed at solving the problems described -- it was OS centric (Mac?), though I believe it's been abandoned.

    Another project out there is "dbmail" which is basically a large-scale email server (IMAP, et) that stores your messages in a MySQL database. Might be worth a shot.

    I think the original poster is asking about something that not only will store the data properly, but present some sensible GUI to peruse it all. This capability is veering into paradigm of "document management" I would think. Especially with regard to access of the original attachments and their various encodings and formats.

  119. You are not alone. by Bent+Spoke · · Score: 1
    You might try this solution by the author of sqlite...

    http://www.sqlite.org/cvstrac/wiki?p=ExperimentalMailUserAgent

  120. Re:Psychiatric consultation! by pz · · Score: 2, Interesting

    When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

    This sort of behavior is odd and not normal. If you want to keep your email, then that's fine, but thinking that it's "vitally important" is odd and I think without question points to some "OCD with some component of Aspberger". If you don't then maybe you need to re-evaluate.

    I am however interested in how you pull demographic analysis out of emails? I mean, hopefully you're not suggesting that you go and chomp on the text to pull out fields of data?

    So on the one hand, you think my saving email for later access and analysis is not useful, but then, you want to know why it is useful?

    I run a research laboratory where we do two things, one is work on restoring sight to the blind, the other is to organize a conference every two years. The primary demographic analysis I need to do is to analyze the country-of-origin for email traffic pertinent to the conference. This has helped to raise many tens of thousands of dollars of support for the conference by demonstrating various aspects of the global attendance to funding agencies.

    Being able to access my email and locate attachments, review discussions, find references, remember addresses, etc., in other words, to recall what someone once wrote to me, has resulted in millions of dollars of grant money to fund my research. Without the ability to review email that is, at times, years old, that would not be possible. Having rich access to my email stream has allowed me to fund my lab, and therefore feed and house my family and the people who work for me, publish high-impact papers, receive numerous awards, get coverage in the international press, etc., or, put better, to run the daily business of a research lab at a high-profile university. While the tools I use are good, they leave a lot to be desired, and having a better system would make me more productive.

    IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea,

    I think that GMail could be the panacea here. I mean, if you're just trying to make sure it lasts and you can search it with ease, then GMail can do it better than you can.

    I dislike GMail for my professional correspondence for a number of reasons: (1) it does not allow me to readily use my university affiliation address (and since that's a top university, that makes a difference whether people like it or not), (2) I do not have ownership of my email, (3) the lack of a good filing / archiving interface makes it hard to associate different threads together, or to limit searches (I intensely dislike the tagging feature), (4) GMail has an only rudimentary ability to edit text since it's browser-based.

    I do use GMail for my personal correspondence, but that's mostly because it's the best of a bunch of poor, but free, services. It does have the best searching features, but falls down in a lot of other ways. It also would be against my employer's policies to store HIPAA-regulated email offsite. So GMail is not a panacea. Thanks for the suggestion, though.

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  121. Private Zimbra installation by petree · · Score: 1

    While it's totally overkill for the job, I highly recommend you run a Zimbra Open Source instance for yourself. Although you don't need much of what it provides (Calendaring, contact sync, Jabber IM, etc), it will let you store your messages in a stable, searchable and accessible form. Zimbra can directly import from PST or via IMAP (with your mail client or imapsync) and once it has your messages it full text indexes them with Lucene and so you can search them via the web or IMAP clients. You can easily get your messages out via one of the supported export formats or just use your IMAP mail client to dump the messages into mbox/maildir/pst/whatever. While you could certainly roll your own, why not let someone else take care of all the hard work for you?

    1. Re:Private Zimbra installation by CoolCash · · Score: 1

      I would also recommend Zimbra Desktop, uses a lot of the same tech as the full server. Its designed as a mail client, but it does have the awesomely fast search that Zimbra offers.

  122. Re:Psychiatric consultation! by pspahn · · Score: 1

    especially if they don't include suggesting GMail as a panacea, as I want to have the email text and attachments in my possession.

    Yeah, I've used Gmail for getting close to five years now. Does it bother me that they have access to all my stuff? No more than it bothered me that whatever ISP's email I used previously had access to all my stuff.

    In my eyes, it's just email, personal email at that. Of course I have sensitive stuff in there, but I'm not going to spend a disproportionate amount of time setting something up myself when I can just use what's already made.

    If you want to have everything in your own possession, you could always set up a client to download the messages, and then delete them off Google servers once done. But I understand the paranoia. It's the same thing that keeps me from signing up for a medical marijuana license, I'd just prefer to not have my name on that list.

    --
    Someone flopped a steamer in the gene pool.
  123. Unix mail spool file by chaoskitty · · Score: 1

    If you get everything into a standard (free)Unix spool file, it'll be readable a hundred years from now. After all, what other kind of archive file could you have from twenty years ago which you could easily use today?

  124. Apache Solr (+ IMAP) by arachnoprobe · · Score: 1

    Use the following for optiomal perfomance: (1) IMAP for input, storage and access from any client for daily use (2) Configure Apache Solr to index your IMAP-Mails (3) web-based search-interface to access your SOLR index ( (4) use (hierarchical) faceting (see example: http://search.lucidimagination.com/)

  125. Definitely Gmail by teh_tecchie · · Score: 1

    Leaving aside all the usual (tiresome) conspiracy theories I'd definitely import them to Gmail or, better still, a Google Apps account as per suggestions from other posters. I have all my mail going back several years and there's no problems for me.

  126. Re:Psychiatric consultation! by tha_mink · · Score: 1

    So on the one hand, you think my saving email for later access and analysis is not useful, but then, you want to know why it is useful?

    No, I wanted to know how saving email was the best way in which to accomplish the goal of demographic analysis. Now that you've explained what you do it *for*, which, for the record, I couldn't be less interested in BTW, I'm interested in how you achieve that goal with saved email? Last I've looked, and I could be way wrong, country of origin isn't listed in the email header. Also, IP addresses can't be that reliable two years after the fact either. So, how do you get country of origin from two year old emails? (not sarcastic either, I'm interested)

    --
    You'll have that sometimes...
  127. Quite a list, but humbly... by smchris · · Score: 1

    I like it that Evolution saves in the same format as Mutt. Quite a lot that a person can do with that and basic unix commands.

  128. Re:Psychiatric consultation! by tha_mink · · Score: 1

    I dislike GMail for my professional correspondence for a number of reasons: (1) it does not allow me to readily use my university affiliation address (and since that's a top university, that makes a difference whether people like it or not), (2) I do not have ownership of my email, (3) the lack of a good filing / archiving interface makes it hard to associate different threads together, or to limit searches (I intensely dislike the tagging feature), (4) GMail has an only rudimentary ability to edit text since it's browser-based.

    So...
    1. Yes it does. So long as your university allows you SMTP access, then Gmail can send email from your University address.
    2. Your University let's you own your email? No archiving or backup there? Interesting. I thought most Universities had a robust email retention policy these days.
    3. Gmail threads emails by default, has labels for filing, and you can even use postini if you have retention needs.
    4. What do you need to do, edit wise, that you can't with the Gmail RTE? Have you used it lately? If the Gmail RTE isn't good enough, there's a myriad of plugin RTE gadgets you can use too. Just sayin...
    Use whatever you want, and it's your business, but I don't see how any of your requirements are not fulfilled by Gmail.

    --
    You'll have that sometimes...
  129. Re:It's obvious - Gmail by flappinbooger · · Score: 4, Insightful

    It's obvious, upload them to gmail!

    (only half kidding)

    --
    Flappinbooger isn't my real name
  130. Re:Psychiatric consultation! by tha_mink · · Score: 1

    It's the modern equivalent of saving all your personal letters and other correspondence. What the heck is abnormal about that? In the old days you'd have a bundle of letters stored in the attic somewhere. But this doesn't result in heaps of paper or file cabinets full of it that get in your way, as it does for people with a genuine mental problem [wikipedia.org]

    But you wouldn't save your junk mail, would you? Grocery store fliers? Credit card offers?

    --
    You'll have that sometimes...
  131. Hold the phone! by Anonymous Coward · · Score: 2, Insightful

    Computers, hard drives, backups, electricity, rack space, and maintenance are all free! Fuck! Tell me where you shop for this stuff.

    1. Re:Hold the phone! by tehcyder · · Score: 1

      The same place he "shops" for his music and movies, I expect.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  132. Re:Psychiatric consultation! by Kvasio · · Score: 1

    b/s ....

    June 2001: Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I'm a... fraid. Good afternoon, gentlemen. I am a HAL 9000 computer.
    Sept 2001: Can you take away this damn monolith?

    btw:

    Shhhh - Dave's the real father (AC doesn't know)..

    It that some sort of crossover between SW:ESB with SO:2001 ?

  133. Re:Psychiatric consultation! by Jawnn · · Score: 1

    Your experience is as common as your rationale. Neveretheless, if email is the easiest way you have to find important information you are doing it (storing that important information) wrong.

  134. take a look at Beagle by anomalous+cohort · · Score: 1

    People here seem to think that you are looking for another email client. Instead, it appears to me that what you really need is a way to archive and search your local machine. In light of that, take a look at http://beagle-project.org/ Beagle can search your IMAP stuff and local file system stuff too. I run Ubuntu so the UX for installing, configuring, indexing, and searching with Beagle is pretty easy. Beagle is available in the Ubuntu Software Center. You can search from either the command line or from the firefox search bar once you have configured that.

  135. Re:RETARD MODERATION by hairyfeet · · Score: 0

    Or maybe, just maybe, someone modded it down because they actually read the TFA where he plainly said platform independent and a Linux only solution is no more platform independent than a Windows or OSX only one. I mean God For fricking bid that someone actually reads TFL in stories, but is it really so much to ask that they read the fricking summary of the story they're posting to? Hell why not just forget TFS altogether and start posting cookie recipes? Sheesh.

    --
    ACs don't waste your time replying, your posts are never seen by me.
  136. Re:RETARD MODERATION by halltk1983 · · Score: 5, Insightful

    Virtualbox is platform independent, and he also mentioned using a VM. Once all the email is on the IMAP server in the VM, you could easily attach to it with a client that runs on any platform.

    Also, IMAP servers are platform independent, as they can run on OSX, Windows, Linux, BSD, and almost any other popular OS I can think of. It's just that Linux distros are common, easy to set up, and light enough on resources that they would be easy to set up in a VM, and without the licensing costs of OSX or Windows, it becomes price comparable to lesser solutions.

    I know it's a lot to ask these days to get people to read the comments that they are replying to, but maybe, just maybe, someone complaining about a lack of reading comprehension should take more time to read.

    --
    Watch for Penguins, they eat Apples and throw rocks at Windows.
  137. As others have said - maildir + mairix by lidocaineus · · Score: 1

    If you want light, always in text format, easily searchable, and fast, maildir + mairix is your answer. You don't even need to keep your mail in a flat structure. Place this on a server with IMAP/s access, and you'll never have to move your mail again. Just make sure you have good backups. For the fastest results ever? Access your email over SSH using mutt. The only drawback is that if you're not a CLI person (and this doesn't even't use it that much), you're going to hate this, or at least have to pile on a few scripts to web-ify mairix and its search results.

    And no offense to the gmail users, but true blue email types would never turn over their emails to anything not completely under their control.

  138. Archiveopteryx by Anonymous Coward · · Score: 0

    Archiveopteryx is designed to do exactly what you're talking about. It's also a good general purpose mail server.

  139. Re:Psychiatric consultation! by raind · · Score: 1

    Well they have this thing, it's called "backups" this would be where you save your precious emails and move them to another media.

    --
    Get up!
  140. Thunderbird 3.x + IMAP by Spiralis+Fractus · · Score: 1

    As far as a "cloud" webmail interface goes Gmail has the best search features (which probably contributes to why so many Slashdotters prefer Gmail), but the search features introduced into the Thunderbird 3.x mail client are the best of any e-mail interface. To even rival the customizability of searches that is available in Thunderbird 3.x would require one to be fluent with command-line commands like find and grep, but acquiring such fluency is temporally expensive.

    Timothy (OP) says that he has already tried Thunderbird though, but since his first complaint is that moving the "hundred of thousands of emails" that he has hoarded over the past two decades between the email systems that he has already tried takes "forever to process", Timothy appears to have some unreasonable expectations regarding data sets that large (specifically in regards to migrating and indexing such sets).

    For those who do not feel comfortable keeping their e-mails in the cloud, they could always use Thunderbird 3.x as the interface and administer their own IMAP server at home using software like Dovecot.

  141. Re:Psychiatric consultation! by burisch_research · · Score: 1

    Imbecile. Outside of the USA, the majority of email addresses end in a country-specific suffix.

    --
    char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
  142. WOW by cvtan · · Score: 1

    Your emails must be REALLY important!

    --
    Sorry, but gray text on gray background is making my eyes bleed.
  143. Re:Psychiatric consultation! by burisch_research · · Score: 1

    Read the post properly. He said he does NOT have ownership of his emails. This doesn't mean he's not responsible for the mundane details, but to quote the poster, "It also would be against my employer's policies to store HIPAA-regulated email offsite". So GMail is totally absolutely out of the question.

    --
    char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
  144. Re:Psychiatric consultation! by burisch_research · · Score: 1

    "... you could always set up a client to download the messages, and then delete them off Google servers ..."

    Or just not use GMail.

    --
    char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
  145. Re:Psychiatric consultation! by burisch_research · · Score: 1

    In Soviet Russia, the jokes LAUGH AT YOU!

    --
    char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
  146. Free yourself from the tyranny of data by anegg · · Score: 1

    Perhaps you could free yourself from the tyranny of data by just deleting the e-mail? You can keep a year or two around in your favorite e-mail tool, and just let the rest go... the alternative appears to be creating the digital equivalent of the old people living in houses filled with junk that they never do anything with.

  147. MailStore Home by bruceatk · · Score: 1

    This is a windows solution, and it works great. I have stopped using all clients and just use GMail on the web. I have archived all my Eudora/Thunderbird archives into MailStore. I now have one place to search all my e-mails, since 1999. http://www.mailstore.com/en/mailstore-home.aspx

  148. Re:Psychiatric consultation! by burisch_research · · Score: 1

    Far more efficient to simply leave the spam and let it sit with everything else. Ideally spam is deleted as it arrives, but some get missed ...

    --
    char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
  149. kmail first, then maybe imap by Anonymous Coward · · Score: 0

    I have a far simpler solution. Just use kmail, and set up autoarchiving. For example, on your main inbox, just create subfolders, one for each year. And as they age, they get put into the the newest subfolder. Next year, label that one "2010". Lather, rise, repeat.

    I get somewhere around 5-10k email messages per day (from a number of lists that I'm on), and have used this approach for the past 10 years. It works great.

    The advantage here is that this is on my home desktop, so it's already part of my backup system. I don't need to build a separate server. Plus, the searching ability is far superior than reading through any of the standard web-based archives of email forums. I have kmail's searching ability, plus find/grep/et.al.. It's a LOT faster, and I don't have to put up with any web-based apps.

    If you want remote access, THEN add imap, or ssh, or whatever. But it really doesn't get any simpler, or more powerful, than this.

  150. Hoarders by KRL · · Score: 1

    I was just watching Hoarders... and I think this would be the digital equivalent. Why on Dawkin's green earth would you possibly want to keep all that email???

  151. Re:Psychiatric consultation! by MobileTatsu-NJG · · Score: 1

    Damn, I'm going for +5 Funny and you guys mod me down to -1 Troll? Tough crowd. Get a sense of humor, will ya?

    Your post would have been modded funny if had contained a humorous punch-line.

    --

    "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

  152. Re:Psychiatric consultation! by SimonInOz · · Score: 1

    Hey - I don't think there ARE any computer older than my dad. Lemme see, he was born in ... er, 1929.

    Nope, not too many PCs then ...

    --
    "Cats like plain crisps"
  153. Hmmm by Anonymous Coward · · Score: 0

    Well, if you just make each email a text file, then you could use Lucenne, an open source search engine. It's pretty easy to use / implement a web interface if you want.

  154. don't re-invent a broken wheel by turlingdrome · · Score: 1

    I agree with the general sentiment that maintaining electronic records (and emails are most definitely legal electronic records) is imperative. The IRS suggests maintaining at least 7 years worth of documentation in the event of an audit. It should be no different for electronic records.

    Where I don't agree with the general sentiment is the fear-mongering of privacy concerns with gmail. I switched to gmail (via Google Apps) about three years ago. It is without a doubt the best digital move I've ever made. Google's privacy policy is quite clear on how your data is stored and managed.

    If you still feel the need to maintain a local archive of your mail records, simply download them on a regular basis to a client of your choice. While I understand the interest of a hobbyist to create some elaborate local server/client for their mail, I (and I suspect many others) have more important things to to with our spare time. Enjoy the services that exist today to help you manage these records, instead of re-inventing the wheel.

  155. thunderbird portable by Anonymous Coward · · Score: 0

    I used thunderbird portable once and it satisfied my needs.
    I stored 1 year email history in a cd. To see the mails you just put the cd and execute the exe(It makes a temporal copy in the hard disk if the media is non writeable).

    Hope that helps

  156. Re:RETARD MODERATION by houstonbofh · · Score: 0

    I am sorry I ran out of mod points yesterday... On the other hand, I would have a hard time choosing between +1 Informative, +1 Funny, +1 Flame Bait with Style.

  157. Stand-alone solutions anyone? by Anonymous Coward · · Score: 0

    Great question. Has anyone ever used this:?

    http://www.mailstore.com

    I'd like to do a windows-based, off-the-net solution.

  158. Delete it by Anonymous Coward · · Score: 0

    99% of everything you save is probably total crap. The other 1% is mostly crap. Use a little common sense and you will find this task much easier.

    1. Re:Delete it by Kadin2048 · · Score: 1

      Given the price of storage, it doesn't make sense to spend a lot of time (potentially any time at all) sorting through messages by hand, deciding what to save, if you can just as easily archive all of them and then search for the ones you want later. Unless you put a very low value on your time, you can buy a lot of disk for an hour's worth of sorting.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  159. I did this by Demena · · Score: 1

    abou five years ago. I sucked all of my mail ever into an sql database using perl scripts.

  160. Re:RETARD MODERATION by Anonymous Coward · · Score: 0

    It's funny. It clearly wasn't meant to be taken seriously.

    Instead of responding with anger, you should have fun with it by coming up with something witty. If you can't, then take a deep breath move on. Nerd rage only gets you laughed at. Even by others who may in fact be equally as nerdy.

  161. GMail by multipartmixed · · Score: 1

    You'll never find a system as fast at searching and categorizing that amount of mail.

    Just remember, larry and sergei read it all.

    Wes

    --

    Do daemons dream of electric sleep()?
  162. Greenstone by Anonymous Coward · · Score: 0

    There's a certain amount of setup/design/configuration involved, but you might think about the Greenstone Digital Library software from the University of Waikato in New Zealand (see http://www.greenstone.org/). It's an open source digital library package, and among the formats it supports out of the box for ingest/indexing/retrieval/display is e-mail archives. It's multiplatform, and in its 2.X incarnation pretty solid (or at least solid enough so that I have students install it on their own machines when I'm teaching our digital library course and I usually don't have a lot of support headaches as a result).

  163. imap = good, zimbra = imap done right by Anonymous Coward · · Score: 0

    While my history is not quite as long (only the last 10 years or so), I do have over 22GB and 130K+ email messages in my repo. I have a virtual machine that's running Zimbra (http://www.zimbra.com) open source. You get an extremely powerful email system built on top of open source components (postfix, spamassasin, etc, etc.). Not only does it make administering all this stuff simple and easy, but it also gives you a very powerful Web 2.0 UI, IMAP/POP3 access, multiple accounts, multiple domains, multiple everything. All my corporate and personal email gets automatically stored and indexed there. I host it all from home and don't need to worry about privacy issues. You couldn't make it easier or simpler to do this.

  164. Improvement: by subreality · · Score: 1

    grepmail regex mbox1 mbox2 mbox3 > newmbox
    mutt -f newmbox

    I lived on a steady diet of that for years, but now Thunderbird 3 does it better. It meets all of the OP's requirements:

    a) Everything's stored as an mbox. Fast conversion utilities abound.
    b) Performance with my very large set of data is good.
    c) mbox is as universal as it gets.
    d) Thunderbird 3 added full-text indexing. I can search GB's in seconds, instead of tens of minutes with grep.
    e) Thunderbird is on all major platforms.
    f) see (c).

  165. Really? by Anonymous Coward · · Score: 0

    Machine analysis violates your privacy? And are you suggesting privacy exists when using other mail services, with plaintext messages traversing public networks?

    1. Re:Really? by Anonymous Coward · · Score: 0

      Machine analysis violates your privacy

      When you knowingly use gmail you are agreeing to keywords being added to whatever profile google has on you. When another person uploads your email to google you have not agreed or opted in to this system. The other person has violated your privacy by disclosing a private communication without consent.

      And are you suggesting privacy exists when using other mail services, with plaintext messages traversing public networks?

      Straw man.

    2. Re:Really? by crunzh · · Score: 1

      Please backup up your assertations with facts or references. You dont agree to "When you knowingly use gmail you are agreeing to keywords being added to whatever profile google has on you".

      --
      Visit http://www.crunzh.com/ for free software. Mac/Lin/Win
    3. Re:Really? by Anonymous Coward · · Score: 0

      http://mail.google.com/mail/help/intl/en/privacy.html
      They clearly indicate that your IP is logged. They use loophole phrasing like: to provide gmail and to *improve our services*. Note plural not singular, what they record is not limited to the gmail service.

    4. Re:Really? by Anonymous Coward · · Score: 0

      Please backup up your assertations

      "back up". ("backup" is a noun.)
      "assertions".

  166. Exchange 2010 SP1 by binaryspiral · · Score: 1

    I recently updated our exchange environment to SP1 which allows me to create a new database on different storage and assign an Archive mailbox for users. So now I got a terabyte volume on tier 2 Sata storage for folks to use as archive - now I can get those damn pst files finally off my file servers.

  167. Re:RETARD MODERATION by Anonymous Coward · · Score: 1

    Or maybe because it suggests running a VM on a desktop/laptop just so he could archive his mail.

    That's a piss-poor solution.

  168. I do a find and grep on a retired computer by beachdog · · Score: 1

    Here is my "non-deliberate" system for recovering emails, files, programs, website copies, correspondence, certification homework, photos and projects, from the last 12 years. All these components have been mentioned earlier except the find text in files command I show you below.

    I have the computer I used for the last 11 years sitting turned off with two disk drives. This is a metal case computer isolated from electrical surges by a UPS. I presently have the archive computer set up with an IO Gear device that switches the keyboard, mouse and display to the archive computer if I need to turn it on. I like this better than accessing the archive computer through SSH or a terminal server or remote desktop viewer. The silly but important thing is do not let the archive computer connect to your mailserver and download current emails. I find simply copying entire directories to an 8 gig USB drive is easier than messing around with SCP (secure copy) and SSH (secure shell).

    Those two disks go back about 12 years. The email systems I have used over the years have pretty well known names. The older email pattern is one big file containing every email with a blank line separating each email. For a Microsoft system, I use a USB stick as a archive device and I read it on a Linux box.

    I can search as much of the disks as I need using the following script (taken from Unix Power Tools)

    I store the script in a file called "findscriptfile" because I can't remember it, I just look it up when I need it. I wind up creating files with the same command on various places on my computers. Note the command line below requires blanks as shown. You will have to test and fiddle with the search. Sometimes you need to use "sudo " when file permission error messages cloud the results. The search time for an email address or a copy of a letter or a photo file is 20 minutes.

    This find script finds all files starting at the current directory and working down and the script searches each file for the word "thumbnail".

    find . -type f -exec grep thumbnail '{}' /dev/null \;

    I think one of the interesting things about the original problem posted to Ask Slashdot is what kind of information has enduring value and how much value does it have? Or another question might be what is the cost of storage per month and how many items on the storage system are worth more than the storage cost?

  169. Simple Solution by ccrasher · · Score: 1

    Try converting all the Email to a PDF package. I have been adding to a pdf package for several years directly from outlook and also include any or all important attachments. You can sort or search at anytime using names, times, subjects, attachments or just about any other query parameter that you can think of. You can also secure the package and if higher security is desirable you can also save the secured package in an office one note notebook with a password which makes the file a little harder to crack. You can add to the package/archive anytime you wish or create multiple packages by week, month, year etc. and keep them in a large package. I believe cute pdf is free and works fairly well. I've had acroB pro for several years and that works great for a paid program. AcroB pro attaches to your Email client upon install so at anytime you can archive any or all emails with 2 or 3 clicks. Any pdf reader can be used on any system to read the email from a flash drive or cd or any portable media. I think you should be able to read the files 100 years from now since I don't see pdf's going away any time soon

  170. Re:RETARD MODERATION by daveime · · Score: 1

    SMTP is the transport protocol.

    IMAP and/or POP3 are STORAGE protocols.

  171. MailStewart Lite, Regular, Pro by cmholm · · Score: 1

    I've been using MailStewart "Regular", which uses a built-in SQLite engine, and claims to be good for +100k records. The "Pro" version uses MySQL. The initial import and periodic update of the archive couldn't be simpilier.

    --
    Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.
  172. Thunderbird? by NikolaiKutuzov · · Score: 1
    Sorry if this answer may sound not tech-savy enough or too simple.

    I have mails from 1995 onward, by now roughly from 15+ different accounts, most of them defunct. Except for a couple of months of 1996 and 2005, which I wistfully deleted, I converted them all with a small tool (Aid4mail, i think) from PST, Eudora, or Pegasus format into thunderbirds UNIX-compatible format.

    + Open Source (Free + maintained + Supported)

    + Thunderbird searches and indexes just fine

    + plain text format - I can use all sorts of editors on them if necessary)

    + Always on my HDD (Encryption, no public mining, no external servers needed)

    + UNIX-Format guarantees I can convert them into something completely different in 20 years, should the need arise

    + no additional software needed

    Am I overlooking some of GPs requirements here? Or is the slashdot crowd prone to a little overengineeering? :)

    Regards!

    --
    Invita Invidia
  173. Re:RETARD MODERATION by Anonymous Coward · · Score: 1, Informative

    You are wrong.

    POP3 is a transport protocol.
    IMAP is a transport protocol.

    You need to learn these things before you post.

  174. mailbox format + mutt by willijar · · Score: 1

    I too archive all my emails. The solution I use is quite simple compared to others proposed here.

    I have postfix deliver a copy of all mails to an archive directory with mailbox file per year as well as to my inbox (or other mailboxes depending on filtering rules). Each archival mailbox file is about 5GB compressed.

    Filtering these mailboxes by header using mutt is a very fast operation and even doing in-body searches and views takes less time than it takes gzip to uncompress the files. Obviously they can easily be backed up in the usual ways.

  175. indexing vs open source... by Herve5 · · Score: 1

    The trouble I see in our OP's question (which I share), is somehow that most of the open source solutions will have a slow interface (compared to, say, OSX Spotlight).
    I currently use Powermail on OSX (so, two closed solutions) because it handles almost 20 years of mail, in Go, and is still Spotlight-compatible (raises results while you type the keyword).
    the guys at Powermail are a small company that indeed started as the kings of indexing, long before Spotlight. To my knowledge they are the only email app on OSX that maintains Spotlight compatibility. But, they are "proprietary".
    I think if Powermail is to die, I'll transfer all my archive to an IMAP server, the way it has been described various times above. This too may be tricky: not all email front-ends will handle 1 Gb of IMAP transfer properly, nor all IMAP servers. Do try before using. I tried with Powermail and the french postal free email service: this did well, but that's presently the only couple that indeed works for Gbytes.

    --
    Herve S.
  176. Been there, done that. by RichiH · · Score: 1

    Upload them to GMail. Or get Google Apps for your own domain (it's free) and use their GMail variant.

    imapsync will take care of your other IMAP accounts, mutt/pine for uploading from Maildir/mailfile and three dead chicken on a moonless night for PST.

  177. outlook 2010 by Anonymous Coward · · Score: 0

    I'm not sure if you tried it, but win 7 + outlook 2010 does do a pretty good job to me.
    However a note here the first day keep your computer on as indexing happens in a background processs

    the first day i changed my mail system i couldnt search, but the next day it was all indexed; it makes sense, indexing is resource heavy. So they do it in the time you dont use your system. However be aware that windows 7 should be configured not to enter sleep mode for this to happen !
    if you sudenly change mail systems then you have to take that in account, but if you would have started from scratch you would not need to keep you pc on for a night.

    By default outlook 2010 has a 50GB pst size limit, but you change that in the registry... ( do you have more then 50GB ???).
    Also to optimize the speed of brouwsing your Email, create folders, dont put 50GB in one inbox.
    Because the view window refresh takes every items properties to display them.
    Make some sub folders increases organization to better find your stuff also.

  178. Cleaning out by IllusionalForce · · Score: 0

    I cannot stress enough how important it is to keep only relevant e-mails. I can't imagine that you actually need all those e-mails. Every year, I clean out my inbox and see if these e-mails would ever be of any future relevance.

    Keeping the "Happy Birthday" e-mails from your co-workers may not really be worth it after you've left the company. But keeping the response to a denied application with reasoning why you weren't hired, however, is worth it.

    A clean inbox can make your life much easier, you won't get caught in all the micro organizing you need for big bulks of e-mail.

  179. SQL Schema? by bill_mcgonigle · · Score: 1

    Frankly, you can't beat something like a SQL database for those requirements.

    I used to have this - a Filemaker Pro database I populated with mail via AppleScript. It would break a message into pieces and store the pieces in fields. But that was mid-90's, before e-mail got hard - there was a From, a To, a Body, etc. No quoted-printable, base-64, or multi-part MIME messages.

    It's great to be able to search "From:" some wildcard, Date-range foo to bar, Subject with a boolean keyword expression, but it's also important to be able to re-construct the message for forwarding, replying, etc.

    So... the CRUD is pretty straightforward, but what's the best way to represent it in SQL? The easy thing to do would be to load the message into an object with a canned library and then throw that at a SQL ORB, but somewhere down the line retrieving the data manually would also be useful.

    A quick search didn't turn up a well-known schema, but certainly this problem has been solved. Being able to use a fast search (tsearch2, for instance) would be so graet vs., say, Thunderbird's built-in search. Anybody have any pointers?

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  180. Anonymous Coward by Anonymous Coward · · Score: 0

    People like you have a serious problem, there is no reason to HOARD your email. Keeping email from the 1990's, come on, the delete button was made for a purpose, TO DELETE!!!!!!!!!!!!!!!
    But if you need to keep your emails and be able to do all of the criteria you listed the best would be to store your emails in a MySQL database, you can run MySQL on linux/unix and windows. Now easier said then done, you would would have to construct all of the table and a way to import them all or do a quick search on sourceforge.net and try on of those solutions

    http://sourceforge.net/search/?type_of_search=soft&words=email+archive

  181. Use the filesystem. by jonadab · · Score: 1

    Just store each message in a file. Each "mail folder" is a directory. Gnus calls this arrangement "nnml", and Courier calls it "Maildir". I don't know what other software calls it, but it's difficult to imagine any reasonably-capable software not supporting it, because it's so obvious and straightforward.

    There are several advantages to this arrangement. The big two are 1) it's hard to beat for compatibility and 2) for searching and indexing and stuff you can use standard utilities such as are made to operate on any kind of (text) file. (As for threading, your mail reader should be able to handle that. Once you find the file with one of the messages you're after you know its subject and message ID and stuff, so finding the thread in the mail reader is easy.)

    The one disadvantage is that you have to choose whether to put it all on a FAT filesystem (for maximum operating system compatibility) and suffer the performance disadvantages thereof (which are considerable when an individual folder contains many thousands of files; not as bad as with IMAP, but still very noticeable). Of course, moving/copying from one filesystem to another is only as problematic as copying any other kind of files around, so if you decide to use NTFS today (which has reasonable read/write support in Windows and Linux) and later decide to use an OS that doesn't have read/write NTFS support, you can just copy the files over to UFS or whatever at that time. Boot an OS that has both filesystems (Knoppix, for instance), cp -r --preserve=all blah blah blah, and leave it running while you go to work or something.

    --
    Cut that out, or I will ship you to Norilsk in a box.
  182. Re:Psychiatric consultation! by dolmen.fr · · Score: 1

    As a common problem -- personally and in business -- listening to other people's solutions before digging into it yourself is an efficient way to deal with it.

    Thank you for your insight. We already knew that is a common problem. What the poster wants (and me as well) is concrete solutions. But you

  183. try mhonarc? by Anonymous Coward · · Score: 0

    I used to use mhonarc to create HTML navigatable archives of my email. http://mhonarc.org/

    I did this for both the Mac running apple mail and for the wintel running Outlook.

    The challenge is sometimes to get the foriegn email format into one of mhonarc's recognized formats.

    Then I would create archives that were no larger than a CD-R or DVD-R and archive them on those. Easy to mount and search.

    1. Re:try mhonarc? by termigator · · Score: 1

      I actually use mharc, http://www.mhonarc.org/mharc/, on my private server to archive various public lists and work-related email since it provides searching capabilities.

      For work-related email, I have nmh folders I file message to for the various projects I work on and then have a cron job that runs each night that uses mharc's mh-month-pack script to copy the mail into mharc's archive area.

      This system provides me an MUA-neutral way to read and search email. Since mharc keeps an mbox formatted version of all data, I can import the messages to any MUA I want, however since I still have the nmh folder data, I've never had the need.

      The other advantage of my setup is I can following mailing lists w/o cluttering my inbox. I have a separate mail account I used to subscribe to lists of interest, and I archive the messages on my private server for reading and searching whenever I want.

  184. if you don't mind the command line by shakotah · · Score: 1

    Sup http://sup.rubyforge.org/ it is gmail like in that it uses tags instead of folders and it automatically indexes all the email using Xapian in the background making it small home google for your email.

  185. Re: pack ratting by vandamme · · Score: 1

    You have 2 delete buttons for a REASON.

  186. death by Anonymous Coward · · Score: 0

    cheery thought. don't forget that when you die, there's more to read for whichever relative/partner inherits your PC.

  187. The obvious answer used to be ZOE by MCRocker · · Score: 1

    Back in the day, ZOE was exactly what you're looking for. It's an open source, cross platform turn-key, solution (Simple Server is built-in) that is designed to archive, index and search your email (using the Apache Lucene search engine). Jon Udel has a good article on O'Reilly that includes some screen shots.

    ZOE meets all of your requirements, though data import is a bit of a problem. There are several different strategies for data import, so one of them may meet your requirements.

    Unfortunately, ZOE is abandonware so it's not for the faint of heart. The original author was on the bleeding edge and tended to make 'interesting' technology choices like Tapestry for the framework, and using his own, home-grown build system and a Creative Commons license that isn't usually used for software. He eventually abandoned Java development for Lua and let the registration for the home page lapse. As a result, it's difficult to recommend this for all but the most determined, high functioning users.

    --
    Signatures are a waste of bandwi (buffering...)
  188. Dovecot + Maildir by PhilipJLewis · · Score: 1

    Yes, me too. Been using dovecot and Maildir files for years now. Before that I used a different open source IMAP servers (courier, cyrus, and UW imap) but since I used Maildir file format the transition was automatic (I used mbox format before that with UW imap server and conversion was really simple using the mb2maildir perl script). I have used IMAP servers etc for 18 years worth of email. I organise the 250,000 emails into different folders for each year as that makes searching much quicker. It's never let me down yet.

  189. Re:RETARD MODERATION by inKubus · · Score: 1

    I know it's a lot to ask these days to get people to read the comments that they are replying to,

    Oblig.: You must be new here.

    --
    Cool! Amazing Toys.
  190. What would I recommend? by Dieppe · · Score: 1

    ...That you get off the computer and get a life?

  191. Re:RETARD MODERATION by insertwackynamehere · · Score: 2, Informative

    Do you know what imap is? He's gonna have to have some central storage thing but the mail access is platform independent..yeah if he wants his imap server to be his own than he'll have to pick one os to serve from but every nonshit mail application has imap support from desktop to mobile and hands down gives him what he wants if he takes the time to organize and set it all up

  192. Mailstore by Anonymous Coward · · Score: 0

    You could try Mailstore. I'm using it for a while and for me it really does it's job

  193. codeworxx by Anonymous Coward · · Score: 0

    It's very easy - just use Free Edition of MailStore: MailStore Home - visit http://www.mailstore.com

  194. I have a solution by AbbeyRoad · · Score: 1

    I have the exact requirements as you, so I spent the last six months developing a
    solution. It converts SentBoxes, Inboxes, gmail, PST files and regular mbox.

    It archives and indexes everything and provides full text search with google-like
    phrase grouping and exclude phrases.

    It normalizes addresses, eliminates duplicates, understands every character set and
    can display any email within it's web GUI with proper inlining of pics-in-html.

    For me it can index 8 gigs of emails within a couple of hours.

    We are pilot testing this solution at an ISP for our customers.

    Would you like to try it out?

    My email http://2038bug.com/email.gif

    -paul

  195. Re:It's obvious - Gmail by hesaigo999ca · · Score: 1

    Is there a tool to download them again though once you have finished uploading them, and might lose the originals, and only have the ones on gmail, I am ignorant to their possibilities.

  196. Re:It's obvious - Gmail by flappinbooger · · Score: 1

    you can use imap to upload existing emails from a local email client such as outlook or thunderbird.

    You can xfer most other webmail emails to gmail, they have a method for it.

    Then, after that, gmail has functionality for imap or pop to get emails back off if you wish (either a copy or permanent).

    I've transitioned more than one small business to gmail.

    Googling various questions regarding this will reveal several good walkthroughs....

    --
    Flappinbooger isn't my real name
  197. Re:Do those you correspond with agree to profiling by jridley · · Score: 1

    Email is not a secure format and never has been. If you have anything you don't want to be public knowledge, don't use email, or encrypt it. This has been true since SMTP was invented. It's simply not secure. Everyone using email should know this.

  198. Commit no sin by rohitsz · · Score: 1

    If you commit no sin, you need no backups. Your slate is empty and clean. Emails, R.I.P. ~rohit.

    --
    Namaste.
  199. Re:Psychiatric consultation! by tha_mink · · Score: 1

    Imbecile. Outside of the USA, the majority of email addresses end in a country-specific suffix.

    Imbecile. Relying on a domain tld to gather demographics.

    --
    You'll have that sometimes...
  200. Anonymous Coward by Anonymous Coward · · Score: 0

    well you can of course, from outlook express select all and then open a folder and name it as you like then drag and drop when you want. That easy. It takes about 2 to 3 minutes for a 3 GB size E-mails, and the good part is that when they are being transfered if the names are the same, it automatically gives a number to it..like joe ( 1) (2) ...and so on.

  201. Another approach by sonamchauhan · · Score: 1

    "I have kept every email I have ever sent or received since 1990"

    Instead, I keep every email I sent.

    And I make it a habit of acknowledging all emails I deem "important", quoting the full body of the original message in my reply.

  202. Re:Psychiatric consultation! by Anonymous Coward · · Score: 0

    My, what a prick you are. No wife or kids to beat up?

  203. archive old emails. by seekertom · · Score: 1

    Most of my emails are able to be viewed on one screen. I can take screen prints, paste into paint, save as a file. In cases where I need access to editable text, I open the em, mouse-select the text and then paste it into word pad. It takes opening each one, but it's still faster than trying to fwd them somewhere. Best is, I can do it all locally without depending on anyone else, isp, em program etc. Now that i/you know you have this problem with the archives, begin today with the new incoming ems and don't get behind, tho a little behind is ok in my book.

  204. Re:Psychiatric consultation! by Fatalis · · Score: 1

    My old alarm clock PC doubles as a web server.

    So does mine, but it's kind of quiet, because the alarm clock part is just bells attached to the CD tray and a cron script, and it's only been useful once when the server was located in my friend's bedroom. I think the video I've linked to was shot at that time, when I knew he was oversleeping and I ran the script remotely.

    --
    Deus est fatalis