Slashdot Mirror


Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive?

First time accepted submitter txoof writes "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010. The archive is for storage only. It is a mirror of my Gmail account. The archives are both sent and received mail compressed in a hierarchy of weekly, monthly and yearly mbox files. I've chosen mbox for a variety of reasons, but mostly because it is the simplest to implement with fetchmail. After inspecting some of the archives, I've noticed that the larger files are a result of attachments sent by well-meaning family members. Things like baby pictures, wedding pictures, etc. What I would like to do is from this point forward is strip out all of the attachments and only save the texts of the emails. What would be a sane way to do that using simple tools like fetchmail?"

167 comments

  1. Why bother? by grumbel · · Score: 4, Insightful

    Storage is cheap and 500MB are hardly worth worrying about. The damage done by reducing that amount will likely be far larger then any temporal benefits you might get. If you want to have it smaller so that you can have faster search, look for a tool that is better at searching and indexing the mails instead of trying to cut the mail into pieces.

    1. Re:Why bother? by InsightIn140Bytes · · Score: 1, Insightful

      Exactly this, and even if it's a few GB. It's just too small amount to bother about. Besides, you never know which one you may want or need later. Even the ones you snobbishly think as uninteresting now.

    2. Re:Why bother? by Anonymous Coward · · Score: 1

      That's true, but like the person posting the article it would be nice to have a convenient way to strip out all the attachments and have a "text only" archive with the attached images and other files stripped out (small and quickly searchable), and a "content/media only" archive of the attachments in the form of plain files rather than encoded within e-mail messages. The files collection would be smaller and easier to index with standard tools (e.g., thumbnails browser for the images). Having it all as a big "blob"-like mess crammed together may be simple, but it's an inefficient mess. And if I just had an easy way to browse through all the attachments at once I could probably strip the archive down to a tenth its regular size by throwing away most of the attachments I already have stored elsewhere. Binary files take up a lot of space when encoded as mail, and they're probably >90% of the space in the mailbox.

    3. Re:Why bother? by AliasMarlowe · · Score: 3, Interesting

      Exactly this, and even if it's a few GB. It's just too small amount to bother about.

      Agreed. 500MB is trivial, especially if it includes a bunch of large attachments. I just checked my email directory at home, and it's 2.7GB in size. It's on a network drive and Thunderbird accesses it more-or-less instantly; there is no discernible lag in showing the content of any mail folder - the hierarchy of folders is complicated, but some folders are large. The network drive is backed-up automatically three times a week, so its risk of loss is tolerably low. With modern email clients, the penalty of huge email directories should be tiny.

      --
      Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    4. Re:Why bother? by NotSanguine · · Score: 1

      I have my email going back to 1996. Several copies (2.4GB) of it in fact, as the email has moved with me through a number of disk and system upgrades. If you really must free up the space, why not just write the mail to a CD/DVD or a USB stick or put it on the SD card of your smart phone?

      --
      No, no, you're not thinking; you're just being logical. --Niels Bohr
    5. Re:Why bother? by Anonymous Coward · · Score: 1

      Mine's around 10 GB and kept in git in case TB decides to corrupt it again.

    6. Re:Why bother? by txoof · · Score: 2

      Storage is cheap, but backing it up to S3 is less cheap. I looked through a bunch of the mail and discovered that what I really wanted to save was the text. The rest is backed up on Google. If I lost it all, it wouldn't be a tragedy, but the mail between my wife and I before we were married and messages between my family are the things I treasure most, not the photos that I can find on facebook/flickr/gmail/picasa/etc.

      Finding a way to save some space and some bucks is worth while for me. After a lot of googling, I eventually landed on a script by Mike Leonetti that did most of the work for me stripping mime attachments. I had to tweak it to work with fetchmail and procmail, but I eventually kludged it into working. I'm just testing it out now and hopefully it will do the job. Perhaps others would be interested. You can find a copy here: Stripping Mime Attachments.

      If anyone has a better solution, I'm definitely interested as my Perl fu is pretty weak and this solution is a pretty huge kludge.

      --
      This one's tricky. You have to use imaginary numbers, like eleventeen... --Hobbes
    7. Re:Why bother? by erroneus · · Score: 2

      True-true.

      But on a related note, I have often longed for a "generic email database format" which could be a universal format for all email programs out there in some way. Pretty much a dream which is long over-due... about 10 years past-due. Perhaps there is already something like that and it has escaped me all these years but I seriously hate migrating email from one format another. Not long ago, I was helping someone to recover some old email (Outlook Express) and contacts which were in Japanese and not in UTF-8 format. Turned out that the Windows Live mail didn't do a good job of importing that format/language of email at all no matter what I did. Fortunately, I was able to access outlook express on an old Windows XP VM I had and it worked out okay by exporting OE to MS Outlook in a PST file.

      Still... it would be nice if there were some universal email archive format which all email programs can use. And you know? The content of email hasn't changed since it was created long ago. Why can't we do something as seemingly obvious as this?

    8. Re:Why bother? by Spudley · · Score: 1

      Storage may be cheap, but that's hardly an excuse for being cluttered.

      Ask yourself: When are you ever going to read all those email again? When is *anybody* ever going to read them again. And the more you have, the less likely it is that they ever will be read, because the more you have, the more time it will take to go through them.

      And don't tell me that doesn't matter because it's easy to run a search -- the same still applies, and you'd only bother running a search if you had something specific you wanted to search for. Is there anything in your 2003 email archives that you are likely to want to search for? The answer to this question may well be 'yes'; you know your archives better than I do; but I'll tell you this: if you haven't found the need to search an archive over the last five years, then the odds are diminishingly small that you'll need to in the future.

      My advice is to keep your archives, but take the time to filter out the stuff you really don't need or want any more.

      First, sort the list of emails by size.
      This will give you all the ones with attachments. The odds are most of the big stuff can be deleted. Most of the stuff you want to keep you'll already have extracted from your email and saved somewhere else. So feel free to delete them. There will also be obsolete software, video and flash attachments that were funny five years ago, and other junk. Deleting all this stuff will free up a substantial portion of your disk usage.

      Next sort the list by name of sender.
      This might sound odd, but it's a very quick way to see who you were talking to all those years ago. There might be a few surprises in there. People you'd lost touch with an virtually forgotten about. Maybe this is your chance to remind yourself to get back in touch? If so, then the exercise has been worthwhile even if you don't delete anything. Or maybe you know you don't want to talk to them. In that case, you do really want to keep those old emails from them? Get rid of them. It's cathartic.

      Next, check if you've been subscribed to any mailing lists over the years.
      Possibly you'll want to keep some of those archives, but equally there can be a lot of pretty mundane chatter on these things, and the bits that are relevant are often only relevant for the moment. It depends a lot on the individual lists, but my experience is that content five years old or more is unlikely to still be of much value. And in any case, most good mailing lists have their own archives online. So your own copies in your archives may be pretty pointless. Be ruthless and delete them.

      My guess is that if you followed that advice, your email archives are now about a quarter of their original size. And nothing of value was lost.

      In fact, doing an exercise like this every now and then can actually be helpful. Not because it saves disk space, but because it means that you do actually go back every now and then and look at what you were doing a few years ago. It's remarkable the things you forget over time. Sometimes its good to be reminded. Other times you may not want to be reminded, but that's what the delete key is for; delete it, and you won't need to be reminded of it again when you do this same process next time.

      --
      (Spudley Strikes Again!)
    9. Re:Why bother? by ArundelCastle · · Score: 2

      nearly 500 megs from 2010.

      OP did not specify how much space is being used total, but everyone is taking the 500MB as the main sticking point. *facepalm*

      The point being it will get larger in the future, even if OP never runs the risk of exceeding Gmail's quota.

      Far as I can tell this is a TMI question about fetchmail and attachments. Wish I could help.

    10. Re:Why bother? by houghi · · Score: 3, Interesting

      Why bother indeed. When I look at my mailfolders, I try to think on my personal mail when the last time was that I actually searched for something older then one year,

      Mails that I keep are orders I placed and passwords that I requested. All the rest I delete after one year.

      I already do a lot of deleting after reading already. e.g. most mailing lists will be deleted almost immediately. Things I keep are bug reports I filed, till they are closed.

      This is something I do in real life as well. If I have not used something in a year and there is no emotional value, I will trow it away. Even though it is technically possible to keep everything, I see no reason to do so.

      --
      Don't fight for your country, if your country does not fight for you.
    11. Re:Why bother? by J4 · · Score: 2

      "snobbishly"

      WTF? It's his own personal email.

    12. Re:Why bother? by Anonymous Coward · · Score: 1

      GMAIL provides an offline capability. Once it is installed, all attachments can be found somewhere in a folder inside your "Documents" folder. Kind of a workaround, but you can see all the current attachments ! Does the job if you want to archive them...

    13. Re:Why bother? by grcumb · · Score: 5, Insightful

      I have often longed for a "generic email database format" which could be a universal format for all email programs out there in some way. Pretty much a dream which is long over-due... about 10 years past-due. Perhaps there is already something like that and it has escaped me all these years but I seriously hate migrating email from one format another.

      Take a look at Maildir. It's not perfect, but it is generic, simple and easily transferred from one location to another.

      RANT: Over the course of my (far too many) years of working in technology, I've often been amazed just how enamoured everyone is with databases. There are some things that databases do well, granted, but just because something needs an index doesn't mean it needs a relational database. /RANT.

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    14. Re:Why bother? by grumbel · · Score: 4, Insightful

      My advice is to keep your archives, but take the time to filter out the stuff you really don't need or want any more.

      The problem with that is that it's extremely hard to judge what you will find valuable 20 years down the road.

      Simple example: Old TV recordings on VHS. I have all of Star Trek: TNG on VHS, labeled, sorted, with the commercials cut out. All nice and dandy you might think.

      You know which part I would love to rewatch? Now, some 15 years later? The commercials, exactly that part which I deleted. All the episodes I can get easily on DVD or on BluRay without problems, with higher quality and everything, but the stuff between the episodes? Nope, that's not available. Here and there a bit of stuff shows up on Youtube, but raw uncut TV from 15 years ago simply isn't easily available.

      There will also be obsolete software, video and flash attachments that were funny five years ago, and other junk.

      Yeah, and exactly that stuff might turn out to be extremely valuable years down the line, as your copy of it might be the only copy left or at least the only copy accessible to you.

      I have absolutely nothing against sorting, indexing and organizing the data, I quite welcome that, but that should be done as a layer on top of the data, not by hacking and slashing the original data itself.

    15. Re:Why bother? by Anonymous Coward · · Score: 0

      Projecting your own issues on others or what? Seriously, some people do have good memories they do in fact enjoy remembering. I sincerely hope you will have some too!

    16. Re:Why bother? by jtownatpunk.net · · Score: 1

      It's called "search". It would take tens of hours to manually sift thru all of my email and clean it up. And then I'd still need to use the search function to find stuff quickly. So what would I gain from this hypothetical cleanup scenario? I'd save maybe 2.5 gigs of storage. Be still my heart. It's a very poor economic tradeoff. My time (even unpaid time) is worth a heck of a lot more than that.

      I don't understand this "cluttered" concept that seems to distress you. They're bits on a hard drive, not filing cabinets in my office. If I have 50 messages in my archive or 58113, it looks exactly the same to me. I don't even bother filing now that search tools have become so quick. If a message isn't above the fold in my mail client, I just search it. Takes about 4 seconds to search my entire archive. Less time than it takes me to move a message to a folder. Why on earth would I spend 5 minutes, let alone 50 hours, clearing out old messages?

    17. Re:Why bother? by Larryish · · Score: 0

      GTIManiac, is that you?

      hash here.

      Hit me up, fucker. Use email, ICQ isn't up anymore.

    18. Re:Why bother? by batkiwi · · Score: 2

      Maildir is exactly what you say, a generic email database.

      It's not a relational database, as email isn't really relational in nature, but solves most/all of the problems you need to solve around storing emails. The only big "miss" in maildir is that attachments are stored inside the main message, making pass-through deduplication difficult/impossible.

      (many storage devices now can auto deduplicate files that are identical, so if you get the same image in 15 different emails due to reply-to-all etc you only store the image once).

      I think email clients (and servers for imap-searching) should keep a relational attribute-based index of emails (so that you can instantly pull up all emails from "bob," or all emails on oct 31), but that's an internal implementation note and not the actual mail store.

    19. Re:Why bother? by icebike · · Score: 4, Interesting

      Ask yourself: When are you ever going to read all those email again? When is *anybody* ever going to read them again.

      As soon as:
      1) you divorce
      2) you get arrested for ANYTHING
      3) They arrive with a search warrant for any reason
      4) You sue or are sued
      5) You run for office
      6) You get hacked

      Seriously, I keep VERY little historical Email. Very little.
      I am not so vain that I believe there is any historical significance, and have never needed to go back more than a couple months for anything.

      Just Delete it. Its safer that way.

      --
      Sig Battery depleted. Reverting to safe mode.
    20. Re:Why bother? by Anonymous Coward · · Score: 3, Informative

      That's easy. (Old school) Eudora uses the mbx format, but separates the attachments from the mails.

    21. Re:Why bother? by aardvarkjoe · · Score: 1

      My guess is that if you followed that advice, your email archives are now about a quarter of their original size. And nothing of value was lost.

      Well, except the time that you spent sorting through all your old e-mails. I'm sure that I could erase 99% of the old e-mails in my archive ... but that would require actually going through them so that I could save the ones that I may need in the future. (And yes, every once in a while I have a reason to go find something from ten years ago or more.)

      Remember, "clutter" only matters where it actually impedes your efficiency. Your computer doesn't care how many junk e-mails you've got in your archives. In my case (and it sounds like the question submitter's case), storage prices are dropping and capacity is rising much faster than my e-mail archive's size. It makes a lot more sense for me to just save everything and search for what I need when I need it.

      --

      How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?
    22. Re:Why bother? by pla · · Score: 2

      WTF? It's his own personal email.

      Poor choice of words, perhaps, but I completely understand the sentiment. I've had some form of email since around 1991, and despite my OCD-like "completionist" tendencies, I never thought to archive it all until sometime around 2003.

      Now, considering the tiny actual disk space those early emails would have taken, I sorely regret my earlier habit of read-respond-delete.

      These days, I delete spam (and some large attachments), and nothing else. And some day, I'll probably regret deleting even the spam... But since I get around a 10:1 ratio of spam, I can't realistically keep it all.

    23. Re:Why bother? by Anonymous Coward · · Score: 0

      Wow! You are _SO_ clever!

      Is your IQ, like, a bazillion?

      I wish I was as clever as you!

    24. Re:Why bother? by simcop2387 · · Score: 3, Interesting

      I'm at about 12GB myself, and that's one of the two big reasons that I keep the mail in maildir format and connect all clients to it via imap. Using a real mail server has kept that from happening to me (again) for years now. The other reason is that it makes it really easy to change clients to play around, or access it from lots of places.

    25. Re:Why bother? by pla · · Score: 2

      That's true, but like the person posting the article it would be nice to have a convenient way to strip out all the attachments and have a "text only" archive with the attached images and other files stripped out (small and quickly searchable), and a "content/media only" archive of the attachments in the form of plain files rather than encoded within e-mail messages.

      Search for everything. Sort by attachment. Select all (that have attachments). Save attachment(s). Delete attachment(s). Done.

    26. Re:Why bother? by Larryish · · Score: 1

      I wish I was as clever as you!

      You and everybody else.

      BTW why was that post modded troll on Slashdot?

      Do people still fall for that?

    27. Re:Why bother? by sootman · · Score: 1

      Funny. Maybe I've been working with databases too long that it's affected my mind (or maybe I got *into* databases because that was *already* how my mind worked) but I've *always* wanted to be able to say things like "show me all messages from my mom, dad, or sister, that arrived in 2005 and had attachments, and sort with the biggest at the top."

      On a related note, the fact that Gmail doesn't let you click column headings to sort absolutely kills me.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    28. Re:Why bother? by grcumb · · Score: 1

      Funny. Maybe I've been working with databases too long that it's affected my mind (or maybe I got *into* databases because that was *already* how my mind worked) but I've *always* wanted to be able to say things like "show me all messages from my mom, dad, or sister, that arrived in 2005 and had attachments, and sort with the biggest at the top."

      Oh, don't get me wrong, I love that kind of stuff too. I once worked on a product that allowed you to construct queries along the lines of 'Show me every speech by every West Coast politician who spoke about the salmon fishery between July and September, 2009, translated into French.' But here's the thing: It didn't use a relational database.

      I love finding clever ways to mangle data. It's my bread and butter. But I do NOT love relational databases enough to use them for everything, all the time.

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    29. Re:Why bother? by Anonymous Coward · · Score: 0

      If you need people to be able to search to prove you are in the right, then you really should buy an email archiver with e-discovery (buy they are priced for business consumption not consumer consumption.) Such as models from tangent, crital path, intradyn, barracuda, etc (a web search will yield many appliances and cloud based ones) Most will not tell you the costs. These will prove your side providing the emails are as you state. These have no way of removing emails for years.

    30. Re:Why bother? by gaspyy · · Score: 1

      Been there, done that.
      My email archive dates back to 1995. Over the years I've been using Pine, Eudora, Outlook Express, Netscape Communicator, Outlook, Thunderbird, Windows Live Mail.

      I converted everything to EML. It's a simple format, easy to read and parse, recognized by the OS. With a simple script I renamed each file to YYYYMMDD-From-Subject.eml, so now it's accessible any way I like, gleaming at the file name, or by searching the contents (Windows 7 indexes EML files).

      Writing a script to strip attachments is trivial compared to mbox.

    31. Re:Why bother? by Malc · · Score: 1

      Thousands of tiny files? That sounds really efficient, especially if you want to copy them, or perform some other global operation. ZZzzzz.

    32. Re:Why bother? by hairyfish · · Score: 1

      Agree. I read email, deal with it, then delete it. 15+ years of using email and I've never found any reason to keep email ever. My current work email inbox is less than 100MB, most of which was generated in the last couple of weeks. My personal email has nothing in it. Why bother?

    33. Re:Why bother? by wvmarle · · Score: 1

      Cyrus imap server has indexing services built in. Works well.

      I can search my complete e-mail archive (something like 12 GB over 8 years, including attachments) in seconds, while I'm sitting at home with barely any mails copied locally (only mails that I actually opened are pulled in, for the rest only the headers are downloaded).

      Mail client is Evolution; server is Cyrus imapd. I do assume other IMAP servers will have similar functionality, and other IMAP mail clients will also handle server-side searches just fine.

    34. Re:Why bother? by wvmarle · · Score: 2

      I have some 12 GB of mail, mainly business related, lots of attachments, dating some 8 years back.

      Quite regularly (once a month or so) I am looking for some e-mail that I received well half year to a year ago, to look up some detail about an old deal or offer.

      And sometimes I have to look up something that's a bit older than that. Two, three times so far I have been searching through e-mails that dated five, six years back, pretty much the beginning of the archive then. And that usually also had to do with the attachments.

      It is totally unpredictable what you could need in the future, which is why I don't delete any of it. Well, that is except the sent folder which I now and then trim (though the last time I did that is also two years ago or so by now), as almost everything that I send out comes back to me as quotes in a reply mail. And that's enough for me.

    35. Re:Why bother? by AmiMoJo · · Score: 1

      Google will sell you 20GB extra space for $5/year. You can now upload arbitrary file types to Docs too, including encrypted archives.. The only cheaper alternative is Skydrive which gives you 25GB free, but bulk uploading does require Silverlight unfortunately.

      Considering the amount of time and effort I'd have to put in to sort through and reduce my archives $5 a year is an absolute bargain. You get Gmail's search and anti-spam systems too which are pretty valuable in themselves.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    36. Re:Why bother? by Anonymous Coward · · Score: 0

      Mail storage under flat file system? Sounds really great.

      I myself have always been a fan of (ab)using the file system as much as I can. It seems that by using Maildir I will be able to check email structure with programs like WindirStat or KDirstat.

      Thanks a lot :)

    37. Re:Why bother? by dbIII · · Score: 2

      7) New girlfriend
      "Why did you never send me emails like those ones you sent her?" she asked.
      "You didn't even have an email address before you moved in" just wasn't a good enough excuse.

    38. Re:Why bother? by Anonymous Coward · · Score: 0

      Sir, if your girlfriend has access to your email, you have already fucked up.

    39. Re:Why bother? by MooKungfu · · Score: 1

      Yeah, man, like, live in the now.

    40. Re:Why bother? by uglyduckling · · Score: 1

      I'm not keen on deleting, but archiving and putting elsewhere is a good idea, otherwise you end up with thousands of hits when you do a search for something. If I know I want something from 5 years ago I can always open the archives.

    41. Re:Why bother? by Anonymous Coward · · Score: 0

      Damn...reluctantly after storing them for many many years, I just got rid of the labeled (title and description) tapes of STNG, DS9, and Voyager that my mom made when the shows were new, ads included. I guess someone out there might have actually liked them... *sigh* (I love Star Trek, but I don't even have a VHS player now!)

  2. BURN IT !! by Anonymous Coward · · Score: 0, Offtopic

    Then don't go back. EVER !!

  3. Don't do it by Anonymous Coward · · Score: 1

    My email archive dates back to 1999 and is 2GByte in size which isn't much considering the attachments.

    I "handle" it by making a backup of it.

    I do not clean it up. I do clean around it by deleting mail archives that contain mails that have no personal value.

    I do not delete personal mails since it is precious like photos.. In 2011 nobody has to delete his personal mail..

    This news is stupid

  4. Isn't there a way... by jabberw0k · · Score: 1

    Surely a tool exists to keep email in a SQL database, so the envelope fields, plain text, and attachments are separately searchable. I have email back to 1996 with the same frustrations.

    One would think that Thunderbird would have done that a decade or more ago, but no. Nor does any of the standard IMAP servers seem to support SQL (MySQL, Postgres) as a backend: This seems like a serious project waiting to happen. Or have I overlooked an obvious solution?

    1. Re:Isn't there a way... by BitHive · · Score: 4, Interesting

      You have. Thunderbird includes archival folders and a Lucene search engine.

    2. Re:Isn't there a way... by zmughal · · Score: 5, Informative

      There is DBMail.

    3. Re:Isn't there a way... by icebraining · · Score: 2

      Sup uses Xapian, it's pretty fast too.

    4. Re:Isn't there a way... by cras · · Score: 2

      Email isn't stored in SQL, because typically it's rather pointless. Full text search indexing doesn't require SQL, and it's more efficient without SQL anyway. There are some good use cases for storing emails in SQL database, but efficiency isn't one of them.

    5. Re:Isn't there a way... by SealBeater · · Score: 1

      Actually the point of storing email in SQL isn't just for indexing, there's a huge speed advantage. DBMail (which I've administered and installed) is used for high volume mail transactions, on the order of 200,000k per sec. Also, having a DB backend carries with it all the advantages of having a DB, snapshots, mirroring, cross-regional updates, backups, etc. I agree, you can definitely get by without it but having email in a database is nice.

      --
      -- Its survival of the fittest...and we got the fucking guns!!!
    6. Re:Isn't there a way... by cras · · Score: 1

      Yes, there are some advantages to using SQL database, like I said.. But I highly doubt "huge speed advantage" is one of them, unless you compare to a really badly set up system. I know people have switched from DBMail to Dovecot simply because Dovecot is so much faster..

    7. Re:Isn't there a way... by bill_mcgonigle · · Score: 1

      Yes, there are some advantages to using SQL database, like I said.. But I highly doubt "huge speed advantage" is one of them, unless you compare to a really badly set up system.

      Yeah, like Thunderbird still using Mork databases after a decade of sharp-poke-in-the-eye performance.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  5. why not save attachments? by Anonymous Coward · · Score: 0

    I am confused why you would want to save only text and not attachments?
    What's the point of having a note that says : here are the pictures of your long lost relative : and not have the picture in your archive?
    It's about the attachment in most cases isn't it?

    1. Re:why not save attachments? by thogard · · Score: 1

      Many attachments are in the mailspool lots of times. This is how google started allowing massive amounts of email storage, they only store a given attachment once (or so) even if its in a million email messages.

      It would be nice if there was a better way of going through the archives and moving the attachments off to one place to deduplicate things.

  6. MailStore by Anonymous Coward · · Score: 0

    I use MailStore Portable version on a USB key. Works very well for me, and is free for home / personal use.

    http://www.mailstore.com/en/mailstore-home.aspx

    1. Re:MailStore by mysidia · · Score: 0, Flamebait

      It appears to be a Windows-only solution.... no version for MacOS or Linux.. less than useful I would say.

    2. Re:MailStore by Anonymous Coward · · Score: 0

      "Right, because no one uses Windows"

      I think he was addressing the smart folks. You know, the ones who don't use Windows.

  7. SQL by papabob · · Score: 0

    yes, I know it isn't what you asked, but if you know a little of SQL you can create a simple database with a few tables: mails ordered by date, relations between them based in the header (to follow up responses) and various types of attachments.

    Accessing to it with sql its not more complicated than with fetchmail (unless your fetchmail isn't the same fetchmail I remember ;) and as extra you can create a simple web page with search options, and point relatives to it when they ask you for that photo of the dog they sent you five years ago.

    1. Re:SQL by Anonymous Coward · · Score: 1

      What rubbish! Don't talk about stuff you know nothing about. Go do your homework kid.

    2. Re:SQL by gl4ss · · Score: 1

      What rubbish! Don't talk about stuff you know nothing about. Go do your homework kid.

      yep, creating a friggin web app sure isn't the easy solution.
      the question asked here is actually "anyone know a script that would go through mbox and detect where an attachment starts and ends and just strips those away".

      the common sense answers are though "don't strip them away, you'll lose the content and besides storage space is cheap".

      storage space is cheap, both local and cloud. I wouldn't trust too many cloud startups to be out there in a while though..

      --
      world was created 5 seconds before this post as it is.
    3. Re:SQL by papabob · · Score: 1

      yay, my bad. I always forget this is a site where the men are still men and anything that doesn't involve writing an obscure script only known by its creator is forbidden (and a browser to share? Jesus!)

      Anyway, if are there any other pussies like, me there are tons of ways and utilities to convert mbox to maildir, and then it's easy to parse them to an DB (i'd recommend sqlite, it still is a simple file(s) in a directory, and the overhead would be minimal)

    4. Re:SQL by Anonymous Coward · · Score: 0

      My emails are all stored in a Postgres DB. All with my own scripts. Of course I wish it was only growing at 500MB/year. Now to get it internet accessible.

  8. 500 megs? How about 5GB/year! by Corporate+T00l · · Score: 1

    This problem seems almost too simple, text-only and only up to 500MB per year.

    I have a much tougher problem, a mailbox that is growing about 5GB per year that I still need searchable. And, stripping out the attachments is not okay, I need a way to still access them since many of them are receipts in PDF or edits on documents where the e-mail trail is the only record of changes over time. Thus ideally, the attachments should be indexed as well.

    I guess you could do what many apache.org sites use: mod_mbox to make a web-accessible version of your your mail folders, possibly pre-processing them with an mbox splitting tool to get them into bit-sized chunks.

    Then, overlay a search tool like Lucene Imagination (which is what lucene.apache.org uses) or any other local web indexer of your choosing in order to build searchability.

  9. Seconded by Colin+Smith · · Score: 1, Redundant

    Nothing to see here. Move along.
     

    --
    Deleted
  10. Why keep it? by Anonymous Coward · · Score: 2, Insightful

    If you're not following Sarbanesâ"Oxley, just delete it. Fuck the pack-rat mentality.

    1. Re:Why keep it? by MrMickS · · Score: 1

      Its personal email. I've reasonably often searched back over a number of years for something I vaguely remembered, finding the associated emails gave me the information I was looking for. I have personal email going back to 1996 all sitting behind an IMAP server. I did look at clearing it down at one time but, in the end, that was more effort than simply leaving it there.

      I guess I'm not part of the disposal culture that we have these days. I place value on history, even my own.

      --
      You may think me a tired, old, cynic. I'd have to disagree about the tired bit.
  11. Thunderbird + AttachmentExtractor by Anonymous Coward · · Score: 0

    https://addons.mozilla.org/en-US/thunderbird/addon/attachmentextractor/

    I've used it to process large batches, it's pretty robust.

  12. Come on, man by Anonymous Coward · · Score: 0

    Did you even try Google yet? http://lmgtfy.com/?q=strip+attachments+munpack

    You can probably also do it with procmail or perl or whatever scripting language you prefer. Let me know if you can't find the search box on Google and I'll post some more lmgtfy.com links for you.

  13. EASY SOLUTION HERE by Anonymous Coward · · Score: 0, Troll

    look dude... we have 2011 where people confuse TB and GB and... and people like you are seriously full of shit...

    just delete everything (including the baby pictures) of personal mail your ever received.. because all of it sure was wasted on you...

    500MB is a fucking joke... i have porn movies captured from VHS tapes that is larger than 500MB... and you want to delete baby pictures... damn you suck...

  14. Re:500 Mb only? by optimism · · Score: 5, Insightful

    Many people have a larger email store than you.

    It is not a sign of status.

    More likely, it is a sign of your incompetence to filter and save relevant data.

    Congratulations.

    Now back to the OP, who perhaps is smarter than you, since he has has just 500MB of email to back up.

  15. get therapy by __aaacoe2998 · · Score: 0

    What possible reason could you have to save personal emails from that long ago? And you want to save the text, but not the attachments? Years from now you're read an email that says: Here's the pix from xmas, enjoy!

  16. dovecot with mbox.gz by Anonymous Coward · · Score: 0

    This doesn't answer your question but may be helpful - dovecot supports (imap and pop3) reading gzipped mbox files. Keeping my archives gzipped brought them to a manageable size.

  17. I hope your family finds this by Anonymous Coward · · Score: 0

    You insensitive clod!

  18. Megs? by Anonymous Coward · · Score: 1

    My gmail inbox is using 2.7GB, or roughly 34%. I know someone using more than 70%. They provide a way to get more room for a reason.

    Just keep it all, and as other people have said, index it.

  19. Here is what I do by Anonymous Coward · · Score: 1

    fetchmail + procmail sorted into different folders in maildir format.

    I don't auto strip attachments on large mails just sort them out from the rest, but It would be easy to add.

    I have a maildir folder for inbox, outbox, notme (e.g., stuff addressed to distribution lists), large (here go all the mails with massive attachments).

    large, for me, is manageable to go through manually. It only has a few tens of messages / yr.

    If I had more to go through, procmail could call a simple script to strip the attachments on all the mails that are large enough to currently get sent to the "large" folder. I have something setup like this already to train a Bayesian filter on mail dropped to certain folders.

    Here is the relevant procmail (pretty simple to do):

    # if the message is huge, probably don't want to archive it, even if directly
    # to me :0:
    * > 200000
    ${DUBIOUS}.large/

  20. IMAP by spinkham · · Score: 2

    IMAP is another potential answer.

    I run Dovecot locally, and it stores every mail I've ever received, indexed for quick searches.

    This way I can get my mail with all history and a fast search index on all my devices also.

    --
    Blessed are the pessimists, for they have made backups.
  21. Write a program by somethingtoremember · · Score: 0

    That takes the subject of any email with an attachment and moves it out of the .mbox into a photo archive (my guess is that your attachments are mostly photos with some videos) After you strip the photos from the mbox, gzip it and you should be fine. You'll have a compressed archive of correspondence, and an easily browsed directory of photographs. For extra points, use the program ``touch'' to date the attachment files to their original received-by date.

  22. Read then purge ... by MacTO · · Score: 5, Insightful

    There is probably some email that you need to keep, but chances are that you don't need to keep most of your email. So just read, respond, then purge (when appropriate).

    As others have pointed out, disk space isn't really a concern this day in age. But managing data that you don't need is a concern. A minute spent filing, backing up, etc. of unnecessary data is a minute wasted. Add enough of those seconds together, and it may amount to a good chunk of your life spent doing more interesting/productive things.

    As a side note, I notice that people sometimes get attached to things that don't really matter to them. I've known people who have lost all of their data due to circumstances beyond their control, then they became very distressed about that loss of data. The problem is that only a tiny fraction of that data was actually valuable, but they were worrying about all of the data. In some cases it was so traumatic to them that they spent more time worrying about the irrelevant stuff than the stuff that they would need to continue on in the future. So if you don't keep the irrelevant stuff, you can focus on what is relevant.

    1. Re:Read then purge ... by jones_supa · · Score: 0

      A minute spent filing, backing up, etc. of unnecessary data is a minute wasted. Add enough of those seconds together, and it may amount to a good chunk of your life spent doing more interesting/productive things.

      It creeps me how young geeks invest their time in managing their home NAS or even a personal mail server (+domain) while it would be much more simple to just use an USB hard disk and webmail and then use that time for something more creative.

    2. Re:Read then purge ... by Anonymous Coward · · Score: 1

      Good points. However - what is relevant is dependent on time. Your judgement of relevancy is all about what you care about in this point in time. As others have already pointed out - store it all, it's cheap. You really don't know what might be valuable to you later.

    3. Re:Read then purge ... by vadim_t · · Score: 3, Insightful

      It creeps me how young geeks hand out all their personal data to the first free provider they happen to come across.

      Yeah, it's a bit of a pain sometimes, but the benefit of having the data where I want it, dealt with how I want it, outweighs the cost IMO. It also makes for good system administration practice if you have an interest in that kind of thing.

    4. Re:Read then purge ... by lucm · · Score: 1

      I used to archive my emails, then one day by mistake they were deleted. For a minute or two I was freaking out, then I felt relieved. I needed to lose them completely to understand that I did not need them. It was like a security blanket (what if I need a cd-key I received by email, or if I want to read again the bad poetry I sent to my ex?), nothing else.

      For the last few years not only did I not archive my emails, I also made sure to change my email address once or twice a year to weed out the crap. And there was never a single time where this "lean" policy caused me a problem. It's like losing everything after a fire at home; losing that Dallas VHS box-set or all those National Geographic magazines is refreshing.

      --
      lucm, indeed.
    5. Re:Read then purge ... by Anonymous Coward · · Score: 0

      As they get older they will realize that. Until then they are having fun. I used to do the same as them. As I got older. I simplified and went to a big external drive just hanging off some low power computer that is running all the time. It just holds my junk files. 99% of which if I lost them I would not be too mad.

    6. Re:Read then purge ... by Anonymous Coward · · Score: 0

      First: I ain't so young...

      Second: My "NAS"/email server box is going on 8 years without need of maintenance except to add an account or two, replace and add more hard disks, and do backups. It'll get replaced soon.

      Third: Well written tools do the work for you without intervention so you can concentrate on getting your job done without extra labor and worries.

      Fourth: No corporation is pawing through my correspondence and files or giving access to them to others. They are private and will stay that way.

    7. Re:Read then purge ... by kava_kicks · · Score: 1

      I don't know that I am that young anymore, but I love my NAS (old PC, 2 x 2TB drives mirrored, Solaris, ZFS). I store all my photos, personal documents and videos of the kids. Everything is mirrored, snapshotted and stored offsite in the cloud for $5 a month (thank you CrashPlan). It takes no time to manage and keeps everything safe. Why would this creep you out?

    8. Re:Read then purge ... by jabberw0k · · Score: 1

      Apparently you never do research that you will need to consult later, nor do you correspond with people who might die. Do you only live in the moment?

    9. Re:Read then purge ... by lucm · · Score: 1

      If you rely on email for research, you have a bigger problem than living in the moment

      --
      lucm, indeed.
    10. Re:Read then purge ... by MrMickS · · Score: 1

      It creeps me how young geeks hand out all their personal data to the first free provider they happen to come across.

      Yeah, it's a bit of a pain sometimes, but the benefit of having the data where I want it, dealt with how I want it, outweighs the cost IMO. It also makes for good system administration practice if you have an interest in that kind of thing.

      I never cease to be amazed by this. If you read this site at first you would suspect that everyone is concerned about privacy and organisational access to their data. Then you come across questions like this were the solution is generally 'leave it to Google' or 'load it onto XYZ storage service'. To me the two things seem diametrically opposed. The uneducated viewer would be spinning trying to work out what people really think.

      I don't understand the trust in Google that people have, especially with personal email. I keep my email on my server. It took very little time to setup and pretty much runs itself. I just need to apply security updates from time to time.

      --
      You may think me a tired, old, cynic. I'd have to disagree about the tired bit.
  23. In my experience by Anonymous Coward · · Score: 0

    I had a practice of burning each year worth of email on one CDR in mbox format, but I have found that I never actually need to refer to those old messages. Also in general there's so much data these days that I've found it best to just archive the cream of the crop.

  24. What's the motivation? by Just+Brew+It! · · Score: 3, Insightful

    Even at today's post-Thailand-flood inflated hard drive prices, your entire e-mail history occupies less than a dollar's worth of disk space. I fail to see the issue.

  25. mutt, ruby libraries by subreality · · Score: 2

    For my own mail archives I just use mutt and weed things a bit by hand. I find that 90% of the mbox size is in fewer than a dozen attachments, so I can hand-filter those out in ten minutes once a year. Beyond that disk is too cheap to care and time is too valuable to make a really comprehensive solution. So what I do:

    'mutt -f archive.mbox'
    ':set pager_index_lines=6' (Lets you see the message index split above the body)
    'o' (Order), 'z' (siZe), End (last entry), Enter (Open).
    while(mbox.size > acceptable_size)
    {
            'v' (View attachments)
            'jjj' (down a few times to the attachment I want to nuke)
            'd' (Delete)
            while(more attachments) { 'd' (Delete more attachments) }
            'q' (Quit back to the message view)
            'k' (previous message)
    }
    'q' (Quit back to index)
    '$' (Sync changes to disk)
    'q' (Quit mutt)

    Note the 'j' and 'k' are vi-style up/down. The arrow keys work too if you're not a home row junkie like me.

    I don't know a good fully automated way to do this that's ready to slice it right out of the box. If you want to roll your own, just pick up a library like RMail or TMail for Ruby, or equivalent for the language you prefer. That's 80% of the work done but you'll still probably find a dozen corner cases involving oddly-named HTML-alternatives named things that look like binary attachments or terribly malformed spam.

  26. I have all email going back to October 2000. by Anonymous Coward · · Score: 0

    With the attachments. Storage is cheap and it takes only seconds to find anything as is. Seriously, if you're concerned about file size and the time to search some simple emails, perhaps your computer is just too old? (Your media attachments shouldn't be adding to the search time, so that is a lousy excuse.)

    1. Re:I have all email going back to October 2000. by pz · · Score: 1

      2000? Hell, I have my email back to the early 1980s.

      The real problem is that back then it was OK to put all messages in one file, and having one message per file is far more useful for searching with grep.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    2. Re:I have all email going back to October 2000. by tomtomtom · · Score: 2

      2000? Hell, I have my email back to the early 1980s.

      The real problem is that back then it was OK to put all messages in one file, and having one message per file is far more useful for searching with grep.

      Actually I find this less of an issue. Check out grepmail and mboxgrep. I use these pretty regularly and they're very useful for doing e.g. grepmail 'foo.*bar[a-z]' ~/Mail/mbox.gz >/tmp/messages; mutt -f /tmp/messages

    3. Re:I have all email going back to October 2000. by pz · · Score: 1

      Actually I find this less of an issue. Check out grepmail [sourceforge.net] and mboxgrep [mboxgrep.org]. I use these pretty regularly and they're very useful for doing e.g. grepmail 'foo.*bar[a-z]' ~/Mail/mbox.gz >/tmp/messages; mutt -f /tmp/messages

      HOLYCRAPTHANKYOU!!!! You have no idea how much time and headache you've saved me. Well, maybe you do, given that you can imagine how much of a pain it is to use plain grep on mbox and similar files.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  27. Of all the weird suggestions: Eudora MUA by Sipper · · Score: 1

    The Eudora Mail User Agent (i.e. email client) stores attachments in a directory as binaries but yet keeps the text of emails intact. Thus you should be able to import the email into Eudora, then when you export it the attachments should be stripped.

    This is also exactly why I don't use Eudora anymore, because attachments get stripped off when exporting the email (or at least that's the way email export or import from/to Eudora worked last I used it).

    Now, although this explains one way attachments can be stripped from email, I don't recommend doing that, because it alters the email. Generally I want email intact because otherwise what you're storing might refer to an "attached file", but yet not even knowing what the filename is that the email refers to. Plus it's actually useful to have sent mail attachments intact too, because it means you get to see what version of what file you sent to someone at the time.

    There are some other interesting options; the KMail MUA has an option of "delete attachment" when right-clicking on an email attachment, which does delete the attachment but not the reference to it, so you at least know the filename of what used to be attached. I just sent myself a test email and deleted the attachment and then viewed the email raw, but unfortunately Slashdot's filter won't let me send the result. But if you do that yourself and look at it, it should give you an idea how to re-form emails to strip attachments but not the references.

    1. Re:Of all the weird suggestions: Eudora MUA by Anonymous Coward · · Score: 0

      Yours is the closest so far, but seriously, nobody has ever written a batch program that takes an mbox file or similar format, pulls all the attachments out as binary, throws them into a folder, and then writes a new mbox file with embedded, links to said files? Yes, I understand the importance of maintaining a link to those files from the relevant message, but does the binary stuff really have to be encoded, bloated and crammed into one gigantic mbox file in order to do that?

      Most of the messages so far have been trying to convince the original poster that the space doesn't matter or ask why they'd ever want to do that. There are good reasons. Apparently nobody has written something similar to what they are looking for? Bizarre. If there really is nothing then maybe I'll sit down and do something in perl or some similar hack, but I would have thought someone had already tried something.

      What about a program that converts mbox files into an HTML index by date and/or subject with each message as an HTML page with links to the attached files (relative links, of course)?

      Storage is cheap, but I don't want mail dating back to the 1990s taking up gigabytes of space when I know that 99% of the space is LOLcat images and other files inefficiently stored and that I probably have elsewhere anyway if they were important. At the very least it would make it easy to sort through the files and decide if they were worth keeping.

    2. Re:Of all the weird suggestions: Eudora MUA by Sipper · · Score: 1

      Good post.

      Yours is the closest so far, but seriously, nobody has ever written a batch program that takes an mbox file or similar format, pulls all the attachments out as binary, throws them into a folder, and then writes a new mbox file with embedded, links to said files?

      It's difficult to "prove a negative" (i.e. "that tool doesn't exist"), but I can say that although I've been an email administrator for several years, I haven't seen something that does it in an automated manner -- but then again, I haven't looked to strip attachments before. Usually mail archiving is meant to store messages intact, so I consider stripping attachments a special case.

      Yes, I understand the importance of maintaining a link to those files from the relevant message, but does the binary stuff really have to be encoded, bloated and crammed into one gigantic mbox file in order to do that?

      Well, yes, although obviously the emails could be broken up into several mbox files or stored via maildir. The reason attachments are so prevalent is that there's generally no other convenient way to remotely share files. Some geeks are lucky enough to have a server with (scp | ftp | sftp) access, but even then the average layman would much rather getting an email attachment. And lately I've even seen tech pundits online recommending sending email attachments to yourself as a method of having a backup of important documents.

      Most of the messages so far have been trying to convince the original poster that the space doesn't matter or ask why they'd ever want to do that.

      That's something I generally don't like, because it doesn't actually answer the OP's question. Instead, it changes the question to one that's trivial to answer; i.e. "simple, don't do that."

      ...

      What about a program that converts mbox files into an HTML index by date and/or subject with each message as an HTML page with links to the attached files (relative links, of course)?

      Well, no, because the intent the OP had was to reduce the storage size. What you're suggesting could be done though -- yes. But at the same time it seems awfully similar to "webmail", even though it's not exactly the same thing. If it were me it wouldn't be what I'd want, though, because it would also change the interface to getting the email -- a mail client allows you to search through emails by subject, date, author, etc, and so I'd want that same interface to the archived mail.

  28. Stanford's MUSE project by Anonymous Coward · · Score: 0

    I'm not sure if anyone has mentioned this, but Stanford has been working on an email client to help understand and visualize an archive of 50,000 emails - It lets you pull out the images, browse emails by 'sentiment-mapped' values and graph the patterns of activity over the full lifetime of the archive. You can see the project page here: http://mobisocial.stanford.edu/muse/

  29. Apple Mail's "Remove Attachments" by Anonymous Coward · · Score: 0

    It Mac OS X's built-in mail application, you can use:

    Message -> Remove Attachments ... so all you need to do is find a Mac and put your email on an IMAP server.

  30. Delete it! by Lazy+Jones · · Score: 2

    Google keeps a permanent copy anyway...

    --
    "I love my job, but I hate talking to people like you" (Freddie Mercury)
  31. Re:500 Mb only? by icebraining · · Score: 4, Insightful

    Who still uses e-mail?

    People who get stuff done instead of being interrupted every 5m? And who want to receive messages even while offline? And have decent systems for archiving, tagging and searching them?

  32. Wait what? by bmo · · Score: 1

    We're worrying about 500MB?

    Even at today's outrageous price-fixed (you know it's true) hard drive prices, you're talking 14 cents a GB. For your situation, we're talking 7 cents.

    You're complaining about 7 cents worth of storage space. And to cut down on this you want to mangle the archive?

    You're tight on space? Buy another drive, burn to CD/DVD.

    For those of us who grew up with a Corvus shoebox hard disk costing thousands on the Apple ][ network, this is a ridiculous "ask slashdot" question.

    --
    BMO

    1. Re:Wait what? by unitron · · Score: 1

      Possibly the first time in the history of the planet that a flood has fixed something.

      --

      I see even classic Slashdot is now pretty much unusable on dial up anymore.

  33. I have all email going back to 1980... by neurocutie · · Score: 2

    back to ARPA mail and UUCP mail days...

    for a while I used Eudora and every month religiously took each piece of email and filed it away in suitable mail folders. After Eudora started declining and I got too busy, I stopped that, but even now, religiously every month I clean out my mailbox of all junk and unwanted attachments (trimming 60-100MB to usually 20-30MB) and then stack that months email away as a single mbox file, and start fresh with a new Inbox.

    the old mailbox files are on an IMAP server that I can easily read emails from at least 10 years ago -- older with a little more effort. As single mbox files each, I can do greps on them also. Seems to be an okay way to keep the stuff, some of which has proven to be important over the years....

    another big help: all semi-junky and non business emails I let Hotmail do the work (vendor stuff, Amazon orders, etc). Have been using Hotmail since before MS bought it. Works well as a place to direct mostly junky vendor stuff.

    1. Re:I have all email going back to 1980... by markdavis · · Score: 1

      That is exactly what I do. I have been using Hotmail way before it was Microsoft. I use that address for all vendor junk, netflix stuff, registrations, autoreplies, notifications, etc. I save my real Email address for stuff that matters more. Even still, that gets cluttered and huge pretty quickly (and I do spend time maintaining it and don't bother with archives).

      Not even counting stupid 100 Megapixel images people feel compelled to Email without resizing and other attachments, Email sizes are still dozens of times larger than they used to be.... Ten times as many headers. Stupid, wasteful HTML parts. Insanely large signatures. Clueless users who can't learn to trim and inline reply, so they just quote the entire thing and say "yes" at the top every time. Etc, etc.

      Still, I can't imagine a world without Email... it is a HUGE productivity booster for me. I could certainly live without all the damn spam, though.

    2. Re:I have all email going back to 1980... by zhonghua1 · · Score: 1

      www.taotao5156.com

  34. Procmail by massysett · · Score: 4, Funny

    Google for "procmail remove attachments":
    http://osdir.com/ml/mail.procmail/2002-11/msg00091.html

    That will get you started. You can do most anything with Procmail after you figure out the rather odd configuration file format.
    Make sure you have it backed up first because it's also quite easy to destroy data with Procmail.
    After you spend a lot of time futzing with Procmail scripts and sed and formail and the like, you'll wonder why you didn't go on Amazon or Newegg and buy a $10 flash drive that will hold all your mail several times over.

    1. Re:Procmail by bmo · · Score: 1

      This is the only way to do it... if your time is entirely worthless.

      If we measure time in minimum wage, the OP spent more time composing this question and submitting it than if the OP had just spent 7 cents worth of disk space and archived it away.

      This is a troll "ask slashdot"

      --
      BMO

      P.S. Where i get my 7 cents from: Go to Newegg. List internal 1TB drives by price. Pick lowest. 140 bux divided by 1000 = 14 cents per Salesman GB. He's using half. 7 cents.

    2. Re:Procmail by txoof · · Score: 1

      Fantastic. In all my googling I never came across that. I'm going to have to give that a try. It's orders of magnitude more elegant than the disaster I've been kludging together. Thanks!

      --
      This one's tricky. You have to use imaginary numbers, like eleventeen... --Hobbes
    3. Re:Procmail by nine-times · · Score: 1

      Honestly, I kind of wish I had an email client that did this for me. Or maybe more to the point, I wish I had an email server that did this for me. What I have in mind is, instead of the normal attachment system, have the server automatically strip out attachments and store them where they can be accessed by webdav/http. Where the attachment was, substitute in a link to the attachment instead. that way, I could browse my attachments like a normal file system, delete stuff as I like, but the email message is only as large as the text.

      Somehow we've gotten ourselves to the point where people are using email as a filesystem, and it's not very well suited for that.

    4. Re:Procmail by mcmonkey · · Score: 1

      This is the only way to do it... if your time is entirely worthless.

      If we measure time in minimum wage, the OP spent more time composing this question and submitting it than if the OP had just spent 7 cents worth of disk space and archived it away.

      This is a troll "ask slashdot"

      --
      BMO

      P.S. Where i get my 7 cents from: Go to Newegg. List internal 1TB drives by price. Pick lowest. 140 bux divided by 1000 = 14 cents per Salesman GB. He's using half. 7 cents.

      But what if addition to storing old email, he ever actually needs to go back and search or read old email?

      You're the one saying his time is worthless by only looking at the cost of hard drive space.

    5. Re:Procmail by bmo · · Score: 1

      >But what if addition to storing old email, he ever actually needs to go back and search or read old email?

      So? What of it? Show me a modern computer system that cannot handle 500MB of email. Show me a /smartphone/ that cannot handle 500MB of email.

      >You're the one saying his time is worthless by only looking at the cost of hard drive space.

      Am I right in saying that you think he's going to /manually/ go through his email to find stuff? Why? Isn't that why we have computers and search algorithms?

      See there's this thing called Google Desktop Search. I click in it and I mention someone's email address and the topic and I get results showing me individual emails that I have stored. It takes milliseconds. That is one tool of many that I can use. The size of my mbox does not matter.

      If someone asks you for an old email and you are going through your mail by hand, you are doing it wrong.

      --
      BMO

    6. Re:Procmail by Anonymous Coward · · Score: 0

      yeah now go and buy yourself 500mb of storage for 7cents. you can't.

      trollolololol, loser.

  35. 500mb? by nurb432 · · Score: 2

    Amateur. When you get to 8+ gb then we can talk about 'large archive'. Until then, just stick it on a CD.. you don't even need a DVD for that.

    --
    ---- Booth was a patriot ----
  36. Re:500 Mb only? by bmo · · Score: 2

    Why delete when disk space even today is 14 cents in "Salesman Gigabytes"?

    Someone back there said he has 16 GB of mail going back over a decade. That's what Two Bucks And A Quarter.

    It's less than a cup of coffee at Starbucks or even a Large at Dunkin' Donuts. It is fully irrational to worry about this.

    Anyone worrying about personal mbox size has OCD issues. Full stop.

    --
    BMO

  37. when you get older by iggymanz · · Score: 1

    you will be very sorry you deleted those pictures. don't do that. Even right now, you could make many people very happy by giving as gift one of those digital picture frames that display different stored photo every several seconds, with your pictures of those important to recipient.

    1. Re:when you get older by markdavis · · Score: 1

      Email is not supposed to be a file repository. Although, it seems like every day I find people who treat it just like that. When I get attachments that matter and need to last (such as nice/important pictures), I save them off and put them in an appropriate directory structure. Then I KNOW where all my pictures are located. They can be backed up appropriately. They can be viewed logically.

    2. Re:when you get older by iggymanz · · Score: 1

      but I back up my emails and the database that indexes them by backing up the whole thunderbird directory, so I can search by topics or phrases. I think its better that way. Only takes up 2.5GB on the 10/20GB tapes I use for 12 years worth. I do regret not having my emails from the mid 80s to 1998 as they were on disparate vax/vms and Unix systems, but oh well.....

  38. Some actual ideas to get you started by Anonymous Coward · · Score: 0

    Okay, so most of the people here have wasted your time trying to convince you that "storage is cheap" or that there isn't a good reason to store all that e-mail, let alone try to organize it all. I'm with you, not them. It's the fricking 2000s. It should be easier to archive this stuff and organize it if you *want* to.

    I've always wanted to do something about my messy mail archive of mbox files (dates back to the 1990s), but I dreaded the thought of coding something up from scratch given all the quirks of e-mail formatting. I had high hopes your post would elicit some sage advice from the readers of /., but so far I don't see much other than the good mutt+ruby solution. In frustration, I've started looking but I haven't found much either. For what it's worth, here's what I've go so far:

    1) There are plenty of commercial solutions that promise to do everything for a low price (e.g., MailSteward for OS X looks pretty good and has a free trial up to 15000 messages). Maybe. But I'm cheap and will exhaust the fully free solutions before spending money. Most of them are more focused on mailbox conversion/migration (e.g., Emailchemy) than actual filtering/archiving.

    2) Free / some assembly required:
    archivemail - mostly for date-selection of messages and archiving/compressing. Doesn't help with attachments. Python.
    archmbox - more capable than archivemail. Can do filtering based on date, header field matches, etc., copy selected messages and compress to archive. Perl. Closer.
    MHonArc - converts mbox to HTML files with links to attachments. Meant for mailing list archiving, but it should work the same for a personal mailbox. Perl. There's also an OS X front end for it.

    The HTML approach isn't ideal, but that could be a convenient way to browse through the archives (e.g., toss it all up on a password-protected web site and your mail archive is available anywhere, like your own personal and backed-up GMail), and a contributed program in the MHonArc distribution can turn an MHonArc archive back *into* an mbox file, which might let you do some modifications to the HTML files and linked attachments with scripts and then backconvert them after.

    I haven't tested any of these, but I think I'll try MHonArc and see how it goes.

  39. Re:500 Mb only? by mysidia · · Score: 1

    My entire mail store is over 16 Gb. I have single mbox files that are larger than 2 Gb.

    My entire mail store is over 1TB.

    I have single LZMA compressed mbox.xz files that are larger than 16 Gb.

  40. Remove the photos? Really? by enjar · · Score: 1

    Photos are one of the most treasured things in many families. Keep in mind it's highly unlikely Aunt Petunia is keeping great backups of her photos, and when it all goes south, you might be one of the family members who actually has a photo of a relative who has passed on that she wants to print when her hard drive gives up the ghost.

  41. Email archiver by Dupple · · Score: 2

    Enables you to save everything off line as a pdf. Personally I don't get the question or see the point. My archive is about 6 Gig, all backed up all searchable. Anyway the company that makes the software is www.spotdocuments.com Just back up

    --
    Watch those corners
  42. Something Like This? by pscottdv · · Score: 5, Informative

    We all think you're crazy, but here it is:

    #!/bin/env python
    from mailbox import mbox, mboxMessage

    orig_mb = mbox(path/ot/orig/mbox)
    new_mb = mbox(path/to/new/mbox)

    for key,msg in orig_mb.iteritems():
            new_msg = mboxMessage()
            payload = msg.get_payload()
            if msg.is_mulltipart():
                    payload = payload[0].get_payload()
            for header in msg.keys():
                    new_msg[header] = msg[header]
            new_msg.set_payload(payload)
            new_mb.add(new_msg)
    new_mb.flush()

    --

    this signature has been removed due to a DMCA takedown notice

    1. Re:Something Like This? by rgbscan · · Score: 1

      Will this work to save the attachments somewhere? I have a similar question to the OP but in reverse. I'm tired of searching my email for attachments that have been sent to me over the years (going back to '97) and would like to take my mbox, run it thru a program, and have all the attachments end up in a directory of my choosing. I can then delete or file them, having them now in a more sane place than using email as a file system

    2. Re:Something Like This? by Anonymous Coward · · Score: 0

      not to be dumb but in thunderbird you can just search for .. attachments.

    3. Re:Something Like This? by pscottdv · · Score: 1

      This script throws the attachments away which is what was requested. Saving them is a little more complicated.

      --

      this signature has been removed due to a DMCA takedown notice

  43. I want that with a GUI access app by Anonymous Coward · · Score: 0

    for Android and desktop Linux. I also want it accessible over the internet with the email archive hosted on my own server.

    1. Re:I want that with a GUI access app by icebraining · · Score: 2

      You're in good luck! The same author is developing a similar system in a client/server model, with the server (called Heliotrope) doing the actual work. You just need to write the Linux and Android clients. A client already exists, but it's ncurses based.

  44. Email cleanup? by Anonymous Coward · · Score: 0

    Ctrl-A, Shift-Del

  45. Meta-Facepalm! by Anonymous Coward · · Score: 3, Insightful

    FTS: "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010."

    So, the total space required thus far is definitely less than (8 * 0.5 GB) = 4 GB. A USB flash drive with that small a capacity is practically classified as electronic waste these days.

    Even if his or her annual e-mail archive size doubled every year for the next 10 years, it would only be 1+2+4+8+16+32+64+128+256+512=1023 GB.

    A 3 TB hard drive he buys *today* for $100 would probably solve his "problem" for 10 more years.

    Hopefully, in the year 2021, we will have tiny 3 PB SSD drives for $100... But maybe we will be ruled by an A.I. by that time, if we haven't already destroyed ourselves with viruses, nanomachines, robots, nuclear weapons, etc.

    1. Re:Meta-Facepalm! by monkeyhybrid · · Score: 1

      And on top of that, I suspect email usage will start decreasing in the not too distant future, or that it already has. Most people I know use social networks, IM, Dropbox, etc, for file transfers these days.

    2. Re:Meta-Facepalm! by MattSausage · · Score: 1

      By the way, if you can point me at a $100 3TB hard drive, I'm in the market for one.

  46. The solution may not be a technical one by Anonymous Coward · · Score: 0

    I happen to be in the process of reading a book on hoarding to help my mother through some moderate hoarding issues that she is having; the problem described in this post sounds exactly what some of the underlying causes are for hoarding in general.

    If you keep looking for a purely technical solution to the problem, you're probably not ever going to solve it, and it will keep escalating despite whatever technical stop-gap you're able to come up with.

  47. Better compression? by tomtomtom · · Score: 1

    As others have said, the headache you will have if you do want to come back (potentially years later) to that one email you know you had only to find your attachment-stripping program has foobar'd the whole archive up (or that you need the attachment after all) probably isn't worth the hassle for saving 500MB per year this year (even taking into account reasonable growth rates - I'd note that bandwidth per $, which will be the factor limiting your email size, has been growing rather more slowly than storage capacity per $ over the past decade and things are likely to continue that way).

    If the problem is that you have significant duplication between emails (e.g. the same attachment being emailed several times), gzip and bzip2 may well miss the opportunity to de-dupe this because the distance between duplicated sections is large. One solution to consider if this is an issue may be to use something which is better at compressing over long distances. I would suggest trying something like lrzip to compress tarballs of the annual sets of mbox files before archiving those.

    Of course, if you just have lots of attachments which *aren't* duplicated (which is probably more likely), that won't really help much.

  48. Re:500 Mb only? by Anonymous Coward · · Score: 0

    Don't act so smug - the 500MB figure was for one year only. He never gave us the full amount for all his emails.

    BTW, the "R" in "RTFS" means "READ".

  49. A few useful tools by Arrogant-Bastard · · Score: 1

    These are a few of the tools that I use (Unix/Linux, of course):

    formail (part of the procmail distribution) is very useful for rewriting mailboxes.
    uuexplode is useful for discovering and yanking out attachments.
    grepmail is REALLY useful for discovering messages which match certain criteria.
    csplit is useful for more than mail, but it also has applications with mailboxes.

  50. What the heck have you got in there? by msobkow · · Score: 1

    My personal email archive goes back to 1996, and is still only 262MB.

    My Google archive uses 164MB.

    I've no idea what my Yahoo account uses.

    But 500GB of email?!?!?!?! Are your relatives sending you entire videos as attachments or accidentally copying their entire music archives?

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:What the heck have you got in there? by msobkow · · Score: 1

      Ah, I get it. 500MB of 2010 emails. I'd misread that as gigabytes.

      Still, you've got as much email for one year as I've saved my entire life. Something doesn't add up.

      --
      I do not fail; I succeed at finding out what does not work.
  51. Attachment Disorder by macraig · · Score: 1

    Deal with the superfluous attachments first and then see how you feel. Attachments are often unnecessary baggage.

  52. You are talking about $0.035 in storage costs by tlambert · · Score: 1

    You are talking about $0.035 in storage costs

    Literally. I bought 3TB for $200.00 at Fry's yesterday. You probably have more than 500M of "Angry Birds" on your cell phone.

    If the point is to pull the data off your gmail account and not have it stored there (maybe you want to migrate away, maybe, you are trying to get us to design a product for your file hosting service adjuntct to gmail, whatever), fetchmail is a terrible tool, particularly since gmail permits IMAP4 access, and you don't have to worry about decoding headers in precedence order.

    In general, if you want to leave the email on your gmail account, and delete only attachments, sorry, but the mail is not stored that way on the gmail server, messages are stored as units, and attachments are not separate things, they are different sections in the same flat file using MIME encoding.

    You would have to pull the mail down with IMAP4, process it, and push it back up, again with IMAP4. You would then need to take additional steps via IMAP4 commands to make it not look like newly arrived mail to the gmail server, or you're going to see them all as new messages the next time you log into gmail (basically, after the put, you will need to mark the message read again).

    If you additionally use POP3 access, the message IDs as reported by POP3 will change, and since it does its "leave on server" functionality by maintaining a local database of message IDs, it's going to be seen as new email there. There's no getting around that, unless you own the source code to your POP3 client and are willing to do correlation after the put operation on the IMAP4 connection to translate the ID.

    You also realize that if you are ding this for reasons of quasilegality of message contents, those messages aren't really gone, right? People accidentally delete things all the time and want them back, and that same recovery process in that case would be usable in discovery by subpoena for the email provider (there is in fact additional requirements for ISPs under Patriot II to maintain records of sent and received emails for up to two years).

    The bottom line here is that you are engaging in a pretty useless exercise here, unless you are trying to hide illegal activity or build a service and have us design it for you. In either case, good luck, you'll be writing a lot of code to get what you want.

    -- Terry

  53. findbigmail.com by matty619 · · Score: 2

    Is a project a friend my mine started. Interfaces w/ gmail's API, quite easy to use.

    www.findbigmail.com

  54. Similar Issue by zetetikos · · Score: 1

    I have an issue with some of my old Thunderbird mail archives. They are infected with various viri (w32 swen.A, N32 Netsky.T, Trojan Zbot) The anti-virus software I've tried just wants to delete the entire file not clean it. Would like to clean the files without risking infection but haven't been able to find a way to clean an offline mail file. Any ideas? Thanks.

  55. Copied to an archive in six-month blocks by VanessaE · · Score: 1

    I keep my email in maildir format (the default for Claws-Mail), and rotate every six months. The whole process is entirely manual, but since any given step only takes a few seconds, it works fine for me.

    Emails are sorted on receipt according to source or content via ordinarily filters. Every email I receive that's worth keeping goes into a catch-all folder after reading. I probably should be preserving the sorting when I move something into that catch-all, but I don't receive enough email to bother with it yet.

    The start of the current six-month range is always part of that folder's name, e.g. "2011-07-01 to Current". Every six months, I rename the folder to add the end date (e.g. "2011-01-01 to 2011-06-30") and move it into a separate storage folder (still within Claws-Mail's folder tree). Then, I simply create a new Archive folder for the new 6-month period. Fill, rinse, repeat every six months.

    When the mood strikes me (roughly every couple of years), I'll compress the latest six-month block(s) and move the results into a long-term storage directory. I generally keep only the most recent few years' worth of emails at hand, so the oldest stuff gets deleted from time to time, leaving only the compressed files. If I need to search the older stuff, it's a small matter of extracting to a temp/work directory, doing whatever needs done, and deleting.

    On top of that, I run an incremental backup of my home directory and storage areas to a USB-connected disk every so often (the time between backups varies - usually once a fortnight or more often). So eventually, every email I decide to save ends up with one online and at least two offline backups. Since I use Gmail, technically they serve as an off-site backup of the most recent stuff (until I delete it anyway).

    I figure with this setup, it's easy to find whatever I need, and it would take a pretty big screwup to actually lose an email.

  56. Mail discipline by Anonymous Coward · · Score: 0

    I check my email twice a day, with rare exceptions.
    If there is information in an email message, I generally write it. Longhand. With a fountain pen on high quality paper (because that is all I use.)
    If it is worth keeping, it is worth writing down on archival paper with archival ink. It's the rare email message which doesn't get deleted while being read.

    I actually cannot relate to anyone who thinks it's necessary or wise to retain email messages longer than the amount of time required to reply and/or delete.

  57. Proc/formail + perl-Email-MIME-Attachment-Stripper by henry.thorpe · · Score: 1

    How about procmail (for new mail) or formail (for iterating through messages in existing mailboxes) and a perl-Email-MIME-Attachment-Stripper script?

    Strip (or separate and save) the attachments from a mail message.

  58. I win. by Spugglefink · · Score: 1

    $ du -hs ~/Mail
    1.9G /home/spugglefink/Mail

  59. There is value in personal history by erice · · Score: 2

    Ask yourself: When are you ever going to read all those email again? When is *anybody* ever going to read them again.

    1) I need to order ink recently. Now, I don't print much but I vaguely remembered a good supplier that I had used in the past. But what was it called? A few moments of greping and I found it: in a confirmation email from three years ago.

    2) I met a woman on a Meetup hike recently that I seem to have met before. Was this the blind date from four years ago? The smoking gun was in an email from 2007.

    3) I've had occasional need to look up old acquaintances. While I might have created a contact file at various points, odds are I have forgotten what I named it or where I put it. But I am quite sure the information is in email.

    The real treasure is email that is ten or more years ago. You think you remember what happened in the right order? Trust me. You don't. An email archive is like a diary except it is less work and more complete.

  60. Apple Mail by benbean · · Score: 1

    If you use, or have access to a Mac, the Apple Mail client has for some years had a Remove Attachments option in the Message menu. Simply select all your mail in a folder with Cmd-A and select that menu option and it'll do exactly what you want. I use it regularly to prune my database.

    --
    It's a Unix system - I know this.
  61. Tired of this shit... by Anonymous Coward · · Score: 1

    Look...

    You've never had to look back thru your email for stuff
    You don't feel the need to have email from 1994 searchable
    You've been lucky enough to never have to find a contact that you lost

    BECAUSE

    You're too young to 'have a past'.
    You've never held a job with any type of seniority
    You've never held a job that was important
    You don't have friends or intimate relationships

    You, are not interesting or important enough to temporally affect
    anything beyond a few dozens of hours

    and you could probably disappear or die in your hovel and |except|
    for bills and rent not getting paid and perhaps a (more?) foul odor
    than before...

                                    NO ONE WOULD MISS YOU

    But see... that's why this question exists... because we do
    have friends, family, loved ones, jobs that are or were important
    and people would miss us, if we fuckin took a 1hr nap in the
    middle of the day! So, do us a favor and STFU... because us
    important folk with friends want to save our emails til the
    angels blow their trumpets.

    Thanks for playing!
    -@|

  62. Gmail 7.5GB and counting... by Anonymous Coward · · Score: 0

    That is 8 years x 500MB = 4GB... still got 3.5 GB. So why in the world would you do that??? oh.. you worry abotu loosing your family's baby pictures... C'mon man.. they won't fail. And if you are want to keep your work mail there, you should consider changing to a more reliable provider or upgrading to a pro gmail account.

  63. Just get a gmail account... by Old+Sparky · · Score: 1

    ...and forward everything there.

    Or set up (an) imap server(s) on additional machine(s) and use imapsync, or similar, to back everything up.

  64. 2003? by Frederic54 · · Score: 1

    I have all my emails since 1990 or 1991 :-/ and it's true that I use less and less emails... I saved them in a mbox format and used a script to remove binary attachement (yeah, cat pictures and things like this). It's mainly text and can be highly compressed.

    --
    "Science will win because it works." - Stephen Hawking
  65. 15 year old mail "lost" in casette-tape land by peter303 · · Score: 1

    I still have the tapes, but the desire to read them. Since the mid-1990s I've have had cloud-email (hotmail) and havent really lost anything.

  66. Fail safe method to manage email. by Anonymous Coward · · Score: 0

    Use my M$ Outlook method.

    1) Select all.
    2) Delete.

    done.

  67. email for file transfer by Onymous+Coward · · Score: 1

    Email is extremely convenient for file transfer, but I prefer not to have my mail store so bloated.

    What happens when my friends want to start emailing me movies?

    I haven't figured out the true nature / fundamentals involved here.

  68. mhonarc by hemhem · · Score: 1

    http://www.mhonarc.org/ is a perl script that will recurse though folders of emails, unpack the attachments and generate html versions of the mail, with links. A simple find can them delete the attachments, or you could just keep them as the result will be (mostly) smaller.

  69. Thunderbird or DB Mail by Anonymous Coward · · Score: 0

    I love Thunderbird but it becomes slow when handling large mail archives (I have about 10GB), mail folders become corrupt and I feel it even looses emails. I'm looking forward for Thunderbird 10 (or later) when it is able to store e-mails in a proper DB (e.g. SQlite). Until then I recommend DB Mail.