Slashdot Mirror


A Programmatically Accessible Email Archive?

JohnnyConatus asks: "Does anyone know of a service that offers corporate email archiving and also provides a read-only interface for accessing the archived emails programmatically? Ideally this would be in the form of an database connection or a web service. My current employer is required by the SEC to archive all email communication with clients and we would like to incorporate the archived emails into our internal applications. I have called just about every email archive service I could find via Google, and while most offer a web application to search the emails, none so far have a solution for doing so programmatically. For various reasons, archiving the emails ourselves is considered the last resort. If we had to implement archiving locally, a program that archived by acting as a mail gateway would be the ideal since we'll be supporting a wide-range of mail servers."

13 of 61 comments (clear)

  1. IMAP by g_bowskill · · Score: 4, Insightful

    If all of the emails are stored in an imap account then you could access this programatically using PHP's Imap functions. I do the same thing using a cron job to check an email account every 5 minutes on my site, if theres a new mail it looks to see if it has an image attachment and if it has automatically posts it online for me.

    Information about PHP's Imap functions can be found at http://uk.php.net/imap.

    I'm not entirely sure if this is the kind've thing you are looking for, but this is probably how I would deal with the problem.

    Regards,

    Grant

    --
    Isee Stars Astro Image Hosting.
    1. Re:IMAP by eoyount · · Score: 3, Funny

      I'm not sure you should let everyone on /. know that you automatically post images in your email...

      --
      To understand recursion,
      you must first understand recursion.
  2. You missed the obvious by spribyl · · Score: 2, Insightful

    You said to did a google search but did you talk the the obvious choice. Google.

    They seems to have something of a specialty in archiving e-mail and search technology and usually have some kind of API.

  3. Re:geocities, archive.org and public key cryptogra by Saeed+al-Sahaf · · Score: 2, Funny

    So, you are suggesting saving possible critical email at Geocities? Why didn't I think of that...

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  4. IMAP as the API by GuyWithLag · · Score: 2, Interesting

    How much mail does your company move per day? Thousand of messages? A gig in attachments per day?

    You could very easily implement this as a simple forwarding daemon, or as an plugin to your existing MTA, just store all mail going anywhere in a separate, append-only mailbox, then use IMAP to access it remotely.

    IMAP is an industry-proven protocol, there exist many open-source implementations, and has been specifically developed for situations where the mail will remain on the server. It provides you with searching and tagging, plus you can organize the mail store as you see fit (f.e. each years mails in a separate folder, while still able to search all of them at once) (sort known spam in a separate folder while keeping it around). Granted, I'm not aware of any IMAP server that uses an SQL back-end, so this may become a bottleneck for you.

    1. Re:IMAP as the API by LetterJ · · Score: 3, Informative

      I personally use IMAPSize to archive my IMAP mail that is needed mostly for historical purposes. Just yesterday, I pulled 12,000 messages off of my IMAP server for long-term storage. It turns them either into an mbox file or individual emails. I've then got a script that dumps them into a database as well as just zipping them up for burning to optical media. The database is for quick searching, the files for backup/recovery. I looked for the solution mostly to speed up my IMAP server and client both, which weren't happy with the huge numbers of email I was storing or occasionally crappy connections. I've got a web interface to it that also lets me easily reply to a message directly from there, pull out related messages, etc.

  5. Java by hexghost · · Score: 2, Interesting

    The javamail api can do everything you need, and you can plug bouncycastle's api along with it so you have it PGP encrypted.

  6. You're talking about SQL storage of messages by scotpurl · · Score: 2, Informative

    http://www.dbmail.org/

    is one starting point, but there are a few others.

    You're basically replacing /var/spool/mail with an SQL back-end. Things like MBOX or IMAP will suck for dealing with millions of records/messages, but SQL should handle it easily.

  7. Assentor by Anonymous Coward · · Score: 2, Informative

    We have the same SEC requirements here and we use iLumen's Assentor products. The configuration was painful initially, but it's quite effective. Here's an article on the Sarbanes-Oxley Compliance Journal. It stores ALL email and IMs and the contents and functionality can be made accessible via APIs or database calls.

  8. SEC will not allow exactly what you want by Anonymous Coward · · Score: 2, Informative

    Disclaimer: I work for a company that makes SOX compliance appliances.

    The SEC requires you to keep all email in house. As far as we can tell that means your storage must be in house, not at a service provider.

    We don't provide such an interface. In our products. We want as few possibilities for bugs where you can delete/alter email as we can. By sticking to our interface we have a better chance of keeping you from doing something illegal (which could reflect on us). However we do provide a web interface which a cleaver programmer can script.

    If you use something other than Microsoft Exchange, you can set the always-cc option to send email to several users, one of which is the account our device polls from, and one is an account you can doing anything on. Frankly I prefer this option. We don't want you messing in our product for anything other than the compliance purposes we have designed as it may open us in court questions of if we are for compliance when we do those other things.

    Yes we are paranoid, but there are some strong laws around on this subject, and right now regulators are looking for examples to prove they are doing their job.

  9. Re:i remember seeing somethign like this once by tzanger · · Score: 3, Informative

    Exchange4Linux does exactly this. Works pretty well, we've got a shitload of email (videos too), 5000+ contacts and all manner of data sitting in a PostgreSQL database.

    It's NICE being able to execute SQL queries on your aggreate communications data. Perfect example: Our Asterisk head-end system knows which of our customer service people is on pager duty with an SQL query which looks at their service calendar. :-)

  10. Do some research ... by gstoddart · · Score: 3, Informative

    If your company is doing this for SEC compliancy (meaning Sorbanes-Oxeley) you really need to look into all that goes along with this.

    You'll still need to provide security as to who can view messages. Search for legal purposed. You have document rentention scheduled you'll need to adhere to. You'll potentially have a freakin' huge volume of data to look it.

    I'm seeing a lot of references to PHP and Java classes -- something as important as SEC regulations for e-mail archiving shouldn't just be thrown together willy-nilly. Failure to get it right could cause *huge* legal problems downstream.

    Mail archiving for SEC/SOX is an utterly non-trivial undertaking.

    Cheers

    --
    Lost at C:>. Found at C.
  11. Courier by bobv-pillars-net · · Score: 2, Informative

    Courier has an optional "big-brother" mode that makes a copy of every email that passes through. It can be set up as an email gateway and has a flexible authentication and filtering mechanism with standard plugins for SQL, LDAP, PAM, and others.

    --
    The Web is like Usenet, but
    the elephants are untrained.