Slashdot Mirror


Text-Mining Your E-mail

Misha writes "There have been a number of weeks/months in anyone's life that called for a better organization of your Inbox. filtering and folders work, but it'd be nice to have an text-mining tool running in the background that categorized incoming messages by topic as they arrive. It's nice to see that besides NLP research, there are some great algorithmic advances being done, as seen in this paper. Perhaps even one of them Perl monkeys will quickly hack such a background tool." Note: it's a PostScript file.

7 of 217 comments (clear)

  1. PS-PDF Document format conversion by Misha · · Score: 5, Informative
    --



    I was thinking of how to intentionally fail my drug test... It would make a good memoir story someday.
  2. Yet another reason for.. by Dr+Caleb · · Score: 4, Informative
    Lotus Notes.

    It automagically does full text indexing of all specified databases. To it, your Inbox is just another database.

    --
    "History doesn't repeat itself, but it does rhyme." Mark Twain
  3. Remembrance Agent by Tekmage · · Score: 5, Informative

    It's more general than e-mail, but in the wearable computing community, there's a little application called Remembrance Agent, written by Bradley Rhodes that many folks use. In terms of stand-alone UI, it's still quite primitive, but that's because it was built around dynamic hooks into Emacs.

    I've been playing around with some Java-based wrapper code, to wrap the ra-retrieve executable in a Server and allow clients to access the data via sockets. I have a Java-based client coded up that hooks into the System clipboard, but it's still in alpha-mode. All GPL'd of course, but needs a little time to mature. It's a proof-of-concept, work in progress. :-)

    Check out Brad's site for more insight into the work he did and is doing.

    --
    --The more you know, the less you know.
  4. procmail! [Re:The ultimate spam blocker?] by Styx · · Score: 5, Informative
    I use procmail, with weighted scoring
    First, I sort out mail from the mailingslists I read.
    Then, mail from friends, and people I correspond with a lot.
    Finally, I have a weighted scoring recipe:

    :0 Bh
    * -199^0
    #Assign an initial value of -199, mail gets filtered, if the score is above 0, at the end of the recipe.
    * 50^1 ^(From|To):.*@hotmail.com
    * 50^1 ^(From|To):.*@yahoo.com
    * 50^1 ^(From|To):.*@aol.com
    * 50^1 ^(From|To):.*@msn.com
    * 50^1 ^(From|To):.*@excite.com
    * 50^1 ^(From|To):.*@netscape.net
    * 50^1 ^(From|To):.*@yahoo.co.uk
    #Most mail to and from these domains is spam, so score it.
    * 100^1 opt-out
    * 50^1 opt-in
    * 200^1 OTCBB
    * 50^1 viagra
    * 50^1 zyban
    * 50^1 propecia
    * 75^1 FREE
    * 75^1 GUARANTEED
    * 75^1 LEGAL
    * 50^2 MILLIONAIRE
    * 50^1 100%
    #Words I only see in spam.
    mail/Trash

    This works quite well for me. If any spam gets through, I try to find some words, that I don't get in normal mail, and add them to the scoring.

    --
    /Styx
    1. Re:procmail! [Re:The ultimate spam blocker?] by bruckie · · Score: 4, Informative

      Or you could just use SpamAssassin, which is designed specifically to do this and has many more rules that have been created by others.

      --Bruce

      --
      There are 10 kinds of people in the world: those who understand binary, and those who don't.
  5. Re:What I want by nosferatu-man · · Score: 5, Informative

    Welcome to Gnus. Have a sandwich.

    (jfb)

    --
    To spur "enterprise Linux," Big Bang, the distributed two-phase commit.
  6. Done already by Matts · · Score: 5, Informative

    "Perhaps even one of them Perl monkeys will quickly hack such a background tool."

    Been done already. Check out Mail::Miner.

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.