Slashdot Mirror


Sorting the Spam from the Ham

MrClever writes "The Sydney Morning Herald (Aust) is running an article about the merits of Bayesian filtering and a good plain-english description of how it works. Might be handy if you need to explain it to non-technophiles. The main thing that may be useful is a Bayesian spam filter written to drop straight into Outlook 2k/XP available here and written in Python by Mark Hammond." Math buffs might enjoy reading these pages or browsing this writeup and its many links.

249 comments

  1. But without spam... by pnix · · Score: 3, Funny

    But without spam, I wouldn't get any email!

    1. Re:But without spam... by IthnkImParanoid · · Score: 3, Insightful

      You're getting modding as funny, but I just figured out exactly how true this is. My main email account is used primarily for work, so it was very easy to set up white lists for 30 or so email addresses with a few family and friends thrown in, and route to a special folder. I still check the default folder, of course, but I turned off notification for everything except the white folder.

      I went from checking my email every 5-10 minutes to a handful of times a day.

      --
      It's nothing but crumpled porno and Ayn Rand.
    2. Re:But without spam... by Anonymous Coward · · Score: 0

      Yup. We're running Spam Assassin locally (on the server with a central Bayes database as well as the other rules), and it's shocking to me now how little mail I actually get. Now that procmail files suspected spam into another mailbox I only check occasionally, I Have New Mail only once or twice a day max. The false positive rate so far is zero.

      The funny thing is that this is largely because I'd come to rely on email a lot less as the spam count rose to 30-50 messages/day. Now that I'm getting nearly none (Spam Assassin misses about one a week, and it'll be basically zero once I bump up the value of a high Vipul's Razor confidence) it'll be interesting to see whether I begin to use it at mid-'90s levels again.

    3. Re:But without spam... by DA_MAN_DA_MYTH · · Score: 1

      Yeah 95% of my incoming emails have this subject:

      Subject: You Blocked My MSN.

      I now have a new purpose in life. To reply to each and every one of them, even if the address exists or not.

      RE: You Blocked My MSN

      BECAUSE YOU KEEP SPAMMING ME ASSHOLE!!!

      --
      "It takes many nails to build a crib, but one screw to fill it."
  2. Finally.... by JohnnySkidmarks · · Score: 2, Funny

    I get to do something to stop my boss from enlarging his penis anymore... It's really starting to hurt.

    --

    I went to battle MC Escher but drew a blank

  3. Why not here? by Anonymous Coward · · Score: 5, Interesting

    What happens if Slashdot runs a Bayesian filter which runs a day after the stories are posted and programs itself with all the -1 comments as "Spam" and all the +5 comments as "Ham". Then let the Bayesian filter adjust all incoming messages by up to 2 points.

    I bet it'd work - and imagine if we did it to stories too! Maybe it'd reject all Taco's dupe submissions.

    1. Re:Why not here? by bmongar · · Score: 4, Interesting

      Very interesting but I think it wouldn't work well, since most of the trolls and flamethrowers are talking about the same topics the same words will show up in both ham and spam posts. But if someone could come up with a word pattern algorithm that could differentiate that would rock.

      --
      As x approaches total apathy I couldn't care less.
    2. Re:Why not here? by Anonymous Coward · · Score: 0, Interesting
      It would do a great job of catching:
      Penis Bird
      First Post!
      $OS is dying
      goatse.cx

      It might make a better lameness filter than the one they have now

    3. Re:Why not here? by Anonymous Coward · · Score: 0

      That might work, but it would up the processing requirements for /. quite a bit. Plus, having people do more things encourages the community.

    4. Re:Why not here? by Anonymous Coward · · Score: 0

      "It might make a better lameness filter than the one they have now"

      So would a tampon.

    5. Re:Why not here? by prgammans · · Score: 1

      *** Please note quote censored to prevent auto moderation ***
      It would do a great job of catching:
      P***s B**d
      F***t Post!
      *** is dying
      ******.cx

      It must be working already as you just been moded (-1, Troll)
      Now we will all just get is.....
      Pe<!-- sdfg ->ni<!-- dgsdfg ->s Bir<!-- cvxbd ->d
      Fi<!-- cvbdfd ->rst <!-- luigh ->ost!
      $OS<!-- dfgs76j -> i<!-- rbgmkyuz ->s dy<!-- dfgdfg ->ing
      g<!-- zxcgfvcb ->oats<!-- vnf6ui ->e.cx

    6. Re:Why not here? by kels · · Score: 2, Insightful
      I bet it'd work - and imagine if we did it to stories too! Maybe it'd reject all Taco's dupe submissions.

      Umm, a naive Bayesian filter would score duplicate posts highly, because after all they contain all the same words that were good last time.
      --
      "I believe that the cult of the particular brings only death - for it bases order on likeness." St.-Exupery
    7. Re:Why not here? by Anonymous Coward · · Score: 0

      Dude!! Baysian rules... it works far better then you think. Initially, I was quite skeptical until I joined the Baysian developers list, and started to use it, and picked up on quite a few interesting things i want to share with people.

      I learned that if I keep my corpus (ham and spam) relatively smaller, and develop a policy where I drop off the older ham and spam, and always picking up new ones, then my corpus can change ahead of the spammers. It's best to not use old ham and spam, and always work with no more then a months worth.

      After about 2 weeks training, I'm getting better then 92% accuracy, but 30 days, it's not uncommon to get 99% accuracy, but if you don't change your corpus (ham and spam), then it starts to get inaccurate again, as spammers are always evolving, trying to get past the filters.

      Soon, it didn't take spammers long to realize that by putting in spaces b e t w e e n t h e l e t t e r s in their spam messages, but once this was known then the baysian picked up on it, and marked ALL as spam if it finds such stuff.

      The important thing to remember, is that if you "teach" it right, it WILL identify the differences. And spammers are making things much easier, by the way they encode their spam messages.... I mean... how many people do YOU know that encodes messages like spammers do?

    8. Re:Why not here? by hipster_doofus · · Score: 1

      I can't wait! I'll just set my sig to "Linux Rules, Microsoft Sucks!" and wait for the karma points to roll in! :-)

      --
      Five Dolla Moddy-Moddy? ;->
    9. Re:Why not here? by Sloppy · · Score: 1

      But then when BSD or Stephen King do finally die, we'll never find out about it!

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    10. Re:Why not here? by pinky42 · · Score: 1

      But it would give the lamers something to do: try and fool the filter. Maybe that would keep them out of trouble.

      "No, no, don't tug on that, you don't know what it's attached to."-Buckaroo Banzai

    11. Re:Why not here? by Zork+the+Almighty · · Score: 2, Funny

      I can't understand why none of my emails ever get through. Let me start by first introducing myself, I am Dr. Zork the Almighty, a manager at the SOCIETE GENERALE BANK NIG. LTD. Lagos. I have come across your name in a private search for a reliable and reputable person to handle a very confidential transaction. I am looking for a foriegn partner to assist in a transaction involving FIVE BILLION US$ currently in an escrow account.

      --

      In Soviet America the banks rob you!
  4. What I want by Nate+Fox · · Score: 5, Interesting

    is a scalable popfile for larger organizations. If I could get popfile (with its super-easy-to-train/use-web-interface) that would run on my linux server, scan my IMAP mail server (well, incoming mail would actually work fine, too. I've heard they have a smtp plugin for it in cvs), and then have a popfile config page for each person, or mayby tie it into the imap/smtp server's login. THAT would rock. I've heard spamasassin does Bayesian, but I couldnt see how it was trainable (and I dont want other people on my server to read each others mail, obviously).

    1. Re:What I want by Telastyn · · Score: 2, Funny

      What I want is physical pain upon the sender whenever spam is sent. *THAT* would be much better I think :]

      Hell, even a fee or mental anguish would suffice...

    2. Re:What I want by franimal · · Score: 3, Informative

      Personally, I really like Spambayes and Procmail for use with my IMAP server. It's easy to setup for each user and they can train their own SPAM database. You can even run the training script as a cron job and the users only need to shuffle unknowns to the spam folder. Works well, because users never even have to see the spam, if they don't want to.

    3. Re:What I want by leshert · · Score: 5, Informative

      Spamassassin learns in two ways:
      1. Manual training: there is a tool called 'sa-learn'. You can pipe a message to it, or point it to a mailbox, and specify whether the mail is spam or ham.
      2. Automatic training: if the score of the mail is significantly low (definitely spam) or significantly high (definitely ham), it will automatically train on the message. This may seem useless, but it's useful in that SA will then start to figure out patterns in spam or ham that don't trigger its rules.

      I read mail with Mutt, and I've remapped the 'd'elete key to instead throw the message into a 'ham' mbox, and added a 'S'pam mapping to throw the message into a 'spam' mbox. Then I added a nightly cron job to run sa-learn over the two mboxes and truncate them. This has worked very, very well for me... In I haven't had a single false positive since Bayes kicked in about two months ago, and I got my first false negative in about two weeks today. I typically trap 10-15 spams a day.

      One thing to notice: even if you enable it, Bayesian filtering won't kick in until you've recognized at least 200 spam and 200 ham messages. Took me a long time to figure that out (I had plenty of spam, but I wasn't training it on ham at all, which is why I started remapping the mutt commands).

      As far as installing it on a server, your users don't have to be able to read each others' mail. I have it installed so that my wife and I each have our own bayes dbs, so neither of us has to read each others' mail. Plus, different users will regard different mail as spam: anything about the Pittsburgh Steelers going to my mailbox is probably spam, but not hers; similarly, anything regarding Linux going to her mailbox is probably spam, but not mine.

    4. Re:What I want by Anonymous Coward · · Score: 0

      SpamProbe runs on Linux and works better than most, since it counts word pairs as well.

    5. Re:What I want by JohnGrahamCumming · · Score: 2, Informative

      I agree with you and we are planning to get to that ASAP. There's some underlying work we need to do on performance first (that's planned for v0.20.0) and then we'll have the foundation for multiusers, pretty much as you describe. If anyone out there wants to write an IMAP module (subclass of Proxy::Proxy) then I'd be very happy to accept it. John.

    6. Re:What I want by JohnGrahamCumming · · Score: 1

      Do you have some evidence to back up the claim that word pairs are more accurate than individual words? This is commonly quoted as a "better" approach but most of the research on Bayesian text classification shows that for email classification it is very close to the same performance as more complex k-word classifiers and it's faster. John.

    7. Re:What I want by slagdogg · · Score: 2, Interesting

      I read mail with Mutt, and I've remapped the 'd'elete key to instead throw the message into a 'ham' mbox, and added a 'S'pam mapping to throw the message into a 'spam' mbox.

      Would you mind sharing your .muttrc for this?

      --
      (Score:-1, Wrong)
    8. Re:What I want by rutledjw · · Score: 1
      There 'ya go! What we need is some good old fashioned ass-whoopin' corporeal punishment.

      I'm personally a proponent of tar-and-feathering, but that's just me. After a few times walking around like a deranged Big Bird, I think spammers might find a real job.

      --

      Computer Science is Applied Philosophy
    9. Re:What I want by Dionysus · · Score: 1

      Why would you need an IMAP module? Just intercept the email before it comes to the INBOX, and do the bucket assignment there.

      I was actually thinking of forking the popfile system to work with IMAP.

      --
      Je ne parle pas francais.
    10. Re:What I want by JohnGrahamCumming · · Score: 1

      Seems a pity to fork the POPFile code just to get IMAP support when I'm happy to include it. How would you suggest intercepting the mail? John.

    11. Re:What I want by Dionysus · · Score: 1

      email me at john at fjellstad dot org and we can discuss. I wouldn't mind helping out.

      --
      Je ne parle pas francais.
    12. Re:What I want by berenddeboer · · Score: 1

      If you want a Bayesian tool that works for an IMAP server, try emc: http://www.pobox.com/~berend/emc/.

      This tool builds your spam token list by scanning IMAP folders. It's a command-line tool, binaries for Windows and Linux.

      You will download the first release, so it might have undesirable properties. A new release is expected soon.

      --
      If I had a sig, I would put it here.
    13. Re:What I want by Phil+Gregory · · Score: 1

      You can look at my macros for spam classification. (Linked instead of posted directly because slashcode kept inserting unwanted spaces in them.)

      With these, "S" will classify an email as spam. "H" will reclassify a false positive--it's designed to operate on an email that SpamAssassin has munged as spam and won't work on regular emails. I haven't bothered to write a macro to train regular emails as ham, though I probably should.

      --Phil (Mutt Mafia member since 1998)
      --
      355/113 -- Not the famous irrational number PI, but an incredible simulation!
    14. Re:What I want by slagdogg · · Score: 1

      Nice, thanks! I'm guessing that the reason for the extra spaces in slashcode is to prevent someone from mucking up the page rendering by including a long word with no spaces ... annoying, but definitely necessary.

      --
      (Score:-1, Wrong)
    15. Re:What I want by Alan+Partridge · · Score: 1

      From Webster's Revised Unabridged Dictionary (1913):

      Corporeal \Cor*po"re*al\ (k[^o]r*p[=o]"r[-e]*al), a. [L.
      corporeus, fr. corpus body.]
      Having a body; consisting of, or pertaining to, a material
      body or substance; material; -- opposed to spiritual or
      immaterial.

      --
      That was classic intercourse!
    16. Re:What I want by leshert · · Score: 3, Interesting

      Not at all. The macros are short and sweet:


      macro index d ~/Mail/bham^my
      macro pager d ~/Mail/bham^my
      macro index S ~/Mail/bspam^my
      macro pager S ~/Mail/bspam^my


      Then the relevant sections of my crontab look like this:


      0 2 * * * /usr/bin/sa-learn --spam --mbox /home/tim/Mail/bspam
      15 2 * * * /usr/bin/sa-learn --ham --mbox /home/tim/Mail/bham


      In another post (as well as on several sites on the web), it's recommended to bind a key to pipe the message directly to sa-learn. I read my mail on the server, which is an embarrassingly old machine, and sa-learn takes on the order of 30 seconds per email--not fun when you're just doing 'that last check of email before heading home'. Copying the mail to a file is just about instantaneous, and the sa-learn can do its dirty work while I'm sleepting (or watching The Office, as the case may be).

    17. Re:What I want by thogard · · Score: 2, Interesting

      Some of them are dealing with the pain. A guy I meet recently paid about AU$5000 to a spam house to send his ad out to a million people in an opt-in mail list. His web server got 40 hits that day compared to the daily averge of 13 and none of them bought his book. He was taught a $5000 lesson that spaming doesn't work. What was interesting is that the "demo run" got more hits on his web site than the real run.

    18. Re:What I want by armb · · Score: 1

      I want their testicles deep fried, stuffed down their thoat, and cut off.

      In that order.

      --
      rant
    19. Re:What I want by rutledjw · · Score: 1
      And your point is? Corporeal - as in physical. Physical punishment? Tar-and-feathering?

      I don't see the point of your post

      --

      Computer Science is Applied Philosophy
    20. Re:What I want by Alan+Partridge · · Score: 1

      The word you wanted is "corporal". Punishment cannot possess a physical body, but a physical body can be punished.

      You see?

      --
      That was classic intercourse!
    21. Re:What I want by Anonymous Coward · · Score: 0

      anything about the Pittsburgh Steelers going to my mailbox is probably spam, but not hers; similarly, anything regarding Linux going to her mailbox is probably spam, but not mine.

      Way to attack gender roles. Kudos to you and your masculine wife.

  5. How to filter bloodninja? by Talisman · · Score: 2, Funny

    And if you could, would you really want to?

    bloodninja: Baby, I been havin a tough night so treat me nice aight?
    BritneySpears14: Aight.
    bloodninja: Slip out of those pants baby, yeah.
    BritneySpears14: I slip out of my pants, just for you, bloodninja.
    bloodninja: Oh yeah, aight. Aight, I put on my robe and wizard hat.
    BritneySpears14: Oh, I like to play dress up.
    bloodninja: Me too baby.
    BritneySpears14: I kiss you softly on your chest.
    bloodninja: I cast Lvl 3 Eroticism. You turn into a real beautiful woman.
    BritneySpears14: Hey...
    bloodninja: I meditate to regain my mana, before casting Lvl 8 Penis of the Infinite.
    BritneySpears14: Funny I still don't see it.
    bloodninja: I spend my mana reserves to cast Mighty of the Beyondness.
    BritneySpears14: You are the worst cyber partner ever. This is ridiculous.
    bloodninja: Don't shit with me biznitch, I'm the mightiest sorcerer of the lands.
    bloodninja: I steal yo soul and cast Lightning Lvl 1, 000, 000 Your body explodes into a fine bloody mist, because you are only a Lvl 2 Druid.
    BritneySpears14: Don't ever message me again you piece.
    bloodninja: Robots are trying to drill my brain but my lightning shield inflicts DOA attack, leaving the robots as flaming piles of metal.
    bloodninja: King Arthur congratulates me for destroying Dr. Robotnik's evil army of Robot Socialist Republics. The cold war ends. Reagan steals my accomplishments and makes like it was cause of him.
    bloodninja: You still there baby? I think it's getting hard now.
    bloodninja: Baby?

    --

    bloodninja: Ok baby, we got to hurry, I don't know how long I can keep it ready for you.
    j_gurli3: thats ok. ok i'm a japanese schoolgirl, what r u.
    bloodninja: A Rhinocerus. Well, hung like one, thats for sure.
    j_gurli3: haha, ok lets go.
    j_gurli3: i put my hand through ur hair, and kiss u on the neck.
    bloodninja: I stomp the ground, and snort, to alert you that you are in my breeding territory.
    j_gurli3: haha, ok, u know that turns me on.
    j_gurli3: i start unbuttoning ur shirt.
    bloodninja: Rhinoceruses don't wear shirts.
    j_gurli3: No, ur not really a Rhinocerus silly, it's just part of the game.
    bloodninja: Rhinoceruses don't play games. They fucking charge your ass.
    j_gurli3: stop, cmon be serious.
    bloodninja: It doesn't get any more serious than a Rhinocerus about to charge your ass.
    bloodninja: I stomp my feet, the dust stirs around my tough skinned feet.
    j_gurli3: thats it.
    bloodninja: Nostrils flaring, I lower my head. My horn, like some phallic symbol of my potent virility, is the last thing you see as skulls collide and mine remains the victor. You are now a bloody red ragdoll suspended in the air on my mighty horn.
    bloodninja: Fuck am I hard now.

    --

    BritneySpears14: Ok, are you ready?
    eminemBNJA: Aight, yeah I'm ready.
    BritneySpears14: I like your music Em... Tee hee.
    eminemBNJA: huh huh, yeah, I make it for the ladies.
    BritneySpears14: Mmm, we like it a lot. Let me show you.
    BritneySpears14: I take off your pants, slowly, and massage your muscular physique.
    eminemBNJA: Oh I like that Baby. I put on my robe and wizard hat.
    BritneySpears14: What the fuck, I told you not to message me again.
    eminemBNJA:
    BritneySpears14: I swear if you do it one more time I'm gonna report your ISP and say you were sending me kiddie porn you fuck up.
    eminemBNJA: Oh
    eminemBNJA: damn I gotta write down your names or something

    --

    "Study your math, kids. Key to the universe." -The Archangel Gabriel
    1. Re:How to filter bloodninja? by Anonymous Coward · · Score: 0
      Hey, fucknard...

      Quit stealing my trollage. It worked very well the first time I stole it.

      Anyway, if you want the whole series you may find it here and the posts that follow -->

      http://games.slashdot.org/comments.pl?sid=68468&th reshold=0&commentsort=0&tid=211&mode=nested&cid=62 64522


      AC
    2. Re:How to filter bloodninja? by DaemonGem · · Score: 1

      BritneySpears14: I swear if you do it one more time I'm gonna report your ISP and say you were sending me kiddie porn you fuck up.

      The problem is that it would be her fault too, for actually accepting the porn.

      -Dae

      --
      "Alle reden vom wetter. Wir nicht." - SDS Sozialistischer Deutscher Studentenbund.
      j00 4r3 3n73r1ng l337 w0r1d.
  6. This is bad news!!! by aborchers · · Score: 4, Funny

    The main thing that may be useful is a Bayesian spam filter written to drop straight into Outlook 2k/XP


    I've now lost one of my primary arguments for switching my colleagues to Mozilla!

    --
    Trouble making decisions? Just flip for it.
    1. Re:This is bad news!!! by stinky+wizzleteats · · Score: 1

      I've now lost one of my primary arguments for switching my colleagues to Mozilla!

      Then switch them to kmail. Kmail has a pass-through script filter option that would allow you to use any console-mode spam filter for Linux, such as bogofilter.

    2. Re:This is bad news!!! by joeflies · · Score: 2, Informative

      From what I understand, beta testers tell me the next revision of the Outlook client contains a spam filtering function that works pretty well too. I do like the Mozilla 1.4 junk mail features though - works about as good as I could have hoped.

    3. Re:This is bad news!!! by aborchers · · Score: 2, Informative

      Er, wouldn't that first involve switching them to Linux? Come on, man, I have to take baby steps with people who need convincing to leave Outlook! :-)

      --
      Trouble making decisions? Just flip for it.
    4. Re:This is bad news!!! by nacturation · · Score: 1, Offtopic

      Okay, so don't switch then. If Outlook meets all your needs, then use Outlook. If there's a more compelling reason to use Mozilla -- be it ideological ... or logical :) -- then use Mozilla.

      I used to be in the "they're evil, shun the products" camp. I have since made the switch to the "use it, it has the benefits I want" camp. Some benefits, of course, are non-tangible and will vary from user to user.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    5. Re:This is bad news!!! by Mikey-San · · Score: 4, Insightful

      I know your post was meant to be funny, but it brings up a point:

      So what? If more computer products benefit, don't we all? Anything that makes Outlook better is good in my book. Perhaps this will eliminate some virus-and-worm-carrying spam--and that's good for /all/ of us on teh intarweb. ;-)

      --
      Mikey-San
      Karma: +Eleventy billion (mostly affected by watching Celebrity Jeopardy)
    6. Re:This is bad news!!! by aborchers · · Score: 2, Interesting

      First, I'd refer you to my /. Moderation Aphorism #1. Second, I'll give a serious answer to your serious observation:

      I use MS Office under Crossover Office because it gives me the features I want (admittedly, one of them is the ability to share identically functional documents with Windows users) so I definitely agree with your perspective. In the case of Mozilla, there has been a great ruckus around here about spam, and I kept telling people it didn't affect me because I used Mozilla w/ Bayesian filters. Additionally, Outlook's rotten record for relaying mail worms has been a problem to me as a sys admin. Independent of the calendar/groupware features, in my immediate area, most people use Outlook as a mail client out of inertia because it came with Office and refuse to switch because of fear of the unknown rather than out of a choice based on features.

      --
      Trouble making decisions? Just flip for it.
    7. Re:This is bad news!!! by H310iSe · · Score: 2, Interesting

      I use outlook because my clients use outlook (though mostly I just use the awsome web interface that fastmail.fm provides). My clients use outlook because it has great, integrated calendaring and it syncs with their various PDAs. Such is life.

      I recently reviewed 7 client-side spam filters and ended up picking Spambully. It's not free and it's not perfect but for our environment (Win/outlook 2k2 w/ a weird mirapoint IMAP server and multiple PCs per user (so email needs to stay on the server)) it was the best. Very tight outlook integration (i'm a little worried about instabilities but so far it's smooth) and baysian.

      But it's really just the best of a bad lot. It's great to see someone working on an open source filter that might work w/ IMAP - we can't have enough of these since right now, well, we have almost none.

      --
      closed minded is as closed minded does
    8. Re:This is bad news!!! by Anonymous Coward · · Score: 0

      I was one of those. I said I'd never use Outlook because it's the security hole from hell and that I'd never run XP because it's just evil!!

      Well um... I run both of them now? lol

      I gave in and just used what worked best for me. I looked at Mozilla mail & news, Eudora, The Bat, and other email clints, but for what I do, Outlook turned out to be the best choice.

    9. Re:This is bad news!!! by Anonymous Coward · · Score: 0

      Yes, and when they say "now Outlook XP 2007 has spam filtering and virus detection" we can say "yes, and GNU/Linux had this for years" :)

  7. SpamAssassin works for me (even on Exchange) by AssFace · · Score: 5, Interesting

    My own personal account is on a shared server at pair.com, and I run SpamAssassin (the perl script, can't put the spamc/d on there since I'm not root).
    I have written on here before how I have saved myself a lot of hassle over the last few months by installing SA. I now stop 100+ messages a day (usually more like 140 now).
    My stats tell me that since Feb, I've stopped over 15K Spam messages. Hot damn.

    Where I currently work now we have Exchange and I wanted SpamAssassin on there, but we weren't getting the money approved to put it on.
    So I hacked in SpamAssassin via an Exchange 2000/2003 EventSink.
    If you want the code for it, feel free to grab it from http://www.cardboardutopia.com/ExchangeSpamFilter. zip

    But do note that if you have many users on your machine, you aren't going to want to use this - an EventSink on Exchange runs in serial, so SpamAssassain's Perl script (the spamc/d doesn't work under Win32) will get executed on every incoming mail, and it will have to wait until it is done before it gets the next one.

    We process about 2000-5000 incoming messages a day and it does okay, but we have a very light load.

    --

    There are some odd things afoot now, in the Villa Straylight.
    1. Re:SpamAssassin works for me (even on Exchange) by IthnkImParanoid · · Score: 2, Informative

      SpamAssassin is nice, but it's nowhere near the 99% elimination claim in the article (an vaporous claim in the article? The hell you say!)

      SpamAssassin, set at 5 (after I got a false positive at 4) stops about 75-80% of spam, but with some more rules from me (how did SpamAssassin let 'huge c-cks' get through?!) stop closer to 90%.

      The only solution I've tried that worked well has been white lists, but that only works so well because I don't make a lot of new friends :)

      --
      It's nothing but crumpled porno and Ayn Rand.
    2. Re:SpamAssassin works for me (even on Exchange) by notque · · Score: 0, Redundant

      A few too many rules, and an email talking about SpamAssassin won't get through.

      Ha Ha.

      Cause it says Ass.

      See, It's funny.

      Cause it's true.

      (We are severely understaffed today and I'm delerious.)

      --
      http://use.perl.org
    3. Re:SpamAssassin works for me (even on Exchange) by AssFace · · Score: 1

      Our company is small, so it works well for us. We have 16 people and I would guess under 200 clients.
      I took our Global Contacts list and put all of the clients in there into the whitelist.

      Then I sat and watched the e-mail coming in for about a week (takes a surprisingly small amount of time with the mail load that we have).
      I manually sorted the mail out into spam/non-spam - mainly looking for mistakes that SA made.

      I had it set to 7.5 as a trigger - I wanted it to err on the side of letting spam through instead of marking client mail (even then, our mail currently just marks the mail so that end users can filter it in Outlook easily - I wrote the script to also allow other server side options of killing the mail).

      Anytime that I saw a mistake, I would add that person to the whitelist (I usually try to add whole domains to a whitelist if they are a domain that we deal with a lot and can be trusted) - but the e-mails that are from yahoo.com, hotmail.com, and all of the other services that people use for home mail, I have to do those on an individual basis.

      After a week things settled down a lot.

      I am still working out a few extra things that will help out a lot. We have spam coming in to addresses that don't exist here (some used to, others never did) - so if we can block those the instant they hit the server, then the less bandwidth is wasted on them.
      The server was confiugred to send out a message letting people know that the address that they sent to didn't exist.
      But if that is sent to a spammer, then they too usually don't exist on their account anymore, so then that gets bounced back with a message as well.
      A series of those is annoying.
      I disabled that on our server, but still allow out of office and confirmation e-mails.

      I then setup an ASP page that lets me add white/black list users from a webpage - that way if I leave as an admin, whoever takes my place doesn't need to know about SpamAssassain and can just use that web page to handle issues if they come up (assuming they are minor false hits either way).

      I also wrote scripts to learn from the mail everyday and then clean it out - for the first little bit it was all manual, but now it is automated.
      Those scripts just generally learn from what it already filtered out as spam and ham, which isn't terribly excellent - ideally the missed messages in either one would be manually sorted out, but I prefer to have as much hands off time as possible with the mail.

      I have since modded the config files so that bayes_90 and bayes_99 score much higher since I have never seen those be wrong on spam.
      I also lower the required hits to 4.5 (4 was too low for us).

      I had to disable the rbl checking since that doesn't work under Win32 and I also disabled auto learning since that slows things down and I was already scripting forced learning in anyway.

      On my personal shared server at pair.com, that has been great for me - I have my whitelist established and I can tweak the rules.
      The only time mail ever gets through is when the server is under a high load - then the spamassassin perl script will time out and let the mail through.
      This wouldn't happen if I either could run the spamc/d, or if I had a dedicated server - I will likely get one of those eventually there at pair.com (I am very pleased with them), but they are a bit expensive now until I can justify it with a business making money off of what it serves intead of just my own personal playground.

      --

      There are some odd things afoot now, in the Villa Straylight.
    4. Re:SpamAssassin works for me (even on Exchange) by AssFace · · Score: 2, Funny

      The company that I am at now is a financial services one - dealing largely with hedge funds.

      The term "asset" shows up in 90% of our mail - it is amusing how many issues the companies in our sector have with poorly written filters that think "asset" is a bad word.

      I suggested that we start referring to the same concept as "fuckerbabies" but it hasn't caught on yet.

      --

      There are some odd things afoot now, in the Villa Straylight.
    5. Re:SpamAssassin works for me (even on Exchange) by mpieters · · Score: 2, Interesting

      We ran SpamAssassin on Python.org and Zope.org for a considerable lenght of time. We had, however, many false-positives to deal with (we manually checked everythiong that scored everything between 5 and 10 points on the SpamAssassin scale). Usually, we had to review between 10 and 15 messages a day like this.

      We recently switched to SpamBayes, and our false-positive rate so far is 6 out of 2200+ spams (almost 12 days of traffic, with certain foreign charactersets, malformed email headers and blacklisted email bounced and not included in this number), mostly because we are still in training mode.

      On top of that, because SpamBayes is written in Python, we can integrate it directly into Exim with Greg Ward's elspy, whereas we had to run SpamAssassin in a separate process, which occasionally bombed out. Way much faster this way!

      Way more hot damn!

      --
      "The truth shall make ye fret" -- The Truth, Terry Pratchett
    6. Re:SpamAssassin works for me (even on Exchange) by Anonymous Coward · · Score: 0

      pair.com provides SA for _all_ users, no need to install it. Just login at their webinterface, and configure it for each of your mailboxes (!) toyour heart...

      Dunno why you claim you need to install SA..

      Best wishes,

      another happy pair.com customer

    7. Re:SpamAssassin works for me (even on Exchange) by vanyel · · Score: 4, Informative

      I run a small ISP with spamassassin installed, and I had to increase the default quota when I upgraded to the version with Bayesian filtering and its multi-megabyte databases per user. Combined with spamd bugs forcing me to switch back to running spamassassin individually and the fact that spamd still doesn't serialize processing, so the system still gets hammered by a flood of spam, I'm looking forward to greylisting to help take the load off spamassassin.

    8. Re:SpamAssassin works for me (even on Exchange) by AssFace · · Score: 1

      yeah, I see that now - but as I said in another post, when I spoke with them about it, they never mentioned it and I never found it when I looked on the server.

      I looked a few mins ago and saw that SA is on the server, but it is Version 2.43, and the first SA version to use Bayes was 2.50 - I currently use 2.60, so I don't really want to switch to their version.

      --

      There are some odd things afoot now, in the Villa Straylight.
    9. Re:SpamAssassin works for me (even on Exchange) by Stormbringer · · Score: 1

      Depending on your MTA, it might be well worth your while to use SpamProbe. It has the advantage of being compiled C++, so the Perl turbocharger-lag at startup isn't there to soak up cycles.

      I run getmail/qmail on my LAN, with SpamProbe spliced in (I threw some once-per-cron Perl together to iterate across Maildirs and SpamProbe-test each new mail), and I'm quite pleased with how clean our mail is now.

      It also grows a large db, but so far one global db seems to be serving us well. Spam is spam, I guess, whether it's meet-your-one-true-love aimed at me or lolita-with-animals aimed at the 10 yr old.

    10. Re:SpamAssassin works for me (even on Exchange) by Anonymous Coward · · Score: 0

      We run a web based Email server, so each "user" has their own directory for storage of their own unique corpus.

    11. Re:SpamAssassin works for me (even on Exchange) by Spoke · · Score: 1

      I assume you're using Sendmail which has a horrible method of handling short bursts of mail.

      I switched to Postfix so that I could get it's much better rate limiting rules. Postfix will limit the number of incoming deliveries to a limit you set. This means you can directly control the load put on your server due to mail processing. By default it limits concurrency to incoming messages to 2 which works out well for a dedicated single CPU mail server.

      Much more effective than Sendmail which sends off message for delivery as fast as it can before it notices that the load has shot up. Before you know it you've got 100 spamc processes running and your machine is deep into swap!!!

  8. Spambayes by Chromodromic · · Score: 5, Informative

    I use Spambayes with Outlook 2000, and it takes a little tweaking, but it works as advertised. Ahhh, the magic of mathematics. Just now, brought up Outlook, checked my mail and three little messages offering a free Sony headset, 70% off cell accessories, and a chance to take an IQ test just got tossed into my spam folder. Thanks anyway, but I think that means I just passed my IQ test.

    Every so often I go in and take out some old, old spam, just to make sure my current preferences are being represented and that's all the maintenance that's required.

    This is, however, the second time I've trained the filter. The first time, it incorrectly identified my FreeBSD status mails as spam, and from then on was throwing those into the Spam folder. My own fault, though, since I hadn't included any of these messages in my representative ham.

    If you run Outlook, download this filter and use it. You'll be doing yourself, and a world that doesn't need fat-injected, herbally enhanced penises, a favor.

    --
    Chr0m0Dr0m!C
    1. Re:Spambayes by Anonymous Coward · · Score: 1, Informative

      And for those of you that use OE (or any other mail client), get Mailwasher. The one account version is free. If you have multiple POP accounts, you have to pay a small amount to get the "full" version. I've been using it for at least three months and it has labeled every spam mail accordingly and in very few instances does a legitimate mail get tagged. You should also download the extra filters from here. MW just works.

    2. Re:Spambayes by MattRog · · Score: 1

      I use SpamInspector by GIANT Company:
      http://www.giantcompany.com/products.asp x

      It catches virtually all my spam (except the one-liner emails that just have a URL; must not have a rule for that one for some reason) and rarely ham (can't think of a time in which it has caught ham). One positive feature is that it (supposedly) integrates everyone's checking of 'spam' or 'not' so that if enough people mark something as spam (that didn't get caught by the filters) it adjusts the filters accordingly so I don't get it (e.g. the Nigerian emails) (or vice versa).

      --

      Thanks,
      --
      Matt
    3. Re:Spambayes by AssFace · · Score: 4, Insightful

      I have seen all of the local client software and I personally have never bothered with it.

      I always felt that the whole point of spam being annoying was that it wasted bandwidth. It gets sent to my server, and then I have to download it all from my server, and then it gets sorted away from my eyes in my client.

      It is fairly trivial if you get enough regular mail for it to matter, and you are on a fast connection.

      But I can't tell you how annoying it is to be on a slow dial-up connection and download 50 messages and then see that they all got filtered into the spam folder and that there were no "real" messages.
      While there is a nice feeling of seeing them all get caught, it is annoying to have to wait for a download (and pay for it) and then get no return on the investment.

      That is why I always try to have the spam blocking on the server side. Although I now spend most of my time using ssh into my server and that way it isn't downloading all of the mail until I want to see something.

      Perhaps if I combine the fact that I have SA on the server, and then if I also had a client side option, I would get everything properly blocked that way (the only reason stuff gets through my server setup right now is if the server is under a high load, then my SA script will time out and the mail gets through).

      --

      There are some odd things afoot now, in the Villa Straylight.
    4. Re:SpamBayes by AssFace · · Score: 1

      I have never used SpamBayes, but just in general, if you are only training it on the spam, then it isn't going to get much smarter.

      You have to train it regularly on both spam and ham so that it works correctly on *your* mail.

      They come with a general rule set and then can get much smarter over time with training.

      I personally use spamassassain and really am pleased with how much it has helped me.

      --

      There are some odd things afoot now, in the Villa Straylight.
  9. An interesting way to deal with spam. by Meat+Blaster · · Score: 2, Interesting
    I've tried a number of different ways to filter spam, from whitelisting to Bayesian filtering, and Bayesian seems to offer a good balance between not eating too much of the ham while letting the spam through. Not too shabby, especially given that it comes with Mozilla now, and I think it's an excellent way of allowing clients to determine what they want to see without infringing free speech.

    I don't know if I'd want it in Python, though... it does seem to be a good deal slower already than other spam filtering methods without putting it in a scripting language. Getting it in Outlook can only be good for the net (can Bayesian be applied to things like spam from Internet virii as well?)

    1. Re:An interesting way to deal with spam. by gilesjuk · · Score: 1

      What can I say? I rarely ever see spam these days thanks to this approach. Popfile is one of the more mature solutions to spam, although it's a classifier not just a spam filter.

      Since Feb I've had 2,215 messages and it has made only 37 mistakes. 98.32% accuracy. I've tried a few commercial products and they were lucky to approach 50% accuracy.

    2. Re:An interesting way to deal with spam. by GreyPoopon · · Score: 1
      I've tried a number of different ways to filter spam, from whitelisting to Bayesian filtering, and Bayesian seems to offer a good balance between not eating too much of the ham while letting the spam through.

      Agreed. Although, I'm a bit disappointed that many of the bayesian filter projects don't offer whitelisting in conjunction with the filters. If I'm running a business, it's really important that I allow all email from *@myclient.com, regardless of what the spam filters think about it.

      I think it's an excellent way of allowing clients to determine what they want to see without infringing free speech.

      Just for the record, restricting spam doesn't infringe on free speech. Advocating free speech does not involve letting people use my money to promote their ideas (or products). As long as the cost of storage and bandwidth for my ISP gets passed on to me, I'm bearing a portion of the cost.

      --

      GreyPoopon
      --
      Why is it I can write insightful comments but can't come up with a clever signature?

    3. Re:An interesting way to deal with spam. by kefoo · · Score: 2, Interesting

      can Bayesian be applied to things like spam from Internet virii as well?

      What if the the filtering programs had a feature that would allow somebody to send out the "signature" of an email virus that the filter could use to block the virus before it had ever actually seen one, by adding its characteristics to the list of things that weigh heavily toward spam so it would be filtered out before ever reaching Exchange/Outlook.

    4. Re:An interesting way to deal with spam. by rusty0101 · · Score: 1

      In a sense, popfile does provide whitelisting at the level you are looking for. Popfile has what they call "Magnets" where you configure a string that you want popfile to look for, and before it applies any of the baysian rules to the message, if it sees that string, it classifies the message as you request.

      Functionally I believe this means that it effectively ignores the content at that point, meaning that other messages that you receive are not classified relative to these messages. I could be wrong however.

      About the only complaint I have about popfile is that there is no way to "reclassify" the messages I have received. I.e. I can not have it go back though the messages and change the subjects as is appropriate after I have passed through them in popfile and re-classified them there.

      -Rusty

      --
      You never know...
  10. Written by more than hammond by adamhupp · · Score: 4, Informative
    The Outlook plugin may have been written by Mark Hammond but spambayes is very much a group effort. The project can be found at spambayes.sf.net.

    I've been using spambayes for months now and it really is quite amazing. Now, when I get the occasionaly spam in my mailbox it's actually interesting because I want to figure out why it made it in. The number of false positives is almost nil, and the ones that do get hit are spammy looking autogenerated reciepts from purchases I've made. It's made reading email a much more enjoyable activity.

    -Adam

    1. Re:Written by more than hammond by Wakko+Warner · · Score: 3, Interesting

      The number of false positives is almost nil, and the ones that do get hit are spammy looking autogenerated reciepts from purchases I've made.

      This is quite possibly the only complaint I have about spambayes, too, and it's not even that big a deal to me. After about a month of collecting spam in its own folder (named SHIT, oddly enough), it had learned enough that I was able to dial down my SpamAssassin settings (I use an old version of SA still, too, without the bayesian stuff built in -- too lazy to switch; spambayes works well enough that it's not worth it.) I check my incoming spam folder once or twice a week now, as opposed to once or twice a day when I only ran SpamAssassin at a relatively forgiving (4.5-5.5) setting.

      There are a few thousand spams in SB's crap folder now; it's gotten so good that I can't really remember the last time I've had something miscategorized as spam, and of the 50-60 spams I get per day, usually only one or two make it through to my inbox, if that. Half of the time, I don't get any at all.

      If you didn't have a reason for installing a Python interpreter before, now you do.

      - A.P.

      --
      "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
    2. Re:Written by more than hammond by Anonymous Coward · · Score: 0

      Getting false positives is very bad for a business account. You should give SpamProbe a try. It never gets false positives.

  11. Better Bayesian Filtering by Anonymous Coward · · Score: 2, Informative

    The first discovery I'd like to present here is an algorithm for lazy evaluation of research papers. Just write whatever you want and don't cite any previous work, and indignant readers will send you references to all the papers you should have cited. I discovered this algorithm after ``A Plan for Spam'' [1] was on Slashdot.

    Spam filtering is a subset of text classification, which is a well established field, but the first papers about Bayesian spam filtering per se seem to have been two given at the same conference in 1998, one by Pantel and Lin [2], and another by a group from Microsoft Research [3].

    When I heard about this work I was a bit surprised. If people had been onto Bayesian filtering four years ago, why wasn't everyone using it? When I read the papers I found out why. Pantel and Lin's filter was the more effective of the two, but it only caught 92% of spam, with 1.16% false positives.

    When I tried writing a Bayesian spam filter, it caught 99.5% of spam with less than .03% false positives [4]. It's always alarming when two people trying the same experiment get widely divergent results. It's especially alarming here because those two sets of numbers might yield opposite conclusions. Different users have different requirements, but I think for many people a filtering rate of 92% with 1.16% false positives means that filtering is not an acceptable solution, whereas 99.5% with less than .03% false positives means that it is.

    So why did we get such different numbers? I haven't tried to reproduce Pantel and Lin's results, but from reading the paper I see five things that probably account for the difference.

    One is simply that they trained their filter on very little data: 160 spam and 466 nonspam mails. Filter performance should still be climbing with data sets that small. So their numbers may not even be an accurate measure of the performance of their algorithm, let alone of Bayesian spam filtering in general.

    But I think the most important difference is probably that they ignored message headers. To anyone who has worked on spam filters, this will seem a perverse decision. And yet in the very first filters I tried writing, I ignored the headers too. Why? Because I wanted to keep the problem neat. I didn't know much about mail headers then, and they seemed to me full of random stuff. There is a lesson here for filter writers: don't ignore data. You'd think this lesson would be too obvious to mention, but I've had to learn it several times.

    Third, Pantel and Lin stemmed the tokens, meaning they reduced e.g. both ``mailing'' and ``mailed'' to the root ``mail''. They may have felt they were forced to do this by the small size of their corpus, but if so this is a kind of premature optimization.

    Fourth, they calculated probabilities differently. They used all the tokens, whereas I only use the 15 most significant. If you use all the tokens you'll tend to miss longer spams, the type where someone tells you their life story up to the point where they got rich from some multilevel marketing scheme. And such an algorithm would be easy for spammers to spoof: just add a big chunk of random text to counterbalance the spam terms.

    Finally, they didn't bias against false positives. I think any spam filtering algorithm ought to have a convenient knob you can twist to decrease the false positive rate at the expense of the filtering rate. I do this by counting the occurrences of tokens in the nonspam corpus double.

    I don't think it's a good idea to treat spam filtering as a straight text classification problem. You can use text classification techniques, but solutions can and should reflect the fact that the text is email, and spam in particular. Email is not just text; it has structure. Spam filtering is not just classification, because false positives are so much worse than false negatives that you should treat them as a different kind of error. And the source of error is not just random variation, but a live hum

    1. Re:Better Bayesian Filtering by GoatEnigma · · Score: 4, Funny
      Email is not just text; it has structure.

      You've obviously never received email from an AOL user!

    2. Re:Better Bayesian Filtering by kirkjobsluder · · Score: 1

      One of the things that I think needs to be considered in comparing false-positive rates is not "can the false positive rate be eliminated" but "is the false positive rate lower than human sorting alone". As far as I know, nobody has really bothered to factor in human performance.

  12. News for Pervs, Stuff that Matters. by notque · · Score: 5, Funny

    Would you use the phone if you had to listen to a 10-second brothel advertisement every time you made a call?

    Yes.

    Definately Yes.

    Is that a feature I can have added?

    --
    http://use.perl.org
    1. Re:News for Pervs, Stuff that Matters. by Anonymous Coward · · Score: 2, Funny

      shouldn't that be:

      News for Pervs, Stuff that Splatters.

      ??

  13. Eudora users... by Control-Z · · Score: 2, Informative


    Eudora 6.0 beta has spam filtering which seems to be Bayesian. It's a little slower to learn than PopFile, but it's pretty good so far, and of course integrated with the Eudora UI.

    http://eudora.com/betas

  14. Spam filtering altogether by ToadMan8 · · Score: 5, Interesting

    I sat on the E-Mail policy team (a branch of the Strategic Planning team) for Miami University (Oxford, OH, not Florida) this last year (as a technical advisor, student and support desk employee. We looked at all sorts of spam solutions, as the president decided this should be a main focus (apparently the Viagra adds hit a bit too close to home for comfort ;)).

    The problem in the educational market, though, is that, not being a business that can make rules and force people to live by them, educational establishments have annoyed customers (students and faculty) sometimes if any spam is blocked. (research, etc) False positives absolutely can't be tolerated. So a ranked system (spam assasian) that suggests the possibility of spam is not on the best but the only solution we have avalible. Mail will be ranked and users can make rules that trash everything but a guarenteed perfect mail, if they so desired. Or they can leave them all alone. So intelligent filtering is a necessity, not just a bennefit.

    On another page, I had an odd place during this discussion of the team. I do not receive spam. (Please, don't start now). My MUOhio.edu address simply doesn't get a single piece of spam e-mail. I have had the account for two years. I have over 3000 messages in various folders. And none are spam at all. I just haven't signed up for anything with it. I put the e-mail addy on webpages too (that I author) and haven't gotten a single thing. But oh my the trash "spam" account gets 60 a day. On AOL. That blocks 80% of incoming mail. Ironically, they had MUOhio.edu blocked weeks back.

    --
    I haven't posted in so long, my sig is out of date.
    1. Re:Spam filtering altogether by Anonymous Coward · · Score: 0

      U of Miami, eh? Does a d%ckhead named Darrow Neves still work there?

    2. Re:Spam filtering altogether by greed · · Score: 2, Informative

      I don't know spambayes, but bogofilter most definately can operating in a "ranking" mode:

      • X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.12.2
      • X-Bogosity: Unsure, tests=bogofilter, spamicity=0.499150, version=0.12.2
      • X-Bogosity: Yes, tests=bogofilter, spamicity=0.969917, version=0.12.2

      Then you can header-match in your MUA all you want--or not. (I run it all through procmail, but that's because I want all the filtering done before it hits my IMAP server.)

  15. SpamBayes by MImeKillEr · · Score: 1

    I've got this installed for Outlook XP. Either I don't have it configured correctly (likely) or it just doesn't work well. Even using the emails in the spam folder to 'train' it, it still misses messages.

    --
    Cruising the internet on my TI-99/4A @ a whopping 300 baud!
  16. What about Outlook Distress, er Express by Anonymous Coward · · Score: 0

    We need one too!

  17. Re:40-in-1 jokes by tomcio.s · · Score: 0

    Enough already. Sure it was funny the first time around.. Good chuckle second time. Now its just getting old and annoying.

  18. Mozilla Mail by respite · · Score: 3, Informative

    In case anyone hasn't tried it yet, the Bayesian filters in the mail client of the Mozilla suite are really impressive. They have worked close to flawless for myself.

    1. Re:Mozilla Mail by drinkypoo · · Score: 2, Interesting

      They work pretty well for me, but nowhere near flawless. Some days I get 25 messages that go into the spam folder and only 3 in my inbox, some days I get about 10 in the spam folder and 5 in the inbox... It's a lot better than nothing. The real reason I run Mozilla for mail is the HTML rendering, which is better than any other mail client I'm aware of; The secondary reason is the bayesian filtering, and the tertiary is Enigmail, though no one I know bothers to use encryption anyway.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Mozilla Mail by Jsprat23 · · Score: 1

      I agree about enigmail. Hardly anyone I know uses gpg/pgp, but maybe with me signing my mail and having the enigmail.mozdev.org link in the comments will help encoursge people.

    3. Re:Mozilla Mail by jacquilynne · · Score: 1

      I have similar results with Mozilla's filtering. When I first started using it, it filtered nothing. Then, for awhile, it filtered everything and I was constantly having to retrieve my mail from the Spam box. Now, it moves most spam and a few things that aren't. Better, but not great, as I still have to review the spam box to be sure.

      To my mind, an ideal candidate quality for spam filtering would be spelling and grammar checks. My actual friends are a remarkably literate bunch. My mailing list acquaintances are somewhat less so, but I'm just going to delete the LOLers and the anti-capitalization protestors, anyway, so if they get filtered off as spam, so be it.

  19. Fight Spam with SpamProbe by steveha · · Score: 2, Informative

    I wrote an article on how to set up SpamProbe on a server, and make it easy to train. You could also use Bogofilter or any other trainable spam filter, set up the same way.

    I get at least 100 spam messages a day now, and I only see about a half-dozen or so. SpamProbe deals with the rest, and I don't have any problems with false positives. (SpamAssassin thinks that ads for LinuxWorld Expo are spam, but as I have it trained, SpamProbe doesn't.)

    steveha

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
  20. Popfile by isn't+my+name · · Score: 2, Informative

    I use PopFile. What I like about it is that it easily lets me use multiple personalities in Eudora, Outlook or any other mail client. Nice web based interface and a very active development community.

    You can run it locally on Windows or Linux. But, you can also set it up on a server and then use it to filter e-mail from multiple client machines. That's what I like about it. I have a home machine in my basement office but also upstairs in the TV room. Unlike plug-ins that only work locally, I can have my reclassification decisions apply to multiple client machines.

    Right now, they do not have multiple user capabilities so that my wife and I can both use the same instance and not have our classifications interfere with each other. However, you can set up multiple instances bound to different ports. The developers list multi-user capability as a priority.

    Worth checking out along with the other choices.

    1. Re:Popfile by Inthewire · · Score: 1

      I love popfile, but my main client doesn't support POP mail, so I can't use it there. Unfortunately, a whole lot of spam hits that account. Higher in this thread was talk of making popfile IMAP-capable - let's hope it happens.

      --


      Writers imply. Readers infer.
  21. So-so article by scottme · · Score: 3, Insightful

    For an article in an "IT tech" section of a paper, this is really very weak.

    It really doesn't do much more than precis Paul Graham's arguments, then ends in a blatant plug for just one Outlook addon.

    I suppose if there are still people in the column's audience who haven't heard this all before, and it gets the message out that spam can be effectively filtered, it's a minor goodness.

    1. Re:So-so article by goss · · Score: 1

      SMH has a pretty light weight IT section - it's more for home users and/or a very low tech knowledge audience - I believe the aussie paper that goes a bit more in depth with their IT secion would be The Australian (or is it The Age? I forget)...

  22. Remote Images in spam... by dioxn · · Score: 3, Interesting

    I've noticed that the spam that has been getting through my Mozilla filter are the ones with innocuous sounding subjects and an embedded image.
    Could this be the future of spam?
    Does anyone know if any spam filters pick up on this patern or lack of pattern (after all there are no words in the body usually.)

    1. Re:Remote Images in spam... by cybermint · · Score: 2, Insightful

      Bayesian is more or less word based, so graphical only messages fly right by my Mozilla mail filters. I believe it does the check after the html has rendered. If they ran the filter before the html was rendered, they might have slighty better results. Eventually all spammers will learn the undetectable patterns that only a handfull seem to know now, and it will once again render mail filters useless. I hate HTML e-mail.

    2. Re:Remote Images in spam... by zerocool^ · · Score: 3, Informative

      Does anyone know if any spam filters pick up on this patern or lack of pattern (after all there are no words in the body usually.)

      Um, only read emails in plain text? Use mh.
      inc; scan; show last
      By the way, those images are baaaad. Usually they're something like img src="blahblah.jpg?userid=32898392" and then, when you open it, there's a log of the image with the userid 32898392 being fetched. Therefore, they know that your email address is valid. So, it's a good idea to filter out images anyway.

      But, come on. Email is a medium for transmitting text. It's not supposed to have flowery backgrounds, blinking text, and embedded images. Mabey i'm a purist? But, it's another thing that use to be beautifully simple that the explosion of advertising on the internet has rendered unuseable.

      --
      sig?
    3. Re:Remote Images in spam... by rusty0101 · · Score: 2, Informative

      This is one of the reasons I have configured Evolution to not display remote images, unless I request them. The other is that pulling remote images has the functionality of verifying your e-mail address. (server operator generates a couple million unique random numbers, creates a table of associations between e-mail names and the random numbers, sends each e-mail address their random number as an img src=protocol://server/uniqRanomNunber/image.php, which does a lookup on the uniquRandomNunber, and confirms your e-mail address. Spamer sells list of confirmed e-mail addresses, and you get more spam.

      Suggestion. If your e-mail client does not allow you to disable remote image retrieval, at the very least turn off preview panes. Bette is to find a client that does allow you to disable remote image retrieval.

      -Rusty

      --
      You never know...
    4. Re:Remote Images in spam... by dacarr · · Score: 1

      This isn't the *future* of spam, it's the *present* of spam. It's basically a way for spammers to track valid addresses - rather than wait for a bounce to kill one, wait for an open to validate one.

      --
      This sig no verb.
    5. Re:Remote Images in spam... by letxa2000 · · Score: 1
      I don't know how Mozilla works, but any Bayesian filter worth its salt will analyze the raw content of the mail and consider HTML as such. I.e., a message which is pretty much just an IMG will have an HTML IMG tag. That should be a Bayesian "word" (token) right there. As more spammers do that the presence of an HTML IMG tag is going to score higher and higher as spam.

      So far I haven't seen any techniques used by spammers that will successfully get around Bayesian. If they think they find something and lots of spammers start doing it then that itself will become a red flag that will be noticed by Bayesian.

  23. Mozilla by Little+Dave · · Score: 2, Informative

    Having used the spam filtering built in to Mozilla for the last six months, I can testify to its effectiveness. In very little time at all, I'd trained it to send 95% of the filth to the spam directory and avoid doing the same for 95% of good mails. For me, not having to run a "middle man" piece of software was a real boon.

    However, my life isn't totally spam free, as I find that I become neurotic about those 5% false positives that get unhelpfully moved to the spam directory, so still end up having to sift through the grot every once in a while. On the plus side, I now have a solution to my tiny cock problem, I've arranged cheaper home insurance and I have the email address of several horny co-eds who I'm assured are hungry for man juice.

    1. Re:Mozilla by Anonymous Coward · · Score: 0

      I now have a solution to my tiny cock problem

      Oh, is that why they call you "Little Dave"?

  24. Why stop at classifying spam? Why not all e-mail? by Anonymous Coward · · Score: 5, Insightful

    As I wrote only late last night, using Bayesian classification with only two categories (spam and "non-spam") is somewhat short-sighted, since if properly trained, a Bayes classifier can do a much better job than ordinary mail filtering (procmail, Mozilla or Mail.app filters, you name it).

    In fact, if I had to bet on the next "killer apps", mail sorting and RSS filtering based on Bayesian classification would be right at the top of my list, based solely on the actual time-saving benefits for users. And I can't see any reason for Bayesian filtering not being included in Mozilla Mail and Apple's own (revamped) Mail.app.

    I have to use Outlook at work, and after setting up Outclass (which requires POPfile) with several "buckets" to classify my corporate e-mail by project and field, I'm definetly not going back. Outlook, even with extensive use of Rules Wizard and categories, simply cannot cope with the diverse kinds of project-related e-mail I swap with colleagues, and Outclass is the only thing I could find that could deal with Exchange, PST folders and multiple Bayesian "buckets" categories.

    Come on, do the right thing and tell Apple and The Mozilla Project that you want configurable Bayesian filtering on their mail clients.

  25. This is apparently decent by Anonymous Coward · · Score: 0

    Some of the departments at work have switched over to this:

    http://www.canit.ca/

    Haven't tried it myself, but it is apparently fairly competent.

  26. Liberal Arts Economy by Anonymous Coward · · Score: 0

    Another liberal arts economy around the corner.

    Heads up on another stock market buzz. Predictable daily fluctuations plus buying low and selling high for those that wish to capitalize on this (keep lawyers handy for insider trading abuses). Never keep to much money in the market at once, and have a good process of converting it into hard currency that is not prone to valuation swings.

    I guess that last two fraudulent pushes for the most part went unpunished, we'll likely see more until we (nationally) hit a certain desaturation point.

    Never keep more then 20,000 in at a time.

    pawl -- Reverse engineering commercials since 1987.

    1. Re:Liberal Arts Economy by Anonymous Coward · · Score: 0

      as hype is pushed, the market over steps with little downward adjustment until it becomes a necessity. Daily and hourly adjustments can become quite predictable.

      When psychology and group think is incorporated into the mix (day traders, stock market floors), barring a large disaster things can get like a runaway train. The result is the pattern.

      On another note, why doesn't geneticlly modified materials fall under the foreign species list?

    2. Re:Liberal Arts Economy by Anonymous Coward · · Score: 0

      Has anybody noticed the pharmaceutical infomercials playing across the news programs lately? How about plastic surgery? Should these reporters be concidered tainted?
      Notice who their targeting?

      The methodology is an interesting study.

    3. Re:Liberal Arts Economy by Anonymous Coward · · Score: 0

      The liberal momma knows best economy.

      It's not liberal and but it is a commodity in the human body and tongue and cheek disgrace for being born.

  27. I hate spam too, but... by Daimaou · · Score: 3, Funny

    I hate spam just as much as the next person, but I must admit, without it I wouldn't be the horse-sized love stud that I am. Thanks spam.

    1. Re:I hate spam too, but... by Lobo_Louie · · Score: 1

      How come 20 "Mortgage Enlargement" spams get through spamassassin? :^)

  28. Dirty Spammer Tricks by dprice · · Score: 2, Informative

    I have been using the Mozilla junk mail filter for a couple of months now. One pop mail account is one that I started using in 1996. It is a spam magnet. In the time I have been using Mozilla, it has accumulated over 12,000 spam messages. That should be plenty of training for the Bayesian filter.

    Mozilla's filter does a reasonably good job at catching spam, but I still get a handful of messages every day that slip through the filter. The ones that slip through seem to be messages that have intentionally munged the spammy words with spaces, numbers, and misspellings. The spammers know that people are filtering, and they are successfully getting through the filter with their dirty tricks. Another trick spammers use is to send a message with nothing but a graphic ad. The filter doesn't have enough words to judge the the spam, so the message slips through.

    I also had some 'ham' messages get filtered, so I still have the annoyance of having to check the 'junk' folder periodically for wanted messages. The filtering makes life easier, but it is still not an ideal solution to the spam problem.

    1. Re:Dirty Spammer Tricks by dioxn · · Score: 1

      I agree about the problems with messages with only images in them, but munging "the spammy words with spaces, numbers, and misspellings" is more of an argument against traditional keyword based filters. Baysian filters as I understand them should become more effective in this case after the first spam.

      For instance if I write a spam to you and use the word pen1s instead of penis that can easily be marked as spam from the next spammer that uses that, however you may get email that has the word penis. Thus the next time the spammer uses pen1s it is simple for the Baysian filter to catch this.

    2. Re:Dirty Spammer Tricks by donutz · · Score: 1

      The ones that slip through seem to be messages that have intentionally munged the spammy words with spaces, numbers, and misspellings. The spammers know that people are filtering, and they are successfully getting through the filter with their dirty tricks.

      Well this is really self-defeating on their part. Sure, now they are getting their spam past your filters, but are you going to remortgage your house with a company that promises you "The best m0rtgag3 rates in the universe! Apply now for these incredible in terest rates!"

      (Hint: The answer is no.)

      If they have to stoop to such unprofessional crap, then they're going to find out they're not even making the 1 or two sales that a grammatically correct and professional looking spam might have.

    3. Re:Dirty Spammer Tricks by Anonymous Coward · · Score: 0
      SPAM is, quite simply, an arms race.

      Trust me, a year or so from now, bayesian filtering will no longer be effective. Spammers will work around it. I have already seen the effectiveness of Popfile drop from 99% to 95% in the last 3 months. Now spammers are including several paragraphs of unrelated (ie, un-spammy) text at the end of their message, or -- even more clever -- in a undisplayed MIME multipart section.

      So now Popfile will have to have a MIME decoder?

      And then they'll send their SPAM in GIFs.

      So then Popfile will have to use some kind of text-to-graphic weighting factor (note: no longer pure Bayesian/Naive filtering...)

      And then they'll start attaching a megabyte of unrelated text to the SPAM.

      Arms race.

      And then people will abandon Bayesian filtering, similar to SpamAssassin. SA was great in the early days. Nowadays, for anyone who receives a good deal of spam, it is 50-50 accurate. Why bother.

      And note that the countermeasures will have increased the size of the average spam from 3-5K to a megabyte plus. Great for bandwidth.

      I hate spam.

    4. Re:Dirty Spammer Tricks by letxa2000 · · Score: 2, Insightful
      Trust me, a year or so from now, bayesian filtering will no longer be effective.

      No, I don't think I'll trust you on that... :)

      I have already seen the effectiveness of Popfile drop from 99% to 95% in the last 3 months.

      That's very strange, but based on what you said below it seems that that's due to a limitation of Popfile as opposed to Bayesian itself. I've seen my Bayesian effectiveness INCREASE in the last 3 months.

      Now spammers are including several paragraphs of unrelated (ie, un-spammy) text at the end of their message

      There is a common misconception--both among spammers and anti-spammers--that doing the above will get your messages through. In some rare cases it might, but you have to remember that a good Bayesian filter is only going to pay attention to the most spammy and least spammy words. Just entering a useless, non-spammy paragraph is not enough. Unless that non-spammy paragraph happens to contain quite a few words that are downright NON-SPAM in my corpus all that verbage isn't going to do squat to lower the overall spam score of the message.

      Basically, you need to know that my email typically talks about microcontrollers, I have a friend named Nathan, or my mom is named Angie. Just flooding me with words that don't appear in spam will do nothing unless you flood me with words that are extremely non-spammy in my particular corpus. And it's unlikely some random paragraph will manage to do that.

      So now Popfile will have to have a MIME decoder?

      You mean it doesn't now? This is what causes me to think that this is a limitation of Popfile more than a limitation of Bayesian and, perhaps, is why my Bayesian effectiveness is climbing and yours is falling.

      And then they'll send their SPAM in GIFs.

      At which point the fact that a message contains just an IMG is going to receive a high spam score. No-one says Bayesian can just score words. You can create a token that means "Message only contains an IMG" or something like that. Bayesian doesn't mean we're done developing--it just means that the logical work is done. Now all we need to do is keep our eyes out for new "characteristics" of spam that can be detected and considered to be a "token."

      So then Popfile will have to use some kind of text-to-graphic weighting factor (note: no longer pure Bayesian/Naive filtering...)

      Very doubtful for the reason mentioned above. You look at characteristics of the mail. And if you find that the message is basically just an IMG, that's a major strike against it. I severely doubt that you have to OCR the image unless your real email is also sent to you as images instead of text.

      And then they'll start attaching a megabyte of unrelated text to the SPAM.

      Again, just adding "innocent" text is not enough to get past Bayesian. You have to have the RIGHT innocent text, and that's different for each person. And, again, if they start adding megabytes of useless text you add a characteristic for "Text of message is over 100k". Suddenly Bayesian will realize that 99% of those messages are spam...

      And note that the countermeasures will have increased the size of the average spam from 3-5K to a megabyte plus. Great for bandwidth.

      I doubt that will be the case for the reasons mentioned above. Spammers are adding useless paragraphs now because they don't understand Bayesian.

      Again, you just need to remember that 1) Bayesian isn't fooled just by adding paragraphs or megabytes of meaningless text. 2) Bayesian doesn't mean we never have to think about spam. It just means the hard work of deciding whether or not a message is spam is done. Now all we need to do is keep our eyes open for new "identifying characteristics" that often appear in spam. The rest falls into place automatically.

  29. great spam filter software by mozkill · · Score: 1

    personally, I use K9 and it filters out nearly 98% of my junk mail though i need to baby sit it a bit while it learns who my friendly emails are...

    best of all. its free! and free of spyware too.

    http://keir.net/k9.html

    --

    -- Betting on the survival of the media industry is a serious risk. I advise investing elsewhere.
    1. Re:great spam filter software by e4liberty · · Score: 1

      I also like K9. It is small and fast on Win2k. It has APOP support (which was a pain to add to POPFile). Here are my results for the first 2000 and next 700 messages using K9...

      Total number of emails processed 2,004 724
      Number of Good emails processed 1,016 453
      Number of Spam emails processed 988 271
      Number of emails re-classified to Good 36 0
      Number of emails re-classified to Spam 41 7
      Percentage misidentified as Spam (false positives) 1.8% 0.0%
      Percentage misidentified as Good (false negatives) 2.0% 1.0%
      Overall accuracy 96.2% 99.0%
  30. Two things by blakestah · · Score: 1

    Our problem is that 99% of people read email via POP, and POP only serves one mailbox per person. It is extraordinarily difficult to train everyone to use a spam filter individually, and yet installing one on the server can't work with POP's limitations. Frustrating.

    Secondly, Microsoft is in the fray now. Bet any amount they will offer a authenticating email service that requires using Windows XP to work. It will work really well, you won't be able to communicate well with people who don't use it - standard tactics. They want to give people incentive to upgrade, and to stay locked in.

  31. The spam I do see by steveha · · Score: 4, Interesting
    I'm using SpamProbe, and it blocks almost all spam I get.

    Much of the spam that gets past it is so minimalist it cannot be blocked by a Bayesian filter. I get messages like this:


    Subject: a nice lady wants to talk to you

    see the pictures

    no more mail


    It's like someone is trying to put so little in the message, that there is nothing to filter. If only they would use the stock "We are sending you this because you opted-in on it. Click on this link to remove your address." If they used that, I'll never see the message; SpamProbe will grab it. But how could I train SpamProbe to detect the minimalist ones, without blocking everything forever?

    So far I don't get too many of the minimalist ones, and I just hit delete. If it becomes widespread, I'll have to start using Vipul's Razor or something.

    The other kinds of spam that get past SpamProbe are the ones that have rampant misspellings. Since none of the words are in the database, they don't match as spam terms:


    Subject: make moneey on EBAYxbbid

    Want to make moneyzseqw? Click here...


    I really think that I should write a filter that spell-checks an email, and rejects it if over 50% of the words with 5 or more letters are misspelled.

    steveha
    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
    1. Re:The spam I do see by pe1chl · · Score: 1

      >I really think that I should write a filter that spell-checks an email, and rejects it if over 50% of the words with 5 or more letters are misspelled.

      That is a good idea. Of course it will need a large dictionary of correctly-spelled words in a couple of languages.

      Maybe it is also possible to create an algorithm that decides if a string of characters is likely to be a valid word in a western language. I.e. that separates the correct words from the hwreqwuir. That could be used like the spellcheck.

      Rob

    2. Re:The spam I do see by be-fan · · Score: 1

      I really think that I should write a filter that spell-checks an email, and rejects it if over 50% of the words with 5 or more letters are misspelled.
      >>>>>>
      That would rock. If such a filter was on Slashdot, Slashdot's post volume would drop by 90% :)

      --
      A deep unwavering belief is a sure sign you're missing something...
    3. Re:The spam I do see by Porphyro · · Score: 1
      I really think that I should write a filter that spell-checks an email, and rejects it if over 50% of the words with 5 or more letters are misspelled.
      Then none of the emails from my parents would ever get through... maybe that isn't such a bad idea!
    4. Re:The spam I do see by slagdogg · · Score: 1

      The original article (a href="http://www.paulgraham.com/spam.html">http:// www.paulgraham.com/spam.html) actually talks about this ... he suggests actually visiting the URL mentioned and running the returned page against the bayes rules, also factoring in HTTP redirects, etc. ... of course, analyzing the URL iteself can help too.

      --
      (Score:-1, Wrong)
    5. Re:The spam I do see by Anonymous Coward · · Score: 0

      I get those minimalist messages as well. Remember that a Bayesian filter looks at the headers in addition to the message. I use popFile and I'm at 99.1% accuracy with about 4,000 messages. Can't ask for much better than that.

    6. Re:The spam I do see by Anonymous Coward · · Score: 0

      Hey, those links didn't work. I was all geared up to look at some pictures.

    7. Re:The spam I do see by anthony_baxter · · Score: 1

      You might want to look at the background page on http://spambayes.sf.net/background.html -- we tried a bunch of these sort of wierd tokenising schemes, and we found that in the end you're better off just leaving it alone. If the spammers want to communicate _something_ to you, they're going to have to use real words, or something like them. After all, getting a spam advertising "Gnargle Froogu Retki Wakka!" might get past the spam filter, but it's not a very good advertisement.

    8. Re:The spam I do see by thogard · · Score: 1

      the spam compaines claim that the email will increase hit counts. As far as the suckers that paid to get their ads out are concerned, a bot hitting their page is just another user hitting their web page.

  32. Re:Mozilla Mail/Thunderbird by Amerk_5 · · Score: 1

    That's one of the main reasons why I switched to Thunderbird. After about a week of training, it only missed one spam email & I haven't had any false positives. It might also help that you can choose to exclude any mail from people that are in your address book from being filtered.

  33. Re:What you want by Anonymous Coward · · Score: 1, Informative

    You might take a look at Spam Sleuth Enterprise I suspect it has what you want, since it has trainable Bayesian (individual to each user), works with any e-mail server, has a web client interface, and a lot more that you may or may not be interested in.

  34. Re:This is totally useless. by cybermint · · Score: 1

    Bayesian filtering it's a miracle solution, but it does save about 45 seconds a day. That adds up to quite a bit over time.

  35. Re:40-in-1 jokes by Anonymous Coward · · Score: 0

    If you're trying to be funny, well, sorry you're not really. If your point is that all these jokes are lame, well, yeah they may be, but if we didn't have these then where would all our internet cliches originate and flourish?

  36. Battle of the network Bayesian allstars by dubStylee · · Score: 3, Interesting

    Suppose

    1. I have a friend who uses the same kinds of words as I do and who uses Outlook (ok, an aquaintance, because friends don't let friends ...)

    2. An email virus attacks this person, snarfs up his Ham, runs a Bayesian filter on it and comes up with Spam specifically tailored for this person's aquaintances.

    There's a science fiction book waiting to happen in here somewhere. If so, I own the SCOpyright on it.

  37. What I don't like by Boyceterous · · Score: 5, Interesting

    about this kind of filtering is that it has to download the email content - not always as good idea, especialy in a Windows environment. Besides, I can identify spam just by looking at message header information. Sender, recipient, and subject line are nearly always enough. Plus I don't need to waste time, bandwidth, or get subjected to offensive graphics, or risk 1-pixel confirmations or getting hacked by the latest security issue. My homespun message header analysis program drops nearly all spam, and results in few legit email rejections. I score the headers based on missing recipient, sender info, keywords in subject, string match in sender email or name, punctuation count in subject line, number of contiuous spaces in subject line, plus a few other things that seem to run common in the spam I get. I can also permit certain email addresses to pass no matter the score. It's not fancy, but it works, and I never have to waste time drawing the whole content down to my local machine. What I do may not work for everyone, but it seems that in most cases it should, unless you get a lot of email from unknown (non-spam) sources - not typical for the average email user.

  38. Someone cares by Anonymous Coward · · Score: 0

    Who cares indeed. Someone does, because there were a lot of little slashbots lining up to mod you down. Why promote good, funny posts? It takes so much less to just slap stuff down, right?

    ROTFL ... so many tools. Sieg heil Taco.

  39. SpamBayes is great by mnemonic_ · · Score: 0, Redundant

    SpamBayes is great if you're a Cloudmark Spamnet refugee like me, who left Spamnet after it went subscription. You probably have a folder full of hundreds of pieces of spam from Spamnet. SpamBayes can be trained on that folder of spam, so it can start accurately identifying spam without further weeks of training, unlike other Bayesian filtering solutions.

  40. Spam is a poor use. by Lord+Bitman · · Score: 2, Interesting

    this is like inventing something as useful as the Knife, and using it only to attack salesmen. Why bother stopping with spam? Why not apply this filter to, say, absolutely everything? Since I just said "absolutely everything", I wont bother giving examples.
    Training something to know how likely something is to be true, that sounds too useful to waste any time with on spam at all.

    --
    -- 'The' Lord and Master Bitman On High, Master Of All
    1. Re:Spam is a poor use. by slasher+guy · · Score: 1

      "this is like inventing something as useful as the Knife, and using it only to attack salesmen."

      your right, it should be used for bad actors, spammers, and trolls as well.

  41. is it really using whole words? by AssFace · · Score: 1

    I assume they are just simplifying the idea for the masses - but the article says that if "word one" shows up 60 times in spam, but twice in ham, then another message that has "word one" show up 40 times is more likely to be spam.

    But technically I don't think that is how the actual process works - well, the same way - but they use sub sections.
    If you only look at words, then you are really shooting yourself in the foot with bayesian analysis.

    Instead it seems it would be much better to look at a block of characters at a time.
    I know from my own experience playing with Markov matricies that 3 and 4 chars work fairly well in creating English text - so I would imagine that analysis would be similar.
    I don't know how much the size of the chunk looked at matters as much as the fact that it looks at every character and as long as the chunk is smaller than the entire text message ideally.

    So you would have a hash (dictionary if you use VB) and look at the first N characters of your e-mail and then put that in the hash and increment that value by one (in Perl: $myHash{$chunk}++).
    Then you move the chunk forward one character and put that into the hash and increment it up.
    That is the learning process.
    Then the analysis process would be to move the chunk over the text and add up the hits and then use the final score as a way to determine how it should be dealt with.

    The way SA does it is the Bayes analysis is just one part of the whole - so if the Bayes says it really is likely spam, but 5 other things say it is likely not, then depending on what you have set the Bayes trigger to worth, it could actually be decided as ham.

    Anyway, I think the whole process is neat and I love playing with Markov Matricies.

    --

    There are some odd things afoot now, in the Villa Straylight.
  42. SpamBayes not Marc Hammond's work only by mpieters · · Score: 5, Informative
    SpamBayes was originally conceived by Tim Peters and co at Python Labs, who improved on the orginal algorithm considerably. From there on out, many people helped tune and perfect the implementation, making it the most effective Baysian-based spam filtering tool currently available (IMNSHO).

    Mark Hammond then wrote the Outlook plugin, which, admittedly, is considerably more code than SpamBayes, but not SpamBayes itself.

    For the complete background on why SpamBayes is so good at what it does, and it's history, see:

    Marc's is not the only application frontend for SpamBayes, here is a list of others: No apologies for this my pedantry offered.
    --
    "The truth shall make ye fret" -- The Truth, Terry Pratchett
    1. Re:SpamBayes not Marc Hammond's work only by Anonymous Coward · · Score: 0

      There's a beta of a commerical product named InBoxer coming out that gives credit to Tim Peters, Python Labs, Mark Hammond and SpamBayes on their web site. I've been using it for a couple of weeks now and it seems to be working. They also have a rudimentaty FAQ on Bayesian filters -- aimed at neophytes. (www.inboxer.com/2faqbayes.html)

  43. Re:This is totally useless. by serbanp · · Score: 3, Informative
    No it's not.

    At work I have Outlook always running with the excellent bayesian FREE filter Spammunition www.upserve.com. I also do check the mailbox from home over a dial-up connection.

    If I wouldn't use Spammunition, then I would spend a lot of time downloading spam messages; as it is right now, I get just the ham (several messages instead of many).

    Serban

  44. Mail.app by Anonymous Coward · · Score: 1, Informative

    Isn't this similar to what is used in Apple's Mail.app for sorting junk mail?
    http://www.apple.com/macosx/jaguar/mail.htm l

  45. Re:Why stop at classifying spam? Why not all e-mai by Anonymous Coward · · Score: 0

    A problem is the more classifiers you add the more likelyhood of an incorrect classification.

    I've designed hierarchical bayes algorithms in attempts to deal with this situation but they add very little in the way of accuracy. It's much easier to filter mailinglists on a few keywords than it is to worry about more classifiers.

    Especially irritating is when your corpus sizes are very dissimilar. If you only get 1 message a week for one corpus and 30 messages on another it can cause skewing since that 1 message is worth quite a bit more than 1 messages of the other corpus.

  46. Soundex to work around intentional misspellings? by GGardner · · Score: 3, Interesting
    For the spammers who are trying to use misspellings to get around filters, I wonder if soundex could fix that problem quickly. That is, instead of doing the Bayesian calculations on the raw tokens, calculate probabilities based on the soundex values of the the tokens. You might need to teach soundex that the number one sounds like I, and other leet-speek-like things, but this might solve the problem quickly and easily.

  47. Spambully by Symbha · · Score: 1

    I've been using Spambully for Outlook for a few months now. Though it's not open source, it works quite well and has the added bonus of sending bounce messages if you want to the spam sender, as well as a challenge response mechanism to filter. There's a free trial. Just an fyi.

    1. Re:Spambully by Symbha · · Score: 1

      Forgot to mention that it's a bayesian filter.

  48. Re:Luke Skywalker wrote my mail filter?! by wheany · · Score: 1

    No, that would be Mark Hamill.

  49. Re: expert rules & methodology, bayes overrate by alexjc · · Score: 1

    FYI, spamd is installed by default on pair.com accounts; you can call spamc from procmail. I spent most of yesterday afternoon setting it up, along with Spam Assassin and a nifty server side IMAP filtering... It's nothing revolutionary, but it's satisfying to have it setup ;)

    Spam Assassin has bayes, but I've been getting 99% so far without it; expert rules work amazingly well, no need for learning. Methodology is the best way to foil spam. Have at least two/three email addresses: one public address (minimizing public exposure), one email as default reply-to (except for mailing lists) and optionally one address for close friends only. You can keep a low spam threshold then without much to worry about...

    The idea of sorting the spam folder by score by injecting the rating into the subject (from this article on Reverse Spam Filtering) works wonders and it's easy to setup with procmail. If things get worse, I'll most likely be setting up temporary addresses that expire within weeks (for website contact & feedback), or a password system with password and explanation posted with contact details on my homepage)... it's almost as good as GPG/PGP for this purpose without the inconveniences for the other party.

    I actually look forward to getting spam now!! hehe

  50. SpamNet by SunPin · · Score: 2, Informative

    I use spamnet by cloudmark. It catches everything. I can't remember the last time I had to click the "block" button. I'm very conscious of where my email ends up and I'm a hardcore advocate of email aliases. As a result, since September (last major crash), spamnet has blocked 4000 pieces while I've actively blocked only 11.

    That's pretty f'n good in my book. So good, in fact, that I send all blocked messages to the "Delete" folder instead of the default "spam" folder and set outlook to permanently delete on close.

    I have two concerns about this program:

    --Money. They are now charging and pretty much deserve it from the average user.
    --Reliability. This company could disappear tomorrow and sell off the server that has compiled spam data.

    Since mathematics isn't going anywhere, I'm leaning towards switching to an open source Bayesian alternative but, as mentioned above, all my spam gets thrown out the door on contact.

    What is the approximate training time of a Bayesian filter?

    --
    Laws are for people with no friends.
    1. Re:SpamNet by E-prospero · · Score: 1

      What is the approximate training time of a Bayesian filter?

      How long is a piece of string? :-) It depends on the inference algorithm in use and the size of the traning set (both in terms of the number of training exemplars, and the number of features per exemplar).

      However, as a guideline - I just trained the SpamBayes filter on my Outlook mail box; approx 400 spam messages, 1000 ham messages, and the filter was trained in about a minute. This is on a PIII-800.

      So far, I'm pretty impressed with the performance of SpamBayes - The contents of my spam folder is all rated >97% probability; the highest rating on a ham message is 11% - and that was a mass mailed RedHat Network update alert.

      Russ %-)

      --
      ... and never, ever play leapfrog with a unicorn.
    2. Re:SpamNet by SunPin · · Score: 1

      Good. I have the same processor. I just adjusted my outlook settings to move collected spam to the default "spam" folder so I can start building a sample. Will try the Bayesian filter within a few days. Thanks.

      --
      Laws are for people with no friends.
  51. Being used for spam, not invented for spam by michaelggreer · · Score: 1

    They didn't have email back in the 18th century when Thomas Bayes came up with this statistical method. It is simply being applied to spam, but has been used for other more "useful" purposes as well.

    Actually, I think spam is a major problem and not a trivial application of statistics.

  52. Re: expert rules & methodology, bayes overrate by AssFace · · Score: 1

    Well hot damn, didn't know it was already on the server. I looked all over the place on the server and didn't find it at all and when I wrote to them to get the okay to run the Perl script, they never mentioned it (I wasn't sure how psyched they would be if it was cpu intensive. But since they kill anything that runs over 30 seconds it doesn't matter too much).

    I have a bunch of e-mail addresses and they forward around depending on where I am (less so now that I don't have a cell phone - I had filters that would forward messages to my phone as text messages if they were from certain people during certain times).

    I'll have to look into the spamd - I wonder how long that has been there - I swear it wasn't when I started.
    Thanks!

    --

    There are some odd things afoot now, in the Villa Straylight.
  53. Great, but my problem is a bit more complicated .. by slagdogg · · Score: 3, Interesting

    Bayes rocks, been using it with spamassassin and it kills 99% of my spam. The problem is when some asshole spammer uses my email address in the 'From' header of his spam ... then I get scores of 'user not found' or 'virus detected' emails from legitimate mail servers ... it's not spam, but it's just as annoying. How do you guys deal with this problem?

    --
    (Score:-1, Wrong)
  54. The math by bpfinn · · Score: 2, Informative

    I think Tom Mitchell did a good job in explaining the math in his book Machine Learning. It's a very pricy book, so maybe you can look for a used copy.

  55. Any Perl programmers out there? by aquarian · · Score: 1

    is a scalable popfile for larger organizations. If I could get popfile (with its super-easy-to-train/use-web-interface) that would run on my linux server, scan my IMAP mail server (well, incoming mail would actually work fine, too. I've heard they have a smtp plugin for it in cvs), and then have a popfile config page for each person, or mayby tie it into the imap/smtp server's login. THAT would rock.

    Actually, I would love to have the same thing. Popfile is all Perl and open-source, so it could probably have its "guts" adapted for this use.

    I've been using Popfile on my laptop to filter several POP accounts. It filters around 1000 messages a day, with about 99.4% accuracy. And the misses are usually a spam message or two that gets through. I never miss a legit message. I want to start reading my mail from an IMAP server, because when travelling I don't always have a broadband connection to download all this crap and filter it locally. Wouldn't it be great to be able to transfer my already-trained filters to this new IMAP machine?

  56. "if you need to explain to non-technophiles" by bcronin · · Score: 0, Redundant

    Because every technophile here knows exactly how Bayesian filtering works, right?

  57. Re: expert rules & methodology, bayes overrate by AssFace · · Score: 1

    I just checked the server and found spamassassin version 2.43 on there.
    There is no Bayes in SA until versions 2.5 and after.

    I currently run 2.60 (they don't seem to have updated it in a long while now) and am too pleased with that to go back to an older version.

    I will write them again and see if they are willing to upgrade the one on my server - the worst they can say is no.
    Well, I suppose the worst they could say was that they are cancelling my account :)

    --

    There are some odd things afoot now, in the Villa Straylight.
  58. Why has Microsoft not done this already? by aquarian · · Score: 1

    This is the big question. Bayesian filtering has been in use for a couple of years now, and is well-proven, IMO. What's wrong with Microsoft? Why are they dragging their feet on this? They should have been shipping this with OE a couple of years ago, if not before. Not only would this have given the average user some relief, it would have slowed the recent explosion in spam itself. And it would have been so easy to do. Fuck Microsoft, one more time.

    1. Re:Why has Microsoft not done this already? by dacarr · · Score: 1

      Microsoft hasn't done this for their own reasons, but consider that they have consistently dragged their feet from day one regarding system security on Win32, Lookout, etc., and are only recently talking about end user system security integrations.

      --
      This sig no verb.
  59. The term "Ham" by kisrael · · Score: 1

    When the hell did the term "Ham" start getting used? I missed it completely 'til this article.

    Not that it matters all that much, but Hormel, who has taken use of "Spam" in pretty good graces, can't be happy about this at all. It's one thing when your product is linked to a negatively-perceived other concept, but then the further implication that Ham=good, Spam=bad... hrrm.

    --
    SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
    1. Re:The term "Ham" by Mooncaller · · Score: 1

      Hormel could respond by direct marketing via email. Anyways I think its more like Spam == fake, Ham == real.

  60. Re:Eudora users... (and other options...) by anti$pam · · Score: 1
    I recently started using SAProxy under Windows (I'm stuck with windows at work :-(), I've found that it does a decent job filtering out the spam.

    However I've really gotten hooked on Bloomba -- it has good SAProxy integration. Very simple to setup compared to some of the other antispam solutions.

  61. Re:YOU MUST BE FROM FRANCE... by Anonymous Coward · · Score: 0
    And I thought it was SCOTUS that just made sodomy compulsory in Texas...

  62. Bayesian filtering by JJahn · · Score: 1
    I was at first leery of trying a Bayesian filter. I thought to myself, its so simple, it won't work good enough. So I installed PopFile (integrates as an SMTP proxy on Windows, extremely easy to use/setup) and was amazed. After a bit of training I now have about a 97% accuracy, sometimes even higher.

    I also used SpamAssasin for a while, but it never seemed to do quite so well of a job. It let alot of junk get through until I set in lower, then I got some false-positives.

  63. Re: Too resource hungry by Anonymous Coward · · Score: 1, Interesting

    Have you tried running a Bayesian filter on many messages at once? The Mozilla implementation hangs the mail app for a few seconds on a 1.4GHz Athlon when going through a hundred or so messages. Assuming Slashcode would implement it through Perl, it would be even slower. For reference, running Spamassassin with Bayes filtering (Perl scripts, not spamc) isn't exactly speedy. Going through several messages brings CPU usage close to maximum.

    Bayesian filtering on comments would be too resource costly. A more plausible application would be to run stories through a Bayes-style filter, creating a profile for each story that checks each new story with previous profiles so that dupes could be reduced. But that would not be as good as having the editor looking at the current front page (as SCO stories would look similar).

  64. Usefull ??? by Mooncaller · · Score: 1

    Now why would I care about an Outlook drop-in? Besides, the only sentence I could come up with that uses both "Outlook" and "usefull" is, "Reformating the hard drive is a usefull way to remove Outlook and other viri, that also eliminates the MS detritus that common viri feed on."

  65. Spammunition by Anonymous Coward · · Score: 1, Informative

    Spammunition is a great Outlook plugin that does this.

    Come on!! Give some credit!

  66. An advantage of these apps over Popfile... by aquarian · · Score: 1

    ...is that they don't tie up port 8080, which you may need while doing web development locally. This isn't a huge problem (the defaults can be changed), but I wind up having to shut down Popfile when playing with Zope, for example.

    1. Re:An advantage of these apps over Popfile... by Thuktun · · Score: 1

      ...is that they don't tie up port 8080, which you may need while doing web development locally. This isn't a huge problem (the defaults can be changed), but I wind up having to shut down Popfile when playing with Zope, for example.

      ISTR you could configure Popfile to use whatever port you want. There's nothing magical about port 8080.

  67. Re: OE by Anonymous Coward · · Score: 0

    The filter only works with Outlook Express. So, no need to worry about home users NOT switching to Mozilla. But, the corporate users seem to be doomed forever with the plugin for their Outlook.
    Ok, now you can laugh.

  68. Outlook - turn off HTML mail by siamSam · · Score: 2, Informative

    Turn off html mail for Outlook and help keep them from validating your address through this method.

    Place these two keys in .reg files of their own and be able to quickly switch between viewing html and plain text mail. taah dahhh!

    [HKEY_CURRENT_USER\Software\Microsoft\Office\10. 0\ Outlook\Options\Mail]
    "ReadAsPlain"=dword:0000000 1

    OR to turn it back on and view those pretty pictures

    [HKEY_CURRENT_USER\Software\Microsoft\Office\10. 0\ Outlook\Options\Mail]
    "ReadAsPlain"=dword:0000000 0

  69. Well... What would REALLY interest me is... by crazyphilman · · Score: 3, Funny

    A Bayesian filter that reads personal ads, compares them to ads posted by women who are KNOWN to have been "easy" (on a sliding scale, configurable, ranging from "mildly slutty" to "dangerously psychotic nymphomaniac"), and returns a list of likely phone numbers.

    Hell, I'd pay MONEY for a piece of software THAT good (Hmm, clickety-click, select "nymphomanic", enter search site... Ah! This one has an oral fixation! Thank you, Mr. Bayes!).

    --
    Farewell! It's been a fine buncha years!
  70. You make a good point by Anonymous Coward · · Score: 0

    You make a good point, but I still think i need a larger pen1s, n0 matter what you say about those spamers and their dirty market ing techniques.

  71. Admit it, Slashdot. You love spam. by jeduthun · · Score: 2, Interesting

    You guys are a bunch of hypocrites. You don't really want spam to stop. You love spam.

    Every spam thread is the same: I use X, and it blocks 98% of my spam, with no false positives! I use Y, and it blocks 99.9% -- take that! Here, I use Z + Y with these custom Perl scripts I wrote that interface with procmail and stop 101% percent of spam! It doesn't matter, because I never get ANY spam! Spam is only because people buy things in spam! What morons! Bow before me, for I am 1337!

    Spam gives you something to fight. Spam gives you an excuse to solve an interesting technical problem (i.e. separating spam from ham). Spam gives you a reason to boast. Spam gives you people to dislike.

    Admit it.

    You love spam.

  72. Check out Cloudmark's SpamNet by Hollinger · · Score: 1

    It's semi-distributed, in that users install a small plugin for Outlook, adding "block" and "unblock" buttons to the tool bar. The entire community of users works against spammers.

    It works well. When I check my mail, I can watch the 50 or so spams I get daily pop into my inbox, and then promptly fly right back out again.

    (Blatantly stolen from Spamnet's Learn More page)
    When the message comes in, SpamNet generates a unique fingerprint of that message. The fingerprint is a one-way hash, or unique string of numbers that represents the email and can absolutely NOT be decoded.

    This unique fingerprint of the message is sent to the server where it asks the database if this message is spam. The server comes back to the client with a confidence level of how sure it is that the message is spam by checking it with the other fingerprints in the database.

    If the same signature has been reported to the SpamNet database, this indicates that the message is spam and it is consequently moved from the member's Inbox to the Spam folder.

    If a spam slips through, the SpamFighter can use the "Block" button (vs. delete) to remove it from their inbox and report it to SpamNet to help themselves and the community. Again a unique fingerprint of the body of the message is generated and sent back to the server. Here is where TeS, or the trust system comes into play to ensure that only valid spam messages are blocked. SpamNet looks at the reputation of the person that blocked the message and depending on their individual trust rating; a confidence level is applied to that message to decide whether it should be blocked for the entire community. Each person starts with a zero trust rating and generates trust based on several factors including how accurate their reports are and the number of reports overtime. This process happens instantly taking less than 3 minutes to stop a spam message that's new to the system for the entire community.

    1. Re:Check out Cloudmark's SpamNet by Anonymous Coward · · Score: 0

      Sure, this reduces the human work load, but a statistical filter eliminates the human workload almost completely.

      I suggest that you still try a Bayesian filter. These things are miraculously good, although some are even better. The ones that count single words easily cans 97% of all spam and the ones that count word pairs as well, easily cans over 99% of spam.

      SpamProbe is an example of the word pair counting variety. It runs on Linux and works fine when installed on a server to classify all mail for a whole corporation.

  73. Problem with this kind of filtering is.. by msimm · · Score: 1

    I don't want to look at the spam, ever. I want it to go to /dev/null before I even download my messages.

    This is at best a band-aid and with the usual mistakes and slip-ups it hardly seems like a very good one. I mean if I have to sort through my junk box to check for mislabeled emails its not doing me so much of a favor.

    All this talk about smart filtering and I'm starting to feel like you've missed the point, your still getting spam. Who cares if its semi-sorted.

    --
    Quack, quack.
  74. what works well with Ximian Evolution? by linuxlover · · Score: 1

    I have Evo picking up my mail from server. Right now I have some simple filters, that catch the little bit of spam I get.

    Is there an 'integrated' solution that works within Evolution. I heard some time back they were going to do one.

    They have a command line filter. May be that can be utilized?

    any experiences?

    thanks
    LinuxLover

    1. Re:what works well with Ximian Evolution? by Bryan_W · · Score: 1

      popfile works with every mail client with an easy to use web interface.

  75. Re:Why stop at classifying spam? Why not all e-mai by JohnGrahamCumming · · Score: 1

    Do you have some evidence that points to this? I've been tracking statistics from actual POPFile users for sometime and publishing them on the POPFile home page and there is only a *very* slight decrease in accuracy as the number of classification buckets increases. John.

  76. One of the most useful mail client features by lucifuge31337 · · Score: 1

    That exists in various forms in various packages (that I want to see in Outlook, because that's what I use...bash as will...but it works best for me for various reasons) is:
    - Only load images from HTML mail from addresses in your personal address book
    and
    - Whitelist/classify based on users in your address book.

    If those two additional features and my Spambayes setup, I'd be very happy.

    --
    Do not fold, spindle or mutilate.
  77. Re:FP w00t by Anonymous Coward · · Score: 0

    I SAID that this message WAS NOT offtopic, it was a reply to parent, who used the SAME topic. That AC had claimed FP, and I said that he/she DIDN'T. Is that offtopic, or is it flamebait? BTW, that WAS my fp about a beowulf cluster of spam.

    I'm posting as AC to prevent my Positive karma from being hurt more. I was trying to get Excellent karma...

    Anyway, if I had modpoints and came across that post, I would mod it down to Score:0, Flamebait, not Score:-1, Offtopic.

    (plug type="blatant" subject="website")
    BTW, what this post got modded to is the name of my future website, which will be at http://score-1ot.no-ip.com. It does point to something, but it's my Geocities site.
    (/plug)

  78. ham is bad for you by Anonymous Coward · · Score: 0

    how about sorting the spam from the tofu and seitan

  79. Re:FINALLY GAY SEX IS LEGAL IN THE USA by Anonymous Coward · · Score: 0

    Hi, Pr0nboy. Now I know your secret. BTW, both parent and this post need modded down to -1 OT. THIS is an example of an offtopic post. BTW, this would also be either Flamebait or Troll.

    I'm just making sure that /. mods aren't stupid.

  80. Re:Why stop at classifying spam? Why not all e-mai by JohnGrahamCumming · · Score: 1

    You got it. That's the whole point of POPFile... Now, if Apple would give me a job I'd happily make Mail.app the coolest mail program out there by adding mutli-bucket Bayesian mail sorting. John.

  81. Bayesian virus-filtering? by Stormbringer · · Score: 1

    (can Bayesian be applied to things like spam from Internet virii as well?)

    Yes. It all comes down to patterns and weights, like fuzzy logic.

    I have one account that regularly attracts Klez-type messages. Since I installed and trained SpamProbe, not a one has ended up in my in-basket, they've all gone into my spam-bucket. The last time I had to retrain on a false-positive was a week ago, so the filtering isn't uber-strict.

  82. I use Apam Assassin with Hotmil by esanbock · · Score: 3, Informative

    1. Use Debian
    2. apt-get install spamassassin
    3. apt-get install hotway
    4. Add this to your /etc/inetd.conf: pop3 stream tcp nowait nobody /usr/sbin/tcpd /usr/bin/hotwayd
    5. Switch to Kmail
    6. Menu: Settings|Configure Filters
    7. Add first filter.
    a. Select Match Any of the following
    b. Select size 250000
    c. Filter action: PIPE THROUGH spamassassin
    8. Add second filter
    a. Select 'Match any of the following'
    b. Type 'X-Spam-Flag' (no quotes)
    c. Select equals. Type 'YES'
    d. Filter action: Move to folder [your spam folder]
    9. It's crucial thta the second filter happes after the first (use the arrows to the left).

    There you have it - a spam-free Hotmail account. Not quite setup.exe, but this is Linux after all.

  83. And yet another one - Spammunition by Anonymous Coward · · Score: 0

    I've sampled it for a bit now, has some problems with some firewalls, but works pretty well for me so far.

    Spammunition http://www.upserve.com/spammunition/

  84. it does *not* work with Outlook Express by Anonymous Coward · · Score: 0

    ...so you may not be out of arguments yet!

  85. Re:Why stop at classifying spam? Why not all e-mai by scottme · · Score: 1

    This is exactly what SwiftFile does for Lotus Notes.

    OK, it's not the most widely used email client among Slashdot readers, but it is very extensible and this is just one example.

  86. I must have done something wrong... by chinton · · Score: 3, Funny
    I tried a few months ago to write a Spam filter in Python, but no matter what I tried, this was the only output I could receive:

    I DON'T LIKE SPAM! I DON'T LIKE SPAM! I DON'T LIKE SPAM!

  87. Bayesian tool for IMAP server downloadable now by berenddeboer · · Score: 1

    Well, people are not working on it, you can download it now already! :-)

    See http://www.pobox.com/~berend/emc/ for more details.

    --
    If I had a sig, I would put it here.
    1. Re:Bayesian tool for IMAP server downloadable now by H310iSe · · Score: 1

      wow, thanks! I'm actually burned out on spam-filters, I just had a marathon of testing but I've got your link and when I can next bring myself to look at the things I'll be installing yours first.

      --
      closed minded is as closed minded does
  88. Re:Why stop at classifying spam? Why not all e-mai by Anonymous Coward · · Score: 1, Interesting

    Make sense. Consider classifying with a binary tree (e.g., first divide into spam and non-spam, then divide the no-spam into personal and business, and then divide personal into two groups, and so on). If each step can be done with 99% accuracy (something my experience with Bayesian spam filtering would indicate), then you could go 5 levels deep (32 buckets, if fully populated) and have roughly 95% accuracy. Not "very slight" decrease but still quite usable,
    and the cost of misclassification wouldn't be very high anyway.

  89. SAproxy works for me by witort · · Score: 1

    I've found SAproxy to be a convenient way to run SpamAssassin with Outlook. You point your account at the locally running POP3 proxy, it filters your mail as it's coming in, prepending "*****SPAM*****" to the beginning of any suspected spam, then you add a rule in Outlook looking for that. Easy peasy (and freesy!)

  90. Not accurate enough by Moderation+abuser · · Score: 1

    I've tried multiple buckets with popfile/outclass, it gets a significant percentage of the classification correct, but it also gets enough wrong to be a serious pain in the arse when important mails get automatically misclassified into a low priority mailing list folder.

    --
    Government of the people, by corporate executives, for corporate profits.
  91. Anyone know of a Lotus Notes filter? by Moderation+abuser · · Score: 2, Interesting

    I've just been migrated to Notes from Outlook. Not a happy bunny till I discovered how powerful it is with stuff like agents.

    The only thing I'm missing now is a spam classification tool like popfile for notes.

    --
    Government of the people, by corporate executives, for corporate profits.
    1. Re:Anyone know of a Lotus Notes filter? by vawlk · · Score: 1

      I wrote one...works well for me and my company but it's not finished yet.

      http://www.spamwedge.net

    2. Re:Anyone know of a Lotus Notes filter? by scottme · · Score: 1

      You could look at SwiftFile from IBM's Alphaworks. IBM employees have some additional options and should check out the internal "stop-spam" forum for recommendations. One research product in particular regularly gets deserved high praise.

  92. What people don't realize about anti-spam tools by anti$pam · · Score: 1
    The economics of spamming:

    1) You gotta pay to send spam (there is a cost)

    2) Spammers make money on the reponses they get.

    3) There is a breakeven point, you need a certain response rate in order to maintain the spamming rate -- otherwise you lose money.

    Anti-spamming tools help reduce the return rate (not saying we shouldn't investigate other options, just that this is something the "average joe/jane can do"). Enough people using anti spam tools will start putting a bite on the return rate, and put spammers out of business. Use the tools!!!

    SpamAssassin

    SAProxy (windows)

  93. The spam war is over and we won... by Anonymous Coward · · Score: 0

    My tuppence worth:

    The Geeks have won the war against spam. Most common users do not get spam in their inboxes, since it is blocked by the ISPs and business servers.

    It still irritates the hell out of all Geeks, but I think it is time to announce to the world that the spam problem has been fixed and that sending spam doesn't work, so the spamvertisers can just as well stop sending it, since it just gets dumped anyway.

  94. Hang on a minute... by Alan+Partridge · · Score: 1

    Spam is "chopped pork and ham" right? So you can NEVER sort the Spam from the ham - Spam CONTAINS ham!

    You could sort Spam from beef, I should think.

    --
    That was classic intercourse!
  95. I'm a vegetarian by billstewart · · Score: 1
    ... you insensitive clod! ...

    Yes, I already did the no-karma-whore-bonus to mod myself down :-)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  96. client side wastes of time by mabu · · Score: 1

    For the Bazillionth Time, client side spam filtering does not address the problem. It's a waste of time.

    The more client-side filters that are in place, the more spam will increase. It's already a cat-and-mouse game between spammers and filters. In the mean time, almost 70% of existing mail traffic is UCE. Filters don't stop that at all.

    You have to stop spam at the source. You have to force spammers to act responsibly and not exploit network resources without appropriate compensation. Only when this is done, will any of us ever have any real control of the spam situation.

    Client-side spam filtering is a scheme devised by companies who actually have a financial interest in spam, and the more spam there is, the more money they make. Let's get smart about this and stop acting like the rest of the country's brain dead couch potatoes.

  97. The headers are the giveaway by SeanAhern · · Score: 1

    Don't delete those messages. Tell your filter to reclassify them as spam. See, the routing and the sender are things that you really don't look at, but your filter does.

    I noticed recently that my bayesian filter was blocking very minimalist messages. When I asked it what it was donig, it told me that the headers clued it in to its spam origins. I think the same guy had sent me something before.

    So the long and the short of it is: trust your filter. It'll keep getting it right...

    1. Re:The headers are the giveaway by anthony_baxter · · Score: 1

      Spambayes is scary-smart sometimes with the header-only scoring. I find that it nails the spams with just a URL almost all the time - viewing the clues, it's always header stuff.

  98. Ah spam! by pair-a-noyd · · Score: 1

    One of my favorites....

    URGENT BUSINESS PROPOSAL

    From:

    To:

    Date:
    06/26/03 10:04 pm

    From:Mr.Wilson Ajala
    Lagos-Nigeria.

    Attn:President/Ceo.

    STRICTLY CONFIDENTIAL BUSINESS PROPOSALRE: TRANSFER OF US$21.5 MILLION
    (TWENTY ONE MILLION,FIVE HUNDRED THOUSAND US DOLLARS ONLY).

    This letter is not intended to to cause any embarrassment but just to contact your esteem self-following the knowledge of your high repute and trustworthiness.By virtue of the nature of this transaction, I solicit for your confidentiality and secrecy.I know that the transaction of this magnitude will make one apprehensive and worried but I assure you that all will be well by the end of this transaction.

    I am a member of the Federal Government of Nigeria Contract Award and Monitoring Committee in the Nigeria National Petroleum Corporation (NNPC) Sometime ago, a contract was awarded to a foreign firm in NNPC by my Committee. This contract was over invoiced to the tune of US$21.5M.U.S.Dollars.This was done deliberately.The over-invoicing was a deal by my
    committee to benefit from the project.We now want to transfer this money which is in a suspense Account with NNPC into any Overseas Account which we expect you to provide for us.

    SHARE: -

    For providing the account where we shall remit the money into, you will be entitled to 20% of the money,75% will be for me and my partners while 5% has been mapped out from the total sum to cover any expenses that may be incurred by us during the course of this transfer, both locally and international expenses.

    It does not matter whether or not your company does contract projects of this nature described here.The assumption is that your company won the major contract and subcontracted it out to other companies. More often than not,big trading companies or firms of unrelated fields win major contracts and subcontracts to more specialized firms for execution of such contracts.

    We have strong reliable connections and contacts at the Central Bank of Nigeria, as well as the Federal Ministry of Finance and we have no doubt that all the money will be released and transferred if we get the necessary foreign partner to assist us in this deal. Therefore,when the business is successfully concluded we shall through our same connections withdraw all documents used from all the concerned Government Ministries for 100% security.

    We are ordinary civil servants and we will not want to miss this once in alifetime opportunity to get rich.We want this money to be transferred to the overseas Accounts for us, before the present Democratic Government starts Auditing all Federal Government ownedParastatals.

    Please contact me immediately through my email address whether or not you are interested in this deal.If you are not, it will enable me scout for another foreign partner to carry out this deal.But where you are interested,I will require the following details from you as soon as possible in order for us to commence communication immediately as time is of the essence in this business.

    1.YOUR NAME
    Â Â
    2.PHONE AND FAX NUMBERS

    Â
    I wait in anticipation of your fullest co-operation.

    Yours Faithfully,

    Wilson Ajala.

    Email: ajala8000@rediffmail.com

  99. shweet by autopr0n · · Score: 1

    There's already some kind of spam filter upstream of me. I geet all these strange headers like 'perlmx-spamgauge: 67%' and a bunch of 'triggers' for the system, but it dosn't do me much good on Outlook (yeah, I know it's virusbait, but it's right there... : P) outlooks built in spam 'filters' suck ass. it gives tons of false positives.

    --
    autopr0n is like, down and stuff.
  100. muttrc - Mod Parent Up by theTerribleRobbo · · Score: 0

    He's being helpful, damnit.

  101. Need the Bayes by Anonymous Coward · · Score: 0

    I've been running a beta of a program I found at www.inboxer.com, which, after looking at all of this dot hoopla, I checked and sure enough, it says that it is based on SpamBayes and gives credit to Mark Hammond. I got better results for blocking with their InBoxer beta version after receiving about 100 spam messages in the past two weeks, so I adjusted the sensitivity on Sunday, and I haven't seen a message since. It seems they have a control that lets you set the percentage point at which a message is considered spam. It completely knows when the man is sending me something pervy which is safe or if indeed is a penis enhancer. That my friends, is an excellent use of Bayesian

  102. Not it's not... by Goonie · · Score: 4, Insightful
    Client side filtering is not an ideal spam solution, but it's a good thing on both a micro and macro scale.
    • For the 99% of people who don't respond to spam, it makes no difference to the spammer whether they filter it or delete it manually. At an individual level, it reduces the amount of spam I have to deal with to managable levels.
    • For the 1% that *do* respond to spam, having a filter might reduce the amount of spam they respond to and thus reduces the financial rewards for spammers. Anything that reduces the financial rewards for spammers is going to help reduce the spam levels.
    • If spammers are spending all their time and money figuring out how to beat filters, that's time and money that they're not using to send spam.

    As for your indictment of spam filtering providers, could you please explain where the spamassassin devteam is making money?

    My choices with regards to spam at the moment are simple. Use spamassassin or something like it, or wade through spam myself. I know which I'd prefer.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
    1. Re:Not it's not... by mabu · · Score: 1

      First, any time you implement filtering based on content you run the risk of blocking legitimate correspondence. This is unacceptable in many scenarios. And the only way to deal with the potential loss of legitimate mail is to have some facility to review messages flagged as "spam", which defeats the purpose of client-side filtering saving any time. It's just as easy to hit the delete key in that case.

      As I've said before, the issue here is inappropriate exploitation (and in many cases downright theft) of resources. If you stop spammers from their currently completely uncontrolled behavior, then everyone's performance and cost to conduct business and communication online will be dramatically reduced.

      There may be some filtering projects that aren't making money, but that's beside the point. Either way, it's plugging a leak in a boat by turning on a bilge pump, which just slows down the water draining in, but doesn't fix the problem, so the pump eventually has to be maintained in order to keep you from sinking. Eventually the Internet trafic noise level will cause everything to sink if we don't do something about it.

  103. Try InBoxer.com by LuckBeALady · · Score: 1

    Agreed, it does work and I just saw the beta download on betanews.com

  104. Give me a better email app, and I won by Angry+Pixie · · Score: 1

    After many years, I've gotten pretty good at identifying spam by scanning Sender and Subject headers, so I don't really see the benefit in using a Bayesian (or any other filter). I'd rather just have a POP email client (in Windows) that will let me scan headers and remove unwanted emails on the POP server before downloading anything at all, and then only display those emails I download when I take an affirmative action to view a message body. In other words. Clicking on the header should not launch a preview pane that loads the email body.

    For me the answer is a well-thought out email client that assumes all email to be hostile - unlike Outlook, which assumes all email to be friendly. I've got an idea about chaining several apps together (fetchmail, Hotpop, etc.) to accomplish a lot of the functionality I want; but there's still no turnkey solution in Windows that I know of. :(

  105. Re:Great, but my problem is a bit more complicated by anthony_baxter · · Score: 1

    I find that they just get filtered fine by SB. The bounced spam still contains enough spammy tokens to get correctly filed away.

  106. Sendmail and SPAM by TheCovenant · · Score: 1

    If all of the postmasters out there would setup their mail servers properly SPAM would be less of a problem.

    At the very least, we would know for sure exactly where it came from and could more easily filter from those domains.

    The solution to SPAM is already here. The fault for SPAM lies with the postmasters.

    !!!! Postmasters unite against SPAM !!!!!

    Send me mail if you're a postmaster and you want more info

    --
    cp -R /* /dev/null
  107. Re:Why stop at classifying spam? Why not all e-mai by ralphclark · · Score: 1

    What I really want is mail stored in a monolithic file with indexed access - with the *keys* placed in these buckets. With multiple categories applied to each email so that eg. a message from your brother which contains a new joke *and* clues you about a business opportunity can get filed under "family" "jokes" and "business" all at the same time without duplicating the underlying data.

    Actually while we're at it the proverbial *they* need to make this indexed data structure readable and writable by other applications. It would be nice if you could access *all* your data (contacts, post-it notes, appointment history, chat conversation logs, URL bookmarks, movie clips, everything) neatly, each item presented in the relevant format. from a single search query.

    Intelligent filtering is only half useful in its current form. The real benefits won't be perceptible until we are able to use it to index our stuff in a purely subject-oriented way rather than (as now) in a format-oriented way.

  108. Re:Why stop at classifying spam? Why not all e-mai by Anonymous Coward · · Score: 0

    go ahead and do it anyway, and use it as sample code on your application to Apple

  109. They're getting smarter by qartis · · Score: 1

    The last peice of spam I actually downloaded from the server, cleverly enough, had the subject line "The server is down". Bastards.