Slashdot Mirror


How Apple's Mail.app Junk Filter Works

fmorgan writes "O'Reilly has now posted the second part on an article about Mac OS X Mail.app spam filtering with more details on what this technology is (and isn't): 'Many myths have emerged about Mail's junk mail filter. No, it's not an extremely complex set of rules, no it doesn't look for keywords, and no, it doesn't use white magic ... Interestingly enough, the technology that underlies the Junk Mail filter began its life as an information retrieval system.'"

7 of 273 comments (clear)

  1. i know how by ShallowThroat · · Score: 5, Funny

    it's simple. it uses it's extremely uninsipired app name to scare away spam.

    --
    The "Insert Quote Here" line is almost as predictable as inserting an actual quote.
  2. subspaces? by thedogcow · · Score: 5, Funny

    The article mentions...

    "In mathematical terms, we would say that every document is a vector of n numbers or a point in a space with n dimensions."

    Funny. When I took linear algebra I was wondering if there was a practical approach to this, and I guess there is... to elliminate penis enlargement advertisments.

    --
    Yes! I listen to NYC Speedcore and do math at 3AM. I suggest you try it too.
  3. ...moderation ideas.... by j3ll0 · · Score: 5, Funny

    Why wouldn't a similar algorithm work to provide automated moderation? It seems to me that you could certainly identify clusters of words that indicate low-value posts?

    1. Re:...moderation ideas.... by wheresdrew · · Score: 5, Funny
      Yes, but the combination of too many all too common terms could cause the system to implode.

      "In Soviet Russia imagine a beowulf cluster of insenstive clods who don't RTFA because they're using linux to beat the GNAA to the first post."

  4. n-space by Anonymous Coward · · Score: 5, Funny

    Each document is in turn represented by a long string of numbers, one for each word in the corpus. In mathematical terms, we would say that every document is a vector of n numbers or a point in a space with n dimensions. This coordinate is then mapped onto a unique position in the goatse.cx photograph. If it lands in an objectionable region, the message is discarded as spam.

    It's an interesting method, but not having Mail.app myself, what I'm wondering is how well it works on the border regions; that is, when it is just barely objectionable. Say, on his leg.

  5. Re:how does it compare to Bayesian? by inburito · · Score: 5, Funny

    Wow. If your grandma is suggesting you viagra I think your problems go way deeper than Bayesian misfirings..

  6. Re:Fast?!? by Alan · · Score: 5, Funny

    Dude, you seriously need to seek help for your mail-archiving condition :)

    Or if nothing else move some of the mail to a backup directory so the poor little imap server doesn't have to deal with YOUR pack-rat habits!